2
Most read
7
Most read
9
Most read
NUMERICAL ANALYSIS WITH
APPLICATIONS IN MECHANICS
AND ENGINEERING
Wiley   numerical analysis with applications in mechanics and engineering by teodorescu, stenascu, and pandread - 2013
NUMERICAL ANALYSIS WITH
APPLICATIONS IN MECHANICS
AND ENGINEERING
PETRE TEODORESCU
NICOLAE-DORU ST ˘ANESCU
NICOLAE PANDREA
Copyright  2013 by The Institute of Electrical and Electronics Engineers, Inc.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey. All rights reserved
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any
means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under
Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the
Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center,
Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at
www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions
Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201)
748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in
preparing this book, they make no representations or warranties with respect to the accuracy or completeness of
the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a
particular purpose. No warranty may be created or extended by sales representatives or written sales materials.
The advice and strategies contained herein may not be suitable for your situation. You should consult with a
professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other
commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer
Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax
(317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be
available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Teodorescu, P. P.
Numerical Analysis with Applications in Mechanics and Engineering / Petre Teodorescu,
Nicolae-Doru Stanescu, Nicolae Pandrea.
pages cm
ISBN 978-1-118-07750-4 (cloth)
1. Numerical analysis. 2. Engineering mathematics. I. Stanescu, Nicolae-Doru. II. Pandrea,
Nicolae. III. Title.
QA297.T456 2013
620.001 518–dc23
2012043659
Printed in the United States of America
ISBN: 9781118077504
10 9 8 7 6 5 4 3 2 1
CONTENTS
Preface xi
1 Errors in Numerical Analysis 1
1.1 Enter Data Errors, 1
1.2 Approximation Errors, 2
1.3 Round-Off Errors, 3
1.4 Propagation of Errors, 3
1.4.1 Addition, 3
1.4.2 Multiplication, 5
1.4.3 Inversion of a Number, 7
1.4.4 Division of Two Numbers, 7
1.4.5 Raising to a Negative Entire Power, 7
1.4.6 Taking the Root of pth Order, 7
1.4.7 Subtraction, 8
1.4.8 Computation of Functions, 8
1.5 Applications, 8
Further Reading, 14
2 Solution of Equations 17
2.1 The Bipartition (Bisection) Method, 17
2.2 The Chord (Secant) Method, 20
2.3 The Tangent Method (Newton), 26
2.4 The Contraction Method, 37
2.5 The Newton–Kantorovich Method, 42
2.6 Numerical Examples, 46
2.7 Applications, 49
Further Reading, 52
v
vi CONTENTS
3 Solution of Algebraic Equations 55
3.1 Determination of Limits of the Roots of Polynomials, 55
3.2 Separation of Roots, 60
3.3 Lagrange’s Method, 69
3.4 The Lobachevski–Graeffe Method, 72
3.4.1 The Case of Distinct Real Roots, 72
3.4.2 The Case of a Pair of Complex Conjugate Roots, 74
3.4.3 The Case of Two Pairs of Complex Conjugate Roots, 75
3.5 The Bernoulli Method, 76
3.6 The Bierge–Vi`ete Method, 79
3.7 Lin Methods, 79
3.8 Numerical Examples, 82
3.9 Applications, 94
Further Reading, 109
4 Linear Algebra 111
4.1 Calculation of Determinants, 111
4.1.1 Use of Definition, 111
4.1.2 Use of Equivalent Matrices, 112
4.2 Calculation of the Rank, 113
4.3 Norm of a Matrix, 114
4.4 Inversion of Matrices, 123
4.4.1 Direct Inversion, 123
4.4.2 The Gauss–Jordan Method, 124
4.4.3 The Determination of the Inverse Matrix by its Partition, 125
4.4.4 Schur’s Method of Inversion of Matrices, 127
4.4.5 The Iterative Method (Schulz), 128
4.4.6 Inversion by Means of the Characteristic Polynomial, 131
4.4.7 The Frame–Fadeev Method, 131
4.5 Solution of Linear Algebraic Systems of Equations, 132
4.5.1 Cramer’s Rule, 132
4.5.2 Gauss’s Method, 133
4.5.3 The Gauss–Jordan Method, 134
4.5.4 The LU Factorization, 135
4.5.5 The Schur Method of Solving Systems of Linear Equations, 137
4.5.6 The Iteration Method (Jacobi), 142
4.5.7 The Gauss–Seidel Method, 147
4.5.8 The Relaxation Method, 149
4.5.9 The Monte Carlo Method, 150
4.5.10 Infinite Systems of Linear Equations, 152
4.6 Determination of Eigenvalues and Eigenvectors, 153
4.6.1 Introduction, 153
4.6.2 Krylov’s Method, 155
4.6.3 Danilevski’s Method, 157
4.6.4 The Direct Power Method, 160
4.6.5 The Inverse Power Method, 165
4.6.6 The Displacement Method, 166
4.6.7 Leverrier’s Method, 166
CONTENTS vii
4.6.8 The L–R (Left–Right) Method, 166
4.6.9 The Rotation Method, 168
4.7 QR Decomposition, 169
4.8 The Singular Value Decomposition (SVD), 172
4.9 Use of the Least Squares Method in Solving the Linear
Overdetermined Systems, 174
4.10 The Pseudo-Inverse of a Matrix, 177
4.11 Solving of the Underdetermined Linear Systems, 178
4.12 Numerical Examples, 178
4.13 Applications, 211
Further Reading, 269
5 Solution of Systems of Nonlinear Equations 273
5.1 The Iteration Method (Jacobi), 273
5.2 Newton’s Method, 275
5.3 The Modified Newton’s Method, 276
5.4 The Newton–Raphson Method, 277
5.5 The Gradient Method, 277
5.6 The Method of Entire Series, 280
5.7 Numerical Example, 281
5.8 Applications, 287
Further Reading, 304
6 Interpolation and Approximation of Functions 307
6.1 Lagrange’s Interpolation Polynomial, 307
6.2 Taylor Polynomials, 311
6.3 Finite Differences: Generalized Power, 312
6.4 Newton’s Interpolation Polynomials, 317
6.5 Central Differences: Gauss’s Formulae, Stirling’s Formula, Bessel’s
Formula, Everett’s Formulae, 322
6.6 Divided Differences, 327
6.7 Newton-Type Formula with Divided Differences, 331
6.8 Inverse Interpolation, 332
6.9 Determination of the Roots of an Equation by Inverse Interpolation, 333
6.10 Interpolation by Spline Functions, 335
6.11 Hermite’s Interpolation, 339
6.12 Chebyshev’s Polynomials, 340
6.13 Mini–Max Approximation of Functions, 344
6.14 Almost Mini–Max Approximation of Functions, 345
6.15 Approximation of Functions by Trigonometric Functions (Fourier), 346
6.16 Approximation of Functions by the Least Squares, 352
6.17 Other Methods of Interpolation, 354
6.17.1 Interpolation with Rational Functions, 354
6.17.2 The Method of Least Squares with Rational Functions, 355
6.17.3 Interpolation with Exponentials, 355
6.18 Numerical Examples, 356
6.19 Applications, 363
Further Reading, 374
viii CONTENTS
7 Numerical Differentiation and Integration 377
7.1 Introduction, 377
7.2 Numerical Differentiation by Means of an Expansion into a Taylor Series, 377
7.3 Numerical Differentiation by Means of Interpolation Polynomials, 380
7.4 Introduction to Numerical Integration, 382
7.5 The Newton–Cˆotes Quadrature Formulae, 384
7.6 The Trapezoid Formula, 386
7.7 Simpson’s Formula, 389
7.8 Euler’s and Gregory’s Formulae, 393
7.9 Romberg’s Formula, 396
7.10 Chebyshev’s Quadrature Formulae, 398
7.11 Legendre’s Polynomials, 400
7.12 Gauss’s Quadrature Formulae, 405
7.13 Orthogonal Polynomials, 406
7.13.1 Legendre Polynomials, 407
7.13.2 Chebyshev Polynomials, 407
7.13.3 Jacobi Polynomials, 408
7.13.4 Hermite Polynomials, 408
7.13.5 Laguerre Polynomials, 409
7.13.6 General Properties of the Orthogonal Polynomials, 410
7.14 Quadrature Formulae of Gauss Type Obtained by Orthogonal Polynomials, 412
7.14.1 Gauss–Jacobi Quadrature Formulae, 413
7.14.2 Gauss–Hermite Quadrature Formulae, 414
7.14.3 Gauss–Laguerre Quadrature Formulae, 415
7.15 Other Quadrature Formulae, 417
7.15.1 Gauss Formulae with Imposed Points, 417
7.15.2 Gauss Formulae in which the Derivatives of the Function Also
Appear, 418
7.16 Calculation of Improper Integrals, 420
7.17 Kantorovich’s Method, 422
7.18 The Monte Carlo Method for Calculation of Definite Integrals, 423
7.18.1 The One-Dimensional Case, 423
7.18.2 The Multidimensional Case, 425
7.19 Numerical Examples, 427
7.20 Applications, 435
Further Reading, 447
8 Integration of Ordinary Differential Equations
and of Systems of Ordinary Differential Equations 451
8.1 State of the Problem, 451
8.2 Euler’s Method, 454
8.3 Taylor Method, 457
8.4 The Runge–Kutta Methods, 458
8.5 Multistep Methods, 462
8.6 Adams’s Method, 463
8.7 The Adams–Bashforth Methods, 465
8.8 The Adams–Moulton Methods, 467
8.9 Predictor–Corrector Methods, 469
8.9.1 Euler’s Predictor–Corrector Method, 469
CONTENTS ix
8.9.2 Adams’s Predictor–Corrector Methods, 469
8.9.3 Milne’s Fourth-Order Predictor–Corrector Method, 470
8.9.4 Hamming’s Predictor–Corrector Method, 470
8.10 The Linear Equivalence Method (LEM), 471
8.11 Considerations about the Errors, 473
8.12 Numerical Example, 474
8.13 Applications, 480
Further Reading, 525
9 Integration of Partial Differential Equations and of
Systems of Partial Differential Equations 529
9.1 Introduction, 529
9.2 Partial Differential Equations of First Order, 529
9.2.1 Numerical Integration by Means of Explicit Schemata, 531
9.2.2 Numerical Integration by Means of Implicit Schemata, 533
9.3 Partial Differential Equations of Second Order, 534
9.4 Partial Differential Equations of Second Order of Elliptic Type, 534
9.5 Partial Differential Equations of Second Order of Parabolic Type, 538
9.6 Partial Differential Equations of Second Order of Hyperbolic Type, 543
9.7 Point Matching Method, 546
9.8 Variational Methods, 547
9.8.1 Ritz’s Method, 549
9.8.2 Galerkin’s Method, 551
9.8.3 Method of the Least Squares, 553
9.9 Numerical Examples, 554
9.10 Applications, 562
Further Reading, 575
10 Optimizations 577
10.1 Introduction, 577
10.2 Minimization Along a Direction, 578
10.2.1 Localization of the Minimum, 579
10.2.2 Determination of the Minimum, 580
10.3 Conjugate Directions, 583
10.4 Powell’s Algorithm, 585
10.5 Methods of Gradient Type, 585
10.5.1 The Gradient Method, 585
10.5.2 The Conjugate Gradient Method, 587
10.5.3 Solution of Systems of Linear Equations by Means of Methods of
Gradient Type, 589
10.6 Methods of Newton Type, 590
10.6.1 Newton’s Method, 590
10.6.2 Quasi-Newton Method, 592
10.7 Linear Programming: The Simplex Algorithm, 593
10.7.1 Introduction, 593
10.7.2 Formulation of the Problem of Linear Programming, 595
10.7.3 Geometrical Interpretation, 597
10.7.4 The Primal Simplex Algorithm, 597
10.7.5 The Dual Simplex Algorithm, 599
x CONTENTS
10.8 Convex Programming, 600
10.9 Numerical Methods for Problems of Convex Programming, 602
10.9.1 Method of Conditional Gradient, 602
10.9.2 Method of Gradient’s Projection, 602
10.9.3 Method of Possible Directions, 603
10.9.4 Method of Penalizing Functions, 603
10.10 Quadratic Programming, 603
10.11 Dynamic Programming, 605
10.12 Pontryagin’s Principle of Maximum, 607
10.13 Problems of Extremum, 609
10.14 Numerical Examples, 611
10.15 Applications, 623
Further Reading, 626
Index 629
PREFACE
In writing this book, it is the authors’ wish to create a bridge between mathematical and technical
disciplines, which requires knowledge of strong mathematical tools in the area of numerical analysis.
Unlike other books in this area, this interdisciplinary work links the applicative part of numerical
methods, where mathematical results are used without understanding their proof, to the theoretical
part of these methods, where each statement is rigorously demonstrated.
Each chapter is followed by problems of mechanics, physics, or engineering. The problem is first
stated in its mechanical or technical form. Then the mathematical model is set up, emphasizing the
physical magnitudes playing the part of unknown functions and the laws that lead to the mathematical
problem. The solution is then obtained by specifying the mathematical methods described in the
corresponding theoretical presentation. Finally, a mechanical, physical, and technical interpretation
of the solution is provided, giving rise to complete knowledge of the studied phenomenon.
The book is organized into 10 chapters. Each of them begins with a theoretical presentation,
which is based on practical computation—the “know-how” of the mathematical method—and ends
with a range of applications.
The book contains some personal results of the authors, which have been found to be beneficial
to readers.
The authors are grateful to Mrs. Eng. Ariadna–Carmen Stan for her valuable help in the pre-
sentation of this book. The excellent cooperation from the team of John Wiley & Sons, Hoboken,
USA, is gratefully acknowledged.
The prerequisites of this book are courses in elementary analysis and algebra, acquired by a
student in a technical university. The book is addressed to a broad audience—to all those interested
in using mathematical models and methods in various fields such as mechanics, physics, and civil
and mechanical engineering; people involved in teaching, research, or design; as well as students.
Petre Teodorescu
Nicolae-Doru St˘anescu
Nicolae Pandrea
xi
1
ERRORS IN NUMERICAL ANALYSIS
In this chapter, we deal with the most encountered errors in numerical analysis, that is, enter data
errors, approximation errors, round-off errors, and propagation of errors.
1.1 ENTER DATA ERRORS
Enter data errors appear, usually, if the enter data are obtained from measurements or experiments.
In such a case, the errors corresponding to the estimation of the enter data are propagated, by means
of the calculation algorithm, to the exit data.
We define in what follows the notion of stability of errors.
Definition 1.1 A calculation process P is stable to errors if, for any ε > 0, there exists δ > 0 such
that if for any two sets I1 and I2 of enter data we have I1 − I2 i < δ, then the two exit sets S1
and S2, corresponding to I1 and I2, respectively, verify the relation S1 − S2 e < ε.
Observation 1.1 The two norms i and e of the enter and exit quantities, respectively, which
occur in Definition 1.1, depend on the process considered.
Intuitively, according to Definition 1.1, the calculation process is stable if, for small variations
of the enter data, we obtain small variations of the exit data.
Hence, we must characterize the stable calculation process. Let us consider that the calculation
process P is characterized by a family fk of functions defined on a set of enter data with values in
a set of exit data. We consider such a vector function fk of vector variable fk : D → Rn
, where D
is a domain in Rm
(we propose to have m enter data and n exit data).
Definition 1.2 f : D → Rn
is a Lipschitz function (has the Lipschitz property) if there exists
m > 0, constant, so as to have f(x) − f(y) < m x − y for any x, y ∈ D (the first norm is in Rn
and the second one in Rm
).
Numerical Analysis with Applications in Mechanics and Engineering, First Edition.
Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.
 2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc.
1
2 ERRORS IN NUMERICAL ANALYSIS
It is easy to see that a calculation process characterized by Lipschitz functions is a stable one.
In addition, a function with the Lipschitz property is continuous (even uniform continuous) but
the converse does not hold; for example, the function f : R+ → R+, f (x) =
√
x, is continuous but
it is not Lipschitz. Indeed, let us suppose that f (x) =
√
x is Lipschitz, hence that it has a positive
constant m > 0 such that
|f (x) − f (y)| < m|x − y|, (∀)x, y ∈ R+. (1.1)
Let us choose x and y such that 0 < y < x < 1/4m2. Expression (1.1) leads to
√
x −
√
y < m(
√
x −
√
y)(
√
x +
√
y), (1.2)
from which we get
1 < m(
√
x +
√
y). (1.3)
From the choice of x and y, it follows that
√
x +
√
y <
1
4m2
+
1
4m2
=
1
m
, (1.4)
so that relations (1.3) and (1.4) lead to
1 < m
1
m
= 1, (1.5)
which is absurd. Hence, the continuous function f : R+ → R+, f (x) =
√
x is not a Lipschitz one.
1.2 APPROXIMATION ERRORS
The approximation errors have to be accepted by the conception of the algorithms because of various
objective considerations.
Let us determine the limit of a sequence using a computer; it is supposed that the sequence is
convergent. Let the sequence {xn}n∈N be defined by the relation
xn+1 =
1
1 + x2
n
, n ∈ N, x0 ∈ R. (1.6)
We observe that the terms of the sequence are positive, excepting eventually x0. The limit of this
sequence, denoted by x, is the positive root of the equation
x =
1
1 + x2
. (1.7)
If we wish to determine x with two exact decimal digits, then we take an arbitrary value of x0, for
example, x0 = 0, and calculate the successive terms of the sequence {xn} (Table 1.1).
PROPAGATION OF ERRORS 3
TABLE 1.1 Calculation of x with Two Exact Decimal Digits
n xn n xn n xn n xn
0 0 4 0.6028 8 0.6705 12 0.6804
1 1 5 0.7290 9 0.6899 13 0.6836
2 0.5 6 0.6530 10 0.6775 14 0.6815
3 0.8 7 0.7011 11 0.6854 15 0.6828
We obtain x = 0.68 . . .
1.3 ROUND-OFF ERRORS
Round-off errors are due to the mode of representation of the data in the computer. For instance, the
number 0.8125 in base 10 is represented in base 2 in the form 0.8125 = 0.11012 and the number
0.75 in the form 0.112. Let us suppose that we have a computer that works with three significant
digits. The sum 0.8125 + 0.75 becomes
1.5625 = 0.11012 + 0.112 ≈ 0.1102 + 0.112 = 1.1002 = 1.5. (1.8)
Such errors may also appear because of the choice of inadequate types of data in the programming
realized on the computer.
1.4 PROPAGATION OF ERRORS
Let us consider the number x and let x be an approximation of it.
Definition 1.3
(i) We call absolute error the expression
E = x − x. (1.9)
(ii) We call relative error the expression
R =
x − x
x
. (1.10)
1.4.1 Addition
Let x1, x2, . . . , xn be the numbers for which the relative errors are R1, R2, . . . , Rn, while their
absolute errors read E1, E2, . . . , En.
The relative error of the sum is
R
n
i=1
xi =
n
i=1
Ei
n
i=1
xi
(1.11)
and we may write the relation
min
i=1,n
|Ri| ≤ R
n
i=1
xi ≤ max
i=1,n
|Ri|, (1.12)
4 ERRORS IN NUMERICAL ANALYSIS
that is, the modulus of the relative error of the sum is contained between the lower and the higher
values in the modulus of the relative errors of the component members.
Thus, if the terms x1, x2, . . . , xn are positive and of the same order of magnitude,
max
i=1,n
xi
min
i=1,n
xi
< 10, (1.13)
then we must take the same number of significant digits for each term xi, i = 1, n, the same
number of significant digits occurring in the sum too.
If the numbers x1, x2, . . . , xn are much different among them, then the number of the significant
digits after the comma is given by the greatest number xi (we suppose that xi > 0, i = 1, n). For
instance, if we have to add the numbers
x1 = 100.32, x2 = 0.57381, (1.14)
both numbers having five significant digits, then we will round off x2 to two digits (as x1) and write
x1 + x2 = 100.32 + 0.57 = 100.89. (1.15)
It is observed that addition may result in a compensation of the errors, in the sense that the
absolute error of the sum is, in general, smaller than the sum of the absolute error of each term.
We consider that the absolute error has a Gauss distribution for each of the terms xi, i = 1, n,
given by the distribution density
φ(x) =
1
σ
√
2π
e
− x2
2σ2 , (1.16)
from which we obtain the distribution function
(x) =
x
−∞
φ(x)dx, (1.17)
with the properties
(−∞) = 0, (∞) = 1, (x) ∈ (0, 1), −∞ < x < ∞. (1.18)
The probability that x is contained between −x0 and x0, with x0 > 0 is
P (|x| < x0) = (x0) − (−x0) =
x0
−x0
φ(t)dt =
√
2
σ
√
π
x0
0
e
− t2
2σ2 dt. (1.19)
Because φ(x) is an even function, it follows that the mean value of a variable with a normal
Gauss distribution is
xmed =
∞
−∞
xφ(x)dx = 0, (1.20)
while its mean square deviation reads
(x2
)max =
∞
−∞
x2
φ(x)dx = σ2
. (1.21)
Usually, we choose σ as being the mean square root
σ = σRMS =
1
n
n
i=1
R2
i . (1.22)
PROPAGATION OF ERRORS 5
1.4.2 Multiplication
Let us consider two numbers x1, x2 for which the relative errors are R1, R2, while the approxima-
tions are x1, x2, respectively. We have
x1x2 = x1(1 + R1)x2(1 + R2) = x1x2(1 + R1 + R2 + R1R2). (1.23)
Because R1 and R2 are small, we may consider R1R2 ≈ 0, hence
x1x2 = x1x2(1 + R1 + R2), (1.24)
so that the relative error of the product of the two numbers reads
R(x1x2) = R1 + R2. (1.25)
Similarly, for n numbers x1, x2, . . . , xn, of relative errors R1, R2, . . . , Rn, we have
R
n
i=1
xi =
n
i=1
Ri. (1.26)
Let x be a number that may be written in the form
x = x∗
× 10r
, 1 ≤ x∗
< 10, 10r
≤ x < 10r+1
, x∗
∈ Z. (1.27)
The absolute error is
|E| ≤ 10r−n
, (1.28)
while the relative one is
|R| =
|E|
x
≤
10r−n+1
x∗ × 10r =
10−n+1
x∗
≤ 10−n+1
, (1.29)
where we have supposed that x has n significant digits.
If x is the round-off of x at n significant digits, then
|E| ≤ 5 × 10r−n
, |R| ≤
5
x
× 10r−n
. (1.30)
The error of the last significant digit, the nth, is
ε =
E
10r−n+1
=
xR
10r−n+1
= x∗
R × 10n−1
. (1.31)
Let x1, x2 now be two numbers of relative errors R1, R2 and let R be the relative error of the
product x1x2. We have
R =
x1x2 − x1x2
x1x2
= R1 + R2 − R1R2. (1.32)
Moreover, |R| takes its greatest value if R1 and R2 are negative; hence, we may write
|R| ≤ 5
1
x∗
1
+
1
x∗
2
× 10−n
+
25
x∗
1 x∗
2
× 10−2n
, (1.33)
6 ERRORS IN NUMERICAL ANALYSIS
where the error of the digit on the nth position is
|ε(x1x2)| ≤
(x1x2)∗
2
1
x∗
1
+
1
x∗
2
+
5
2
(x1x2)∗
x∗
1 x∗
2
× 10−n
. (1.34)
On the other hand,
(x1x2)∗
= x∗
1 x∗
2 × 10−p
, (1.35)
where p = 0 or p = 1, the most disadvantageous case being that described by p = 0.
The function
φ(x∗
1 , x∗
2 ) =
10p
2
(x∗
1 + x∗
2 + 5 × 10−n
) (1.36)
defined for 1 ≤ x∗
1 < 10, 1 ≤ x∗
2 < 10, 1 ≤ x∗
1 x∗
2 < 10 will attain its maximum on the frontier of
the above domain, that is, for x∗
1 = 1, x∗
2 = 10 or x∗
1 = 10, x∗
2 = 1. It follows that
φ(x∗
1 , x∗
2 ) ≤
10−p
2
(11 + 5 × 10−n
), (1.37)
and hence
|ε(x1, x2)| ≤
11
2
+
5
2
× 10−n
< 6, (1.38)
so that the error of the nth digit of the response will have at the most six units.
If x1 = x2 = x, then the most disadvantageous case is given by
(x∗
)2
= (x2
)∗
= 10 (1.39)
when
|ε(x2
)| ≤ 10
1
2 +
5
2
× 10−n
< 4, (1.40)
that is, the nth digit of x2
is given by an approximation of four units.
Let x1, . . . , xm now be m numbers; then
ε
m
i=1
xi ≤
(x1 · · · xm)∗
2 × 5 × 10−n
m
i=1
1 +
5 × 10−n
x∗
i
− 1 , (1.41)
the most disadvantageous case being that in which m − 1 numbers x∗
i are equal to 1, while one
number is equal 10. In this case, we have
ε
m
i=1
xi ≤
5
5 × 10−n 1 + 5 × 10−n m−1
1 +
5 × 10−n
10
− 1 . (1.42)
If all the m numbers are equal, xi = x, i = 1, m, then the most disadvantageous situation
appears for (x∗
)m
= (xm
)∗
= 10, and hence it follows that
|ε(xm
)| ≤
5
5 × 10−n 1 +
5 × 10−m
10
m
− 1 . (1.43)
PROPAGATION OF ERRORS 7
1.4.3 Inversion of a Number
Let x be a number, x its approximation, and R its relative error. We may write
1
x
=
1
x(1 + R)
=
1
x
(1 − R + R2
− R3
+ · · ·) ≈
1
x
(1 − R), (1.44)
hence
R
1
x
=
1
x − 1
x
1
x
= R, (1.45)
so that the relative error remains the same.
In general,
E
1
x
= −
E
x2
. (1.46)
1.4.4 Division of Two Numbers
We may imagine the division of x1 by x2 as the multiplication of x1 by 1/x2, so that
R
x1
x2
= R(x1) + R(x2); (1.47)
hence, the relative errors are summed up.
1.4.5 Raising to a Negative Entire Power
We may write
R
1
xm
= R
1
x
1
x
· · ·
1
x
=
m
i=1
R
1
x
=
m
i=1
R(x), m ∈ N, m = 0, (1.48)
so that the relative errors are summed up.
1.4.6 Taking the Root of pth Order
We have, successively,
x
1
p = (x + R)
1
p = x
1
p (1 + R)
1
p
= x
1
p 1 +
R
p
+
1
p
1
p
− 1
R2
2!
+
1
p
1
p
− 1
1
p
− 2
R3
3!
+ · · · , (1.49)
R x
1
p =
x
1
p − x
1
p
x
1
p
≈ −
R
p
. (1.50)
The maximum error for the nth digit is now obtained for x = 10(k−m)/m
, x∗
= 1, (x∗
)m
= 101−m
,
m = 1/p, k entire, and is given by
ε x∗
1
p
≤
101−m
2 × 5 × 10−n [(1 + 5 × 10−n
)m
− 1] = 10n−m
[(1 + 5 × 10−n
)m
− 1]. (1.51)
8 ERRORS IN NUMERICAL ANALYSIS
1.4.7 Subtraction
Subtraction is the most disadvantageous operation if the result is small with respect to the minuend
and the subtrahend.
Let us consider the subtraction 20.003 − 19.998 in which the first four digits of each number are
known with precision; concerning the fifth digit, we can say that it is determined with a precision
of 1 unit. It follows that for 20.003 the relative error is
R1 ≤
10−3
20.003
< 5 × 10−5
, (1.52)
while for 19.998 the relative error is
R1 ≤
10−3
19.998
< 5.1 × 10−5
. (1.53)
The result of the subtraction operation is 5 × 10−3
, while the last digit may be wrong with two
units, so that the relative error of the difference is
R =
2 × 10−3
5 × 10−3
= 400 × 10−3
, (1.54)
that is, a relative error that is approximately 8000 times greater than R1 or R2.
It follows the rule that the difference of two quantities must be directly calculated, without
previously calculating the two quantities.
1.4.8 Computation of Functions
Starting from Taylor’s relation
f (x) − f (x) = (x − x)f (ξ), (1.55)
where ξ is a point situated between x and x, it follows that the absolute error is
|E(f )| ≤ |E| sup
ξ∈Int(x,x)
|f (ξ)|, (1.56)
while the relative error reads
|R(f )| ≤
|E|
|f (x)|
sup
ξ∈Int(x,x)
|f (ξ)|, (1.57)
where Int(x, x) defines the real interval of ends x and x.
1.5 APPLICATIONS
Problem 1.1
Let us consider the sequence of integrals
In =
1
0
xn
ex
dx, n ∈ N. (1.58)
(i) Determine a recurrence formula for {In}n∈N.
APPLICATIONS 9
Solution: To calculate In, n ≥ 1, we use integration by parts and have
In =
1
0
xn
ex
dx = xn
ex 1
0
− n
0
0
xn−1
ex
dx = e − In−1. (1.59)
(ii) Show that lim
n→∞
In does exist.
Solution: For x ∈ [0, 1] we have
xn+1
ex
≤ xn
ex
, (1.60)
hence In+1 ≤ In for any n ∈ N. It follows that {In}n∈N is a decreasing sequence of real numbers.
On the other hand,
xn
ex
≥ 0, x ∈ [0, 1], n ∈ N, (1.61)
so that {In}n∈N is a positive sequence of real numbers.
We get
0 ≤ · · · ≤ In+1 ≤ In ≤ · · · ≤ I1 ≤ I0, (1.62)
so that {In}n∈N is convergent and, moreover,
0 ≤ lim
n→∞
In ≤ I0 =
1
0
ex
dx = e − 1. (1.63)
(iii) Calculate I13.
Solution: To calculate the integral we have two methods.
Method 1.
I0 =
1
0
ex
dx = ex 1
0
= e − 1, (1.64)
I1 = e − 1I0 = 1, (1.65)
I2 = e − 2I1 = e − 2, (1.66)
I3 = e − 3I2 = 6 − 2e, (1.67)
I4 = e − 4I3 = 9e − 24, (1.68)
I5 = e − 5I4 = 120 − 44e, (1.69)
I6 = e − 6I5 = 265e − 720, (1.70)
I7 = e − 7I6 = 5040 − 1854e, (1.71)
I8 = e − 8I7 = 14833e − 40320, (1.72)
I9 = e − 9I8 = 362880 − 133496e, (1.73)
I10 = e − 10I9 = 1334961e − 3628800, (1.74)
I11 = e − 11I10 = 39916800 − 14684570e, (1.75)
I12 = e − 12I11 = 176214841e − 479001600, (1.76)
I13 = e − 13I12 = 6227020800 − 2290792932e. (1.77)
It follows that
I13 = 0.1798. (1.78)
10 ERRORS IN NUMERICAL ANALYSIS
Method 2. In this case, we replace directly the calculated values, thus obtaining
I0 = e − 1 = 1.718281828, (1.79)
I1 = e − 1I0 = 1, (1.80)
I2 = e − 2I1 = 0.718281828, (1.81)
I3 = e − 3I2 = 0.563436344, (1.82)
I4 = e − 4I3 = 0.464536452, (1.83)
I5 = e − 5I4 = 0.395599568, (1.84)
I6 = e − 6I5 = 0.34468442, (1.85)
I7 = e − 7I6 = 0.305490888, (1.86)
I8 = e − 8I7 = 0.274354724, (1.87)
I9 = e − 9I8 = 0.249089312, (1.88)
I10 = e − 10I9 = 0.227388708, (1.89)
I11 = e − 11I10 = 0.21700604, (1.90)
I12 = e − 12I11 = 0.114209348, (1.91)
I13 = e − 13I12 = 1.233560304. (1.92)
We observe that, because of the propagation of errors, the second method cannot be used to
calculate In, n ≥ 12.
Problem 1.2
Let the sequences {xn}n∈N and {yn}n∈N be defined recursively:
xn+1 =
1
2
xn +
0.5
xn
, x0 = 1, (1.93)
yn+1 = yn − λ(y2
n − 0.5), y0 = 1. (1.94)
(i) Calculate x1, x2, . . . , x7.
Solution: We have, successively,
x1 =
1
2
x0 +
0.5
x0
=
3
4
, (1.95)
x2 =
1
2
x1 +
0.5
x1
=
17
24
, (1.96)
x3 =
1
2
x2 +
0.5
x2
=
577
816
, (1.97)
x4 =
1
2
x3 +
0.5
x3
= 0.707107, (1.98)
x5 =
1
2
x4 +
0.5
x4
= 0.707107, (1.99)
APPLICATIONS 11
x6 =
1
2
x5 +
0.5
x5
= 0.707107, (1.100)
x7 =
1
2
x6 +
0.5
x6
= 0.707107. (1.101)
(ii) Calculate y1, y2, . . . , y7 for λ = 0.49.
Solution: There result the values
y1 = y0 − 0.49(y2
0 − 0.5) = 0.755, (1.102)
y2 = y1 − 0.49(y2
1 − 0.5) = 0.720688, (1.103)
y3 = y2 − 0.49(y2
2 − 0.5) = 0.711186, (1.104)
y4 = y3 − 0.49(y2
3 − 0.5) = 0.708351, (1.105)
y5 = y4 − 0.49(y2
4 − 0.5) = 0.707488, (1.106)
y6 = y5 − 0.49(y2
5 − 0.5) = 0.707224, (1.107)
y7 = y8 − 0.49(y2
8 − 0.5) = 0.707143. (1.108)
(iii) Calculate y1, y2, . . . , y7 for λ = 49.
Solution: In this case, we obtain the values
y1 = y0 − 49(y2
0 − 0.5) = −23.5, (1.109)
y2 = y1 − 49(y2
1 − 0.5) = −27059.25, (1.110)
y3 = y2 − 49(y2
2 − 0.5) = −3.587797 × 1010
, (1.111)
y4 = y3 − 49(y2
3 − 0.5) = −6.307422 × 1022
, (1.112)
y5 = y4 − 49(y2
4 − 0.5) = −1.949395 × 1047
, (1.113)
y6 = y5 − 49(y2
5 − 0.5) = −1.862070 × 1096
, (1.114)
y7 = y8 − 49(y2
8 − 0.5) = −1.698979 × 10194
. (1.115)
We observe that the sequences {xn}n∈N and {yn}n∈N converge to
√
0.5 = 0.707107 for λ = 0.49,
while the sequence {yn}n∈N is divergent for λ = 49.
Problem 1.3
If the independent aleatory variables X1 and X2 have the density distributions p1(x) and p2(x),
respectively, then the aleatory variable X1 + X2 has the density distribution
p(x) =
∞
−∞
p1(x − s)p2(s)ds. (1.116)
(i) Demonstrate that if the aleatory variables X1 and X2 have a normal distribution by zero mean
and standard deviations σ1 and σ2, then the aleatory variable X1 + X2 has a normal distribution.
Solution: From equation (1.116) we have
p(x) =
∞
−∞
1
σ1
√
2π
e
−
(x − s)2
2σ2
1
1
σ2
√
2π
e
−
x2
2σ2
2 ds =
1
2πσ1σ2
∞
−∞
e
−
(x − s)2
2σ2
1 e
−
s2
2σ2
2 ds. (1.117)
12 ERRORS IN NUMERICAL ANALYSIS
We require the values λ1, λ2, and a real, such that
(x − s)2
2σ2
1
+
s2
2σ2
2
=
x2
λ2
1
+
(s − ax)2
λ2
2
, (1.118)
from which
x2
σ2
1
=
x2
2
λ2
1
+
a2x2
λ2
2
,
s2
σ2
1
+
s2
σ2
2
=
s2
λ2
2
, −
2xs
σ2
1
= −
2asx
λ2
2
, (1.119)
with the solution
λ2
2 =
σ2
1σ2
2
σ2
1 + σ2
2
, a =
σ2
2
σ2
1 + σ2
2
, λ2
1 = σ2
1 + σ2
2. (1.120)
We make the change of variable
s − ax =
√
2λ2t, ds =
√
2λ2dt (1.121)
and expression (1.118) becomes
p(x) =
1
2πσ1σ2
∞
−∞
e
−
x2
2λ2
1 e−t2
λ2dt =
1
σ2
1 + σ2
2
√
2π
e
−
x2
2(σ2
1 + σ2
2) . (1.122)
(ii) Calculate the mean and the standard deviation of the aleatory variable X1 + X2 of point (i).
Solution: We calculate
∞
−∞
xp(x)dx =
1
σ2
1 + σ2
2
√
2π
∞
−∞
xe
− x2
2 σ2
1+σ2
2 dx = 0, (1.123)
∞
−∞
x2
p(x)dx =
1
σ2
1 + σ2
2
√
2π
∞
−∞
x2
e
−
x2
2 σ2
1 + σ2
2 dx
=


−
σ2
1 + σ2
2
√
2π
xe
− x2
σ2
1+σ2
2



∞
−∞
+
σ2
1 + σ2
2
√
2π
∞
−∞
e
− x2
2(σ2
1+σ2
2)
dx
= σ2
1 + σ2
2. (1.124)
(iii) Let X be an aleatory variable with a normal distribution, a zero mean, and standard deviation
σ. Calculate
I1 =
1
σ
√
2π
∞
−∞
e
− x2
2σ2 dx (1.125)
and
I2 =
1
σ
√
2π
σ
−σ
e
− x2
2σ2 dx. (1.126)
Solution: Through the change of variable
x = σ
√
2u, dx = σ
√
2du, (1.127)
APPLICATIONS 13
it follows that
I1 =
1
σ
√
2π
∞
−∞
e−u2
σ
√
2du = 1. (1.128)
Similarly, we have
I2 =
1
√
π
σ
−σ
e−u2
du. (1.129)
On the other hand,
σ
−σ
e−u2
du =
2π
0
σ
0
e−ρ2
ρdρdθ = π(1 − e−σ2
), (1.130)
so that
I2 = 1 − e−σ2
. (1.131)
(iv) Let 0 < ε < 1, fixed. Determine R > 0 so that
1
√
π
R
−R
e−x2
dx < ε. (1.132)
Solution: Proceeding as with point (iii), it follows that
R
−R
e−x2
dx = π(1 − e−R2
), (1.133)
so that we obtain the inequality
1 − e−R2
< ε, (1.134)
from which
R < − ln(1 − ε2). (1.135)
(v) Calculate
I3 =
1
σ
√
2π
R
−R
e
− x2
2σ2 dx (1.136)
and
I4 =
1
σ
√
2π
∞
R
e
− x2
2σ2 dx (1.137)
Solution: We again make the change of variable (1.127) and obtain
I3 =
1
√
π
R
σ
√
2
− R
σ
√
2
e−u2
du. (1.138)
Point (ii) shows that
A
−A
e−x2
dx = π(1 − e−A2
), A > 0; (1.139)
hence, it follows that
I3 = 1 − e
− R2
2σ2 . (1.140)
14 ERRORS IN NUMERICAL ANALYSIS
On the other hand, we have seen that I1 = 1 and we may write
I1 =
1
σ
√
2π
2
∞
R
e
− x2
σ2 dx +
R
−R
e
− x2
2σ2 dx = 2I4 + I3. (1.141)
Immediately, it follows that
I4 =
I1 − I3
2
=
1 − 1 − e
− R2
2σ2
2
. (1.142)
(vi) Let X1 and X2 be two aleatory variables with a normal distribution, a zero mean, and standard
deviation σ. Determine the density distribution of the aleatory variable X1 + X2, as well as its mean
and standard deviation.
Solution: It is a particular case of points (i) and (ii); hence, we obtain
p(x) =
1
2σ
√
π
e
− x2
4σ2 , (1.143)
that is, a normal aleatory variable of zero mean and standard deviation σ
√
2.
(vii) Let N1 and N2 be numbers estimated with errors ε1 and ε2, respectively, considered to be
aleatory variables with normal distribution, zero mean, and standard deviation σ. Calculate the sum
N1 + N2 so that the error is less than a value ε > 0.
Solution: The requested probability is given by
I =
ε
−∞
1
2σ
√
π
e
− x2
4σ2 dx =
−ε
−∞
1
2σ
√
π
e
− x2
4σ2 dx +
ε
−ε
1
2σ
√
π
e
− x2
4σ2 dx. (1.144)
Taking into account the previous results, we obtain
−ε
−∞
1
2σ
√
π
e
− x2
4σ2 dx =
1 − 1 − e
− ε2
4σ2
2
, (1.145)
ε
−ε
1
2σ
√
π
e
− x2
4σ2 dx = 1 − e
− ε2
4σ2 , (1.146)
so that
I =
1
2
1 + 1 − e
− ε2
4σ2 . (1.147)
FURTHER READING
Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of
America.
Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis:
Theory, Methods and Practice. Boca Raton: CRC Press.
Atkinson KE (1989). An Introduction to Numerical Analysis. 2nd ed. New York: John Wiley & Sons,
Inc.
FURTHER READING 15
Atkinson KE (2003). Elementary Numerical Analysis. 2nd ed. New York: John Wiley & Sons, Inc.
Bakhvalov N (1976). M´ethodes Num´erique. Moscou: Editions Mir (in French).
Berbente C, Mitran S, Zancu S (1997). Metode Numerice. Bucures¸ti: Editura Tehnic˘a (in Romanian).
Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole.
Chapra SC (1996). Applied Numerical Methods with MATLAB for Engineers and Scientists. Boston:
McGraw-Hill.
Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson.
Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice Hall.
D´emidovitch B, Maron I (1973). ´El´ements de Calcul Num´erique. Moscou: Editions Mir (in French).
Epperson JF (2007). An Introduction to Numerical Methods and Analysis. Hoboken: John Wiley &
Sons, Inc.
Gautschi W (1997). Numerical Analysis: An Introduction. Boston: Birkh¨auser.
Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen-
tation of Algorithms. Princeton: Princeton University Press.
Hamming RW (1987). Numerical Methods for Scientists and Engineers. 2nd ed. New York: Dover
Publications.
Hamming RW (2012). Introduction to Applied Numerical Analysis. New York: Dover Publications.
Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing.
Higham NJ (2002). Accuracy and Stability of Numerical Algorithms. 2nd ed. Philadelphia: SIAM.
Hildebrand FB (1987). Introduction to Numerical Analysis. 2nd ed. New York: Dover Publications.
Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill.
Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd
ed. Boca Raton: CRC Press.
Krˆılov AN (1957). Lect¸ii de Calcule prin Aproximat¸ii. Bucures¸ti: Editura Tehnic˘a (in Romanian).
Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill.
Levine L (1964). Methods for Solving Engineering Problems Using Analog Computers. New York:
McGraw-Hill.
Marinescu G (1974). Analiz˘a Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Romanian).
Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific
Computing. 3rd ed. Cambridge: Cambridge University Press.
Quarteroni A, Sacco R, Saleri F (2010). Numerical Mathematics. 2nd ed. Berlin: Springer-Verlag.
Ralston A, Rabinowitz P (2001). A First Course in Numerical Analysis. 2nd ed. New York: Dover
Publications.
Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press.
Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson.
Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN.
Bucures¸ti: Editura Tehnic˘a (in Romanian).
St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
Stoer J, Bulirsh R (2010). Introduction to Numerical Analysis. 3rd ed. New York: Springer-Verlag.
2
SOLUTION OF EQUATIONS
We deal with several methods of approximate solutions of equations, that is, the bipartition method,
the chord (secant) method, the tangent method (Newton), and the Newton–Kantorovich method.
These are followed by applications.
2.1 THE BIPARTITION (BISECTION) METHOD
Let us consider the equation1
f (x) = 0, (2.1)
where f : [a, b] → R, a, b ∈ R, a < b, f continuous on [a, b], with a single root α, f (α) = 0, on
the interval [a, b].
First, we verify if f (a) = 0 or f (b) = 0; if this occurs, then the algorithm stops. Otherwise, we
consider the middle of the interval [a, b], c = (a + b)/2. We verify if c is a solution of equation
(2.1); if f (c) = 0, the algorithm stops; if not, we calculate f (c). If f (a) × f (c) < 0, then we
consider the interval [a, c] on which we have the true solution; if not, we consider the interval
[c, b]. Thus, the interval [a, b] is diminished to [a, c] or [c, b], its new length being equal to
(a + b)/2. We thus obtain a new interval [a, b], where a = c or b = c, and we apply the procedure
described above. The procedure stops when a certain criterion (e.g., the length of the interval [a, b]
is less than a given ε) is fulfilled.
1
The bipartition method is the simplest and most popular method for solving equations. It was known by ancient
mathematicians.
Numerical Analysis with Applications in Mechanics and Engineering, First Edition.
Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.
 2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc.
17
18 SOLUTION OF EQUATIONS
As we can see from this exposition, the bipartition method consists in the construction of three
sequences {an}, {bn}, and {cn}, n ∈ N, as follows:
a0 = 0, b0 = b, cn =
an + bn
2
, n ≥ 0, an+1 =
an for f an × f (cn) < 0
cn otherwise
,
bn+1 =
bn for f cn × f (bn) < 0
cn otherwise
.
(2.2)
The bipartition method is based on the following theorem.
Theorem 2.1 The sequences {an}, {bn}, {cn}, n ∈ N, given by formulae (2.2), are convergent, and
their limit is the value of the unique real root α of equation (2.1) on the interval [a, b].
Demonstration. Let us show that
bn − an =
b − a
2n , (2.3)
for any n ∈ N.
To fix the ideas, we suppose that f (a) < 0 and f (b) > 0. If f (cn−1) < 0, then
bn − an = bn−1 − cn−1 = bn−1 −
an−1 + bn−1
2
=
bn−1 − an−1
2
, (2.4)
whereas if f (cn−1) > 0, we get
bn − an = cn−1 − an−1 =
an−1 + bn−1
2
− an−1 =
bn−1 − an−1
2
. (2.5)
Hence, in general,
bn − an =
bn−1 − an−1
2
=
bn−2 − an−2
22
= · · · =
b0 − a0
2n =
b − a
2n . (2.6)
It is obvious that
an < cn < bn, n ∈ N. (2.7)
From the definition of the sequence {an}, n ∈ N, it follows that an+1 = an or an+1 = cn =
(an + bn)/2 > an. We may write
an+1 ≥ an, n ∈ N; (2.8)
hence, the sequence {an}, n ∈ N, is monotone increasing. Analogically, we obtain the relation
bn+1 ≤ bn, n ∈ N; (2.9)
this means that the sequence {bn}, n ∈ N, is monotone decreasing.
Let us have the sequence of relations
a = a0 ≤ a1 ≤ · · · ≤ an ≤ · · · ≤ bn ≤ · · · ≤ b1 ≤ b0 = b, (2.10)
where the sequence {an}, n ∈ N, is superior bounded by any value bn, n ∈ N; in particular, we can
take bn = b. The sequence {bn}, n ∈ N, is inferior bounded by any value an, n ∈ N, in particular
by an = a.
THE BIPARTITION (BISECTION) METHOD 19
We have stated thus that {an}, n ∈ N, is a monotone increasing and superior bounded (by b)
sequence, and hence it is convergent, while the sequence {bn}, n ∈ N is a monotone decreasing and
inferior bounded (by a) sequence, and hence it is convergent, too.
Let A = lim
n→∞
an and B = lim
n→∞
bn. Let us show that A = B, that is, that the sequences {an},
{bn}, n ∈ N, have the same limit. We have
A − B = lim
n→∞
an − lim
n→∞
bn = lim
n→∞
(an − bn). (2.11)
On the other hand, taking into account relation (2.6), we get
lim
n→∞
(an − bn) = lim
n→∞
a − b
2n = 0. (2.12)
The last two expressions show that A − B = 0, hence A = B.
Let A = lim
n→∞
an = lim
n→∞
bn. Applying now the tongs theorem for the sequences {an}, {bn}, and
{cn}, n ∈ N, and taking into account (2.7), it follows that the sequence {cn}, n ∈ N, is convergent
and lim
n→∞
cn = A.
Let us show that f (A) = 0. We have
f (A) = f lim
n→∞
an = lim
n→∞
f (an) ≤ 0, (2.13)
f (A) = f lim
n→∞
bn = lim
n→∞
f (bn) ≥ 0, (2.14)
where if f is continuous, then the function commutes into the limit. The last two expressions lead
to f (A) = 0, and hence A is the root α of the equation f (x) = 0 on the interval [a, b].
To determine the corresponding error, we can proceed in two modes.
In the first method, we start from the evident relations
|an − bn| = 2|an − cn|, |an − bn| = 2|bn − cn|, (2.15)
where an = cn−1, or bn = cn−1, from which we obtain
|an − bn| = 2|cn−1 − cn|, (2.16)
so that
|cn − α| < |an − bn| = 2|cn−1 − cn|. (2.17)
To determine the solution α with an error ε, we must calculate the terms of the sequence {cn},
n ∈ N, until the relation
2|cn−1 − cn| < ε (2.18)
is fulfilled. We then have an a posteriori estimation of the error.
In the second method, we start from the relation
|cn − α| < bn − an =
b − a
2n . (2.19)
To determine now the solution α with an error ε, we must calculate n terms of the sequence {cn},
n ∈ N, so that
b − a
2n < ε ⇒ n =
ln b−a
ε
ln 2
+ 1, (2.20)
where the brackets represent the entire part of the fraction. We then have an a priori estimation of
the error.
20 SOLUTION OF EQUATIONS
Observation 2.1 If equation (2.1) has several roots on the interval [a, b], the above algorithm
leads to one of these roots.
2.2 THE CHORD (SECANT) METHOD
Let us consider the equation2
f (x) = 0, (2.21)
where f : [a, b] ⊂ R → R, f continuous on [a, b], with a single root on the interval [a, b].We
construct a point c ∈ [a, b] using the following rule. Let us consider the straight line AB, where
A(a, f (a)) and B(b, f (b)). The equation of this line, denoted by (d) in Figure 2.1, is
y − f (a)
f (b) − f (a)
=
x − a
b − a
. (2.22)
The value of the abscissa c of the intersection point of the straight line (d) with the Ox-axis is
given by the equation
−f (a)
f (b) − f (a)
=
c − a
b − a
, (2.23)
from which we obtain
c =
af (b) − bf (a)
f (b) − f (a)
. (2.24)
The method consists firstly in verifying if a or b is solution of equation (2.21). If f (a) = 0 or
f (b) = 0, then the procedure stops. Otherwise, we determine c using formula (2.24). If f (c) = 0,
then the algorithm ends, the required solution of equation (2.21) being found. If f (c) = 0, then
we calculate the products. If f (a) · f (c) < 0, then the solution is in the interval [a, c], and if
f (a) · f (c) > 0, then, obviously, we consider the interval [c, b]. Thus, the interval [a, b] is replaced
by one of the intervals [a, c] or [c, b], the length of which is strictly smaller than that of the
interval [a, b].
A(a, f(a))
f(x)
O x
c ba
B(b, f(b))
y
Figure 2.1 The chord method.
2The method was known by the Babylonian and Egyptian mathematicians in different forms. It also appears (as
regula falsi) in the papers of Abu Kamil (tenth century), Qusta ibn Luqa (tenth century), and Leonardo of Pisa
(Fibonacci, 1202).
THE CHORD (SECANT) METHOD 21
The chord method involves the construction of three sequences {an}, {bn}, and {c}, n ∈ N, defined
recurrently as follows:
a0 = a, b0 = b, cn =
anf (bn) − bnf (an)
f (bn) − f (an)
, an+1 =
an if f an · f (cn) < 0
cn otherwise
;
bn+1 =
bn if f cn · f (bn) < 0
cn otherwise
.
(2.25)
Theorem 2.2 Let f : [a, b] → R, f ∈ C0
([a, b]), f with a single root in the interval [a, b].
Under these conditions, the sequence {cn}, defined by relations (2.25), converges to α, the unique
solution of equation (2.21) in the interval [a, b].
Demonstration. The sequences {an} and {bn}, n ∈ N, satisfy the relation
an < bn, (∀) n ∈ N. (2.26)
Indeed, for n = 0 we have a0 = a < b = b0.
On the other hand, if f (cn−1) = 0, then we have
an − bn = cn−1 − bn =
an−1f (bn−1) − bn−1f (an−1)
f (bn−1) − f (an−1)
− bn−1 =
f (bn−1)(an−1 − bn−1)
f (bn−1) − f (an−1)
(2.27)
for an = cn−1, and
an − bn = an−1 − cn−1 = an−1 −
an−1f (bn−1) − bn−1f (an−1)
f (bn−1) − f (an−1)
= −
f (an−1)
f (bn−1) − f (an−1)
(2.28)
for bn = cn−1, respectively.
Let us suppose that f (a) < 0 and f (b) > 0, which leads to f (an) < 0 and f (bn) > 0, (∀) n ∈ N,
respectively. In this case, it follows that an − bn has the same sign as an−1 − bn−1. By complete
induction we obtain an < bn, hence relation (2.26) is true.
We have
an < cn < bn, (∀)n ∈ N. (2.29)
Indeed, we can write
an − cn = an −
anf (bn) − bnf (an)
f (bn) − f (an)
= −
f (an)(an − bn)
f (bn) − f (an)
< 0 (2.30)
and
bn − cn = bn −
anf (bn) − bnf (an)
f (bn) − f (an)
= −
f (bn)(bn − an)
f (bn) − f (an)
> 0, (2.31)
respectively, hence relation (2.29) is true.
We thus show that the sequence {an}n∈N is monotone increasing and superior bounded by any
element bn of the sequence {bn}n∈N, in particular by b0 = b. Hence, the sequence {an}n∈N is conver-
gent, and let A be its limit. Analogically, the sequence {bn}n∈N is monotone decreasing and inferior
bounded by any element of the sequence {an}n∈N, particularly by a0 = a; hence, the sequence
{bn}n∈N is convergent, and let B be its limit. We thus obtain
lim
n→∞
an = A, lim
n→∞
bn = B, A ≤ B, A, B ∈ [a, b]. (2.32)
22 SOLUTION OF EQUATIONS
Let us show now that the sequence {cn} is convergent.
Case 2.1 We suppose that A = B. Using inequality (2.29) and passing to the limit, we obtain
A = lim
n→∞
an ≤ lim
n→∞
cn ≤ lim
n→∞
bn = B (2.33)
and the theorem of tongs leads to
lim
n→∞
cn = A = B. (2.34)
On the other hand,
f (A) = f ( lim
n→∞
an) = lim
n→∞
f (an) ≤ 0 (2.35)
and
f (B) = f ( lim
n→∞
bn) = lim
n→∞
f (bn) ≥ 0; (2.36)
because of the continuity of f , the limit commutes with the function. It follows from equation (2.35)
and equation (2.36) that f (A) = f ( lim
n→∞
cn) = 0 and, because f has a single root in the interval
[a, b], we deduce that A = α, hence lim
n→∞
cn = α.
Case 2.2 We suppose that A = B. Let us assume at the very beginning that it is not possible to
have f (A) = f (B) = 0 because f has only one root in the interval [a, b]. Hence, f (A) = f (B).
Let us now pass to the limit for
cn =
anf (bn) − bnf (an)
f (bn) − f (an)
. (2.37)
We get
lim
n→∞
cn = lim
n→∞
anf (bn) − bnf (an)
f (bn) − f (an)
=
Af (B) − Bf (A)
f (B) − f (A)
. (2.38)
If f (A) = 0 and f (B) = 0, then relation (2.38) leads to
lim
n→∞
cn =
Af (B) − B · 0
f (B) − 0
= A, (2.39)
hence cn → α. If f (B) = 0 and f (A) = 0, then relation (2.38) leads to
lim
n→∞
cn =
A · 0 − Bf (A)
0 − f (A)
= B, (2.40)
so that we get once more cn → α. Finally, if f (A) = 0 and f (B) = 0, it is obvious that f (A) < 0
and f (B) > 0.
On the other hand, the inequalities
A <
Af (B) − Bf (A)
f (B) − f (A)
< B (2.41)
hold, which is evident because they lead to
−Af (A) < −Bf (A) and Af (B) < Bf (B). (2.42)
THE CHORD (SECANT) METHOD 23
Passing now to the limit in equation (2.37) and taking into account inequality (2.41), we get
A < lim
n→∞
cn < B, (2.43)
so that we have an+1 = cn or bn+1 = cn for any n ≥ 0. Hence,
{cn|n ∈ N} ⊂ {am|m ∈ N∗
} ∪ {bm|m ∈ N∗
} ⊂ (−∞, A) ∪ (B, +∞), (2.44)
from which
lim
n→∞
cn ∈ (−∞, A] ∪ [B, ∞). (2.45)
Relations (2.43) and (2.45) are in contradiction so that this case is not possible, and the theorem is
proved.
Theorem 2.3 (a posteriori estimation of the error). Let f : [a, b] → R, f continuous on [a, b]
and derivable on (a, b); we suppose that f has a single root α on [a, b] and that there exist the
real and strict positive constants m > 0, M > 0 such that
m ≤ |f (x)| ≤ M, (∀)x ∈ (a, b). (2.46)
Under these conditions, the relation
|cn−1 − α| ≤
M
m
|cn − cn−1|, (2.47)
which represents the a posteriori estimation of the error in the chord method, holds.
Demonstration. Assuming that f (cn−1) = 0, we can write
cn − cn−1 =
anf (bn) − bnf (an)
f (bn) − f (an)
− an =
f (an)(an − bn)
f (bn) − f (an)
(2.48)
if f (cn−1) < 0 and
cn − cn−1 =
anf (bn) − bnf (an)
f (bn) − f (an)
− bn =
f (bn)(an − bn)
f (bn) − f (an)
(2.49)
if f (cn−1) > 0, respectively.
Let us apply now Lagrange’s finite increments formula to the function f on the interval [an, bn].
Hence, there exists ξ ∈ (an, bn) such that
f (bn) − f (an) = f (ξ)(bn − an). (2.50)
From equation (2.48), equation (2.49), and equation (2.50) we get
cn − cn−1 = −
f (an)
f (ξ)
for f (cn−1) < 0 (2.51)
or
cn − cn−1 = −
f (bn)
f (ξ)
for f (cn−1) > 0. (2.52)
24 SOLUTION OF EQUATIONS
Let us now apply Lagrange’s formula to the restriction of the function f on the interval [an, α].
Hence, there exists ξn ∈ (an, α) such that
f (α) − f (an) = f (ξn)(α − an); (2.53)
because f (α) = 0, we get
−f (an) = f (ξn)(α − an). (2.54)
Obviously, in the case f (cn−1) > 0, we apply Lagrange’s formula to the restriction of function f
in the interval (α, bn), the calculation being analogous. From equation (2.51) and equation (2.54),
it follows that
cn − cn−1 = −
f (ξn)(α − an)
f (ξn)
. (2.55)
On the other hand, we can write the relations
|cn − cn−1| ≥ |α − cn−1|, |α − an| ≤ |cn − cn−1|, (2.56)
|f (ξn)| ≤ M, |f (ξn)| ≥ m, (2.57)
so that, by applying the modulus, expression (2.55) leads to
|α − cn−1| ≤
M
m
|cn − cn−1|. (2.58)
Theorem 2.4 (a priori estimation of the error). Let f : [a, b] → R, f having a single root on
[a, b]. If f is convex, strictly increasing, and derivable on [a, b] and if f (a) > 0, then the
relation
α − cn ≤ 1 −
f (a)
f (b)
n
(α − c0) ≤ 1 −
f (a)
f (b)
n
(b − a) (2.59)
holds.
Demonstration. Because f is convex, we deduce that f is strictly increasing on [a, b] so that we
have
f (a) < f (x) < f (b), (∀)x ∈ (a, b). (2.60)
From equation (2.37), taking into account that f is convex and supposing that f (a) < 0,
f (b) > 0, we obtain
α − cn = α −
cn−1f (b) − bf (cn−1)
f (b) − f (cn−1)
=
f (b)(α − cn−1) − f (cn−1)(α − b)
f (b) − f (cn−1)
. (2.61)
We now apply Lagrange’s theorem to the function f on the interval [α, b]; hence, there exists
ξ ∈ (α, b) such that
f (b) − f (α) = f (ξ)(b − α). (2.62)
Analogically, applying Lagrange’s formula to the function f on the interval [cn−1, α], it results in
the existence of ζ ∈ (cn−1, α), for which we can write
f (α) − f (cn−1) = f (ζ)(α − cn−1). (2.63)
THE CHORD (SECANT) METHOD 25
O
y
x
2
f(b)
4
f(b)
8
f(b)
f(b)
f(a)
a c3c2c1c0
A
B
b
Figure 2.2 Modified chord method.
Because f (α) = 0, expressions (2.62) and (2.63) take a simpler form
f (b) = f (ξ)(b − α), (2.64)
−f (cn−1) = f (ζ)(α − cn−1), (2.65)
respectively.Replacing the last two relations in formula (2.61), we obtain
α − cn =
f (ξ) − f (ζ)
f (b) − f (cn−1)
(b − α)(α − cn−1). (2.66)
Because ζ < ξ and f is strictly increasing, we get
f (ξ) − f (ζ) > 0. (2.67)
On the other hand, f (b) > 0 and f (cn−1) < 0. Relation (2.66) leads now to
α − cn ≤
f (ξ) − f (ζ)
f (b)
(b − α)(α − cn−1). (2.68)
Replacing relation (2.64) in the last formula, we get
α − cn ≤
f (ξ) − f (ζ)
f (ξ)
(α − cn−1) = 1 −
f (ζ)
f (ξ)
(α − cn−1) ≤ 1 −
f (a)
f (b)
(α − cn−1). (2.69)
If we write relation (2.69) for n − 1, n − 2, . . . , 1, it results in
α − cn ≤ 1 −
f (a)
f (b)
n
(α − c0) ≤ 1 −
f (a)
f (b)
n
(b − a), (2.70)
and the theorem is proved.
A variant of this method supposes the division by 2, at each step, of the value of the function at
the end at which it is maintained. The situation is presented in a graphical form in Figure 2.2.
In the case considered in the figure, we obtain the results
c0 =
af (b) − bf (a)
f (b) − f (a)
, c1 =
c0
f (b)
2 − bf (c0)
f (b)
2 − f (c0)
, c2 =
c1
f (b)
4 − bf (c1)
f (b)
4 − f (c1)
, c3 =
c2
f (b)
8 − bf (c2)
f (b)
8 − f (c2)
.
(2.71)
26 SOLUTION OF EQUATIONS
O x
x
x2 x1 x0
f(x0)
y
Figure 2.3 The tangent method.
2.3 THE TANGENT METHOD (NEWTON)
Let us consider the equation3
f (x) = 0, (2.72)
with the root x, and let (x − , x + ) be the interval on which equation (2.72) has a single
solution (obviously x).
Let us consider the point x0 ∈ (x − , x + ) and let us construct the tangent to the graph
of the function f at the point (x0, f (x0)) (Fig. 2.3); the corresponding equation is
y − f (x0) = f (x0)(x − x0). (2.73)
The point of intersection with the Ox-axis is given by
−f (x0) = f (x0)(x1 − x0), (2.74)
from which
x1 = x0 −
f (x0)
f (x0)
. (2.75)
The last formula allows the construction of a recurrent sequence {xn}n∈N in the form
x0 ∈ (x − , x + ), xn+1 = xn −
f (xn)
f (xn)
. (2.76)
The tangent method consists in the construction of the terms of the sequence {xn}n∈N until a
certain criterion of stopping is satisfied, that is, until we obtain x or until the modulus of the
difference between two consecutive terms xn and xn+1 of the sequence is smaller than an ε a priori
given.
3
The method is sometimes called the Newton–Raphson method. It appears in De analysi per aequationes numero
terminorum infinitas (1711) by Isaac Newton (1642–1727) used for finding polynomials roots, in De metodis
fluxionum et serierum infinitarum (1736) by Isaac Newton, again for polynomials roots, in A Treatise of Algebra
both Historical and Practical by John Wallis (1690), and in Analysis aequationum universalis (1690) by Joseph
Raphson (circa 1648–circa 1715). The general case of the method for arbitrary equations was given by Thomas
Simpson in 1740.
THE TANGENT METHOD (NEWTON) 27
x∗ xO
y
Figure 2.4 The tangent at the point x∗
at the graph of the function f is horizontal and it does not
intersect the Ox-axis.
Therefore, we state the following theorem.
Theorem 2.5 Let us consider the function f : (x − , x + ) → R, with f (x) = 0, which has a
single root in the interval (x − , x + ). Let us suppose that f is twice derivable on (x − , x +
) and that there exist the real, strictly positive constants α > 0, β > 0 so as to have |f (x)| ≥ α,
|f (x)| ≤ β for any x ∈ (x − , x + ). If we denote by λ the value min{ , 2α/β}, then, for
any x0 ∈ (x − λ, x + λ), the sequence {xn}n∈N, defined by relations (2.76), converges to x.
Demonstration. Let us observe firstly that, because of the hypothesis of the existence of the constant
α > 0, the derivative f does not vanish on the interval (x − , x + ). Hence, the situation
considered in Figure 2.4, in which the tangent to the graph of the function f at the point x∗
is
horizontal, cannot be accepted in the hypothesis considered.
Taking into account the existence of the constant β > 0, we can state that if x ∈ (x − , x +
) such that |f (x)| < ∞ exists, then we have |f (x)| < ∞ for any x ∈ (x − , x + ). There-
fore, let us apply Lagrange’s formula of finite increments to the function f on the interval
defined by the ends x and x. We deduce the existence of a point ξ in this interval for which we
have
|f (x) − f (x)| = |x − x| · |f (ξ)| ≤ β|x − x| ≤ β , (2.77)
from which
2β + f (x) ≤ f (x) ≤ 2β + f (x), (2.78)
hence |f (x)| < ∞. Thus, the hypotheses of the theorem avoid the situation in Figure 2.5, which
leads to x = x∗
in the iteration sequence (2.76) (xn = xn+k, for any k ≥ 1).
y
x∗O x
Figure 2.5 The tangent at the point x∗ at the graph of the function f is vertical and, by iteration of
relation (2.76) for x = x∗
we get xn = xn+k for any k ≥ 1.
28 SOLUTION OF EQUATIONS
We cannot have |f (x)| = ∞ for any x ∈ (x − , x + ) because the graph of f would be a
vertical straight line passing through x∗
so that f can no more be a function in the sense of the
known definition.
The sequence {xn}n∈N satisfies the relation
|x − xn+1| ≤
β
2α
|x − xn|2
. (2.79)
Indeed, we may write successively
|x − xn+1| = x − xn +
f xn
f (xn)
=
f xn (x − xn) + f (xn)
f (xn)
=
−f xn (x)xn − f (xn)
f (xn)
,
(2.80)
so that
|x − xn+1| =
f (x) − f (xn) − f (xn)(x − xn)
f (xn)
, (2.81)
because f (x) = 0.
On the other hand, by representing the function f by means of a Taylor series around the point
xn, we have
f (x) = f (xn) +
x − xn
1!
f (xn) +
(x − xn)2
2!
f (ξ), (2.82)
where ξ is a point situated between x and xn.
From relations (2.81) and (2.82) we get
|x − xn+1| =
x − xn
2
f (ξ)
2!
1
f xn
(2.83)
and, taking into account that |f (xn)| ≥ α, |f (ξ)| ≤ β, we obtain equation (2.79).
To show that the sequence {xn}n∈N has its terms in the interval (x − λ, x + λ), we use an
induction method. The affirmation is obvious for n = 0 because of the choice of x0. Let us now
suppose that xn ∈ (x − λ, x + λ). From equation (2.79) we get
|x − xn+1| ≤
β
2α
|x − xn|2
<
β
2α
· λ2
=
β
2α
λ · λ ≤ λ, (2.84)
which leads to
−λ < x − xn+1 < λ, x − λ < xn+1 < x + λ, (2.85)
hence xn+1 ∈ (x − λ, x + λ). Therefore, if xn ∈ (x − λ, x + λ), then xn+1 ∈ (x − λ, x + λ) and
also x0 ∈ (x − λ, x + λ). It follows that xn ∈ (x − λ, x + λ) for any n ∈ N.
To show that {xn}n∈N converges to x, we multiply expression (2.79) by β/(2α). We obtain
β
2α
|x − xn+1| ≤
β
2α
x − xn
2
. (2.86)
Let us denote by {zn}n∈N the sequence defined by
zn =
β
2α
(x − xn), n ∈ N, (2.87)
so that equation (2.86) can now be written as
zn+1 ≤ z2
n. (2.88)
THE TANGENT METHOD (NEWTON) 29
Written for n − 1, n − 2, . . . , 0, relation (2.88) leads to
zn+1 ≤ z2n+1
0 . (2.89)
On the other hand,
z0 =
β
2α
|x − x0| <
β
2α
λ ≤ 1, (2.90)
corresponding to the definition of λ. Finally, there results
lim
n→∞
zn = 0, lim
n→∞
β
2α
|x − xn| = 0, (2.91)
from which
lim
n→∞
xn = x, (2.92)
so that the sequence {xn}n∈N converges to the single root x ∈ (x − , x + ) of the equation
f (x) = 0.
Proposition 2.1 (a priori estimation of the tangent method). If λ < 2α/β, then the relation
|x − xn| ≤
2α
β
β
2α
λ
2n
(2.93)
exists under the conditions of Theorem 2.5.
Demonstration. We can easily obtain
β
2α
(x − xn) ≤
β
2α
x − x0
2n
<
β
2α
λ
2n
, (2.94)
from relation (2.79), and the proposition is proved.
Observation 2.2 To obtain the root x with a precision ε we get, from formula (2.93), the estimation
2α
β
β
2α
λ
2n
< ε, (2.95)
from which we get the number of iteration steps
n = ln ln
εβ
2α
/ ln
βλ
2α
/ ln 2 + 1, (2.96)
whereby the entire part of the function within square brackets is proved.
Proposition 2.2 (a posteriori estimation of the error in the tangent method). We have the
expression
|xn+1 − x| ≤
β
2α
|xn+1 − xn|2
, (2.97)
in the frame of Theorem 2.5.
30 SOLUTION OF EQUATIONS
Demonstration. By expansion into a Taylor series of the function f around xn, we get
f (xn+1) = f (xn) +
xn+1 − xn
1!
f (xn) +
(xn+1 − xn)2
2!
f (ζ), (2.98)
from which
f (xn+1) − f (xn) −
xn+1 − xn
1
f (xn) =
(xn+1 − xn)2
2
f (ζ), (2.99)
where ζ is a point situated between xn and xn+1. Applying the modulus to equation (2.99) and
taking into account equation (2.76), we get
|f (xn+1)| =
(xn+1 − xn)2
2
|f (ζ)|. (2.100)
On the other hand, from the hypotheses of Theorem 2.5 we obtain
|f (ζ)| ≤ β, (2.101)
and relation (2.100) may be transcribed into the form
|f (xn+1)| ≤
β
2
|xn+1 − xn|2
. (2.102)
Applying the formula of finite increments to the function f between the points xn+1 and x (the root
of the equation f (x) = 0 in the interval (x − , x + )), the existence of a point ξ between xn+1
and x such that
f (xn+1) − f (x) = f (ξ)(xn+1 − x) (2.103)
is proved.
Taking into account that f (x) = 0, relations (2.102) and (2.103) lead to
|f (ξ)||xn+1 − x| ≤
β
2
|xn+1 − xn|2
(2.104)
and, taking into account that |f (ξ)| ≥ α, we obtain relation (2.97), which we had to prove.
Observation 2.3 To obtain the root x with precision ε, formula (2.97) leads to
β
2α
|xn+1 − xn|2
< ε, (2.105)
from which
|xn+1 − xn| <
2αε
β
; (2.106)
the iteration algorithm continues until the modulus of the difference of two consecutive iterations
becomes smaller than
√
2αε/β.
Theorem 2.6 Let f : [a, b] → R a function that satisfies the following conditions:
(i) f is strictly positive on (a, b), that is, f (x) > 0, (∀) x ∈ (a, b);
(ii) f is strictly positive on (a, b), hence f (x) > 0, (∀) x ∈ (a, b);
(iii) f has a single root x in the interval (a, b).
THE TANGENT METHOD (NEWTON) 31
In the above hypotheses, the sequence {xn}n∈N, defined by relation (2.76) with f (x0) > 0, is a
sequence of real numbers that converges to x.
Demonstration. The sequence {xn}n∈N is a decreasing one. To prove this, we write Taylor’s relation
for the points xn+1 and xn so that
f (xn+1) = f (xn) +
xn+1 − xn
1!
f (xn) +
(xn+1 − xn)2
2!
f (ξ), (2.107)
where ξ is a point between xn and xn+1.
On the other hand, from relation (2.76) we obtain
f (xn) + f (xn)(xn+1 − xn) = 0, (2.108)
which, replaced in formula (2.107), leads to
f (xn+1) =
f (ξ)
2
(xn+1 − xn)2
. (2.109)
Taking into account hypothesis (ii), we get f (xn+1) > 0, (∀) n ≥ 0, and because f (x0) > 0 it
follows that f (xn) > 0, (∀) n ∈ N.
Relation (2.76) may be written in the form
xn+1 − xn = −
f (xn)
f (xn)
(2.110)
and because f (xn) > 0, f (xn) > 0 (hypothesis (i)), we have
xn+1 − xn < 0, (2.111)
and hence the sequence {xn}n∈N is a decreasing one (even strictly decreasing).
The sequence {xn}n∈N is inferior bounded by x, the unique solution of the equation f (x) = 0
in the interval (a, b). Indeed, because f (xn) ≥ 0, (∀) n ∈ N, and the function f is strictly
increasing on (a, b) (hypothesis (i)) and f (x) = 0, we obtain xn ≥ x, (∀) n ∈ N, and hence the
sequence {xn}n∈N is inferior bounded by x.
From the previous two steps, we deduce that {xn}n∈N is convergent; let x∗
be its limit. Passing
to the limit for n → ∞ in the definition relation (2.76), we get
lim
n→∞
xn+1 = lim
n→∞
xn − lim
n→∞
f (xn)
f (xn)
, (2.112)
from which
x∗
= x∗
−
f (x∗
)
f (x∗)
, (2.113)
hence f (x∗
) = 0. But f (x) = 0 and f have a single root in (a, b) such that x∗
= x; hence the
theorem is proved.
Observation 2.4
(i) Theorem 2.6 makes sure that, in the conditions of the hypotheses, the sequence {xn}n∈N is
convergent to x with f (x) = 0, and x0 can be taken arbitrarily in the interval (a, b), with
the condition f (x0) > 0. In particular, if the conditions (i) and (ii) are satisfied at the point
b, we can take x0 = b.
32 SOLUTION OF EQUATIONS
(ii) If the function f is strictly concave and decreasing, then we can consider the function −f ,
which has the same root x, attaining the hypotheses of Theorem 2.6.
(iii) If f is strictly convex and decreasing, then we can take x0 = a, assuming that the hypotheses
(i) and (ii) of Theorem 2.6 are satisfied at the point a.
(iv) If the function f is strictly concave and increasing, then we consider the function −f ,
which satisfies the conditions of point (iii) of this observation.
Observation 2.5 We can no more give formulae for an a priori or an a posteriori estimation of
the error in the conditions of Theorem 2.6. Therefore, the sequence of iterations stops usually when
|xn+1 − xn|2
< ε, where ε is the imposed error.
Observation 2.6 Newton’s method presented here has at least two deficiencies. The first one
consists in the choice of intervals of the form (x − µ, x + µ), where x is the required solution,
that is, intervals centered just at the point x, which is unknown. This deficiency can be easily
eliminated for functions twice differentiable as shown later. The second deficiency arises because
in any iteration step we must calculate f (xn) as well as f (xn). We can construct a simplified
Newton’s method in which we need not calculate f (xn) every time, but always use f (x0). Such
a method is given by Theorem 2.8.
Theorem 2.7 (general procedure of choice of the start point x0). Let f : [a, b] → R be a func-
tion twice differentiable for which f (a) < 0 and f (b) > 0. Let us suppose that there exist the
strict positive constants α and β such that |f (x)| ≥ α and |f (x)| ≤ β for any x ∈ [a, b]. We
apply the bisection method to the equation f (x) = 0 on the interval [a, b] until we obtain an
interval [m1, m2] for which a < m1, m2 < b and m2 − m1 < 2α/β. Choosing x0 ∈ (m1, m2),
the sequence of successive iterations given by Newton’s method converges to the unique solution
x of the equation f (x) = 0 in the interval [a, b].
Demonstration. From the condition |f (x)| ≥ α, α > 0 and because f is twice differentiable, it
follows that f (x) does not change the sign in the interval [a, b]. But f (a) < 0 and f (b) > 0,
and hence f is strictly increasing (f (x) > 0, (∀)x ∈ [a, b]). Hence, f has a single solution x in
the interval [a, b] so that such a hypothesis is not necessary.
Let [γn, γn] be the interval obtained at the nth iteration in the bipartition method. It is known
that the sequences {γn}n∈N and {γn}n∈N converge to x. Let us introduce the value
ε = min x − a, b − x,
2α
β
; (2.114)
we observe that ε > 0.
There result the following statements:
• there exists n such that |γn − x| < ε for n > n ;
• there exists n such that |γn − x| < ε for n > n ;
• there exists n such that |γn − x| < ε for n > n .
Let us denote nε = max{n , n , n }. From the above three statements, we obtain
|γn − x| < ε, |γn − x| < ε, |γn − x| < ε, with n > nε. (2.115)
THE TANGENT METHOD (NEWTON) 33
We denote by [m1, m2] the interval [γn, γn] corresponding to n = nε + 1. The first inequality
(2.115) leads to
−ε < x − m1 < ε; (2.116)
hence, because a − x > ε, we get m1 > a. Analogically, from the second relation (2.115) we obtain
m2 < b, and hence the last relation (2.115) leads to
m2 − m1 < ε <
2α
β
. (2.117)
On the other hand, the interval [m1, m2] can be written in the form
a + i
b − a
2nε+1
, a + (i + 1)
b − a
2nε+1
, (2.118)
with i ∈ N, i > 0 (because m1 > a) and i + 1 < 2nε+1
(because m2 < b). We have
m1 − (m2 − m1) = a + (i − 1)
b − a
2nε+1
≥ a, (2.119)
m2 − (m2 − m1) = a + (i + 2)
b − a
2nε+1
≤ b. (2.120)
Considering that x ∈ (m1, m2), we get
m1 > x − (m2 − m1), m2 < x + (m2 − m1), x − (m2 − m1) > a, x + (m2 − m1) < b.
(2.121)
Introducing the notation
= m2 − m1, (2.122)
we are led to the sequence of inclusions
(m1, m2) ⊂ (x − , x + ) ⊂ [a, b]. (2.123)
On the other hand, m2 − m1 < 2α/β, hence λ = m2 − m1 = in Theorem 2.5.
Theorem 2.8 (simplified Newton’s method). Let f : (x − , x + ) → R be a function for
which x is its single root in the interval (x − , x + ). Let us suppose that f is twice
differentiable on (x − , x + ) and that there exist two strictly positive constants α and β
such that |f (α)| ≥ α and |f (x)| ≤ β for any x ∈ (x − , x + ). Also, let λ be such that
0 < λ < min{ , α/(2β)}. Under these conditions, the sequence {xn}n∈N defined by
x0 ∈ (x − λ, x + λ), xn+1 = xn −
f (xn)
f (x0)
, with f (x0) = 0 (2.124)
converges to x (Fig. 2.6).
Demonstration. Let us show that xn ∈ (x − λ, x + λ) for any n ∈ N using the induction method. By
the choice of x0, it follows that the statement is true for n = 0. Let us suppose that the affirmation
is true for n and let us state it for n + 1. We have, successively,
|x − xn+1| = x − xn +
f xn
f (x0)
=
f x0 (x − xn) + f (xn)
f (x0)
. (2.125)
34 SOLUTION OF EQUATIONS
O x
x
x2 x1 x0
f(x0)
y
Figure 2.6 Simplified Newton’s method.
On the other hand, f (x) = 0, and the previous relation leads to
|x − xn+1| =
1
|f (x0)|
|f (x0)(x − xn) + f (xn) − f (x)|. (2.126)
Let us now apply Lagrange’s formula of finite increments to the function f on the interval
defined by the points xn and x. It results in the existence of a point ξ situated between xn and x
such that
f (xn) − f (x) = f (ξ)(xn − x). (2.127)
Relation (2.126) becomes
|x − xn+1| =
1
|f (x0)|
|[f (x0) − f (ξ)](x − xn)|. (2.128)
We now apply Lagrange’s formula to the function f on the interval defined by the points x0 and
ξ; let us deduce that there exists a point ζ in this interval such that
f (x0) − f (ξ) = f (ζ)(x0 − ξ). (2.129)
Relation (2.128) now becomes
|x − xn+1| =
1
|f (x0)|
|f (ζ)||x0 − ξ||x − xn|. (2.130)
Taking into account the hypotheses of the theorem concerning the derivatives f and f and the
constants α > 0 and β > 0, relation (2.130) leads to
|x − xn+1| ≤
β
α
|x0 − ξ||x − xn|. (2.131)
We may now write the following sequence of relations
|x0 − ξ| = |x0 − x + x − ξ| ≤ |x0 − x| + |x − ξ| ≤ λ + λ = 2λ; (2.132)
THE TANGENT METHOD (NEWTON) 35
from equation (2.131) and equation (2.132) we obtain
|x − xn+1| ≤
2βλ
α
|x − xn|. (2.133)
By the choice of λ in the hypotheses of the theorem, we get 2βλ/α < 1; hence,
|x − xn+1| < |x − xn|. (2.134)
The induction hypothesis |x − xn| < λ leads to |x − xn+1| < λ, hence xn+1 ∈ (x − λ, x + λ),
and the induction principle states that xn ∈ (x − λ, x + λ) for any n ∈ N.
Let us show that xn → x for n → ∞. We write relation (2.133) for n − 1, n − 2, . . . , 0, hence
|x − xn+1| ≤
2βλ
α
n+1
|x − x0|; (2.135)
because 2βλ/α < 1, we get
|x − xn+1| → 0 for n → ∞, (2.136)
that is, lim
n→∞
xn = x, and the theorem is proved.
Proposition 2.3 (a priori estimation of the error in Newton’s simplified method). The relation
|x − xn| ≤
2βλ
α
n
λ (2.137)
exists under the conditions of Theorem 2.8.
Demonstration. If we write relation (2.135) for n, that is,
|x − xn| ≤
2βλ
α
n
|x − x0|, (2.138)
and if we consider that x0 ∈ (x − λ, x + λ), hence |x − x0| < λ, we obtain the formula required.
Observation 2.7 If we wish to determine x with an imposed accuracy ε, then we have to consider
|x − xn| ≤
2βλ
α
n
λ < ε; (2.139)
we thus obtain the necessary iteration steps in the simplified Newton’s method
n = ln
ε
λ
/ ln
2βλ
α
+ 1, (2.140)
where, as is usual, the square brackets denote the entire part of the function.
Proposition 2.4 (a posteriori estimation of the error in the simplified Newton method). The
relation
|xn+1 − x| ≤
2βλ
α
|xn+1 − xn| (2.141)
exists under the conditions of Theorem 2.8.
36 SOLUTION OF EQUATIONS
Demonstration. Let us write Taylor’s formula for the function f at the points xn+1 and xn. We have
f (xn+1) = f (xn) +
xn+1 − xn
1!
f (xn) +
(xn+1 − xn)2
2!
f (ξ), (2.142)
where ξ is a point between xn and xn+1. From the definition of the sequence {xn}n∈N, we obtain
the relation
f (xn) = f (x0)(xn − xn+1), (2.143)
which, when replaced in equation (2.142), leads to
f (xn+1) = [f (x0) − f (xn)](xn − xn+1) +
(xn+1 − xn)2
2
f (ξ). (2.144)
Let us now apply Lagrange’s formula to the function f (x) for the points x0 and xn. It follows
that there exists ζ such that
f (x0) − f (xn) = f (ζ)(x0 − xn). (2.145)
From equation (2.145) and equation (2.144), we get
f (xn+1) = f (ζ)(x0 − xn)(xn − xn+1) +
(xn+1 − xn)2
2
f (ξ). (2.146)
In modulus, we obtain
|f (xn+1)| = f (ζ) (x0 − xn)(xn − xn+1) +
(xn+1 − xn)2
2
f (ξ)
≤ f (ζ) |x0 − xn| +
|f (ξ)|
2
|xn+1 − xn| |xn+1 − xn|. (2.147)
On the other hand, we have
|x0 − xn| = |x0 − x + x − xn| ≤ |x0 − x| + |x − xn| < 2λ (2.148)
and
|xn+1 − xn| = |xn+1 − x + x − xn| ≤ |xn+1 − x| + |x − xn| < 2λ. (2.149)
Hence,
|f (xn+1)| ≤ f (ζ) |x0 − xn| +
|f (ξ)|
2
|xn+1 − xn| |xn+1 − xn|
< [2λ|f (ζ)| + λ|f (ξ)|]|xn+1 − xn|. (2.150)
The condition of boundedness of |f (x)| on (x − , x + ), expressed by |f (x)| < β with β > 0,
and relation (2.150) lead to
|f (xn+1)| < 3βλ|xn+1 − xn|. (2.151)
Let us now apply Lagrange’s formula to the function f and for the points xn+1 and x,
f (xn+1) − f (x) = f (γ)(xn+1 − x), (2.152)
where γ is a point situated between xn+1 and x.
THE CONTRACTION METHOD 37
On the other hand, f (x) = 0 so that
f (xn+1) = f (γ)(xn+1 − x), (2.153)
which, when introduced in relation (2.151), leads to
|f (γ)||xn+1 − x| < 3βλ|xn+1 − xn|. (2.154)
Considering that |f (x)| ≥ α for any x ∈ (x − , x + ), the above formula leads to relation
(2.141) so that the proposition is proved.
Observation 2.8 If we wish to determine x with an imposed precision ε, then we must continue
the sequence of iterations (2.124) until
|xn+1 − xn| <
αε
3βλ
. (2.155)
Observation 2.9 The statements in Observation 2.4 remain valid in this case too.
2.4 THE CONTRACTION METHOD
Let us consider the equation
f (x) = 0 (2.156)
with f : I → R, where I is an interval of the real axis.
We suppose that we can rewrite the formula in the form
x = φ(x), (2.157)
assuming that x is a solution of equation (2.156) if and only if it is a solution of equation (2.157).
Definition 2.1 The roots of equation (2.157) are called fixed points of the function φ.
Observation 2.10 The passing from equation (2.156) to equation (2.157) is not unique. Indeed,
let us consider that
φ(x) = x − λf (x), (2.158)
where λ is a real arbitrary parameter. In this case, any root x of equation (2.156) is also a root of
equation (2.157) and the converse is also true.
Let us consider an approximation x0 of the root of equation (2.157) and let us construct the
sequence {xn}n∈N defined by the relation of recurrence
xn+1 = φ(xn), n ≥ 0. (2.159)
We have to state sufficient conditions for this sequence so as to converge to the root x of equation
(2.157).
38 SOLUTION OF EQUATIONS
Definition 2.2 Let B be a Banach space and φ : B → B an application for which there exists
q ∈ (0, 1) such that for any two elements x and y of B we have
φ(x) − φ(y) ≤ q x − y . (2.160)
Such a function is called contraction.
Theorem 2.9 (Stefan Banach (1892–1945)). Let B be a Banach space and φ a contraction on it.
In this case, the sequence {xn}n∈N defined by equation (2.159) is convergent to the unique root x
for any x0 ∈ B.
Demonstration. Let us consider two successive terms xn and xn+1 of the sequence {xn}n∈N for
which we can write
xn+1 − xn = φ(xn) − φ(xn−1) ≤ q xn − xn−1 ≤ q2
xn−1 − xn−2 ≤ · · · ≤ qn
x1 − x0 .
(2.161)
On the other hand,
xn+p − xn = xn+p − xn+p−1 + xn+p−1 − xn+p−2 + · · · + xn+1 − xn
≤ xn+p − xn+p−1 + xn+p−1 − xn+p−2 + · · · + xn+1 − xn
≤ qn+p−1
x1 − x0 + qn+p−2
x1 − x0 + · · · + qn
x1 − x0
= qn
x1 − x0 (qp−1
+ qp−2
+ · · · + 1) = qn
x1 − x0
1 − qp
1 − q
<
qn
1 − q
x1 − x0 .
(2.162)
The sequence {xn}n∈N is a Cauchy one. Indeed, for any ε > 0 there exists nε ∈ N such that for
any n ≥ nε and for any p > 0, p ∈ N, to have the relation
xn+p − xn < ε. (2.163)
It is sufficient to assume
qn
1 − q
x1 − x0 < ε, (2.164)
as relation (2.162) suggests; hence, {xn}n∈N is a Cauchy sequence. Because B is a Banach space,
we state that {xn}n∈N is convergent, and let
x∗
= lim
n→∞
xn. (2.165)
We observe that φ satisfies condition (2.160) because it is a contraction, and hence it is contin-
uous. We may write
x∗
= lim
n→∞
φ(xn) = φ( lim
n→∞
xn) = φ(x∗
), (2.166)
Hence, x∗
= x is a root of equation (2.157).
Let us show that x is the unique solution of equation (2.157). Per absurdum, let us suppose that
x is not the unique solution of equation (2.157) and let x be another solution of the same. We have
x − x = φ(x) − φ(x) ≤ q x − x < x − x , (2.167)
because φ is a contraction, and hence x is unique.
Corollary 2.1 Let φ : [a, b] → R so that
(a) for any x ∈ [a, b], we have φ(x) ∈ [a, b];
THE CONTRACTION METHOD 39
(b) there exists q ∈ (0, 1), such that for any x, y of [a, b] we have
|φ(x) − φ(y)| ≤ q|x − y|. (2.168)
Under these conditions,
(i) if x0 ∈ [a, b], then xn ∈ [a, b] for any n ∈ N and the sequence {xn}n∈N is convergent;
(ii) if x = lim
n→∞
xn, then x is the unique root of equation (2.157) in [a, b].
Demonstration. We can apply Banach’s theorem 2.9 because the set of real numbers R is a Banach
space and relation (2.170) shows that φ is a contraction.
On the other hand, φ(x) ∈ [a, b] for any x ∈ [a, b] and, because x0 ∈ [a, b], we successively
deduce that x1 = φ(x0) ∈ [a, b], x2 ∈ [a, b], . . . , xn ∈ [a, b], . . . ; hence the corollary is proved.
Corollary 2.2 Let φ : [a, b] → R so that
(a) we have φ(x) ∈ [a, b] for any x ∈ [a, b];
(b) φ is differentiable on [a, b] and there exists q ∈ (0, 1) such that
|φ (x)| ≤ q < 1, for any x ∈ [a, b]. (2.169)
Under these conditions,
(i) if x0 ∈ [a, b], then xn ∈ [a, b] for any n ∈ N and the sequence {xn}n∈N is convergent;
(ii) if x = lim
n→∞
xn, then x is the only root of equation (2.170) in [a, b].
Demonstration. Let us consider x ∈ [a, b], y ∈ [a, b], x < y. Under these conditions, we can apply
Lagrange’s formula of finite increments to the function φ on the interval [x, y]. Hence, there exists
ξ ∈ (x, y) such that
φ(y) − φ(x) = φ (ξ)(x − y). (2.170)
Applying the modulus, we get
|φ(x) − φ(y)| = |φ (ξ)||x − y|, (2.171)
from which
|φ(x) − φ(y)| ≤ sup
ξ∈[a,b]
|φ (ξ)||x − y| ≤ q|x − y|, (2.172)
so that we can use Corollary 2.1.
Observation 2.11 To apply a method using the above considerations, we must solve the following
problems:
(i) the determination of the interval [a, b] so as to have φ(x) ∈ [a, b] for any x ∈ [a, b];
(ii) φ is a contraction on the interval [a, b].
40 SOLUTION OF EQUATIONS
Proposition 2.5 Let φ : [a − λ, a + λ] → R be a contraction of the contraction constant q. If
|φ(a) − a| ≤ (1 − q)λ, then there exists the relation φ([a − λ, a + λ]) ⊆ [a − λ, a + λ].
Demonstration. Let x ∈ [a − λ, a + λ]. We have
|φ(x) − a| = |φ(x) − φ(a) + φ(a) − a| ≤ |φ(x) − φ(a)| + |φ(a) − a|. (2.173)
On the other hand, φ is a contraction, hence
|φ(x) − φ(a)| ≤ q|x − a|. (2.174)
If we take into account the hypothesis and relation (2.174), then relation (2.173) leads to
|φ(x) − a| ≤ q|x − a| + (1 − q)λ. (2.175)
Because x ∈ [a − λ, a + λ], it follows that
|x − a| ≤ λ (2.176)
so that relation (2.175) allows
|φ(x) − a| ≤ qλ + (1 − q)λ = λ, (2.177)
that is,
φ(x) ∈ [a − λ, a + λ], for any x ∈ [a − λ, a + λ], (2.178)
and the proposition is proved.
Proposition 2.6 Let φ : [a, b] → R. If φ satisfies the conditions
(a) φ is differentiable on [a, b];
(b) the equation x = φ(x) has a root x ∈ (α, β), with
α = a +
b − a
3
, β = b −
b − a
3
; (2.179)
(c) there exists q ∈ (0, 1) such that
|φ (x)| ≤ q < 1, for any x ∈ [a, b]; (2.180)
(d) x0 ∈ (α, β);
then
(i) the sequence {xn}n∈N has all the terms in the interval (a, b);
(ii) the sequence {xn}n∈N is convergent and lim
n→∞
xn = x;
(iii) x is the unique solution of the equation x = φ(x) in (a, b).
Demonstration. The points (ii) and (iii) are obvious consequences of Corollary 2.2.
THE CONTRACTION METHOD 41
To demonstrate point (i), let x1 = φ(x0). Applying the finite increments formula to the function
φ between the points x0 and x, it follows that there exists ξ between x0 and x such that
|x1 − x| = |φ(x0) − φ(x)| = |φ (ξ)||x0 − x|. (2.181)
On the other hand,
|φ (ξ)| ≤ sup
ξ∈[a,b]
|φ (ξ)| ≤ q < 1 (2.182)
and relation (2.181) allows
|x1 − x| ≤ q|x0 − x| ≤ q(β − α) <
b − a
3
; (2.183)
hence, x1 ∈ (a, b). Let us suppose that xn−1 ∈ (a, b) and |xn−1 − x| < (b − a)/3. We wish to
show that |xn − x| < (b − a)/3. We have
|xn − x| = |φ(xn−1) − φ(x)|. (2.184)
We now apply Lagrange’s finite increments formula between the points xn−1 and x so that
|φ(xn−1) − φ(x)| = |xn−1 − x||φ (ζ)| ≤ |xn−1 − x| sup
ζ∈[a,b]
|φ (ζ)| ≤ q
b − a
3
<
b − a
3
; (2.185)
hence, xn ∈ (a, b); this is valid for any n ∈ N, taking into account the mathematical induction
principle.
Proposition 2.7 (a priori estimation of the error in the contractions method). Let x = φ(x)
with φ : [a, b] → [a, b], φ contraction, and let x be its unique root in [a, b]. Let {xn}n∈N be the
sequence of successive approximations defined by the recurrence relation (2.159). Under these
conditions, there exists the relation
|xn − x| ≤ qn
(b − a), (2.186)
where q is the contraction constant of φ, 0 < q < 1.
Demonstration. Formula (2.186) is an obvious consequence of the successive relations
|xn − x| = |φ(xn−2) − φ(x)| ≤ q|xn−1 − x|
= q|φ(xn−2) − φ(x)| ≤ q2
|xn−2 − x| ≤ · · · ≤ qn
|x0 − x|, (2.187)
where
|x0 − x| ≤ b − a. (2.188)
Observation 2.12 To determine the solution x of equation (2.157) with precision ε, we must
determine the necessary number nε of iterations from
qn
(b − a) < ε, (2.189)
from which
nε =
ln ε/ (b − a)
ln q
+ 1, (2.190)
where the square brackets represent the entire part function.
42 SOLUTION OF EQUATIONS
Proposition 2.8 (a posteriori estimation of the error in the contractions method). Let x= φ(x)
with φ : [a, b] → [a, b] a contraction of the contraction constant q, 0 < q < 1, and let x be the
unique root of this equation in [a, b]. Let us also consider the sequence {xn}n∈N of successive
approximations defined by the recurrence relation (2.159). Under these conditions, there exists the
relation
|xn − x| ≤
1
1 − q
|xn+1 − xn| (2.191)
for any n ∈ N.
Demonstration. Formula (2.162) leads to the relation
|xn+p − xn| = |xn+p − xn+p−1 + xn+p−1 − xn+p−2 + · · · + xn+1 − xn|
≤ |xn+p − xn+p−1| + |xn+p−1 − xn+p−2| + · · · + |xn+1 − xn|
≤ qp−1
|xn+1 − xn| + qp−2
|xn+1 − xn| + · · · + |xn+1 − xn| =
1 − qp
1 − q
|xn+1 − xn|.
(2.192)
We pass to the limit for p → ∞ in relation (2.192), hence
lim
p→∞
|xn+p − xn| ≤ lim
p→∞
1 − qp
1 − q
|xn+1 − xn| (2.193)
and, because lim
p→∞
xn+p = x and lim
p→∞
qp
= 0, we obtain formula (2.191), which had to be proved.
Observation 2.13 To determine the solution of equation (2.157) with precision ε, we must calculate
the terms of the sequence {xn}n∈N until
1
1 − q
|xn+1 − xn| < ε, (2.194)
from which
|xn+1 − xn| < ε(1 − q). (2.195)
2.5 THE NEWTON–KANTOROVICH METHOD
We now deal with a variant4
of Newton’s method, where the successive iterations sequence is
defined by a contraction.
Theorem 2.10 Let f : [x∗
− λ, x∗
+ λ] → R, f (x∗
) = 0, be a twice differentiable function. Let
us denote this as
a = |f (x∗
)|, (2.196)
c = |f (x∗
)|. (2.197)
We also suppose that there exists b > 0 such that
|f (x)| ≤ b, for any x ∈ [x∗
− λ, x∗
+ λ] (2.198)
4The theorem was stated by Leonid Vitaliyevich Kantorovich (1912–1986) in 1940.
THE NEWTON–KANTOROVICH METHOD 43
and let us denote
µ =
bc
2a2
. (2.199)
If µ < 1/4, under these conditions, the application
g(x) = x −
f (x)
f (x∗)
(2.200)
is a contraction from [x∗ − ky∗, x∗ + ky∗] to [x∗ − ky∗, x∗ + ky∗], where
k =
c
a
, (2.201)
and y∗
is the smallest solution of the equation
µy2
− y + 1 = 0, (2.202)
that is,
y∗
=
1 −
√
1 − 4µ
2µ
. (2.203)
Demonstration. Firstly, we show that g([x∗
− ky∗
, x∗
+ ky∗
]) ⊆ [x∗
− ky∗
, x∗
+ ky∗
]. Let us cal-
culate |g(x) − x∗
|. We have
|g(x) − x∗
| = x −
f (x)
f (x∗)
− x∗
= x − x∗
−
f (x)
f (x∗)
+
f (x∗
)
f (x∗)
−
f (x∗
)
f (x∗)
=
f (x∗
) (x − x∗
) − f (x) + f (x∗
)
f (x∗)
−
f (x∗
)
f (x∗)
≤
1
|f (x∗)|
|f (x∗
)(x − x∗
) − f (x) + f (x∗
)| +
f (x∗
)
f (x∗)
. (2.204)
If we take into account relations (2.196), (2.197), and (2.201), then relation (2.204) leads to
|g(x) − x∗
| ≤
1
a
|f (x∗
)(x − x∗
) − f (x) + f (x∗
)| + k. (2.205)
Taylor’s formula written for the points x and x∗
leads to
f (x) = f (x∗
) + f (x∗
)(x − x∗
) +
1
2
f (ξ)(x − x∗
)2
, (2.206)
where ξ is a point situated between x and x∗. Obviously, it follows that
|f (x∗
)(x − x∗
) − f (x) + f (x∗
)| ≤
1
2
|f (ξ)|(x − x∗
)2
(2.207)
and, taking into account condition (2.198), we have
|f (x∗
)(x − x∗
) − f (x) + f (x∗
)| ≤
1
2
b(x − x∗
)2
. (2.208)
44 SOLUTION OF EQUATIONS
We obtain
|g(x) − x∗
| ≤
b
2a
|x − x∗
|2
+ k (2.209)
from relations (2.205) and (2.206). On the other hand, x ∈ [x∗
− ky∗
, x∗
+ ky∗
], hence
|x − x∗
| ≤ ky∗
(2.210)
and relation (2.209) leads to
|g(x) − x∗
| ≤
b
2a
k2
(y∗
)2
+ k = k
bc
2a2
y∗ 2
+ 1 . (2.211)
From relations (2.199) and (2.202), we get
bc
2a2
(y∗
)2
+ 1 = µ(y∗
)2
+ 1 = y∗
, (2.212)
hence
|g(x) − x∗
| ≤ ky∗
. (2.213)
Concluding, g([x∗
− ky∗
, x∗
+ ky∗
]) ⊂ [x∗
− ky∗
, x∗
+ ky∗
].
Let us show now that g is a contraction. We have
|g(x)| = 1 −
f (x)
f (x∗)
=
1
f (x∗)
|f (x∗
) − f (x)|. (2.214)
Applying the finite increments formula to the function f for the points x and x∗
, it follows that
there exists η between x and x∗
such that
f (x∗
) − f (x) = f (η)(x∗
− x) (2.215)
and, applying the modulus to the last relation, we get
|f (x∗
) − f (x)| = |f (η)||x∗
− x|. (2.216)
Taking into account equation (2.198), relation (2.216) leads to
|f (x∗
) − f (x)| ≤ b|x∗
− x|. (2.217)
Relations (2.214) and (2.217) imply that
|g (x)| ≤
1
|f (x∗)|
b|x∗
− x| (2.218)
and, taking into account equation (2.197), we obtain
|g (x)| ≤ b|x∗
− x|. (2.219)
Applying now formulae (2.210), (2.201), and (2.199), we get
|g (x)| ≤
b
a
ky∗
= 2µy∗
= 1 − 1 − 4µ. (2.220)
THE NEWTON–KANTOROVICH METHOD 45
Because 0 < µ < 1/4, we get |g (x)| < 1 and can choose as contraction constant
q = 1 − 1 − 4µ < 1, (2.221)
proving that g is a contraction.
Observation 2.14 We must obviously have
[x∗
− ky∗
, x∗
+ ky∗
] ⊂ [x∗
− λ, x∗
+ λ]. (2.222)
To fulfill condition (2.222), it is sufficient that
ky∗
≤ λ, (2.223)
from which
k ≤
λ
y∗
=
2λµ
1 −
√
1 − 4µ
. (2.224)
Observation 2.15 The solution x of the equation
x = g(x), (2.225)
which is the same as that of the equation
f (x) = 0, (2.226)
is obtained by constructing the sequence
x0 ∈ [x∗
− ky∗
, x∗
+ ky∗
] arbitrary, xn+1 = g(xn), n ≥ 0. (2.227)
Observation 2.16 The formulae that define the a priori estimation of the error
|x − xn| ≤
qn
1 − q
|x1 − x0| (2.228)
and the a posteriori estimation of the error
|x − xn| ≤
1
1 − q
|xn+1 − xn|, (2.229)
respectively, are obviously those in the contractions method, specifying that q is given by equation
(2.221).
46 SOLUTION OF EQUATIONS
2.6 NUMERICAL EXAMPLES
Consider the equation
f (x) = x −
1 − sin x
2
= 0, x ∈ [0, 1]. (2.230)
We observe that f (0) = −0.5 and f (1) = 0.9207; we also have
f (x) = 1 + 0.5 cos x, (2.231)
hence f (x) > 0 for x ∈ [0, 1]. We conclude that the equation f (x) = 0 has only one root in the
interval [0, 1].
Let us apply the bipartition method to solve equation (2.230). The calculation is given in
Table 2.1.
We may state that
x ∈ [0.333984375, 0.3359375]. (2.232)
We now apply the method of the chord to solve equation (2.230); the calculation may be found
in Table 2.2.
It follows that
x ≈ 0.335418. (2.233)
The recurrence formula in the tangent method reads
xn+1 = xn −
xn − 0.5(1 − sin xn)
1 + 0.5 cos xn
. (2.234)
Because
f (x) = 1 + 0.5 cos x, f (x) = −0.5 sin x (2.235)
TABLE 2.1 Solution of Equation (2.229) by the Bipartition Method
Step an bn cn f (an) f (bn) f (cn)
0 0 1 0.5 −0.5 0.9207 0.2397 > 0
1 0 0.5 0.25 −0.5 0.2397 −0.1263 < 0
2 0.25 0.5 0.375 −0.1263 0.2397 0.0581 > 0
3 0.25 0.375 0.3125 −0.1263 0.0581 −0.0338 < 0
4 0.3125 0.375 0.34375 −0.0338 0.0581 0.0123 > 0
5 0.3125 0.34375 0.328125 −0.0338 0.0123 −0.0107 < 0
6 0.328125 0.34375 0.3359375 −0.0107 0.0123 0.0008 > 0
7 0.328125 0.3359375 0.33203125 −0.0107 0.0008 −0.005 < 0
8 0.33203125 0.3359375 0.333984375 −0.005 0.0008 −0.0021 < 0
TABLE 2.2 Solution of Equation (2.229) by the Chord Method
Step an bn cn f (an) f (bn) f (cn)
0 0 1 0.351931 −0.5 < 0 0.920735 0.024287 > 0
1 0 0.351931 0.335628 −0.5 < 0 0.024287 0.000309 > 0
2 0 0.335628 0.335421 −0.5 < 0 0.000309 4 × 10−6
> 0
3 0 0.335421 0.335418 −0.5 < 0 4 × 10−6
> 0 ≈ 0
NUMERICAL EXAMPLES 47
and f (x) > 0, f (x) ≤ 0 for x ∈ [0, 1], we deduce that the function f is strictly increasing and
concave on the interval [0, 1]. We may thus choose
x0 = a = 0. (2.236)
The calculations are given in Table 2.3.
We obtain
x ≈ 0.335418. (2.237)
Let us solve the same problem by means of the modified Newton method, for which
xn+1 = xn −
xn − 0.5(1 − sin x)
1.5
. (2.238)
The calculations are given in Table 2.4.
We get
x ≈ 0.335418. (2.239)
To solve equation (2.230) by the contractions method, we write it in the form
x = 0.5(1 − sin x) = φ(x). (2.240)
Taking into account that the derivative
φ (x) = −0.5 cos x, |φ (x)| ≤ 0.5 < 1, (2.241)
it follows that φ(x) is a contraction such that the recurrence formula is of the form
xn+1 = φ(xn) = 0.5(1 − sin xn); (2.242)
the calculation is given in Table 2.5.
TABLE 2.3 Solution of Equation (2.229) by the Tangent Method
Step xn f (xn) f (xn)
0 0 −0.5 1.5
1 0.333333 −0.003070 1.472479
2 0.335418 −4.7675 × 10−8
1.472136
3 0.335418 – –
TABLE 2.4 Solution of Equation (2.229) by Means of the
Modified Newton Method
Step xn f (xn)
0 0 −0.5
1 0.333333 −0.003070
2 0.335380 −0.000056
3 0.335417 −2 × 10−6
4 0.335418 −4.7675 × 10−8
5 0.335418 –
48 SOLUTION OF EQUATIONS
TABLE 2.5 Solution of Equation (2.229) by the Contractions Method
Step xn φ(xn)
0 0.5 0.260287
1 0.260287 0.371321
2 0.371321 0.318577
3 0.318577 0.343392
4 0.343392 0.331650
5 0.331659 0.337194
6 0.337194 0.334580
7 0.334580 0.335814
8 0.335814 0.335231
9 0.335231 0.335506
10 0.335506 0.335377
11 0.335377 0.335437
12 0.335437 0.335409
13 0.335409 0.335422
14 0.335422 0.335416
15 0.335416 0.335419
16 0.335419 0.335418
17 0.335418 0.335418
We obtain
x ≈ 0.335418. (2.243)
To apply the Newton–Kantorovich method, let us consider
x∗
= 0.5. (2.244)
c = |f (x∗
)| = |0.5 − 0.5(1 − sin 0.5)| = 0.239713, (2.245)
a = |f (x∗
)| = |1 + cos 0.5| = 1.438791, (2.246)
|f (x)| = |−0.5 sin x| = 0.5 sin x ≤ 0.5 sin 1 = 0.420735; (2.247)
we may thus take
|f (x)| ≤ b = 0.43, (2.248)
µ =
bc
2a2
= 0.024896 <
1
4
, (2.249)
λ = 0.5. (2.250)
Hence, we can apply the Newton–Kantorovich method, with
k =
c
a
= 0.166607, (2.251)
y∗
=
1 −
√
1 − 4µ
2µ
= 1.026219, (2.252)
ky∗
= 0.170975 (2.253)
and the function g : [0.329025, 0.670975] → [0.329025, 0.670975]
g(x) = x −
f (x)
f (x∗)
. (2.254)
APPLICATIONS 49
TABLE 2.6 Solution of Equation (2.229) by the Newton–Kantorovich Method
Step xn f (xn)
0 0.5 0.239713
1 0.333393 −0.002981
2 0.335465 0.000069
3 0.335417 −0.000002
4 0.335418 −4.7675 × 10−8
5 0.335418 –
The calculation is given in Table 2.6.
We deduce that
x ≈ 0.335418. (2.255)
The following conclusions result:
(i) the most unfavorable method is that of bisection, for which a relatively large number of
steps are necessary to determine the solution with a good approximation;
(ii) the number of steps in the contractions method depends on the value of the contraction
constant; if this constant is close to 1, then the number of iteration steps increases;
(iii) Newton’s method is quicker than the modified Newton method;
(iv) the Newton–Kantorovich method has both the advantages and the disadvantages of New-
ton’s and contractions methods;
(v) the chord method is quicker than the bisection one, but less quick than Newton’s method.
2.7 APPLICATIONS
Problem 2.1
Let us consider a material point of mass m, which moves on the Ox-axis (Fig. 2.7), under the action
of a force
F = −
F0
b
xe
x
b . (2.256)
Determine the displacement xmax, knowing the initial conditions: t = 0, x = x0, ˙x = v0.
Numerical application: x0 = 0, v0 = 40 ms−1
, F0 = 50 N, b = 2 m, m = 1 kg.
F
x
xMO
Figure 2.7 Problem 2.1.
Solution:
1. Theory
The theorem of variation of the kinetic energy is
mv2
2
−
mv2
0
2
= W, (2.257)
50 SOLUTION OF EQUATIONS
where v is the velocity of the material point, while
W =
x
x0
F(x)dx (2.258)
is the work of the force F; imposing the condition v = 0, we obtain xmax as the solution of the
equation
−
mv2
0
2
=
x
x0
F(x)dx. (2.259)
In the considered case, by using the notations
ξ =
x
b
, k =
mv2
0
2bF0
+
x0
b
e
x0
b − e
x0
b , (2.260)
we obtain the equation
ξeξ
− eξ
− k = 0. (2.261)
2. Numerical Calculation
In the case of the numerical application, equation (2.260) takes the form
ξeξ
− eξ
− 7 = 0, (2.262)
the solution of which is
ξ ≈ 1.973139, (2.263)
that is,
xmax = bξ ≈ 3.946278 m. (2.264)
Problem 2.2
Two particles move on the Ox-axis corresponding to the laws (Fig. 2.8)
x1 = A1 cos(ω1t), (2.265)
x2 = d + A cos(ω2t + φ), (2.266)
where ω1, ω2, φ, A1, and A2 are positive constants, while t is the time.
d
x1
x2
xO A
Figure 2.8 Problem 2.2.
Let us determine the first positive value of the time at which the two particles meet.
Numerical application: ω1 = 2 s−1
, ω2 = π s−1
, φ = π/6 rad, d = 1 m, A1 = 0.6 m,
A2 = 0.8 m.
APPLICATIONS 51
Solution: The meeting condition reads
x1 = x2, (2.267)
from which
A1 cos(ω1t) = d + A2 cos(ω2t + φ) (2.268)
or
cos(ω1t) =
d
A1
+
A2
A1
cos(ω2t + φ). (2.269)
Because −1 ≤ cos(ω1t) ≤ 1, we obtain a condition that must verify the parameters of the problem
−1 ≤
d
A1
+
A2
A1
cos(ω2t + φ) ≤ 1. (2.270)
In the numerical case considered, it follows that
cos 2t =
1
0.6
+
0.8
0.6
cos πt +
π
6
. (2.271)
Let us represent graphically the functions f : R+ → R, g : R+ → R (Fig. 2.9)
f (t) = 0.6 cos 2t, g(t) = 1 + 0.8 cos πt +
π
6
. (2.272)
0 0.5 1 1.5 2 2.5 3 3.5 4
−1
−0.5
0
0.5
1
1.5
2
t (s)
f(t)andg(t)(m)
Figure 2.9 The functions f (t) (continuous line) and g(t) (dotted line).
From the figure, we obtain the first point of meeting for t1 contained between 2.5 and 3 s. Solving
the equation
0.6 cos 2t − 1 − 0.8 cos πt +
π
6
= 0, (2.273)
we obtain the required solution
t1 ≈ 2.6485 s. (2.274)
52 SOLUTION OF EQUATIONS
FURTHER READING
Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of
America.
Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis:
Theory, Methods and Practice. Boca Raton: CRC Press.
Atkinson KE (1989). An Introduction to Numerical Analysis. 2nd ed. New York: John Wiley & Sons,
Inc.
Atkinson KE (2003). Elementary Numerical Analysis. 2nd ed. Hoboken: John Wiley & Sons, Inc.
Bakhvalov N (1976). M´ethodes Num´erique. Moscou: Editions Mir (in French).
Berbente C, Mitran S, Zancu S (1997). Metode Numerice. Bucures¸ti: Editura Tehnic˘a (in Romanian).
Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole.
Butt R (2009). Introduction to Numerical Analysis Using MATLAB. Boston: Jones and Bartlett Pub-
lishers.
Chapra SC (1996). Applied Numerical Methods with MATLAB for Engineers and Scientists. Boston:
McGraw-Hill.
Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson.
Cira O, M˘arus¸ter S¸ (2008). Metode Numerice pentru Ecuat¸ii Neliniare. Bucures¸ti: Editura Matrix Rom
(in Romanian).
Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice Hall.
D´emidovitch B, Maron I (1973). ´El´ements de Calcul Num´erique. Moscou: Editions Mir (in French).
Dennis JE Jr, Schnabel RB (1987). Numerical Methods for Unconstrained Optimization and Nonlinear
Equations. Philadelphia: SIAM.
DiBenedetto E (2010). Classical Mechanics: Theory and Mathematical Modeling. New York:
Springer-Verlag.
Epperson JF (2007). An Introduction to Numerical Methods and Analysis. Hoboken: John Wiley &
Sons, Inc.
Fung YC, Tong P (2011). Classical and Computational Solid Mechanics. Singapore: World Scientific
Publishing.
Gautschi W (1997). Numerical Analysis: An Introduction. Boston: Birkh¨auser.
Godunov SK, Reabenki VS (1977). Scheme de Calcul cu Diferent¸e Finite. Bucures¸ti: Editura Tehnic˘a
(in Romanian).
Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen-
tation of Algorithms. Princeton: Princeton University Press.
Hamming RW (1987). Numerical Methods for Scientists and Engineers. 2nd ed. New York: Dover
Publications.
Hamming RW (2012). Introduction to Applied Numerical Analysis. New York: Dover Publications.
Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing.
Higham NJ (2002). Accuracy and Stability of Numerical Algorithms. 2nd ed. Philadelphia: SIAM.
Hildebrand FB (1987). Introduction to Numerical Analysis. 2nd ed. New York: Dover Publications.
Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill.
Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd
ed. Boca Raton: CRC Press.
Kelley CT (1987a). Iterative Methods for Linear and Nonlinear Equations. Philadelphia: SIAM.
Kelley CT (1987b). Solving Nonlinear Equations with Newton’s Method. Philadelphia: SIAM.
Kleppner D, Kolenkow RJ (2010). An Introduction to Mechanics. Cambridge: Cambridge University
Press.
Kress R (1996). Numerical Analysis. New York: Springer-Verlag.
Krˆılov AN (1957). Lect¸ii de Calcule prin Aproximat¸ii. Bucures¸ti: Editura Tehnic˘a (in Romanian).
FURTHER READING 53
Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill.
Lange K (2010). Numerical Analysis for Statisticians. 2nd ed. New York: Springer-Verlag.
Lurie AI (2002). Analytical Mechanics. New York: Springer-Verlag.
Marinescu G (1974). Analiz˘a Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Romanian).
Meriam JL, Kraige LG (2012). Engineering Mechanics: Dynamics. Hoboken: John Wiley & Sons, Inc.
Otto SR, Denier JP (2005). An Introduction to Programming and Numerical Methods in MATLAB.
London: Springer-Verlag.
Pandrea N (2000). Elemente de Mecanica Solidului ˆın Coordonate Pl¨uckeriene. Bucures¸ti: Editura
Academiei Romˆane (in Romanian).
Pandrea N, St˘anescu ND (2002). Mecanic˘a. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
Popovici P, Cira O (1992). Rezolvarea Numeric˘a a Ecuat¸iilor Neliniare. Timis¸oara: Editura Signata
(in Romanian).
Postolache M (2006). Modelare Numeric˘a. Teorie s¸i Aplicat¸ii. Bucures¸ti: Editura Fair Partners (in
Romanian).
Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific
Computing. 3rd ed. Cambridge: Cambridge University Press.
Quarteroni A, Sacco R, Saleri F (2010). Numerical Mathematics. 2nd ed. Berlin: Springer-Verlag.
Ralston A, Rabinowitz P (2001). A First Course in Numerical Analysis. 2nd ed. New York: Dover
Publications.
Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press.
Salvadori MG, Baron ML (1962). Numerical Methods in Engineering. Englewood Cliffs: Prentice
Hall.
Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson.
Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN.
Bucures¸ti: Editura Tehnic˘a (in Romanian).
St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
Stoer J, Bulirsh R (2010). Introduction to Numerical Analysis. 3rd ed. New York: Springer-Verlag.
S¨uli E, Mayers D (2003). An Introduction to Numerical Analysis. Cambridge: Cambridge University
Press.
Teodorescu PP (2010). Mechanical Systems: Classical Models. Volume 1: Particle Mechanics.
Dordrecht: Springer-Verlag.
Udris¸te C, Iftode V, Postolache M (1996). Metode Numerice de Calcul. Algoritmi s¸i Programe Turbo
Pascal. Bucures¸ti: Editura Tehnic˘a (in Romanian).
3
SOLUTION OF ALGEBRAIC EQUATIONS
In this chapter, we deal with the determination of limits of the roots of polynomials, including their
separation. Three methods are considered, namely, Lagrange’s method, the Lobachevski–Graeffe
method, and Bernoulli’s method.
3.1 DETERMINATION OF LIMITS OF THE ROOTS OF POLYNOMIALS
Let
f (X) = a0Xn
+ a1Xn−1
+ · · · + an (3.1)
be a polynomial in R(X), where n ∈ N∗
, ai ∈ R, i = 0, n. Let us consider the algebraic equation
f (x) = a0xn
+ a1xn−1
+ · · · + an = 0. (3.2)
Theorem 3.1 All the roots of the algebraic equation (3.2) are in the circular annulus of the complex
plane, defined by the inequalities
|an|
a + |an|
≤ |x| ≤ 1 +
a
|a0|
, (3.3)
where a and a are specified by
a = max{|a1|, . . . , |an|}, a = max{|a0|, . . . , |an−1|}. (3.4)
Demonstration. We now show that
|x| ≤ 1 +
a
|a0|
. (3.5)
Numerical Analysis with Applications in Mechanics and Engineering, First Edition.
Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.
 2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc.
55
56 SOLUTION OF ALGEBRAIC EQUATIONS
We may write
|a1xn−1
+ a2xn−2
+ · · · + an| ≤ |a1||xn−1
| + |a2||xn−2
| + · · ·
+ |an| ≤ a(|xn−1
| + |xn−2
| + · · · + 1) = a
|x|n
− 1
|x| − 1
. (3.6)
If |x| > 1, then
a
|x|n
− 1
|x| − 1
< a
|x|n
|x| − 1
(3.7)
and relation (3.6) leads to
|a1xn−1
+ a2xn−2
+ · · · + an| < a
|x|n
|x| − 1
. (3.8)
Let x be a root of equation (3.2). Thus,
|f (x)| = |a0xn
+ a1xn−1
+ · · · + an| = 0 (3.9)
and
|f (x)| ≥ |a0xn
| − |a1xn−1
+ · · · + an|, (3.10)
from which
|a0xn
| ≤ |a1xn−1
+ · · · + an|. (3.11)
Taking into account relations (3.6) and (3.7), the latter formula leads to the relation
|a0||x|n
< a
|x|n
|x| − 1
, (3.12)
and hence inequality (3.5) is proved.
To arrive at the other inequality, we perform the transformation x → 1/y, hence
f (y) = anyn
+ an−1yn−1
+ · · · + a0. (3.13)
Let y be a root of this polynomial. Taking into account equation (3.5), we get
|y| ≤ 1 +
a
|an|
, (3.14)
that is,
1
|x|
≤ 1 +
a
|an|
, (3.15)
hence
|x| ≥
|an|
|an| + a
. (3.16)
Observation 3.1 Let
L = 1 +
a
|a0|
, l =
|an|
a + |an|
. (3.17)
We have l < 1, L > 1, and L > l.
DETERMINATION OF LIMITS OF THE ROOTS OF POLYNOMIALS 57
O
l
L
x2
x1
Figure 3.1 Domain where the roots of equation (3.2) lie.
Observation 3.2 The roots of equation (3.2) are in the hatched domain of the complex plane
(Fig. 3.1).
Observation 3.3 If equation (3.2) has positive, real roots, then formula (3.3) can be written for
these roots in the form
|an|
a + |an|
≤ x ≤ 1 +
a
|a0|
. (3.18)
Observation 3.4 We can always assume that an = 0. In the opposite case, we may obtain an
equation of the form
a0xp
+ a1xp−1
+ · · · + ap = 0, (3.19)
where ap = 0.
Definition 3.1 The real number L > 0 is called a superior bound of the positive roots of equation
(3.2) if for any such root x, we have x < L.
Definition 3.2 The real number l > 0 is called an inferior bound of the positive roots of equation
(3.2) if for any such root x, we have x > l.
Observation 3.5
(i) The value −l < 0 will be a superior bound of the negative roots of equation (3.2) if l > 0
is an inferior bound of the positive roots of the same equation.
(ii) The value −L < 0 will be an inferior bound of the negative roots of equation (3.2) if L > 0
is a superior bound of the positive roots of the same equation.
(iii) The real roots of equation (3.2) are in the set (−L, −l) ∪ (l, L).
Observation 3.6
(i) Let us consider the equation
f1(x) = (−1)n
f (−x) = 0, (3.20)
for which L1 denotes a superior bound of its positive roots. If α < 0, is a negative root of
equation (3.2), then −α > 0 will be a root of equation (3.20), hence −α < L1, from which
α > −L1.
58 SOLUTION OF ALGEBRAIC EQUATIONS
(ii) Let us consider the equation
f2(x) = xn
f
1
x
= 0 (3.21)
and let L2 denote a superior bound of its positive roots. If α > 0 is a positive root of
equation (3.2), then 1/α > 0 is a solution of equation (3.21) for which 1/α < L2, hence
α > 1/L2.
(iii) Let us now consider the equation
f3(x) = (−1)n
xn
f −
1
x
= 0. (3.22)
Let L3 be a superior bound of its positive roots. If α < 0 is a negative root of equation (3.2),
then −1/α > 0 is a positive root of equation (3.22), for which the relation −1/α < L3 is
true. Hence, it follows that α < −1/L3.
(iv) From the above considerations, it follows that the real roots of equation (3.2) belong to the
set (−L1, −1/L3) ∪ (1/L2, L).
Theorem 3.2 Let A be the greatest absolute value of the negative coefficients of the algebraic
equation (3.2) for which a0 > 0 (eventually, by multiplying it by −1). In these conditions, a superior
limit of the positive roots of this equation is given by
L = 1 +
A
a0
1
k
, (3.23)
where k is the index of the first negative coefficient in the expression of the polynomial function
(3.1).
Demonstration. Let us specify the terms which appear in equation (3.23). Thus, A is given by
A = max
1≤i≤n
{|ai||ai < 0}, (3.24)
while k is given by
k = min{i|ai < 0, aj ≥ 0, (∀)j < i}. (3.25)
Let x > 0. Then, f (x) can be written in the form
f (x) = a0xn
+ · · · + ak−1xn−k+1
+ (akxn−k
+ · · · + an) ≥ a0xn
− A(xn−k
+ · · · + 1)
= a0xn
− B
xn−k+1
− 1
x − 1
. (3.26)
For x > 1, the last formula leads to
f (x) = a0xn
− A
xn−k+1
x − 1
. (3.27)
Let x be a positive root of equation (3.2). Relation (3.27) leads to
0 > a0xn
− A
xn−k+1
x − 1
, (3.28)
DETERMINATION OF LIMITS OF THE ROOTS OF POLYNOMIALS 59
from which
a0 < A
x−(k−1)
x − 1
=
A
xk−1
(x − 1)
<
A
(x − 1)k
, (3.29)
so that
x < 1 +
A
a0
1
k
. (3.30)
Observation 3.7 If all the coefficients of equation (3.2) are positive, then this equation has no
positive roots.
Observation 3.8 We notice that Theorem 3.2 gives more restricted bounds for the limits of the
real roots of equation (3.2).
Theorem 3.3 (Newton). Let f be the polynomial function given by equation (3.1), with a0 > 0,
and let a ∈ R, a > 0, a number such that f (a) > 0, f (a) > 0, . . . , f (n)
(a) > 0. In these conditions,
a is a superior bound of the positive roots of equation (3.2).
Demonstration. The expansion of f into a Taylor series around a is of the form
f (x) = f (a) + (x − a)
f (a)
1!
+ (x − a)2 f (a)
2!
+ · · · + (x − a)n f (n)
(a)
n!
· · · . (3.31)
We observe that if x ≥ a, then f (x) > 0 because f (i)
(a) > 0 and x − a > 0. It thus follows that f
cannot have roots greater than a, hence a is a superior bound of the roots of the equation f (x) = 0.
Let us show that there exists such an a. We have
f (n)
(x) = a0n! > 0, (3.32)
because a0 > 0, by hypothesis. It follows that f (n−1)
(x) is strictly increasing, and hence there
exists a1 ∈ R so that f (n−1)
(x) > 0 for x ≥ a1. Obviously, we may consider a1 > 0. We pass now
to f (n−1)
(x), which is strictly positive for x > a1. It follows that there exists a2, where a2 ≥ a1,
such that f (n−2)
(x) > 0 for x ≥ a2. The procedure continues until an, with an ≥ an−1 ≥ · · · ≥ a1,
so that f (i)
(an) ≥ 0 for any i = 0, n. We now take a = an.
Theorem 3.4 Let f be a polynomial function of the form (3.1) and let us suppose that a0 > 0
and that the polynomial has only one variation of sign in the sequence of its coefficients, that there
exists i, 1 ≤ i < n, so that aj > 0 for any j, 0 ≤ j ≤ i, and aj < 0 for any j, i < j ≤ n. Let us
suppose that there exists a ∈ R, so that f (a) ≥ 0. Then f (x) > 0 for any x > a.
Demonstration. Let us write the polynomial f in the form
f (x) = (a0xn
+ · · · + aixn−i
) − (ai+1xn−i−1
+ · · · + an). (3.33)
It follows that
f (x) = xn−i
a0xn−i
+ · · · + ai −
ai+1
x
+ · · · +
an
xn−i
. (3.34)
If x increases starting from a, the expression in first parentheses will increase, while that in the
second parentheses will decrease. Hence f (x) is increasing and, because f (a) ≥ 0, it follows that
f (x) > 0 for x > a. Hence a is a superior bound of the positive roots of the equation f (x) = 0.
60 SOLUTION OF ALGEBRAIC EQUATIONS
Observation 3.9 The previous theorem suggests a method to determine a superior bound of the
positive roots of equation (3.2). To do this, we group the terms of the polynomial so that
(i) the powers of x are decreasing in any group;
(ii) the first coefficient of a group is positive;
(iii) we have only one variation of sign in the interior of each group.
We now determine a superior bound of the positive roots for each group; hence, the superior
bound of the positive roots of equation (3.2) will be the maximum of the superior bounds of the
positive roots of the groups.
Observation 3.10 The method presented above, called the method of terms grouping, is sensible
to the choice of the groups.
3.2 SEPARATION OF ROOTS
Definition 3.3 Let {bi}i=0,m be a finite sequence of real numbers, so that bi < bi+1, i = 0, m. We
say that this sequence separates the roots of the algebraic equation f (x) = 0 with
f (x) = a0xn
+ · · · + an, (3.35)
if we have a single root of this equation in any interval (bi, bi+1), i = 0, m.
Observation 3.11 The sequence {bi}i=0, m can be chosen as consisting of a part of Rolle’s
sequence.
Proposition 3.1 Let f be a polynomial of even degree n, n = 2k, for which a0a2k < 0. The
equation
f (x) = a0x2k
+ a1x2k−1
+ · · · + a2k = 0 (3.36)
has at least one positive root and one negative root in these conditions.
Demonstration. To fix the ideas, let us suppose that a0 > 0. Because
lim
x→∞
f (x) = +∞, (3.37)
it follows that there exists m1 > 0, so that f (x) > 0 for any x > m1. Analogically, we have
lim
x→−∞
f (x) = ∞, (3.38)
hence there exist m2 < 0, so that f (x) > 0 for any x < m2. Let M = max{|m1|, |m2|}. Hence, for
any x, |x| > M, we will have f (x) > 0.
On the other hand, f (0) = a2k < 0 according to a0a2k < 0 and a0 > 0. It follows that there
exists ξ1 ∈ (−M, 0) and ξ2 ∈ (0, M), so that f (ξ1) = 0 and f (ξ2) = 0. Hence, equation (3.36) has
at least one positive root, the proposition being proved.
Observation 3.12 Proposition 3.1 specifies only the existence of a positive root and a negative
one, but there can exist several positive and negative roots.
Proposition 3.2 Let f be a polynomial with real coefficients and (a, b) an interval of the real
axis. Let us suppose that f has a single root x of multiplicity order k on this interval. Under these
conditions,
SEPARATION OF ROOTS 61
(i) f (a)f (b) > 0 if k is an even number;
(ii) f (a)f (b) < 0 if k is an odd number.
Demonstration. We write f in the form
f (x) = (x − x)k
g(x), (3.39)
where g(x) is a polynomial with real coefficients, without solution in the interval (a, b). We have
f (a) = (a − x)k
g(a), f (b) = (b − x)k
g(b). (3.40)
We mention that g(a) and g(b) have the same sign, because g has no roots in the interval (a, b).
On the other hand,
a − x < 0, b − x > 0, (3.41)
because x ∈ (a, b). We may write
f (a)f (b) = (a − x)k
(b − x)k
g(a)g(b) = [(a − x)(b − x)]k
g(a)g(b). (3.42)
The sign of f (a)f (b) is given by [(a − x)(b − x)]k
. If k is an even number (eventually 0),
then [(a − x)(b − x)]k > 0, hence f (a)f (b) > 0. Analogically, if k is an odd number, then
[(a − x)(b − x)]k
< 0, so that f (a)f (b) < 0.
Proposition 3.3 Let f be a polynomial of degree n with real coefficients and (a, b) an interval
of the real axis. Let us suppose that, in the interval (a, b), f has s roots denoted by x1, x2, . . . ,
xs, of multiplicity orders k1, k2, . . . , ks. In these conditions,
(i) if f (a)f (b) < 0, then s
i=1 ki is an odd number;
(ii) if f (a)f (b) > 0, then s
i=1 ki is an even number (eventually 0).
Demonstration. Let us suppose that the roots x1, x2, . . . , xs have been increasingly ordered on the
interval (a, b), so that x1 < x2 < · · · < xs. Let x be one of these roots of multiplicity order k and
h a real number. We may write
f (x + h) = hk f (k)
(x)
k!
+ hk+1 f (k+1)
(ξ1)
(k + 1)!
, (3.43)
where ξ1 ∈ (x, x + h). Analogically,
f (x − h) = (−1)k
hk f (k)
(x)
k!
+ (−1)k+1
hk+1 f (k+1)
(ξ2)
(k + 1)!
, (3.44)
with ξ2 ∈ (x − h, x). Hence, it follows that
f (x + h)f (x − h) = (−1)k
hk f (k)(x)
k!
2
+ (−1)k h2k+1
k!(k + 1)!
× f (k)
(x)f (k+1)
(ξ1) + (−1)k+1 h2k+1
k!(k + 1)!
f (k)
(x)f (k+1)
(ξ2)
+ (−1)k+1 h2k+2
[(k + 1)!]2
f (k+1)
(ξ1)f (k+1)
(ξ2) (3.45)
62 SOLUTION OF ALGEBRAIC EQUATIONS
or
f (x + h)f (x − h) = (−1)k
hk f (k)(x)
k!
2
+ h2k+1
φ(x, ξ1, ξ2), (3.46)
where the notation for the function φ is obvious. We can immediately show that, for h sufficiently
small, the sign of f (x + h)f (x − h) is given by the sign of (−1)k
and it is +1 for k even and
−1 for k odd, respectively. It follows that f has the sign of f (a) on the interval (a, x1), has
the sign of (−1)k1 f (a) on the interval (x1, x2), has the sign of (−1)k1+k2 f (a) on the interval
(x2, x3), . . . , and has the sign of (−1)k1+···+ks f (a) on the interval (xs, b). Hence, we can state that
if f (a)f (b) < 0, then s
i=1 ki is an odd number, while if f (a)f (b) > 0, then the sum is an even
number (eventually 0).
Theorem 3.5 (Edward Waring, 1736–1798). Let f be a polynomial with real coefficients and
x1 and x2 be two consecutive roots of the polynomial (i.e., no other root of the polynomial exists
between x1 and x2). Let x1 be of order of multiplicity k1, and x2 of order of multiplicity k2. Under
these conditions, the polynomial g = f + λf , λ ∈ R, has, on the interval (x1, x2), a number of
real roots, the sum of multiplicity orders of which is odd. Moreover, x1 and x2 are roots of the
polynomial g, of multiplicity orders k1 − 1 and k2 − 1, respectively.
Demonstration. Let us write the polynomial f in the form
f (x) = (x − x1)k1 (x − x2)k2 h(x), (3.47)
where h(x) does not change in sign in the interval (x1, x2). Hence,
f (x) = k1(x − x1)k1−1
(x − x2)k2 h(x)
+ k2(x − x1)k1 (x − x2)k2−1
h(x) + (x − x1)k1 (x − x2)k2 h (x)
= (x − x1)k1−1
(x − x2)k2−1
[k1(x − x2)h(x) + k2(x − x1)h(x)
+ (x − x1)(x − x2)h (x)]. (3.48)
Denoting by p(x) the polynomial
p(x) = k1(x − x2)h(x) + k2(x − x1)h(x) + (x − x1)(x − x2)h (x), (3.49)
relation (3.48) leads to
f (x) = (x − x1)k1−1
(x − x2)k2−1
p(x). (3.50)
The polynomial g(x) can be written in the form
g(x) = (x − x1)k1−1
(x − x2)k2−1
p(x) + λ(x − x1)k1 (x − x2)k2 h(x)
= (x − x1)k1−1
(x − x2)k2−1
[p(x) + λ(x − x1)(x − x2)h(x)].
(3.51)
Denoting by q(x) the polynomial
q(x) = p(x) + λ(x − x1)(x − x2)h(x), (3.52)
formula (3.51) leads to
g(x) = (x − x1)k1−1
(x − x2)k2−1
q(x). (3.53)
SEPARATION OF ROOTS 63
Note that g(x) has the roots x1 and x2 of multiplicity orders k1 − 1 and k2 − 1 (a root of multiplicity
order 0 is, in fact, not a root), respectively. The roots of g(x), other than x1 and x2, are the roots
of q(x). But
q(x1) = p(x1) = k1(x1 − x2)h(x1), q(x2) = p(x2) = k2(x2 − x1)h(x2), (3.54)
hence
q(x1)q(x2) = −k1k2(x1 − x2)2
h(x1)h(x2). (3.55)
On the other hand, h(x1) and h(x2) have the same sign on (x1, x2), k1 > 0, k2 > 0, (x1 − x2)2
>
0, and we obtain
q(x1)q(x2) < 0. (3.56)
Taking into account Proposition 3.3, the theorem is proved.
Corollary 3.1 Let f be a polynomial with real coefficients, the roots of which are x1, . . . , xs, of
multiplicity orders k1, . . . , ks, respectively.
(i) If all the roots of f are real, then all the roots of f are also real.
(ii) If all the roots of f are simple, then all the roots of f are also simple and separate the roots
of f .
Demonstration.
(i) We may write
s
i=1
ki = n, (3.57)
where n is the degree of f . Waring’s theorem shows that xi, i = 1, s, are roots of the
polynomial f + λf , λ ∈ R, also of multiplicity orders ki − 1. It follows that the sum of
the multiplicity orders of the roots of f + λf is given by
s
i=1
(ki − 1) =
s
i=1
ki − s = n − s. (3.58)
On the other hand, there exists at least one root between xi and xi+1. The addition of these
s − 1 roots to the sum equation (3.58) shows that the sum of multiplicity orders of the roots
of polynomial f + λf is at least equal to
n − s + (s − 1) = n − 1. (3.59)
Let us note that, from Waring’s theorem, from formula (3.59), and because the sum of
the multiplicity orders of the polynomial f + λf is equal to n, it follows that each of the
roots of f + λf , situated between xi and xi+1, are simple roots. Accordingly, it follows
that the last root of f + λf is situated either in the interval (−∞, x1), or in the interval
(xs, ∞) and that this root is simple. This last root cannot be complex, without being real,
ξ ∈ C − R, because the polynomial f + λf being with real coefficients, it would result
that the conjugate ξ of ξ is also a root. The sum of the multiplicity orders of the other roots
of the polynomial f + λf , which are real, in accordance with Waring’s theorem, would
be equal to n − 2, in contradiction to formula (3.59). The s roots of f are x1, . . . , xs,
of multiplicity orders k1 − 1, . . . , ks − 1, respectively, as well as the s − 1 roots situated
64 SOLUTION OF ALGEBRAIC EQUATIONS
between xi and xi+1, i = 1, s − 1. The sum of the multiplicity orders of these roots is
equal to
s
i=1
(ki − 1) + (s − 1) = n − 1 (3.60)
and, because the degree of f is equal to n − 1, it is sufficient to make λ = 0, obtaining
thus all the roots of f , all of which are real.
(ii) It is a particular case of point (i) for
k1 = k2 = · · · = ks = 1. (3.61)
Proposition 3.4 Let a1, . . . , an be a finite sequence of nonzero numbers. If we leave out the
intermediate terms a2, . . . , an−1, the extremes a1 and an remaining unchanged, then the number of
sign variations in the sequence of two elements obtained differs from the number of sign variations
of the initial sequence by an even number (eventually 0).
Demonstration. Let us consider a sequence of three consecutive elements of the initial sequence,
that is, ai, ai+1, ai+2, i ≥ 1, i ≤ n − 2, and let us eliminate the intermediate element ai+1. To fix
the ideas, let us suppose that ai > 0. The following situations are possible:
(a) ai+1 > 0, ai+2 > 0. The number of sign variations is equal to zero in the initial sequence
and in the last one also it is equal to zero; hence, the difference of the two numbers is an
even number;
(b) ai+1 > 0, ai+2 < 0. The number of sign variations is equal to one in the initial sequence,
and in the last one it is equal to one too; the difference of the two numbers is zero, hence
an even number;
(c) ai+1 < 0, ai+2 > 0. In this case, we have two sign variations in the initial sequence, while
in the last sequence we have none; the difference is equal to two, hence an even number;
(d) ai+1 < 0, ai+2 < 0. We have one variation of sign in both sequences; the difference is thus
equal to zero, hence an even number.
The considered property thus holds for this sequence of three elements.
In the general case, by eliminating any intermediate term from a2 to an−1, the number of sign
variations differs by two or remains the same and the proposition is proved.
Corollary 3.2 Let
f (x) = a0xn
+ a1xn−1
+ · · · + an−1x + an (3.62)
be a polynomial of degree n with real coefficients. The number of sign variations of the sequence
of the coefficients of f has the same parity as the sum of the orders of multiplicity of the positive
real roots of the equation f (x) = 0.
Demonstration. Let us suppose that a0 > 0. There are two cases. If an < 0, then f (0) = an < 0 and
lim
x→∞
f (x) = +∞. According to Proposition 3.3, it follows that the sum of the orders of multiplicity
of the positive roots of the equation f (x) = 0 is an odd number, and hence Proposition 3.4 shows
that the number of sign variations in the sequence of the coefficients of f is an odd number. If
an > 0, then f (0) = an > 0 and lim
x→∞
f (x) = +∞. As the number of sign variations in the sequence
of the coefficients of f is even, Proposition 3.3 shows that the sum of the orders of multiplicity of
the positive roots of the equation f (x) = 0 is an even number.
SEPARATION OF ROOTS 65
Corollary 3.3 Let f be a polynomial of degree n with real coefficients, the positive real roots of
which are all simple. In this case, the number of sign variations in the sequence of the coefficients
of f has the same parity as the number of positive roots of the equation f (x) = 0.
Demonstration. In the given conditions, the sum of the multiplicity orders of the positive roots of
the equation f (x) = 0 is equal just to the number of these roots and we apply Corollary 3.2.
Lemma 3.1 Let α be a nonzero positive number and let f (x) be a polynomial of degree n. Let
us consider the polynomial g(x) = (x − α)f (x). In these conditions, the number of sign variations
in the sequence of the coefficients of the polynomial g differs from the number of sign variations
of the coefficients of f by a positive odd number.
Demonstration. Let us consider the polynomial
f (x) = anxn
+ an−1xn−1
+ · · · + a1x + a0, (3.63)
which we write in the form
f (x) = anxn
+ · · · + aixi
− aj xj
− · · · − akxk
+ alxl
+ · · · , (3.64)
where we have marked groups of terms of the same sign. The polynomial g(x) is now written in
the form
g(x) = (x − α)(anxn
+ · · · + aixi
) + (x − α)(−aj xj
− · · · − akxk
) + (x − α)(alxl
+ · · ·) − · · ·
= anxn+1
+ · · · + aixi+1
− αanxn
− · · · − αaixi
− aj xj+1
− · · · − akxk+1
+ αaj xj
+ · · · + αakxk
+ alxl+1
+ · · · − αalxl
− · · · (3.65)
The following situations may occur:
(a) i > j + 1. We have only one sign variation in this case.
(b) i = j + 1. We introduce the terms −αai and −aj in the same group and have once more a
sign variation between the first and the last term in the group.
(c) k > l + 1. We have a sign variation too.
(d) k = l + 1. The coefficient of xl+1 is positive and we have a sign variation in this case.
Let an and ap be the coefficients of the extreme terms of the polynomial f . It follows that
the extreme terms of g are an and αap. If anap > 0, then −αanap < 0, whereas if anap < 0, then
−αanap > 0. It follows that the number of sign variations in the sequence of the coefficients of g
is greater than the number of variations in the sequence of variations of f ; we mention also that
the difference between the two numbers is an odd number.
Theorem 3.6 (Descartes1
). Let us suppose that the equation f (x) = 0 has only simple roots, the
number of positive roots of which is p. In this case, p is either equal to the number of sign variations
in the sequence of coefficients of f or is less than minus from this one by an even number.
Demonstration. Let x1, . . . , xp be the p positive simple roots of the equation f (x) = 0. We may
write
f (x) = (x − x1) · · · (x − xp)g(x), (3.66)
1The theorem was presented by Ren´e Descartes (1596–1650) in La G´eom´etrie (1637) and is also known as the
rule of signs.
66 SOLUTION OF ALGEBRAIC EQUATIONS
where g(x) has no positive roots. Let n1 be the number of sign variations of the coefficients of the
polynomial g(x). According to the Corollary 3.3, n1 is an even number
n1 = 2m, m ∈ N. (3.67)
We now apply the Lemma 3.1 p times, so that every time number of the sign variations in the
sequence of the coefficients of the obtained polynomials will increase by an odd number. It follows
that the number of sign variations in the sequence of the coefficients of the polynomial g is given by
Nv = n1 +
p
i=1
(2ki + 1) = 2 m +
p
i=1
ki + p. (3.68)
We obtain
p = Nv − 2 m +
p
i=1
ki (3.69)
and the theorem is proved.
Observation 3.13 Taking into account the polynomial f1(x) = (−1)n
f (−x), we may apply
Descartes’ theorem for the negative roots of f too.
Definition 3.4 Let f be a polynomial with real coefficients, which does not have multiple roots,
and let [a, b] be an interval of the real axis. A finite sequence f0, f1, . . . , fk of polynomials
associated with the polynomial f on this interval is called a Sturm sequence if
(i) the last polynomial fk(x) has no real roots;
(ii) two consecutive polynomials fi(x) and fi+1(x) have no common roots;
(iii) if x ∈ R and fi(x) = 0, then fi−1(x)fi+1(x) < 0, i = 0, k − 1;
(iv) fi(a) = 0; fi(b) = 0 for any i = 0, k.
Proposition 3.5 For any polynomial f with real coefficients, without multiple roots, and for any
interval [a, b] with f (a) = 0, f (b) = 0, there exists a Sturm sequence associated with f on [a, b].
Demonstration. Let us construct the sequence fi so that
f0 = f, f1 = f , (3.70)
while for i ≥ 2 we have
f0 = f1q1 − f2, f1 = f2q2 − f3, . . . , fi-2 = fi−1qi−1 − fi, . . . (3.71)
Because the degrees of the polynomials decrease, it follows that there exist only a finite number of
such polynomials.
In the following, we verify that the sequence of these polynomials fi, i = 0, k, previously defined
is a Sturm sequence associated with f on [a, b].
(i) Because f = f0 and f = f1 have no common factors (f has no multiple roots), it follows
that the last polynomial fk of the sequence is a constant.
SEPARATION OF ROOTS 67
(ii) If fi and fi−1, 1 ≤ i ≤ k, have a common root, then from relation (3.71) it would follow
that fi−2 has the same root. Finally, we can show that the root is common to f0 = f and
f1 = f , so that the polynomial f would then have multiple roots, which is a contradiction
to the hypothesis. Therefore, fi and fi−1 have no common roots, 1 ≤ i ≤ k.
(iii) Let x ∈ R be so that fi(x) = 0 for a certain index i, 1 ≤ i ≤ k − 1. From
fi−1(x) = fi(x)qi(x) − fi+1(x), (3.72)
we get
fi−1(x) = −fi+1(x), (3.73)
because fi(x) = 0; hence
fi−1(x)fi+1(x) < 0. (3.74)
(iv) From (ii) and (iii) it follows that fi(a) may be equal to zero only for a finite number
of indices i1, i2, . . . , ip between 0 and k, as well as for any two neighboring indices
ik+1 − ik > 1. We can replace the value a with the value a + ε, ε sufficiently small, so that
the properties (i), (ii) and (iii) are not violated, and fi(a) = 0 for any i = 0, k . Analogically,
we may also replace the value b with the value b − µ, µ sufficiently small, to get all the
properties required by the Sturm sequence.
Theorem 3.7 (Sturm2). Let f be a polynomial with real coefficients and without multiple roots.
The number of real roots of the polynomial f in the interval [a, b] is given by Ws(a) − Ws(b),
where Ws(x∗
) is the number of the sign variations in the sequence f0(x∗
), f1(x∗
), . . . , fk(x∗
).
Demonstration. Let fi, 0 ≤ i ≤ k − 1, be an arbitrary term (but not the last) of the Sturm sequence
and let us denote by x1, x2, . . . , xs the roots of this polynomial in the interval [a, b].
We shall show that Ws(x∗
) remains constant for x∗
∈ (xk, xk+1). Let us suppose per absurdum
that Ws(x∗) is not constant. Then there exist two real numbers, y1 and y2, in the interval (xk, xk+1)
so that fi(y1)fi(y2) < 0. It follows that there exists ξ ∈ (y1, y2), y1 < y2, so that fi(ξ) = 0. But
ξ is not a root of fi because ξ ∈ (xk, xk+1) and xk and xk+1 are consecutive roots. Hence, Ws(x∗
)
is constant for x∗
∈ (xk, xk+1).
Let us consider y ∈ [a, b] and fi(y) = 0, 1 ≤ i ≤ k − 1, that is y is not a root for f . We
shall show that Ws(a) = Ws(b). From property (iii) of the Sturm sequence, it follows that
fi−1(y)fi+1(y) < 0, that is, fi−1(y) and fi+1(y) have opposite signs. These signs do not change
if we replace y by a and b, respectively. Hence, it follows that the number of sign variations in
the triples fi−1(a), fi(a), fi+1(a) and fi−1(b), fi(b), fi+1(b), respectively is every time equal to
unity. We conclude that if y is not a root of f , then Ws(a) = Ws(b).
Let y ∈ [a, b], y a root of f . In this case, f (a)f (b) < 0, and hence f (a) and f (b) have
the same sign as f (b). It results Ws(a) − Ws(b) = 1. It follows that each root adds a unity to
Ws(a) − Ws(b). Thus the theorem is proved.
Theorem 3.8 (Budan3
or Budan–Fourier). Let f be a polynomial in the variable x, and a and
b two real numbers, not necessarily finite. Let us denote by δf the sequence f , f , . . . , f (n)
and
by W(δf , p) the number of variations of sign in the sequence f (p), f (p), . . . , f (n)(p). In these
conditions, if R(f, a, b) is the number of real roots of f in the interval [a, b], each root being
2
The idea is a generalization of Euclid’s algorithm in the case of polynomials and was proved in 1829 by Jacques
Charles Franc¸ois Sturm (1803–1855).
3Ferdinand Franc¸ois D´esir´e Budan de Boislaurent (1761–1840) proved this theorem in 1807. The proof was
lost and was replaced by another statement of an equivalent theorem belonging to Jean Baptiste Joseph Fourier
(1768–1830), published in 1836.
68 SOLUTION OF ALGEBRAIC EQUATIONS
counted as many times as its order of multiplicity, then W(δf , a) − W(δf , b) is at least equal to
R(f, a, b), while the difference between them is a positive even number, that is,
W(δf , a) − W(δf , b) = R(f, a, b) + 2k, k ∈ N. (3.75)
Demonstration. First, let us remark that W(δf , x) may change its value only if x passes through a
root x of a polynomial of the sequence δf . We can find an ε > 0 so that, in the interval [x − ε, x + ε],
no function of the sequence δf has roots, other than x. Let us denote by m the order of multiplicity
of x.
If we show that
W(δf , x) = W(δf , x + ε) (3.76)
and
W(δf , x − ε) − [W(δf , x) + m] = 2k, k ∈ N, (3.77)
then the theorem is proved.
Indeed, when x goes through the interval [a, b], R(f, a, b) and W(δf , x) are modified only if x
becomes equal to a root x of f or of one of its derivatives. At such a point, R(f, a, b) increases
with the order of multiplicity of x for f , while W(δf , x) decreases with the sum of m and an
even natural number (Proposition 3.4). It follows therefore that the sum R(f, a, b) + W(δf , x) may
be changed only by the roots x of f or of its derivatives, in which case the value of the sum
decreases by an even natural number. We thus obtain the above theorem, because this sum is equal
to W(δf , a) for x = a.
Let us now prove relations (3.76) and (3.77). The proof is obtained by induction on the degree
of f .
If f is of first degree, then
W(δf , x − ε) = 1, W(δf , x) = W(δf , x + ε) = 0 (3.78)
and the induction hypothesis is verified.
Let us suppose now that the degree of f is at least equal to 2 and that m is the order of
multiplicity of x for f . We begin by assuming that f (x) = 0, from which m > 0 and x is the root
of an order of multiplicity m − 1 of f . The induction hypothesis leads to
W(δf , x) = W(δf , x + ε), W(δf , x − ε) − [W(δf , x) + (m − 1)] = 2k1, k1 ∈ N. (3.79)
From Lagrange’s mean theorem, applied to the intervals [x − ε, x] and [x, x + ε], we deduce
that f and f do not have the same sign in x − ε but have the same sign in x + ε, hence
W(δf x) = W(δf , x) = W(δf , x + ε) = W(δf , x + ε), (3.80)
W(δf , x − ε) = W(δf , x − ε) + 1 ≥ W(δf , x) + (m − 1) + 1
= W(δf , x) + m, (3.81)
W(δf , x − ε) − [W(δf , x) + m] = 2k, k ∈ N, (3.82)
so that the theorem is proved in this case.
If f (x) = 0, that is m = 0, then we denote by m the order of multiplicity of x for f . From the
induction hypothesis, we have
W(δf , x) = W(δf , x + ε), (3.83)
W(δf , x − ε) − [W(δf , x) − m ] = 2k1, k1 ∈ N. (3.84)
LAGRANGE’S METHOD 69
On the other hand, f (x) = 0 and f (x) = 0, f (x) = 0, . . . , f (m )(x) = 0, f (m +1)(x) = 0. We
may suppose that f (m +1)(x) > 0 (eventually, multiplying f by −1). The following situations may
occur:
• m is an even number. In this case, f (x − ε) and f (x + ε) are positive, hence, for each x
of the set {x − ε, x, x + ε}, the first nonzero term of the sequence f (x), f (x), . . . , f (k)
(x)
is positive. If f (x) > 0, then W(δf , x) = W(δf , x), while if f (x) < 0, then W(δf , x) =
W(δf , x) + 1. The theorem is proved, because it follows that
W(δf , x) = W(δf , x + ε), (3.85)
W(δf , x − ε) − W(δf , x) = W(δf , x − ε) − W(δf , x), (3.86)
the term on the right being greater than m by an even number; and because m is an even
number, it follows that this term is also an even number.
• m is an odd number. We get
f (x − ε) < 0 < f (x + ε), (3.87)
hence the first nonzero term of the sequence f (x), f (x), . . . , f (k)(x) will have the signs +,
−, + at the points x − ε, x, x + ε, respectively. If f (x) > 0, then W(δf , x − ε) = W(δf , x −
ε) + 1, so that the other two variations of sign remain unchanged. If f (x) < 0, then the number
of variations of sign does not change for x − ε, but increases by unity for x and x + ε. We
obtain
W(δf , x) = W(δf , x + ε), (3.88)
W(δf , x − ε) − W(δf , x) = W(δf , x − ε) − W(δf , x) ± 1. (3.89)
On the other hand, W(δf , x − ε) − W(δf , x) is equal to m to which we add an even natural
number (i.e., an odd number, because m is odd). It follows therefore that if we add or subtract 1 to
this difference, we obtain an even natural number which is just W(δf , x − ε) − W(δf , x) and the
theorem is proved.
Observation 3.14 Descartes’ theorem is a particular case of Budan’s theorem. Indeed, if
f = a0xn
+ a1xn−1
+ · · · + an, (3.90)
then
sgnf (0) = sgnan, sgnf (0) = sgnan−1, . . . , sgnf (n)
(0) = a0, (3.91)
sgnf (∞) = sgnf (∞) = · · · = sgnf (n)
(∞) = sgna0 (3.92)
and from Budan’s theorem, for a = 0, b = ∞, it follows that W(δf , 0) is just the number of
variations of sign in the sequence a0, a1, . . . , an, W(δf , ∞) = 0. Hence we obtain Descartes’
theorem.
3.3 LAGRANGE’S METHOD
Let us consider the equation4
f (x) = a0xn
+ a1xn−1
+ · · · + a0 = 0, (3.93)
4The method was named in the honor of Joseph Louis Lagrange (Giuseppe Luigi Lagrancia or Giuseppe Luigi
Lagrangia) (1736–1813) who studied the problem in 1770.
70 SOLUTION OF ALGEBRAIC EQUATIONS
TABLE 3.1 The Generalized Horner’s Schema
a0 a1 a2 · · · an−2 an−1 an
α a0 a11 a12 · · · a1,n−2 a1,n−1
α a0 a21 a22 · · · a2,n−2
· · · · · · · · · · · · · · ·
α a0 an−1,1
α a0
the coefficients ai, i = 0, n, of which are real numbers and let α ∈ R be an arbitrary value. We may
write Taylor’s formula around α in the form
f (x) = f (α) +
x − α
1!
f (α) +
(x − α)2
2!
f (α) + · · · +
(x − α)n
n!
f (n)
(α). (3.94)
Hence, it follows that the remainder of the division of f (x) by x − α is just f (α), while the quotient
is given by
Q1(x) = f (α) +
x − α
2!
f (α) + · · · +
(x − α)n−1
n!
f (n)
(α). (3.95)
The remainder of the division of Q1(x) by x − α is f (α), while the quotient becomes
Q2(x) =
f (α)
2!
+
x − α
3!
f (α) + · · · +
(x − α)n−2
n!
f (n)
(α). (3.96)
In general,
Qi(x) =
f (i)
(α)
i!
+
x − α
(i + 1)!
f (i+1)
(α) + · · · +
(x − α)n−i
n!
f (n)
(α), (3.97)
while the remainder of the division of Qi(x) by x − α is
Ri(x) =
f (i)
(α)
i!
. (3.98)
Hence, we have the following relations between the coefficients a0, . . . , an of f (α) and the
coefficients a0, . . . , an−1 of f (α):
a0 = a0, a1 = a0α + a1, . . . , an−1 = an−2α + an−1. (3.99)
Analogically, the coefficients a0, . . . , an−1 of f (α) and the coefficients a 0, . . . , an−2 of f (α)/2
are related as follows:
a0 = a0, a1 = a0 α + a1, . . . , an−2 = an−3α + an−2. (3.100)
The above relations may be systematized in Table 3.1.
The first row gives the coefficients of f (α), that is, a0, . . . , an, the second gives the coef-
ficients of f (α)/1!, that is, a0 = a0, a1 = a11, . . . , an−1 = a1,n−1, the third the coefficients of
f (α)/2!, that is, a0 = a0, a1 = a21, . . . , an−2 = a2,n−2, . . . , the nth row has the coefficients of
f (n−1)
(α)/(n − 1)!, that is, a(n−1)
0 = a0, a(n−1)
1 = an−1,1, and the last row, the (n + 1)th, the coef-
ficients of f (n)(α)/n!, that is, a(n)
0 = a0. This table is known as the generalized Horner’s schema.
LAGRANGE’S METHOD 71
Let us suppose that equation (3.93) is the one that has a positive real root x. The case of the
negative root is similar to the previous one if we consider the equation
g(x) = (−1)n
f (−x) = 0. (3.101)
Let us suppose that we have found a natural number a1, so that
a1 < x < a1 + 1; (3.102)
hence x becomes
x = a1 +
1
x1
, (3.103)
with x1 ∈ R∗
+. We then have
f1(x1) = xn
1 f (a1) + xn−1
1
f (a1)
1!
+ xn−2
1
f (a1)
2!
+ · · · +
f (n)
(a1)
n!
= 0. (3.104)
We now search for a natural number a2, so that
x1 = a2 +
1
x2
, (3.105)
and hence
f2(x2) = xn
2 f (a2) + xn−1
2
f (a2)
1!
+ xn−2
2
f (a2)
2!
+ · · · +
f (n)
(a2)
n!
= 0. (3.106)
The procedure continues by searching for a3, so that
x2 = a3 +
1
x3
. (3.107)
Finally, we obtain
x = a1 +
1
a2 +
1
a3 +
1
...
, (3.108)
a decomposition of x in a continued fraction.
Let us denote
Rn = a1 +
1
a2 +
1
a3 +
1
... +
1
an
=
An
Bn
. (3.109)
Dirichlet’s theorem shows that
|x − Rn| <
1
B2
n
, (3.110)
thus obtaining the error of approximation in the solution x.
The method presented above is called Lagrange’s method.
Observation 3.15 To apply Lagrange’s method, it is necessary to have one and only one solution
of equation (3.93) between a1 and a1 + 1.
72 SOLUTION OF ALGEBRAIC EQUATIONS
3.4 THE LOBACHEVSKI–GRAEFFE METHOD
Let us consider the algebraic equation5
a0xn
+ a1xn−1
+ · · · + an−1x + an = 0, (3.111)
where ai ∈ R, i = 0, n, and let us denote by xi, i = 1, n, its roots.
3.4.1 The Case of Distinct Real Roots
Let us suppose that the n distinct roots are obtained as follows
|x1| > |x2| > · · · > |xn|. (3.112)
The corresponding Vi`ete’s relations are
x1 + x2 + · · · + xn = −
a1
a0
, x1x2 + · · · + x1xn + · · · + xn−1xn =
a2
a0
,
x1x2x3 + · · · + x1x2xn + · · · + xn−2xn−1xn = −
a3
a0
, . . . , x1x2 · · · xn = (−1)n an
a0
. (3.113)
If
|x1| |x2| |x3| · · · |xn|, (3.114)
then the roots xi, i = 1, n, may be given by the approximate formulae
x1 ≈ −
a1
a0
, x2 ≈ −
a2
a1
, x3 ≈ −
a3
a2
, . . . , xn ≈ −
an
an−1
. (3.115)
Let us see now how we can transform equation (3.111) into another one for which the roots yi,
i = 1, n, satisfy condition (3.114); there exist certain relations between the roots xk, k = 1, n, and
yi, i = 1, n. We now introduce the polynomial function
f (x) = a0xn
+ a1xn−1
+ · · · + an−1x + an; (3.116)
we can then write
f (x) = a0(x − x1)(x − x2) · · · (x − xn) (3.117)
because of the supposition that the roots xi, i = 1, n, are real and distinct.
On the other hand,
f (−x) = (−1)n
a0(x + x1)(x + x2) · · · (x + xn), (3.118)
hence
f (x)f (−x) = (−1)n
a2
0(x2
− x2
1)(x2
− x2
2) · · · (x2
− x2
n). (3.119)
From relation (3.116), we get
f (−x) = (−1)n
a0xn
+ (−1)n−1
a1xn−1
+ · · · + (−1)an−1x + an, (3.120)
5This method was presented by Germinal Pierre Dandelin (1794–1847) in 1826, Karl Heinrich Graeffe (Karl
Heinrich Gr¨affe) (1799–1873) in 1837, and Nikolai Ivanovich Lobachevski (1792–1856) in 1834.
THE LOBACHEVSKI–GRAEFFE METHOD 73
and hence
f (x)f (−x) = (−1)n
[a2
0x2n
− (a2
1 − 2a0a2)x2n−2
+ (a2
2 − 2a1a3 + 2a0a4)x2n−2
+ · · · + (−1)n
a2
n].
(3.121)
By the transformation
y = x2
, (3.122)
the equation
f (x)f (−x) = 0 (3.123)
becomes
a0yn
+ a1yn−1
+ · · · + an−1y + an = 0, (3.124)
where
a0 = a2
0, a1 = −(a2
1 − 2a0a2), a2 = a2
2 − 2a1a3 + 2a0a4, . . . , an = (−1)n
a2
n. (3.125)
We can write these relations in the form
aj = a2
j + 2
n
i=1
(−1)i
aj−1aj+1 (−1)j
, j = 0, n, (3.126)
where aj = 0 for j < 0 or j > n.
Observation 3.16
(i) Equation (3.123) has 2n roots, namely, ±x1, ±x2, . . . , ±xn.
(ii) By solving equation (3.124), we obtain the roots y1, y2, . . . , yn. The roots x1, x2, . . . , xn
are no more unique, because xi = −yi or xi = − −yi, i = 1, n.
The procedure described above can be repeated for equation (3.124) in y. In general, the proce-
dure is repeated p times, obtaining thus an equation of the form
a
(p)
0 zn
+ a
(p)
1 zn−1
+ · · · + a
(p)
n−1z + a(p)
n = 0, (3.127)
the roots of which are z1, z2, . . . , zn. The connection between zi and xi is given by
xi = 2p
−zi or xi = − 2p
−zi, i = 1, n. (3.128)
The roots of equation (3.127) are given by the formulae of the form (3.115), hence
z1 = −
a
(p)
1
a
(p)
0
, z2 = −
a
(p)
2
a
(p)
1
, . . . , zi = −
a
(p)
i+1
a
(p)
i
, . . . , zn = −
a
(p)
n
a
(p)
n−1
, (3.129)
so relations (3.128) may also be written in the form
xi = 2p
−
a
(p)
i
a
(p)
i−1
or xi = − 2p
−
a
(p)
i
a
(p)
i−1
, i = 1, n. (3.130)
Relations (3.130) must satisfy the initial equation f (x) = 0, retaining only its solutions.
74 SOLUTION OF ALGEBRAIC EQUATIONS
3.4.2 The Case of a Pair of Complex Conjugate Roots
Let us again consider equation (3.111), supposing that two of its roots, say xk and xk+1, are conjugate
complex ones. We can write relation (3.112) in the form
|x1| > |x2| > · · · > |xk| = |xk+1| > |xk+2| > · · · > |xn|. (3.131)
We denote by
r = |xk| = |xk+1|, (3.132)
the modulus of the conjugate complex roots, where
xk = α + iβ, xk+1 = α − iβ, r = α2 + β2. (3.133)
From Vi`ete’s relation,
x1 + x2 + · · · + xk + xk+1 + · · · + xn = −
a1
a0
, (3.134)
we easily obtain
α = −
a1
2a0
−
1
2
n
j=1
j=k;j=k+1
xj , (3.135)
by taking into account relations (3.133). Squaring equation (3.111) and proceeding as from equation
(3.111), we obtain the equation
a
(p)
0 zn
+ a
(p)
1 zn−1
+ · · · + a
(p)
k−1zn−k+1
+ a
(p)
k zn−k
+ a
(p)
k+1zn−k−1
+ · · · + a
(p)
1 z + a
(p)
0 = 0. (3.136)
The roots zk and zk+1 satisfy the relation
a
(p)
k−1z2
+ a
(p)
k z + a
(p)
k+1 = 0. (3.137)
Then
zkzk+1 =
a
(p)
k+1
a
(p)
k−1
. (3.138)
On the other hand,
zkzk+1 = (xkxk+1)2p
= (r2
)2p
, (3.139)
from which
r2
= 2p a
(p)
k+1
a
(p)
k−1
. (3.140)
From relations (3.135) and (3.140), we get
β2
= r2
− α2
= 2p a
(p)
k+1
a
(p)
k−1
−



−
a1
2a0
−
1
2
n
j=1
j=k;j=k+1
xj



 . (3.141)
Knowing α and β, we obtain the roots xk and xk+1.
THE LOBACHEVSKI–GRAEFFE METHOD 75
Observation 3.17
(i) If all the roots of equation (3.111) are real and distinct, then all the products of the form
aj−iaj+i become negligible with respect to a2
j , hence all the coefficients a(s)
j become perfect
quasi-squares beginning from a certain rank.
(ii) If a certain a(s)
j , 1 ≤ j ≤ n − 1, does not become a perfect square, but is situated between
two perfect squares a(s)
j−1 and a(s)
j+1, then the ratio
(r2
)2s
=
a(s)
j+1
a(s)
j−1
, (3.142)
where r is the modulus of the pair of conjugate complex roots or even the value of a double
real root (if the imaginary part of the conjugate complex roots vanishes).
(iii) More generally, if 2l − 1 coefficients a(s)
k−2l+1, a(s)
k−2l+2, . . . , a(s)
k are situated between two
perfect squares a(s)
k−2l and a(s)
k , then there exist l pairs of roots that have the same modulus r.
3.4.3 The Case of Two Pairs of Complex Conjugate Roots
Let xk, xk+1 and xl, xl+1 be two pairs of conjugate complex roots, so that
xk = α1 + iβ1, xk+1 = α1 − iβ1, xl = α2 + iβ2, xl+1 = α2 − iβ2, (3.143)
with β1 = 0, β2 = 0. We may write the sequence of inequalities
|x1| > |x2| > · · · > |xk−1| > |xk| = |xk+1| > |xk+2| > · · ·
> |xl−1| > |xl| = |xl+1| > |xl+2| > · · · > |xn|, (3.144)
where x1, . . . , xn are the roots of equation (3.111), all real, except for xk, xk+1, xl, and xl+1. We
obtain thus two equations of second degree, that is,
a
(p)
k−1z2
+ a
(p)
k z + a
(p)
k+1 = 0, a
(p)
l−1z2
+ a
(p)
l z + a
(p)
l+1 = 0. (3.145)
Let us denote by r1 and r2 the moduli of the two pairs of complex roots
r1 = |xk| = |xk+1|, r2 = |xl| = |xl+1|. (3.146)
We can write the relations
r2
1 = xkxk+1 = α2
1 + β2
1, r2
2 = xlxl+1 = α2
2 + β2
2; (3.147)
and from equation (3.145), we obtain
r2
1 = 2p a
(p)
k+1
a
(p)
k−1
, r2
2 = 2p a
(p)
l+1
a
(p)
l−1
. (3.148)
From the first Vi`ete relation for equation (3.111), we have
n
i=1
xi = −
a1
a0
(3.149)
76 SOLUTION OF ALGEBRAIC EQUATIONS
or
x1 + x2 + · · · + xk−1 + xk+1 + xk+2 + · · · + xl−1 + xl + xl+1 + xl+2 + · · · + xn = −
a1
a0
, (3.150)
because
xk + xk+1 = 2α1, xl + xl+1 = 2α2, (3.151)
we have
α1 + α2 = −
1
2






a1
a0
+
n
i=1
i=k;i=k+1
i=l;i=l+1
xi






. (3.152)
Let us consider now the last two Vi`ete relations,
x1x2 . . . xn−1 + x1x2 . . . xn−2xn + · · · + x2x3 . . . xn = (−1)n−1 an−1
a0
, (3.153)
x1x2 . . . xn = (−1)n an
a0
. (3.154)
By division, we get
1
x1
+ · · · +
1
xk−1
+
1
xk
+
1
xk+1
+
1
xk+2
+ · · · +
1
xl−1
+
1
xl
+
1
xl+1
+
1
xl+2
+ · · · +
1
xn
= −
an−1
an
.
(3.155)
On the other hand,
1
xk
+
1
xk+1
=
xk + xk+1
xkxk+1
=
2α1
r2
1
,
1
xl
+
1
xl+1
=
xl + xl+1
xlxl+1
=
2α2
r2
2
, (3.156)
leading to
α1
r2
1
+
α2
r2
2
= −
1
2






an−1
an
+
n
i=1
i=k;i=k+1
i=l;i=l+1
1
xi






. (3.157)
We obtain α1 and α2 from relations (3.152) and (3.157). Taking into account r1, r2, α1, α2, it follows
that
β1 = r2
1 − α2
1, β2 = r2
2 − α2
2. (3.158)
3.5 THE BERNOULLI METHOD
Let us consider the equation6
f (x) = xn
+ a1xn−1
+ · · · + an = 0, (3.159)
6Daniel Bernoulli (1700–1782) used this method for the first time in 1724.
THE BERNOULLI METHOD 77
to which we associate the recurrence formula
µn + a1µn−1 + · · · + an = 0. (3.160)
If the roots of equation (3.195) are ξ1, ξ2, . . . , ξn and if equation (3.160) is considered to be a
difference equation, then the solution of the latter is of the form
µk = C1ξk
1 + C2ξk
2 + · · · + Cnξk
n, k = 1, n, (3.161)
where Ci, i = 1, n, are constants that do not depend on k, while the roots ξi, i = 1, n, are assumed
to be distinct.
Let us further suppose that the roots ξi, i = 1, n, are indexed such that
|ξ1| > |ξ2| > · · · > |ξn|. (3.162)
Writing expression (3.161) in the form
µk = C1ξk
1 1 + C2
ξk
2
ξk
1
+ · · · + Cn
ξk
n
ξk
1
(3.163)
and making k → k − 1, from which
µk−1 = C1ξk−1
1 1 + C2
ξk−1
2
ξk−1
1
+ · · · + Cn
ξk−1
n
ξk−1
1
, (3.164)
it follows that
ξ1 = lim
k→∞
µk
µk−1
, (3.165)
supposing that µ1, µ2, . . . , µn are chosen so as not to have C1 = 0. Such a choice is given by
µ1 = µ2 = · · · = µn−1 = 0, µn = −a0. (3.166)
Another choice for the n values is given by
µr = −(a1µr−1 + a2µr−2 + · · · + ar−1µ1 + ar ), r = 1, n, (3.167)
where we suppose that µi = 0 if i ≤ 0. In the case of this choice, we obtain Ci = 1, i = 1, n, and
µk = ξk
1 + ξk
2 + · · · + ξk
n, k ≥ 1. (3.168)
Moreover, we also obtain also the approximate relations
ξ1 ≈
µk
µk−1
, ξ1 ≈ k
√
µk, (3.169)
with k sufficiently large.
If ξ1 is a complex root, then ξ2 = ξ1, |ξ1| = |ξ2|. We may write
ξ1 = ζ1 + iη1 = β1eiφ1 , ξ1 = ζ1 − iη1 = β1e−iφ1 , (3.170)
where
β1 = ξ2
1 + η2
1 > 0. (3.171)
78 SOLUTION OF ALGEBRAIC EQUATIONS
The sum C1ξk
1 + C2ξk
2 may be replaced by
βk
1(C1 cos kφ1 + C2 sin kφ1), (3.172)
where we have made the substitutions
C1 →
C1 − iC2
2
, C2 →
C1 + iC2
2
. (3.173)
Hence it follows that, for k sufficiently large, we may write
µk ≈ βk
1(C1 cos kφ1 + C2 sin kφ1). (3.174)
Moreover, µk must satisfy the recurrence relation
µk+1 − 2µkβ1 cos φ1 + β2
1µk−1 = 0. (3.175)
Making k → k − 1, we obtain the second relation of recurrence
µk − 2µk−1β1 cos φ1 + β2
1µk−2 = 0. (3.176)
By eliminating cos φ1 between these relations, it follows that
(µ2
k−1 − µkµk−2)β2
1 = µ2
k − µk+1µk−1, (3.177)
whereas by eliminating β2
1, we obtain
2(µ2
k−1 − µkµk−2) cos φ1 = µkµk−1 − µk+1µk−2. (3.178)
Denoting
sk = µ2
k − µk+1µk−1, tk = µkµk−1 − µk+1µk−2, (3.179)
we obtain the values
β2
1 ≈
sk
sk−1
, 2β1 cos φ1 ≈
tk
tk−1
, (3.180)
for k sufficiently large and C1 = 0, C2 = 0 (a case that may be eliminated).
If ξ1 is a double root, ξ1 = ξ2, then in the sum (3.161) we obtain the expression ξk
1(C1 + C2k).
It follows that µk satisfies the relation
µk+1 − 2µkξ1 + µk−1ξ2
1 = 0, (3.181)
for k → ∞.
Proceeding as above, we obtain the relation
2ξ1 ≈
tk
sk−1
. (3.182)
LIN METHODS 79
3.6 THE BIERGE–VI `ETE METHOD
Let us consider the polynomial7
f (x) = xn
+ a1xn−1
+ · · · + an (3.183)
which we divide by x − ξ. It follows that
f (x) = xn
+ a1xn−1
+ · · · + an = (x − ξ)(xn−1
+ b1xn−2
+ · · · + bn−1) + R, (3.184)
where R is the remainder. In particular,
R = f (ξ). (3.185)
Dividing now the quotient of relation (3.184) by x − ξ, we obtain
xn−1
+ b1xn−2
+ · · · + bn−1 = (x − ξ)(xn−2
+ c1xn−3
+ · · · + cn−2) + R , (3.186)
while the remainder R verifies the relation
R = f (ξ). (3.187)
Obviously, the procedure may continue.
Between the coefficients ai, i = 1, n, and bj , j = 1, n − 1, there take place the relations
a1 = b1 − ξ, a2 = b2 − ξb1, . . . , an−1 = bn−1 − ξbn−2, an = R − ξbn−1 (3.188)
and similarly for bj , j = 1, n − 1, and ck, k = 1, n − 2.
It follows that
R = f (ξ) = bn = an + ξbn−1, (3.189)
R = f (ξ) = cn−1 = bn−1 + ξcn−2. (3.190)
Thus, we obtain the relation of recurrence
ξ∗
= ξ −
R
R
= ξ −
an + ξbn−1
bn−1 + ξcn−2
. (3.191)
As a matter of fact, the Bierge–Vi`ete method is a variant of Newton’s method, in which the
computation of the functions f (ξ) and f (ξ) is avoided.
3.7 LIN METHODS
The first Lin method8
derives from the Bierge–Vi`ete one, for which the relation f (ξ) = 0 is
equivalent to
an + ξbn−1 = 0, (3.192)
the notations being those in the previous paragraph.
7
This method is the Newton–Raphson method in the case of polynomials. It was named in the honor of Franc¸ois
Vi`ete (1540–1603) who stated it for the first time, in a primary form, in 1600.
8The methods were presented for the first time by Sir Leonard Bairstow (1880–1963) in 1920. They were mathe-
matically developed by S. N. Lin in 1941 and 1943.
80 SOLUTION OF ALGEBRAIC EQUATIONS
In this case, bn−1 is seen as a function of ξ, hence relation (3.192) is written in the form
ξ = −
an
bn−1(ξ)
. (3.193)
We obtain thus an iterative formula in which
ξ∗
= −
an
bn−1(ξ)
, (3.194)
from which
ξ = ξ∗
− ξ = −
an + ξbn−1(ξ)
bn−1(ξ)
(3.195)
or, equivalently,
ξ∗
= ξ −
R
bn−1(ξ)
. (3.196)
On the other hand, we have seen in the previous paragraph that
bn−1(ξ) =
f (ξ) − an
ξ
(3.197)
and the recurrence relation (3.193) becomes
ξ∗
= −
anξ
f (ξ) − an
. (3.198)
Hence, it follows that the first Lin method is equivalent to the application of the method of con-
tractions to the function
F(ξ) = −
anξ
f (ξ) − an
; (3.199)
this method is convergent if
|F (x)| = an
xf (x) − f (x) + an
[f (x) − an]2
< 1. (3.200)
On the other hand, if ξ is a root of the equation f (x) = 0, then we may write
µr = F (ξr ) = 1 +
ξr
an
f (ξr ) = 1 +
ξr f (ξr )
f (0)
(3.201)
and the convergence is ensured if
|µr | = 1 +
ξr
an
f ξr < 1, (3.202)
that is, if the start value for the iterations sequence is sufficiently close to ξr .
The second method of Lin starts from the idea of dividing the polynomial
f (x) = xn
+ a1xn−1
+ · · · + an (3.203)
by the quadratic factor x2
+ px + q, obtaining
xn
+ a1xn−1
+ · · · + an = (x2
+ px + q)(xn−2
+ b1xn−3
+ · · · + bn−2) + Rx + S. (3.204)
It follows therefore that x2 + px + q is a divisor of f if and only if R = 0 and S = 0.
LIN METHODS 81
Expanding the computations in equation (3.204), we obtain the relations
a1 = b1 + p, a2 = b2 + pb1 + q, a3 = b3 + pb2 + qb1, . . . ,
an−2 = bn−2 + pbn−3 + qbn−4, an−1 = R + pbn−2 + qbn−3, an = S + qbn−2. (3.205)
Using the recurrence formula
bk = ak − pbk−1 − qbk−2, k = 1, n, b0 = 1, b−1 = 0, (3.206)
it follows that R and S are given by
R = bn−1 = an−1 − pbn−2 − qn−3, S = bn + pbn−1 = an − qbn−2. (3.207)
Using the condition R = 0, S = 0, so that x2
+ px + q divides f , we obtain
an−1 − pbn−2 − qbn−3 = 0, an − qbn−2 = 0. (3.208)
Lin’s idea consists in applying the method of successive iterations to the sequences defined by
p =
an−1 − qbn−3
bn−2
, q =
an
bn−2
, (3.209)
so that the new values p∗
, q∗
after iteration become
p∗
=
an−1 − qbn−3
bn−2
, q∗
=
an
bn−2
, (3.210)
p = p∗
− p =
an−1 − pbn−2 − qbn−3
bn−2
, q = q∗
− q =
an − qbn−2
bn−2
(3.211)
or, equivalently,
p∗
= p +
R
bn−2
, q∗
= q +
S
bn−2
. (3.212)
Because x1 and x2 are the roots of the equation x2
+ px + q, we have
Rx1 + S = f (x1), Rx2 + S = f (x2), (3.213)
resulting in the expressions
x1(p∗
− p) + (q∗
− q) =
qf (x1)
an − S
, x2(p∗
− p) + (q∗
− q) =
qf (x2)
an − S
. (3.214)
Denoting the roots of the equation x2 + p∗x + q∗ = 0 by x∗
1 , x∗
2 , we obtain the relations
(x2 − x1)(x∗
1 − x1) =
qf (x1)
an − S
− (x∗
1 − x1)(x∗
2 − x2),
(x2 − x1)(x∗
2 − x2) = −
qf (x2)
an − S
+ (x∗
1 − x1)(x∗
2 − x2). (3.215)
82 SOLUTION OF ALGEBRAIC EQUATIONS
If (x∗
1 , x∗
2 ) is sufficiently close to (x1, x2), then Lagrange’s theorem of finite increments leads to
x1 − x∗
1 ≈ 1 +
ξ1ξ2
ξ2 − ξ1
f ξ1
an
(ξ1 − x1), x2 − x∗
2 ≈ 1 −
ξ1ξ2
ξ2 − ξ1
f ξ2
an
(ξ2 − x2),
(3.216)
where ξ1 and ξ2 are the roots of equation f (x) = 0.
Hence, the method is convergent if the moduli of the expressions in the brackets of relation (3.50)
are strictly subunitary. Moreover, it is necessary that the start values for p and q be sufficiently
close to −(ξ1 + ξ2) and ξ1ξ2, respectively.
3.8 NUMERICAL EXAMPLES
Example 3.1 Let us consider the polynomial
P (x) = X5
+ 3X4
− 2X3
+ 6X2
+ 5X − 7 (3.217)
for which we wish to determine the limits between which its roots can be found.
Using the notation in Section 3.1, we have
a = max{|3|, |−2|, |6|, |5|, |−7|} = 7, a0 = 1, (3.218)
a = max{|1|, |3|, |−2|, |6|, |5|} = 6, a5 = −7, (3.219)
so that the roots of the equation P (x) = 0 can be found in the interval
7
13
=
7
6 + 7
≤ |x| ≤ 1 +
7
1
= 8. (3.220)
The positive roots of the equation P (x) = 0 has as upper limit, the value
L = 8. (3.221)
Let us consider the equations
P (−x) = 0, (3.222)
x5
P
1
x
= 0, (3.223)
and
−x5
P −
1
x
= 0, (3.224)
which may be written also in the forms
x5
− 3x4
− 2x3
− 6x2
+ 5x + 7 = 0, (3.225)
7x5
− 5x4
− 6x3
+ 2x2
− 3x − 1 = 0, (3.226)
and
7x5
+ 5x4
− 6x3
− 2x2
− 3x + 1 = 0. (3.227)
NUMERICAL EXAMPLES 83
The upper limits of the positive roots of these equations are given by
L1 = 1 +
7
1
= 8, L2 = 1 +
6
7
=
13
7
, L3 = 1 +
6
7
=
13
7
, (3.228)
so that the real roots of the equation P (x) = 0 are to be found in the set
M1 = −L1, −
1
L3
1
L2
, L = −8, −
7
13
7
13
, 8 . (3.229)
If we solve the problem by using the second method of determination of the upper limit of the
roots of the equation, we get
(i) for the equation P (x) = 0: a0 = 1, A = 7, k = 2, L = 1 + (a0/A)1/k
= 1 +
√
1/7;
(ii) for equation (3.225): a0 = 1, A = 6, k = 1, L1 = 1 + 1/6 = 7/6;
(iii) for equation (3.226): a0 = 7, A = 5, k = 1, L2 = 1 + 7/5 = 12/5;
(iv) for equation (3.227): a0 = 7, A = 6, k = 2, L3 = 1 + (7/6)1/2
= 1 +
√
7/6.
In this case, the real roots of the equation P (x) = 0 have to be found in the set
M2 = −
7
6
, −
1
1 +
√
7/6
5
12
, 1 + 1/7 . (3.230)
Let us denote by f (x), f1(x), f2(x), and f3(x) the functions
f (x) = x5
+ 3x4
− 2x3
+ 6x2
+ 5x − 7, f1(x) = x5
− 3x4
− 2x3
− 6x2
+ 5x + 7,
f2(x) = 7x5
− 5x4
− 6x3
+ 2x2
− 3x − 1, f3(x) = 7x5
+ 5x4
− 6x3
− 2x2
− 3x + 1, (3.231)
for which the derivatives are
f (x) = 5x4
+ 12x3
− 6x2
+ 12x + 5, f (x) = 20x3
+ 36x2
− 12x + 12,
f (x) = 60x2
+ 72x − 12, f (iv)
(x) = 120x + 72, f (v)
(x) = 120, (3.232)
f1(x) = 5x4
− 12x3
− 6x2
− 12x + 5, f1 (x) = 20x3
− 36x2
− 12x − 12,
f1 (x) = 60x2
− 72x − 12, f (iv)
1 (x) = 120x − 72, f (v)
1 (x) = 120, (3.233)
f2(x) = 35x4
− 20x3
− 18x2
+ 4x − 3, f2 (x) = 140x3
− 60x2
− 36x + 4,
f2 (x) = 420x2
− 120x − 36, f (iv)
2 (x) = 840x − 120, f (v)
2 (x) = 840, (3.234)
f3(x) = 35x4
+ 20x3
− 18x2
− 4x − 3, f3 (x) = 140x3
+ 60x2
− 36x − 4,
f3 (x) = 420x2
+ 120x − 36, f (iv)
3 (x) = 840x + 120, f (v)
3 (x) = 840. (3.235)
To apply Newton’s method, we search first for a value M > 0 so that f (v)(M) > 0. Obviously,
M may be any positive real number. We choose M = 0.1. We search now for a value M ≥ M
so that f (iv)
(M ) > 0. We choose M = M. The procedure is continued for the value M and the
derivative f (x), obtaining M = M = M. Step by step, it follows that we may choose the value
L = 1 for the function f (x).
Analogically, we get the following values
• for f1(x): L1 = 4;
• for f2(x): L2 = 2;
• for f3(x): L3 = 1.
84 SOLUTION OF ALGEBRAIC EQUATIONS
It follows that the real roots of the equation f (x) = 0 have to be found in the set
M3 = [−4, −1]
1
2
, 1 . (3.236)
Let us solve the same problem by the method of grouping the terms.
For f (x) we may make a group of the form
(x5
+ 3x4
− 2x3
) + (6x2
+ 5x − 7), (3.237)
for which we find as upper bounds of the positive roots the values M1 = 2, M2 = 1, so that an
upper bound of these roots is given by M1 = 2.
In the same case of the function f (x) we may make also the group
(x5
− 2x3
) + (3x4
+ 6x2
+ 5x − 7), (3.238)
for which the upper bounds of the positive roots are the values M3 = 2 and M4 = 1, from which
we deduce that the upper bound of the positive roots is given by the value M3 = 2.
In conclusion, the upper limit of the positive roots of the equation f (x) = 0 is
L = max{M1, M3} = 2.
By an analogous procedure, it follows that:
• for f1(x) there is only one possibility of grouping
(x5
− 3x4
− 2x3
− 6x2
) + (5x + 7), (3.239)
hence the value L1 = 4;
• for f2(x), there is only one possibility of grouping
(7x5
− 5x4
− 6x3
) + (2x2
− 3x − 1) (3.240)
to which corresponds L2 = 2;
• for f3(x) the possibilities of grouping
(7x5
+ 5x4
− 6x3
− 2x2
− 3x) + (1), (3.241)
with L3 = 1,
(7x5
− 6x3
) + (5x4
− 2x2
− 3x) + (1), (3.242)
with L3 = 2,
(7x5
− 6x3
− 2x2
) + (5x4
− 3x) + (1), (3.243)
with L3 = 2,
(7x5
− 6x3
− 3x) + (5x4
− 2x2
) + (1), (3.244)
with L(iv)
3 = 2,
(7x5
− 2x2
− 3x) + (5x4
− 6x3
) + (1), (3.245)
with L(v)
3 = 2,
(7x5
− 2x2
) + (5x4
− 6x3
− 3x) + (1), (3.246)
with L(vi)
3 = 2,
(7x5
− 3x) + (5x4
− 6x3
− 2x2
) + (1), (3.247)
with L(vii)
3 = 2, so that L3 = 2.
NUMERICAL EXAMPLES 85
In conclusion, the real roots of the equation f (x) = 0 may be found in the set
M4 = −4, −
1
2
1
2
, 2 . (3.248)
We observe that the four methods lead to different results.
Moreover, Newton’s method and the method of grouping of terms lead to sufficiently laborious
expressions for the determination of the values L, L1, L2, and L3, because they imply polynomials
of a great degree for which we have no formulas to calculate the roots. In the example presented
here, we have preferred to determine these limits by entire numbers, although sometimes they can
be found as roots of some algebraic equations of small degrees (1 or 2). The first two methods are
simpler to apply, the second one having a more restricted area of values for the real roots.
Example 3.2 We wish to determine, as a function of the real parameter λ, the number of negative
and positive roots of the equation
x4
− 2x2
− λ = 0. (3.249)
To do this, we denote by f : R → R the polynomial function
f (x) = x4
− 2x2
− λ, (3.250)
the derivative of which is
f (x) = 4x3
− 4x, (3.251)
so that the first two polynomials of the Sturm sequence are
f0(x) = x4
− 2x2
− λ, (3.252)
f1(x) = x3
− x. (3.253)
Dividing f0 by f1, we obtain the quotient x and the remainder (−x2
− λ), so that the following
polynomial in the Sturm sequence reads
f2(x) = x2
+ λ. (3.254)
Now dividing f1 by f2 results in the quotient x and the remainder −(λ + 1)x, from we get the
polynomial
f3(x) = (λ + 1)x. (3.255)
We continue this process with the polynomials f2 and f3, for which we obtain the quotient
x/(λ + 1) and the remainder λ; hence, the last polynomial of the Sturm sequence is
f4(x) = −λ. (3.256)
Case 1 λ ∈ (−∞, −1) We construct the following table, where ε > 0 is a sufficiently small value.
f0 f1 f2 f3 f4 WS
−∞ + − + + + 2
−ε + + − + + 2
ε + − − – + 2
∞ + + + − + 2
The number of negative roots of the equation f (x) = 0 is given by
WS(−∞) − WS(−ε) = 2 − 2 = 0, (3.257)
86 SOLUTION OF ALGEBRAIC EQUATIONS
while the number of positive roots of the same equation is
WS(ε) − WS(∞) = 0. (3.258)
In conclusion, for λ ∈ (−∞, −1) our equation has no real roots.
Case 2 λ = −1 In this case, the equation f (x) = 0 becomes
x4
− 2x2
+ 1 = (x2
− 1)2
= 0 (3.259)
and has the double roots x1 = −1 and x2 = 1.
Case 3 λ ∈ (−1, 0) We construct the following table, where ε is a sufficiently small positive value.
f0 f1 f2 f3 f4 WS
−∞ + − + − + 4
−ε + + − − + 2
ε + − − + + 2
∞ + + + + + 0
It follows that the number of negative values of the equation f (x) = 0 is
WS(−∞) − WS(−ε) = 4 − 2 = 2, (3.260)
while the number of positive roots of the same equation is given by
WS(ε) − WS(∞) = 2 − 0 = 2. (3.261)
Case 4 λ = 0 The equation f (x) = 0 now takes the form
x4
− 2x2
= x2
(x2
− 2) = 0 (3.262)
and has the double root x1 = 0 and the simple roots x2 = −
√
2 and x3 =
√
2.
Case 5 λ ∈ (0, ∞) We construct the following table, in which ε > 0 is a sufficiently small value.
f0 f1 f2 f3 f4 WS
−∞ + − + − + 4
−ε + + − − + 2
ε + − − + + 2
∞ + + + + + 0
In this case, the number of negative roots of the equation f (x) = 0 is
WS(−∞) − WS(−ε) = 3 − 2 = 1, (3.263)
while the number of positive roots is
WS(ε) − WS(∞) = 2 − 1 = 1. (3.264)
NUMERICAL EXAMPLES 87
If we were to apply Descartes’ theorem to solve the same problem, then we would find that for
λ > 0, we have only one variation of sign in the sequence of the coefficients of the polynomial
x4
− 2x2
− λ, which means that equation (3.249) has only one positive root. Making x → −x, we
obtain the same equation (3.249) and, analogically it follows that it has a negative root. If λ < 0,
then Descartes’ theorem shows that equation (3.249) has zero or two positive roots and zero or two
negative roots.
The same conclusion is obtained from Budan’s theorem.
Example 3.3 Let us consider the equation
f (x) = x3
+ x − 3 = 0, (3.265)
the roots of which we wish to determine.
We begin by presenting an exact method of solving the equation of third degree, that is, the
method of Hudde (Johann van Waveren Hudde, 1628–1704).
Let us observe that any equation of third degree,
a0y3
+ a1y2
+ a2y + a3 = 0, a0 = 0, (3.266)
may be brought to the canonical form
x3
+ ax + b = 0, (3.267)
by dividing it by a0 and by the transformation
y = x −
a1
3a0
. (3.268)
We search for solutions of the form
x = u + v, u, v ∈ C, (3.269)
for equation (3.267). It follows that
(u + v)3
+ a(u + v) + b = 0 (3.270)
or, equivalently,
u3
+ v3
+ 3uv(u + v) + a(u + v) + b = 0. (3.271)
We shall determine u and v so that
u3
+ v3
= −b, uv = −
a
3
. (3.272)
The last relation (3.272) leads to
u3
+ v3
= −
a3
27
, (3.273)
hence u3
and v3
are solutions of second-degree equation
z2
+ bz −
a3
27
= 0, (3.274)
88 SOLUTION OF ALGEBRAIC EQUATIONS
from which
z1,2 =
−b ± b2 +
4a3
27
2
. (3.275)
We get the values
u = 3
−b − b2 +
4a3
27
2
, v = 3
−b + b2 +
4a3
27
2
. (3.276)
Let us denote by the expression
=
4a3
27
+ b2
, (3.277)
called the discriminant of the equation of third degree. Three situations may occur:
Case 1 = 0. In this case, all the roots of equation (3.267) are real, one of them being a double;
this is just the condition for such a root. Indeed, denoting by g(x) the function
g(x) = x3
+ ax + b, (3.278)
the derivative of which is
g (x) = 3x2
+ a. (3.279)
From the condition that g(x) and g (x) have a common root, we deduce
3x3
+ 3ax + 3b = 0, 3x3
+ ax = 0, (3.280)
from which
2ax + 3b = 0. (3.281)
Hence the common root is
x = −
3b
2a
. (3.282)
Replacing x in equation (3.267), we get
−
27b3
8a3
−
3b
2
+ b = 0, (3.283)
from which
27b3
4a3
+ b = 0, (3.284)
that is, the condition = 0.
Case 2 < 0. In this situation, expressions (3.276) become
u =
3 −b − i
√
| |
2
, v =
3 −b + i
√
| |
2
(3.285)
or, taking into account the trigonometric representation of complex numbers,
u = 3
A(cos θ − i sin θ), v = 3
A(cos θ + i sin θ), (3.286)
NUMERICAL EXAMPLES 89
where
A =
1
2
b2 + 2 (3.287)
and θ is the argument, θ ∈ [0, 2π). We deduce the values
u1 =
3
√
A cos
θ
3
− i sin
θ
3
, u2 =
3
√
A cos
θ + 2π
3
− i sin
θ + 2π
3
,
u3 =
3
√
A cos
θ + 4π
3
− i sin
θ + 4π
3
, (3.288)
v1 =
3
√
A cos
θ
3
+ i sin
θ
3
, v2 =
3
√
A cos
θ + 2π
3
+ i sin
θ + 2π
3
,
v3 =
3
√
A cos
θ + 4π
3
+ i sin
θ + 4π
3
, (3.289)
and the roots of equation (3.267) are
x1 = u1 + v1 = 2
3
√
A cos
θ
3
, x2 = u2 + v2 = 2
3
√
A cos
θ + 2π
3
,
x3 = u3 + v3 = 2
3
√
A cos
θ + 4π
3
. (3.290)
All these roots are real and distinct.
Case 3 > 0. In this situation, expressions (3.276) read
u =
3 −b −
√
2
, v =
3 −b +
√
2
(3.291)
or, equivalently,
u =
3 |b +
√
|
2
(cos λπ + i sin λπ), v =
3 |b −
√
|
2
(cos µπ + i sin µπ), (3.292)
where λ and µ are two entire parameters with the values 0 or 1, function of the sign of the
expressions −b −
√
and −b +
√
.
It follows that
u1 =
3 |b +
√
|
2
cos
λπ
3
+ i sin
λπ
3
, u2 =
3 |b +
√
|
2
cos
λπ + 2π
3
+ i sin
λπ + 2π
3
,
u3 =
3 |b +
√
|
2
cos
λπ + 4π
3
+ i sin
λπ + 4π
3
, (3.293)
v1 =
3 |b −
√
|
2
cos
µπ
3
+ i sin
µπ
3
, v2 =
3 |b −
√
|
2
cos
µπ + 2π
3
+ i sin
µπ + 2π
3
,
v3 =
3 |b −
√
|
2
cos
µπ + 4π
3
+ i sin
µπ + 4π
3
. (3.294)
We obtain nine combinations for the roots of equation (3.267) from which only three lead to
roots, one real and two complex conjugate.
90 SOLUTION OF ALGEBRAIC EQUATIONS
Returning to equation (3.265), it has already been brought to the canonical form with a = 1 and
b = −3. It follows that
=
4a3
27
+ b2
=
247
27
> 0, (3.295)
and hence equation (3.265) has a real root and two complex conjugate.
We have
u = 3
3 −
247
27
2
, v = 3
3 +
247
27
2
, (3.296)
so that
u1 = −0.230806, u2 = 0.115403 + 0.199883i, u3 = 0.115403 − 0.199883i, (3.297)
v1 = 1.444217, v2 = −0.722109 + 1.250729i, v3 = −0.722109 − 1.250729i. (3.298)
These result in the solutions
x1 = u1 + v1 = 1.213411, x2 = u2 + v2 = −0.606706 + 1.450612i,
x3 = u3 + v3 = −0.606706 − 1.450612i. (3.299)
Applying Descartes theorem to the function f (x), we deduce that equation (3.265) has only one
positive root. Now making x → −x in equation (3.265), we deduce the equation
x3
+ x + 3 = 0, (3.300)
so that equation (3.265) has no negative roots. In conclusion, equation (3.265) has a positive root
and two complex conjugate roots.
Let us apply now Lagrange’s method to determine the positive root of equation (3.265). We have
f (1) = −1 < 0, f (2) = 7 > 0, hence the positive root of equation (3.265) lies between 1 and 2.
We construct the following table:
1 0 1 −3
1 1 1 2 −1
1 1 2 4
1 1 3
1 1
It results in the equation
f1(x) = x3
− 4x2
− 3x − 1 = 0, (3.301)
while the solution reads
x = 1 +
1
· · ·
. (3.302)
As f1(4) = −13 < 0, f1(5) = 9 > 0, the equation f1(x) = 0 has a root between 4 and 5, while
the solution x reads as
x = 1 +
1
4 + 1
...
. (3.303)
NUMERICAL EXAMPLES 91
We construct the following table:
1 −4 −3 −1
4 1 0 −3 −13
4 1 4 13
4 1 8
4 1
and obtain the equation
f2(x) = 13x3
− 13x2
− 8x − 1 = 0, (3.304)
for which f2(1) = −9 < 0, f2(2) = 35 > 0. Now, the solution becomes
x = 1 +
1
4 +
1
1 +
1
· · ·
. (3.305)
It results in the table
13 −13 −8 −1
1 13 0 −8 −9
1 13 13 5
1 13 26
1 13
and the new equation
f3(x) = 9x3
− 5x2
− 26x − 13 = 0, (3.306)
for which f3(2) = −13 < 0, f3(3) = 107 > 0; the equation f3(x) = 0 has a root between 2 and 3.
Moreover, the solution x takes the form
x = 1 +
1
4 +
1
1 +
1
2 +
1
...
(3.307)
and we obtain the following table.
9 −5 −26 −13
2 9 13 0 −13
2 9 31 62
2 9 49
2 9
It results in the equation
f4(x) = 13x3
− 62x2
− 49x − 9 = 0, (3.308)
92 SOLUTION OF ALGEBRAIC EQUATIONS
TABLE 3.2 Solving of Equation (2.49) by the Lobachevski–Graeffe Method
Step a0 a1 a2 a3
0 1 0 1 −3
1 1 2 1 −9
2 1 −2 37 −81
3 1 70 1045 −6561
4 1 −2810 2010565 −43046721
5 1 −3874970 3.800449 × 1012
−1.853020 × 1015
6 1 −7.414495 × 1012
1.412905 × 1025
−3.433684 × 1030
7 1 −2.671664 × 1025
2.081974 × 1050
−1.179018 × 1061
for which f4(5) = −179 < 0, f4(6) = 273 > 0; the solution x takes the form
x = 1 +
1
4 +
1
1 +
1
2 +
1
5 +
1
...
. (3.309)
We stop here and write
x ≈ 1 +
1
4 +
1
1 +
1
2 +
1
5 + 1
=
108
89
= 1.213483, (3.310)
the precision of determination of the solution being
x −
108
89
<
1
892
=
1
7921
. (3.311)
Let us solve now equation (3.265) using the Lobachevski–Graeffe method. We may pass from
the coefficients a
(p)
i , i = 0, 3, at the step p, to the coefficients a
(p+1)
i , i = 0, 3, using the formulae
a
(p+1)
0 = [a
(p)
0 ]2
, a
(p+1)
1 = −{[a
(p)
1 ]2
− 2a
(p)
0 a
(p)
2 }, a
(p+1)
2 = [a
(p)
2 ]2
− 2a
(p)
1 a
(p)
3 ,
a
(p+1)
3 = −[a
p
3 ]2
. (3.312)
It results in Table 3.2.
The changes of sign in the column of a1 indicates the presence of a pair of complex roots. The
real root is determined by the relation
x3 = ± 27
−
a(7)
3
a(7)
2
= ±1.21341; (3.313)
we observe that equation (3.265) is verified by
x3 = 1.21341. (3.314)
NUMERICAL EXAMPLES 93
Searching for the complex roots of the form
x1 = α + iβ, x2 = α − iβ, (3.315)
we obtain, from the Vi`ete relation,
α = −
a1
2a0
−
1
2
x3 = −0.60671. (3.316)
If r is the modulus of the two complex roots (3.315), then
r2
= 27 a(7)
2
a(7)
0
= 2.472368, (3.317)
hence
β2
= r2
− α2
= 2.104271, β = 1.45061. (3.318)
The required roots are
x1 = −0.60671 + 1.45061i, x2 = −0.60671 − 1.45061i, x3 = 1.21341. (3.319)
We shall use now the Bernoulli method to solve equation (3.265).
We choose parameters µk, k ∈ N, using the recurrence formula
µk = −(a1µk−1 + a2µk−2 + · · · + ak−1µ1 + ak), (3.320)
where a0 = 1, a1 = 0, a2 = 1, a3 = −3, ai = 0 for i < 0 or i > 3 and µi = 0 for i ≤ 0.
Successively, we get
µ1 = x1 + x2 + x3 = 0, (3.321)
µ2 = x2
1 + x2
2 + x2
3 = −2, (3.322)
µ3 = x3
1 + x3
2 + x3
3 = 9, (3.323)
µ4 = −(a1µ3 + a2µ2 + a3µ1) = 2, (3.324)
µ5 = −15, µ6 = 25, µ7 = 21, µ8 = −70, µ9 = 54, µ10 = 133, µ11 = −264,
µ12 = 29, µ13 = 663, µ14 = −821, µ15 = −576, µ16 = 2810, µ17 = −1887, (3.325)
s16 = µ2
16 − µ15µ17 = 6809188, s15 = µ2
15 − µ14µ16 = 2638786,
t16 = µ16µ15 − µ17µ14 = −3167787, t15 = µ15µ14 − µ16µ13 = −1390134, (3.326)
so that
|x2|2
= |x3|2
≈
s16
s15
= 2.580424, |x2| = |x3| ≈ 1.60637, (3.327)
2|x2| cos φ = 2|x3| cos φ ≈
t16
t15
= 2.278764, cos φ = 0.709. (3.328)
Although the modulus of the two complex conjugate roots determined is relatively correct, their
argument (cos φ) is obtained as positive; but, in reality, this has a negative value.
Let us apply now the Bierge–Vi`ete method to determine the real root of equation (3.265). To
do this, let ξ be a real number. Dividing f (x) twice by x − ξ, we obtain the following data.
94 SOLUTION OF ALGEBRAIC EQUATIONS
1 0 1 −3
ξ 1 ξ = b1 ξ2 + 1 = b2 ξ3 + ξ − 3
ξ 1 2ξ = c1 3ξ2
+ 1
The following recurrence relation results:
ξ∗
= ξ −
−3 + ξ(ξ2
+ 1)
ξ2 + 1 + ξ × 2ξ
= ξ −
ξ3
+ ξ − 3
3ξ2 + 1
. (3.329)
As a matter of fact, we have obtained thus the same Newton method, the results of which have
been presented before.
We have seen that the application of the first Lin method is equivalent to the application of the
contractions method to the function
F(ξ) = −
anξ
f (ξ) − an
=
3
ξ2 + 1
. (3.330)
The method is convergent if
1 +
x1
an
f x1 < 1, (3.331)
that is,
1 −
1.213411
3
3 × 1.2134112
+ 1 < 1, (3.332)
which is absurd because it leads to 1.191 < 1.
The convergence is assured in the case of the second Lin method if we have simultaneously
1 +
x2x3
x3 − x2
f x2
an
< 1, 1 −
x2x3
x3 − x2
f x3
an
< 1. (3.333)
We obtain
1 −
2.472367
2.901224
1
−3
3(−0.606706 − 1.450612i)2
+ 1 < 1,
1 −
2.472367
2.901224
1
−3
3(−0.606706 + 1.450612i)2
+ 1 < 1, (3.334)
that is,
|−0.195481 + 1.5i| < 1, |−0.195481 − 1.5i| < 1, (3.335)
which is absurd; hence neither the second Lin method also cannot be applied.
3.9 APPLICATIONS
Problem 3.1
A material point of mass m moves along the Ox-axis (Fig. 3.2) acted upon by a force
F = −F0P
x
b
, (3.336)
where P is a polynomial of nth degree, while b is a given constant.
APPLICATIONS 95
F
x
xMO
Figure 3.2 Problem 3.1.
Determine the displacements of extreme values, knowing the following initial conditions: t = 0,
x = x0, ˙x = v0.
Solution:
1. Theory
From the theorem of kinetic energy,
mv2
2
−
mv2
0
2
= W, (3.337)
where v is the velocity of the material point, while W is the work done by force F.
W =
x
x0
F(x)dx, (3.338)
and with the initial condition v = 0, we obtain the extreme values of the distance x as solutions
of the equation
−
mv2
0
2
=
x
x0
F(x)dx. (3.339)
With the help of the notations
ξ =
x
b
, ξ0 =
x0
b
, (3.340)
we obtain from equation (3.339) the algebraic equation
P(ξ) − k = 0, (3.341)
where P(ξ) is a primitive of the polynomial P (ξ), while k is given by
k =
mv2
0
2bF0
+ P(ξ). (3.342)
Numerical application: m = 4 kg, x0 = 0, v0 = 20 ms−1
, F0 = 50 N, P (x/b) = A2(x/b)2
+
A1(x/b) + A0, A0 = −2, A1 = 2, A2 = 3, b = 1 m.
2. Numerical computation
We obtain the following successively:
P (ξ) = 3ξ2
+ 2ξ − 2, (3.343)
P (ξ) = ξ3
+ ξ2
− 2ξ, P(ξ0) = 0; (3.344)
it results in the equation
ξ3
+ ξ2
− 2ξ − 16 = 0 (3.345)
with the solutions
ξ1 = −2.459120, ξ2 = −3.45912 − 3.749674i, ξ3 = −3.45912 + 3.749674i. (3.346)
96 SOLUTION OF ALGEBRAIC EQUATIONS
Problem 3.2
Consider the system illustrated in Figure 3.3 (the schema of half of an automobile), formed by two
equal masses m1 and a mass m2. The nonlinear springs (denoted by k1, ε1) give an elastic force
Fe = k1z + ε1z2
, (3.347)
where z is the elongation.
ϕ
l l
x3
m2g
m1g m1g
k2
k2
k1,ε1 k1,ε1
x2
x1
Figure 3.3 Problem 3.2.
The system moves in a vertical plane, the rotation of the bar of mass m2 (denoted by φ) being
considered sufficiently small to admit the approximations sin φ ≈ φ, cos φ ≈ 1.
Let us suppose that both the nonlinear and the linear springs are contracted.
Determine
• the positions of equilibrium;
• their stability as a function of the parameter ε1, assuming that k1, k2, m1, m2 are known.
Numerical application: m1 = 50 kg, m2 = 750 kg, l = 2 m, k2 = 20000 Nm−1
, k1 = 105
Nm−1
,
J = [m2(2l)2
]/12 = 1000 kg m2
, g = 9.8065 m s−2
.
Solution:
1. Theory
1.1. Equations of equilibrium
Isolating the three bodies, we obtain the representations in Figure 3.4.
The equations of equilibrium are
ε1x2
10 + k1x10 − k2(x30 + lφ0 − x10) − m1g = 0, (3.348)
ε1x2
20 + k1x20 − k2(x30 − lφ0 − x20) − m1g = 0, (3.349)
k2(x30 + lφ0 − x10) + k2(x30 − lφ0 − x20) − m2g = 0, (3.350)
k2l(x30 − lφ0 − x20) + k2l(x30 + lφ0 − x10) = 0, (3.351)
where the index 0 corresponds to the position of equilibrium.
APPLICATIONS 97
m2g
m1g m1g
k1(x30 − lϕ − x20)
k2(x30 + lϕ − x10)
k1x10 + ε1 x 0
10 k1x20 + ε1 x 0
20
ϕ
Figure 3.4 Isolation of the rigid bodies.
The above equations may be put in the form
ε1x2
10 + (k1 + k2)x10 − k2x30 − k2lφ0 = m1g, (3.352)
ε1x2
20 + (k1 + k2)x20 − k2x30 + k2lφ0 = m1g, (3.353)
−k2x10 − k2x20 + 2k2x30 = m2g, (3.354)
x10 − x20 − 2lφ0 = 0. (3.355)
1.2. Positions of equilibrium
From relation (3.355), we obtain
φ0 =
x10 − x20
2l
, (3.356)
which, replaced in relations (3.352) and (3.354), leads to
ε1x2
10 + (k1 + k2)x10 − k2x30 −
k2
2
(x10 − x20) = m1g, (3.357)
ε1x2
20 + (k1 + k2)x20 − k2x30 +
k2
2
(x10 − x20) = m1g. (3.358)
From equation (3.354), we get
x30 =
m2g
2k2
+
x10 + x20
2
. (3.359)
Subtracting relation (3.357) from (3.358), term by term, it follows that
ε1(x2
10 − x2
20) + k1(x10 − x20) = 0, (3.360)
from which it follows that
x10 = x20 or x10 + x20 = −
k1
ε1
. (3.361)
98 SOLUTION OF ALGEBRAIC EQUATIONS
If x10 = x20, then from equation (3.356) we obtain φ0 = 0, so that from equation
(3.359) we get
x30 =
m2g
2k2
+ x10 =
m2g
2k2
+ x20. (3.362)
If x10 + x20 = −k1/ε1, then we may write
x10 = −
k1
ε1
− x20, x20 = −
k1
ε1
− x10. (3.363)
Relation (3.359) leads to
x30 =
m2g
2k2
−
k1
2ε1
, (3.364)
while from equation (3.356) we obtain
φ0 =
x10
l
+
k1
2lε1
, φ0 −
x20
l
−
k1
2lε1
. (3.365)
Equation (3.357) now takes the form
ε1x2
10 + (k1 − k2)x10 −
k1k2
2ε1
− m1 +
m2
2
g = 0, (3.366)
while equation (3.358) becomes
ε1x2
20 + (k1 − k2)x20 −
k1k2
2ε1
− m1 +
m2
2
g = 0. (3.367)
As a matter of fact, equations (3.366) and (3.367) are the same. The discriminant of
these equations is
= k2
1 + k2
2 + 4ε1 m1 +
m2
2
g (3.368)
and the condition ≥ 0 leads to the inequality
ε1 ≥ −
k2
1 + k2
2
4 m1 + m2
2 g
. (3.369)
The sum of the roots of equation (3.366) (of equation (3.367) too) is
S =
k1 − k2
ε1
= −
k1
ε1
, (3.370)
which means that the position of equilibrium (if it exists) is given by
x10 = x20 =
k2 − k1 −
√
2ε1
or x10 = x20 =
k2 − k1 +
√
2ε1
. (3.371)
As x10 > 0, x20 > 0 (the springs are compressed), from x10 + x20 = −k1/ε1 it follows
that ε1 < 0.
It follows, from the first equality (3.371), that
k2 = , (3.372)
APPLICATIONS 99
from which
k2
1 + k2
2 + 4ε1 m1 +
m2
2
g = k2
2, (3.373)
that is,
ε1 = −
k2
1
4 m1 + m2
2 g
, (3.374)
which verifies inequalities (3.369) and ε1 < 0.
It follows that the position of equilibrium is
x10 = x20 = −
k1
2ε1
> 0, (3.375)
ε1 being given by equation (3.374).
For the second equation (3.371), we obtain
k2 = −
√
, (3.376)
which is absurd.
Let us remark that equation (3.375) is a particular case of the first relation (3.361),
and hence at equilibrium x10 = x20.
1.3. Equations of motion
Using the schema in Figure 3.3, these equations are
m1 ¨x1 = k2(x3 + lφ − x1) − k1x1 − ε1x2
1 + m1g, (3.377)
m1 ¨x2 = k2(x3 − lφ − x2) − k1x2 − ε1x2
2 + m1g, (3.378)
m2 ¨x3 = k2(−x3 − lφ + x1) + k2(−x3 + lφ + x2) + m2g, (3.379)
J ¨φ = k2l(−x3 − lφ + x1) − k2l(−x3 + lφ + x2). (3.380)
Denoting x1 = ξ1, x2 = ξ2, x3 = ξ3, φ = ξ4, ˙x1 = ξ5, ˙x2 = ξ6, ˙x3 = ξ7, ˙φ = ξ8,
a10 = −
ε1
m1
, a11 = −
k1 + k2
m1
, a13 =
k2
m1
, a14 =
k2l
m1
, (3.381)
a20 = −
ε1
m1
, a22 = −
k1 + k2
m1
, a23 =
k2
m1
, a24 = −
k2l
m1
, (3.382)
a31 =
k2
m2
, a32 =
k2
m2
, a33 = −
2k2
m2
, (3.383)
a41 =
k2l
J
, a42 = −
k2l
J
, a44 = −
2k2l2
J
, (3.384)
we obtain the system
˙ξ1 = ξ5, ˙ξ2 = ξ6, ˙ξ3 = ξ7, ˙ξ4 = ξ8, ˙ξ5 = a10ξ2
1 + a11ξ1 + a13ξ3 + a14ξ4 + g,
˙ξ6 = a20ξ2
2 + a22ξ2 + a23ξ3 + a24ξ4 + g, ˙ξ7 = a31ξ1 + a32ξ2 + a33ξ3 + g,
˙ξ8 = a41ξ1 + a42ξ2 + a44ξ4. (3.385)
100 SOLUTION OF ALGEBRAIC EQUATIONS
1.4. Stability of the positions of equilibrium
Denoting by fk(ξ1, . . . , ξ8), k = 1, 8, the expressions of the right member of relations
(3.385) and by jkl = ∂fk/∂ξl, k, l = 1, 8, their partial derivatives, the characteristic
equation is
−λ 0 0 0 1 0 0 0
0 −λ 0 0 0 1 0 0
0 0 −λ 0 0 0 1 0
0 0 0 −λ 0 0 0 1
j51 0 j53 j54 −λ 0 0 0
0 j62 j63 j64 0 −λ 0 0
j71 j72 j73 0 0 0 −λ 0
j81 j82 0 j84 0 0 0 −λ
= 0, (3.386)
from which
j51 − λ2
0 j53 j54
0 j62 − λ2
j63 j64
j71 j72 j73 − λ2 0
j81 j82 0 j84 − λ2
= 0. (3.387)
We obtain the algebraic equation of eighth degree in λ
λ8
+ Aλ6
+ Bλ4
+ Cλ2
+ D = 0, (3.388)
where
A = −j51 − j62 − j73 − j74, (3.389)
B = j62j73 + j62j84 + j73j84 − j64j82 − j63j72
+j51j62 + j51j73 + j51j84 − j53j71 − j54j81, (3.390)
C = −j62j73j84 + j64j73j82 + j63j72j84 − j51j62j73 − j51j62j84 − j51j73j84
+j51j64j82 + j51j63j72 + j53j62j71 + j53j71j84 + j54j62j81
+j54j73j81, (3.391)
D = j51j62j73j84 − j51j64j73j82 − j51j63j72j84 + j53j64j71j82 − j53j64j72j81
−j53j62j71j84 − j54j62j73j81 − j54j63j71j82 + j54j63j72j81. (3.392)
Equation (3.388), with the notation u = λ2
, may be written in the form
u4
+ Au3
+ Bu2
+ Cu + D = 0 (3.393)
and, for the position of stable equilibrium, it is necessary and sufficient that all the roots
of equation (3.393) be negative and distinct (see 1.10. Discussion).
The following situations may occur:
• The roots are distinct.
• There is a double root.
• There is a triple root.
• There is a root of an order of multiplicity equal to four.
• There are two double roots.
APPLICATIONS 101
1.5. Case of distinct roots
Making u → −u in equation (3.393), we obtain
u4
− Au3
+ Bu2
− Cu + D = 0 (3.394)
and, from Descartes’ theorem, we deduce the necessary condition for the existence of
four negative roots
A > 0, B > 0, C > 0, D > 0. (3.395)
We construct the Sturm sequence associated to the polynomial
f (u) = u4
+ Au3
+ Bu2
+ Cu + D. (3.396)
We choose
f0(u) = u4
+ Au3
+ Bu2
+ Cu + D, (3.397)
f1 = u3
+
3A
4
u2
+
B
2
u +
C
4
. (3.398)
Dividing f0 by f1, we obtain the remainder
R2 =
8B − 3A2
16
u2
+
6C − AB
8
u +
16D − AC
16
. (3.399)
We find that it is necessary that 8B − 3A2 = 0; in the opposite case, R2 would have
a degree at most equal to 1 (as with the polynomial f2 in the Sturm sequence) and
would result in only four terms in the Sturm sequence (f0, f1, f2, and f3 (the last term
being a constant)), so that in the sequence f0(−∞), f1(−∞), f2(−∞), f3(−∞), we
would have at most three variations of sign. It would follow that equation (3.393) has
at most three negative roots, which is not convenient. As a conclusion, it results in the
necessary condition
8B − 3A2
= 0. (3.400)
Writing
R2 = −α2u2
− β2u − γ2, (3.401)
we may choose the following term of Sturm’s sequence in the form
f2(u) = u2
+ β2u + γ2, (3.402)
where
β2 =
β2
α2
=
2(6C − AB)
8B − 3A2
, γ2 =
γ2
α2
=
16D − AC
8B − 3A2
. (3.403)
Dividing f1 by f2, we obtain the remainder
R3 = −β3u − γ3, (3.404)
where
β3 = γ2 −
B
2
+ β2
3
4
A − β2 , γ3 = −
C
4
+ γ2
3
4
A − β2 . (3.405)
102 SOLUTION OF ALGEBRAIC EQUATIONS
TABLE 3.3 Table of the Variations of Sign in the Sturm Sequence
u f0 f1 f2 f3 f4 WS
−∞ + − + − sgnf4 3 or 4
0 + + sgnγ2 sgnγ3 sgnf4 0, 1, 2, or 3
Similar considerations lead to the condition β3 = 0, from which
16D − AC
8B − 3A2
−
B
2
+
2(6C − AB)
8B − 3A2
3
4
A −
2 (6C − AB)
8B − 3A2
= 0. (3.406)
We choose
f3(u) = u + γ3, (3.407)
with
γ3 =
γ3
β3
. (3.408)
Dividing f2 by f3 results in the remainder
R4 = γ2 − γ3(β2 − γ3) (3.409)
and the polynomial
f4(u) = γ3(β2 − γ3) − γ2, (3.410)
which must be nonzero (the roots are distinct!), from which we obtain the condition
−C
4 + γ2
3
4 A − β2
γ2 − B
2 + β2
3
4 A − β2
β2 −
−C
4 + γ2
3
4 A − β2
γ2 − B
2 + β2
3
4 A − β2
− γ2 = 0. (3.411)
We may construct Table 3.3.
The only possibility to have four negative distinct roots is that WS(−∞) = 4 and
WS(0) = 0, from which result the conditions
f4 > 0, γ2 > 0, γ3 > 0. (3.412)
1.6. The case of a double root
If the polynomial f (u) given by (L) has a double root, say u, then u is also a root for
the derivative f (u), that is,
u4
+ Au3
+ Bu2
+ Cu + D = 0, 4u3
+ 3Au2
+ 2Bu + C = 0. (3.413)
Relations (3.413) multiplied by 4 and −u, respectively, summed, lead to
Au3
+ 2Bu2
+ 3Cu + 4D = 0. (3.414)
We multiply the second relation (3.413) by A, relation (3.141) by −4 and make the
sum, obtaining
(3A2
− 8B)u2
+ (2AB − 12C)u + AC − 16D = 0. (3.415)
APPLICATIONS 103
Summing relation (3.414), multiplied by (8B − 3A2
), with relation (3.415), multi-
plied by Au, we get the relation
(4B2
+ A2
B − 3AC)u2
+ (6BC − 2A2
C − 4AD)u + D(3A2
− 8B) = 0. (3.416)
Multiplying expressions (3.415) and (3.416) by (4B2
− A2
B − 3AC) and (8B −
3A2), respectively, and summing the results thus obtained, we get
(8AB3
− 2A3
B2
− 28A2
BC + 36AC2
− 32ABD − 6A4
C + 12A3
D)u
+ 4AB2
C − A3
BC − 3A2
C2
− 128B2
D + 64A2
BD + 48ACD − 9A4
D = 0
(3.417)
and the condition
4AB3
− A3
B2
− 14A2
BC + 18AC2
− 16ABD − 3A4
C + 6A3
D = 0. (3.418)
We now construct Horner’s schema in Table 3.4.
The other roots result from
u2
+ (A + 2u)u + 3u2
+ 2Au + B = 0 (3.419)
which must have two negative roots, distinct and different from u, from which result
the conditions
= (A + 2u)2
− 4(3u2
+ 2Au + B) > 0,
A + 2u > 0, 3u2
+ 2Au + B > 0,
−(A + 2u) ±
√
2
= u. (3.420)
Writing relation (3.418) in the form E1u + E2 = 0, the notations being obvious, we
also obtain the condition
E2
E1
> 0. (3.421)
1.7. Case of a triple root
Let us denote this root by u; then, it must satisfy the conditions
u
4
+ Au
3
+ Bu
2
+ Cu + D = 0, 4u
3
+ 3Au
2
+ 2Bu + C = 0, 6u
2
+ 3Au + B = 0.
(3.422)
Multiplying the second relation (3.422) by 3, the third one by −2u and summing, we
obtain the equation
3Au
2
+ 4Bu + 3C = 0. (3.423)
Summing now the last relation (3.422), multiplied by A, to relation (3.423), multi-
plied by −2, it follows that
(3A2
− 8B)u + AB − 6C = 0, (3.424)
TABLE 3.4 Horner’s Schema for a Double Root
1 A B C D
u 1 A + u u2
+ Au + B u3
+ Au2
+ Bu + C 0
u 1 A + 2u 3u2
+ 2Au + B 0
104 SOLUTION OF ALGEBRAIC EQUATIONS
from which
u =
6C − AB
3A2 − 8B
< 0, 3A2
− 8B = 0. (3.425)
We construct now Horner’s schema in Table 3.5.
We obtain thus the last root
u∗
= −A − 3u < 0. (3.426)
1.8. Case of the root of order of multiplicity equal to four
Let u be this root. It will satisfy the relations
u
4
+ Au
3
+ Bu
2
+ Cu + D = 0, 4u
3
+ 3Au
2
+ 2Bu + C = 0,
6u
2
+ 3Au + B = 0, 4u + A = 0, (3.427)
from which
u = −
A
4
< 0; (3.428)
it follows
u +
A
4
4
= u4
+ Au3
+ Bu2
+ Cu + D, (3.429)
from which
B =
3A2
8
, C =
A3
16
, D =
A4
256
. (3.430)
1.9. Case of two double roots
Let u < 0 and u < 0 be the two double roots. We may write
u4
+ Au3
+ Bu2
+ Cu + D = (u − u)2
(u − u)2
, (3.431)
from which
A = −2(u + u), B = (u + u)2
, C = −2uu(u + u), D = (uu)2
,
A > 0, B > 0, C > 0, D > 0. (3.432)
It follows that u and u are solutions of the equation
z2
+
A
2
z +
√
D = 0, (3.433)
that is,
z1,2 = −
A
4
±
1
2
A2 − 16
√
D, (3.434)
TABLE 3.5 Horner’s Schema for a Triple Root
1 A B C D
u 1 A + u u
2
+ Au + B u
3
+ Au
2
+ Bu + C 0
u 1 A + 2u 3u
2
+ 2Au + B 0
u 1 A + 3u 0
APPLICATIONS 105
obtaining thus a new condition
A4
256
> D. (3.435)
Denoting by
u = −
A
4
+
1
2
A2 − 16
√
D, u = −
A
4
−
1
2
A2 − 16
√
D, (3.436)
it follows that
u + u = −
A
2
, uu =
√
D, (u + u)2
=
A2
4
= B, −2uu(u + u) = A
√
D = C.
(3.437)
1.10. Discussion
Let u = α + iβ, α = 0, β = 0 be a root of equation (3.413), which will be written in
the trigonometric form
u = |u|(cos θ + i sin θ), (3.438)
from which
λ = u
1
2 = |u| cos
θ
2
+ i sin
θ
2
or
λ = u
1
2 = |u| cos
θ
2
+ π + i sin
θ
2
+ π . (3.439)
Let us remark that, irrespective of the value of θ, we get either cos(θ/2) > 0, or
cos(θ/2 + π) > 0, hence equation (3.388) will have at least one root with a positive
real part, that is, the position of equilibrium is unstable. Let us suppose now that a root
of equation (3.393) is of the form
u = iβ, β = 0, (3.440)
that is,
u = |β| cos
π
2
+ i sin
π
2
or u = |β| cos
3π
2
+ i sin
3π
2
. (3.441)
We deduce
λ = u
1
2 = |β| cos
π
4
+ i sin
π
4
or λ = u
1
2 = |β| cos
3π
4
+ i sin
3π
4
or
λ = u
1
2 = |β| cos
5π
4
+ i sin
5π
4
, (3.442)
hence at least one root of the characteristic equation (3.388) has its real part positive,
so that the equilibrium is unstable.
The case α = 0, β = 0 leads to the root u = 0, from which it follows that λ = 0
is a double root of the characteristic equation (3.388). The linear approximation of the
motion around the position of equilibrium will contain a term of the form Kt, where K
is a constant, hence the equilibrium also is unstable.
Thus, the only possibility of stability of equilibrium is that described by the fact that
all the roots of equation (3.393) are negative.
106 SOLUTION OF ALGEBRAIC EQUATIONS
−6.5 −6 −5.5 −5 −4.5 −4 −3.5 −3 −2.5 −2
× 105
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
ε1
ξ11
Figure 3.5 The first branch of stability described by ξ11 for ε1 < 0.
If such a root u < 0 is double, then for the characteristic equation, we obtain the dou-
ble roots λ1 = i
√
|u|, λ2 = −i
√
|u|. Each such root leads, in the linear approximation of
the motion around the position of equilibrium, to terms of the form Kt
√
|u| sin(πt/2);
the equilibrium is unstable too.
Hence, it follows that the equilibrium is stable (in fact, simply stable) if and only if
the four roots of equation (3.393) are negative and distinct.
2. Numerical computation
We obtain the values
a11 = −2400, a13 = 400, a14 = 53.333, a22 = −2400, a23 = 400,
a24 = −53.333, a31 = 26.667, a32 = 26.667, a33 = −53.333, a41 = 40,
a42 = −40, a44 = −160, (3.443)
a10 = −
ε1
50
, a20 = −
ε1
50
, (3.444)
j51 = −
ε1ξ1
25
− 2400, j53 = 400, j54 = 53.333,
j62 = −
ε1ξ2
25
− 2400 = −
ε1ξ1
25
− 2400, j63 = 400, j64 = −53.333,
j71 = 26.667, j72 = 26.667, j73 = −53.333, j81 = 40,
j82 = −40, j84 = −160. (3.445)
The stability diagrams are plotted in Figure 3.5, Figure 3.6, and Figure 3.7.
We have to consider two branches for ε1 < 0. The first branch is given by
ξ11 =
k2 − k1 + k2
1 + k2
2 + 4ε1 m1 +
m2
2
g
2ε1
(3.446)
APPLICATIONS 107
−7 −6 −5 −4 −3 −2 −1 0
× 105
0
50
100
150
ε1
× 105ε1
ξ12ξ12
−6 −5.5 −5 −4.5 −4 −3.5 −3 −2.5 −2 −1.5 −1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(a)
(b)
Figure 3.6 (a) The second branch of stability described by ξ12 for ε1 < 0 and (b) detail of this
branch.
and the second one by
ξ12 =
k2 − k1 − k2
1 + k2
2 + 4ε1 m1 +
m2
2
g
2ε1
. (3.447)
They exist only if the expression under the radical is positive. The two branches start from the
same point for which the expression under the radical vanishes.
The first branch may lead, for values of ε1 sufficiently close to zero, to negative roots ξ11, a
fact which is not in concordance with the hypothesis that all the springs are compressed. The
branch contains simply stable positions of equilibrium and is presented in Figure 3.5.
The second branch leads to solutions valid for any ε1 < 0. Moreover, these solutions define
simply stable positions of equilibrium. For ε1 → 0, we obtain ξ12 → ∞. This branch is presented
in Figure 3.6.
108 SOLUTION OF ALGEBRAIC EQUATIONS
0
(a)
(b)
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
× 106
0
2
4
6
8
10
12
14
× 1011
ε1
× 106ε1
ξ13
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
ξ13
Figure 3.7 (a) Branch of stability described by ξ1 for ε1 > 0 and (b) detail of this branch.
If ε1 > 0, then we have to consider only one branch, described by
ξ1 =
k2 − k1 + k2
1 + k2
2 + 4ε1 m1 +
m2
2
g
2ε1
. (3.448)
This branch leads to ξ1 → ∞ for ε1 → 0 too. It is presented in Figure 3.7.
If ε1 = 0, then we obtain the linear case described by
ξ1 =
m1 +
m2
2
g
k1
, (3.449)
which is a simply stable position of equilibrium.
FURTHER READING 109
Obviously, the stability diagram in the general case is much more complicated and to draw it
we must take into consideration all the possibilities of compression or expansion of the springs.
Moreover, because the function that describes the elastic force in the nonlinear springs is not
an odd function, the situations to be studied cannot be obtained one from the other by simple
changes of sign. The diagrams that are presented are only parts of the stability diagram of the
mechanical system considered.
FURTHER READING
Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of
America.
Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis:
Theory, Methods and Practice. Boca Raton: CRC Press.
Atkinson KE (1989). An Introduction to Numerical Analysis. 2nd ed. New York: John Wiley & Sons,
Inc.
Atkinson KE (2003). Elementary Numerical Analysis. 2nd ed. Hoboken: John Wiley & Sons, Inc.
Bakhvalov N (1976). M´ethodes Num´erique. Moscow: Editions Mir (in French).
Berbente C, Mitran S, Zancu S (1997). Metode Numerice. Bucures¸ti: Editura Tehnic˘a (in Romanian).
Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole.
Butt R (2009). Introduction to Numerical Analysis Using MATLAB. Boston: Jones and Bartlett Pub-
lishers.
Chapra SC (1996). Applied Numerical Methods with MATLAB for Engineers and Scientists. Boston:
McGraw-Hill.
Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson.
Cira O, M˘arus¸ter S¸ (2008). Metode Numerice pentru Ecuat¸ii Neliniare. Bucures¸ti: Editura Matrix Rom
(in Romanian).
Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice-Hall, Inc.
D´emidovitch B, Maron I (1973). ´El´ements de Calcul Num´erique. Moscow: Editions Mir (in French).
DiBenedetto E (2010). Classical Mechanics: Theory and Mathematical Modeling. New York: Springer-
Verlag.
Epperson JF (2007). An Introduction to Numerical Methods and Analysis. Hoboken: John Wiley &
Sons, Inc.
Fung YC, Tong P (2011). Classical and Computational Solid Mechanics. Singapore: World Scientific
Publishing.
Gautschi W (1997). Numerical Analysis: An Introduction. Boston: Birkh¨auser.
Godunov SK, Reabenki VS (1977). Scheme de Calcul cu Diferent¸e Finite. Bucures¸ti: Editura Tehnic˘a
(in Romanian).
Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen-
tation of Algorithms. Princeton: Princeton University Press.
Hamming RW (1987). Numerical Methods for Scientists and Engineers. 2nd ed. New York: Dover
Publications.
Hamming RW (2012). Introduction to Applied Numerical Analysis. New York: Dover Publications.
Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing.
Hildebrand FB (1987). Introduction to Numerical Analysis. 2nd ed. New York: Dover Publications.
Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill.
Jazar RN (2008). Vehicle Dynamics: Theory and Applications: New York: Springer-Verlag.
Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd
ed. Boca Raton: CRC Press.
110 SOLUTION OF ALGEBRAIC EQUATIONS
Kleppner D, Kolenkow RJ (2010). An Introduction to Mechanics. Cambridge: Cambridge University
Press.
Kress R (1996). Numerical Analysis. New York: Springer-Verlag.
Krˆılov AN (1957). Lect¸ii de Calcule prin Aproximat¸ii. Bucures¸ti: Editura Tehnic˘a (in Romanian).
Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill.
Lange K (2010). Numerical Analysis for Statisticians. 2nd ed. New York: Springer-Verlag.
Lurie AI (2002). Analytical Mechanics. New York: Springer-Verlag.
Marinescu G (1974). Analiz˘a Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Romanian).
Meriam JL, Kraige LG (2012). Engineering Mechanics: Dynamics. Hoboken: John Wiley & Sons, Inc.
Otto SR, Denier JP (2005). An Introduction to Programming and Numerical Methods in MATLAB.
London: Springer-Verlag.
Pandrea N (2000). Elemente de Mecanica Solidului ˆın Coordonate Pl¨uckeriene. Bucures¸ti: Editura
Academiei Romˆane (in Romanian).
Pandrea N, Pˆarlac S, Popa D (2001). Modele pentru Studiul Vibrat¸iilor Automobilelor. Pites¸ti: Tiparg
(in Romanian).
Pandrea N, St˘anescu ND (2002). Mecanic˘a. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
Popovici P, Cira O (1992). Rezolvarea Numeric˘a a Ecuat¸iilor Neliniare. Timis¸oara: Editura Signata
(in Romanian).
Postolache M (2006). Modelare Numeric˘a. Teorie s¸i Aplicat¸ii. Bucures¸ti: Editura Fair Partners (in
Romanian).
Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific
Computing. 3rd ed. Cambridge: Cambridge University Press.
Quarteroni A, Sacco R, Saleri F (2010). Numerical Mathematics. 2nd ed. Berlin: Springer-Verlag.
Ralston A, Rabinowitz P (2001). A First Course in Numerical Analysis. 2nd ed. New York: Dover
Publications.
Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press.
Salvadori MG, Baron ML (1962). Numerical Methods in Engineering. Englewood Cliffs: Prentice-Hall,
Inc.
Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson.
Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN.
Bucures¸ti: Editura Tehnic˘a (in Romanian).
St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
St˘anescu ND (2011). Mechanical Systems with neo–Hookean Elements: Stability and Behavior. Saar-
brucken: LAP.
Stoer J, Bulirsh R (2010). Introduction to Numerical Analysis. 3rd ed. New York: Springer-Verlag.
S¨uli E, Mayers D (2003). An Introduction to Numerical Analysis. Cambridge: Cambridge University
Press.
Teodorescu PP (2010). Mechanical Systems: Classical Models. Volume 1: Particle Mechanics. Dor-
drecht: Springer-Verlag.
Udris¸te C, Iftode V, Postolache M (1996). Metode Numerice de Calcul. Algoritmi s¸i Programe Turbo
Pascal. Bucures¸ti: Editura Tehnic˘a (in Romanian).
4
LINEAR ALGEBRA
4.1 CALCULATION OF DETERMINANTS
4.1.1 Use of Definition
Let A be a square matrix of order n, A ∈ Mn(R), the elements of which are aij , i, j = 1, n, hence
A = [aij ]i=1,n
j=1,n
. (4.1)
The determinant of matrix A, denoted by det A, is given by
det A =
σ∈ n
sgnσ
n
i=1
aiσ(i), (4.2)
where σ is a permutation of the set {1, 2, . . . , n}, n is the set of all these permutations, while
sgnσ is the signature of the permutation σ, having the value 1 if σ is an even permutation and the
value −1 if σ is an odd permutation.
Observation 4.1 In the calculation of the determinant det A, there appear n! terms in formula
(4.2).
Observation 4.2 As n! is a quickly increasing function with respect to n, it follows that the
number of terms that must be calculated becomes very large, even for small values of n. For each
generated permutation, one must calculate its signature too. It follows that the calculation time
increases considerably, even for small values of n. For instance, 7! = 5040, so that a determinant
of seventh order will generate 5040 permutations, and it is necessary to determine the signature of
every one of them.
Observation 4.3 Formula (4.2) must be applied even in the case in which the value of the
determinant can be exactly obtained by other methods.
Numerical Analysis with Applications in Mechanics and Engineering, First Edition.
Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.
 2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc.
111
112 LINEAR ALGEBRA
4.1.2 Use of Equivalent Matrices
This method starts from the following properties of the determinants:
• if two rows (columns) of a square matrix A commute into each other, then a new matrix A
is obtained for which det A = − det A;
• if a row (column) of a square matrix is multiplied by λ, then a new matrix A is obtained for
which det A = λ det A;
• if a row (column) of a square matrix A is multiplied by λ and this is added to another row
(column) of A, then a new matrix A is obtained for which det A = det A.
The idea of the method consists in applying some such transformation to the matrix A, in order
to obtain a new matrix A of a particular form, for which det A is easier to calculate (directly).
We must take into account the factors λ1, . . . , λm that may occur because of the transformations
made, so that
det A =
n
i=1
λi det A . (4.3)
Observation 4.4 Let us consider a permutation σ of the set {1, 2, . . . , n} and let us suppose that
σ is unlike the identical one. Let us now write this permutation in the form
σ =
1 2 · · · i · · · n
σ (1) σ(2) · · · σ(i) · · · σ(n)
. (4.4)
Then there exists an index i ∈ {1, 2, . . . , n} so that σ(i) = j < i.
Demonstration. Let us suppose that this affirmation is not true. Then, for any i ∈ {1, 2, . . . , n} we
have σ(i) = j ≥ i. First, we take i = n. We deduce σ(n) = j ≥ n, hence σ(n) = n. Let us sup-
pose now that i = n − 1. It follows that σ(n − 1) = j ≥ n − 1, from which σ(n − 1) = n − 1 or
σ(n − 1) = n. But σ(n) = n, hence σ(n − 1) = n. We obtain σ(n − 1) = n − 1. Proceeding analog-
ically for i = n − 2, i = n − 3, . . . , i = 1, it follows that σ(i) = i for any i, 1 ≤ i ≤ n. But, by
definition, σ is different fromthe identical permutation. Hence, there is a contradiction, so that the
supposition made is false and the observation is proved.
The previous observation shows that any term in formula (4.2), excepting the one obtained by
using the identical permutation, will contain a factor aij , i < j, that is, an element situated under
the principal diagonal in the matrix A. It follows that for a matrix A of the form
A =







a11 a12 · · · a1,n−1 a1n
0 a22 · · · a2,n−1 a2n
...
...
...
...
...
0 0 · · · an−1,n−1 an−1,n
0 0 · · · 0 ann







, (4.5)
the determinant is easy to calculate; it is given by
det A =
n
i=1
aii . (4.6)
By this method, we try to obtain a matrix A of the form (4.5) so as to have
det A = ± det A , (4.7)
CALCULATION OF THE RANK 113
where we take the sign + in case of an even number of row permutations or the sign− in case of
an odd number of such permutations.
Observation 4.5 Let us suppose that at a certain transformation step we obtain aii = 0, for a
certain i, and that for any j, 1 < j ≤ n, we have aji = 0. In this case, det A = 0 and it is no more
necessary to obtain the form (4.5).
Observation 4.6 The procedure presented above may be modified to obtain a matrix A of the
form
A =






0 0 · · · 0 a1n
0 0 · · · a2,n−1 a2n
· · · · · · · · · · · · · · ·
0 an−1,2 · · · an−1,n−1 an−1,n
an1 an2 · · · an,n−1 ann






, (4.8)
for which
det A = −
n
i=1
ai,n+1−i. (4.9)
4.2 CALCULATION OF THE RANK
Let A be a matrix with m rows and n columns and real elements, A ∈ Mm,n(R). Let us suppose
that m ≤ n.
By definition, the rank of the matrix A is the order of the greatest nonzero minor; but, to obtain
this rank, we must consider a great number of determinants.
Observation 4.7 We have
rankA ≤ min{m, n}. (4.10)
for a matrix A ∈ Mm,n(R).
To calculate this rank we use its following properties:
• The rank of the matrix A is equal to the rank of its transpose AT
.
• The rank of the matrix A is not modified by multiplying a row (column) by a nonzero number.
• The rank of the matrix A does not change by commuting two of its rows (columns) into each
other.
• The rank of the matrix A is not modified by multiplying one of its rows (columns) by λ and
by adding the result to another row (column) of A.
The idea of the method consists in obtaining a matrix A of the same rank as A but of the
particular form (m ≤ n)
A =






a11 0 · · · 0 0 · · · 0 0
0 a22 · · · 0 0 · · · 0 0
· · · · · · · · · · · · · · · · · · · · · · · ·
0 0 · · · am−1,m−1 0 · · · 0 0
0 0 · · · 0 am,m · · · 0 0






, (4.11)
where the greatest nonzero minor is obtained by selecting those rows and columns for which aii = 0,
1 ≤ i ≤ m.
114 LINEAR ALGEBRA
Hence, it follows that the rank of the matrix A is equal to the number of nonzero elements of
the principal pseudo diagonal of matrix A in formula (4.11).
Observation 4.8 We need to continue calculating until we obtain the form (4.11). If we try to
obtain only a superior triangular matrix, then we may obtain an incorrect result.
4.3 NORM OF A MATRIX
Definition 4.1 Let A and B be two matrices with m rows and n columns and real elements. We
say that A ≤ B if for any i and j, 1 ≤ i ≤ m, 1 ≤ j ≤ n, we have
aij ≤ bij . (4.12)
Definition 4.2 Let A ∈ Mm,n(R). We define the modulus of the matrix A = [aij ]i=1,m
j=1,n
by the
relation
|A| = [|aij |]i=1,m
j=1,n
. (4.13)
Proposition 4.1 The application of modulus has the following properties:
(i) If A and B are two matrices with m rows and n columns and real elements, then
|A + B| ≤ |A| + |B|. (4.14)
(ii) If A ∈ Mm,n(R) and B ∈ Mn,p(R), then
|AB| ≤ |A| · |B|. (4.15)
(iii) If A ∈ Mm,n(R) and α ∈ R, then
|αA| = |α| · |A|. (4.16)
Demonstration.
(i) Let A = [aij ]i=1,m
j=1,n
, B = [bij ]i=1,m
j=1,n
and let us denote by C = [cij ]i=1,m
j=1,n
the matrix sum,
C = A + B. An element of this matrix is given by
cij = aij + bij , (4.17)
hence, by applying the modulus, we obtain
|cij | = |aij + bij | ≤ |aij | + |bij |. (4.18)
Because i and j have been chosen arbitrarily, 1 ≤ i ≤ m, 1 ≤ j ≤ n, we have the result (4.14).
NORM OF A MATRIX 115
(ii) Let us denote by C = [cij ]i=1,m
j=1,n
the matrix product C = A · B. An element of this matrix is
given by
cij =
n
k=1
aik bkj . (4.19)
Analogically, we denote by D = [dij ]i=1,m
j=1,p
the matrix D = |A| · |B|, an element of which is given
by
dij =
n
k=1
|aik ||bkj |. (4.20)
Comparing relations (4.18) and (4.19) and taking into account that i and j are arbitrary, 1 ≤ i ≤
m, 1 ≤ j ≤ p, we obtain the relation (4.15).
(iii) Let B = [bij ]i=1,m
j=1,n
be the matrix B = |αA|, an element of which is
bij = |αaij | = |α||aij |. (4.21)
We obtain the relation (4.16) immediately.
Corollary 4.1 Let A ∈ Mn(R) be a square matrix of order n with real elements. Then, for p ∈ N
arbitrary, one has the relation
|Ap
| ≤ |A|p
. (4.22)
Demonstration. If p = 0, then relation (4.22) is obvious because
|A0
| = In and |A|0
= In. (4.23)
Let us suppose that the relation is true for p and let us state it now for p + 1. We have
|Ap+1
| = |A
p
· A|. (4.24)
Applying the property (ii) of Proposition 4.1, we get
|Ap
· A| ≤ |Ap
| · |A| (4.25)
and relation (4.24) becomes
|Ap+1
| = |Ap
· A| ≤ |Ap
| · |A| ≤ |A|p
· |A| = |A|p+1
. (4.26)
The principle of mathematical induction shows thus that relation (4.24) is true for any p ∈ N and
the corollary is proved.
Definition 4.3 Let A ∈ Mm,n(R). A real number satisfying the following properties is called the
norm of the matrix A, and is denoted by A :
(i) A ≥ 0 and A = 0 if and only if A = 0m,n.
(ii) αA = |α| A for any α ∈ R.
116 LINEAR ALGEBRA
(iii) A + B ≤ A + B for any matrix B ∈ Mm,n(R).
(iv) AB ≤ A · B for any matrix B ∈ Mn,p(R).
Observation 4.9
(i) If we put α = −1 in the property (ii) of Definition 4.3, it follows that
−A = A for any A ∈ Mm,n(R). (4.27)
(ii) From
A = B + A − B ≤ B + A − B (4.28)
it follows that
A − B ≤ A − B (4.29)
and, taking into account equation (4.27), we get
B − A ≤ A − B (4.30)
too. The last two relations lead to
A − B ≥ | A − B |. (4.31)
Definition 4.4 A norm of a matrix is called canonical if, in addition to the four properties of
Definition 4.3, it also fulfills the following conditions:
(i) For any matrix A ∈ Mm,n(R), A = [aij ]i=1,m
j=1,n
, we have |aij | ≤ A for any i = 1, m,
j = 1, n;
(ii) for any A, B ∈ Mm,n(R) with |A| ≤ |B|, we have A ≤ B .
Proposition 4.2 Let A ∈ Mm,n(R) be a matrix with m rows and n columns and real elements.
We define
A ∞ = max
1≤i≤m
n
j=1
|aij |, (4.32)
A 1 = max
1≤j≤m
m
i=1
|aij |, (4.33)
A k =
m
i=1
n
j=1
a2
ij . (4.34)
Under these conditions, · ∞, · 1, and · k are canonical norms.
Demonstration. One must verify six properties:
(i) It is obvious that A ≥ 0 and A = 0 if and only if A = 0m,n for any of the three norms.
NORM OF A MATRIX 117
(ii) The relation αA = |α| A is immediate because the modulus of the product is equal to
the product of the moduli.
(iii) Let B ∈ Mn,p(R) be arbitrary. We may write successively
A + B ∞ = max
1≤i≤m
n
j=1
|aij + bij | ≤ max
1≤i≤m


n
j=1
aij +
n
j=1
|bij |


≤ max
1≤i≤m
n
j=1
|aij | + max
1≤i≤m
n
j=1
|bij | = A ∞ + B ∞, (4.35)
A + B 1 = max
1≤j≤m
m
i=1
|aij + bij | ≤ max
1≤j≤m
m
i=1
aij +
m
i=1
|bij |
≤ max
1≤j≤m
m
i=1
|aij | + max
1≤j≤m
m
i=1
|bij | = A 1 + B 1, (4.36)
A + B k =
m
i=1
n
j=1
(aij + bij )2
=
m
i=1
n
j=1
|aij + bij |2
≤
m
i=1
n
j=1
(|aij | + |bij |)2
=
m
i=1
n
j=1
|aij |2
+
m
i=1
n
j=1
|bij |2
+ 2
m
i=1
n
j=1
|aij ||bij |. (4.37)
The Cauchy–Buniakowski–Schwarz inequality states that for any real numbers xi, yi, i = 1, r, one
has
r
i=1
xiyi
2
≤
r
i=1
|xi|2
r
i=1
|yi|2
, (4.38)
hence
m
i=1
n
j=1
|aij ||bij | ≤
m
i=1
n
j=1
|aij |2
m
i=1
n
j=1
|bij |2
. (4.39)
From equations (4.37) and (4.39) we obtain
A + B k ≤
m
i=1
n
j=1
|aij |2
+
m
i=1
n
j=1
|bij |2
+ 2
m
i=1
n
j=1
|aij |2
m
i=1
n
j=1
|bij |2
=
m
i=1
n
j=1
|aij |2
+
m
i=1
n
j=1
|bij |2
= A k + B k. (4.40)
(iv) Let B ∈ Mn,p(R) be arbitrary. We have
AB ∞ = max
1≤i≤m



p
j=1
n
l=1
ail blj



≤ max
1≤i≤m
p
j=1
n
l=1
|ail blj | = max
1≤i≤m
n
l=1
|ail |
p
j=1
|blj |. (4.41)
But
max
n
l=1
|ail | = A ∞ (4.42)
118 LINEAR ALGEBRA
and
p
j=1
|blj | ≤ B ∞, (4.43)
because it is a sum of moduli on the row l. It follows that
AB ∞ ≤ A ∞ B ∞. (4.44)
Then
AB 1 = max
1≤l≤p



m
i=1
n
j=1
ail blj



≤ max
1≤l≤p
m
i=1
n
j=1
|ail blj |
≤ max
1≤l≤p
n
j=1
|blj |
m
i=1
|aij | = B 1
m
i=1
|aij | ≤ B 1 A 1 = A 1 B 1. (4.45)
We also have
A · B k =
m
i=1
p
j=1
n
l=1
ail blj
2
≤
m
i=1
p
j=1
n
l=1
ail |blj |
2
. (4.46)
From the Cauchy–Buniakowski–Schwarz inequality, we obtain
n
l=1
ail |blj |
2
≤
n
l=1
|ail |2
·
n
l=1
|blj |2
(4.47)
and relation (4.46) becomes
A · B k ≤
m
i=1
n
l=1
|ail |
2 p
j=1
n
l=1
|blj |
2
= A k B k. (4.48)
(v) Let alc be an arbitrary element of matrix A. We may write
A ∞ = |ai1| + |ai2| + · · · + |ain| ≥ |alc|, (4.49)
A 1 = |a1j | + |a2j | + · · · + |amj | ≥ |alc|, (4.50)
in these conditions, where i and j are the row and the column, respectively, on which the sum of
the moduli is maximum
A k =
m
i=1
n
j=1
|aij |2
≥ |alc|2 = |alc|. (4.51)
(vi) This is an immediate consequence of (v). From |A| < |B| it follows that A ≤ B , where
corresponds to any of the three norms.
NORM OF A MATRIX 119
Definition 4.5 Let {Ak
}k∈N∗ be a sequence of matrices of the same order Ak ∈ Mm,n(R), for any
k ∈ N∗. We call the limit of the sequence of matrices {Ak}k∈N a matrix
A = lim
k→∞
Ak, A ∈ Mm,n(R), (4.52)
for which the elements of A = [aij ]i=1,m
j=1,n
are given by
aij = lim
k→∞
a(k)
ij , (4.53)
where Ak = [a(k)
ij ]i=1,m
j=1,n
.
Proposition 4.3
(i) The necessary and sufficient condition for the sequence {Ak}k∈N to be convergent to the
matrix A is
lim
k→∞
A − Ak → 0, (4.54)
where is a certain canonical norm. Moreover,
lim
k→∞
Ak = A . (4.55)
(ii) The Cauchy criterion for convergence: The necessary and sufficient condition for the
sequence of matrices {Ak}k∈N to be convergent is that for any ε > 0, there exists N(ε) ∈ N so that
for any k > N(ε) and any p ∈ N∗
, Ak+p − Ak < ε, where takes the place of any canonical
norm.
Demonstration.
(i) The necessity. If Ak = [a(k)
ij ]i=1,m
j=1,n
→ [aij ]i=1,m
j=1,n
, then for any ε > 0, there exists N(ε) ∈ N
so that for any k > N(ε),
|aij − a(k)
ij | < ε (4.56)
or, equivalently,
|A − Ak| < ε1m,n, (4.57)
where 1m,n is the matrix with m rows and n columns and all elements equal to 1.
Passing to the norm in relation (4.47) and taking into account that is a canonical norm, we
have
A − Ak < ε 1m,n , (4.58)
for any k > N(ε). We get the relation
lim
k→∞
A − Ak = 0, (4.59)
because 1m,n is a finite real number.
The sufficiency. From A − Ak → 0 for k → ∞ it follows that for any ε > 0, there exists
N(ε) ∈ N so that for any k > N(ε),
A − Ak < ε. (4.60)
120 LINEAR ALGEBRA
Taking into account that is a canonical norm, we get
|aij − a(k)
ij | ≤ A − Ak < ε (4.61)
for k > N(ε). Therefore, it follows that
lim
n→∞
a(k)
ij = aij , (4.62)
hence
lim
k→∞
Ak = A. (4.63)
On the other hand, one has
| Ak − A | ≤ A − Ak < ε (4.64)
and one obtains
lim
k→∞
Ak = A . (4.65)
(ii) The necessity. If lim
k→∞
Ak = A, then from (i) it follows that lim
k→∞
A − Ak = 0. In that case,
Ak+p − Ak = Ak+p − A + A − Ak ≤ Ak+p − A + A − Ak . (4.66)
Let ε > 0. Because lim
k→∞
A − Ak = 0, there exists N(ε) ∈ N∗
so that Ak − A < ε/2 for any
k > N(ε). We may choose N(ε) so that
Ak+p − A <
ε
2
, A − Ak <
ε
2
(4.67)
for any p ∈ N; then equation (4.68) leads to
Ak+p − Ak < ε. (4.68)
The sufficiency. If Ak+p − Ak < ε, because is a canonical norm, we have
|a
(k+p)
ij − a(k)
ij | < ε (4.69)
for any i, j, 1 ≤ i ≤ m, 1 ≤ j ≤ n. It follows that a(k)
ij is a Cauchy sequence of real numbers, hence
convergent; let us denote its limit by aij . Taking into account (i), the proposition is proved.
Definition 4.6 Let {Ak}k∈N∗ be a sequence of matrices with Ak ∈ Mm,n(R) for any k ∈ N∗
. In
that case,
∞
k=1
Ak = lim
N→∞
N
k=1
Ak. (4.70)
Definition 4.7
(i) We say that the series SN = N
k=1 Ak, with Ak ∈ Mm,n(R), is convergent if the sequence
{SN }N∈N∗ is convergent.
(ii) The series is called absolute convergent if the series SN = N
k=1|Ak| is convergent.
NORM OF A MATRIX 121
Proposition 4.4
(i) If the series SN = N
k=1 Ak, Ak ∈ Mm,n(R), is convergent, then lim
k→∞
Ak = 0.
(ii) If a series of matrices is absolute convergent, then it is convergent.
(iii) Let be a canonical norm. If the numerical series ∞
k=1 Ak is convergent, then the series
∞
k=1 Ak is absolute convergent.
(iv) Let r be the convergence radius of the series ∞
k=1 Ak xk
, where is a canonical norm.
In this case, if x < r, then the series ∞
k=1 Akxk
and ∞
k=1 xk
Ak are convergent, where x
is chosen so that we may calculate ( ∞
k=1 Akxk
(x ∈ Mn(R)) or ∞
k=1 xk
Ak (x ∈ Mm(R)),
respectively).
(v) If x < 1, then the series ∞
k=1 Axk and ∞
k=1 xkA, where the matrices A and x are of
such a nature that one may effect the calculations, are convergent.
(vi) Let x ∈ Mn(R) and A ∈ Mm,n(R), with x < 1. Under these conditions,
∞
k=1
Axk
= A(In − x)−1
. (4.71)
If x ∈ Mm(R) and x < 1, then
∞
k=1
xk
A = (x − In)−1
A. (4.72)
Demonstration.
(i) Let SN = N
k=1 Ak. As SN is convergent, there exists S ∈ Mm,n(R) so that S = lim
N→∞
SN .
On the other hand, AN+1 = SN+1 − SN and we pass to the limit after N. We have
lim
N→∞
AN+1 = lim
N→∞
SN+1 − lim
N→∞
SN = S − S = 0. (4.73)
(ii) Let the series ∞
k=1 Ak be absolute convergent, that is, ∞
k=1|Ak| is convergent. But
∞
k=1
|Ak| =
∞
k=1
a(k)
ij
i=1,m
j=1,n
. (4.74)
It follows that that any series ∞
k=1|a(k)
ij | is convergent, hence any series ∞
k=1 a(k)
ij is
absolute convergent. Moreover, the series ∞
k=1 Ak = ∞
k=1 a(k)
ij i=1,m
j=1,n
is absolute con-
vergent.
(iii) Let Ak = [a(k)
ij ]i=1,m
j=1,n
. As is a canonical norm, it follows that |a(k)
ij | ≤ Ak for any i, j,
1 ≤ i ≤ m, 1 ≤ j ≤ n. Hence, any series ∞
k=1 a(k)
ij is absolute convergent and ∞
k=1 Ak is
also absolute convergent.
(iv) We may write successively
N+p
k=1
Akxk
−
N
k=1
Akxk
= AN+1xN+1
+ · · · + AN+pxN+p
≤ AN+1 xN+1
+ · · · + AN+p xN+p
≤ x N+1
( AN+1 + · · · + AN+p x p−1
)
≤ rN+1
( A + · · · + AN+p rp−1
); (4.75)
122 LINEAR ALGEBRA
As the series ∞
k=1 Ak xk is convergent for x < r, it follows that
N+p
k=1
Akxk
−
N
k=1
Akxk
< ε, N > N(ε), p ∈ N. (4.76)
Cauchy’s criterion states that ∞
k=1 Akxk
is convergent.
Analogically, we state that ∞
k=1 xkAk is convergent.
(v) The series ∞
k=0 Axk
is convergent for x < 1, as a consequence of (iv) with r = 1, the
geometric series with subunitary ratio being convergent. Analogically, this follows for the
series ∞
k=0 xkA.
(vi) Starting from the relation
A(In + x + x2
+ · · · + xk)(In − x) = A(In − xk+1
), (4.77)
passing to the limit for k → ∞ and taking into account that xk+1 → 0, because x < 1,
we obtain ∞
k=0
Axk
(In − x) = A. (4.78)
Let us consider the particular case A = In. One has
∞
k=0
xk
(In − x) = In, (4.79)
hence
det
∞
k=0
xk
det(In − x) = det(In) = 1, (4.80)
from which
det(In − x) = 0, (4.81)
the matrix In − x being invertible. Hence, equation (4.78) leads to
∞
k=0
Axk
= A(In − x)−1
. (4.82)
The second relation is analogous.
Corollary 4.2
(i) If x < 1, x ∈ Mn(R), then
∞
k=0
xk = (In − x)−1
. (4.83)
(ii) Under the same conditions as in (i), we have
(In − x)−1
≤ In +
x
1 − x
. (4.84)
INVERSION OF MATRICES 123
Demonstration.
(i) It is a consequence of point (vi) of the previous proposition for A = In.
(ii) We have
(In − x)−1
=
∞
k=0
xk
≤ In +
∞
k=1
x k
≤ In + x
1
1 − x
. (4.85)
Observation 4.10 If In = 1, then relation (4.84) becomes
(In − x)−1
≤
1
1 − x
. (4.86)
Proposition 4.5 (Evaluation of the Remainders). Let us denote by Rk the remainder of the series
∞
k=0 Axk
, x < 1, that is,
Rk =
∞
i=k+1
Axk
. (4.87)
Under these conditions,
Rk ≤
A x k+1
1 − x
. (4.88)
Demonstration. We may write
Rn =
∞
i=k+1
Axi
≤ A
∞
i=k+1
xi
≤ A
∞
i=k+1
x i
=
A x k+1
1 − A
. (4.89)
4.4 INVERSION OF MATRICES
4.4.1 Direct Inversion
Let the matrix
A = [aij ]i,j=1,n, A ∈ Mn(R) (4.90)
for which
det A = 0; (4.91)
that is, the matrix A is a nonsingular square matrix of order n with real elements.
Under these conditions, the inverse of the matrix A is given by
A−1
=
1
det A
(−1)i+j
ji i,j=1, n
, (4.92)
where lk is the determinant of the matrix Alk obtained from the matrix A by eliminating the row
l and the column k, hence a square matrix of order n − 1,
A =










a11 · · · a1,k−1 a1,k+1 · · · a1n
a21 · · · a2,k−1 a2,k+1 · · · a2n
· · · · · · · · · · · · · · · · · ·
al−1,1 · · · al−1,k−1 al−1,k+1 · · · al−1,n
al+1,1 · · · al+1,k−1 al+1,k+1 · · · al+1,n
· · · · · · · · · · · · · · · · · ·
an1 · · · an,k−1 an,k+1 · · · ann










. (4.93)
124 LINEAR ALGEBRA
4.4.2 The Gauss–Jordan Method
This method1
is based on
Lemma 4.1 (Substitution Lemma). Let V be a finite-dimensional vector space of dimension n
and let B = {v1, v2, . . . , vn} a basis of the vapor space. Let x, x = 0, be a vector of V. Then
x may replace any vector vi of the basis B if, by expressing it as a linear combination of the
basis’ vectors, the scalar αi which multiplies vi is nonzero. Moreover, in this case, the set B =
{v1, . . . , vi−1, x, vi+1, . . . , vn } is a basis of the vector space V.
Demonstration. There exist scalars α1, α2, . . . , αn, not all equal to zero, so that
x =
n
i=1
αivi, (4.94)
because B is a basis. Let us suppose that α1 = 0. It follows that
v1 =
1
α1
x −
n
i=2
αivi . (4.95)
Let us show that the vectors x, v2, . . . , vn are linearly independent. We suppose firstly that this
is not so; in this case, there can exist scalars β1, . . . , βn, not all zero, so that
β1x + β2v2 + · · · + βnvn = 0. (4.96)
Taking into account equation (4.94), we have
β1(α1v1 + α2v2 + · · · + αnvn) + β2v2 + · · · + βnvn = 0, (4.97)
from which
β1α1v1 + (β1α1 + β2)v2 + · · · + (β1αn + βn)vn = 0. (4.98)
As B is a basis, one obtains the system
β1α1 = 0, β1α2 + β2 = 0, . . . , β1αn + βn = 0 (4.99)
with the solution B, i = 1, n. The vectors x, v2, . . . , vn are thus linearly independent. Let us now
show that they constitute a system of generators. Let y be a vector of V. As B is a basis, there exist
scalars γi, i = 1, n, so that
y = γ1v1 + γ2v2 + · · · + γnvn. (4.100)
Taking into account equation (4.95), we get
y = γ1
1
α1
x − α2v2 + α3v3 + · · · + αnvn + γ2v2 + · · · + γnvn, (4.101)
from which we have
y =
γ1
α1
x + (γ2 − γ1α2)v2 + (γ3 − γ1α3)v3 + · · · + (γn − γ1αn)vn. (4.102)
1The method is named after Carl Friedrich Gauss (1777–1855) who discovered it in 1810, and Wilhelm Jordan
(1842–1899) who described it in 1887. The method was known by the Chinese (tenth–second century BC). It also
appears in 1670 attributed to Isaac Newton.
INVERSION OF MATRICES 125
As y is arbitrary, the vectors x, v2, . . . , vn form a system of generators B , which is a new basis
for V, hence the lemma is proved.
If the matrix A is nonsingular, then its columns are linearly independent. Let us write A in the
form
A = [c1 c2 · · · cn], (4.103)
where ci is the column i
ci = [a1i a2i · · · ani ]T
. (4.104)
Let us consider now an arbitrary column vector of dimension n
b = [b1 b2 · · · bn]T
. (4.105)
One can associate to every column ci, i = 1, n, a vector vi of Rn. The vectors vi, i = 1, n, are
linearly independent too, because ci, i = 1, n, are linearly independent. One has dim Rn
= n, so
that vi and ci, respectively, form a basis. Hence, the vector b given by equation (4.105) will be
generated by the columns of matrix A
b = α1c1 + α2c2 + · · · + αncn. (4.106)
In particular, b may be a column of the unit matrix.
Let us construct the table
a11 a12 · · · a1n
a21 a22 · · · a2n
· · · · · · · · · · · ·
an1 an2 · · · ann
1 0 · · · 0
0 1 · · · 0
· · · · · · · · · · · ·
0 0 · · · 1
. (4.107)
On the left side of the table are given the columns of the matrix A, while on the right side is the
unit matrix. Thus, a row of the table has 2n elements. We multiply the rows of the table by numbers
conveniently chosen, we commute them into one another or we add one to another, so as to obtain
the unit matrix on the left side; and we obtain the inverse matrix A−1 on the right. The procedure is
an application of the substitution lemma because, obviously, the columns of the matrix A (supposed
to be nonsingular) as well as the column of the unit matrix are bases in the space Rn
.
Observation 4.11 If, at a given point, on the left side of the table, on trying to obtain the column
i, we have aii = 0, ai+1,i = 0, . . . , ani = 0, then det A = 0 and we cannot obtain the inverse matrix.
Observation 4.12 Usually, one tries to have on the position of aii the greatest modulus between
|aii |, |ai+1,i|, . . . , |an,i|, so as to reduce the errors of calculation.
4.4.3 The Determination of the Inverse Matrix by its Partition
Let us consider the nonsingular square matrix of nth order A ∈ Mn(R), with real elements. Let us
partition the matrix
A =
A1 A3
A2 A4
, (4.108)
where A1 ∈ Mp(R) is a square matrix of order p, p < n, A2 ∈ Mn−p,p(R) is a matrix with
n − p rows and p columns, A3 ∈ Mp,n−p(R) is a matrix with p rows and n − p columns, while
A4 ∈ Mn−p(R) is a square matrix of order n − p.
126 LINEAR ALGEBRA
Let us denote by
B =
B1 B3
B2 B4
(4.109)
the inverse of the matrix A, of the same form (4.108), where the dimensions of the matrices Bi are
the same as those of the matrices Ai, i = 1, 4. As the matrix B is the inverse of the matrix A, one
has
A · B = In, (4.110)
where In is the unit matrix of order n. Taking into account relations (4.108) and (4.109), relation
(4.110) leads to
A1 A3
A2 A4
B1 B3
B2 B4
=
Ip 0p,n−p
0n−p,p In−p
, (4.111)
from which
A1B1 + A3B2 = Ip, A1B3 + A3B4 = 0p,n−p, A2B1 + A4B2 = 0n−p,p,
A2B3 + A4B4 = In−p. (4.112)
The second relation (4.112) leads to
B3 = −A−1
1 A3B4, (4.113)
which, replaced in the last relation (4.112), leads to
−A2A−1
1 A3B4 + A4B4 = In−p, (4.114)
hence
B4 = (A4 − A2A−1
1 A3)−1
. (4.115)
On the other hand, one may write the relation
B · A = In (4.116)
too; it follows that the system
B1A1 + B3A2 = Ip, B1A3 + B3A4 = 0p,n−p, B2A1 + B4A2 = 0n−p,p,
B2A3 + B4A4 = In−p. (4.117)
The third relation (4.117) leads to
B2 = −B4A3A−1
1 , (4.118)
while from the first relation (4.112), one obtains
B1 = A−1
1 − A−1
1 A3B2. (4.119)
Finally, the formulae (4.113), (4.115), (4.118), and (4.119) lead to
B4 = (A4 − A2A−1
1 A3)−1
, B3 = −A−1
1 A3(A4 − A2A−1
1 A3)−1
,
B2 = −(A4 − A2A−1
1 A3)−1
A2A−1
1 , B1 = A−1
1 + A−1
1 A3(A4 − A2A−1
1 A3)−1
A2A−1
1 . (4.120)
INVERSION OF MATRICES 127
4.4.4 Schur’s Method of Inversion of Matrices
Let A be a quadratic matrix of nth order, which we partition in the form2
A =
A1 A2
A3 A4
, (4.121)
where A1 is a quadratic matrix of pth order, p < n, A2 is a matrix with p rows and n − p columns,
A3 is a matrix with n − p rows and p columns, while A4 is a quadratic matrix of (n − p)th order.
Let us also suppose that A4 and A1 − A2A−1
4 A3 are invertible matrices.
Definition 4.8 The matrix A1 − A2A−1
4 A3 is called the Schur complement of the matrix A.
Proposition 4.6 Under the above conditions, the decomposition
A1 A2
A3 A4
=
Ip A2A−1
4
0n−p,p In−p
A1 − A2A−1
4 A3 0p,n−p
0n−p,p A4
Ip 0p,n−p
A−1
4 A3 In−p
(4.122)
takes place, where Ip and In−p are the unit matrices of orders p and n − p, respectively, while
0p,n−p or 0n−p,p mark zeros with p rows and n − p columns or with n − p rows and p columns,
respectively.
Demonstration. The result is evident, being an elementary multiplication of matrices.
Corollary 4.3 Under the same conditions, the inverse of the matrix A is
A−1
=
Ip 0p,n−p
−A−1
4 A3 In−p

 A1 − A2A−1
4 A3
−1
0p,n−p
0n−p,p A−1
4

 Ip −A2A−1
4
0n−p,p In−p
. (4.123)
Demonstration. The result is obvious.
Observation 4.13
(i) We may consider that the matrix A1 is invertible, in which case, the Schur complement of
the matrix A is given by A4 − A3A−1
1 A2.
(ii) If A1 and A4 − A3A−1
1 A2 are invertible, then we may write
A =
Ip 0p,n−p
A3A−1
1 In−p
A1 0p,n−p
0n−p,p A4 − A3A−1
1 A2
Ip A−1
1 A2
0n−p,p In−p
(4.124)
and
A−1
=
Ip −A−1
1 A2
0n−p,p In−p


A−1
1 0p,n−p
0n−p,p A4 − A3A−1
1 A2
−1

 Ip 0p,n−p
−A3A−1
1 In−p
,
(4.125)
respectively.
2The method was found by Isaai Schur (1875–1941).
128 LINEAR ALGEBRA
Observation 4.14
(i) If A, A2, A3, and A4 are invertible, while A1 = 0p, then the Schur complement is given by
−A2A−1
4 A3 and the inverse of the matrix A is given, corresponding to formula (4.123), by
A−1
=
−A−1
3 A4A−1
2 A−1
3
A−1
2 0n−p
. (4.126)
(ii) If A, A1, A2 and A3 are invertible, while A4 = 0n−p, then the Schur complement becomes
−A3A−1
1 A2 and relation (4.127) leads to
A−1
=
0p A−1
3
A−1
2 −A−1
2 A1A−1
3
. (4.127)
In multibody dynamics, matrices of the form
A =
A1 −AT
3
A3 0
, (4.128)
for which the decomposition, corresponding to relation (4.124), is
A =
Ip 0p,n−p
A3A−1
1 In−p
A1 0p,n−p
0n−p,p A3A−1
1 AT
3
Ip −A−1
1 AT
3
0n−p,p In−p
, (4.129)
are of interest. Using relation (4.125), the inverse of this matrix is of the form
A−1
=
Ip A−1
1 AT
3
0n−p,p In−p
A1 0p,n−p
0n−p,p A3A−1
1 AT
3
−1
Ip 0p,n−p
−A3A−1
1 In−p
, (4.130)
4.4.5 The Iterative Method (Schulz)
Let A ∈ Mn(R) be nonsingular and B0 be an approximate value of A−1
. Let us consider the matrix3
C0 = In − AB0, (4.131)
where In is the unit matrix of order n. If C0 = 0, then B0 = A−1
and the procedure stops. Let us
suppose that C0 = 0. We construct the sequence
Bk = Bk−1 + Bk−1Ck−1, k ≥ 1, (4.132)
where
Ck−1 = In − ABk−1. (4.133)
Proposition 4.7 The relation
Ck = C2k
0 , k ≥ 1. (4.134)
takes place for the sequence {Ck}k≥1 defined by relations (4.132) and (4.133).
3The method was published by G. Schulz in 1933.
INVERSION OF MATRICES 129
Demonstration. We have
C1 = In − A · B1 = In − A(B0 + B0C0) = In − AB0(In + C0) (4.135)
for k = 1. On the other hand,
AB0 = In − C0, (4.136)
hence
C1 = In − (In − C0)(In + C0) = C2
0. (4.137)
Let us now suppose that Ck = C2k
0 and let us show that Ck+1 = C2k+1
0 . We may write
Ck+1 = In − ABk+1 = In − A(Bk + BkCk) = In − ABk(In + Ck). (4.138)
Then
ABk = In − Ck, (4.139)
corresponding to relation (4.133); relation (4.138) leads to
Ck+1 = In − (In − Ck)(In + Ck) = C2
k = C2k+1
0 . (4.140)
Taking into account the principle of mathematical induction, relation (4.134) is true for any
k ≥ 1.
Proposition 4.8 If there exists q ∈ R, 0 < q < 1, so that C0 ≤ q, then
lim
k→∞
Bk = A−1
. (4.141)
Demonstration. We may write successively
Ck = C2k
0 ≤ C0
2k
≤ q2k
, (4.142)
hence
lim
k→∞
Ck = 0. (4.143)
On the other hand,
Ck = In − ABk (4.144)
and relation (4.143) leads to
lim
k→∞
Ck = lim
k→∞
(In − ABk) = 0. (4.145)
The last relation implies
In = lim
k→∞
ABk, (4.146)
hence
lim
k→∞
Bk = A−1
. (4.147)
Proposition 4.9 Taking into account the previous notations, the following relation exists:
A−1
− Bk ≤ B0 In +
q
1 − q
q2k
. (4.148)
130 LINEAR ALGEBRA
Demonstration. The relation A−1
− Bk may be written in the form
A−1
− Bk = A−1
− A−1
(ABk) = A−1
(In − ABk)
≤ A−1
In − ABk = A−1
Ck = A−1
C2k
0 ≤ A−1
C0
2k
≤ A−1
q2k
.
(4.149)
Then
A−1
= B0(In − C0)−1
, (4.150)
hence
A−1
≤ B0 (In − C0)−1
= B0 In + C0 + C2
0 + · · ·
≤ B0 ( In + C0 + C2
0 + · · ·) ≤ B0 ( In + q + q2
+ q22
+ q23
+ · · ·)
≤ B0 ( In + q + q2
+ q3
+ · · ·) ≤ B0 In + q
1
1 − q
. (4.151)
It follows that
A−1
− Bk ≤ B0 In +
q
1 + q
q2k
. (4.152)
Observation 4.15 If In = 1, then relation (4.148) becomes
A−1
− Bk ≤
B0
1 − q
q2k
. (4.153)
Observation 4.16 If we wish to obtain the matrix A−1
with a precision ε, then we stop at the
point that
B0
1 − q
q2k
< ε (4.154)
if In = 1, or when
B0 In +
q
1 − q
q2k
< ε. (4.155)
Each of the relations (4.154) and (4.155) indicates the number of necessary iteration steps. We have
thus to deal with an a priori estimation of the error.
Observation 4.17 One uses sometimes a stopping condition of the form
Bk − Bk−1 < ε, (4.156)
which results because the sequence Bk is convergent.
INVERSION OF MATRICES 131
4.4.6 Inversion by Means of the Characteristic Polynomial
Let A ∈ Mn(R) a square matrix with det A = 0 and the secular equation
det[A − λIn] = 0, (4.157)
which may be transcribed in the form
a11 − λ a12 a13 . . . a1,n−1 a1n
a21 a22 − λ a23 . . . a2,n−1 a2n
. . . . . . . . . . . . . . . . . .
an−1,1 an−1,2 an−1,3 . . . an−1,n−1 − λ an−1,n
an1 an2 an3 . . . an,n−1 ann − λ
= 0 (4.158)
or, equivalently,
(−1)n
λn
+ σ1λn−1
+ σ2λn−2
+ · · · + σn = 0. (4.159)
Replacing λ by A in the characteristic equation (4.157), one gets
det[A − A] = 0 (4.160)
which, obviously, is true. Hence,
(−1)n
An
+ σ1An−1
+ σ2An−2
+ · · · + σnIn = 0. (4.161)
On multiplying relation (4.161) on the right by A−1
, we get
(−1)n
An−1
+ σ1An−2
+ · · · + σn−1In = −σnA−1
, (4.162)
obtaining
A−1
= −
1
σn
[(−1)n
An−1
+ σ1An−2
+ · · · + σn−1In]. (4.163)
4.4.7 The Frame–Fadeev Method
The Frame–Fadeev method4
is a different reading from the previous one, where the coefficients σi,
i = 1, n, of various powers of λ in the secular equation are obtained as traces of certain matrices.
We multiply the characteristic equation
(−1)n
λn
+ σ1λn−1
+ σ2λn−2
+ · · · + σn−1λ + σn = 0 (4.164)
by (−1)n
to bring it in the form
λn
+ σ∗
1λn−1
+ σ∗
2λn−2
+ · · · + σ∗
n−1λ + σ∗
n = 0. (4.165)
Following sequences
A1 = A, σ∗
1 = −TrA1, B1 = A1 + σ∗
1In, (4.166)
A2 = AB1, σ∗
2 = −
1
2
TrA2, B2 = A2 + σ∗
2In (4.167)
4This method was published by J. S. Frame in 1949 and then in 1964, and by D. K. Fadeev (Faddeev) in 1952 in
Russian and then in 1963 in English.
132 LINEAR ALGEBRA
and, in general,
Ak = ABk−1, σ∗
k = −
1
k
TrAk, Bk = A + σ∗
kIn, (4.168)
until
An = ABn−1, σ∗
n = −
1
n
TrAn, Bn = An + σ∗
nIn (4.169)
are obtained.
The last relation (4.169) is just the Hamilton–Cayley equation, hence
Bn = An + σ∗
nIn = 0, (4.170)
from which
An = −σ∗
nIn. (4.171)
The first formula (4.169) leads now to
ABn−1 = −σ∗
nIn, (4.172)
hence
A−1
= −
1
σ∗
n
Bn−1. (4.173)
4.5 SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS
4.5.1 Cramer’s Rule
Let us consider the linear system of n equations with n unknowns5



a11x1 + a12x2 + · · · + a1nxn = b1,
a21x1 + a22x2 + · · · + a2nxn = b2,
...
an1x1 + an2x2 + · · · + annxn = bn,
(4.174)
which may be written in the form
Ax = b, (4.175)
where A = [aij ]i,j=1,n, x = x1 x2 · · · xn
T
, b = b1 b2 · · · bn
T
, A ∈ Mn(R),
x ∈ Mn,1(R), b ∈ Mn,1(R).
We suppose that the system is determined compatible, that is det A = 0. In this case, the solution
of the system is given by
xi = i
, i = 1, n, (4.176)
where
=
a11 a12 · · · a1n
a21 a22 · · · a2n
· · · · · · · · · · · ·
an1 an2 · · · ann
, (4.177)
5The method was named after Gabriel Cramer (1704–1752) who published it in 1750.
SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 133
i =
a11 a12 · · · a1,i−1 b1 a1,i+1 · · · a1n
a21 a22 · · · a2,i−1 b2 a2,i+1 · · · a2n
· · · · · · · · · · · · · · · · · · · · · · · ·
an−1,1 an−1,2 · · · an−1,i−1 bn−1 an−1,i+1 · · · an−1,n
an1 an2 · · · an,i+1 bn an,i+1 · · · ann
. (4.178)
Formulae (4.176) form the so-called Cramer’s rule.
The obvious disadvantage of this method consists in the fact that one must calculate n + 1
determinants of n + 1 distinct matrices of nth order.
If det A = 0, then the system may be undetermined compatible or incompatible.
Obviously, the first step consists in the calculation of = det A. If = 0, then one may apply
formula (4.176); in the contrary case, the algorithm stops.
4.5.2 Gauss’s Method
The idea of Gauss’s method consists in bringing the system of n linear algebraic equations with n
unknowns 


a11x1 + a12x2 + · · · + a1nxn = b1,
a21x1 + a22x2 + · · · + a2nxn = b2,
...
an1x1 + an2x2 + · · · + annxn = bn
(4.179)
to a canonical form in which the unknowns may easily be obtained. Such a form is the triangular
one in which the system (4.179) is written in the form



a11x1 + a12x2 + a13x3 + · · · + a1,n−1xn−1 + a1nxn = b1,
a22x2 + a23x3 + · · · + a1,n−1xn−1 + a2nxn = b2,
a33x3 + · · · + a3,n−1xn−1 + a3nxn = b3,
...
an−1,n−1xn−1 + an−1,nxn = bn−1,
annxn = bn.
. (4.180)
The solution of this system is given by
xn =
1
ann
bn, xn−1 =
1
an−1,n−1
(bn−1 − an−1,nxn), . . . , xi =
1
aii

bi −
n
j=i+1
aij

 , . . . ,
x1 =
1
a11

b1 −
n
j=2
a1j xj

 (4.181)
and is obtained step by step, starting from xn, continuing with xn−1, xn−2, . . . , until x1.
Observation 4.18 Obviously, equation (4.180) is not the only possible form in which the solution
can be immediately obtained. We may take, for instance, an inferior triangular form of the system,
the determination of the unknowns beginning with x1, continuing with x2, x3, . . . , until xn.
Observation 4.19 We observe from equation (4.181) that all the coefficients aii , i = 1, n, must
be nonzero.
In Gauss’s method, we multiply successively the first row with suitable values, adding then to
the rows 2, 3, . . . , n, so as to obtain a zero value for all ai1, 2 ≤ i ≤ n, remaining with a11 = 0. If
134 LINEAR ALGEBRA
a11 = 0 in the initial system, then we look for a nonzero coefficient between a21, a31, . . . , an1. If
all ai1, i = 1, n, vanish, then the procedure stops, and the variable x1 disappears in all n equations
(if not, we have a system of n equations with at most n − 1 unknowns). We continue then with the
second row and multiply it (we suppose a22 = 0) with suitable values so that all elements a32, a42,
. . . , an2 do vanish. Obviously, if a22 = 0, then we commute the second row with another row i for
which ai2 = 0. After the first step, the system (4.181) becomes



a(1)
11 x1 + a(1)
12 x2 + a(1)
13 x3 + · · · + a(1)
1n xn = b(1)
1 ,
a(1)
22 x2 + a(1)
23 x3 + · · · + a(1)
2n xn = b(1)
2 ,
a(1)
32 x2 + a(1)
33 x3 + · · · + a(1)
3n = b(1)
3 ,
...
a(1)
n2 x2 + a(1)
n3 x3 + · · · + a(1)
nn xn = b(1)
n ,
(4.182)
while, after the second step, the form of the system will be



a(2)
11 x1 + a(2)
12 x2 + a(2)
13 x3 + · · · + a(2)
1n xn = b(2)
1 ,
a(2)
22 x2 + a(2)
23 x3 + · · · + a(2)
2n xn = b(2)
2 ,
a(2)
32 x2 + a(2)
33 x3 + · · · + a(2)
3n = b(2)
3 ,
...
a(2)
n2 x2 + a(2)
n3 x3 + · · · + a(2)
nn xn = b(2)
n .
(4.183)
The procedure will continue until we obtain the form (4.180).
Observation 4.20 To reduce the calculation errors at step i, i = 1, n − 1, we do not make the
pivot with a(i)
ii but bring to position the element among a(i)
ii , a(i)
i+1,i, . . . , a(i)
ni that is greatest in
modulus. If all these elements vanish (the maximum is equal to zero), then the algorithm stops.
4.5.3 The Gauss–Jordan Method
The Gauss–Jordan method is a similar one to Gauss’s method, specifying that the row i, after
multiplying it with suitable values, is added not only to the rows i + 1, i + 2, . . . , n, but to the
rows 1, 2, . . . , i − 1 too. Thus, the matrix of the system becomes a diagonal one



a11x1 + 0 · x2 + 0 · x3 + · · · + 0 · xn−1 + 0 · xn = b1,
a22x2 + 0 · x3 + · · · 0 · xn−1 + 0 · xn = b2,
a33x3 + · · · + 0 · xn−1 + 0 · xn = b3,
...
an−1,n−1xn−1 + 0 · xn−1 + 0 · xn = bn−1,
ann xn = bn.
. (4.184)
In this case, the solution of the system (4.184) becomes
xi =
bi
aii
, aii = 0, i = 1, n. (4.185)
The observations to Gauss’s method remain valid.
SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 135
4.5.4 The LU Factorization
The idea of the method6
consists in writing the matrix A of the linear system Ax = b, A ∈ Mn(R),
x ∈ Mn,1(R), b ∈ Mn,1(R), in the form
A = LU, (4.186)
where L ∈ Mn(R) is an inferior triangular matrix
L =









l11 0 0 · · · 0 0
l21 l22 0 · · · 0 0
l31 l32 l33 · · · 0 0
· · · · · · · · ·
... · · · · · ·
ln−1,1 ln−1,2 ln−1,3 · · · ln−1,n−1 0
ln,1 ln,2 ln,3 · · · ln,n−1 ln,n









, (4.187)
while U ∈ Mn(R) is a superior triangular matrix
U =









u11 u12 u13 · · · u1,n−1 u1n
0 u22 u23 · · · u2,n−1 u2,n
0 0 u33 · · · u3,n−1 u3,n
· · · · · · · · ·
... · · · · · ·
0 0 0 · · · un−1,n−1 un−1,n
0 0 0 · · · 0 un,n









. (4.188)
Taking into account the previous relations, we may determine the values of the elements of the
matrices L and U
l11u11 = a11, l11u12 = a12, . . . , l11u1n = a1n, l21u11 = a21, l21u12 + l22u22 = a22, . . . ,
l21u1n + l22u2n = a2n, . . . , ln1u11 = an1, ln1u12 + ln2u22 = an2,
ln1u1n + ln2u2n + . . . + lnn unn = ann. (4.189)
The system
Ax = b (4.190)
now takes the form
LUx = b. (4.191)
We denote
Ux = y (4.192)
and we now have to solve two systems, that is,
Ly = b, (4.193)
with L an inferior triangular matrix, which has the solution
yi =
1
lii

bi −
i−1
j=1
lji yj

 , i = 1, n, (4.194)
6The method was introduced by Alan Mathison Turing (1912–1954).
136 LINEAR ALGEBRA
and the system
Ux = y, (4.195)
where U is a superior triangular matrix, having the solution
xi =
1
uii

yi −
n
j=i+1
uji xj

 , i = 1, n, (4.196)
respectively.
Observation 4.21 We observe that the system (4.189) has n2
equations and n2
+ n unknowns. To
be determined, n unknowns must be a priori specified. Depending on the specified unknowns, the
method must be used in various readings that will be presented in the following.
A. The Doolittle Method In the frame of this method,7
the values
lii = 1, i = 1, n (4.197)
are established and the system (4.189) becomes
u11 = a11, u12 = a12, . . . , u1n = a1n, l21u11 = a21, l21u12 + u22 = a22, . . . , l21u1n + u2n = a2n,
. . . , ln1u11 = an1, ln1u12 + ln2u22 = an2, . . . , ln1u1n + ln2un2 + . . . + unn = ann. (4.198)
The solution of this system becomes
lii = 1, i = 1, n, u1j = a1j , j = 1, n, l21 =
a21
a11
, u2k = a2k − l21u1k, k = 2, n, . . . ,
ln1 =
an1
u11
, ln2 =
1
u22
(an2 − ln1u12), . . . , unn = ann − ln1un1 − ln2un2 − · · · − ln,n−1un,n−1.
(4.199)
B. The Crout Method In the Crout method,8
the values
uii = 1 i = 1, n, (4.200)
are imposed, so that the system (4.189) becomes
l11 = a11, l11u12 = a12, . . . , l11u1n = a1n, l21 = a21, l21u12 + l22 = a22, . . . ,
l21u1n + l22u2n = a2n, . . . , ln1 = an1, ln1u12 + ln2 = an2, . . . ,
ln1u1n + ln2u2n + . . . + ln,n−1un−1,n + ln,n = ann, (4.201)
with the solution
uii = 1, i = 1, n, lj1 = aj1, j = 1, n, u12 =
a12
l11
, lk2 = ak2 − lk1u12, k = 2, n, . . . ,
u1n =
a1n
l11
, u2n =
1
l22
(a2n − l21u1n), . . . , lnn = ann − ln1u1n − · · · − ln,n−1un−1,n. (4.202)
7
The method was described by Myrick Hascall Doolittle (1830–1913).
8The method was named after Prescott Durand Crout (1907–1984).
SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 137
C. The Cholesky Method We suppose that the matrix A is symmetric and positive definite,9
that is
AT
= A (4.203)
and
xT
Ax > 0 (4.204)
for any x ∈ Mn,1(R), x = 0. We may choose the matrices L and U so that
U = LT
, (4.205)
in these conditions. The condition A = LU, written now in the form
A = LLT
, (4.206)
or, equivalently,




a11 a12 a13 · · · a1n
a21 a22 a23 · · · a2n
· · · · · · · · · · · · · · ·
an1 an2 an3 · · · ann



 =




l11 0 0 · · · 0
l21 l22 0 · · · 0
· · · · · · · · · · · · · · ·
ln1 ln2 ln3 · · · lnn








l11 l21 . . . ln1
0 l22 . . . ln2
. . . . . . . . . . . .
0 . . . . . . lnn



 , (4.207)
leads to
l2
11 = a11, l11l21 = a12, . . . , l11ln1 = a1n, l21l11 = a21, l2
21 + l2
22 = a22, . . . , l21ln1 + l22ln2
= a2n, . . . , ln1l11 = an1, ln1l21 + ln2l22 = an2, . . . , l2
n1 + l2
n2 + . . . + l2
nn = ann, (4.208)
the solution of which is
lii = aii −
i−1
j=1
l2
ji , j = 1, n, lij =
1
lii
aij −
i−1
k=1
lki lkj , j > i. (4.209)
4.5.5 The Schur Method of Solving Systems of Linear Equations
Let us consider the linear system



a11x1 + a12x2 + · · · + a1nxn = b1,
...
an1x1 + an2x2 + · · · + ann xn = bn,
, (4.210)
which we write in a condensed form as
Ax = b. (4.211)
We suppose that the system is compatible determined and that the matrix A allows a partition
of the form
A =
A1 A2
A3 A4
, (4.212)
where A1 ∈ Mp(R), A2 ∈ Mp,n−p(R), A3 ∈ Mn−p,p(R), and A4 ∈ Mn−p(R).
9The method was presented by Andr´e–Louis Cholesky (1876–1918).
138 LINEAR ALGEBRA
We partition the column vectors x and b in the form
x =
x1
x2
, b =
b1
b2
, (4.213)
where
x1 = [x1 · · · xp]T
, x2 = [xp+1 · · · xn]T
, b1 = [b1 · · · bp]T
, b2 = [bp+1 · · · bn]T
. (4.214)
The system (4.211) is now written in the form
A1x1 + A2x2 = b1, A3x1 + A4x2 = b2. (4.215)
If the matrix A4 is invertible, then the second equation (4.213) becomes
A−1
4 A3x1 + x2 = A−1
4 b2, (4.216)
from which
x2 = A−1
4 b2 − A−1
4 A3x1. (4.217)
Substituting now relation (4.217) in the first equation (4.215), we get
A1x1 + A2A−1
4 b2 − A2A−1
4 A3x1 = b1 (4.218)
or, equivalently,
(A1 − A2A−1
4 A3)x1 = b1 − A2A−1
4 b2. (4.219)
Now, if A1 − A2A−1
4 A3 is invertible, then it follows
x1 = (A1 − A2A−1
4 A3)−1
(b1 − A2A−1
4 b2). (4.220)
Relations (4.220) and (4.217) give the solution of the system (4.211).
The conditions of invertibility of the matrices A4 and A1 − A2A−1
4 A3 are just the Schur conditions
for the determination of the matrix A−1.
If the matrix A1 is invertible, then the first equation (4.215) leads to
x1 = A−1
1 b1 − A−1
1 A2x2, (4.221)
while from the second equation (4.215) we obtain
A3A−1
1 b1 − A3A−1
1 A2x2 + A4x2 = b2, (4.222)
from which, if A4 − A3A−1
1 A2 is an invertible matrix, we get
x2 = (A4 − A3A−1
1 A2)−1
(b2 − A3A−1
1 b1). (4.223)
In this case too, the invertibility conditions of the matrices A1 and A4 − A3A−1
1 A2 are Schur’s
conditions to determine the inverse of the matrix A.
Let us suppose now that the matrix A4 is invertible, while Q is a nonsingular quadratic matrix.
Moreover, we verify the relations
Qb2 = 0, x2 = QT
λ. (4.224)
SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 139
Under these conditions, the equation
A1 A2
A3 A4
x1
x2
=
b1
b2
(4.225)
may be written in the form
I 0
0 Q
A1 A2
A3 A4
I 0
0 QT
x1
λ
=
b1
0
, (4.226)
which may be easily verified by performing the requested products and taking into account the
relations (4.224). It follows that
A1 A2QT
QA3 QA4QT
x1
λ
=
b1
0
, (4.227)
from which
A1x1 + A2QT
λ = b1, QA3x1 + QA4QT
λ = 0. (4.228)
From the second relation (4.228) one obtains
λ = −(QT
)−1
A−1
4 A3x1, (4.229)
which, replaced in the first relation (4.228), leads to
(A1 − A2A−1
4 A3)x1 = b1. (4.230)
If the expression between parentheses in equation (4.230) defines a nonsingular matrix, then relations
(4.230) and (4.229) give the required solution, because QA4QT
is always invertible.
Let us consider now the case in which A4 is not invertible, a situation that may be frequently
encountered in the mechanics of multibody systems, when A4 = 0. From the first relation (4.228),
we get
x1 = A−1
1 (b1 − A2QT
λ), (4.231)
which, replaced in the second relation (4.228), leads to
(A4 − A3A−1
1 A2)QT
λ = −A3A−1
1 b1. (4.232)
If the expression from the parentheses in equation (4.232), as well as A1, are nonsingular matrices,
then relations (4.232) and (4.231) lead to the solution of the system (4.225) with the conditions
(4.224).
In the particular case frequently encountered, for which A4 = 0, the relation (4.70) is simplified
in the form
QA3A2QT
λ = −QA3A−1
1 b1, (4.233)
from which
λ = (A3A2QT
)−1
A3A−1
1 b1, (4.234)
and the relation (4.69) now leading to
x1 = A−1
1 [I − A2QT
(A3A2QT
)−1
A3A−1
1 ]b1. (4.235)
140 LINEAR ALGEBRA
Let us now consider the case of the system
A1 A2
A3 A4
x1
x2
+
c1
c2
=
b1
b2
, (4.236)
the relations (4.224) continuing to remain valid. Proceeding analogically, we obtain the relation
A1 A2QT
QA3 QA4QT
x1
λ
=
b1 − c1
−Qc2
, (4.237)
resulting in the system
A1x1 + A2QT
λ = b1 − c1, QA3x1 + QA4QT
λ = −Qc2. (4.238)
If A4 is invertible, then the last relation (4.238) leads to
λ = −(QT
)−1
A−1
4 (c2 + A3c1), (4.239)
which, replaced in the first equation (4.238), allows to write
(A1 − A2A−1
4 A3)x1 = b1 − c1 + A2A−1
4 c2. (4.240)
If the expression between parentheses of equation (4.240) defines a nonsingular matrix, then for-
mulae (4.240) and (4.239) give the required solution.
If A1 is invertible, then the first relation (4.238) leads to
x1 = A−1
1 (b1 − c1 − A2QT
λ), (4.241)
which, replaced in the second equation (4.238), allows to write
(A4 − A3A−1
1 A2)QT
λ = −[c2 + A3A−1
1 (b1 − c1)]. (4.242)
If the expression between the parentheses, in the left-hand term of the formula (4.242) defines an
invertible matrix, then relations (4.242) and (4.241) give the solution we require.
In the particular case defined by A4 = 0, we obtain, from relations (4.242) and (4.241), the
formulae
λ = (A3A−1
1 A2QT
)−1
[c2 + A3A−1
1 (b1 − c1)], (4.243)
x1 = A−1
1 {b1 − c1 − A2(A3A−1
1 A2)−1
[c2 + A3A−1
1 (b1 − c1)]}. (4.244)
Let us now modify the second condition (4.224) in the form
x2 = QT
λ + β. (4.245)
The system (4.236) now becomes
A1 A2QT
QA3 QA4QT
x1
λ
=
b1 − c1 − A2β
−Qc2 − QA4β
, (4.246)
from which we get
A1x1 + A2QT
λ = b1 − c1 − A2β, QA3x1 + QA4QT
λ = −Qc2 − QA4β. (4.247)
SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 141
If A4 is invertible, then the last relation (4.247) leads to
λ − (A4QT
)−1
(A3x1 + c2 + A4β), (4.248)
which, replaced in the first equation (4.247), allows to write
(A1 − A2A−1
4 A3)x1 = b1 − c1 − A2β + A2A−1
4 (c1 + A4β). (4.249)
If the expression between the parentheses on the left-hand side of this formula defines an invertible
matrix, then relations (4.249) and (4.248) give the required answer.
If A1 is invertible, then the first relation (4.247) leads to
x1 = A−1
1 (b1 − c1 − A2β) − A−1
1 A2QT
λ, (4.250)
which, replaced in the last relation (4.247), allows to write
(A4 − A3A−1
1 A2)QT
λ = −c2 − A4β − A3A−1
1 (b1 − c1 − A2β). (4.251)
If the parentheses of the left-hand side of the previous relation define a nonsingular matrix, then
the relations (4.251) and (4.250) constitute the required answer.
In the particular case given by A4 = 0, formulae (4.251) and (4.250) are simplified in the
form
λ = (QT
A3A−1
1 A2)−1
[−c2 − A3A−1
1 (b1 − c1 − A2β)], (4.252)
x1 = A−1
1 (b1 − c1 − A2β) − A−1
1 A2(A3A−1
1 A2)−1
[−c2 − A3A−1
1 (b1 − c1 − A2β)]. (4.253)
Observation 4.22 The theory presented above remains valid also in the case in which we renounce
the condition that Q be invertible. The only condition asked is that Q should be a full rank
matrix.
Considering now the system (4.247), if A4 is invertible, then the last equation (4.247) leads to
λ = (QA4QT
)−1
(−Qc2 − QA4β − QA3x1), (4.254)
while the first relation (4.247) gives
[A1 − A2QT
(QA4QT
)−1
QA3]x1 = b1 − c1 − A2β + A2QT
(QA4QT
)−1
(Qc1 + QA4β). (4.255)
If the square brackets on the left-hand side of this formula define an invertible matrix, then formulae
(4.255) and (4.254) give the allowed answer.
If A1 is invertible, then, from the first relation (4.247), we get
x1 = A−1
1 (b1 − c1 − A2β − A2QT
λ), (4.256)
which, replaced in the second formula (4.247), leads to
(QA4QT
+ QA3A−1
1 A2QT
)λ = Qc2 + QA4β + QA3(b1 − c1 − A2β). (4.257)
If the parentheses on the left-hand side of equation (4.257) define a nonsingular matrix, then the
formulae (4.257) and (4.256) give the searched answer.
142 LINEAR ALGEBRA
If A4 = 0, then the relation (4.257) may be written in the form
QA3A−1
1 A2QT
λ = Qc2 + QA3(b1 − c1 − A2β). (4.258)
If β = 0, then the relations (4.254)–(4.258) become
λ = (QA4QT
)−1
(−Qc2 − QA3x1), (4.259)
[A1 − A2QT
(QA4QT
)−1
QA3]x1 = b1 − c1 + A2QT
(QA4QT
)−1
Qc2, (4.260)
x1 = A−1
1 (b1 − c1 − A2QT
λ), (4.261)
(QA4QT
+ QA3A−1
1 A2QT
)λ = Qc2 + QA3(b1 − c1), (4.262)
QA3A−1
1 A2QT
λ = Qc2 + QA3(b1 − c1). (4.263)
If we also have c1 = 0, c2 = 0, then the last relations are simplified and, furthermore, we are
led to
λ = −(QA4QT
)−1
QA3x1, (4.264)
[A1 − A2QT
(QA4QT
)−1
QA3]x1 = b1, (4.265)
x1 = A−1
1 (b1 − A2QT
λ), (4.266)
(QA4QT
+ QA3A−1
1 A2QT
)λ = QA3b1, (4.267)
QA3A−1
1 A2QT
λ = QA3b1. (4.268)
4.5.6 The Iteration Method (Jacobi)
Let us consider the system of linear equations10



a11x1 + a12x2 + · · · + a1nxn = b1,
a21x1 + a22x2 + · · · + a2nxn = b2,
...
ai1x1 + a12x2 + · · · + ainxn = bi,
...
an1x1 + an2x2 + · · · + annxn = bn,
(4.269)
which may also be written in the matrix form
Ax = b, (4.270)
where
A =




a11 a12 . . . a1n
a21 a22 . . . a2n
. . . . . . . . . . . .
an1 an2 . . . ann



 , b = b1 b2 · · · bn
T
, x = x1 x2 · · · xn
T
. (4.271)
We suppose that aii = 0, i = 1, n, in the system (4.269) and that A is nonsingular.
10The method was named after Carl Gustav Jacob Jacobi (1804–1851).
SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 143
If one includes the unknown xi in the equation i of the system (4.269), then one may write



x1 =
b1
a11
−
a12
a11
x2 −
a13
a11
x3 − · · · −
a1n
a11
xn,
x2 =
b2
a22
−
a21
a22
x1 −
a23
a22
x3 − · · · −
a2n
a22
xn,
...
xi =
bi
aii
−
ai1
aii
x1 −
ai2
aii
x2 − · · · −
ain
aii
xn,
...
xn =
bn
ann
−
an1
ann
x1 −
an2
ann
x2 − · · · −
an,n−1
ann
xn−1.
(4.272)
Let us denote
βi =
bi
aii
, i = 1, n, (4.273)
αij = −
aij
aii
, i = 1, n, i = j, αij = 0, i = j. (4.274)
It follows that
α =




0 α12 . . . α1,n−1 α1n
α21 0 . . . α2,n−1 α2n
. . . . . . . . . . . . . . .
αn1 αn2 . . . αn,n−1 0



 , β =





β1
β2
...
βn





, (4.275)
so that the system (4.272) becomes
x = β + αx. (4.276)
Let x(0)
∈ Mn,1(R) be an initial solution of the system (4.276). We define the sequence of
iterations
x(1)
= β + αx(0)
, x(2)
= β + αx(1)
, . . . , x(k+1)
= β + αx(k)
, . . . , (4.277)
where k ∈ N∗
. Let us suppose that the sequence x(0)
, x(1)
, . . . , x(k)
, . . . , is convergent and let
x = lim
k→∞
x(k)
(4.278)
be its limit. It follows that
x = β + αx, (4.279)
hence x is the solution of the system (4.276).
Proposition 4.10 A sufficient condition of convergence of the sequence of successive iterations
x(k+1)
= β + αx(k)
, k ∈ N∗
, x(0)
arbitrary, (4.280)
is α < 1, where is one of the canonical norms.
Demonstration. We may write
x(k)
= β + αx(k−1)
= β + α(β + αx(k−2)
) = (In + α)β + α2
x(k−2)
. (4.281)
144 LINEAR ALGEBRA
We get, in general,
x(k)
= (In + α + α2
+ · · · + αk−1
)β + αk
x(0)
, (4.282)
where In is the unit matrix of nth order.
On the other hand, from α < 1 and because is canonical, we also have
αk
≤ α k
. (4.283)
It follows that αk
→ 0 for k → ∞, because α k
→ 0 for k → ∞. One obtains
lim
k→∞
αk
= 0. (4.284)
Then
lim
k→∞
(In + α + · · · + αk−1
) = (In − α)−1
(4.285)
and, passing to the limit in (4.282), it follows that
x = (In − α)−1
β, (4.286)
from which
(In − α)x = β; (4.287)
which is just the relation (4.279), showing that x is a solution of the system (4.276), hence of the
system (4.269).
Observation 4.23 Instead of the sequence of successive iterations x(0)
, x(1)
, . . . , x(k)
, . . . , we
may consider the sequence
y(0)
= x(0)
, y(k)
= x(k)
− x(k−1)
, k ∈ N∗
. (4.288)
We get
y(k+1)
= x(k+1)
− x(k)
= β + αx(k)
− β − αx(k−1)
, (4.289)
from which
y(k+1)
= α(x(k)
− x(k−1)
), k ∈ N∗
, (4.290)
hence
y(k+1)
= αy(k)
, k ∈ N∗
. (4.291)
On the other hand,
x(k+1)
=
k+1
i=0
y(i)
= x(0)
+
k+1
i=1
αi
y(1)
. (4.292)
Observation 4.24
(i) If x(0) = β, then the sequence of successive iterations becomes a particular form
x(0)
= β, x(1)
= β + αx(0)
= (In + α)β,
x(2)
= β + αx(1)
= β + αβ + α2
β = (In + α + α2
)β, . . ., x(k)
= (In + α + α2
+ · · · + αn
)β, . . . (4.293)
SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 145
(ii) For x(0) = β, relation (4.292) is written in the form
x(k+1)
=
k+1
i=0
αi
β, (4.294)
where α0
= In.
Proposition 4.11 (Estimation of the Error). Under the above conditions, the relation
x(k)
− x ≤
1
1 − α
x(k+1)
− x(k)
≤
α k
x(1)
− x(0)
1 − α
(4.295)
follows.
Demonstration. Let x(m+1)
and x(m)
be two consecutive iterations, with m ∈ N∗
. We have
x(m+1)
− x(m)
= β + αx(m)
− β − αx(m−1)
= α(x(m)
− x(m−1)
). (4.296)
It follows that
x(m+1)
− x(m)
= αm−k
(x(k+1)
− x(k)
) = αm
(x(1)
− x(0)
) (4.297)
for any 1 ≤ k < m. Passing to the norm in the relation (4.297), it follows that
x(m+1)
− x(m)
≤ α m−k
x(k+1)
− x(k)
≤ α m
x(1)
− x(0)
. (4.298)
Let p ∈ N∗, arbitrary. We calculate
x(k+p)
− x(k)
= x(k+p)
− x(k+p−1)
+ x(k+p−1)
− · · · − x(k+1)
+ x(k)
≤ x(k+p)
− x(k+p−1)
+ x(k+p−1)
− x(k+p−2)
+ · · · + x(k+1)
− x(k)
.
(4.299)
From (4.298), we get
x(k+p)
− x(k+p−1)
≤ α p−1
x(k+1)
− x(k)
, x(k+p−1)
− x(k+p−2)
≤ α p−2
x(k+1)
− x(k)
, . . . , x(k+2)
− x(k+1)
≤ α x(k+1)
− x(k)
,
(4.300)
so that the relation (4.298) leads to
x(k+p)
− x(k)
≤ α p−1
x(k+1)
− x(k)
+ α p−2
x(k+1)
− x(k)
+ · · · + x(k+1)
− x(k)
=
1 − α p
1 − α
x(k+1)
− x(k)
≤
1
1 − α
x(k+1)
− x(k)
. (4.301)
Taking into account that
x(k+1)
− x(k)
≤ α x(k)
− x(k−1)
≤ α 2
x(k−1)
− x(k−2)
≤ · · · ≤ α k
x(1)
− x(0)
,
(4.302)
we get
x(k+p)
− x(k)
≤
1
1 − α
x(k+1)
− x(k)
≤
α k
1 − α
x(1)
− x(0)
(4.303)
from the formula (4.301).
146 LINEAR ALGEBRA
We pass now to the limit for p → ∞ in the last relation and take into account lim
p→∞
x(k+p) = x,
obtaining the relation (4.295), which had to be proved.
Corollary 4.4 If x(0)
= β, then the relation (4.295) becomes
x(k)
− x ≤
1
1 − α
x(k+1)
− x(k)
≤
α k+1
1 − α
β . (4.304)
Demonstration. We have
x(0)
= β, x(1)
= (In + α)β, x(2)
= (In + α + α2
)β, . . . , x(m)
= (In + α + α2
+ · · · + αm
)β
(4.305)
for x(0)
= β, so that
x(k+1)
− x(k)
= αk+1
β ≤ α k+1
β (4.306)
and the relation (4.304) is obvious.
Observation 4.25
(i) A priori estimation of the error: The formula (4.295), written in the form
x(k)
− x ≤
α k
x(1)
− x(0)
1 − α
< ε, (4.307)
leads to the a priori estimation of the error in the iterations method. So, to determine the
solution x with an imposed precision ε, we must make a number of iterations given by
k =
ln ε (1 − α ) / x(1)
− x(0)
ln( α )
+ 1, (4.308)
where the external brackets mark the entire part of the function.
(ii) A posteriori estimation of the error: This estimation is given by the formula (4.295) written
in the form
x(k)
− x ≤
1
1 − α
x(k+1)
− x(k)
< ε. (4.309)
Hence, to determine x with an imposed precision ε, we must iterate until the difference
between two successive iterations x(k)
and x(k+1)
verifies the relation
x(k+1)
− x(k)
< ε(1 − α ). (4.310)
Observation 4.26 A sufficient condition to have α < 1 is given by the relation
|aii | >
n
j=1
j=i
|aij |, i = 1, n. (4.311)
Thus, it follows α ∞ < 1. Analogically, if
|aii | >
n
j=1
j=i
|aij |, i = 1, n, (4.312)
then we get α 1 < 1.
SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 147
4.5.7 The Gauss–Seidel Method
The Gauss–Seidel method11
is a variant of the iterations method; indeed, at the step k + 1 for the
determination of x(k+1)
i one uses the values x(k+1)
1 , . . . , x(k+1)
i−1 (obtained at this step) and the values
x(k)
i+1, . . . , x(k)
n (determined in the preceding step). We may write
x(k+1)
1 = β1 +
n
j=1
αij x(k)
j , x(k+1)
2 = β2 + α21x(k+1)
1
n
j=2
α2j x(k)
j , . . . ,
x(k+1)
i = βi +
i−1
j=1
αij x(k+1)
j +
n
j=i+1
αij − x(k)
j , . . . , x(k+1)
n = βn +
n−1
j=1
αnj x(k)
j . (4.313)
Proposition 4.12 Let x = αx + β, where α ∞ < 1 be the linear system. Under these conditions,
the iterative Gauss–Seidel process described by the relations (4.313) is convergent to the unique
solution of the system for any choice of the initial value x(0).
Demonstration. The component x(k)
i is given by
x(k)
i =
i−1
j=1
αij x(k)
j +
n
j=i+1
αij x(k−1)
j + βi, i = 1, n. (4.314)
On the other hand,
xi =
n
j=1
αij xj + βi, i = 1, n, (4.315)
and, by subtracting the relation (4.314) from relation (4.315) term by term, we obtain
xi − x(k)
i =
i−1
j=1
αij (xj − x(k)
j ) +
n
j=i+1
αij (xj − x(k−1)
j ). (4.316)
We apply the modulus in the last relation and obtain the result
|xi − x(k)
i | ≤
i−1
j=1
|αij ||xj − x(k)
j | +
n
j=i+1
|αij ||xj − x(k−1)
j |, i = 1, n. (4.317)
But
|xi − x(k)
i | ≤ x − x(k)
∞, (4.318)
because ∞ is a canonical norm, and hence
|xi − x(k)
i | ≤
i−1
j=1
|αij | x − x(k)
+
n
k=i+1
|αij | x − x(k−1)
. (4.319)
Let us denote by m the value of the index i = 1, n for which |xm − x(k)
m | is the norm α ∞,
hence
|xm − x(k)
m | = max
1≤i≤n
|xi − x(k)
i | = α ∞. (4.320)
11The method is named after Carl Friedrich Gauss (1777–1855) and Philipp Ludwig von Seidel (1821–1896).
148 LINEAR ALGEBRA
We have
x − x(k)
≤ λi x − x(k)
+ µi x − x(k−1)
, (4.321)
hence
x − x(k)
≤
µi
1 − λi
x − x(k−1)
. (4.322)
Let
q = max
1≤i≤n
µi
1 − λi
. (4.323)
Let us show that q ≤ α ∞ < 1. Now,
λi + µi =
n
j=1
|αij | ≤ α ∞, (4.324)
from which
µi ≤ α ∞ − λi, i = 1, n, (4.325)
with α ∞ < 1. We may also write
µi
1 − λi
≤
α ∞ − λi
1 − λi
≤
α ∞ − λi α ∞
1 − λi
= α ∞ < 1, (4.326)
hence q ≤ α ∞.
The relation (4.322) leads now to the sequence of inequalities
x − x(k)
≤ q x − x(k−1)
≤ q2
x − x(k−2)
≤ · · · ≤ qk
x − x(0)
(4.327)
and, by passing to the limit as k → ∞, we get
lim
k→∞
x(k)
= x (4.328)
and the proposition is thus proved.
Proposition 4.13 (Error Estimation). Under the above conditions, the inequalities result:
x(k)
− x ∞ ≤
1
1 − q
x(k+1)
− x(k)
∞ ≤
qk
1 − q
x(1)
− x(0)
∞ (4.329)
Demonstration. The proof is analogical to that of Proposition 4.11.
Observation 4.27 Obviously, the formulae for error estimation are
x(k)
− x ∞ ≤
qk
1 − q
x(1)
− x(0)
∞ < ε (4.330)
and
x(k)
− x ∞ ≤
1
1 − q
x(k+1)
− x(k)
∞, (4.331)
respectively.
SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 149
4.5.8 The Relaxation Method
Let the linear system be given by



a11x1 + a12x2 + · · · + a1nxn = b1,
a21x1 + a22x2 + · · · + a2nxn = b2,
. . .
an1x1 + an2x2 + · · · + annxn = bn,
(4.332)
which we assume to be compatible determined and with aii = 0, i = 1, n. Dividing row i by aii ,
i = 1, n, one obtains the system



−x1 + γ12x2 + · · · + γ1nxn + δ1 = 0,
γ21x1 − x2 + · · · + γ2nxn − δ2 = 0,
. . .
γn1x1 + γn2x2 + · · · − xn + δn = 0,
, (4.333)
where
γij = −
aij
aii
, δi =
bi
aii
, i, j = 1, n, i = j. (4.334)
Let x(0)
= x(0)
1 x(0)
2 · · · x(0)
n
T
be an approximation of the solution of the system (4.323),
which we replace in that one. We thus obtain rests of the form
R(0)
1 = −x(0)
1 +
n
j=2
γ1j x(0)
j + δ1, R(0)
2 = −x(0)
2 +
n
j=2
γ2j x(0)
j + δ2, . . . ,
R(0)
n = −x(0)
n +
n
j=2
γnj x(0)
j + δn. (4.335)
Let
|R(0)
k | = max{|R(0)
1 |, |R(0)
2 |, . . . , |R(0)
n |}, (4.336)
be the maximum of the moduli of these rests and let us give to xk the value xk + R(0)
k . At this point,
R(1)
k = 0 and the other rests are
R(1)
i = R(0)
i + γik R(0)
k , i = 1, n, i = k. (4.337)
Between the rests R(1)
i , i = 1, n, one of them will be maximum in modulus, say, R(1)
l . We give to
xi the increment R(1)
l ; it follows that R(2)
l = 0 and
R(2)
i = R(1)
i + γil R(1)
l , i = 1, n, i = l. (4.338)
The process may continue either until one obtains the desired precision, or until R
(p)
i = 0,
i = 1, n, at some step.
The solution of the system is given by
xi = x(0)
i +
p
k=1
R(k)
i , (4.339)
where p is the number of the iteration steps performed.
150 LINEAR ALGEBRA
4.5.9 The Monte Carlo Method
Let us consider the linear system12
Ax = b, A ∈ Mn(R), x, b ∈ Mn,1(R), (4.340)
which can be written in the form
x = αx + β, (4.341)
where α < 1, being one of the canonical norms.
Let us choose the factors vij , i, j = 1, n, so that
αij = pij vij , (4.342)
where
pij ≥ 0, with pij > 0 for αij > 0, i, j = 1, n, (4.343)
n
j=1
pij < 1, i = 1, n. (4.344)
We construct the matrix H so that hij = pij , i, j = 1, n, hn+1,j = 0, j = 1, n, hi,n+1 = 1 −
n
j=1 pij , i = 1, n, hn+1,n+1 = 1, that is,
H =
















p11 p12 · · · p1n p1,n+1 = 1 −
n
j=1
p1j
p21 p22 · · · p2n p2,n+1 = 1 −
n
j=1
p2j
· · · · · · · · · · · · · · ·
pn1 pn2 · · · pnn pn,n+1 = 1 −
n
j=1
pnj
0 0 · · · 0 1
















. (4.345)
Moreover, we choose a sequence S1, S2, . . . , Sn+1 of states possible and incompatible with one
another, in which Sn+1 is the frontier or the absorbent barrier. Thus, pij represents the probability
of passing of a particle from the state Si to the state Sj independently of the previous states, the
further states being non-definite. The state Sn+1 is a singular one and supposes the stopping of the
particle, which is evidenced by pn+1,j = 0, j = 1, n.
Thus, a particle starts from an initial state Si, i = 1, n, then passes into another state Sj and so
on until it attains the final state Sn+1. Obviously, the number of states through which the particle
passes is finite, but the number is different from simulation to simulation, that is, there are a number
of paths from the initial state Si, i = 1, n, to the final one Sn+1. It appears as a simple, homogeneous
Markov chain with a finite number of states.
Let Si0
, i0 = 1, n, be an initial state and one such Markov chain that defines the trajectory of the
particle be given by
Ti = {Si0
, Si1
, . . . , Sim
, Sim+1
}, (4.346)
where Sim+1
= Sn+1, that is, the final state.
12The Monte Carlo method was stated in the 1940s by John von Neumann (1903–1957), Stanislaw Marcin Ulam
(1909–1984), and Nicholas Constantine Metropolis (1915–1999). The name of the method comes from the famous
Monte Carlo Casino.
SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 151
We associate with this trajectory the aleatory variable Xi, the value of which is
ξ(Ti) = βi0
+ vi0i1
βi0
+ vi1i2
βi1
+ · · · + vim−1im
βim
. (4.347)
Theorem 4.1 The mathematical expectation
MXi =
Ti
ξ(Ti)P (Ti) =
i Tij
ξ(Tij )P (Tij ) = xi (4.348)
is a solution of the system (4.341).
Demonstration. The trajectories of Ti type may be divided into distinct classes as functions of the
state through which the particle passes for the first time. We have
Ti1
= {Si, S1, . . . }, Ti2
= {Si, S2, . . . }, Tin
= {Si, Sn, . . . }, Tin+1
= {Si, Sn+1}. (4.349)
Thus, Ti is the trajectory from one of the sets (4.349), if Ti is given by (4.346), then the associate
aleatory variable Xi will have the value
ξ(Tij ) = βi + vij βj + vji2
βi2
+ · · · + vim−1im
βim
= βi + vij ξ(Tj ). (4.350)
Obviously, for the trajectory Tin+1
= {Si, Sn+1}, we have
ξ(Tin+1
) = βi. (4.351)
If j < n + 1, then the trajectory Ti is composed from the segment (Si, Sj ), to which we add a
trajectory from the set Tj defined by (4.349). It follows that
P (Tij ) = pij P (Tj ). (4.352)
If j = n + 1, then
P (Tin+1
) = pi,n+1. (4.353)
It follows that
MXi =
n
i=1
Tj [βi + vij ξ(Tij )]pij P (Tj ) + βipi,n+1 (4.354)
or
MXi =
n
j=1
pij vij
Tj
ξ(Tj ) + βi


n
j=1
pij
Tj
P Tj + pi,n+1

 . (4.355)
On the other hand,
Tj
ξ(Tj )P (Tj ) = MXj , j = 1, n, (4.356)
Tj
P (Tj ) = 1, (4.357)
n
j=1
pij
Tj
P (Tj ) + pi,n+1 =
n+1
j=1
pij = 1, (4.358)
152 LINEAR ALGEBRA
so that the formula (4.355) becomes
MXi =
n
j=1
αij MXj + βi, i = 1, n, (4.359)
and the theorem is proved.
Chebyshev’s theorem ensures that the inequality
xi −
1
N
N
k=1
ξ T (k)
i < ε (4.360)
is realized with a probability tending to 1 for N → ∞. Thus it follows that
xi ≈
1
N
N
k=1
ξ(T (k)
i ). (4.361)
Practically, the problem is solved in a simpler manner. One constructs the matrix H. Let us
observe that if α < 1, then we may choose pij = |αij | and vij = 1 if αij > 0 or vij = −1 if
αij < 0. Let us suppose that we wish to determine xi, hence we start with the state Si. Thus, a
uniformly distributed aleatory number is generated in the interval (0, 1), let the number be π1. On
the line i of the matrix H, an index j is required, so that
j
k=1
pik ≤ π1 and
j+1
k=1
pik > π1. (4.362)
This index defines the new state Sj through which the particle passes. Obviously, this state may
also be Sn+1, the case in which the trajectory stops. If j = n + 1, then we use the row j of the
matrix H, for which a new uniformly distributed aleatory number is generated in the interval (0, 1).
The process continues until the final state Sn+1 is attained. Thus, ξ(T (1)
i ), where the upper index
(1) marks the first simulation, is calculated. The procedure is repeated N times, the approximate
value of xi being given by the formula (4.361).
Observation 4.28 The process gives also a possibility to calculate the inverse of the matrix A,
with A < 1, because determining the inverse A−1
is equivalent to solving a system of n2
linear
equations with n2
unknowns.
4.5.10 Infinite Systems of Linear Equations
We have considered until now a linear system of n equations with n unknowns, where n is a finite
integer. We try to generalize this for n → ∞.
Let us consider the infinite system
Ax = b, (4.363)
where x = [xk]T
k∈N, A = [ajk ]j,k∈N, b = [bj ]T
j∈N.
Definition 4.9 The system is called regular if the matrix A is diagonally dominant, that is,
|ajj | ≥
k∈K
k=j
|ajk |, j ∈ N, (4.364)
DETERMINATION OF EIGENVALUES AND EIGENVECTORS 153
and completely regular if the above inequality (4.364) is strict, that is, A is strictly diagonally
dominant.
The well-known theorem that asserts the existence and the uniqueness of the solution of a finite,
linear algebraic system, whose associated matrix is strictly diagonally dominant can be extended to
completely regular infinite systems. If the system is regular, but not completely regular, only the
existence of the solution is ensured.
The condition (4.364) may be written also in the form
ρ = 1 −
k∈N
k=j
|ajk ||ajj |
≥ 0. (4.365)
In case of a regular system, one may use the method of sections, considering that n is a finite
integer, that is, one solves a finite system formed by the first n equations with the first n unknowns,
by the methods presented above. Obviously, the accuracy of the solution depends on the number n.
4.6 DETERMINATION OF EIGENVALUES AND EIGENVECTORS
4.6.1 Introduction
Let A ∈ Mn(C) be a matrix with complex elements and V ⊂ Cn
a vector space. The matrix A
defines a linear transformation by the relation
x ∈ V → Ax ∈ V. (4.366)
Let us consider a subspace V1 of V and let us suppose that V1 is invariant with respect to the
linear transformation induced by the matrix A, hence for any x ∈ V1 it follows that Ax ∈ V1. It
follows that the subspace V1 is defined by the equation
Ax = λx, (4.367)
where λ is an element of the corpus that defines the product by scalars over V.
Definition 4.10 Any nonzero element x that satisfies the relation (4.367) is called an eigenvector
of the matrix A, while the element λ is called an eigenvalue of the matrix A.
Definition 4.11 The set of all the eigenvalues of the matrix A is called the spectrum of this matrix
and is denoted by SpA or Λ(A).
Observation 4.29
(i) If λ is an eigenvalue of the matrix A, then the matrix A − λIn, where In is the unit matrix
of order n, is a singular matrix and, conversely, if the matrix A − λIn is singular, then λ is
an eigenvalue for the matrix A.
(ii) The eigenvalues of the matrix A are obtained by solving the algebraic equation
det[A − λIn] = 0, (4.368)
called the characteristic equation or secular equation.
154 LINEAR ALGEBRA
(iii) Equation (4.368) is an algebraic equation of nth degree, which, corresponding to the basic
theorem of algebra, has n roots in C. These roots may be distinct or one may have various
orders of multiplicity. Hence, it follows that to an eigenvector there corresponds only one
eigenvalue, but to an eigenvalue there may correspond several eigenvectors.
(iv) If A ∈ Mn(R), then the eigenvalues are real or conjugate complex.
(v) If the matrix A ∈ Mn(C) has n distinct eigenvalues λi, i = 1, n, then any vector y ∈ Cn
may be written in the form
y =
n
i=1
µixi, (4.369)
where µi ∈ C, i = 1, n, the formula being unique.
(vi) As
Axi = λixi, i = 1, n, (4.370)
by multiplying the relation (4.369) on the left by Ak
, we obtain
Ak
y =
n
i=1
Ak
µixi =
n
i=1
µiAk
xi =
n
i=1
µiAk−1
(Axi)
=
n
i=1
µiAk−1
λixi = · · · =
n
i=1
λk
i µixi. (4.371)
(vii) Let us suppose that we have the relation
|λ1| > |λi|, i = 2, n, (4.372)
for the matrix A; that is, λ1 is the greatest eigenvalue in modulus. The expression (4.371)
may also be written in the form
Ak
y =
n
i=1
λk
i µixi = λk
1µ1x1 + λk
2µ2x2 + · · · + λk
nµnxn
= λk
1µ1 x1 +
λk
2
λk
1
µ2
µ1
x2 + · · · +
λk
n
λk
1
µn
µ1
xn , (4.373)
where we suppose that µ1 = 0. Passing to the limit after k in the last relation, we get
lim
k→∞
Ak
y = lim
k→∞
λk
1µ1x1. (4.374)
(viii) Let A ∈ Mn(C) and k ∈ N∗. Under these conditions, if the eigenvalues of A are distinct
λi ∈ C, i = 1, n, λi = λj for i = j, j = 1, n, then the spectrum of the matrix Ak
is given by
Λ(Ak
) = {λk
i }, i = 1, n. (4.375)
It follows that if A is idempotent (that is A2
= A), then Λ(A) = {0, 1}, and if A is nilpotent
(that is there exists k ∈ N so that Ak = 0), then Λ(A) = {0}.
DETERMINATION OF EIGENVALUES AND EIGENVECTORS 155
(ix) If x is an eigenvector of the matrix A corresponding to the eigenvalue λ, that is, if
Ax = λx, (4.376)
while y is a vector in Cn
, which depends on the variable t ∈ R (in general, t is the time),
corresponding to the law
y(t) = eλt
x, (4.377)
then y verifies the differential equation
dy
dt
= Ay. (4.378)
Indeed, one may write
dy
dt
= λeλt
x = eλt
λx = eλt
Ax = Aeλt
x = Ay. (4.379)
It follows that the particular solution of a system of ordinary differential equations may be
immediately written if one knows the eigenvectors and the eigenvalues of the matrix A.
Definition 4.12 The matrices A and B of Mn(C) are said to be similar if there exists a nonsingular
matrix P ∈ Mn(C), so that
B = P−1
AP. (4.380)
Observation 4.30 Let λ be an eigenvalue of the matrix A and x be the corresponding eigenvector.
If B is a matrix similar to A, by means of the matrix P, then λ is an eigenvalue of A if and only
if it is eigenvalue of B with the eigenvector P−1
x. Indeed, we obtain
B(P−1
x) = P−1
APP−1
x = P−1
Ax = P−1
λx = λP−1
x (4.381)
from Ax = λx.
4.6.2 Krylov’s Method
Let us denote by P (λ) the characteristic polynomial13
P (λ) = det[A − λIn], (4.382)
where A ∈ Mn(R), In being as usual the unit matrix of order n. We may write
P (λ) = (−1)n
λn
+ p1λn−1
+ p2λn−2
+ · · · + pn. (4.383)
Multiplying the relation (4.383) by (−1)n
we obtain a polynomial of nth degree, for which the
dominant coefficient is equal to 1,
P1(λ) = λn
+ q1λn−1
+ q2λn−2
+ · · · + qn, (4.384)
in which
qi = (−1)n
pi, i = 1, n − 1. (4.385)
13The method is credited to Aleksey Nikolaevich Krylov (1863–1945) who first presented it in 1931.
156 LINEAR ALGEBRA
The Hamilton–Cayley theorem allows to state that the matrix A equates the characteristic polynomial
to zero. Hence, we obtain
An
+ q1An−1
+ q2An−2
+ · · · + qnIn = 0. (4.386)
Let
y(0)
= y(0)
1 y(0)
2 · · · y(0)
n
T
(4.387)
be a nonzero vector in Rn
. Let us multiply the relation (4.386) on the right by y(0)
. It results
An
y(0)
+ q1An−1
y(0)
+ q2An−2
y(0)
+ · · · + qny(0)
= 0. (4.388)
We denote
Ak
y(0)
= y(k)
, k = 0, n (4.389)
and the relation (4.388) becomes
y(n)
+ q1y(n−1)
+ q2y(n−2)
+ · · · + qny(0)
= 0, (4.390)
an equation in which the unknowns are q1, q2, . . . , qn. The relation (4.390) may be also written in
the form
q1y(n−1)
+ q2y(n−2)
+ · · · + qny(0)
= −y(n)
(4.391)
or in components,





y(n−1)
1 y(n−2)
1 · · · y(0)
1
y(n−1)
2 y(n−2)
2 · · · y(0)
2
· · · · · · · · · · · ·
y(n−1)
n y(n−2)
n · · · y(0)
n









q1
q2
· · ·
qn



 = −





y(n)
1
y(n)
2
· · ·
y(n)
n





. (4.392)
The coefficients q1, q2, . . . , qn of the characteristic polynomial are determined by solving the linear
system (4.392) of n equations with n unknowns.
Observation 4.31 The relation (4.389) that defines the vector y(k)
may also be written recursively
y(0)
∈ Rn
arbitrary, y(0)
= 0, y(k)
= Ay(k−1)
, k ≥ 1. (4.393)
Observation 4.32 If the roots of the characteristic polynomial are real and distinct, then Krylov’s
method also leads to the corresponding eigenvectors. Indeed, the n eigenvectors x1, . . . , xn form a
basis in Rn
; then any vector of Rn
may be written as a linear combination of these vectors of the
basis. In particular, there exist the constants c1, c2, . . . , cn, not all zero, so that
y(0)
= c1x1 + c2x2 + · · · + cnxn. (4.394)
The relations (4.393) are transcribed now in the form
y(0)
= Ay(0)
= A(c1x1 + · · · + cnxn) = c1λ1x1 + c2λ2x2 + · · · + cnλnxn,
y(2)
= c1λ2
1x1 + c2λ2
2x2 + · · · + cnλ2
nxn, . . . , y(n)
= c1λn
1x1 + c2λn
2x2 + · · · + cnλn
nxn. (4.395)
DETERMINATION OF EIGENVALUES AND EIGENVECTORS 157
Let us introduce the polynomials
φi(λ) = λn−1
+ q1iλn−2
+ · · · + qn−1,i, i = 1, n, (4.396)
hence, it follows that
y(n−1)
+ q1iy(n−2)
+ · · · + qn−1,iy(0)
= c1φi(λ1)x1 + · · · + cnφi(λn)xn. (4.397)
On the other hand, we consider
φi(λ) =
P1(λ)
λ − λi
, (4.398)
so that the coefficients qij , i = 1, n, j = 1, n − 1, are given by Horner’s schema
q0j = 1, qij = λj qi−1,j + qi. (4.399)
Under these conditions,
φi(λj ) = 0 for any i and j with i = j (4.400)
and
φi(λj ) = P (λj ) = 0. (4.401)
We thus obtain
y(n−1)
+ q1iy(n−2)
+ · · · + qn−1,iy(0)
= ciφi(λi)xi, i = 1, n (4.402)
and if ci = 0, then we get the eigenvectors ciφi(λi)xi, i = 1, n.
4.6.3 Danilevski’s Method
Let14
P (λ) =
a11 − λ a12 · · · a1n−1 a1n
a21 a22 − λ · · · a2n−1 a2n
· · · · · · · · · · · · · · ·
an−1,1 an−1,2 · · · an−1,n−1 − λ an−1,n
an1 an2 · · · an,n−1 an,n − λ
= (−1)n
λn
−
n
i=1
piλn−i
.
(4.403)
be the characteristic polynomial of the matrix A ∈ Mn(R). The idea of the method consists in the
transformation of the matrix
A − λIn =






a11 − λ a12 · · · a1n−1 a1n
a21 a22 − λ · · · a2n−1 a2n
· · · · · · · · · · · · · · ·
an−1,1 an−1,2 · · · an−1,n−1 − λ an−1,n
an1 an2 · · · an,n−1 an,n − λ






(4.404)
14The method was stated by A. M. Danilevski (Danilevsky) in Russian in 1937, and then in 1959 it was translated
into English.
158 LINEAR ALGEBRA
into the matrix
B − λIn =






p1 − λ p2 p3 · · · pn−2 pn−1 pn
1 −λ 0 · · · 0 0 0
0 1 −λ · · · 0 0 0
· · · · · · · · · · · · · · · · · · · · ·
0 0 0 · · · 0 1 −λ






(4.405)
of a normal Frobenius form.15
On the other hand, the determinant of the matrix B − λIn, calculated by developing after the
first row, is
det[B − λIn] = (−1)n−1
n
i=1
piλn−i
− λn
= P (λ). (4.406)
To bring the matrix A to the Frobenius form B, we proceed as follows:
• We multiply the (n − 1)th column of the matrix A by an1/an,n−1, an2/an,n−1, . . . ,
an,n−2/an,n−1, ann/an,n−1, respectively, and subtract it from the columns 1, 2, . . . , n − 2, n,
respectively. This is equivalent to the multiplication on the right of the matrix A by the matrix
M1 =








1 0 · · · 0 0 0
0 1 · · · 0 0 0
· · · · · · · · · · · · · · · · · ·
−
an,1
an,n−1
−
an,2
an,n−1
· · · −
an,n−2
an,n−1
1
an,n−1
−
ann
an,n−1
0 0 · · · 0 0 1








. (4.407)
The inverse of the matrix M1 is
M−1
1 =






1 0 · · · 0 0 0
0 1 · · · 0 0 0
· · · · · · · · · · · · · · · · · ·
an1 an2 · · · an,n−2 an,n−1 an,n
0 0 · · · 0 0 1






. (4.408)
• To obtain a similar matrix, we must consider, in the following step, the matrix
A2 = M−1
1 A1M1, A1 = A. (4.409)
• the procedure is repeated for the (n − 1)th row and the matrix A2 until we obtain the (n − 1)th
row of the Frobenius matrix;
• the procedure continues until the second row, when the Frobenius matrix directly results.
Observation 4.33 If the element ai,i−1 is equal to zero (this means, on the computer, |ai,i−1| < ε,
ε given a priori), then one searches on the row i for a nonzero element among ai1, ai2, . . . , ai,i−2,
15This form was introduced by Ferdinand Georg Frobenius (1849–1917).
DETERMINATION OF EIGENVALUES AND EIGENVECTORS 159
let it be aij , j < i − 1, adding the columns i and j of the initial matrix. This means multiplication
on the right by the matrix
M∗
ij =










1 0 0 · · · 0 · · · 0 · · · 0 · · · 0
0 1 0 · · · 0 · · · 0 · · · 0 · · · 0
0 0 1 · · · 0 · · · 0 · · · 0 · · · 0
· · · · · · · · · · · · · · · · · · . . . · · · · · · · · · · · ·
0 0 0 · · · m∗
jj = 1 · · · 0 · · · m∗
ji = 1 · · · 0
· · · · · · · · · · · · · · · · · · . . . · · · · · · · · · · · ·
0 0 0 · · · 0 · · · 0 · · · 0 · · · 1










, (4.410)
the inverse of which is
(M∗
ij )−1
=










1 0 0 . . . 0 . . . 0 . . . 0 . . . 0
0 1 0 . . . 0 . . . 0 . . . 0 . . . 0
0 0 1 . . . 0 . . . 0 . . . 0 . . . 0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
0 0 0 . . . m∗
jj
−1
= 1 . . . 0 . . . (m∗
ji )−1
= −1 . . . 0
0 0 0 . . . 0 . . . 0 . . . 0 . . . 1










.
(4.411)
Observation 4.34
(i) If y is an eigenvector of the Frobenius matrix B, then the eigenvector of the matrix A is
x = M1M2 · · · Mn−1y, (4.412)
where we suppose that, by passing from the matrix A to the Frobenius matrix B, addition
of columns have not been necessary (if not, matrices of the form (4.410) would appear in
the product (4.412)).
(ii) Let us consider that the Frobenius matrix has distinct eigenvalues and let λ be one such value
(which is eigenvalue for the matrix A too, the matrix A also having distinct eigenvalues).
The eigenvector y, corresponding to the eigenvalue λ for the Frobenius matrix, satisfies the
relation 





p1 p2 . . . pn−1 pn
1 0 . . . 0 0
0 1 . . . 0 0
. . . . . . . . . . . . . . .
0 0 . . . 1 0












y1
y2
y3
. . .
yn






= λ






y1
y2
y3
. . .
yn






, (4.413)
from which
yn−j = λyn−j+1, j = 1, n − 1 (4.414)
and
n
i=1
piyi = λy1. (4.415)
The relation (4.414) leads to
yn−1 = λyn, yn−2 = λyn−1 = λ2
yn, . . . , y1 = λn−1
yn (4.416)
160 LINEAR ALGEBRA
and, by replacing in (4.415), we obtain
yn λn
−
n
i=1
piλn−i
= 0; (4.417)
hence, the characteristic polynomial of the matrix A is the same as that of the Frobenius
matrix B. Moreover, because yn = 0 (if not, it would follow that y = 0), one obtains also
the eigenvector of the Frobenius matrix B,
y = [λn−1
λn−2
· · · λ 1]T
, (4.418)
where we have supposed that yn = 1.
Observation 4.35 To reduce the errors in calculation, we usually consider as pivot not the element
ai,i−1 but the greatest element in modulus from among ai1, ai2, . . . , ai,i−1. Let that element be aij ,
that is
|aij | = max
k=1,i−1
|aik |. (4.419)
A commutation one into the other of the columns i and j is necessary, under these conditions;
one thus uses a matrix
Pij = P −1
ij =













1 0 . . . 0 . . . 0 . . . 0
0 1 · · · 0 · · · 0 · · · 0
· · · · · · · · · · · · · · · · · · · · · · · ·
0 0 · · · pii = 0 · · · pij = 1 · · · 0
· · · · · · · · · · · · · · · · · · · · · · · ·
0 0 · · · pji = 1 · · · pjj = 0 · · · 0
· · · · · · · · · · · · · · · · · · · · · · · ·
0 0 · · · 0 · · · 0 · · · 1













. (4.420)
Observation 4.36 If at a certain point, all the elements a(n+1−i)
ij , j = 1, i − 1, vanish, that is, at
the step n + 1 − i we will not be able to find a pivot on the row i, then the determinant of the
matrix A is written, according to Laplace’s theorem, as the product of two determinants and the
matrix A is decomposed into blocks.
4.6.4 The Direct Power Method
Let us consider the matrix A ∈ Mn(R) for which we suppose that the eigenvalues are distinct and
ordered as follows:
|λ1| > |λ2| > · · · > |λn|. (4.421)
The n eigenvectors x1, x2, . . . , xn, corresponding to the eigenvalues λ1, λ2, . . . , λn are linearly
independent, hence they form a basis in Rn
.
Let y ∈ Rn
be arbitrary. Under these conditions, y has a unique representation with respect to
the basis’ vectors x1, . . . , xn; hence there exist the real constants a1, . . . , an, uniquely determinate,
so that
y =
n
j=1
aj xj . (4.422)
DETERMINATION OF EIGENVALUES AND EIGENVECTORS 161
On the other hand,
Ay = A
n
j=1
aj xj =
n
j=1
aj (Axj ) (4.423)
and because
Axj = λj xj , j = 1, n, (4.424)
it results in the representation
Ay =
n
j=1
aj λj xj . (4.425)
Analogically, we obtain the relations
A2
y =
n
j=1
aj λ2
j xj , (4.426)
A3
y =
n
j=1
aj λ3
j xj (4.427)
and, in general,
An
y =
n
j=1
aj λn
j xj (4.428)
for any m ∈ N∗
.
Let us denote
y(m)
= Am
y = [y(m)
1 y(m)
2 · · · y(m)
n ]T
; (4.429)
the relation (4.428) becomes
y(m)
=
n
j=1
aj λm
j xj . (4.430)
Let V be the subspace of Rn generated by the set of vectors
Y = {y(1)
, y(2)
, . . . , y(m)
, . . . } (4.431)
and let B
B = {e1, e2, . . . , en}. (4.432)
be a basis of it or an extension of one of its bases in Rn
.
Observation 4.37
(i) All the previous considerations are valid for y = 0. Obviously, if y = 0, then y(m)
= 0
for any m ∈ N∗
. Moreover, if A = 0, then y(m)
= 0 for any m ∈ N∗
, irrespective of the y
initially chosen. We will suppose that A = 0 and y = 0.
162 LINEAR ALGEBRA
(ii) Obviously, the space Y may have a dimension less than n. As y(m) ∈ Rn for any m ∈ N∗,
it follows that Y ⊂ Rn
and dim Y ≤ dim Rn
. If dim Y = n, then, obviously, B is given by
the formula (4.432). If dim Y < n, then one can add to the basis’ vectors, let us say in terms
of k, n − k vectors to form the basis B in Rn
. As B is such a basis, it follows that any
vector of Rn
may be written in the form of a unique linear combination of vectors of B. In
particular,
xj =
n
i=1
xij ej , j = 1, n. (4.433)
Under these conditions, the vector y(m)
becomes
y(m)
=
n
j=1
aj λm
j
n
i=1
xij ei =
n
i=1

ei


n
j=1
aj xij λm
j



. (4.434)
But
n
j=1
aj xij λm
j = y(m)
i , (4.435)
so that
y(m)
i =
n
j=1
aj xij λm
j . (4.436)
Writing the previous relation for m + 1,
y(m+1)
i =
n
j=1
aj xij λm+1
j , (4.437)
and making the ratio of (4.436) to (4.437), we obtain
y(m+1)
i
y(m)
i
=
n
j=1
aj xij λm+1
j
n
j=1
aj xij λm
j
. (4.438)
On the other hand,
y(m+1)
i =
n
j=1
aj xij λm+1
j = a1xi1λm+1
1 + · · · + anxinλm+1
n
= λm+1
1 a1xi1 + a2xi2
λ2
λ1
m+1
+ · · · + anxin
λn
λ1
m+1
(4.439)
and, analogically,
y(m)
i = λm
1 a1xi1 + a2xi2
λ2
λ1
m
+ · · · + anxin
λn
λ1
m
. (4.440)
DETERMINATION OF EIGENVALUES AND EIGENVECTORS 163
Taking into account the relations (4.421), (4.439), and (4.440) and making m → ∞ in the relation
(4.438), we get
lim
m→∞
y(m+1)
i
y(m)
i
= λ1. (4.441)
Observation 4.38
(i) The formula (4.441) suggests that the index i, 1 ≤ i ≤ n, chosen for the ratio y(m+1)
i /y(m)
i ,
does not matter because we obtain λ1, as the limit. The statement is erroneous.
(ii) It is also possible that the limit in the relation (4.441) is infinite or does not exist, which
may lead to erroneous values for the approximation of λ1.
(iii) It follows from (i) and (ii) that the method is sensitive to the choice of the start vector y.
(iv) Instead of the ratio (4.441), we may choose
λ1 = lim
m→∞
n
i=1
y(m+1)
i
n
i=1
y(m)
i
, (4.442)
so as to obtain the approximate formula
λ1 ≈
n
i=1
y(m+1)
i
n
i=1
y(m)
i
. (4.443)
(v) The procedure may be accelerated with regard to the convergence, by using powers of 2 as
values of m, so that
A2
= AA, A4
= A2
A2
, . . . , A2k
= A2k−1
A2k−1
. (4.444)
The value of λ1 is given by the ratio
λ1 = lim
k→∞
y(2k)
i
y(2k−1)
i
. (4.445)
(vi) The vector
y(m)
= Am
y (4.446)
is the approximate value of the eigenvector of the matrix A, associated with the eigenvalue
λ1. Indeed, one may write
Am
y = a1λm
1 x1 +
n
j=2
aj λm
j xj = a1λm
1



x1 +
n
j=2
aj
a1
λj
λ1
m
xj



. (4.447)
But
lim
m→∞
λj
λ1
m
= 0, (4.448)
It follows
Am
y ≈ a1λm
1 x1, (4.449)
hence the vector Amy differs from the eigenvector x1 only by a multiplicative factor.
164 LINEAR ALGEBRA
(vii) One can also choose
λ1 =
1
n
n
i=1
y(m+1)
i
y(m)
i
. (4.450)
If the greatest modulus of the roots is multiple, of order of multiplicity p, that is,
|λ1| = |λ2| = · · · = |λp| > |λp+1| > · · · > |λn|, (4.451)
then
y(m+1)
i
y(m)
i
=
λm+1
1
p
j=1
aj xij +
n
j=p+1
aj xij λm+1
j
λm
1
p
j=1
aj xij +
n
j=p+1
aj xij λm
j
. (4.452)
Assuming that
p
j=1
aj xij = 0, (4.453)
we obtain
y(m+1)
i
y(m)
i
= λ1
1 +
n
j=p+1







aj xij
p
j=1
aj xij
λj
λ1
m+1







1 +
n
j=p+1







aj xij
p
j=1
aj xij
λj
λ1
m







. (4.454)
Passing to the limit for m and taking into account that (λj /λ1)m
→ 0 for m → ∞, we obtain
lim
m→∞
y(m+1)
i
y(m)
i
= λ1. (4.455)
Now, Amy = y(m) is one of the eigenvectors associated with the eigenvalue λ1.
Observation 4.39
(i) Let us form the sequence of matrices A, A2
, A22
, . . . , A2k
. As
n
i=1
λm
i = Tr(Am
), m = 2k
, (4.456)
where Tr(·) denotes the trace of (·), we obtain the equality
λm
1 + λm
2 + · · · + λm
n = λm
1 1 +
λ2
λ1
m
+ · · · +
λn
λ1
m
= Tr(Am
) (4.457)
DETERMINATION OF EIGENVALUES AND EIGENVECTORS 165
for the simple eigenvalue λ1. Passing to the limit for m → ∞, it follows that
Tr(Am
) = λm
1 , (4.458)
from which
λ1 = m
Tr(Am). (4.459)
(ii) If now we write
Am+1
= Am
· A, (4.460)
λm+1
1 + · · · + λm+1
n = Tr(Am+1
), (4.461)
λm
1 + · · · + λm
n = Tr(Am
). (4.462)
Dividing the last two relations and making m → ∞, it follows that
λ1 =
Tr(Am+1)
Tr(Am)
. (4.463)
4.6.5 The Inverse Power Method
The inverse power method is used to find the smallest eigenvalue in the modulus of the matrix A ∈
Mn(R), in the case in which A is nonsingular. In the latter case, det A = 0 and, in the characteristic
polynomial
P (λ) = (−1)n
λn
+ pn−1λn−1
+ · · · + p1λ + p0, (4.464)
the free term is nonzero
p0 = det A = 0. (4.465)
Hence, λ = 0 is not an eigenvalue for the matrix A.
Let x be an eigenvector corresponding to the eigenvalue λ of the matrix A. One can successively
write (λ = 0)
x = λ−1
λx = λ−1
Ax, (4.466)
from which
A−1
x = λ−1
x; (4.467)
hence, the eigenvalues of the inverse A−1
are the inverses of the eigenvalues of the original matrix
A. Thus, if λ1 is the smallest eigenvalue in modulus of the matrix A, that is,
0 < |λ1| < |λ2| ≤ · · · ≤ |λn|, (4.468)
then 1/λ1 is the greatest eigenvalue in modulus for A−1
and we can use the method of direct power
for the matrix A−1
.
Obviously, all the commentaries and discussions made in the direct power method remain valid.
166 LINEAR ALGEBRA
4.6.6 The Displacement Method
The idea of the displacement method is based on the observation that if the matrix A has the
eigenvalues λ1, λ2, . . . , λn, then the matrix A − qIn, A ∈ Mn(R), q ∈ R, has the eigenvalues
λ1 − q, λ2 − q, . . . , λn − q. Thus, one can find also eigenvalues other than the maximum or the
minimum ones in the modulus for the matrix A.
Let us suppose that λ1 is the maximum value in the modulus of the matrix A, while λn is the
minimum value in modulus of the matrix A. After a displacement q, considering the matrix A − qIn,
the maximum and the minimum eigenvalues in the modulus of this new matrix are given by
λ1 = max
1≤i≤n
|λi − q|, λn1 = min
1≤i≤n
|λi − q|. (4.469)
4.6.7 Leverrier’s Method
Let A ∈ Mn(R), with the characteristic polynomial16
P (λ) = det(λIn − A) = λn
+ p1λn−1
+ p2λn−2
+ · · · + pn. (4.470)
The roots of P (λ) are λ1, λ2, . . . , λn. Let us denote
Sk = λk
1 + λk
2 + · · · + λk
n. (4.471)
The following Newton formulae are known
Sk + Sk−1p1 + · · · + S1pk−1 = −kpk. (4.472)
We obtain the relations
k = 1 ⇒ S1 = −p1 ⇒ p1 = −S1, k = 2 ⇒ S2 + p1S1 = −2p2 ⇒ p2 = −
1
2
(S2 + p1S1),
. . . , k = n ⇒ pn = −
1
n
(Sn + Sn−1p1 + Sn−2 + · · · + S1pn−1), (4.473)
for k = 1, 2, . . . , n. On the other hand,
S1 = λ1 + · · · + λn = T r(A), S2 = λ2
1 + · · · + λ2
n = T r(A2
), . . . ,
Sk = λk
1 + · · · + λk
n = T r(Ak
), . . . , Sn = λn
1 + · · · + λn
n = Tr(An
). (4.474)
The coefficients p1, p2, . . . , pn are given by the formulae (4.472) and (4.473).
4.6.8 The L–R (Left–Right) Method
This method is based on the fact that any matrix A ∈ Mn(R) may be decomposed as a product of
two matrices
A = LR, (4.475)
in which L ∈ Mn(R) is an inferior triangular matrix, while R ∈ Mn(R) is a superior triangular
matrix. The decomposition leads to the sequence of matrices A1, A2, . . . , Ak in which
Ai = LiRi, i = 1, k, (4.476)
16The method was named in the honor of Urbain Jean Joseph Le Verrier (Leverrier) (1811–1877).
DETERMINATION OF EIGENVALUES AND EIGENVECTORS 167
where the matrices Li are inferior triangular, with the elements on the principal diagonal equal to
unity
l(i)
jj = 1, j = 1, n, (4.477)
while the matrices Ri are superior triangular.
The recurrence formula of the sequence is given by
Ai+1 = RiLi, A1 = A. (4.478)
One obtains thus a similar transformation, because
L1 = A1R−1
1 , A2 = R1L1 = R1A1R−1
1 = L−1
1 A1L1,
A3 = R2L2 = R2R1A1(R2R1)−1
= (L1L2)−1
A1(L1L2), . . . ,
Ak = (Rk−1Rk−2 · · · R1)A1(Rk−1Rk−2 · · · R1)−1
= (L1L2 · · · Lk−1)−1
A1(L1L2 · · · Lk−1) (4.479)
Moreover, all the matrices A1, . . . , Ak have the same eigenvalues.
Observation 4.40
(i) The matrix
Sk−1 =
j=1
Lj (4.480)
is an inferior triangular matrix with
s(k−1)
ii = 1 (4.481)
and
lim
i→∞
Li = In, (4.482)
where In is the unit matrix of order n, while the matrix
Dk−1 =
k−1
j=1
Rj (4.483)
is a superior triangular matrix.
(ii) The elements of the principal diagonal of the matrix Ri tend to the eigenvalues of the matrix
A for i → ∞.
(iii) The elements of the matrices Li and Ri are the solutions of the system of equations (4.476),
that is,
r1i = a1i, i = 1, n, rij = 0 for i > j, li1 =
a1i
a11
, i = 2, n, a11 = 0, rij = aij −
i−1
k=1
lik rkj ,
i ≤ j, lij =
1
rii

aij −
j−1
k=1
lik rkj

 , rii = 0, i > j, lii = 1, i = 1, n, lij = 0, i < j.
(4.484)
168 LINEAR ALGEBRA
(iv) If the sequence A1, A2, . . . , Ak is convergent, then the matrix Ak is superior triangular and
the elements situated on the principal diagonal are the eigenvalues of the matrix A, that is,
a(k)
ii = λi, i = 1, n. (4.485)
(v) The condition for stopping the algorithm is given by
Ak − Ak−1 < ε, (4.486)
where ε is a positive error imposed a priori, while is one of the canonical norms of the
matrix.
4.6.9 The Rotation Method
The rotation method applies to the symmetric matrices A ∈ Mn(R) and supplies both the eigen-
vectors and the eigenvalues of the matrix.
The idea consists in the construction of sequence of matrices A0 = A, A1, . . . , Ap, . . . , obtained
by the rule
Ap+1 = R−1
ij ApRij , (4.487)
the transformations being unitary and orthogonal.
To do this, we choose the matrix Rij in the form of a rotation matrix
Rij =















1 0 · · · 0 · · · 0 · · · 0 0
0 1 · · · 0 · · · 0 · · · 0 0
· · · · · · · · · · · · · · · · · · · · · · · · · · ·
0 0 · · · cos α · · · − sin α · · · 0 0
· · · · · · · · · · · · · · · · · · · · · · · · · · ·
0 0 · · · sin α · · · cos α · · · 0 0
· · · · · · · · · · · · · · · · · · · · · · · · · · ·
0 0 · · · 0 · · · 0 · · · 1 0
0 0 · · · 0 · · · 0 · · · 0 1















, (4.488)
that is, a unitary matrix in which the elements rii , rij , rji , rjj have been modified in the form
rii = cos α, rij = − sin α, rji = sin α, rjj = cos α. Obviously,
R−1
ij = RT
ij . (4.489)
By multiplying a matrix Ap by R−1
ij on the left and by Rij on the right, respectively,
Ap+1 = R−1
ij ApRij , (4.490)
we obtain a new matrix, which has the property
a
(p+1)
ij = a
(p+1)
ji = 0, (4.491)
that is, two new extradiagonal elements equal to zero have been created.
QR DECOMPOSITION 169
On the other hand, the Euclidian norm k remains unchanged to transformations similar to
rotation matrices, so that
k=l
a
(p+1)
kl
2
=
k=l
a
(p)
kl
2
− 2 a
(p)
ij
2
+
1
2
a
(p)
jj − a
(p)
ii sin 2α + 2a
(p)
ij cos 2α
2
. (4.492)
It follows therefore that the Euclidian norm of the new matrix, calculated only with the extradiagonal
elements, will diminish the most if
|a
(p)
ij | = max
k=l
|a
(p)
kl | (4.493)
and
tan 2α =
2a
(p)
ij
a
(p)
jj − a
(p)
ii
, |α| ≤
π
4
. (4.494)
If we denote such a norm by k , then
Ap+1 k
2
= Ap k
2
− 2 a
(p)
ipjp
2
(4.495)
and, furthermore,
a
(p)
ipjp
2
≥
Ap k
2
n(n − 1)
, (4.496)
because a
(p)
ipjp
is the maximal element in modulus out of the principal diagonal in the matrix Ap.
It results in the sequence of inequalities
[ Ap+1 k ]2
≤ [ Ap k ]2
1 −
2
n (n − 1)
≤ · · · ≤ [ A0 k ]2
1 −
2
n (n − 1)
p+1
, (4.497)
hence
lim
p→∞
Ap+1 k = 0. (4.498)
One obtains thus a matrix A∗
, all the extradiagonal elements of which are equal to zero, while
on the principal diagonal, it has the eigenvalues of the matrix A.
Moreover, for a matrix Ap, p ∈ N, the elements of the principal diagonal approximate the
eigenvalues of the matrix A, while the columns of the matrix
R = Ri1j1
Ri2j2
· · · Rip−1jp−1
(4.499)
approximate the eigenvectors of the matrix A.
4.7 QR DECOMPOSITION
Definition 4.13 Let v ∈ Mn,1(R), v = 0. The matrix
H = In −
2vvT
vTv
(4.500)
is called the Householder17
reflexion or Householder matrix or Householder transformation, while
the vector v is called the Householder vector, In being the unit matrix of order n.
17Introduced by Alston Scott Householder (1904–1993) in 1958.
170 LINEAR ALGEBRA
Let a vector x ∈ Mn,1(R),
x = [x1 x2 · · · xn]T
(4.501)
and let us calculate
Hx = In −
2vvT
vTv
x = x −
2vT
x
vTv
v. (4.502)
Let e1 be the first column of the unit matrix In. If Hx is in the vector subspace generated by e1,
then it follows results that v is in the vector subspace generated by x and e1. Let us take
v = x + λe1, (4.503)
where λ ∈ R. It follows that
vT
x = xT
x + λx1, (4.504)
vT
v = xT
x + 2λx1 + λ2
, (4.505)
Hx = 1 − 2
xT
x + λx1
xTx + 2λx1 + λ2
− 2λ
vT
x
vTv
e1; (4.506)
the condition that Hx be in the vector subspace generated by e1 leads to
1 − 2
xT
x + λx1
xTx + 2λx1 + λ2
= 0, (4.507)
that is,
λ2
= xT
x, λ = ±
√
xTx. (4.508)
Definition 4.14 Let A ∈ Mm,n(R). We call the following expression the QR factorization of the
matrix A:
A = QR, (4.509)
where Q ∈ Mm(R) is an orthogonal matrix.
QT
Q = Im, (4.510)
and R ∈ Mm,n(R) is an upper triangular matrix.
Let
A =




a11 a12 a13 · · · a1n
a21 a22 a23 · · · a2n
· · · · · · · · · · · · · · ·
am1 am2 am3 · · · amn



 . (4.511)
We find a Householder matrix H1 ∈ Mm(R), so that
H1A =




a11 a12 a13 · · · a1n
0 a22 a23 · · · a2n
· · · · · · · · · · · · · · ·
0 am2 am3 · · · amn



 . (4.512)
QR DECOMPOSITION 171
We determine now a new Householder matrix H2 ∈ Mm−1(R) with the property
H2





a22
a23
...
am2





=





a22
0
...
0





(4.513)
and choose
H2 =
1 O
0 H2
. (4.514)
Thus,
H2H1A =






a11 a12 a13 · · · a1n
0 a22 a23 · · · a2n
0 0 a33 · · · a3n
· · · · · · · · · · · · · · ·
0 0 am3 · · · amn






. (4.515)
The procedure is continuing with the determination of the matrix H3 with the property
H3





a33
a43
...
am3





=





a33
0
...
0





(4.516)
and with the choice of the matrix
H3 =
I2 0
0 H3
. (4.517)
Thus, we determine the Householder matrices H1, H2, H3, . . . , Hp, where p = min{m, n}.
Moreover,
R = HpHp−1 . . . H2H1A (4.518)
and
Q = H1H2 . . . Hp−1Hp. (4.519)
Another possibility to obtain the QR decomposition is by the use of the Givens rotation matrices.
Definition 4.15 The matrix denoted by G(i, j, θ), which is different from the unit matrix In, and
whose elements are given by
gii = cos θ, gij = sin θ, gji = − sin θ, gjj = cos θ. (4.520)
is called the Givens rotation18
matrix of order n.
18Defined by James Wallace Givens Jr. (1910–1993) in 1950.
172 LINEAR ALGEBRA
Let y be the product
y =





y1
y2
...
yn





= G(i, j, θ)





x1
x2
...
xn





. (4.521)
It follows that
yk =



xi cos θ − xj sin θ, for k = i
xi sin θ + xj cos θ, for k = j
xk, otherwise,
(4.522)
so that yk = 0 for
cos θ =
xi
x2
i + x2
j
, sin θ = −
xj
x2
i + x2
j
. (4.523)
Multiplying the matrix A on the left by various Givens matrices GT
1 , GT
2 , . . . , GT
r , results, in a
finite number of steps, in the matrix
R = GT
r GT
r−1 · · · GT
2 GT
1 A, (4.524)
an upper triangular matrix. The matrix Q is given by
Q = G1G2 · · · Gr−1Gr . (4.525)
4.8 THE SINGULAR VALUE DECOMPOSITION (SVD)
Definition 4.16
(i) Let x1, x2, . . . , xp be vectors in Rn, p ≤ n. We say that the vectors xi, i = 1, p, are
orthogonal if
xT
i xj = 0, (4.526)
for any 1 ≤ i, j ≤ p, i = j.
(ii) If, in addition,
xT
i xi = 1 (4.527)
for any 1 ≤ i ≤ p, then the vectors x1, x2, . . . , xp are called orthonormal vectors.
Observation 4.41
(i) If xi, i = 1, p, are orthogonal, then they are also linear independent.
(ii) The system of orthogonal vectors xi, i = 1, p, of Rn
, p < n, may be completed by the
vectors xp+1, . . . , xn, so that the new system of vectors x1, . . . , xn is orthogonal.
(iii) There exists A1 ∈ Mn,(n−p)(R), p < n, so that the matrix
A = A1 A2 (4.528)
has orthonormal columns.
THE SINGULAR VALUE DECOMPOSITION (SVD) 173
Theorem 4.2 (Singular Value Decomposition (SVD)19
). If A ∈ Mm,n(R) then there exist
orthogonal matrices U ∈ Mm(R) and V ∈ Mn(R) so that
UT
AV =




σ1 0 · · · 0
0 σ2 · · · 0
· · · · · · · · · · · ·
0 0 · · · σp



 , (4.529)
where p = min{m, n}.
Demonstration. Let x ∈ Rn
and y ∈ Rm
be two vectors of unitary norm that fulfill the relation
Ax = A 2y = σy. (4.530)
Taking into account the previous observation, we know that matrices V2 ∈ Mn,n−1(R) and
U2 ∈ Mm,m−1(R) exist, so that the matrices V = [xV2] ∈ Mn(R) and U = [yU2] ∈ Mm(R) are
orthogonal.
On the other hand,
UT
AV =
σ wT
0 B
= A1. (4.531)
But
A1
σ
w
2
2
≥ (σ2
+ wT
w)2
(4.532)
and
A1
σ
w 2
≤ A1 2
σ
w 2
. (4.533)
Then
X 2 = XT
2, (4.534)
for any matrix X, and hence,
σ
w 2
=
σ
w 2
σ
w
T
2
≥ σ w
σ
w 2
= (σ2 + wTw), (4.535)
so that
A1
2
2 ≥ σ2
+ wT
w. (4.536)
But U and V are orthogonal; we have
UAV 2 = UT
AV 2 = A 2 (4.537)
and we deduce
σ2
= A 2
2 = UT
AV 2
2 = A1
2
2. (4.538)
19The algorithm for SVD was given by Gene Howard Golub (1932–2007) and William Morton Kahan (1933–)
in 1970.
174 LINEAR ALGEBRA
Comparing relations (4.536) and (4.538), it follows that
wT
w = w 2
2 = 0, (4.539)
and hence,
w = 0. (4.540)
The procedure is continued for the matrix B ∈ Mm−1,n−1(R) and so on, the theorem being
proved.
In the demonstration, we used 2 defined as follows:
• for x ∈ Rn
,
x 2 = x2
1 + x2
2 + · · · + x2
n. (4.541)
• for A ∈ Mm,n(R),
A 2 = sup
x=0
Ax 2
x 2
= max
x 2=1
Ax 2. (4.542)
4.9 USE OF THE LEAST SQUARES METHOD IN SOLVING THE LINEAR
OVERDETERMINED SYSTEMS
We consider the linear system
Ax = b, (4.543)
where A ∈ Mm,n(R), m ≥ n, x ∈ Mn,1(R), b ∈ Mm,1(R).
Definition 4.17 System (4.543) is called an overdetermined system.
Obviously, system (4.543) has an exact solution only in some particular cases.
An idea of solving consists in finding the vector x so as to minimize the expression Ax − b ,
where is one of the norms of the matrix, that is
min
x∈Mn,1(R)
Ax − b . (4.544)
It is obvious that the answer depends on the chosen norm.
Usually, we consider the norm 2, which leads to the least squares method.
To begin, let us consider the case in which the columns of the matrix A are linearly independent.
We start from the equality
A(x + αz) − b 2
2 = Ax − b 2
2 + 2αxT
AT
(Ax − b) + α2
Az 2
2, (4.545)
where α is a real parameter, while z ∈ Mn,1(R).
If x is a solution of relation (4.544), then
AT
(Ax − b) = 0. (4.546)
Indeed, if relation (4.546) is not satisfied, then we choose
z = −AT
(Ax − b) (4.547)
USE OF THE LEAST SQUARES METHOD 175
and from equation (4.545) we get
A(x + αz) − b 2
2 < Ax − b 2
2, (4.548)
that is, x does not minimize expression (4.544), which is absurd.
It follows therefore that if the columns of the matrix A are linearly independent, then the solution
of system (4.543) in the sense of the least squares, denoted by xLS, is obtained from the linear system
AT
AxLS = AT
b. (4.549)
Definition 4.18
(i) System (4.549) is called the system of normal equations.
(ii) The expression
rLS = b − AxLS (4.550)
is called the minimum residual.
If A has the QR decomposition, where Q ∈ Mm(R) is orthogonal, while R is upper triangular,
then
QT
A = R =










r11 r12 · · · r1n
0 r22 · · · r2n
· · · · · · · · · · · ·
0 0 · · · rnn
0 0 · · · 0
· · · · · · · · · · · ·
0 0 · · · 0










. (4.551)
We also denote
QT
b =
c
d
, (4.552)
where
c = [c1 c2 · · · cn]T
, d = [d1 d2 · · · dm−n]T
. (4.553)
Thus, it follows that
Ax − b 2
2 = QT
Ax − QT
b 2
2 = R1x − c 2
2 + d 2
2, (4.554)
with
R1 =




r11 r12 · · · 0
0 r22 · · · r2n
· · · · · · · · · · · ·
0 0 · · · rnn



 , R1 ∈ Mn(R). (4.555)
As
rank(A) = rank(R1) = n, (4.556)
the solution of system (4.543) in the sense of the least squares is obtained from the system
R1xLS = c. (4.557)
176 LINEAR ALGEBRA
The case in which the columns of the matrix A are not linearly independent is somewhat more
complicated.
Let us denote by x a solution of equation (4.544) and let z ∈ null(A). It follows that x + z is also
a solution of equation (4.544), hence condition (4.544) does not have a unique solution. Moreover,
the set of all x ∈ Mn,1(R) for which Ax − b 2 is minimum is a convex set. We define in this set
xLS as being that x for which x 2 is minimum. Let us show that xLS is unique.
We denote by Q and Z two orthogonal matrices for which
QT
AZ = T =










t11 t12 · · · t1r 0 · · · 0
t21 t22 · · · t2r 0 · · · 0
· · · · · · · · · · · · · · · · · · · · ·
tr1 tr2 · · · trr 0 · · · 0
0 0 · · · 0 0 · · · 0
· · · · · · · · · · · · · · · · · · · · ·
0 0 · · · 0 0 · · · 0










, (4.558)
where r = rank(A).
Under these conditions,
Ax − b 2
2 = (QT
AZ)ZT
x − QT
b 2
2 = T1w − c 2
2 + d 2
2, (4.559)
where
ZT
x =
w
y
, QT
b =
c
d
, (4.560)
w = [w1 w2 · · · wr ]T
, y = [y1 y2 · · · yn−r ]T
,
c = [c1 c2 · · · cr ]T
, d = [d1 d2 · · · dn−r ]T
, (4.561)
T1 =




t11 t12 · · · t1r
t21 t22 · · · t2r
· · · · · · · · · · · ·
tr1 tr2 · · · trr



 . (4.562)
If we choose x such that equation (4.559) be minimum, then
w = T−1
1 c (4.563)
and
xLS = Z
T−1
1 c
0
. (4.564)
If we use SVD for the matrix A, then
xLS =
r
i=1
uT
i b
σi
vi, (4.565)
THE PSEUDO-INVERSE OF A MATRIX 177
where
UT
AV = Σ =










σ1 0 · · · 0 0 · · · 0
0 σ2 · · · 0 0 · · · 0
· · · · · · · · · · · · · · · · · · · · ·
0 0 · · · σr 0 · · · 0
0 0 · · · 0 0 · · · 0
· · · · · · · · · · · · · · · · · · · · ·
0 0 · · · 0 0 · · · 0










, Σ ∈ Mm,n(R), (4.566)
U = [u1 u2 · · · um], V = [v1 v2 · · · vn]. (4.567)
4.10 THE PSEUDO-INVERSE OF A MATRIX
Let A ∈ Mm,n(R) for which we know the SVD,
UT
AV = Σ ∈ Mm,n(R) (4.568)
and let r = rank(A).
Definition 4.19 The matrix A+
∈ Mn,m(R) is defined by
A+
= VΣ+
UT
, (4.569)
where Σ+ ∈ Mn,m(R) and
Σ+
=
















1
σ1
0 · · · 0 0 · · · 0
0
1
σ2
· · · 0 0 · · · 0
· · · · · · · · · · · · · · · · · · · · ·
0 0 · · ·
1
σr
0 · · · 0
0 0 · · · 0 0 · · · 0
· · · · · · · · · · · · · · · · · · 0
0 0 · · · 0 0 · · · 0
















. (4.570)
Let us observe that
xLS = A+
b; (4.571)
hence A+
is the unique solution of the problem
min
X∈Mn,m(R)
AX − Im k. (4.572)
178 LINEAR ALGEBRA
4.11 SOLVING OF THE UNDERDETERMINED LINEAR SYSTEMS
Definition 4.20 The linear system
Ax = b, (4.573)
where A ∈ Mm,n(R), b ∈ Mm,1(R), x ∈ Mn,1(R) and m < n is called an underdetermined linear
system.
Let us consider the QR decomposition of the matrix AT
,
AT
= QR = Q
R1
0n−m,m
, (4.574)
where R1 ∈ Mm(R), while 0n−m,m is a matrix of Mn−m,m(R) with all elements equal to zero.
System (4.573) is now written in the form
(QR)T
x = RT
1 0m,n−m
z1
z2
= b, (4.575)
where z1 ∈ Mm,1(R), z2 ∈ Mn−m,1(R) and
QT
x = [z1 z2]T
. (4.576)
The minimum in norm solution is obtained if we impose the condition z2 = 0.
In general, an underdetermined system either does not have a solution or has an infinite number
of solutions.
4.12 NUMERICAL EXAMPLES
Example 4.1 Let us calculate the determinant of the matrix
A =


1 2 −3
5 0 4
2 1 7

 . (4.577)
If we calculate the determinant by means of the definition, then we have to consider 3! = 6
permutations. These permutations, together with their signs and the corresponding products are
given below.
Permutation Sign Product
p1 = (1, 2, 3) + P1 = a11a22a33 = 0
p2 = (1, 3, 2) – P2 = a11a23a32 = 4
p3 = (2, 1, 3) – P3 = a12a21a33 = 70
p4 = (2, 3, 1) + P4 = a12a23a31 = 16
p5 = (3, 1, 2) + P5 = a13a21a32 = −15
p6 = (3, 2, 1) – P6 = a13a22a31 = 0
We obtain
det A = P1 − P2 − P3 + P4 + P5 − P6 = −73. (4.578)
NUMERICAL EXAMPLES 179
The same problem may be solved by means of equivalent matrices. Let us denote by the
required determinant and let us commute the rows 1 and 2 of the matrix A with each other in order
to realize the pivoting with the maximum element in modulus of the column 1. We have
= −
5 0 4
1 2 −3
2 1 7
. (4.579)
We multiply row 1 by −1/5 and −2/5, and we add it to the rows 2 and 3, respectively, obtaining
= −
5 0 −4
0 2 −
19
5
0 1
27
5
. (4.580)
We now multiply row 2 by −1/2 and we add it to row 3, obtaining
= −
5 0 −4
0 2 −
19
5
0 0
73
10
= −73. (4.581)
Example 4.2 Let us calculate the rank of the matrix
A =


1 2 3 0
3 5 8 1
6 11 17 1

 . (4.582)
We observe that the minor of second order
2 =
1 2
3 5
= −1 (4.583)
has a non zero value, hence the rank of A is at least equal to two.
Let us now border this minor by elements so as to obtain all the minors of order 3. As a matter
of fact we have only two such minors, that is
31 =
1 2 3
3 5 8
6 11 17
= 0, (4.584)
32 =
1 2 0
3 5 1
6 11 1
= 0, (4.585)
so it follows that
rank A = 2. (4.586)
180 LINEAR ALGEBRA
We may solve this problem by using equivalent matrices too. Thus, the rank of the matrix A is
the same with the rank of the matrix obtained from the matrix A by commuting rows 1 and 3 with
each other,
A =


1 2 3 0
3 5 8 1
6 11 17 1

 ∼


6 11 17 1
3 5 8 1
1 2 3 0

 . (4.587)
We now multiply, in the new matrix, row 1 by −1/2 and −1/6, and add it to rows 2 and 3,
respectively, obtaining
A ∼





6 11 17 1
0 −
1
2
−
1
2
1
2
0
1
6
1
6
−
1
6





. (4.588)
We now multiply the rows 2 and 3 by 2 and 6, respectively, obtaining
A ∼


6 11 17 1
0 −1 −1 1
0 1 1 −1

 . (4.589)
We multiply column 1 by −11/6, by −17/6 and by −1/6 and add it to columns 2, 3, and 4,
respectively, obtaining
A ∼


6 0 0 0
0 −1 −1 1
0 1 1 −1

 . (4.590)
We add now the second row to the third one, resulting
A ∼


6 0 0 0
0 −1 −1 −1
0 0 0 0

 . (4.591)
The last step consists in the subtraction of the second column from the third one and by addition
of the second column to the fourth one, deducing
A ∼


6 0 0 0
0 −1 0 0
0 0 0 0

 = B. (4.592)
To determine the rank of the matrix A it is now sufficient to number the non-zero elements of
the principal quasi-diagonal of the matrix B, that is, the elements b11 = 6, b22 = −1 and b33 = 0.
It follows that
rank A = 2. (4.593)
Example 4.3 Let the matrix
A =


1 2 −1
0 3 4
5 6 −2

 . (4.594)
We pose the problem of calculating the inverse of this matrix.
The direct method supposes the calculation of the determinant
det A = 25 (4.595)
NUMERICAL EXAMPLES 181
and of the minors
11 =
3 4
6 −2
= −30, 12 =
0 4
5 −2
= −20, 13 =
0 3
5 6
= −15,
21 =
2 −1
6 −2
= 2, 22 =
1 −1
5 −2
= 3, 23 =
1 2
5 6
= −4, 31 =
2 −1
3 4
= 11,
32 =
1 −1
0 4
= 4, 33 =
1 2
0 3
= 3, (4.596)
from which
A−1
=
1
25


−30 −2 11
20 3 −4
−15 4 3

 . (4.597)
We now pass on to the Gauss–Jordan method for which we construct the table
1 2 −1
0 3 4
5 6 −2
1 0 0
0 1 0
0 0 1
. (4.598)
We commute rows 1 and 3 with each other,
5 6 −2
0 3 4
1 2 −1
0 0 1
0 1 0
1 0 0
, (4.599)
we divide row 1 by 5,
1
6
5
−
2
5
0 3 4
1 2 −1
0 0
1
5
0 1 0
1 0 −
1
5
. (4.600)
and then we subtract row 1 from row 3, obtaining
1
6
5
−
2
5
0 3 4
0
4
5
−
3
5
0 0
1
5
0 1 0
1 0 −
1
5
. (4.601)
We now divide row 2 by 3,
1
6
5
−
2
5
0 1
4
3
0
4
5
−
3
5
0 0
1
5
0
1
3
0
1 0 −
1
5
, (4.602)
182 LINEAR ALGEBRA
and then multiply the new row 2 by −6/5 and −4/5, and add the results to rows 1 and 3, respectively,
obtaining
1 0 −2
0 1
4
3
0 0 −
3
5
0 −
2
5
1
5
0
1
3
0
1 −
14
5
−
1
5
. (4.603)
Further, we divide the third row by −5/3,
1 0 −2
0 1
4
3
0 0 1
0 −
2
5
1
5
0
1
3
0
−
3
5
4
25
3
25
, (4.604)
and multiply the new row 3 by 2 and −4/3, and add it to the rows 1 and 2, respectively,
1 0 0
0 1 0
0 0 1
−
6
5
−
2
25
11
25
4
5
3
25
−
4
25
−
3
5
4
25
3
25
. (4.605)
We have thus, in the right part of table (4.605), the searched required inverse, given before in
equation (4.597).
We shall solve now the same problem by the method of partitioning the matrix A.
If
A =
A1 A3
A2 A4
, A−1
= B =
B1 B3
B2 B4
, (4.606)
then we have
B4 = (A4 − A2A−1
1 A3)−1
, ∈ B3 = −A−1
1 A3B4, B2 = −B4A2A−1
1 , B1 = A−1
1 − A−1
1 A3B2,
(4.607)
with the conditions that A4 − A2A−1
1 A3 and A1 be invertible matrices.
Let us choose
A1 = [1], A2 =
0
5
, A3 = [2 − 1], A4 =
3 4
6 −2
, (4.608)
NUMERICAL EXAMPLES 183
from which
A−1
1 = [1], (4.609)
B4 = A4 − A2A−1
1 A3 =
3 4
6 −2
−
0
5
[1] 2 −1 =
3 4
−4 3
, (4.610)
(A4 − A2A−1
1 A3)−1
=
1
25
3 −4
4 3
, (4.611)
B3 = −[1] 2 −1
1
25
3 −4
4 3
=
1
25
[2 11], (4.612)
B2 = −
1
25
3 −4
4 3
0
5
=
1
25
20
−15
, (4.613)
B1 = −[1] − [1] 2 −1
1
25
20
−15
= −
6
5
. (4.614)
We obtained thus the same inverse (4.597).
To determine the inverse using the iterative method (Schulz) we shall consider an approximation
B0 of A−1
, given by
B0 =


−1.23 −0.1 0.46
0.77 0.13 −0.15
−0.62 0.17 0.11

 . (4.615)
We deduce
C0 = I3 − AB0 =


0.07 0.01 −0.05
0.17 −0.07 0.01
0.29 0.06 −0.18

 , (4.616)
C0 ∞ = 0.53, (4.617)
so that we may apply Schulz’s method.
There follows, successively,
B1 = B0 + B0C0 =


−1.1997 −0.0777 0.4377
0.8025 0.1196 −0.1602
−0.6026 0.1595 0.1229

 , (4.618)
C1 = I3 − AB1 =


−0.0079 −0.002 0.0056
0.0029 0.0032 −0.011
−0.0217 −0.0101 0.0185

 , (4.619)
B2 = B1 + B1C1 =


−1.199946 −0.079970 0.439934
0.799983 0.119996 −0.159985
−0.600044 0.159974 0.120045

 . (4.620)
The procedure may, obviously, continue, the exact value of the inverse being
A−1
= lim
n→∞
Bn =


−1.2 −0.08 0.44
0.8 0.12 −0.16
−0.6 0.16 0.12

 . (4.621)
184 LINEAR ALGEBRA
Another possibility to determine A−1
consists in the use of the characteristic polynomial of the
matrix A. To do this, we calculate
1 − λ 2 −1
0 3 − λ 4
5 6 −2 − λ
= −λ3
+ 2λ2
+ 24λ + 25, (4.622)
which leads to the equation
A3
+ 2A2
+ 24A + 25I3 = O3, (4.623)
from which, multiplying by A−1
, we get
−A2
+ 2A + 24I3 = −25A−1
; (4.624)
hence
A−1
=
1
25
(A2
− 2A − 24I3). (4.625)
But
A2
=


1 2 −1
0 3 4
5 6 −2




1 2 −1
0 3 4
5 6 −2

 =


−4 2 9
20 33 4
−5 16 23

 (4.626)
and it follows that
A−1
=
1
25




−4 2 9
20 33 4
−5 16 23

 − 2


1 2 −1
0 3 4
5 6 −2

 − 24


1 0 0
0 1 0
0 0 1



 = −
1
25


−30 −2 11
20 3 −4
−15 4 3

 .
(4.627)
Let us now calculate the inverse of A, using the Frame–Fadeev method.
We have successively
A1 = A =


1 2 −1
0 3 4
5 6 −2

 , σ1 = −Tr(A1) = −2,
B1 = A1 + σ1I3 =


−1 2 −1
0 1 4
5 6 −4

 , (4.628)
A2 = AB1 =


−6 −2 11
20 27 −4
−15 4 27

 , σ2 = −
1
2
Tr(A2) = −24,
B2 = A2 + σ2I3 =


−30 −2 11
20 3 −4
−15 4 3

 , (4.629)
A3 = AB2 =


25 0 0
0 25 0
0 0 25

 , σ3 = −
1
3
Tr(A3) = −25,
B3 = A3 + σ3I3 =


0 0 0
0 0 0
0 0 0

 ; (4.630)
NUMERICAL EXAMPLES 185
hence
A−1
= −
1
σ3
B2 =
1
25


−30 −2 11
20 3 −4
−15 4 3

 . (4.631)
To determine the inverse of A by Schur’s method, let us consider
A =
A1 A2
A3 A4
, (4.632)
where
A1 = [1], A2 = [2 − 1], A3 =
0
5
, A4 =
3 4
6 −2
. (4.633)
We have
A−1
4 = −
1
30
−2 −4
−6 3
, (4.634)
A2A−1
4 = −
1
30
[2 − 1]
−2 −4
−6 3
= −
1
30
[2 − 11], (4.635)
A−1
4 A3 = −
1
30
−2 −4
−6 3
0
5
= −
1
30
−20
15
, (4.636)
A2A−1
4 A3 = −
1
30
[2 − 11]
0
5
=
11
6
, (4.637)
A1 − A2A−1
4 A3 = −
5
6
, (A1 − A2A−1
4 A3)−1
= −
6
5
. (4.638)
We may write
A =


1 2 −1
0 3 4
5 6 −2

 =




1 −
2
30
11
30
0 1 0
0 0 1








−
5
6
0 0
0 3 4
0 6 −2










1 0 0
2
3
1 0
−
1
2
0 1






, (4.639)
A−1
=







1 0 0
−
2
3
1 0
1
2
0 1















−
6
5
0 0
0
1
15
2
15
0
1
5
−
1
10













1
1
15
−
11
30
0 1 0
0 0 1





=








−
6
5
−
2
25
11
25
4
5
3
25
−
4
25
−
3
5
4
25
3
25








. (4.640)
Example 4.4 Let the linear system be
10x1 + 2x2 − x3 = 7, 2x1 + 8x2 + x3 = −5, x1 + x2 + 10x3 = 8, (4.641)
the solution of which is required.
186 LINEAR ALGEBRA
If we wish to apply Cramer’s rule, then we must calculate the determinants
=
10 2 −1
2 8 1
−1 1 10
= 738, 1 =
7 2 −1
−5 8 1
8 1 10
= 738,
2 =
10 7 −1
2 −5 1
−1 8 10
= −738, 3 =
10 2 7
2 8 −5
−1 1 8
= 738, (4.642)
wherefrom
x1 = 1
= 1, x2 = 2
= −1, x3 = 3
= 1. (4.643)
To solve the same problem by Gauss’s method, we multiply the first equation in system (4.641)
by −1/5 and by 1/10 and we will add it to the second and third equations (4.642) and (4.643),
respectively, obtaining
10x1 + 2x2 − x3 = 7,
38
5
x2 +
6
5
x3 = −
32
5
,
6
5
x2 +
99
10
x3 =
87
10
. (4.644)
We now multiply the second equation in system (4.644) by −3/19 and add it to the third equation
(4.644), resulting in the system
10x1 + 2x2 − x3 = 7,
38
5
x2 +
6
5
x3 = −
32
5
,
369
38
x3 =
369
38
, (4.645)
with the solution
x3 = 1, x2 = −1, x1 = 1. (4.646)
The first step in solving by the Gauss–Jordan method leads to the same system (4.644). We
now multiply the second equation by −5/19 and by −3/19 and add it to the first and to the third
equations of system (4.644), respectively, obtaining
10x1 −
25
19
x3 =
165
19
,
38
5
x2 +
6
5
x3 = −
32
5
,
369
38
x3 =
369
38
. (4.647)
We now multiply the third equation of system (4.647) by 50/369 and −76/615, and add it to the
first and second equations (4.641) and (4.642), respectively, obtaining
10x1 = 10,
38
5
x2 = −
38
5
,
369
38
x3 =
369
38
(4.648)
and the solution
x1 = 1, x2 = −1, x3 = 1. (4.649)
Applying the Doolittle method of factorization LU, we are led to


1 0 0
l21 1 0
l31 l32 1




u11 u12 u13
0 u22 u23
0 0 u33

 =


10 2 −1
2 8 1
−1 1 10

 , (4.650)
wherefrom we obtain the system
u11 = 10, u12 = 2, u13 = −1, l21u11 = 2, l21u12 + u22 = 8, l21u13 + u23 = 1,
l31u11 = −1, l31u12 + l32u22 = 1, l31u13 + l32u23 + u33 = 10, (4.651)
NUMERICAL EXAMPLES 187
with the solution
u11 = 10, u12 = 2, u13 = −1, l21 =
1
5
, u22 =
38
5
, u23 =
6
5
, l31 = −
1
10
, l32 =
3
19
,
u33 =
369
38
. (4.652)
There results
L =







1 0 0
1
5
1 0
−
1
10
3
19
1







, U =







10 2 −1
0
38
5
6
5
0 0
369
38







. (4.653)
We denote
Ux = y (4.654)
and solve the system
Ly = b, (4.655)
that is
y1 = 7,
1
5
y1 + y2 = −5, −
1
10
y1 +
3
19
y2 + y3 = 8, (4.656)
wherefrom
y1 = 7, y2 = −
32
5
, y3 =
369
38
. (4.657)
Expression (4.654) leads to the system
10x1 + 2x2 − x3 = 7,
38
5
x2 +
6
5
x3 = −
32
5
,
369
38
x3 =
369
38
, (4.658)
with the known solution (4.649).
The Crout method leads to


l11 0 0
l21 l22 0
l31 l32 l33




1 u12 u13
0 1 u23
0 0 1

 =


10 2 −1
2 8 1
−1 1 10

 , (4.659)
wherefrom
l11 = 10, l11u12 = 2, l11u13 = −1, l21 = 2, l21u12 + l22 = 8, l21u13 + l22u23 = 1,
l31 = −1, l31u12 + l32 = 1, l31u13 + l32u23 + l33 = 10, (4.660)
with the solution
l11 = 10, u12 =
1
5
, u13 = −
1
10
, l22 =
38
5
, u23 =
3
19
, l31 = −1; (4.661)
hence
L =


10 0 0
2 38
5 0
−1 6
5
369
38

 , U =


1 1
5 − 1
10
0 1 3
19
0 0 1

 . (4.662)
188 LINEAR ALGEBRA
This results in the system
10y1 = 7, 2y1 +
38
5
y2 = −5, −y1 +
6
5
y2 +
369
38
y3 = 8, (4.663)
with the solution
y1 =
7
10
, y2 = −
16
19
, y3 = 1, (4.664)
and the system
x1 +
1
5
x2 −
1
10
x3 =
7
10
, x2 +
3
19
x3 = −
16
19
, x3 = 1, (4.665)
with the same solution (4.649).
To apply the Cholesky method, we must verify that the matrix A is symmetric (obviously!) and
positive definite.
We have
A =


10 2 −1
2 8 1
−1 1 10

 , (4.666)
xT
Ax = x1 x2 x3


10 2 −1
2 8 1
−1 1 10




x1
x2
x3


= (2x1 + x2)2
+ (x1 − x3)2
+ (x2 + x3)2
+ 5x2
1 + 6x2
2 + 8x2
3 > 0, (4.667)
for any x = 0.
Hence, we may apply the Cholesky method in which
L =


l11 0 0
l21 l22 0
l31 l32 l33

 , U =


l11 l21 l31
0 l22 l32
0 0 l33

 . (4.668)
It results the system
l2
11 = 10, l11l21 = 2, l11l31 = −1, l21l11 = 2, l2
21 + l2
22 = 8, l21l31 + l22l32 = 1,
l31l11 = −1, l31l21 + l32l22 = 1, l2
31 + l2
32 + l2
33 = 10, (4.669)
with the solution
l11 =
√
10, l21 =
2
√
10
, l31 = −
1
√
10
, l22 =
38
5
, l32 =
6
√
190
, l33 =
369
38
, (4.670)
so that
L =




√
10 0 0
2√
10
38
5 0
− 1√
10
6√
190
369
38



 , U =





√
10 2√
10
− 1√
10
0 38
5
6√
190
0 0 369
38





. (4.671)
We obtain the systems
√
10y1 = 7,
2
√
10
y1 +
38
5
y2 = −5, −
1
√
10
y1 +
6
√
190
y2 +
369
38
y3 = 8, (4.672)
NUMERICAL EXAMPLES 189
with the solution
y1 =
7
√
10
, y2 = −
32
√
190
, y3 =
369
38
, (4.673)
and
√
10x1 +
2
√
10
x2 −
1
√
10
x3 =
7
√
10
,
38
5
x2 +
6
√
190
x3 = −
32
√
190
,
369
38
x3 =
369
38
,
(4.674)
respectively, wherefrom results solution (4.649).
To solve system (4.641) by the iteration (Jacobi) method, we write it in the form
x1 = 0.7 − 0.2x2 + 0.1x3, x2 = −0.625 − 0.25x1 − 0.125x3, x3 = 0.8 + 0.1x1 − 0.1x2,
(4.675)
the matrices α and β having the expressions
α =


0 −0.2 0.1
−0.25 0 −0.125
0.1 −0.1 0

 , β =


0.7
−0.625
0.8

 . (4.676)
We choose
x(0)
= β, (4.677)
the iteration formula being
x(n+1)
= αx(n)
+ β. (4.678)
Let us observe that
α ∞ = 0.375 < 1, (4.679)
so that the Jacobi method may be applied.
We have successively
x(1)
= αx(0)
+ β =


0 −0.2 0.1
−0.25 0 −0.125
0.1 −0.1 0




0.7
−0.625
0.8

 +


0.7
−0.625
0.8

 =


0.905
−0.9
0.9325

 , (4.680)
x(2)
= αx(1)
+ β =


0 −0.2 0.1
−0.25 0 −0.125
0.1 −0.1 0




0.905
−0.9
0.9325

 +


0.7
−0.625
0.8

 =


0.97325
−0.9678125
0.9805

 ,
(4.681)
x(3)
= αx(2)
+ β =


0.9916125
−0.990875
0.99410625

 , x(4)
= αx(3)
+ β =


0.997585625
−0.997166406
0.99824875

 . (4.682)
The procedure may continue, so that at the limit, we obtain
x = lim
n→∞
x(n)
=


1
−1
1

 . (4.683)
190 LINEAR ALGEBRA
The solution of system (4.641) may be determined by means of the Gauss–Seidel method too.
In this case, the iteration formulae read
x(n+1)
1 = 0.7 − 0.2x(n)
2 + 0.1x(n)
3 , x(n+1)
2 = −0.625 − 0.25x(n+1)
1 − 0.125x(n)
3 ,
x(n+1)
3 = 0.8 + 0.1x(n+1)
1 − 0.1x(n+1)
2 . (4.684)
It results successively in
x(1)
1 = 0.7 + 0.2 · 0.625 + 0.1 · 0.8 = 0.905,
x(1)
2 = −0.625 − 0.2 · 0.905 + 0.125 · 0.8 = −0.95125,
x(1)
3 = 0.8 + 0.1 · 0.905 + 0.1 · 0.95125 = 0.985625, (4.685)
x(2)
1 = 0.7 + 0.2 × 0.95125 + 0.1 × 0.985625 = 0.9888125,
x(2)
2 = −0.625 − 0.25 × 0.9888125 − 0.125 × 0.985625 = −0.99540625,
x(2)
3 = 0.8 + 0.1 × 0.9888125 + 0.1 × 0.99540625 = 0.998421875, (4.686)
x(3)
1 = 0.998923437, x(3)
2 = −0.999533593, x(3)
3 = 0.999845703, (4.687)
x(4)
1 = 0.999891288, x(4)
2 = −0.999953534, x(4)
3 = 0.999984482. (4.688)
The procedure continues by obtaining at the limit, for n → ∞, solution (4.649).
If we wish to solve the problem by the relaxation method, then we write system (4.641) in the
form
x1 + 0.2x2 − 0.1x3 − 0.7 = 0, x2 + 0.25x1 + 0.125x3 + 0.625 = 0,
x3 − 0.1x1 + 0.1x2 − 0.8 = 0. (4.689)
Let us replace in equation (4.689) the values given by x(0). It follows that
0.7 − 0.2 × 0.625 − 0.1 × 0.8 − 0.7 = −0.205 = R(0)
1 ,
−0.625 + 0.25 × 0.7 + 0.125 × 0.8 + 0.625 = −0.275 = R(0)
2 ,
0.8 − 0.1 × 0.7 − 0.1 × 0.625 − 0.8 = −0.1325 = R(0)
3 . (4.690)
The greatest remainder in modulus is R(0)
2 , so that
x(1)
= x(0)
+


0
−0.275
0

 =


0.7
−0.9
0.8

 . (4.691)
We now replace in system (4.689) the values given by x(1)
, obtaining the remainders
0.7 − 0.2 × 0.9 − 0.1 × 0.8 − 0.7 = −0.26 = R(1)
1 ,
−0.9 − 0.25 × 0.7 + 0.125 × 0.8 + 0.625 = 0 = R(1)
2 ,
0.8 − 0.1 × 0.7 − 0.1 × 0.9 − 0.8 = −0.16 = R(1)
3 ; (4.692)
NUMERICAL EXAMPLES 191
the greatest remainder in modulus is R(1)
1 , hence
x(2)
= x(1)
+


0.26
0
0

 =


0.96
−0.9
0.8

 . (4.693)
Continuing the procedure, we obtain the values
x(3)
=


0.96
−0.9
0.986

 , x(4)
=


0.96
−0.98825
0.986

 , x(5)
=


0.99625
−0.98825
0.986

 , x(6)
=


0.99625
−0.98825
0.99845

 ,
x(7)
=


0.99625
−0.9988687
0.99845

 , . . . (4.694)
To apply Schur’s method, we write the matrix
A =


10 2 −1
2 8 1
−1 1 10

 (4.695)
in the form
A =
A1 A2
A3 A4
, (4.696)
where
A1 =
10 2
2 8
, A2 =
−1
1
, A3 = [−1 1], A4 = [10]. (4.697)
The vectors
x =


x1
x2
x3

 , b =


7
−5
8

 , (4.698)
are written in the form
x =
x1
x2
, b =
b1
b2
, (4.699)
where
x1 =
x1
x2
, x2 = [x3], b1 =
7
−5
, b2 = [8]. (4.700)
It follows that
x1 = (A1 − A2A−1
4 A3)−1
(b1 − A2A−1
4 b2), (4.701)
x2 = A−1
4 b2 − A−1
4 A3x1. (4.702)
Effecting the necessary calculations, we obtain
A−1
4 =
1
10
, (4.703)
192 LINEAR ALGEBRA
A1 − A2A−1
4 A3 =
1
10
99 21
21 79
, (A1 − A2A−1
4 A3)−1
=
1
738
79 −21
−21 99
, (4.704)
b1 − A2A−1
4 b2 =
1
10
78
−58
, (4.705)
x1 =
1
7380
79 −21
−21 99
78
−58
=
1
−1
(4.706)
x2 =
1
10
[8] −
1
10
[−1 1]
1
−1
= [1]. (4.707)
System (4.641) may be solved by the Monte Carlo method too. To do this, we write it in the
form
x1 = −0.2x2 + 0.1x3 + 0.7, x2 = −0.25x1 − 0.125x3 − 0.625, x3 = 0.1x1 − 0.1x2 + 0.8
(4.708)
and the matrix H becomes
H =




0 0.2 0.1 0.7
0.25 0 0.125 0.625
0.1 0.1 0 0.8
0 0 0 1



 . (4.709)
For the initial state S1, we may write as follows:
• If 0 ≤ x < 0.2, then we pass to the state S2.
• If 0.2 ≤ x < 0.3, then we pass to the state S3.
• If 0.3 ≤ x < 1, then we pass in the final state S4.
For the initial state S2, we have the following:
• If 0 ≤ x < 0.25, then we pass to the state S1.
• If 0.25 ≤ x < 0.375, then we pass to the state S3.
• If 0.375 ≤ x < 1, then we pass to the final state S4.
Finally, for the initial state S3 we get the following:
• If 0 ≤ x < 0.1, then we pass to the state S1.
• If 0.1 ≤ x < 0.2, then we pass to the state S2.
• If 0.2 ≤ x < 1, then we pass to the final state S4.
Moreover,
v11 = 0, v12 = −1, v13 = 1, v21 = −1, v22 = 0, v23 = −1, v31 = 1,
v32 = −1, v33 = 0. (4.710)
There have been 1000 simulations made for each unknown xi, i = 1, 3, of the following form:
NUMERICAL EXAMPLES 193
Nr. Random number Trajectory The value of the aleatory variable X
1 0.263
0.194
0.925 S1, S3, S2, S4 0.7 − 0.8 + 0.325 − 0.625
We obtain the values
x1 ≈ 0.98, x2 ≈ −1.03, x3 ≈ 1.06. (4.711)
Example 4.5 Let x ∈ M2,1(R). We define the norm
x 2 = x2
1 + x2
2 , (4.712)
where
x = x1 x2
T
. (4.713)
For a matrix A ∈ M2(R) we define the norm
A 2 = sup
x=0
Ax 2
x 2
. (4.714)
Let us consider
A =
1 2
0 −1
. (4.715)
We wish to calculate A 2.
Let us show that expression (4.712) defines a norm. First of all x 2 ≥ 0 for any x ∈ M2,1(R).
Moreover, x 2 = 0 leads to x2
1 + x2
2 = 0, with the unique solution x1 = x2 = 0, hence x = 0.
Let y ∈ M2,1(R),
y = y1 y2
T
. (4.716)
We have successively
x + y 2 = (x1 + y1)2 + (x2 + y2)2 = x2
1 + x2
2 + y2
1 + y2
2 + 2x1y1 + 2x2y2, (4.717)
x 2 + y 2 = x2
1 + x2
2 + y2
1 + y2
2 (4.718)
and the inequality
x + y 2 ≤ x 2 + y 2 (4.719)
is equivalent to
x1y1 + x2y2 ≤ x2
1 + x2
2 y2
1 + y2
2 . (4.720)
If x1y1 + x2y2 < 0, then inequality (4.720) is obviously satisfied.
If x1y1 + x2y2 > 0, then we square both members of inequality (4.720) and obtain the equivalent
relation
2x1x2y1y2 ≤ x2
1 y2
2 + x2
2 y2
1 , (4.721)
Which is obviously true.
194 LINEAR ALGEBRA
We also may write
αx 2 = α2x2
1 + α2x2
2 = |α| x 2, (4.722)
where α ∈ R, hence x 2 is a norm.
On the other hand,
A 2 = sup
x=0
Ax 2
x 2
= sup
x=0
A
x
x 2
= max
x 2=1
Ax 2. (4.723)
From x 2 = 1, it follows that there exists θ ∈ [0, 2π) with the property
x = x1 x2
T
= cos θ sin θ
T
. (4.724)
If
A =
a11 a12
a21 a22
, (4.725)
then
Ax| x 2=1 =
a11 cos θ + a12 sin θ
a21 cos θ + a22 sin θ
(4.726)
and
Ax 2 = [(a2
11 + a2
21)cos2
θ + (a2
12 + a2
22)sin2
θ + 2(a11a12 + a21a22) sin θ cos θ]
1
2 . (4.727)
It follows that
A 2 = max
θ∈[0,2π)
a2
11 + a2
21 − a2
12 − a2
22
2
cos 2θ + a11a12 + a21a22 sin 2θ
+
a2
11 + a2
21 + a2
12 + a2
22
2
1
2
. (4.728)
We verify immediately that 2 is norm.
For the matrix A given by equation (4.715), we get
A 2 = max
θ∈[0,2π)
[−2 cos 2θ + 2 sin 2θ + 3]
1
2 . (4.729)
We denote f : [0, 2π) → R,
f (θ) = −2 cos 2θ + 2 sin 2θ + 3, (4.730)
and we may write
f (θ) = 4 sin 2θ + 4 cos 2θ. (4.731)
The equation f (θ) = 0 leads to the solution
tan 2θ = −1, (4.732)
wherefrom
sin 2θ =
√
2
2
, cos 2θ =
−
√
2
2
. (4.733)
NUMERICAL EXAMPLES 195
It follows that
A 2 = 3 + 2
√
2. (4.734)
Example 4.6 Let the matrix
A =


2 1 −1 3
0 3 2 5
2 4 1 8

 , (4.735)
for which we calculate the QR factorization.
We have
x1 = 2 0 2
T
, x 2 = 2
√
2 = λ1 (4.736)
and choose
v1 = x1 + 2
√
2e1 = 2 1 +
√
2 0 1
T
. (4.737)
It follows successively that
v1vT
1 = 4


3 + 2
√
2 0 1 +
√
2
0 0 0
1 +
√
2 0 1

 , (4.738)
vT
1 v1 = 8(2 +
√
2), (4.739)
2
v1vT
1
vT
1 v1
=
1
2 +
√
2


3 + 2
√
2 0 1 +
√
2
0 0 0
1 +
√
2 0 1

 , (4.740)
H1 =
1
2 +
√
2


−1 −
√
2 0 −1 −
√
2
0 2 +
√
2 0
−1 −
√
2 0 1 +
√
2

 , (4.741)
H1A =


−2.828427 −3.535534 0 −7.778175
0 3 2 5
0 2.121320 1.414215 3.535534

 . (4.742)
We also find
x2 = 3 2.121320
T
, x2 2 = 3.674234 = λ2. (4.743)
v2 = x2 + 3.674234e2 = 6.674234 2.121320
T
, (4.744)
v2vT
2 =
44.545399 14.158186
14.158186 4.5
, (4.745)
vT
2 v2 = 49.045399, (4.746)
2
v2vT
2
vT
2 v2
=
1.816497 0.577350
0.577350 0.183503
, (4.747)
H2 =
−0.816497 −0.577350
−0.577350 0.816497
, (4.748)
H2 =


1 0 0
0 −0.816497 −0.577350
0 −0.577350 0.816497

 , (4.749)
196 LINEAR ALGEBRA
H2H1A =


−2.828427 −3.535534 0 −7.778175
0 −3.674235 −2.449491 −6.123726
0 0 0 0

 = R, (4.750)
Q = H1H2 =


−0.707107 0.408248 −0.577350
0 −0.816497 −0.577350
−0.707107 −0.408248 0.577350

 . (4.751)
The same factorization may be found by u with the Givens matrices too.
At the beginning, we equate to zero the element a31 = 2. To do this, we choose the Givens
matrix
G1 =


1 0 0
0 cos θ sin θ
0 − sin θ cos θ

 , (4.752)
such that
GT
1


2
0
2

 =


2
−2 sin θ
2 cos θ

 . (4.753)
The element 2 cos θ vanishes for θ = π/2 and we obtain
G1 =


1 0 0
0 0 1
0 −1 0

 , GT
1 =


1 0 0
0 0 −1
0 1 0

 , (4.754)
GT
1 A =


2 1 −1 3
−2 −4 −1 −8
0 3 2 5

 . (4.755)
We now equate to zero the element −2 of row 2 and column 1 in the matrix GT
1 A. For this, we
choose
G2 =


cos θ sin θ 0
− sin θ cos θ 0
0 0 1

 , GT
2 =


cos θ − sin θ 0
sin θ cos θ 0
0 0 1

 (4.756)
and obtain
GT
2


2
−2
0

 =


2 cos θ + 2 sin θ
2 sin θ − 2 cos θ
0

 . (4.757)
The element 2 sin θ − 2 cos θ vanishes for θ = π/4 and we obtain
G2 =








√
2
2
√
2
2
0
−
√
2
2
√
2
2
0
0 0 1








, (4.758)
NUMERICAL EXAMPLES 197
GT
2 GT
1 A =








√
2
2
−
√
2
2
0
√
2
2
√
2
2
0
0 0 1












2 1 −1 3
−2 −4 −1 −8
0 3 2 5



 =








2
√
2 5
√
2
2
0 11
√
2
2
0 −3
√
2
2
−
√
2 −5
√
2
2
0 3 2 5








.
(4.759)
Obviously, the procedure may be continued, obtaining again the known factorization.
Example 4.7 Let us consider the matrix
A =
1 2
0 2
, (4.760)
for which we want to calculate the SVD.
Let u ∈ M2,1(R),
u = cos θ sin θ
T
, θ ∈ [0, 2π), u 2 = 1. (4.761)
To determine A 2 we have to calculate
Au =
cos θ + 2 sin θ
2 sin θ
(4.762)
and
Au 2 =
9
2
+ 2 sin 2θ −
7
2
cos 2θ. (4.763)
Let f : [0, 2π) → R,
f (θ) =
9
2
+ 2 sin 2θ −
7
2
cos 2θ, (4.764)
for which
f (θ) = 4 cos 2θ + 7 sin 2θ. (4.765)
The equation f (θ) = 0 leads to the solution
tan 2θ = −
4
7
, sin 2θ =
4
√
65
, cos 2θ = −
7
√
65
, (4.766)
hence
A 2 =
9
2
+
√
65
2
= 2.92081 (4.767)
The equation
Ax = σy = A 2y (4.768)
leads to
1 2
0 2
x1
x2
= σ
y1
y2
(4.769)
wherefrom
x1 + 2x2 = σy1, 2x2 = σy2; (4.770)
198 LINEAR ALGEBRA
moreover,
x2
1 + x2
2 = 1, y2
1 + y2
2 = 1. (4.771)
Relation (4.720) leads to
(x1 + x2)2
+ (2x2)2
= σ2
, (4.772)
hence
x2
1 + 4x1x2 + 8x2
2 = σ2
. (4.773)
It follows that
4x1x2 + 7x2
2 = σ2
− 1. (4.774)
We obtain successively
x1 =
σ2
− 1 − 7x2
2
4x2
, (4.775)
σ2
− 1 − 7x2
2
4x2
2
+ x2
2 = 1, (4.776)
65x4
2 − [14(σ2
− 1) + 16]x2
2 + (σ2
− 1)2
= 0, (4.777)
x2
2 = 0.93412, x2 = ±0.9665. (4.778)
We choose
x2 = 0.9665, x1 = 0.2567, (4.779)
wherefrom
y1 = 0.7497, y2 = 0.6618. (4.780)
We now determine the vector v2 so that x = x1 x2
T
and v2 = v1 v2
T
are orthogonal. We
deduce
0.2567v1 + 0.9665v2 = 0 (4.781)
and may choose
v1 = −0.9665, v2 = 0.2567, (4.782)
resulting in the matrix
V =
0.2567 −0.9665
0.9665 0.2567
. (4.783)
Analogically, we get
U =
0.7497 −0.6618
0.6618 0.7497
. (4.784)
Moreover,
UT
AV =
2.92 0
0 0.68
(4.785)
and the problem is solved.
Example 4.8 Let the matrix
A =


−1 −3 −4
8 12 14
−4 −5 −5

 , (4.786)
NUMERICAL EXAMPLES 199
for which we wish to determine the eigenvalues and eigenvectors.
We begin solving with Krylov’s method. To do this, we consider the vector
y(0)
= 1 0 1
T
(4.787)
and calculate successively
y(1)
= Ay(0)
=


−1 −3 −4
8 12 14
−4 −5 −5




1
0
1

 =


−5
22
−9

 , (4.788)
y(2)
= Ay(1)
=


−1 −3 −4
8 12 14
−4 −5 −5




−5
22
−9

 =


−25
98
−45

 , (4.789)
y(3)
= Ay(2)
=


−1 −3 −4
8 12 14
−4 −5 −5




−5
22
−9

 =


−89
346
−165

 . (4.790)
It results in the linear system


−25 −5 1
98 22 0
−45 −9 1




q1
q2
q3

 = −


−89
346
−165

 , (4.791)
with the solution
q1 = −6, q2 = 11, q3 = −6 (4.792)
and the characteristic polynomial
P (λ) = λ3
− 6λ2
+ 11λ − 6. (4.793)
The eigenvalues of the matrix A result from the equation P (λ) = 0 and are
λ1 = 3, λ2 = 2, λ3 = 1. (4.794)
The polynomials φi(λ), i = 1, 3, are obtained by dividing P (λ) by λ − λi; we have
φ1(x) = λ2
− 3λ + 2, φ2(λ) = λ2
− 4λ + 3, φ3(λ) = λ2
− 5λ + 6. (4.795)
The eigenvectors are
ciφi(λi)xi = y(2)
+ q1iy(1)
+ q21y(0)
, i = 1, 3, (4.796)
where
φ1(λ1) = 2, φ2(λ2) = −1, φ3(λ3) = 2. (4.797)
It follows that
2c1x1 =


−25
98
−45

 − 3


−5
22
−9

 + 2


1
0
1

 =


−8
32
−16

 , (4.798)
200 LINEAR ALGEBRA
−c2x2 =


−25
98
−45

 − 4


−5
22
−9

 + 3


1
0
1

 =


−2
10
−6

 , (4.799)
2c3x3 =


−25
98
−45

 − 5


−5
22
−9

 + 6


1
0
1

 =


6
−12
6

 . (4.800)
To apply the Danilevski method, we must obtain the Frobenius form of the matrix A.
We multiply the matrix A on the left by the matrix
M1 =





1 0 0
−
4
5
−
1
5
−1
0 0 1





, (4.801)
the inverse of which is
M−1
1 =


1 0 0
−4 −5 −5
0 0 1

 , (4.802)
and obtain
A2 = M−1
1 AM1 =







7
5
3
5
−1
12
5
23
5
−6
0 1 0







. (4.803)
We now multiply the matrix A2 on the left by the matrix
M2 =





5
12
−
23
12
5
2
0 1 0
0 0 1





, (4.804)
the inverse of which is
M−1
2 =





12
5
23
12
−6
0 1 0
0 0 1





, (4.805)
obtaining
A3 = M−1
2 A2M2 =


6 −11 6
1 0 0
0 1 0

 . (4.806)
The matrix A3 is just the required Frobenius form. The characteristic polynomial is
−λ3
+ 6λ2
− 11λ + 6 = 0 (4.807)
and has the roots given by equation (4.720).
NUMERICAL EXAMPLES 201
We obtain the eigenvectors of the Frobenius matrix in the form
yi = λ2
i λi 1
T
, i = 1, 3, (4.808)
that is
y1 = 9 3 1
T
, y2 = 4 2 1
T
, y3 = 1 1 1
T
. (4.809)
The eigenvectors of the matrix A are
xi = M1M2yi, i = 1, 3, (4.810)
and it follows successively that
M1M2 =







5
12
−
23
12
5
2
−
1
3
4
3
−3
0 0 1







, (4.811)
x1 =







5
12
−
23
12
5
2
−
1
3
4
3
−3
0 0 1











9
3
1



 =





1
2
−2
1





, (4.812)
x2 =







5
12
−
23
12
5
2
−
1
3
4
3
−3
0 0 1









4
2
1

 =







1
3
−
5
3
1







, (4.813)
x3 =







5
12
−
23
12
5
2
−
1
3
4
3
−3
0 0 1









1
1
1

 =


1
−2
1

 . (4.814)
The maximum eigenvalue in modulus and the corresponding eigenvector may be determined by
means of the direct power method.
To do this, we use the vector y(0)
defined by relation (4.713) and calculate successively
y(1)
= y(0)
= 1 0 1
T
, (4.815)
y(2)
= Ay(1)
=


−1 −3 −4
8 12 14
−4 −5 −5




1
0
1

 =


−5
22
−9

 , (4.816)
y(3)
= Ay(2)
=


−1 −3 −4
8 12 14
−4 −5 −5




−5
22
−9

 =


−25
98
−45

 , (4.817)
202 LINEAR ALGEBRA
y(4)
= Ay(3)
=


−1 −3 −4
8 12 14
−4 −5 −5




−25
98
−45

 =


−89
346
−165

 , (4.818)
y(5)
= Ay(4)
=


−1 −3 −4
8 12 14
−4 −5 −5




−89
346
−165

 =


−289
1130
−549

 , (4.819)
y(6)
= Ay(5)
=


−1 −3 −4
8 12 14
−4 −5 −5




−289
1130
−549

 =


−905
3562
−1749

 , (4.820)
y(7)
= Ay(6)
=


−1 −3 −4
8 12 14
−4 −5 −5




−905
3562
−1749

 =


−2785
11018
−5445

 , (4.821)
y(8)
= Ay(7)
=


−1 −3 −4
8 12 14
−4 −5 −5




−2785
11018
−5445

 =


−8489
33706
−16725

 , (4.822)
y(9)
= Ay(8)
=


−1 −3 −4
8 12 14
−4 −5 −5




−8489
33706
−16725

 =


−25729
102410
−50949

 , (4.823)
y(10)
= Ay(9)
=


−1 −3 −4
8 12 14
−4 −5 −5




−8489
33706
−16725

 =


−25729
102410
−50949

 . (4.824)
It follows that
λ1 ≈
y(10)
1
y(9)
1
= 3.02, λ1 ≈
y(0)
2
y(9)
2
= 3.025, λ1 ≈
y(10)
3
y(9)
3
= 3.029. (4.825)
The eigenvector is y(10)
on normalization gives
y(10)
= −0.219 0.873 −0.435
T
. (4.826)
The eigenvalue λ3 = 1 may be obtained by using the inverse power method.
We have
A−1
=
1
6


10 5 6
−16 −11 −18
8 7 12

 (4.827)
and, using the same vector y(0)
given by equation (4.787), we have
y(1)
= A−1
y(0)
=
1
6


10 5 6
−16 −11 −18
8 7 12




1
0
1

 =
1
6


16
−34
20

 , (4.828)
y(2)
= A−1
y(1)
=
1
36


10 5 6
−16 −11 −18
8 7 12




16
−34
20

 =
1
36


110
−242
110

 , (4.829)
NUMERICAL EXAMPLES 203
y(3)
= A−1
y(2)
=
1
63


10 5 6
−16 −11 −18
8 7 12




110
−242
110

 =
1
63


550
−1078
506

 , (4.830)
y(4)
= A−1
y(3)
=
1
64


10 5 6
−16 −11 −18
8 7 12




550
−1078
506

 =
1
64


3146
6050
2926

 , (4.831)
y(5)
= A−1
y(4)
=
1
65


10 5 6
−16 −11 −18
8 7 12




3146
6050
2926

 =
1
65


18766
−36454
17930

 , (4.832)
y(6)
= A−1
y(5)
=
1
66


10 5 6
−16 −11 −18
8 7 12




18766
−36454
17930

 =
1
66


112970
−222002
110110

 , (4.833)
y(7)
= A−1
y(6)
=
1
67


10 5 6
−16 −11 −18
8 7 12




112970
−222002
110110

 =
1
67


680350
−1347478
671066

 , (4.834)
y(8)
= A−1
y(7)
=
1
68


10 5 6
−16 −11 −18
8 7 12




680350
−1347478
671066

 =
1
68


4092506
−8142530
4063246

 , (4.835)
y(9)
= A−1
y(8)
=
1
69


10 5 6
−16 −11 −18
8 7 12




4092506
−8142530
4063246

 =
1
69


245914886
−49050694
24501290

 , (4.836)
y(10)
= A−1
y(9)
=
1
610


10 5 6
−16 −11 −18
8 7 12




245914886
−49050694
24501290

=
1
610


147673130
−294935762
147395710

. (4.837)
It follows that
λ3 ≈
y(10)
1
y(9)
1
= 1.0008, λ3 ≈
y(10)
2
y(9)
2
= 1.0021, λ3 ≈
y(10)
3
y(9)
3
= 1.0026, (4.838)
and we obtain the eigenvector y(10)
or, when normalized,
y(10)
= 0.4088 −0.8164 0.4080
T
. (4.839)
The eigenvalue λ2 may be found by means of the displacement method. To do this, we consider
the matrix
B = A − 1.9I3 =


−2.9 −3 −4
8 10.1 14
−4 −5 −6.9

 , (4.840)
the inverse of which is
B−1
=


−3.131313 7.070707 16.161616
8.080808 −40.505051 −86.868687
−4.040404 25.252525 53.434343

 . (4.841)
204 LINEAR ALGEBRA
We successively calculate
B−2
=


1.64269 99.58167 198.75522
−1.63249 −495.85751 −992.55170
0.81624 297.92876 596.27586

 , (4.842)
B−4
=


2.364 10000.159 19999.477
−3.360 −49997.593 −99994.870
1.679 29998.797 59997.436

 , (4.843)
B−8
=


−15.8 100000011.4 200000020.9
92.6 −499999600.9 −1000000199
−56.3 3000000505 600000099.5

 . (4.844)
It follows that for B−8
, the eigenvalue
λ ≈
8
Tr(B−8) = 10.0; (4.845)
hence, the matrix B has the eigenvalue
λ =
1
λ
= 0.1. (4.846)
We deduce from equation (4.840) that the matrix A has the eigenvalue
λ2 = λ + 1.9 = 2.0. (4.847)
The eigenvalues of the matrix A may be determined by the Leverier method too. We calculate
A =


−1 −3 −4
8 12 14
−4 −5 −5

 , S1 = Tr(A) = 6, (4.848)
A2
=


−7 −13 −18
32 50 66
−16 −23 −29

 , S2 = Tr(A2
) = 14, (4.849)
A3
=


−25 −45 −64
104 174 242
−52 −83 −113

 , S3 = Tr(A3
) = 36, (4.850)
the coefficients of the characteristic polynomial being given by
p1 = −S1 = −6, (4.851)
p2 = −
1
2
(S2 + p1S1) = 11, (4.852)
p3 = −
1
3
(S3 + p1S2 + p2S1) = −6. (4.853)
We obtain the characteristic equation
λ3
− 6λ2
+ 11λ − 6 = 0, (4.854)
whose roots are given by equation (4.794).
NUMERICAL EXAMPLES 205
Another method to determine the eigenvalues is the Left–Right one.
We write the matrix A in the form
A =


1 0 0
l21 1 0
l31 l32 1




r11 r12 r13
0 r22 r23
0 0 r33

 ; (4.855)
it results in the system
r1 = −1, r12 = −3, r13 = −4, l21r11 = 8, l2r12 + r22 = 12, l21r13 + r23 = 14,
l31r11 = −4, l31r12 + l32r22 = −5, l31r13 + l32r23 + r33 = −5, (4.856)
with the solution
r11 = −1, r12 = −3, r13 = −4, l21 = −8, r22 = −12, r23 = −18, l31 = 4,
l32 = −
7
12
, r33 =
1
2
, (4.857)
hence the matrices
L1 =


1 0 0
−8 1 0
4 − 7
12 1

 , R1 =


−1 −3 −4
0 −12 −18
0 0 1
2

 , (4.858)
A2 = R1L1 =


−1 −3 −4
0 −12 −18
0 0 1
2




1 0 0
−8 1 0
4 − 7
12 1

 =




7 −2
3 −4
24 −3
2 −18
2 − 7
24
1
2



 . (4.859)
The procedure can continue, the data obtained being given in Table 4.1.
This results in the following eigenvalues
λ1 ≈ 3.0002, λ2 ≈ 1.9888, λ3 ≈ 1.0056. (4.860)
Example 4.9 Let the linear system be
2x1 + 3x2 + x3 + 3x4 = 9, x1 − 2x2 − x3 + 5x4 = 3, 3x1 + 6x2 + x3 − 2x4 = 8,
−2x1 − x2 + 6x3 + 4x4 = 7, x1 + 2x2 + 5x3 − 7x4 = 1. (4.861)
We wish to determine the solution of this system in the sense of the least squares method. We
have
A =






2 3 1 3
1 −2 −1 5
3 6 1 −2
−2 −1 6 4
1 2 5 −7






. (4.862)
We shall first determine the rank of the matrix A.
We commute rows 1 and 2 with each other,
A ∼






1 −2 −1 5
2 3 1 3
3 6 1 −2
−2 −1 6 4
1 2 5 −7






; (4.863)
206 LINEAR ALGEBRA
TABLE 4.1 Determination of the Eigenvalues by the L–R Method
Step A L R
1


−1 −3 −4
8 12 14
−4 −5 −5




1 0 0
−8 11 0
4 0.583333 1




−1 −3 −4
0 −12 −18
0 0 0.5


2


7 −0.6667 −4
24 −1.5 −18
2 0.2917 0.5




1 0 0
3.4286 1 0
0.2857 −0.1288 1




7 −0.6667 −4
0 0.7857 −4.2857
0 0 1.0909


3


3.5714 −0.1515 −4
1.4694 1.3377 −4.2857
0.3117 −0.1405 1.0909




1 0 0
0.4114 1 0
0.0873 −0.0909 1




3.5714 −0.1515 −4
0 1.4 −2.64
0 0 1.2


4


3.16 0.2121 −4
0.3456 1.64 −2.64
0.1047 −0.1091 1.2




1 0 0
0.1094 1 0
0.0331 −0.0718 1




3.16 0.2121 −4
0 1.16168 −2.2025
0 0 1.1744


5


3.0506 0.4994 −4
0.1038 1.7750 −2.2025
0.0389 −0.0843 1.1744




1 0 0
0.0348 1 0
0.0128 −0.0516 1




3.0506 0.4994 −4
0 1.7580 −2.0664
0 0 1.1188


6


3.0166 0.7058 −4
0.0335 1.8646 −2.0664
0.0143 −0.0577 1.1188




1 0 0
0.0111 1 0
0.0047 −0.0329 1




3.0166 0.7058 −4
0 1.8568 −2.0220
0 0 1.0712


7


3.0055 0.8374 −4
0.0110 1.9233 −2.0220
0.0051 −0.0352 1.0712




1 0 0
0.0037 1 0
0.0017 −0.0191 1




3.0055 0.8374 −4
0 1.9202 −2.0073
0 0 1.0396


8


3.0018 0.9137 −4
0.0037 1.9585 −2.0073
0.0018 −0.0198 1.0396




1 0 0
0.0012 1 0
0.0006 −0.0104 1




3.0018 0.9137 −4
0 1.9574 −2.0024
0 0 1.0211


9


3.0006 0.9553 −4
0.0012 1.9783 −2.0024
0.0006 −0.0106 1.0211




1 0 0
0.0004 1 0
0.0002 −0.0055 1




3.0006 0.9553 −4
0 1.9779 −2.0008
0 0 1.0110


10


3.0002 0.9772 −4
0.0004 1.9888 −2.0008
0.0002 −0.0055 1.0110




1 0 0
0.0001 1 0
0.0001 −0.0028 1




3.0002 0.9772 −4
0 1.9887 −2.0003
0 0 1.0056


then we multiply row 1 by −2, −3, 2 and −1, and add it to rows 2, 3, 4, 5, respectively, obtaining
A ∼






1 −2 −1 5
0 7 3 −7
0 12 4 −17
0 −5 4 −6
0 4 6 −12






. (4.864)
We multiply column 1 by 2, 1, −5 and add this to columns 2, 3, 4, respectively, to get
A ∼






1 0 0 0
0 7 3 −7
0 12 4 −17
0 −5 4 −6
0 4 6 −12






; (4.865)
NUMERICAL EXAMPLES 207
We also multiply row 2 by −12/7, 5/7 and −4/7, and add this to rows 3, 4, 5, respectively
A ∼













1 0 0 0
0 7 3 −7
0 0 −
8
7
−5
0 0
43
7
−11
0 0
30
7
−8













. (4.866)
We now multiply column 2 by −3/7 and 1, and add this to columns 3, 4, respectively
A ∼












1 0 0 0
0 7 0 0
0 0 −
8
7
−5
0 0
43
7
−11
0 0
30
7
−8












. (4.867)
We now multiply row 3 by 43/8 and 30/8, and add this to rows 4, 5, respectively
A ∼













1 0 0 0
0 7 0 0
0 0 −
8
7
−5
0 0 0 −
303
7
0 0 0 −
107
4













. (4.868)
Finally, we multiply row 4 by −749/1212 and add this to row 5
A ∼












1 0 0 0
0 7 0 0
0 0 −
8
7
−5
0 0 0 −
303
7
0 0 0 0












. (4.869)
It follows that
rank(A) = 4, (4.870)
so that we must solve the linear system
AT
AxLS = AT
b, (4.871)
208 LINEAR ALGEBRA
that is




2 1 3 −2 1
3 −2 6 −1 2
1 −1 1 6 5
3 5 −2 4 −7










2 3 1 3
1 −2 −1 5
3 6 1 −2
−2 −1 6 4
1 2 5 −7










x1
x2
x3
x4



 =




2 1 3 −2 1
3 −2 6 −1 2
1 −1 1 6 5
3 5 −2 4 −7










9
3
8
7
1






(4.872)
or, equivalently, 



19 26 −3 −10
26 54 15 −31
−3 15 64 −15
−10 −31 −15 103








x1
x2
x3
x4



 =




32
64
61
47



 . (4.873)
The solution of this system is
xLS = x1 x2 x3 x4
T
= 1 1 1 1
T
. (4.874)
Example 4.10 Let us again take the matrix A of Example 4.7 for which we have found
U =
0.7497 −0.6618
0.6618 0.7497
, V =
0.2567 −0.9665
0.9665 0.2567
, (4.875)
A =
1 2
0 2
, (4.876)
Σ = UT
AV =
2.92 0
0 0.68
. (4.877)
Its pseudo-inverse (in fact, it is just the inverse) is given by
A+
= VΣ+
UT
=
0.2567 −0.9665
0.9665 0.2567
1
2.92 0
0 1
0.68
0.7497 0.6618
−0.6618 0.7497
=
1 −1
0 0.5
. (4.878)
Example 4.11 Let the underdetermined linear system be
2x1 + 3x2 + x3 = 6, x1 + 4x2 + 3x3 = 8. (4.879)
The matrix A has the expression
A =
2 3 1
1 4 3
, AT
=


2 1
3 4
1 3

 . (4.880)
We find now the QR decomposition of the matrix AT
.
We have
x1 = 2 3 1
T
, x1 2 =
√
14 = λ1 (4.881)
and choose
v1 = x1 + λ1e1 = 2 +
√
14 3 1
T
. (4.882)
NUMERICAL EXAMPLES 209
Then
v1vT
1 =


18 + 4
√
14 6 + 3
√
14 2 +
√
14
6 + 3
√
14 9 3
2 +
√
14 3 1

 , (4.883)
vT
1 v1 = 28 + 4
√
14, (4.884)
2
v1vT
1
vT
1 v1
=


1.53452 0.80178 0.26726
0.80178 0.41893 0.13964
0.26726 0.13964 0.04655

 , (4.885)
H1 =


−0.53452 −0.80178 −0.26726
−0.80178 0.58107 −0.13964
−0.26726 −0.13964 0.95345

 , (4.886)
H1AT
=


−0.53452 −0.80178 −0.26726
−0.80178 0.58107 −0.13964
−0.26726 −0.13964 0.95345




2 1
3 4
1 3

 =


−3.74164 −4.54342
0 1.10358
0 2.03453

 . (4.887)
The next vector is
x2 = 1.10358 2.03453
T
, (4.888)
for which
x2 2 = 2.31456. (4.889)
We choose
v2 = x2 + x2 2e2 = 3.418174 2.03453
T
, (4.890)
for which
v2vT
2 =
11, 68368 6, 95431
6, 95431 4, 13931
, (4.891)
vT
2 v2 = 15, 82299, (4.892)
2
v2vT
2
vT
2 v2
=
1, 47680 0, 87901
0, 87901 0, 52320
, (4.893)
H2 =


1 0 0
0 −0.47680 −0.87901
0 −0.87901 0.47680

 , (4.894)
H2H1AT
=


−3.74164 −4.54342
0 −2.31456
0 0

 = R, (4.895)
Q = H1H2 =


−0.53452 0.61721 0.57735
−0.80178 −0.15431 −0.57735
−0.26726 −0.77151 0.57735

 . (4.896)
It results in the system
−3.74164 0 0
−4.54342 −2.31456 0


z1
z2
z3

 =
6
8
, (4.897)
210 LINEAR ALGEBRA
with the solution
z1 = −1.60357, z2 = −0.30861, z3 = 0. (4.898)
The vector x is given by the system


0.53452 −0.80178 −0.26726
0.61721 −0.15431 −0.77151
0.57735 −0.57735 0.57735




x1
x2
x3

 =


−1.60357
−0.30861
0

 (4.899)
and it follows that
x1 = 0.66667, x2 = 1.33334, x3 = 0.66667, x 2 = 1.633. (4.900)
If we consider z3 = 0.57785, then we obtain x1 = 1, x2 = 1, x3 = 1, x 2 =
√
3 = 1.73205 >
x 2.
Example 4.12 Let
A =
1 2
2 2
, (4.901)
be the matrix for which we wish to determine the eigenvalues and eigenvectors by means of the
rotation method.
To do this, we construct the matrix R1 given by
R1 =
cos α − sin α
sin α cos α
, R−1
1 = RT
1 , (4.902)
where
tan 2α =
2a12
a11 − a22
= −4. (4.903)
It follows that
α = −0.66291, (4.904)
R1 =
0.78821 0.61541
−0.61541 0.78821
(4.905)
and the new matrix
A2 = R−1
1 AR1 =
−0.56156 0
0 3.56157
. (4.906)
We observe that the matrix A2 is a diagonal one, the eigenvalues of the matrix A being given
by
λ1 ≈ 0.56156, λ2 = 3.56157, (4.907)
while the eigenvectors read
v1 = 0.78821 −0.61541
T
, v2 = 0.61541 0.78821
T
. (4.908)
The exact eigenvalues of the matrix A are
λ1 =
3 −
√
17
2
= −0.56155, λ2 =
3 +
√
17
2
= 3.56155. (4.909)
APPLICATIONS 211
4.13 APPLICATIONS
Problem 4.1
Let us show that the motion of the system in Figure 4.1 is stable if the force F is given by
¨F = −40˙x − 25x, (4.910)
the constants of the system being m = 4 kg, c = 20 N s m−1
, k = 41 N m−1
.
x
c
k
F
Figure 4.1 Problem 4.1.
Solution: Differentiating twice the differential equation of motion
m¨x + c ˙x + kx = F (4.911)
with respect to time and taking into account the numerical values, we obtain
4
····
x + 20
···
x + 41
··
x + 40
·
x + 25x = 0; (4.912)
the characteristic equation is
b0r4
+ b1r3
+ b2r2
+ b3r + b4 = 0, (4.913)
where b0 = 4, b1 = 20, b2 = 41, b3 = 40, b4 = 25.
The motion is asymptotically stable if the solutions of equation (4.913) are either strictly negative
or complex with a strict negative real part. To this end, the conditions of the Routh–Hurwitz criterion
must be fulfilled, that is,
bi > 0, i = 1, 4, det A1 > 0, det A2 > 0, (4.914)
where
A1 =
b1 b0
b3 b2
, A2 =


b1 b0 0
b3 b2 b1
0 b4 b3

 (4.915)
or, equivalent,
A1 =
20 4
40 41
, A2 =


20 4 0
40 41 20
0 25 40

 . (4.916)
In case of the numerical application, we obtain the values det A1 = 660, det A2 = 16,400, con-
ditions (4.914) are fulfilled and, as a consequence, the motion is asymptotically stable.
212 LINEAR ALGEBRA
Moreover, the roots of equation (4.913) are r1,2 = −2 ± i, r3,4 = −1/2 ± i, obtaining a solution
obviously asymptotically stable:
x = C1e−2t
cos(t + φ1) + C2e− t
2 cos(t + φ2), (4.917)
where C1, C2, φ1, φ2 are integration constants that may be determined by the initial conditions.
Problem 4.2
We consider a rigid solid acted upon by five forces of intensities Fi, i = 1, 5, the supports of which
being the straight lines of equations
bix − aiy = 0, z − zi = 0, a2
i + b2
i = 1, i = 1, 5. (4.918)
If we show that if the rank of the matrix
A =




a1 a2 a3 a4 a5
b1 b2 b3 b4 b5
a1z1 a2z2 a3z3 a4z4 a5z5
b1z1 b2z2 b3z3 b4z4 b5z5



 (4.919)
is equal to four, then we may determine the intensities Fi, i = 1, 5, so that the solid is in equilibrium.
Solution: The equations of equilibrium
5
i=1
Fiai = 0,
5
i=1
Fibi = 0,
5
i=1
Fiaizi = 0,
5
i=1
Fibizi = 0 (4.920)
form a system of homogeneous algebraic equations, which admits solutions if rankA = 4. Because
the determinant
a1 a2 a3 a4
b1 b2 b3 b4
a1z1 a2z2 a3z3 a4z4
b1z1 b2z2 b3z3 b4z4
=
1 0
√
2
2
3
5
0 1
√
2
2
4
5
0 0
√
2
2
−
3
10
0
1
2
√
2
2
−
2
5
=
3
√
2
40
= 0, (4.921)
system (4.920) admits the solution
F1
a2 a3 a4 a5
b2 b3 b4 b5
a2z2 a3z3 a4z4 a5z5
b2z2 b3z3 b4z4 b5z5
=
F2
a3 a4 a5 a1
b3 b4 b5 b1
a3z3 a4z4 a5z5 a1z1
b3z3 b4z4 b5z5 b1z1
=
F3
a4 a5 a1 a2
b4 b5 b1 b2
a4z4 a5z5 a1z1 a2z2
b4z4 b5z5 b1z1 b2z2
=
F4
a5 a1 a2 a3
b5 b1 b2 b3
a5z5 a1z1 a2z2 a3z3
b5z5 b1z1 b2z2 b3z3
=
F5
a1 a2 a3 a4
b1 b2 b3 b4
a1z1 a2z2 a3z3 a4z4
b1z1 b2z2 b3z3 b4z4
(4.922)
APPLICATIONS 213
or, equivalent,
F1
0
√
2
2
3
5
−
3
5
1
√
2
2
4
5
4
5
0
√
2
2
−
3
10
3
5
1
2
√
2
2
−
2
5
−
4
5
=
F2
√
2
2
3
5
−
3
5
1
√
2
2
4
5
4
5
0
√
2
2
−
3
10
3
5
0
√
2
2
−
2
5
−
4
5
0
=
F3
3
5
−
3
5
1 0
4
5
4
5
0 1
−
3
10
3
5
0 0
−
2
5
−
4
5
0
1
2
=
F4
−
3
5
1 0
√
2
2
4
5
0 1
√
2
2
3
5
0 0
√
2
2
−
4
5
0
1
2
√
2
2
=
F5
1 0
√
2
2
3
5
0 1
√
2
2
4
5
0 0
√
2
2
−
3
10
0
1
2
√
2
2
−
2
5
, (4.923)
from which
F1
−
213
√
2
200
=
F2
−
19
√
2
25
=
F3
21
25
=
F4
19
√
2
20
=
F5
−
13
√
2
40
. (4.924)
Denoting now by λ (arbitrary real number) the common value of the ratios in relations (4.924), we
obtain the forces
F1 = −
213
√
2
200
λ, F2 = −
19
√
2
25
λ, F3 =
21
25
λ, F4 =
19
√
2
20
λ, F5 = −
13
√
2
40
λ. (4.925)
Problem 4.3
Let us consider a rigid solid (Fig. 4.2), suspended by n = 6 bars with spherical hinges.
Let ui be the unit vectors in the directions AiA0i, ri the position vectors of the points Ai, the
vectors mi = ri · ui, (ai, bi, ci), (di, ei, fi) the projections of the vectors ui, mi, i = 1, 6, on the
axes of the OXYZ-trihedron, while A is the matrix defined by
A =








a1 a2 a3 a4 a5 a6
b1 b2 b3 b4 b5 b6
c1 c2 c3 c4 c5 c6
d1 d2 d3 d4 d5 d6
e1 e2 e3 e4 e5 e6
f1 f2 f3 f4 f5 f6








. (4.926)
Let us show that if rankA = 6, then the equilibrium of the rigid solid is a statically determined
(isostatic) problem, hence the efforts Ni in the bars AiA0i, i = 1, 6, may be determined for any
system of forces (F, MO) that acts upon the rigid solid.
214 LINEAR ALGEBRA
Z
Mo
F
Y
O
Ai
ui
A0i
X
Figure 4.2 Problem 4.3.
As a numerical application, we consider the cube in Figure 4.3 of side l = 2 m, acted upon by
the force of components FX = 2000 N, FY = 2000 N, FZ = 4000 N, by the moment of projections
MOX = 3000 Nm, MOY = 1000 N m, MOZ = 2000 N m, the bars A1A01, A5A05 being parallel to
the OX-axis, the bars A2A02, A6A06 being parallel to the OY -axis, while the bars A3A03, A4A04
are parallel to the OZ-axis.
Solution: By means of the vectors ui, mi, i = 1, 6, the equations of equilibrium
6
i=1
Ni + F = 0,
6
i=1
ri · Ni + MO = 0 (4.927)
are obtained in the form
6
i=1
Niui + F = 0,
6
i=1
Nimi + MO = 0. (4.928)
A03
X
Z
O Y
F
A3
A05
A5
A02
A06
A01
A4
A2
A6A1
A04
Mo
u1
Figure 4.3 Numerical application.
APPLICATIONS 215
If we denote by (FX, FY , FZ), (MOX , MOY , MOZ ) the projections of the vectors F, MO on the
axes OX, OY , OZ and by {F}, {N} the column matrices
{F} = FX FY FZ MOX MOY MOZ
T
,
{N} = N1 N2 N3 N4 N5 N6
T
,
(4.929)
then system (4.928) leads to the matrix equation
A{N} + {F} = {0}, (4.930)
which has a solution if rankA = 6 and the problem is isostatic.
Observation 4.42 If the number of bars n > 6, then equation (4.930) may have as well a solution
if rankA = 6. In this case, the problem is statically undetermined (hyperstatic); the determination
of the reactions Ni, i = 1, n, is possible by taking into account the elastic equilibrium equations.
In the numerical case, it follows that
A =








1 0 0 0 1 0
0 1 0 0 0 1
0 0 1 1 0 0
0 0 0 2 0 0
0 0 0 0 2 0
0 0 0 0 0 2








, (4.931)
det A = 8 = 0, rankA = 6; (4.932)
and because
{F} = 1000 2 2 4 3 1 2
T
, (4.933)
we obtain the matrix equation








1 0 0 0 1 0
0 1 0 0 0 1
0 0 1 1 0 0
0 0 0 2 0 0
0 0 0 0 2 0
0 0 0 0 0 2
















N1
N2
N3
N4
N5
N6








=








−2000
−2000
−4000
−3000
−1000
−2000








, (4.934)
from which the values
N1 = −1500 N, N2 = −1000 N, N3 = −2500 N,
N4 = −1500 N, N5 = −500 N, N6 = −1000 N.
(4.935)
Problem 4.4
A homogeneous straight bar AB, of constant cross section, of mass m and length 2l is moving,
under the action of its own weight, in the vertical plane OXY (Fig. 4.4), with the end A on the
hyperbola of equation
F(X, Y) = (X − 2l)Y − 8l2
= 0. (4.936)
216 LINEAR ALGEBRA
O
Y
X
B0(l,8l)
B
A0(3l,8l)
A(XA,YA)
mg
θ
Figure 4.4 Problem 4.4.
Knowing that at the initial moment the bar is in rest and parallel to the OX-axis, the end A being
of coordinates (3l, 8l), determine the reaction NA at this moment, the acceleration of the gravity
center C, as well as the angular acceleration.
Numerical application for m = 3 kg, l = 1 m.
Solution: Denoting by (X, Y) the coordinates of the center of gravity C and by θ the angle made
by the bar with the OX-axis, we may write the relations
XA = X + l cos θ, YA = Y + l sin θ, ˙XA = ˙X − l˙θ sin θ, ˙YA = ˙Y + l˙θ cos θ,
¨XA = ¨X − l¨θ sin θ − l˙θ2
cos θ, ¨YA = ¨Y + l¨θ cos θ − l˙θ2
sin θ,
(4.937)
from which, for the initial moment (X = 3l, Y = 8l, θ = 0, ˙θ = 0, ˙X = 0, ˙Y = 0), we obtain
˙XA = 0, ˙YA = 0, ¨XA = ¨X, ¨YA = ¨Y + l¨θ. (4.938)
By successive differentiation of equation (4.936) with regard to time, we get the relations
˙XAYA + (XA − 2l) ˙YA = 0, ¨XAYA + (XA − 2l) ¨YA + 2 ˙XA
˙YA = 0. (4.939)
Taking into account the relations at the initial moment, it follows that
8l ¨X + l ¨Y + l2 ¨θ = 0. (4.940)
The reaction NA has the components
NAX = λ
∂F
∂X X=XA
Y=YA
, NAY = λ
∂F
∂Y X=XA
Y=YA
(4.941)
or
NAX = λYA, NAY = λ(XA − 2l), (4.942)
which, at the initial moment, become
NAX = 8λl, NAY = λl. (4.943)
APPLICATIONS 217
Under these conditions, the theorem of momentum leads to the equations
m ¨X = 8λl, m ¨Y = −mg + λl, (4.944)
while the theorem of moment of momentum with respect to the point C allows to write
ml2
3
¨θ = λl2
. (4.945)
Using the notation
A =







m 0 0 −8l
0 m 0 −l
0 0
ml2
3
−l2
8l l l2
0







, (4.946)
equation (4.944) and equation (4.945), equation (4.940) can be written in the matrix form
A




¨X
¨Y
¨θ
λ



 =




0
−mg
0
0



 , (4.947)
from which, by inverting the matrix A, the matrix




¨X
¨Y
¨θ
λ



 = −mgA−1




0
1
0
0



 (4.948)
is obtained.
For the numerical application, we have
A =




3 0 0 −8
0 3 0 −1
0 0 1 −1
8 1 1 0



 , A−1
=




0.019608 −0.039216 −0.117647 0.117647
−0.039216 0.328431 −0.014706 0.014706
−0.117647 −0.014706 0.955882 0.044118
−0.117647 −0.014706 −0.044118 0.044118



, (4.949)




¨X
¨Y
¨θ
λ



 =




1.153715
−9.662276
0.432643
0.432643



 , (4.950)
NAX = 3.461144 N, NAY = 0.432643 N. (4.951)
Problem 4.5
We consider a system of two homogeneous straight bars, of constant cross sections, lengths 2l1 and
2l2 and masses m1 and m2 (Fig. 4.5), respectively, acted upon by their own weights m1g and m2g
(a double pendulum). The fixed reference system is OXY , the OX-axis being vertical.
Taking as generalized coordinates the coordinates (X1, Y1), (X2, Y2) of the center of gravity C1
and C2, respectively, as well as the angles θ1 and θ2 made by the bars with the OX-axis, it is
required
218 LINEAR ALGEBRA
X
Y
m1g
m2g
θ1
O
C1(X1,Y1)
C2(X2,Y2)
θ2
Figure 4.5 Problem 4.5.
(a) to write the differential equation of motion, using the multibody method;
(b) the dimensions l1 = l2 = 0.5 m and the initial conditions at t = 0: X1 = l1, Y1 = 0, θ1 = 0,
X2 = 2l1, Y2 = l2, θ2 = π/2, ˙X1 = ˙X2 = ˙Y1 = ˙Y2 = 0, ˙θ1 = ˙θ2 = 0 being given, determine
the accelerations ¨X1, ¨Y1, ¨θ1, ¨X2, ¨Y2, ¨θ2 and the reactions, by inverting the matrix in two
cases, that is, m1 = m2 = 3 kg and m1 = 0, m2 = 3 kg.
Solution: Differentiating the constraints functions with respect to time,
X1 − l1cosθ1 = 0, Y1 − l1sinθ1 = 0,
−X1 − l1cosθ1 + X2 − l2cosθ2 = 0,
−Y1 − l1sinθ1 + Y2 − l2sinθ2 = 0, (4.952)
and using the notations
[B] =




1 0 l1 sin θ1 0 0 0
0 1 −l1 cos θ1 0 0 0
−1 0 l1 sin θ1 1 0 l2 sin θ2
0 −1 −l1 cos θ1 0 1 −l2 cos θ2



 , (4.953)
{q} = X1 Y1 θ1 X2 Y2 θ2
T
, (4.954)
[B] being the constraints matrix and {q} the column matrix of the generalized coordinates, we obtain
the relation
[B]{˙q} = {0}. (4.955)
We apply Lagrange’s equations
d
dt
∂T
∂ ˙qk
−
∂T
∂qk
+
∂V
∂qk
=
4
i=1
Bik λi, k = 1, 6, (4.956)
where λi, i = 1, 4, are Lagrange’s multipliers, while the kinetic energy T and the potential energy
V are given by the relations
T =
1
2
2
i=1
mi
˙X2
i + ˙Y2
i +
mil2
i
3
˙θ2
i , (4.957)
V = −g
2
i=1
miXi, (4.958)
respectively.
APPLICATIONS 219
Using the notations
[M] =












m1 0 0 0 0 0
0 m1 0 0 0 0
0 0
m1l2
1
3
0 0 0
0 0 0 m2 0 0
0 0 0 0 m2 0
0 0 0 0 0
m2l2
2
3












, {F} =








m1g
0
0
m2g
0
0








, {λ} =




λ1
λ2
λ3
λ4



 , (4.959)
we obtain the matrix equation
[M]{¨q} = {F} + [B]T
{λ}. (4.960)
Relation (4.960) and relation (4.955), differentiated with respect to time, are expressed together
in the matrix equation of motion of the mechanical system
[M] −[B]T
[B] [0]
{¨q}
{λ}
=
{F}
−[ ˙B]{˙q}
. (4.961)
We obtain
{¨q}
{λ}
=
[M]−1
− [M]−1
[B]T
[[B][M]−1
[B]T
]−1
[B][M]−1
[M]−1
[B]T
[[B][M]−1
[B]T
]−1
−[[B][M]−1
[B]T
]−1
[B][M]−1
[[B][M]−1
[B]T
]−1
×
{F}
−[ ˙B]{˙q}
(4.962)
if the matrix [M] is invertible.
For the first numerical application, we obtain the values
[M] =








3 0 0 0 0 0
0 3 0 0 0 0
0 0 0.25 0 0 0
0 0 0 3 0 0
0 0 0 0 3 0
0 0 0 0 0 0.25








, [M]−1
=


















1
3
0 0 0 0 0
0
1
3
0 0 0 0
0 0 4 0 0 0
0 0 0
1
3
0 0
0 0 0 0
1
3
0
0 0 0 0 0 4


















, (4.963)
[B] =




1 0 0 0 0 0
0 1 −0.5 0 0 0
−1 0 0 1 0 0.5
0 −1 −0.5 0 1 0



 , [B]T
=








1 0 −1 0
0 1 0 −1
0 −0.5 0 −0.5
0 0 1 0
0 0 0 1
0 0 0.5 0








, (4.964)
{F} = 29.4195 0 0 29.4195 0 0
T
, (4.965)
[ ˙B]{˙q} = 0 0 0 0
T
, (4.966)
220 LINEAR ALGEBRA
{¨q}
{λ}
= ¨X1
¨Y1
¨θ1
¨X2
¨Y2
¨θ2 λ1 λ2 λ3 λ4
T
= 0 0 0 7.354875 0 −14.709750 −36.774375 0 −7.354875 0
T
(4.967)
for the initial moment, where λ1, λ2 are the reactions at the hinge O, while λ3, λ4 are the reactions
at the hinge O1.
For the second numerical application, the matrix [M] is not invertible, so that it is necessary to
proceed to the inversion of the total matrix
[A] =
[M] −[B]T
[B] [0]
. (4.968)
Hence, it follows that
{¨q}
{λ}
= ¨X1
¨Y1
¨θ1
¨X2
¨Y2
¨θ2 λ1 λ2 λ3 λ4
T
= 0 0 0 7.354875 0 −14.709750 −7.354875 0 −7.354875 0
T
. (4.969)
Problem 4.6
Consider a rigid solid, as illustrated in Figure 4.6, upon which a percussion P is applied at the point
A. We denote by
• Oxyz —the reference system rigidly connected to the solid;
• m—the mass;
• [JO]—the matrix of the moments of inertia
[JO] =


Jx −Jxy −Jxz
−Jxy Jy −Jyz
−Jxz −Jyz Jz

 ; (4.970)
• rC —the position vector of the center of gravity C;
• xC, yC, zC —the coordinates of the gravity center C;
• rA —the position vector of the point A;
• xA, yA, zA —the coordinates of the point A;
C (xC,yC,zC)
A (xA,yA,zA)
rA
rC
P
z
x
O y
uv0
Oω0
O
Figure 4.6 Problem 4.6.
APPLICATIONS 221
• u—the unit vector of the percussion P;
• a, b, c—the components of the unit vector u;
• d, e, f —the projections on the axes of the vector rA · u;
• P —the intensity of the percussion P;
• v0
O —the velocity of the point O before the application of the percussion;
• v0
Ox , v0
Oy , v0
Oz —the projections of the velocity v0
O on the axes;
• ω0
—the angular velocity of the rigid solid before the application of the percussion;
• ω0
x, ω0
y, ω0
z —the projections of the vector ω0
on the axes;
• vO —the velocity of the point O after application of the percussion;
• vOx , vOy , vOz —projections of the velocity vO on the axes;
• ω—the angular velocity after percussion;
• ωx, ωy, ωz —the projections of the vector ω on the axes;
• {v0
O}, {ω0
}, {vO}, {ω}—the column matrices defined by
{v0
O} = v0
Ox v0
Oy v0
Oz
T
, {ω0
} = ω0
x ω0
y ω0
z
T
,
{vO} = vOx vOy vOz
T
, {ω} = ωx ωy ωz
T
; (4.971)
• {u}, {mu}—the column matrices defined by
{u} = a b c
T
, {mu} = d e f
T
; (4.972)
• {V}, {V0}, {U}—the column matrices defined by
{V} = vOx vOy vOz ωx ωy ωz
T
,
{V0
} = v0
Ox v0
Oy v0
Oz ω0
x ω0
y ω0
z
T
, (4.973)
{U} = a b c d e f
T
;
• [m], [S], [M]—the matrices defined by
[m] =


m 0 0
0 m 0
0 0 m

 , [S] =


0 −mzC myC
mzC 0 −mxC
−myC mxC 0

 , [M] =
[m] [S]T
[S] [JO]
.
(4.974)
Determine the velocities vOx , vOy , vOz , ωx, ωy, ωz after the application of the percussion.
For the numerical application, we take m = 80, Jx = 2, Jxy = 0.8, Jxz = 0.4, Jy = 2, Jyz = 0.4,
Jz = 3.2, xC = 0.05, yC = 0.05, zC = 0.025, xA = 0.2, yA = 0.2, zA = 0.1, a = 2/3, b = 1/3, c =
2/3, v0
Ox = 10, v0
Oy = 8, v0
Oz = 7, ω0
x = 4, ω0
y = 3, ω0
z = 5, P = 100 (quantities given in SI).
Solution: The theorem of momentum for collisions, in matrix form, leads to
[m] vO − {v0
O} + [S]T
{{ω} − {ω0
}} = P {u}. (4.975)
Analogically, the theorem of moment of momentum for collisions about the point O, in matrix
form, reads
[S]{{vO} − {v0
O}} + [JO]{{ω} − {ω0
}} = P {mu}. (4.976)
222 LINEAR ALGEBRA
Equation (4.975) and equation (4.976) may be written together in a matrix form
[M]{{V} − {V0
}} = P {U}; (4.977)
inverting the matrix [M]
{V} = {V0
} + P [M]−1
{U}. (4.978)
For the numerical application, we obtain
rA = 0.2i + 0.2j + 0.1k, u =
2
3
i +
1
3
j +
2
3
k,
rA · u =
i j k
0.2 0.2 0.1
2
3
1
3
2
3
= 0.1i −
0.2
3
j −
0.2
3
k, (4.979)
[S] =


0 −2 4
2 0 −4
−4 4 0

 , (4.980)
{U} =
2
3
1
3
2
3
0.1 −
0.2
3
−
0.2
3
T
, {V0
} = 10 8 7 4 3 5
T
, (4.981)
[M] =








80 0 0 0 2 −4
0 80 0 −2 0 4
0 0 80 4 −4 0
0 −2 4 2 −0.8 −0.4
2 0 −4 −0.8 2 −0.4
−4 4 0 −0.4 −0.4 3.2








, (4.982)
[M]−1
=








−0.013620 −0.000854 −0.000532 −0.001260 −0.011898 0.016447
−0.000854 1.013620 −0.000532 0.011898 0.001260 −0.016447
−0.000532 −0.000532 −0.014628 −0.021277 0.021277 0
−0.001260 0.011898 −0.021277 0.673292 0.247760 0.098681
−0.011898 0.001260 0.021277 0.247760 0.673292 0.098681
0.016447 −0.016447 0 0.098681 0.098681 0.378289








,
(4.983)
from which
{V} = vOx vOy vOz ωx ωy ωz
T
= 8.985140 8.581827 5.616983 7.317427 0.998380 3.355253
T
.
(4.984)
Problem 4.7
The matrix of the moments of inertia of a rigid solid is
[J] =


Jxx −Jxy −Jxz
−Jxy Jyy −Jyz
−Jxz −Jyz Jzz

 =


2.178606 0.313753 −0.219693
0.313753 3.209143 0.553764
−0.219693 0.553764 3.612250

 ;
let us determine the principal moments of inertia Jx, Jy, Jz, as well as the principal directions.
APPLICATIONS 223
Solution:
1. Theory
The principal moments of inertia are just the eigenvalues of the matrix [J], which are given by
the third-degree equation
det[[J] − λ[I]] = 0, (4.985)
where [I] is the unit matrix of third order, hence
Jx = λ1, Jy = λ2, Jz = λ3. (4.986)
The principal directions ai, bi, ci, a2
i + b2
i + c2
i = 1, i = 1, 3, are given by the system
(Jxx − λi)ai − Jxy bi − Jxz ci = 0, −Jxy ai + (Jyy − λi)bi − Jyz ci = 0. (4.987)
Using the notations
1i =
−Jxy −Jxz
Jyy − λi −Jyz
, 2i =
−Jxz Jxx − λi
−Jyz −Jxy
, 3i =
Jxx − λi −Jxy
Jxy Jyy − λi
, (4.988)
we obtain the equalities
ai
1i
=
bi
2i
=
ci
3i
= µi; (4.989)
the condition a2
i + b2
i + c2
i = 1 leads to
µi =
1
2
1i + 2
2i + 2
3i
, (4.990)
so that the solution is
ai = µi 1i, bi = µi 2i, ci = µi 3i, i = 1, 3. (4.991)
2. Numerical calculation
Solving system (4.985), we obtain the eigenvalues
λ1 = 2, λ2 = 3, λ3 = 4, (4.992)
hence relations (4.988) lead to
11 = 0.439385, 21 = −0.167835, 31 = 0.117519, (4.993)
12 = 0.219692, 22 = 0.385929, 32 = −0.270230, (4.994)
13 = −0.000001, 23 = 0.939693, 33 = 1.342021, (4.995)
µ1 = 2.062672, a1 = 0.906308, b1 = −0.346188, c1 = 0.242404, (4.996)
µ2 = 1.923681, a2 = 0.422618, b2 = 0.742405, c2 = −0.519836, (4.997)
µ3 = 0.610387, a3 = 4 × 10−7
, b3 = 0.573576, c3 = 0.819152. (4.998)
224 LINEAR ALGEBRA
Mmin
Fi
F
ri A (xA, yA, zA)
P
z
x
yO
ui
Figure 4.7 Problem 4.8.
Problem 4.8
Consider a rigid solid (Fig. 4.7) in the reference frame Oxyz and the straight lines that pass through
the points Ai of position vectors ri(xi, yi, zi), i = 1, 3, the unit vectors along which are ui(ai, bi, ci),
i = 1, 3. Upon this solid act three forces of unknown intensities F1, F2, F3, the supports of which
are the three straight lines. Let us determine the intensities F1, F2, F3 of the forces so that, at the
point A of position vector rA(xA, yA, zA), the system of forces is reduced to a minimal torsor.
Numerical application: x1 = 0, y1 = 0, z1 = 8a, x2 = a, y2 = 0, z2 = 0, x3 = 0, y3 = −6a,
z3 = 0, a1 = 1, b1 = 0, c1 = 0, a2 = 0, b2 = 1, c2 = 0, a3 = 0, b3 = 0, c3 = 1, xA = 0, yA = 0,
zA = 7a, a = 1 m.
Solution:
1. Theory
Reduced at O, the system of three forces is of components
F =
3
i=1
Fiui, MO =
3
i=1
Firi · ui; (4.999)
by reducing it at A, we obtain the components
F =
3
i=1
Fiui, MA =
3
i=1
Firi · ui − rA ·
3
i=1
Fiui. (4.1000)
The conditions to have the minimal moment is transcribed in the relation
MA = λF. (4.1001)
Using the notations
{F} = Fx Fy Fz
T
, {MA} = MAx MAy MAz
T
, {F} = F1 F2 F3
T
, (4.1002)
di = yici − zibi, ei = ziai − xici, fi = xibi − yiai, i = 1, 3, (4.1003)
[U] =


a1 a2 a3
b1 b2 b3
c1 c2 c3

 , [V] =


d1 d2 d3
e1 e2 e3
f1 f2 f3

 , [rA] =


0 −zA yA
zA 0 −xA
−yA xA 0

 , (4.1004)
[A] = [V] − [rA][U], [B] = [U]−1
[A], (4.1005)
APPLICATIONS 225
in a matrix form, relations (4.1000) become
{F} = [U]{F}, {MA} = [A]{F} (4.1006)
and condition (4.1001) reads
[B]{F} = λ{F}; (4.1007)
the problem becomes one of eigenvalues and eigenvectors.
The eigenvalues λ1, λ2, λ3 are given by the equation
det[[B] − λ[I]] = 0, (4.1008)
while the intensities of the forces are given by the first two secular equations of the matrix equation
(4.1007).
We obtain thus three directions, hence three minimal torsors to which the considered system of
forces is reduced.
2. Numerical calculation
It follows, successively, that
d1 = 0, e1 = 6a, f1 = 0, d2 = 0, e2 = 0, f2 = a, d3 = 6a, e3 = 0, f3 = 0,
(4.1009)
[U] =


1 0 0
0 1 0
0 0 1

 , [V] =


0 0 6a
6a 0 0
0 a 0

 , [rA] =


0 −7a 0
7a 0 0
0 0 0

 , (4.1010)
[A] = [B] =


0 7a −6a
a 0 0
0 a 0

 =


0 7 −6
1 0 0
0 1 0

 , (4.1011)
while equation (4.1008) is
λ3
− 7λ + 6 = 0, (4.1012)
with the solutions
λ1 = 1, λ2 = 2, λ3 = −3. (4.1013)
Equation (4.1007), written in the form


−λi 7 −6
1 −λi 0
0 1 −λi




F1
F2
F3

 = {0}, (4.1014)
leads to the solutions
F2 =
F1
λi
, F3 =
F1
λ2
i
, (4.1015)
that is, to the set of values of the components of the resultant along the axes
F1 F1 F1
T
, F1
F1
2
F1
4
T
, F1 −F1
3
F1
9
T
, (4.1016)
F1 being an arbitrary value.
Finally, it results in
• the first minimal torsor: resultant F = F1
√
3, minimum moment M1 min = F1a
√
3 = F1
√
3,
direction of the resultant 1/
√
3 1/
√
3 1/
√
3
T
;
226 LINEAR ALGEBRA
• the second minimal torsor: resultant F = F1
√
21/4, minimum moment M1 min =
F1a
√
21/2 = F1
√
21/2, direction of the resultant 4/
√
21 2/
√
21 1/
√
21
T
;
• the third minimal torsor: resultant F = F1
√
91/4, minimum moment M1 min =
−F1a
√
91/3 = −F1
√
91/3, direction of the resultant 9/
√
91 −3/
√
91 1/
√
21
T
.
Problem 4.9
To study the free vibrations of an automobile, let us consider the model in Figure 4.8. Thus, for
this model (half of an automobile) let the notations be as follows:
k1, k2 —stiffness of the tires;
k3, k4 —stiffness of the suspension springs;
k5, k6 —stiffness of the passengers’ chairs;
m1, m2 —the masses of the wheels (to which are added the masses of the pivot pins);
m3 —half of the suspended mass of the automobile;
m5, m6 —the masses of the chairs, to which are added 75% of the passengers’ masses;
J —moment of inertia of the suspended mass with respect to the gravity center C.
It is required
• to determine the deflections of the springs in the state of equilibrium;
• to write the matrix equation of the free vibrations;
• to determine the eigenpulsations and the modal matrix;
• to discuss the results thus obtained.
Solution:
1. Theory
Denoting by zi0, i = 1, 6, the deflections of the springs in the state of equilibrium and taking
into account the forces represented in Figure 4.9, we obtain the equilibrium equations
k1z10 − k3z30 = m1g, k2z20 − k4z40 = m2g, k3z30 + k4z40 − k5z50 − k6z60 = m3g,
k3z30l1 − k5z50ls1 − k4z40l2 + k6z60ls2 = 0, k5z50 = m5g, k6z60 = m6g, (4.1017)
m6g
k3k4
k2 k1
k6
m3g
l2 l1
ls2
ls1
m5g
k5
m2g m1g
Figure 4.8 Problem 4.9.
APPLICATIONS 227
m6g
m3g
m2g m1g
m5g
k6z60
k6z60
k4z40
k4z40
k2z20
k1z10
k5z50
k5z50
k3 z30
k3 z30
l2 l1
ls2
ls1
l
Figure 4.9 Equations of equilibrium.
from which it follows that
z10 =
g
k1l
[m4l2 + m1l + m5(l1 + ls1) + m6(l2 − ls2)],
z20 =
g
k2l
[m3l1 + m2l + m5(l1 − ls1) + m6(l2 + ls2)],
z30 =
g
k3l
[(m3 + m5 + m6)l2 + m5ls1 − m6ls2],
z40 =
g
k4l
[(m3 + m5 + m6)l1 − m5ls1 + m6ls2],
z50 =
m5g
k5
, z60 =
m6g
k6
. (4.1018)
For an arbitrary position, denoting the displacements with respect to the position of equilibrium by
z1, z2, z5, z6 for the masses m1, m2, m5, m6, the displacement of the point C by z3 and the angle of
rotation of the suspended mass by φ, we obtain the forces represented in Figure 4.10. The theorem
of momentum, written for the bodies of masses m1, m2, m3, m5, m6, leads to the equations
m1 ¨z1 = −k1(z1 − z10) + k3(z3 + l1φ − z1 − z30) − m1g,
m2 ¨z2 = −k2(z2 − z20) + k4(z3 − l2φ − z2 − z40) − m2g,
m3 ¨z3 = −k3(z3 + l1φ − z1 − z30) − k4(z3 − l2φ − z2 − z40)
+ k5(z5 − z3 − ls1φ − z50) + k6(z6 − z3 + ls2φ − z60) − m3g,
m5 ¨z5 = −k5(z5 − z3 − ls1φ − z50) − m5g,
m6 ¨z6 = −k6(z6 − z3 + ls2φ − z60) − m6g, (4.1019)
228 LINEAR ALGEBRA
m6g
m3g
m1gm2g
m5g
k6(z6−z3+ls2
ϕ−z60)
k4(z4−z2−l2ϕ−z40) k3(z3−z1−l1ϕ−z30)
k5(z5−z3+ls1
ϕ−z50)
k2(z2−z20) k1(z1−z10)
z6
z3
z2 z1
z5
C ϕ
Figure 4.10 Equations of motion.
while the theorem of moment of momentum with respect to the center of gravity of the body of
mass m3 leads to the equation
J ¨φ = k4l2(z3 − l2φ − z2 − z40) + k5ls1(z5 − z3 − ls1φ − z50)
− k3l1(z3 + l1φ − z1 − z30) − k6ls2(z6 − z3 + ls2φ − z60). (4.1020)
Using the matrix notations,
{z} = z1 z2 z3 φ z5 z6
T
, (4.1021)
[M] =








m1 0 0 0 0 0
0 m2 0 0 0 0
0 0 m3 0 0 0
0 0 0 J 0 0
0 0 0 0 m5 0
0 0 0 0 0 m6








, (4.1022)
[K] =







k1 + k3 0 −k3 k3l1 0 0
0 k2 + k4 −k4 k4l2 0 0
−k3 −k4 k2 + k4 + k5 + k6 k3l1 + k4l2 + k5ls1 − k6ls2 −k5 −k6
−k3l1 k4l2 k5l1 − k4l2 + k5ls1 − k6ls2 k3l2
1 + k4l2
2 + k3l2
s1 + k6l2
s2 −k5ls1 k6ls2
0 0 −k5 −k5ls1 k5 0
0 0 −k6 k6ls2 0 k6







(4.1023)
and taking into account equation (4.1017), equation (4.1019), and equation (4.1020), we obtain the
matrix differential equation
[M]{¨z} + [K]{z} = {0}. (4.1024)
The solution of this equation is of the form
{z} = {a} cos(pt − φ) (4.1025)
APPLICATIONS 229
and leads to the matrix equation
[−p2
[M] + [K]]{a} = p2
{a}, (4.1026)
equivalent to the equation
[M]−1
[K]{a} = p2
{a}, (4.1027)
which is a problem of eigenvalues and eigenvectors.
Solving the equation
det[[K] − p2
[M]] = 0, (4.1028)
we obtain the eigenvalues p2
1, p2
2, . . . , p2
6 and the eigenpulsations p1, p2, . . . , p6.
Corresponding to each eigenvalue, we obtain the eigenvectors {a(i)
}, i = 1, 6, which define the
modal matrix
[A] = a(1) {a(2)
} · · · {a(6)
} . (4.1029)
2. Numerical calculation
We obtain successively
[M] =








30 0 0 0 0 0
0 30 0 0 0 0
0 0 450 0 0 0
0 0 0 300 0 0
0 0 0 0 60 0
0 0 0 0 0 60








, (4.1030)
[M]−1
=








0.03333 0 0 0 0 0
0 0.03333 0 0 0 0
0 0 0.002222 0 0 0
0 0 0 0.003333 0 0
0 0 0 0 0.066667 0
0 0 0 0 0 0.066667








, (4.1031)
[K] =








152000 0 −12000 15000 0 0
0 154000 −14000 17500 0 0
−12000 −14000 158000 −2500 −2000 −2000
15000 17500 −2500 42065 −1200 1200
0 0 −2000 −1200 2000 0
0 0 −2000 1200 0 2000








, (4.1032)
[M]−1
[K] =








5066.67 0 −400 500 0 0
0 5133.33 −466.67 583.33 0 0
−26.67 −31.11 351.11 −5.56 −4.44 −4.44
50 58.33 −8.33 140.22 −4 4
0 0 −33.33 −20 33.33 0
0 0 −33.33 20 0 33.33








, (4.1033)
230 LINEAR ALGEBRA
p1 = 6.22 s−1
, p2 = 8.04 s−1
, p3 = 13.13 s−1
,
p4 = 14.26 s−1
, p5 = 71.19 s−1
, p6 = 41.69 s−1
, (4.1034)
[A] =








0 0 0 0 1.0 0
0 0 0 0 0 1.0
0.5 0 0 0.2 0 0
0 −0.6 −0.2 0 0 0
0.7 −0.5 0.7 −0.7 0 0
0.6 0.7 −0.7 −0.7 0 0








. (4.1035)
The first mode of vibration defined by the eigenvector for the eigenpulsation p1 corresponds to
raising the vibrations of the suspended mass together with the displacement in phase of the chairs.
The second and the third modes of vibration correspond to a motion of pitching, together with the
motion in opposition to the phase of the chairs. The fourth mode of vibrations corresponds to a
vibration of raise, together with the motion in opposition of the phase of the chairs. The last two
modes of vibration correspond exclusively to the vibrations of the wheels.
Problem 4.10
We consider the rectangular plate in Figure 4.11, of dimensions 2l1, 2l2, of mass m and of moments
of inertia JX = ml2
2/3, JY = ml2
1/3, JZ = m(l2
1 + l2
2)/3, suspended by the springs AiBi of stiffness
ki, i = 1, 4.
As shown in Figure 4.11, the plate is in equilibrium under the action of the weight mg and of
the deformed springs of deflections si, i = 1, 4.
Considering that the deflections si are relatively great with respect to the displacements of the
plate when it is vibrating, knowing the lengths Li = Ai0Bi and the angles αi, i = 1, 4, determine
the following:
• the matrix differential equation of the linear vibrations;
• the eigenpulsations;
• the modal matrix.
B2
B3
B4
B1
k1
O0
2l2
2l1
Y
X
A20
A30
A40
A10
k2
k3 k4
α2
α3
α1
α4
Figure 4.11 Problem 4.10.
APPLICATIONS 231
C
z
y
x
mg
uC
ki
O
Bi
Ai
δi
ui
Ai0
O0
Z
Y
X
δ
θ
Figure 4.12 Small displacements of the rigid body.
Solution:
1. Theory
We consider a rigid solid the position of which is specified with respect to the fixed reference
system O0XYZ and to a system of reference Oxyz rigidly linked to it (Fig. 4.12), so that at the
position of equilibrium the mobile system coincides with the fixed one.
A small displacement of an arbitrary position of the rigid solid is defined by the linear displace-
ment δ of the point O and by the rotation angle θ.
The rigid solid is acted upon by its own weight mg and is suspended by the springs AiBi,
i = 1, n.
To construct the mathematical model of the linear solutions of the rigid solid, we introduce the
following notations:
• (θX, θY , θZ), (δX, δY , δZ)—projections of the vectors θ and δ on the axes of the system
O0XYZ;
• {θ}, {δ}, {∆}—column matrices
{θ} = θX θY θZ
T
, {δ} = δX δY δZ
T
, {∆} = δX δY δZ θX θY θZ
T
;
(4.1036)
• δi —displacement Ai0Ai of the end of the spring AiBi;
• ui —unit vector in the direction Ai0Ai of the spring AiBi in the position of equilibrium of
the solid;
• ri —position vector of the point Ai0;
• xi, yi, zi —coordinates of the point Ai0 in the system O0XYZ, respectively, the coordinates
of the point Ai in the system Oxyz;
• [ri]—the matrix defined by
[ri] =


0 −zi yi
zi 0 −xi
−yi xi 0

 ; (4.1037)
• ai, bi, ci —projections of the vector ui in the system O0XYZ;
232 LINEAR ALGEBRA
• m∗
i —the vector defined by the relation
m∗
i = ri · ui; (4.1038)
• di, ei, fi —the projections of the vector m∗
i on the axes of the trihedron O0XYZ, that is,
the quantities
di = yici − zibi, ei = ziai − xici, fi = xibi − yiai; (4.1039)
• {ui}, {m∗
i }, {Ui}—the column matrices given by the relations
{ui} = ai bi ci
T
, {m∗
i } = di ei fi
T
, {Ui} = ai bi ci di ei fi
T
;
(4.1040)
• C —the center of gravity of the rigid solid;
• uC —the unit vector in the direction toward to the surface of the Earth;
• xC, yC, zC —the coordinates of the center C in the system Oxyz, respectively, the coordi-
nates of the point C0 in the system O0XYZ;
• rC —the position vector O0C0 of the point C0;
• aC, bC, cC —the projections of the vector uC in the system O0XYZ;
• dC, eC, fC —parameters defined by the relations
dC = yCcC − zCbC, eC = zCaC − xCcC, fC = xCbC − yCaC; (4.1041)
• {UC}—the column matrix
{UC} = aC bC cC dC eC fC
T
; (4.1042)
• δC —displacement of the point C;
• li0 —the undeformed length of the spring AiBi;
• [S]—the matrix defined by
[S] =


0 −mzC myC
mzC 0 −mxC
−myC mxC 0

 ; (4.1043)
• [m]—the matrix
[m] =


m 0 0
0 m 0
0 0 m

 ; (4.1044)
• [J]—the matrix of the moments of inertia
[J] =


Jxx −Jxy −Jxz
−Jxy Jyy −Jyz
−Jxz −Jyz Jzz

 ; (4.1045)
• [M]—the matrix of inertia of the rigid solid
[M] =
[m] [S]T
[S] [J]
; (4.1046)
APPLICATIONS 233
• T , V —the kinetic energy and the potential energy, respectively;
• Va, VC —the potential energy of the springs and the potential energy of the weight mg,
respectively;
• {∂T /∂ ˙∆}. {∂T /∂∆}, {∂V/∂∆}—the column matrices of the partial derivatives
∂T
∂ ˙∆
=
∂T
∂˙δX
∂T
∂˙δY
∂T
∂˙δZ
∂T
∂˙θX
∂T
∂˙θY
∂T
∂˙θZ
T
,
∂T
∂∆
=
∂T
∂δX
∂T
∂δY
∂T
∂δZ
∂T
∂θX
∂T
∂θY
∂T
∂θZ
T
, (4.1047)
∂V
∂∆
=
∂V
∂δX
∂V
∂δY
∂V
∂δZ
∂V
∂θX
∂V
∂θY
∂V
∂θZ
T
.
By these notations, we may write
T =
1
2
{ ˙∆}[M]{ ˙∆}, Va =
1
2
n
i=1
ki(AiBi − li0)2
, VC = mgδCuC, (4.1048)
V = Va + VC. (4.1049)
Lagrange’s equations have the matrix form
d
dt
∂T
∂ ˙∆
−
∂T
∂∆
+
∂V
∂∆
= {0}; (4.1050)
taking into account the relation
d
dt
∂T
∂ ˙∆
= [M]{ ¨∆} (4.1051)
and the fact that {∂T /∂∆} = {0}, because it is a function of second degree in the components of
the matrix { ˙∆}, equation (4.1050) reads
[M]{ ¨∆} +
∂VC
∂∆
+
∂Va
∂∆
= {0}. (4.1052)
The displacements δC, δi, i = 1, n, being small, can be expressed by the relations
δC = δ + θ · rC, δi = δ + θ · ri, i = 1, n, (4.1053)
so that
VC = mg{UC}T
{∆}, (4.1054)
∂VC
∂∆
= mg{UC}. (4.1055)
To calculate the column matrix {∂Va/∂∆} we express first the length AiBi, taking into account
the second relation (4.1053),
AiBi = (AiBi)2 = (Liui − δi)2, (4.1056)
AiBi = (Liui − δ − θ · ri)2 (4.1057)
234 LINEAR ALGEBRA
or
AiBi = [(Liai − δX − θY zi + θZyi)2
+ (Libi − δY − θZxi + θXzi)2
+ (Lici − δZ − θXyi + θY xi)2
]; (4.1058)
by computing, it follows that
∂AiBi
∂∆
=
1
AiBi
−Li Ui +
[I] −[ri]
[ri] −[ri]2 {∆}, (4.1059)
where [I] is the unit matrix of third order.
From relation (4.1057), expanding the binomial into series and neglecting the nonlinear terms,
we obtain
AiBi = Li 1 − 2
ui
Li
δ − θ · ri
1
2
= Li − {Ui}T
{∆}; (4.1060)
taking into account the relation
si = Li − li0 (4.1061)
and neglecting the nonlinear terms, it follows that
AiBi − li0
AiBi
=
si − {Ui}T
{∆}
Li − {Ui}T{∆}
=
si
Li
−
1
Li
Ui
T
{∆} 1 +
1
Li
Ui
T
{∆} (4.1062)
or AiBi − li0
AiBi
=
si
Li
− 1 −
si
Li
{Ui}T
{∆}. (4.1063)
Finally, denoting by [K] the rigidity matrix
[K] =
n
i=1
kisi
Li
[I] [ri]T
[ri] −[ri]2 +
n
i=1
1 −
si
Li
{Ui}{Ui}T
(4.1064)
and taking into account the equilibrium equation
mg{UC} −
n
i=1
kisi{Ui} = {0}, (4.1065)
we get, from equation (4.1062), the matrix differential equation of the linear vibrations
[M]{ ¨∆} + [K]{∆} = {0}. (4.1066)
2. Numerical calculation
We obtain successively
JX = 5 kg m2
, JY = 3.2 kg m2
, JZ = 8.2 kg m2
, (4.1067)
xC = 0, yC = 0, zC = 0, [S] =


0 0 0
0 0 0
0 0 0

 , (4.1068)
[M] =








60 0 0 0 0 0
0 60 0 0 0 0
0 0 60 0 0 0
0 0 0 5 0 0
0 0 0 0 3.2 0
0 0 0 0 0 8.2








, (4.1069)
APPLICATIONS 235
x1 = 0.3, y1 = 0.5, z1 = 0, x2 = −0.3, y2 = 0.5, z2 = 0,
x3 = −0.3, y3 = −0.5, z3 = 0, x4 = 0, y4 = −0.5, z4 = 0,
(4.1070)
[r1] =


0 0 0.5
0 0 −0.3
−0.5 0.3 0

 , [r2] =


0 0 0.5
0 0 0.3
−0.5 −0.3 0

 ,
[r3] =


0 0 −0.5
0 0 −0.3
0.5 0.3 0

 , [r4] =


0 0 −0.5
0 0 −0.3
0.5 0.3 0

 ,
(4.1071)
a1 =
√
3
2
, b1 =
1
2
, c1 = 0, d1 = 0, e1 = 0, f1 = −0.28301,
a2 = −
√
3
2
, b2 =
1
2
, c2 = 0, d2 = 0, e2 = 0, f2 = 0.28301,
a3 = −
√
3
2
, b3 = −
1
2
, c3 = 0, d3 = 0, e3 = 0, f3 = −0.28301,
a4 =
√
3
2
, b4 = −
1
2
, c4 = 0, d4 = 0, e4 = 0, f4 = 0.28301,
(4.1072)
[r1]2
=


−0.25 0.15 0
0.15 −0.09 0
0 0 −0.34

 , [r2]2
=


−0.25 0.15 0
0.15 −0.09 0
0 0 −0.34

 ,
[r3]2
=


−0.25 0.15 0
0.15 −0.09 0
0 0 −0.34

 , [r4]2
=


−0.25 −0.15 0
−0.15 −0.09 0
0 0 −0.34

 ,
(4.1073)
{U1} = 0.86603 0.5 0 0 0 −0.28301
T
,
{U2} = −0.86603 0.5 0 0 0 0.28301
T
,
{U3} = −0.86603 −0.5 0 0 0 −0.28301
T
,
{U4} = 0.86603 −0.5 0 0 0 0.28301
T
,
(4.1074)
{U1}{U1}T
=








0.75 0.43301 0 0 0 −0.24510
0.43301 0.25 0 0 0 −0.14151
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
−0.24510 −0.14151 0 0 0 008010








,
{U2}{U2}T
=








0.75 −0.43301 0 0 0 −0.24510
−0.43301 0.25 0 0 0 0.14151
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
−0.24510 0.14151 0 0 0 0.08010








,
236 LINEAR ALGEBRA
{U3}{U3}T
=








0.75 0.43301 0 0 0 0.24510
0.43301 0.25 0 0 0 0.14151
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0.24510 0.14151 0 0 0 0.08010








,
{U4}{U4}T
=








0.75 −0.43301 0 0 0 0.24510
−0.43301 0.25 0 0 0 −0.14151
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0.24510 −0.14151 0 0 0 0.08010








, (4.1075)
k1s1
L1
[I] [r1]T
[r1] −[r1]2 + k1 1 −
s1
L1
{U1}{U1}T
=








6206.67 2829 0 0 0 −2254.65
2829 2940 0 0 0 −532.53
0 0 1306.67 653.33 −392 0
0 0 653.33 326.67 −196 0
0 0 −392 −196 117.6 0
−2254.65 −532.53 0 0 0 967.59








, (4.1076)
k2s2
L2
[I] [r2]T
[r2] −[r2]2 + k2 1 −
s2
L2
{U2}{U2}T
=








3266.67 −1131.6 0 0 0 −1293.86
−1131.6 1960 0 0 0 −22.19
0 0 1306.67 653.33 392 0
0 0 653.33 326.67 −196 0
0 0 392 −196 117.6 0
−1293.86 −22.19 0 0 0 653.39








, (4.1077)
k3s3
L3
[I] [r3]T
[r3] −[r3]2 + k3 1 −
s3
L3
{U3}{U3}T
=








1568 678.96 0 0 0 580.32
678.96 784 0 0 0 104.29
0 0 392 −196 117.6 0
0 0 −196 98 −58.8 0
0 0 117.6 −58.8 35.28 0
580.32 104.29 0 0 0 258.88








, (4.1078)
k4s4
L4
[I] [r4]T
[r4] −[r4]2 + k4 1 −
s4
L4
{U4}{U4}T
=








1148 −436.47 0 0 0 443.06
−436.47 644 0 0 0 −25.04
0 0 392 −196 −117.6 0
0 0 −196 98 58.8 0
0 0 −117.6 58.8 35.28 0
443.06 −25.04 0 0 0 214.02








, (4.1079)
APPLICATIONS 237
[K] =
4
i=1
kisi
Li
[I] [ri]T
[ri] −[ri]2 +
4
i=1
ki 1 −
si
Li
{Ui}{Ui}T
=








12189.34 1939.89 0 0 0 −2525.13
1939.89 6328 0 0 0 −475.47
0 0 3397.34 914.66 0 0
0 0 914.66 849.34 −392 0
0 0 0 −392 305.76 0
−2525.13 −475.47 0 0 0 2094.08








. (4.1080)
The eigenpulsations are obtained from the equation
12189.34
−60p2 1939.89 0 0 0 −2525.13
1939.89
6328
−60p2 0 0 0 −475.47
0 0
3397.34
−60p2 914.66 0 0
0 0 914.66
849.34
−5p2 −392 0
0 0 0 −392
305.76
−3.2p2 0
−2525.13 −475.47 0 0 0
2094.08
−8.2p2
= 0, (4.1081)
from which
12189.34 − 60p2 1939.89 −2525.13
1939.89 6328 − 60p2
−475.47
−2525.13 −475.47 2094.08 − 8.2p2
×
3397.34 − 60p2
914.66 0
914.66 849.34 − 5p2
−392
0 −392 305.76 − 3.2p2
= 0, (4.1082)
that is,
−29520p6
+ 16649219.28p4
− 2532108243.09p2
+ 115198062272.87 = 0 (4.1083)
or
−960p6
+ 309158.72p4
− 18112125.58p2
+ 104420926.76 = 0. (4.1084)
It follows that
p1 = 18.751, p2 = 10.934, p3 = 9.635,
p4 = 567.485, p5 = 0.763, p6 = 0.761.
(4.1085)
For the first three eigenpulsations, we use the system
(12189.34 − 60p2
)a1 + 1939.89a2 = 2525.13,
1939.89a1 + (6328 − 60p2
)a2 = 475.47,
(4.1086)
238 LINEAR ALGEBRA
while for the last three eigenpulsations, we use the system
(3397.34 − 60p2
)b1 + 914.66b2 = 0,
914.66b1 + (849.34 − 5p2
)b2 = 392.
(4.1087)
The modal matrix reads
[A] =








−0.495 0.382 0.791 0 0 0
−0.973 0.314 −1.396 0 0 0
0 0 0 10−8
−0.17843 −0.17841
0 0 0 0.00024 0.65594 0.65591
0 0 0 1 1 1
1 1 1 0 0 0








. (4.1088)
Problem 4.11
Determine the efforts in the homogeneous, articulated, straight bars, of constant cross section from
which a rigid solution is suspended.
Solution:
1. Theory
1.1. Generalities. Notations
Consider a rigid solid, as illustrated in Figure 4.13, suspended by the elastic straight
bars A0iAi, i = 1, n, of constant cross section and spherical articulated (with spherical
hinges) and the notations:
• O0XYZ —the dextrorsum three-axes orthogonal fixed reference system;
• Oxyz —the dextrorsum three-axes orthogonal reference system, rigidly linked to the
solid;
• XO, YO, ZO —the coordinates of the point O in the system O0XYZ;
• F, MO —the resultant and the resultant moment, respectively, of the external forces
that act upon the body;
O0
z
x
y
Z
X
Y
Ai
ui
MO
F
O (XO,YO,ZO)
A0i
Figure 4.13 Problem 4.11.
APPLICATIONS 239
li +∆li
ri
Ai0
(Xi0
,Yi0
,Zi0
)
z
x
O (XO,YO,ZO)
y
Ai
li
Figure 4.14 Small displacements.
• (Fx, Fy, Fz), (Mx, My, Mz)—projection of the vectors F, MO on the axes of the
Oxyz-trihedron;
• li —length of the bar A0iAi;
• Ai —area of the cross section of the bar A0iAi;
• Ei —the longitudinal elasticity modulus of the bar A0iAi;
• ki —the stiffness of the bar A0iAi
ki =
EiAi
li
; (4.1089)
• δ—the displacement (small) of the point O (Fig. 4.14);
• θ—the rotation angle (small) of the rigid solid;
• (δx, δy, δz), (θx, θy, θz)—the projections of the vectors δ and θ on the axes of the
Oxyz-trihedron;
• δi —the displacement (small) of the point Ai;
• ui —the unit vector of the direction AiA0i;
• ri —the position vector of the point Ai;
• xi, yi, zi —the coordinates of the point Ai in the Oxyz-system;
• ai, bi, ci —projections of the unit vector ui on the axes of the Oxyz-trihedron;
• di, ei, fi —projections of the vector ri · ui on the axes of the Oxyz-trihedron, that is,
di = yici − zibi, ei = ziai − xici, fi = xibi − yiai; (4.1090)
• Ni —intensity of the effort Ni in the bar A0iAi;
• li —deformation of the bar A0iAi;
• {F}, {∆}, {Ui}—column matrices defined by
{F} = Fx Fy Fz Mx My Mz
T
,
{∆} = δx δy δz θx θy θz
T
, (4.1091)
{Ui} = ai bi ci di ei fi
T
.
1.2. Case in which none of the bars is deformed by the application of the external load F,
MOWith the above notations, we write the obvious relation
(li + li)2
= (−liui + δi)2
, (4.1092)
240 LINEAR ALGEBRA
from which, neglecting the nonlinear terms ( li)2
, we obtain the relation
li = −ui · δi. (4.1093)
The displacement of the point Ai of the solid is small, so that it can be expressed by
δi = δ + θ · ri; (4.1094)
hence, using the mentioned notations, relation (4.1093) becomes
li = −{Ui}T
{∆}. (4.1095)
Under these conditions, the intensities of the efforts in the bars are
Ni = ki li = −ki{Ui}T
{∆}, i = 1, n; (4.1096)
if Ni > 0 the bars are subjected to traction and if Ni < 0 they are subjected to com-
pression.
The effort vector reads
Ni = −ki{Ui}T
{∆}ui. (4.1097)
Taking into account the previous notations and the equations of equilibrium
n
i=1
Ni + F = 0,
n
i=1
ri · Ni + MO = 0, (4.1098)
we obtain the matrix equation
[K]{∆} = {F}, (4.1099)
where [K] is the stiffness matrix given by
[K] =
n
i=1
ki{Ui}{Ui}T
=
n
i=1
ki








a2
i aibi aici aidi aiei aifi
biai b2
i bici bidi biei bifi
ciai cibi c2
i cidi ciei cifi
diai dibi dici d2
i diei difi
eiai eibi eici eidi e2
i eifi
fiai fibi fici fidi fiei f 2
i








. (4.1100)
Thus, equation (4.1099) gives the displacement {∆}, and then the efforts in the bars are
given by equation (4.1096)
Particular cases:
(a) The bars are parallel
We suppose in this case, that the bars are parallel to the Oz-axis and we get, successively
ai = bi = 0, ci = 1, di = yi, ei = −xi, fi = 0, (4.1101)
{Ui} = ci di ei
T
, {∆} = δz θx θy
T
, {F} = Fz Mx My
T
, (4.1102)
[K] =
n
i=1
ki{Ui}{Ui}T
=
n
i=1
ki


c2
i cidi ciei
dici d2
i diei
eici eidi e2
i

. (4.1103)
APPLICATIONS 241
(b) The bars are coplanar
We assume that the bars are situated in the Oxy-plane, so that
ci = 0, di = ei = 0, fi = xibi − yiai, (4.1104)
{Ui} = ai bi fi
T
, {∆} = δx δy θz
T
, {F} = Fx Fy Mz
T
, (4.1105)
[K] =
n
i=1
ki{Ui}{Ui}T
=
n
i=1
ki


a2
i aibi aifi
biai b2
i bifi
fiai fibi f 2
i

. (4.1106)
(c) The bars are parallel and coplanar
In this case, we assume that the bars are situated in the Oz-plane and are parallel to the
Oz-axis; it follows that
ai = bi = 0, ci = 1, di = fi = 0, ei = −xi, (4.1107)
{Ui} = ci ei
T
, {∆} = δz θy
T
, {F} = Fz My
T
, (4.1108)
[K] =
n
i=1
ki{Ui}{Ui}T
=
n
i=1
ki
c2
i ciei
eici e2
i
. (4.1109)
(d) The bars are concurrent
In this case, the solid is reduced to the concurrence point, so that
θx = θy = θz, di=0 = ei = fi, (4.1110)
{Ui} = ai bi ci
T
, { } = δx δy δz
T
, (4.1111)
[K] =
n
i=1
ki{Ui}{Ui}T
=
n
i=1
ki


a2
i aibi aici
biai b2
i bici
ciai cibi c2
i

. (4.1112)
(e) The bars are concurrent and coplanar
If the bars are situated in the Oxy-plane, we have
ci = 0, (4.1113)
{Ui} = ai bi
T
, {∆} = δx δy
T
, (4.1114)
[K] =
n
i=1
ki{Ui}{Ui}T
=
n
i=1
ki
a2
i aibi
biai b2
i
. (4.1115)
1.3. Case in which the bars have errors of fabrication equal to li
In this case, relations (4.1096) and (4.1097) become
Ni = −ki[ li + {Ui}T
{∆}], (4.1116)
Ni = −kiui[ li + {Ui}T
{∆}]. (4.1117)
Using the notation
{F} = −ki li{Ui}, (4.1118)
242 LINEAR ALGEBRA
C
G
u5
u6
u3
u5
u4
u2
u1
A5 A6
A3
A2
A1 A4
A05
A07
A7
A06
A03
A04
A01
A05
X
Z
O Y
Figure 4.15 Application 2.1.
for where li corresponds to bars that are longer, the equation of equilibrium (4.1098)
leads to the equation
[K]{∆} = {F} + {F}. (4.1119)
The rigidity matrix is also given by relation (4.1100).
In the case of temperature variations, the deviations that appear are given by
←→
l i = liαi T, (4.1120)
where αi is the coefficient of linear dilatation, while T is the temperature variation in
Kelvin.
2. Numerical applications
Application 2.1. We consider the rigid solid in the form of a homogeneous parallelepiped (Fig. 4.15)
of weight G and dimensions 2a, 2b, 2c, suspended by seven homogeneous articulated straight
bars, of the same length l and of the same stiffness k, the bars A1A01, A2A02 being parallel to
the OX-axis, the bars A3A03, A4A04 being parallel to the OY -axis, while the bars A5A05, A6A06
and A7A07 are parallel to the vertical OZ-axis.Assuming that the rigid solid is acted upon only
by its own weight G, let us determine the efforts in the seven bars in the following cases:
(a) The bars have no fabrication errors.
(b) The bars A1A01, A6A06 have the fabrication errors l1, l6.
(c) The bar A4A04 is heated by T 0
.
Solution of Application 2.1:
(a) It follows, successively, that
{U1} = 1 0 0 0 0 0
T
, {U2} = 1 0 0 0 0.5 0
T
,
{U3} = 0 1 0 0 0 0
T
, {U4} = 0 1 0 0 0 1
T
,
{U5} = 0 0 1 0 0 0
T
, {U6} = 0 0 1 1 0 0
T
,
{U7} = 0 0 1 1 −1 0
T
, (4.1121)
APPLICATIONS 243
{U1}{U1}T
=








1 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0








, {U2}{U2}T
=








1 0 0 0 0.5 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0.5 0 0 0 0.25 0
0 0 0 0 0 0








,
{U3}{U3}T
=








0 0 0 0 0 0
0 1 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0








, {U4}{U4}T
=








0 0 0 0 0 0
0 1 0 0 0 1
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 1 0 0 0 1








,
{U5}{U5}T
=








0 0 0 0 0 0
0 0 0 0 0 0
0 0 1 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0








, {U6}{U6}T
=








0 0 0 0 0 0
0 0 0 0 0 0
0 0 1 1 0 0
0 0 1 1 0 0
0 0 0 0 0 0
0 0 0 0 0 0








,
{U7}{U7}T
=








0 0 0 0 0 0
0 0 0 0 0 0
0 0 1 1 −1 0
0 0 1 1 −1 0
0 0 −1 −1 1 0
0 0 0 0 0 0








, (4.1122)
[K] = k
7
i=1
{Ui}{Ui}T
= 107








2 0 0 0 0.5 0
0 2 0 0 0 1
0 0 3 2 −1 0
0 0 2 2 −1 0
0.5 0 −1 −1 1.25 0
0 1 0 0 0 1








, (4.1123)
{F} = 200000 0 0 −1 −1 1 0
T
, (4.1124)
[K]−1
= 10−7








0.6 0 0 −0.2 −0.4 0
0 1 0 0 0 −1
0 0 1 −1 0 0
−0.2 0 −1 1.9 0.8 0
−0.4 0 0 0.8 1.6 0
0 −1 0 0 0 2








. (4.1125)
{∆} = [K]−1
{F} = 0.004 0 0 −0.002 0.016 0
T
. (4.1126)
N1 = −k1{U1}T
{ } = −40,000 N, N2 = −k2{U2}T
{ } = −12,0000 N,
N3 = −k3{U3}T
{ } = 0 N, N4 = −k4{U4}T
{ } = 0 N,
N5 = −k5{U5}T
{ } = 0 N, N6 = −k6{U6}T
{ } = 20,000 N,
N7 = −k7{U7}T
{ } = 180,000 N. (4.1127)
244 LINEAR ALGEBRA
O
G
u3u4
u2
u1
A1
A4
A2
A3
A04
A02
A03
A01
X
Z
Y
Figure 4.16 Application 2.2.
(b) We have, in this case,
{F} = −k1 l1 U1 − k6 l6{U6} = 200000 −1 0 −1 −1 0 0
T
, (4.1128)
{F} + {
←→
F } = 200000 −1 0 −2 −2 1 0
T
, (4.1129)
{∆} = [K]−1
{{F} + {F}} = −0.012 0 0 −0.016 0.008 0
T
, (4.1130)
N1 = −k1[ l1 + {U1}T
{∆}] = −80,000 N, N2 = −k2{U2}T
{∆} = 80,000 N,
N3 = −k3{U3}T
{∆} = 0 N, N4 = −k4{U4}T
{∆} = 0 N,
N5 = −k5{U5}T
{∆} = 160,000 N, N6 = −k6[ l6 + {U6}T
{∆}] = −40,000 N,
N7 = −k7{U7}T
{∆} = 240,000 N. (4.1131)
(c) We obtain
l4 = lα4 T = 0.012 m, (4.1132)
{F} = −k4 l4{U4} = 120,000 0 −1 0 0 0 −1
T
, (4.1133)
{F} + {F} = 40,000 0 −3 −5 −5 5 −3
T
, (4.1134)
{∆} = [K]−1
{{F} + {F}} = −0.004 0 0 −0.002 0.016 −0.012
T
, (4.1135)
N1 = −k1{U1}T
{∆} = 40,000 N, N2 = −k2{U2}T
{∆} = −40,000 N,
N3 = −k3{U3}T
{∆} = 0 N, N4 = −k4[ l4 + {U4}T
{∆}] = 0 N,
N5 = −k5{U5}T
{∆} = 20,000 N, N6 = −k6{U6}{∆} = −140,000 N,
N7 = −k7{U7}{∆} = 180,000 N. (4.1136)
Application 2.2. A square horizontal plate of side 2l and weight G is suspended by four vertical
bars of elastic stiffness k1, k2, k3, k4 (Fig. 4.16).
Determine the efforts in the bars in the following cases:
(a) The bars have no fabrication errors.
(b) The bar A1A01 has a fabrication error given by l1. Numerical data: l = 1 m, G = 200000 N,
k1 = 8 × 106
Nm−1
, k2 = 2 × 106
Nm−1
, k3 = 5 × 106
Nm−1
, k4 = 6 × 106
Nm−1
,
l1 = 0.02 m.
Solution of Application 2.2: (a) It follows that
{U1} = 1 −l −l
T
= 1 −1 −1
T
, {U2} = 1 l −l
T
= 1 1 −1
T
,
{U3} = 1 l l
T
= 1 1 1
T
, {U4} = 1 −l l
T
= 1 −1 1
T
, (4.1137)
APPLICATIONS 245
{U1}{U1}T
=


1 −1 −1
−1 1 1
−1 1 1

 , {U2}{U2}T
=


1 1 −1
1 1 −1
−1 −1 1

 ,
{U3}{U3}T
=


1 1 1
1 1 1
1 1 1

 , {U4}{U4}T
=


1 −1 1
−1 1 −1
1 −1 1

 , (4.1138)
[K] =
4
i=1
ki{Ui}{Ui}T
= 106


21 −7 1
−7 21 5
1 5 21

 , (4.1139)
{F} = 200,000 −1 0 0
T
, (4.1140)
[K]−1
= 10−6


0.054622 0.019958 −0.007353
0.019958 0.057773 −0.014706
−0.007353 −0.014706 0.051471

 , (4.1141)
{∆} = [K]−1
{F} = 0.010924 0.003992 −0.001471
T
, (4.1142)
N1 = −k1{U1}T
{∆} = −67,224 N, N2 = −k2{U2}T
{∆} = −32,774 N,
N3 = −k3{U3}T
{∆} = −67,225 N, N4 = −k4{U4}T
{∆} = −32,766 N. (4.1143)
(b) The rigidity matrix [K] and the matrix {F} remain the same. We calculate, successively,
{F} = −k1 l1{U1} = 160,000 −1 1 1
T
, (4.1144)
{F} + {F} = 40,000 −9 4 4
T
, (4.1145)
{∆} = [K]−1
{{F} + {F}} = −0.017647 −0.000294 0.008529
T
, (4.1146)
N1 = −k1[ l1 + {U1}T
{∆}] = 47,056 N, N2 = −k2{U2}T
{∆} = 52,040 N,
N3 = −k3{U3}T
{∆} = 47,060 N, N4 = −k4{U4}T
{∆} = 52,944 N. (4.1147)
Application 2.3. Let us consider the rectangular plate in Figure 4.17 of dimensions 2a, 2b, suspended
by the hinged bars A1A01, A3A03 parallel to the Ox-axis and by the bars A2A02, A4A04 parallel
to the Oy-axis.
O
A3
A4 A1
A2
A02
A03
A04
A01
P
x
y
2a
2b
Figure 4.17 Application 2.3.
246 LINEAR ALGEBRA
Knowing that the plate is acted upon at the point A2 by the force P parallel to the Ox-axis and
knowing the rigidities k1 = k, k2 = k3 = 2k, k4 = 3k, determine the efforts in the bars in the
following cases:
(a) The bars have not fabrication errors.
(b) The bar A1A01 has a fabrication error equal to l1.
(c) The bar A1A01 of length l1 is heated by T 0
. Numerical data: a = 0.5 m, b = 0.4 m, P =
10000 N, k = 106
N m−1
, l1 = 0.01 m, T = 100 K, α1 = 12 × 106
deg−1
, l1 = 1 m.
Solution of Application 2.3: (a) We have
{U1} = 1 0 b
T
= 1 0 0.4
T
, {U2} = 0 1 a
T
= 0 1 0.5
T
,
{U3} = −1 0 b
T
= −1 0 0.4
T
, {U4} = 0 1 −a
T
= 0 1 −0.5
T
,
(4.1148)
{U1}{U1}T
=


1 0 0.4
0 0 0
0.4 0 0.16

 , {U2}{U2}T
=


0 0 0
0 1 0.5
0 0.5 0.25

 ,
{U3}{U3}T
=


1 0 −0.4
0 0 0
−0.4 0 0.16

 , {U4}{U4}T
=


0 0 0
0 1 −0.5
0 −0.5 0.25

 , (4.1149)
[K] =
4
i=1
ki{Ui}{Ui}T
= 106


3 0 −0.4
0 5 −0.5
−0.4 −0.5 1.73

 , (4.1150)
[K]−1
= 10−6


0.344262 0.008197 0.081967
0.008197 0.206148 0.061475
0.081967 0.061475 0.614754

 , (4.1151)
{F} = P 1 0 −b
T
= 10,000 1 0 −0.4
T
, (4.1152)
{∆} = δx δy θz
T
= [K]−1
{F} = 0.003115 0.000164 −0.001639
T
, (4.1153)
N1 = −k1{U1}T
{∆} = −24,594 N, N2 = −k2{U2}T
{∆} = 1311 N,
N3 = −k3{U3}T
{∆} = 75,412 N, N4 = −k4{U4}T
{∆} = −24,505 N. (4.1154)
(b) We obtain
{F} = −k1 l1{U1} = 2000 −5 0 −2
T
, (4.1155)
{F} + {F} = 8000 0 0 −1
T
, (4.1156)
{∆} = [K]−1
{{F} + {F}} = −0.000656 −0.000492 −0.004918
T
, (4.1157)
N1 = −k1[ l1 + {U1}T
{∆}] = −73,768 N, N2 = −k2{U2}T
{∆} = 5902 N,
N3 = −k3{U3}T
{∆} = 2622.4 N, N4 = −k4{U4}T
{∆} = 5901 N. (4.1158)
(c) It follows successively that
l1 = l1α1 T = 0.0012 m, (4.1159)
{F} = −k1 l1{U1} = −12000 0 −4800
T
, (4.1160)
{F} + {F} = −2000 0 −8800
T
, (4.1161)
APPLICATIONS 247
A05
A04
A03
A02
A01
y
z
P
O
x
Figure 4.18 Application 2.4.
{∆} = [K]−1
{{F} + {F}} = −0.001410 −0.000557 −0.005574
T
, (4.1162)
N1 = −k1[ l1 + {U1}T
{∆}] = 24,396 N, N2 = −k2{U2}T
{∆} = 6688 N,
N3 = −k3{U3}T
{∆} = 1639.2 N, N4 = −k4{U4}T
{∆} = −6690 N. (4.1163)
Application 2.4. We consider the spatial system of articulated bars in Figure 4.18, concurrent at
the articulation O, where the vertical force P situated on the Oz-axis is acting.
Knowing the rigidities of the bars A0iO, i = 1, 5, k1 = 2k, k2 = 1.5k, k3 = 2k, k4 = 3k,
k5 = 2.5k and the direction cosines of their directions (ai, bi, ci), i = 1, 5, determine the efforts
in the bars in the following cases:
(a) The bars have no fabrication errors.
(b) The bar A01O has a fabrication error equal to l1.
(c) The bar A01O of length l1 is heated by T 0
. Numerical data: P = 20000 N,
k = 106
N m−1
, l1 = 0.02 m, l1 = 1 m, α1 = 12 × 10−6
deg−1
, (a1, b1, c1) = (3/5, 4/5, 0),
(a2, b2, c2) = (2/3, 2/3, 1/3), (a3, b3, c3) = (0, 3/5, 4/5), (a4, b4, c4) = (−2/3, 2/3, 1/3),
(a5, b5, c5) = (−3/5, 4/5, 0).
Solution of Application 2.4: (a) We have, successively,
{Ui} = ai bi ci
T
, i = 1, 5, (4.1164)
{U1}{U1}T
=







9
25
12
25
0
12
25
16
25
0
0 0 0







, {U2}{U2}T
=








4
9
4
9
2
9
4
9
4
9
2
9
2
9
2
9
1
9








,
{U3}{U3}T
=






0 0 0
0
9
25
12
25
0
12
25
16
25






, {U4}{U4}T
=








4
9
−
4
9
−
2
9
−
4
9
4
9
2
9
−
2
9
2
9
1
9








,
{U5}{U5}T
=







9
25
−
12
25
0
−
12
25
16
25
0
0 0 0







, (4.1165)
248 LINEAR ALGEBRA
[K] =
5
i=1
ki{Ui}{Ui}T
= 106


3.62 −0.906667 −0.333333
−0.906667 5.6 1.96
−0.333333 1.96 1.78

 , (4.1166)
[K]−1
= 10−6


0.324021 0.119911 −0.192715
0.119911 0.334921 −0.391245
−0.192715 −0.391245 1.028696

 , (4.1167)
{F} = 20,000 0 0 −1
T
, (4.1168)
{∆} = δx δy δz
T
= [K]−1
{F} = 0.003854 0.007825 −0.020574
T
, (4.1169)
N1 = −k1{U1}T
{∆} = −17,144.8 N, N2 = −k2{U2}T
{∆} = −1392 N,
N3 = −k3{U3}T
{∆} = 23,528.4 N, N4 = −k4{U4}T
{∆} = 12,632 N,
N5 = −k5{U5}T
{∆} = −9869 N. (4.1170)
(b) It follows that
{F} = −k1 l1{U1} = 8000 −3 −4 0
T
, (4.1171)
{F} + {F} = 4000 −6 −8 −5
T
, (4.1172)
{∆} = [K]−1
{{F} + {F}} = −0.007759 −0.005770 −0.003429
T
, (4.1173)
N1 = −k1[ l1 + {U1}T
{∆}] = −21,457.2 N, N2 = −k2{U2}T
{∆} = 15,243.5 N,
N3 = −k3{U3}T
{∆} = 12,410.4 N, N4 = −k4{U4}T
{∆} = 6309 N,
N5 = −k5{U5}T
{∆} = −98.5 N. (4.1174)
(c) We obtain the values
l1 = l1α1 T = 12 × 10−4
m, (4.1175)
{F} = −k1 l1{U1} = −1440 −1920 0
T
, (4.1176)
{F} + {F} = −1440 −1920 −20000
T
, (4.1177)
{∆} = [K]−1
{{F} + {F}} = 0.003157 −0.008641 −0.021048
T
, (4.1178)
N1 = −k1[ l1 + {U1}T
{∆}] = 7637.2 N, N2 = −k2{U2}T
{∆} = 16,008 N,
N3 = −k3{U3}T
{∆} = 44,046 N, N4 = −k4{U4}T
{∆} = 44,644 N,
N5 = −k5{U5}T
{∆} = 22,017.5 N. (4.1179)
Problem 4.12
Let us consider the continuous beam in Figure 4.19, where the sections have lengths lk and rigidities
EIk, k = 1, n − 1.
The beam is acted upon by given distributed loads and by given concentrated forces and moments.
It is required to determine the reactions Vk, k = 1, n, in the supports.
APPLICATIONS 249
l1
l1
l2
l2
ln−2
ln−2
ln−1
ln−1
Figure 4.19 Problem 4.12.
Q q
Mk−1 Mk+1
Mk+1Mk−1
Mk
Mk
Mk
Mk
Ak−1
Ak+1
Ak+1
Ak−1
Ak Ak
Ak Ak
Q
V ″k−1 V″kV ′k
V ′k+1
q
lk−1 lk
(a) (b)
(d)(c)
Figure 4.20 Isolation of the sections Ak−1Ak and AkAk+1.
Solution:
1. Theory
By isolating the sections Ak−1Ak, AkAk+1, we obtain the representations in Figure 4.20a and b,
where we have the notations:
• q, Q—given external loads;
• Mk−1, Mk, Mk+1 —bending moments;
• Vk, Vk+1 —reactions at the right of each section;
• Vk−1, Vk —reactions at the left of each section.
Figure 4.20c and d represents the loadings (bending moments) of the conjugate beams, while
the bending moments given by the external loads q, Q are represented under the reference lines
Ak−1Ak, AkAk+1. Denoting by Mr
k−1, Ml
k+1 the resultant moments of the external loading q with
respect to the points Ak−1 and Ak+1, respectively, it follows that the reactions Vk, Vk are given by
Vk =
1
lk−1
(Mk−1 + Mr
k−1 − Mk), Vk =
1
lk
(Mk∗1 + Ml
k∗1 − Mk), (4.1180)
so that the reaction at the support Ak
Vk = Vk + Vk (4.1181)
reads
Vk =
Mk−1
lk−1
− Mk
1
lk−1
+
1
lk
+
Mk+1
lk
+
Mr
k−1
lk−1
+
Ml
k+1
lk
. (4.1182)
Because the rotations φk, φk at the fixed support Ak for the two sections, respectively, are equal to
the shearing forces divided by the rigidities EIk−1, EIk of the conjugate beams, it follows that
φk =
1
EIk−1
Mklk−1
2
2lk−1
3
+
Mk−1lk−1
2
lk−1
3
+ Sr
k−1 , (4.1183)
φk =
1
EIk
Mklk
2
2lk
3
+
Mk+1lk+1
2
lk
3
+ Sl
k+1 , (4.1184)
250 LINEAR ALGEBRA
where by Sr
k−1, Sl
k+1 we have denoted the static moments of the bending moments of the areas
corresponding to the external loads q, Q.
The indices r, l specify the loadings to the right and at the left of the supports Ak−1, Ak+1,
respectively.
Taking into account the relation
φk + φk = 0, (4.1185)
we obtain, from equation (4.1183) and equation (4.1184), the Clapeyron relation
Mk−1lk−1
Ik−1
+ 2Mk
lk−1
Ik−1
+
lk
Ik
+
Mk+1
Ik
+ 6
Sr
k−1
lk−1Ik−1
+
Sl
k+1
lkIk
= 0. (4.1186)
If we take into account that the moments at the supports A1, An vanish (M1 = Mn = 0) and if we
use the notations
[A] =










2
l1
I1
+
l2
I2
l2
I2
0 0 · · · 0 0 0
l2
I2
2
l2
I2
+
l3
I3
l3
I3
0 · · · 0 0 0
0
l3
I3
2
l3
I3
+
l4
I4
0 · · · 0 0 0
· · · · · · · · · · · · · · · · · · · · · · · ·
0 0 0 0 · · ·
ln−3
In−3
2
ln−3
In−3
+
ln−2
In−2
ln−2
In−2
0 0 0 0 · · · 0
ln−2
In−2
2
ln−2
In−2
+
ln−1
In−1










,
(4.1187)
{B} = −6




















Sr
1
l1I1
+
Sl
2
l2I2
Sr
2
l2I2
+
Sl
3
l3I3
Sr
3
l3I3
+
Sl
3
l3I3
...
Sr
n−2
ln−2In−2
+
Sl
n−1
ln−1In−1




















, (4.1188)
{M} = M2 M3 · · · Mn−1
T
, (4.1189)
{V} = V1 V2 · · · Vn
T
, (4.1190)
[C] =














1
l1
0 0 0 · · ·
− 1
l1
+ 1
l2
1
l2
0 0 · · ·
1
l2
− 1
l2
+ 1
l3
1
l3
0 · · ·
· · · · · · · · · · · · · · · · · · · · · · · ·
0 0 0 0 · · · 1
ln−3
− 1
ln−3
+ 1
ln−2
1
ln−2
0 0 0 0 · · · 0 1
ln−2
− 1
ln−2
+ 1
ln−1
0 0 0 0 · · · 0 0 1
ln−1














,
(4.1191)
APPLICATIONS 251
Pq
l l l l l/2l/2
A1
A2 A3 A4 A5
A6
Figure 4.21 Numerical application.
{D} =


























Ml
1
l1
Mr
1
l1
+
Ml
2
l2
Mr
2
l2
+
Ml
3
l3
...
Mr
n−2
ln−2
+
Ml
n−1
ln−1
Mr
n−1
ln−1


























, (4.1192)
then equation (4.1186) and equation (4.1182), for k = 1, n, may be written in the matrix form as
[A]{M} = {B}, (4.1193)
{V} = [C]{M} + {D}, (4.1194)
from which we obtain the solution
{V} = [C][A]−1
{B} + {D}. (4.1195)
2. Numerical application
Figure 4.21 gives n = 6, Ik = I = 256 × 10−6
m2
, k = 1, 6, lk = l = 1 m, k = 1, 6,
P = 20,000 N, q = 40,000 N m−1
. The reactions Vk, k = 1,6, are required.
Solution of the numerical application: The matrices [A] and [C] are obtained, directly, from
relations (4.1187) and (4.1191)
[A] = 3906.25




4 1 0 0
1 4 1 0
0 1 4 1
0 0 1 4



 , [C] =








1 0 0 0
−2 1 0 0
1 −2 1 0
0 1 −2 1
0 0 1 −2
0 0 0 1








. (4.1196)
The matrix {B} is written first in the form
{B} = −
6
EI
Sr
1 0 0 Sl
6
T
, (4.1197)
252 LINEAR ALGEBRA
(b)
(a) A2
q
l/2 l/2
A1
q
l2
8
Figure 4.22 Section A1A2.
(b)
(a)
l/2l/2
A5 A6
P
P l
4
Figure 4.23 Section A5A6.
and from Figure 4.22b and Figure 4.23b it follows that
Sr
1 =
ql4
24
, (4.1198)
Sl
6 =
P l3
16
, (4.1199)
{B} = −39062500 0 0 −29296875
T
. (4.1200)
Analogically, the matrix {D} is written first in the form
{D} =
Ml
2
l1
Mr
1
l1
0 0
Ml
6
l5
Mr
5
l1
T
, (4.1201)
and, because from Figure 4.22a and Figure 4.23a we have
Mr
1 = Ml
2 =
ql2
2
, (4.1202)
Mr
5 = Ml
6 =
Pl
2
, (4.1203)
we obtain
{D} = 10000 2 2 0 0 1 1
T
. (4.1204)
APPLICATIONS 253
A
B
C
P
l2
l1
l3
Figure 4.24 Problem 4.13.
X1
X3
X1
P
C
B
Figure 4.25 Basic system.
In the numerical case, from relation (4.1195) it follows that
{V} = V1 V2 V3 V4 V5 V6
T
= −2000 9000 −27500 −25000 62500 −20000
T
. (4.1205)
Problem 4.13
Let us determine, by the method of efforts, the reactions in the built-in sections A, B in Figure 4.24,
assuming that the bars AC, CB have the same rigidity EI .
Numerical application: EI = 2 × 108
N m2
, l1 = 0.5 m, l2 = 0.4 m, l3 = 0.6 m, P = 12,000 N.
Solution:
1. Theory
Introducing at the built-in section A, the reactions forces X1 and X2 and the reaction moment
X3 (Fig. 4.25), we obtain the basic system, which is the bent beamACB, built in at B and acted
upon by the force P and by the unknown reactions X1, X2, X3.
The external load P , the unit forces along the forces X1 and X2 and the unit moment in the direction
of the moment X3 produce on the basic system the diagrams of bending moments M0, m1, m2 m3,
represented in Figure 4.26.
By means of these diagrams we calculate the coefficients of influence
δi0 =
miM0
EI
dx, (4.1206)
δij =
mimj
EI
dx. (4.1207)
Being given that the variations of the moments mi are linear, we can also calculate them by
Vereshchyagin’s rule:
δi0 =
miC
EI
, (4.1208)
δij =
imjC
EI
, (4.1209)
254 LINEAR ALGEBRA
(a) (b)
(c) (d)
l2+l3A
1
BC
1
m1
m3
m2
M0
−l1
B
1
C
−Pl3
−Pl3
A
BC
C
B
Figure 4.26 Diagrams of bending moments.
where is the area of the moment surface of the diagram M0, while miC is the moment of the
diagram corresponding to the center of gravity of the surface and mi is the area of the surface
of moments of the diagram mi, while mjC is the moment of the diagram mj corresponding to the
center of gravity of the surface of moments mi, respectively.
2. Numerical application
It follows successively that
δ10 =
P l1l2
3
2EI
, δ20 = −
P l2
3(3l2 + 2l3)
6EI
, δ30 = −
P l2
3
2EI
, (4.1210)
δ11 =
l2
1(l1 + 3l2 + 3l3)
3EI
, δ12 = δ21 = −
l1(l2 + l3)2
2EI
, δ13 = δ31 = −
l1(l1 + 2l2 + 2l3)
2EI
,
δ22 =
(l2 + l3)3
3EI
, δ23 = δ32 =
(l2 + l3)2
2EI
, δ33 =
l1 + l2 + l3
EI
. (4.1211)
δ10 = 3.24 × 10−6
m2
, δ20 = −8.64 × 10−6
m2
, δ30 = −6.48 × 10−6
, (4.1212)
δ11 = 1.45833 × 10−9
m N−1
, δ12 = δ21 = −1.25 × 10−9
m N−1
,
δ13 = δ31 = −3.125 × 10−9
N−1
, δ22 = 1.66667 × 10−9
m N−1
,
δ23 = δ32 = 2.5 × 109
N−1
, δ33 = 7.5 × 10−9
N−1
m−1
. (4.1213)
Using the notations
[δ] =


δ11 δ12 δ13
δ21 δ22 δ23
δ31 δ32 δ33

 = 10−9


1.45833 −1.25 −3.125
−1.25 1.66667 2.5
−3.125 2.5 7.5

 ,
{δ0} = δ10 δ20 δ30
T
= 10−6
3.24 −8.64 −6.48
T
, {X} = X1 X2 X3
T
, (4.1214)
we obtain the matrix equation of condition
[δ]{X} = −{δ0}, (4.1215)
from which we obtain
{X} = −[δ]−1
{δ0}. (4.1216)
APPLICATIONS 255
In our case,
[δ]−1
= 1.53604 × 109


6.25003 1.5625 2.08334
1.5625 1.17185 0.26043
2.08334 0.26043 0.86805

 , {X} =


10368.187
10368.071
1728.133

 . (4.1217)
The reactions at B are
HB = X1 = 10,368.187 N, VB = P − X2 = 1631.929 N,
MB = P l3 + X1l1 − X2(l2 + l3) − X3 = 287.89 N m. (4.1218)
Observation 4.43 If l1 = l2 = l3 = l, then we obtain the values
δ10 =
P l4
2EI
, δ20 = −
5P l4
6EI
, δ30 = −
P l2
2EI
, (4.1219)
δ11 =
7l3
3EI
, δ12 = δ21 = −
2l3
EI
, δ13 = δ31 = −
5l2
2EI
,
δ22 =
8l3
3EI
, δ23 = δ32 =
2l2
EI
, δ33 =
3l
EI
. (4.1220)
The condition for this is given by
l
EI









7l2
3
−2l2
−
5l
2
−2l2 8l2
3
2l
−
5l
2
2l 3











X1
X2
X3

 = −
P l2
EI









l2
2
−
5l2
6
−
l2
2









(4.1221)
or, equivalently, by








7l2
3 −2l2
−
5l
2
−2l2 8l2
3
2l
−
5l
2
2l 3










X1
X2
X3

 = Pl









−
l2
2
5l2
6
l2
2









, (4.1222)
with the solution
X1 X2 X3
T
= P
4
7P
16
Pl
12
T
. (4.1223)
P
A2
A1
A5 A6
A7
A4
A9
A8A3
Figure 4.27 Problem 4.14.
256 LINEAR ALGEBRA
Problem 4.14
Let us show that the plane frame in Figure 4.27 is with fixed knots and determine the reactions at
the points A5, A6, A7, A8 by the method of displacements, knowing that the bars have the same
rigidity EI and the same length, while A3A9 = A9A8 = l.
Numerical application for l = 1 m, EI = 6 × 108
N m2
, P = 12,000 N.
Solution:
1. Theory
If we replace the elastic knots A1, A2, A3, A4 and the built-in ones A5, A6, A7, A8 by articula-
tions, we obtain the structure in Figure 4.28, which has b = 8 bars and n = 8 articulations.
The structure in Figure 4.27 has r = 12 external constraints (three in each built-in section A5, A6,
A7, A8). It follows thus that the expression 2n − (b + r) = −4 is negative, so that the structure is
with fixed knots.
Isolating an arbitrary bar AhAj (Fig. 4.29), denoting by φh, φj , Mh, Mj the rotation angles and the
moments at the ends of the bar, respectively, and using the method of the conjugate bar, we obtain
the relations
Mhj =
2EI
l
(2φh + φj ) +
2(Sh − 2Sj )
l2
, Mjh =
2EI
l
(2φj + φh) +
2(Sj − 2Sh)
l2
, (4.1224)
where by Sh, Sj we have denoted the static moments of the areas of the bending moments given
by the external loads Q, q (Fig. 4.29).
In the case of Figure 4.27, these static moments vanish for all the bars, excepting the bar A3A8
(Fig. 4.30), for which
S3 = S8 =
P l3
2
. (4.1225)
To determine the unknown rotations φ1, φ2, φ3, φ4 at the knots A1, A2, A3, A4, we write the
equilibrium equations that are obtained by isolating the knots, that is,
M12 + M14 + M15 = 0, M21 + M23 = 0,
M32 + M34 + M38 = 0, M41 + M43 + M46 + M47 = 0. (4.1226)
A2
A3 A8
A7
A4
A6
A1
A5
Figure 4.28 Resulting structure.
Ah
Ajϕh ϕj
q
l
Mhj
Mjh
Q
Figure 4.29 Isolation of the bar AhAj .
APPLICATIONS 257
2
Pl
P A8A3
Figure 4.30 Diagram of bending moments for the bar A3A8.
Vj = VhMhj
Mjh
Ah
Vh
Aj
Figure 4.31 Bar without external loads.
2. Computation relations
With the view to obtain the system of four equations with four unknowns from system (4.1226),
we take into account that φ5 = φ6 = φ7 = φ8 = 0, obtaining thus the equalities
M12 =
2EI (2φ1 + φ2)
l
, M14 =
2EI (2φ1 + φ4)
l
, M15 =
EI φ1
l
,
M21 =
2EI (2φ2 + φ1)
l
, M23 =
2EI (2φ2 + φ3)
l
, M31 =
2EI (2φ1 + φ3)
l
, (4.1227)
M34 =
2EI (2φ3 + φ4)
l
, M32 =
2EI (2φ3 + φ2)
l
, M38 =
2EI φ3
l
−
Pl
4
,
M41 =
2EI (2φ4 + φ1)
l
, M43 =
2EI (2φ4 + φ3)
l
, M46 =
4EI φ4
l
, M47 =
4EI φ4
l
,
so that system (4.1226), with the notation
[A] =




6 1 0 1
1 4 1 0
0 1 5 1
1 0 1 8



 , {φ} = φ1 φ2 φ3 φ4
T
, {B} = 0 0 Pl2
8 0
T
, (4.1228)
becomes
[A]{φ} = {B} (4.1229)
and has the solution
{φ} = [A]−1
{B}. (4.1230)
The rotations φ1, φ2, φ3, φ4 being known now, from relations (4.1227), we determine the indicated
moments and, moreover, the moments M51, M64, M74, M83 by the formulae
M51 =
2EI φ1
l
, M64 = M74 =
2EI φ4
l
, M83 =
Pl
8
+
EI φ1
l
. (4.1231)
258 LINEAR ALGEBRA
For bars unloaded with external loads (Fig. 4.31), we obtain the reactions
Vh = Vj =
Mhj + Mjh
l
, (4.1232)
while for the bar A3A8 (Fig. 4.32) we obtain the reactions
V3 =
M38 + M83 − Pl
2l
, V8 =
M38 + M83 + Pl
2l
. (4.1233)
On the basis of relation (4.1232), we may determine (Fig. 4.33) the reactions H5, H6, V7, that is,
H5 =
M51 + M15
l
, H6 =
M64 + M46
l
, V7 =
M47 + M74
l
. (4.1234)
To determine the reactions V5 and V6, we isolate the parts in Figure 4.34; there result the successive
relations
V2 =
M23 + M32
l
, V1 =
M14 + M41
l
, V5 = −(V1 + V7),
V3 =
M38 + M83 − Pl
2l
, V4 =
M47 + M74
l
, (4.1235)
V6 = V1 + V2 − V3 − V4, (4.1236)
P
V8
V′3
M83
M38
Figure 4.32 The bar A3A8.
A2
A1
A5
A3 A8
A7
A4
A6
V5 V6
V7
V8
H5 H6
H7
H8
Figure 4.33 Calculation of the reactions H5, H6, and V7.
V2
V1
V5
V′3
V′
V6
V2
V1
A3
A4
A6
A2
A1
A5
Figure 4.34 Determination of the reactions V5 and V6.
APPLICATIONS 259
H2
H4H1
H2 H3
H3
H7
A8 H8
A7
A3
A4
A2
A1
Figure 4.35 Determination of the reactions H7 and H8.
while, for the determination of the reactions H7 and H8, we isolate the parts in Figure 4.35 and
there result the successive relations
H2 =
M12 + M21
l
, H3 =
M34 + M43
l
, H8 = −(H2 + H3), (4.1237)
H1 =
M15 + M51
l
, H4 =
M46 + M64
l
, H7 = H2 + H3 − H1 − H4. (4.1238)
In conclusion, we obtain the reactions:
• at the point A5 —H5, V5, M51;
• at the point A6 —H6, V6, M64;
• at the point A7 —H7, V7, M74;
• at the point A8 —H8, V8, M83.
3. Numeric computation
We calculate successively
{B} = 0 0 0.25 × 10−6
0
T
, (4.1239)
[A]−1
=




0.178744 −0.048309 0.014493 −0.024155
−0.048309 0.276570 −0.057971 0.013285
0.014493 −0.057971 0.217391 −0.028986
−0.024155 0.013285 −0.028986 0.131643



 , (4.1240)
{φ} = φ1 φ2 φ3 φ4
T
= 10−9
3.62325 −14.49275 54.34775 −7.2465
T
,
(4.1241)
M51 = 4.3479 N m, M64 = M74 = −8.6958 N m,
M83 = 152.17395 N m, M12 = −8.6955 N m, M15 = 2.17395 N m,
M14 = 0 N m, M21 = −30.4347 N m, M23 = 30.4347 N m,
M31 = 73.9131 N m, M34 = 121.7388 N m, M38 = −234.7827 N m,
M41 = 13.0437 N m, M43 = 47.8257 N m, M46 = −17.3916 N m,
M47 = 47.8257 N m, M32 = 113.0433 N m, (4.1242)
V3 = −641.3 N, V8 = 558.7 N, H5 = 6.49 N, H6 = −26.09 N,
V7 = 39.13 N, V2 = 143.5 N, V1 = −13.04 N, V5 = −26.09 N,
V4 = 39.13 N, V6 = 758.7 N, H2 = −39.13 N, H3 = 169.6 N,
H8 = −130.5 N, H1 = 6.52 N, H4 = −26.09 N, H7 = 150 N. (4.1243)
260 LINEAR ALGEBRA
A B
C
D
E F
G60° 60°
60°
l
3l
l
2l
X
Y
32l
Figure 4.36 Problem 4.15.
Problem 4.15
Let us consider the plane articulated mechanism in Figure 4.36, where the crank AB is rotating with
the constant angular velocity ω1. For the position in Figure 4.36, determine the angular velocities
ω2, ω3, ω4, ω5 and the angular accelerations ε2, ε3, ε4, ε5 of the bars BC, CD, EF, FG.
Numerical application for ω1 = 100 s−1
, l = 0.2 m.
Solution:
1. Theory
In an arbitrary position and in a more general case, the mechanism is represented in Figure 4.37.
Denoting by l1, l2, l∗
2 , l3, l4, l5 the lengths of the bars AB, BC, BE, CD, EF, FG, from the vector
equations
AB + BC + CD = OD − OA, AB + BE + EF + FG = OG − OA, (4.1244)
projected on the axes OX and OY , we obtain the scalar equations
l1 cos φ1 + l2 cos φ2 + l3 cos φ3 = XD − XA,
l1 sin φ1 + l2 sin φ2 + l3 sin φ3 = YD − YA,
l1 cos φ1 + l∗
2 cos φ2 + l4 cos φ4 + l5 cos φ5 = XG − XA,
l1 sin φ1 + l∗
2 sin φ2 + l4 sin φ4 + l5 sin φ5 = YG − YA.
(4.1245)
O
ϕ5
ϕ4
D (xD,yD)
ϕ3
ϕ2
ϕ1
Y
X
G (xG,yG)
F
E
C
B
A(xA,yA)
Figure 4.37 The general case.
APPLICATIONS 261
Differentiating successively with respect to time relations (4.1225) and denoting by ωi, εi the angular
velocities and accelerations, respectively,
ωi = ˙φi, εi = ˙ωi, i = 1,5, (4.1246)
and knowing that ˙ω1 = 0, we obtain the systems of equations
−l2ω2 sin φ2 − l3ω3 sin φ3 = l1ω1 sin φ1,
l2ω2 cos φ2 + l3ω3 cos φ3 = −l1ω1 cos φ1,
−l∗
2 ω2 sin φ2 − l4ω4 sin φ4 − l5ω5 sin φ5 = l1ω1 sin φ1, (4.1247)
l∗
2 ω2 cos φ2 + l4ω4 cos φ4 + l5ω5 cos φ5 = −l1ω1 cos φ1,
−l2ε2 sin φ2 − l3ε3 sin φ3 − l2ω2
2 cos φ2 − l3ω2
3 cos φ3 = l1ω2
1 cos φ1,
l2ε2 cos φ2 + l3ε3 cos φ3 − l2ω2
2 sin φ2 − l3ω2
3 sin φ3 = l1ω2
1 sin φ1,
−l∗
2 ε2 sin φ2 − l4ε4 sin φ4 − l5ε5 sin φ5 − l∗
2 ω2
2 cos φ2 (4.1248)
−l4ω2
4 cos φ4 − l5ω2
5 cos φ5 = l1ω2
1 cos φ1,
l∗
2 ε2 cos φ2 + l4ε4 cos φ4 + l5ε5 cos φ5 − l∗
2 ω2
2 sin φ2
−l4ω2
4 sin φ4 − l5ω2
5 sin φ5 = l1ω2
1 sin φ1.
By using the notations
[A] =




−l2 sin φ2 −l3 sin φ3 0 0
l2 cos φ2 l3 cos φ3 0 0
−l∗
2 sin φ2 0 −l4 sin φ4 −l5 sin φ5
l∗
2 cos φ2 0 l4 cos φ4 l5 cos φ5



 , (4.1249)
{B} = sin φ1 − cos φ1 sin φ1 − cos φ1
T
, {ω} = ω2 ω3 ω4 ω5
T
, (4.1250)
[Ap] =




−l2 cos φ2 −l3 cos φ3 0 0
−l2 sin φ2 −l3 sin φ3 0 0
−l∗
2 cos φ2 0 −l4 cos φ4 −l5 cos φ5
−l∗
2 sin φ2 0 −l4 sin φ4 −l5 sin φ5



 , (4.1251)
{Bp} = cos φ1 sin φ1 cos φ1 sin φ1
T
,
{ω2
} = ω2
1 ω2
2 ω2
3 ω2
4
T
,
{ε} = ε1 ε2 ε3 ε4
T
, (4.1252)
the systems of equation (4.1247) and equation (4.1248) are written in the matrix form
[A]{ω} = l1ω1{B}, (4.1253)
[A]{ε} = l1ω2
1{Bp} − [Ap]{ω2
}, (4.1254)
obtaining thus the solutions
{ω} = l1ω1[A]−1
{B}, (4.1255)
{ε} = l1ω2
1[A]−1
{Bp} − [A]−1
[Ap]{ω2
}. (4.1256)
262 LINEAR ALGEBRA
2. Numerical calculation
The following values result:
l1 = l, l2 = 3l, l∗
2 = 4l, l3 = 3l, l4 = 2l, l5 = 2
√
3l,
XA = 0, YA = 0, XG = 5l, YG = 0, XD = 4l, YD = 0,
φ1 = 0
◦
, φ2 = 60
◦
, φ3 = 300
◦
, φ4 = 0
◦
, φ5 = 270
◦
, (4.1257)
[A] =




−0.51962 0.51962 0 0
0.3 0.3 0 0
−0.69282 0 0 0.69282
0.4 0 0.4 0



 , (4.1258)
{B} = 0.86603 −0.5 0.86603 −0.5
T
, (4.1259)
[Ap] =




−0.3 −0.3 0 0
−0.51962 0.51962 0 0
−0.4 0 −0.4 0
−0.69282 0 0 0.69282



 , (4.1260)
{Bp} = 0.5 0.86603 0.5 0.86603
T
, (4.1261)
[A]−1
=




−0.962242 1.666667 0 0
0.962242 1.666667 0 0
0.962242 −1.666667 0 2.5
−0.962242 1.666667 1.44376 0



 , (4.1262)
{ω} = −33.333 0 8.333 −33.333
T
, (4.1263)
{ω2
} = 1111.089 0 69.439 1111.089
T
, (4.1264)
{ε} = 2566.03 5131.99 −280.61 4690.98
T
. (4.1265)
Problem 4.16
We consider a mechanical system, the motion of which is defined by the matrix differential equation
with constant coefficients
{˙x} = [A]{x} + [B]{u}, (4.1266)
where
• {x} = x1 x2 · · · xn
T
is the state vector;
• {u} = u1 u2 · · · um
T
is the command vector;
• [A] = [aij ]1≤i,j≤n is the matrix of coefficients;
• [B] = [bij ]
1 ≤ i ≤ n
1 ≤ j ≤ m
is the command matrix.
Knowing that the matrix [A] has either positive solutions or complex ones with a positive real
part, determine a command vector to make stable the motion with the aid of a reaction matrix.
Numerical application for
[A] =


1 1 0
1 1 1
0 1 1

 , [B] =


0
0
1

 . (4.1267)
APPLICATIONS 263
Solution:
1. Theory
If the matrix [A] has all its eigenvalues either strictly negative or complex with a negative real
part, then even the null command vector {u} = {0} satisfies the condition that the motion be stable.
If this condition is not fulfilled, then we may determine a command vector of the form
{u} = [K]{x}, (4.1268)
[K] being the reaction matrix, so that the motion is stable. To do this, the eigenvalues of the matrix
[A] + [B][K] must be either negative or complex with the real part negative. To determine the
matrix [K] that must fulfill these conditions, we may use the method of allocation of poles, by
choosing convenient eigenvalues λ1, λ2, . . . , λn and obtaining the elements of the matrix [K] by
means of the equations
det[[A] + [B][K] − λ[I]] = 0, λ = λ1, λ2, . . . , λn. (4.1269)
2. Numerical calculation
In the numerical case considered, the eigenvalues of the matrix [A] are given by the equation
1 − λ 1 0
1 1 − λ 1
0 1 1 − λ
= 0, (4.1270)
That is,
λ1 = 1, λ2,3 = 1 ±
√
2; (4.1271)
thus, in the absence of the command, the motion is unstable.
In the numerical case considered, the reaction matrix [K] is of the form
[K] = α1 α2 α3 , (4.1272)
hence equation (4.1269) reads
1 − λ 1 0
1 1 − λ 1
α1 1 + α2 1 + α3 − λ
= 0 (4.1273)
or
λ3
− λ2
(3 + α3) + λ(−α2 + 2α3 + 1) + 1 − α1 + α2 = 0. (4.1274)
If we allocate the poles
λ1 = −1, λ2 = −2 + i, λ3 = −2 − i, (4.1275)
then
λ1 + λ2 + λ3 = −5, λ1λ2 + λ1λ3 + λ2λ3 = 9, λ1λ2λ3 = −5 (4.1276)
and we obtain the system
−5 = 3 + α3, 9 = −α2 + 2α3 + 1, −5 = α1 − α2 − 1, (4.1277)
from which it follows that
α1 = −28, α2 = −24, α3 = −8; (4.1278)
264 LINEAR ALGEBRA
x
bb
y
a a
O
p(x)
p(x)
Figure 4.38 Problem 4.17.
as a conclusion, the reaction matrix is
[K] = −28 −24 −8 , (4.1279)
so that the command becomes
u = −28x1 − 24x2 − 8x3. (4.1280)
Problem 4.17
Let a rectangular plate of dimensions 2a and 2b (λ = a/b, λ = 1/λ = b/a) be subjected in the
middle plane by the distributed loads
p(x) = p(x) = b0 +
n
bn cos γnx, γn =
nπ
a
, n ∈ N, (4.1281)
distributed on y = ±b, respectively, symmetrical with respect to both axes of coordinates (Fig. 4.38).
The state of stress (σx and σy, normal stresses; τxy , tangential stress) may be expressed in the form
σx =
n
(−1)n
An 1(γny) cos γnx +
m
(−1)m
Bm 2(δmx) cos δmy,
σy = b0 +
n
(−1)n
An 2(γny) cos γnx +
m
(−1)m
Bm 1(δmx) cos δmy, (4.1282)
τxy =
n
(−1)n
An 3(γny) sin γnx +
m
(−1)m
Bm 3(δmx) sin δmy,
where it has been denoted (i = 1, 2, 3)
i(γny) = i(νπζ) for ν = nλ , n ∈ N, ζ = η,
i(δmx) = i(νπζ) for ν = mλ, m ∈ N, ζ = ξ, (4.1283)
with
ξ =
x
a
, η =
y
b
, δm =
mπ
b
, m ∈ N. (4.1284)
The functions i(νπζ) are defined by the relations
1(νπζ) =
νπ
sinh νπ
[(1 − νπ coth νπ) cosh νπζ + νπζ sinh νπζ],
2(νπζ) =
νπ
sinh νπ
[(1 + νπ coth νπ) cosh νπζ − νπζ sinh νπζ],
3(νπζ) = −
νπ
sinh νπ
(νπ coth νπ sinh νπζ − νπζ cosh νπζ). (4.1285)
APPLICATIONS 265
The sequences of coefficients An and Bn are given by the system of equations with a double
infinity of unknowns
i
µ2
mi Ai + κ(mλ)Bm = 0, m, i ∈ N,
l
µ2
lnBl + κ (nλ )An = (−1)n
bn, n, l ∈ N,
(4.1286)
where we have introduced the rational function
µmn =
2γmδm
γ2
n + δ2
m
, n, m ∈ N (4.1287)
and the hyperbolic functions
κ(mλ) = 1(δma) = coth δma +
δma
sinh2
δma
δma, m ∈ N,
κ (nλ ) = 1(γnb) = coth γnb +
γnb
sinh2
γnb
γnb, n ∈ N. (4.1288)
To solve the system of infinite linear algebraic equations (4.1286) by approximate methods, we
must prove the existence and the uniqueness of the solution, as well as its boundedness; thus we
search whether the system is regular or not.
The system is completely regular if the conditions
i
µ2
mi < κ(mλ),
l
µ2
ln < κ (mλ ) (4.1289)
are fulfilled.
Solution: Let the expansions into Fourier series be given by
3(γny) =
m
(−1)m
µ2
mn cos δmy, 3(δmx) =
n
(−1)n
µ2
mn cos γnx. (4.1290)
In particular, we get
κ(mλ) = 3(δma) = coth δma −
δma
sinh2
δma
δma, m ∈ N,
κ (nλ ) = 3(γnb) = coth γnb −
γnb
sinh2
γnb
γnb, n ∈ N (4.1291)
Relations (4.1289) and (4.1290) lead to
i
µ2
mi = κ(mλ) − 2
δma
sinh δma
2
,
l
µ2
ln = κ (nλ ) − 2
γnb
sinh γnb
2
. (4.1292)
Thus, conditions (4.1288) become
2
δma
sinh δma
2
> 0, 2
γnb
sinh γnb
2
> 0. (4.1293)
266 LINEAR ALGEBRA
We notice that these magnitudes tend to zero for m → ∞ or n → ∞. Hence, the system of
equations (4.1286) is regular, but not completely regular.
We have
ρm = 1 −
i
µ2
mi
κ(mλ)
=
2 δma
sinh δma
2
κ(mλ)
=
4δma
sinh 2δma + 2δma
,
ρn =
4γnb
sinh 2γnb + 2γnb
. (4.1294)
Asking that the solution of the infinite system of equations, the existence of which is assured
for a regular system, be bounded and obtained by successive approximations, the free terms, that
is, the Fourier coefficients bn must satisfy the condition
bn =
bn
κ2(nλ )
≤ Kρn, (4.1295)
where K is a positive constant, hence bn must be of the same order of magnitude as ρn. As a result,
the type of external loads that may be taken into consideration is very restricted.
The solution of a regular system, however, is not necessarily unique. To study this problem, we
make the change of variable
An = γ2
nAn, Bm = δ2
mBm, m, n ∈ N. (4.1296)
Thus, system (4.1286) becomes
i
ω2
µ2
mi Ai + κ(mλ)Bm = 0, κ (mλ )An +
l
ω
2
µ2
lnBl = (−1)n
bn, (4.1297)
where we have denoted
ω =
m
n
λ, ω =
1
ω
=
n
m
λ , (4.1298)
eventually taking n = i or m = l.
Let the expansions into Fourier series be given by
1(γny) = 2 +
m
(−1)m
ω
2
µ2
mn cos δmy, 1(δmx) = 2 +
n
(−1)n
ω2
µ2
mn cos γnx. (4.1299)
Relations (4.1287) and (4.1298) allow us now to write
i
ω2
µ2
mi = χ(mλ) − 2,
l
ω
2
µ2
ln = χ (nλ ) − 2. (4.1300)
Thus, we get
ρm = 1 −
i
ω2
µ2
mi
κ2(mλ)
=
2
κ2(mλ)
=
2
δma coth δma + δma
sinh2δma
,
ρn =
2
κ (nλ )
=
2
γnb coth γnb + γnb
sinh2γnb
. (4.1301)
APPLICATIONS 267
Hence, the system of equations (4.1296) is regular too (not completely regular). Thus, the Fourier
coefficients bn must be of order of magnitude 1/n2
(ρm and ρn tend to zero for m → ∞ and for
n → ∞).
As, by a change of variable of form (4.1295), where γn → ∞ and δm → ∞, together with
n → ∞ and m → ∞, we have obtained also a regular system with bounded free terms, on the
basis of a theorem of P. S. Bondarenko, we can affirm that the solution of both systems is unique.
It is also sufficient for system (4.1286) to have Fourier coefficients of order of magnitude 1/n2
.
Hence, we can consider any case of loading with a distributed load (we cannot make a calculation
for a concentrated load; in this case, this force must be replaced by an equivalent distributed load
on a certain interval).
To diminish the restriction imposed on the external loads, we will try a new change of variable,
namely,
An = γnAn, Bm = δmBm, m, n ∈ N. (4.1302)
System (4.1286) reads
i
ωµ2
mi Ai + κ(mλ)Bm = 0, κ (nλ )An +
l
ω µ2
lnBl = (−1)n
γnbn, (4.1303)
in this case. Taking into account
i
ωµ2
mi =
i
µmi (ωµmi ) ≤
i
µ2
mi
i
ω2
µ2
mi
= κ(mλ)[κ(mλ) − 2] < κ(mλ) (4.1304)
and
l
ω µ2
ln ≤ κ (nλ )[κ (nλ ) − 2] < κ (nλ ), (4.1305)
we may affirm that system (4.1302) is regular too, obtaining the same conclusions as above. But
the evaluations thus made are not strict; we try now to bring some improvements.
We notice that we may write
i
ωµ2
mi = 4(mλ)2
i
i
[i2 + (mλ)2]2
. (4.1306)
On the basis of some evaluations made by P. S. Bondarenko, who considers that the series above
approximates a certain definite integral, we can write
i
i
[i2 + (mλ)2]2
≤



f1 (mλ) , mλ ≤ 3,
f2(mλ), 3 < mλ ≤ 4,
f3(mλ), mλ > 4,
(4.1307)
where we denoted
f1(mλ) =
1
[1 + (mλ)2]2
+
1
[4 + (mλ)2]2
+
21 + (mλ)2
4[9 + (mλ)2]2
+
32 + (mλ)2
4[16 + (mλ)2]
,
f2(mλ) =
1
8(mλ)2
+
3 + (mλ)2
4[1 + (mλ)2]2
+
2
[4 + (mλ)2]2
+
32 + (mλ)2
4[16 + (mλ)2]
, (4.1308)
f3(mλ) =
1
4(mλ)2
+
3 + (mλ)2
4[1 + (mλ)2]2
+
8 + (mλ)2
4[4 + (mλ)2]
.
268 LINEAR ALGEBRA
It follows that
ρm = 1 −
i
ωµ2
mi
χ(mλ)
≥ 1 −
4(mλ)2
fk(mλ)
χ(mλ)
= 1 −
1
π
(mλ)2
fk(mλ)
coth mλ + πmλ
sinh2πmλ
, (4.1309)
for k = 1, 2, 3. The denominator of the last fraction is superunitary, being equal to the unity only
for m → ∞. Hence, we may write
ρm ≥ 1 −
4
π
(mλ)2
fk(mλ). (4.1310)
The maximum of the function (mλ)2fk(mλ) is smaller or at most equal to the sum of the
maximum values of each component fraction, for every variation interval of the argument mλ. We
may thus write
max[(mλ)2
f1(mλ)] ≤
1
4
+
1
9
+
5
24
+
369
2500
< 0.250 + 0.112 + 0.210 + 0.148 = 0.720,
max[(mλ)2
f2(mλ)] ≤
1
24
+
27
100
+
18
169
+
1
4
< 0.042 + 0.270 + 0.108 + 0.250 = 0.670,
max[(mλ)2
f3(mλ)] ≤
1
16
+
76
289
+
1
4
< 0.065 + 0.265 + 0.250 = 0.580. (4.1311)
Thus,
ρm > 1 −
4
π
0.720 > 1 − 0.920 = 0.080 > 0, (4.1312)
for any m (for m → ∞ too).
Analogically, we may show that
ρn = 1 −
l
ω µ2
ln
κ (nλ )
≥ 1 −
4(nλ )2
fk(nλ )
κ (nλ )
> 0.080 > 0, (4.1313)
for any n (for n → ∞ too).
Hence, the infinite system (4.1302) is completely regular. Its free terms, that is, the Fourier
coefficients bn must be all bounded; but we cannot consider loadings with concentrated moments
(in this case, the Fourier coefficients bn must be of the order of magnitude of n, so that they cannot
all be bounded).
The linear system of algebraic equations may now be solved on sections (the first n equations
with the first n unknowns), obtaining a result as accurate as we choose.
Let us now show that, from the infinite system of linear algebraic equations, we may obtain
An =
1
κ (nλ )
(−1)n
bn −
l
µ2
lnBl , Bm = −
1
κ(mλ)
l
µ2
mi Ai. (4.1314)
FURTHER READING 269
Introducing An in the first group of equations (4.1286), we obtain the system
l
aml Bl = cm, (4.1315)
with
aml = −
i
µ2
mi µ2
li
κ (iλ )
, aml = alm, m = l,
amm = κ(mλ) −
i
µ4
mi
κ (iλ )
, cm = −
k
(−1)k
bk
µ2
mk
κ (kλ )
, (4.1316)
while, introducing Bm in the second group of equations (4.1286), we obtain the system
i
bni Bi = dn, (4.1317)
with
bni = −
l
µ2
lnµ2
li
κ(lλ)
, bni = bin, n = i,
bnn = κ (nλ ) −
l
µ4
ln
κ(lλ)
, dn = (−1)n
bn. (4.1318)
We obtain that both systems are symmetric with respect to the principal diagonal. We obtain thus
a system of equations for each sequence of unknown coefficients. These systems have, obviously,
the same properties as system (4.1286) and may be similarly studied.
FURTHER READING
Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of
America.
Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis:
Theory, Methods and Practice. Boca Raton: CRC Press.
Atkinson KE (1989). An Introduction to Numerical Analysis. 2nd ed. New York: John Wiley & Sons,
Inc.
Atkinson KE (1993). Elementary Numerical Analysis. 2nd ed. New York: John Wiley & Sons, Inc.
Atkinson K, Han W (2010). Theoretical Numerical Analysis: A Functional Analysis Framework. 3rd
ed. New York: Springer-Verlag.
Bakhvalov N (1976). M´ethodes Num´erique. Moscou: Editions Mir (in French).
Berbente C, Mitran S, Zancu S (1997). Metode Numerice. Bucures¸ti: Editura Tehnic˘a (in Romanian).
Bhatia R (1996). Matrix Analysis. New York: Springer-Verlag.
Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole.
Butt R (2009). Introduction to Numerical Analysis Using MATLAB. Boston: Jones and Bartlett Pub-
lishers.
Chapra SC (1996). Applied Numerical Methods with MATLAB for Engineers and Scientists. Boston:
McGraw-Hill.
Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson.
270 LINEAR ALGEBRA
Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice Hall.
Den Hartog JP (1961). Strength of Materials. New York: Dover Books on Engineering.
D´emidovitch B, Maron I (1973). ´El´ements de Calcul Num´erique. Moscou: Editions Mir (in French).
DiBenedetto E (2010). Classical Mechanics: Theory and Mathematical Modeling. New York: Springer-
Verlag.
Epperson JF (2007). An Introduction to Numerical Methods and Analysis. Hoboken: John Wiley &
Sons, Inc.
Fung YC, Tong P (2011). Classical and Computational Solid Mechanics. Singapore: World Scientific
Publishing.
Golub GH, van Loan CF (1996). Matrix Computations. 3rd ed. Baltimore: John Hopkins University
Press.
Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen-
tation of Algorithms. Princeton: Princeton University Press.
Hamming RW (1987). Numerical Methods for Scientists and Engineers. 2nd ed. New York: Dover
Publications.
Hamming RW (2012). Introduction to Applied Numerical Analysis. New York: Dover Publications.
Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing.
Hibbeler RC (2010). Mechanics of Materials. 8th ed. Englewood Cliffs: Prentice Hall.
Higham NJ (2002). Accuracy and Stability of Numerical Algorithms. 2nd ed. Philadelphia: SIAM.
Hildebrand FB (1987). Introduction to Numerical Analysis. 2nd ed. New York: Dover Publications.
Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill.
Ionescu GM (2005). Algebr˘a Liniar˘a. Bucures¸ti: Editura Garamond (in Romanian).
Jazar RN (2008). Vehicle Dynamics: Theory and Applications. New York: Springer-Verlag.
Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd
ed. Boca Raton: CRC Press.
Kelley CT (1987). Iterative Methods for Linear and Nonlinear Equations. Philadelphia: SIAM.
Kleppner D, Kolenkow RJ (2010). An Introduction to Mechanics. Cambridge: Cambridge University
Press.
Kress R (1996). Numerical Analysis. New York: Springer-Verlag.
Krˆılov AN (1957). Lect¸ii de Calcule prin Aproximat¸ii. Bucures¸ti: Editura Tehnic˘a (in Romanian).
Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill.
Lange K (2010). Numerical Analysis for Statisticians. 2nd ed. New York: Springer-Verlag.
Lurie AI (2002). Analytical Mechanics. New York: Springer-Verlag.
Lurie AI (2005). Theory of Elasticity. New York: Springer-Verlag.
Mabie HH, Reinholtz CF (1987). Mechanisms and Dynamics of Machinery. 4th ed. New York: John
Wiley & Sons, Inc.
Marinescu G (1974). Analiza Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Romanian).
Meriam JL, Kraige LG (2012). Engineering Mechanics: Dynamics. Hoboken: John Wiley & Sons, Inc.
Otto SR, Denier JP (2005). An Introduction to Programming and Numerical Methods in MATLAB.
London: Springer-Verlag.
Palm WJ III (2007). Mechanical Vibrations. Hoboken: John Wiley & Sons, Inc.
Pandrea N (2000). Elemente de Mecanica Solidului ˆın Coordonate Pl¨uckeriene. Bucures¸ti: Editura
Academiei Romˆane (in Romanian).
Pandrea N, Pˆarlac S, Popa D (2001). Modele pentru Studiul Vibrat¸iilor Automobilelor. Pites¸ti: Tiparg
(in Romanian).
Pandrea N, Popa D (2000). Mecanisme Teorie s¸i Aplicat¸ii CAD. Bucures¸ti: Editura Tehnic˘a
(in Romanian).
Pandrea N, St˘anescu ND (2002). Mecanic˘a. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
FURTHER READING 271
Postolache M (2006). Modelare Numeric˘a. Teorie s¸i Aplicat¸ii. Bucures¸ti: Editura Fair Partners (in
Romanian).
Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific
Computing. 3rd ed. Cambridge: Cambridge University Press.
Quarteroni A, Sacco R, Saleri F (2010). Numerical Mathematics. 2nd ed. Berlin: Springer-Verlag.
Ralston A, Rabinowitz P (2001). A First Course in Numerical Analysis. 2nd ed. New York: Dover
Publications.
Reza F (1973). Spat¸ii Liniare. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press.
Salvadori MG, Baron ML (1962). Numerical Methods in Engineering. Englewood Cliffs: Prentice
Hall.
Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson.
Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN.
Bucures¸ti: Editura Tehnic˘a (in Romanian).
Sinha AK (2010). Vibration of Mechanical Systems. Cambridge: Cambridge University Press.
St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
Stoer J, Bulirsh R (2010). Introduction to Numerical Analysis. 3rd ed. New York: Springer-Verlag.
S¨uli E, Mayers D (2003). An Introduction to Numerical Analysis. Cambridge: Cambridge University
Press.
Trefethen LN, Bau D III (1997). Numerical Linear Algebra. Philadelphia: SIAM.
Udris¸te C, Iftode V, Postolache M (1996). Metode Numerice de Calcul. Algoritmi s¸i Programe Turbo
Pascal. Bucures¸ti: Editura Tehnic˘a (in Romanian).
Voi´evodine V (1980). Principes Num´eriques d’Alg´ebre Lin´eare. Moscou: Editions Mir (in French).
Wilkinson JH (1988). The Algebraic Eigenvalue Problem. Oxford: Oxford University Press.
5
SOLUTION OF SYSTEMS OF NONLINEAR
EQUATIONS
5.1 THE ITERATION METHOD (JACOBI)
Let us consider the equation1
F(x) = 0, (5.1)
where F : D ⊂ Rn
→ Rn
, x ∈ Rn
.
In components, we have
f1(x1, x2, . . . , xn) = 0, f2(x1, x2, . . . , xn) = 0, . . . , fn(x1, x2, . . . , xn). (5.2)
Let us now write equation (5.1) in the form
x = G(x), (5.3)
where G : D ⊂ Rn
→ Rn
or, in components,
x1 = g1(x1, . . . , xn), x2 = g2(x1, . . . , xn), . . . , xn = gn(x1, . . . , xn). (5.4)
We observe that, if G is a contraction, then the sequence of successive iterations
x(0)
∈ D arbitrary, x(1)
= G(x(0)
), x(2)
= G(x(1)
), . . . , x(n+1)
= G(x(n)
), . . . , n ∈ N∗
(5.5)
where we assume that x(i) ∈ D for any i ∈ N∗ is convergent, as proved by Banach’s fixed-point
theorem, because Rn
is a Banach space with the usual Euclidean norm. The limit of this sequence
is
lim
n→∞
x(n)
= x (5.6)
1
The method is a generalization in the case of nonlinear systems for the Jacobi method in the case of linear systems
of equations.
Numerical Analysis with Applications in Mechanics and Engineering, First Edition.
Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.
 2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc.
273
274 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS
and satisfies the relation
x = G(x). (5.7)
Observation 5.1 If G is a contraction, then all the functions gi(x), i = 1, n, are contractions.
Indeed, if G is a contraction, then there exists q ∈ R, 0 < q < 1, so that
G (x) − G(y) k ≤ q x − y k, (5.8)
for any x and y of D, k being the Euclidean norm on Rn. Relation (5.8) may also be written in
the form
n
i=1
(gi(x) − gi(y))2
≤ q
n
i=1
(xi − yi)2
. (5.9)
On the other hand,
n
i=1
(gi(x) − gi(y))2
≥ (gj (x) − gj (y))2
(5.10)
for any j = 1, n and relation (5.9) leads to
|gj (x) − gj (y)| ≤ q x − y , (5.11)
that is, gj : D ⊂ Rn
→ R is a contraction.
Observation 5.2 Let us suppose that gi : D ⊂ Rn
→ R is a contraction for any i = 1, n; it does
not mean that G : D ⊂ Rn
→ Rn
is also a contraction. Indeed, let us suppose that n = 2, so that
gi(x) = gi(x1, x2) = λxi, i = 1, 2, 0 < λ < 1. (5.12)
We have
|gi(x) − gi(y)| = λ|x1 − y1| = λ (x1 − y1)2
≤ λ (x1 − y1)2 + (x2 − y2)2 = λ x − y ,
(5.13)
so that gi, i = 1, 2, are contractions.
On the other hand,
G (x) − G(y) = (g1(x) − g1(y))2 + (g2(x) − g2(y))2
= λ2(x1 − y1)2 + λ2(x1 − y1)2 = λ
√
2|x1 − y1|.
(5.14)
Let us now choose λ > 1/
√
2 and x and y so that
x = x1 a
T
, y = y1 a
T
. (5.15)
It follows that
x − y = (x1 − y1)2 = |x1 − y1|, (5.16)
NEWTON’S METHOD 275
hence the condition G (x) − G(y) ≤ q x − y , 0 < q < 1, leads to
q|x1 − y1| ≥ λ
√
2|x1 − y1| > |x1 − y1|, (5.17)
which is absurd.
Observation 5.3 Let us consider the Jacobian of G,
J =











∂g1
∂x1
∂g1
∂x2
· · ·
∂g1
∂xn
∂g2
∂x1
∂g2
∂x2
· · ·
∂g2
∂xn
· · · · · · · · · · · ·
∂gn
∂x1
∂gn
∂x2
· · ·
∂gn
∂xn











, (5.18)
and one of the norms ∞ or 1. Proceeding as in the one-dimensional case, it follows that if
J ∞ = max
i=1,n
n
j=1
∂gi
∂xj
< 1 on D (5.19)
or if
J 1 = max
j=1,n
n
i=1
∂gj
∂xi
< 1 on D, (5.20)
respectively, then the function G is a contraction and the sequence of successive iterations is
convergent.
5.2 NEWTON’S METHOD
Let the equation2
be
f(x) = 0, (5.21)
where f : D ⊂ Rn
→ Rn
, and let us denote by x its solution.
We suppose that
f = f1 f2 · · · fn
T
, (5.22)
the functions fi, i = 1, n, being of class C1
on D. We also suppose that the determinant of Jacobi’s
matrix does not vanish at x,
det J =
∂f1
∂x1
∂f1
∂x2
· · ·
∂f1
∂xn
∂f2
∂x1
∂f2
∂x2
· · ·
∂f2
∂xn
· · · · · · · · · · · ·
∂fn
∂x1
∂fn
∂x2
· · ·
∂fn
∂xn x=x
= 0. (5.23)
2This method is the generalization of the Newton method presented in Chapter 2.
276 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS
It follows that there exists a neighborhood V of x so that det J = 0 on V.
Let us consider an arbitrary point x ∈ V. There exists J−1
(x) under these conditions, and we
may define a recursive sequence, so that
x(0)
∈ V arbitrary, x(k)
= x(k−1)
− J−1
(x(k−1)
)f(x), k ∈ N∗
, (5.24)
with the condition x(i) ∈ V, i ∈ N.
Theorem 5.1 Let f : D ⊂ Rn → Rn and equation (5.21) with the solution x. Let us suppose that
det J(x) = 0 too. If there exists the real constants α, β, and γ so that
J−1
x(0)
≤ α, (5.25)
x(1)
− x(0)
≤ γ, (5.26)
n
i=1
n
j=1
∂2
fi
∂xi∂yj
< β; i = 1, n; j = 1, n, (5.27)
2nαβγ < 1, (5.28)
then the recurrent sequence defined by relation (5.24) is convergent to the solution x of equation
(5.21).
Demonstration. It is analogous to that of the Theorem 2.5.
As stopping conditions for the iterative process we use
x(k)
− x(k−1)
< ε, (5.29)
f x(k)
< ε, (5.30)
where is one of the canonical norms of the matrix. Sometimes, we use both conditions (5.29)
and (5.30) together. A variant of condition (5.29) is given by
x(k)
− x(k−1)
< ε, if x(k)
≤ 1, (5.31)
x(k) − x(k−1)
x(k)
< ε, if x(k)
> 1. (5.32)
5.3 THE MODIFIED NEWTON METHOD
If the matrix J−1 is continuous on a neighborhood of the solution x and if the start vector x(0) is
sufficiently close to x, that is, it fulfills the conditions of Theorem 5.1, then we may replace the
sequence of iterations
x(k+1)
= x(k)
− J−1
(x(k)
)f(x(k)
) (5.33)
by the sequence
x(k+1)
= x(k)
− J−1
(x(0)
)f(x(k)
), (5.34)
obtaining thus a variant of Newton’s method3
; this variant has the advantage in that the calculation
of the inverse J−1
at each iteration step is no more necessary.
3It is the generalization of the modified Newton method discussed in Chapter 2.
THE GRADIENT METHOD 277
5.4 THE NEWTON–RAPHSON METHOD
Let us consider the system of nonlinear equations4
f(x) = 0 (5.35)
for which an approximation x(0)
of the solution x is known.
Let us now determine the variation
δ(0)
= δ1 δ2 · · · δn
T
, (5.36)
so that x(0)
+ δ(0)
be a solution of equation (5.35). Expanding the components fi, i = 1, n, of the
vector function f into a Taylor series around x(0), we have
fi(x0
) + δ(0)
1
∂fi
∂x1 x=x(0)
+ δ(0)
2
∂fi
∂x2 x=x(0)
+ · · · + δ(0)
n
∂fi
∂xn x=x(0)
+ · · · + = 0, i = 1, n. (5.37)
We neglect the terms of higher order in relation (5.37), obtaining thus a linear system of n equations
with n unknowns
δ(0)
1
∂f1
∂x1 x=x(0)
+ δ(0)
2
∂f1
∂x2 x=x(0)
+ · · · + δ(0)
n
∂f1
∂xn x=x(0)
= −f1(x0
), . . . ,
δ(0)
1
∂fn
∂x1 x=x(0)
+ δ(0)
2
∂fn
∂x2 x=x(0)
+ · · · + δ(0)
n
∂fn
∂xn x=x(0)
= −fn(x0
).
(5.38)
By solving this system, we obtain the values δ(0)
i , i = 1, n. The new approximation of the solution is
x(1)
= x(0)
+ δ(0)
(5.39)
and the procedure continues, obtaining successively δ(1), x(2), δ(2), x(3), and so on.
5.5 THE GRADIENT METHOD
Let the equation be
f(x) = 0, (5.40)
where f : Rn → Rn is at least of class C1 on a domain D ⊂ Rn, while x = x1 x2 · · · xn
T
.
Equation (5.40) may also be written in the form of a system with n unknowns
f1(x1, . . . , xn) = 0, f2(x1, . . . , xn) = 0, . . . , fn(x1, . . . , xn) = 0. (5.41)
Let us consider the function
U(x) =
n
i=1
f 2
i (x1, x2, . . . , xn). (5.42)
4It is easy to prove that the Newton method is equivalent to the Newton–Raphson method; moreover, they lead
to the same results.
278 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS
We observe that the solution x of equation (5.40) is solution of the equation
U(x) = 0 (5.43)
too and reciprocally. We thus reduce the problem of solving equation (5.40) to an equivalent problem
of determination of the absolute minimum of the function U(x).
Let us denote by x(0)
the first approximation of the solution of equation (5.40), or the first
approximation of the absolute minimum of function (5.42). We will draw through x(0)
the level
hypersurface of the function U(x) = U(x(0)) (Fig. 5.1). We will go along the normal to this hyper-
surface at the point P0 until it pierces another hypersurface U(x) = U(x(1)) where it meets the point
P1 of coordinate x(1)
with U(x(1)
) < U(x(0)
). Starting now from the point P1 along the normal to
the hypersurface U(x) = U(x(1)
), we obtain the point P2 corresponding to the intersection of the
normal with the hypersphere U(x) = U(x(2)
), where U(x(2)
) < U(x(1)
). Let this point of coordi-
nate be x(2)
. The procedure continues, obtaining thus the sequence of points P1, P2, P3, . . . , Pn for
which we have the sequence of relations
U(x(1)
) > U(x(2)
) > U(x(3)
) > · · · > U(x(n)
) > · · · (5.44)
it follows that the sequence of points P1, P2, . . . , Pn, . . . approaches the point P , which realizes
the minimum value of the function U(x).
The triangle OP0P1 leads to
x(1)
= x(0)
− λ0∇U(x(0)
), (5.45)
where ∇U means the gradient of the function U
∇U(x) =
∂U
∂x1
∂U
∂x2
· · ·
∂U
∂xn
T
. (5.46)
Let the function now be
φ(λ0) = U(x(0)
− λ0∇U(x(0)
)). (5.47)
We must search that value of the parameter λ0 for which the function φ(λ0) will be minimum, from
which it follows that
∂φ
∂λ0
= 0 (5.48)
P
O
P0
P2
P1
P3
U(x(2)
)
U(x(3)
)
U(x(1)
)
U(x(0)
)
x(3)
x(1)
x(0)
x(2)
Figure 5.1 The gradient method.
THE GRADIENT METHOD 279
or
∂
∂λ0
U(x(0)
− λ0∇U(x(0)
)) = 0, (5.49)
λ0 being the smallest positive solution of equation (5.49).
On the other hand, we have
φ(λ0) =
n
i=1
f 2
i (x(0)
− λ0∇U(x(0)
)) = 0. (5.50)
Expanding the functions fi into a Taylor series, supposing that λ0 1 and neglecting the nonlinear
terms in λ0, we obtain the relation
φ(λ0) =
n
i=1
f 2
i

x(0)
− λ0
n
j=1
∂fi x(0)
∂xj
∂U(x(0)
)
∂xj

 . (5.51)
Condition (5.48) of minimum may be now written in the form
2
n
i=1




fi x(0)
− λ0
n
j=1
∂fi x(0)
∂xj
∂U(x(0)
)
∂xj


n
j=1
∂fi x(0)
∂xj
∂U(x(0)
)
∂xj



= 0, (5.52)
from which it follows that
λ0 =
n
i=1

fi x(0)
n
j=1
∂fi x(0)
∂xj
∂fi(x(0)
)
∂xj


n
i=1


n
j=1
∂fi x(0)
∂xj
∂fi(x(0)
)
∂xj


2
. (5.53)
From the definition of the function U(x) we have
∂U
∂xj
=
∂
∂xj
n
i=1
f 2
i (x) = 2
n
i=1
fi (x)
∂fi(x)
∂xj
, (5.54)
∇U(x) = 2
n
i=1
fi (x)
∂fi(x)
∂x1
· · ·
n
i=1
fi(x)
∂fi(x)
∂xn
T
= 2J(x)f(x), (5.55)
where we have denoted the Jacobian of the vector function f by J(x),
J(x) =






∂f1
∂x1
∂f1
∂x2
· · ·
∂f1
∂xn
· · · · · · · · · · · ·
∂fn
∂x1
∂fn
∂x2
· · ·
∂fn
∂xn






. (5.56)
We denote the scalar product by ·, ·
x, y = xT
y, (5.57)
280 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS
where
x = x1 · · · xn
T
, y = y1 · · · yn
T
, (5.58)
so that relation (5.53) may be written in a more compact form
2λ0 =
f(x(0)
), J(x(0)
)JT
(x(0)
)f(x(0)
)
J(x(0))JT(x(0))f(x(0)), J(x(0))JT(x(0))f(x(0))
. (5.59)
Relation (5.45) now becomes
x(1)
= x(0)
− 2λ0JT
(x(0)
)f(x(0)
). (5.60)
Using the recurrence relation
x(k+1)
= x(k)
− 2λkJT
(x(k)
)f(x(k)
), (5.61)
we thus obtain the sequence of vectors x(1), . . . , x(k), . . . , where
2λk =
f(x(k)
), J(x(k)
)JT
(x(k)
)f(x(k)
)
J(x(k))JT(x(k))f(x(k)), J(x(k))JT(x(k))f(x(k))
. (5.62)
5.6 THE METHOD OF ENTIRE SERIES
Instead of solving the system of equations
fk(x1, x2, . . . , xn) = 0, k = 1, n, (5.63)
we will solve the system formed by the equations
Fk(x1, x2, . . . , xn, λ) = 0, k = 1, n, (5.64)
where Fk, k = 1, n, are analytic on a neighborhood of the solution x, while λ is a real parameter.
The functions Fk(x, λ) fulfill the condition that the solution of system (5.64) is known for λ = 0,
while for λ = 1 we have Fk(x, 1) = fk(x), k = 1, n. Moreover, Fk, k = 1, n, are analytic in λ.
Moreover, we also suppose that for 0 ≤ λ ≤ 1 system (5.64) admits an analytic solution x(λ),
while x = x(0) is an isolated solution of system (5.63).
Expanding into a Taylor series the solution xj (λ) around 0, we have
xj (λ) = xj (0) + λxj (0) +
λ2
2!
xj (0) + · · · , j = 1, n. (5.65)
Differentiating relation (5.64) with respect to λ, we obtain
n
j=1
∂Fk
∂xj
xj (λ) +
∂Fk
∂λ
= 0, k = 1, n. (5.66)
If we denote by x(0)
= x(0) the solution for λ = 0, then relation (5.66) leads to
n
j=1
∂Fk(x(0)
, 0)
∂xj
xj (0) +
∂Fk(x(0)
, 0)
∂λ
= 0, k = 1, n, (5.67)
NUMERICAL EXAMPLE 281
and if
det
∂Fk x(0)
, 0
∂xj
= 0, (5.68)
then from equation (5.67) we obtain the values xj (0), j = 1, n.
Differentiating once more expression (5.66) with respect to λ, we get
n
j=1
∂Fk
∂xj
xj (λ) +
n
j=1
n
l=1
∂2
Fk
∂xj ∂xl
xj (λ)xl(λ)
+ 2
n
j=1
∂2
Fk
∂xj ∂λ
xj (λ) +
∂2
Fk
∂λ2
= 0, k = 1, n.
(5.69)
Making now λ = 0, expression (5.69) becomes
n
j=1
∂Fk(x(0)
, 0)
∂xj
xj (0) +
n
j=1
n
l=1
∂2
Fk(x(0)
, 0)
∂xj ∂xl
xj (0)xl(0)
+ 2
n
j=1
∂2
Fk(x(0)
, 0)
∂xj ∂λ
xj (0) +
∂2
Fk(x(0)
, 0)
∂λ2
= 0, k = 1, n;
(5.70)
because the values xj (0), j = 1, n, are known, it follows that xj (0) are determined from equation
(5.70).
Obviously, the procedure of differentiation may continue now with relation (5.69), solving xj (0),
and so on.
The solution of system (5.63) is thus given by expression (5.65).
5.7 NUMERICAL EXAMPLE
Example 5.1 Let us consider the nonlinear system
50x1 + x2
1 + x2
2 + x3
2 = 52, 50x2 + x3
1 + x4
2 = 52, (5.71)
which has the obvious solution x1 = 1 and x2 = 1.
To determine the solution by Jacobi’s method, we write this system in the form
x1 = 1.04 − 0.02x2
1 − 0.02x3
2 = g1(x1, x2), x2 = 1.04 − 0.02x3
1 − 0.02x4
2 = g2(x1, x2). (5.72)
The Jacobi matrix is given by
J =




∂g1
∂x1
∂g1
∂x2
∂g2
∂x1
∂g2
∂x2



 =
−0.04x1 −0.06x2
2
−0.06x2
1 −0.08x3
2
(5.73)
and we observe that J < 1 for a neighborhood of the solution 1 1
T
.
282 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS
Let use choose as vector of start
x(0)
= 1.05 0.92
T
. (5.74)
The relation of recurrence reads


x(n+1)
1
x(n+1)
2

 =



1.04 − 0.02 x(n)
1
2
− 0.02(x(n)
2 )3
1.04 − 0.02(x(n)
1 )3 − 0.02(x(n)
2 )4


 , (5.75)
the calculations being given in Table 5.1.
To apply Newton’s method, we write
F(x) =
f1 x1, x2
f2(x1, x2)
=
50x1 + x2
1 + x3
2 − 52
50x2 + x3
1 + x3
2 − 52
, (5.76)
so that the Jacobian is
J(x) =
50 + 2x1 3x2
2
3x2
1 50 + 4x3
2
, (5.77)
from which
J−1
(x) =
1
(50 + 2x1)(50 + 4x3
2 ) − 9x2
1 x2
2
50 + 4x3
2 −3x2
2
−3x2
1 50 + 2x1
. (5.78)
The recurrence formula reads
x(n+1)
= x(n)
− J−1
(x(n)
)F(x(n)
), (5.79)
the calculation being systematized in Table 5.2.
In the case of the modified Newton method, the recurrence relation reads
x(n+1)
= x(n)
− J−1
(x(0)
)F(x(n)
), (5.80)
where
J−1
(x(0)
) =
0.019252 −0.000920
−0.001199 0.018884
, (5.81)
The calculations are given in Table 5.3.
TABLE 5.1 Solution of Equation (5.71) by Jacobi’s Method
Step x(n)
1 x(n)
2
0 1.05 0.92
1 1.002376 1.002520
2 0.999753 0.999655
3 1.000031 1.000042
4 0.999996 0.999995
5 1.000000 1.000001
NUMERICAL EXAMPLE 283
TABLE 5.2 Solution of Equation (5.71) by Newton’s Method
Step x(n)
1 x(n)
2 J−1
(xn
) F(x(n)
)
0 1.05 0.92
0.019252 −0.000920
−0.001199 0.018884
2.381188
−4.125982
1 1.000361 1.000770
0.019252 −0.001075
−0.001072 0.018575
0.021084
0.042667
2 1.000001 1.000000
TABLE 5.3 Solution of Equation (5.71) by Newton’s
Modified Method
Step x(n)
1 x(n)
2 F(x(n)
)
0 1.05 0.92
2.381188
−4.125982
1 1.000361 1.000770
0.021084
0.042667
2 0.999994 0.999990
−0.000342
−0.000558
3 1.000000 1.000000
The problem is put to see if Newton’s method has been correctly applied, that is, if the conditions
of Theorem 5.1 are fulfilled. We thus calculate successively
J−1
x0 ∞
=
0.019252 −0.000920
−0.001199 0.018884 ∞
= 0.020172 = α, (5.82)
x(1)
− x(0)
∞
=
1.000361 −1.05
1.000770 −0.92 ∞
= 0.08077 = β, (5.83)
2
i=1
2
j=1
∂2
fi
∂xi∂xj
∞
=
2 0
0 6x2 ∞
+
6x1 0
0 12x2
2 ∞
. (5.84)
Choosing now a neighborhood of the point (1, 1), given by
x −
1
1 ∞
< 0.1, (5.85)
we deduce
2
i=1
2
j=1
∂2
fi
∂xi∂xj
∞
= |6x2| + |12x2
2 | < 6 × 1.1 + 12 × 1.12
= 21.12 = γ. (5.86)
It follows that the relation
2nαβγ = 2 × 2 × 0.020172 × 0.08077 × 21.12 = 0.1376 < 1; (5.87)
284 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS
hence, Newton’s method has been correctly applied.
Let us now pass to the solving of system (5.71) by means of the Newton–Raphson method.
To do this, we successively calculate
∂f1
∂x1
= 50 + 2x1,
∂f1
∂x2
= 3x2
2 ,
∂f2
∂x1
= 3x2
1 ,
∂f2
∂x2
= 50 + 4x3
2 , (5.88)
∂f1
∂x1 x=x(0)
= 52.1,
∂f1
∂x2 x=x(0)
= 2.5392,
∂f2
∂x1 x=x(0)
= 3.3075,
∂f2
∂x2 x=x(0)
= 53.114752, f1(x(0)
) = 2.381188, f2(x(0)
) = −4.125982
(5.89)
and obtain the system
52.1δ(0)
1 + 2.5392δ(0)
2 = −2.381188, 3.3075δ(0)
1 + 53.114752δ(0)
2 = 4.125982, (5.90)
with the solution
δ(0)
1 = −0.049641, δ(0)
2 = 0.080772, (5.91)
so that
x(1)
= x(0)
+
δ(0)
1
δ(0)
2
=
1.000359
1.000772
. (5.92)
In the following step, we have
∂f1
∂x1 x=x(0)
= 52.000718,
∂f1
∂x2 x=x(0)
= 3.004634,
∂f2
∂x1 x=x(0)
= 3.002154,
∂f2
∂x2 x=x(0)
= 54.009271, f1(x(0)
) = 0.020986, f2(x(0)
) = 0.042769,
(5.93)
the system
52.000718δ(1)
1 + 3.004634δ(1)
2 = −0.020986, 3.002154δ(1)
1 + 54.009271δ(1)
2 = −0.042769,
(5.94)
and the solution
δ(1)
1 = −0.000359, δ(1)
2 = −0.000772. (5.95)
It follows that
x(2)
= x(1)
+
δ(1)
1
δ(1)
2
=
1.000000
1.000000
. (5.96)
We observe that the Newton and Newton–Raphson methods lead to the same solution (in the
limits of the calculation approximates). As a matter of fact, the two methods are equivalent.
Let us now pass to the solution of system (5.71) by means of the gradient method.
NUMERICAL EXAMPLE 285
We calculate successively
J(x) =
50 + 2x1 3x2
2
3x2
1 50 + 4x3
2
, (5.97)
JT
(x) =
50 + 2x1 3x2
1
3x2
2 50 + 4x3
2
, (5.98)
F(x) =
50x1 + x2
1 + x3
2 − 52
50x2 + x3
1 + x4
2 − 52
, (5.99)
JT
(x)F(x) =
50 + 2x1 (50x1 + x2
1 + x2
2 − 52) + 3x2
1 (50x2 + x3
1 + x4
2 − 52)
3x2
2 (50x1 + x2
1 + x3
2 − 52) + (50 + 4x3
2 )(50x2 + x3
1 + x4
2 − 52)
, (5.100)
J(x)JT
(x) =
50 + 2x1
2
+ 9x4
2 3x2
1 (50 + 2x1)
3x2
1 (50 + 2x1) + 3x2
2 (50 + 4x2)3 9x4
4 + (50 + 4x3
2 )2
, (5.101)
The calculations are contained in Table 5.4.
Let us consider the system
F(x, λ) =
50x1 + λ x2
1 + x3
2 − 52
50x2 + λ(x3
1 + x4
2 ) − 52
=
0
0
, (5.102)
where λ is a real parameter. For λ = 1 we obtain system (5.71), while for λ = 0, the solution of
system (5.102) becomes obvious
x(0) =
1.04
1.04
. (5.103)
We observe that the conditions asked by the method of entire series are fulfilled.
TABLE 5.4 The Solution of Equation (5.71) by the
Gradient Method
Step x(n)
2λn
0
1.05
0.92
0.0003957
1
1.0076065
1.0043272
0.0003189
2
1.0004987
0.9995063
0.0004063
3
1.0000230
1.0000285
0.0003117
4
1.0000002
1.0000002
0.0003126
286 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS
We have
∂F1
∂x1
= 50 + 2λx1,
∂F1
∂x2
= 3λx2
2 ,
∂F2
∂x1
= 3λx2
1 ,
∂F2
∂x2
= 50 + 4λx3
2 , (5.104)
∂F1(x(0)
, 0)
∂x1
= 50,
∂F1(x(0)
, 0)
∂x2
= 0,
∂F2(x(0)
, 0)
∂x1
= 0,
∂F2(x(0)
, 0)
∂x2
= 50, (5.105)
∂F1
∂λ
= x2
1 + x3
2 ,
∂F2
∂λ
= x3
1 + x4
2 , (5.106)
∂F1(x(0)
, 0)
∂λ
= 1.042
+ 1.043
= 2.206464,
∂F2(x(0)
, 0)
∂λ
= 1.043
+ 1.044
= 2.29472256,
(5.107)
where
x(0)
= x(0) =
1.04
1.04
. (5.108)
It follows that the system
50x1(0) + 2.206464 = 0, 50x2(0) + 2.29472256 = 0, (5.109)
with the solution
x1(0) = −0.04412928, x2(0) = −0.045894451. (5.110)
On the other hand,
∂2
F1
∂x2
1
= 2λ,
∂2
F1
∂x1∂x2
= 0,
∂2
F2
∂x1∂x2
= 0,
∂2
F2
∂x2
2
= 12λx2
2 , (5.111)
∂2
F1
∂x1∂λ
= 2x1,
∂2
F1
∂x2∂λ
= 3x2
2 ,
∂2
F2
∂x1∂λ
= 3x2
1 ,
∂2
F2
∂x2∂λ
= 4x3
2 , (5.112)
∂2F1
∂λ2
= 0,
∂2F2
∂λ2
= 0, (5.113)
∂2
F1(x(0)
, 0)
∂x2
1
= 0,
∂2
F2(x(0)
, 0)
∂x2
2
= 0, (5.114)
∂2
F1(x(0)
, 0)
∂x1∂λ
= 2.08,
∂2
F1(x(0)
, 0)
∂x2∂λ
= 3.2448,
∂2F2(x(0), 0)
∂x1∂λ
= 3.2448,
∂2F2(x(0), 0)
∂x2∂λ
= 4.499456.
(5.115)
There follows the system
50x1 (0) − 0.481414433 = 0, 50x2 (0) − 0.699381501 = 0, (5.116)
with the solution
x1 (0) = 0.009628288, x2 (0) = 0.01398763. (5.117)
APPLICATIONS 287
We obtain the values
x1 ≈ x1(0) + x1(0) +
1
2
x1 (0) = 1.000684864,
x2 ≈ x2(0) + x2(0) +
1
2
x2 (0) = 1.001099364.
(5.118)
5.8 APPLICATIONS
Problem 5.1
Let us consider the plane articulated mechanism in Figure 5.2, where the dimensions OA = l1,
AB = l2, BC = l3, AD = l∗
2 , DE = l4, EF = l5, the angle α, the coordinates XC, YC, XF , YF and
the initial position φi = φ◦
i , i = 1, 5, are known.
Determine and represent graphically the functions φi(φ1), i = 2, 5.
Numerical application: l = 0.2 m, l1 = l, l2 = 3l, l3 = 3l, l∗
2 = 4l, l4 = 2l, l5 = 2l
√
3, α = 0◦
,
XC = 4l, YC = 0, XF = 5l, YF = 0, φ◦
1 = 0◦
, φ◦
2 = 60◦
, φ◦
3 = 60◦
, φ◦
4 = 0◦
, φ◦
5 = −90◦
, ω =
100 s−1, the imposed precision being ε = 0.0001, while the variation of the angle φ1 is φ1 = 1◦
.
Solution:
1. Theory
The vector equations
OA + AB + BC = OC, OA + AD + DE + EF = OF, (5.119)
projected on the axes OX, OY , the notations
f1 = l1 cos φ1 + l2 cos φ2 + l3 cos φ3 − XC,
f2 = l1 sin φ1 + l2 sin φ2 + l3 sin φ3 − YC,
f3 = l1 cos φ1 + l∗
2 cos(φ2 + α) + l4 cos φ4 + l5 cos φ5 − XF ,
f4 = l1 sin φ1 + l∗
2 sin(φ2 + α) + l4 sin φ4 + l5 sin φ5 − YF ,
(5.120)
being used, lead to the system of nonlinear equations
fi(φ2, φ3, φ4, φ5) = 0, i = 2, 5; (5.121)
we must determine the unknowns φ2, φ3, φ4, φ5 in function of the angle φ1.
O
ϕ2
ϕ1
α
ϕ3
ϕ4
ϕ5
(xA,yA)
C (xC,yC)
Y
X
F (xF,yF)
E
D
B
A
Figure 5.2 Problem 5.1.
288 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS
Denoting by [J] the Jacobian
[J] =




−l2 sin φ2 −l3 sin φ3 0 0
l2 cos φ2 l3 cos φ3 0 0
−l∗
2 sin φ2 + α 0 −l4 sin φ4 −l5 sin φ5
l∗
2 cos(φ2 + α) 0 l4 cos φ4 l5 cos φ5



 (5.122)
and by {φ}, {f}, { φ} the column matrices
{φ} = φ2 φ3 φ4 φ5
T
, {f} = f2 f3 f4 f5
T
, { φ} = φ2 φ3 φ4 φ5
T
,
(5.123)
we obtain the equation
[J]{ φ} = −{f}, (5.124)
from which, by means of the known initial values φ◦
i , i = 1, 5, we determine { φ}; then {φ} →
{φ◦
} + { φ}, and the iteration process is continued until | φi| < ε, i = 2, 5, ε becomes the imposed
precision.
After determination of the angles φi, i = 2, 5, an increment φ1 = 1◦
of the angle φ1 is given;
the values known from the previous step are considered to be approximate values for φi, i = 2, 5,
and the iteration process is taken again.
2. Numerical calculation
The results of the simulation are presented in Table 5.5 and graphically are plotted in the diagrams
of Figure 5.3, Figure 5.4, Figure 5.5, and Figure 5.6.
Problem 5.2
We consider the rigid solid in Figure 5.7 suspended by six bars A0iAi, i = 1, 6, spherical articulated
and having lengths variable in time
li(t) = l0i + si(t), si(0) = 0, i = 1, 6. (5.125)
0 50 100 150 200 250 300 350 400
25
30
35
40
45
50
55
60
65
70
ϕ1
ϕ2
Figure 5.3 Time history φ2 = φ2(φ1).
APPLICATIONS 289
0 50 100 150 200 250 300 350 400
–70
–65
–60
–55
–50
–45
–40
–35
–30
–25
ϕ1
ϕ3
Figure 5.4 Time history φ3 = φ3(φ1).
0 50 100 150 200 250 300 350 400
−5
0
5
10
15
20
25
30
35
ϕ1
ϕ4
Figure 5.5 Time history φ4 = φ4(φ1).
In particular, this may be the mechanical model of a Stewart platform.
The position of the rigid solid with respect to a fixed frame of reference O0XYZ is defined by
the position of the frame of reference rigidly linked to the body Oxyz, by the coordinates XO, YO,
ZO of the point O and by the Bryan angles ψ, θ, φ, respectively.
Knowing the coordinates xi, yi, zi of the points Ai, i = 1, 6, in the system Oxyz, the coordinates
X0i, Y0i, Z0i of the points A0i, i = 1, 6, in the system O0XYZ, the functions si(t), i = 1, 6, the
initial position X◦
O, Y◦
O, Z◦
O, ψ◦
, θ◦
, φ◦
, the error ε and the step t, determine the functions XO(t),
YO(t), ZO(t), ψ(t), θ(t), φ(t) and represent them graphically.
290 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS
0 50 100 150 200 250 300 350 400
−105
−100
−95
−90
−85
−80
−75
−70
−65
ϕ1
ϕ5
Figure 5.6 Time history φ5 = φ5(φ1).
Ai
A0i
O0
O (xO,yO,zO)
z
y
x
Z
X Y
Figure 5.7 Problem 5.2.
Numerical application (Fig. 5.8): l = 1 m; l0i = l, i = 1, 6; s1(t) = (l/100) sin πt; si(t) = 0,
i = 2, 6; the coordinates of the points A0i, Ai are given in Table 5.6.
We know X◦
O = Y◦
O = Z◦
O = 0 m, ψ◦
= θ◦
= φ◦
= 0 rad, ε = 10−6
, t = 0.05 s too.
Solution:
1. Theory
1.1. Notations
We denote
• Xi, Yi, Zi —the coordinates of the points Ai, i = 1, 6 in the system O0XYZ;
• {Ri}, {RO}, {ri}, i = 1, 6—column matrices defined by the relations
{Ri} = Xi Yi Zi
T
, {RO} = XO YO ZO
T
, {ri} = xi yi zi
T
;
(5.126)
APPLICATIONS 291
TABLE 5.5 Results of the Simulation
φ1[
◦
] φ2[
◦
] φ3[
◦
] φ4[
◦
] φ5[
◦
]
0.000000 60.000000 −60.000000 0.000000 −90.000000
10.000000 56.481055 −63.073225 −1.425165 −93.194507
20.000000 52.744084 −65.497920 −2.303616 −95.958155
30.000000 49.001803 −67.131160 −2.561647 −98.101199
40.000000 45.423080 −67.906509 −2.163048 −99.505742
50.000000 42.122571 −67.829804 −1.112080 −100.129729
60.000000 39.165924 −66.961696 0.551096 −99.994181
70.000000 36.582735 −65.396930 2.758832 −99.163908
80.000000 34.380507 −63.247016 5.424664 −97.729573
90.000000 32.556126 −60.628613 8.450173 −95.794641
100.000000 31.103960 −57.657128 11.729479 −93.467451
110.000000 30.020982 −54.444041 15.151752 −90.857173
120.000000 29.309571 −51.096361 18.602544 −88.072056
130.000000 28.978502 −47.716900 21.964860 −85.218490
140.000000 29.042278 −44.404335 25.120966 −82.399651
150.000000 29.518760 −41.252238 27.955835 −79.712771
160.000000 30.425042 −38.346622 30.362862 −77.244546
170.000000 31.771829 −35.762068 32.251753 −75.065015
180.000000 33.557310 −33.557310 33.557310 −73.221345
190.000000 35.762068 −31.771829 34.246675 −71.733836
200.000000 38.346622 −30.425042 34.322340 −70.596343
210.000000 41.252238 −29.518760 33.819449 −69.781847
220.000000 44.404335 −29.042278 32.798103 −69.251836
230.000000 47.716900 −28.978502 31.333094 −68.966865
240.000000 51.096361 −29.309571 29.503898 −68.895743
250.000000 54.444041 −30.020982 27.386860 −69.021894
260.000000 57.657128 −31.103960 25.050335 −69.346594
270.000000 60.628613 −32.556126 22.552622 −69.889466
280.000000 63.247016 −34.380507 19.942121 −70.686657
290.000000 65.396930 −36.582735 17.259152 −71.786823
300.000000 66.961696 −39.165924 14.539060 −73.244478
310.000000 67.829804 −42.122571 11.816457 −75.109848
320.000000 67.906509 −45.423080 9.130485 −77.414403
330.000000 67.131160 −49.001803 6.530785 −80.152359
340.000000 65.497920 −52.744084 4.083089 −83.261374
350.000000 63.073225 −56.481055 1.872281 −86.609758
360.000000 60.000000 −60.000000 0.000000 −90.000000
• [ψ], [θ], [φ]—rotation matrices
[ψ] =



1 0 0
0 cos ψ − sin ψ
0 sin ψ cos ψ


 , [θ] =



cos θ 0 sin θ
0 1 0
− sin θ 0 cos θ


 ,
[φ] =



cos φ − sin φ 0
sin φ cos φ 0
0 0 1


 ; (5.127)
292 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS
• [Uψ], [Uθ], [Uφ]—matrices given by the relations
[Uψ] =


0 0 0
0 0 −1
0 1 0

 , [Uθ] =


0 0 1
0 0 0
−1 0 0

 , [Uφ] =


0 −1 0
1 0 0
0 0 0

 ;
(5.128)
• [A]—rotation matrix
[A] = [ψ][θ][φ]; (5.129)
• [Aψ], [Aθ], [Aφ]—partial derivatives of the rotation matrix, which are written in the
form
[Aψ] = [Uψ][A], [Aθ] = [A][φ]T
[Uθ][φ], [Aφ] = [A][Uφ]; (5.130)
• fi, i = 1, 6—functions of variables XO, YO, ZO, ψ, θ, φ, defined by the relations
{fi} = [{Ri}T
− {R0i}T
]{{Ri} − {R0i}} − (l0i + si)2
, i = 1, 6; (5.131)
• {f}—the column matrix
{f} = f1 f2 f3 f4 f5 f6
T
; (5.132)
• {q}, { q}—the column matrices
{q} = XO YO ZO ψ θ φ
T
,
{ q} = XO YO ZO ψ θ φ
T
;
(5.133)
• [Bi]—matrix given by the relation
[Bi] = Aψ {ri} [Aθ]{ri} [Aφ]{ri} , i = 1, 6. (5.134)
1.2. Computation relations
The column matrices {Ri}, {RO}, {ri} are dependent on the relation
{Ri} = {RO} + [A]{ri}, i = 1, 6. (5.135)
The conditions
(A0iAi)2
= (l01 + si)2
, i = 1, 6 (5.136)
are transcribed in the system of nonlinear equations
fi = 0, i = 1, 6, (5.137)
the solution of which leads to the equation
[J]{ q} = −{f}, (5.138)
APPLICATIONS 293
O
A1
A01
A02
A2
A4
A3
A6
A06A05
A5
A04
A03
X,x
Z,z
Y,y
l
l
l l
l/2l
Figure 5.8 Numerical application.
TABLE 5.6 Coordinates of the Points A0i, Ai, i = 1, 6.
i X0i Y0i Z0i xi yi zi
1 2l 0 0 l 0 0
2 2l 0 l/2 l 0 l/2
3 0 2l 0 0 l 0
4 l 2l 0 l l 0
5 0 0 3l/2 0 0 l/2
6 0 l 3l/2 0 l l/2
[J] being the Jacobian of the system, which, with the given notations, reads
[J] = 2






R1
T
− {R01}T [I] [B1]
[{R2}T
− {R02}T
] [I] [B2]
· · · · · ·
[{R6}T
− {R06}T
] [I] [B6]






. (5.139)
We calculate successively
• the values of the functions si;
• the matrices [ψ], [θ], [φ], [A], [Aψ], [Aθ], [Aφ];
• the matrices {Ri};
• the values of the functions fi, i = 1, 6, and the column matrix {f};
• the matrices [Bi], i = 1, 6;
• the Jacobian [J];
• the column matrix { q};
• the column matrix {q} that becomes {q} + { q};
294 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS
• in a cyclic manner, until | qi| < ε, i = 1, 6;
• the parameter t becomes t + t, and the calculation is taken again, the approximate
values of the matrix {q} being those given at the previous step.
2. Numerical calculation
The motion is periodic with the period T = 2π/π = 2 s, while the results are transcribed in
Figure. 5.9, Figure 5.10, Figure 5.11, Figure 5.12, Figure 5.13, and Figure 5.14.
0 0.5 1 1.5 2 2.5 3 3.5 4
−0.01
−0.005
0
0.005
0.01
0.015
t (s)
xo(m)
Figure 5.9 Time history XO(t).
0 0.5 1 1.5 2 2.5 3 3.5 4
0
1
2
3
4
5
6
× 10−5
t (s)
YO(m)
Figure 5.10 Time history YO(t).
APPLICATIONS 295
0 0.5 1 1.5 2 2.5 3 3.5 4
0
0.2
0.4
0.6
0.8
1
1.2
× 10−4
t (s)
ZO(m)
Figure 5.11 Time history ZO(t).
0 0.5 1 1.5 2 2.5 3 3.5 4
−3
−2
−1
0
1
2
3
× 10−4
t (s)
ψ(°)
Figure 5.12 Time history ψ(t).
Problem 5.3
Let us consider the planetary gear in Figure 5.15, with an angular axial tripod coupling to the gear
box, and with an angular coupling to the wheel in the ball joint C.
The motion is transmitted from the tulip axle (body 1) by contacts between the ramps of the
tulip BiAi, i = 1, 3, symmetrical parallel to the rotation axis and to the arms of the tripod O2Ai,
i = 1, 3, axisymmetric and normal to the axle O2C disposed.
296 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS
0 0.5 1 1.5 2 2.5 3 3.5 4
−1.5
−1
−0.5
0
0.5
1
1.5
t (s)
θ(°)
Figure 5.13 Time history θ(t).
0 0.5 1 1.5 2 2.5 3 3.5 4
0
0.002
0.004
0.006
0.008
0.01
0.012
t (s)
φ(°)
Figure 5.14 Time history φ(t).
On the rotation axis of the tulip, we consider the point O0, so chosen as to have O2C = O0C = l.
The fixed reference system O0x0y0z0 is so chosen that the O0z0-axis coincides with the rotation
axis; as well, we choose the mobile reference system rigidly linked to the tulip O0x1y1z1, so that the
O0z1-axis coincides with the O0z0-axis, while the O0x1-axis be parallel with O∗
C1 and intersects
the ramp B1A1.
APPLICATIONS 297
Gear
box
1
2
B1
B3
B2
A1
A3
A2
O2O∗
z0,z1O0
C
O0x0
O0x1 x2
x0 x1
z2
θ
α
θ
Figure 5.15 Problem 5.3.
We denote by θ the rotation angle of the tulip (the angle between the axes O0x0 and O0x1);
knowing the distances O∗
B1 = O∗
B2 = O∗
B3 = r, the angle α (the angle between the O0z0-axis
and the O2C-line), the length l and the coordinates XC, YC of the point C in the system O0x0y0z0,
determine
• the variation of the angle γ (the angle between O2C and the O0z0-axis), as a function of the
angle θ;
• the variation of the coordinates ξ, η, ζ of the point O2 in the reference system O0x1y1z1;
• the variation of the coordinates ξ0, η0, ζ0 of the point O2 in the reference system O0x0y0z0
as a function of the angle θ;
• the projections of the trajectory of the point O2 on each of the planes O0x1y1 and O0x0y0.
Numerical application: r = 0.04 m, l = 0.2 m, α = 30◦
, XC = 0 m, YC = −0.1 m.
Solution:
1. Theory
We choose the system of reference O2x2y2z2, so that the O2x2-axis coincides with the straight
line O2A1, while the O2z-axis coincides with the straight line O2C, and denoting by x1i, y1i, z1i,
x2i, y2i, z2i the coordinates of the points Ai, i = 1, 3, in each of the systems O0x1y1z1, O2x2y2z2,
we write the relations 

x1i
y1i
z1i

 =


ξ
η
ζ

 + [A21]


x2i
y2i
z2i

 , i = 1, 3, (5.140)
where [A21] is the rotation matrix of the system O2x2y2z2, with respect to the system O0x1y1z1
[A21] =


α1 α2 α3
β1 β2 β3
γ1 γ2 γ3

 . (5.141)
Taking into account the relations


x1i
y1i
z1i

 =


r cos δi
r sin δi
z1i

 ,


x2i
y2i
z2i

 =


µi cos δi
µi sin δi
0

 , (5.142)
298 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS
where
δi =
2
3
(i − 1)π, µi = O2Ai, i = 1, 3, (5.143)
from equation (5.140) we obtain the relations
r cos δi = ξ + µi(α1 cos δi + α2 sin δi), (5.144)
r sin δi = η + µi(β1 cos δi + β2 sin δi), (5.145)
z1i = ζ + µi(γ1 cos δi + γ2 sin δi). (5.146)
By eliminating the parameter µi of equation (5.144) and equation (5.145), we obtain
ξ(β1 cos δi + β2 sin δi) − η(α1 cos δi + α2 sin δi)
=
r
2
[(β1 − α2) + (β1 + α2) cos δi + (β2 − α1) sin δi], i = 1, 3
(5.147)
and taking into account the equalities
3
i=1
sin δi =
3
i=1
cos δi =
3
i=1
sin 2δi =
3
i=1
cos 2δi = 0, (5.148)
by summation of relation (5.147), we obtain the condition
α2 = β1; (5.149)
by adding and subtracting relation (5.147) for i = 2, 3, we obtain the system
ξβ1 − ηα1 = rβ1, ξβ2 − ηα2 =
r
2
(α1 − β2), (5.150)
from which we obtain the unknowns
ξ =
r
2γ3
[−2β1α1 + α1(α1 − β2)], η =
rβ1
2γ3
(α1 − 3β2). (5.151)
By means of Euler’s angles ψ, γ, φ condition (5.149) becomes ψ = −φ, and the rotation matrix
takes the form
[A21] =



cos2
φ + sin2
cos γ − sin φ cos φ (1 − cos γ) − sin φ sin γ
− sin φ cos φ(1 − cos γ) sin2
φ + cos2
φ cos γ − cos φ sin γ
sin φ sin γ cos φ sin γ cos γ


 , (5.152)
while the coordinates ξ, η are given by
ξ =
r(1 − cos γ)
2 cos γ
(cos 3φ cos φ + cos γ sin 3φ sin φ), (5.153)
η =
r(1 − cos γ)
2 cos γ
(− cos 3φ sin φ + cos γ sin 3φ cos φ). (5.154)
APPLICATIONS 299
Starting from the vector relation
O0O2 + O2C = O0C, (5.155)
denoting by [θ] the rotation matrix from the system O0x0y0z0 to the system O0x1y1z1
[θ] =


cos θ − sin θ 0
sin θ cos θ 0
0 0 1

 , (5.156)
and denoting by β the angle defined by the relations
cos β =
XC
X2
C + Y2
C
, sin β =
YC
X2
C + Y2
C
, (5.157)
we obtain the matrix equation


ξ
η
ζ

 + [A21]


0
0
l

 = [θ]T


l sin α cos β
l sin α sin β
l cos α

 , (5.158)
from which the scalar relations
r(1 − cos γ)
2 cos γ
(cos 3φ cos φ + cos γ sin 3φ sin φ) − l sin γ sin φ = l sin α cos(θ − β), (5.159)
r(1 − cos γ)
2 cos γ
(− cos 3φ sin φ + cos γ sin 3φ cos φ) − 1 sin γ cos φ = l sin α sin(θ − β), (5.160)
ζ + l cos γ = l cos α (5.161)
are obtained.
Summing relations (5.159) and (5.160), multiplied by sin φ, cos φ, cos φ, − sin φ, and using the
notation
λ =
r
2l
, (5.162)
we obtain the equations
f1(φ, γ) = λ(1 − cos γ) sin 3φ − sin γ − sin α sin(φ − θ + β) = 0, (5.163)
f2(φ, γ) = λ(1 − cos γ) cos 3φ − sin α cos γ cos(φ − θ + β) = 0, (5.164)
the solving of which leads to φ(θ), γ(θ).
2. Numerical calculation
For θ = 0 we obtain the approximate values γ = α, φ = 0 and because β = 3π/2, from
equation (5.163) and equation (5.164) we obtain by the Newton–Raphson method, the results
plotted into the diagrams in Figure 5.16 and Figure 5.17; then, from relations (5.153),
(5.154), and (5.161), we obtain the results plotted in Figure 5.18, Figure 5.19, Figure 5.20, and
Figure 5.21.
300 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS
0 50 100 150 200 250 300 350 400
0
50
100
150
200
250
300
350
400
θ (°)
φ(°)
Figure 5.16 Time history φ = φ(θ).
0 50 100 150 200 250 300 350 400
29
29.2
29.4
29.6
29.8
30
30.2
30.4
30.6
30.8
31
θ (°)
γ(°)
Figure 5.17 Time history γ = γ(θ).
To calculate φ and γ, we have taken into account that
φ
γ
= −
A11 A12
A21 A22
−1
f1
f2
, (5.165)
APPLICATIONS 301
0 50 100 150 200 250 300 350 400
−3
−2
−1
0
1
2
3
4
× 10−3
θ (°)
ξ(m)
Figure 5.18 Time history ξ = ξ(θ).
0 50 100 150 200 250 300 350 400
−0.015
−0.01
−0.005
0
0.005
0.01
0.015
θ (°)
η(m)
Figure 5.19 Time history η = η(θ).
where
A11 = 3λ(1 − cos γ) cos 3φ − sin α cos(φ − θ + β),
A12 = λ sin γ sin 3φ − cos γ,
A21 = −3λ(1 − cos γ) sin 3φ + sin α cos γ sin(φ − θ + β),
A22 = λ sin γ cos 3φ + sin α sin γ cos(φ − θ + β).
(5.166)
302 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS
0 50 100 150 200 250 300 350 400
−1.5
−1
−0.5
0
0.5
1
1.5
2
× 10−3
θ (°)
ζ(m)
Figure 5.20 Time history ζ = ζ(θ).
−3 −2 −1 0 1 2 3 4
× 10−3
−0.015
−0.01
−0.005
0
0.005
0.01
0.015
ξ (m)
η(m)
Figure 5.21 Variation η = η(ξ).
For the diagrams ξ0(θ), η0(θ), ζ0(η) we take into account the relations
ξ0 = ξ cos θ − η sin θ, η0 = ξ sin θ + η cos θ, ζ0 = l(cos α − cos γ); (5.167)
and the diagrams in Figure 5.22, Figure 5.23, Figure 5.24, Figure 5.25 are obtained.
APPLICATIONS 303
0 50 100 150 200 250 300 350 400
−0.01
−0.008
−0.006
−0.004
−0.002
0
0.002
0.004
0.006
0.008
0.01
θ (°)
ξ0(m)
Figure 5.22 Time history ξ0 = ξ0(θ).
0 50 100 150 200 250 300 350 400
−0.01
−0.008
−0.006
−0.004
−0.002
0
0.002
0.004
0.006
0.008
0.01
θ (°)
η0(m)
Figure 5.23 Time history η0 = η0(θ).
304 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS
0 50 100 150 200 250 300 350 400
−1.5
−1
−0.5
0
0.5
1
1.5
2
× 10−3
θ (°)
ζ0(m)
Figure 5.24 Time history ζ0 = ζ0(θ).
−0.01 −0.008 −0.006 −0.004 −0.002 0 0.002 0.004 0.006 0.008 0.01
−0.01
−0.008
−0.006
−0.004
−0.002
0
0.002
0.004
0.006
0.008
0.01
ξ0 (m)
η0(m)
Figure 5.25 Variation η0 = η0(ξ0).
FURTHER READING
Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of
America.
Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis:
Theory, Methods and Practice. Boca Raton: CRC Press.
FURTHER READING 305
Atkinson KE (1989). An Introduction to Numerical Analysis. 2nd ed. New York: John Wiley & Sons,
Inc.
Atkinson K, Han W (2010). Theoretical Numerical Analysis: A Functional Analysis Framework. 3rd
ed. New York: Springer-Verlag.
Bakhvalov N (1976). M´ethodes Num´erique. Moscou: Editions Mir (in French).
Berbente C, Mitran S, Zancu S (1997). Metode Numerice. Bucures¸ti: Editura Tehnic˘a (in Romanian).
Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole.
Butt R (2009). Introduction to Numerical Analysis Using MATLAB. Boston: Jones and Bartlett
Publishers.
Chapra SC (1996). Applied Numerical Methods with MATLAB for Engineers and Scientists. Boston:
McGraw-Hill.
Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson.
Cira O, M˘arus¸ter S¸ (2008). Metode Numerice pentru Ecuat¸ii Neliniare. Bucures¸ti: Editura Matrix Rom
(in Romanian).
Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Hills: Prentice Hall.
D´emidovitch B, Maron I (1973). ´El´ements de Calcul Num´erique. Moscou: Editions Mir (in French).
Dennis JE Jr, Schnabel RB (1987). Numerical Methods for Unconstrained Optimization and Nonlinear
Equations. Philadelphia: SIAM.
DiBenedetto E (2010). Classical Mechanics: Theory and Mathematical Modeling. New York:
Springer-Verlag.
Fung YC, Tong P (2011). Classical and Computational Solid Mechanics. Singapore: World Scientific
Publishing.
Gautschi W (1997). Numerical Analysis: An Introduction. Boston: Birkh¨auser.
Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen-
tation of Algorithms. Princeton: Princeton University Press.
Hamming RW (1987). Numerical Methods for Scientists and Engineers. 2nd ed. New York: Dover
Publications.
Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing.
Hildebrand FB (1987). Introduction to Numerical Analysis. 2nd ed. New York: Dover Publications.
Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill.
Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd
ed. Boca Raton: CRC Press.
Kleppner D, Kolenkow RJ (2010). An Introduction to Mechanics. Cambridge: Cambridge University
Press.
Kress R (1996). Numerical Analysis. New York: Springer-Verlag.
Krˆılov AN (1957). Lect¸ii de Calcule prin Aproximat¸ii. Bucures¸ti: Editura Tehnic˘a (in Romanian).
Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill.
Lurie AI (2002). Analytical Mechanics. New York: Springer-Verlag.
Mabie HH, Reinholtz CF (1987). Mechanisms and Dynamics of Machinery. 4th ed. New York: John
Wiley & Sons, Inc.
Meriam JL, Kraige LG (2012). Engineering Mechanics: Dynamics. Hoboken: John Wiley & Sons, Inc.
Pandrea N (2000). Elemente de Mecanica Solidului ˆın Coordonate Pl¨uckeriene. Bucures¸ti: Editura
Academiei Romˆane (in Romanian).
Pandrea N, Pˆarlac S, Popa D (2001). Modele pentru Studiul Vibrat¸iilor Automobilelor. Pites¸ti: Tiparg
(in Romanian).
Pandrea N, Popa D (2000). Mecanisme. Teorie s¸i Aplicat¸ii CAD. Bucures¸ti: Editura Tehnic˘a
(in Romanian).
Pandrea N, St˘anescu ND (2002). Mecanic˘a. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
306 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS
Popovici P, Cira O (1992). Rezolvarea Numeric˘a a Ecuat¸iilor Neliniare. Timis¸oara: Editura Signata
(in Romanian).
Postolache M (2006). Modelare Numeric˘a. Teorie s¸i Aplicat¸ii. Bucures¸ti: Editura Fair Partners (in
Romanian).
Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific
Computing. 3rd ed. Cambridge: Cambridge University Press.
Quarteroni A, Sacco R, Saleri F (2010). Numerical Mathematics. 2nd ed. Berlin: Springer-Verlag.
Ralston A, Rabinowitz P (2001). A First Course in Numerical Analysis. 2nd ed. New York: Dover
Publications.
Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press.
Salvadori MG, Baron ML (1962). Numerical Methods in Engineering. Englewood Cliffs: Prentice-Hall
Inc.
Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson.
Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN.
Bucures¸ti: Editura Tehnic˘a (in Romanian).
St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
S¨uli E, Mayers D (2003). An Introduction to Numerical Analysis. Cambridge: Cambridge University
Press.
Udris¸te C, Iftode V, Postolache M (1996). Metode Numerice de Calcul. Algoritmi s¸i Programe Turbo
Pascal. Bucures¸ti: Editura Tehnic˘a (in Romanian).
6
INTERPOLATION AND APPROXIMATION
OF FUNCTIONS
6.1 LAGRANGE’S INTERPOLATION POLYNOMIAL
Definition 6.1 Let [a, b], −∞ < a < b < ∞, be an interval of the real axis and x0, x1, . . . , xn,
n + 1 interior points of the segment [a, b], with
a ≤ x0 < x1 < x2 < · · · < xn−1 < xn = b. (6.1)
The points xi, i = 0, n, are called interpolation knots.
Let us consider a function f : [a, b] → R for which we know the values
yi = f (xi), i = 0, n. (6.2)
We wish to construct a polynomial1
function L(x), for which the values at the interpolation knots
xi, i = 0, n, coincide with the values of the function f at the very same points, that is,
yi = L(xi), i = 0, n. (6.3)
Theorem 6.1 Let f : [a, b] → R, the interpolation knots xi, i = 0, n, and the values of the function
f at the points xi, that is, yi = f (xi), i = 0, n. Under these conditions, there exists a polynomial
Ln(x) of degree n at the most which is unique and the values of which coincide with the values of
the function f at the interpolation knots.
1The polynomial was discovered by Edward Waring (circa 1736–1798) in 1779, then by Leonhard Euler
(1707–1783) in 1783, and published by Joseph Louis Lagrange (1736–1813) in 1795.
Numerical Analysis with Applications in Mechanics and Engineering, First Edition.
Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.
 2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc.
307
308 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
Demonstration. Let us consider a polynomial ψi(x) with the property
ψi(xj ) = δij , (6.4)
where δij is Kronecker’s symbol
δij =
1 for i = j
0 for i = j.
(6.5)
It follows that the polynomial ψi(x) may be written in the form
ψi(x) = Ci(x − x0)(x − x1) · · · (x − xi−1)(x − xi+1) · · · (x − xn), (6.6)
where Ci is given by the condition
ψi(xi) = Ci(xi − x0) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn) = 1. (6.7)
We obtain
Ci =
1
(xi − x0) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn)
, (6.8)
hence
ψi(x) =
(x − x0) · · · (x − xi−1)(x − xi+1) · · · (x − xn)
(xi − x0) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn)
. (6.9)
Let us construct the polynomial Ln(x) in the form
Ln(x) =
n
i=0
ψi(x)yi. (6.10)
We have
Ln(xj ) =
n
i=0
ψi(xj )yi = ψj (xj )yj = yj . (6.11)
Because ψi(x), i = 0, n, are polynomials of nth degree, it follows that Ln(x) has a degree n at the
most. Formula (6.10) may also be written in the form
Ln(x) =
n
i=0
(x − x0) · · · (x − xi−1)(x − xi+1) · · · (x − xn)
(xi − x0) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn)
yi. (6.12)
We will show that Ln(x) is unique. Let us suppose that there exists another polynomial
n(x) such that n(xi) = yi, the degree of n(x) being n at the most. Let us consider the
polynomial
Dn(x) = Ln(x) − n(x), (6.13)
which is of degree n at the most (as a difference of two polynomials of degrees equal to n at the
most), and let us assume that
Dn(xi) = 0, i = 0, n. (6.14)
LAGRANGE’S INTERPOLATION POLYNOMIAL 309
It follows that the polynomial Dn(x) of degree n at the most has at least n + 1 real roots, x0, x1, . . . ,
xn. It follows that the polynomial Dn(x) vanishes identically, hence
Ln(x) = n(x), (6.15)
where Ln(x) is unique.
Definition 6.2 The polynomial Ln(x) given by formula (6.12) is called Lagrange’s interpolation
polynomial.
Observation 6.1 Let us denote by Pn+1(x) the polynomial
Pn+1(x) =
n
i=0
(x − xi). (6.16)
Under these conditions, we have
Ln(x) = Pn+1(x)
n
i=0
yi
(x − xi)Pn+1(xi)
. (6.17)
Demonstration. We may successively write
Ln(x) =
n
i=0
(x − x0) · · · (x − xi−1)(x − xi+1) · · · (x − xn)
(xi − x0) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn)
x − xi
x − xi
yi
= Pn+1(x)
n
i=0
yi
(x − xi)
1
(xi − x0) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn)
.
(6.18)
On the other hand,
P n+1
(x) =
n
i=0
(x − x0) · · · (x − xi−1)(x − xi+1) · · · (x − xn) (6.19)
and it follows that
P n+1
(xi) = (xi − x0) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn). (6.20)
Formula (6.18), in which we replace relation (6.20), leads to relation (6.17), which had to be proved.
Observation 6.2 The polynomial Ln(x) may also be written in the form
Ln(x) = anxn
+ an−1xn−1
+ · · · + a1x + a0 (6.21)
and condition (6.3) implies a system of n + 1 linear equations with n + 1 unknowns a0, a1, . . . , an:



anxn
0 + an−1xn−1
0 + · · · + a1x0 + a0 = y0,
...
anxn
n + an−1xn−1
n + · · · + a1xn + a0 = yn.
(6.22)
310 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
The determinant of the system matrix
=
xn
0 xn−1
0 · · · x0 1
xn
1 xn−1
1 · · · x1 1
· · · · · · · · · · · · · · ·
xn
n xn−1
n · · · xn 1
, (6.23)
is of the Vandermonde type, the value of which,
= (x1 − x2) · · · (x1 − xn)(x2 − x3) · · · (x2 − xn) · · · (xn−1 − xn) =
i<j
(xi − xj ), (6.24)
does not vanish because xi = xj for i = j. Hence, it follows that system (6.22) has a unique solution
so that Lagrange’s polynomial does exist and is unique.
Theorem 6.2 (Evaluation of the Error for Lagrange’s Polynomial). Let f : [a, b] → R of
class Cn+1 on [a, b] and let us denote by M the value
M = sup
x∈[a,b]
|f (n+1)
(x)|. (6.25)
Let x0, x1, . . . , xn be the n + 1 interpolation knots on [a, b] and let yi = f (xi), i = 0, n. Ln(x),
Lagrange’s polynomial, satisfy Ln(xi) = yi, i = 0, n. Under these conditions, we have
|f (x) − Ln(x)| ≤
M
(n + 1)!
|Pn+1(x)|. (6.26)
Demonstration. Let us denote by θ : [a, b] → R an auxiliary function
θ(x) = f (x) − Ln(x) − λPn+1(x), λ ∈ R. (6.27)
We observe that
θ(xi) = f (xi) − Ln(xi) − λPn+1(xi) = 0, i = 0, n; (6.28)
hence, θ(x) has at least n + 1 roots in the interval [a, b]. Let us choose λ ∈ R so that θ(x) admits
a (n + 2)th root also on [a, b], and let us denote this root by x. In this case,
θ(x) = f (x) − Ln(x) − λPn+1(x) = 0; (6.29)
hence
λ =
f (x) − Ln(x)
Pn+1(x)
. (6.30)
Let us arrange the n + 2 roots of θ in increasing order. The intervals [x0, x1], [x1, x2], . . . , [xj , x],
[x, xj+1], . . . , [xn−1, xn] are thus obtained. The function θ(x) vanishes at the ends of each of
these intervals for the value λ given by relation (6.30). Applying Rolle’s theorem for each of these
intervals, it follows that the function θ (x) has at least n + 1 distinct roots. Analogically, it follows
that θ (x) has at least n distinct roots, . . . , and the function θ(n+1)(x) has at least one root ζ; hence,
θ(n+1)
(ζ) = 0. (6.31)
TAYLOR POLYNOMIALS 311
Differentiating n + 1 times the function θ in relation (6.27) and taking into account that
L(n+1)
n (x) = 0, P (n+1)
n+1 = (n + 1)!, (6.32)
because Ln(x) is a polynomial of degree n at the most while Pn+1(x) is a polynomial of (n + 1)th
degree, we get
θ(n+1)
(x) = f (n+1)
(x) − λ(n + 1)!, (6.33)
from which
f (n+1)
(ζ) − λ(n + 1)! = 0; (6.34)
hence
λ =
f (n+1)
(ζ)
(n + 1)!
. (6.35)
Equating relations (6.30) and (6.35), we get
f (x) − Ln(x) =
f (n+1)(ζ)
(n + 1)!
Pn+1(x) (6.36)
and because x is arbitrary, it follows that
|f (x) − Ln(x)| =
1
(n + 1)!
|f (n+1)
(ζ)||Pn+1(x)|. (6.37)
Passing on to the supremum after ζ on [a, b], we obtain
|f (x) − Ln(x)| ≤
1
(n + 1)!
sup
ζ∈[a,b]
|f (n+1)
(ζ)||Pn+1(x)| =
M
(n + 1)!
|Pn+1(x)|, (6.38)
and the theorem is proved.
6.2 TAYLOR POLYNOMIALS
We remember a well-known theorem of analysis.
Theorem 6.3 (Taylor2
). Let us consider f : I → R, where I is an interval of the real axis, and x
and x are two elements of I. If f is of class Cn+1
on I, then the relation
f (x) = f (x) +
(x − x)1
1!
f (x) + · · · +
(x − x)n
n!
f (n)
(x) +
(x − x)n+1
(n + 1)!
f (n+1)
(ζ) (6.39)
exists, where ζ is a point between x and x.
Observation 6.3 Relation (6.39) leads to an approximate formula for the calculation of f (x), that
is,
f (x) ≈ f (x) +
(x − x)1
1!
f (x) +
(x − x)2
2!
f (x) + · · ·
+
(x − x)n
n!
f (n)
(x) =
n
k=0
(x − x)k
k!
f (k)
(x).
(6.40)
2Brook Taylor (1685–1731) stated this theorem in 1712.
312 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
6.3 FINITE DIFFERENCES: GENERALIZED POWER
Let there be a function f : R → R.
Definition 6.3 We call step a fixed value h = x of the increment of the argument of the func-
tion f .
Definition 6.4 The expression
y = f (x) = f (x + x) − f (x) (6.41)
is called first difference of the function f . In general, the difference of order n of the function f is
defined by
n
y = ( n−1
y), (6.42)
where n ≥ 2.
Proposition 6.1 Let
Pn(x) = a0xn
+ a1xn−1
+ · · · + an (6.43)
be a polynomial of nth degree, where ai ∈ R, i = 0, n, and h is the step. Under these
conditions,
(i) kPn(x) is a polynomial of degree n − k, 1 ≤ k ≤ n, the dominant coefficient of which is
given by
a(k)
0 = a0n(n − 1) · · · (n − k + 1)hk
; (6.44)
(ii) for k = n, we have
a(n)
0 = a0n!hn
; (6.45)
(iii) if k > n, then
k
Pn(x) = 0. (6.46)
Demonstration
(i) Successively, we may write
Pn(x) = Pn(x + h) − Pn(x)
= a0(x + h)n
+ a1(x + h)n−1
+ · · · + an − a0xn
− a1xn−1
− · · · − an
= a0xn
+ C1
na0xn−1
h + · · · + a1xn−1
+ · · · + an − a0xn
− a1xn−1
− · · · − an
= C1
na0xn−1
h + · · · ;
(6.47)
hence, Pn(x) is a polynomial of degree n − 1, the dominant coefficient of which is
a(1)
0 = C1
na0h = na0h. (6.48)
Then
2
Pn(x) = ( Pn(x)) = na0h(x + h)n−1
+ · · · − na0hxn−1
− · · ·
= na0hC1
n−1xn−2
h + · · · = n(n − 1)a0h2
xn−2
+ · · · ,
(6.49)
FINITE DIFFERENCES: GENERALIZED POWER 313
and hence 2
Pn(x) is a polynomial of degree n − 2, the dominant coefficient of which is
given by
a(2)
0 = n(n − 1)a0h2
. (6.50)
We can thus show that k
Pn(x) is a polynomial of degree n − k, 1 ≤ k ≤ n, the dominant
coefficient of which is given by (6.44).
(ii) It is a particular case of (i) for k = n. It follows that n
Pn(x) is a polynomial of degree 0
(hence a constant), its value being given by
n
Pn(x) = a0n!hn
. (6.51)
(iii) Let k = n + 1. We have
k
Pn(x) = ( n
Pn(x)) = (a0n!hn
) = 0 (6.52)
and, in general, the finite difference of a constant is zero and the proposition is proved.
Proposition 6.2 Finite differences have the following properties:
(i) If f and g are two functions and a and b two real constants, then
(af + bg) = a f + b g. (6.53)
(ii) The relation
m
( n
y) = m+n
y (6.54)
exists for any m, n ∈ N∗ (N∗ = N − {0} = 1, 2, 3, . . . ).
(iii) If we write
f (x + x) = f (x) + f (x) = (1 + )f (x), (6.55)
then the relation
f (x + n x) = (1 + )n
f (x) =
n
k=0
Ck
n
k
f (x) (6.56)
holds for any n ∈ N∗
.
Demonstration
(i) We have
(af + bg) = af (x + x) + bg(x + x) − af (x) − bg(x)
= a[f (x + x) − f (x)] + b[g(x + x) − g(x)]
= a f + b g.
(6.57)
(ii) Let n ∈ N∗
arbitrary be fixed. If m = 1, then we have
m
( n
y) = ( n
y) = 1+n
y = m+n
y (6.58)
corresponding to the definition of the finite difference. Let us suppose that relation (6.54)
is valid for n ∈ N∗ arbitrary and m ∈ N∗ and let us write it for m + 1. We have
314 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
m+1
( n
y) = [ m
( n
y)] = ( m+n
y) = m+1+n
y (6.59)
and, conforming to the principle of mathematical induction, it follows that the property (ii)
holds for any m, n ∈ N∗
.
(iii) For n = 1 we have
f (x + x) = f (x) + f (x) = (1 + )f (x), (6.60)
while for n = 2 we may write
f (x + 2 x) = (1 + )f (x + x) = (1 + )2
f (x). (6.61)
Let us suppose that relation (6.56) holds for n and let us show that it holds for n + 1 too. We
have
f [x + (n + 1) x] = (1 + )f (x + n x) = (1 + )(1 + )n
f (x) = (1 + )n+1
f (x) (6.62)
and, conforming to the principle of mathematical induction, the property is valid for any n ∈ N∗
.
Corollary 6.1 We may write
n
f (x) = f (x + n x) − C1
nf [x + (n − 1) x] + C2
nf [x + (n − 2) x] + · · · + (−1)n
f (x)
(6.63)
for any n ∈ N∗.
Demonstration. Indeed,
n
f (x) = [(1 + ) − 1]n
f (x) =
n
k=0
Ck
n(−1)k
(1 + )n−k
f (x)
=
n
k=0
(−1)k
f [x + (n − k) x] = f (x + n x) − C1
nf [x + (n − 1) x]
+ C2
nf [x + (n − 2) x] + · · · + (−1)n
f (x).
(6.64)
Proposition 6.3 Let I be an open interval of the real axis and f : I → R of class C∞ on I. Let
us denote the step by h = x. Under these conditions,
n
f (x) = ( x)n
f (n)
(x + nξ x), (6.65)
where 0 < ξ < 1.
Demonstration. We proceed by induction after n. For n = 1 we get
f (x) = xf (x + ξ x), (6.66)
which is just Lagrange’s theorem of finite increments. Let us suppose that the statement holds for
n and let us show that it is valid for n + 1 too. We have
n+1
f (x) = ( n
f (x)) = n
f (x + x) − n
f (x)
= ( x)n
[f (n)
(x + x + nξ1 x) − f (n)
(x + nξ1 x)]
= ( x)n
( x)f (n+1)
(x + nξ1 x + λ x),
(6.67)
FINITE DIFFERENCES: GENERALIZED POWER 315
the last relation being the result of the application of Lagrange’s theorem, while λ ∈ (0, 1). Let us
denote
ξ =
nξ1 + λ
n + 1
∈ (0, 1); (6.68)
hence
n+1
f (x) = ( x)n+1
f (n+1)
[x + (n + 1)ξ x]. (6.69)
Corresponding to the principle of mathematical induction, the property is valid for any n ∈ N∗
.
Corollary 6.2 In the above conditions, there exists the relation
f (n)
(x) = lim
x→0
n
f (x)
( x)n
. (6.70)
Demonstration. We pass to the limit for x → 0 in the relation
f (n)
(x + nξ x) =
nf (x)
( x)n
, (6.71)
with 0 < ξ < 1, and obtain just the requested relation.
Observation 6.4
(i) Let there be a system of equidistant points xi, i = 0, n, for which
xi = xi+1 − xi = h = ct, i = 0, n − 1, (6.72)
and let us denote by yi, i = 0, n the values of the function at the points xi. We may write
the relations
yi = yi+1 − yi, i = 0, n − 1, (6.73)
2
yi = yi+1 − yi = yi+1 − 2yi+1 + yi, i = 0, n − 2, (6.74)
and, in general,
k
yi = k−1
yi+1 − k−1
yi, i = 0, n − k. (6.75)
On the other hand,
yi+1 = yi + yi = (1 + )yi, i = 0, n − 1, (6.76)
yi+2 = yi+1 + yi+1 = (1 + )yi+1 = (1 + )2
yi, i = 0, n − 2, (6.77)
and, in general,
yi+k = (1 + )k
yi, i = 0, n − k. (6.78)
Hence, it follows that
yi+k =
k
j=0
C
j
k
j
yi = yi + C1
k yi + · · · + k
yi. (6.79)
316 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
(ii) We can calculate
k
yi = [(1 + ) − 1]k
yi =
k
j=0
C
j
k (−1)j
(1 + )k−j
yi
= (1 + )k
yi − C1
k (1 + )k−1
yi + C2
n(1 + )k−2
yi + · · · + (−1)k
Ck
k yi
(6.80)
and, taking into account relation (6.78), we obtain
k
yi = yi+k − C1
k yi+k−1 + C2
k yi+k−2 + · · · + (−1)k
yi. (6.81)
Usually, we put the finite differences as, for example, in Table 6.1.
Definition 6.5 We denote by generalized power of order n the product
x(n)
= x(x − h)(x − 2h) · · · [x − (n − 1)h]. (6.82)
Proposition 6.4 The relation
k
x(n)
= n(n − 1) · · · [n − (k − 1)h]hk
x(n−k)
(6.83)
holds for k ∈ N∗
.
Demonstration. Let us consider firstly that k = 1. We have
x(n)
= (x + h)(n)
− x(n)
= (x + h)x(x − h) · · · [x − (n − 2)h]
− x(x − h) · · · [x − (n − 2)h][x − (n − 1)h]
= x(x − h) · · · [x − (n − 2)h]nh = nhx(n−1)
.
(6.84)
It follows that
2
x(n)
= nh x(n−1)
= nh[(x + h)(n−1)
− x(n−1)
]
= nh{(x + h)x · · · [x − (n − 3)h] − x(x − h) · · · [x − (n − 2)h]}
= nhx(x − h) · · · [x − (n − 3)h]h(n − 1) = n(n − 1)h2
x(n−2)
,
(6.85)
for k = 2. Let us suppose that the relation holds for k and let us show that it remains valid for
k + 1. We have
k
x(n)
= n(n − 1) · · · [n − (k − 1)]hk
x(n−k)
(6.86)
TABLE 6.1 Table of the Finite Differences
x y y 2y . . . n−3y n−2y n−1y ny
x0 y0 y0
2
y0 . . . n−3
y0
n−2
y0
n−1
y0
n
y0
x1 y1 y1
2
y1 . . . n−3
y1
n−2
y1
n−1
y1
x2 y2 y2
2y2 . . . n−3y2
n−2y2
x3 y3 y3
2
y3 . . . n−3
y3
. . . . . . . . . . . . . . .
xn−2 yn−2 yn−2
2
yn−2
xn−1 yn−1 yn−1
xn yn
NEWTON’S INTERPOLATION POLYNOMIALS 317
k+1
x(n)
= n(n − 1) · · · [n − (k − 1)]hk
[(x + h)(n−k)
− x(n−k)
]
= n(n − 1) · · · [n − (k − 1)]hk
{(x + h)x · · · [x − (n − k − 2)]
− x(x − h) · · · [x − (n − k − 1)h]}
= n(n − 1) · · · [n − (k − 1)]hk
x(x − h) · · · [x − (n − k − 2)h](n − k)h
= n(n − 1) · · · (n − k)hk+1
x(n−k−1)
(6.87)
and, conforming to the principle of mathematical induction, property (6.83) is valid for any k ∈ N∗.
Observation 6.5 If h = 0, then the generalized power coincides with the normal power.
6.4 NEWTON’S INTERPOLATION POLYNOMIALS
Proposition 6.5 Let us consider the function f : [a, b] → R and an equidistant system of knots3
xi = x0 + ih, i = 0, n, (6.88)
where h is the constant interpolation step. If yi = f (xi), i = 0, n, then there exists a polynomial
Pn(x) of degree n at the most so that Pn(xi) = yi and
Pn = y0 +
q
1!
y0 +
q(q − 1)
2!
2
y0 + · · · +
q(q − 1) . . . [q − (n − 1)]
n!
n
y0, (6.89)
where
q =
x − x0
h
. (6.90)
Demonstration. Let us search the polynomial Pn in the form
Pn = a0 + a1(x − x0) + a2(x − x0)(x − x1) + · · · + an(x − x0) · · · (x − xn−1) (6.91)
or, equivalently,
Pn = a0 + a1(x − x0)(1)
+ a2(x − x0)(2)
+ · · · + an(x − x0)(n)
. (6.92)
The condition Pn(xi) = yi is equivalent to the condition
k
Pn(x0) = k
y0, k ≥ 0. (6.93)
For k = 0, we obtain
Pn(x0) = y0, (6.94)
from which
a0 = y0. (6.95)
For k = 1 we have
Pn(x) = 1!a1h + 2a2h(x − x0)(1)
+ · · · + nanh(x − x0)(n−1)
, (6.96)
3Newton’s interpolation polynomials were described by Isaac Newton in a letter to Smith in 1675; a letter to
Oldenburg in 1676; in Methodus Differentialis in 1711; in Regula Differentiarum written in 1676 and discovered
in the twentieth century; and in Philosophiae Naturalis Principia Mathematica, published in 1687.
318 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
obtaining
Pn(x0) = 1!a1h, (6.97)
hence
a1 =
y0
1!h
. (6.98)
For k = 2 we have
2
Pn(x) = 1 × 2 × a2h2
+ 2 × 3 × a3h2
(x − x0)(1)
+ · · · + n(n − 1)anh2
(x − x0)(n−2)
(6.99)
and we get
2
Pn(x0) = 2!a2h2
, (6.100)
from which
a2 =
2y0
2!h2
. (6.101)
Step by step, we obtain
ak =
ky0
k!hk
, k = 0, n, (6.102)
and the polynomial Pn(x) may now be written as
Pn(x) = y0 +
y0
1!h
(x − x0)(1)
+
2
y0
2!h2
(x − x0)(2)
+ · · · +
n
y0
n!hn
(x − x0)(n)
. (6.103)
We now verify that Pn(x) is an interpolation polynomial, that is,
Pn(xk) = yk, k = 0, n. (6.104)
Observing that
(xk − x0)(k+p)
= 0, (6.105)
for any p ∈ N∗
, it follows that Pn(xk) may be written in the form
Pn(xk) = y0 +
y0
1!h
(xk − x0)(1)
+
2
y0
2!h2
(xk − x0)(2)
+ · · · +
k
y0
k!hk
(xk − x0)(k)
. (6.106)
Then
xk − x0 = kh, xk − x1 = (k − 1)h, xk − x2 = (k − 2)h, . . . , xk − xk−1 = h (6.107)
and formula (6.106) is now written as
Pn(xk) = y0 +
y0
1!h
kh +
2y0
2!h2
k(k − 1)h2
+ · · · +
ky0
k!hk
hk
k(k − 1) · · · 1. (6.108)
Because
k(k − 1) · · · [k − (p − 1)]
p!
= C
p
k , (6.109)
relation (6.108) becomes
Pn(xk) = y0 + C1
k y0 + C2
k
2
y0 + · · · + Ck
k
k
y0. (6.110)
NEWTON’S INTERPOLATION POLYNOMIALS 319
But we know that
yk = (1 + )k
y0 (6.111)
and it follows that
Pn(xk) = (1 + )k
y0 = yk. (6.112)
We calculate
(x − x0)k
hk
=
(x − x0)(x − x1) · · · (x − xk−1)
hk
=
x − x0
h
x − x1
h
· · ·
x − xk−1
h
. (6.113)
But
x − x0
h
= q,
x − x1
h
=
x − x0 − h
h
= q − 1, . . . ,
x − xk−1
h
=
x − x0 − (k − 1)h
h
= q − (k − 1)
(6.114)
and, taking into account the relation (6.103), we obtain the relation (6.89); hence the proposition is
proved.
Definition 6.6 The polynomial Pn(x) is called Newton’s polynomial or Newton’s forward
polynomial.
Observation 6.6 Newton’s formula (6.89) is inconvenient for x contiguous to the value xn (x
situated in the inferior part of the finite difference table); therefore, another Newton’s polynomial
beginning with xn is necessary.
Observation 6.7 Because
f (k)
k = lim
x→0
k
f (x)
( x)k
, (6.115)
corresponding to the demonstrations in Section 6.3 and considering that
lim
h→0
k
y0
hk
= y(k)
(x0), (6.116)
it follows that
f (k)
= y(k)
(x0), (6.117)
so that Newton’s polynomial is transformed into the formula of expansion into a Taylor
series.
Proposition 6.6 Let f : [a, b] → R and the equidistant interpolation knots xi = x0 + ih, i = 0, n.
Let us denote by yi the values of the function f at the points xi, yi = f (xi), i = 0, n. Under these
conditions, the polynomial of degree n at the most, given by
Pn(x) = yn +
q
1!
yn−1 +
q(q + 1)
2!
2
yn−2 + · · · +
q(q + 1) · · · (q + n − 1)
n!
n
y0, (6.118)
is an interpolation polynomial with
q =
x − xn
h
. (6.119)
Demonstration. We seek the polynomial Pn(x) in the form
Pn(x) = a0 + a1(x − xn) + a2(x − xn)(x − xn−1) + · · · + an(x − xn)(x − xn−1) · · · (x − x1).
(6.120)
320 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
The condition Pn(xi) = yi is equivalent to the condition
i
Pn(xn−i) = i
yn−i. (6.121)
Relation (6.120) may also be written in the form
Pn(x) = a0 + a1(x − xn)(1)
+ a2(x − xn−1)(2)
+ · · · + an(x − x1)(n)
. (6.122)
We obtain
Pn(xn) = a0, (6.123)
for i = 0 in relation (6.121), from which
a0 = yn. (6.124)
If we make i = 1 in the same relation, then it follows that
Pn(xn−1) = yn−1, (6.125)
where
Pn(xn−1) = 1 × a1 × h; (6.126)
hence,
a1 =
yn−1
1!h
. (6.127)
On the other hand,
2
Pn(x) = 1 × 2 × a2h2
+ 2 × 3a3h2
(x − xn−2)(1)
+ · · · + n(n − 1)(x − x1)(n−2)
; (6.128)
making x = xn−2, we obtain
2
Pn(xn−2) = 2!a2h2
. (6.129)
But
2
Pn(xn−2) = 2
yn−2, (6.130)
corresponding to relation (6.121) for i = 2 so that it follows that
a2 =
2
yn−2
2!h2
. (6.131)
Step by step, we obtain
ai =
i
yn−i
i!hi
, i = 0, n. (6.132)
Newton’s polynomial becomes
Pn(x) = yn +
yn−1
1!h
(x − xn)(1)
+
2yn−2
2!h2
(x − xn−1)(2)
+ · · · +
ny0
n!hn
(x − x1)(n)
. (6.133)
Pn(x) is an interpolation polynomial, that is,
Pn(xi) = yi, i = 0, n. (6.134)
NEWTON’S INTERPOLATION POLYNOMIALS 321
Firstly, let us observe that
(x − xn−k)(k+p)
= 0, (6.135)
for any p ∈ N∗
; hence,
Pn(xi) = yn +
yn−1
1!h
(xi − xn)(1)
+
2yn−2
2!h
(xi − xn−1)(2)
+ · · · +
n−iyi
(n − i)!hn−i
(xi − xi+1)(n−i)
.
(6.136)
Then
xi − xn = (i − n)h, xi − xn−1 = (i − n + 1)h, . . . , xi − xi+1 = −h (6.137)
and relation (6.136) reads
Pn(xi) = yn +
(i − n)h
1!h
yn−1 +
(i − n)(i − n + 1)h2
2!h2
2
yn−2 + · · ·
+
(i − n)(i − n + 1) · · · (−1)hn−i
(n − i)!hn−i
n−i
yi.
(6.138)
On the other hand,
i − n
1!
= −
n − i
1!
= −C1
n−i,
(i − n)(i − n + 1) . . . (−1)
2!
=
(n − i)(n − i − 1)
2!
= C2
n−i, . . . ,
(i − n)(i − n + 1) · · · (−1)
(n − i)!
= (−1)n−i (n − i)!
(n − i)!
= (−1)n−i
Cn−i
n−i
(6.139)
and relation (6.138) leads to
Pn(xi) = yn − C1
n−i yn−1 + C2
n−i
2
yn−2 + · · · + (−1)n−i
Cn−i
n−i
n−i
yi = yi, (6.140)
corresponding to Section 6.3.
We have
x − xn
h
= q,
x − xn−1
h
=
x − xn + h
h
= q + 1,
x − xn−2
h
=
x − xn + 2h
h
= q + 2, . . . ,
x − x1
h
=
x − xn + (n − 1)h
h
= q + (n − 1)
(6.141)
and relation (6.133) leads to relation (6.118), which had to be proved.
Definition 6.7 The polynomial Pn(x) is called Newton’s polynomial or Newton’s backward poly-
nomial.
Observation 6.8 Newton’s formula (6.118) is used for values contiguous to xn (situated in the
inferior part of the finite differences table).
Observation 6.9
(i) We know that the Lagrange interpolation polynomial is unique; hence, Newton’s polynomials
are in fact Lagrange polynomials written differently.
322 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
(ii) The error in case of the Lagrange polynomial is given by
|f (x) − Ln(x)| =
|f (n+1)
(ζ)|
(n + 1)!
|Pn+1(x)|, (6.142)
where ζ is a point situated in the interval [a, b], while
Pn+1(x) = (x − x0)(x − x1) · · · (x − xn). (6.143)
Considering that
Pn+1(x) = qh(q − 1)h · · · (q − n)h = q(q − 1) · · · (q − n)hn+1
, (6.144)
where we used Newton’s forward polynomial and the relation
f (n+1)
(ζ) = lim
h→0
n+1
f (ζ)
hn+1
, (6.145)
relation (6.144) becomes
|f (x) − Pn(x)| ≈
n+1
f (ζ)
(n + 1)!hn+1
q[n+1]
hn+1
≈
n+1
y0q[n+1]
(n + 1)!
. (6.146)
Analogically, for Newton’s backward polynomial we have
Pn+1 = qh(q + 1)h · · · (q + n)h = q(q + 1) · · · (q + n)hn+1
(6.147)
and it follows that
|f (x) − Pn(x)| ≈
n+1
f (ζ)
(n + 1)!hn+1
(q + n)(n+1)
hn+1
≈
n+1
y0(q + n)(n+1)
(n + 1)!
. (6.148)
6.5 CENTRAL DIFFERENCES: GAUSS’S FORMULAE, STIRLING’S FORMULA,
BESSEL’S FORMULA, EVERETT’S FORMULAE
Let us consider the function f : [a, b] → R and 2n + 1 equidistant points in the interval [a, b]. We
denote these points by x−n, x−n+1, . . . , x−1, x0, x1, . . . , xn−1, xn and denote by h the step
h = xi+1 − xi = const, i = −n, n − 1. (6.149)
Theorem 6.4 (Gauss’s first formula4
). Under the above conditions and denoting
q =
x − x0
h
(6.150)
and yi = f (xi), i = −n, n, there exists a unique interpolation polynomial of degree 2n at the most,
the expression of which is
4
Carl Friedrich Gauss (1777–1855) gave these formulae in 1812, in a lecture on interpolation.
CENTRAL DIFFERENCES 323
P (x) = y0 + q y0 +
q(q − 1)
2!
2
y−1 +
(q + 1)q(q − 1)
3!
3
y−1
+
(q + 1)q(q − 1)(q − 2)
4!
4
y−2 +
(q + 2)(q + 1)q(q − 1)(q − 2)
5!
5
y−2
+ · · · +
(q + n − 1) · · · (q + 1)q(q − 1) · · · (q − n)
(2n)!
2n
y−n.
(6.151)
Demonstration. In the case of Gauss’s polynomial, the conditions are
k
P (xi) = k
yi, i = −n, n, k = 0, 2n. (6.152)
We require the polynomial in the form
P (x) = a0 + a1(x − x−1)(−1)
+ a2(x − x−1)(2)
+ a3(x − x−2)(3)
+ a4(x − x−2)(4)
+ · · · + a2n−1(x − x−n)(2n−1)
+ a2n(x − x−n)(2n)
.
(6.153)
Proceeding as with Newton’s polynomials, conditions (6.152) lead to
a0 = y0, a1 =
y0
1!h
, a2 =
2
y−1
2!h2
, a3 =
3
y−1
3!h3
, a4 =
4
y−2
4!h4
, . . . , a2n =
2n
y−n
(2n)!h2n
.
(6.154)
Taking into account equation (6.150) and equation (6.154) and replacing in relation (6.153), we get
formula (6.151), which had to be proved. As for Newton’s polynomials, we may show that P (x)
is an interpolation polynomial.
Observation 6.10 The first Gauss formula may also be written in the form
P (x) = y0 + q(1)
y0 +
q(2)
2!
2
y−1 +
(q + 1)(3)
3!
3
y−1
+
(q + 1)(4)
4!
4
y−2 + · · · +
(q + n − 1)(2n)
(2n)!
2n
y−n.
(6.155)
Definition 6.8 The finite differences y−1, y0, and 2
y−1 are called central differences. For an
arbitrary i between −n + 1 and 0, we call central differences the finite differences yi−1, yi, and
2
yi−1.
Theorem 6.5 (Gauss’s Second Formula). Under the conditions of Theorem 6.4, the interpolation
polynomial may be written in the form
P (x) = y0 + q(1)
y−1 +
(q + 1)(2)
2!
2
y−1 +
(q + 1)(3)
3!
3
y−2
+
(q + 2)(4)
4!
4
y−2 + · · · +
(q + n)(2n)
(2n)!
2n
y−n.
(6.156)
Demonstration. It is analogous to the demonstrations of the first Gauss formula and the Newton
polynomials.
324 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
Corollary 6.3 (The Stirling Formula5
). Under the conditions of Theorem 6.4, the interpolation
polynomial reads
P (x) = y0 + q
y−1 + y0
2
+
q2
2
2
y−1 +
q(q2
− 1)
3!
3
y−2 + 3
y−1
2
+
q2
(q2
− 1)
4!
4
y−2 +
q(q2
− 12
)(q2
− 22
)
5!
5
y−3 + 5
y−2
2
+ · · · +
q2(q2 − 12
) · · · [q2 − (n − 1)2]
(2n)!
2n
y−n.
(6.157)
Demonstration. Formula (6.157) is the arithmetic mean of relations (6.151) and (6.156).
For Bessel’s formulae6
we start from Gauss second formula, in which we take as initial values
x1 and, correspondingly, y1 = f (x1). We have
x − x1
h
= q − 1 (6.158)
and, replacing q by q − 1, we obtain
P (x) = y1 + (q − 1) y0 +
q(q − 1)
2!
2
y0 +
q(q − 1)(q − 2)
3!
3
y−1
+
(q + 1)q(q − 1)(q − 2)
4!
4
y−1 +
(q + 1)q(q − 1)(q − 2)(q − 3)
5!
5
y−2
+ · · · +
(q + n − 2)(q + n − 3) · · · (q − n)
(2n − 1)!
2n−1
y−(n−1)
+
(q + n − 1)(q + n − 2) · · · (q − n)
(2n)!
2n
y−(n−1).
(6.159)
To obtain the first interpolation formula of Bessel, we take the arithmetic mean between relation
(6.159) and the first interpolation formula of Gauss, resulting in
P (x) =
y0 + y1
2
+ q −
1
2
y0 +
q(q − 1)
2!
2
y−1 + 2
y0
2
+
q − 1
2 q(q − 1)
3!
3
y−1
+
q(q − 1)(q + 1)(q − 2)
4!
4
y−2 + 4
y−1
2
+
q − 1
2 q(q − 1)(q + 1)(q − 2)
5!
5
y−2
+
q(q − 1)(q + 1)(q − 2)(q + 2)(q − 3)
6!
6
y−3 + 6
y−2
2
+ · · ·
+
q(q − 1)(q + 1)(q − 2)(q + 2) · · · (q − n)(q + n − 1)
(2n)!
2n
y−n + 2n
y−n+1
2
+
q − 1
2 q(q − 1)(q + 1)(q − 2)(q + 2) · · · (q − n)(q + n − 1)
(2n + 1)!
2n+1
y−n,
(6.160)
5
In 1719, James Stirling (1692–1770) discussed some Newton’s interpolation formulae in Methodus Differentialis.
In 1730, Stirling published a more elaborate booklet on the topic.
6Friedrich Wilhelm Bessel (1784–1846) published these formulae in 1824.
CENTRAL DIFFERENCES 325
where
q =
x − x0
h
. (6.161)
The polynomial P (x) in formula (6.160) coincides with f (x) at the points x−n, x−n+1, . . . , xn,
xn+1, that is, at 2n + 2 points.
If we consider the particular case n = 1, then we obtain the quadratic interpolation formula of
Bessel
P (x) = y0 + q y0 +
q(q − 1)
4
( y1 − y−1). (6.162)
Let us observe that in Bessel’s formula (6.160) all the terms that contain differences of odd order
have the factor (q − 1/2). If we choose q = 1/2, then we obtain Bessel’s dichotomy formula
P
x0 + x1
2
=
y0 + y1
2
−
1
8
2
y−1 + 2
y0
2
+
3
128
4
y−2 + 4
y−1
2
−
5
1024
4y−3 + 4y−2
2
+ · · ·
+ (−1)n [1 × 3 × 5 × · · · × (2n − 1)]2
22n
(2n)!
2n
y−n + 2n
y−n+1
2
.
(6.163)
If we denote
q1 = q −
1
2
, (6.164)
then Bessel’s formula reads
P (x) =
y0 + y1
2
+ q1 y0 +
q2
1 − 1
4
2!
2y−1 + 2y0
2
+
q1 q2
1 − 1
4
3!
3
y−1
+
q2
1 − 1
4 q2
1 − 9
4
4!
4
y−2 + 4
y−1
2
+
q q2
1 − 1
4 q2
1 − 9
4
5!
5
y−2
+
q2
1 − 1
4 q2
1 − 9
4 q2
1 − 25
4
6!
6
y−3 + 6
y−2
2
+ · · ·
+
q2
1 − 1
4 q2
1 − 9
4 · · · q2
1 − (2n−1)2
4
(2n)!
2n
y−n + 2n
y−n+1
2
+
q1 q2
1 − 1
4 q2
1 − 9
4 · · · q2
1 − (2n−1)2
4
(2n+)!
2n+
y−n+1, (6.165)
where
q1 =
x −
x0 + x1
2
h
. (6.166)
Definition 6.9 We define the operator δ by the relations
δf (x) = f x +
h
2
− f x −
h
2
, δk+1
f (x) = δk
f x +
h
2
− δk
f x −
h
2
, (6.167)
where k ≥ 1, k ∈ N.
326 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
Observation 6.11
(i) Calculating δ2
f (x), we obtain
δ2
f (x) = δf x +
h
2
− δf x −
h
2
= f (x + h) − 2f (x) + f (x − h). (6.168)
(ii) Proceeding by induction, it follows immediately that if k is an even number, then in the
calculation of δkyp supplementary intermediate points do not appear. Indeed, if k = 2, we
have seen above that the affirmation is true. Let us suppose that the affirmation is true for
k = 2l and let us show that it remains true for k = 2l + 2, l ∈ N, l ≥ 1. We have
δ2l+2
yp = δ2l−2
yp+1 − 2δ2l−2
yp + δ2l−2
yp−1 (6.169)
and, because all the terms on the right side do not introduce new supplementary points
besides the given ones x−n, x−n+1, . . . , xn, the affirmation is proved.
Starting from the first formula of Gauss and writing all the finite differences as a function of
δk
y0 and δk
y1, we obtain the first Everett formula7
P (x) = (1 − q)y0 −
q(q − 1)(q − 2)
3!
δ2
y0 −
(q + 1)q(q − 1)(q − 2)(q − 3)
5!
δ4
y0 − · · ·
−
(q + n − 1)(q + n − 2) . . . (q − n − 1)
(2n + 1)!
δ2n
y0 + qy1 +
(q + 1)q(q − 1)
3!
δ2
y1
+
(q + 2)(q + 1)q(q − 1)(q − 2)
5!
δ4
y1 + · · · +
(q + n)(q + n − 1) . . . (q − n)
(2n + 1)!
δ2n
y1.
(6.170)
Observation 6.12
(i) The expression δyp+1/2 reads
δyp+ 1
2
= f (xp + h) − f (xp) = yp+1 − yp. (6.171)
(ii) Proceeding as with Observation 6.11, we deduce that δk
yp+1/2 does not introduce supple-
mentary points if k is a natural odd number.
The first Gauss formula may also be written in the form
P (x) = y0 +
q + 1
2!
δy1
2
+
(q + 2)(q + 1)q(q − 1)
4!
δ3
y1
2
+ · · ·
+
(q + n + 1)(q + n) · · · (q − n)
(2n + 2)!
δ2n+1
y1
2
−
q(q − 1)
2!
δy− 1
2
−
(q + 1)q(q − 1)(q − 2)
4!
δ3
y− 1
2
− · · ·
−
(q + n)(q + n − 1) · · · (q − n − 1)
(2n + 2)!
δ2n+1
y− 1
2
,
(6.172)
called the second interpolation formula of Everett or the interpolation formula of Steffensen.8
7
Joseph Davis Everett (1831–1904) published his formulae in 1900.
8The formula is called after Johan Frederik Steffensen (1873–1961) who presented it in 1950.
DIVIDED DIFFERENCES 327
6.6 DIVIDED DIFFERENCES
Definition 6.10 Let there be f : I ⊂ R → R, I interval of the real axis, and the division points
x1, x2, . . . , xn. The values of the function at these points are yi = f (xi), i = 1, n. We define the
divided differences by the relations
[xi, xj ] = f (xi; xj ) =
f (xj ) − f (xi)
xj − xi
, (6.173)
[xi, xj , xk] = f (xi; xj ; xk) =
f (xj ; xk) − f (xi; xj )
xk − xi
, (6.174)
and, in general, by
[xi1
, xi2
, . . . , xik+1
] = f (xi1
; xi2
; . . . ; xik+1
) =
f (xi2
; . . . ; xik+1
) − f (xi1
; . . . ; xik
)
xik+1
− xi1
, (6.175)
where il ∈ {1, 2, . . . , n}, l = 1, k + 1.
Theorem 6.6 There exists the relation
f (x1; . . . ; xk) =
k
j=1
j=i
f (xj )
(xj − xi)
. (6.176)
Demonstration. We proceed by induction. For k = 1, we have
f (x1) = f (x1) (6.177)
which is true.
For k = 2, we obtain
f (x1; x2) =
f (x2)
x2 − x1
+
f (x1)
x1 − x2
=
f (x2) − f (x1)
x2 − x1
, (6.178)
which is the definition of divided differences.
Let us suppose now that the affirmation is valid for any i ≤ k and let us show that it exists for
k + 1. We have
f (x1; . . . ; xk+1) =
f (x2; . . . ; xk+1) − f (x1; . . . ; xk)
xk+1 − x1
=
1
xk+1 − x1







k+1
j=2
f xj
2≤i≤k+1
i=j
(xj − xi)
−
k
j=1
f (xj )
1≤i≤k
i=j
(xj − xi)







,
(6.179)
corresponding to the induction hypothesis.
328 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
We calculate the coefficient of f (xj ), that is,
cj =
1
xk+1 − x1







1
2≤i≤k+1
i=j
xj − xi
−
1
1≤i≤k
i=j
(xj − xi)







=
(xj − x1) − (xj − xk+1)
(xk+1 − x1)
1≤i≤k+1
i=j
(xj − xi)
=
1
1≤i≤k+1
i=j
(xj − xi)
(6.180)
and the theorem is proved.
Observation 6.13
(i) The divided differences are linear operators, that is,
l
i=1
αifi (x1; . . . ; xk) =
l
i=1
αifi(x1; . . . ; xk). (6.181)
(ii) A divided difference is an even function with respect to its arguments.
We may construct Table 6.2 of divided differences in the following form.
Observation 6.14
(i) If x2 = x1 + ε, then
f (x1; x2) =
f (x1 + ε) − f (x1)
ε
(6.182)
and it follows that
f (x; x) = lim
ε→0
f (x + ε) − f (x)
ε
= f (x). (6.183)
(ii) In general,
f (x; x; . . . ; x) =
1
k!
f (k)
(x), (6.184)
where x appears k times in the left part of formula (6.184).
TABLE 6.2 Table of Divided Differences
x1 f (x1)
f (x1; x2)
x2 f (x2) f (x1; x2; x3)
f (x2; x3) f (x1; x2; x3; x4)
x3 f (x3) f (x2; x3; x4) f (x1; x2; x3; x4; x5) . . .
f (x3; x4) f (x2; x3; x4; x5)
x4 f (x4) f (x3; x4; x5)
. . . . . . . . . . . . . . . . . . . . .
xn f (xn)
DIVIDED DIFFERENCES 329
The demonstration is made by induction. For k = 1, the affirmation has been given at point
(i). Let us suppose that the affirmation holds for k and that it remains valid for k + 1. We
may write
f(x; . . . ; x; x)k+1 times = lim
ε→0
f (x; x + ε; . . . ; x + (k + 1)ε)
= lim
ε→0
f (x + ε; . . . ; x + (k + 1)ε) − f (x; . . . ; x + kε)
x + (k + 1)ε − x
=
1
k + 1
f (k+1)
(x)
k!
=
f (k+1)
(x)
(k + 1)!
,
(6.185)
the affirmation thus being proved.
(iii) There exists the relation
d
dx
f (x1; . . . ; xn; x) = f (x1; . . . ; xn; x; x). (6.186)
(iv) If u1, . . . , up are differentiable functions of x, then
d
dx
f (x1; . . . ; xn; u1; . . . ; up) =
p
i=1
f (x1; . . . ; xn; u1; . . . ; up; ui)
dui
dx
. (6.187)
(v) We may write
dr
dxr
f (x1; . . . ; xn; x) =
1
r!
f (x1; . . . ; xn; x; . . . ; x), (6.188)
where x appears r times on the right side.
Theorem 6.7 Let x0, x1, . . . , xn be distinct internal points of a connected domain D included in
the complex plane and f : D → C holomorphic. Under these conditions,
[x0; x1; . . . ; xn] =
1
2πi
C
f (z)dz
(z − x0) · · · (z − xn)
, (6.189)
where C is a rectifiable contour in the complex plane, contained in D, which contains in its interior
the points x0, x1, . . . , xn.
Demonstration. Let
I =
1
2πi
C
f (z)dz
(z − x0) · · · (z − xn)
, (6.190)
where C is passed through in the positive sense. We apply the residue theorem, knowing that
the function under the integral admits the poles of the first order x0, x1, . . . , xn; it follows
that
I =
n
k=0
f (xk)
n
i=0
i=k
(xk − xi)
, (6.191)
the last expression being [x0; x1; . . . ; xn], in conformity with Theorem 6.6.
330 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
Observation 6.15
(i) It follows that Theorem 6.7 is true in the domain of holomorphy of the function f (z) too;
hence, the representation remains valid, immaterial of the choice of the points xi in the
domain bounded by the curve C, in particular, if these points coincide.
(ii) If we denote by L the length of the curve C, then we have
|[x0; x1; . . . ; xn]| ≤
L
2π
max
z∈C
|f (z)|
min
z∈C
|(z − x0) · · · (z − xn)|
. (6.192)
Theorem 6.8 (Hermite). Let f : D → C analytic, D connected, with zk, k = 1, ν, interpolation
knots of multiplicity orders pk, ν
k=0 pk = n + 1. Under these conditions, we have
f (x) =
ν
k=1


pk−1
k=0


m
s=0
f (m)(zk)
(pk − m − 1)!(m − s)!
dm−s
dzm−s
z − zk
pk
Q(z)
z=zk
Q(x)
(x − zk)s+1




+ Q(x)[x0; x1; . . . xn],
(6.193)
where x0, x1, . . . , xn are the interpolation knots z1, . . . , zν too, but counted as many times as
indicated by the multiplicity order, x = x0, while Q will be specified later.
Demonstration. From Theorem 6.7, we have
[x0; x1; . . . ; xn] =
1
2πi
C
f (z)dz
(z − z0)p0 · · · (z − zν)pν
. (6.194)
Let us choose the curves Ck in the form of circles of radii rk, sufficiently small, centered at zk,
and interior to the domain bounded by the curve C. It follows that formula (6.194) may be written
in the form
[x0; x1; . . . ; xn] =
1
2πi
Ck
f (z)dz
(z − z0)p0 · · · (z − zν)pν
. (6.195)
We denote
q(z) =
ν
k=0
(z − zk)pk , (6.196)
Ik =
1
2πi
Ck
z − zk
pk
q(z)
f (z)
1
(z − zk)pk
dz. (6.197)
The function (z − zk)pk f (z)/q(z) is holomorphic in the circle bounded by Ck.
From Cauchy’s theorem, we have
Ik =
1
(pk − 1)!
dpk−1
dzpk−1
z − zk
pk
q(z)
f (z)
z=zk
. (6.198)
Applying now Leibniz’s formula of differentiation of a product of functions, follows that
I =
ν
k=0



pk−1
m=0

 f (m)(zk)
(pk − m − 1)!
dm
dzm
z − zk
pk
q(z)
z=zk





. (6.199)
NEWTON-TYPE FORMULA WITH DIVIDED DIFFERENCES 331
We denote
Q(z) =
q(z)
z − x
ν
k=1
(z − zk)pk (6.200)
and have
I =
ν
k=0



pk−1
m=0

 f (m)(zk)
(pk − m − 1)!
dm
dzm
z − zk
pk
Q(z)
1
z − x
z=zk





. (6.201)
We make k = 0 and apply once more Leibniz’s formula to relation (6.201), obtaining
I =
f (x)
Q(x)
−
ν
k=1



pk−1
m=0

 f (m)(zk)
(pk − m − 1)!
m
s=0
1
(m − s)!
dm
dzm
z − zk
pk
Q(z)
1
z − x
z=zk





= [x0; x1; . . . ; xn],
(6.202)
that is, Hermite’s formula, the theorem thus being proved.
6.7 NEWTON-TYPE FORMULA WITH DIVIDED DIFFERENCES
Lemma 6.1 If P (x) is a polynomial of nth degree, then its divided difference of (n + 1)th order
satisfies the relation
P (x; x0; x1; . . . ; xn) = 0, (6.203)
where the knots xi, i = 0, n, are distinct.
Demonstration. From the definition, we have
P (x; x0) =
P (x) − P (x0)
x − x0
, (6.204)
which is a polynomial of (n − 1)th degree.
Further,
P (x; x0; x1) =
P (x; x0) − P (x0; x1)
x − x1
(6.205)
is a polynomial of (n − 2)th order. Moreover, it follows that x − x1 divides P (x; x0) − P (x0; x1).
Proceeding step by step, we obtain P (x; x0; . . . ; xn−1), which is a polynomial of zeroth degree,
that is, a constant, which will be denoted by C. Finally,
P (x; x0; x1; . . . ; xn) =
C − C
x − xn
= 0, (6.206)
hence the lemma is proved.
A consequence for the Lagrange interpolation polynomial is immediately obtained. Indeed, if
P (x) is a Lagrange interpolation polynomial for which P (xi) = yi, i = 0, n, then
P (x; x0; x1; . . . ; xn) = 0. (6.207)
On the other hand,
P (x) = P (x0) + (x − x0)
P (x) − P (x0)
x − x0
= P (x0) + P (x; x0)(x − x0). (6.208)
332 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
Proceeding step by step, it follows that
P (x) = P (x0) + P (x; x0)(x − x0)
= P (x0) + P (x0; x1)(x − x0) + P (x; x0; x1)(x − x0)(x − x1)
= P (x0) + P (x0; x1)(x − x0) + P (x; x0; x1)(x − x0)(x − x1)
+ P (x; x0; x1; x2)(x − x0)(x − x1)(x − x2)
= · · · = P (x0) + P (x0; x1)(x − x0) + P (x; x0; x1)(x − x0)(x − x1) + · · ·
+ P (x0; x1; . . . ; xn)(x − x0) . . . (x − xn−1)
+ P (x; x0; x1; . . . ; xn)(x − x0) . . . (x − xn−1)(x − xn),
(6.209)
where we have marked the last term too, even if this one is equal to zero.
Definition 6.11 The expression
P (x) = y0 + [x0, x1](x − x0) + [x0, x1, x2](x − x0)(x − x1) + · · ·
+ [x0, x1, . . . , xn](x − x0) · · · (x − xn−1)
(6.210)
is called Newton-type formula with divided differences.
6.8 INVERSE INTERPOLATION
The determination of the value x for which the function takes a certain value y is considered in the
frame of the inverse interpolation.
Two cases may occur:
• the division points are equidistant;
• the division points are arbitrary.
Let us begin with the first case. Newton’s forward interpolation polynomial leads to
y = y0 +
q
1!
y0 +
q(q − 1)
2!
2
y0 + · · · +
q(q − 1) · · · (q − n + 1)
n!
n
y0; (6.211)
that is,
y = y(q). (6.212)
The problem consists in solving equation (6.211), because if we know q and the relation
q =
x − x0
h
, (6.213)
h being the interpolation step, then it results automatically in the required value of x.
We start with an initial approximation of the solution and customarily we take
q0 =
y − y0
y0
, (6.214)
the solution obtained from equation (6.211) by neglecting the nonlinear terms.
If f is of class Cn+1([a, b]), [a, b] being the interval that contains the points of division, while
f is the function that connects the values xi and yi = f (xi), i = 0, n, then the iterative sequence
given by the relation
qp+1 =
y − y0
y0
−
qp(qp − 1)
2! y0
− · · · −
qp(qp − 1) · · · (qp − n + 1)
n!
n
y0, p ∈ N, (6.215)
DETERMINATION OF THE ROOTS OF AN EQUATION BY INVERSE INTERPOLATION 333
where q0 is definite by equation (6.214), is convergent to q, the solution of equation (6.211), the
problem thus being solved.
If the knots are arbitrary, then instead of constructing the Lagrange polynomial that gives y as
a function of x, we construct the Lagrange polynomial that gives x as a function of y, that is,
x =
n
i=0
(y − y0) · · · (y − yi−1)(y − yi+1) · · · (y − yn)
(yi − y0) · · · (yi − yi−1)(yi − yi+1) · · · (yi − yn)
yi (6.216)
or
x = x0 + [y0, y1](y − y0) + [y0, y1, y2](y − y0)(y − y1) + · · ·
+ [y0, y1, . . . , yn](y − y0)(y − y1) · · · (y − yn−1),
(6.217)
the problem being solved by a simple numerical replacement.
Obviously, this method may be applied in the case of equidistant knots also.
6.9 DETERMINATION OF THE ROOTS OF AN EQUATION BY INVERSE
INTERPOLATION
The method of determination of the roots of an equation by inverse interpolation is an application
of the preceding paragraph.
The idea consists in construction of a table of values with knots that are equidistant or not and
in finding the value x for which f (x) = 0 at a certain interval.
An application consists in the determination of the eigenvalues of a matrix.
Let us consider the characteristic equation written in the form
D(λ) =
a11 − λ a12 · · · a1n
a21 a22 − λ · · · a2n
· · · · · · · · · · · ·
an1 an2 · · · ann − λ
= 0, (6.218)
and let us give to λ the values 0, 1, 2, . . . , n, resulting in D(0), D(1), . . . , D(n).
By using Newton’s forward formula, we obtain
D(λ) = D(0) + λ D(0) +
λ(λ − 1)
2!
2
D(0) + · · · +
λ(λ − 1) · · · (λ − n + 1)
n!
n
D(0). (6.219)
On the other hand,
λ(λ − 1) · · · (λ − r + 1)
n!
=
r
p=1
cpr λp
, r = 1, n, (6.220)
so that expression (6.219) reads
D(λ) = D(0) +
n
p=1

λp
n
i=p
cpi
i
D (0)

, (6.221)
thus obtaining Markoff’s formula.
334 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
If, instead of the values 0, 1, . . . , n, we choose the values a, a + h, . . . , a + nh, then Markoff’s
formula takes the form
D(λ) = D(a) +
n
p=1

(λ − a)p
n
i=p
cpi hi i
D(a)

. (6.222)
Let us consider, for example, that the matrix A is given by
A =


1 0 3
1 2 −1
0 3 1

 ; (6.223)
then
D(0) =
1 0 3
1 2 −1
0 3 1
= 14, D(1) =
0 0 3
1 1 −1
0 3 0
= 9,
D(2) =
−1 0 3
1 0 −1
0 3 −1
= 6, D(3) =
−2 0 3
1 −1 −1
0 3 −2
= −1,
(6.224)
λ
1!
= λ,
λ(λ − 1)
2!
=
λ2
2
−
λ
2
,
λ(λ − 1)(λ − 2)
3!
=
λ3
6
−
λ2
2
+
λ
3
, (6.225)
c11 = 1, c12 = −
1
2
, c22 =
1
2
, c13 =
1
3
, c23 = −
1
2
, c33 =
1
6
. (6.226)
We thus construct Table 6.3, the table of finite differences.
We obtain
D(λ) = D(0) + λ(c11 D(0) + c12
2
D(0) + c13
3
D(0))
+ λ2
(c22
2
D(0) + c23
3
D(0)) + λ3
c33
3
D(0)
= 14 − 8λ + 4λ2
− λ3
.
(6.227)
Let the function f : R → R, f (λ) = −λ3
+ 4λ2
− 8λ + 14, the derivative of which is f (λ) =
−3λ2
+ 8λ − 8. The equation f (λ) = 0 has no real roots, and hence the function f (λ) is strictly
decreasing on R. It follows that the equation f (λ) = 0 has a single real root; because D(2) > 0,
D(3) < 0, we may state that this root is between 2 and 3.
Refining this interval a little, we find that the root is between 2.7 and 3, a situation for which
Table 6.4 of finite differences has been created.
TABLE 6.3 The Table of Finite Differences
λ D D 2
D 3
D
0 14 −5 2 −6
1 9 −3 −4
2 6 −7
3 −1
INTERPOLATION BY SPLINE FUNCTIONS 335
TABLE 6.4 Table of Finite Differences
λ f (λ) f 2
f 3
f
2.7 1.877 −0.869 −0.088 −0.006
2.8 1.008 −0.957 −0.094
2.9 0.051 −1.051
3.0 −1
We choose λ0 = 2.9, which corresponds to q0 = 2. We have
q1 =
0 − 1.877
−0.869
−
2 × 1
2! × (−0.869)
× (−0.088) −
2 × 1 × 0
3! × (−0.869)
× (−0.006) = 2.05869,
q2 = 2.04945, q3 = 2.05093, q4 = 2.05069, q5 = 2.05073,
(6.228)
from which we obtain the root of the equation f (λ) = 0, that is,
λ ≈ 2.7 + 0.1q5 = 2.905, (6.229)
for which
f (λ) = 0.00073. (6.230)
6.10 INTERPOLATION BY SPLINE FUNCTIONS
Let us consider a function f : [a, b] → R and an approximation of the same by an interpolation
polynomial P such that P (xi) = f (xi) = yi, i = 0, n, xi being the interpolation knots. For higher
values of n, there is a better chance for the degree of the interpolation polynomial to increase
(obviously, remaining n at the most). But, a polynomial of a higher degree has a deep oscillatory
character as can be seen in Figure 6.1. Because of this oscillation property, interpolation polynomials
of high degree are avoided.
An alternative used to obtain interpolation functions is to divide the interval [a, b] in a finite
set of subintervals, using for each subinterval another interpolation polynomial. We thus obtain a
piecewise interpolation. Let us observe that such a method does not guarantee the differentiability
of the approximation function at the ends of the subintervals. Usually, it is required that the approx-
imation function be of the same class of differentiability as the original function. Practically, if the
approximation function is of class C2 on [a, b], then it is sufficient for most situations. Usually,
we use on each subinterval polynomial functions of third degree; hence, we realize a cubical spline
interpolation.
Definition 6.12 Let f : [a, b] → R and the interpolation knots be
a = x0 < x1 < · · · < x = b. (6.231)
A cubical spline for the function f is a function S that satisfies the following conditions:
(a) Sj = S|[xj ,xj+1] is a polynomial of degree at the most 3 for each j = 0, n − 1;
(b) S(xj ) = f (xj ) for any j = 0, n;
(c) Sj+1(xj+1) = Sj (xj+1) for any j = 0, n − 2;
(d) S (xj+1) = Sj (xj+1) for j = 0, n − 2;
336 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
f
O
P
y
Ai(xi,yi)
Ai+1(xi+1,yi+1)
Ai−1(xi−1,yi−1)
xxi +1
yi+1
yi−1
yi
xi−1 xi
Figure 6.1 The oscillatory character of polynomials of high degree.
(e) Sj+1(xj+1) = Sj (xj+1) for j = 0, n − 2;
(f) the following boundary conditions are satisfied:
• or S (x0) = S (xn) = 0 (the so-called condition of free boundary),
• or S (x0) = f (x0) and S (xn) = f (xn) (the so-called condition of imposed boundary).
Observation 6.16 We have to determine n polynomials of third degree Sj , j = 0, n − 1. As any
polynomial of third degree has four coefficients, it follows that the interpolation by spline functions
is equivalent to the determination of 4n coefficients. Condition (b) of Definition 6.12 leads to n + 1
equations, the condition (c) leads to n − 1 equations, condition (d) implies n − 1 equations, while
the condition (e) leads to n − 1 equations. We thus have 4n − 2 equations to which are added the
two equations of point (f) for free or imposed frontier. A system of 4n equations with 4n unknowns
are thus obtained.
Observation 6.17 Let us choose the polynomials Sj , j = 0, n − 1, in the form
Sj (x) = aj + bj (x − xj ) + cj (x − xj )2
+ dj (x − xj )3
. (6.232)
Immediately, we notice that
Sj (xj ) = S(xj ) = f (xj ) = aj , j = 0, n − 1. (6.233)
On the other hand,
aj+1 = Sj+1(xj+1) = Sj (xj+1), (6.234)
hence
aj+1 = aj + bj (xj+1 − xj ) + cj (xj+1 − xj )2
+ dj (xj+1 − xj )3
, j = 0, n − 1, (6.235)
where we have assumed that
an = f (xn). (6.236)
INTERPOLATION BY SPLINE FUNCTIONS 337
Defining
bn = S (xn) (6.237)
and observing that
Sj (x) = bj + 2cj (x − xj ) + 3dj (x − xj ), (6.238)
from which
S (xj ) = bj , j = 0, n − 1, (6.239)
we obtain
bj+1 = bj + 2cj (xj+1 − xj ) + 3dj (xj+1 − xj )2
, j = 0, n − 1, (6.240)
from condition (d).
Finally, defining
cn =
S (xn)
2
(6.241)
and applying the condition (e), we obtain the relation
cj+1 = cj + 3dj (xj+1 − xj ). (6.242)
Relation (6.242) leads to
dj =
cj+1 − cj
3(xj+1 − xj )
; (6.243)
replacing in relations (6.235) and (6.240), we obtain
aj+1 = aj + bj (xj+1 − xj ) +
(xj+1 − xj )2
3
(2cj + cj+1), (6.244)
bj+1 = bj + (xj+1 − x)(cj + cj+1), (6.245)
for j = 0, n − 1. Eliminating bj between the last two relations, it follows that the system
(xj − xj−1)cj−1 + 2(xj+1 − xj−1)cj + (xj+1 − xj )cj+1
=
3
(xj+1 − xj )
(aj+1 − aj ) −
3
xj − xj−1
(aj − aj−1), j = 1, n − 1
(6.246)
the unknowns being cj , j = 0, n; this system is a linear one.
Theorem 6.9 If f : [a, b] → R, then f has a unique natural interpolation spline, which is a unique
interpolation spline that satisfies the free boundary conditions S (a) = S (b) = 0.
Demonstration. The free boundary conditions imply
cn =
S (xn)
2
= 0, (6.247)
0 = S (x0) = 2c0 + 6d0(x0 − x0), c0 = 0. (6.248)
338 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
System (6.246) determines the matrix








1 0 0 · · · 0 0
x1 − x0 2 x2 − x0 x1 − x0 · · · 0 0
0 x2 − x1 2(x3 − x1) · · · 0 0
· · · · · · · · · · · · · · · · · ·
0 0 0 · · · 2(xn − xn−2) xn − xn−1
0 0 0 · · · 0 1








, (6.249)
the determinant of which is nonzero.
Observation 6.18 We can describe an algorithm for the determination of a natural spline interpo-
lation function as follows:
– for i = 1, n − 1, calculate αi = 3[f (xi+1)(xi − xi−1) − f (xi)(xi+1 − xi−1) + f (xi−1)
(xi+1 − xi)]/[(xi+1 − xi)(xi − xi−1)];
– set β0 = 1, γ0 = 0, δ0 = 0;
– for i = 1, n − 1, calculate βi = 2(xi+1 − xi−1) − (xi − xi−1)γi−1, γi = (1/βi)(xi+1 − xi),
δi = (1/βi)[αi − (xi − xi−1)δi−1];
– set βn = 1, δn = 0, cn = δn;
– for j = n − 1, 0, calculate cj = δj − γj cj+1, bj = [f (xj+1) − f (xj )]/(xj+1 − xj )
−[(xj+1 − xj )(cj+1 + 2cj )]/3, dj = (cj+1 − cj )/3(xj+1 − xj );
– the natural spline interpolation function reads
Sj (x) = f (xj ) + bj (x − xj ) + cj (x − xj )2
+ dj (x − x)3
, j = 0, n − 1.
Theorem 6.10 If f : [a, b] → R, then f admits a unique spline interpolation function that satisfies
the imposed boundary conditions S (a) = f (a) and S (b) = f (b).
Demonstration. Because
S (a) = S (x0) = b0, (6.250)
equation (6.244), written for j = 0, implies
f (a) =
a1 − a0
x1 − x0
−
x1 − x0
3
(2c0 + c1), (6.251)
from which
2(x1 − x0)c0 + (x1 − x0)c1 =
3
x1 − x0
(a1 − a0) − 3f (a). (6.252)
Analogically,
f (b) = bn = bn−1 + (xn − xn−1)(cn−1 + cn) (6.253)
and equation (6.244), written for j = n − 1, leads to
f (b) =
an − an−1
xn − xn−1
−
xn − xn−1
3
(2cn−1 + cn) + (xn − xn−1)(cn−1 + cn)
=
an − an−1
xn − xn−1
+
xn − xn−1
3
(cn−1 + 2cn),
(6.254)
from which
(xn − xn−1)cn−1 + 2(xn − xn−1)cn = 3f (b) −
3
xn − xn−1
(an − an−1). (6.255)
HERMITE’S INTERPOLATION 339
The system formed by equation (6.246), equation (6.252), and equation (6.255) is a linear system,
the matrix of which is








2 x1 − x0 x1 − x0 0 · · · 0 0
0 2(x2 − x0) x1 − x0 · · · 0 0
0 x2 − x1 2(x3 − x1) · · · · · · · · ·
· · · · · · · · · · · · · · · · · ·
0 0 0 · · · 2(xn − xn−2) xn − xn−1
0 0 0 · · · xn−1 − xn−2 2(xn − xn−1)








. (6.256)
The determinant of this matrix does not vanish, hence its solution is unique.
Observation 6.19 In this case too, we may give an algorithm to determine the cubical spline
interpolation function with the imposed boundary conditions as follows:
– set α0 = [3(f (x1) − f (x0))]/x1 − x0 − 3f (x0),
αn = 3f (xn) − [3(f (xn) − f (xn−1))]/xn − xn−1;
– for i = 1, n − 1, calculate αi = 3[f (xi+1)(xi − xi−1) − f (xi)(xi+1 − xi−1) + f (xi−1)(xi+1 −
xi)]/[(xi+1 − xi)(xi − xi−1)];
– set β0 = 2(x1 − x0), γ0 = 1/2, δ0 = α0/2(x1 − x0), b0 = f (x0);
– for i = 1, n − 1, calculate βi = 2(xi+1 − xi−1) − (xi − xi−1)γi−1, γi = (1/βi)(xi+1 −
xi), δi = (1/βi)[αi − (xi − xi−1)δi−1];
– set βn = (xn − xn−1)(2 − γn−1), δn = (1/βn)[αn − (xn − xn−1)δn−1], cn = δn;
– for j = n − 1, 0, calculate cj = δj − γj cj + 1, bj = {[f (xj+1) − f (xj )]/xj+1 − xj } −
{[(xj+1 − xj )(cj+1 + 2cj )]/3}, dj = (cj+1 − cj )/3(xj+1 − xj );
– the cubical spline interpolation function is given by
Sj (x) = f (xj ) + bj (x − xj ) + cj (x − xj )2
+ dj (x − xj )3
, j = 0, n − 1.
6.11 HERMITE’S INTERPOLATION
Definition 6.13 Let [a, b] be an interval of the real axis, with n + 1 distinct points in this interval
x0, x1, . . . , xn and mi, i = 0, n, n + 1 integers associated to the points xi. We denote by m the
value
m = max
0≤i≤n
mi. (6.257)
Let a function f : [a, b] → R, f at least of class Cm
on the interval [a, b]. The polynomial P of
minimum degree, which satisfies
dk
P (xi)
dxk
=
dk
f (xi)
dxk
(6.258)
for any i = 0, n and k = 0, mi is called approximation osculating polynomial of the function f on
the interval [a, b].
Observation 6.20 The degree of the approximation osculating polynomial P will be at
the most
M =
n
i=0
mi + n, (6.259)
because the number of conditions that must be satisfied is n
i=0 mi + n + 1 and a polynomial of
degree M has M + 1 coefficients that are deduced from these conditions.
340 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
Observation 6.21
(i) If n = 0, then the approximation osculating polynomial P becomes just the Taylor polyno-
mial of degree m0 for f at x0.
(ii) If mi = 0 for i = 0, n, then the approximation osculating polynomial P coincides with
Lagrange’s interpolation polynomial at the interpolation knots x0, x1, . . . , xn.
Theorem 6.11 If f ∈ C1([a, b]), f : [a, b] → R and x0, x1, . . . , xn are n + 1 distinct points in
[a, b], then the unique polynomial9
of minimum degree, which coincides with f at the knots xi,
i = 0, n, and the derivative of which coincides with f at the very same points xi is given by
H2n+1(x) =
n
j=0
f (xj )Hn,j (x) +
n
j=0
f (xj )Hn,j (x), (6.260)
where
Hn,j (x) = [1 − 2(x − xj )Ln,j (xj )]L2
n,j (x) (6.261)
and
Hn,j (x) = (x − xj )L2
n,j (x), (6.262)
while Ln,j represents the polynomial coefficient of degree n and orderj, that is,
Ln,j =
(x − x0) · · · (x − xj−1)(x − xj+1) · · · (x − xn)
(xj − x0) · · · (xj − xj−1)(xj − xj+1) · · · (xj − xn)
. (6.263)
If f ∈ C2n+2
([a, b]), then the following expression of the approximation osculating polynomial
error
f (x) − H2n+1(x) =
(x − x0)2 · · · (x − xn)2
(2n + 2)!
f (2n+2)
(ξ), (6.264)
where ξ is a point situated between a and b, exists.
Demonstration. It is similar to the proof of the existence and uniqueness of the Lagrange polynomial,
formula (6.264) being obtained in an analogous way as the formula of the error in case of the
Lagrange polynomial.
6.12 CHEBYSHEV’S POLYNOMIALS
Definition 6.14 Let f : [a, b] → R be a real function of real variable. We call deviation from zero
of the function f (x) on the segment [a, b] the greatest value of the modulus of the function f on
the very same interval.
Lemma 6.2 Let x ∈ [−1, 1] and
Tn(x) = cos(n arccos x). (6.265)
Under these conditions,
9The name of the polynomial is given in honor of Charles Hermite (1822–1901).
CHEBYSHEV’S POLYNOMIALS 341
(i) Tn(x) represents a polynomial10
of degree n in x, the dominant coefficient of which is equal
to 2n−1
;
(ii) all the roots of the equation Tn(x) = 0 are distinct and in the interval [−1, 1];
(iii) the maximal value of the polynomial Tn(x) on the interval [−1, 1] is equal to 1 and exists
for
xk = cos
2kπ
n
, k = 0, 1, . . . ,
n
2
+ 1; (6.266)
(iv) the minimal value of the polynomial Tn(x) on the interval [−1, 1] is equal to −1 and exists
for
xl = cos
(2l + 1)π
n
, l = 0, 1, . . . ,
n − 1
2
+ 1. (6.267)
Demonstration. From Moivre’s formula
(cos α + i sin α)n
= cos nα + i sin nα, n ∈ N∗
; (6.268)
considering
(cos α + i sin α)n
= cosn
α + iC1
ncosn−1
α sin α − C2
ncosn−2
αsin2
α + · · · + in
sinn
α, (6.269)
we obtain
cos nα = cosn
α − C2
ncosn−2
αsin2
α + C4
ncosn−4
αsin4
α − · · · (6.270)
Choosing now
α = arccos x, (6.271)
hence
cos α = x, sin α = 1 − x2, (6.272)
formula (6.270) leads to
Tn(x) = cos(n arccos x) = xn
− C2
nxn−2
(1 − x2
) + C4
nxn−4
(1 − x2
)2
− · · · (6.273)
It follows that Tn is a polynomial of degree n at the most.
(i) On the other hand, the coefficient of xn
is given by
1 + C2
n + C4
n + · · · = 2n−1
, (6.274)
so that the point (i) of the lemma is proved.
(ii) The following equation
cos φ = 0 (6.275)
leads to the solutions
φ =
2k − 1
2
π, k ∈ Z. (6.276)
It follows that
Tn(x) = cos(n arccos x) = 0 (6.277)
10The polynomials are named after Pafnuty Lvovich Chebysev (1821–1894) who introduced them in 1854.
342 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
if and only if
n arccos x =
2k − 1
2
π, x = cos
2k − 1
2n
π , k ∈ Z. (6.278)
Giving the values 1, 2, 3, . . . , n to k, we get n distinct roots of the equation Tn(x) = 0,
that is,
x1 = cos
π
2n
, x2 = cos
3π
2n
, x3 = cos
5π
2n
, . . . , xn = cos
2n − 1
2n
π .
(6.279)
(iii) From (6.265) it follows that
−1 ≤ Tn(x) ≤ 1. (6.280)
The condition Tn(x) = 1 leads to
n arccos x = 2kπ, k ∈ Z, (6.281)
obtaining immediately relation (6.266).
(iv) It is analogous to point (iii), the condition Tn(x) = −1 leading to
n arccos x = (2k + 1)π, k ∈ Z. (6.282)
Definition 6.15 The polynomials Kn(x) = 21−n
Tn(x), x ∈ [−1, 1], are called Chebyshev’s poly-
nomials.
Theorem 6.12 (Chebyshev)
(i) The deviation from zero of the polynomial
Q(x) = xn
+ a1xn−1
+ a2xn−2
+ · · · + an−1x + an (6.283)
cannot be less then 21−n
on the interval [−1, 1] and it is equal to 21−n
only for Chebyshev’s
polynomial Kn(x).
(ii) There exists a unique polynomial of degree n with the dominant coefficient equal to 1,
the deviation of which on the segment [−1, 1] is equal to 21−n
, this polynomial being,
obviously, Kn(x).
Demonstration
(i) Let us suppose, per absurdum, that there would exist a polynomial Q(x) of the form
(6.283) for which the deviation from zero would be less than 21−n
. This means that for any
x ∈ [−1, 1], we have
−
1
2n+1
< Q(x) <
1
2n+1
(6.284)
or, equivalently,
Q(x) −
1
2n+1
< 0, Q(x) +
1
2n+1
> 0. (6.285)
CHEBYSHEV’S POLYNOMIALS 343
Let us consider the polynomial
P (x) = Q(x) − Kn(x). (6.286)
Because the coefficients of the terms of maximal degree are equal to 1 both for Q(x) and
forKn(x), it follows that P (x) is a polynomial of degree n − 1 at the most. On the other
hand, from formulae (6.266) and (6.267) it follows that
P (1) = Q(1) − Kn(1) < 0,
P cos
π
n
= Q cos
π
n
− Kn cos
π
n
> 0,
P cos
2π
n
= Q cos
2π
n
− Kn cos
2π
n
< 0,
P cos
3π
n
= Q cos
3π
n
− Kn cos
3π
n
> 0, . . .
(6.287)
This means that for x = 1, x = cos(2π/n), x = cos(4π/n), . . . , the polynomial P (x) is
negative, while for x = cos(π/n), x = cos(3π/n), . . . , the polynomial P (x) is positive.
It follows that the polynomial P (x) has at least one root between 1 and cos(π/n), at
least one root between x = cos(π/n) and x = cos(2π/n), . . . , at least one root between
x = cos[(n − 1)π/n] and x = cos π = 1. Hence, the polynomial P (x) has at least n roots.
But P (x) is of degree n − 1 at the most. That means that P (x) = 0, hence Q(x) = Kn(x).
(ii) Let us assume now, per absurdum too, that there exists a polynomial Q(x) of degree n at
the most, the dominant coefficient of which is equal to 1 and for which the deviation from
zero on the segment [−1, 1] is equal to 21−n
. Let
P (x) = Q(x) − Kn(x), (6.288)
which obviously is a polynomial of degree n − 1 at the most. For the polynomial P (x) we
may state that it has nonpositive values at the points x = 1, x = cos(2π/n), x = cos(4π/n),
. . . , while at the points x = cos(π/n), x = cos(3π/n), . . . it has nonnegative ones. It
follows that on each interval [−1, cos((n − 1)π/n)], [cos((n − 1)π/n), cos((n − 2)π/n)],
. . . , [cos(3π/n), cos(2π/n)], [cos(2π/n), cos(π/n)], [cos(π/n), 1] the equation P (x) =
0 has at least one root. But, although we have n intervals, the number of roots of the
equation P (x) = 0 may be less than n because a root may be the common extremity of two
neighboring intervals. Let us now consider such a case, for example, the case in which the
root is x = cos(π/n). This means that in the interval [cos(2π/n), 1] the equation P (x) = 0
has a single root, that is, x. Because of this, it follows that the curve y = P (x) is tangential to
the Ox-axis at the point x = cos(π/n). If not, then the curve y = P (x) pierces the Ox-axis at
the point x and P (x) becomes positive either on the interval (cos(2π/n), cos(π/n)) or on the
interval (cos(π/n), 1). But P (x) is a continuous function and P (cos(2π/n)) < 0, P (1) < 0,
and hence the equation P (x) = 0 has the second root on the interval [cos(2π/n), 1], which
is a contradiction, from which the curve y = P (x) is tangential to the Ox-axis at the point
x. This means that x is a double root of the equationP (x) = 0. Let us suppose now that x
is not a double root of the equation P (x) = 0. Hence, the equation may be written in the
form
x − cos
π
n
P1(x) = 0, (6.289)
where the polynomial P1(x) is of degree n − 2 at the most and P1(x) = 0. But P1(x) is
a continuous function so that it has a constant sign in a neighborhood V of x. But the
polynomial
P (x) = x − cos
π
n
P1(x) (6.290)
344 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
changes the sign on V , together with the factor x − cos(π/n); it means that the curve
y = P (x) pierces the axis Ox at the point x, which is not possible. Hence, if x = cos(π/n)
is a root of the equation P (x) = 0, then it is at least a double one. It follows that on each
interval of the form [cos(kπ/n), cos((k − 1)π/n)], k = 1, n − 1, we have at least one root
of the equation P (x) = 0 (if it is in the interior of the interval, then it is at least a single
one; if it is at one of the frontiers (excepting the ends −1 and 1 where one has no roots)
it is at least a double one). The equation P (x) = 0 will thus have at least n roots (distinct
or not), P (x) being a polynomial of degree n − 1 at the most. It follows that P (x) is an
identical zero polynomial and the point (ii) of the theorem is proved.
6.13 MINI–MAX APPROXIMATION OF FUNCTIONS
Let the function f : [a, b] → R and its approximate g : [a, b] → R. We suppose that both f and g
are at least of class C0
on the interval [a, b]. The mini–max principle requires that the approximation
function g satisfies the condition
max
x∈[a,b]
|f (x) − g(x)| = minimum. (6.291)
Observation 6.22 Condition (6.291) is incomplete at least for one reason: the kind of function
we require for the approximate gis not specified. Usually, g is required in the set of polynomial
functions.
Let us consider on the interval [a, b] a division formed by the points x0, x1, . . . , xn so that
xi < xi+1, i = 0, n − 1, and let g : [a, b] → R the approximate of the function f , which we require
in the form of a polynomial Pn(x) of degree n at the most. The mini–max principle given by relation
(6.291) is thus written in the form
max
x∈[a,b]
|f (x) − Pn(x)| = minimum. (6.292)
In this case, the required polynomialPn(x) will have the smallest deviation from the function f on
the interval [a, b]. We also require that the polynomial Pn(x) pass through the interpolation knots
xi, that is,
Pn(xi) = yi, yi = f (xi), i = 0, n. (6.293)
In contrast to the interpolations considered until now, the interpolation knots are not known. We
minimize error (6.292) by an adequate choice of knots. Lagrange’s interpolation leads to
|f (x) − Pn(x)| =
|f (n+1)(ξ)|
(n + 1)!
|(x − x0)(x − x1) · · · (x − xn)|, (6.294)
where ξ is a point situated between a and b, while f is at least of class Cn+1 on [a, b]. Let us
consider the product
Rn+1(x) = (x − x0)(x − x1) · · · (x − xn) (6.295)
and let us make the change of variable
x =
b − a
2
u +
b + a
2
, (6.296)
so that the interval [a, b] is transformed into the interval [−1, 1]. It follows that
Rn+1(u) =
b − a
2
n+1
(u − u0)(u − u1) · · · (u − un). (6.297)
ALMOST MINI–MAX APPROXIMATION OF FUNCTIONS 345
As we know from Chebyshev’s polynomials, the minimum of the product Rn+1(u), which is a
polynomial of (n + 1)th degree in u, is realized if ui, i = 0, n, are just the zeros of Chebyshev’s
polynomial Kn+1(u). We may write
Rn+1(u) ≥
b − a
2
n+1
1
2n , (6.298)
and formula (6.294) leads to
|f (x) − Pn(x)| ≥
|f (n+1)
(ξ)|
(n + 1)!
b − a
2
n+1
1
2n . (6.299)
On the other hand, the roots of Chebyshev’s polynomial Kn+1(u) are
u0 = cos
2 (n + 1) − 1
2(n + 1)
π , u1 = cos
2n − 1
2 (n + 1)
π , . . . , un = cos
π
2 (n + 1)
, (6.300)
so that the interpolation knots will be
xi =
b − a
2
ui +
b + a
2
. (6.301)
Hence, it follows that among all the polynomials of degree n at the most, the one that minimizes
error (6.292) is the one constructed with the abscissas of the knots given by the roots of Chebyshev’s
polynomial Kn+1(x), of degree n + 1.
6.14 ALMOST MINI–MAX APPROXIMATION OF FUNCTIONS
Let us give a somewhat new formulation to the mini–max optimization criterion. Instead of
max
x∈[a,b]
|f (x) − Pn(x)| = minimum, (6.302)
where f is a real function defined on [a, b], at least of class C0
on [a, b], while Pn(x) is a polynomial
of degree n at the most, we will require
max
x∈[a,b]
|f (x) − Pn(x)| ≤ ε, (6.303)
where ε is a positive error a priori imposed. We reduce the problem to the interval [−1, 1], with
its generality not being changed. We also suppose that f is analytic on [−1, 1], that is, f may be
expanded into a convergent power series
f (x) =
∞
k=0
bkxk
. (6.304)
Lemma 6.3 The Chebyshev polynomials constitute the basis for the vector space of real
polynomials.
Demonstration. The idea consists in showing that every polynomial P (x) may be written as a linear
combination of the polynomials Kn(x), n ∈ N. The demonstration is made by induction after n.
The affirmation is true for n = 0, because 1 is Chebyshev’s polynomial K0(x). Let us suppose that
346 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
the affirmation holds for any polynomial xk
, k ≤ n, and let us state it for xn+1
. The polynomial
xn+1 − Kn+1(x) is of degree n at the most for which we can write
xn+1
− Kn+1(x) = α0K0(x) + α1K1(x) + · · · + αnKn(x), (6.305)
with αi ∈ R, i = 0, n. It follows that xn+1
can also be written as a combination of Chebyshev
polynomials and, by mathematical induction, the lemma is proved.
Taking into account Lemma 6.3, it follows that relation (6.304) may be written by means of the
Chebyshev polynomials as follows:
f (x) =
∞
k=0
akKk(x). (6.306)
Truncating series (6.306) at k = n, we get
Pn(x) =
n
k=0
akKk(x) (6.307)
and criterion (6.303) leads to
|f (x) − Pn(x)| =
∞
k=n+1
akKk (x) <
∞
k=n+1
|ak||Kk(x)| ≤
∞
k=n+1
|ak|
1
2k−1
<
∞
k=n+1
|ak| < ε.
(6.308)
Instead of the infinite sum ∞
k=n+1|ak| we usually consider the approximation N
k=n+1|ak| so that
condition (6.303) now reads
N
k=n+1
|ak| < ε. (6.309)
Definition 6.16 The polynomial Pn(x) thus obtained is called an almost mini–max polynomial for
the function f .
Observation 6.23
(i) The almost mini–max polynomial Pn(x) of the function f may be different from the
mini–max polynomial constructed in Section 6.13.
(ii) We know that the mini–max polynomial minimizes the error, but this minimal error is not
known. Using the almost mini–max polynomial, the error is less than ε > 0 imposed a
priori.
6.15 APPROXIMATION OF FUNCTIONS BY TRIGONOMETRIC FUNCTIONS
(FOURIER)
Definition 6.17
(i) Let H be a fixed Hilbert space. We call basis in H a system B = {ei}i∈I linearly independent
of elements in H for which the Hilbert subspace generated by it is dense in H.
(ii) We call orthonormal basis in H (total or complete orthonormal system) any basis B of H
for which we have for any two elements ei and ej of B,
APPROXIMATION OF FUNCTIONS BY TRIGONOMETRIC FUNCTIONS (FOURIER) 347
ei, ej = δij , (6.310)
where ·, · is the scalar product on H, while δij is Kronecker’s symbol
δij =
1 for i = j,
0 otherwise.
(6.311)
(iii) Let H be a Hilbert space with an orthonormal basis B = {en}n≥1. For any arbitrary u ∈ H,
we call generalized Fourier coefficients of u relative to B the numbers
cn = u, en , n ≥ 1, (6.312)
while the series n>1cnen is called generalized Fourier series11
of u relative to B.
Theorem 6.13 (Generalization of Dirichlet’s Theorem). Let H be a Hilbert space with an
orthonormal basis B = {e}n≥1. For any u ∈ H, its generalized Fourier series relative to B is
convergent in H, its sum being equal to u. The numerical series u≥1|cn|2
is convergent, its sum
being equal to u 2
.
Demonstration. We must show that
lim
n→∞
u −
n
i=1
ciei = 0, (6.313)
lim
n→∞
u 2
−
n
i=1
|ci|2
= 0, (6.314)
respectively. Let
un =
n
i=1
ciei, n ≥ 1, (6.315)
where ci given by equation (6.312) are the Fourier coefficients of u relative to the basis B. Let
k ∈ N, 1 ≤ k ≤ n, arbitrary. We may write
un, ek =
n
i=1
ci ei, ek =
n
i=1
ciδij = ck = u, ek , (6.316)
that is,
un − u, ek = 0. (6.317)
Let n ≥ 1, arbitrary but fixed, and let us denote by Hn the vector subspace of H, generated by
the elements e1, e2, . . . , en. It follows that un − u ∈ H⊥
n for any n ≥ 1. But Hn is a subspace of
finite dimension (dim Hn = n), hence a closed set in H. Moreover, un is the projection of u on Hn.
Because
u − un
2
+ un − v 2
= u − v 2
, (6.318)
11The series is called after Jean Baptiste Joseph Fourier (1768–1830) who published his results in M´emoire sur la
propagation de la chaleur dans les corps solides in 1807 and then in Th´eorie analytique de la chaleur in 1822. The
first steps in this field were made by Leonhard Euler (1707–1783), Jean-Baptiste le Rond d’Alembert (1717–1783),
and Daniel Bernoulli (1700–1782).
348 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
corresponding to Pythagoras’s theorem, it follows that
u − un ≤ u − v . (6.319)
Let ε > 0 be fixed. Because the subspace generated by B in H is dense, it follows that there exists
v ∈ H, a finite linear combination of elements of B such that
u − v < ε. (6.320)
It follows that there exists a natural number N(ε) such that v ∈ Hn for any n ≥ N(ε), and from
(6.319) and (6.320) we obtain
u − un < ε (6.321)
too for any n ≥ N(ε). We have shown that un → u in H, that is,
∞
i=1
ciei = u. (6.322)
On the other hand,
u 2
= un, un =
n
i=1
ciei,
n
j=1
cj ej =
n
i=1
n
j=1
cicj δij =
n
i=1
|ci|2
, (6.323)
a relation valid for any n ≥ 1. Making n → ∞ and considering that un → u , it follows that
∞
i=1
|ci|2
= u 2
. (6.324)
Definition 6.18 Relation (6.324) is called the relation or the equality of Parseval.
Corollary 6.4
(i) If the basis B is fixed and u ∈ H, then the Fourier expansion of u is unique.
(ii) For any n ≥ 1 we have Bessel’s inequality
n
i=1
|ci|2
≤ u 2
(6.325)
and
lim
n→∞
cn = 0. (6.326)
(iii) Let H = L2
[−π,π], that is, the space of real square integrable functions, on which the scalar
product
f, g =
π
−π
f (x)g(x)dx (6.327)
has been defined, and let us consider as orthonormal basis in H the sequence
e1 =
1
√
2π
, e2 =
1
√
π
cos x, e3 =
1
√
π
sin x, e4 =
1
√
π
cos 2x, e5 =
1
√
π
sin 2x, . . .
(6.328)
APPROXIMATION OF FUNCTIONS BY TRIGONOMETRIC FUNCTIONS (FOURIER) 349
Under these conditions, for u : [−π, π] → R, u ∈ H, the generalized Fourier coefficients
of u relative to the orthonormal basis B = {e}n≥1 are
c1 =
π
2
a0, c2 = a1
√
π, c3 = b1
√
π, c4 = a2
√
π, c5 = b2
√
π, . . . , (6.329)
where
an =
1
π
π
−π
u(x) cos(nx)dx, bn =
1
π
π
−π
u(x) sin(nx)dx, n ≥ 0. (6.330)
Parseval’s equality now reads
a2
0
2
+
∞
i=1
(a2
i + b2
i ) =
1
π
π
π
u2
(x)dx. (6.331)
(iv) (Dirichlet’s theorem) If the periodic function f (x) of period 2π satisfies Dirichlet’s condi-
tions in the interval (−π, π), that is,
(a) f is uniformly bounded on (−π, π), that is, there exists M > 0 and finite such that
|f (x)| ≤ M for any x ∈ (−π, π), and
(b) f has a finite number of strict extremes, then, at each point of continuity x ∈ (−π, π),
the function f (x) may be expanded into a trigonometric Fourier series
f (x) =
a0
2
+
∞
i=1
[ai cos(ix) + bi sin(ix)], (6.332)
where the Fourier coefficients ai and bi are given by
ai =
1
π
π
−π
f (x) cos(ix)dx, i = 0, 1, 2, . . . , (6.333)
bi =
1
π
π
−π
f (x) sin(ix)dx, i = 1, 2, . . . , (6.334)
respectively. If x ∈ (−π, π) is a point of discontinuity for the function f (x), then the
sum S(x) of the Fourier series (6.332) attached to f reads
S(x) =
f (x − 0) + f (x + 0)
2
. (6.335)
At the ends, we have
S(−π) = S(π) =
f (−π + 0) + f (π + 0)
2
. (6.336)
Demonstration
(i) Let us suppose, per absurdum, that the expansion is not unique, that is,
u =
∞
i=1
ciei and u =
∞
i=1
diei, (6.337)
where there exists at least i ∈ N∗ such that ci = di. Let vn = n
i=1 diei. It follows that
vn, ei = di for any i ≤ n, making n → ∞; because vn → u it also follows that u, ei =
di, that is, di = ci for any i ≥ 1.
350 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
(ii) The relations are obvious, taking into account Parseval’s equality.
(iii) We successively have
c1 = u, e1 =
π
−π
u(x)
1
√
2π
dx =
π
2
a0, (6.338)
c2 = u, e2 =
π
−π
u(x)
1
√
π
cos xdx =
√
πa1, (6.339)
c3 = u, e3 =
π
−π
u(x)
1
√
π
sin xdx =
√
πb1 (6.340)
and, in general, all the requested relations are satisfied. Parseval’s equality becomes
π
−π
u2
(x)dx =
n
i=1
|ci|2
=
π
2
a2
0 + π
∞
i=1
(a2
i + b2
i ) = π
a2
0
2
+
∞
i=1
a2
i + b2
i , (6.341)
that is, relation (6.331).
(iv) Obviously, a function f that satisfies Dirichlet’s conditions is a function of L2
[−π,π] and the
theorem is proved. At the points of discontinuity, the Fourier series is replaced by relations
(6.335) and (6.336), respectively, which may or not satisfy equality (6.332).
Observation 6.24
(i) If the function f (x) is even, f (x) = f (−x), then bi = 0, for any i ∈ N∗ and the Fourier
series becomes
f (x) =
a0
2
+
∞
i=1
ai cos(ix), ai =
2
π
π
0
f (x) cos(ix)dx, i ∈ N. (6.342)
(ii) If the function f (x) is odd f (−x) = −f (x), then ai = 0, i ∈ N, and the Fourier series
reads
f (x) =
∞
i=1
bi sin(ix), bi =
2
π
π
0
f (x) sin(ix)dx, i ∈ N. (6.343)
(iii) If the function f (x) satisfies Dirichlet’s conditions on the interval (−l, l), then we have the
expansion
f (x) =
a0
2
+
∞
i=1
ai cos
πi
l
x + bi sin
πi
l
x , (6.344)
where
ai =
1
l
l
−l
f (x) cos
πi
l
x dx, i = 0, 1, 2, . . . , (6.345)
bi =
1
l
l
−l
f (x) sin
πi
l
x dx, i = 1, 2, 3, . . . (6.346)
APPROXIMATION OF FUNCTIONS BY TRIGONOMETRIC FUNCTIONS (FOURIER) 351
(iv) If the function f (x) satisfies Dirichlet’s conditions on a finite interval (a, b), then we make
the change of variable
x = αz + β, (6.347)
so that
a = −απ + β, b = απ + β, (6.348)
from which
β =
a + b
2
, α =
b − a
2π
. (6.349)
Transformation (6.347) may be written as
x =
b − a
2π
z +
b + a
2
. (6.350)
Let us consider now the case in which the function f is given numerically, that is, we know the
values
yi = f (xi), (6.351)
with xi, i = 0, n, division knots, xi ∈ [−π, π]. We denote by S(x) the series
S(x) =
a0
2
+
m
k=1
ak cos(kx) +
m
k=1
bk sin(kx). (6.352)
The coefficients ai, i = 0, n, and bi, i = 1, n, are determined by the condition of minimal error
εf =
n
i=0
[yi − S(xi)]2
= minimum. (6.353)
There result the conditions
∂εf
∂aj
= 0, j = 0, m;
∂εf
∂bj
= 0, j = 1, m. (6.354)
Taking into account that
∂S(xi)
∂a0
=
1
2
,
∂S(xi)
∂aj
= cos(jxi),
∂S(xi)
∂bj
= sin(jxi), j = 1, m, (6.355)
Equation (6.353) and equation (6.354) lead to the system
n
i=0
yi =
n
i=0
S(xi),
n
i=0
yi cos(jxi) =
n
i=0
S(xi) cos(jxi),
n
i=0
yi sin(jxi) =
n
i=0
S(xi) sin(jxi),
j = 1, m.
(6.356)
The system is compatible if
n + 1 ≥ 2m + 1. (6.357)
352 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
O
y
x
(x2,y2)
(x1,y1)
(x0,y0)
(x3,y3)
(xi,yi)
(xN,yN)
Figure 6.2 Discrete approximation by the least squares.
6.16 APPROXIMATION OF FUNCTIONS BY THE LEAST SQUARES
An idea to consider the approximation function g(x) for a given function f (x) is that of writing
the approximate in the form of a finite linear combination of certain functions12
= {φi}i=1,n that
satisfy certain properties. Under these conditions, the approximate g(x) will be of the form
g(x) =
n
i=1
ciφi(x), (6.358)
where ci, i = 1, n are real constants. Thus, once the set is chosen, the problem is reduced to the
determination of the constants ci, i = 1, n. These constants result from the condition that the graph
of the approximate g(x) be sufficiently near the set M = {(xi, yi), i = 1, N}. The nearness of the
approximate g(x) to the set M is calculated by means of a norm, which usually is
f 2 =
b
a
f 2
(x)dx (6.359)
for f ∈ C0([a, b]) and
f 2 =
n
i=0
|f (xi)|2
(6.360)
for the discrete case, respectively.
The problem of approximation of a given function f by a linear combination of the functions of
the set may be seen as a problem of determination of the constants ci, i = 1, n, which minimize
the expression
f −
n
i=1
ciφi = minimum. (6.361)
Definition 6.19 If the norm in relation (6.361) is one of norms (6.359) or (6.360), then the
approximation of the function f (x) by
g(x) =
n
i=1
ciφi(x) (6.362)
is called approximation by the least square.
12The first description of the least squares method was given by Carl Friedrich Gauss (1777–1855) in Theoria
motus corporum coelestium in sectionibus conicis Solem ambientum in 1809.
APPROXIMATION OF FUNCTIONS BY THE LEAST SQUARES 353
Let us suppose, at the beginning, that we have a sequence of values (xi, yi), i = 0, N, as a
result of the application of an unknown function f (x) on the distinct values xi, i = 0, N (Fig. 6.2).
We require a straight line that realizes the best approximation. The problem is thus reduced to the
minimization of the function
E(a, b) =
N
i=0
[yi − (axi + b)]2
, (6.363)
where a and b are the parameters of the straight line
(d) : y = ax + b. (6.364)
For minimizing expression (6.363), it is necessary that
∂E(a, b)
∂a
= 0,
∂E(a, b)
∂b
= 0 (6.365)
or, otherwise,
∂
∂a
N
i=0
[yi − (axi + b)]2
= 0,
∂
∂b
N
i=0
[yi − (axi + b)]2
= 0. (6.366)
System (6.366) is equivalent with
a
N
i=0
x2
i + b
N
i=0
xi =
N
i=0
xiyi, a
N
i=0
xi + b(N + 1) =
N
i=0
yi (6.367)
and has the solution
a =
(N + 1)
N
i=0
xiyi −
N
i=0
xi
N
i=0
yi
(N + 1)
N
i=0
x2
i −
N
i=0
xi
2
, b =
N
i=0
x2
i
N
i=0
yi −
N
i=0
xiyi
N
i=0
xi
(N + 1)
N
i=0
x2
i −
N
i=0
xi
2
.
(6.368)
Considering that d2
E(a, b) is everywhere positive definite, it follows that the function E(a, b) is
convex; hence, the previous critical point given by relation (6.368) is a point of global minimum.
Let us pass now to the general case in which the approximate g is a polynomial of nth degree
g(x) = a0 + a1x + a2x2
+ · · · + anxn
, (6.369)
with n < N. The problem is obviously reduced to the determination of the coefficients a0, a1, . . . ,
an, which minimize the expression
E(al) =
N
i=0
[yi − g(xi)]2
=
N
i=0
y2
i − 2
N
i=0


n
j=0
aj x
j
i

 yi +
N
i=0


n
j=0
aj x
j
i


2
=
N
i=0
y2
i − 2
n
j=0
aj
N
i=0
yix
j
i +
n
j=0
n
k=0
aj ak
N
i=0
x
j+k
i .
(6.370)
354 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
To minimize E(al) it is necessary that
∂E
∂al
= 0, for l = 0, n. (6.371)
There result the equations
−2
N
i=0
yix
j
i + 2
n
k=0
ak
N
i=0
x
j+k
i =
∂E
∂aj
= 0, j = 0, n. (6.372)
We obtain the determined compatible system
a0
N
i=0
x0
i + a1
N
i=0
x1
i + a2
N
i=0
x2
i + · · · + an
N
i=0
xn
i =
N
i=0
yix0
i ,
a0
N
i=0
x1
i + a1
N
i=0
x2
i + a2
N
i=0
x3
i + · · · + an
N
i=0
xn+1
i =
N
i=0
yix1
i , . . . ,
a0
N
i=0
xn
i + a1
N
i=0
xn+1
i + a2
N
i=0
xn+2
i + · · · + an
N
i=0
x2n
i =
N
i=0
yixn
i .
(6.373)
Because the error is a convex function, it follows that the solution of system (6.373) is a point of
global minimum.
6.17 OTHER METHODS OF INTERPOLATION
6.17.1 Interpolation with Rational Functions
The interpolation with polynomials has at least one disadvantage, that is, for x → ±∞ the values
of the polynomials become infinite too.
Many times, we know, practically, some information about the real function, concerning its
behavior at ±∞, as, for instance, it has a certain oblique asymptote, it is bounded, and so on.
For this reason, we may choose as approximate function a rational one
R(x) =
P (x)
Q(x)
, (6.374)
where P and Q are polynomials of mth and nth degree, respectively.
We may write
R(x) =
a0xm
+ a1xm−1
+ · · · + am
b0xn + b1xn−1 + · · · + bn
. (6.375)
Because b0 may be a common factor, we may choose b0 = 1 such that expression (6.375) takes
the form
R(x) =
a0xm
+ a1xm−1
+ · · · + am
xn + b1xn−1 + · · · + bn
. (6.376)
We have m + n + 1 unknown coefficients (a0, . . . , am, b1, . . . , bn) to determine in relation
(6.376) so that m + n + 1 division points are necessary.
If, for instance, we know that the function has an oblique asymptote of the form y = cx + d,
then we obtain the values m = n + 1, a0 = c, a1 − b1c = d, the number of division points necessary
to determine the coefficients thus being reduced.
OTHER METHODS OF INTERPOLATION 355
6.17.2 The Method of Least Squares with Rational Functions
We may also give in this case a criterion of optimization, that is,
N
i=1
yi −
P xi
Q(xi)
2
= minimum. (6.377)
Proceeding as with the method of least squares, it follows that the coefficients of the polynomials
P (x) and Q(x) will be determined by equations of the form
N
i=1



yi −
P xi
Q(xi)
∂P (xi)
∂aj
Q(xi)



= 0,
N
i=1
yi −
P xi
Q(xi)
P (xi)
Q2(xi)
∂Q(xi)
∂bk
= 0, (6.378)
where j = 0, m, k = 1, n, while N is the number of the division points at which we know the
values of the function.
Unfortunately, system (6.378) is a nonlinear one so that the calculation of the coefficients of the
polynomials P (x) and Q(x) become difficult.
6.17.3 Interpolation with Exponentials
We may require an approximate of the function f (x) in the form
g(x) = C1eα1x
+ C2eα2x
+ · · · + Cpeαpx
, (6.379)
thus introducing 2p unknowns Ci, αi, i = 1, p.
These unknowns are deduced by the conditions
f (xi) = yi = g(xi), i = 0, 2p − 1. (6.380)
Two cases may occur:
(i) The exponents are known, that is, we know the values αi, i = 1, p. In this case, because
the exponentials are linearly independent, we obtain a linear system of p equations with p
unknowns Ci, i = 1, p, compatible determined, of the form
C1eα1x1 + C2eα2x1 + · · · + Cpeαpx1 = y1,
C1eα1x2 + C2eα2x2 + · · · + Cpeαpx2 = y2, . . . ,
C1eα1xp
+ C2eα2xp
+ · · · + Cpeαpxp
= yp.
(6.381)
(ii) The exponents are unknown. If the division points are equidistant, then we apply Prony’s
method.13
To do this, we observe that the exponential
eαij
= (eαi )j
= ρ
j
i (6.382)
satisfies, for any i = 0, k − 1, a relation of the form
y(j + k) + Ck−1y(j + k − 1) + Ck−2y(j + k − 2) + · · · + C0y(j) = 0, (6.383)
13The method was introduced by Gaspard Clair Franc¸ois Marie Riche de Prony (1755–1839) in 1795.
356 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
where we have supposed that the division points are xj = j − 1; this may be always made,
eventually, in a scalar way on the Ox-axis. In relation (6.383), the coefficients Ci, i =
0, k − 1, are constant real numbers. The characteristic equation is of the form
ρk
+ Ck−1ρk−1
+ Ck−2ρk−2
+ · · · + C0 = 0. (6.384)
We remark that the original function f (x) satisfies equation (6.383), that is,
f (j + k) + Ck−1f (j + k − 1) + Ck−2f (j + k − 2) + · · · + C0f (j) = 0, j = 1, n − k.
(6.385)
From the last relation, there result the values of the constants C0, . . . , Ck−1, while from relation
(6.384) we obtain the roots ρ0, . . . , ρk−1, the interpolation exponentials being of the form
g(x) = C0eρ0x
+ C1eρ1x
+ · · · + Ck−1eρk−1x
. (6.386)
If certain parameters are imposed, for example, we know α0, then the number of unknowns
diminishes so that equation (6.384) now has an imposed root ρ0 = α0.
6.18 NUMERICAL EXAMPLES
Example 6.1 Let us consider the following table of data.
xi yi = f (xi)
0 −2
1 −3
2 −16
3 −35
4 −30
We solve the problem to determine the Lagrange interpolation polynomial.
From the relation
L4(x) =
4
i=0




4
j=0
j=i
x − xj
xi − xj



 yi, (6.387)
we deduce
L4(x) =
(x − 1)(x − 2)(x − 3)(x − 4)
(0 − 1)(0 − 2)(0 − 3)(0 − 4)
(−2) +
x(x − 2)(x − 3)(x − 4)
(1 − 0)(1 − 2)(1 − 3)(1 − 4)
(−3)
+
x(x − 1)(x − 3)(x − 4)
(2 − 0)(2 − 1)(2 − 3)(2 − 4)
(−16) +
x(x − 1)(x − 2)(x − 4)
(3 − 0)(3 − 1)(3 − 2)(3 − 4)
(−35)
+
x(x − 1)(x − 2)(x − 3)
(4 − 0)(4 − 1)(4 − 2)(4 − 3)
(−30)
= x4
− 5x3
+ 2x2
+ x − 2.
(6.388)
Example 6.2 Let the function f : [−1, ∞) → [0, ∞),
f (x) =
√
x + 1. (6.389)
NUMERICAL EXAMPLES 357
We wish to determine approximations of
√
1.1 and
√
0.89, by means of the expansions into a
Taylor series of the function f .
Because
f (x) =
(x + 1)− 1
2
2
, f (x) = −
(x + 1)− 3
2
22
, f (x) =
1 × 3
23
(x + 1)− 5
2 , . . . ,
f (n)
(x) =
(−1)n+1
(2n − 3)!!
2n (x + 1)
1−2n
2 , n ≥ 2,
(6.390)
we deduce
f (0) =
1
2
, f (0) = −
1
22
, f (0) =
1 × 3
23
, . . . ,
f (n)
(0) =
(−1)n+1(2n − 3)!!
2n , n ≥ 2,
(6.391)
obtaining the expansion into a Taylor series around the origin
f (x) = f (0) +
1
2
x
1!
+
n
k=2
xk
k!
(−1)k+1(2k − 3)!!
2k
+
xn+1
(n + 1)!
(−1)n+2(2n − 1)!!
2n+1
(1 + ξ)− 1+2n
2 ,
(6.392)
where ξ is a point situated between 0 and x.
For an approximate calculation of
√
1.1 we have x = 0.1 and it follows that
f (0.1) ≈ f (0) = 1, (6.393)
f (0.1) ≈ f (0) +
0.1
2 × 1!
= 1.05, (6.394)
f (0.1) ≈ f (0) +
0.1
2 × 1!
−
0.12
22
× 2!
= 1.04875, (6.395)
f (0.1) ≈ f (0) +
0.1
2 × 1!
−
0.12
22
× 2!
+
0.13
× 3
23
× 3!
= 1.0488125, (6.396)
f (0.1) ≈ f (0) +
0.1
2 × 1!
−
0.12
22
× 2!
+
0.13
× 3
23
× 3!
−
0.14
× 3 × 5
24
× 4!
= 1.048808594. (6.397)
The exact value is
√
1.1 = 1.048808848, (6.398)
so that approximation (6.397) gives six exact decimal digits.
For
√
0.89 we must take x = −0.11, and we obtain
f (−0.11) ≈ f (0) = 1, (6.399)
358 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
f (−0.11) ≈ f (0) −
0.11
2 × 1!
= 0.945, (6.400)
f (−0.11) ≈ f (0) −
0.11
2 × 1!
−
0.112
22
× 2!
= 0.9434875, (6.401)
f (−0.11) ≈ f (0) −
0.11
2 × 1!
−
0.112
22
× 2!
−
0.113
× 3
23
× 3!
= 0.943404312, (6.402)
f (−0.11) ≈ f (0) −
0.11
2 × 1!
−
0.112
22
× 2!
−
0.113
× 3
23
× 3!
−
0.114
× 3 × 5
24
× 4!
= 0.943398593. (6.403)
On the other hand, √
0.89 = 0.943398113, (6.404)
and hence approximation (6.403) that uses the first four derivatives of the function f leads to six
exact decimal digits.
Example 6.3 For the function f : [−1, 3] → R we know the following values.
xi yi = f (xi)
−1 6
0 3
1 −2
2 9
3 78
We wish to determine approximate values for f (−0.9) and f (2.8) using forward and backward
Newton’s interpolation polynomials, respectively.
To do this, we construct Table 6.5 of finite differences.
In the case of forward Newton’s polynomial, the value of q is given by
q =
x − x0
h
=
−0.9 + 1
1
= 0.1 (6.405)
and we have
P (q) = y0 +
q
1!
y0 +
q(q − 1)
2!
2
y0
+
q(q − 1)(q − 2)
3!
3
y0 +
q(q − 1)(q − 2)(q − 3)
4!
4
y0,
(6.406)
TABLE 6.5 Table of Finite Differences
xi yi yi
2yi
3yi
4yi
−1 6 −3 −2 18 24
0 3 −5 16 42
1 −2 11 58
2 9 69
3 78
NUMERICAL EXAMPLES 359
from which
f (−0.9) ≈ P (0.1) = 5.8071. (6.407)
For the backward Newton’s polynomial we may write
q =
x − xn
h
=
2.8 − 3
1
= −0.2, (6.408)
P (q) = y4 +
q
1!
y3 +
q(q + 1)
2!
2
y2
+
q(q + 1)(q + 2)
3!
3
y1 +
q(q + 1)(q + 2)(q + 3)
4!
4
y0,
(6.409)
hence
f (2.8) ≈ P (−0.2) = 56.7376. (6.410)
Example 6.4 Let the function f : [−3, 3] → R be given by the following table of values.
xi yi = f (xi)
−3 68
−2 42
−1 18
0 2
1 0
2 18
3 62
We wish to have an approximate value for f (0.5).
We construct Table 6.6 of finite differences.
We have
x0 = 0, x−1 = −1, x−2 = −2, x−3 = −3, x1 = 1, x2 = 2, x3 = 3, (6.411)
h = 1, q =
x − x0
h
= 0.5. (6.412)
TABLE 6.6 Table of Finite Differences
xi yi = f (xi) yi
2
yi
3
yi
4
yi
5
yi
6
yi
−3 68 −26 2 6 0 0 0
−2 42 −24 8 6 0 0
−1 18 −16 14 6 0
0 2 −2 20 6
1 0 18 26
2 18 44
3 62
360 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
If we apply Gauss’s first formula, then we obtain
f (0.5) ≈ y0 +
q
1!
y0 +
q(q − 1)
2!
2
y−1 +
(q + 1)q(q − 1)
3!
3
y−1
+
(q + 1)q(q − 1)(q − 2)
4!
4
y−2 +
(q + 2)(q + 1)q(q − 1)(q − 2)
5!
5
y−2
+
(q + 2)(q + 1)q(q − 1)(q − 2)(q − 3)
6!
6
y−3
= −1.125.
(6.413)
The use of the second Gauss’s formula leads to the relation
f (0.5) ≈ y0 +
q(1)
1!
y−1 +
(q + 1)(2)
2!
2
y−1 +
(q + 1)(3)
3!
3
y−2
+
(q + 2)(4)
4!
4
y−2 +
(q + 2)(5)
5!
5
y−3 +
(q + 3)(6)
6!
6
y−3
= −1.125.
(6.414)
Analogically, we may use the formulae of Stirling, Bessel, or Everrett.
Example 6.5 Let us consider the function f : [0, 1] → R,
f (x) = ex
, (6.415)
as well as the intermediary points
x0 = 0, x1 = 0.5, x2 = 1. (6.416)
The values of the function f at these points are
f (0) = 1, f (0.5) = 1.64872, f (1) = 2.71828. (6.417)
If we wish to determine the natural cubic spline interpolation polynomial, then we shall calculate
successively
α1 =
3[f (x2)(x1 − x0) − f (x1)(x2 − x0) + f (x0)(x2 − x1)]
(x2 − x1)(x1 − x0)
= 2.52504, (6.418)
β0 = 1, γ0 = 0, δ0 = 0, (6.419)
β1 = 2(x2 − x0) − (x1 − x2)γ0 = 2, (6.420)
γ1 =
1
β1
(x2 − x1) = 0.25, (6.421)
δ1 =
1
β1
[α1 − (x1 − x0)δ0] = 1.26252, (6.422)
β2 = 1, δ2 = 0, c2 = 0, (6.423)
NUMERICAL EXAMPLES 361
c1 = δ1 − γ1c2 = 0, (6.424)
b1 =
f (x2) − f (x1)
x2 − x1
−
(x2 − x1)(c2 + 2c1)
3
= 1.71828, (6.425)
d1 =
c2 − c1
3(x2 − x1)
= −0.84168, (6.426)
b0 =
f (x1) − f (x0)
x1 − x0
−
(x1 − x0)(c1 + 2c0)
3
= 1.08702, (6.427)
d0 =
c1 − c0
3(x1 − x0)
= 0.84168. (6.428)
We obtain the natural cubic spline interpolation polynomial in the form
S(x) =



1 + 1.08702x + 0.84168x3
, for x ∈ [0, 0.5]
1.64872 + 1.71828(x − 0.5) + 1.26252(x − 0.5)2
−0.84168(x − 0.5)3
, for x ∈ [0.5, 1]
. (6.429)
If we wish to determine the cubic spline interpolation polynomial with an imposed frontier, then
we must take into account that
f (0) = 1, f (0.5) = 1.64872, f (1) = 2.71828, (6.430)
obtaining the answer
S(x) =



1 + x + 0.48895x2
for x ∈ [0, 0.5]
1.64872 + 1.64785(x − 0.5) + 0.80677(x − 0.5)2
+0.35155(x − 0.5)3
for x ∈ [0.5, 1]
. (6.431)
Example 6.6 Let us consider the function f : [0, 4] → R,
f (x) =
sin x
3 + x + sin x
(6.432)
and the interpolation knots
x0 = 0, x1 = 1, x2 = 2, x3 = 3, x4 = 4. (6.433)
If we realize the interpolation of this function by interpolation polynomials, then the limit to
infinite of any such polynomial will be ±∞, in contradiction to
lim
x→±∞
f (x) = 0. (6.434)
We realize the interpolation by rational functions and let
R(x) =
P (x)
Q(x)
(6.435)
be such a function.
From relation (6.434), we deduce
degP < degQ (6.436)
362 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
and, because we have five interpolation points, we may take
P (x) = a1x + a0, Q(x) = b2x2
+ b1x + b0, (6.437)
with b2 = 0, ai ∈ R, i = 0, 1, bi ∈ R, i = 0, 2.
It follows that the linear system
a0
b0
= f (0) = 0,
a1 + a0
b2 + b1 + b0
= f (1) = 0.17380,
2a1 + a0
4b2 + 2b1 + b0
= f (2) = 0.15388,
3a1 + a0
9b2 + 3b1 + b0
= f (3) = 0.02298,
4a1 + a0
16b2 + 4b1 + b0
= f (4) = −0.12122
(6.438)
which is equivalent to,
a0 = 0, a1 − 0.17380b0 − 0.17380b1 − 0.17380b2 = 0,
2a1 − 0.15388b0 − 0.30776b1 − 0.61552b2 = 0,
3a1 − 0.02298b0 − 0.06894b1 − 0.20682b2 = 0,
4a1 + 0.12122b0 + 0.48488b1 + 1.93952b2 = 0.
(6.439)
System (6.439) is compatible indeterminate. We shall determine its general solution. To do this,
we consider that a1 = λ, where λ is a real parameter. It follows that the system
0.15388b0 + 0.30776b1 + 0.61552b2 = 2λ, 0.02298b0 + 0.06894b1 + 0.20682b2 = 3λ,
0.12122b0 + 0.48488b1 + 1.93952b2 = −4λ,
(6.440)
with the solution
b0 = −1065.4λ, b1 = 820.29λ, b2 = −140.55λ. (6.441)
We deduce
R(x) =
λx
−140.55λx2 + 820.29λx − 1065λ
=
x
−140.55x2 + 820.29x − 1065
, λ = 0. (6.442)
Example 6.7 Let us consider the function f : [−1, 1] → R,
f (x) =
1
1 + x2
, (6.443)
called the Runge function, for which let us choose two systems of knots of interpolation. The first
system will contain four equidistant interpolation knots, that is,
x0 = −1, x1 = −
1
3
, x2 =
1
3
, x3 = 1, (6.444)
while the second system will have as interpolation knots the roots of the Chebyshev polynomial
K4(x), that is,
x0 = −
2 +
√
2
4
, x1 = −
2 −
√
2
4
, x2 =
2 −
√
2
4
, x3 =
2 +
√
2
4
. (6.445)
14The function was presented by Carl David Tolm´e Runge (1856–1927) in 1901.
APPLICATIONS 363
TABLE 6.7 The Values of the Interpolation Knots and of Function (6.443) at These Knots
x f (x) x f (x)
x0 = −1 0.5 x0 = −0.9238795 0.5395043
x1 = −0.3333333 0.9 x1 = −0.3826834 0.8722604
x2 = 0.3333333 0.9 x2 = 0.3826834 0.8722604
x3 = 1 0.5 x3 = 0.9238795 0.5395043
We shall construct interpolation polynomials corresponding to each system of knots and shall
verify that the deviation is minimum in the case of the second system of interpolation knots for
various numbers of interpolation knots.
The Lagrange polynomial that passes through the interpolation knots zi, i = 0, 3, reads
L3(x) =
(x − z1)(x − z2)(x − z3)
(z0 − z1)(z0 − z2)(z0 − z3)
y0
+
(x − z0)(x − z2)(x − z3)
(z1 − z0)(z1 − z2)(z1 − z3)
y1
+
(x − z0)(x − z1)(x − z3)
(z2 − z0)(z2 − z1)(z2 − z3)
y2 +
(x − z0)(x − z1)(x − z2)
(z3 − z0)(z3 − z1)(z3 − z2)
y3,
(6.446)
where yi = f (zi), i = 0, 3.
We construct Table 6.7 with the values of the interpolation knots and of function (6.443) at these
knots.
The Lagrange polynomial for the first system of interpolation knots reads
L(1)
3 (x) = −0.45x2
+ 0.95. (6.447)
The Lagrange polynomial for the second set of interpolation knots is
L(2)
3 (x) = −0.4705883x2
+ 0.9411765. (6.448)
In general, calculating the values of the function f and of the polynomials L(1)
n (x) and L(2)
n (x)
on the interval [−1, 1] with the step x = 0.001, we have determined the values in Table 6.8.
We have denoted by εeq the maximum deviation for equidistant points, by Peq the points at
which this deviation takes place, by εCh the maximum deviation with Chebyshev knots, and by PCh
the points at which the maximum deviation with Chebyshev knots takes place.
We observe that for the interpolation knots given by the roots of the Chebyshev polynomial the
error is stable at values of order 10−15
; for equidistant interpolation knots, the error is unbounded;
thus, the oscillatory character of the polynomials of higher degree is established.
6.19 APPLICATIONS
Problem 6.1
Let us consider the planar linkage in Figure 6.3, where OA = d, OC = c, AB = a, BC = b, and
CM = λBC; the polynomials of first, second, and third degree, that approximate, in the sense of
the least squares, the trajectory of the point M if the positions of the points Mi, specified by the
angles
φi = −
3π
4
+ (i − 1)
π
4
, i = 1, 7, (6.449)
are known, have to be determined.
364 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
TABLE 6.8 Deviation
n εeq Peq εCh PCh
4 0.058359 ±0.701 0.058824 0
5 0.022282 ±0.827 0.012195 ±1
6 0.014091 ±0.851 0.10101 0
7 0.006873 ±0.894 0.002092 ±1
8 0.004273 ±0.905 0.001733 0
9 0.002258 ±0.925 0.000359 ±1
10 0.001425 ±0.931 0.000297 0
11 0.000791 ±0.943 0.00062 ±1
12 0.000501 ±0.947 0.00051 0
13 0.00029 ±0.954 0.00001 ±1
14 0.0001815 ±0.957 88 × 10−7
0
15 0.0001061 ±0.962 18 × 10−7
±1
16 6.73 × 10−5
±0.964 1.5 × 10−6
0
17 3.99 × 10−5
±0.968 3.11 × 10−7
±1
18 2.54 × 10−5
±0.969 2.58 × 10−7
0
19 1.52 × 10−5
±0.972 5.34 × 10−8
±1
20 9.67 × 10−6
±0.973 4.42 × 10−8
0
25 8.84 × 10−7
±0.980 2.70 × 10−10
±1
30 8.79 × 10−8
±0.984 6.57 × 10−12
0
35 1.92 × 10−8
±0.979 4.02 × 10−14
±0.964
40 4.13 × 10−7
±0.989 1.78 × 10−15
±0.082
45 9.37 × 10−6
±0.991 1.22 × 10−15
±0.052
50 0.0003145 ±0.988 1.22 × 10−15
±0.319
60 0.365949 ±0.994 1.67 × 10−15
±0.163
70 218.546 ±0.990 1.67 × 10−15
±0.035
80 171416 ±0.995 1.67 × 10−15
±0.056
90 2.03 × 108
±0.996 1.55 × 10−15
±0.753
100 1.47 × 1011
±0.998 2 × 10−15
±0.054
200 1.42 × 1041
±0.998 2.78 × 10−15
±0.544
300 3.95 × 1070
±0.999 2.66 × 10−15
±0.043
400 4.67 × 10100
±0.999 3.33 × 10−15
±0.320
500 4.23 × 10130
±0.999 3.66 × 10−15
±0.445
Solution:
1. Theory
Denoting by XC, YC the coordinates of the point C, as well as OC2 = c2, CB2 = b2, we obtain
the equations
X2
C + Y2
C = c2
, (6.450)
[XC − (d + a cos φ)]2
+ (YC − a sin φ)2
= b2
, (6.451)
from which, by subtracting and using the notation
f =
c2
+ a2
+ d2
+ 2ad cos φ − b2
2
, (6.452)
we get the equation of first degree
XC(d + a cos φ) + YCa sin φ = f. (6.453)
APPLICATIONS 365
B
ϕ
Mi
M2
M1
MY
X
O
C
A
Figure 6.3 Problem 6.1.
Further, using the notations
h =
fa sin φ
a2 + d2 + 2ad cos φ
, k =
c2
(d + a cos φ)2
− f 2
a2 + d2 + 2ad cos φ
, (6.454)
equation (6.450) and equation (6.453) lead to the equation
Y2
C − 2hYC − k = 0, (6.455)
the solution of which is
YC = h + h2 + k; (6.456)
also, from equation (6.453) we obtain
XC =
f − YCa sin φ
d + a cos φ
. (6.457)
Denoting then by X, Y the coordinates of the point M, there result
X = (1 − λ)XC + λ(d + a cos φ), (6.458)
Y = (1 − λ)YC + λa sin φ. (6.459)
Numerical application for a = l, b = c = 3l, d = 2l, l = 1, λ = 1/3 (with a positive value for λ,
it follows, on the basis of a known relation in the affine geometry, that the point M is between C
and B).
2. Numerical calculation
Relations (6.449), (6.452), (6.454), (6.456), (6.457), (6.458), and (6.459) lead to the values in
Table 6.9.
Successively, the polynomials
Y = 2.405819 − 0.496319X, (6.460)
Y = 2.220796 + 0.377282X − 0.390308X2
, (6.461)
Y = 2.209666 + 0.773455X − 0.888467X2
+ 0.147989X3
(6.462)
are obtained (Fig. 6.4).
Problem 6.2
Let there be a mechanism with the plane follower of translation as shown in Figure 6.5; the
mechanism is used for admission and evacuation of gas at heat engines.
366 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
−0.5 0 0.5 1 1.5 2 2.5
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
X
Y
−0.5 0 0.5 1 1.5 2 2.5
1
2
0.8
1.2
1.4
1.6
1.8
2.2
2.4
2.6
X
Y
−0.5 0 0.5 1 1.5 2 2.5
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
X
Y
(a)
(b)
(c)
Figure 6.4 The trajectory of the point M and the polynomials of approximations by the least square
method: (a) of first degree, (b) of second degree, (c) of third degree (the dashed line represents the
original function).
APPLICATIONS 367
TABLE 6.9 Numerical Results
i φi XCi YCi Xi Yi
1 −2.356194 2.041879 2.197892 1.792217 1.229559
2 −1.570796 2.244990 1.989980 2.163327 0.993320
3 −0.785398 2.024246 2.214143 2.251866 1.240393
4 0.000000 1.500000 2.598076 2.000000 1.732051
5 0.785398 0.682861 2.921250 1.357610 2.183202
6 1.570796 −0.244990 2.989980 0.503340 2.326653
7 2.356194 −0.748985 2.904999 −0.068359 2.172368
Follower
s
Cam
ϕ
Figure 6.5 Problem 6.2.
ϕ1 ϕ2 ϕ
s2
s1
O
s
Figure 6.6 The displacement of the follower versus the rotation angle of the cam.
If the motion law s = s(φ) of the follower, where s is the displacement and φ is the rotation
angle of the cam, is piecewise polynomial, then the cam is called polydine (Fig. 6.6).
Let us determine, on the interval [φ1, φ2], the Hermitic polynomial of minimal degree, which
verifies the conditions
si(φi) = s(φi), vi =
ds
dφ
|φ=φi
, ai =
d2
s
dφ2
|φ=φi
, i = 1, 2. (6.463)
Solution:
1. Theory
Because there are six conditions, it means that the polynomial is of fifth degree and may be
written in the form
s = b0 + b1 + b2
2
+ b3
3
+ b4
4
+ b5
5
, (6.464)
where
=
φ − φ1
φ − φ0
, ∈ [0, 1]. (6.465)
368 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
Moreover, taking into account conditions (6.463), polynomial (6.464) reads
s = s1P1( ) + s2P2( ) + (φ2 − φ1)[v1P3( ) + v2P4( )](φ2 − φ1)2
[a1P5( ) + a2P6( )],
(6.466)
where Pi( ), i = 1, 6, are polynomials of fifth degree in , which satisfy the conditions
P1(0) = 1, P1(1) = 1, P1 (0) = P1 (1) = 0, P1 (0) = P1 (1) = 0, P2(0) = 0, P2(1) = 1,
P2 (0) = P2 (1) = 0, P2 (0) = P2 (1) = 0, P3(0) = P3(1) = 0, P3 (0) = 1, P3 (1) = 1,
P3 (0) = P3 (1) = 0, P4(0) = P4(1) = 0, P4 (0) = 0, P4 (1) = 1, P4 (0) = P4 (1) = 0,
P5(0) = P5(1) = 0, P5 (0) = P5 (1) = 0, P5 (0) = 1, P5 (1) = 0, P6(0) = P6(1) = 0,
P6 (0) = P6 (1) = 0, P6 (0) = 0, P6 (1) = 1.
(6.467)
If we express the polynomial Pi( ) and its derivatives in the form
Pi( ) = c0i + c1i + c2i
2
+ · · · + c5i
5
,
Pi ( ) = c1i + 2c2i + 3c3i
2
+ 4c4i
3
+ 5c5i
4
,
Pi ( ) = 2c2i + 6c3i + 12c4i
2
+ 20c5i
3
, i = 1, 6,
(6.468)
then conditions (6.467) lead to the system
c3i + c4i + c5i = αi, 3c3i + 4c4i + 5c5i = βi, 6c3i + 12c4i + 20c5i = γi, i = 1, 6, (6.469)
where the constants αi, βi, γi and c0i, c1i, c2i, determined for each case, are given in Table 6.10.
The solution
c3i =
20αi − 8βi + γi
2
, c4i = −15αi + 7βi − γi, c5i =
12αi − 6βi + γi
2
(6.470)
is obtained from system (6.469), using the data of Table 6.10; numerical results are given in
Table 6.11.
Thus, the six polynomials read
P1( ) = 1 − 10 3
+ 15 4
− 6 5
, P2( ) = 10 3
− 15 4
+ 6 5
,
P3( ) = − 6 3
+ 8 4
− 3 5
, P4( ) = −4 3
+ 7 4
− 3 5
,
P5( ) =
1
2
2
−
3
2
3
+
3
2
4
−
1
2
5
, P6( ) =
1
2
3
− 4
+
1
2
5
.
(6.471)
Particular case: φ1 = 0 rad, s1 = 0 mm, φ2 = 1 rad, s2 = h = 7 mm, v1 = v2 = 0, a1 = a2 = 0.
2. Particular case
The answer
s = hP2( ) = h 10
φ3
φ3
2
− 15
φ4
φ4
2
+ 6
φ5
φ5
2
= 7(10φ3
− 15φ4
+ 6φ5
) (6.472)
is obtained and the diagram is shown in Figure 6.7.
Problem 6.3
Let us consider the quadrangular mechanism in Figure 6.8, where AB = a, OA = b, BC = CM =
OC = c.
It is required to determine the distance Y0 so that the straight line Y − Y0 = 0 approximates
the trajectory of the point M on the interval φ ∈ [−π/2, π/2] in the sense of the mini–max
method.
APPLICATIONS 369
TABLE 6.10 The Values c0i, c1i, c2i, αi, βi, and γi
i c0i c1i c2i αi βi γi
1 1 0 0 −1 −1 0
2 0 1 0 1 1 0
3 0 0 0 −1 −1 0
4 0 0 0 0 0 1
5 0 0 1/2 −1/2 −1/2 −1
6 0 0 0 0 0 1
TABLE 6.11 The Values c3i, c4i, and c5i
i c3i c4i c5i
1 −10 15 −6
2 10 −15 6
3 −6 8 −3
4 −4 7 −3
5 −3/2 3/2 −1/2
6 1/2 −1 1/2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
1
2
3
4
5
6
7
ϕ (rad)
s(mm)
Figure 6.7 The variation diagram s = s(φ).
Solution:
1. Theoretical aspects
Let us consider the function y = y(x), the graphic of which is symmetric on the interval [−a, a]
(Fig. 6.9). We wish to determine the straight line y − y0 = 0, which approximates this curve in the
sense of the mini–max method.
Let us choose, for example,
y0i = y(0) + yi; (6.473)
370 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
M(X ,Y)
C
B
A
O
Y
X
Y0
β
ϕ
Figure 6.8 Problem 6.3.
y
−a a
x
Figure 6.9 Theoretical aspects.
we then calculate
max|y(xi) − y0i| = yi
max. (6.474)
We construct a table such as the following for each yi.
x yi
max
0 0.2
0.01 0.3
0.02 0.5
...
...
a 0.01
The above table has been created for y = 0.2.
We thus obtain a sequence of data of the following form.
yi yi
max
0 0.5
0.01 0.8
...
...
0.5 0.125
...
...
APPLICATIONS 371
y0
y
x
2
−1 1
Figure 6.10 Function (6.477).
The minimum in this table is obtained (in the case given by us) for yi = 0.5 and has the value
yi
max = 0.125 = minimum. (6.475)
We deduce the required straight line of the equation
y0 − y(0) = 0.125. (6.476)
Sometimes, the problem may be solved analytically also.
Let there be a function (Fig. 6.10) with
y = 2x2
, x ∈ [−1, 1], (6.477)
for which we consider
y0 < f (1) = 2. (6.478)
It follows immediately that
g(x) = |y0 − 2x2
| =



y0 − 2x2
for |x| ≤
y0
√
2
,
2x2
− y0 for |x| >
y0
√
2
.
(6.479)
In the first case of formula (6.479), we deduce gmax = y0, while in the second case we have
gmax = 2 − y0.
It follows that the required straight line is given by
y0 = 1, y − 1 = 0. (6.480)
Let us return to the problem in Figure 6.8. The triangle OBM is rectangular at O so that there
result the relations
OM = BM2 − OB2 = 4c2 − (a2 + c2 + 2ac cos φ). (6.481)
372 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
Thus, also result
cos β =
b + a cos φ
a2 + b2 + 2ab cos φ
, sin β =
a sin φ
a2 + b2 + 2ab cos φ
; (6.482)
hence
XM = OM cos
π
2
+ β = −
a sin φ 4c2 − (a2 + b2 + 2ab cos φ)
a2 + b2 + 2ab cos φ
, (6.483)
YM = OM sin
π
2
+ β =
(b + a cos φ) 4c2 − (a2 + b2 + 2ab cos φ)
a2 + b2 + 2ab cos φ
. (6.484)
Because XM (−φ) = −XM (φ), YM (−φ) = YM (φ), it follows that the trajectory of the point M
is symmetric with respect to the OY -axis.
Numerical application for a = 0.1 m, b = 0.2 m, c = 0.25 m.
2. Numerical calculation
Expressions (6.483) and (6.484) become
XM = −
0.1 sin φ
√
0.2 − 0.04 cos φ
√
0.05 + 0.04 cos φ
, (6.485)
YM =
(0.2 + 0.1 cos φ)
√
0.2 − 0.04 cos φ
√
0.05 + 0.04 cos φ
. (6.486)
Denoting now
φ =
π
2
φ∗
, φ∗
∈ [−1, 1], (6.487)
we obtain the following table of values.
φ∗
XM YM
−1 0.200000 0.400000
−0.9 0.183292 0.400183
−0.8 0.164973 0.400529
−0.7 0.145533 0.400825
−0.6 0.125354 0.400968
−0.5 0.104726 0.400934
−0.4 0.083858 0.400758
−0.3 0.062893 0.400505
−0.2 0.041912 0.400251
−0.1 0.020947 0.400067
0 0.000000 0.400000
0.1 −0.020947 0.400067
0.2 −0.041912 0.400251
0.3 −0.062893 0.400505
0.4 −0.083858 0.400758
0.5 −0.104726 0.400934
0.6 −0.125354 0.400968
0.7 −0.145533 0.400825
0.8 −0.164973 0.400529
0.9 −0.183292 0.400183
1 −0.200000 0.400000
APPLICATIONS 373
We consider now the step
Y = 10−6
m (6.488)
and the interval 0.4 m ≤ Y ≤ 0.401 m.
For each Y we have constructed a table of the following form (in this case the table has been
created for Y = 0.4 m)
Xi
M Yi
M |Yi
M − Y|
0.200000 0.400000 0.000000
0.183292 0.400183 0.000183
0.164973 0.400529 0.000529
0.145533 0.400825 0.000825
0.125354 0.400968 0.000968
0.104726 0.400934 0.000934
0.083858 0.400758 0.000758
0.062893 0.400505 0.000505
0.041912 0.400251 0.000251
0.020947 0.400067 0.000067
0.000000 0.400000 0.000000
−0.020947 0.400067 0.000067
−0.041912 0.400251 0.000251
−0.062893 0.400505 0.000505
−0.083858 0.400758 0.000758
−0.104726 0.400934 0.000934
−0.125354 0.400968 0.000968
−0.145533 0.400825 0.000825
−0.164973 0.400529 0.000529
−0.183292 0.400183 0.000183
−0.200000 0.400000 0.000000
From the above table, it follows
max|Yi
M − Y| = 0.000968. (6.489)
Analyzing each table, we deduce the value
min max|Yi
M − Y| = 0.000484 (6.490)
obtained for
Y0 = 0.400484 m; (6.491)
hence, the equation of the required straight line is
Y − 0.400484 = 0. (6.492)
In Figure 6.11 the trajectory of the point M has been drawn (with a continuous line), as also the
straight line (6.492) (with a broken line).
374 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
−0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2
0.4
0.4001
0.4002
0.4003
0.4004
0.4005
0.4006
0.4007
0.4008
0.4009
0.401
X (m)
Y(m)
Figure 6.11 Trajectory of the point M (continuous line) and its approximation by the straight line
(6.492) (broken line).
FURTHER READING
Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of
America.
Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis:
Theory, Methods and Practice. Boca Raton: CRC Press.
Atkinson KE (1989). An Introduction to Numerical Analysis. 2nd ed. New York: John Wiley & Sons,
Inc.
Atkinson KE (1993). Elementary Numerical Analysis. 2nd ed. New York: John Wiley & Sons, Inc.
Atkinson K, Han W (2010). Theoretical Numerical Analysis: A Functional Analysis Framework. 3rd
ed. New York: Springer-Verlag.
Bakhvalov N (1976). M´ethodes Num´erique. Moscou: Editions Mir (in French).
Berbente C, Mitran S, Zancu S (1997). Metode Numerice. Bucures¸ti: Editura Tehnic˘a (in Romanian).
Bloch S (1951). Angen¨aherte Synthese von Mechanismen. Berlin: Verlag Technik (in German).
Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole.
Butt R (2009). Introduction to Numerical Analysis Using MATLAB. Boston: Jones and Bartlett Pub-
lishers.
Chapra SC (1996). Applied Numerical Methods with MATLAB for Engineers and Scientists. Boston:
McGraw-Hill.
Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson.
Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice Hall.
D´emidovitch B, Maron I (1973). ´El´ements de Calcul Num´erique. Moscou: Editions Mir (in French).
DiBenedetto E (2010). Classical Mechanics: Theory and Mathematical Modeling. New York: Springer-
Verlag.
Epperson JF (2007). An Introduction to Numerical Methods and Analysis. Hoboken: John Wiley &
Sons, Inc.
FURTHER READING 375
Fung YC, Tong P (2011). Classical and Computational Solid Mechanics. Singapore: World Scientific
Publishing.
Golub GH, van Loan CF (1996). Matrix Computations. 3rd ed. Baltimore: John Hopkins University
Press.
Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen-
tation of Algorithms. Princeton: Princeton University Press.
Hamming RW (1987). Numerical Methods for Scientists and Engineers. 2nd ed. New York: Dover
Publications.
Hamming RW (2012). Introduction to Applied Numerical Analysis. New York: Dover Publications.
Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing.
Hildebrand FB (1987). Introduction to Numerical Analysis. 2nd ed. New York: Dover Publications.
Hoffman JD (1992). Numeical Methods for Engineers and Scientists. New York: McGraw-Hill.
Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd
ed. Boca Raton: CRC Press.
Kleppner D, Kolenkow RJ (2010). An Introduction to Mechanics. Cambridge: Cambridge University
Press.
Kress R (1996). Numerical Analysis. New York: Springer-Verlag.
Krˆılov AN (1957). Lect¸ii de Calcule prin Aproximat¸ii. Bucures¸ti: Editura Tehnic˘a (in Romanian).
Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill.
Lange K (2010). Numerical Analysis for Statisticians. 2nd ed. New York: Springer-Verlag.
Lurie AI (2002). Analytical Mechanics. New York: Springer-Verlag.
Mabie HH, Reinholtz CF (1987). Mechanisms and Dynamics of Machinery. 4th ed. New York: John
Wiley & Sons, Inc.
Marciuk GI (1983). Metode de Analiz˘a Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Roma-
nian).
Marciuk GI, S¸aidurov VV (1981). Cres¸terea Preciziei Solut¸iilor ˆın Scheme cu Diferent¸e. Bucures¸ti:
Editura Academiei Romˆane (in Romanian).
Meriam JL, Kraige LG (2012). Engineering Mechanics: Dynamics. Hoboken: John Wiley & Sons, Inc.
Otto SR, Denier JP (2005). An Introduction to Programming and Numerical Methods in MATLAB.
London: Springer-Verlag.
Pandrea N (2000). Elemente de Mecanica Solidului ˆın Coordonate Pl¨uckeriene. Bucures¸ti: Editura
Academiei Romˆane (in Romanian).
Pandrea N, Pˆarlac S, Popa D (2001). Modele pentru Studiul Vibrat¸iilor Automobilelor. Pites¸ti: Tiparg
(in Romanian).
Pandrea N, Popa D (2000). Mecanisme. Teorie s¸i Aplicat¸ii CAD. Bucures¸ti: Editura Tehnic˘a (in
Romanian).
Pandrea N, St˘anescu ND (2002). Mecanic˘a. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
Postolache M (2006). Modelare Numeric˘a. Teorie s¸i Aplicat¸ii. Bucures¸ti: Editura Fair Partners (in
Romanian).
Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific
Computing. 3rd ed. Cambridge: Cambridge University Press.
Quarteroni A, Sacco R, Saleri F (2010). Numerica Mathematics. 2nd ed. Berlin: Springer-Verlag.
Ralston A, Rabinowitz P (2001). A First Course in Numerical Analysis. 2nd ed. New York: Dover
Publications.
Reza F (1973). Spat¸ii Liniare. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a p. 4 (in Romanian).
Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press. pp. 1–8, 10.
Rivi`ere B (2008). Discontinuous Galerkin Methods for Solving Elliptic and Parabolic Equations:
Theory and Implementation. Philadelphia: SIAM. 9.
376 INTERPOLATION AND APPROXIMATION OF FUNCTIONS
Salvadori MG, Baron ML (1962). Numerical Methods in Engineering. Englewood Cliffs: Prentice
Hall.
Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson.
Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN.
Bucures¸ti: Editura Tehnic˘a (in Romanian).
St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
Stoer J, Bulirsh R (2010). Introduction to Numerical Analysis. 3rd ed. New York: Springer-Verlag.
S¨uli E, Mayers D (2003). An Introduction to Numerical Analysis. Cambridge: Cambridge University
Press.
Udris¸te C, Iftode V, Postolache M (1996). Metode Numerice de Calcul. Algoritmi s¸i Programe Turbo
Pascal. Bucures¸ti: Editura Tehnic˘a (in Romanian).
7
NUMERICAL DIFFERENTIATION
AND INTEGRATION
7.1 INTRODUCTION
The numerical differentiation is used if the function to differentiate is defined in a numerical form
by its values yi at the knots xi,
yi = f (xi), i = 0, n, (7.1)
with f : D ⊂ R → R, or if the expression of the function is very complicated and difficult to use,
or if the function is the solution of an equation or of a system of equations.
The operation of differentiation is, in general, avoided, because it increases the small errors.
Such an example is given in Figure 7.1, where the function f has been drawn by an unbroken line,
while its approximate f has been drawn by a broken one. The function and its approximate pass
through the points Ai−1(xi−1, yi−1), Ai(xi, yi), Ai+1(xi+1, yi+1). The straight line (τ) is tangent to
the graph of the function f at the point Ai−1(xi−1, yi−1), while the straight line (τ1) marks the
tangent to the graph of the approximate f at the very same point Ai−1(xi−1, yi−1). Thus we obtain
the relation
tan α = f (xi−1), tan α1 = f (xi−1) (7.2)
and in the figure, we observe that the error is |tan α − tan α1|.
7.2 NUMERICAL DIFFERENTIATION BY MEANS OF AN EXPANSION
INTO A TAYLOR SERIES
Let f : [a, b] → R be of class C3
([a, b]) and let
a = x0 < x1 < x2 < · · · < xn = b (7.3)
Numerical Analysis with Applications in Mechanics and Engineering, First Edition.
Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.
 2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc.
377
378 NUMERICAL DIFFERENTIATION AND INTEGRATION
x
Ai(xi,yi)
Ai+1(xi+1,yi+1)
Ai−1(xi−1,yi−1)
(τ)
α1α
y
O
(τ1)
Figure 7.1 Numerical differentiation.
be a division of the interval [a, b]. Let us denote by h the magnitude
h = xi − xi−1 (7.4)
and by h1 the magnitude
h1 = xi+1 − xi. (7.5)
In the general case, h = h1 and we may write
h1 = hα, (7.6)
where α ∈ R∗
+.
Let us consider now the expansion into a Taylor series of the function f around the point xi
f (x) = f (xi) +
x − xi
1!
f (xi) +
(x − xi)2
2!
f (xi) +
(x − xi)3
3!
f (ξ), (7.7)
where ξ is a point situated between x and xi. We may also write
ξ = xi + θ(x − xi), (7.8)
where θ ∈ (0, 1). It follows that
f (x) = f (xi) +
x − xi
1!
f (xi) +
(x − xi)2
2!
f (xi) +
(x − xi)3
3!
f [xi + θ(x − xi)]. (7.9)
Let us now consider the values x = xi−1 and x = xi+1, i = 1, n − 1. We thus obtain
f (xi+1) = f (xi) +
αh
1!
f (xi) +
(αh)2
2!
f (xi) +
(αh)3
3!
f (ξi), (7.10)
with ξi situated between xi and xi+1, and
f (xi−1) = f (xi) −
h
1!
f (xi) +
h2
2!
f (xi) −
h3
3!
f (ζi), (7.11)
where ζi is situated between xi−1 and xi, respectively.
NUMERICAL DIFFERENTIATION BY MEANS OF AN EXPANSION INTO A TAYLOR SERIES 379
We will now subtract the last two relations one from another, obtaining
f (xi+1) − f (xi−1) =
(α + 1)h
1!
f (xi) +
h2(α2 − 1)
2!
f (xi)
+
(αh)3
3!
f (ξi) +
h3
3!
f (ζi), (7.12)
from which
f (xi) =
1
(α + 1)h
(f (xi+1) − f (xi−1)) +
(1 − α)h
2!
f (xi)
−
h2
3!(α + 1)
(α3
f (ξi) + f (ζi)).
(7.13)
Observation 7.1 If f : [a, b] → R is at least of class C3
([a, b]), then we may consider
f (xi) ≈
1
(α + 1)h
(f (xi+1) − f (xi−1)). (7.14)
We add now relation (7.10) to relation (7.11) multiplied by α. It follows that
f (xi+1) + αf (xi−1) = (1 + α)f (xi) +
αh2
2!
f (xi)(1 + α)
+
(αh)3
3!
f (ξi) −
αh3
3!
f (ζi), (7.15)
from which
f (xi) =
2
α(α + 1)h2
f xi+1 + α f (xi−1) − (1 + α)f (xi)
−
(αh)3
3!
f (ξi) −
αh3
3!
f (ζi) . (7.16)
Observation 7.2 In the same conditions as at the Observation 7.1, we can use the approximate
formula
f (xi) ≈
2
α(α + 1)h2
[αf (xi−1) − (1 + α)f (xi) + f (xi+1)]. (7.17)
Proposition 7.1 Let f : [a, b] → R be at least of class C3
([a, b]). Under these conditions:
(i) the approximation error of f (xi) obtained by using formula (7.14) is
εf =
(1 − α)h
2!
f (xi) −
h2
3!(1 + α)
(α3
f (ξi) + f (ζi)); (7.18)
(ii) the approximation error of f (xi) obtained by using formula (7.17) is
εf =
h
3(α + 1)
(α2
f (ξi) − f (ζi)). (7.19)
Demonstration. It is immediate, using formulae (7.13) and (7.16), respectively.
380 NUMERICAL DIFFERENTIATION AND INTEGRATION
Corollary 7.1 If the knots are equidistant (α = 1), then
(i) formula (7.14) of approximation of the derivative of first order f (xi) takes the form
f (xi) ≈
f (xi+1) − f (xi−1)
2h
, (7.20)
the error being
εf = −
h2
12
(f (ξi) + f (ζi)); (7.21)
(ii) formula (7.17) of approximation of the derivative of second order f (xi) reads
f (xi) ≈
f (xi−1) − 2f (xi) + f (xi+1)
h2
, (7.22)
the error being
εf =
h
6
(f (ξi) − f (ζi)). (7.23)
Corollary 7.2 If f : [a, b] → R, f ∈ C3
([a, b]) and the interpolation knots are equidistant, then
we denote
M = sup
x∈[a,b]
f (x), m = inf
x∈[a,b]
f (x). (7.24)
In this case,
|εf | ≤
h2
6
max{|M|, |m|}, (7.25)
|εf | ≤
h
6
|M − m|. (7.26)
Observation 7.3 We use the approximate formulae
f (x0) ≈
−3f (x0) + 4f (x1) − f (x2)
x2 − x0
, (7.27)
f (xn) ≈
3f (xn) − 4f (xn−1) + f (xn−2)
xn − xn−2
. (7.28)
for the points x0 and xn.
7.3 NUMERICAL DIFFERENTIATION BY MEANS OF INTERPOLATION
POLYNOMIALS
Let the function be f : [a, b] → R and the equidistant interpolation knots xi, i = 0, n, so that
xi+1 − xi = h = const, i = 0, n − 1. (7.29)
We also denote by P (q) Newton’s interpolation polynomial, where q = (x − x0)/h for x in the
superior part of the finite differences table and where q = (x − xn)/h for x in the inferior part
NUMERICAL DIFFERENTIATION BY MEANS OF INTERPOLATION POLYNOMIALS 381
of the finite differences table. We approximate the derivative f (x) = df/dx by the derivative of
Newton’s polynomial at the very same point
f (x) ≈
dP
dx
. (7.30)
We mention that we may write
dP
dx
=
dP
dq
dq
dx
=
1
h
dP
dq
,
d2
P
dx2
=
d
dx
dP
dx
=
1
h
d
dx
dP
dq
=
1
h2
d2
P
dq2
, . . . , (7.31)
dk
P
dxk
=
1
hk
dk
P
dqk
, . . .
Lemma 7.1 Let x(n)
be the generalized power of nth order. Under these conditions
dk
dxk
x(n)
= n(n − 1) · · · (n − k + 1)x(n−k)
. (7.32)
Demonstration. We have
x(n)
= nhx(n−1)
(7.33)
and
d
dx
x(n)
= lim
h→0
x(n)
h
= nx(n−1)
. (7.34)
Step by step, we obtain formula (7.32).
Let P (q) be Newton’s forward polynomial
P (q) = y0 +
q(1)
1!
y0 +
q(2)
2!
2
y0 + · · · +
q(n)
n!
n
y0. (7.35)
Under these conditions, assuming that q = (x − x0)/h, Lemma 7.1 leads to
dP
dx
=
1
h
y0 +
2q(1)
2!
2
y0 +
3q(2)
3!
3
y0 + · · · +
nq(n−1)
n!
n
y0
=
1
h
y0 +
q(1)
1!
2
y0 +
q(2)
2!
3
y0 + · · · +
q(n−1)
(n − 1)!
n
y0 , (7.36)
d2
P
dx2
=
1
h2
2
y0 +
2q(1)
2!
3
y0 +
3q(2)
3!
4
y0 + · · · +
(n − 1)q(n−2)
(n − 1)!
n
y0
=
1
h2
2
y0 +
q(1)
1!
3
y0 +
q(2)
2!
4
y0 + · · · +
q(n−2)
(n − 2)!
n
y0 . (7.37)
In general, we may write
dkP
dxk
=
1
hk
k
y0 +
q(1)
2!
k+1
y0 +
q(2)
2!
k+2
y0 + · · · +
q(n−k)
(n − k)!
n
y0 . (7.38)
382 NUMERICAL DIFFERENTIATION AND INTEGRATION
Let us consider now Newton’s backward polynomial
P (q) = yn +
q(1)
1!
yn−1 +
(q + 1)(2)
2!
2
yn−2 + · · · +
(q + n − 1)(n)
n!
n
y0. (7.39)
Applying again Lemma 7.1 with q = (x − xn)/h, we have
dP
dx
=
1
h
yn−1 +
2(q + 1)(1)
2!
2
yn−2 + · · · +
n(q + n − 1)(n−1)
n!
n
y0
=
1
h
yn−1 +
(q + 1)(1)
1!
2
yn−2 + · · · +
(q + n − 1)(n−1)
(n − 1)!
n
y0 , (7.40)
d2
P
dx2
=
1
h2
2
yn−2 +
2(q + 2)(1)
2!
3
yn−3 + · · · +
(n − 1)(q + n − 1)(n−2)
(n − 1)!
n
y0
=
1
h2
2
yn−2 +
(q + 2)(1)
1!
3
yn−3 + · · · +
(q + n − 1)(n−2)
(n − 2)!
n
y0 (7.41)
and, in general,
dk
P
dxk
=
1
hk
k
yn−k +
(q + k)(1)
2!
k+1
yn−k−1 + · · · +
(q + n − 1)(n−k)
(n − k)!
n
y0 . (7.42)
7.4 INTRODUCTION TO NUMERICAL INTEGRATION
We want to calculate the integrals
I =
b
a
f (x)dx, (7.43)
where −∞ ≤ a < b ≤ ∞, f being integrable on [a, b].
In general, two situations exist
The first case is that of a proper integral (7.43), which will be considered at this place.
The second case assumes that the integral (7.43) is an improper one. Several techniques exist to
transform an improper integral into a proper one, or to approximate with an imposed precision
the value of the improper integral.
If the interval [a, b] is an infinite one, that is, if the integral (7.43) has one of the forms
I =
b
−∞
f (x)dx, I =
∞
a
f (x)dx, I =
∞
−∞
f (x)dx, (7.44)
then we may use the following techniques to calculate the improper integrals:
• the change of variable which may lead to the transformation of the infinite interval (−∞, b],
[a, ∞) or (−∞, ∞) into an interval of finite length;
INTRODUCTION TO NUMERICAL INTEGRATION 383
• the separation of the integral in two integrals of the form
b
−∞
f (x)dx =
b1
−∞
f (x)dx +
b
b1
f (x)dx,
∞
a
f (x)dx =
a1
a
f (x)dx +
a1
∞
f (x)dx, (7.45)
∞
−∞
f (x)dx =
a2
−∞
f (x)dx +
b2
a2
f (x)dx +
b2
∞
f (x)dx.
The idea is that if |ai|, |bi|, i = 1, 2 are sufficient great, then the improper integrals
b1
−∞ f (x)dx,
∞
a1
f (x)dx,
a2
−∞ f (x)dx and
∞
b2
f (x)dx may be neglected, the values of the
integrals in formula (7.45) being given by
b
−∞
f (x)dx ≈
b
b1
f (x)dx,
∞
a
f (x)dx ≈
a1
a
f (x)dx,
∞
−∞
f (x)dx ≈
b2
a2
f (x)dx.
(7.46)
A question rises: what can we understand by |ai|, |bi|, i = 1, 2, sufficient great? In general,
the answer to this question is based on the following considerations: we may analytically show
that the improper integrals neglected b1
−∞ f (x)dx, ∞
a1
f (x)dx,
a2
−∞ f (x)dx and ∞
b2
f (x)dx may be less
than an ε a priori given for |ai|, |bi|, i = 1, 2, sufficient great in modulus, or we calculate the
integrals
b
b1
f (x)dx,
b1
d1
f (x)dx, d1 b1, (7.47)
a1
a
f (x)dx,
c1
a1
f (x)dx, c1 a1, (7.48)
b2
a2
f (x)dx,
a2
c2
f (x)dx,
d2
b2
f (x)dx, c2 a2, d2 b2, (7.49)
and we show that
b1
a1
f (x) dx
b
b1
f (x)dx
1,
c1
a1
f (x) dx
a1
a
f (x)dx
< 1,
a2
c2
f (x) dx +
d2
b2
f (x) dx
b2
a2
f (x) dx
1; (7.50)
• if the asymptotic behavior of f (x) that is, we know that the functions g1(x) and g2(x) so that
lim
x→∞
f (x)
g1(x)
= µ1, lim
x→−∞
f (x)
g2(x)
= µ2, (7.51)
where µ1 and µ2 are two finite real values, then we may write the approximate relations
a
−∞
f (x)dx ≈ µ2
a1
−∞
g2(x)dx +
a
a1
f (x)dx,
∞
b
f (x)dx ≈ µ2
b1
b
f (x)dx +
∞
b1
g1(x)dx,
∞
−∞
f (x)dx = µ2
a2
−∞
g2(x)dx +
b2
a2
f (x)dx + µ1
∞
b2
g1(x)dx; (7.52)
384 NUMERICAL DIFFERENTIATION AND INTEGRATION
• a last method to solve the problem of the improper integral on an infinite interval is that by
a change of variable, which transforms the infinite limit into a finite one. But, in many cases,
this technique introduces a singularity.
The last situation that may appear for the integral (7.43) is that in which the interval [a, b] is
bounded, but
lim
x→a
f (x)dx = ±∞ or lim
x→b
f (x)dx = ±∞. (7.53)
There are several methods to avoid the singularities, that is:
• their elimination, by using the integration by parts, the change of variable etc.;
• the use of certain Gauss type quadrature formulae, which eliminate some types of singularities,
using other polynomials as Legendre ones;
• the use of Gauss type quadrature formulae with Legendre polynomials, because the calculation
of the values of the function f at the points a and b is not necessary;
• the division of the integral in several integrals of the form
b
a
f (x)dx =
a+ε1
a
f (x)dx +
b−ε2
a+ε1
f (x)dx +
b
b−ε2
f (x)dx, (7.54)
using a very small integration step for the first and the last integral in the right member, which
leads to a very great time of calculation;
• the transformation of the finite interval in an infinite one by a certain change of variable, the
new integral thus obtained being easier to calculate.
7.5 THE NEWTON–C ˆOTES QUADRATURE FORMULAE
We begin with a definition.
Definition 7.1 A quadrature formula is a numerical procedure by which the value of a definite
integral is approximated by using information about the integrand only at certain points in which
this one is definite.
Let N be a nonzero natural number and the integral
I =
N
0
f (x)dx. (7.55)
Observation 7.4 Any integral of the form
I =
b
a
g(u)du, (7.56)
with −∞ < a < b < ∞, may be brought to form (7.55) by using the change of variable
u = a +
b − a
N
x, du =
b − a
N
dx. (7.57)
Indeed,
I =
b
a
g(u)du =
N
0
g a +
b − a
N
x
b − a
N
dx, (7.58)
THE NEWTON–C ˆOTES QUADRATURE FORMULAE 385
where
f (x) = g a +
b − a
N
x
b − a
N
. (7.59)
Let us further denote by
yi = f (i), i = 0, N, (7.60)
the values of the function f of equation (7.55) at the points i and by LN (x) the Lagrange polynomial
corresponding to the function f on the interval [0, N] and to the division points xi = i, i = 0, N.
We replace the integral (7.55) by the approximate value
I ≈
N
0
LN (x)dx. (7.61)
On the other hand, we have
LN (x) =
N
i=0
(x − 0)(x − 1) · · · (x − i + 1)(x − i − 1) · · · (x − N)
(i − 0)(i − 1) · · · (i − i + 1)(i − i − 1) · · · (i − N)
yi (7.62)
or, equivalently,
LN (x) =
N
i=0
φi(xi)yi, (7.63)
where the notations are obvious. Replacing relation (7.63) in the formula (7.61), we get
I ≈
N
0
N
i=0
φi(x)yidx =
N
i=0
yi
N
0
φi(x)dx =
N
i=0
c(N)
i yi, (7.64)
where
c(N)
i =
N
0
φi(x)dx. (7.65)
Definition 7.2 The formula
I ≈
N
i=0
c(N)
i yi (7.66)
is called the Newton–Cˆotes quadrature formula.1
Proposition 7.2 (Error in the Newton–Cˆotes Quadrature Formula). If the function f is of
class CN+1 and if we denote
M = sup
x∈[0,N]
|f (N+1)
(x)|, (7.67)
then the formula
I −
N
i=0
c(N)
i yi ≤
M
(N + 1)!
N
0
|x(x − 1) · · · (x − N)|dx (7.68)
takes place.
1The formula is named after Sir Isaac Newton (1642–1727) and Roger Cˆotes (1682–1716).
386 NUMERICAL DIFFERENTIATION AND INTEGRATION
Demonstration. If from the error formula of Lagrange’s polynomial
|f (x) − LN (x)| ≤
M
(N + 1)!
|x(x − 1) · · · (x − N)|, (7.69)
we pass to integration, then
I −
N
i=0
c(N)
i yi =
N
0
f (x) dx −
N
0
LN (x)dx
=
N
0
f (x) − LN (x) dx ≤
N
0
|f (x) − LN (x)|dx
≤
N
0
M
(N + 1)!
x(x − 1) · · · (x − N) dx
≤
M
(N + 1)!
N
0
|x(x − 1) · · · (x − N)|dx (7.70)
and the proposition is stated.
Observation 7.5 We can write the exact formula
I −
N
i=0
c(N)
i yi =
f (N+1)
(ξ)
(N + 1)!
N
0
x(x − 1) · · · (x − N)dx (7.71)
too, obtained analogically as equation (7.68), taking into account the expression of the rest in
Lagrange’s polynomial, ξ being a point between 0 and N.
7.6 THE TRAPEZOID FORMULA
This formula is a particular case of the Newton–Cˆotes quadrature formula for N = 1.
Let the integral be
Ii =
xi+1
xi
f (x)dx, (7.72)
where f : [xi, xi+1] → R, xi = xi+1, f at least of class C0 on [xi, xi+1]. We make the change of
variable
x = xi + (xi+1 − xi)u, dx = (xi+1 − xi)du (7.73)
and the integral (7.72) now reads
Ii =
1
0
F(u)du, (7.74)
with
F(u) = f [xi + (xi+1 − xi)u](xi+1 − xi). (7.75)
Taking into account the discussion in Section 7.5, we have
Ii ≈ c(1)
0 y0 + c(1)
1 y1, (7.76)
THE TRAPEZOID FORMULA 387
where
y0 = F(0) = f (xi)(xi+1 − xi), y1 = F(1) = f (xi+1)(xi+1 − xi), (7.77)
c(1)
0 =
1
0
x − 1
0 − 1
dx =
1
0
(1 − x)dx = x −
x2
2
1
0
=
1
2
,
c(1)
1 =
1
0
x − 0
1 − 0
dx =
1
0
xdx =
x2
2
1
0
=
1
2
. (7.78)
It follows that
Ii ≈
xi+1 − xi
2
(f (xi) + f (xi+1)). (7.79)
Definition 7.3 Relation (7.79) is called the trapezoid formula.
Observation 7.6 Relation (7.79) means that the area under the curve y = f (x), equal to the
integral Ii, is approximated by the area of the trapezium hatched in Figure 7.2.
Let f : [a, b] → R be of class C2
on [a, b]. Let us assume that the interval [a, b] is divided into
n equal parts so that
a = x0 < x1 < x2 · · · < xn = b, xj+1 − xj = h =
b − a
n
, j = 0, n − 1. (7.80)
Applying the trapezoid formula on each interval [xj , xj+1] and summing, we obtain
I =
b
a
f (x)dx =
n−1
j=0
xj+1
xj
f (x)dx ≈
n
j=0
xj+1 − xj
2
(f (xj+1) + f (xj ))
=
h
2
[(f (a) + f (x1)) + (f (x1) + f (x2)) + · · · + (f (xn−1) + f (b))],
(7.81)
that is,
I ≈
h
2

f (a) + f (b) + 2
n−1
j=1
f (xj )

 . (7.82)
Definition 7.4 Formula (7.82) is called the generalized trapezoid formula.
O
x
y
xi+1xi
L1 (x)
f(x)
Figure 7.2 The trapezoid formula.
388 NUMERICAL DIFFERENTIATION AND INTEGRATION
Proposition 7.3 (The Error in the Generalized Trapezoid Formula). If f : [a, b] → R, is of
class C2 on [a, b], then the relation
b
a
f (x)dx −
h
2

f (a) + f (b) + 2
n
j=1
f (xj )

 = −
(b − a)2
12u2
f (ξ) (7.83)
holds, where ξ is a point situated between a and b, while xj , j = 0, n, is an equidistant division of
the interval [a, b], with x0 = a, xn = b and xj+1 − xj = h = (b − a)/n.
Demonstration. Let us calculate the error on each interval of the form [xj , xj+1], j = 0, n − 1.
Taking into account Observation 7.4, we have
εj (f (x)) = εj (F(u)) =
F (ζ)
2!
1
0
x(x − 1)dx, (7.84)
where ζ ∈ [0, 1], while
1
0
x(x − 1)dx =
x3
3
−
x2
2
1
0
= −
1
6
. (7.85)
Relation (7.84) now becomes
εj (f (x)) = −
F (ζ)
12
. (7.86)
Formula (7.75) leads to
F (u) = (xj+1 − xj )3
f [xj + (xj+1 − xj )u] (7.87)
and taking into account that xj+1 − xj = h, relation (7.86) reads
εj (f (x)) = −
h3
12
f (ξj ), (7.88)
where ξj is a point in the interval [xj , xj+1]. We have
ε[a,b](f (x)) =
n−1
j=0
εj (f (x)) = −
h3
12
n−1
j=0
f (ξj ) (7.89)
on the entire interval [a, b]. Because f ∈ C2([a, b]), there exists ξ ∈ [a, b] so that
f (ξ) =
1
n
n−1
j=0
f (ξj ) (7.90)
and relation (7.89) becomes
ε[a,b](f (x)) = −
(b − a)3
12n2
f (ξj ), (7.91)
the relation (7.83), which had to be demonstrated.
SIMPSON’S FORMULA 389
Corollary 7.3 In the conditions of Proposition 7.3, denoting
M = sup
x∈[a,b]
|f (x)|, (7.92)
there exists the inequality
|ε[a,b](f (x))| ≤
M
12u2
(b − a)3
. (7.93)
Demonstration. From relation (7.91) we obtain immediately
|ε[a,b](f (x))| =
(b − a)3
12u2
|f (ξ)| ≤
(b − a)3
12n2
sup
ξ∈[a,b]
|f (ξ)| =
M
12u2
(b − a)3
. (7.94)
Observation 7.7 We observe that, by increasing the number of division points (increasing n) the
error ε[a,b](f (x)) diminishes in direct proportion to n2
. This method of increasing the precision may
not always be used, because the growth of n leads to the increasing of the calculation time.
7.7 SIMPSON’S FORMULA
This formula is a particular case of the Newton–Cˆotes formula2
for N = 2.
Let f : [a, b] → R, be of class C0 on [a, b], and let a division of the interval be [a, b] so that
a = x0 < x1 < · · · < x2n = b, xi+1 − xi = h =
b − a
2n
, i = 0, 2n − 1. (7.95)
Let us consider the integral
I2i =
x2i+2
x2i
f (x)dx (7.96)
and let us make the change of variable
x = x2i +
x2i+2 − x2i
2
u, dx =
x2i+2 − x2i
2
du. (7.97)
The integral (7.96) now reads
I2i =
2
0
F(u)du, (7.98)
where
F(u) = f x2i +
x2i+2 − x2i
2
u
x2i+2 − x2i
2
. (7.99)
Corresponding to the Section 7.5, we have
I2i ≈ c(2)
0 y0 + c(2)
1 y1 + c(2)
2 y2, (7.100)
2The method was introduced by Thomas Simpson (1710–1761) in 1750. The method was also known by Bonaven-
tura Francesco Cavalieri (1598–1647) since 1639, Johannes Kepler (1571–1630) since 1609, and James Gregory
(1638–1675) since 1668 in the book The Universal Part of Geometry.
390 NUMERICAL DIFFERENTIATION AND INTEGRATION
where
y0 = F(0) = hf (x2i), y1 = F(1) = hf (x2i+1), y2 = F(2) = hf (x2i+2), (7.101)
c(2)
0 =
2
0
(x − 1)(x − 2)
(0 − 1)(0 − 2)
dx =
2
0
x2
− 3x + 2
2
dx =
x3
6
−
3x2
4
+ x
2
0
=
1
3
,
c(2)
1 =
2
0
x(x − 2)
(1 − 0)(1 − 2)
dx =
2
0
2x − x2
1
dx = x2
−
x3
3
2
0
=
4
3
, (7.102)
c(2)
2 =
2
0
x(x − 1)
(2 − 0)(2 − 1)
dx =
2
0
x2
− x
2
dx =
x3
6
−
x2
4
+ x
2
0
=
1
3
.
We thus obtain
I2i ≈
h
3
(f (x2i) + 4f (x2i+1) + f (x2i+2)). (7.103)
Definition 7.5 Formula (7.103) is called Simpson’s formula.
Observation 7.8 Geometrically, relation (7.103) shows that the integral I, equal to the area under
the curve f (x), is approximated by the area hatched in Figure 7.3 and which is under the L2(x).
Applying Simpson’s formula on each interval [x2j , x2j+2], with j = 0, n − 1 and summing, we
obtain
I =
b
a
f (x)dx ≈
n−1
j=0
I2j =
h
3
n−1
j=0
(y2j + 4y2j+1 + y2j+2)
=
h
3
[y0 + y2n + 4(y1 + y3 + · · · + y2n−1) + 2(y2 + y4 + · · · + y2n−2)]. (7.104)
Definition 7.6 Formula (7.104) is called the generalized Simpson formula.
Proposition 7.4 (The Error in the Generalized Simpson Formula). If f : [a, b] → R, f of
class C4 on [a, b], while xj ,j = 0, 2n, is an equidistant division of the interval [a, b], with x0 = a,
x2n = b and xj+1 − xj = h = (b − a)/(2n), then takes place the relation
b
a
f (x)dx −
h
3
[y0 + y2n + 4(y1 + · · · + y2n−1) + 2(y2 + · · · + y2n−2)] = −
(b − a)5
2880n4
y(4)
(ξ),
(7.105)
where ξ ∈ [a, b].
O x
y
L2(x)
f(x)
x2i x2i+1 x2i+2
Figure 7.3 The Simpson formula.
SIMPSON’S FORMULA 391
Demonstration. Let us consider the interval [x2j , x2j+2] for which the error is
ε2j (f (x)) =
x2j+1+h
x2j
f (x)dx −
h
3
(y2j + 4y2j+1 + y2j+2) (7.106)
or, equivalently,
ε2j (f (x)) =
x2j+1+h
x2j −h
f (x)dx −
h
3
[y(x2j − h) + 4y(x2j+1) + y(x2j+2 + h)], (7.107)
ε2j being a function of h. We have
dε2j
dh
= y(x2j+1 + h) + y(x2j+1 − h) −
1
3
[y(x2j+1 − h) + 4y(x2j+1) + y(x2j+1 + h)]
−
h
3
−
dy x2j+1 − h
dh
+
dy(x2j+1 + h)
dh
(7.108)
and it follows that
dε2j
dh
=
2
3
[y(x2j+1 + h) + y(x2j+1 − h)] −
4
3
y(x2j+1)
−
h
3
−
dy x2j+1 − h
dh
+
dy(x2j+1 + h)
dh
. (7.109)
Further,
d2
ε2j
dh2
=
2
3
dy x2j+1 + h
dh
−
dy(x2j+1 − h)
dh
−
1
3
−
dy x2j+1 − h
dh
+
dy(x2j+1 + h)
dh
−
h
3
d2
y x2j+1 − h
dh2
+
d2
y(x2j+1 + h)
dh2
, (7.110)
that is,
d2
ε2j
dh2
= −
1
3
−
dy x2j+1 − h
dh
−
dy(x2j+1 + h)
dh
−
h
3
d2
y x2j+1 − h
dh2
+
d2
y(x2j+1 + h)
dh2
.
(7.111)
Analogically,
d2
ε2j
dh2
=
1
3
d2
y x2j+1 − h
dh2
+
d2
y(x2j+1 + h)
dh2
−
1
3
d2
y x2j+1 − h
dh2
+
d2
y(x2j+1 + h)
dh2
−
h
3
−
d3
y x2j+1 − h
dh3
+
d3
y(x2j+1 + h)
dh3
= −
h
3
d3
y x2j+1 + h
dh3
−
d3
y(x2j+1 − h)
dh3
. (7.112)
392 NUMERICAL DIFFERENTIATION AND INTEGRATION
Applying Lagrange’s finite increments formula to the function d3
y/dh3
on the interval [y(x2j+1 −
h), y(x2j+1 + h)], it follows that the existence of an intermediate point ξ2j ∈ (x2j+1 − h, x2j+1 + h)
so that
d3
y(x2j+1 + h)
dh3
−
d3
y(x2j+1 − h)
dh3
= 2h
d4
y(ξ2j )
dh4
, (7.113)
hence
d3
ε2j
dh3
= −
2h2
3
d4
y(ξ2j )
dh4
. (7.114)
On the other hand, we have
ε2j (0) = 0,
dε2j(0)
dh
= 0,
d2
ε2j(0)
dh2
= 0 (7.115)
and, by successive integration of formula (7.114) between 0 and h, we obtain
d2
ε2j (h)
dh2
=
d2
ε2j (0)
dh2
+
h
0
d3
ε2j (τ)
dτ3
dτ = −2
d4
y(ξ2j )
dh4
h
0
τ2
dτ = −
2
9
h3
d4
y(ξ2j )
dh4
, (7.116)
dε2j (h)
dh
=
dε2j (0)
dh
+
h
0
d2
ε2j (τ)
dτ2
dτ = −
2
9
d4
y(ξ2j )
dh4
h
0
τ3
dτ = −
1
18
h4
d4
y(ξ2j )
dh4
, (7.117)
ε2j (h) = ε2j (0) +
h
0
dε2j(τ)
dτ
dτ = −
1
18
d4
y(ξ2j )
dh4
h
0
τ4
dτ = −
1
90
h5
d4
y(ξ2j )
dh4
. (7.118)
It follows that
ε2j (h) = −
h5
90
y(4)
(ξ2j ), (7.119)
where ξ2j ∈ (x2j , x2j+2).
Summing on the entire interval [a, b], we obtain the error
ε[a,b](f (x)) = −
h5
90
n−1
j=0
y(4)
(ξ2j ). (7.120)
Because f is of class C4
on [a, b], there exists ξ ∈ [a, b] so that
1
n
n−1
j=0
y(4)
(ξ2j ) = y(4)
(ξ) (7.121)
and expression (7.120) of the error reads
ε[a,b](f (x)) = −
nh5
90
y(4)
(ξ). (7.122)
Taking into account that h = (b − a)/(2n), the last formula leads to
ε[a,b](f (x)) = −
n
90
(b − a)5
32n5
y(4)
(ξ) = −
(b − a)5
2880n4
y(4)
(ξ), (7.123)
that is, relation (7.105) which had to be stated.
EULER’S AND GREGORY’S FORMULAE 393
Corollary 7.4 In the conditions of Proposition 7.4, denoting
M = sup
x∈[a,b]
|f (4)
(x)|, (7.124)
the relation
|ε[a,b](f (x))| ≤
M
2880n4
(b − a)5
(7.125)
is valid
Demonstration. From equation (7.123) it follows that
|ε[a,b]f (x)| =
(b − a)5
2880n4
|y(4)
(ξ)| ≤
(b − a)5
2880n4
sup
ξ∈[a,b]
|f (4)
(ξ)| =
M(b − a)5
2880n4
. (7.126)
Observation 7.9 If the number n of the division points increases, then the error decreases in direct
proportion to n4
. But the growth of n cannot be as great as we wish, because the calculation time
may increase too much.
7.8 EULER’S AND GREGORY’S FORMULAE
Definition 7.7 We define the operators ∇, E, D, J called operator of backward differentiation,
operator of shifting, operator of differentiation, and operator of integration, respectively, by the
formulae
∇f (x) = f (x) − f (x − h), (7.127)
Ef(x) = f (x + h), (7.128)
Df (x) = f (x), (7.129)
Jf (x) =
x+h
x
f (t)dt, (7.130)
where h is the division step.
Observation 7.10
(i) There exist the immediate relations
EP
f (x) = f (x + ph), p ∈ N, (7.131)
DJf (x) = JDf(x), (7.132)
DJ = JD = , (7.133)
D−1
f (x) = F(x) + C, (7.134)
where F(x) is a primitive of f (x), while C is a constant,
D−1
f (x) = Jf(x), (7.135)
= E − 1, ∇ = 1 − E−1
, (7.136)
394 NUMERICAL DIFFERENTIATION AND INTEGRATION
where 1 is the identity operator, 1f (x) = f (x),
DJ = JD = E − 1, (7.137)
p
= Ep
∇p
= (E − 1)p
= Ep
− C1
pEp−1
+ C2
pEp−2
− · · · + (−1)p−1
Cp−1
p E + (−1)p
,
(7.138)
∇p
yk = yk − C1
k yk−1 + C2
pyk−2 − · · · + (−1)p−1
yk−p+1 + (−1)p
yk−p, (7.139)
(1 − ∇)(1 + ∇ + ∇2
+ · · · + ∇p
) = 1 − ∇p+1
, (7.140)
(1 − ∇)−1
= 1 + ∇ + ∇2
+ · · · + ∇p
+ · · · =
∞
i=0
∇i
. (7.141)
(ii) If the function f is a polynomial of nth degree, then
(1 − ∇)−1
= 1 + ∇ + ∇2
+ · · · + ∇n
. (7.142)
Let us consider the sum
k−1
l=m
f (x0 + lh) = f (xm) + f (xm+1) + · · · + f (xk−1) = ym + ym+1 + · · · yk−1, (7.143)
where yi = f (xi), i ∈ N.
The problem is connected to finding a function F(x) with the property F(x) = f (x). Indeed,
if we find such a function F(x), then
k−1
l=m
f (x0 + lh) = F(xm+1) − F(xm) + F(xm+2) − F(xm+1) + · · ·
+ F(xk) − F(xk−1) = F(xk) − F(xm).
(7.144)
Writing F(x) = −1
f (x), we have
−1
f (xk) = C +
k−1
l=m
f (xl), (7.145)
k−1
l=l0
f (xl) = −1
f (xk) − −1
f (xl0
), (7.146)
where C is a constant, while l0 is an integer for which m ≤ l0 ≤ k.
If f is a polynomial, then
p−1
l=0
f (xl) = (1 + E + E2
+ · · · + Ep−1
)f (x0) =
Ep
− 1
E − 1
f (x0)
=
(1 + )p
− 1
f (x0) = p +
p (p − 1)
2!
+
p(p − 1)(p − 2)
3!
2
+ · · · +
p (p − 1) · · · (p − n + 1)
n!
n
f (x0), (7.147)
where n is its degree.
EULER’S AND GREGORY’S FORMULAE 395
Let us remark that the formula is useful for n small in comparison with p.
Taking into account the identity
DJ −1
= 1, (7.148)
obtained from equation (7.133), it follows that
hf (x) =
hD
ehD − 1
Jf(x). (7.149)
Definition 7.8 The coefficients Bi of the expansion
t
et − 1
=
∞
i=0
Bi
i!
ti
(7.150)
are called Bernoulli’s numbers.3
Bernoulli’s numbers verify the property
B1 = −
1
2
, B2p+1 = 0, p ∈ N, P = 0. (7.151)
Hence it follows that expression (7.149) now becomes
hf (x) =
∞
i=0
Bi
i!
hi
Di
Jf(x) (7.152)
or
hf (x) =
x+h
x
f (t)dt +
∞
i=1
Bi
i!
hi
Di
Jf(x). (7.153)
If we take into account that
Di
Jf(x) = Di−1
(f (x + h) − f (x)), (7.154)
then relation (7.153) becomes
f (x) =
1
h
x+h
x
f (t)dt +
∞
i=1
Bi
i!
hi−1
(f (i−1)
(x + h) + f (i−1)
(x)) (7.155)
or, equivalently,
p−1
l=0
f (xl) =
1
h
xp
x0
f (t)dt +
∞
i=1
Bi
i!
hi−1
(f (i−1)
(xp) + f (i−1)
(x0)), (7.156)
called the first Euler formula or the first Euler–Maclaurin formula.4
3
The numbers are called after Jacob Bernoulli (1654–1705) who used them in the book Ars Conjectandi published
in 1713. The numbers are also known by Seki Takakazu (Seki K¯owa) (1642–1708).
4The formulae were called after Leonhard Euler (1707–1783) and Colin Maclaurin (1698–1746) who discovered
them in 1735.
396 NUMERICAL DIFFERENTIATION AND INTEGRATION
If we take into account equation (7.151), then relation (7.156) reads
p−1
l=0
f (xl) =
1
h
xp
x0
f (t)dt +
1
2
(f (x0) + f (xp))
+
∞
i=1
B2i
(2i)!
h2i−1
(f (2i−1)
(xp) + f (2i−1)
(x0)). (7.157)
Obviously, if f is a polynomial, then the infinite sum on the right side becomes a finite one.
Analogically, we obtain also the second Euler formula or the second Euler–Maclaurin formula,
in the form
p−1
l=0
f xl +
h
2
=
1
h
xp
x0
f (t)dt −
∞
i=1
(1 − 21−2i
)B2i
(2i)!
h2i−1
(f (2i−1)
(xp) + f (2i−1)
(x0)). (7.158)
In the first Euler formula we express the derivatives at the point x0 by forward differences, while
the derivatives at the point xp by the backward differences in the form
hf (x0) = y0 −
1
2
2
y0 +
1
3
3
y0 −
1
4
4
y0 +
1
5
5
y0 − · · · ,
hf (xp) = ∇yp +
1
2
∇2
yp +
1
3
∇3
yp +
1
4
∇4
yp +
1
5
∇5
yp + · · · ,
h3
f (x0) = 3
y0 −
3
2
4
y0 +
7
4
5
y0 − · · · ,
h3
f (xp) = ∇3
yp +
3
2
∇4
yp +
7
4
∇5
yp · · · ,
(7.159)
then we obtain Gregory’s formula5
xp
x0
f (t)dt = h
1
2
y0 + y1 + y2 + · · · + yp−1 +
1
2
yp
−
h
12
(∇yp − y0) −
h
24
(∇2
yp + 2
y0) −
19h
720
(∇3
yp − 3
y0)
+
3h
180
(∇4
yp + 4
y0) −
863h
60480
(∇5
yp − 5
y0) − · · · (7.160)
7.9 ROMBERG’S FORMULA
Let us suppose that the error in the calculation of the integral
I =
b
a
f (x)dx (7.161)
5The formula was discovered by James Gregory (1638–1675) in 1670.
ROMBERG’S FORMULA 397
may be written in the form
E = Chp
f (p)
(ξ), (7.162)
where integration step is h, C is a positive constant that does not depend on h, p is a natural
nonzero number, while ξ ∈ (a, b).
If we calculate the integral (7.161) with the integration steps h1 and h2, then the errors are
E1 = I − I1 = Ch
p
1 f (p)
(ξ1), (7.163)
E2 = I − I2 = Ch
p
2 f (p)
(ξ2). (7.164)
Let us remark that, in general, ξ1 = ξ2.
Let us suppose that f (p)(ξ1) ≈ f (p)(ξ2). Under these conditions, the integral I may be approx-
imated by Richardson’s formula of extrapolation6
I =
h
p
1 I2 − h
p
2 I1
h
p
1 − h
p
2
= I2 +
I2 − I1
h1
h2
p
− 1
. (7.165)
If, for example, h2 = λh1, then
I =
I2 − λp
I1
1 − λp
= I2 +
I2 − I1
1
λ
p
− 1
. (7.166)
Usually, we consider h2 = h1/2 and it follows that
I =
2p
I2 − I1
2p
− 1
= I2 +
I2 − I1
2p
− 1
. (7.167)
On the other hand, the error in the trapezium formula may be put in the form
E = C1h2
+ C2h4
+ · · · + Cph2p
+ (b − a)h2p+2
+
B2p+2
(2p + 2)!
f (2p+2)
(ξ), (7.168)
where B2k are Bernoulli’s numbers.
Suppose now that the integration step is chosen of the form
hn =
b − a
2n , (7.169)
and let us denote by I(0)
n the value of the integral, calculated with the step hn. We apply Richardson’s
extrapolation formula, in which I(0)
n+1 is the value of the same integral with a halved step. We obtain
the approximation
I(1)
n =
2n
I(0)
n+1 − I(0)
n
2n
− 1
. (7.170)
The procedure may continue and we obtain the general recurrence formulae
I(p)
n =
4p
I
(p−1)
n+1 − I
(p−1)
n
4p
− 1
, (7.171)
I
(p)
0 =
4p
I
(p−1)
1 − I
(p−1)
0
4p
− 1
. (7.172)
6The formula was published by Lewis Fry Richardson (1881–1953) in 1910.
398 NUMERICAL DIFFERENTIATION AND INTEGRATION
TABLE 7.1 Table of the Romberg Procedure
I(0)
0
I(0)
1 I(1)
0
I(0)
2 I(1)
1 I(2)
0
I(0)
3 I(1)
2 I(2)
1 I(3)
0
...
...
...
...
...
Using these formulae, the approximation I
(p)
1 has an error of the order h2p+2
, so that, for example,
in expression (7.168) of the error for I(1)
1 , the term C1h2
does not appear any longer.
This procedure is called the Romberg procedure. 7
Usually, we work in a table form, where the integrals are put as shown in Table 7.1.
7.10 CHEBYSHEV’S QUADRATURE FORMULAE
In the Newton–Cˆotes formulae the division knots have been arbitrarily chosen, the only condition
imposed being that of their equidistance. If this condition is not put and we choose certain points
as division knots, then we obtain Chebyshev’s quadrature formulae.8
Let us consider the integral
I =
1
−1
f (x)dx (7.173)
and let us write the relation
I ≈
n
i=1
Aif (xi), (7.174)
where Ai are certain constants, and xi are the division knots. Obviously, the relation (7.174) is an
equality only in certain cases.
In the case of Chebyshev’s quadrature formulae, the following conditions are put:
(a) the constants Ai, i = 1, n, are equal, that is
A1 = A2 = · · · = An = A; (7.175)
(b) the quadrature formula (7.174) is exact for any polynomial till the degree n inclusive.
Observation 7.11
(i) Let us write the quadrature formula (7.174) for the polynomial f (x) = 1. We obtain
I =
1
−1
dx = 2; (7.176)
taking into account the condition a in Section 7.10, it follows that
I = A1 + A2 + · · · + An = nA, (7.177)
7
Werner Romberg (1909–2003) published the procedure in 1955. In fact, the procedure is an application of the
Richardson extrapolation on the trapezoid formula.
8The formula was called in honor of Pafnuty Lvovich Chebyshev (1821–1894).
CHEBYSHEV’S QUADRATURE FORMULAE 399
from which
A1 = A2 = · · · = An = A =
2
n
. (7.178)
(ii) Because the polynomials 1, x, x2
, . . . , xn
form a basis for the vector space of polynomials
of degree at most n, it follows that we must verify the condition b in Section 7.10 for these
polynomials only. But
1
−1
xk
dx =
xk+1
k + 1
1
−1
=
1 − (−1)k+1
k + 1
(7.179)
and we obtain the system
x1 + x2 + · · · + xn = 0, x1
1
+ x2
2 + · · · + x2
n =
2
3
×
n
2
, x3
1
+ x3
2 + · · · + x3
n = 0, . . . ,
xk
1
+ xk
2 + · · · + xk
n =
1 − (−1)k+1
k + 1
×
n
2
, . . . , xn
1
+ xn
2 + · · · + xn
n =
1 − (−1)n+1
n + 1
×
n
2
.
(7.180)
The solving of system (7.180) in the unknowns x1, x2, . . . , xn is equivalent to the solving
of an algebraic equation of degree n. A question arises: are the solutions of system (7.180)
real and contained in the interval [−1, 1]? The answer to this question is positive only for
n ≤ 7 and n = 9.9
It has been shown that for n = 8 and n ≥ 10 system (7.180) has not only
real roots, hence Chebyshev’s method cannot be applied.
Observation 7.12 Let the integral be
J =
b
a
F(u)du (7.181)
for which we make the change of variable
u =
b + a
2
+
b − a
2
x, du =
b − a
2
dx. (7.182)
It follows that
J =
b
a
F(u)du =
1
−1
F
b + a
2
+
b − a
2
x
b − a
2
dx; (7.183)
denoting
f (x) = F
b + a
2
+
b − a
2
x
b − a
2
, (7.184)
we obtain form (7.173). The quadrature formula now reads
b
a
F(u)du ≈
2
n
b − a
2
n
i=1
F(ui) =
b − a
n
n
i=1
F(ui), (7.185)
9This result belongs to Francis Begnaud Hildebrand (1915–2002) who published it in Introduction to Numerical
Analysis in 1956.
400 NUMERICAL DIFFERENTIATION AND INTEGRATION
where
ui =
b + a
2
+
b − a
2
xi. (7.186)
7.11 LEGENDRE’S POLYNOMIALS
Let us consider an interval [a, b] ⊂ R and let f and g be two functions of class at least Cn
on
[a, b]. The obvious relation
b
a
f (x)g(n)
(x)dx = f (x)g(n−1)
(x)dx|b
a − f (x)g(n−2)
(x)|b
a + f (x)g(n−3)
(x)|b
a
− · · · + (−1)n−1
f (n−1)
(x)g(x)|b
a + (−1)n
b
a
f (n)
(x)g(x)dx (7.187)
takes place in these conditions. We will particularize relation (7.187) taking for f (x) any polynomial
Q(x) of degree at most n − 1 and for g(x) the polynomial An(x − a)n(x − b)n, An ∈ R. Because
the degree of Q(x) is at most n − 1, we get
Q(n)
(x) = 0,
b
a
Q(n)
(x)g(x)dx = 0. (7.188)
From
g(x) = An(x − a)n
(x − b)n
(7.189)
we obtain
g(a) = g (a) = g (a) = · · · = gn−1
(a) = 0,
g(b) = g (b) = g (b) = · · · = gn−1
(b) = 0
(7.190)
and relation (7.187) is reduced now to
An
b
a
Q(x)
dn
dxn
[(x − a)n
(x − b)n
]dx = 0. (7.191)
Let us now denote Pn the polynomial of degree n by Pn, given by
Pn(x) = An
dn
dxn
[(x − a)n
(x − b)n
]. (7.192)
On the other hand, Q(x) is an arbitrary polynomial of degree at most n − 1, so that for Q(x)
we may take the polynomials of a basis of the vector space of the polynomials of degree at most
n − 1, that is the polynomials 1, x, x2
, . . . , xn−1
. We may write
b
a
Pn(x) = 0,
b
a
xPn(x) = 0, . . . ,
b
a
xn−1
Pn(x) = 0. (7.193)
We observe that we may also write the relation
b
a
Pm(x)Pn(x)dx = 0, m = n. (7.194)
LEGENDRE’S POLYNOMIALS 401
Indeed, let us suppose that m < n; we may consider that Pm(x) is one of the polynomials Q(x) of
degree at most n − 1.
Observation 7.13 Relation (7.194) means that the sequence {Pn(x)}x∈N is a sequence of orthog-
onal polynomials on [a, b].
Observation 7.14 The polynomials Pn are unique, with the exception of a multiplicative constant.
Indeed, let us suppose that there exists a sequence n (x) n∈N
of orthogonal polynomials too. We
may write the relations
b
a
Q(x)
n
(x)dx = 0,
b
a
Q(x)Pn(x)dx = 0,
b
a
Q(x)CnPn(x)dx = 0, (7.195)
where Cn is an arbitrary constant, while Q(x) is an arbitrary polynomial of degree at most equal
to n − 1. From the first and the third relation (7.195) we obtain
b
a
CnPn (x) −
n
(x) Q(x)dx = 0. (7.196)
We choose the constant Cn so that the polynomial CnPn(x) − n(x) does have a degree at most
n − 1 and we take
Q(x) = CnPn(x) −
n
(x). (7.197)
We obtain the expression
b
a
CnPn (x) −
n
(x)
2
dx = 0, (7.198)
hence,
CnPn(x) −
n
(x) = 0, (7.199)
that is, {Pn(x)}n∈N are uniquely determined excepting a multiplicative constant.
Definition 7.9 The sequence of polynomials10
Pn(x) =
1
2n
n!
dn
dxn
[(x2
− 1)n
] (7.200)
is called the sequence of Legendre polynomials.
Theorem 7.1 Let {Pn(x)}n∈N be the sequence of Legendre polynomials and let Rn(x) be the
polynomials
Rn(x) = 2n
n!Pn(x). (7.201)
10These polynomials were introduced by Adrien–Marie Legendre (1752–1833) in Recherches sur la figure des
plan`etes published in 1784.
402 NUMERICAL DIFFERENTIATION AND INTEGRATION
Under these conditions, the following affirmations hold:
(i) for any n ∈ N
Pn(1) = 1; (7.202)
(ii) for any n ∈ N
Pn(−1) = (−1)n
; (7.203)
(iii) all the real roots of Legendre’s polynomials Pn(x) are in the interval (−1, 1) for any n ∈ N;
(iv) for any n ∈ N we have
(x2
− 1)Rn(x) = nxRn(x) − 2n2
Rn−1(x); (7.204)
(v) for any n ∈ N we have
Rn+1(x) = 2(2n + 1)xRn(x) − 4n2
Rn−1(x); (7.205)
(vi) the sequence of the polynomials Rn(x) forms a Sturm sequence.
Demonstration.
(i) We rewrite the Legendre polynomial (7.200) by means of the Leibniz formula
dn
dxn
(uv) =
dn
u
dxn
v + C1
n
dn−1
u
dxn−1
dv
dx
+ C2
n
dn−2
u
dxn−2
d2
v
d2x
+ · · · + u
dn
v
dxn
, (7.206)
assuming
u = (x − 1)n
, v = (x + 1)n
. (7.207)
It follows that
Pn(x) =
1
2n
n!
{[(x − 1)n
](n)
(x + 1)n
+ C1
n[(x − 1)n
](n−1)
[(x + 1)n
]
+ C2
n[(x − 1)n
](n−2)
[(x + 1)n
] + · · · + (x − 1)n
[(x + 1)n
](n)
}. (7.208)
But
[(x − 1)n
](k)
|x=1 = 0, k = 1, n − 1, [(x − 1)n
](n)
= n! (7.209)
and
[(x + 1)n
](k)
|x=−1 = 0, k = 1, n − 1, [(x + 1)n
](n)
= n!. (7.210)
Relation (7.208) leads to
Pn(1) =
1
2n
n!
n!(1 + 1)n
= 1. (7.211)
(ii) From equation (7.208) we get
Pn(−1) =
1
2n
n!
n!(−1 − 1)n
= (−1)n
. (7.212)
LEGENDRE’S POLYNOMIALS 403
(iii) Let us observe that the polynomial (x2
− 1)n
and its n − 1 successive derivatives vanish at
the points x = −1 and x = 1. Taking into account Rolle’s theorem, under these conditions,
the first derivative will have a real root in the interval (−1, 1). The first derivative vanishes
at three points x = −1, x = 1 and at a point between −1 and 1 and it follows that the second
derivative will have two distinct roots in the interval (−1, 1). Applying Rolle’s theorem,
step by step, it follows that the (n − 1)th derivative has n − 1 distinct roots in the interval
(−1, 1), hence Pn(x) has n distinct roots in the interval (−1, 1).
(iv) Let us write
Rn(x) = [(x2
− 1)n−1
(x2
− 1)](n)
, (7.213)
a relation to which we apply Leibniz’s formula (7.206) with
u = (x2
− 1)n−1
, v = x2
− 1. (7.214)
It follows that
Rn(x) = [(x2
− 1)n−1
](n)
(x2
− 1) + 2nx[(x2
− 1)n−1
](n−1)
+ n(n − 1)[(x2
− 1)n−1
](n−2)
.
(7.215)
Now, we write
Rn(x) = [(x2
− 1)n
](n)
= 2n[(x2
− 1)n−1
x](n−1)
(7.216)
and apply again Leibniz’s formula (7.206) with
u = (x2
− 1)n−1
, v = x, (7.217)
obtaining
Rn(x) = 2nx[(x2
− 1)n−1
](n−1)
+ 2n(n − 1)[(x2
− 1)n−1
](n−2)
. (7.218)
Multiplying relation (7.215) by 2 and subtracting relation (7.218), we get
Rn(x) = 2(x2
− 1)Rn−1(x) + 2nxRn−1(x). (7.219)
On the other hand,
Rn(x) = [(x2
− 1)n
](n−1)
= 2n[(x2
− 1)n−1
x](n)
, (7.220)
and we may again apply Leibniz’s formula (7.206) with
u = (x2
− 1)n−1
, v = x, (7.221)
Resulting in
Rn(x) = 2nxRn−1(x) + 2n2
Rn−1(x). (7.222)
Multiplying relation (7.219) by nx and relation (7.222) by x2
− 1 and subtracting the results
thus obtained one of the other, we obtain
(x2
− 1)Rn(x) = nxRn(x) − 2n2
Rn−1(x), (7.223)
that is, relation (7.204) which had to be stated.
404 NUMERICAL DIFFERENTIATION AND INTEGRATION
(v) We make n → n + 1 in relation (7.219) it follows that
Rn+1(x) = 2(x2
− 1)Rn(x) + 2(n + 1)xRn(x) (7.224)
or, equivalently,
2(x2
− 1)Rn(x) = Rn+1(x) − 2(n + 1)xRn(x). (7.225)
We multiply relation (7.223) by 2 and subtract expression (7.225) from the result thus
obtained, that is,
0 = 2nxRn(x) + 2(n + 1)xRn(x) − Rn+1(x) − 4n2
Rn−1(x) (7.226)
or
Rn+1(x) = 2(2n + 1)xRn(x) − 4n2
Rn−1(x), (7.227)
that is, relation (7.205) which had to be stated.
(vi) The last polynomial Rn(x) preserves a constant sign (i.e., R0(x)), because it is a constant.
Two neighboring polynomials Rk(x) and Rk+1(x) cannot simultaneously vanish, because
taking into account equation (7.227), Rk−1(x) would vanish too, and step by step R0(x)
would also vanish, which is absurd. If Rn(x) = 0, then from equation (7.227) we obtain
Rn+1(x0) = −4n2
Rn−1(x0), (7.228)
hence
Rn+1(x0)Rn−1(x0) < 0. (7.229)
Let x0 be a root of Rn(x). From equation (7.223) we obtain
(x2
0 − 1)Rn(x0) = nx0Rn(x0) − 2n2
Rn−1(x0) (7.230)
and because Rn(x0) = 0, it follows that
(1 − x2
0 )Rn(x0) = 2n2
Rn−1(x0). (7.231)
But x0 ∈ (−1, 1), because the roots of Legendre’s polynomial
Pn(x) =
1
2n
n!
Rn(x) (7.232)
are in the interval (−1, 1), hence
1 − x2
0 > 0. (7.233)
From equation (7.231) and equation (7.233) it follows that Rn(x0) and Rn−1(x0) have the
same sign. It follows that Rn(x) forms a Sturm sequence.
GAUSS’S QUADRATURE FORMULAE 405
7.12 GAUSS’S QUADRATURE FORMULAE
Let f : [−1, 1] → R and the quadrature formula11
be
I =
1
−1
f (x)dx ≈
n
i=1
Aif (xi). (7.234)
We wish that formula (7.234) be exact for polynomials of a maximum possible degree N. Because
we have 2n unknowns, that is, the constants A1, A2, . . . , An and the knots x1, x2, . . . , xn of the
division, it follows that
N = 2n − 1, (7.235)
because a polynomial of degree 2n − 1 has 2n coefficients. Proceeding as at Chebyshev’s quadrature
formulae, it follows that it is sufficient to satisfy relation (7.234) only for the polynomials 1, x, x2
,
x3, . . . , x2n−1, because they form a basis in the vector space of the polynomials of degree at most
2n − 1.
On the other hand,
1
−1
xk
dx =
1 − (−1)k+1
k + 1
(7.236)
and it follows that the system
A1 + A2 + · · · + An =
1
−1
dx = 2,
A1x1 + A2x2 + · · · + Anxn =
1
−1
x dx = 0,
A1x2
1 + A2x2
2 + · · · + Anx2
n =
1
−1
x2
dx =
2
3
, . . . ,
A1xk
1 + A2xk
2 + · · · + Anxk
n =
1
−1
xk
dx =
1 − (−1)k+1
k + 1
, . . . ,
A1x2n−1
1 + A2x2n−1
2 + · · · + Anx2n−1
n =
1
−1
x2n−1
dx = 0.
(7.237)
Let us consider that
f (x) = xk
Pn(x), k = 0, n − 1, (7.238)
where Pn(x) is Legendre’s polynomial of degree n. Taking into account the properties of the
Legendre polynomial, we have
1
−1
xk
Pn(x)dx = 0, k = 0, n − 1, (7.239)
and from formula (7.234) we get
1
−1
xk
Pn(x)dx =
n
i=1
Aixk
i Pn(xi), k = 0, n − 1. (7.240)
Equating the last two relations, it follows that xi are the roots of Legendre’s polynomial of nth
degree, all these roots being real, distinct, and situated in the interval (−1, 1).
11The method was developed by Carl Friedrich Gauss (1777–1855) in Methodus nova integralium valores per
approximationem inveniendi in 1815. The method is also known as Gauss–Legendre quadrature.
406 NUMERICAL DIFFERENTIATION AND INTEGRATION
We select now the first n equations from system (7.237), which form a linear system of n
equations with n unknowns, that is, the coefficients A1, A2, . . . , An. The determinant of this
system is a Vandermonde one,
=
n
i,j=1
i<j
(xi − xj ) = 0, (7.241)
because the roots xi of Legendre’s polynomial Pn(x) are distinct. It thus follows that the system
has a unique solution.
Observation 7.15 If we have to calculate
J =
b
a
F(u)du, (7.242)
then, by a change of variable
u =
b + a
2
+
b − a
2
x, du =
b − a
2
dx, (7.243)
we obtain
J =
1
−1
F
b + a
2
+
b − a
2
x
b − a
2
dx; (7.244)
denoting
f (x) = F
b + a
2
+
b − a
2
x
b − a
2
, (7.245)
we obtain form (7.234) of the integral,
b
a
F(u)du =
b − a
2
n
i=1
AiF
b + a
2
+
b − a
2
xi , (7.246)
where xi are the roots of the Legendre polynomial.
7.13 ORTHOGONAL POLYNOMIALS
Let us denote by R[X] the set of the polynomials with real coefficients in the indeterminate X.
We define the scalar product on R[X] by
P (x), Q(x) =
b
a
P (x)Q(x)ρ(x)dx, (7.247)
where ρ(x) is a weight function.
Definition 7.10 We say that the polynomials P and Q are orthogonal if and only if P, Q = 0,
where the scalar product , has been defined by relation (7.247).
Observation 7.16 Starting from the sequence of polynomials 1, x, x2
, . . . , we construct a sequence
of orthogonal polynomials P0, P1, . . . , Pn, with the help of the Gramm–Schmidt procedure. Thus,
we have
P0 = 1, P1 = x −
x, P0
P0
2
P0, . . . , Pn = xn
−
n−1
i=1
xn
, Pi
Pi
2
Pi, . . . , (7.248)
ORTHOGONAL POLYNOMIALS 407
where marks the norm defined by
P = P, P . (7.249)
We may thus construct various sequences of orthogonal polynomials.
7.13.1 Legendre Polynomials
In the case of Legendre’s polynomials, we choose a = −1, b = 1, ρ(x) = 1; It follows that
P0 = 1, (7.250)
P1 = x −
x, 1
P0
2
× 1 = x, (7.251)
P2 = x2
−
x2, 1
P0
2
× 1 −
x2, x
P1
× x = x2
−
1
3
, (7.252)
P3 = x3
−
x3
, 1
P0
2
× 1 −
x3
, x
P1
2
× x −
x3
, x2
− 1
3
P2
2
× x2
−
1
3
= x3
−
3
5
x, . . . (7.253)
7.13.2 Chebyshev Polynomials
We define the Chebyshev polynomials by a = −1, b = 1, ρ(x) = 1/
√
1 − x2.
Because
Ik =
1
−1
xk
√
1 − x2
dx = (−xk−1
1 − x2)|1
−1 +
1
−1
(k − 1)xk−2
1 − x2dx
= (k − 1)
1
−1
xk−2
dx
√
1 − x2
− (k − 1)
1
−1
xk
dx
√
1 − x2
= (k − 1)Ik−2 − (k − 1)Ik, (7.254)
we obtain
kIk = (k − 1)Ik−2, (7.255)
that is,
Ik =
k − 1
k
Ik−2. (7.256)
On the other hand,
I0 =
1
−1
dx
√
1 − x2
= π, (7.257)
I1 =
1
−1
x dx
√
1 − x2
= 0, (7.258)
hence
I2p+1 = 0, p ∈ N, (7.259)
I2 =
1
2
I0 =
π
2
, I4 =
3
4
I2 =
3π
8
, . . . ,
I2p =
2p − 1
2p
I2p−2 =
(2p − 1)!!
(2p)!!
π. (7.260)
408 NUMERICAL DIFFERENTIATION AND INTEGRATION
We obtain the Chebyshev polynomials in the form
P0 = 1, (7.261)
P1 = x −
x, 1
P0
2
× 1 = x, (7.262)
P2 = x2
−
x2
, 1
P0
2
× 1 −
x2
, x
P1
2
× x = x2
−
1
2
, (7.263)
P3 = x3
−
x3
, 1
P0
2
× 1 −
x3
, x
P1
2
× x −
x3, x2 − 1
2
P2
2
× x2
−
1
2
= x3
−
3
4
x, . . . (7.264)
7.13.3 Jacobi Polynomials
In case of the Jacobi polynomials,12
a = −1, b = 1, ρ(x) = (1 − x)α
(1 + x)β
, α > −1, β > −1, α,
β integers.
We observe that we obtain various sequences of orthogonal polynomials, depending on the choice
of the parameters α and β. If α = β = 0, then we get Legendre’s polynomials.
Let us take α = β = 1. We have
P0 = 1, (7.265)
P1 = x −
x, 1
P0
2
× 1 = x, (7.266)
P2 = x2
−
x2, 1
P0
2
× 1 −
x2, x
P1
2
x = x2
−
1
5
, (7.267)
P3 = x3
−
x3
, 1
P0
2
× 1 −
x3
, x
P1
2
× x −
x3
, x2
− 1
5
P2
2
× x2
−
1
5
= x3
−
3
7
x, . . . (7.268)
7.13.4 Hermite Polynomials
In the case of the Hermite polynomials13
we have a = −∞, b = ∞, ρ(x) = exp(−x2
).
We may write
Ik =
∞
−∞
xk
e−x2
dx = −
xk−1
2
e−x2
∞
−∞
+ (k − 1)
∞
−∞
xk−2
2
e−x3
dx =
k − 1
2
Ik−2. (7.269)
On the other hand,
I0 =
∞
−∞
e−x2
dx =
√
π, (7.270)
I1 =
∞
−∞
xe−x2
dx = 0; (7.271)
12
These polynomials were introduced by Carl Gustav Jacob Jacobi (1804–1851).
13These polynomials were named in honor of Charles Hermite (1822–1901) who studied them in Sur un nouveau
d´eveloppement en s´erie de fonctions in 1864. They were also studied by Pierre–Simon Laplace (1749–1827) in a
memoir since 1810 and Chebyshev in Sur le d´eveloppement des fonctions `a une seulle variable in 1859.
ORTHOGONAL POLYNOMIALS 409
hence, it follows that
I2p+1 = 0, p ∈ N, (7.272)
I0 =
√
π, I2 =
1
2
I0 =
√
π
2
, I4 =
3
4
√
π, . . . ,
I2p =
2p − 1
2
I2p−2 =
(2p − 1)!!
2p
√
π, . . . (7.273)
We obtain the Hermite polynomials
P0 = 1, (7.274)
P1 = x −
x, 1
P0
2
× 1 = x, (7.275)
P2 = x2
−
x2
, 1
P0
2
× 1 −
x2
, x
P1
2
× x = x2
−
1
2
, (7.276)
P3 = x3
−
x3
, 1
P0
2
× 1 −
x3
, x
P1
2
× x −
x3
, x2
− 1
2
P2
2
× x2
−
1
2
= x3
−
3
2
x, . . . (7.277)
7.13.5 Laguerre Polynomials
The Laguerre polynomials14
are defined by a = 0, b = ∞, ρ(x) = e−x
xα
, α integer.
Obviously, we obtain various sequences of Laguerre polynomials as function of the exponent α.
We may consider the case α = 1.
Taking into account that
Ik =
∞
0
xk
xe−x
dx = (−xk+1
e−x
)|∞
0 + (k + 1)
∞
0
xk−1
xe−x
dx = (k + 1)Ik−1, (7.278)
I0 =
∞
0
xe−x
dx = (−xe−x
)|∞
0 +
∞
0
e−x
dx = 1, (7.279)
we get
I1 = 2I0 = 2, I2 = 3I1 = 6, . . . , Ik = (k + 1)Ik = (k + 1)!. (7.280)
We obtain thus Laguerre’s polynomials
P0 = 1, (7.281)
P1 = x −
x, 1
P0
2
× 1 = x − 2, (7.282)
P2 = x2
−
x2
, 1
P0
2
× 1 −
x2
, x − 2
P1
2
× (x − 2) = x2
− 6x + 6, (7.283)
P3 = x3
−
x3, 1
P0
2
× 1 −
x3, x − 2
P1
2
× (x − 2) −
x3, x2 − 6x + 6
P2
2
× (x2
− 6x + 6) · · ·
= x3
− 12x2
+ 36x − 24, (7.284)
14They are called after Edmond Nicolas Laguerre (1834–1886) who studied them in 1879.
410 NUMERICAL DIFFERENTIATION AND INTEGRATION
7.13.6 General Properties of the Orthogonal Polynomials
Let us begin with a remark.
Observation 7.17
(i) The complex roots λ1, λ2, . . . , λn of the polynomials Pj , j = 1, m, given by formulae
(7.248), satisfy the relation
λk =
xQi, Qi
Qi
2
, k = 1, n, (7.285)
in which
Qi(x) =
n
l=1
l=i
(x − λl), i = 1, n. (7.286)
Indeed, if λk is a root of Pn(x), then
Pn(x) = (x − λk)Qk(x); (7.287)
from the orthogonality condition
Qk, Pn = 0 (7.288)
we get
0 = Qk, Pn = Qk, xQk − Qk, λkQk , (7.289)
that is, a relation equivalent to equation (7.285).
(ii) The scalar product defined by relation (7.247) has the property of symmetry, that is, we have
the relation
xP, Q = P, xQ . (7.290)
Proposition 7.5 If the scalar product considered in relation (7.248) satisfies the condition (7.285),
then the polynomials P0, P1, . . . , Pm verify the relations
P0(x) = 1, P1(x) = x − α0, . . . ,
Pi+1(x) = (x − αi)Pi(x) − βiPi−1(x), i = 1, m − 1, (7.291)
where
αi =
xPi, Pi
Pi
2
, i = 1, m − 1, (7.292)
βi =
Pi
2
Pi−1
2
, i = 1, m − 1. (7.293)
Demonstration. The first relations (7.291) result directly from formulae (7.248).
Let it now be m ≥ 2, and for any i = 1, m − 1, let us consider
Qi+1(x) = (x − αi)Pi(x) − β1Pi−1(x), (7.294)
ORTHOGONAL POLYNOMIALS 411
with Pi−1 and Pi given by relation (7.248).
Because Pi−1 and Pi are orthogonal, we get
Qi+1, Pi = xPi, Pi − αi Pi
2
. (7.295)
Moreover,
Qi+1, Pi+1 = Pi, xPi − βi Pi−1
2
= Pi, xPi−1 − Pi = 0, (7.296)
because xi
does not appear in the difference xPi−1 − Pixi
, while Pi is orthogonal to the polynomials
1, x, x2, . . . , xi−1.
On the other hand, for any k = 0, i − 2, the polynomial Pi is orthogonal to the polynomials Pk
and xPk; hence
Qi+1, Pk = Pi, xPk − αi Pi, Pk − βi Pi−1, Pk = 0. (7.297)
We thus deduce that the polynomial Qi+1 is orthogonal to all the polynomials of degree strictly
less than i and is of the form
Qi+1(x) = xi+1
+ R(x), (7.298)
where the degree of R is at most equal to i.
On the other hand, the polynomials P0, P1, . . . , Pi form an orthogonal basis for the space of
polynomials of degree at most equal to i, so that R(x) will be written in the form
R(x) =
i
k=0
R, Pk
Pk
2
Pk. (7.299)
From the relation
xi+1
, Pk + R, Pk = Qi+1, Pk = 0, k = 0, i, (7.300)
we deduce
R, Pk = − xi+1
, Pk , (7.301)
hence, Qi+1 = Pi+1, the proposition being stated.
Theorem 7.2 If the scalar product (7.247) satisfies the conditions of symmetry (7.290), then the
roots of the polynomial Pn constructed with relation (7.247) and denoted by λ1, λ2, . . . , λn are
real, distinct, and verify the relations
λi =
xLi, Li
Li
2
, i = 1, n, (7.302)
in which
Li(x) =
n
k=1
k=i
x − λk
λi − λk
. (7.303)
Demonstration. Because
xQj , Qj = Qj , xQj = xQj , Qj , (7.304)
where the upper bar marks the complex conjugate, taking into account Proposition 7.5 we deduce
that the roots are real and distinct.
412 NUMERICAL DIFFERENTIATION AND INTEGRATION
If the coefficients of the polynomials Pn are real numbers, then the complex roots of these
polynomials are conjugate two by two, which means that the polynomial Pn is written in the form
Pn(x) = [(x − a)2
+ b2
]R(x), (7.305)
where a and b are real numbers, while R is a polynomial with real coefficients of degree n − 2.
We may write successively
0 = Pn, R = [(x − a)2
+ b2
]R, R = (x − a)2
R, R + b2
R, R
= (x − a)R 2
+ b2
R 2
> 0, (7.306)
which is absurd.
If the polynomial P would have a multiple real root a, then
Pn(x) = (x − a)2
R(x), (7.307)
where R is a polynomial of degree n − 2, which may have a as root.
We have
0 = Pn, R = (x − a)2
R, R = (x − a)R 2
> 0 (7.308)
obtaining again a contradiction.
Formula (7.302) is a consequence of Proposition 7.5.
7.14 QUADRATURE FORMULAE OF GAUSS TYPE OBTAINED
BY ORTHOGONAL POLYNOMIALS
We have calculated in the previous paragraph various orthogonal polynomials till the third degree.
Let P be such a polynomial of degree 3, and denote by x1, x2 and x3 its real and distinct roots.
We search a quadrature formula of the form
b
a
f (x)dx ≈ A1f (x1) + A2f (x2) + A3f (x3), (7.309)
where A1, A2 and A3 are constants; the formula is exact for polynomials of maximum possible
degree.
We have
b
a
dx = b − a,
b
a
x dx =
b2
− a2
2
,
b
a
x2
dx =
b3
− a3
3
, (7.310)
obtaining thus a linear system of three equations with three unknowns
A1 + A2 + A3 = b − a,
A1x1 + A2x2 + A3x3 =
b2
− a2
2
, (7.311)
A1x2
1 + A2x2
2 + A3x2
3 =
b3
− a3
3
.
We deduce the values A1, A2, and A3 from system (7.311).
QUADRATURE FORMULAE OF GAUSS TYPE OBTAINED BY ORTHOGONAL POLYNOMIALS 413
Obviously, if we wish to have a quadrature formula at n points, then we consider the polynomial
Pn with the roots x1, x2, . . . , xn; it follows that the system
A1 + A2 + · · · + An = b − a, A1x1 + A2x2 + · · · + Anxn =
b2
− a2
2
, . . . ,
A1xn−1
1 + A2xn−1
2 + · · · + Anxn−1
n =
bn
− an
n
. (7.312)
7.14.1 Gauss–Jacobi Quadrature Formulae
The Jacobi polynomial of second degree is given (the case α = β = 1) by
P2(x) = x2
−
1
5
; (7.313)
it has the roots
x1 = −
1
5
, x2 =
1
5
(7.314)
and it follows that the system
A1 + A2 = 2, −A1
1
5
+ A2
1
5
= 0, (7.315)
with the solution A1 = A2 = 1. We obtain the Gauss–Jacobi quadrature formula
1
−1
f (x)dx ≈ f −
1
5
+ f
1
5
. (7.316)
Considering now the Jacobi polynomial of third degree (the case α = β = 1)
P3(x) = x3
−
3
7
x, (7.317)
we obtain the roots
x1 = −
3
7
, x2 = 0, x3 =
3
7
(7.318)
and the system
A1 + A2 + A3 = 2, −A1
3
7
+ A3
3
7
= 0,
3
7
A1 +
3
7
A3 =
2
3
, (7.319)
with the solution
A1 =
7
9
, A2 =
4
9
, A3 =
7
9
. (7.320)
It follows that the Gauss–Jacobi quadrature formula
1
−1
f (x)dx ≈
7
9
f −
3
7
+
4
9
f (0) +
7
9
f
3
7
. (7.321)
414 NUMERICAL DIFFERENTIATION AND INTEGRATION
7.14.2 Gauss–Hermite Quadrature Formulae
A formula of the form
∞
−∞
e−x2
f (x)dx ≈
n
i=1
Aif (xi) (7.322)
is searched; this one is exact for f polynomial of the maximum possible degree.
The Hermite polynomials P1(x) = x has the root x1 = 0, so that formula (7.322) becomes
∞
−∞
e−x2
f (x)dx ≈ A1f (0). (7.323)
Choosing f (x) = 1, we obtain
∞
−∞
e−x2
dx =
√
π = A1 (7.324)
and the first Gauss–Hermite quadrature formula reads
∞
−∞
e−x2
dx =
√
πf (0). (7.325)
Let us consider now the Hermite polynomial P2(x) = x2 − 1/2, with the roots
x1 = −
1
2
, x2 =
1
2
; (7.326)
the quadrature formula is now of the form
∞
−∞
e−x2
f (x)dx ≈ A1f −
1
2
+ A2f
1
2
. (7.327)
Taking f (x) = 1 and f (x) = x, we obtain the linear algebraic system
∞
−∞
e−x2
dx =
√
π = A1 + A2,
∞
−∞
xe−x2
dx = 0 = −A1
1
2
+ A2
1
2
, (7.328)
with the solution
A1 = A2 =
√
π
2
; (7.329)
it follows that the second Gauss–Hermite quadrature formula
∞
−∞
e−x2
f (x)dx ≈
√
π
2
f −
1
2
+
√
π
2
f
1
2
. (7.330)
For a Gauss–Hermite quadrature formula in three points, one starts from the Hermite polynomial
P3(x) = x3
− 3x/2, the roots of which are
x1 = −
3
2
, x2 = 0, x3 =
3
2
. (7.331)
From ∞
−∞
e−x2
f (x)dx ≈ A1f (x1) + A2f (x2) + A3f (x3), (7.332)
QUADRATURE FORMULAE OF GAUSS TYPE OBTAINED BY ORTHOGONAL POLYNOMIALS 415
choosing f (x) = 1, f (x) = x, and f (x) = x2
, we obtain the linear algebraic system
∞
−∞
e−x2
dx =
√
π = A1 + A2 + A3,
∞
−∞
xe−x2
dx = 0 = −A1
3
2
+ A3
3
2
, (7.333)
∞
−∞
x2
e−x2
dx =
√
π
2
=
3
2
A1 +
3
2
A3,
with the solution
A1 =
√
π
6
, A2 =
2
√
π
3
, A3 =
√
π
6
; (7.334)
it thus results the Gauss–Hermite quadrature formula
∞
−∞
e−x2
f (x)dx ≈
√
π
6
f −
3
2
+
2
√
π
3
f (0) +
√
π
6
f
3
2
. (7.335)
7.14.3 Gauss–Laguerre Quadrature Formulae
We take the quadrature formulae of the form (for α = 1)
∞
0
xe−x
f (x)dx ≈
n
i=1
Aif (xi). (7.336)
For the Laguerre polynomial P1(x) = x − 2 we find the root x1 = 2 and formula (7.336) becomes
∞
0
xe−x
f (x)dx ≈ A1f (2). (7.337)
Choosing f (x) = 1, it follows that the equation
∞
0
xe−x
dx = 1 = A1; (7.338)
thus we obtain the Gauss–Laguerre quadrature formula
∞
0
xe−x
f (x)dx ≈ f (2). (7.339)
In the case of the Laguerre polynomial P2(x) = x2
− 6x + 6, the roots being
x1 = 3 −
√
3, x2 = 3 +
√
3, (7.340)
we obtain the relation
∞
0
xe−x
f (x)dx ≈ A1f (3 −
√
3) + A2f (3 +
√
3). (7.341)
416 NUMERICAL DIFFERENTIATION AND INTEGRATION
Taking now f (x) = 1 and f (x) = x, it follows that the linear algebraic system
∞
0
xe−x
dx = 1 = A1 + A2,
∞
0
x2
e−x
dx = 2 = A1(3 −
√
3) + A2(3 +
√
3),
(7.342)
with the solution
A1 =
1 +
√
3
2
√
3
, A2 =
√
3 − 1
2
√
3
. (7.343)
We obtain the Gauss–Laguerre quadrature formula
∞
0
xe−x
f (x)dx ≈
1 +
√
3
2
√
3
f (3 −
√
3) +
√
3 − 1
2
√
3
f (3 +
√
3). (7.344)
Let the Laguerre polynomial now be P3(x) = x3
− 12x2
+ 36x − 24, the roots of which are
x1 ≈ 0.9358, x2 ≈ 3.3050, x3 ≈ 7.7598. (7.345)
The quadrature formula reads
∞
0
xe−x
f (x)dx ≈ A1f (x1) + A2f (x2) + A3f (x3). (7.346)
Choosing f (x) = 1, f (x) = x, and f (x) = x2
, it follows that the linear algebraic system
∞
0
xe−x
dx = 1 = A1 + A2 + A3,
∞
0
x2
e−x
dx = 2 = A1x1 + A2x2 + A3x3,
∞
0
x3
e−x
dx = 6 = A1x2
1 + A2x2
2 + A3x2
3 ,
(7.347)
from which we obtain the values
A1 = 0.589, A2 = 0.391, A3 = 0.020. (7.348)
The Gauss–Laguerre quadrature formula reads
∞
0
xe−x
f (x)dx = 0.589f (0.9358) + 0.391f (3.3050) + 0.020f (7.7598). (7.349)
OTHER QUADRATURE FORMULAE 417
7.15 OTHER QUADRATURE FORMULAE
7.15.1 Gauss Formulae with Imposed Points
We present now the theory in the case in which a point of division is imposed, so that
1
−1
f (x)dx = C0f (x0) +
n
i=1
Cif (xi), (7.350)
where the division point x0 is the imposed point.
Let us remark that 2n + 1 parameters remain to be determined, that is, the points xi, i = 1, n,
and the coefficients C0, C1, . . . , Cn.
Proceeding as in the Gauss method, we have
1
−1
dx = 2 = C0 +
n
i=1
Ci,
1
−1
x dx = 0 = C0x0 +
n
i=1
Cixi,
1
−1
x2
dx =
2
3
= C0x2
0 +
n
i=1
Cix2
i , . . . ,
1
−1
x2n−1
dx = 0 = C0x2n−1
0 +
n
i=1
Cix2n−1
i ,
1
−1
x2n
dx =
2
2n + 1
= C0x2n
0 +
n
i=1
Cix2n
i .
(7.351)
Multiplying by x0 each such relation (unless the last one) and subtracting from the following
one, we obtain
−2 =
n
i=1
Ci(xi − x0) =
n
i=1
Cixi − x0
n
i=1
Ci,
2
3
=
n
i=1
Cixi(xi − x0) =
n
i=1
Cix2
i − x0
n
i=1
Cixi,
−
2
3
=
n
i=1
Cix2
i (xi − x0) =
n
i=1
Cix3
i − x0
n
i=1
Cix2
i , . . . ,
2
2n + 1
=
n
i=1
Cix2n−1
i (xi − x0) =
n
i=1
Cix2n
i − x0
n
i=1
Cix2n−1
i .
(7.352)
From the first relation (7.352), we find
n
i=1
Cixi = −2 + x0
n
i=1
Ci, (7.353)
which replaced in the second relation (7.352), leads to
n
i=1
Cix2
i =
2
3
− 2x0 + x2
0
n
i=1
Ci. (7.354)
418 NUMERICAL DIFFERENTIATION AND INTEGRATION
Step by step, we deduce
n
i=1
Cixk
i = Pk(x0) + xk
0
n
i=1
Ci, (7.355)
where Pk is a polynomial of (k − 1)th degree.
On the other hand, from the first relation (7.351), we obtain
n
i=1
Ci = 2 − C0, (7.356)
so that expression (7.355) becomes
n
i=1
Cixk
i = Pk(x0) + (2 − C0)xk
0 . (7.357)
The problem has been reduced to Gauss quadrature in which the terms that define the sums
n
i=1 Cixk
i are no more equal to
1
−1 xkdx, but to the expressions at the right of relation (7.357).
We find the same interpolation knots, but the constants C0, C1, . . . , Cn are other ones now.
Similarly, we discuss the case in which more points are imposed.
7.15.2 Gauss Formulae in which the Derivatives of the Function Also Appear
A formula in which the derivatives of the function also appear is of the form
1
−1
f (x)dx = C1f (x1) + · · · + Cpf (xp) + D1f (x1) + · · ·
+ Dr f (xr ) + E1f (x1 ) + · · · + Esf (xs ) + · · ·
(7.358)
Such a relation may or may not have certain imposed points, but we must be careful because
the system which is obtained may be without solutions.
As a first example, let us consider a Gauss formula of the form
1
−1
f (x)dx = Cf (y) + Df (y), (7.359)
where the unknowns are C, D, and y. We have
1
−1
dx = 2 = C,
1
−1
x dx = 0 = Cy + D,
1
−1
x2
dx =
2
3
= Cy2
+ 2Dy. (7.360)
From the first relation (7.360) it follows that C = 2, and from the second one we get
D = −Cy = −2y, which replaced in the last expression (7.360), leads to
2
3
= 2y2
− 2y2
= 0, (7.361)
which is absurd; hence, we cannot have such a Gauss formula.
Let us now search a Gauss formula of the form
1
−1
f (x)dx = Cf (−1) + Df (1) + Ef (y), (7.362)
OTHER QUADRATURE FORMULAE 419
in which the unknowns are C, D, E, and y. We have
1
−1
dx = 2 = C + D,
1
−1
x dx = 0 = −C + D + E,
1
−1
x2
dx =
2
3
= C + D + 2Ey,
1
−1
x3
dx = 0 = −C + D + 3Ey2
.
(7.363)
It follows that successively
C = 2 − D, E = C − D = 2 − 2D, (7.364)
2
3
= 2 + 2(2 − 2D)y, 2D − 2 + 3(2 − 2D)y2
= 0, (7.365)
from which
y =
1
3(D − 1)
, y2
=
1
9(D − 1)2
, (7.366)
y2
=
2 − 2D
3(2 − 2D)
=
1
3
. (7.367)
For y = 1/
√
3, we obtain the values
(D − 1)2
=
1
3
, D = 1 +
1
√
3
, or D = 1 −
1
√
3
, (7.368)
E = −
2
√
3
or E =
2
√
3
, (7.369)
C = 1 −
1
√
3
or C = 1 +
1
√
3
(7.370)
as well as the quadrature formulae
1
−1
f (x)dx = 1 −
1
√
3
f (−1) + 1 +
1
√
3
f (1) −
2
√
3
f
1
√
3
, (7.371)
1
−1
f (x)dx = 1 +
1
√
3
f (−1) + 1 −
1
√
3
f (1) +
2
√
3
f
1
√
3
. (7.372)
For y = −1/
√
3, the formulae read
1
−1
f (x)dx = 1 −
1
√
3
f (−1) + 1 +
1
√
3
f (1) −
2
√
3
f −
1
√
3
, (7.373)
1
−1
f (x)dx = 1 +
1
√
3
f (−1) + 1 −
1
√
3
f (1) +
2
√
3
f −
1
√
3
. (7.374)
420 NUMERICAL DIFFERENTIATION AND INTEGRATION
7.16 CALCULATION OF IMPROPER INTEGRALS
We will exemplify, in this paragraph, the methods described in Section 7.4 for the calculation of
the improper integrals.
We consider the integral
I =
∞
0
dx
(x + 2)
√
x + 1
. (7.375)
The integral may be written in the form
I =
∞
0
xe−x dx
xe−x(x + 2)
√
x + 1
; (7.376)
we may apply the Gauss–Laguerre quadrature formula
I ≈ 0.589f (0.9358) + 0.391f (3.3050) + 0.020f (7.7598), (7.377)
where
f (x) =
ex
x(x + 2)
√
x + 1
. (7.378)
It follows that
f (0.9358) ≈ 0.667, f (3.3050) ≈ 0.749, f (7.7598) ≈ 10.459, (7.379)
I ≈ 0.895. (7.380)
By the change of variable
x = u − 2, dx = du, (7.381)
it follows that
I =
∞
2
du
u
√
u − 1
. (7.382)
By a new change of variable
u =
1
v
, du = −
1
v2
dv, (7.383)
the integral takes the form
I =
0
1
2
− 1
v2 dv
1
v
1
v − 1
=
1
2
0
dv
√
v(v − 1)
. (7.384)
By a new change of variable
v =
w + 1
4
, dv =
1
4
dw, (7.385)
it follows that
I =
1
−1
dw
√
w + 1
√
3 − w
. (7.386)
CALCULATION OF IMPROPER INTEGRALS 421
We may apply the Gauss quadrature formula in three points, obtaining
I ≈
5
9
f −
3
5
+
8
9
f (0) +
5
9
f
3
5
, (7.387)
where
f (w) =
1
√
w + 1
√
3 − w
, (7.388)
f −
3
5
≈ 0.9734, f (0) ≈ 0.5774, f
3
5
≈ 0.5032, (7.389)
I ≈ 1.3336. (7.390)
If we wish to apply the Gauss–Jacobi quadrature formula in three points, we calculate
successively
f −
3
7
≈ 1.5147, f (0) ≈ 0.5774, f
3
7
≈ 0.3946, (7.391)
I ≈
7
9
f −
3
7
+
4
9
f (0) +
7
9
f
3
7
≈ 1.7416. (7.392)
Returning to relation (7.382) of the integral, we observe that the asymptotic behavior of the
function
f (u) =
1
u
√
u − 1
(7.393)
is given by the function
g(u) =
1
u
√
u
= u− 3
2 . (7.394)
Calculating (a > 0)
∞
a
g(u)du = −2u− 1
2
∞
a
=
2
√
a
, (7.395)
we observe that the integral (7.395) may be made as small as we wish by conveniently choosing
a. For example, let, a = 100. We may write
∞
2
du
u
√
u − 1
=
100
2
du
u
√
u − 1
+
∞
100
du
u
√
u − 1
≈
100
2
du
u
√
u − 1
+
∞
100
du
u
3
2
= 0.2 +
100
2
du
u
√
u − 1
. (7.396)
By the change of variable
u = 49v + 51, du = 49dv, (7.397)
the last integral (7.396) becomes
100
2
du
u
√
u − 1
=
1
−1
49dv
(49v + 51)
√
49v + 50
. (7.398)
422 NUMERICAL DIFFERENTIATION AND INTEGRATION
Applying the Gauss quadrature formula in three points to the last integral
f (v) =
49
(49v + 51)
√
49v + 50
, (7.399)
f −
3
5
≈ 1.0823, f (0) ≈ 0.1359, f
3
5
≈ 0.0587, (7.400)
we get
1
−1
49dv
(49v + 51)
√
49v + 50
≈ 0.7455, (7.401)
I ≈ 0.9455. (7.402)
In form (7.384), this integral may be easily calculated; it has the value
I = (arcsin(2v))|
1
2
0 =
π
2
≈ 1.5708. (7.403)
We remark that the values thus obtained are sensibly different from the exact value (7.403). The
precision may be improved by using Gauss quadrature formulae in several points; but we are thus
led to an increased calculation time.
7.17 KANTOROVICH’S METHOD
The idea of this method15 consists in writing
I =
b
a
f (x)dx (7.404)
in the form
I =
b
a
g(x)dx +
b
a
(f (x) − g(x))dx, (7.405)
where the first integral is directly calculated, while the second one is calculated by numerical
formulae.
Let us return, by exemplifying, to the example of the preceding paragraph written in the form
I =
1
2
0
dx
√
x
√
1 − x
. (7.406)
The function
f (x) =
1
√
x
√
1 − x
(7.407)
is not defined for x = 0.
We expand into series the function
φ(x) =
1
√
1 − x
(7.408)
15The method was described by Leonid Vitaliyevich Kantorovich (1912–1986).
THE MONTE CARLO METHOD FOR CALCULATION OF DEFINITE INTEGRALS 423
around x = 0; it follows that
φ(x) = 1 +
1
2
x +
3
4
x2
+
5
16
x3
+
35
128
x4
+ · · · (7.409)
We get
I =
1
2
0
x− 1
2 dx +
1
2
1
2
0
x
1
2 dx +
3
8
1
2
0
x
3
2 dx +
5
16
1
2
0
x
5
2 dx +
35
128
1
2
0
x
7
2 dx + J
= 1.5691585 + J, (7.410)
where J is the integral
J =
1
2
0
1
√
1 − x
− 1 +
1
2
x +
3
8
x2
+
5
16
x3
+
35
128
x4
+ · · · dx. (7.411)
This last integral is no more an improper one and may be calculated as usual, for example, by
the trapezoid formula with the step h = 0.1.
Denoting
ψ(x) =
1
√
1 − x
− 1 +
1
2
x +
3
8
x2
+
5
16
x3
+
35
128
x4
, (7.412)
we have
ψ(0) = 0, ψ(0.1) = 2.7 × 10−6
, ψ(0.2) = 9.65 × 10−5
, (7.413)
ψ(0.3) = 8.263 × 10−4
, ψ(0.4) = 0.0039944, ψ(0.5) = 0.0143112,
J ≈
0.1
2
[ψ(0) + 2(ψ(0.1) + ψ(0.2) + ψ(0.3) + ψ(0.4)) + ψ(0.5)] = 0.001208. (7.414)
It follows that
I ≈ 1.50916 + 0.00121 = 1.57037, (7.415)
which is a value very close to the exact one I = π/2.
7.18 THE MONTE CARLO METHOD FOR CALCULATION OF
DEFINITE INTEGRALS
Hereafter, we consider firstly the one-dimensional case, generalizing then for the multidimensional
case.
7.18.1 The One-Dimensional Case
Let us suppose that we must calculate the integral
I =
b
a
f (x)dx, (7.416)
where a and b are two finite real numbers, a < b, while f is continuous and positive on [a, b].
With the change of variable
x = a + (b − a)t, dx = (b − a)dt, (7.417)
424 NUMERICAL DIFFERENTIATION AND INTEGRATION
G(t)
G(t)
t1
1
P2
P1
O
Figure 7.4 The Monte Carlo method in the one-dimensional case.
the integral I reads
I =
1
0
F(t)dt, (7.418)
where
F(t) = (b − a)f (a + (b − a)t), (7.419)
Let
M = max
t∈[0,1]
F(t), (7.420)
so that the integral I may be put in the form
I = M
1
0
F(t)
M
dt = M
1
0
G(t)dt. (7.421)
The function G is definite on the interval [0, 1] and takes values in the same interval. Graphically,
this is shown in Figure 7.4.
Denoted by A the hatched area in Figure 7.4, it follows that the integral I has the form
I = MA. (7.422)
Obviously, if the value M given by relation (7.420) is difficult to determine, then, we may take
a covering value for M.
Hence, it follows that the determination of the value of the integral has been reduced to the
determination of the area A. To do this, we generate pairs (x, y) of aleatory numbers, uniformly
distributed in the interval [0, 1], resulting in the points P1(x1, y1), P2(x2, y2), . . . , Pn(xn, yn). We
index the entire variable s by 0. If the point Pi(xi, yi) is in the interior of the hatched area (the
case of the point P1 in Fig. 7.4), then the variable s is incremented by a unit; in the opposite case
(the case of the point P2 in Fig. 7.4), the variable s remains unchanged. Finally, the area A is
approximated by the formula
A ≈
s
n
, (7.423)
where n is the number of generatings. Obviously,
A = lim
n→∞
s
n
. (7.424)
THE MONTE CARLO METHOD FOR CALCULATION OF DEFINITE INTEGRALS 425
Observation 7.18
(i) If the function f changes of sign in the interval [a, b], then we divide the interval [a, b]
in subintervals on which the function f keeps a constant sign; thus we apply the described
method on each such interval.
(ii) If F(t) is negative on the interval [0, 1], then we choose
M = min
t∈[0,1]
F(t) (7.425)
and it follows that G : [0, 1] → [0, 1]; the procedure may be applied.
7.18.2 The Multidimensional Case
Let the function be
y = f (x1, x2, . . . , xn), (7.426)
continuous on the closed domain D of Rn and the integral
I =
D
· · · f (x1, x2, . . . , xn)dx1 dx2 · · · dxn. (7.427)
The domain D may be included in the n-dimensional hyperparallelepiped
[a1, b1] × [a2, b2] × · · · × [an, bn] ⊇ D (7.428)
We make the change of variable
xi = ai + (bi − ai)ξi, i = 1, n, (7.429)
from which
D(x1, x2, . . . , xn)
D(ξ1, ξ2, . . . , ξn)
=
b1 − a1 0 · · · 0
0 b2 − a2 · · · 0
· · · · · · · · · · · ·
0 0 · · · bn − an
=
n
i=1
(bi − ai); (7.430)
the integral I becomes
I =
E
· · · F(ξ1, ξ2, . . . , ξn)dξ1 dξ2 · · · dξn, (7.431)
where E marks the n-dimensional hypercube
E = [0, 1] × [0, 1] × · · · × [0, 1], (7.432)
while
F(ξ1, ξ2, . . . , ξn) =
n
i=1
(bi − ai)f (a1 + (b1 − a1)ξ1, . . . , an + (bn − an)ξn). (7.433)
426 NUMERICAL DIFFERENTIATION AND INTEGRATION
We generate groups of n aleatory numbers uniformly distributed in the interval [0, 1]. Let
g = (g1, g2, . . . , gn) be such a group. The point P (g1, g2, . . . , gn) may be found in the inte-
rior of the transform of the domain D by the changes of variables (7.429), case in which it must
be taken in consideration with the value F(g1, g2, . . . , gn). Let us denote by S the set of all the
points of this kind obtained by N generations of groups of aleatory uniformly distributed numbers.
We define the value
ymed =
1
|S|
g∈S
F(g), (7.434)
where |S| is the cardinal number of S, F(g) = F(g1, g2, . . . , gn), while g = (g1, g2, . . . , gn) is the
group of n uniformly distributed aleatory numbers. For the integral I follows that the approximate
value
I ≈
1
N
g∈S
F(g). (7.435)
If the function F(ξ1, ξ2, . . . , ξn) is positive, then we may consider the integral (7.433) as
defining the volume of the body in a (n + 1)-dimensional space given by
0 ≤ ξi ≤ 1, i = 1, n, 0 ≤ y ≤ F(ξ). (7.436)
We may find a real positive number B for which 0 ≤ F(ξ) ≤ B. We introduce variable
η =
1
B
y, (7.437)
so that the integral I now becomes
I =
E×[0,1]
· · · dξ1 dξ2 · · · dξndη (7.438)
and is equal to the volume of a hypercylinder interior to the (n + 1)-dimensional hypercube given
by E × [0, 1].
Now, we also generate groups of uniformly distributed aleatory numbers in the interval [0, 1];
but, in this case, a group will contain n + 1 uniformly distributed aleatory numbers. If we denote
by S the set of groups which define points in the interior of the hypercylinder, then
I ≈ B
|S|
N
, (7.439)
where N is the number of generations of such groups.
Observation 7.19
(i) If as a consequence of the generation of a group of aleatory numbers it follows that a point
is raised on the frontier of the domain, then this may be considered as a valid point, which
is definite by a group of, or on the contrary, it is possible to not take it into consideration.
Obviously, immaterial of how we consider such a point, passing to the limit for the number
of generations N → ∞, we obtain the searched value of the integral.
(ii) The method supposes that we may determine if a group g is a part or not a part of the set S.
If the frontier of the domain D is described by complicated expressions, then the validation
of a group g may take sufficient time, so that the method is quite slow.
NUMERICAL EXAMPLES 427
7.19 NUMERICAL EXAMPLES
Example 7.1 Let us consider the function f : [0, 3] → R,
f (x) = ex
(sin x + cos2
x), (7.440)
for which the values in the following table are known.
xi yi = f (xi)
0 1
0.5 2.060204
1.2 3.530421
1.8 6.203722
2.3 11.865576
3.0 22.520007
We wish to determine approximations for the values of the derivatives f (x) and f (x) in the
interior division knots and to compare these values with the real values.
The derivative of the function f is given by
f (x) = ex
(sin x + cos x + xcos2
x − sin 2x), (7.441)
f (x) = ex
(2 cos x + cos2
x − 2 sin 2x − 2 cos 2x). (7.442)
For the knot x1 = 0.5 we have
h = x1 − x0 = 0.5, h1 = x2 − x1 = 0.7 = 1.4h (7.443)
and it follows that
f (x) ≈
1
(α + 1)h
(f (x2) − f (x0)) =
1
(1.4 + 1)0.5
(3.530421 − 1) = 2.10868 (7.444)
f (x) ≈
1
α(α + 1)h
(f (x2) − f (x0) − (1 + α)f (x1))
=
2
1.4 × 2.4 × 0.5
(3.530421 + 1.4 × 1 − 2.4 × 2.060204) = −0.01675. (7.445)
The exact values are
f (x) = 2.11974, f (x1) = −0.39278. (7.446)
The calculations are given in the following table.
xi yi approx f (xi) exact f (xi) approx f (xi) exact f (xi)
0 1 2 1
0.5 2.060204 2.10868 2.11974 −0.01675 −0.39278
1.2 3.530411 3.18732 2.49087 2.53636 3.25332
1.8 6.203722 9.09290 7.50632 7.49259 13.76763
2.3 11.865576 13.59690 15.13127 3.24742 13.19643
3.0 22.520007 8.24769 47.43018
428 NUMERICAL DIFFERENTIATION AND INTEGRATION
Example 7.2 Let f : [0, 4] → R,
f (x) =
sin x
1 + cos2x
(7.447)
and the equidistant division knots x0 = 0, x1 = 1, x2 = 2, x3 = 3, x4 = 4.
Approximate values of the derivatives f (xi), f (xi), i = 1, 3, as well as of the derivatives
f (0.5), f (0.4), f (3.7), f (3.73) are asked.
We construct the table of finite differences.
xi yi = f (xi) yi
2yi
3yi
4yi
0 0 0.651330 −0.527588 −0.299956 1.229384
1 0.651330 0.123742 −0.827544 0.929824
2 0.775072 −0.703802 0.102280
3 0.071270 −0.601522
4 −0.530252
If we use an expansion into a Taylor series, then we obtain the following results:
f (1) ≈
f (2) − f (0)
2
= 0.387536,
f (2) ≈
f (3) − f (1)
2
= −0.290030, (7.448)
f (3) ≈
f (4) − f (2)
2
= −0.652662,
f (1) ≈
f (0) + f (2) − 2f (1)
12
= −0.527588,
f (2) ≈
f (1) + f (3) − 2f (2)
12
= −0.827544, (7.449)
f (3) ≈
f (2) + f (4) − 2f (3)
12
= 0.102280.
The Newton forward and backward interpolation polynomials read
P1(q1) = y0 +
q(1)
1
1!
y0 +
q(2)
1
2!
2
y0 +
q(3)
1
3!
3
y0 +
q(4)
1
4!
4
y0, (7.450)
P2(q2) = y4 +
q(1)
2
1!
y3 +
(q2 + 1)(2)
2!
2
y2 +
(q2 + 2)(3)
3!
3
y1 +
(q2 + 3)(4)
4!
4
y2, (7.451)
respectively, where
q1 =
x − x0
h
, q2 =
x − xn
h
. (7.452)
We deduce the following values:
• for x = 0.5:
q1 =
0.5 − 0
1
= 0.5, (7.453)
f (0.5) ≈
1
1
y0 +
q(1)
1
1!
2
y0 +
q(2)
1
2!
3
y0 +
q(3)
1
3!
4
y0 = 0.501867; (7.454)
NUMERICAL EXAMPLES 429
• for x = 0.4:
q =
0.4 − 0
1
= 0.4, (7.455)
f (0.4) ≈
1
12
2
y0 +
q(1)
1
1!
3
y0 +
q(2)
1
2!
4
y0 = −0.801243; (7.456)
• for x = 3.7:
q2 =
3.7 − 0
1
= −0.3, (7.457)
f (3.7) ≈
1
1
y3 +
q2 + 1
(1)
1!
2
y2 +
(q2 + 2)(2)
2!
3
y1 +
(q2 + 3)(3)
3!
4
y0 = 0.681654,
(7.458)
• for x = 3.73:
q2 =
3.73 − 4
1
= −0.27, (7.459)
f (3.73) ≈
1
12
2
y2 +
q2 + 2
(1)
1!
3
y1 +
(q2 + 3)(2)
2!
4
y0 = 4.614004. (7.460)
On the other hand,
f (x) =
cos x(2 + sin2
x)
(1 + cos2x)2
, (7.461)
f (x) =
sin x(1 + 7cos2x − 4sin2
x)
(1 + cos2x)3
(7.462)
and the exact values of the function and of its first two derivative are given in the following
table.
x f (x) f (x) f (x)
0 0 0.5 0
0.4 0.2102684 0.876641 0.422211
0.5 0.270839 0.624515 0.534146
1 0.651330 0.876641 0.405069
2 0.775072 −0.854707 −0.294121
3 0.071270 −0.510032 0.142858
3.7 −0.308174 −0.654380 −0.596319
3.73 −0.328049 −0.670677 −0.633525
4 −0.530252 −0.8255541 −0.697247
These two examples show:
(i) the method that uses the expansion into a Taylor series is more precise than the one which
uses interpolation polynomials;
(ii) the derivative of first order is more precisely calculated as that of second order;
(iii) the numerical derivative does not offer a good precision.
430 NUMERICAL DIFFERENTIATION AND INTEGRATION
Example 7.3 Let
I =
2
1
x sin dx. (7.463)
We shall give approximations of the integral I using various numerical methods.
The integral I may be directly calculated, obtaining the value
I = (−x cos x + sin x)|2
1 = 1.4404224. (7.464)
To apply the trapezium method, we take the division step h = 0.1, obtaining the following data.
xi yi = f (xi)
1 0.8414710
1.1 0.9803281
1.2 1.1184469
1.3 1.2526256
1.4 1.3796296
1.5 1.4962425
1.6 1.5993178
1.7 1.6858302
1.8 1.7529257
1.9 1.7979702
2.0 1.8185949
It follows that
I ≈
0.1
2
(f (1) + 2(f (1, 1) + f (1, 2) + · · · + f (1, 9)) + f (2)) = 1.4393350. (7.465)
The same problem may be solved by using Simpson’s formula obtaining
I ≈
0.1
3
(f (1) + 2(f (1.2) + f (1.4) + f (1.6) + f (1.8))
+ 4(f (1.1) + f (1.3) + f (1.5) + f (1.7) + f (1.9)) + f (2)) = 1.4404233.
(7.466)
Let us consider the transformation
x =
1
2
y +
3
2
, dx =
1
2
dy. (7.467)
Now, the integral I reads
I =
1
−1
y + 3
2
sin
y + 3
2
dy
2
. (7.468)
We shall determine the Chebyshev quadrature formulae for the cases n = 2, n = 3, and n = 4,
applying them to the integral (7.468).
In the case n = 2 we obtain
A1 = A2 = 1 (7.469)
NUMERICAL EXAMPLES 431
and the system
x1 + x2 = 0, x2
1 + x2
2 =
2
3
, (7.470)
which it results in Chebyshev’s formula
I =
1
−1
f (x)dx ≈ f −
1
√
3
+ f
1
3
(7.471)
and, numerically,
I ≈ 1.440144. (7.472)
If n = 3, then we deduce the values
A1 = A2 = A3 =
2
3
(7.473)
and the system
x1 + x2 + x3 = 0, x2
1 + x2
2 + x2
3 = 1, x3
1 + x3
2 + x3
3 = 0, (7.474)
with the solution
x1 = −
1
2
, x2 = 0, x3 =
1
2
. (7.475)
Chebyshev’s formula reads
I =
1
−1
f (x)dx ≈
2
3
f −
1
2
+ f (0) + f
1
2
, (7.476)
leading to the value
I ≈ 1.440318. (7.477)
Finally, in the case n = 4 we obtain the values
A1 = A2 = A3 = A4 =
1
2
(7.478)
and the system
x1 + x2 + x3 + x4 = 0, x2
1 + x2
2 + x2
3 + x2
4 =
4
3
,
x3
1 + x3
2 + x3
3 + x3
4 = 0, x4
1 + x4
2 + x4
3 + x4
4 =
4
5
,
(7.479)
with the solution
x1 = −0.79466, x2 = −0.18759, x3 = 0.18759, x4 = 0.79466. (7.480)
The integral I will have the value
I =
1
−1
f (x)dx ≈ 0.5(f (−0.79466) + f (−0.18759) + f (0.18759) + f (0.79466)), (7.481)
432 NUMERICAL DIFFERENTIATION AND INTEGRATION
hence
I ≈ 1.440422. (7.482)
The same integral I at equation (7.468) may be calculated by quadrature formulae of Gauss
type. To do this, we determine firstly the Legendre polynomials:
P0(x) = 1, (7.483)
P1(x) =
1
2 × 1!
d2
(x2
− 1)
dx
= x, (7.484)
P2(x) =
1
22
× 2!
d2[(x2 − 1)2]
dx2
=
1
2
(3x2
− 1), (7.485)
P3(x) =
1
23
× 3!
d3
[(x2
− 1)3
]
dx3
=
1
2
(5x3
− 3x), (7.486)
P4(x) =
1
24
× 4!
d4[(x2 − 1)4]
dx4
=
1
8
(35x4
− 30x2
+ 3), . . . (7.487)
The roots of these polynomials are
• for P1(x):
x1 = 0; (7.488)
• for P2(x):
x1 = −
1
3
, x2 =
1
3
; (7.489)
• for P3(x):
x1 = −
3
5
, x2 = 0, x3 =
3
5
; (7.490)
• for P4(x):
x1 = −
30 +
√
480
70
, x2 = −
30 −
√
480
70
, x3 = −
30 −
√
480
70
, x4 =
30 +
√
480
70
.
(7.491)
In the case n = 2 we obtain the system
A1 + A2 = 2, −A1
1
3
+ A2
1
3
= 0, (7.492)
with the solution
A1 = 1, A2 = 1; (7.493)
it results in the quadrature formula
I =
1
−1
f (x)dx ≈ f −
1
3
+ f
1
3
, (7.494)
which is Chebyshev’s quadrature formula (7.471), leading to the same value (7.472) for I.
NUMERICAL EXAMPLES 433
In the case n = 3 we obtain the system
A1 + A2 + A3 = 2, −
3
5
A1 +
3
5
A3 = 0,
3
5
A1 +
2
5
A3 =
2
3
, (7.495)
with the solution
A1 =
5
9
, A2 =
8
9
, A3 =
5
9
(7.496)
and the formula
I =
1
−1
f (x)dx ≈
5
9
f −
3
5
+
8
9
f (0) +
5
9
f
3
5
, (7.497)
from which
I ≈ 1.440423. (7.498)
For n = 4 we obtain the system
A1 + A2 + A3 + A4 = 2, A1x1 + A2x2 + A3x3 + A4x4 = 0,
A1x2
1 + A2x2
2 + A3x2
3 + A4x2
4 =
2
3
, A1x3
1 + A2x3
2 + A3x3
3 + A4x3
4 = 0,
(7.499)
with x1, x2, x3, and x4 given by equation (7.491). The solution of this system is
A1 = A4 =
x2
3 − 1
3
x2
3 − x2
4
= 0.3478548, A2 = A3 =
x2
4 − 1
3
x2
4 − x2
3
= 0.6521452, (7.500)
leading to the quadrature formula
I =
1
−1
f (x)dx ≈ 0.3478548f (x1) + 0.6521452f (x2)
+ 0.6521452f (x3) + 0.3478548f (x4)
= 0.3478548[f (−0.8611363) + f (0.8611363)]
+ 0.6521452[f (−0.3399810) + f (0.3399810)],
(7.501)
from which
I ≈ 1.440422. (7.502)
Another possibility of determination of the integral (7.463) is by the use of the Monte Carlo
method. To do this, we denote by f (x) the function f : [1, 2] → R,
f (x) = x sin x, (7.503)
the derivative of which is
f (x) = sin x − x cos x. (7.504)
The equation f (x) = 0 leads to
x = tan x, (7.505)
without solution in the interval [1, 2].
434 NUMERICAL DIFFERENTIATION AND INTEGRATION
Moreover, f (x) > 0 for any x ∈ [1, 2]. We deduce that the maximum value of the function f
takes place at the point 2, while the minimum value of the same function takes place at the point
1; we may write
max f = f (2) = 1.818595, min f = f (1) = 0.841471. (7.506)
We shall generate pairs of aleatory numbers (a, b), where a is an aleatory number uniformly
distributed in the interval [1, 2], while b is an aleatory number uniformly distributed in the interval
[0, 2]. The value b is then compared with f (a). If b < f (a), then the pair (a, b) is taken into
consideration; otherwise it is eliminated.
We have made 1000 generations of the following form.
Step a b f (a) Counter
1 1.644 1.958 1.639597 0
2 1.064 1.622 0.930259 0
3 1.622 1.414 1.619874 1
4 1.521 0.606 1.519115 2
5 1.212 0.600 1.134820 3
6 1.303 1.086 1.256556 3
7 1.856 0.872 1.781026 4
8 1.648 1.648 1.643091 4
9 1.713 0.702 1.695709 5
10 1.000 1.288 0.841471 5
We obtained the result
I ≈ 1.456. (7.507)
To apply Euler’s or Gregory’s formulae, we may calculate first Bernoulli’s numbers.
Writing
et
− 1 = t +
t2
2!
+
t3
3!
+
t4
4!
+
t5
5!
+ · · · (7.508)
and
t
et − 1
= B0 + B1t +
B2
2!
t2
+
B3
3!
t3
+
B4
4!
t4
+
B5
5!
t5
+ · · · , (7.509)
it follows that
t = B0 + B1t +
B2
2
t2
+
B3
6
t3
+
B4
24
t4
+
B5
120
t5
+ · · ·
× t +
t2
2
+
t3
6
+
t4
24
+
t5
120
+ · · · ,
(7.510)
hence
B0 = 1,
B0
2
+ B1 = 0,
B0
6
+
B1
2
+
B2
2
= 0,
B0
24
+
B1
6
+
B2
4
+
B3
6
= 0,
B0
120
+
B1
24
+
B2
12
+
B3
12
+
B4
24
= 0, . . . ,
(7.511)
from which
B0 = 1, B1 = −
1
2
, B2 =
1
6
, B3 = 0, B4 = −
1
30
, . . . (7.512)
APPLICATIONS 435
On the other hand,
f (x) = x cos x + sin x, (7.513)
f (x) = −x sin x + 2 cos x, (7.514)
f (x) = −x cos x − 3 sin x. (7.515)
The first formula of Euler leads to
2
1
f (x)dx ≈ h
9
i=0
f (xi) − h
4
i=1
Bi
i!
hi−1
[f (i−1)
(2) + f (i−1)
(1)], (7.516)
where
f (2) = 1.8185949, f (1) = 0.8414710, (7.517)
f (2) = 0.077004, f (1) = 1.381773, (7.518)
f (2) = −2.650889, f (1) = 0.239134, (7.519)
f (2) = −1.895599, f (1) = −3.064715. (7.520)
It follows that
I ≈ 1.38428. (7.521)
Analogically, we may use the second formula of Euler or Gregory’s formula too.
We have seen that the value of the considered integral, calculated by the trapezium method is
I2 ≈ 1.4393350. (7.522)
If we would use only the points 1.0, 1.2, 1.4, 1.6, 1.8, and 2.0, then the value of the integral,
calculated by the trapezium method too, would be
I1 ≈
0.2
2
(f (1) + 2(f (1.2) + f (1.4) + f (1.6) + f (1.8)) + f (2)) = 1.4360706. (7.523)
The Richardson extrapolation formula leads to the value
I = I2 +
I2 − I1
22
− 1
= 1.440423. (7.524)
7.20 APPLICATIONS
Problem 7.1
Let us consider the forward eccentric with pusher rod (Fig. 7.5) of a heat engine; the motion law
of the valve is given by s = s(φ), where φ is the rotation angle of the cam.
Let us determine
• the law of motion of the follower;
• the parametric equations of the cam;
• the variation of the curvature radius of the cam, in numerical values.
436 NUMERICAL DIFFERENTIATION AND INTEGRATION
C (XC,YC)
D
B
A
Y
y
XO
x
r0
∼s
sl1
l
Valve
ϕ
ϕFollower
Pusher rod
Figure 7.5 Distribution mechanism.
We know
s =



0 for φ ∈ 0,
π
2
h(1 + cos 2φ) for φ ∈
π
2
,
3π
2
0 for φ ∈
3π
2
, 2π
, (7.525)
h = 4 mm, CD = a = 3 mm, CB = b = 20 mm, AB = l = 70 mm, l1 = 30.72 mm, r0 = 10 mm,
XC = 30 mm, YC = 110 mm.
Solution:
1. Theory
Denoting by θ the rotation angle of the rocker BD, we obtain the relation
θ = arcsin
s
a
. (7.526)
The coordinates of the points A, B in the OXY -system (Fig. 7.5) read
XA = 0, YA = r0 + l1 + s,
XB = XC − b cos θ, YB = YC + b sin θ;
(7.527)
under these conditions, taking into account the relations
(XC − b)2
+ (YC − r0 − l1)2
− l2
= 0 (7.528)
and using the notations
α = b sin θ + YC − r0 − l1, (7.529)
β = 2b[(YC − r0 − l1) sin θ + XC(1 − cos θ)], (7.530)
the relation
(XB − XA)2
+ (YB − YA)2
− l2
= 0 (7.531)
leads to the equation
s2
− 2αs + β = 0, (7.532)
APPLICATIONS 437
s
2
1
M P
Y
y
X
O
x
r0
γ ω
ϕ
ϕ
Figure 7.6 Parametric equations of the cam.
the solution of which
s = α − α2 − β (7.533)
represents the law of motion of the follower.
The numerical solution is obtained by giving to the angle φ values from degree to degree and
by calculating the values of the parameters θi, αi, βi, si, i = 0, 360, by means of relations (7.526),
(7.529), (7.530), and (7.533).
To establish the parametric equations of the cam in the proper system of axes (Fig. 7.6), the
relation between the absolute velocity vM2
, the transportation velocity vM1
, and the relative velocity
vM2M1
of the point M2 is written in the form
vM2
= vM1
+ vM2M1
; (7.534)
projecting on the Oy-axis, we obtain
ω
ds
dφ
= ωOM sin γ (7.535)
or
MP =
ds
dφ
, (7.536)
where ω is the angular velocity of the cam.Under these conditions, the coordinates x, y of the point
M are
x = −(r0 + s) sin φ −
ds
dφ
cos φ, (7.537)
y = (r0 + s) cos φ −
ds
dφ
sin φ, (7.538)
while the curvature radius
R =
dx
dφ
2
+ dy
dφ
2
3
2
d2
x
dφ2
dy
dφ
−
d2
y
dφ2
dx
dφ
(7.539)
becomes
R = r + s +
d2s
dφ2
. (7.540)
438 NUMERICAL DIFFERENTIATION AND INTEGRATION
−10 −8 −6 −4 −2 0 2 4 6 8 10
−12
−10
−8
−6
−4
−2
0
2
4
6
8
10
x
y
Figure 7.7 Representation of the cam.
2. Numerical calculation
For a numerical calculation, we give to the angle φi, i = 0, 360, values from degree in degree;
thus we calculate successively the parameters si, θi, αi, βi, si, xi, yi, Ri by means of relations
(7.525), (7.526), (7.529), (7.530), (7.533), (7.537), (7.538), and (7.540), where for the derivatives
ds/dφ, d2s/dφ2 we use finite differences
ds
dφ
|φ=φi
=
si+1 − si−1
2
180
π
, (7.541)
d2
s
dφ2
|φ=φi
= (si+1 − 2si + si−1)
180
π
2
. (7.542)
The results obtained for φ = 0◦
, φ = 10◦
, . . . , φ = 360◦
are given in Table 7.2.
The representation of the cam is given in Figure 7.7.
If the radius r0 of the basis circle is small, then it is possible that the curvature radius becomes
negative; the cam is no more useful from a technical point of view in this case. To avoid this
situation, we increase the radius r0 of the basis circle.
Problem 7.2
Let the equation of nondamped free nonlinear vibrations be
¨x + f (x) = 0, (7.543)
where f (x) is an odd function
f (x) =



p2
xn
ln+1
if x ≥ 0
p2
ln+1
(−1)n−1
xn
if x < 0
. (7.544)
It is asked to show that the period of vibrations is given by
Tn =
4
p
(n + 1)ln−1
2An−1
In, (7.545)
APPLICATIONS 439
TABLE 7.2 Numerical Results
φi si θi αi βi si
ds
dφ
d2s
dφ2 xi yi Ri
0 0.000 0.000 69.280 0.000 0.000 0.000 0.000 0.000 10.000 10.000
10 0.000 0.000 69.280 0.000 0.000 0.000 0.000 −1.736 9.848 10.000
20 0.000 0.000 69.280 0.000 0.000 0.000 0.000 −3.420 9.397 10.000
30 0.000 0.000 69.280 0.000 0.000 0.000 0.000 −5.000 8.660 10.000
40 0.000 0.000 69.280 0.000 0.000 0.000 0.000 −6.428 7.660 10.000
50 0.000 0.000 69.280 0.000 0.000 0.000 0.000 −7.660 6.428 10.000
60 0.000 0.000 69.280 0.000 0.000 0.000 0.000 −8.660 5.000 10.000
70 0.000 0.000 69.280 0.000 0.000 0.000 0.000 −9.397 3.420 10.000
80 0.000 0.000 69.280 0.000 0.000 0.000 0.000 −9.848 1.736 10.000
90 0.000 0.000 69.280 0.000 0.000 0.008 0.931 −10.000 −0.008 10.931
100 0.241 0.080 69.308 3.890 0.028 0.318 1.750 −9.820 −2.055 11.778
110 0.936 0.312 69.389 15.105 0.109 0.599 1.430 −9.295 −4.020 11.539
120 2.000 0.667 69.513 32.326 0.233 0.807 0.937 −8.458 −5.816 11.170
130 3.305 1.102 69.665 53.512 0.385 0.919 0.330 −7.365 −7.379 10.715
140 4.695 1.565 69.826 76.135 0.547 0.920 −0.318 −6.075 −8.671 10.229
150 6.000 2.000 69.978 97.464 0.700 0.810 −0.931 −4.648 −9.671 9.769
160 7.064 2.355 70.102 114.904 0.824 0.602 −1.432 −3.137 −10.377 9.393
170 7.759 2.587 70.183 126.311 0.906 0.320 −1.760 −1.578 −10.796 9.146
180 8.000 2.668 70.211 130.278 0.934 0.000 −1.874 0.000 −10.934 9.060
190 7.759 2.587 70.183 126.311 0.906 −0.320 −1.760 1.578 −10.796 9.146
200 7.064 2.355 70.102 114.904 0.824 −0.602 −1.432 3.137 −10.377 9.393
210 6.000 2.000 69.978 97.464 0.700 −0.810 −0.931 4.648 −9.671 9.769
220 4.695 1.565 69.826 76.135 0.547 −0.920 −0.318 6.075 −8.671 10.229
230 3.305 1.102 69.665 53.512 0.385 −0.919 0.330 7.365 −7.379 10.715
240 2.000 0.667 69.513 32.326 0.233 −0.807 0.937 8.458 −5.816 11.170
250 0.936 0.312 69.389 15.105 0.109 −0.599 1.430 9.295 −4.020 11.539
260 0.241 0.080 69.308 3.890 0.028 −0.318 1.750 9.820 −2.055 11.778
270 0.000 0.000 69.280 0.000 0.000 −0.008 0.931 10.000 −0.008 10.931
280 0.000 0.000 69.280 0.000 0.000 0.000 0.000 9.848 1.736 10.000
290 0.000 0.000 69.280 0.000 0.000 0.000 0.000 9.397 3.420 10.000
300 0.000 0.000 69.280 0.000 0.000 0.000 0.000 8.660 5.000 10.000
310 0.000 0.000 69.280 0.000 0.000 0.000 0.000 7.660 6.428 10.000
320 0.000 0.000 69.280 0.000 0.000 0.000 0.000 6.428 7.660 10.000
330 0.000 0.000 69.280 0.000 0.000 0.000 0.000 5.000 8.660 10.000
340 0.000 0.000 69.280 0.000 0.000 0.000 0.000 3.420 9.397 10.000
350 0.000 0.000 69.280 0.000 0.000 0.000 0.000 1.736 9.848 10.000
360 0.000 0.000 69.280 0.000 0.000 0.000 0.000 0.000 10.000 10.000
where
In =
1
0
dβ
1 − βn+1
, (7.546)
for the initial conditions are
t = 0, x = A, ˙x = 0. (7.547)
Determine numerically the periods T1, T2, T3, T4, T5 for A = l/λ, λ positive.
440 NUMERICAL DIFFERENTIATION AND INTEGRATION
Solution:
1. Theory
The differential equation (7.543), written in the form
˙xd(˙x) + f (x)dx = 0, (7.548)
is integrated in the form
˙x2
2
+
x
0
f (ξ)dξ = C1, (7.549)
the integration constant C1 being specified by the initial conditions in the form
C1 =
A
0
f (ξ)dξ, (7.550)
from which relation (7.549) becomes
˙x2
= 2
A
x
f (ξ)dξ. (7.551)
From the very beginning, the velocity ˙x is negative, hence
˙x = − 2
A
x
f (ξ)dξ, (7.552)
so that
dt = −
dx
2
A
x
f (ξ)dξ
, (7.553)
hence
t = −
x
0
dη
2
A
η
f (ξ)dξ
+ C2. (7.554)
Taking into account the initial given conditions, it follows that
C2 =
A
0
dη
2
A
η
f (ξ)dξ
, (7.555)
from which the relation becomes
t =
A
x
dη
2
A
η
f (ξ)dξ
. (7.556)
APPLICATIONS 441
For x = 0 in equation (7.556), we obtain the time T /4 (a quarter of the period), therefore
T = 4
A
0
dη
2
A
η
f (ξ)dξ
; (7.557)
replacing f (ξ) by (p2
/ln+1
)ξn
, we obtain
T =
4
p
(n + 1)ln+1
2
A
0
dη
An+1 − ηn+1
, (7.558)
so that the substitution η = Aβ leads to
T =
4
p
(n + 1)ln+1
2An+1
In, (7.559)
where In is the integral (7.546).
2. Numerical results
Numerically, we obtain the values:
• with Gauss formula in two points
I1 = 1.328412, T1 = 5.3137
λ
p
, I2 = 1.202903,
T2 = 5.8930
λ
3
2
p
, I3 = 1.139060, T3 = 6.3977
λ2
p
, (7.560)
I4 = 1.099923, T4 = 6.9565
λ
5
2
p
, I5 = 1.073808, T5 = 7.4603
λ3
p
;
• with Gauss formula in three points
I1 = 1.395058, T1 = 5.5802
λ
p
, I2 = 1.259053,
T2 = 6.1681
λ
3
2
p
, I3 = 1.187340, T3 = 6.7166
λ2
p
, (7.561)
I4 = 1.143415, T4 = 7.2316
λ
5
2
p
, I5 = 1.113941, T5 = 7.7176
λ3
p
;
• with Gauss formula in four points
I1 = 1.434062, T1 = 5.7362
λ
p
, I2 = 1.290703,
T2 = 6.3231
λ
3
2
p
, I3 = 1.214628, T3 = 6.8710
λ2
p
, (7.562)
I4 = 1.167633, T4 = 7.3848
λ
5
2
p
, I5 = 1.135837, T5 = 7.8693
λ3
p
.
442 NUMERICAL DIFFERENTIATION AND INTEGRATION
Problem 7.3
We consider the equation of nondamped free vibrations
¨x + f (x) = 0, (7.563)
where
f (x) =
f1 (x) , if x ≤ 0,
f2(x), if x > 0,
(7.564)
and
f1(0) = f2(0) = 0, f1(x) ≤ 0. (7.565)
Show that the period is given by
T = 2
0
A1
dη
2
A1
η
f1(ξ)dξ
+ 2
A2
0
dη
2
A2
η
f2(ξ)dξ
, (7.566)
where the distance A1 is specified by the equation
0
−A1
f (x)dx +
A2
0
f (x)dx = 0, (7.567)
for the initial conditions
t = 0, x = A2, ˙x = 0, A2 > 0. (7.568)
Numerical application for A2 = 0.25 and
f (x) =
−6x2
if x ≤ 0,
6x + 64x3 if x > 0.
(7.569)
Solution:
1. Theory
Applying the theorem of kinetic energy and work on the interval BC (Fig. 7.8) and observing
that the kinetic energy at the points B and C vanishes, we obtain relation (7.567).
Starting from the point B (Fig. 7.8), the particle travels through the direction BO in the time
interval t2 given by the relation (7.556) of Problem 7.2, where x is replaced by 0, f (x) by f2(x)
and A by A2, so that
t2 =
A2
0
dη
2
A2
η
f2(ξ)dξ
. (7.570)
C O B x
A1 A2
Figure 7.8 Problem 7.3.
APPLICATIONS 443
In the study of the motion from the point C toward the point O, we obtain
˙x2
2
+
x
0
f1(ξ)dξ = C1, (7.571)
the initial conditions t = 0, x = −A1, ˙x = 0 leading to
C1 =
A1
0
f1(ξ)dξ, (7.572)
so that
˙x2
= 2
−A1
x
f1(ξ)dξ; (7.573)
because the velocity is ˙x > 0, it follows that
˙x = 2
−A1
x
f1(ξ)dξ, (7.574)
from which
t =
x
0
dη
2
−A1
η
f1(ξ)dξ
+ C2. (7.575)
The initial conditions lead, successively, to
C2 = −
−A1
0
dη
2
−A1
η
f1(ξ)dξ
, (7.576)
t =
x
−A1
dη
2
−A1
η
f1(ξ)dξ
, (7.577)
obtaining the time of traveling through the distance CO (equal to the time corresponding to the
distance OC)
t1 =
0
−A1
dη
2
−A1
η
f1(ξ)dξ
. (7.578)
Summing the times t1 and t2 given by relations (7.578) and (7.570), we obtain half of the period
(T /2), hence the relation (7.566).
2. Numerical calculation
Relations (7.567) and (7.569) lead to
0
−A1
(−6x2
)dx +
A2
0
(6x + 64x3
)dx = 0, (7.579)
−2A3
1 + 3A2
2 + 16A4
2 = 0; (7.580)
444 NUMERICAL DIFFERENTIATION AND INTEGRATION
because A2 = 0.25, it follows that A1 = 0.5, so that we obtain successively
−A1
η
f1(ξ)dξ = 2(η3
+ 0.125), (7.581)
A2
η
f2(ξ)dξ = 0.25 − 3η2
− 16η4
, (7.582)
T = 2
0
−0.5
dη
4(η3 + 0.125)
+ 2
0.25
0
dη
2(0.25 − 3η2 − 16η4)
, (7.583)
so that
T = 2.668799 s, (7.584)
where, for the calculation of the integrals we use Gauss formula in four points.
Problem 7.4
Let us consider the crankshaft mechanism in Figure 7.9, where:
• the crank OA has the length r, while the moment of inertia with respect to the point O is
equal to J1;
• the shaft AB is a homogeneous bar of length l, of constant cross section, of mass m2 and
moment of inertia with respect to the center of gravity J2 = m2l2
/12;
• the rocker in B has the mass m3.
The crank OA is acted by a moment M
M =
M0 if 0 ≤ φ ≤ π,
−M0 if π < φ ≤ 2π,
(7.585)
and the motion of the mechanism is in a phased regimen, the mean angular velocity of the crank
being ωm.
We ask to determine
• the variation of the angular velocity ω of the crank OA as function of the angle φ;
• the irregularity degree δ0 of the motion;
• the moment of inertia Jv of a fly wheel rigidly linked to the crank OA, so that the irregularity
degree δ be equal to δ0/4.
O
B
C2 (X2,Y2)
A
X
Y
3
2
1
M
ω
ϕ ψ
Figure 7.9 Problem 7.4.
APPLICATIONS 445
Numerical application: ωm = 100 rad s−1
, r = 0.04 m, l = 0.2 m, J1 = 0.00016 kg m2
, m2 =
1.2 kg, J2 = 0.004 kg m2, m3 = 0.8 kg, M0 = 4 N m.
Solution:
1. Theory
Denoting by X2, Y2 the coordinates of the point C2 and by X3 the distance OB, the kinetic
energy of the mechanism is
T =
1
2
J1ω2
+
1
2
J2
dψ
dφ
2
ω2
+
1
2
m2
dX2
dφ
2
+
dY2
dφ
2
ω2
+
1
2
m3
dX3
dφ
2
ω2
(7.586)
or with the notation
Jred(φ) = J1 + J2
dψ
dφ
2
+ m2
dX2
dφ
2
+
dY2
dφ
2
+ m3
dX3
dφ
2
, (7.587)
T =
1
2
Jred(φ)ω2
. (7.588)
The numerical computation of the moment of inertia Jred(φ) is made by the successive relations
ψ = arcsin
r sin φ
l
, (7.589)
dψ
dφ
=
r cos φ
l cos ψ
, (7.590)
dX2
dφ
= −r sin φ −
l
2
dψ
dφ
sin ψ, (7.591)
dY2
dφ
=
l
2
dψ
dφ
cos ψ, (7.592)
dX3
dφ
= −r sin φ − l
dψ
dφ
sin ψ. (7.593)
Applying the theorem of the kinetic energy between the position in which φ = 0, Jred(0) =
J1 + m2r2/3, ω(0) = ω0 and an arbitrary position, we obtain the equality
Jred(φ)ω2
− J0ω2
0 = 2L(φ), (7.594)
where
L(φ) =
φ
0
M dφ =
M0φ if 0 ≤ φ ≤ π,
M0 (2π − φ) if π < φ ≤ 2π.
(7.595)
The motion is periodic, because L(2π) = 0, L(2π) = L(0), the period being φd = 2π.
From equation (7.594), we deduce
ω(φ) =
2L(φ) + J0ω2
0
Jred(φ)
, (7.596)
while the mean angular velocity is given by
ωm =
1
2π
2π
0
2L(φ) + J0ω2
0
Jred(φ)
dφ. (7.597)
446 NUMERICAL DIFFERENTIATION AND INTEGRATION
From equation (7.597), we obtain the unknown ω0.
We take as approximate value of start ω = ωm, and with the notation
F(ω0) =
1
2π
2π
0
2L(φ) + J0ω2
0
Jred(φ)
dφ − ωm, (7.598)
applying Newton’s method, it follows that
ω0 = −
2π
0
2L(φ) + J0ω2
0
Jred(φ)
dφ − 2πωm
2π
0
Jred(φ)
2L(φ) + J0ω2
0
J0ω0
Jred(φ)
dφ
(7.599)
and the iterative process continues till | ω0| < 0.01.
From the graphic representation of the function ω(φ), we obtain the values ωmin, ωmax and it
follows that
δ =
ωmax − ωmin
ωm
. (7.600)
Adding the fly wheel of moment of inertia Jv, relation (7.598) becomes
F(ω0) =
1
2π
2π
0
2L(φ) + (J0 + Jv)ω2
0
Jred(φ) + Jv
dφ − ωm. (7.601)
We consider Jv = J0/10 and we calculate ω0, ωmin, ωmax, δ for the set of values Jv, 2 Jv,
. . . , comparing δ with δ0/4. The function δ(Jv) is decreasing.
0 50 100 150 200 250 300 350 400
60
80
100
120
140
160
180
200
ϕ (°)
ω(rads–1)
Figure 7.10 Diagram ω = ω(φ).
FURTHER READING 447
0 1 2 3 4 5 6 7 8
× 10−3
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
δ
Jv (kg m2
)
Figure 7.11 Diagram δ = δ(Jv); dashed lines mark δ0 and δ0/4.
2. Numerical calculation
We obtain the results plotted in the diagrams in Figure 7.10 and Figure 7.11.
It follows that
ωmin = 67.2455 rad s−1
, ωmax = 195.8535 rad s−1
, δ0 = 1.2861, Jv ≈ 4.5 kg m2
. (7.602)
FURTHER READING
Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of
America.
Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis:
Theory, Methods and Practice. Boca Raton: CRC Press.
Atkinson KE (1989). An Introduction to Numerical Analysis. 2nd ed. New York: John Wiley & Sons,
Inc.
Atkinson KE (2003). Elementary Numerical Analysis. 2nd ed. New York: John Wiley & Sons, Inc.
Bakhvalov N (1976). M´ethodes Num´erique. Moscou: Editions Mir (in French).
Berbente C, Mitran S, Zancu S (1997). Metode Numerice. Bucures¸ti: Editura Tehnic˘a (in Romanian).
Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole.
Butt R (2009). Introduction to Numerical Analysis Using MATLAB. Boston: Jones and Bartlett
Publishers.
Chapra SC (1996). Applied Numerical Methods with MATLAB for Engineers and Scientists. Boston:
McGraw-Hill.
Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson.
Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice Hall.
Davis JD, Rabinowitz P (2007). Methods of Numerical Integration. 2nd ed. New York: Dover
Publications.
D´emidovitch B, Maron I (1973). ´El´ements de Calcul Num´erique. Moscou: Editions Mir (in French).
448 NUMERICAL DIFFERENTIATION AND INTEGRATION
DiBenedetto E (2010). Classical Mechanics: Theory and Mathematical Modeling. New York: Springer-
Verlag.
Epperson JF (2007). An Introduction to Numerical Methods and Analysis. Hoboken: John Wiley &
Sons, Inc.
Fung YC, Tong P (2011). Classical and Computational Solid Mechanics. Singapore: World Scientific
Publishing.
Gautschi W (1997). Numerical Analysis: An Introduction. Boston: Birkh¨auser.
Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen-
tation of Algorithms. Princeton: Princeton University Press.
Hamming RW (1987). Numerical Methods for Scientists and Engineers. 2nd ed. New York: Dover
Publications.
Hamming RW (2012). Introduction to Applied Numerical Analysis. New York: Dover Publications.
Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing.
Hildebrand FB (1987). Introduction to Numerical Analysis. 2nd ed. New York: Dover Publications.
Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill.
Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd
ed. Boca Raton: CRC Press.
Kleppner D, Kolenkow RJ (2010). An Introduction to Mechanics. Cambridge: Cambridge University
Press.
Kress R (1996). Numerical Analysis. New York: Springer-Verlag.
Krˆılov AN (1957). Lect¸ii de Calcule prin Aproximat¸ii. Bucures¸ti: Editura Tehnic˘a (in Romanian).
Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill.
Lange K (2010). Numerical Analysis for Statisticians. 2nd ed. New York: Springer-Verlag.
Lurie AI (2002). Analytical Mechanics. New York: Springer-Verlag.
Mabie HH, Reinholtz CF (1987). Mechanisms and Dynamics of Machinery. 4th ed. New York: John
Wiley & Sons, Inc.
Marciuk GI (1983). Metode de Analiz˘a Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Roma-
nian).
Meriam JL, Kraige LG (2012). Engineering Mechanics: Dynamics. Hoboken: John Wiley & Sons, Inc.
Otto SR, Denier JP (2005). An Introduction to Programming and Numerical Methods in MATLAB.
London: Springer-Verlag.
Palm WJ III (2007). Mechanical Vibrations. Hoboken: John Wiley & Sons, Inc.
Pandrea N, Popa D (2000). Mecanisme. Teorie s¸i Aplicat¸ii CAD. Bucures¸ti: Editura Tehnic˘a (in
Romanian).
Pandrea N (2000). Elemente de Mecanica Solidului ˆın Coordonate Pl¨uckeriene. Bucures¸ti: Editura
Academiei Romˆane (in Romanian).
Pandrea N, St˘anescu ND (2002). Mecanic˘a. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
Postolache M (2006). Modelare Numeric˘a. Teorie s¸i Aplicat¸ii. Bucures¸ti: Editura Fair Partners (in
Romanian).
Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific
Computing. 3rd ed. Cambridge: Cambridge University Press.
Quarteroni A, Sacco R, Saleri F (2010). Numerical Mathematics. 2nd ed. Berlin: Springer-Verlag.
Ralston A, Rabinowitz P (2001). A First Course in Numerical Analysis. 2nd ed. New York: Dover
Publications.
Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press.
Salvadori MG, Baron ML (1962). Numerical Methods in Engineering. Englewood Cliffs: Prentice
Hall.
Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson.
FURTHER READING 449
Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN.
Bucures¸ti: Editura Tehnic˘a (in Romanian).
Sinha AK (2010). Vibration of Mechanical Systems. Cambridge: Cambridge University Press.
St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
Stoer J, Bulirsh R (2010). Introduction to Numerical Analysis. 3rd ed. New York: Springer-Verlag.
S¨uli E, Mayers D (2003). An Introduction to Numerical Analysis. Cambridge: Cambridge University
Press.
Udris¸te C, Iftode V, Postolache M (1996). Metode Numerice de Calcul. Algoritmi s¸i Programe Turbo
Pascal. Bucures¸ti: Editura Tehnic˘a (in Romanian).
8
INTEGRATION OF ORDINARY
DIFFERENTIAL EQUATIONS AND OF
SYSTEMS OF ORDINARY DIFFERENTIAL
EQUATIONS
This chapter presents the numerical methods for the integration of ordinary differential equations
and of systems of differential equations. We thus present Euler’s method, Taylor’s method, the
Runge–Kutta methods, the multistep methods, and the predictor–corrector methods. Finally, we
close the chapter with some applications.
8.1 STATE OF THE PROBLEM
Let us consider the ordinary differential equation
dx
dt
= f(t, x), (8.1)
where x ∈ Rn
, f : Rn+1
→ R, and t ∈ I, with I interval on the real axis. We shall attach to equation
(8.1) the initial condition
x(t0) = x0
. (8.2)
Relations (8.1) and (8.2) form the so-called Cauchy problem or the problem with initial values,
which can be written in detail as
dx1
dt
= f1(t, x1, x2, . . . , xn),
dx2
dt
= f2(t, x1, x2, . . . , xn), . . . ,
dxn
dt
= fn(t, x1, x2, . . . , xn),
(8.3)
to which we add
x1(t0) = x0
1 , x2(t0) = x0
2 , . . . , xn(t0) = x0
n. (8.4)
Equation (8.1), to which we added the initial condition (8.2), is equivalent to the differential equation
system (8.3), to which we add the initial conditions (8.4). It follows that we can thus treat the most
general case of Cauchy problems (8.1) and (8.2).
Numerical Analysis with Applications in Mechanics and Engineering, First Edition.
Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.
 2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc.
451
452 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
The first question is to find the conditions under which Cauchy problems (8.1) and (8.2) have
solutions, and especially solutions that are unique.
Theorem 8.1 (Of Existence and Uniqueness; Cauchy–Lipschitz1
). Let f : I × G ⊂ R × Rn
be
continuous and Lipschitzian. Under these conditions, for any t0 ∈ I and x0
∈ G, fixed, there exists
a neighborhood I0 × J0 × G0 ∈ VRn+2 (t0, t0, x0) (i.e., a neighborhood in Rn+2
for (t0, t0, x0))
with the propriety that I0 × J0 × G0 ⊂ I × I × G and that there exists a unique a function
α ∈ C0
I0 × J0 × G0 with the properties
dα(tτξ0)
dt
= f(t, α(tτξ0
)), (8.5)
for any t ∈ I0, and
α(τ, τ, ξ0
) = ξ0
, (8.6)
for any (τ, ξ0
) ∈ (J0 × G0).
Definition 8.1 We say that Cauchy problems (8.1) and (8.2) are correctly stated if
(i) there exists a unique solution x = x(t) of problems (8.1) and (8.2);
(ii) there exists ε > 0 with the property that the problem
dz
dt
= f(t, z) + δ(t), z(0) = z0
= x0
+ ε0
(8.7)
admits a unique solution z = z(t) for any ε0
with ε0
< ε and δ(t) < ε;
(iii) there exists a constant K > 0 such that
z(t) − x(t) < Kε (8.8)
for any t ∈ I.
Definition 8.2 Cauchy problem (8.7) is named the perturbed problem associated to Cauchy prob-
lems (8.1) and (8.2).
Corollary 8.1 Cauchy problems (8.1) and (8.2) are correctly stated problems under the conditions
of the Cauchy–Lipschitz theorem.
Demonstration. The corollary is obvious considering ε, ε0
such that we do not leave the domain
I × G.
If we abandon the Lipschitz condition in the Cauchy–Lipschitz theorem, then we can prove only
the existence of the solution of the Cauchy problem.
Theorem 8.2 (Of Existence; Peano2
). Let f : I × G ⊂ R × Rn
→ R be continuous in I × G.
Under these conditions, for any (t0, x0
) ∈ I × G there exists a solution of Cauchy problems (8.1)
and (8.2).
1
The theorem is also known as Picard–Lindel¨of theorem. It was called after Charles ´Emile Picard (1856–1941),
Ernst Leonard Lindel¨of (1870–1946), Rudolf Otto Sigismund Lipschitz (1832–1903), and Augustin–Louis Cauchy
(1789–1857).
2
Giuseppe Peano (1858–1932) proved this theorem in 1886.
STATE OF THE PROBLEM 453
Observation 8.1
(i) The Cauchy–Lipschitz and Peano theorems assure the existence and uniqueness or only
the existence of the solution of the Cauchy problem, respectively, in a neighborhood of the initial
conditions. In general, the solution can be extended without problems to intervals long enough, but
there is no rule in this sense.
(ii) If we consider the ordinary differential equation
dn
y
dtn
= f t, y,
dy
dt
,
d2
y
dt2
, . . . ,
dn−1
y
dtn−1
, (8.9)
with the conditions
y(0) = y0,
dy
dt
(0) = y0, . . . ,
dn−1y
dtn−1
(0) = y(n−1)
0 , (8.10)
then, using the notations
x1 = y, x2 =
dy
dt
, . . . , xn =
dn−1y
dtn−1
, (8.11)
we obtain the system
dx2
dt
= x1,
dx3
dt
= x2, . . . ,
dxn−1
dt
= xn−2,
dxn
dt
= f (t, x1, x2, . . . , xn), (8.12)
for which the initial conditions are
x1(0) = x0
1 = y0, x2(0) = x0
2 = y0, . . . , xn(0) = x0
n = y(n−1)
0 . (8.13)
It thus follows that equation (8.9) is not a special case and that it can be considered in the frame
of the general Cauchy problems (8.1) and (8.2).
(iii) The Cauchy–Lipschitz and Peano theorems give us sufficient conditions for the existence
and uniqueness or only for the existence of the solution of Cauchy problems (8.1) and (8.2),
respectively. Therefore, it does not mean that, if the hypotheses of these theorems are not satisfied,
then the Cauchy problem has no solution or that the solution is not unique.
Let us consider, for instance, the problem of a ball that falls on the surface of the Earth, the
restitution coefficient being k. The mechanical problem is simple, and if we denote by h0 the initial
height of the ball, then it will collide with the Earth, at a speed v0 = 2gh0; after the collision, it will
have the speed v = v1 = kv0 (Fig. 8.1). The new height reached by the ball is h1 = v2
1/2g = kh0
and the process can continue, the ball jumping lesser and lesser. During the time when the ball is
in the air the mathematical problem is simple, the equation of motion being
¨x = −g. (8.14)
The inconveniences appear at the collision between the ball and the Earth, when the velocity vector
presents discontinuities in both modulus and sense. Obviously, none of the previous theorems can
be applied, although the problem has a unique solution.
(iv) As we observed, the Cauchy–Lipschitz or Peano theorems can be applied on some subin-
tervals (the time in which the ball is in the air), the solution being obtained piecewise.
454 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
h1
h0
x
ν0′ = ν1
ν0
Figure 8.1 The collision between a ball and the surface of the Earth.
8.2 EULER’S METHOD
The goal of the method3 is to obtain an approximation of the solution of the Cauchy problem
dx
dt
= f (t, x), t ∈ [t0, tf ], x(t0) = x0
, (8.15)
considered as a correct stated problem.
Let the interval [t0, tf ] be divided by N + 1 equidistant points (including the limit ones),
ti = t0 + ih, h =
tf − t0
N
, i = 0, 1, . . . , N. (8.16)
We shall assume that the unique solution x = x(t) is at least of class C2 in the interval [t0, tf ] and
we shall use the Taylor theorem
x(ti+1) = x(ti + h) = x(ti) + h
dx(ti)
dt
+
h2
2
d2
x(ξi)
dt2
, (8.17)
where ξi ∈ (ti, ti+1). Relation (8.17) holds for all i = 0, 1, . . . , N − 1. Writing relation (8.15) for
t = ti,
dx(ti)
dt
= f (ti, x(ti)) (8.18)
and replacing expression (8.18) in equation (8.17), we obtain
x(ti+1) = x(ti + h) = x(ti) + hf (ti, x(ti)) +
h2
2
d2
x(ξi)
dt2
. (8.19)
There results the equation
x(ti+1) − x(ti)
h
= f (ti, x(ti)) +
h
2
d2
x(ξi)
dt2
. (8.20)
Because x is of class C2
in the interval [t0, tf ], we deduce that, for a small h, the expression
(h/2)d2x(ξi)/dt2 is small enough to be neglected in relation (8.20); hence, we obtain
x(ti+1) − x(ti)
h
≈ f (ti, x(ti)). (8.21)
3Leonhard Euler (1707–1783) published this method in Institutionum calculi integralis in 1768–1770.
EULER’S METHOD 455
Denoting
w0 = x(t0), wi+1 = wi + hf (ti, wi), (8.22)
we get
wi ≈ x(ti) (8.23)
for all i = 0, 1, . . . , N.
Definition 8.3 Expression (8.22) is named the equation with finite differences associated to Euler’s
method.
Observation 8.2 Euler’s method can be easily generalized to the n-dimensional case, resulting in
the following algorithm:
– inputs N, t0, tf, x(t0) = x(0), w(0) = x(0);
– calculate h =
tf − t0
N
;
– for i from 1 to N
– calculate ti = t0 + ih;
– calculate w(i)
= w(i−1)
+ hf(ti−1, w(i−1)
).
Lemma 8.1 Let x ∈ R, x ≥ −1, and m ∈ N∗
arbitrary. Under these conditions exists the inequality
0 ≤ (1 + x)m
≤ emx
. (8.24)
Demonstration. The first relation (8.24) is evident. For the second one, we shall proceed by induc-
tion. For m = 1, the relation becomes (the case m = 0 is evident)
1 + x ≤ ex
. (8.25)
Let us consider the function
g : [−1, ∞) → R, g(x) = ex
− x − 1, (8.26)
for which
g (x) = ex
− 1, g (x) = ex
> 0. (8.27)
The equation g (x) = 0 has the unique solution x = 0, which is a point of minimum and g(0) = 0,
such that the relation (8.25) is true for any x ∈ [−1, ∞). Let us assume that expression (8.24) is
true for m ∈ N and let us prove it for m + 1. We have
(1 + x)m+1
= (1 + x)(1 + x)m
≤ (1 + x)emx
≤ ex
emx
= e(m+1)x
. (8.28)
Taking into account the principle of mathematical induction, it follows that equation (8.24) is true
for any m ∈ N.
Lemma 8.2 If m and n are two real positive numbers and {ai}i=0,k is a finite set of real numbers
with a0 ≥ 0, which satisfies the relation
ai+1 ≤ (1 + m)ai + n, i = 0, k − 1, (8.29)
456 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
then
ai+1 ≤ e(i+1)m n
m
+ a0 −
n
m
, i = 0, k − 1. (8.30)
Demonstration. We shall use the induction after i. For i = 0, we have
a1 ≤ (1 + m)a0 + n; (8.31)
applying Lemma 8.1, we obtain
a1 ≤ em
a0 + n. (8.32)
We shall prove that
em
a0 + n ≤ em n
m
+ a0 −
n
m
. (8.33)
The last relation reads equivalently in the form
n ≤ em n
m
−
n
m
, 1 + m ≤ em
, (8.34)
obviously true from Lemma 8.1. Let us suppose that the affirmation is true for i and let us prove
it for i + 1. We can write
ai+1 ≤ (1 + m)ai + n ≤ (1 + m)eim n
m
+ a0 −
n
m
. (8.35)
We shall prove that
(1 + m)eim n
m
+ a0 −
n
m
≤ e(i+1)m n
m
+ a0 −
n
m
, (8.36)
meaning that 1 + m ≤ em
is obviously true. The lemma is completely proved.
Theorem 8.3 (Determination of the Error in Euler’s Method). Let x(t) be the unique solution
of Cauchy problem (8.15) and wi, i = 0, N, the approximations of the values of the solution
obtained using Euler’s method for a certain N > 0, N ∈ N. If x is defined in a convex set D, if it
is Lipschitzian in D, of constant L, and if there is M ∈ R, M > 0, such that
d2
x
dt2
≤ M; (∀)t ∈ [t0, tf ], (8.37)
then
|x(ti) − wi| ≤
hM
2L
[eL(ti−t0)
− 1] (8.38)
for i = 0, N.
Demonstration. For i = 0 we obtain
0 = |x(t0) − w0| ≤
hM
2L
(eL·0
− 1) = 0 (8.39)
and the theorem is true.
TAYLOR METHOD 457
On the other hand,
x(ti+1) = x(ti) + hf (ti, x(ti)) +
h2
2
d2x(ti + θih)
dt2
, (8.40)
where θi ∈ (0, 1), i = 0, N − 1 and
wi+1 = wi + hf (ti, wi), i = 0, N − 1. (8.41)
It successively results in
|x(ti+1) − wi+1| = x ti − wi + hf (ti, x(ti)) − hf (ti, wi) +
h2
2
d2
x(ti + θih)
dt2
≤ |x(ti) − wi| + h|f (ti, x(ti)) − f (ti, wi)| +
h2
2
d2x ti + θih
dt2
≤ |x(ti) − wi|(1 + hL) +
h2
M
2
.
(8.42)
Now applying Lemma 8.2 with aj = |x(tj ) − wj |, j = 0, N, m = hL, n = h2
M/2, expression (8.42)
leads to
|x(ti+1) − wi+1| ≤ e(i+1)hL
x t0 − w0 +
hM
2L
−
hM
2L
. (8.43)
Taking into account that x(t0) = w0 = x0 and (i + 1)h = ti+1 − t0, relation (8.43) leads us to expres-
sion (8.38) that we had to prove.
Observation 8.3 Relation (8.38) shows that the bound of the error depends linearly on the size of
the division step h. In conclusion, a better approximation of the solution is obtained by decreasing
the division step.
8.3 TAYLOR METHOD
We shall consider the Cauchy problem
dx
dt
= f (t, x(t)), t ∈ [t0, tf ], x(t0) = x0, (8.44)
considered as a correct stated one, and we shall assume that the function x = x(t), the solution of
the problem, is at least of class Cn+1
in the interval [t0, tf ].
Using the expansion of the function x = x(t) into a Taylor series, we can write the relation
x(ti+1) = x(ti) + h
dx(ti)
dt
+
h2
2
d2
x(ti)
dt2
+ · · · +
hn
n!
dn
x(ti)
dtn
+
hn+1
(n + 1)!
dn+1
x(ξi)
dtn
, (8.45)
in which ξi is an intermediary point between ti and ti+1, ξi ∈ (ti, ti+1), ti are the nodes of an
equidistant division of the interval [t0, tf ], h is the step of the division, h = (tf − t0)/N, ti = t0 + ih,
i = 1, N, and N is the number of points of the division.
On the other hand, we have
dx
dt
= f (t, x(t)), (8.46)
458 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
d2
x
dt2
=
∂f (t, x(t))
∂t
+
∂f (t, x(t))
∂x
dx
dt
=
∂f (t, x(t))
∂t
+
∂f (t, x(t))
∂x
f (t, x(t)) =
df (t, x(t))
dt
= f (t, x(t)),
(8.47)
d3
x
dt3
=
d
dt
df (t, x (t))
dt
=
∂f (t, x(t))
∂t
+
∂f (t, x(t))
∂x
dx(t)
dt
=
∂f (t, x(t))
∂t
+
∂f (t, x(t))
∂x
f (t, x(t))
=
d2
f (t, x(t))
dt2
= f (t, x(t)).
(8.48)
and, in general,
dk
x(t)
dtk
=
dx(k−1)
(t)
dt
=
df (k−2)
(t, x(t))
dt
= f (k−1)
(t, x(t)). (8.49)
Replacing these derivatives in equation (8.45), it follows that
x(ti+1) = x(ti) + hf (ti, x(ti)) +
h2
2
f (ti, x(ti)) + · · ·
+
hn
n!
f (n−1)
(ti, x(ti)) +
hn+1
(n + 1)!
f (n)
(ξi, x(ξi)).
(8.50)
Renouncing to the remainder, we obtain the equation with finite differences
w0 = x(t0) = x0, wi+1 = wi + hT (n)
(ti, wi), i = 0, N − 1, (8.51)
where
T (n)
(ti, wi) = f (ti, wi) +
h
2
f (ti, wi) + · · · +
hn−1
n!
f (n−1)
(ti, wi). (8.52)
Definition 8.4 Relation (8.51) is called the equation with differences associated to the nth-order
Taylor’s method.
Observation 8.4 Euler’s method is in fact the first-order Taylor’s method.
8.4 THE RUNGE–KUTTA METHODS
The Runge–Kutta method4
implies the obtaining of the values c1, α1, and β1, such that c1f (t +
α1, x + β1) approximates T (2)
(t, x) = f (t, x) + (h/2)f (t, x) with an error at most equal to O(h2
),
which is the truncation error for the second-order Taylor’s method.
On the other hand,
f (t, x(t)) =
df
dt
(t, x(t)) =
∂f
∂t
(t, x(t)) +
∂f
∂x
(t, x(t))x (t), (8.53)
where
x (t) = f (t, x(t)), (8.54)
4The methods were developed by Carl David Tolm´e Runge (1856–1927) and Martin Wilhelm Kutta (1867–1944)
in 1901.
THE RUNGE–KUTTA METHODS 459
hence
T (2)
(t, x(t)) = f (t, x(t)) +
h
2
∂t
∂x
(t, x(t)) +
h
2
∂t
∂x
(t, x(t))f (t, x(t)). (8.55)
Expanding into a Taylor series c1f (t + α1, x + β1) around (t, x), we obtain
c1f (t + α1, x + β1) = c1f (t, x) + c1α1
∂f
∂t
(t, x) + c1β1
∂f
∂t
(t, x) + c1R2(t + α1, x + β1), (8.56)
where the rest R2(t + α1, x + β1) reads
R2(t + α1, x + β1) =
α2
1
2
∂2
f
∂t2
(τ, ξ) + α1β1(τ, ξ) +
β2
1
2
∂2
f
∂x2
(τ, ξ). (8.57)
Identifying the coefficients of f and of its derivatives in formulae (8.55) and (8.56), we find the
system
c1 = 1, c1α1 =
h
2
, c1β1 =
h
2
f (t, x), (8.58)
the solution of which is
c1 = 1, α1 =
h
2
, β1 =
h
2
f (t, x). (8.59)
Therefore, it follows that
T (2)
(t, x) = f t +
h
2
, x +
h
2
f (t, x) − R2 t +
h
2
, x +
h
2
f (t, x) , (8.60)
where
R2 t +
h
2
, x +
h
2
f (t, x) =
h2
8
∂2f
∂t2
(τ, ξ) +
h2
4
f (t, x)
∂2f
∂t∂x
(τ, ξ)
+
h2
8
[f (t, x)]2 ∂2f
∂x2
(τ, ξ).
(8.61)
Observation 8.5 If all second-order partial derivatives of f are bounded, then R2[t + (h/2), x +
(h/2)f (t, x)] will be of order O(h2
).
Definition 8.5 The method with differences obtained from Taylor’s method by replacing T 2(t, x)
is called the Runge–Kutta method of the mean point.
The mean point method is given by the relations
w0 = x(t0) = x0, wi+1 = wi + hf ti +
h
2
, wi +
h
2
f ti, wi , i = 0, N − 1. (8.62)
Definition 8.6
(i) If we approximate
T (2)
(t, x(t)) = f (t, x(t)) +
h
2
∂f
∂x
(t, x(t)) +
h2
6
∂2f
∂x2
(t, x(t)) (8.63)
by an expression of the form
T (2)
(t, x(t)) ≈ c1f (t, x(t)) + c2f (t + α2, x + δ2f (t, x(t))) (8.64)
460 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
so that the error is of order O(h2
), and if we choose the parameters
c1 = c2 =
1
2
, α2 = δ2 = h, (8.65)
then we obtain the modified Euler method for which the equation with differences reads
w0 = x(t0) = x0, wi+1 = wi +
h
2
[f(ti, wi) + f(ti+1, wi + hf(ti, wi))], i = 0, N − 1. (8.66)
(ii) Under the same conditions, choosing
c1 =
1
4
, c2 =
3
4
, α2 = δ2 =
2
3
h, (8.67)
we obtain Heun’s method,5
for which the equation with differences is of the form
w0 = x(t0) = x0,
wi+1 = wi +
h
4
f ti, wi + 3f ti +
2
3
h, wi +
2
3
hf ti, wi , i = 0, N − 1. (8.68)
Analogically, the higher order Runge–Kutta formulae are established:
– the third-order Runge–Kutta method for which the equation with differences is
w0 = x(t0) = x0, K1 = hf(ti, wi), K2 = hf ti +
h
2
, wi +
K1
2
,
K3 = hf(ti + h, wi + 2K2 + K1), wi+1 = wi +
1
6
(K1 + 4K2 + K3);
(8.69)
– the fourth-order Runge–Kutta method for which the equation with differences reads
w0 = x(t0) = x0, K1 = hf(ti, wi), K2 = hf ti +
h
2
, wi +
K1
2
,
K3 = hf ti + h, wi +
K2
2
, K4 = hf(ti + h, wi + K3),
wi+1 = wi +
1
6
(K1 + 2K2 + 2K3 + K4);
(8.70)
– the sixth-order Runge–Kutta method for which the equation with differences has the form
w0 = x(t0) = x0, K1 = hf(ti, wi), K2 = hf ti +
h
2
, wi +
K1
3
,
K3 = hf ti +
2h
5
, wi +
1
25
6K2 + 4K1 ,
K4 = hf ti + h, wi +
1
4
15K3 − 12K2 + K1 ,
K5 = hf ti +
2h
3
, wi +
2
81
4K4 − 25K3 + 45K2 + 3K1 ,
K6 = hf ti +
4h
5
, wi +
1
75
8K4 + 10K3 + 36K2 + 6K1 ,
wi+1 = wi +
1
192
(23K1 + 125K3 − 81K5 + 125K6).
(8.71)
5After Karl L. W. M. Heun (1859–1929) who published it in 1900.
THE RUNGE–KUTTA METHODS 461
Definition 8.7 The local error is the absolute value of the difference between the approximation
at a division point and the exact solution at the same point of the Cauchy problem that has as initial
value the approximation at the previous division point.
Observation 8.6 If y(t) is the solution of the Cauchy problem
˙y(t) = f (t, y), t0 ≤ t ≤ tf , y(t0) = wi, (8.72)
where wi is the approximate value obtained using the method with differences, then the local error
at the point ti+1 has the expression
εi+1(h) = |y(ti+1) − wi+1|. (8.73)
In various problems, we can apply methods that exert some control on the error too. One of
these methods is the Runge–Kutta–Fehlberg method6
for which the algorithm is the following:
– inputs ε > 0, t0, w0 = x(t0) = x0, h = ε1/4, tf ;
– i = 0;
– while ti + h ≤ tf
– calculate
w0 = x(t0) = x0, K1 = hf(ti, wi), K2 = hf ti +
h
2
, wi +
1
3
K1 , K1 = hf(ti, wi),
K2 = hf ti +
h
4
, wi +
1
4
K1 , K3 = hf ti +
3h
8
, wi +
3
32
K1 +
9
32
K2 ,
K4 = hf ti +
12
13
h, wi +
1932
2197
K1 −
7200
2197
K2 +
7296
2197
K3 ,
K5 = hf ti + h, wi +
439
216
K1 − 8K2 +
3680
513
K3 −
845
4104
K4 ,
K6 = hf ti +
h
2
, wi −
8
27
K1 + 2K2 −
3544
2565
K3 +
1859
4104
K4 −
11
40
K5 ;
– calculate
wi+1 = wi +
25
216
K1 +
1408
2565
K3 +
2197
4104
K4 −
1
5
K5,
wi+1 = wi +
16
135
K1 +
6656
12825
K3 +
28561
56430
K4 −
9
50
K5 +
2
55
K6;
– calculate
ri+1 =
1
h
(wi+1 − wi), δ = 0, 84
ε
ri+1
1
4
;
– if δ ≤ 0.1, then h := 0.1h;
– if δ ≥ 4, then h := 4h;
– if 0.1 < δ < 4, then h := δh;
– if ri+1 ≤ ε, then i := i + 1.
In this case, wi approximates xi(t) with a local error at most ε.
6The algorithm was presented by Erwin Fehlberg (1891–1979) in Classical fifth-, sixth-, seventh-, and eighth-order
Runge–Kutta formulae with stepsize control in 1968.
462 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
8.5 MULTISTEP METHODS
The methods presented before required only the knowledge of the value xi at the point ti to determine
numerically the value xi+1 at the point ti+1. Therefore, it was necessary to return only one step to
determine the new value; it means that we discussed one-step methods. The following methods use
approximations of the solutions in more previous points to determine the approximate solution at
the present division point.
Definition 8.8 It is called the multistep method for the determination of the approximate wi of the
solution of the Cauchy problem
˙x(t) = f(t, x), t0 ≤ t ≤ tf , x(t0) = x0, (8.74)
at the division point ti+1, by using the equations with finite differences, which can be represented
in the form
wi+1 = am−1wi + am−2wi−1 + · · · + a0wi+1−m
+ h[bmf(ti+1, wi+1) + bm−1(ti+1, wi+1) + · · · + b0f(ti+1−m, wi+1−m)],
i = m − 1, . . . , N − 1,
(8.75)
where N is the number of the division steps of the interval [t0, tf ], h is the division step of the
same interval, h = (tf − t0)/N, m > 1, and in addition
w0 = x(t0) = x0, w1 = x(t1) = x1, . . . , wm−1 = x(tm−1) = xm−1. (8.76)
Definition 8.9
(i) If bm = 0, then the method is called explicit or open because relation (8.75) is an explicit
equation to determine wi+1.
(ii) If bm = 0, then the method is called implicit or closed because wi+1 appears in both members
of expression (8.75).
Observation 8.7 The start values w0, w1, . . . , wm−1 must be specified according to formula
(8.76); that is, they must be the exact values of the function x = x(t) at the points t0, t1 = t0 + h,
. . . , tm−1 = t0 + (m − 1)h, or they can be determined using a one-step method starting from the
value w0 = x(t0) = x0.
The most used technique to obtain multistep methods starts from the evident equality
x(ti+1) = x(ti) +
ti+1
ti
f(t, x(t))dt. (8.77)
Owing to the fact that the integral at the right hand part of relation (8.77) cannot be calculated
because the solution x(t) is not known, we replace f(t, x(t)) by an interpolation polynomial P(t) that
is determined as a function of the known values (t0, w0), (t1, w1), . . . , (ti, wi), where wj = x(tj ),
j = 0, i. Relation (8.77) now becomes
x(ti+1) ≈ x(ti) +
ti+1
ti
P(t)dt. (8.78)
ADAMS’S METHOD 463
8.6 ADAMS’S METHOD
In the equation7
dx
dt
= f (t, x), (8.79)
we replace the function f (t, x) by the first five terms of Newton’s polynomial
N5(q) = x0 +
q
1!
x0 +
q(q − 1)
2!
2
x0 +
q(q − 1)(q − 2)
3!
3
x0
+
q(q − 1)(q − 2)(q − 3)
4!
4
x0,
(8.80)
in which q = (t − t0)/h, h = (tf − t0)/N, N being the number of the division points in the interval
t0, tf , t = t0 + qh, dt = hdq. Integrating, it follows that
x1 − x0 =
x0+h
x0
f (t, x)dt = h
1
0
f (t, x)dq
= h x0 +
1
2
x0 −
1
12
2
x0 −
1
24
3
x0 −
19
720
4
x0 ,
(8.81)
x2 − x0 =
x0+2h
x0
f (t, x)dt = h
2
0
f (t, x)dq
= h 2x0 + 2 x0 +
1
3
2
x0 −
1
90
4
x0 ,
(8.82)
x3 − x0 =
x0+3h
x0
f (t, x)dt = h
3
0
f (t, x)dq
= h 3x0 +
9
2
x0 +
9
4
2
x0 +
3
8
3
x0 −
3
80
4
x0 ,
(8.83)
x4 − x0 =
x0+4h
x0
f (t, x)dt = h
4
0
f (t, x)dq
= h 4x0 + 8 x0 +
20
3
2
x0 +
8
3
3
x0 +
14
45
4
x0 .
(8.84)
The calculation involves successive approximations:
– approximation 1:
x(1)
1 = x0 + f (t0, x0), f (t0, x0) = f (t1, x(1)
1 ) − f (t0, x0); (8.85)
7
The method was presented by John Couch Adams (1819–1892). It appears for the first time in a letter written by
F. Bashforth in 1855.
464 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
– approximation 2:
x(2)
1 = x0 + hf (t0, x0) +
1
2
h f (t0, x0),
x(1)
2 = x0 + 2hf (t0, x0) + 2h f (t0, x0),
f (t0, x0) = f (t1, x(2)
1 ) − f (t0, x0),
2
f (t0, x0) = f (t2, x(1)
2 ) − 2f (t1, x(2)
1 ) + f (t0, x0);
(8.86)
– approximation 3:
x(3)
1 = x0 + hf (t0, x0) +
1
2
h f (t0, x0) −
1
12
h 2
f (t0, x0),
x(2)
2 = x0 + 2hf (t0, x0) + 2h f (t0, x0) +
1
3
h 2
f (t0, x0),
x(1)
3 = x0 + 3hf (t0, x0) +
9
2
h f (t0, x0) +
9
4
h 2
f (t0, x0),
f (t0, x0) = f (t1, x(3)
1 ) − f (t0, x0),
2
f (t0, x0) = f (t2, x(2)
2 ) − 2f (t1, x(3)
1 ) + 3f (t0, x0),
3
f (t0, x0) = f (t3, x(1)
3 ) − 3f (t2, x(2)
2 ) + 3f (t1, x(3)
1 ) − f (t0, x0);
(8.87)
– approximation 4:
x(4)
1 = x0 + hf (t0, x0) +
1
2
h f (t0, x0) −
1
12
h 2
f (t0, x0) +
1
24
h 3
f (t0, x0),
x(3)
2 = x0 + 2hf (t0, x0) + 2h f (t0, x0) +
1
3
h 2
f (t0, x0),
x(2)
3 = x0 + 3hf (t0, x0) +
9
2
h f (t0, x0) +
9
4
h 2
f (t0, x0) +
1
8
h 3
f (t0, x0),
x(1)
4 = x0 + 4hf (t0, x0) + 8h f (t0, x0) +
20
3
h 2
f (t0, x0) +
8
3
h 3
f (t0, x0),
f (t0, x0) = f (t1, x(4)
1 ) − f (t0, x0), 2
f (t0, x0) = f (t2, x(3)
2 ) − 2f (t1, x(4)
1 ) + f (t0, x0),
3
f (t0, x0) = f (t3, x(2)
3 ) − 3f (t2, x(3)
2 ) + 3f (t1, x(4)
1 ) − f (t0, x0),
4
f (t0, x0) = f (t4, x(1)
4 ) − 4f (t3, x(2)
3 ) + 6f (t2, x(3)
2 ) − 4f (t1, x(4)
1 ) + f (t0, x0);
(8.88)
– approximation 5:
x(5)
1 = x0 + hf (t0, x0) +
1
2
h f (t0, x0) −
1
12
h 2
f (t0, x0)
+
1
24
h 3
f (t0, x0) −
19
720
h 4
f (t0, x0),
x(4)
2 = x0 + 2hf (t0, x0) + 2h f (t0, x0) +
1
3
h 2
f (t0, x0) −
1
90
h 4
f (t0, x0)
THE ADAMS–BASHFORTH METHODS 465
x(3)
3 = x0 + 3hf (t0, x0) +
9
2
h f (t0, x0) +
9
4
h 2
f (t0, x0)
+
3
8
h 3
f (t0, x0) −
3
80
h 4
f (t0, x0),
x(2)
4 = x0 + 4hf (t0, x0) + 8h f (t0, x0) +
20
3
h 2
f (t0, x0)
+
8
3
h 3
f (t0, x0) +
14
45
h 4
f (t0, x0),
f (t0, x0) = f (t1, x(5)
1 ) − f (t0, x0),
2
f (t0, x0) = f (t2, x(4)
2 ) − 2f (t1, x(5)
1 ) + f (t0, x0),
3
f (t0, x0) = f (t3, x(3)
3 ) − 3f (t2, x(4)
2 ) + 3f (t1, x(5)
1 ) − f (t0, x0),
4
f (t0, x0) = f (t4, x(2)
4 ) − 4f (t3, x(3)
3 ) + 6f (t2, x(4)
2 ) − 4(t1, x(5)
1 ) + f (t0, x0). (8.89)
The values x1, x2, x3, x4 are calculated repeatedly according to formula (8.86), formula (8.87),
formula (8.88), and formula (8.89) until the difference between two successive iterations
decreases under an imposed value.
We now replace the function f (t, x) by Newton’s polynomial
N∗
5 (q) = f (ti, xi) +
q
1!
f (ti−1, xi−1) +
q(q + 1)
2!
2
f (ti−2, xi−2)
+
q(q + 1)(q − 2)
3!
3
f (ti−3, xi−3) +
q(q + 1)(q − 2)(q − 3)
4!
4
f (ti−4, xi−4),
(8.90)
where q = (t − ti)/h. Thus, it follows that
ti+1
ti
f (t, x)dt = h
1
0
f (t, x)dq. (8.91)
Integrating, we deduce Adams’s formula
xi+1 = xi + hf (ti, xi) +
1
2
h f (ti−1, xi−1) +
5
12
h 2
f (ti−2, xi−2)
+
3
8
h 3
f (ti−3, xi−3) +
251
720
h 4
f (ti−4, xi−4), i = 4, 5, . . .
(8.92)
8.7 THE ADAMS–BASHFORTH METHODS
To deduce the recurrent formula of the Adams–Bashforth method,8
we shall start from the relation
f (ti + qh) = f (ti) +
q
1!
f (ti−1) +
q(q + 1)
2!
2
f (ti−2) +
q(q − 1)(q − 2)
3!
3
f (ti−3) + · · ·
(8.93)
8The methods were published by John Couch Adams (1819–1892) and Francis Bashforth (1819–1912) in An
Attempt to Test the Theories of Capillary Action by Comparing the Theoretical and Measured Forms of Drops
of Fluid, with an Explanation of the Method of Integration Employed in Constructing the Tables which Give the
Theoretical Forms of Such Drops in 1882.
466 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
It follows that
ti+1
ti
P (t)dt = h
1
0
f (ti + qh)dq; (8.94)
using expression (8.93), we obtain
ti+1
ti
P (t)dt = h
1
0
f (ti)dq +
h
1
f (ti−1)
1
0
qdq +
h
2!
2
f (ti−2)
1
0
q(q + 1)dq + · · ·
+
h
r!
r
f (ti−r )
1
0
q(q + 1) · · · (q + r − 1)dq + · · ·
(8.95)
Calculating the integrals and limiting ourselves to the terms up to r
f (ti−r ), we deduce the
expression
ti+1
ti
P (t)dt = hf (ti) +
h
2
f (ti−1) +
5h
12
2
f (ti−2) +
3h
8
3
f (ti−3) +
251h
720
4
f (ti−4) + · · ·
(8.96)
Thus, it results in the recurrent relation for the Adams–Bashforth method
xi+1 = xi + hf (ti) +
h
2
f (ti−1) +
5h
12
2
f (ti−2) +
3h
8
3
f (ti−3) +
251h
720
4
f (ti−4) + · · · (8.97)
Depending on the degree r of the interpolation polynomial, we deduce different
Adams–Bashforth formulae:
– for r = 1:
xi+1 = xi +
h
2
[3f (ti, xi) − f (ti−1, xi−1)]; (8.98)
– for r = 2:
xi+1 = xi +
h
12
[23f (ti, xi) − 16f (ti−1, xi−1) + 5f (ti−2, xi−2)]; (8.99)
– for r = 3:
xi+1 = xi +
h
24
[55f (ti, xi) − 59f (ti−1, xi−1) + 37f (ti−2, xi−2) − 9f (ti−3, xi−3)]; (8.100)
– for r = 4:
xi+1 = xi +
h
720
[1901f (ti, xi) − 2774f (ti−1, xi−1)
+ 2616f (ti−2, xi−2) − 1274f (ti−3, xi−3) + 251f (ti−4, xi−4)].
(8.101)
The most used methods are those of the third, fourth, and fifth order, for which the recurrent
relations read as follows:
THE ADAMS–MOULTON METHODS 467
– the third-order Adams–Bashforth method:
w0 = x(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2,
wi+1 = wi +
h
12
[23f(ti, wi) − 16f(ti−1, wi−1) + 5f(ti−2, wi−2)];
(8.102)
– the fourth-order Adams–Bashforth method:
w0 = x(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2, w3 = x(t3) = x3,
wi+1 = wi +
h
24
[55f (ti, wi) − 59f (ti−1, wi−1) + 37f (ti−2, wi−2) − 9f (ti−3, wi−3)];
(8.103)
– the fifth-order Adams–Bashforth method:
w0 = x(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2, w3 = x(t3) = x3,
w4 = x(t4) = x4, wi+1 = wi +
h
720
[1901f (ti, wi) − 2774f (ti−1, wi−1)
+ 2616f (ti−2, wi−2) − 1274f (ti−3, wi−3) + 251f (ti−4, wi−4)].
(8.104)
Observation 8.8 The start values w0, w1, . . . are obtained using a one-step method.
8.8 THE ADAMS–MOULTON METHODS
Writing the interpolation polynomial P (t) in the form
P (t) = f (ti+1
) +
q − 1
1!
f (ti) +
(q − 1)q
2!
2
f (ti−1)
+
(q − 1)q(q + 1)
3!
3
f (ti−2) +
(q − 1)q(q + 1)(q + 2)
4!
4
f (ti−3)
+ · · · +
(q − 1)q(q + 1) . . . (q + r − 2)
r!
r
f (ti−r+1),
(8.105)
it results, by integration, in
ti+1
ti
P (t)dt = hf (ti+1
) −
h
2
f (ti) −
h
12
2
f (ti−1) −
h
24
3
f (ti−2) +
19h
720
4
f (ti−3) − · · · (8.106)
Limiting the number of terms in the right-hand side of formula (8.106), we obtain the following
particular expressions:
– for r = 1:
xi+1 = xi + 0.5h[f(ti+1, xi+1) + f(ti, xi)]; (8.107)
– for r = 2:
xi+1 = xi +
h
12
[5f(ti+1, xi+1) + 8f(ti, xi) − f(ti−1, xi−1)]; (8.108)
– for r = 3:
xi+1 = xi +
h
24
[9f(ti+1, xi+1) + 19f(ti, xi) − 5f(ti−1, xi−1) + f(ti−2, xi−2)]; (8.109)
468 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
– for r = 4:
xi+1 = xi +
h
720
[251f(ti+1, xi+1) + 646f(ti, xi)
− 246f(ti−1, xi−1) + 106f(ti−2, xi−2) − 19f(ti−3, xi−3)].
(8.110)
The most used Adams–Moulton methods9
are those of the third, fourth, and fifth order for which
the equations with differences read as follows:
– the third-order Adams–Moulton method:
w0 = x0(t0) = x0, w1 = x(t1) = x1,
wi+1 = wi +
h
12
[5f(ti+1, wi+1) + 8f(ti, wi) − f(ti−1, wi−1)];
(8.111)
– the fourth-order Adams–Moulton method:
w0 = x0(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2,
wi+1 = wi +
h
24
[9f(ti+1, wi+1) + 19f(ti, wi) − 5f(ti−1, wi−1) + f(ti−2, wi−2)];
(8.112)
– the fifth-order Adams–Moulton method:
w0 = x0(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2, w3 = x(t3) = x3,
wi+1 = wi +
h
720
[251f(ti+1, wi+1) + 646f(ti, wi)
− 264f(ti−1, wi−1) + 106f(ti−2, wi−2) − 19f(ti−3, wi−3)].
(8.113)
Observation 8.9
(i) Unlike the Adams–Bashforth methods in which the required value wi+1 appears only on
the left side of the equality, in the Adams–Moulton formulae this appears both on the left and right
sides of the equal sign. It follows that, at each step, it is necessary to solve an equation of the form
wi+1 = wi + h[c0f(ti+1, wi+1) + c1f(ti, wi) + · · ·], (8.114)
where c0, c1, . . . are the coefficients that appear in the respective Adams–Moulton formula.
(ii) Equation (8.114) is solved by successive approximations using the recurrent formula
w(k)
i+1 = wi + h[c0f(ti+1, w(k−1)
i+1 ) + c1f(ti, wi) + · · ·], (8.115)
an expression that can also be written in the form
w(k)
i+1 = wi+1 + hc0f(ti+1, w(k−1)
i+1 ) − hc0f(ti, wi), (8.116)
obtained by subtraction of equation (8.114) from equation (8.115).
9Forest Ray Moulton (1872–1952) published these methods in New Methods in Exterior Ballistics in 1926.
PREDICTOR–CORRECTOR METHODS 469
(iii) If the function f is Lipschitzian in the second variable and if there exists L > 0 so as to
have for any y and z
f(t, y) − f(t, z) ≤ L y − z , (8.117)
then expression (8.116) can be written as
w(k)
i+1 − wi+1 ≤ hc0L w(k−1)
i+1 − wi+1 . (8.118)
The last formula offers us the sufficient condition for the convergence of the iterative
procedure
hc0L < 1 or h <
1
c0L
. (8.119)
8.9 PREDICTOR–CORRECTOR METHODS
Definition 8.10 A predictor–corrector method is a linear combination between an explicit multi-
step method and an implicit multistep one, the first realizing a predetermination of the value xi+1
function of the previous values xi, xi−1, . . . , and the second realizing a more correct evaluation of
the value xi+1.
Observation 8.10 The corrector formula can be applied more times until the difference between
two successive iterations x(k)
i+1 and x(k+1)
i+1 becomes less than an imposed value ε, that is,
x(k+1)
i+1 − x(k)
i < ε. (8.120)
We shall now present a few most used predictor–corrector methods.
8.9.1 Euler’s Predictor–Corrector Method
In this case, the formula with differences reads
w0 = x(t0) = x0, w
pred
i+1 = wi + hf(ti, wi),
wcor
i+1 = wi + 0.5h[f(ti, wi) + f(ti+1, w
pred
i+1 ).
(8.121)
8.9.2 Adams’s Predictor–Corrector Methods
These methods consist of an Adams–Bashforth method with the role of predictor for wi+1
and of an Adams–Moulton method with the role of corrector, both methods having the same
order.
We obtain
– the third-order predictor–corrector Adams’s algorithm for which the equations with differ-
ences read
w0 = x(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2,
w
pred
i+1 = wi +
h
12
[23f(ti, wi) − 16f(ti−1, wi−1) + 5f(ti−2, wi−2)],
wcor
i = wi +
h
12
[5f(ti+1, w
pred
i+1 ) + 8f(ti, w
pred
i ) − f(ti−1, w
pred
i−1 );
(8.122)
470 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
– the fourth-order predictor–corrector Adams’s algorithm for which the equations with differ-
ences are
w0 = x(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2, w3 = x(t3) = x3,
w
pred
i+1 = wi +
h
24
[55f(ti, wi) − 59f(ti−1, wi−1) + 37f(ti−2, wi−2) − 9f(ti−3, wi−3)],
wcor
i+1 = wi +
h
24
[9f(ti+1, w
pred
i+1 ) + 19f(ti, w
pred
i ) − 5f(ti−1, w
pred
i−1 ) + f(ti−2, w
pred
i−2 )];
(8.123)
– the fifth-order predictor–corrector Adams’s algorithm for which the equations with differences
have the expressions
w0 = x(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2, w3 = x(t3) = x3,
w4 = x(t4) = x4, w
pred
i+1 = wi +
h
720
[1901f(ti, wi) − 2774f(ti−1, wi−1) + 2616f(ti−2, wi−2)
− 1274f(ti−3, wi−3) + 646f(ti−4, wi−4)],
− 264f(ti−1, w
pred
i−1 ) + 106f(ti−2, w
pred
i−2 ) − 19f(ti−3, w
pred
i−3 )].
(8.124)
The most used is the fourth-order predictor–corrector algorithm.
8.9.3 Milne’s Fourth-Order Predictor–Corrector Method
For this method,10 the equations with differences read
w0 = x(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2, w3 = x(t3) = x3,
w
pred
i+1 = wi−3 +
4
3
h[2f(ti, wi) + 2f(ti−2, wi−2) − f(ti−1, wi−1)],
wcor
i+1 = wi−1 +
h
3
[f(ti+1, wi+1) + 4f(ti, wi) + f(ti−1, wi−1)].
(8.125)
8.9.4 Hamming’s Predictor–Corrector Method
The equations with differences are, in this case,11
w0 = x(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2, w3 = x(t3) = x3,
w
pred
i+1 = wi−3 +
4
3
h[2f(ti, wi) + 2f(ti−2, wi−2) − f(ti−1, wi−1)],
wcor
i+1 =
9
8
wi −
1
8
wi−2 +
3h
8
[f(ti+1, w
pred
i+1 ) + 2f(ti, wi) − f(ti−1, wi−1)].
(8.126)
10
The method was presented by William Edmund Milne (1890–1971) in Numerical Calculus in 1949.
11
The method was described by Richard Wesley Hamming (1915–1998) in Numerical Methods for Scientists and
Engineers in 1962.
THE LINEAR EQUIVALENCE METHOD (LEM) 471
8.10 THE LINEAR EQUIVALENCE METHOD (LEM)
The linear equivalence method (LEM) was introduced by Ileana Toma to study the nonlinear ordinary
differential systems depending on the parameters in a classical linear frame.
The method is presented only for homogeneous nonlinear differential operators with constant
coefficients, although it can be—and was—applied in more general cases.
Consider, therefore, the system
F(y) = ˙y − f(y) = 0, f(y) = [fj (y)]j=1,n, fj (y) =
∞
|µ|=1
fjµyµ
, fjµ ∈ R, (8.127)
to which are associated the arbitrary Cauchy conditions
y(t0) = y0, t0 ∈ R. (8.128)
The main idea of LEM consists of an exponential mapping depending on n parameters—
ξ = (ξ1, ξ2, . . . , ξn) ∈ Rn
—namely,
ν(x, ξ) ≡ e ξ,y
. (8.129)
Multiplying equation (8.127) by ν, and then differentiating it with respect to t and replacing the
derivatives ˙yj from the nonlinear system gives
(a) the first LEM equivalent:
Lν(x, ξ) ≡
∂ν
∂t
− ξ, f(D) ν = 0, (8.130)
a linear partial differential equation, always of first order with respect to x, accompanied by
the obvious condition
ν(t0, ξ) = e ξ,y0 , ξ ∈ Rn
. (8.131)
The usual notation fj (Dξ) stands for the formal operator
fj (Dξ) =
∞
|µ|=1
fµ
∂|µ|
∂ξµ
. (8.132)
The formal scalar product in (8.130) is expressed as
n
j=1
ξj fj (Dξ) ≡ ξ, f(D) . (8.133)
Searching now for the unknown function ν in the class of analytic with respect to ξ functions,
ν(t, ξ) = 1 +
∞
|γ|=1
νγ(t)
ξγ
γ!
(8.134)
is obtained.
472 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
(b) the second LEM equivalent:
δV ≡
dV
dt
− AV = 0, V = (Vj )j∈N, Vj = (νγ)|γ|=j , (8.135)
which must be solved under the Cauchy conditions
V(t0) = (y
γ
0)|γ|∈N. (8.136)
The LEM matrix A is always column-finite; in the case of a polynomial operator, A is also
row-finite. The cells Ass on the main diagonal are square, of s + 1 rows and columns, and are
generated by those fjµ, for which |µ| = 1. The other cells Ak,k+s contain only those fjµ with
|µ| = s + 1. More precisely, the diagonal cells contain the coefficients of the linear part; on the
next upper diagonal we find cells containing the coefficients of the second degree in y, and so on.
In the case of polynomial operators of degree m, the associated LEM matrix is band-diagonal, the
band being made up of m lines.
We can express the LEM matrix as
A(t) =




A11 A12 A13 · · · A1m A1,m+1 · · ·
0 A22 A23 · · · A2m A2,m+1 · · ·
0 0 A33 · · · A3m A3,m+1 · · ·
· · · · · · · · · · · · · · · · · · · · ·



 . (8.137)
It should be mentioned that this particular form of the LEM matrix is also conserved if the
method is applied to nonhomogeneous ordinary differential systems with variable coefficients. This
form permits the calculus by block partitioning, which represents a considerable simplification.
It was proved that any analytic with respect to ξ solution of linear problems (8.130) and (8.131)
is of the exponential form (8.129), with y solution of the nonlinear initial problems (8.127) and
(8.128).
Starting from this essential fact, we can establish various representations of the solution of
nonlinear ordinary differential systems.
Theorem 8.4 The solution of the nonlinear initial problem
(i) coincides with the first n components of the infinite vector
V(t) = eA(t−t0)
V0, (8.138)
where the exponential matrix
eA(t−t0)
= I +
(t − t0)
1!
A +
(t − t0)2
2!
A2
+ · · · +
(t − t0)n
n!
An
+ · · · (8.139)
can be computed by block partitioning, each step involving finite sums;
(ii) coincides with the series
yj (t) = yj0 +
∞
l=1 |γ|=l
ujγ(t)y
γ
0 , j = 1, n, (8.140)
where ujγ(t) are solutions of the finite linear ordinary differential systems
dUk
dt
= AT
1kU1 + AT
2kU2 + · · · + AT
kk Uk, k = 1, l, Us(t) = [uγ(t)]|γ|=s. (8.141)
CONSIDERATIONS ABOUT THE ERRORS 473
which satisfy the Cauchy conditions
U1(t0) = ej , Us(t0) = 0, s = 2, l. (8.142)
T standing for transposed.
The above theorem generalizes a similar one, stated for polynomial ordinary differential systems.
The corresponding result is very much alike a solution of a linear ordinary differential system with
constant coefficients. There is more: the computation is even easier because of the fact that the
eigen values of the diagonal cells are always known. The generalized representation (8.140) is the
normal LEM representation and it was used in many applications requiring the qualitative behavior
of the solution.
8.11 CONSIDERATIONS ABOUT THE ERRORS
The integration error is obviously of the order O(h) for Euler’s method.
Taylor’s method has the advantage that the order of the error is O(hn
), but it has the disadvantage
that it needs the calculus of the derivatives of the function f(t, x(t)).
In the case of the Runge–Kutta type methods the error is of the order O(hp+1
), where p is the
order of the method.
Butcher stated that between the number of evaluations of the function f at each step and the
truncation error’s order, there is a link of the following form:
– for two evaluations of the function f, the truncation error is of the order O(h2
);
– for three evaluations, the truncation error is of the order O(h3
);
– for four or five evaluations, the truncation error is of the order O(h4
);
– for six evaluations, the truncation error is of the order O(h5);
– for seven evaluations, the truncation error is of the order O(h6
);
– for eight or more evaluations of the function f, the truncation error is of the order O(hn−2
),
where n is the number of evaluations.
Proceeding as with the evaluation of the error in the case of Lagrange’s interpolation polynomials,
we obtain the following estimations of the errors in the case of multistep methods:
– for the second-order Adams–Bashforth method,
εx =
5h3
12
M2, M2 = sup
ξ∈[t0,tf ]
|f (ξ, x(ξ))|; (8.143)
– for the third-order Adams–Bashforth method,
εx =
3h4
8
M3, M3 = sup
ξ∈[t0,tf ]
|f (3)
(ξ, x(ξ))|; (8.144)
– for the fourth-order Adams–Bashforth method,
εx =
251h5
720
M4, M4 = sup
ξ∈[t0,tf ]
|f (4)
(ξ, x(ξ))|; (8.145)
474 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
– for the fifth-order Adams–Bashforth method,
εx =
85h6
288
M5, M5 = sup
ξ∈[t0,tf ]
|f (5)
(ξ, x(ξ))|; (8.146)
– for the second-order Adams–Moulton method,
εx =
h3
12
M2, M2 = sup
ξ∈[t0,tf ]
|f (ξ, x(ξ))|; (8.147)
– for the third-order Adams–Moulton method,
εx =
h4
24
M3, M3 = sup
ξ∈[t0,tf ]
|f (3)
(ξ, x(ξ))|; (8.148)
– for the fourth-order Adams–Moulton method,
εx =
19h5
720
M4, M4 = sup
ξ∈[t0,tf ]
|f (4)
(ξ, x(ξ))|; (8.149)
– for the fifth-order Adams–Moulton method,
εx =
3h6
16
M5, M5 = sup
ξ∈[t0,tf ]
|f (5)
(ξ, x(ξ))|. (8.150)
We can easily observe that the Adams–Moulton methods are more precise than the same-order
Adams–Bashforth methods.
8.12 NUMERICAL EXAMPLE
Example Let us consider the Cauchy problem
˙x =
dx
dt
= x + et
(2 cos 2t − sin t), t ∈ [0, 2], x(0) = 1, (8.151)
the solution of which, obviously, is
x(t) = et
(sin 2t + cos t). (8.152)
We shall determine the numerical solution of this Cauchy problem by various methods, with the
step h = 0.1.
In the case of Euler’s method, the calculation relation is
w(i)
= w(i−1)
+ hf (ti−1, w(i−1)
), i = 1, 20, (8.153)
where
f (t, w) = w + et
(2 cos 2t − sin t). (8.154)
It results in Table 8.1.
NUMERICAL EXAMPLE 475
TABLE 8.1 Solution of Problem (8.151) with Euler’s Method
Step ti xi = x(ti) f (ti−1, wi−1) wi
0 0.0 1.000000 – 1.000000
1 0.1 1.319213 3.000000 1.300000
2 0.2 1.672693 3.355949 1.635595
3 0.3 2.051757 3.642913 1.999886
4 0.4 2.444231 3.829149 2.382801
5 0.5 2.834240 3.880586 2.770860
6 0.6 3.202145 3.762036 3.147063
7 0.7 3.524655 3.438735 3.490937
8 0.8 3.775141 2.878185 3.778755
9 0.9 3.924192 2.052280 3.983983
10 1.0 3.940421 0.939656 4.077949
11 1.1 3.791535 −0.471815 4.030767
12 1.2 3.445687 −2.182478 3.812520
13 1.3 2.873060 −4.178426 3.394677
14 1.4 2.047695 −6.429262 2.751751
15 1.5 0.949478 −8.886245 1.863126
16 1.6 −0.433755 −11.481013 0.715025
17 1.7 −2.104107 −14.125068 −0.697482
18 1.8 −4.051585 −16.710208 −2.368502
19 1.9 −6.252297 −19.110082 −4.279511
20 2.0 −8.666988 −21.183026 −6.397813
Another possibility to solve problem (8.151) is the use of Taylor’s method. We shall use Taylor’s
method of second order, for which we have
T (2)
(ti, wi) = f (ti, w) +
h
2
f (ti, wi), i = 0, 19, (8.155)
f (t, x) =
df (t, x)
dt
= x + et
(4 cos 2t − 4 sin 2t − 2 sin t − cos t), (8.156)
wi+1 = wi + hT (2)
(ti, wi), i = 0, 19. (8.157)
The numerical results are given in Table 8.2.
If we solve the same Cauchy problem by Euler’s modified method, then we have the relation
wi+1 = wi +
h
2
[f (ti, wi) + f (ti+1, wi + hf (ti, wi))], i = 0, 19, (8.158)
resulting in Table 8.3.
The solution of Cauchy problems (8.151) and (8.152) by Heun’s method leads to the relation
wi = wi +
h
4
f ti, wi + 3f ti +
2
3
h, wi +
2
3
hf ti, wi (8.159)
and to the data in Table 8.4. Another way to treat Cauchy problems (8.151) and (8.152) is that of
the Runge–Kutta method.
476 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
TABLE 8.2 Solution of Problem (8.151) with Taylor’s Second-Order Method
Step ti f (ti, wi) f (ti, wi) T (2)
(ti, wi) wi
0 0.0 3.000000 4.000000 3.200000 1.000000
1 0.1 3.375949 3.453994 3.548649 1.320000
2 0.2 3.682182 2.589898 3.811677 1.674865
3 0.3 3.885295 1.376238 3.954107 2.056033
4 0.4 3.949228 −0.207727 3.938842 2.451443
5 0.5 3.836504 −2.168613 3.728074 2.845327
6 0.6 3.509807 −4.495524 3.285031 3.218135
7 0.7 2.933886 −7.156876 2.576043 3.546638
8 0.8 2.077767 −10.097625 1.572886 3.804242
9 0.9 0.917204 −13.237152 0.255346 3.961531
10 1.0 −0.562699 −16.468063 −1.386102 3.987065
11 1.1 −2.364790 −19.656143 −3.347597 3.848455
12 1.2 −4.477250 −22.641732 −5.609337 3.513696
13 1.3 −6.871177 −25.242758 −8.133315 2.952762
14 1.4 −9.498565 −27.259588 −10.861545 2.139430
15 1.5 −12.290864 −28.481849 −13.714956 1.053276
16 1.6 −15.158313 −28.697264 −16.593176 −0.318220
17 1.7 −17.990263 −27.702427 −19.375385 −1.977537
18 1.8 −20.656655 −25.315371 −21.922424 −3.915076
19 1.9 −23.010834 −21.389601 −24.080314 −6.107318
20 2.0 −24.893818 −15.829130 −25.685275 −8.515350
TABLE 8.3 Solution of Problem (8.151) with the Modified Euler Method
Step ti f (ti, wi) + f (ti+1, wi + hf (ti, wi)) wi
0 0.0 6.355949 1.000000
1 0.1 6.355949 1.317797
2 0.2 7.036236 1.669609
3 0.3 7.543491 2.046784
4 0.4 7.808220 2.437195
5 0.5 7.756849 2.825037
6 0.6 7.314545 3.190765
7 0.7 6.408693 3.511199
8 0.8 4.973017 3.759850
9 0.9 2.952235 3.907462
10 1.0 0.307146 3.922819
11 1.1 −2.980065 3.773816
12 1.2 −6.900502 3.428791
13 1.3 −11.413518 2.858115
14 1.4 −16.442287 2.036000
15 1.5 −21.870334 0.942484
16 1.6 −27.539431 −0.434488
17 1.7 −33.249253 −2.096950
18 1.8 −38.759174 −4.034909
19 1.9 −43.792562 −6.224537
20 2.0 −48.043864 −8.626730
NUMERICAL EXAMPLE 477
TABLE 8.4 Solution of Equation (8.151) by Heun’s Method
Step ti xi wi
0 0.0 1.000000 1.000000
1 0.1 1.3192132 1.3185770
2 0.2 1.6726927 1.6714575
3 0.3 2.0517570 2.0500182
4 0.4 2.4442311 2.4421527
5 0.5 2.8342401 2.8320649
6 0.6 3.2021455 3.2002036
7 0.7 3.5246551 3.5233706
8 0.8 3.7751413 3.7750360
9 0.9 3.9241925 3.9258857
10 1.0 3.9404206 3.9446252
11 1.1 3.7915355 3.7990486
12 1.2 3.4456868 3.4573757
13 1.3 2.8730600 2.8898418
14 1.4 2.0476947 2.0705108
15 1.5 0.9494781 0.9792638
16 1.6 −0.4337552 −0.3961070
17 1.7 −2.1041065 −2.0577838
18 1.8 −4.0515853 −3.9958985
19 1.9 −6.2522972 −6.1867206
20 2.0 −8.6669884 −8.5911995
Thus, for the Runge–Kutta method of third order we apply the relations
K1 = hf (ti, wi), K2 = hf ti +
h
2
, wi +
K1
2
, K3 = hf (ti + h, wi + 2K2 + K1), (8.160)
wi+1 = wi +
1
6
(K1 + 4K2 + K3), (8.161)
the results being given in Table 8.5. Analogically, for the Runge–Kutta method of fourth order we
have the relations
K1 = hf (ti, wi), K2 = hf ti +
h
2
, wi +
K1
2
, K3 = hf ti +
h
2
, wi +
K2
2
,
K4 = hf (ti + h, wi + K3),
(8.162)
wi+1 = wi +
1
6
(K1 + 2K2 + 2K3 + K4), (8.163)
while for the Runge–Kutta method of sixth order we may write
K1 = hf (ti, wi), K2 = hf ti +
h
2
, wi +
K1
3
,
K3 = hf ti +
2h
5
, wi +
1
25
6K2 + 4K3 ,
K4 = hf ti + h, wi +
1
4
15K3 − 12K2 + K1 ,
478 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
TABLE 8.5 Solution of Equation (8.151) by the Runge–Kutta
Method of Third Order
Step xi wi
0 1.0000000 1.0000000
1 1.3192132 1.3291972
2 1.6726927 1.6949971
3 2.0517570 2.0887387
4 2.4442311 2.4981579
5 2.8342401 2.9071609
6 3.2021455 3.2957409
7 3.5246551 3.6400730
8 3.7751413 3.9128210
9 3.9241925 4.0836843
10 3.9404206 4.1202083
11 3.7915355 3.9888704
12 3.4456868 3.6564447
13 2.8730600 3.0916307
14 2.0476947 2.2669173
15 0.9494781 1.1606331
16 −0.4337552 −0.2408872
17 −2.1041065 −1.9411070
18 −4.0515853 −3.9311771
19 −6.2522972 −6.1880339
20 −8.6669884 −8.6728551
K5 = hf ti +
2
3
h, wi +
2
81
4K4 − 25K3 + 45K2 + 3K1 ,
K6 = hf ti +
4
5
h, wi +
1
75
8K4 + 10K3 + 36K2 + 6K1 , (8.164)
wi+1 = wi +
1
192
(23K1 + 125K3 − 8K5 + 12K6). (8.165)
The results are given in Table 8.6 and Table 8.7.
The solution of Cauchy problems (8.151) and (8.152) by the Runge–Kutta–Fehlberg method
leads to the data in Table 8.8.
We may study the problem by using the multistep methods too.
Thus, Adams method leads to the results in Table 8.9. For Adams–Bashforth methods of the third,
fourth, and fifth order we obtain the data in Table 8.10, Table 8.11, and Table 8.12, respectively.
The use of the Adams–Moulton methods of third, fourth, and fifth order leads to the results in
Table 8.13, Table 8.14, and Table 8.15, respectively.
If we use the predictor–corrector methods, then it results
– for Euler’s predictor–corrector method the data in Table 8.16;
– for Adams’s predictor–corrector method the data in Table 8.17, Table 8.18, and Table 8.19;
– for Milne’s predictor–corrector method of fourth order the data in Table 8.20;
– for Hamming’s predictor–corrector method the data in Table 8.21.
NUMERICAL EXAMPLE 479
TABLE 8.6 Solution of Equation (8.151) by the Runge–Kutta
Method of Fourth Order
Step xi wi
0 1.0000000 1.0000000
1 1.3192132 1.3192130
2 1.6726927 1.6726923
3 2.0517570 2.0517565
4 2.4442311 2.4442305
5 2.8342401 2.8342396
6 3.2021455 3.2021451
7 3.5246551 3.5246551
8 3.7751413 3.7751417
9 3.9241925 3.9241937
10 3.9404206 3.9404228
11 3.7915355 3.7915390
12 3.4456868 3.4456919
13 2.8730600 2.8730670
14 2.0476947 2.0477038
15 0.9494781 0.9494898
16 −0.4337552 −0.4337406
17 −2.1041065 −2.1040888
18 −4.0515853 −4.0515641
19 −6.2522972 −6.2522725
20 −8.6669884 −8.6669599
TABLE 8.7 Solution of Equation (8.151) by the Runge–Kutta
Method of Sixth Order
Step xi wi
0 1.0000000 1.0000000
1 1.3192132 1.3192132
2 1.6726927 1.6726927
3 2.0517570 2.0517570
4 2.4442311 2.4442311
5 2.8342401 2.8342402
6 3.2021455 3.2021455
7 3.5246551 3.5246551
8 3.7751413 3.7751413
9 3.9241925 3.9241926
10 3.9404206 3.9404208
11 3.7915355 3.7915357
12 3.4456868 3.4456871
13 2.8730600 2.8730603
14 2.0476947 2.0476951
15 0.9494781 0.9494786
16 −0.4337552 −0.4337547
17 −2.1041065 −2.1041059
18 −4.0515853 −4.0515846
19 −6.2522972 −6.2522964
20 −8.6669884 −8.6669876
480 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
TABLE 8.8 Solution of Equation (8.151) by the
Runge–Kutta–Fehlberg Method
t w x
0.0000000 1.0000000 1.0000000
0.0316228 1.0968461 1.0968461
0.1469862 1.4814837 1.4814836
0.2666438 1.9231489 1.9231488
0.4162211 2.5081341 2.5081341
0.5210182 2.9141284 2.9141283
0.6250856 3.2883041 3.2883041
0.7170764 3.5733095 3.5733096
0.8010931 3.7773744 3.7773745
0.8799701 3.9039488 3.9039489
0.9551846 3.9516312 3.9516314
1.0276254 3.9171714 3.9171716
1.0979024 3.7965716 3.7965719
1.1664726 3.5854933 3.5854937
1.2337016 3.2794283 3.2794287
1.2998982 2.8737670 2.8737676
1.3653372 2.3638101 2.3638107
1.4302756 1.7447325 1.7447331
1.4949669 1.0114991 1.0114998
1.5596750 0.1587110 0.1587118
1.6246911 −0.8196561 −0.8196552
1.6903571 −1.9307042 −1.9307032
1.7571028 −3.1834262 −3.1834251
1.8255128 −4.5902733 −4.5902721
1.8964618 −6.1705045 −6.1705032
1.9714434 −7.9585369 −7.9585354
8.13 APPLICATIONS
Problem 8.1
Study the motion of a rigid solid with a point constrained to move without friction on a given curve
(Fig. 8.2). As numerical application, let us consider a body formed (Fig. 8.3) from a homogeneous
cube ABDEA B D E of mass m and edge l and a bar OG of length l and negligible mass, G being
the center of the square ABDE. The point O moves without friction on the cylindrical curve of
equations
X0 = l cos ξ1, Y0 = l sin ξ1, Z0 = klξ1. (8.166)
Knowing that the mass m, the length l, and the parameter k have the values
m = 12 kg, l = 0.1 m, k = 0.1, (8.167)
and that the initial conditions of the attached Cauchy problem are (for t = 0)
ξ1 = 0 m, ξ5 = 0 m s−1
, ψ = 0 rad, θ = 0.001 rad, φ = 0 rad, ωx = 0 rad s−1
,
ωy = 0 rad s−1
, ωz = 0 rad s−1
,
(8.168)
draw the variables ξi(t), i = 1, 8.
APPLICATIONS 481
TABLE 8.9 Solution of Equation (8.151) by the Adams Method
Step t x f (ti, wi) f 2
f 3
f 4
f w
0 0.0 1.00000 3.00000 0.37516 −0.07031 −0.03352 −0.00265 1.00000
1 0.1 1.31921 3.37516 0.30485 −0.10384 −0.03617 −0.00145 1.31921
2 0.2 1.67269 3.68001 0.20101 −0.14001 −0.03762 0.00024 1.67269
3 0.3 2.05176 3.88102 0.06100 −0.17764 −0.03739 0.00208 2.05176
4 0.4 2.44423 3.94202 −0.11664 −0.21502 −0.03530 0.00429 2.44423
5 0.5 2.83424 3.82538 −0.33166 −0.25033 −0.03102 0.00678 2.83420
6 0.6 3.20215 3.49371 −0.58199 −0.28134 −0.02424 0.00940 3.20204
7 0.7 3.52466 2.91173 −0.86333 −0.30558 −0.01484 0.01209 3.52448
8 0.8 3.77514 2.04839 −1.16892 −0.32042 −0.00275 0.01473 3.77487
9 0.9 3.92419 0.87948 −1.48934 −0.32317 0.01198 0.01714 3.92381
10 1.0 3.94042 −0.60986 −1.81251 −0.31120 0.02912 0.01916 3.93990
11 1.1 3.79154 −2.42238 −2.12371 −0.28208 0.04827 0.02060 3.79087
12 1.2 3.44569 −4.54609 −2.40579 −0.23381 0.06887 0.02125 3.44486
13 1.3 2.87306 −6.95188 −2.63960 −0.16494 0.09012 0.02092 2.87206
14 1.4 2.04769 −9.59148 −2.80454 −0.07482 0.11104 0.01942 2.04652
15 1.5 0.94948 −12.39601 −2.87935 0.03622 0.13046 0.01658 0.94813
16 1.6 −0.43376 −15.27536 −2.84313 0.16668 0.14704 0.01229 −0.43527
17 1.7 −2.10411 −18.11850 −2.67646 0.31372 0.15933
18 1.8 −4.05159 −20.79496 −2.36274 0.47305
19 1.9 −6.25230 −23.15770 −1.88970
20 2.0 −8.66699 −25.04739
TABLE 8.10 Solution of Equation (8.151) by the
Adams–Bashforth Method of Third Order
Step xi wi
0 1.00000 1.00000
1 1.31921 1.31921
2 1.67269 1.67269
3 2.05176 2.05301
4 2.44423 2.44707
5 2.83424 2.83887
6 3.20215 3.20874
7 3.52466 3.53335
8 3.77514 3.78599
9 3.92419 3.93717
10 3.94042 3.95539
11 3.79154 3.80823
12 3.44569 3.46373
13 2.87306 2.89191
14 2.04769 2.06669
15 0.94948 0.96783
16 −0.43376 −0.41696
17 −2.10411 −2.08986
18 −4.05159 −4.04092
19 −6.25230 −6.24627
20 −8.66699 −8.66659
482 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
TABLE 8.11 Solution of Equation (8.151) by the
Adams–Bashforth Method of Fourth Order
Step xi wi
0 1.00000 1.00000
1 1.31921 1.31921
2 1.67269 1.67269
3 2.05176 2.05176
4 2.44423 2.44433
5 2.83424 2.83441
6 3.20215 3.20233
7 3.52466 3.52478
8 3.77514 3.77513
9 3.92419 3.92394
10 3.94042 3.93980
11 3.79154 3.79041
12 3.44569 3.44391
13 2.87306 2.87047
14 2.04769 2.04414
15 0.94948 0.94479
16 −0.43376 −0.43971
17 −2.10411 −2.11146
18 −4.05159 −4.06043
19 −6.25230 −6.26269
20 −8.66699 −8.67894
TABLE 8.12 Solution of Equation (8.151) by the
Adams–Bashforth Method of Fifth Order
Step xi wi
0 1.00000 1.00000
1 1.31921 1.31921
2 1.67269 1.67269
3 2.05176 2.05176
4 2.44423 2.44423
5 2.83424 2.83420
6 3.20215 3.20204
7 3.52466 3.52448
8 3.77514 3.77487
9 3.92419 3.92381
10 3.94042 3.93990
11 3.79154 3.79087
12 3.44569 3.44486
13 2.87306 2.87206
14 2.04769 2.04652
15 0.94948 0.94813
16 −0.43376 −0.43527
17 −2.10411 −2.10577
18 −4.05159 −4.05338
19 −6.25230 −6.25418
20 −8.66699 −8.66892
APPLICATIONS 483
TABLE 8.13 Solution of Equation (8.151) by the
Adams–Moulton Method of Third Order
Step xi wi
0 1.00000 1.00000
1 1.31921 1.31921
2 1.67269 1.67255
3 2.05176 2.05145
4 2.44423 2.44372
5 2.83424 2.83351
6 3.20215 3.20118
7 3.52466 3.52345
8 3.77514 3.77369
9 3.92419 3.92250
10 3.94042 3.93852
11 3.79154 3.78946
12 3.44569 3.44349
13 2.87306 2.87081
14 2.04769 2.04548
15 0.94948 0.94739
16 −0.43376 −0.43561
17 −2.10411 −2.10562
18 −4.05159 −4.05264
19 −6.25230 −6.25278
20 −8.66699 −8.66680
TABLE 8.14 Solution of Equation (8.151) by the
Adams–Moulton Method of Fourth Order
Step xi wi
0 1.00000 1.00000
1 1.31921 1.31921
2 1.67269 1.67269
3 2.05176 2.05175
4 2.44423 2.44422
5 2.83424 2.83422
6 3.20215 3.20213
7 3.52466 3.52465
8 3.77514 3.77515
9 3.92419 3.92423
10 3.94042 3.94049
11 3.79154 3.79165
12 3.44569 3.44586
13 2.87306 2.87330
14 2.04769 2.04802
15 0.94948 0.94990
16 −0.43376 −0.43323
17 −2.10411 −2.10347
18 −4.05159 −4.05084
19 −6.25230 −6.25143
20 −8.66699 −8.66601
484 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
TABLE 8.15 Solution of Equation (8.151) by the
Adams–Moulton Method of Fifth Order
Step xi wi
0 1.00000 1.00000
1 1.31921 1.31921
2 1.67269 1.67269
3 2.05176 2.05176
4 2.44423 2.44423
5 2.83424 2.83425
6 3.20215 3.20216
7 3.52466 3.52467
8 3.77514 3.77516
9 3.92419 3.92422
10 3.94042 3.94046
11 3.79154 3.79158
12 3.44569 3.44574
13 2.87306 2.87313
14 2.04769 2.04777
15 0.94948 0.94956
16 −0.43376 −0.43366
17 −2.10411 −2.10400
18 −4.05159 −4.05148
19 −6.25230 −6.25219
20 −8.66699 −8.66688
TABLE 8.16 Solution of Equation (8.151) by Euler’s Predictor–Corrector Method
Step xi w
pred
i wcorr
i
0 1.000000000 1.000000000 1.000000000
1 1.319213234 1.300000000 1.317797459
2 1.672692659 1.655172121 1.669609276
3 2.051756990 2.037301965 2.046783846
4 2.444231072 2.434388485 2.437194823
5 2.834240148 2.830692770 2.825037271
6 3.202145482 3.206658671 3.190764509
7 3.524655087 3.539008168 3.511199171
8 3.775141261 3.801043935 3.759850010
9 3.924192475 3.963187530 3.907461783
10 3.940420612 3.993775235 3.922819068
11 3.791535483 3.860124570 3.773815798
12 3.445686849 3.529872878 3.428790709
13 2.873060026 2.972575236 2.858114788
14 2.047694688 2.161532373 2.036000413
15 0.949478141 1.075800882 0.942483717
16 −0.433755206 −0.297681859 −0.434487826
17 −2.104106532 −1.961945934 −2.096950472
18 −4.051585254 −3.907918108 −4.034909166
19 −6.252297191 −6.112558023 −6.224537284
20 −8.666988414 −8.537342587 −8.626730488
APPLICATIONS 485
TABLE 8.17 Solution of Equation (8.151) by Adams’s Predictor–Corrector Method
of Third Order
Step xi w
pred
i wcorr
i
0 1.000000000 1.000000000 1.000000000
1 1.319213234 1.319213234 1.319213234
2 1.672692659 1.672692659 1.672692659
3 2.051756990 2.053006306 2.051661525
4 2.444231072 2.445469036 2.444025281
5 2.834240148 2.835416070 2.833912943
6 3.202145482 3.203185641 3.201689577
7 3.524655087 3.525480687 3.524068322
8 3.775141261 3.775667889 3.774427807
9 3.924192475 3.924333295 3.923363951
10 3.940420612 3.940090450 3.939497056
11 3.791535483 3.790655697 3.790546062
12 3.445686849 3.444190896 3.444670252
13 2.873060026 2.870899572 2.872064424
14 2.047694688 2.044846047 2.046777180
15 0.949478141 0.945948761 0.948703618
16 −0.433755206 −0.437920555 −0.434315861
17 −2.104106532 −2.108820265 −2.104378926
18 −4.051585254 −4.056712629 −4.051494673
19 −6.252297191 −6.257653609 −6.251772462
20 −8.666988414 −8.672338860 −8.665966364
TABLE 8.18 Solution of Equation (8.151) by Adams’s Predictor–Corrector Method
of Fourth Order
Step xi w
pred
i wcorr
i
0 1.000000000 1.000000000 1.000000000
1 1.319213234 1.319213234 1.319213234
2 1.672692659 1.672692659 1.672692659
3 2.051756990 2.051756990 2.051756990
4 2.444231072 2.444325646 2.444229802
5 2.834240148 2.834290471 2.834239769
6 3.202145482 3.202142622 3.202148906
7 3.524655087 3.524590325 3.524665906
8 3.775141261 3.775008126 3.775163759
9 3.924192475 3.923986977 3.924231579
10 3.940420612 3.940142130 3.940481805
11 3.791535483 3.791187569 3.791624671
12 3.445686849 3.445277980 3.445810180
13 2.873060026 2.872604252 2.873223668
14 2.047694688 2.047212125 2.047904558
15 0.949478141 0.948995244 0.949739603
16 −0.433755206 −0.434205656 −0.433437678
17 −2.104106532 −2.104485777 −2.103729696
18 −4.051585254 −4.051849311 −4.051147446
19 −6.252297191 −6.252398041 −6.251798643
20 −8.666988414 −8.666875667 −8.666431527
486 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
TABLE 8.19 Solution of Equation (8.151) by Adams’s Predictor–Corrector Method
of Fifth Order
Step xi w
pred
i wcorr
i
0 1.000000000 1.000000000 1.000000000
1 1.319213234 1.319213234 1.319213234
2 1.672692659 1.672692659 1.672692659
3 2.051756990 2.051756990 2.051756990
4 2.444231072 2.444231072 2.444231072
5 2.834240148 2.834199636 2.834241554
6 3.202145482 3.202095487 3.202148720
7 3.524655087 3.524595924 3.524660533
8 3.775141261 3.775074728 3.775149304
9 3.924192475 3.924120543 3.924203459
10 3.940420612 3.940346215 3.940434826
11 3.791535483 3.791462325 3.791553128
12 3.445686849 3.445619382 3.445708016
13 2.873060026 2.873003382 2.873084668
14 2.047694688 2.047654547 2.047722600
15 0.949478141 0.949460545 0.949508941
16 −0.433755206 −0.433744107 −0.433722093
17 −2.104106532 −2.104060787 −2.104071874
18 −4.051585254 −4.051499477 −4.051550007
19 −6.252297191 −6.252166980 −6.252262480
20 −8.666988414 −8.666810798 −8.666955499
TABLE 8.20 Solution of Equation (8.151) by Milne’s Predictor–Corrector Method
of Fourth Order
Step xi w
pred
i wcorr
i
0 1.000000000 1.000000000 1.000000000
1 1.319213234 1.319213234 1.319213234
2 1.672692659 1.672692659 1.672692659
3 2.051756990 2.051756990 2.051756990
4 2.444231072 2.444313815 2.444232221
5 2.834240148 2.834284533 2.834241933
6 3.202145482 3.202140594 3.202149027
7 3.524655087 3.524591299 3.524660029
8 3.775141261 3.775009983 3.775148704
9 3.924192475 3.923986136 3.924202128
10 3.940420612 3.940134740 3.940433506
11 3.791535483 3.791168274 3.791551327
12 3.445686849 3.445241097 3.445706520
13 2.873060026 2.872542757 2.873083141
14 2.047694688 2.047118841 2.047721888
15 0.949478141 0.948862029 0.949508855
16 −0.433755206 −0.434386680 −0.433720655
17 −2.104106532 −2.104722840 −2.104068978
18 −4.051585254 −4.052149724 −4.051544739
19 −6.252297191 −6.252768644 −6.252254840
20 −8.666988414 −8.667321559 −8.666944624
APPLICATIONS 487
TABLE 8.21 Solution of Equation (8.151) by Hamming’s Predictor–Corrector Method
of Fourth Order
Step xi w
pred
i wcorr
i
0 1.000000000 1.000000000 1.000000000
1 1.319213234 1.319213234 1.319213234
2 1.672692659 1.672692659 1.672692659
3 2.051756990 2.051756990 2.051756990
4 2.444231072 2.444313815 2.444229732
5 2.834240148 2.834283869 2.834239485
6 3.202145482 3.202140273 3.202148436
7 3.524655087 3.524590804 3.524665679
8 3.775141261 3.775008426 3.775164256
9 3.924192475 3.923986924 3.924233399
10 3.940420612 3.940141921 3.940485639
11 3.791535483 3.791187804 3.791631306
12 3.445686849 3.445279364 3.445820481
13 2.873060026 2.872607655 2.873238554
14 2.047694688 2.047218551 2.047924973
15 0.949478141 0.949005833 0.949766481
16 −0.433755206 −0.434189654 −0.433403459
17 −2.104106532 −2.104463036
18 −4.051585254 −4.051818467 −4.051096382
19 −6.252297191 −6.252357753 −6.251738441
20 −8.666988414 −8.666824674 −8.666362041
Solution:
1. Theory
Let us consider the rigid solid in Figure 8.2, in which its point O moves on the curve , and let
O0X0Y0Z0 be a fixed reference system, Oxyz the movable system of the principal axes of inertia,
and C the weight center of the rigid solid.
Further on, we use the notations:
• XO, YO, ZO, the co-ordinates of the point O in the fixed reference system;
• rC, the vector OC;
• xC, yC, zC, the co-ordinates of the point C in the system Oxyz;
• m, the mass of the rigid solid;
• Jx, Jy, Jz, the principal moments of inertia;
• the parametric equations of the curve , given by
XO = f1(λ), YO = f2(λ), ZO = f3(λ), where λ ∈ R; (8.169)
• Ox0y0z0, a reference system with the origin at O and with the axes parallel to those of the
system O0X0Y0Z0, respectively;
• ψ, θ, φ, the Euler angles, which define the position of the system Oxyz relative to the
system Ox0y0z0;
• F, the resultant of the forces that act upon the rigid solid;
• MO, the resultant moment of the given forces at O.
Considering that the parameters ψ, θ, φ, λ and their derivatives ˙ψ, ˙θ, ˙φ, ˙λ, the inertial parameters
m, Jx, Jy, Jz, xC, yC, zC, and also the torsor of the forces {F, MO} at the moment t = 0 are known,
488 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
C
MO
rC
F
O
y
x
z
y0′
x0′
z0′
Y0
Z0
X0
O0
Γ
Figure 8.2 The rigid solid, a point of which is constrained to move without friction on a given curve.
G
mg
Γ
x
X0
Z0
Y0
O0
z
y
O
B A
ED
B′ A′
E′D′
Figure 8.3 Numerical application.
determine the motion, respectively, the functions of time ψ = ψ(t), θ = θ(t), φ = φ(t), λ = λ(t),
XO(t), YO(t), ZO(t).
The theorem of momentum can be written in the vector form
m[aO + ε × rC + ω × (ω × rC)] = F + N1ν + N2β, (8.170)
where
• aO is the acceleration of the point O;
• ε is the angular acceleration of the rigid solid;
• ω is the angular velocity of the rigid solid;
APPLICATIONS 489
• ν, β are the unit vectors of the principal normal and of the binormal, respectively, to the
curve ;
• N1, N2 are the reactions in the direction of the principal normal and in the direction of the
binormal, respectively, to the curve .
The theorem of moment of momentum relative to the point O, in the vector form, reads
rC × maO + [Jxεx − (Jy − Jz)ωyωz]i
+ [Jyεy − (Jz − Jx)ωzωx]j + [Jzεz − (Jx − Jy)ωxωy]k = MO,
(8.171)
where
• ωx, ωy, ωz are the projections of the vector ω onto the axes of the system Oxyz;
• εx, εy, εz are the projections of the vector ε onto the axes of the system Oxyz.
If T1 is a tangent vector at the point O to the curve , then from relation (8.170), by a dot
product of both members by T1, we can eliminate the reactions N1 and N2, obtaining
m{T1 · aO + T1 · (ε × rC) + T1 · [ω × (ω × rC)]} = T1 · F. (8.172)
As we shall see soon, the system consisting of equations (8.171) and (8.172) can be transformed
in a system of eight first-order differential equations, from which the parameters ψ, θ, φ, λ are
finally deduced.
To pass from the system O0X0Y0Z0 to the system Oxyz, the rotation matrix [R] is written in the
form
[R] = [φ][θ][ψ], (8.173)
where
[φ] =


cos φ sin φ 0
− sin φ cos φ 0
0 0 1

 , [θ] =


1 0 0
0 cos θ sin θ
0 − sin θ cos θ

 , [ψ] =


cos ψ sin ψ 0
− sin ψ cos ψ 0
0 0 1

 .
(8.174)
The vector T1, tangent to the curve, and the acceleration aO have the matrix expressions
{T1} =


T1x
T1y
T1z

 =


f1 (λ)
f2(λ)
f3(λ)

 , (8.175)
{aO} =


aOx
aOy
aOz

 = ¨λ{T1} + ˙λ2
{T2}, (8.176)
where
{T2} =


T2x
T2y
T2z

 =


f1 (λ)
f2 (λ)
f3 (λ)

 , (8.177)
in the system O0X0Y0Z0. On the basis of these notations, we calculate the dot product mT1 · aO
and we obtain
mT1 · aO = ¨λA14 + ˙λ2
A15, (8.178)
where
A14 = m(T 2
1x + T 2
1y + T 2
1z), A15 = m(T1xT2x + T1yT2y + T1zT2z). (8.179)
490 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
Further on, the calculation is made in the system Oxyz because the vectors ε, ω, rC, F, MO are
represented in this system. Hence, we calculate successively
{T∗
1} = [R]{T1}, {T∗
2} = [R]{T2}, (8.180)
{a∗
O} = ¨λ{T∗
1} + ˙λ2
{T∗
2}. (8.181)
The components ωx, ωy, ωz of the angular velocity are given by the relations
ωx = ˙ψ sin θ sin φ + ˙θ cos φ, ωy = ˙ψ sin θ cos φ − ˙θ sin φ, ωz = ˙ψ cos θ + ˙φ, (8.182)
from which it follows that
˙ψ =
1
sin θ
(ωx sin φ + ωy cos φ), ˙θ = ωx cos φ − ωy sin φ, ˙φ = ωz −
cos θ
sin θ
(ωx sin φ + ωy cos φ).
(8.183)
Further on, using the matrix notations
{rC} =


xC
yC
zC

 , (8.184)
[ω] =


0 −ωz ωy
ωz 0 −ωx
−ωy ωx 0

 , (8.185)
{F} =


Fx
Fy
Fz

 , {MO} =


MOx
MOy
MOz

 , (8.186)
and the scalar notations
A11 = m(yCT ∗
1z − zCT ∗
1y), A12 = m(zCT ∗
1x − xCT ∗
1z), A13 = m(xCT ∗
1y − yCT ∗
1x), (8.187)
B1 = −A15
˙λ2
− m{T∗
1}T
[ω]2
{rC} + {T∗
1}T
{F}, (8.188)
too, we obtain the relation
A11εx + A12εy + A13εz + A14
¨λ = B1, (8.189)
from equation (8.182). Taking into account relations (8.180) and (8.181), we get for equation (8.171),
the matrix formulation




m¨λ yCT ∗
1z − zCT ∗
1y + m˙λ2
(yCT ∗
2z − zCT ∗
2y) + Jxεx − (Jy − Jz)ωyωz
m¨λ(zCT ∗
1x − xCT ∗
1z) + m˙λ2(zCT ∗
2x − xCT ∗
2z) + Jyεy − (Jz − Jx)ωzωx
m¨λ(xCT ∗
1y − yCT ∗
1x) + m˙λ2
(xCT ∗
2y − yCT ∗
2x) + Jzεz − (Jx − Jy)ωxωy



 =


MOx
MOy
MOz

 ; (8.190)
using the scalar notations
B2 = MOx − m˙λ2
(yCT ∗
2z − zCT ∗
2y) + (Jy − Jz)ωyωz,
B3 = MOy − m˙λ2
(zCT ∗
2x − xCT ∗
2z) + (Jz − Jx)ωzωx,
B4 = MOz − m˙λ2
(xCT ∗
2y − yCT ∗
2x) + (Jx − Jy)ωxωy,
(8.191)
APPLICATIONS 491
we get the system
A11
¨λ + Jxεx = B2, A12
¨λ + Jyεy = B3, A13
¨λ + Jzεz = B4. (8.192)
Equations (8.189) and (8.192) form a linear system of four equations with four unknowns ¨λ, εx, εy,
εz. Finally, if we denote
C =
B1 −
A11B2
Jx
−
A12B3
Jy
−
A13B4
Jz
A14 −
A2
11
Jx
−
A2
12
Jy
−
A2
13
Jz
, (8.193)
then we obtain, from equations (8.189) and (8.192), the system of four differential equations
¨λ = C, εx =
1
Jx
(B2 − A11C), εy =
1
Jy
(B2 − A12C), εz =
1
Jz
(B4 − A13C). (8.194)
To determine the parameters that are involved in the problem, we have to couple the equations
of the kinematic system (8.183) with the equations of system (8.194). Thus, it results in a system
of seven differential equations of first and second order. To apply the fourth-order Runge–Kutta
method, the system must contain only first-order differential equations. With the notations
λ = ξ1, ψ = ξ2, θ = ξ3, φ = ξ4, ˙λ = ξ5, ωx = ξ6, ωy = ξ7, ωz = ξ8, (8.195)
we obtain, from relations (8.183) and (8.194), the following system of eight first-order differential
equations
˙ξ1 = ξ5, ˙ξ2 =
1
sin ξ3
(ξ6 sin ξ4 + ξ7 cos ξ4), ˙ξ3 = ξ6 cos ξ4 − ξ7 sin ξ4,
˙ξ4 = ξ8 −
cos ξ3
sin ξ3
(ξ6 sin ξ4 + ξ7 cos ξ4),
˙ξ5 = C, ˙ξ6 =
1
Jx
(B2 − A11C),
˙ξ7 =
1
Jy
(B2 − A12C), ˙ξ8 =
1
Jz
(B4 − A13C).
(8.196)
Taking into account that the initial conditions are known (or can be deduced), we choose the
integration step t and apply the fourth-order Runge–Kutta method to determine the numerical
results. At each step of the method, we proceed with the calculations in the following manner:
• the matrices {T1} and {T2} with relations (8.175) and (8.177);
• the parameters A14 and A15 with relations (8.179);
• the rotation matrix with relations (8.173) and (8.174);
• the matrices {T∗
1} and {T∗
2} with relations (8.180);
• the matrix [ω] with relation (8.185);
• the expression B1 with relation (8.188);
• the parameters B2, B3, B4 with relations (8.191);
• the parameter C with relation (8.193).
492 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
2. Numerical calculation
The principal axes of inertia determine the system Oxyz, where Ox||BD, Oy||AB, Oz||AA . In
this reference frame, the co-ordinates of the gravity center C are
xC = 0, yC = 0, zC = −
3
2
l. (8.197)
The principal moments of inertia read
Jx = mz2
C +
ml2
6
=
29
12
ml2
, Jy = mz2
C +
ml2
6
=
29
12
ml2
, Jz =
ml2
6
. (8.198)
The force F is given by the weight G, which, in the system O0X0Y0Z0, has the expression
{F0} = mg


0
0
−1

 . (8.199)
The rotation matrix reads
[R] =


cξ2cξ4 − sξ2cξ3sξ4 sξ2cξ4 + cξ2cξ3sξ4 sξ3sξ4
−cξ2sξ4 − sξ2cξ3cξ4 −sξ2sξ4 + cξ2cξ3cξ4 sξ3cξ4
sξ2sξ3 −cξ2sξ3 cξ3

 , (8.200)
from which, if we take into account the relation
{F} = [R]{F0}, (8.201)
we obtain the expression
{F} = −mg


sin ξ3 sin ξ4
sin ξ3 cos ξ4
cos ξ3

 . (8.202)
For the moment MO = OC × F, we obtain the matrix representation
{MO} =
3
2
mgl sin ξ3


− cos ξ4
sin ξ4
0

 . (8.203)
The graphic results obtained after the simulation are captured in Figure 8.4.
This problem may be solved by a method of multibody type, as will be seen in Problem 8.4;
but it is necessary to solve an algebraic-differential system of equations, having the advantage of
obtaining the reactions at the same time.
Problem 8.2
Study the motion of a rigid solid having a point constrained to move without friction on a given
surface (Fig. 8.5). As numerical application, let us consider the body formed (Fig. 8.6) by a homo-
geneous cube ABDEA B D E of mass m and edge l and a bar OG of length l and neglected
mass, G being the center of the square ABDE. The point O moves without friction on the plane of
equations
X0 = ξ1, Y0 = ξ2, Z0 = l − ξ1 − ξ2. (8.204)
APPLICATIONS 493
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−35
−30
−25
−20
−15
−10
−5
0
t (s) t (s)
t (s) t (s)
t (s) t (s)
t (s) t (s)
ξ1
(=λ)(m)
(a)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−35
−30
−25
−20
−15
−10
−5
0
5
ξ2
(=ϕ)(rad)ξ4
(=ϕ)(rad)
(b)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
ξ3
(=θ)(rad)
(c)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−5
0
5
10
15
20
25
(d)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−16
−14
−12
−10
−8
−6
−4
−2
0
2
4
ξ5
(=dotλ
)(ms
–1
)
(e)
ξ6
(=ωx
)(rads
–1
)ξ7
(=ωy
)(rads
–1
)ξ8
(=ωz
)(rads
–1
)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−15
−10
−5
0
5
10
(f)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−15
−10
−5
0
5
10
15
(g)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−0.1
−0.08
−0.06
−0.04
−0.02
0
0.02
0.04
0.06
0.08
0.1
(h)
Figure 8.4 Results of the simulation.
494 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
rC
O0
O
M
C
z′
z
F
y′
y
x′
Z
Y
X
Σ
x
Figure 8.5 The rigid solid with a point constrained to move without friction on a given surface.
D′ E′
A′B′
D E
AB
O
y
x
mg
Y
O0
X G
Z
Figure 8.6 Numerical application.
Knowing that
m = 12 kg, l = 0.1 m, (8.205)
and that the initial conditions are given by (for t = 0)
ξ1 = 0 m, ξ2 = 0 m, ˙ξ1 = 0 m s−1
, ˙ξ2 = 0 m s−1
, ψ = 0 rad,
θ = 0.001 rad, φ = 0 rad,
ωx = 0 rad/s, ωy = 0 rad s−1
, ωz = 0 rad s−1
,
(8.206)
we search the graphical representation of the variations of the variables ξi(t), i = 1, 10.
APPLICATIONS 495
Solution:
1. Theory
Let us consider a rigid solid (Fig. 8.5), the point O of which is constrained to move on the
surface . We shall consider
• the three-orthogonal system O0XYZ;
• the three-orthogonal system Oxyz of the principal axes of inertia, relative to the point O of
the rigid solid;
• the three-orthogonal system Ox y z having the axes parallel to those of the three-orthogonal
system O0XYZ.
The following are known:
– the equations of the surface
X = X(ξ1, ξ2), Y = Y(ξ1, ξ2), Z = Z(ξ1, ξ2), (8.207)
where ξ1 and ξ2, respectively, are two real parameters;
– the mass and the principal moments of inertia of the rigid solid: m and Jx, Jy, and Jz,
respectively;
– the resultant of the given forces F(Fx, Fy, Fz) and the resultant moment M(Mx, My, Mz)
of the given forces;
– the position vector of the gravity center rC(xC, yC, zC).
In addition, we shall define the Euler angles:
ψ = ξ3, θ = ξ4, φ = ξ5. (8.208)
We wish to determine
• the motion and the functions of time ξi = ξi(t), i = 1, 2, . . . , 5;
• the normal reaction N = N(t).
Applying the theorem of momentum in the form of the theorem of gravity center’s motion, we
obtain the vector relation
m[aO + ε × rC + ω × (ω × rC)] = F + N. (8.209)
The theorem of the moment of momentum leads to
mrC × aO + Jε + ω × Jω = M. (8.210)
The passing from the fixed system O0XYZ to the movable system Oxyz, rigidly linked to the
rigid solid, is made by the matrix
[P] = [φ][θ][ψ] = [ξ5][ξ4][ξ3], (8.211)
where
[φ] =


cos φ sin φ 0
− sin φ cos φ 0
0 0 1

 , [θ] =


1 0 0
0 cos θ sin θ
0 − sin θ cos θ

 , [ψ] =


cos ψ sin ψ 0
− sin ψ cos ψ 0
0 0 1

 .
(8.212)
496 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
Making the calculation, we find
[P] =


cξ3cξ5 − sξ3cξ4sξ5 sξ3cξ5 + cξ3cξ4sξ5 sξ4sξ5
−cξ3sξ5 − sξ3cξ4cξ5 −sξ3sξ5 + cξ3cξ4cξ5 sξ4cξ5
sξ3sξ4 −cξ3sξ4 cξ4

 , (8.213)
where the functions cosine and sine are marked by c and s, respectively.
We make the following notations:
˙ξ1 = ξ6, ˙ξ2 = ξ7, ωx = ξ8, ωy = ξ9, ωz = ξ10, (8.214)
{r} =


X
Y
Z

 , {r1} =









∂X
∂ξ1
∂Y
∂ξ1
∂Z
∂ξ1









, {r2} =









∂X
∂ξ2
∂Y
∂ξ2
∂Z
∂ξ2









, (8.215)
{r11} =











∂2
X
∂ξ2
1
∂2Y
∂ξ2
1
∂2
Z
∂ξ2
1











, {r12} =










∂2
X
∂ξ1∂ξ2
∂2
Y
∂ξ1∂ξ2
∂2Z
∂ξ1∂ξ2










, {r22} =











∂2
X
∂ξ2
2
∂2Y
∂ξ2
2
∂2
Z
∂ξ2
2











, (8.216)
{R1} = [P]{r1}, {R2} = [P]{r2}, {R11} = [P]{r11}, {R12} = [P]{r12}, {R22} = [P]{r22},
(8.217)
{rC} =


xC
yC
zC

 , {ω} =


ξ8
ξ9
ξ10

 , {ε} =


˙ξ8
˙ξ9
˙ξ10

 , (8.218)
[rC] =


0 −zC yC
zC 0 −xC
−yC xC 0

 , [ω] =


0 −ξ10 ξ9
ξ10 0 −ξ8
−ξ9 ξ8 0

 , [J] =


Jx 0 0
0 Jy 0
0 0 Jz

 ,
(8.219)
{aO} = aOx aOy aOz
T
, {AO} = [P]{aO}. (8.220)
Considering that r1⊥N, r2⊥N, aO are expressed in the system O0XYZ and ε, rC, ω, F in the
system Oxyz, from equation (8.209) there result the matrix relations
m{r1}T
{aO} + m{ε}T
[rC]{R1} = {R1}T
{F} − m{R1}T
[ω]2
{rC},
m{r2}T
{aO} + m{ε}T
[rC]{R2} = {R2}T
{F} − m{R2}T
[ω]2
{rC},
(8.221)
where
{aO} = {r1}¨ξ1 + {r2}¨ξ2 + {r11}˙ξ2
1 + {r22}˙ξ2
2 + 2{r12}˙ξ1
˙ξ2 (8.222)
or
{aO} = {r1}˙ξ6 + {r2}˙ξ7 + {r11}ξ2
6 + {r22}ξ2
7 + 2{r12}ξ6ξ7. (8.223)
It follows that
{AO} = {R1}˙ξ6 + {R2}˙ξ7 + {R11}ξ2
6 + {R22}ξ2
7 + 2{R12}ξ6ξ7 (8.224)
too.
APPLICATIONS 497
We denote
A11 = m{r1}T
{r1}, A12 = m{r1}T
{r2}, A13 = m(yCR1z − zCR1y), A14 = m(zCR1x − xCR1z),
A15 = m(xCR1y − yCR1x), A21 = m{r2}T
{r1}, A22 = m{r2}T
{r2}, A23 = m(yCR2z − zCR2y),
A24 = m(zCR2x − xCR2z), A25 = m(xCR2y − yCR2x),
(8.225)
B1 = {R1}T
{F} − m{R1}T
[ω]2
{rC} − m{r1}T
{r11}ξ2
6 − m{r1}T
{r22}ξ2
7 − 2m{r1}T
{r12}ξ6ξ7,
B2 = {R2}T
{F} − m{R2}T
[ω]2
{rC} − m{r2}T
{r11}ξ2
6 − m{r2}T
{r11}ξ2
7 − 2m{r2}T
{r12}ξ6ξ7.
(8.226)
From equation (8.209) we obtain the equations
A11
˙ξ6 + A12
˙ξ7 + A13
˙ξ8 + A14
˙ξ9 + A15
˙ξ10 = B1,
A21
˙ξ6 + A22
˙ξ7 + A23
˙ξ8 + A24
˙ξ9 + A25
˙ξ10 = B2.
(8.227)
In the matrix form, relation (8.210) reads
m[rC]{AO} + [J]{ε} + [ω][J]{ω} = {M} (8.228)
or
m[rC]{R1}˙ξ6 + m[rC]{R2}˙ξ7 + [J]{ε} = {M} − [ω][J]{ω} − m[rC]{R11}ξ2
6
− m[rC]{R22}ξ2
7 − 2m[rC]{R12}ξ6ξ7.
(8.229)
If we denote
B3 = Mx + (Jy − Jz)ξ9ξ10 − m(yCR11z − zCR11y)ξ2
6
− m(yCR22z − zCR22y)ξ2
7 − 2m(yCR12z − zCR12y)ξ6ξ7,
B4 = My + (Jz − Jx)ξ10ξ8 − m(zCR11x − xCR11z)ξ2
6
− m(zCR22x − xCR22z)ξ2
7 − 2m(zCR12x − xCR12z)ξ6ξ7,
B5 = Mz + (Jx − Jy)ξ8ξ9 − m(xCR11y − yCR11x)ξ2
6
− m(xCR22y − yCR22x)ξ2
7 − 2m(xCR12y − yCR12x)ξ6ξ7,
(8.230)
then we obtain the system
A13
˙ξ6 + A23
˙ξ7 + Jx
˙ξ8 = B3, A14
˙ξ6 + A24
˙ξ7 + Jy
˙ξ9 = B4,
A15
˙ξ6 + A25
˙ξ7 + Jz
˙ξ10 = B5.
(8.231)
Solving the linear system formed by equations (8.227) and (8.231), it follows that
˙ξi = Di, i = 6, 7, . . . , 10. (8.232)
From the known relations
ωx = ˙ψ sin θ sin φ + ˙θ cos φ, ωy = ˙ψ sin θ cos φ − ˙θ sin φ, ωz = ˙ψ cos θ + ˙φ, (8.233)
which form a system of three equations with the unknowns ˙ψ, ˙θ, and ˙φ, it follows that
˙ψ =
1
sin θ
(ωx sin φ + ωy cos φ), ˙θ = ωx cos φ − ωy sin φ,
˙φ = ωz −
cos θ
sin θ
(ωx sin φ + ωy cos φ).
(8.234)
498 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
With the notations
D1 = ξ6, D2=ξ7, D3 =
1
sin ξ4
(ξ8 sin ξ5 + ξ9 cos ξ5), D4 = ξ8 cos ξ5 − ξ9 sin ξ5,
D5 = ξ10 −
cos ξ4
sin ξ4
(ξ8 sin ξ5 + ξ9 cos ξ5),
(8.235)
it results in the system of first-order differential equations
˙ξi = Di, i = 1, 2, . . . , 10. (8.236)
To apply the fourth-order Runge–Kutta method, it is necessary that at each step we execute the
following calculations:
• the rotation matrix with relation (8.213);
• {r1}, {r2}, {r11}, {r12}, {r22}, with relations (8.215) and (8.216);
• {R1}, {R2}, {R11}, {R12}, {R22}, with relations (8.217);
• {rC}, {ω}, [rC], [ω], with relations (8.218) and (8.219);
• A11, A12, A13, A14, A15, A21, A22, A23, A24, A25, B1, B2, with relations (8.225) and (8.226);
• B3, B4, B5, with relations (8.230);
• the linear system formed by equations (8.227) and (8.231), obtaining the parameters Di,
i = 6, 7, . . . , 10;
• Di, i = 1, 2, . . . , 5, with relations (8.235).
2. Numerical calculation
Proceeding as in the previous application, we get
– the co-ordinates of the gravity center C of the body
xC = 0, yC = 0, zC = −
3
2
l; (8.237)
– the principal moments of inertia
Jx = mz2
C +
ml2
6
=
29
12
ml2
, Jy = mz2
C +
ml2
6
=
29
12
ml2
, Jz =
ml2
6
; (8.238)
– the rotation matrix
P =


cξ3cξ5 − sξ3cξ4sξ5 sξ3cξ5 + cξ3cξ4sξ5 sξ4sξ5
−cξ3sξ5 − sξ3cξ4cξ5 −sξ3sξ5 + cξ3cξ4cξ5 sξ4cξ5
sξ3sξ4 −cξ3sξ4 cξ4

 ; (8.239)
– the matrix expression of the force F, in the system Oxyz,
{F} = −mg


sin ξ4 sin ξ5
sin ξ4 cos ξ5
cos ξ4

 ; (8.240)
– the matrix expression of the moment MO = OC × F,
{MO} =
3
2
mgl sin ξ4


− cos ξ5
sin ξ5
0

 . (8.241)
Integrating the obtained system of differential equations by the fourth-order Runge–Kutta
method, we get the numerical results plotted into diagrams (Fig. 8.7).
APPLICATIONS 499
This problem may be solved by a multibody-type method too, as seen in Problem 8.3; also,
in this case we have to solve an algebraic-differential system of equations, with the advantage of
obtaining, at the same time, the reactions.
Problem 8.3
We consider the parallelepiped ABCDA B C D (Fig. 8.8) of dimensions AD = 2a, AB = 2b,
BB = 2c and of mass m, with the vertex A situated without friction on the cylindrical surface
Z = 1 − x2
. (8.242)
Knowing that the parallelepiped is acted on only by its own weight mg, while the O0Z-axis is
vertical, with the initial conditions t = 0, XO = X0
O
, YO = Y0
O
, ZO = Z0
O
, ψ = ψ0
, θ = θ0
, φ = φ0
,
O being the gravity center, and ψ, θ, φ being Bryan’s angles, let us determine
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
5
10
15
20
25
30
35
40
45
t (s)
t (s) t (s)
t (s)
ξ1
(m)
(a) t (s)
ξ2
(m)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
5
10
15
20
25
30
35
40
45
(b)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−1
−0.5
0
0.5
1
1.5
2
ξ3
(=ϕ)(rad)ξ5
(=ϕ)(rad)
ξ4
(=θ)(rad)
(c)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−2
−1.8
−1.6
−1.4
−1.2
−1
−0.8
−0.6
−0.4
−0.2
0
(d)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−2
−1.5
−1
−0.5
0.5
0
1
(e)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
2
4
6
8
10
12
14
16
18
t (s)
ξ6(=dotξ1)(ms
−1
)
(f)
Figure 8.7 Results of the simulation.
500 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
2
4
6
8
10
12
14
16
18
t (s)
t (s)
t (s)
t (s)
(g)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−10
−8
−6
−4
−2
0
2
4
6
8
10
(h)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−10
−8
−6
−4
−2
0
2
4
6
8
10
(i)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−0.1
−0.08
−0.06
−0.04
−0.02
0
0.02
0.04
0.06
0.08
0.1
(j)
ξ7
(=dotξ2
)(ms–1
)
ξ8
(=ωx
)(rads–1
)ξ10
(=ωz
)(rads–1
)
ξ9
(=ωy
)(rads–1
)
Figure 8.7 (Continued)
• the trajectory of the point A;
• the trajectory of the point O;
• the reaction at A.
Numerical application for a = 0.3 m, b = 0.2 m, c = 0.1 m, X0
O = 0.1 m, Y0
O = 0.2 m, Z0
O =
0.74 m, m = 100 kg, ψ0
= 0 rad, θ0
= 0 rad, φ0
= 0 rad, ˙ψ0
= 0 rad s−1
, ˙θ0
= 0 rad s−1
, ˙φ0
=
0 rad s−1
.
Solution:
1. Theory
1.1. Kinematic relations
We consider the frame of reference Oxyz rigidly linked to the parallelepiped, the axes
Ox, Oy, Oz being parallel to AD, AB, BB , respectively, and the frame of reference
Ox y z with the axes Ox , Oy , Oz , parallel to the axes O0X, O0Y, O0Z.
If, from the position Ox y z we attain the position Oxyz, by successive rotations of
angles ψ, θ, φ, specified in the schema
Ox y z
axis Ox
−−−−→
angle ψ
Ox y z
axis Oy
−−−−→
angle θ
Ox y z
axis Oz
−−−−→
angle φ
Oxyz,
APPLICATIONS 501
Z
Y
X
O
O0
B
D′
B′
C′
C
D
A′
A
mg
Figure 8.8 Problem 8.3.
where ψ, θ, φ are Bryan’s angles, then the partial rotation matrices are
[ψ] =


1 0 0
0 cos ψ − sin ψ
0 sin ψ cos ψ

 , [θ] =


cos θ 0 sin θ
0 1 0
− sin θ 0 cos θ

 ,
[φ] =


cos φ − sin φ 0
sin φ cos φ 0
0 0 1

 , (8.243)
while the matrix of the system Oxyz with respect to the system O0XYZ is
[A] = [ψ][θ][φ]. (8.244)
Associating to the matrices [ψ], [θ], [φ] the antisymmetric matrices
[Uψ] =


0 0 0
0 0 −1
0 1 0

 , [Uθ] =


0 0 1
0 0 0
−1 0 0

 , [Uφ] =


0 −1 0
1 0 0
0 0 0

 , (8.245)
we obtain the derivatives [ψp], [θp], [φp] from the relations
[ψp] = [Uψ][ψ] = [ψ][Uψ], [θp] = [Uθ][θ] = [θ][Uθ], [φp] = [Uφ][φ] = [φ][Uφ];
(8.246)
thus, the partial derivatives [Aψ], [Aθ], [Aφ] of the matrix [A] are
[Aψ] = [Uψ][A], [Aθ] = [A][φ]T
[Uθ][φ], [Aφ] = [A][Uφ], (8.247)
while the derivative with respect to time of the matrix [A] is
[ ˙A] = ˙ψ[Aψ] + ˙θ[Aθ] + ˙φ[Aφ]. (8.248)
502 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
The square matrix [ω] of the angular velocity with respect to the frame Oxyz is
antisymmetric and we deduce the relation
[ω] = [A]T
[ ˙A] =


0 −ωz ωy
ωz 0 −ωx
−ωy ωx 0

 , (8.249)
from which it follows that
{ω} = [Q]{˙β}, (8.250)
where
{ω} = ωx ωy ωz
T
, {˙β} = ˙ψ ˙θ ˙φ
T
, (8.251)
[Q] =


cos θ cos φ sin φ 0
− cos θ sin φ cos φ 0
sin θ 0 1

 . (8.252)
Moreover, we obtain
[Qθ] =


− sin θ cos φ 0 0
sin θ sin φ 0 0
cos θ 0 0

 , [Qφ] =


− cos θ sin φ cos φ 0
− cos θ cos φ − sin φ 0
0 0 0

 , (8.253)
[ ˙Q] = ˙θ[Qθ] + ˙φ[Qφ]. (8.254)
1.2. The constraints matrix
In the frame of reference Oxyz, the point A has the co-ordinates a, −b, c; denoting
by XA, YA, ZA the co-ordinates of the same point in the frame O0XYZ, we obtain the
matrix equation 

XA
YA
ZA

 =


XO
YO
ZO

 + [A]


a
−b
c

 , (8.255)
or
{RA} = {RO} + [A]{rA}, (8.256)
where
{RA} = XA YA ZA
T
, {RO} = XO YO ZO
T
, {rA} = a −b c
T
.
(8.257)
Writing equation (8.242) in the general form
f (X, Y, Z) = 0, (8.258)
we must verify the relation
f (XA, YA, ZA) = 0. (8.259)
Differentiating with respect to time equation (8.259), it follows that
{fp}T


˙XA
˙YA
˙ZA

 = 0, (8.260)
APPLICATIONS 503
where
{fp} =
∂f
∂X
∂f
∂Y
∂f
∂Z
T
. (8.261)
Differentiating with respect to time relation (8.256) and taking into account the suc-
cessive relations
[ ˙A]{rA} = [A][ω]{rA} = [A][rA]T
{ω}, (8.262)
where
[rA] =


0 −c −b
c 0 −a
b a 0

 , (8.263)
we obtain
{ ˙RA} = { ˙RO} + [A][rA]T
[Q]{˙β}; (8.264)
equation (8.260), with the notations
[B] = {fp}T
[I] [A][rA]T
[Q] , (8.265)
{q} = XO YO ZO ψ θ φ
T
, (8.266)
becomes
[B]{˙q} = 0, (8.267)
where [B] is the constraints matrix.
1.3. The matrix differential equation of the motion
The kinetic energy T of the rigid solid reads
T =
1
2
m{ ˙RO}T
{ ˙RO} +
1
2
{ω}T
[J]{ω}, (8.268)
where [J] is the matrix of the moments of inertia with respect to the axes Ox, Oy, Oz
[J] =


Jxx −Jxy −Jxz
−Jyx Jyy −Jyz
−Jzx −Jzy Jzz

 . (8.269)
In the considered case Jxy = Jxz = Jyz = 0 and
Jxx =
m
3
(b2
+ c2
), Jyy =
m
3
(a2
+ c2
), Jzz =
m
3
(a2
+ b2
). (8.270)
Applying Lagrange’s equations and using the notations
[m] =


m 0 0
0 m 0
0 0 m

 , [M] =
[m] [0]
[0] [Q]T
[J][Q]
,
{F} = 0 0 −mg 0 0 0
T
,
[∆] =



˙β
T
[Qψ]T
[J][Q]
{˙β}T
[Qθ]T
[J][Q]
{˙β}T
[Qφ]T
[J][Q]


 , {Fβ} = [[ ˙Q]T
[J][Q] + [Q]T
[J][ ˙Q] + [∆]]{˙β},
{F} = 0 0 0 Fβ
T T
, (8.271)
504 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
we obtain the matrix differential equation
[M]{¨q} = {F} + {F} + λ[B]T
. (8.272)
Equation (8.272) together with Equation (8.267), differentiated with respect to time,
form the equation
[M] −[B]T
[B] 0
{¨q}
λ
=
{F} + {F}
−[ ˙B]{˙q}
, (8.273)
from which we obtain {¨q}, λ; then, by the Runge–Kutta method, we get the new values
for {q} and {˙q}.
2. Numerical calculation
With the initial values we calculate, successively, [ψ], [θ], [φ], [Aψ], [Aθ], [Aφ], [ ˙A], [Q], [Qψ],
[Qθ], [Qφ], [ ˙Q] by relations (8.243) and (8.244) and then the co-ordinates XA, YA, ZA by relation
(8.225) and
{fp} = 2XA 0 1
T
, (8.274)
as well as the matrix [B] by relation (8.265).
Hereafter, from equation (8.264) we obtain ˙XA, ˙YA, ˙ZA; we may thus calculate
{˙fp} = 2 ˙XA 0 0
T
, (8.275)
and
[ ˙B] = {˙fp}T
[I] [A][rA]T
[Q] + {fp}T
[0] [ ˙A][rA]T[Q] + [A][rA]T[ ˙Q] , (8.276)
and then the matrices [∆], {Fβ}, {F}, by relation (8.271), where [Qψ] = [0].
Finally, from equation (8.273) we calculate {¨q}, λ; then, by the Runge–Kutta method we deter-
mine the new values {q}, {˙q}, the iteration process being then taken again.
We obtain the diagrams in Figure 8.9.
For the reaction, it follows that
{NA} = λ{fp} = λ 2XA 0 1
T
, (8.277)
hence
NA = λ 4X2
A + 1. (8.278)
The graphic is drawn in Figure 8.10.
Problem 8.4
Let ABCDA B C D in Figure 8.8 be the parallelepiped discussed in Problem 8.3, where the point
A is situated without friction on the curve of equations
X2
+ Z − 1 = 0, X2
+ (Y − 1)2
− 1 = 0. (8.279)
Assuming the same data as in Problem 8.3 and the initial conditions X0
O = −0.3 m, Y0
O =
2.2 m, Z0
O = 0.9 m, ψ0
= 0 rad, θ0
= 0 rad, φ0
= 0 rad, ˙ψ0
= 0 rad s−1
, ˙θ0
= 0 rad s−1
, ˙φ0
=
0 rad s−1, let us determine
• the trajectory of the point O;
• the reaction at A.
APPLICATIONS 505
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
t (s) t (s)
t (s)
t (s)
t (s)
t (s)
XO
(m)
YO
(m)
ZO
(m)
XA
(m)
XA
(m)
ZA
(m)
ZA
(m)
YA
(m)
Y
A (m)
(a)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
(b)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
0.4
0.42
0.44
0.46
0.48
0.5
0.52
0.54
(d)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
0.54
0.56
0.58
0.6
0.62
0.64
0.66
0.68
0.7
0.72
0.74
(c)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
0
0.1
0.2
0.3
0.4
0.5
(e)
(g)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
0.72
0.74
0.76
0.78
0.8
0.82
0.84
(f)
0.4
0.45
0.5
0.55
0.6
0
0.2
0.4
0.6
0.8
0.72
0.74
0.76
0.78
0.8
0.82
0.84
Figure 8.9 Variation diagrams.
506 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
0
1000
2000
3000
4000
5000
6000
t (s)
N(N)
Figure 8.10 The diagram NA = NA(t).
Solution:
1. Theory
In this case, the calculation algorithm remains, in principle, the same as that in the previous
problem; the constraints matrix becomes
[B] =
f1p
T
[I] [A][rA]T
[Q]
{f2p}T [I] [A][rA]T
[Q]
, (8.280)
where
{f1p} = 2XA 0 1
T
, {f2p} = 2XA 2YA − 2 0
T
, (8.281)
{˙f1p} = 2 ˙XA 0 0
T
, {˙f2p} = 2 ˙XA 2 ˙YA 0
T
. (8.282)
It follows the calculation algorithm, so that
– we determine the matrices [ψ], [θ], [φ], [A], [Aψ], [Aθ], [Aφ], [ ˙A], [Q], [Qψ], [Qθ], [Qφ];
– we determine the matrices {RA}, { ˙RA}
{RA} = {RO} + [A]{rA}, { ˙RA} = { ˙RO} + [A][rA]T
[Q]{˙β}; (8.283)
– we determine the constraints matrix by relation (8.280) and its derivative by the relation
[ ˙B] =
˙f1p
T
[I] [A][rA]T
[Q] + {f1p}T [0] [ ˙A][rA]T[ ˙Q] + [A][rA]T[ ˙Q]
{˙f2p}T [I] [A][rA]T
[Q] + {f2p}T [0] [ ˙A][rA]T
[ ˙Q] + [A][rA]T
[ ˙Q]
;
(8.284)
– we calculate the matrices [M], {F} by the relations
[m] =


m 0 0
0 m 0
0 0 m

 , [M] =
[m] [0]
[0] [Q]T
[J][Q]
, [∆] =




˙β
T
[Qψ]T
[J][Q]
{˙β}T
[Qθ]T
[J][Q]
{˙β}T[Qφ]T[J][Q]



 ,
{˙Fβ}T
= [[ ˙Q]T
[J][Q] + [Q]T
[J][ ˙Q] + [∆]]{˙β}, {F} = 0 0 0 Fβ
T T
;
(8.285)
APPLICATIONS 507
– we calculate {¨q}, λ1, λ2 from the equation
[M] −[B]T
[B] [0]


{¨q}
λ1
λ2

 =
{F} + {F}
−[ ˙B]{˙q}
, (8.286)
and then the new values of the matrices {q}, {˙q} by means of the Runge–Kutta method.
The reaction NA reads
{NA} = λ1{f1p} + λ2{f2p}, (8.287)
NA = λ2
1{f1p}T{f1p} + λ2
2{f2p}T{f2p} + 2λ1λ2{f1p}T{f2p}. (8.288)
2. Numerical calculation
We obtain the numerical results plotted in the diagrams in Figure 8.11 and Figure 8.12.
Problem 8.5
Let us consider the system formed by n bodies, hung in a vertical plane and linked to one another
in series (Fig. 8.13). Study the motion of this system. As numerical application, consider the system
formed by four bodies (Fig. 8.14) for which
n = 4, m1 = 10 kg, m2 = 8 kg, m3 = 50 kg, m4 = 16 kg, l1 = 4 m, l2 = 0.5 m,
l3 = 0.5 m, l4 = 0.7 m, r1 = 2 m, r2 = 0.25 m, r3 = 0.25 m, r4 = 0.35 m,
J1 = 13.3333 kg m2
, J2 = 0.1666 kg m2
, J3 = 1.0416 kg m2
, J4 = 0.6533 kg m2
.
(8.289)
The initial conditions are (for t = 0)
θ0
1 = 0 rad, θ0
2 = 1 rad, θ0
3 = 3.12414 rad, θ0
4 = 3.12414 rad, ˙θ0
1 = 0 rad s−1
,
˙θ0
2 = 0.25 rad s−1
, ˙θ0
3 = 0 rad s−1
, ˙θ0
4 = 0 rad s−1
.
(8.290)
Solution:
1. Theory
The following are known:
• the masses of the n bodies mi, i = 1, . . . , n;
• the moments of inertia relative to the gravity centers Ci of the bodies, calculated with respect
to an axis perpendicular to the plane of the motion and denoted by Ji, i = 1, . . . , n;
• the lengths of the bodies calculated from the link point to the previous body to the link
point to the next body, denoted by li, i = 1, . . . , n;
• the distances from the link point to the previous body to the gravity center, denoted by ri,
i = 1, . . . , n.
We are required to
• establish the equations of motion of the bodies;
• the numerical integration of these equations.
To establish the equations of motion, we shall use the second-order Lagrange equations, which,
in the general case of the holonomic constraints and assuming that the forces derive from a function
of force, read
d
dt
∂T
∂ ˙qi
−
∂T
∂qi
+
∂V
∂qi
= 0, (8.291)
508 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
XA
(m)
ZA
(m)
Y
A (m)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
−0.33
−0.325
-0.32
−0.315
−0.31
−0.305
−0.3
−0.295
XO
(m)ZO
(m)
XA
(m)ZA
(m)
YA
(m)
YO(m)
(a)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
2.05
2.1
2.15
2.2
t (s)t (s)
t (s)t (s)
t (s)t (s)
(b)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
(c)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
−0.4
−0.35
−0.3
−0.25
−0.2
−0.15
−0.1
−0.05
0
(d)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
1.92
1.93
1.94
1.95
1.96
1.97
1.98
1.99
2
2.01
(e)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
0.85
0.9
0.95
1
(f)
−0.4
−0.3
−0.2
−0.1
0
1.9
1.95
2
0.85
0.9
0.95
1
Figure 8.11 Variation diagrams.
APPLICATIONS 509
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
0
200
400
600
800
1000
1200
1400
t (s)
N(N)
Figure 8.12 The diagram NA = NA(t).
C1
m1g
θ1
A1 C2
Cn
A2
An–1
An
θ2
m2g
θn
mng
r1 l1 r2
l2
rn
ln
O
Figure 8.13 Problem 8.5.
C1θ1
θ3
A1
C2
C4
A4
θ2
r1 l1
r2
l2
r4
l4
O
C3
A3
A2
l3
r3
m3g
m1g
m2g
m4g
θ4
Figure 8.14 Numerical application.
where T denotes the kinetic energy of the system, V represents the potential energy, and qi, i = 1, n,
is a generalized co-ordinate of the system.
In this case, the kinetic energy is given by the relation
T =
n
i=1
Ti, (8.292)
510 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
where Ti, i = 1, 2, . . . , n, are the kinetic energies of the component bodies of the system. These
read
Ti =
1
2
miv2
Ci
+
1
2
Ji
˙θ2
i , (8.293)
where vCi
is the velocity of the gravity center of the body i, given by the relation
v2
Ci
= ˙x2
Ci
+ ˙y2
Ci
. (8.294)
We obtain
T =
1
2
n
i=1



mi


i−1
j=1
l2
j
˙θ2
j + r2
i
˙θ2
i + 2
i−2
j=1
i−1
k=j+1
lj lk
˙θj
˙θk cos θk − θj
+ 2
i−1
j=1
lj ri
˙θj
˙θk cos θi − θj

 + Ji
˙θ2
i



.
(8.295)
Taking into account that the only forces that act are the weights of the bodies, the potential
energy of the system takes the form
V = −m1gr1 cos θ1 − m2g(l1 cos θ1 + r2 cos θ2)
− m3g(l1 cos θ1 + l2 cos θ2 + r3 cos θ3) − · · ·
− mng(l1 cos θ1 + l2 cos θ2 + l3 cos θ3 + · · · + ln−1 cos θn−1 + rn cos θn).
(8.296)
With the notations
Jii = Ji + mir2
i +


n
j=i+1
mj

 l2
i , (8.297)
ai = miri +


n
j=i+1
mj

 li, (8.298)
[J] =




J11 a2l1 cos θ1 − θ2 · · · anl1 cos(θ1 − θn)
a2l1 cos(θ1 − θ2) J22 · · · anl2 cos(θ2 − θn)
· · · · · · · · · · · ·
anl1 cos(θ1 − θn) anl2 cos(θ2 − θn) · · · Jnn



 , (8.299)
[A] =




0 a2l1 sin θ1 − θ2 · · · anl1 sin(θ1 − θn)
−a2l1 sin(θ1 − θ2) 0 · · · anl2 sin(θ2 − θn)
· · · · · · · · · · · ·
−anl1 sin(θ1 − θn) −anl2 sin(θ2 − θn) · · · 0



 , (8.300)
[K] =




ga1 0 · · · 0
0 ga2 · · · 0
· · · · · · · · · · · ·
0 0 · · · gan



 , (8.301)
{θ} = θ1 θ2 · · · θn
T
, {¨θ} = ¨θ1
¨θ2 · · · ¨θn
T
, {˙θ2
} = ˙θ2
1
˙θ2
2 · · · ˙θ2
n
T
,
{sin θ} = sin θ1 sin θ2 · · · sin θn
T
,
(8.302)
APPLICATIONS 511
where the elements of the matrices [J], [A], and [K] are given by the formulae
Jpq =



Jpp for p = q,
aqlp cos θp − θq for p < q,
Jqp for p > q,
(8.303)
Apq =



0 for p = q,
aqlp sin θp − θq for p < q,
−Apq for p > q,
(8.304)
and
Kpq =
gap for p = q,
0 for p = q,
(8.305)
respectively, and the system of equations of motion reads
[J]{¨θ} + [A]{˙θ2
} + [K]{sin θ} = {0}. (8.306)
Relation (8.306) can be written in the form
{¨θ} = −[J]−1
[A]{˙θ}2
− [J]−1
[K]{sin θ}. (8.307)
With the notations
θ1 = ξ1, θ2 = ξ2, . . . , θn = ξn, ˙θ1 = ξn+1, ˙θ2 = ξn+2, . . . , ˙θn = ξ2n, (8.308)
[B] = [J]−1
[A], (8.309)
[L] = [J]−1
[K], (8.310)
we obtain the system
dξi
dt
=



ξn+i for i ≤ n,
−
n
j=1
Bi−n,j ξ2
n+j −
n
j=1
Li−n,j sin ξj for i > n.
(8.311)
2. Numerical calculation
In the case of the numerical application, we obtain, with the aid of the fourth-order Runge–Kutta
method, the numerical results plotted in the diagrams in Figure 8.15.
Problem 8.6
Let the kinematic schema in Figure 8.16 be of a torque converter of G. Constantinescu.12
It is composed of the principal axle 1, the floating lever 2, the connection bars 3, 3, and the
bars 4, 4. The principal axle is articulated to the floting lever at the point A, the last one acting
the connection bars by the multiple articulation B. The connection bars are acting the bars 4, 4 by
the articulations D, D. The bars 4, 4 are hinged at the fixed point E. Thus, the motion of rotation
of the principal axle 1 is transformed into the oscillatory plane-parallel motion of the lever 2, and
this is transformed, by means of a coupling system, into a motion of rotation in the same sense of
the secondary axle 5. In Figure 8.16, the simplest system of coupling, formed by the ratchet wheel
5 and the ratchets 6, 6, has been chosen.
12After George “Gogu” Constantinescu (1881–1965) who created the theory of sonics in A Treatise of Transmission
of Power by Vibrations in 1918. This torque converter is an invention of G. Constantinescu.
512 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
t (s) t (s)
t (s) t (s)
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
θ1
(rad)
θ3
(rad)
θ2
(rad)
θ4
(rad)
(a)
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
(b)
3
4
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
−3
−2
−1
0
1
2
(c)
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
−2
−1
0
1
2
3
4
(d)
Figure 8.15 Results of the simulation.
O
A
X
Y
E
B
C
mg
D
D
~
3 6
4
63
5
1
M5
M1
ω1
θ γ
~ ~
~
ψϕ
2
Figure 8.16 Torque converter of G. Constantinescu.
The following are known:
– the distances OA = a, AB = b, AC = l, BD = BD = c, ED = ED = d, xE, yE;
– the moment of inertia J1 and the mass m;
– the motor torque
M1 = M0 − kω2
; (8.312)
– the resistant torque
M5 =
M5 if ˙ψ ≥ 0,
−M5 if ˙ψ < 0;
(8.313)
APPLICATIONS 513
– the initial conditions (which have to be consistent with position and velocity constraint
equations, see below) t = 0, φ = φ0, θ = θ0, γ = γ0, ψ = ψ0, ˙φ = ˙θ = ˙γ = ˙ψ = 0.
It is required to determine and represent graphically
• ω1(t), ω5(t) = | ˙ψ|;
• the trajectory of the point B.
Numerical application: l = 0.3 m, a = 0.015 m, b = 0.15 m, c = 0.25 m, d =
√
b2 − a2,
xE =
√
c2 − d2, yE = d, m = 3 kg, J1 = 0.1 kg m3
, M0 = 3.2 Nm, k = 2 × 10−5
Nms2
,
M5 = 20 Nm, φ0 = −π/2, θ0 = arg tan(a/d), γ0 = arg tan(d/
√
c2 − d2), ψ0 = 0 rad, ˙φ = ˙θ =
˙γ = ˙ψ = 0 rad s−1
.
Solution:
1. Theory
The chosen mechanical model is that in which the bodies 3, 3, 4, 4 have no mass, while the
one-directional system formed by these bars leads (approximately) to a symmetry of the motion of
the bars 4, 4.
Under these conditions, we study the motion of the mechanism with two degrees of freedom,
formed by the elements 1, 2, 3, 4, 5, the bar 4 being acted on by the torque M5, given by relation
(8.313).
We obtain the equations of constraints
a sin φ + b sin θ + c cos γ − d sin ψ = XE, −a cos φ + b cos θ − c sin γ + d cos ψ = YE;
(8.314)
by differentiation with respect to time, denoting by [B] the matrix of constraints
[B] =
a cos φ b cos θ −c sin γ −d cos ψ
a sin φ −b sin θ −c cos γ −d sin ψ
(8.315)
and by {q} the column matrix of the generalized co-ordinates
{q} = φ θ γ ψ
T
, (8.316)
we obtain the equation of constraints
[B]{˙q} = {0}. (8.317)
The kinetic energy T of the system reads
T =
1
2
[J1
˙φ2
+ m( ˙X2
C + ˙Y2
C)] (8.318)
or
T =
1
2
[(J1 + ma2
)˙φ2
+ ml2 ˙θ2
+ 2mal ˙φ˙θ cos(φ + θ)]. (8.319)
Using Lagrange’s equations, we write successively the relations
d
dt
∂T
∂ ˙φ
= (J1 + ma2
)¨φ + mal ¨θ cos(φ + θ) − mal(˙φ + ˙θ) sin(φ + θ), (8.320)
514 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
∂T
∂φ
= −mal ˙φ˙θ sin(φ + θ), (8.321)
d
dt
∂T
∂˙θ
= ml2 ¨θ + mal ¨φ cos(φ + θ) − mal ˙φ(˙φ + ˙θ) sin(φ + θ), (8.322)
∂T
∂θ
= −mal ˙φ˙θ sin(φ + θ), (8.323)
d
dt
∂T
∂ ˙γ
=
d
dt
∂T
∂ ˙ψ
= 0,
∂T
∂γ
=
∂T
∂ψ
= 0; (8.324)
because the generalized forces are
Qφ = M1 + mg sin φ, Qθ = −mgl sin θ, Qψ = −M5, (8.325)
Lagrange’s equations, which are of the form
d
dt
∂T
∂ ˙qk
−
∂T
∂ ˙qk
= Qk + B1kλ1 + B2kλ2, (8.326)
B1k, B2k being the elements of the matrix [B], while λ1, λ2 are Lagrange’s multipliers, are written
in the matrix form
[M]{¨q} = {F} + {F} + [B]T
{λ}, (8.327)
where
[M] =




J1 + ma2
mal cos (φ + θ) 0 0
mal cos(φ + θ) ml2
0 0
0 0 0 0
0 0 0 0



 , (8.328)
{F} = Qφ Qθ 0 Qψ
T
, (8.329)
{F} = mal ˙θ2 ˙φ2
0 0
T
sin(φ + θ), (8.330)
{λ} = λ1 λ2
T
. (8.331)
If to the differential equation (8.327) we add equation (8.317), differentiated with respect to time,
we obtain the matrix differential equation
[M] −[B]T
[B] [0]
{¨q}
{λ}
=
{F} + {F}
−[ ˙B]{˙q}
, (8.332)
where
[ ˙B] =
−a ˙φ sin φ −b˙θ sin θ −c˙γ cos γ d ˙ψ sin ψ
a ˙φ cos φ −b˙θ cos θ c˙γ sin γ −d ˙ψ cos ψ
. (8.333)
For the given initial conditions, from equation (8.332) we determine the matrices {q}, {λ}; then,
by the Runge–Kutta numerical method, we determine the new values of the matrices {q}, {˙q}, which
become the initial conditions for the following integration step.
This problem is a particular one in the class of drift and constraint stabilization.
APPLICATIONS 515
0 0.5 1 1.5 2 2.5
−4.5
−4
−3.5
−3
−2.5
−2
−1.5
t (s)
ω1(rads−1
)
Figure 8.17 Variation of ω1 = ω1(t).
0 0.5 1 1.5 2 2.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
t (s)
ω5(rads−1)
Figure 8.18 Variation of ω5 = ω5(t).
2. Numerical calculation
On the basis of the calculation algorithm constructed by means of relations (8.312), (8.313),
(8.315), (8.316), (8.323), (8.329), (8.330), (8.331), (8.333), and (8.332) as well as of the relations
XB = a sin φ + b sin θ, YB = −a cos φ + b cos θ, (8.334)
the results plotted in the diagrams in Figure 8.17, Figure 8.18, and Figure 8.19 have been obtained.
Problem 8.7
We consider the toroidal wheel of radius r0 and balloon radius r, which, under the influence of the
weight mg, is rolling without sliding on a horizontal plane.
Knowing that, at the initial moment, the wheel axis is inclined by the angle θ0 with respect to
the vertical and that the angular velocity is parallel to the rotation axis of the wheel and has the
value ω0, let us determine
516 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
0 0.01 0.02 0.03 0.04 0.05 0.06
0.13
0.135
0.14
0.145
0.15
0.155
0.16
0.165
XB (m)
YB(m)
Figure 8.19 Variation of YB = YB(XB).
• the variation in time of the inclination angle of the wheel axis with respect to the vertical;
• the trajectory of the point of contact wheel-plane;
• the variation in time of the contact forces wheel-plane.
Numerical application: r0 = 0.3 m, r = 0.05 m, m = 20 kg, Jx = Jy = 0.9 kg m2,
Jz = 1.8 kg m2, θ0 = 5π/12 rad.
Solution:
1. Theory
1.1. Equations of the torus
We consider the circle of radius r situated in the plane Oy z (Fig. 8.20), the center
C of it being chosen so as to have yC = −r0.
The Oy z-plane is obtained by rotation with the angle η of the Oyz-plane around the
Oz-axis.
By the notations in Figure 8.20, the co-ordinates of a point of the circle in the system
Ox y z are
x = 0, y = −(r0 + r cos ξ), z = r sin ξ. (8.335)
By rotating the circle, we obtain the torus, the parametric equations of which are
obtained, in the Oxyz-frame, from the relation


x
y
z

 =


cos η − sin η 0
sin η cos η 0
0 0 1




0
− r0 + r cos ξ
r sin ξ

 ; (8.336)
it follows that
x = (r0 + r cos ξ) sin η, y = −(r0 + r cos ξ) cos η, z = r sin ξ. (8.337)
APPLICATIONS 517
x
y
x′
z
O
y′
ξ
η
η
Figure 8.20 Equations of the torus.
M
y
x
X
Z
YO O
z
Figure 8.21 Conditions of tangency of the torus with the plane.
1.2. Conditions of tangency of the torus with the plane
We take as rolling plane the horizontal O0XY -plane (Fig. 8.21) and we choose as
rotation angles Euler’s angles ψ, θ, φ, to which correspond the partial rotation matrices
[ψ] =


cos ψ − sin ψ 0
sin ψ cos ψ 0
0 0 1

 , [θ] =


1 0 0
0 cos θ − sin θ
0 sin θ cos θ

 ,
[φ] =


cos φ − sin φ 0
sin φ cos φ 0
0 0 1

 , (8.338)
and the rotation matrix [A] of the frame Oxyz with respect to the frame O0XYZ
[A] = [ψ][θ][φ]. (8.339)
Denoting by {r}, {rξ}, {rη} the matrices
{r} =


r0 + r cos ξ sin η
−(r0 + r cos ξ) cos η
r sin ξ

 , {rξ} =


−r sin ξ sin η
r sin ξ cos η
r cos ξ

 ,
{rη} =


r0 + r cos ξ cos η
(r0 + r cos ξ) sin η
0

 , (8.340)
the tangency conditions at the point M are written in the form
0 0 1 [A]{rξ} = 0, 0 0 1 [A]{rη} = 0; (8.341)
518 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
hence, we obtain the equations
sin θ sin(φ + η) = 0, sin θ sin ξ cos(φ + η) + cos θ cos ξ = 0, (8.342)
from which it follows that
η = −φ, ξ = θ −
π
2
. (8.343)
1.3. Initial conditions
If we choose the frame of reference O0XYZ, so that the contact point at the initial
moment is O0, the Ox-axis is parallel to the O0Y-axis, while the Oz-axis is normal to
the O0Y-axis, then, at the initial moment, the conditions
ψ =
π
2
, θ = θ0, φ = 0, [A] =


0 −1 0
1 0 0
0 0 1




1 0 0
0 cos θ0 − sin θ0
0 sin θ0 cos θ0

 ,
{r} =


0
− r0 + r sin θ0
−r cos θ0

 , (8.344)
are fulfilled; also, from the contact equation at O0


0
0
0

 =


XO
YO
ZO

 + [A]{r}, (8.345)
we obtain the initial conditions
XO = −r0 cos θ0, YO = 0, ZO = r0 sin θ0 + r. (8.346)
From the conditions specified in the enunciation, it also follows that at the initial moment
˙ψ = ˙θ = 0, ˙φ = ω0, (8.347)
while, from the condition of rolling without sliding, we get


˙XO
˙YO
˙ZO

 + [A][r]T
[Q]


˙ψ
˙θ
˙φ

 = {0}; (8.348)
knowing that
[r] =


0 r cos θ0 − r0 + r sin θ0
−r cos θ0 0 0
r0 + r sin θ0 0 0

 ,
[Q] =


sin φ sin θ cos φ 0
cos φ sin θ − sin φ 0
cos θ 0 1

 =


0 1 0
sin θ0 0 0
cos θ 0 1

 ,
(8.349)
we obtain the initial conditions
˙XO = ˙YO = 0, ˙ZO = −(r0 + r sin θ0)ω0. (8.350)
APPLICATIONS 519
1.4. The constraints matrix
Taking into account relation (8.343), from the last relation (8.340) we get
{r} =


− r0 + r sin θ sin φ
−(r0 + r sin θ) cos φ
−r cos θ

 ; (8.351)
with the notations
[r] =


0 r cos θ − r0 + r sin θ cos φ
−r cos θ 0 (r0 + r sin θ) sin φ
(r0 + r sin θ) cos φ −(r0 + r sin θ) sin φ 0

 , (8.352)
from equation (8.348) we obtain the constraints matrix
[B] = [I] [A][r]T
[Q] . (8.353)
The derivative with respect to time of the constraints matrix is
[ ˙B] = [0] [ ˙A][r]T[Q] + [A][˙r]T[Q] + [A][r]T[ ˙Q] , (8.354)
where
[˙r] =


0 −˙z ˙y
˙z 0 −˙x
− ˙y ˙x 0

 , (8.355)
˙x = −r˙θ cos θ sin φ − ˙φ(r0 + r sin θ) cos φ, ˙y = −r˙θ cos θ sin φ + ˙φ(r0 + r sin θ) sin φ,
˙z = −r˙θ sin θ.
(8.356)
2. Numerical calculation
As has been shown in Problem 8.6, the matrix differential equation of the motion is
[M] −[B]T
[B] [0]
{¨q}
{λ}
=
{F} + {F}
−[ ˙B]{˙q}
, (8.357)
where
[m] =


m 0 0
0 m 0
0 0 m

 , [J] =


Jx 0 0
0 Jy 0
0 0 Jz

 , (8.358)
[Q] =


sin φ sin θ cos φ 0
cos φ − sin φ 0
cos θ 0 1

 , (8.359)
[M] =
[m] [0]
[0] [Q]T[J][Q]
, (8.360)
{q} = XO YO ZO ψ θ φ
T
, {λ} = λ1 λ2 λ3
T
, (8.361)
{F} = 0 0 −mg 0 0 0
T
, (8.362)
{β} = ψ θ φ
T
, (8.363)
520 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
[Uψ] = [Uφ] =


0 −1 0
1 0 0
0 0 0

 , [Uθ] =


0 0 0
0 0 −1
0 1 0

 , (8.364)
[Qψ] = [0], [Qθ] =


sin φ cos θ 0 0
cos φ cos θ 0 0
− sin θ 0 0

 , [Qφ] =


cos φ sin θ − sin φ 0
− sin φ sin θ − cos φ 0
0 0 0

 , (8.365)
[∆] =



˙β
T
[Qψ]T
[J][Q]
{˙β}T
[Qθ]T
[J][Q]
{˙β}T
[Qφ]T
[J][Q]


 , (8.366)
[ ˙Q] = ˙θ[Qψ] + ˙φ[Qφ], [Aψ] = [Uψ][A], [Aθ] = [A][φ]T
[Uθ][φ], [Aφ] = [A][Uφ],
(8.367)
{Fβ} = [[ ˙Q]T
[J][Q] + [Q]T
[J][ ˙Q] + [∆]]{˙β}, {F} = 0 0 0 Fβ
T T
. (8.368)
By solving equation (8.357), we determine the functions XO(t), YO(t), ZO(t), ψ(t), θ(t), φ(t).
The variation of the inclination angle θ is given in Figure 8.22.
The trajectory of the contact point is obtained by means of the co-ordinates X, Y, Z = 0, which
are obtained from the relation 

X
Y
Z

 =


XO
YO
ZO

 + [A]{r}; (8.369)
it results in the trajectory in Figure 8.23.
The reaction of contact has the components along the axes O0X, O0Y, O0Z
RX = λ1, RY = λ2, RZ = λ3; (8.370)
thus, the force tangent to the wheel is
Ft =
˙Xλ1 + ˙Yλ2
˙X2 + ˙Y2
, (8.371)
while the force in the plane of contact, normal to the tangent at the wheel, is
Fn =
˙Yλ1 − ˙Xλ2
˙X2 + ˙Y2
. (8.372)
The variation in time of the forces RZ, Ft , Fn is given in Figure 8.24, Figure 8.25, and Figure 8.26.
Problem 8.8
(Postcritical behavior of the cantilever beam). Let us consider a cantilever beam of length l, acted
upon by the constant axial force P (Fig. 8.27). The mathematical model of the problem may be
expressed in the nonlinear general form
dy
ds
= sin θ,
dθ
ds
= α2
(f − y), α2
=
P
EI
, (8.373)
where ds = (dx)2 + (dy)2, Oxis the direction along the bar axis, O corresponds to the bar left
end, Oy is the transverse axis, θ is the rotation of the bar cross section, and EI is the constant
APPLICATIONS 521
0 1 2 3 4 5 6 7 8 9 10
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
t (s)
θ(rad)
Figure 8.22 The variation θ = θ(t).
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
X (m)
Y(m)
Figure 8.23 Trajectory of the contact point.
bending rigidity of the bar (E is the modulus of longitudinal elasticity, I is the moment of inertia
of the cross section with respect to the neutral axis). The solution must be found under null Cauchy
conditions
y(0) = 0, θ(0) = 0. (8.374)
We firstly perform the change of function
y(x) = y(x) − f, (8.375)
and then apply the LEM mapping, which in this case will depend on two parameters
ν(x, σ, ξ) = eσy(s)+ξθ(s)
; (8.376)
522 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
0 1 2 3 4 5 6 7 8 9 10
150
200
250
300
350
400
t (s)
RZ(n)
Figure 8.24 The variation RZ = RZ(t).
0 1 2 3 4 5 6 7 8 9 10
−25
−20
−15
−10
−5
0
5
10
15
20
25
Ft(N)
Figure 8.25 The variation Ft = Ft (t).
this leads to the first linear partial differential equation, equivalent to equation (8.373), the first
LEM equivalent
∂ν
∂x
= σ sin Dξν − α2
ξ
∂ν
∂σ
. (8.377)
By sin Dξ, we mean the operator obtained by formally replacing the powers of θ with derivatives
with respect to ξ of the same order in the expansion of sin θ.
Considering for ν a series expansion in σ and ξ, we get the second LEM equivalent
dνij
ds
= i
∞
k=1
(−1)k+1
(2k − 1)!
νi−1,j+2k−1 − jα2
νi+1,j−1. (8.378)
APPLICATIONS 523
0 1 2 3 4 5 6 7 8 9 10
−200
−180
−160
−140
−120
−100
−80
−60
−40
−20
t (s)
Fn(N)
Figure 8.26 The variation Fn = Fn(t).
x
y
O
P
l
f
θ
Figure 8.27 Problem 8.8.
Applying Theorem 8.4, we get the following normal LEM representation
y(x) ≡ −f (cos αx − 1) − f 2
α2
(αx) − f 4
α4
(αx), (8.379)
where (αx) is analytic in αx and (αx) is given by
(αx) =
1
16
1
4
(cos 3αx − cos αx) + αx sin αx . (8.380)
To equation (8.379) we apply the condition y(l) = f , meaning that the bar length remains still
l if the shortening is neglected in postcritical behavior. This gives
cos αl + (αf )2
(αl) ≡ 0, (8.381)
in fact, an approximate relationship depending on the parameters f and α.
From equation (8.381), by elementary computation we obtain
f
l
∼=
4
αl
2 cot αl
sin 2αl − 2αl
,
π
2
< αl < π, (8.382)
which is, in fact, a direct LEM representation of the postcritical values of f/l as a function of the
supraunitary ratio P/Pcr (Pcr = π2EI /(4l2) is the critical force). It will be marked by LEM.
524 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
Considering for α the following expansion
α = α0 + α1f + α2
f 2
2!
+ · · · ≡
∞
j=0
αj
f j
j!
, (8.383)
and, introducing it in equation (8.381), a power series in f appears that must vanish identically.
Determining the coefficients αj up to j = 2, we obtain
α
α0
∼= 1 + 16α2
0f 2
, (8.384)
from which another approximate LEM formula for the postcritical values of f/l, marked by LEM1
is finally deduced:
f
l
∼=
8
π
P
Pcr
− 1. (8.385)
We can also relate the dimensionless quantities αl and αf by taking
(αf )2
=
∞
j=0
pj
(α − α0)j
lj
j!
; (8.386)
introducing this in formula (8.381) again leads to a series in αl, whose coefficients must vanish.
Going as far as j = 1, we obtain the following approximating value for αf
(αf )2 ∼= 16(α − α0)l, (8.387)
and from equation (8.381) we get a third formula for the postcritical cantilever bar, marked by
LEM2,
f
l
∼=
8
π
Pcr
P
P
Pcr
− 1, (8.388)
which coincides with Schneider’s formula.
The form of these formulae suggests a comparison with Grashof’s formula (marked by G)
f
l
∼=
8
π
P
Pcr
P
Pcr
− 1, (8.389)
established from the well-known form of the solution of the cantilever problem by using elliptic
integrals.
The LEM representation for y was also used to get good postcritical formulae for other quantities
of interest, such as δ/l, where δ = l − x(l), the displacement of the bar along its straight axis, and
θ(l).
In Table 8.22 the values of the ratio f/l, expressed by elliptic integrals (exact solution), are
compared
f
l
=
2k
K(k)
, k =
sin θl
2
, K(k) =
π
2
P
Pcr
, (8.390)
with LEM, LEM1, LEM2, and G.
FURTHER READING 525
TABLE 8.22 The Values of the Ratio f/l Computed Comparatively by Using Three LEM
Variants, Grashof’s Formula, and Elliptic Integrals
P/Pcr 1.004 1.015 1.035 1.063 1.102 1.152 1.215 1.293
Exact solution 0.110 0.220 0.324 0.422 0.514 0.594 0.662 0.720
LEM 0.110 0.220 0.324 0.422 0.516 0.601 0.676 0.741
LEM1 0.116 0.220 0.329 0.435 0.541 0.642 0.738 0.829
LEM2 (Schneider) 0.114 0.220 0.335 0.448 0.563 0.689 0.814 0.942
G 0.114 0.221 0.341 0.462 0.596 0.740 0.898 1.072
This comparison is emphasized for 1 < P/Pcr < 1.3, the formulae approximating the postcritical
behavior of the cantilever bar being ordered with respect to their “goodness.”
The mean square errors with respect to the exact solution are 0.24% for LEM, 1.36% for LEM1,
2.67% for LEM2 (Schneider), and 4.22% for G. These results point out that LEM leads to quite
simple formulae, which give very good approximations for the ratio f/l, and that it is, in any case,
much better than Grashof’s formula. Similar conclusions can be drawn for the ratio δ/l and for θ(l).
We can conclude that the method presented here provides direct approximate formulae for f/l,
δ/l, and θ(l) in the case of the cantilever bar, as well as critical values for the loads, considering
various hypotheses.
It must also be mentioned that this method, based on LEM, does not depend on some particular
mechanical interpretation. Using the same pattern, we can obtain similar results for various cases
of loading and support.
FURTHER READING
Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of
America.
Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis:
Theory, Methods and Practice. Boca Raton: CRC Press.
Atkinson KE (1989). An Introduction to Numerical Analysis. 2nd ed. New York: John Wiley & Sons,
Inc.
Atkinson KE (2003). Elementary Numerical Analysis. 2nd ed. Hoboken: John Wiley & Sons, Inc.
Babuˇska I, Pr´ager M, Vit´asek E (1966). Numerical Processes in Differential Equations. Prague: SNTI.
Bakhvalov N (1976). M´ethodes Num´erique. Moscou: Editions Mir (in French).
Boyce WE, DiPrima RC (2008). Elementary Differential Equations and Boundary Value Problems.
9th ed. Hoboken: John Wiley & Sons, Inc.
Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole.
Chapra SC (1996). Applied Numerical Methods with MATLAB for Engineers and Scientists. Boston:
McGraw-Hill.
Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson.
Constantinescu G (1985). Teoria sonicit˘at¸ii. Bucures¸ti: Editura Academiei (in Romanian).
Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice Hall.
D´emidovitch B, Maron I (1973). ´El´ements de Calcul Num´erique. Moscou: Editions Mir (in French).
Den Hartog JP (1961). Strength of Materials. New York: Dover Books on Engineering.
DiBenedetto E (2010). Classical Mechanics: Theory and Mathematical Modeling. New York: Springer-
Verlag.
Epperson JF (2007). An Introduction to Numerical Methods and Analysis. Hoboken: John Wiley &
Sons, Inc.
526 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS
Fung YC, Tong P (2011). Classical and Computational Solid Mechanics. Singapore: World Scientific
Publishing.
Gautschi W (1997). Numerical Analysis: An Introduction. Boston: Birkh¨auser.
Godunov SK, Reabenki VS (1977). Scheme de Calcul cu Diferent¸e Finite. Bucures¸ti: Editura Tehnic˘a
(in Romanian).
Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen-
tation of Algorithms. Princeton: Princeton University Press.
Hamming RW (1987). Numerical Methods for Scientists and Engineers. 2nd ed. New York: Dover
Publications.
Hamming RW (2012). Introduction to Applied Numerical Analysis. New York: Dover Publications.
Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing.
Hibbeler RC (2010). Mechanics of Materials. 8th ed. Englewood Cliffs: Prentice Hall.
Hildebrand FB (1987). Introduction to Numerical Analysis. 2nd ed. New York: Dover Publications.
Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill.
Iserles A (2008). A first Course in the Numerical Analysis of Differential Equations. 2nd ed. Cam-
bridge: Cambridge University Press.
Ixaru LG (1979). Metode Numerice pentru Ecuat¸ii Diferent¸iale cu Aplicat¸ii. Bucures¸ti: Editura
Academiei Romˆane (in Romanian).
Jazar RN (2008). Vehicle Dynamics: Theory and Applications. New York: Springer-Verlag.
Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd
ed. Boca Raton: CRC Press.
Kleppner D, Kolenkow RJ (2010). An Introduction to Mechanics. Cambridge: Cambridge University
Press.
Kress R (1996). Numerical Analysis. New York: Springer-Verlag.
Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill.
Levine L (1964). Methods for Solving Engineering Problems Using Analog Computers. New York:
McGraw-Hill.
Lurie AI (2002). Analytical Mechanics. New York: Springer-Verlag.
Mabie HH, Reinholtz CF (1987). Mechanisms and Dynamics of Machinery. 4th ed. New York: John
Wiley & Sons, Inc.
Lurie AI (2005). Theory of Elasticity. New York: Springer-Verlag.
Marciuk GI (1983). Metode de Analiz˘a Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Roma-
nian).
Marciuk GI, S¸aidurov VV (1981). Cres¸terea Preciziei Solut¸iilor ˆın Scheme cu Diferent¸e. Bucures¸ti:
Editura Academiei Romˆane (in Romanian).
Marinescu G (1974). Analiza Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Romanian).
Meriam JL, Kraige LG (2012). Engineering Mechanics: Dynamics. Hoboken: John Wiley & Sons, Inc.
Otto SR, Denier JP (2005). An Introduction to Programming and Numerical Methods in MATLAB.
London: Springer-Verlag.
Palm WJ III (2007). Mechanical Vibrations. Hoboken: John Wiley & Sons, Inc.
Pandrea N (2000). Elemente de Mecanica Solidului ˆın Coordonate Pl¨uckeriene. Bucures¸ti: Editura
Academiei Romˆane (in Romanian).
Pandrea N, Pˆarlac S, Popa D (2001). Modele pentru Studiul Vibrat¸iilor Automobilelor. Pites¸ti: Tiparg
(in Romanian).
Pandrea N, Popa D (2000). Mecanisme. Teorie s¸i Aplicat¸ii CAD. Bucures¸ti: Editura Tehnic˘a
(in Romanian).
Pandrea N, St˘anescu ND (2002). Mecanic˘a. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific
Computing. 3rd ed. Cambridge: Cambridge University Press.
FURTHER READING 527
Quarteroni A, Sacco R, Saleri F (2010). Numerical Mathematics. 2nd ed. Berlin: Springer-Verlag.
Ralston A, Rabinowitz P (2001). A First Course in Numerical Analysis. 2nd ed. New York: Dover
Publications.
Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press.
Salvadori MG, Baron ML (1962). Numerical Methods in Engineering. Englewood Cliffs: Prentice
Hall.
Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson.
Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN.
Bucures¸ti: Editura Tehnic˘a (in Romanian).
Sinha AK (2010). Vibration of Mechanical Systems. Cambridge: Cambridge University Press.
Soare M, Teodorescu PP, Toma I (2010). Ordinary Differential Equations with Applications to Mechan-
ics. Dordrecht: Springer-Verlag.
St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
St˘anescu ND, Munteanu L, Chiroiu V, Pandrea N (2007). Sisteme Dinamice: Teorie s¸i Aplicat¸ii.
Volume 1. Bucures¸ti: Editura Academiei Romˆane (in Romanian).
St˘anescu ND, Munteanu L, Chiroiu V, Pandrea N (2011). Sisteme Dinamice. Teorie s¸i Applicat¸ii.
Volume 2. Bucures¸ti: Editura Academiei Romˆane (in Romanian).
Stoer J, Bulirsh R (2010). Introduction to Numerical Analysis. 3rd ed. New York: Springer-Verlag.
Stuart AM, Humphries AR (1998). Dynamical Systems and Numerical Analysis. Cambridge: Cam-
bridge University Press.
S¨uli E, Mayers D (2003). An Introduction to Numerical Analysis. Cambridge: Cambridge University
Press.
Teodorescu PP (2008). Mechanical Systems: Classical Models. Volume 2: Mechanics of Discrete and
Continuous Systems. Dordrecht: Springer-Verlag.
Teodorescu PP (2009). Mechanical Systems: Classical Models. Volume 3: Analytical Mechanics. Dor-
drecht: Springer-Verlag.
Toma I (2008). Metoda Echivalent¸ei Lineare s¸i Aplicat¸iile Ei ˆın Mecanic˘a. Bucures¸ti: Editura Tehnic˘a
(in Romanian).
Udris¸te C, Iftode V, Postolache M (1996). Metode Numerice de Calcul. Algoritmi s¸i Programe Turbo
Pascal. Bucures¸ti: Editura Tehnic˘a (in Romanian).
9
INTEGRATION OF PARTIAL
DIFFERENTIAL EQUATIONS
AND OF SYSTEMS OF PARTIAL
DIFFERENTIAL EQUATIONS
9.1 INTRODUCTION
Many problems of science and technique lead to partial differential equations. The mathematical
theories of such equations, especially of the nonlinear ones, are very intricate, such that their
numerical study becomes inevitable.
To classify the partial differential equations, we may use various criteria, that is,
• considering the order of the derivatives, we have equations of first order, second order, or nth
order;
• considering the linearity character, we have linear, quasilinear, or nonlinear equations;
• considering the influence of the integration domain at a point, we have equations of elliptic,
parabolic, or hyperbolic type;
• considering the types of limit conditions, we get Dirichlet, Neumann, or mixed problems.
The partial differential equations which will be dealt with further are mostly the usual equations,
the existence and the uniqueness of the solution being assured.
9.2 PARTIAL DIFFERENTIAL EQUATIONS OF FIRST ORDER
The partial differential equations of first order have the general form
n
i=1
ai(x1, x2, . . . , xn, u)
∂u
∂xi
= b(x1, x2, . . . , xn, u), (9.1)
where u is the unknown function, xi, i = 1, n, are the independent variables, while ai, i = 1, n,
and b are functions that do not depend on the partial derivatives ∂u/∂xi, i = 1, n.
Numerical Analysis with Applications in Mechanics and Engineering, First Edition.
Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.
 2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc.
529
530 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
Definition 9.1
(i) If the functions ai, i = 1, n, and b do not depend on the unknown function u, then the
equation is linear.
(ii) If the function b is identically zero, b ≡ 0, then the equation is called homogeneous.
The solution of equation (9.1) is reduced to the solving of a system of n ordinary differential
equations
dx1
a1(x1, . . . , xn, u)
=
dx2
a2(x1, . . . , xn, u)
= · · · =
dxn
an(x1, . . . , xn, u)
=
du
b(x1, . . . , xn, u)
. (9.2)
Definition 9.2 System (9.2) is called a characteristic system.
In general, the solution of equation (9.1) is an n-dimensional hypersurface in a domain
Dn+1 ⊂ Rn+1
, the solution being of the form F(x1, . . . , xn, u) = 0, or of the form
u = f (x1, . . . , xn).
In the case of the Cauchy problem, the n-dimensional integral hypersurface pierces an (n − 1)-
dimensional hypersurface , contained in the (n + 1)-dimensional definition domain, the hypersur-
face being the intersection of two n-dimensional hypersurfaces,
F1(x1, . . . , xn, u) = 0; F2 = (x1, . . . , xn, u) = 0. (9.3)
The solution of system (9.2) depends on n arbitrary constants Ci, i = 1, n, and is of the form
φi(x1, . . . xn, u) = Ci, i = 1, n. (9.4)
Definition 9.3 The hypersurfaces φi(x1, . . . xn, u) = Ci, i = 1, n, are called characteristic hyper-
surfaces and depend on one parameter.
Relations (9.3) and (9.4) form a system of n + 2 equations from which n + 1 variables x1, x2,
. . . , xn, u are expressed as functions of Ci, i = 1, n; introducing in the last equation, we obtain
(Ci, . . . , Cn) = 0. (9.5)
From equations (9.4) and (9.5) we get the solution
(C1, . . . , Cn) = (φ1, . . . , φn) ≡ F(x1, . . . , xn, u) = 0 (9.6)
To solve the problem numerically, we proceed as follows. We seek the solution in the domain
Dn+1 ⊂ Rn+1, which contains the hypersurface of equation (9.3). We divide conveniently the
hypersurface , observing that the values at the knots represent initial conditions for the system of
differential equation (9.2).
If b ≡ 0, then the system (9.2) is simpler and reads
dx1
a1(x1, . . . , xn, u0)
= · · · =
dxn
an(x1, . . . , xn, u0)
, (9.7)
where u = u0 = const is a first integral of the system.
There are two possibilities to tackle a numerical solution. The first implies the use of explicit
schemata, while the second implies the use of implicit schemata.
PARTIAL DIFFERENTIAL EQUATIONS OF FIRST ORDER 531
9.2.1 Numerical Integration by Means of Explicit Schemata
The first step, in this case, consists of discretization of the partial differential equation, that is,
dividing the domain by means of a calculation net and by replacing the partial differential equation
by a new and simpler equation. The simplest method is based on finite differences.
Let us deal with this method for a simple problem, that is, the partial differential equation of
first order with two independent variables
a1(x1, x2, u)
∂u
∂x1
+ a2(x1, x2, u)
∂u
∂x2
= b(x1, x2, u); x1 ∈ [0, l1]; x2 ∈ [0, l2]. (9.8)
To solve equation (9.8), there are necessary initial conditions of the form
u(x1, 0) = f (x1). (9.9)
Sometimes, limit conditions of the form
u(0, x2) = g0(x2), u(l1, x2) = g1(x2) (9.10)
are put, where the functions f , g0, and g1 are known.
The numerical solution of equation (9.8) implies the division of the rectangular domain
[0, l1] × [0, l2] by means of a net with equal steps on each axis, denoted by h and k for the
variables x1 and x2, respectively (Fig. 9.1).
Using the expansion of the function u(x1, x2) into a Taylor series around the point A(xi
1, x
j
2 ),
we get
u(xi−1
1 , x
j
2 ) = u(xi
1, x
j
2 ) − h
∂(xi
1, x
j
2 )
∂x1
+ O(h2
), (9.11)
u(xi
1, x
j+1
2 ) = u(xi
1, x
j
2 ) + k
∂(xi
1, x
j
2 )
∂x2
+ O(k2
), (9.12)
O
k
hh
k
j+1
x2
i+1
x1
j−1
x2
j−1
x1
xj
2
xi
1
x1
A
x2
Figure 9.1 The calculation net for equation (9.8).
532 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
where xi
1 = ih, i = 0, I, x
j
2 = jk, j = 0, J, h = l1/I, k = l2/J. It follows that
∂u(xi
1, x
j
2 )
∂x1
=
u(xi
1, x
j
2 ) − u(xi−1
1 , x
j
2 )
h
+ O(h), (9.13)
∂u(xi
1, x
j
2 )
∂x2
=
u(xi
1, x
j+1
2 ) − u(xi
1, x
j
2 )
k
+ O(k). (9.14)
Neglecting O(h) and O(k), in equations (9.13) and (9.14), we obtain the equation with finite
differences
a1(xi
1, x
j
2 , u(xi
1, x
j
2 ))
u(xi
1, x
j
2 ) − u(xi−1
1 , x
j
2 )
h
+ a2(xi
1, x
j
2 , u(xi
1, x
j
2 ))
u(xi
1, x
j+1
2 ) − u(xi
1, x
j
2 )
k
= b(xi
1, x
j
2 , u(xi
1, x
j
2 )). (9.15)
Let us now consider the waves propagation equation
∂u
∂t
+ a
∂u
∂x
= 0, x ∈ [0, 1], t ∈ [0, T ], (9.16)
where a is a positive constant.
Applying the previous theory, we obtain the equation in finite differences
V (xi
, tj+1
) = V (xi
, tj
) + c[V (xi−1
, tj
) − V (xi
, tj
)], i = 1, I, j = 1, J, (9.17)
where V (xi
, tj
) means the approximate value of the function u(xi
, tj
), xi
= ih, tj
= jk, h = 1/I,
k = T /J.
Definition 9.4 The number c of relation (9.17), the expression of which is
c =
ak
h
(9.18)
bears the name of Courant.1
Equation (9.16) is equivalent to the system
dt
1
=
dx
a
, (9.19)
which leads to the first integral
x − at = C1, (9.20)
where C1 is a constant; hence, the exact solution of the problem is
u = φ(x − at), (9.21)
where φ is an arbitrary function.
1The number appears in Courant–Friedrichs–Lewy condition of convergence, called after Richard Courant
(1888–1972), Kurt O Friedrichs (1901–1982) and Hans Lewy (1904–1988) who published it in 1928.
PARTIAL DIFFERENTIAL EQUATIONS OF FIRST ORDER 533
If c = 1, then the schema is
V (xi
, tj+1
) = V (xi−1
, tj
). (9.22)
Definition 9.5 We say that a method with finite differences is convergent if the solution obtained
by means of the equation with differences converges to the exact solution, when the norm of the
net tends to zero.
Observation 9.1
(i) No schema is unconditionally stable or unstable.
(ii) The schema given in the previous example is stable for 0 < c ≤ 1.
(iii) A better approximation of the derivative ∂u(xi
, tj
)/∂x by using central differences
∂u(xi, tj )
∂x
=
u(xi+1, tj ) − u(xi−1, tj )
2h
+ O(h2
) (9.23)
leads to an unstable schema for any Courant number c.
An often used explicit schema is the Lax–Wendroff2
schema for which, in the case of the
previous example, the equation with differences reads
V (xi
, tj+1
) = (1 − c2
)V (xi
, tj
) −
c
2
(1 − c)V (xi+1
, tj
) +
c
2
(1 + c)V (xi−1
, tj
), (9.24)
its order of accuracy being O(h2
). Let us note that for c = 1 the Lax–Wendroff schema leads to
the exact solution V (xi
, tj+1
) = V (xi−1
, tj
).
9.2.2 Numerical Integration by Means of Implicit Schemata
The implicit schemata avoid the disadvantage of the conditional convergence that appears in case
of the explicit schemata.
In case of implicit schemata, the space derivative is approximated by using the approximate
values V (xi
, tj+1
) and not the V (xi
, tj
) ones. Thus, we may write
∂u(xi, tj+1)
∂x
=
u(xi+1, tj+1) − u(xi, tj+1)
h
+ O(h). (9.25)
In our example, the equation with finite differences takes the form
V (xi
, tj+1
) =
cV (xi+1
, tj+1
) + V (xi
, tj
)
1 + c
, i = 1, 2, . . . , (9.26)
which is unconditionally convergent.
Another schema often used in the case of the considered example is that of Wendroff, for which
the equation with differences reads
V (xi
, tj+1
) = V (xi−1
, tj
) +
1 − c
1 + c
[V (xi
, tj
) − V (xi−1
, tj
)]. (9.27)
2After Peter David Lax (1926–) and Burton Wendroff (1930–) who presented the method in 1960.
534 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
9.3 PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER
Let us consider the quasi-linear partial differential equation of second order
n
i=1
ai(x1, . . . , xn, u)
∂2
u
∂x2
i
+
n
i=1
bi(x1, . . . , xn, u)
∂u
∂xi
+ c(x1, . . . , xn, u) = 0, (9.28)
written in a canonical form (it does not contain mixed partial derivatives).
Equation (9.28) is
• of elliptic type if all the coefficients ai(x1, . . . , xn, u), i = 1, n, have the same sign;
• of parabolic type if there exists an index j, 1 ≤ j ≤ n, so that aj (x1, . . . , xn, u) = 0,
ai(x1, . . . , xn, u) = 0 for i = j, 1 ≤ i ≤ n, and bj (x1, . . . , xn, u) = 0;
• of hyperbolic type if all the coefficients ai(x1, . . . , xn, u) have the same sign, excepting one,
which is of opposite sign.
Observation 9.2
(i) In case of an equation of elliptic type, an arbitrary point of the domain is influenced by all
the points of any of its neighborhood. For the reason of reciprocal influence, a problem of
elliptic type is numerically solved simultaneously for all the points of the domain. Moreover,
the limit conditions are conditions of closed frontiers.
(ii) If the equation is of parabolic type, then we can numerically go on in the direction xj for
which aj (x1, . . . , xn, u) = 0. Equation (9.28) is now written in the form
bj (x1, . . . , xn, u)
∂u
∂xj
= F x1, . . . , xn, u,
∂u
∂xi
,
∂2
u
∂x2
i
, i = 1, n, i = j. (9.29)
The problem is now solved only for the points situated on the hypersurfaces xj = const and
not for all the points of the domain.
(iii) In the case of hyperbolic equations, there exist points, which do not influence each other.
The numerical solution must take this fact into account. Moreover, there exist several distinct
characteristic directions along which we may go on starting from a certain initial state. In
the case of these equations, we may have not only initial conditions but boundary conditions
too.
9.4 PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER
OF ELLIPTIC TYPE
We consider Poisson’s equation3
∇2
u(x, y) =
∂2
u
∂x2
(x, y) +
∂2
u
∂y2
(x, y) = f (x, y), (9.30)
where (x, y) ∈ D, D rectangular domain,
D = {(x, y)| a < x < b, c < y < d}, (9.31)
3The equation was studied by Sim´eon Denis Poisson (1781–1840) in 1818.
PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER OF ELLIPTIC TYPE 535
with the boundary condition
u(x, y) = g(x, y), (x, y) ∈ ∂D. (9.32)
Observation 9.3 If f (x, y) and g(x, y) are continuous, then problem (9.30) with the boundary
conditions (9.32) has a unique solution.
We divide the interval [a, b] in n equal subintervals of length h and the interval [c, d] in m equal
subintervals of length k, so that
h =
b − a
n
, k =
d − c
m
. (9.33)
Thus, the rectangle D will be covered by a net grid with vertical and horizontal lines which pass
through the points xi, i = 0, n, and yj , j = 0, m, where
xi = a + ih, i = 0, n, (9.34)
yi = c + jk, j = 0, m. (9.35)
Let a knot be Aij (xi, yj ), i = 1, n − 1, j = 1, m − 1, from the inside of the net. We may expand
the function u(x, y) into a Taylor series in the x-variable, around xi, obtaining
∂2
u
∂x2
(xi, yj ) =
u(xi+1, yj ) − 2u(xi, yj ) + u(xi−1, yj )
h2
−
h2
12
∂4
u
∂x4
(ξi, yj ), (9.36)
where ξi is an intermediary value between xi−1 and xi+1. Analogically, expanding the function
u(x, y) into a Taylor series in the y-variable, around yj , it follows that
∂2
u
∂y2
(xi, yj ) =
u(xi, yj+1) − 2u(xi, yj ) + u(xi, yj−1)
h2
−
k2
12
∂4
u
∂y4
(xi, ηj ), (9.37)
with ηj , in this case being an intermediary point between yj−1 and yj .
By means of formulae (9.36) and (9.37), problems (9.30) and (9.32) become
u(xi+1, yj ) − 2u(xi, yj ) + u(xi−1, yj )
h2
+
u(xi, yj+1) − 2u(xi, yj ) + u(xi, yj−1)
k2
= f (xi, yj ) +
h4
12
∂4
u
∂x4
(xi, yj ) +
k4
12
∂4
u
∂y4
(xi, yj ), i = 1, n − 1, j = 1, m − 1, (9.38)
u(x0, yj ) = g(x0, yj ), j = 0, m, (9.39)
u(xn, yj ) = g(xn, yj ), j = 0, m, (9.40)
u(xi, y0) = g(xi, y0), i = 1, n − 1, (9.41)
u(xi, ym) = g(xi, ym), i = 1, n − 1. (9.42)
Observation 9.4 The local truncation error is of order O(h2 + k2).
We use the notation
wij = u(xi, yj ), i = 0, n, j = 0, m; (9.43)
536 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
and take into account that h and k are sufficiently small to rewrite formulae (9.38)–(9.42) in the
form
2
h
k
2
+ 1 wij − (wi+1,j + wi−1,j ) −
h
k
2
(wi,j+1 + wi,j−1) = −h2
f (xi, yj ), (9.44)
w0,j = g(x0, yj ), j = 0, m, (9.45)
wn,j = g(xn, yj ), j = 0, m, (9.46)
wi,0 = g(xi, y0), i = 1, n − 1, (9.47)
wi,m = g(xi, ym), i = 1, n − 1. (9.48)
Equation (9.44), equation (9.45), equation (9.46), equation (9.47)and equation (9.48) lead to a
system of (n − 1)(m − 1) linear equations with (n − 1)(m − 1) unknowns, that is, wi,j = u(xi, yj ),
i = 1, n − 1, j = 1, m − 1.
Numbering the knots of the net again, so that
Ai,j = Al, (9.49)
where
l = i + (m − 1 − j)(n − 1), i = 1, n − 1, j = 1, m − 1 (9.50)
and noting
wi,j = wl, (9.51)
we may write the system of (n − 1)(m − 1) equations with (n − 1)(m − 1) unknowns in a matrix
form.
O
y
A2(n−1)
An−1
An
An−2A3A2A1
An+1 An+2 A2n−3
A(n−1)(m−1)−1
A(n−1)(m−2)−1
A(n−1)(m−1)
A(n−1)(m−2)
A(n−1)(m−3)+2A(n−1)(m−3)+1A(n−1)(m−2)+1
A(n−1)(m−2)+2
A(n−1)(m−3)+3
A(n−1)(m−2)+3
ym−1
ym−2
ym
y3
y2
y1
xn−2 xn−1 xxnx1 x2 x3
Figure 9.2 The numbering of the internal knots of the net.
PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER OF ELLIPTIC TYPE 537
Observation 9.5 The renumbering creates a succession of the internal knots of the net, starting
from left up to right lateral as shown in Figure 9.2.
The algorithm of the finite differences for problems (9.30) and (9.32) reads
– given m, n, a, b, c, d, ε, g(x, y), f (x, y);
– calculate h = (b − a)/n, k = (d − c)/m;
– for i from 0 to n calculate xi = a + ih;
– for j from 0 to m calculate yj = c + jk;
– for i from 1 to n − 1 do
– for j from 1 to m − 1 do
– calculate w(0)
i,j = 0;
– calculate λ = (h2
/k2
);
– set l = 1;
– repeat
– calculate
w(l)
1,m−1 = [1/2(λ + 1)] − h2
f x1, ym−1 + g(x0, ym−1) + λg(x1, ym)
+λw(l−1)
1,m−2 + w(l−1)
2,m−1 ;
– for i from 2 to n − 2 calculate
w(l)
i,m−1 = [1/2(λ + 1)] − h2
f xi, ym−1 + λg(xi, ym) + w(l)
i−1,m−1
+w(l−1)
i+1,m−1 + λw(l−1)
i,m−2 ;
– calculate
w(l)
n−1,m−1 = [1/2(λ + 1)] − h2
f xn−1, ym−1 + g(xn, ym−1) + λg(xn−1, ym)
+w(l)
n−2,m−1 + λw(l−1)
n−1,m−2 ;
– for j from m − 2 to 2 do
– calculate
w(l)
1,j = [1/2(λ + 1)] −h2f x1, yj + g(x0, yj ) + λw(l)
1,j+1 + λw(l−1)
1,j−1 + w(l−1)
2,j ;
– for i from 2 to n − 2 do
– calculate
w(l)
i,j = [1/2(λ + 1)] −h2
f xi, yj + w(l)
i−1,j + λw(l)
i,j+1 + w(l−1)
i+1,j + λw(l−1)
i,j−1 ;
– calculate
w(l)
n−1,j = [1/2(λ + 1)] −h2
f xn−1, yj + g(xn, yj ) + w(l)
n−2,j + λw(l)
n−1,j + λw(l−1)
n−1,j−1 ;
– calculate
w(l)
1,1 = [1/2(λ + 1)] −h2
f x1, y1 + g(x0, y1) + λg(x1, y0) + λw(l)
1,2 + w(l−1)
2,1 ;
– for i from 2 to n − 2 do
– calculate
w(l)
i,1 = [1/2(λ + 1)] −h2
f x1, y1 + g(xi, y0) + w(l)
i−1,1 + λw(l)
i,2 + w(l−1)
i+1,1 ;
– calculate
w(l)
n−1,1 = [1/2(λ + 1)][−h2f (xn−1, y1) + g(xn, y1) + λg(xn−1, y0)
+w(l)
n−2,1 + λw(l)
n−1,2]
– set b = true;
– for i from 1 to n − 1 do
– for j from 1 to m − 1 do
– if |w(l)
i,j − w(l−1)
i,j | ≥ ε
then b = false;
– if b = false
538 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
then l = l + 1;
– until b = true.
At the end, wi,j approximates u(xi, yj ) for i = 1, n − 1, j = 1, m − 1.
Observation 9.6 The solving of the linear system has been previously made by the Gauss–Seidel
method.
9.5 PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER OF
PARABOLIC TYPE
We consider the partial differential equation of second order of parabolic type of the form
∂u
∂t
(x, t) − α2 ∂2
u
∂x2
(x, t) = 0, 0 ≤ x ≤ l, t > 0, (9.52)
with the initial and on the frontier conditions
u(x, 0) = f (x), 0 ≤ x ≤ l, (9.53)
u(0, t) = u(l, t) = 0, t > 0. (9.54)
We begin by choosing two net constants h and k, where
h =
l
m
, m ∈ N. (9.55)
Thus, the knots of the net are (xi, tj ), where
xi = ih, i = 0, m, (9.56)
tj = jk, j = 0, 1, . . . (9.57)
Expanding into a Taylor series, we obtain the formulae with differences
∂u
∂t
(xi, tj ) =
u(xi, tj + k) − u(xi, tj )
k
−
k
2
∂2u
∂x2
(xi, τj), (9.58)
where τj ∈ (tj , tj+1) and
∂2u
∂x2
(xi, tj ) =
u(xi + h, tj ) − 2u(xi, tj ) + u(xi − h, tj )
h2
−
h2
2
∂4u
∂x4
(ξi, tj), (9.59)
where ξi is a point between xi−1 and xi+1. Replacing expressions (9.58) and (9.59) in equation
(9.52), we obtain the linear system
wi,j+1 − wi,j
k
− α2
wi+1,j − 2wi,j + wi−1,j
h2
= 0, i = 1, m − 1, j = 1, 2, . . . , (9.60)
where wij is the approximate of u(xi, tj ).
PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER OF PARABOLIC TYPE 539
Observation 9.7 The truncation error is of the order O(h + k2
).
From equation (9.60) we get
wi,j+1 = 1 −
2α2
k
h2
wi,j + α2 k
h2
(wi+1,j + wi−1,j ), i = 1, m − 1, j = 1, 2, . . . (9.61)
Condition (9.53) leads to
wi,0 = f (xi), i = 0, m. (9.62)
With these values, we can determine wi,1, i = 1, m − 1. From the frontier condition (9.54) we
obtain
w0,1 = wm,1 = 0. (9.63)
Applying now the above described procedure with the known values wi,1, it follows that we can
determine the other values wi,2, wi,3, . . . , wi,m−1. We obtain a tridiagonal quadratic matrix of order
m − 1 associated to the linear system, the form of which is
A =








1 − 2λ λ 0 0 · · · 0 0 0
λ 1 − 2λ λ 0 · · · 0 0 0
0 λ 1 − 2λ λ · · · 0 0 0
· · · · · · · · · · · · · · · · · · · · · · · ·
0 0 0 0 · · · 0 1 − 2λ λ
0 0 0 0 · · · 0 λ 1 − 2λ








, (9.64)
where
λ = α2 k
h2
. (9.65)
If we now denote
w(j)
=





w1,j
w2,j
...
wm−1,j





, j = 1, 2, . . . , (9.66)
w(0)
=





f x1
f (x2)
...
f (xm−1)





, (9.67)
then the approximate solution of problems (9.52)–(9.54) is given by the matrix equation
w(0)
= Aw(j−1)
, j = 1, 2, . . . (9.68)
Definition 9.6 The technique that has been presented is called the method with differences forward.
If we denote the error in the representation of the initial data w(0)
by ε(0)
, then w(1)
reads
w(1)
= A(w(0)
+ ε(0)
) = Aw(0)
+ Aε(0)
, (9.69)
540 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
so that the representation error of w(1) is given by Aε(0). Step by step we obtain the representation
error Anε(0) of w(n). Hence, the method is stable if and only if
An
ε(0)
≤ ε(0)
, n = 1, 2, . . . (9.70)
This implies An ≤ 1, where denotes any of the canonical norms; it follows that the spectral
radius of the matrix An must be at most equal to unity
ρ(An
) = [ρ(A)]n
≤ 1. (9.71)
This happens if all the eigenvalues of the matrix A are at most equal to unity.
On the other hand, the eigenvalues of the matrix A are given by
µi = 1 − 4λsin2 πi
2m
, i = 1, m − 1, (9.72)
while the stability condition
1 − 4λsin2 πi
2m
≤ 1, i = 1, m − 1, (9.73)
leads to
0 ≤ 1 − 4λsin2 πi
2m
≤
1
2
, i = 1, m − 1. (9.74)
Making now m → ∞ (or its equivalent, h → 0), we get
lim
m→∞
sin2 (m − 1) π
2m
= 1, (9.75)
hence the searched condition is
0 ≤ λ = α2 k
h2
≤
1
2
. (9.76)
The previous presented schema is thus conditioned stable.
A nonconditioned stable schema starts from the relation
∂u
∂t
(xi, tj ) =
u(xi, tj ) − u(xi, tj−1)
k
+
k
2
∂2
u
∂t2
(xi, τj ), (9.77)
where τj is a point between tj−1 and tj , as well as from formula (9.59). We obtain
wi,j − wi,j−1
k
− α2
wi+1,j − 2wi,j + wi−1,j
h2
= 0, wij ≈ u(xi, tj ). (9.78)
Definition 9.7 The above presented method is called the method with differences backward.
Equation (9.78) is written in the form
(1 + 2λ)wi,j − λwi+1,j − λwi−1,j = wi,j−1, i = 1, m − 1, j = 1, 2, . . . (9.79)
Because wi,0 = f (xi), i = 1, m − 1, and wm,0 = w0,j = 0, j = 1, 2, . . . , the linear system
takes the matrix form
Aw(j)
= w(j−1)
, (9.80)
PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER OF PARABOLIC TYPE 541
where the matrix A is
A =








1 + 2λ −λ 0 0 · · · 0 0 0
−λ 1 + 2λ −λ 0 · · · 0 0 0
0 −λ 1 + 2λ −λ · · · 0 0 0
· · · · · · · · · · · · · · · · · · · · · · · ·
0 0 0 0 · · · 0 1 + 2λ −λ
0 0 0 0 · · · 0 −λ 1 + 2λ








. (9.81)
The solving algorithm of problems (9.52)–(9.54) is as follows:
– given m > 0, k, N > 0, T = kN , l;
– calculate h = (l/m);
– for i from 0 to m do
– calculate xi = ih;
– calculate λ = α2
(k/h2
);
– for i from 1 to m − 1 do
– calculate wi,0 = f (xi);
– for j from 1 to N do
– calculate w0,j = wm,j = 0;
– calculate l1 = 1 + 2λ, u1 = −(λ/l1);
– for n from 2 to m − 2 do
– calculate ln = 1 + 2λ + λun−1, un = −(λ/ln);
– calculate lm−1 = 1 + 2λ + λun−2;
– for j from 0 to N do
– calculate z1 = w1,j /l1;
– for n from 2 to m − 1 do
– calculate zn = [(wn,j + λzn−1)/ln];
– calculate wm−1,j+1 = zm−1;
– for n from m − 2 to 1 do
– calculate wn,j+1 = znwn+1,j+1.
The values wi,j approximate u(xi, tj ), i = 0, m, j = 0, N.
In the case of the above described algorithm, the matrix A has the eigenvalues
µi = 1 + 4λsin2 iπ
2m
, i = 1, m − 1, (9.82)
all of them being positive and superunitary. Thus, it follows that the eigenvalues of the matrix
A−1
are positive and subunitary, and hence the method with differences backward is unconditioned
stable.
Using for ∂u(xi, tj )/∂t, the formula with central differences
∂u
∂t
(xi, tj ) =
u(xi, tj+1) − u(xi, tj−1)
2k
−
k2
6
∂3u
∂t3
(xi, τj ), (9.83)
where τj is between tj−1 and tj+1, and for ∂2
u(xi, tj )/∂x2
, formula (9.59), we obtain the approxi-
mating system
wi,j+1 − wi,j−1
2k
− α2
wi+1,j − 2wi,j + wi−1,j
h2
= 0, wij ≈ u(xi, tj ). (9.84)
542 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
Definition 9.8 The method put in evidence by relation (9.84) bears the name of Richardson.4
Observation 9.8
(i) The error of Richardson’s method is of order O(h2
+ k2
).
(ii) The Richardson method is conditioned stable.
An unconditioned stable method leads to the equation with finite differences
wi,j+1 − wi,j
k
−
α2
2
wi+1,j − 2wi,j + wi−1,j
h2
+
wi+1,j+1 − 2wi,j+1 + wi−1,j+1
h2
= 0. (9.85)
Definition 9.9 The method given by formula (9.85) is called the Crank–Nicolson method.
Observation 9.9 The truncation error in the Crank–Nicolson method is of order O(h2
+ k2
).
The Crank–Nicolson system may be written in a matrix form
Aw(j+1)
= Bw(j)
, j = 0, 1, 2, . . . , (9.86)
the matrices A and B being given by
A =















1 + λ −
λ
2
0 0 · · · 0 0 0
−
λ
2
1 + λ −
λ
2
0 · · · 0 0 0
0 −
λ
2
1 + λ −
λ
2
· · · 0 0 0
· · · · · · · · · · · · · · · · · · · · · · · ·
0 0 0 0 · · · 0 1 + λ −
λ
2
0 0 0 0 · · · 0 −
λ
2
1 + λ















, (9.87)
B =















1 − λ
λ
2
0 0 · · · 0 0 0
λ
2
1 − λ
λ
2
0 · · · 0 0 0
0
λ
2
1 − λ
λ
2
· · · 0 0 0
· · · · · · · · · · · · · · · · · · · · · · · ·
0 0 0 0 · · · 0 1 − λ
λ
2
0 0 0 0 · · · 0
λ
2
1 − λ















. (9.88)
The solving Crank–Nicolson algorithm5
of solving problems (9.52)–(9.54) is as follows:
– given: m > 0, k > 0, N > 0, T = kN , l;
– calculate h = (l/m);
– for i from 0 to m do
– calculate xi = ih;
– for j from 0 to N do
– calculate tj = jk;
4
After Lewis Fry Richardson (1881–1953) who presented it in 1922.
5John Crank (1916–2006) and Phyllis Nicolson (1917–1968) published this algorithm in A Practical Method for
Numerical Evaluation of Solutions of Partial Differential Equations of the Heat Conduction Type in 1947.
PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER OF HYPERBOLIC TYPE 543
– calculate λ = α2
(k/h2
);
– for i from 1 to m − 1 do
– calculate wi,0 = f (xi);
– for j from 1 to N do
w0,j = wm,j = 0;
– calculate l1 = 1 + λ, u1 = −(λ/2l1);
– for n from 2 to m − 2 do
– calculate ln = 1 + λ + λ(un − 1/2), un = −(λ/2ln);
– calculate lm−1 = 1 + λ + λ(um−2/2);
– for j from 0 to N − 1 do
– calculate z1 = [(1 − λ)w1,j + (λ/2)w2,j ]/l1;
– for n from 2 to m − 1 do
– calculate zn = [(1 − λ)wn,j + (λ/2)wn+1,j + (λ/2)wn−1,j + (λ/2)zn−1]/ln;
– calculate wm−1,j+1 = zm−1;
– for n from m − 2 to 1 do
– calculate wn,j+1 = zn − unwn+1,j+1.
Finally, wi,j approximate u(xi, tj ), i = 0, m, j = 0, N.
9.6 PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER
OF HYPERBOLIC TYPE
We start from the equation
∂2u
∂t2
(x, t) − α2 ∂2u
∂x2
(x, t) = 0, 0 < x < l, t > 0, (9.89)
to which the conditions
u(0, t) = u(l, t) = 0, t > 0, (9.90)
u(x, 0) = f (x), 0 ≤ x ≤ l, (9.91)
∂u
∂t
(x, 0) = g(x), 0 ≤ x ≤ l. (9.92)
are added; α is a real constant in equation (9.89).
Let us choose a nonzero natural number m and a time step k > 0 and denote
h =
l
m
. (9.93)
Thus, the knots (xi, tj ) of the net are given by
xi = ih, i = 0, m, (9.94)
tj = jk, j = 0, 1, . . . (9.95)
Let Ai,j (xi, tj ) be an interior point of the net. We can write the relation
∂2
u
∂t2
(xi, tj ) − α2 ∂2
u
∂x2
(xi, tj ) = 0 (9.96)
at this point. Using the central differences of second order, we can write
∂2
u
∂t2
(xi, tj ) =
u(xi, tj+1) − 2u(xi, tj ) + u(xi, tj−1)
k2
−
k2
12
∂4
u
∂t4
(xi, τj ), (9.97)
544 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
where τj is an intermediary value between tj−1 and tj+1, and
∂2u
∂t2
(xi, tj ) =
u(xi+1, tj ) − 2u(xi, tj ) + u(xi−1, tj )
k2
−
h2
12
∂4u
∂x4
(ξi, tj ), (9.98)
where ξi ∈ (xi−1, xi+1). It follows that
u(xi, tj+1) − 2u(xi, tj ) + u(xi, tj−1)
k2
− α2
u(xi+1, tj ) − 2u(xi, tj ) + u(xi−1, tj )
k2
=
1
12
k2 ∂4
u
∂t4
xi, τj − α2
h2 ∂4
u
∂x4
(ξi, tj ) , (9.99)
which will be approximated by
wi,j+1 − 2wi,j + wi,j−1
k2
− α2
wi+1,j − 2wi,j + wi−1,j
h2
= 0. (9.100)
Denoting
λ =
αk
h
, (9.101)
we obtain
wi,j+1 − 2wi,j + wi,j−1 − λ2
wi+1,j + 2λ2
wi,j − λ2
wi−1,j = 0 (9.102)
from equation (9.100), or equivalently,
wi,j+1 = 2(1 − λ2
)wi,j + λ2
(wi+1,j + wi−1,j ) − wi,j−1 = 0, i = 1, m − 1, j = 1, 2, . . .
(9.103)
The frontier conditions (9.90) are
w0,j = wm,j = 0, j = 1, 2, . . . , (9.104)
while the initial conditions (9.91) lead to
wi,0 = f (xi), i = 1, m − 1. (9.105)
We obtain the matrix equation
w(j+1)
= Aw(j)
− w(j−1)
, (9.106)
where
w(k)
=





w1,k
w2,k
...
wm−1,k





, (9.107)
A =








2 1 − λ2
λ2
0 0 · · · 0 0 0
λ2 2(1 − λ2) λ2 0 · · · 0 0 0
0 λ2
2(1 − λ2
) λ2
· · · 0 0 0
· · · · · · · · · · · · · · · · · · · · · · · ·
0 0 0 0 · · · 0 2(1 − λ2) λ2
0 0 0 0 · · · 0 λ2
2(1 − λ2
)








. (9.108)
PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER OF HYPERBOLIC TYPE 545
Observation 9.10 We notice that to determine w(j+1), the values w(j) and w(j−1) that create
difficulties for j = 0 are necessary, because the values w1,j , j = 1, 2, . . . , must be determined by
condition (9.92).
Usually, ∂u/∂t is replaced by the expression with differences backward
∂u
∂t
(xi, 0) =
u(xi, t1) − u(xi, t0)
k
− k2 ∂2
u
∂t2
(xi, τi), (9.109)
where τi ∈ (0, k). Thus, it follows that
wi,1 = wi,0 + kg(xi), i = 1, m, (9.110)
which leads to the error O(k) in the initial data.
On the other hand, the local truncation error for equation (9.103) is of order O(h2
+ k2
); we
wish to have an error of order O(k2
) for the initial data. We have
u(xi, t1) − u(xi, t0)
k
=
∂u
∂t
(xi, 0) +
k
2
∂2
u
∂t2
(xi, 0) +
k2
6
∂3
u
∂t3
(xi, τi), (9.111)
where τi ∈ (0, k).
Supposing that equation (9.89) takes place on the initial interval too, that is, we may write
∂2u
∂t2
(xi, 0) − α2 ∂2u
∂x2
(xi, 0) = 0, i = 0, m, (9.112)
and if there also exists f (x), then we may write
∂2
u
∂t2
(xi, 0) = α2 ∂2
u
∂x2
(xi, 0) = α2 d2
f (xi)
dx2
= α2
f (xi). (9.113)
But
f (xi) =
f (xi+1) − 2f (xi) + f (xi−1)
h2
−
h2
12
f (4)
(ξi), (9.114)
where ξi is between xi−1 and xi+1, while f ∈ C4
([0, l]), and we obtain
u(xi, t1) − u(xi, 0)
k
= g(xi) +
kα2
2h2
[f (xi+1) − 2f (xi) + f (xi−1)] + O(h2
+ k2
). (9.115)
We get successively
u(xi, t1) = u(xi, 0) + kg(xi) +
λ2
2
[f (xi+1) − 2f (xi) + f (xi−1)] + O(k3
+ h2
k2
)
= (1 − λ2
)f (xi) +
λ2
2
f (xi+1) +
λ2
2
f (xi−1) + kg(xi) + O(k3
+ h2
k2
). (9.116)
It follows that the determination of the values wi,1, i = 1, m − 1, can be made by means of the
relation
wi,1 = (1 − λ2
)f (xi) +
λ2
2
f (xi+1) +
λ2
2
f (xi−1) + kg(xi). (9.117)
546 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
Thus, the algorithm with finite differences used to solve problems (9.89)–(9.92) is
– given: m, N > 0, k > 0, l, α, f (x), g(x);
– calculate h = (l/m), T = kN , λ = (αk/h);
– for i from 0 to m do
– calculate xi = ih;
– for j from 0 to N do
– calculate tj = j × k;
– for j from 1 to N do
– calculate w0,j = wm,j = 0;
– for i from 0 to m do
– calculate wi,0 = f (xi);
– for i from 1 to m − 1 do
– calculate wi,1 = (1 − λ2
)f (xi) + (λ2
/2)(f (xi+1) + f (xi−1)) + kg(xi);
– for j from 1 to N − 1 do
– for i from 1 to m − 1 do
– calculate wi,j−1 = 2(1 − λ2
)wi,j − λ2
(wi+1,j + wi−1,j ) − wi,j−1.
Thus, wi,j approximate u(xi, tj ), i = 0, m, j = 0, N.
9.7 POINT MATCHING METHOD
This method6
has been developed at the middle of the twentieth century. We will present it in the
two-dimensional case for partial differential equations of elliptic type, particularly for biharmonic
equations (we may use it on the same way for polyharmonic equations too). The method fits well
with the plane problem of the theory of elasticity, formulated for a plane domain D.
Some methods of calculation (e.g., the variational methods) allow to obtain, with an approxima-
tion as good as we wish, the searched function (solution of the partial differential equation) and its
derivatives at any point of the domain D. Other methods (finite differences method, nets method,
relaxation method, etc.) allow to obtain an appropriate value of the searched function at a finite
number of points in the interior of the domain, satisfying the boundary conditions also at a finite
number of points.
We can imagine a method of calculation that uses ideas from both types of methods. The method
consists in searching an analytic function of a form as simple as possible, which does verify the
partial differential equation at any point of D, excepting the boundary, where this does occur at a
finite number of points.
We will thus search a biharmonic function
F(x, y) =
n
i=2
Pi(x, y), (9.118)
where Pi(x, y) are biharmonic polynomials ( Pi = 0) of ith degree, i = 2, 3, . . . We notice that
such a polynomial implies four arbitrary constants, except P2(x, y), which contain only three such
constants. Thus, F(x, y) contains 4n − 5 arbitrary constants.
At a point of the boundary, we may put two conditions, that is, for the function F (or for its
tangential derivative ∂F/∂s) and for the normal derivative ∂F/∂n. Hence, for a point of the contour
we get two conditions for the constants to determine. If we put boundary conditions at 2n − 3 points
of the contour, we find a system of 4n − 6 equations, with 4n − 5 unknowns, which will determine
the coefficients of the biharmonic polynomial. One of these constants must be taken arbitrary.
6Also known as collocation method. It was introduced by Leonid Vitaliyevich Kantorovich (1912–1986) in 1934.
VARIATIONAL METHODS 547
+
+
+
+
+
+
+
+
2n−4
2n−3
1
2
3 4
B1 or B2
Figure 9.3 Point matching method.
Let B1 and B2 be the distribution of the real boundary conditions and B1 and B2 the boundary
conditions obtained after calculation (Fig. 9.3). The differences B1 = B1 − B1 , B2 = B2 − B2 must
be as small as possible, so that the error in the determination of the biharmonic function will also be
as small as possible. The calculation of the error may be made from case to case from the physical
point of view.
As an advantage, we mention that contour can be a complicated one and that one gets an
analytical expansion for the solution.
Besides elementary representations (biharmonic polynomials), we may also use other functions,
adequate for some particular problems.
We have to solve a system of linear algebraic equations, so that various methods of calculation
can be used.
In fact, the method considered above is a collocation method.
9.8 VARIATIONAL METHODS
Let us consider the functional
I(y) =
x1
x0
f (x, y, y )dx, (9.119)
where f is a function continuous, together with its derivatives till the second order inclusive, in
a domain of R3
, y = y(x) is continuous with continuous derivative y = dy/dx and y(x0) = y0,
y(x1) = y1. It follows that the function f verifies Euler’s equation
∂f
∂y
−
d
dx
∂f
∂y
= 0. (9.120)
If the functional is of the form
I(y) = · · · f x1, x2, . . . , xn, y,
∂f
∂x1
,
∂f
∂x2
, . . . ,
∂f
∂xn
dx1dx2 · · · dxn, (9.121)
then Euler’s equation reads
∂f
∂y
−
d
dx1
∂f
∂y
∂x1
−
d
dx2
∂f
∂y
∂x2
− · · · −
d
dxn
∂f
∂y
∂xn
= 0. (9.122)
548 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
In the general case, we consider the equation
Lu = f, (9.123)
where L is an autoadjoint positive linear operator, with the domain of definition D in the Hilbert
space H, the dot product , which has values (the operator) in H, u ∈ D, while f ∈ H.
Proposition 9.1 If the solution of problem (9.123) exists, then this one assures the minimum of
the functional
I(u) = Lu, u − 2 u, f . (9.124)
Demonstration. Let u be the solution of problem (9.123) and v ∈ D arbitrary and nonzero. If c is
a real nonzero number, then we consider
vc = u + cv (9.125)
and we may write
I(vc) = L(u + cv), u + cv − 2 u + cv, f . (9.126)
Because L is autoadjoint, we have
I(vc) = I(u) + 2c Lu − f, v + c2
Lv, v , (9.127)
obtaining thus
I(vc) = I(u) + c2
Lv, v ; (9.128)
because L is positive, it follows that
I(vc) > I(u) (9.129)
for any c = 0. Hence u minimizes the functional I(u).
Proposition 9.2 If u ∈ H minimizes the functional I(u) and u ∈ D, then Lu = f .
Demonstration. Let v ∈ D be arbitrary. Because w = αu + βv ∈ D, with α and β constant, taking
into account that the functional I(u) attains its minimum at u, we get
I(u + cv) ≥ I(u). (9.130)
If L is symmetric, then from equation (9.130) we obtain
2c Lu − f, v + c2
Lv, v ≥ 0; (9.131)
with necessity, and it follows that
Lu − f, v = 0, (9.132)
that is, Lu − f is orthogonal on any element of D, hence
Lu − f = 0. (9.133)
VARIATIONAL METHODS 549
9.8.1 Ritz’s Method
In the frame of this method,7
we consider the problem
Lu = f, (9.134)
in Hilbert’s space H, with the scalar product , ; D is the domain of definition of L, considered
dense in H, while L is a positive definite autoadjoint operator.
The problem is equivalent to the finding of the element u ∈ D, which minimizes the functional
I(u) = Lu, u − 2 f, u . (9.135)
To ensure the existence of the solution, we consider a new scalar product in H, defined by
u, v L = Lu, v , u, v ∈ D, (9.136)
the norm being given by
u L = u, u L. (9.137)
Definition 9.10 We call energetic space defined by the operator L, the space obtained by the
completing of D by the norm L. We denote this space by HL.
We may write
I(u) = u, u L − 2 f, u , u ∈ D. (9.138)
Because L is positive definite, that is,
Lu, u = u, u L ≥ c2
u 2
, u ∈ D, (9.139)
with c constant, c > 0, the by completing D to HL, it follows that u, u L ≥ c2 u , for any u ∈ HL.
On the other hand,
| u, f | ≤ u f ≤
1
c
u L f = B u L, (9.140)
so that u, f is bounded, and we may apply Ritz’s theorem. It follows that there exists u0 ∈ HL,
so that for any u ∈ HL we have
u, f = u, u0 L. (9.141)
Thus, the functional reads
I(u) = u, u L − 2 f, u = u, u L − 2 u, u0 L = u − u0
2
L − u0
2
L, (9.142)
with u ∈ HL; hence it attains its minimum for u = u0.
Definition 9.11 The element u0 ∈ HL bears the name of generalized solution of the equation
Lu = f .
Observation 9.11 If u0 ∈ D, then u0 is the classical solution of problem (9.134).
7After Walther Ritz (1878–1909) who published this method in 1909.
550 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
We will consider a sequence of finite dimensional subspaces Hk ⊆ HL given by the parameters
k1, k2, . . . , so that ki → 0, for i → ∞.
Definition 9.12 We say that the sequence {Hk} is complete in HL if for any u ∈ HL and ε > 0
there exists k = k(u, ε) > 0, so that
inf
v∈Hk
u − v L < ε, (9.143)
for any k < k.
From the previous definition we deduce that if {Hk} is complete, then any element u ∈ HL may
be approximated with any precision that we may wish with elements of Hk.
We will ask to determine the element uk ∈ Hk, which minimizes the functional I(u) in Hk.
Proposition 9.3 In the above conditions, the sequence {uk
} of Ritz’s approximations for the
solution of the equation Lu = f converges to the generalized solution of this problem.
Demonstration. For v ∈ Hk we have
u0 − uk 2
L = I(uk
) − I(u0) ≤ I(v) − I(u0) = u0 − v 2
L. (9.144)
Because v is arbitrary, we may write
u0 − uk 2
L ≤ inf
v∈Hk
u0 − v 2
L −−−→
k→0
0. (9.145)
If a basis of the space Hk formed by the functions φk
1, φk
2, . . . , φk
nk
(nk being the dimension
of the space Hk) is known, then the problem of the determination of uk
∈ Hk is equivalent to the
determination of the coefficients c1, c2, . . . , cnk
in the expansion
uk
= c1φk
1 + c2φk
2 + · · · + cnk
φk
nk
. (9.146)
We obtain the system
Ac = g, (9.147)
where
c = c1 · · · cnk
T
, (9.148)
g = g1 · · · gnk
T
, gi = f, φk
i , i = 1, nk, (9.149)
A = [aij ]i,j=1,nk
, aij = φk
i , φk
j , i, j = 1, nk. (9.150)
If φk
i ∈ D, i = 1, nk, then we may also write
aij = Lφk
i , φk
j , i, j = 1, nk. (9.151)
Let us remark that the matrix A is symmetric and positive definite, because
Av, v =
nk
i=1
nk
j=1
aij vivj =
nk
i=1
viφk
i ,
nk
j=1
vj φk
j
L
≥ c2
nk
i=1
viφk
i
2
≥ 0. (9.152)
VARIATIONAL METHODS 551
Observation 9.12 It is possible that the functions φk
1, φk
2, . . . , φk
nk
do not verify the limit conditions
imposed to problem (9.134). This is due to the completion of the space to HL.
Definition 9.13
(i) The limit conditions which are obligatory satisfied by the functions of the domain D, and
are not obligatory satisfied by the functions of the energetic space HL are called natural
conditions for the operator L.
(ii) The limit conditions which are obligatory satisfied by the functions of the energetic space
HL are called essential conditions.
Observation 9.13 In the frame of Ritz’s method we choose bases in the energetic space; it follows
that the functions φk
i , i = 1, nk, are not subjected to the natural conditions.
9.8.2 Galerkin’s Method
In the frame of Ritz’s method it has been asked that the operator L be autoadjoint and positive
definite, which represents a limitation of this method.
In the case of Galerkin’s method8
we solve the operational equation
Lu = f (9.153)
in a Hilbert space H, f ∈ H, while the domain D of definition of L is dense in H.
We write L in the form
L = L0 + K, (9.154)
where L0 is a positive definite symmetric operator with L−1
0 total continuous in H, while the domain
DK of definition of K satisfies the relation DF ⊇ DL0
, where DL0
is the domain of definition
of L0.
We also introduce now the energetic space HL0
of the operator L0, with the scalar product
u, v L0
and the norm u 2
L0
= u, u L0
.
Let us perform a scalar product of relation (9.135) and an arbitrary function v ∈ DL0
. We obtain
L0u, v + Ku, v = f, v (9.155)
or
u, v L0
+ Ku, v = f, v . (9.156)
Definition 9.14 We call the generalized solution of equation (9.135) a function u0 ∈ HL0
, which
satisfies relation (9.156) for any v ∈ HL0
.
Observation 9.14 If u0 ∈ DL0
, then, because
u, v L0
= L0u, v , (9.157)
it follows that
L0u0 + Ku0 − f, v = 0 (9.158)
and because DL0
is dense in H, we deduce that u0 satisfies equation (9.153).
8Boris Grigoryevich Galerkin (1871–1945) described the method in 1915.
552 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
Also, we now construct the spaces Hk ⊆ HL0
and the bases φk
1, φk
2, . . . , φk
nk
, the approximation
of the solution being
uk
=
nk
i=1
ciφk
i , (9.159)
where the coefficients ci, i = 1, nk, are chosen so that uk
do verify relation (9.156) for any v ∈ Hk.
On the other hand, because v ∈ Hk, we deduce that v is written in the form
v =
nk
i=1
biφk
i ; (9.160)
hence, uk
is determined by the system of equations
uk
, φk
i L0
+ Kuk
, φk
i = f, φk
i , i = 1, nk. (9.161)
The last system may be put in the form
Ac = g, (9.162)
where
A = [aij ]i,j=1,nk
, aij = φk
i , φk
j L0
+ Kφk
i , φk
j , i, j = 1, nk, (9.163)
g = g1 · · · gnk
T
, gi = f, φk
i , i = 1, nk, (9.164)
c = c1 · · · cnk
T
. (9.165)
Observation 9.15 If K = 0, then Galerkin’s method becomes Ritz’s method.
Observation 9.16 We consider that there exists the operator L−1
0 , bounded and defined on the
whole space H. Equation (9.153) is now equivalent to
u + L−1
0 Ku = L−1
0 f. (9.166)
We denote by H1 the Hilbert space with the scalar product
u, v 1 = Lu0, L0v (9.167)
and the norm
u 1 = L0u . (9.168)
We also now construct the subspaces Hk, finite dimensional but included in H1 and of bases
ψk
i , i = 1, nk, and search the approximate solution in the form
uk
=
nk
i=1
ciψk
i , (9.169)
where ci, i = 1, nk, are obtained from the system
uk
, ψk
i 1 + L−1
0 Kuk
, ψk
i 1 = L−1
0 f, ψk
i 1, i = 1, nk. (9.170)
VARIATIONAL METHODS 553
9.8.3 Method of the Least Squares
Let the operational equation be
Lu = f, (9.171)
in the Hilbert space H and let Hk be dimensional finite subspaces of H with the bases φk
i , i = 1, nk,
and with Hk ⊆ D.
Starting from the relations
∂
∂ci
Lu − f = 0, i = 1, nk, (9.172)
we obtain system (9.147) in which
A = [aij ]i,j=1,nk
, aij = Lφk
i , Lφk
j , i, j = 1, nk, (9.173)
g = g1 · · · gnk
T
, gi = f, Lφk
i , i = 1, nk, (9.174)
c = c1 · · · cnk
T
, (9.175)
the approximate solution being
uk
=
nk
i=1
ciφk
i . (9.176)
The approximate solution uk
converges to the exact solution of equation (9.171), if that equation
has a unique solution, the sequence of subspaces LHk is complete in D, while the operator L−1
does exist and is bounded.
Observation 9.17 The problem is put that the limit solution verifies the limit conditions of problem
(9.171). There are two possibilities of tackling this problem:
(i) we impose the functions of the space Hk to verify the limit conditions; but the method is
difficult to apply;
(ii) if Lu = f in D and Liu = fi on ∂Di, i = 1, p, are the problems and the limit conditions,
then we consider the functional
Ik(u) = Lu − f 2
+
nk
i=1
ci(k) Liu − fi
2
, (9.177)
where ci(k), i = 1, nk, are positive functions of parameter k. If the solution is smooth, then
ci(k) = k
−2 2m−mi − 1
2
, i = 1, nk, (9.178)
where 2m is the order of the partial differential equation Lu = f , while mi is the order of
the highest order derivative in the operator Li, i = 1, p.
We now search the approximations uk
as solutions of the variational problem
inf
v∈Hk
Ik(v) = Ik(uk
). (9.179)
554 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
9.9 NUMERICAL EXAMPLES
Example 9.1 Let us consider the equation of wave propagation
∂u
∂t
+ a
∂u
∂x
= 0, x ∈ [0, 1], t ∈ [0, T ], (9.180)
where a is a positive constant.
Applying the theory of numerical integration of partial differential equations of first order using
explicit schemata, we obtain the equations with finite differences
V (xi
, tj+1
) = V (xi
, tj
) + c[V (xi−1
, tj
) − V (xi
, tj
)], i = 1, I, j = 1, J, (9.181)
where by V (xi
, tj
) has been denoted by the approximate value of the function u(xi
, tj
), xi
= ih,
tj
= jk, h = 1/I, k = I/J.
Equation (9.180) is equivalent to the system
dt
1
=
dx
a
(9.182)
which leads to the first integral
x − at = C1, (9.183)
where C1 is a constant; hence the exact solution of the problem is
u = φ(x − at), (9.184)
where φ is an arbitrary function.
If C = 1, then it follows that the schema
V (xi
, tj+1
) = V (xi−1
, tj
). (9.185)
Example 9.2 Let the partial differential equation be
∂u
∂t
+
∂u
∂x
= 0 (9.186)
for which the initial and boundary conditions are
u(x, 0) = 0, 0 < x ≤ 1, u(0, t) = 1, t ≥ 0. (9.187)
At the initial moment t = 0 the function u is identically null for all the values x in the domain,
excepting x = 0 for which u = 1.
We wish to obtain the equation with differences for problem (9.186), t ≤ 1, with the steps
h = 0.1, k = 0.1.
We shall apply relation (9.185) from Example 9.1.
It follows that
V (xi
, t0
) = 0, i > j, (9.188)
V (x0
, tj
) = 1, j ≥ 0, (9.189)
V (xi
, tj+1
) = V (xi−1
, tj
), i ≥ 1, j ≥ 0, i ≤ 10, j ≤ 9 (9.190)
NUMERICAL EXAMPLES 555
0
t
x
(1,0)
(0,9)
(0,8)
(0,7)
(0,6)
(0,5)
(0,4)
(0,3)
(0,2)
(0,1)
(1,0)(0,9)(0,8)(0,7)(0,6)(0,5)(0,4)(0,3)(0,2)(0,1)
Figure 9.4 Numerical solution of problem (9.186).
and the solution
V (xi
, tj
) =
1 for i ≤ j
0 otherwise
. (9.191)
Graphically, the situation is given in Figure 9.4, wherein the points where V (xi
, tj
) = 1 have
been marked by a star, while the points where V (xi
, tj
) = 0 have been marked by a circle.
Let us observe that for c = 1 the Lax–Wendroff schema leads to the exact solution V (xi
, tj+1
) =
V (xi−1
, tj
), as in this example.
Example 9.3 The equation with finite differences for Example 9.1 are now of the form (using
implicit schemata)
V (xi
, tj+1
) =
cV (xi+1, tj+1) + V (xi, tj )
1 + c
, i = 1, 2, . . . (9.192)
which is unconditioned convergent.
Another schema often used in case of Example 9.1 is the Wendroff schema for which the equation
with differences reads
V (xi
, tj+1
) = V (xi−1
, tj
) +
1 − c
1 + c
[V (xi
, tj
) − V (xi−1
, tj
)]. (9.193)
Example 9.4 Returning to Example 9.2 and using the implicit schemata (9.186) and (9.187) from
Example 9.3 for c = 1, we obtain the same results as in Figure 9.4.
556 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
Example 9.5 Let the problem of elliptic type be
∂2
u
∂x2
+
∂2
u
∂y2
= 0, 0 < x < 1, 0 < y < 1 (9.194)
with the boundary conditions
u(x, 0) = 0, u(x, 1) = x, 0 ≤ x ≤ 1, (9.195)
u(0, y) = 0, u(1, y) = y, 0 ≤ y ≤ 1, (9.196)
the exact solution of which is
u(x, y) = xy. (9.197)
Using a system with n = 5, m = 5, we will determine the numerical solution of the problem.
In the case of our problem
h =
1 − 0
5
= 0.2, k =
1 − 0
5
= 0.2,
h
k
= 1, (9.198)
and the linear approximating system
4wi,j − wi+1,j − wi−1,j − wi,j+1 − wi,j−1 = 0, i = 1, 4, j = 1, 4, (9.199)
w0,j = 0, j = 0, 5, (9.200)
w5,j = 0.2j, j = 0, 5, (9.201)
wi,0 = 0, i = 1, 4, (9.202)
wi,m = 0.2i, i = 1, 4. (9.203)
Renumbering the knots as in Figure 9.5, it follows that the linear system
4w13 − w14 − w9 = w0,1 + w1,0 = 0, 4w9 − w10 − w5 − w13 = w0,2 = 0,
4w5 − w6 − w1 − w9 = w0,3 = 0, 4w1 − w2 − w5 = w0,4 + w1,5 = 0 + 0.2 = 0.2,
4w14 − w15 − w13 − w10 = w2,0 = 0, 4w10 − w11 − w9 − w6 − w14 = 0,
4w6 − w7 − w5 − w2 − w10 = 0, 4w2 − w3 − w1 − w10 = w2,5 = 0.4,
4w15 − w16 − w14 − w11 = w3,0 = 0, 4w11 − w12 − w10 − w7 − w15 = 0,
4w7 − w8 − w6 − w3 − w11 = 0, 4w3 − w4 − w2 − w7 = w3,5 = 0.6,
4w16 − w15 − w12 = w5,1 + w4,0 = 0.2 + 0 = 0.2,
4w12 − w11 − w8 − w16 = w5,2 = 0.4,
4w8 − w7 − w4 − w12 = w5,3 = 0.6,
4w4 − w3 − w8 = w5,4 + w4,5 = 0.8 + 0.8 = 1.6. (9.204)
The solution of this system is
w1 = 0.16, w2 = 0.32, w3 = 0.48, w4 = 0.64, w5 = 0.12, w6 = 0.24,
w7 = 0.36, w8 = 0.48, w9 = 0.08, w10 = 0.16, w11 = 0.24, (9.205)
w12 = 0.32, w13 = 0.04, w14 = 0.08, w15 = 0.12, w16 = 0.16.
NUMERICAL EXAMPLES 557
A1 A2 A3 A4
A7A6A5 A8
A12A11A10A9
A13 A15A14 A16
(0,10)(0,8)(0,6)(0,4)(0,2)
(0,10)
(0,8)
(0,6)
(0,4)
(0,2)
O
y
x
Figure 9.5 Numbering of the knots for problem (9.196).
We observe that the numerical solution coincides with the exact solution and this is because
∂4
u
∂x4
= 0,
∂4
u
∂y4
= 0; (9.206)
hence the truncation error vanishes at each step.
Example 9.6 Let the problem of elliptic type be
∂2
u
∂x2
+
∂2
u
∂y2
= 0, 0 < x < 1, 0 < y < 1 (9.207)
with the boundary conditions
u(x, 0) = 0, u(x, 1) = sin(πx) sinh 1, 0 ≤ x ≤ 1,
u(0, y) = 0, u(x, 0) = 0, 0 ≤ y ≤ 1
(9.208)
the solution of which is
u(x, y) = sin(πx) sinh(πy). (9.209)
Using the algorithm presented for the differential equations of elliptic type with n = 6, m = 6, and
the stopping condition given by ε = 10−10
, we will determine the approximate numerical solution
of the problem, as well as the error with respect to the exact solution |u(xi, yj ) − w(l)
i,j |, i = 1, n,
j = 1, m, while l is given by the algorithm.
We have
f (x, y) = 0 for (x, y) ∈ [0, 1] × [0, 1], (9.210)
g(x, y) =
0 for y = 0, x = 0 or x = 1
sin (πx) sinh(π) for j = m
(9.211)
or written in the form
g(xi, yj ) =
0 for j = 0, i = 0 or i = n
sin (πx) sinh(1) for j = m
. (9.212)
The results of the program are given in Table 9.1 in which l = 80.
558 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
TABLE 9.1 Numerical Solution of Problems (9.207) and (9.208)
i j xi yj w(80)
i,j u(xi, yj ) |u(xi, yj ) − w(80)
i,j |
1 1 0.1667 0.1667 0.28665 0.27393 0.01272
1 2 0.1667 0.3333 0.65011 0.62468 0.02542
1 3 0.1667 0.5000 1.18776 1.15065 0.03711
1 4 0.1667 0.6667 2.04367 1.99935 0.04433
1 5 0.1667 0.8333 3.44719 3.40881 0.03837
2 1 0.3333 0.1667 0.49649 0.47446 0.02204
2 2 0.3333 0.3333 1.12602 1.08198 0.04404
2 3 0.3333 0.5000 2.05726 1.99298 0.06428
2 4 0.3333 0.6667 3.53975 3.46297 0.07678
2 5 0.3333 0.8333 5.97070 5.90423 0.06647
3 1 0.5000 0.1667 0.57330 0.54785 0.02545
3 2 0.5000 0.3333 1.30021 1.24937 0.05085
3 3 0.5000 0.5000 2.37552 2.30130 0.07422
3 4 0.5000 0.6667 4.08735 3.99869 0.08865
3 5 0.5000 0.8333 6.89437 6.81762 0.07675
4 1 0.6667 0.1667 0.49649 0.47446 0.02204
4 2 0.6667 0.3333 1.12602 1.08198 0.04404
4 3 0.6667 0.5000 2.05726 1.99298 0.06428
4 4 0.6667 0.6667 3.53975 3.46297 0.07678
4 5 0.6667 0.8333 5.97070 5.90423 0.06647
5 1 0.8333 0.1667 0.28665 0.27393 0.01272
5 2 0.8333 0.3333 0.65011 0.62468 0.03711
5 3 0.8333 0.5000 1.18776 1.15065 0.03711
5 4 0.8333 0.6667 2.04367 1.99935 0.04433
5 5 0.8333 0.8333 3.44719 3.40881 0.03837
Example 9.7 Let the problem of parabolic type be
∂u
∂t
−
∂2
u
∂x2
= 0, 0 < x < π, t > 0 (9.213)
with the initial and boundary conditions
u(x, 0) = sin x, (9.214)
u(0, t) = u(π, t) = 0 (9.215)
the exact solution of which is
u(x, t) = e−t
sin x. (9.216)
Considering m = 20, from which h = π/20 and k = 0.01, we search the approximate of the
problem for t = 0.5, which will be compared with the exact solution.
We shall solve the same problem for h = π/20 and k = 0.1.
The results are given in Table 9.2.
The numerical and the exact solutions in the second case are given in Table 9.3.
We observe that the method presented is not stable in the second case studied above.
NUMERICAL EXAMPLES 559
TABLE 9.2 Solution of Equation (9.213) in the First Case
i xi u(xi, 0.5) wi,50 |u(xi, 0.5)| − wi,50
0 0 0 0 0
1 0.157079633 0.094882299 0.094742054 0.000140245
2 0.314159265 0.187428281 0.187151245 0.000277037
3 0.471238898 0.275359157 0.274952150 0.000407007
4 0.628318531 0.356509777 0.355982821 0.000526955
5 0.7853981463 0.428881942 0.428248014 0.000633928
6 0.942477796 0.490693611 0.489968319 0.000725292
7 1.099557429 0.540422775 0.539623979 0.000798796
8 1.256637061 0576844936 0.575992305 0.000852632
9 1.413716694 0.599063261 0.598177788 0.000885473
10 1.570796327 0.606530660 0.605634150 0.000896510
11 1.727875959 0.599063261 0.598177788 0.000885473
12 1.884955592 0.576844936 0.575992305 0.000852632
13 2.042035225 0.540422775 0.539623979 0.000798796
14 2.199114858 0.490693611 0.489968319 0.0007252925
15 2.356194490 0.428881942 0.428248014 0.000633928
16 2.513274123 0.356509777 0.355982821 0.000526955
17 2.670353756 0.275359157 0.274952150 0.000407007
18 2.827433388 0.187428281 0.187151245 0.000277037
19 2.984513021 0.094882299 0.094742054 0.000140245
20 3.141592654 0 0 0
TABLE 9.3 Solution of Equation (9.213) in the Second Case
i xi u(xi, 0.5) wi,5 |u(xi, 0.5)| − wi,5
0 0 0 0 0
1 0.157079633 0.094882299 0.092478468 0.002403832
2 0.314159265 0.187428281 0.182679809 0.004748473
3 0.471238898 0.275359157 0.268382966 0.006976191
4 0.628318531 0.356509777 0.347477645 0.009032132
5 0.7853981463 0.428881942 0.418016274 0.010865672
6 0.942477796 0.490693611 0.478261948 0.012431663
7 1.099557429 0.540422775 0.526731229 0.013691545
8 1.256637061 0576844936 0.562230640 0.014614296
9 1.413716694 0.599063261 0.582886066 0.015177195
10 1.570796327 0.606530660 0.591164279 0.015366381
11 1.727875959 0.599063261 0.582886066 0.015177195
12 1.884955592 0.576844936 0.562230640 0.014614296
13 2.042035225 0.540422775 0.526731229 0.013691545
14 2.199114858 0.490693611 0.478261948 0.012431663
15 2.356194490 0.428881942 0.418016270 0.010865672
16 2.513274123 0.356509777 0.347477645 0.009032132
17 2.670353756 0.275359157 0.268382966 0.006976191
18 2.827433388 0.187428281 0.182679809 0.004748473
19 2.984513021 0.094882299 0.092478468 0.002403832
20 3.141592654 0 0 0
560 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
Example 9.8 Let the problem of parabolic type be
∂u
∂t
−
∂2
u
∂t2
= 0, 0 < x < π, t > 0 (9.217)
with the initial and boundary conditions
u(x, 0) = sin x, (9.218)
u(0, t) = u(π, t) = 0 (9.219)
the exact solution of which is
u(x, t) = e−t
sin x. (9.220)
Considering m = 20, from which h = π/20 and k = 0.1 we will determine the approximate
solution of the problem for t = 0.5, which will be compared with the exact solution using the
method with backward differences.
By means of the presented algorithm, the results are given in Table 9.4.
Example 9.9 Let the problem of parabolic type be
∂u
∂t
−
∂2
u
∂t2
= 0, 0 < x < π, t > 0 (9.221)
with the initial and boundary conditions
u(x, 0) = sin x, (9.222)
u(0, t) = u(π, t) = 0 (9.223)
TABLE 9.4 Solution of Problem (9.217)
i xi u(xi, 0.5) wi,5 |u(xi, 0.5)| − wi,5
0 0 0 0 0
1 0.157079633 0.094882299 0.097224254 0.002341955
2 0.314159265 0.187428281 0.192054525 0.004626243
3 0.471238898 0.275359157 0.282155775 0.006796618
4 0.628318531 0.356509777 0.365309415 0.008799638
5 0.7853981463 0.428881942 0.439467923 0.010585981
6 0.942477796 0.490693611 0.502805274 0.012111662
7 1.099557429 0.540422775 0.553761889 0.013339114
8 1.256637061 0576844936 0.591083049 0.014238113
9 1.413716694 0.599063261 0.613849783 0.014786522
10 1.570796327 0.606530660 0.621501498 0.014970838
11 1.727875959 0.599063261 0.613849783 0.014786522
12 1.884955592 0.576844936 0.591083049 0.014238113
13 2.042035225 0.540422775 0.553761889 0.013339114
14 2.199114858 0.490693611 0.502805274 0.012111662
15 2.356194490 0.428881942 0.439467923 0.010585981
16 2.513274123 0.356509777 0.365309415 0.008799638
17 2.670353756 0.275359157 0.282155775 0.006796618
18 2.827433388 0.187428281 0.192054525 0.004626243
19 2.984513021 0.094882299 0.097224254 0.002341955
20 3.141592654 0 0 0
NUMERICAL EXAMPLES 561
the exact solution of which is
u(x, t) = e−t
sin x. (9.224)
Considering m = 20, where h = π/20 and k = 0.1, we will determine the approximate solu-
tion of the problem for t = 0.5, which will be compared with the exact solution by using the
Crank–Nicolson method.
The results are given in Table 9.5.
Example 9.10 Let the problem of hyperbolic type be
∂2u
∂t2
−
∂2u
∂x2
= 0, 0 < x < 1, t > 0 (9.225)
with the conditions
u(0, t) = u(1, t) = 0, t > 0, (9.226)
u(x, 0) = sin(πx), 0 ≤ x ≤ 1, (9.227)
∂u
∂t
(x, 0) = 0, 0 ≤ x ≤ 1; (9.228)
the exact solution is
u(x, t) = sin(πx) cos(πt). (9.229)
TABLE 9.5 Solution of Problem (9.221)
i xi u(xi, 0.5) wi,5 |u(xi, 0.5)| − wi,5
0 0 0 0 0
1 0.157079633 0.094882299 0.094940434 0.000058135
2 0.314159265 0.187428281 0.187543119 0.00114838
3 0.471238898 0.275359157 0.275527871 0.00168713
4 0.628318531 0.356509777 0.356728211 0.000218434
5 0.7853981463 0.428881942 0.429144720 0.000262777
6 0.942477796 0.490693611 0.490994261 0.000300649
7 1.099557429 0.540422775 0.540753893 0.000331118
8 1.256637061 0576844936 0.577198371 0.000353434
9 1.413716694 0.599063261 0.599430308 0.000367048
10 1.570796327 0.606530660 0.606902283 0.000371623
11 1.727875959 0.599063261 0.599430308 0.000367048
12 1.884955592 0.576844936 0.577198371 0.000353434
13 2.042035225 0.540422775 0.540753893 0.000331118
14 2.199114858 0.490693611 0.490994261 0.000300649
15 2.356194490 0.428881942 0.429144720 0.000262777
16 2.513274123 0.356509777 0.356728211 0.000218434
17 2.670353756 0.275359157 0.275527871 0.000168713
18 2.827433388 0.187428281 0.187543119 0.000114838
19 2.984513021 0.094882299 0.094940434 0.000058135
20 3.141592654 0 0 0
562 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
TABLE 9.6 Solution of Equation (9.225)
i xi u(xi, 0.5) wi,60 |u(xi, 0.5)| − wi,60
0 0 0 0 0
1 0.05 −0.048340908 −0.051663969 0.003323061
2 0.10 −0.095491503 −0.102101248 0.006609746
3 0.15 −0.140290780 −0.150138925 0.009848145
4 0.20 −0.181635632 −0.193803147 0.012167515
5 0.25 −0.218508012 −0.234218363 0.015710350
6 0.30 −0.250000000 −0.266551849 0.16551849
7 0.35 −0.275336158 −0.292401548 0.017065390
8 0.40 −0.293892626 −0.311103275 0.017210649
9 0.45 −0.305212482 −0.313895800 0.008683318
10 0.50 −0.309016994 −0.299780167 0.009236827
11 0.55 −0.305212482 −0.278282952 0.026929531
12 0.60 −0.293892626 −0.259112488 0.034780138
13 0.65 −0.275336158 −0.241810622 0.033525536
14 0.70 −0.250000000 −0.218502651 0.031497349
15 0.75 −0.218508012 −0.189734816 0.028773196
16 0.80 −0.181635632 −0.158609575 0.023026057
17 0.85 −0.140290780 −0.122055771 0.018235009
18 0.90 −0.095491503 −0.083173084 0.012318419
19 0.95 −0.048340908 −0.042127931 0.006212977
20 1.00 0 0 0
Using the algorithm of finite differences for h = 0.05, k = 0.01, T = 0.5, we will determine the
approximate solution, which will be compared with the exact solution.
The results are given in Table 9.6.
9.10 APPLICATIONS
Problem 9.1
Let be a square deep beam of side, acted upon on the upper side by a uniform distributed normal
load and by the reactions which act as tangential loadings parabolically distributed (Fig. 9.6a). One
asks to calculate the corresponding state of stress.
Solution:
We decompose the loading in two cases, using the properties of symmetry with respect to the Ox-
axis. We have thus to solve the problem in Figure 9.6b, with properties of skew symmetry with
respect to Ox; the case in Figure 9.6c is symmetric with respect to Ox and represents a simple
compression, for which the state of stress is given by (σx, σy – normal stresses, τxy – tangential
stress)
σx = 0, σy = −
p
2
, τxy = 0. (9.230)
For the first case, we use the Airy biharmonic function F(x, y), the second derivatives of which
give the state of stress in the form
σx =
∂2
F
∂y2
, σy =
∂2
F
∂x2
, τxy = −
∂2
F
∂x∂y
; (9.231)
APPLICATIONS 563
we notice that F(x, y) must be even with respect to x and odd with respect to y, so that we take
the function of the form (the polynomials are obtained from the general form, putting the condition
of biharmonicity)
F(x, y) = P3(x, y) + P5(x, y) + P7(x, y) + P9(x, y) + P11(x, y)
= γ3x2
y + δ3y3
+ γ5(x4
y − x2
y3
) + δ5(y5
− 5x2
y3
)
+ γ7 x6
y −
10
3
x4
y3
+ x2
y5
+ δ7 y7
− 14x2
y5
+
35
3
x4
y3
+ γ9(x8
y − 7x6
y3
+ 7x4
y5
− x2
y7
) + δ9(y9
− 27x2
y7
+ 63x4
y5
− 21x6
y3
) + γ11 x10
y − 12x8
y3
+
126
5
x6
y5
− 12x4
y7
+ x2
y9
; (9.232)
hence the state of stress is given by
σx = 6δ3y − 6γ5x2
y + δ5(20y3
− 30x2
y) + γ7(−20x4
y + 20x2
y3
)
+ δ7(42y5
− 280x2
y3
+ 70x4
y) + γ9(−42x6
y + 140x4
y3
− 42x2
y5
)
+ δ9(72y7
− 1134x2
y5
+ 1260x4
y3
− 126x6
y)
+ γ11(−72x8
y + 504x6
y3
− 504x4
y5
+ 72x2
y7
),
σy = 2γ3y + γ5(12x2
y − 2y3
) − 10δ5y3
+ γ7(30x4
y − 40x2
y3
+ 2y5
)
+ δ7(−28y5
+ 140x2
y3
) + γ9(56x6
y − 210x4
y3
+ 84x2
y5
− 2y7
)
+ δ9(−54y7
+ 756x2
y5
− 630x4
y3
)
+ γ11(90x8
y − 672x6
y3
+ 756x4
y5
− 144x2
y7
+ 2y9
),
τxy = −2γ3x + γ5(−4x3
+ 6xy2
) + 30δ5xy2
+ γ7(−6x5
+ 40x3
y2
− 10xy4
)
+ δ7(140xy4
− 140x3
y2
) + γ9(−8x7
+ 126x5
y2
− 140x3
y4
+ 14xy6
)
+ δ9(378xy6
− 1260x3
y4
+ 378x5
y2
)
+ γ11(−10x9
+ 288x7
y2
− 756x5
y4
+ 336x3
y6
− 18xy8
). (9.233)
We put conditions at 16 points of the contour. Because of the symmetry, there remain five distinct
points (Fig. 9.6b). The conditions
σx(a, 0) = 0, τxy (0, a) = 0 (9.234)
are identically satisfied. We then have (τyx = τxy )
σy(0, a) = σy
a
2
, a = σy(a, a) = −0.5p, τyx
a
2
, a = τyx (a, a) = 0,
σx a,
a
2
= σx(a, a) = 0, τxy (a, 0) = 0.75p, τxy a,
a
2
= 0.5625p; (9.235)
we notice that at the point (a, a), three conditions must be satisfied, because of the symmetry of
the stress tensor, hence of the tangential stresses.
564 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
x
y
O
p
3p
4
aa
a a
x
y
O
1 22
3
4
4
0.5p
aa
3
3 3
2 21
4
4
5 5
0.5625p
0.75p
0.5p
aa
x
y
O
0.5p
aa
0.5p
aa
(a)
(b)
(c)
Figure 9.6 Square deep beam.
APPLICATIONS 565
We find thus following system of nine linear equations for the nine arbitrary parameters
(α1 = γ3a, α2 = δ3a, α3 = γ5a3
, α4 = δ5a3
, α5 = γ7a5
, α6 = δ7a5
, α7 = γ9a7
, α8 = δ9a7
,
α9 = γ11a9
),
α1 − α3 − 5α4 + α5 − 14α6 − α7 − 27α8 + α9 = −0.25p,
2α1 + α3 − 10α4 − 6.125α5 + 7α6 + 6.75α7 + 95.625α8 + 3.1016α9 = −0.5p,
α1 + 5α3 − 5α4 − 4α5 + 56α6 − 36α7 + 36α8 + 16α9 = −0.25p,
α1 − 2.5α3 − 15α4 + 0.1875α5 − 52.5α6 + 6.625α7 − 13.3125α8 − 11.6055α9 = 0,
α1 − α3 − 15α4 − 12α5 + 4α7 + 252α8 + 80α9 = 0,
3α2 − 3α3 − 12.5α4 − 7.5a5 + 1.3125α6 − 4.8125α7 + 59.625α8 + 11.8125α9 = 0,
3α2 − 3α3 − 5α4 − 84α6 + 28α7 + 36α8 = 0,
α1 + 2α3 + 3α5 + 4α7 + 5α9 = −0.375p,
2α1 + 2.5α3 − 7.5α4 − 3.375α5 + 26.25α6 − 14.9688α7 − 21.6563α8 − 19.9297α9 = −0.5625p.
(9.236)
By solving the system (we use one of the methods that have been presented in Section 4.5),
we get
γ3 = −0.347100
p
a
, δ3 = −0.083952
p
a
, γ5 = 0.009407
p
a3
,
δ5 = −0.014571
p
a3
, γ7 = −0.009264
p
a5
, δ7 = −0.003585
p
a5
,
γ9 = −0.003837
p
a7
, δ9 = 0.000376
p
a7
, γ11 = −0.000654
p
a9
,
(9.237)
the function F(x, y) being thus completely determined.
Taking into account the state of stress (9.230) and formulae (9.233), we get finally (ξ = x/a,
η = y/a)
σx = [(−0.504 + 0.380ξ2
− 0.064ξ4
+ 0.113ξ6
+ 0.047ξ8
)η
+ (−0.291 + 0.819ξ2
− 0.064ξ4
− 0.329ξ6
)η3
+ (−0.151 − 0.265ξ2
+ 0.329ξ4
)η5
+ (0.027 − 0.047ξ3
)η7
]p,
σy = [−0.500 + (−0.695 + 0.113ξ2
− 0.278ξ4
− 0.215ξ6
− 0.059ξ8
)η
+ (0.127 − 0.132ξ2
+ 0.570ξ4
+ 0.439ξ6
)η3
+ (0.082 − 0.038ξ2
− 0.494ξ4
)η5
+ (−0.013 + 0.094ξ2
)η7
− 0.001η9
]p,
τxy = [0.695ξ − 0.638ξ3
+ 0.056ξ5
+ 0.031ξ7
+ 0.006ξ9
+ (−0.381ξ + 0.131ξ3
− 0.338ξ5
− 0.189ξ7
)η2
+ (−0.409ξ + 0.063ξ3
+ 0.494ξ5
)η4
+ (0.088ξ − 0.221ξ3
)η6
+ 0.012ξη8
]p.
(9.238)
We obtain thus on the contour a distribution of stresses from which we subtract the distribution
of the external loading; it follows that
566 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
• on the sides ξ = ±1:
σx(±1, η) = (−0.028η + 0.135η3
− 0.087η5
− 0.020η7
)p
= −0.02η(1 − η2
)(0.25 − η2
)(5.6 + η2
)p,
τxy (±1, η) = ∓(0.027η2
− 0.148η4
+ 0.133η6
− 0.012η8
)p
∼= ∓0.012η2
(1 − η2
)(0.25 − η2
)(9.7 − η2
)p;
(9.239)
• on the sides η = ±1:
σy(ξ, ±1) = ±(0.037ξ2
− 0.202ξ4
+ 0.224ξ6
− 0.059ξ8
)p
∼= ±0.05ξ2
(1 − ξ2
)(0.25 − ξ2
)(2.55 − ξ2
)p,
τyx (ξ, ±1) = (0.005ξ − 0.065ξ3
+ 0.212ξ5
− 0.158ξ7
+ 0.006ξ9
)p
∼= 0.006ξ(1 − ξ2
)(0.25 − ξ2
)(0.15 − ξ2
)(25 − ξ2
)p.
(9.240)
We represent these parasite stresses in Figure 9.7. Although Saint–Venant’s principle cannot
be applied, because the deep beam has equal dimensions, a negligible state of stress takes place
in the interior (the stresses are very small with respect to the loading). We can make an elemen-
tary verification, approximating the loading by parabolically distributed loads and using methods
x
y
O
1 2
3 3
2
2 2 33
3
3 3 3
3 3
3
3
1
4
4 4
4
4
5
22
5 5
4
1 22
0.39a
a a
0.007p0.001p
0.017p
0.005p
0.013p
0.008p
0.001p
0.009p
aa
4 4
Figure 9.7 Parasitic stresses on the boundary.
APPLICATIONS 567
of strength of materials. The bending moments at the vertical cross sections 2–2 and 1–1 are
(covering)
M2−2 = −2
2
3
0.01p
a
2
a
4
= −
1
6
0.01pa2
,
M1−1 = −2
2
3
0.01p
a
2
3a
4
+ 2
2
3
0.02p
a
2
a
4
= −
1
6
0.01pa2
;
(9.241)
hence, we get (the strength modulus is W = (1/6)(2a)2
= 2a2
/3)
σmax = ∓0.0025p. (9.242)
We can thus see that the error is not greater then 1.7% of the maximum external load, which
takes place at four points of the contour.
We may consider that the relations (9.238) give the searched state of stress, which we represent
in Figure 9.8a and Figure 9.8b. The broken line in Figure 9.8a corresponds to the linear distribution
obtained in strength of materials (Navier’s formula).
x
y
O
0.687p
σx
0.919p 0.750p 0.562p
0.375p
0.287p
0.763p0.950p(a)
(b)
0.223p
0.293p
0.343p
0.95p
0.05p
0.5p
0.5p
0.121p 0.124p
0.5p
0.826p
O
p p
0.829p
p
0.285p
y
0.583p
0.75p
σy τxy
x
Figure 9.8 State of stress (a) σx; (b) σy, τxy .
568 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
w(x,t)
x
G
BA
l
Figure 9.9 Problem 9.2.
Problem 9.2
We consider a thread of length l and density ρ, the cross section of which is constant of area equal
to A. The spring is fixed at A (Fig. 9.9) and passes over a small pulley at B, at the other end of
the thread holding a weight G.
The partial differential equation of the free transverse vibrations of the thread is
∂2
w
∂x2
−
1
c2
∂2
w
∂t2
= 0, (9.243)
where w(x, t) is the deflection, while c is a constant
c =
G
ρA
; (9.244)
knowing the initial conditions
t = 0, w(x, 0) = 4h0
x
l
−
x2
l2
,
∂w(x, 0)
∂t
= 0, (9.245)
determine
• the exact solution, integrating the equation by Fourier’s method;
• a numerical solution, integrating with finite differences and compare the results.
Numerical application: A = 10−6
m2
, ρ = 104
kg m−3
, l = 2 m, h0 = 2 × 10−2
m, G = 10−2
N.
Solution:
1. Solution by the Fourier method
We consider a solution of the form
w(x, t) = Y(x) cos(pt − φ) (9.246)
and expression (9.243) leads to the differential equation
Y +
p2
c2
Y = 0, (9.247)
from which we obtain
Y(x) = B cos
p
c
x + D sin
p
c
x ; (9.248)
taking into account the boundary conditions
w(0, t) = w(l, t) = 0, (9.249)
APPLICATIONS 569
we obtain
sin
p
c
l = 0, (9.250)
which leads to the eigenvalues
pk = kπ
c
l
, k = 1, 2, . . . (9.251)
Under these conditions, the general solution takes the form
w(x, t) =
∞
k=1
Dk sin
kπx
l
cos(pkt − φk), (9.252)
the constants Dk, φk being given by
Dk cos φk =
2
l
l
0
w(x, 0) sin
kπx
l
dx, Dk sin φk =
2
l
l
0
∂w(x, 0)
∂t
sin
kπx
l
dx. (9.253)
We obtain the results
φk = 0, Dk =
16h0
k3π3
(1 − cos kπ), (9.254)
from which the solution
w(x, t) =
32h0
π3
∞
i=1
sin (2i − 1) πx
l cos p2i−1t
(2i − 1)3
. (9.255)
2. Numerical calculation
We apply the theory presented for the partial differential equations of second order of hyperbolic
type for
α = c, f (x) = 4h0
x
l
−
x2
l2
, g(x) = 0. (9.256)
The results for x = l/2 are plotted in Figure 9.10
Problem 9.3
Let us consider the bar BC of length l (Fig. 9.11), of density ρ, of modulus of longitudinal elasticity
E, having a constant area A of the cross section; the bar is built-in at B, the end C being free.
The partial differential equation of the free transverse vibrations of the bar reads
∂4
w
∂x4
+
Aρ
EI
∂2
w
∂t2
= 0, (9.257)
where w(x, t) is the deflection (Fig. 9.11), and I is the principal moment of inertia of the cross
section of the bar with respect to the neutral axis (normal to Bx and Bw). Being given the initial
conditions
t = 0, w(x, 0) = h0
f1(β1)f4 β1
x
l − f2(β1)f3 β1
x
l
|f1(β1)f4(β1) − f2(β1)f3(β1)|
,
∂w(x, 0)
∂t
= 0, (9.258)
570 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−0.02
−0.015
−0.01
−0.005
0
0.005
0.01
0.015
0.02
t (s)
w(l/2)&wfindiff(l/2)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−1.5
−1
−0.5
0
0.5
1
1.5
× 10−6
t (s)
w(l/2)−wfindiff(l/2)
(a)
(b)
Figure 9.10 (a) The analytic w(l/2) calculated with 20 terms and the numerical w(l/2) versus time;
(b) the error.
B w(x,t) x
x
C
w
Figure 9.11 Problem 9.3.
APPLICATIONS 571
where f1, f2, f3, f4 are Krylov’s functions
f1(z) =
cosh z + cos z
2
, f2(z) =
sinh z + sin z
2
,
f3 =
cosh z − cos z
2
, f4 =
sinh z − sin z
2
,
(9.259)
while β1 is the smallest positive solution of the equation
cosh β cos β + 1 = 0, (9.260)
determine:
• the exact solution, integrating the equation by Fourier’s method;
• a numerical solution, integrating by means of finite differences and compare the results for
x = l/2.
Numerical application: ρ = 7800 kg m−3
, l = 1 m, A = 6 × 10−4
m2
, I = 5 × 10−9
m4
,
E = 2 × 1011
N m−2
, h0 = 0.02 m.
Solution:
1. Solution by Fourier’s method
Let us consider a solution of the form
w(x, t) = Y(x) cos(pt − φ); (9.261)
from equation (9.257) we obtain the differential equation
Y(iv)
− α4
Y = 0, (9.262)
where
α4
= p2 ρA
EI
. (9.263)
The solution of equation (9.262) and its derivatives Y , Y , Y satisfy the matrix equation











Y (x)
Y (x)
α
Y (x)
α2
Y (x)
α3











=




f1 (αx) f2(αx) f3(αx) f4(αx)
f4(αx) f1(αx) f2(αx) f3(αx)
f3(αx) f4(αx) f1(αx) f2(αx)
f2(αx) f3(αx) f4(αx) f1(αx)
















Y (0)
Y (0)
α
Y (0)
α2
Y (0)
α3












. (9.264)
Observing from Figure 9.11 that the conditions that take place for the bar at the ends are
Y(0) = Y (0) = 0, Y (l) = Y (l) = 0, (9.265)
we obtain from the expression (9.264) the homogenous equations in Y (0), Y (0)
αf1(αl)Y (0) + f2(αl)Y (0) = 0, αf4(αl)Y (0) + f1(αl)Y (0) = 0. (9.266)
The system (9.266) admits a nontrivial solution if
f 2
1 (β) − f2(β)f4(β) = 0, (9.267)
572 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
where
β = αl. (9.268)
Taking into account equation (9.259), equation (9.267) becomes
cosh β cos β + 1 = 0, (9.269)
with the solutions β1, β2, . . . , βn, . . . , so that, from equation (9.263) and equation (9.268) we
deduce the eigenpulsations
pn =
β2
n
l2
EI
ρA
. (9.270)
Taking into account relations (9.264), (9.266), and (9.270), the functions Yn(x) read
Yn(x) = Dn n(x), (9.271)
where Dn are constants, while n(x) are the eigenfunctions
n(x) = f1(βn)f4 βn
x
l
− f2(βn)f3 βn
x
l
, (9.272)
with the property of orthogonality
l
0
n(x) m(x)dx = 0, if m = n. (9.273)
Under these conditions, the general solution is
w(x, t) =
∞
n=1
Dn n(x) cos(pnt − φn), (9.274)
where Dn and φn are given by
Dn cos φn =
l
0 w(x, t) n(x)dx
l
0
2
n(x)dx
, Dn sin φn =
l
0
∂w(x,0)
∂t n(x)dx
p2
n
l
0
2
n(x)dx
. (9.275)
In the considered case, with the conditions (9.258), it follows that
φn = 0, n ≥ 0, Dn = 0, n ≥ 1, (9.276)
D1 =
h0
|f1(β1)f4(β1) − f2(β1)f3(β1)|
, (9.277)
where β1 ≈ 1.875, p1 = β2
1
√
EI /(ρA)/l2
, hence
w(x, t) = h0
f1(β1)f4 β1
x
l − f2(β1)f3 β1
x
l
|f1(β1)f4(β1) − f2(β1)f3(β1)|
cos p1t. (9.278)
APPLICATIONS 573
2. Numerical calculation
We consider the domain
[0, l] × [0, T ] ⊂ R2
, (9.279)
the number of division points being m and n, respectively.
We may write
h =
l
m
, k =
T
n
. (9.280)
From the relation
w(x, k) = w(x, 0) + k
∂w(x, 0)
∂t
+ O(k2
) (9.281)
we obtain
wi,1 ≈ wi,0 + k
∂w(xi, 0)
∂t
+ wi,0, i = 0, m. (9.282)
On the other hand, the conditions
Y (l) = Y (l) = 0, (9.283)
are put; we take into account that
Y(l − h) = Y(l) − hY (l) + O(h4
),
Y(l − 2h) = Y(l) − 2hY (l) + O(h4
), (9.284)
Y(l − 2h) = Y(l − h) − hY (l − h) + O(h4
),
from which
Y (l) = Y (l − h) = Y (l − 2h), (9.285)
and that
Y (l) ≈
wm−1,j − wm,j
h
, (9.286)
Y (l − h) ≈
wm−2,j − wm−1,j
h
, (9.287)
Y (l − 2h) ≈
wm−3,j − wm−2,j
h
, (9.288)
we are led to
wm−1,j = 2wm−2,j − wm−3,j , wm,j = 2wm−1,j − wm−2,j . (9.289)
On the other hand,
∂4
w
∂x4
≈
wi+2,j − 4wi+1,j + 6wi,j − 4wi−1,j + wi−2,j
h4
, (9.290)
∂2w
∂t2
≈
wi,j+1 − 2wi,j + wi,j−1
k2
, (9.291)
so that equation (9.257) takes the form
wi,j+1 = 2wi,j − wi,j−1 − λ2
(wi+2,j − 4wi+1,j + 6wi,j − 4wi−1,j + wi−2,j ), (9.292)
574 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
in finite differences, where
λ2
=
Aρ
EI
k2
h4
. (9.293)
By formula (9.292), we may calculate the values w at the points A, B, and C, marked in
Figure 9.12.
The values w for the points of type D or E cannot be calculated by this formula. We apply the
formula (9.289) for these points and we obtain:
EDC1 C2B
A
T
l
t
n
j +1
j
j −1
xm
m−1
m−2
m−3
m−4i+2
i+1
ii−1
i−2
4321O
Figure 9.12 Working schema.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
× 10−3
× 10−3
−6.805
−6.8
−6.795
−6.79
−6.785
−6.78
−6.775
−6.77
−6.765
t (s)
w(l/2)&wfindiff(l/2)
Figure 9.13 The analytic w(l/2) (continuous line) and numerical w(l/2) (dashed line) versus time.
FURTHER READING 575
• for the point D
wm−1,j+1 = 2wm−2,j+1 − wm−3,j+1 (9.294)
or
wD = 2wC1
− wC2
; (9.295)
• for the point E
wm,j+1 = 2wm−1,j+1 − wm−2,j+1 (9.296)
or
wE = 2wD − wC1
. (9.297)
The results obtained for x = l/2 are plotted in Figure 9.13.
FURTHER READING
Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of
America.
Atkinson K, Han W (2010). Theoretical Numerical Analysis: A Functional Analysis Framework. 3rd
ed. New York: Springer-Verlag.
Babuˇska I, Pr´ager M, Vit´asek E (1966). Numerical Processes in Differential Equations. Prague: SNTI.
Bakhvalov N (1976). M´ethodes Num´erique. Moscou: Editions Mir (in French).
Boyce WE, DiPrima RC (2008). Elementary Differential Equations and Boundary Value Problems.
9th ed. Hoboken: John Wiley & Sons, Inc.
Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole.
Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson.
Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice Hall.
Den Hartog JP (1961). Strength of Materials. New York: Dover Books on Engineering.
Epperson JF (2007). An Introduction to Numerical Methods and Analysis. Hoboken: John Wiley &
Sons, Inc.
Farlow SJ (1982). Partial Differential Equations for Scientists and Engineers. New York: John Wiley
& Sons, Inc.
Gockenbach MS (2010). Partial Differential Equations: Analytical and Numerical Methods. 2nd ed.
Philadelphia: SIAM.
Godunov SK, Reabenki VS (1977). Scheme de Calcul cu Diferent¸e Finite. Bucures¸ti: Editura Tehnic˘a
(in Romanian).
Golub GH, van Loan CF (1996). Matrix Computations. 3rd ed. Baltimore: John Hopkins University
Press.
Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen-
tation of Algorithms. Princeton: Princeton University Press.
Grossmann C, Roos HG, Stynes M (2007). Numerical Treatment of Partial Differential Equations.
Berlin: Springer-Verlag.
Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing.
Hibbeler RC (2010). Mechanics of Materials. 8th ed. Englewood Cliffs: Prentice Hall.
Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill.
Iserles A (2008). A First Course in the Numerical Analysis of Differential Equations. 2nd ed.
Cambridge: Cambridge University Press.
Ixaru LG (1979). Metode Numerice pentru Ecuat¸ii Diferent¸iale cu Aplicat¸ii. Bucures¸ti: Editura
Academiei Romˆane (in Romanian).
576 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS
Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd
ed. Boca Raton: CRC Press.
Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill.
Lurie AI (2005). Theory of Elasticity. New York: Springer-Verlag.
Marciuk GI (1983). Metode de Analiz˘a Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in
Romanian).
Marciuk GI, S¸aidurov VV (1981). Cres¸terea Preciziei Solut¸iilor ˆın Scheme cu Diferent¸e. Bucures¸ti:
Editura Academiei Romˆane (in Romanian).
Marinescu G (1974). Analiza Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Romanian).
Palm WJ III (2007). Mechanical Vibrations. Hoboken: John Wiley & Sons, Inc.
Pandrea N, Pˆarlac S (2000). Vibrat¸ii Mecanice: Teorie s¸i Aplicat¸ii din Domeniile Autovehiculelor
Rutiere s¸i din Domeniul Prelucr˘arilor Mecanice. Pites¸ti: Editura Universit˘at¸ii din Pites¸ti (in
Romanian)
Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific
Computing. 3rd ed. Cambridge: Cambridge University Press.
Quarteroni A, Sacco R, Saleri F (2010). Numerical Mathematics. 2nd ed. Berlin: Springer-Verlag.
Rivi`ere B (2008). Discontinuous Galerkin Methods for Solving Elliptic and Parabolic Equations:
Theory and Implementation. Philadelphia: SIAM.
Salvadori MG, Baron ML (1962). Numerical Methods in Engineering. Englewood Cliffs: Prentice
Hall.
Samarski A, Andr´eev V (1978). M´ethodes aux Diff´erences pour ´Equations Elliptiques. Moscou:
Editions Mir (in French).
Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson.
Shabana AA (2011). Computational Continuum Mechanics. 2nd ed. Cambridge: Cambridge University
Press.
Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN.
Bucures¸ti: Editura Tehnic˘a (in Romanian).
Sinha AK (2010). Vibration of Mechanical Systems. Cambridge: Cambridge University Press.
Smith GD (1986). Numerical Solution of Partial Differential Equations: Finite Difference Methods.
3rd ed. Oxford: Oxford University Press.
St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
St˘anescu ND, Munteanu L, Chiroiu V, Pandrea N (2007). Sisteme Dinamice: Teorie s¸i Aplicat¸ii.
Volume 1. Bucures¸ti: Editura Academiei Romˆane (in Romanian).
St˘anescu ND, Munteanu L, Chiroiu V, Pandrea N (2011). Sisteme Dinamice. Teorie s¸i Applicat¸ii.
Volume 2. Bucures¸ti: Editura Academiei Romˆane (in Romanian).
Teodorescu PP, Nicorovici NAP (2010). Applications of the Theory of Groups in Mechanics and
Physics. Dordrecht: Kluwer Academic Publishers.
Teodorescu PP (2008). Mechanical Systems: Classical Models. Volume 2: Mechanics of Discrete and
Continuous Systems. Dordrecht: Springer-Verlag.
Teodorescu PP (2009). Mechanical Systems: Classical Models. Volume 3: Analytical Mechanics.
Dordrecht: Springer-Verlag.
Udris¸te C, Iftode V, Postolache M (1996). Metode Numerice de Calcul. Algoritmi s¸i Programe Turbo
Pascal. Bucures¸ti: Editura Tehnic˘a (in Romanian).
10
OPTIMIZATIONS
10.1 INTRODUCTION
Definition 10.1 A method of optimization solves the problem of determination of the minimum
(maximum) of an objective (purpose) function U, where U : D ⊂ Rn → R.
Observation 10.1 Because the determination of the maximum of the objective function U is
equivalent to the determination of the minimum of the function −U, it follows that we may limit
ourselves to the determination of the minimum of the objective function.
In general, in case of optimization problems, the global minimum is of interest. Such a point of
global minimum will be found between the points of local minimum; it can be unique or multiple
(i.e., there exists only one point at which the function U takes its least value in D, or there are
several such points, possibly even an infinity).
For a local minimum x of the function U we can write
∇U(x) = 0, ∇2
U(x) > 0, (10.1)
where ∇U is the gradient of U, that is,
∇U(x) =
∂U
∂x1 x=x
i1 + · · · +
∂U
∂xn x=x
in, (10.2)
and where x = (x1, . . . , xn)T is a point of D ⊂ Rn, i1, . . . , in are the unit vectors of the coordinate
axes in Rn
, while ∇2
U is the Hessian matrix
∇2
U(x) =






∂2
U
∂x2
1
∂2
U
∂x1∂x2
. . .
∂2
U
∂x1∂xn
. . . . . . . . . . . .
∂2U
∂xn∂x1
∂2U
∂xn∂x2
. . .
∂2U
∂x2
n






x=x
. (10.3)
Numerical Analysis with Applications in Mechanics and Engineering, First Edition.
Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.
 2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc.
577
578 OPTIMIZATIONS
Definition 10.2 Conditions (10.1) are called optimality conditions.
Observation 10.2
(i) The optimality conditions are sufficient for x to be a global minimum for the function U,
but they are not necessary.
(ii) The condition ∇2
U(x) > 0 requires that the Hessian matrix be positive definite at the point x.
To determine the global minimum of the function U, we can proceed intuitively in two modes:
• we start with different points x(0)
, determining in each case the minimum of the function U;
the point x is that which leads to the least value between the minima previously obtained;
• we determine a local minimum; if, by a perturbation, the algorithm returns us to the same
point, then it may be a serious candidate for the global minimum.
The classification of the optimization methods can be made after several criteria as
• from the point of view of the restrictions imposed to the variables we have optimization
problems with or without restrictions;
• from the point of view of the objective function we may have linear optimization problems
for which both the objective function and the restrictions are linear and nonlinear optimization
problems in the opposite case;
• from the point of view of the calculation of the derivatives we encounter (i) optimization
methods of Newton type, where the Hessian matrix ∇2
U(x) and the gradient vector ∇U are
calculated, (ii) optimization methods of quasi-Newton type and optimization methods with
conjugate gradients, where only the partial derivatives of first order are calculated, and (iii)
optimization methods where no partial derivatives are calculated.
The optimization methods are iterative ones. They determine the value x as a limit of a sequence
x(0)
, x(1)
, . . . , x(k)
, . . . defined iteratively by the relation
x(k+1)
= x(k)
+ αkp(k)
, k = 0, 1, . . . , (10.4)
where p(k)
is a direction of decreasing of the objective function U by the step k, while αk is a
positive real number such that
U(x(k+1)
) < U(x(k)
), k = 0, 1, . . . (10.5)
The point x(0)
∈ D is necessary to start the algorithm.
10.2 MINIMIZATION ALONG A DIRECTION
Let us consider the function f : R → R, the minimum of which we wish to determine.
There can appear two situations:
• the derivative f may be analytically determined. In this case, we have to solve the equation
f (x) = 0 and to verify which of its solutions are local minima. The global minimum will,
obviously, be the smallest value of these local minima and will correspond to one or several
points at which f (x) = 0;
• the derivative f cannot be analytically determined. In this case, we have to go through two
steps:
MINIMIZATION ALONG A DIRECTION 579
(a) localization of the minimum, that is, the determination of an interval (a, b) of the real axis
that contains the point of minimum;
(b) reduction of the length of the interval (a, b) until it has a length strictly smaller than an
imposed value ε
|b − a| < ε. (10.6)
Observation 10.3 Let us denote the representation error of the numbers in the computer by εm,
that is, the minimal distance between two numbers, which can be represented in the computer, for
which the two representations differ. Under these conditions, ε must fulfill the relation
ε ≥
√
εm. (10.7)
Indeed, let a be a point sufficiently near to the point of minimum, so that
f (a) ≈ 0. (10.8)
Taylor’s relation around the point a leads to
f (b) ≈ f (a) +
(b − a)2
2!
f (a). (10.9)
The values a and b must satisfy the relation
|f (b) − f (a)| > εm|f (a)|, (10.10)
so that the representations of f (a) and f (b) be different. We thus deduce
|b − a| ≈ 2εm
|f (a)|
|f (a)|
= |a|
√
εm 2
|f (a)|
a2|f (a)|
. (10.11)
Moreover, if 2|f (a)|/(a2f (a)) is of order O(1), then |b − a| is of order O(|a|
√
εm) and the
condition
|b − a| < ε|a| (10.12)
leads to equation (10.7).
10.2.1 Localization of the Minimum
To localize the minimum of a function f : R → R, at least three points are necessary.
Considering three points a, b and c, so that a < b < c, the minimum xm is situated in the interval
(a, c) if f (a) > f (b) and f (b) < f (c).
If we have two values a and b, with a < b and f (a) > f (b), we use the following algorithm
for the localization of the minimum:
– given: a, b, a < b, f (a) > f (b);
– calculate fa = f (a), fb = f (b);
– repeat
– calculate c = b + k(b − a), fc = f (c);
– if fc > fb
then xm ∈ (a, c); stop;
else
– calculate a = b, b = c, fa = fb, fb = fc;
until false.
580 OPTIMIZATIONS
Observation 10.4
(i) Usually, the searching step is not taken constant (k = 1), but it increases from one step to
another, so that the localization of the minimum does take place as fast as possible
hj+1 = khj , k > 1. (10.13)
(ii) The algorithm may be improved by using a parabolic interpolation. Thus, a parabola passes
through the points A(a, f (a)), B(b, f (b)) and C(c, f (c)), whose equation is
g(x) =
(x − b)(x − c)
(a − b)(a − c)
f (a) +
(x − a)(x − c)
(b − a)(b − c)
f (b) +
(x − a)(x − b)
(c − a)(c − b)
f (c)
= d2x2
+ d1x + d0. (10.14)
Let us denote the point of minimum of this parabolaby
x∗
= −
d1
2d2
. (10.15)
The following situations may occur:
• x∗
> c. In this case it requires that x∗
not be very far from the point c, so that |x∗
− c| <
λ|c − b|, where we may take, for example, λ = 50;
• x∗
< a. The situation is similar to the previous one, replacing the point c by the point a;
• x∗
∈ (b, c), f (b) > f (x∗
), f (x∗
) < c. It follows that the minimum of the function is between
the points b and c;
• x∗
∈ (a, b), f (a) > f (x∗
), f (x∗
) < f (b). The case is analogous to the preceding one, the
minimum of the function f now taking place between a and b;
• x∗ ∈ (b, c), f (a) ≤ f (x∗) or f (x∗) ≥ f (c). The algorithm fails;
• x∗ ∈ (a, b), f (a) ≤ f (x∗), f (x∗) ≥ f (b). The algorithm fails.
10.2.2 Determination of the Minimum
There are two ways to solve the problem.
The first method supposes the reduction of the interval in which the minimum has been localized
by successive steps, until the point of minimum is obtained with the desired accuracy. The method
has the advantage of reliability (the point of minimum has been correctly determined), but also the
disadvantage of a slow convergence.
A second method to determine a point of minimum consists in replacing the function f (x) by
another function g(x), which does pass through certain points, common with f (x), and hence to
be g(xi) = f (xi) for certain xi of the interval in which the minimum takes place; it requires now
the minimum of the function g(x). The method has the advantage of a faster convergence as in
case of the previous one, but also the disadvantage of eventually leading to great errors if the point
of minimum of the function g(x) is not in the considered interval. Usually, we take a parabola for
g(x), because only three points are necessary to determine it.
In connection with the first method, let us present the golden section algorithm1
in the following:
– given: a < b < c, f (a) > f (b), f (b) < f (c), ε >
√
εm, w = 0.38197;
1The algorithm was presented by Jack Carl Kiefer (1924–1981) in 1953.
MINIMIZATION ALONG A DIRECTION 581
– calculate w1 = 1 − w, x0 = a, x3 = c, f0 = f (a), f3 = f (c);
– if |c − a| > |b − a|
then x1 = b, x2 = b + w|c − b|;
else x2 = b, x1 = b − w|b − a|;
– calculate f1 = f (x1), f2 = f (x2);
– while |x3 − x0| > ε|x1 + x2| do
– if f2 < f1
then x0 = x1, x1 = x2, x2 = w1x1 + wx3, f0 = f1, f1 = f2, f2 = f (x2);
else x3 = x2, x2 = x1, x1 = w1x2 + wx0, f3 = f2, f2 = f1, f1 = f (x1);
– if f1 < f2
then xmin = x1, fmin = f1;
else xmin = x2, fmin = f2.
The idea of the golden section algorithm is based on considerations which we further show. Let
us consider three points a, b and c with
a < b < c, fa = f (a) > f (b) = fb, fb < fc = f (c). (10.16)
Let
w =
b − a
c − a
, 1 − w =
c − b
c − a
. (10.17)
We shall try to find a point x ∈ (a, c) so as to diminish the interval in which the minimum will be
determined. We suppose also that (b, c) is an interval of length greater than (a, c) and that x is in
(b, c). Let us denote
z =
x − b
c − a
. (10.18)
The point of minimum will be either in the interval (a, x) or in the interval (b, c). We may write
x − a
c − a
= w + z,
c − b
c − a
= 1 − w. (10.19)
Imposing the condition of equality of the two ratios of (10.19) (the most unfavorable case), it
follows that the relation
z = 1 − 2w. (10.20)
But the same method has been used also for the determination of the point b at the previous step
x − b
c − b
=
b − a
c − a
= w, (10.21)
from which we may successively deduce
x − b = w(c − b) = z(c − a), 1 − w =
c − b
c − a
=
z
w
. (10.22)
We thus obtained the equation
w2
− 3w + 1 = 0, (10.23)
which has the solution (w must be in the interval (0, 1))
w =
3 −
√
5
2
≈ 0.38197; (10.24)
582 OPTIMIZATIONS
hence, it follows that the position of the point x
x = b + w(c − b) = c − (1 − w)(c − b). (10.25)
We will present Brent’s algorithm2
of the second method:
– given: a, c, f (a), f (c), nmax, w = 0.381966, ε;
– calculate b = c, fb = fc, u = b, fu = fb;
– if fb < fa
then t = b, ft = fb, v = a, fv = fa;
else t = a, ft = fa, v = b, fv = fb;
– set i = 1, δu = 0, δx = b − a;
– calculate x = 0.5(b + a), fx = f (x);
– while (b − a) > ε(2|x| + 1) and i ≤ nmax do
– calculate xm = 0.5(b + a);
– if |δx| > 0, 5δu or u − a < ε(2|x| + 1) or b − u < ε(2|x| + 1)
then
– if x > xm
then δx = w(a − x);
else δx = w(b − x), δu = max(|b − x|, |a − x|);
else r = (x − t)(fx − fv), q = (x − v)(fx − ft ), p = (x − v)q − (x − t)r,
δx = −0.5 p
q−r , δu = |δx|;
– calculate fu = f (u), u = x + δx;
– if fu ≤ fx
then
– if u ≥ x
then a = x;
else b = x;
– calculate v = t, t = x, x = u, fv = ft , ft = fx, fx = fu;
else
– if u < x
then a = u;
else b = u;
– if fu ≤ ft or t = x
then v = t, t = u, fv = ft , ft = fu;
else
– if fu ≤ fv or v = x or x = t
then v = u, fv = fu;
– set i = i + 1.
Brent’s algorithm uses six points a, b, u, v, t, x, not necessarily distinct, which have the following
meanings: a and b are the points of the limits of the interval which contains the minimum; x is the
point at which the function f takes the smallest value until a given moment; t is the value previous
to x; v is the value previous to t, while u is the point at which the function f has been calculated
last time. The parabolic interpolation is made through the points (x, f (x)), (t, f (t)) and (v, f (v)).
Brent’s algorithm combines the assurance of the first method with the speed of the parabolic
interpolation. To do this, we must take certain precautions so that the parabolic interpolation can
be accepted, that is:
• the calculated minimum be in the interval (a, b);
2Richard Pierce Brent (1946–) published this algorithm (also known as Brent’s method) in 1973.
CONJUGATE DIRECTIONS 583
• the displacement with respect to the last value which approximates the minimum of f be at
most equal to half of the previous displacement, to be sure that we have a convergent process;
• the calculated point of minimum u is not be very near to another value previously calculated,
that is, |u − p| > εp.
10.3 CONJUGATE DIRECTIONS
A method to determine the minimum of a function U : Rn
→ Rn
may be conceived as a repetition
of the method of one-dimensional search along the directions i1, i2, . . . , in, not necessarily in this
order. We thus determine a partial minimum of the function U, realizing the minimization of this
function along the direction ij1
; let U1 be this minimum. We minimize then along the direction ij2
,
resulting in the minimum U2 and so on until ijn
, obtaining the minimum Un. In the above procedure,
we have jk ∈ {1, 2, . . . , n} and ijk
= ijl
for jk = jl, k = 1, n, l = 1, n. Moreover, there exists the
sequence of inequalities
U1 ≥ U2 ≥ · · · ≥ Un. (10.26)
The algorithm is as follows:
– given: x(0)
, U(x);
– for j from 1 to n do x(j)
= min
α∈R
[U(x(j−1)
+ αij )].
Definition 10.3 The method considered above is called the method of one-dimensional search.
Observation 10.5 The method is very simple, but has the disadvantage that either the minimum
is not found or the time of work of the algorithm is sufficiently great to be inefficient.
The problem is put to determine other more efficient displacement directions.
Definition 10.4 The decreasing directions for which the method of one-dimensional search con-
verges are called conjugate directions.
Let us suppose that U(x) is twice differentiable with continuous derivatives. We may define the
quadratic form
φ(x) = U(x(k)
) + x1 − x(k)
1 . . . xn − x(k)
n







∂U
∂x1
...
∂U
∂xn







x=x(k)
+
1
2
x1 − x(k)
1 . . . xn − x(k)
n






∂2
U
∂x2
1
. . .
∂2
U
∂x1∂xn
. . . . . . . . .
∂2
U
∂xn∂x1
. . .
∂2
U
∂x2
n






x=x
(k)
k




x1 − x(k)
1
...
xn − x(k)
n



 . (10.27)
We observe that the quadratic form φ coincides with the three terms of the expansion into a Taylor
series of the function U(x) about x(k)
. The previous expression may be written in the form
φ(x) = U(x(k)
) + (x − x(k)
)∇U(x)|x=x(k) +
1
2
(x − x(k)
)∇2
U(x)|x=x(k) (10.28)
584 OPTIMIZATIONS
too. Moreover
∇φ(x) = ∇U(x)|x=x(k) +∇2
U(x)|x=x(k) (x − x(k)
). (10.29)
Let us denote by p(k)
the conjugate directions. The point x(k)
is the point which minimizes the
function φ(x(k−1)
+ αp(k−1)
), hence ∇U(x)|x=x(k) must be normal to the direction p(k−1)
, which is
written in the form
[p(k−1)
]T
∇U(x)|x=x(k) = 0. (10.30)
Moreover, the gradient of the function U(x), calculated at x = x(k+1)
, must be normal to the direction
p(k−1)
, otherwise p(k−1)
would not be a conjugate direction of minimization. Hence,
[p(k−1)
]T
∇U(x)|x=x(k+1) = 0 (10.31)
and equation (10.29) leads to
∇ (x) = ∇U(x)|x=x(k+1) + ∇2
U(x)|x=x(k+1) (x − x(k+1)
). (10.32)
Subtracting relations (10.32) and (10.29) one from the other, we get
∇U(x)|x=x(k+1) − ∇U(x)|x=x(k) + [∇2
U(x)|x=x(k+1) − ∇2
U(x)|x=x(k) ](x − x(k+1)
)
+ ∇2
U(x)|x=x(k) (x(k)
− x(k+1)
) = 0. (10.33)
Taking now into account that x(k+1) has been determined by the displacement along the conjugate
direction p(k)
, it follows that
∇U(x)|x=x(k+1) = ∇U(x)|x=x(k) + ∇2
U(x)|x=x(k) (x(k+1)
− x(k)
)
= ∇U(x)|x=x(k) + αk∇2
U(x)|x=x(k) p(k)
, (10.34)
with αk ∈ R. Taking into account formulae (10.29) and (10.30), the product of the last relation and
[p(k−1)
]T
leads to
[p(k−1)
]T
∇2
U(x)|x=x(k) [p(k)
] = 0. (10.35)
Definition 10.5 Two directions which satisfy condition (10.35) are called G -conjugate directions.
Observation 10.6
(i) If φ is a quadratic form, then its minimum is obtained after n displacements along n conjugate
directions defined by relation (10.35). Therefore, it is requested that at each minimization
stage of φ along the direction p(k)
, the minimum must be determined so that
[p(k)
]T
∇U(x)|x=x(k) = 0. (10.36)
(ii) If the function U is not a quadratic form, then its minimum is not obtained after n displace-
ments, but we arrive sufficiently near to it.
METHODS OF GRADIENT TYPE 585
10.4 POWELL’S ALGORITHM
The Powell algorithm3 gives a procedure to determine n conjugate directions without using the
matrix ∇2
U(x) and is as follows:
– given: x(0)
, U(x), ε, n, iter;
– for l from 1 to iter do
– for j from 1 to n do
– set p(j) = ij ;
– for k from 1 to n − 1 do
– for i from 1 to n do
– determine x(i)
so that min
α∈R
[U(x(i−1)
+ αp(i)
)];
– for i from 1 to n − 1 do p(i)
= p(i+1)
;
– set p(n) = p(n) − x(0);
– determine x(0)
so that min
α∈R
[U(xn
) + αp(n)
];
– if |U − U0| < ε(1 − |U|)
then stop (the minimum has been determined).
Powell showed that, for a quadratic form φ, k iterations lead to a set of directions p(i), of which
the latter k iterations are G -conjugate if the minimizations along the directions p(i)
have been
exactly made.
In the frame of the algorithm, an iteration means n + 1 minimizations made along the directions
p(1), p(2), . . . , p(n) and p(n) − x(0).
Powell’s algorithm has the tendency to lead to linearly dependent directions. To avoid this
phenomenon, we have two possibilities, that is:
• either we use new initial positions for the directions p(j) = ij after n + 1 iterations;
• or we renounce to the direction p(j)
which has produced the greatest decrease of the function
U(x).
10.5 METHODS OF GRADIENT TYPE
The methods of gradient type are characterized by the use of the gradient of the function to be
optimized, ∇U(x).
10.5.1 The Gradient Method
This method rises from the observation that the given n − 1 -dimensional hypersurfaces of equations
U(x) = Ci = const, i = 1, 2, . . . , (10.37)
are disposed so that the constants Ci take more and more greater values when we go along the
positive direction of the gradient.
Definition 10.6 The hypersurfaces defined by relation (10.37) bear the name of level surfaces of
the function U.
3Michael James David Powell (1936–) purposed this method in 1964.
586 OPTIMIZATIONS
The gradient method supposes the construction of the sequence of iterations
x(0)
arbitrary, x(k+1)
= x(k)
+ αk∇U(x(k)
), (10.38)
where
U(x(k)
) > U(x(k+1)
). (10.39)
Let us notice that the direction p(k)
= −∇U(x(k)
) is a direction of decreasing of the value of the
function U(x) at the point x(k)
(as a matter of fact, it is the direction of maximum decreasing for
the function U(x) at the point x(k)
).
The real value αk is determined by using one of the methods previously emphasized. Moreover,
if the value αk is exactly determined, then between the gradients of the points x(k) and x(k+1) there
exist the relations
∇U(x(k)
)⊥∇U(x(k+1)
) ⇒ [∇U(x(k)
)]T
∇U(x(k+1)
) = 0. (10.40)
Definition 10.7 If the value of the scalar αk is exactly determined at each step k, then we say that
the gradient method uses an optimal step or a Cauchy step.
Any algorithm which uses the gradient of the objective function U(x) has the following structure:
– given: x(0)
, U(x), ∇U(x), ε, iter;
– set
x = x(0)
, Uk = U(x(0)
), ∇U(x(k)
) = ∇U(x(0)
), p = ∇U(x(k)
);
– for i from 1 to iter do
– determine x so that min
α∈R
[x(k)
+ αp];
– set
Uk+1 = U(x), ∇U(x(k+1)) = ∇U(x);
– if Uk+1 ≥ Uk
then the algorithm failed; stop;
else perform test of convergence; actualize decreasing direction p;
– set Uk = Uk+1.
Observation 10.7
(i) A one-dimensional minimization method may be chosen, for example, Brent’s method.
(ii) The gradient method does not require an exact calculus for the one-dimensional mini-
mization. Therefore, we must specify a certain sufficiency criterion to determine the one-
dimensional minimum. An idea is that of using the directional derivative in the form
|[p(k)
]T
∇U[x(k)
+ αkp(k)
]| ≤ η|[p(k)
]T
∇U(x(k)
)|, 0 ≤ η ≤ 1. (10.41)
Thus, for η = 0 it follows that [p(k)
]T
∇U(x(k+1)
) = 0, hence the unidirectional minimization
has been exactly made.
We may impose also a condition of sufficient decreasing in the form
∇U(x(k+1)
) − ∇U(x(k)
) ≤ −µαk[∇U(x(k)
)]T
p(k)
. (10.42)
In general, we take
10−5
≤ µ ≤ 10−1
, µ < η < 1. (10.43)
METHODS OF GRADIENT TYPE 587
(iii) Concerning the convergence test, we may use many criteria. One of the criteria is defined
by the relation
x(k+1)
− x(k)
≤ ε(1 + x(k+1)
). (10.44)
A second criterion reads
U(x(k+1)
) − U(x(k)
) ≤ ε(1 + U(x(k+1)
) ). (10.45)
Sometimes one uses a criterion of the form
∇U(x(k+1)
) ≤ ε, (10.46)
but its fulfillment does not necessarily mean that U has a minimum at that point (it can be
a point of maximum or a mini–max one).
10.5.2 The Conjugate Gradient Method
Let us consider the quadratic form
φ(x) = ∇U(x(k)
) + [x − x(k)
]T
∇U(x(k)
) +
1
2
[x − x(k)
]T
∇2
U(x(k)
)[x − x(k)
] (10.47)
and a point x(k+1)
for which we can write
∇φ(x(k+1)
) = ∇U(x(k)
) + ∇2
U(x(k)
)[x(k+1)
− x(k)
]
= ∇U(x(k)
) + αk∇2
U(x(k)
)p(k)
,
(10.48)
where
x(k+1)
= x(k)
+ αkp(k)
, (10.49)
while the decreasing directions are given by
p(k+1)
= −∇U(x(k+1)
) + βkp(k)
. (10.50)
Imposing the condition that the directions p(k)
and p(k+1)
be G -conjugate
[p(k+1)
]T
∇2
U(x(k)
)p(k)
= 0, (10.51)
transposing relation (10.50)
[p(k+1)
]T
= −[∇U(x(k+1)
)]T
+ βk[p(k)
]T
(10.52)
and multiplying it at the right by ∇2U(x(k))p(k), we get
βk =
[∇U(x(k+1)
)]T
∇2
U(x(k)
)p(k)
[p(k)]T∇2U(x(k))p(k)
. (10.53)
Multiplying relation (10.52) by ∇2
U(x(k)
)p(k+1)
, it now follows that
[p(k+1)
]T
∇2
U(x(k)
)p(k+1)
= −[∇U(x(k+1)
)]T
∇2
U(x(k)
)p(k+1)
, (10.54)
where we take into account relation (10.51).
588 OPTIMIZATIONS
On the other hand, formula (10.48) leads to
∇2
U(x(k)
)p(k)
=
∇U(x(k+1)
) − ∇U(x(k)
)
αk
, (10.55)
relation which holds if ∇U(x(k+1)
) and ∇U(x(k)
) are normal to each other, hence
[∇U(x(k+1)
)]T
∇U(x(k)
) = 0. (10.56)
Relation (10.53) leads now to
βk = −
[∇U(x(k+1)
)]T
∇2
U(x(k)
)p(k)
[∇U(x(k))]T∇2U(x(k))p(k)
=
[∇U(x(k+1)
)]T
∇U(x(k+1)
)
[∇U(x(k))]T∇U(x(k))
. (10.57)
Multiplying relation (10.48) by [∇U(x(k+1)
)]T
and by [∇U(x(k)
)]T
and imposing condition (10.56)
of perpendicularity of the vectors ∇U(x(k)
) and ∇U(x(k+1)
), we obtain
αk = −
[∇U(x(k))]T∇U(x(k))p(k)
[∇U(x(k))]T∇2U(x(k))p(k)
=
[∇U(x(k+1))]T∇U(x(k+1))
[∇U(x(k+1))]T∇2U(x(k))p(k)
. (10.58)
On the other hand, the value αk of equation (10.48) is the value obtained from the approximation
min Uα∈R[x(k)
+ αp(k)
]. Indeed, it is sufficient to show that the vectors p(k)
and ∇U(x(k+1)
) are
normal to each other
[p(k)
]T
∇U(x(k+1)
) = 0. (10.59)
But, from equation (10.48), equation (10.50), and equation (10.54) it follows that
[p(k)
]T
∇U(x(k+1)
) = βk−1 [p(k−1)
]T
∇U(x(k)
). (10.60)
We thus deduce that if at the previous step the one-dimensional search has been exactly made, that
is, αk−1 has been determined so that p(k+1)
and ∇U(x(k)
) be normal to each other, then we have
relation (10.59) too.
Observation 10.8 We have thus obtained the G -conjugate directions p(k)
for which it has not
been necessary to know the Hessian matrix, but for which it is necessary that the weights αk be
exactly calculated.
We use several variants to determine βk, that is:
• the Fletcher–Reeves method4
for which
βk =
[∇U(x(k+1)
)]T
∇U(x(k+1)
)
[∇U(x(k))]T∇U(x(k))
; (10.61)
• the Polak–Ribi`ere method5
given by
βk =
[∇U(x(k+1))]Ty(k)
[∇U(x(k))]T∇U(x(k))
, y(k)
= ∇U(x(k+1)
) − ∇U(x(k)
); (10.62)
4
Roger Fletcher and C. M. Reeves published it in 1964.
5The method was presented by E. Polak and G. Ribi`ere in 1969.
METHODS OF GRADIENT TYPE 589
• the Hestenes–Stiefel method6
characterized by
βk =
[∇U(x(k+1))]Ty(k)
[∇U(x(k))]Tp(k)
, y(k)
= ∇U(x(k+1)
) − ∇U(x(k)
). (10.63)
The most robust of these three methods is the Polak–Ribi`ere method.
10.5.3 Solution of Systems of Linear Equations by Means of Methods of Gradient Type
Let the linear system be
Ax = b, (10.64)
where A is a positive definite symmetric matrix
AT
= A, xT
Ax > 0, (∀) x = 0. (10.65)
The solution of system (10.64) is equivalent to the minimization of the quadratic form
U(x) = x, Ax − 2 x, b , (10.66)
whereby ·, · we denoted the dot product given by
y, z = yT
z. (10.67)
The gradient of U(x) is expressed by
∇U(x) = −2(b − Ax), (10.68)
for the symmetric matrix A, while the Hessian reads
∇2
U(x) = 2A. (10.69)
If we denote by x the solution of system (10.64), then
∇U(x) = 0, ∇2
U(x) = 2A, (10.70)
hence the function U has a minimum at x. Moreover, if p is a decreasing direction, then we also
have
U(x + αp) = x + αp, A(x + αp) − 2 x + αp, b
= U(x) + 2α p, Ax − b + α2
p, Ap .
(10.71)
On the other hand,
p, Ap > 0, (10.72)
because A is a positive definite matrix; hence, U(x + αp) has a minimum for α = α, obtained from
dU(x + αp)
dα
= 0, (10.73)
6Magnus Rudolph Hestenes (1906–1991) and Eduard L. Stiefel (1909–1978) published the method in 1952.
590 OPTIMIZATIONS
that is,
2 p, Ax − b + 2α p, Ap = 0, (10.74)
from which
α =
p, b − Ax
p, b − Ap
. (10.75)
For α = α it follows that the minimum of the function U(x + αp) along the direction p
U(x + αp) = U(x) + α[2 p, Ax − b + α p, Ap ] = U(x) −
p, b − Ax 2
p, Ap
. (10.76)
Observation 10.9
(i) Using the method of the gradient for which the decreasing direction is
p = −∇U(x), (10.77)
we then obtain the following algorithm:
– given: x(0), A, b, iter, ε;
– set i = 1; norm = 1; x = x(0)
;
– while norm > ε and i ≤ iter do
– calculate p = b − Ax, norm =
√
p, p ; α = norm2
/ p, Ap , x = x + αp, i = i + 1.
(ii) If we apply the Fletcher–Reeves method, then we obtain the algorithm:
– given: x(0)
, A, b, iter, ε, δ;
– set r(0)
= b − Ax(0)
, p(0)
= r(0)
;
– for k from 0 to iter − 1 do
– if p(k), p(k) < δ
then stop;
– calculate αk = r(k),r(k)
p(k),Ap(k) , x(k+1) = x(k) + αkp(k), r(k+1) = r(k) − αkAp(k);
– if r(k+1)
, r(k+1)
< ε
then stop;
– calculate βk = r(k+1),r(k+1)
r(k),Ar(k) , p(k+1)
= r(k+1)
+ βkp(k)
.
10.6 METHODS OF NEWTON TYPE
The methods of Newton type use the Hessian matrix ∇2
U(x).
10.6.1 Newton’s Method
The Newton method approximates the objective function U(x), at an arbitrary iteration k, by a
quadratic form
φk(x) = U(x(k)
) + [x − x(k)
]T
∇U(x(k)
) +
1
2
[x − x(k)
]T
∇2
U(x(k)
)[x − x(k)
]. (10.78)
METHODS OF NEWTON TYPE 591
If the Hessian matrix ∇2U(x(k)) is positive definite, then the quadratic form φk(x) has a minimum
x = x, hence
φk(x) − φk(x) > 0 (10.79)
in a neighborhood of x. Moreover, the point of minimum x is a stationary point, hence the gradient
of φk(x) vanishes at this point
∇φk(x) = 0. (10.80)
We may write the approximate relation
φk(x) − φk(x) ≈
1
2
[x − x]T
∇2
U(x(k)
)[x − x]. (10.81)
Equation (10.80) may be solved using Newton’s method, which leads to the definition of the
iterative sequence
x(0)
arbitrary, x(k+1)
= x(k)
− [∇2
U(x(k)
)]−1
∇U(x(k)
). (10.82)
Definition 10.8 The decreasing direction p(k)
, defined by
p(k)
= −[∇2
U(x(k)
)]−1
∇U(x(k)
), [∇U(x(k)
)]T
p(k)
< 0, (10.83)
bears the name of Newton direction.
Observation 10.10
(i) The affirmation x(0)
arbitrary in relation (10.82) must be understood as x(0)
being an arbitrary
point in a sufficiently small neighborhood of the exact solution, which is valid in any Newton
method.
(ii) If the Hessian matrix ∇2
U(x(k)
) is not positive definite, then it may happen that ∇U(x(k+1)
)
be greater in value as ∇U(x(k)
), that is, the direction p(k)
is no more a decreasing direction.
(iii) If U(x) has flat zones, in other words if it can be approximated by a hyperplane, then in
these zones the Hessian matrix ∇2
U(x) vanishes and the method cannot be applied. For
these zones it would be necessary to determine, instead of the Hessian ∇2
U(x), another
positive definite matrix to may continue the procedure.
Various algorithms have been conceived to eliminate such inconveniences; one such algorithmis
the algorithm of the trust region, which is presented as follows:
– given: x(0)
, U(x), ∇U(x), ∇2
U(x), µ, η, γ1, γ2, δ0, λ0, ε, εp, iter, np;
– set x = x(0)
, δ = δ0, λ = λ0, Uk = U(x(0)
), ∇U(x) = ∇U(x(0)
), ∇2
U(x(k)
) = ∇2
U(x(0)
),
φk(x) = Uk;
– for k from 1 to iter do
– set d = 1, ip = 1;
– while |d| > εp|λ| + 10−5
and ip < np do
– calculate the Cholesky factorization of ∇U(x(k)) + λI = RTR;
– solve the system RT
Rp(k)
= −∇U(x(k)
);
– solve the system RT
q = −p(k)
;
– calculate d = ( p(k) / q )2(( p(k) /δ) − 1), λ = λ + d, ip = ip + 1;
– calculate x(k+1)
= x(k)
+ p(k)
, Uk+1 = U(x(k+1)
), φk+1 = Uk + [p(k)
]T
∇U(x(k+1)
) +
1/2[p(k)]T∇2U(x(k+1))p(k), d = Uk+1 − Uk;
592 OPTIMIZATIONS
– if |d| < ε|Uk+1|
then stop (the minimum has been found);
– calculate rk = d/φ(x(k+1)
) − φ(x(k)
);
– if rk > µ
then x(k+1)
= x(k)
;
– if rk ≤ µ
then δ = γ1δ;
else
– if rk > η
then δ = γ2δ.
Observation 10.11
(i) The usual values for each of the parameters µ, η, γ1, and γ2 are as follows
µ = 0.25, η = 0.75, γ1 = 0.5, γ2 = 2. (10.84)
(ii) The algorithm establishes a trust region in the model, that is, a region in which U(x) may
be good approximated by a quadratic form φk(x). This zone is a hypersphere of center x(k)
and radius δk; we try the point of minimum for φk(x) in this hypersphere. This minimum
is not taken into consideration if it does not belong to the interior of the hypersphere.
(iii) The length of the hypersphere radius which defines the trust zone at the step k + 1 is
calculated as a function of the previous value and of the ratio rk between the effective
reduction of the hypersphere radius and the planed one
rk =
U(x(k+1)
) − U(x(k)
)
φ(x(k+1)) − φ(x(k))
. (10.85)
If rk is small, then δk+1 < δk, otherwise we consider δk+1 > δk.
(iv) The searching Newton direction p(k)
is determined by the relation
[∇2
U(x(k)
) + λI] p(k)
= −∇U(x(k)
), (10.86)
where λ is a parameter which assures that the matrix ∇2
U(x(k)
) + λI be positive definite
so as to avoid the cases in Observation 10.10.
(v) The Cholesky decomposition is not imperatively necessary, but increases the calculation
velocity to solve system (10.86) in case of the positive definite matrix ∇2U(x(k)) + λI.
10.6.2 Quasi-Newton Method
The quasi-Newton method approximates the Hessian matrix ∇2U(x) by a positive definite symmetric
matrix B.
The equation which determinates the decreasing direction p(k) is now written in the form
Bkp(k)
= −∇U(x), (10.87)
while x(k+1)
is determined by the relation
x(k+1)
= x(k)
+ αkp(k)
, (10.88)
LINEAR PROGRAMMING: THE SIMPLEX ALGORITHM 593
where αk results from the condition of minimum of the function of one variable U[x(k) + αkp(k)].
It remains to solve the problem of bringing up-to-date the matrix Bk in the matrix Bk+1. There
exist several methods, the most known being:
• the Davidon–Fletcher–Powell7
method for which
Bk+1 = Bk +
z(k)
[y(k)
]T
+ y(k)
[z(k)
]T
[y(k)]T[x(k+1) − x(k)]
−
[z(k)
]T
[x(k+1)
− x(k)
]
{[y(k)]T[x(k+1) − x(k)]}2
y(k)
[y(k)
]T
,
z(k)
= y(k)
+ αk∇U(x(k)
), y(k)
= ∇U(x(k+1)
) − ∇U(x(k)
); (10.89)
• the Broyden–Fletcher–Goldfarb–Shanno method in which
Bk+1 = Bk +
y(k)
[y(k)
]T
[y(k)]T[x(k+1) − x(k)]
−
Bk[x(k+1)
− x(k)
] [x(k+1)
− x(k)
]T
Bk
[x(k+1) − x(k)]TBk[x(k+1) − x(k)]
,
y(k)
= ∇U(x(k+1)
) − ∇U(x(k)
). (10.90)
We may write
x(k+1)
= x(k)
− αkB−1
k ∇U(x(k)
) (10.91)
too, while formulae (10.89) and (10.90) also give the inverse B−1
k+1 as a function of B−1
k . Thus:
• the Davidon–Fletcher–Powell method gives
B−1
k+1 = B−1
k +
[x(k+1)
− x(k)
] [x(k+1)
− x(k)
]T
[y(k)]T[x(k+1) − x(k)]
−
B−1
k y(k)
[y(k)
]T
B−1
k
[y(k)]TB−1
k y(k)
; (10.92)
• the Broyden–Fletcher–Goldfarb–Shanno8
method leads to
B−1
k+1 =
B−1
k − B−1
k y(k)[x(k+1) − x(k)]T + [x(k+1) − x(k)][y(k)]TB−1
k
[y(k)]T[x(k+1) − x(k)]
+
[x(k+1)
− x(k)
][y(k)
]T
B−1
k y(k)
[x(k+1)
− x(k)
]T
{[y(k)]T[x(k+1) − x(k)]}2
+
[x(k+1) − x(k)][x(k+1) − x(k)]T
[y(k)]T[x(k+1) − x(k)]
. (10.93)
10.7 LINEAR PROGRAMMING: THE SIMPLEX ALGORITHM
10.7.1 Introduction
Let a linear system of m equations with n unknowns be
Ax = b. (10.94)
Definition 10.9 Two linear systems are called equivalent if any solution of the first system is a
solution of the second system too and reciprocal.
Definition 10.10 We call elementary transformation applied on a linear system as any one of the
following:
• the multiplication of an equation by a nonzero number;
• the change of the order of two equations;
7
William C. Davidon, Roger Fletcher and Michael James David Powell published the method in 1958 and 1964.
8Charles George Broyden (1933–2011), Roger Fletcher, Donald Goldfarb and David Shanno published the method
in 1970.
594 OPTIMIZATIONS
• the multiplication of an equation by a nonzero number, the addition of the result to another
equation, and the replacing of the latter equation by the equation thus obtained.
Observation 10.12
(i) Each of the above operation determines an operation on the enlarged matrix of the system.
These transformations are equivalent to the multiplication of the extended matrix at the left
by certain matrices.
Thus, considering the matrix
M1 =






1 · · · 0 · · · 0
· · · · · · · · · · · · · · ·
0 · · · α · · · 0
· · · · · · · · · · · · · · ·
0 · · · 0 · · · 1






, M1 ∈ Mm(R), (10.95)
which differs from the unit matrix only by the element α situated at the position (i, i), α = 0,
the multiplication at the left of the extended matrix by M1 has as effect, the multiplication
of the row i of the extended matrix by α.
If we multiply the extended matrix at left by the matrix M2 given by
M2 =










1 · · · 0 · · · 0 · · · 0
· · · · · · · · · · · · · · · · · · · · ·
0 · · · 0 · · · 1 · · · 0
· · · · · · · · · · · · · · · · · · · · ·
0 · · · 1 · · · 0 · · · 0
· · · · · · · · · · · · · · · · · · · · ·
0 · · · 0 · · · 0 · · · 1










, M2 ∈ Mm(R), (10.96)
which differs from the unit matrix of order m by the elements at the positions (i, i) and
(i, j) replaced by 0 and by the elements at the positions (i, j) and (j, i) replaced by 1, then
the product M2A, where A is the extended matrix has as effect, the interchange of the rows
i and j of the extended matrix A.
Let us now consider the matrix
M3 =










1 · · · 0 · · · 0 · · · 0
· · · · · · · · · · · · · · · · · · · · ·
0 · · · 1 · · · α · · · 0
· · · · · · · · · · · · · · · · · · · · ·
0 · · · 0 · · · 1 · · · 0
· · · · · · · · · · · · · · · · · · · · ·
0 · · · 0 · · · 0 · · · 0










, M3 ∈ Mm(R), (10.97)
which differs from the unit matrix by the element at the position (i, j), which has the value
α = 0; then the product M3A has as effect, the multiplication of the row j by α and the
addition of it to the row i.
(ii) The elementary operations lead, obviously, to equivalent systems.
Definition 10.11 A system is called explicit if the matrix of the system contains all the columns
of the unit matrix of order m (the number of the equations of the system).
LINEAR PROGRAMMING: THE SIMPLEX ALGORITHM 595
Observation 10.13
(i) The columns of the unit matrix may be found at any position in the matrix A of the system.
(ii) A developed linear system has the number of unknowns at least equal to the number of
equations, that is, m ≤ n.
Definition 10.12 The variables, the coefficients of which form the columns of the unit matrix
are called principal or basic variables. The other variables of the system are called secondary or
nonbasic variables.
Observation 10.14 A compatible system may be developed so as to have exactly m columns of
the unit matrix. To do this, it is sufficient to effect the elementary transformations presented in a
certain order.
Definition 10.13
(i) A solution of system (10.94) in which the n − m secondary variables vanish is called basic
solution.
(ii) A basic solution is called nondegenerate if it has exactly m nonzero components (the
principal variables have nonzero values) and degenerate in the opposite case.
10.7.2 Formulation of the Problem of Linear Programming
Definition 10.14
(i) A problem of linear programming is a problem which requires the minimization (maximiza-
tion) of the function
f (x1, x2, . . . , xn) = minimum (maximum) (10.98)
if
fi(x1, x2, . . . , xn) ≤ bi, i = 1, p,
fj (x1, x2, . . . , xn) ≥ bj , j = p + 1, q,
fk(x1, x2, . . . , xn) = bk, k = q + 1, r (10.99)
and
xl ≥ 0, l = 1, m1,
xh ≤ 0, h = m1 + 1, m2,
xt arbitrary, t = m2 + 1, m, (10.100)
the functions f , fi, fj , and fk being linear.
(ii) Conditions (10.99) are called the restrictions of the problem, while the vector x = [x1 . . .
xn]T, which verifies the system of restrictions, is called possible solution of the linear
programming problem.
(iii) The possible solution x which verifies conditions (10.100) too is called admissible solution
of the linear programming problem.
(iv) The admissible solution which realizes the extremum of function (10.98) is called optimal
solution or optimal program.
The linear programming may be written in a matrix form
AxSb, x > 0, f = CT
x = minimum (maximum), (10.101)
596 OPTIMIZATIONS
in which
A = [aij ]i=1,m
j=1,n
, b = b1 · · · bn
T
, C = c1 · · · cn
T
, (10.102)
and where S takes the place of one of the signs ≤, =, or ≥.
Let us observe that the second relation (10.101) puts the condition that all variables be non-
negative. This can be always obtained, as will be seen later.
Definition 10.15
(i) A problem of linear programming is of standard form if all the restrictions are equations
and if we impose conditions of non-negativeness to all variables.
(ii) A problem of linear programming is of canonical form if all the restrictions are inequalities
of the same sense and if conditions of non-negativeness are imposed to all variables.
Observation 10.15
(i) A program of standard form reads
Ax = b, x ≥ 0, f = CT
x. (10.103)
(ii) A program of canonical form is written
Ax ≥ b, x ≥ 0, CT
x = minimum (10.104)
or
Ax ≤ b, x ≥ 0, CT
x = maximum. (10.105)
(iii) A program may be brought to a standard or to a canonical form by using the following
elementary transformations:
• an inequality of a certain sense may be transformed into an opposite sense by multipli-
cation with −1;
• a negative variable may be transformed in a positive one by its multiplication with −1;
• a variable, let us say xk, xk ∈ R, is written in the form
xk = x(1)
k − x(2)
k , (10.106)
where x(1)
k ≥ 0, x(2)
k ≥ 0;
• an equality is expressed by means of two inequalities; so
ai1x1 + ai2x2 + · · · + ainxn = bi (10.107)
is written in the form
ai1x1 + ai2x2 + · · · + ainxn ≤ bi, ai1x1 + ai2x2 + · · · + ainxn ≥ bi; (10.108)
LINEAR PROGRAMMING: THE SIMPLEX ALGORITHM 597
• the inequalities are transformed in equalities by means of the compensation variables;
thus
ai1x1 + ai2x2 + · · · + ainxn ≤ bi (10.109)
becomes
ai1x1 + ai2x2 + · · · + ainxn + y = bi, y ≥ 0, (10.110)
while
ai1x1 + ai2x2 + · · · + ainxn ≥ bi (10.111)
is transformed in
ai1x1 + ai2x2 + · · · + ainxn − y = bi, y ≥ 0. (10.112)
10.7.3 Geometrical Interpretation
In the space Rn
, an equality of the restrictions system defines a hyperplane, while an inequality
defines a half-space. We thus define a convex polyhedron in the space Rn, and if the optimum is
unique, then it will be situated at one of the vertices of this polyhedron.
The objective function, written in the form
f (x) = c1x1 + c2x2 + · · · + cnxn = λ, λ ∈ R, (10.113)
defines a pencil of hyperplanes, while for λ = 0 we obtain a hyperplane which passes through the
origin.
Definition 10.16
(i) The hyperplanes of the pencil (10.113) are called level hyperplanes.
(ii) The hyperplanes become straight lines in R2
and are called level straight lines.
Observation 10.16 The objective function has the same value at points situated on the same level
hyperplane.
10.7.4 The Primal Simplex Algorithm
Definition 10.17 A linear program is said to be in primal admissible form if it is given by the
relations
maximum (minimum)f(x) = f0 +
k∈K
a0kxk, (10.114)
xi +
k∈K
aik xk = bi, i ∈ I, (10.115)
xk ≥ 0, xi ≥ 0, k ∈ K, i ∈ I, (10.116)
where K is the set of indices of secondary variables, while I marks the set of indices of principal
variables.
598 OPTIMIZATIONS
Observation 10.17 Obviously, any linear program may be brought to the primal admissible form
by means of the elementary transformations presented above.
Let a program be in the primal admissible form and an admissible basic solution corresponding
to this form. The Simplex algorithm9 realizes a partial examination of the list of the basic solutions
of the system of restrictions, having scope to find an optimal basic solution or to demonstrate the
inexistence of such a solution.
Let us assume that after r steps the program takes its primal admissible form
f = f (r)
0 +
k∈K(r)
a0kxk, (10.117)
xi =
k∈K(r)
aik xk = bi, i ∈ I(r)
, (10.118)
xk ≥ 0, xi ≥ 0, k ∈ K(r)
, i ∈ I(r)
, (10.119)
where the upper index r marks the iteration step.
There are four operations to perform:
• application of the optimality criterion. If a0k ≥ 0 for all k ∈ K(r)
, then the linear program has
the basic solution obtained at the step r and the algorithm stops; in the opposite case, we pass
to the following stage;
• application of the entrance criterion. At this stage, we determine the secondary unknown xh,
which becomes a principal variable and is given by
a0h = min
k∈K(r)
a0k < 0. (10.120)
If all aih ≤ 0, i ∈ I(r)
, then the program does not have an optimal solution and the algorithm
stops; in the opposite case, we pass to the following stage;
• application of the exit criterion. We determine the principal variable xj , which becomes sec-
ondary by the relations
xj
ajh
= min
i∈I(r)
aih >0
bi
aih
; (10.121)
• we make a pivoting with the pivot ajh to obtain the column of the unit matrix on the column h.
Usually, we use tables.
x1 · · · xi . . . xm xm+1 . . . xk . . . xn
0 · · · 0 . . . 0 a01 . . . a0k . . . a0n −f0
x1 1 . . . 0 . . . 0 a1,m+1 . . . a1k . . . a1n b1
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
xi 0 . . . 1 . . . 0 ai,m+1 . . . ai,k . . . ai,n bi
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
xm 0 . . . 0 . . . 1 am,m+1 am,k am,n bm
9The algorithm was purposed by George Bernard Dantzig (1914–2005) in 1947.
LINEAR PROGRAMMING: THE SIMPLEX ALGORITHM 599
10.7.5 The Dual Simplex Algorithm
Definition 10.18 Let the problem of linear programming in canonical form be
a11x1 + a12x2 + · · · + a1nxn ≥ b1, a21x1 + a22x2 + · · · + a2nxn ≥ b2, . . . ,
am1x1 + am2x2 + · · · + amnxn ≥ bm, (10.122)
x1 ≥ 0, x2 ≥ 0, . . . , xn ≥ 0, (10.123)
minimumf = c1x1 + c2x2 + · · · + cnxn. (10.124)
By definition, the dual of this problem is
a11y1 + a21y2 + · · · + am1ym ≤ c1, a12y1 + a22y2 + · · · + a2nym ≤ c2, . . . ,
a1ny1 + a2ny2 + · · · + amnym ≤ cn, (10.125)
y1 ≥ 0, y2 ≥ 0, . . . , ym ≥ 0, (10.126)
max g = b1y1 + · · · + bmym. (10.127)
Observation 10.18 The dual problem is obtained from the primal problem as follows:
• to each restriction of the system of restrictions (10.122) we associate a dual variable yi;
• the variable yi does not have a sign restriction if the corresponding restriction of (10.122) is
an equality and has a restriction in case of inequality; thus, for ≥ corresponds yi ≥ 0, for ≤
corresponds yi ≤ 0 and for = corresponds yi arbitrary;
• to each variable xi we associate a restriction in which the coefficients of the variables y1, . . . ,
ym are the coefficients of the variable xi of system (10.122), while the free terms are ci;
• the dual restriction associated to xi is ≤ if xi ≥ 0, is ≥ if xi ≤ 0 and is = for any xi;
• the minimum of the objective function of the primal problem is transformed in the maximum
of the objective function of the dual problem;
• the objective function of the dual problem is obtained by means of the free terms of the initial
restrictions (10.122).
Definition 10.19 A linear program is in an explicit dual admissible form if
xi +
k∈K
aik xk = bi, i ∈ I, (10.128)
xi ≥ 0, xk ≥ 0, i ∈ I, k ∈ K, (10.129)
f0 +
k∈K
a0kxk = minimum. (10.130)
Let us suppose that at the step r, the linear program is expressed by the relations
xi +
k∈K
aik xk = bi, i ∈ I(r)
, (10.131)
600 OPTIMIZATIONS
xi ≥ 0, xk ≥ 0, i ∈ I(r)
, k ∈ K(r)
, (10.132)
f0 +
k∈K(r)
a0kxk = minimum. (10.133)
For the step r + 1, we have to pass through the stages:
• application of the optimality criterion. At this stage we establish if bi ≥ 0 for all i ∈ I(r). If
the answer is yes, then the solution is optimal; in the opposite case, we pass to the following
stage;
• application of the exit criterion. We determine the unknown xj , which becomes secondary by
the condition
bj = min
i∈I(r)
bi, (10.134)
and we verify if all the elements ajk ≥ 0, with k ∈ K(r). If yes, then the problem does not
have an admissible solution; in the opposite case, we pass to the following step;
• application of the entrance criterion. We determine the unknown xh, which becomes a principal
variable. This results from the condition
a0h
ajh
= min
k∈K(r)
ajk <0
a0k
ajk
; (10.135)
that effects the pivoting with the element ajh.
10.8 CONVEX PROGRAMMING
Definition 10.20 Let X be a convex set and f a function f : X → R. We say that the function
f is convex (or convex in Jensen’s sense) if for any α ∈ (0, 1) and any x1, x2 of X we have
f (αx1 + (1 − α)x2) ≤ f (x1) + (1 − α)f (x2). (10.136)
Observation 10.19
(i) If f is differentiable, then, instead relation (10.136), we may consider the inequality
f (x) ≥ f (x∗
) + f (x∗
), x − x∗
, (10.137)
where ·, · marks the scalar product.
(ii) If f : I ⊂ R → R, I being an interval, let us consider the expansion into a Taylor series
f (x) = f (x∗
) + f (x∗
)(x − x∗
) +
1
2
f (ξ)(x − x∗
)2
, (10.138)
where ξ is a point between x and x∗
. The convexity condition of f leads to the inequality
f (x) ≥ 0, for any x ∈ I.
(iii) In the case f : D ⊂ Rn
→ R, it requires that the matrix of the derivatives of second order
be semi-positive definite, that is,
CONVEX PROGRAMMING 601
x1 x2 · · · xn














∂2
f
∂x2
1
∂2
f
∂x1∂x2
· · ·
∂2
f
∂x1∂xn
∂2
f
∂x2∂x1
∂2
f
∂x2
2
· · ·
∂2
f
∂x2∂xn
· · · · · · · · · · · ·
∂2
f
∂xn∂x1
∂2
f
∂xn∂x2
· · ·
∂2
f
∂x2
n



















x1
x2
...
xn





≥ 0, (10.139)
for any x ∈ D.
The problem of convex programming requires determining the minimum of f (x), f : D ⊂ Rn →
R in the condition of a restriction of the form gi(x) ≤ 0, i = 1, m.
If we denote by E the admissible set
E = {x ∈ D|gi(x) ≤ 0, i = 1, m}, (10.140)
then the problem of convex programming requires the determination of the value inf
x∈E
f (x).
We define Lagrange’s function by
L(x, λ) = f (x) +
m
i=1
λigi(x), (10.141)
where
λ = λ1 λ2 · · · λm
T
(10.142)
is a vector of non-negative components λi > 0, i = 1, m.
We suppose that the condition of regularity is fulfilled too, in the sense that there exists at least
a point ξ ∈ E at which the inequalities gi(x) ≤ 0 become strict, that is,
gi(ξ) < 0, i = 1, m. (10.143)
The Kuhn–Tucker theorem states that to ensure the minimum of the function f by x∗ it is
sufficient (and necessary if the condition of regularity is fulfilled) that the vector λ = [λ1 λ2 . . .
λm]T
does exist so as to have
L(x∗
, λ) ≤ L(x∗
, λ∗
) ≤ L(x, λ∗
) (10.144)
for all x ∈ D and λ > 0 (λ1 > 0, λ2 > 0, . . . , λm > 0).
The point (x∗, λ∗) ∈ D × Rm
+ is called saddle point for Lagrange’s function and fulfills the
condition
λ∗
i gi(x∗
) = 0, i = 1, m. (10.145)
Moreover,
L(x∗
, λ∗
) = f (x∗
). (10.146)
Let us suppose that Lagrange’s function has the saddle point (x∗
, λ∗
) and let us consider the
function
φ(λ) = inf
x∈D
L(x, λ), λ ≥ 0 (10.147)
602 OPTIMIZATIONS
for which φ(λ∗) = f (x∗). Now, let the function be f : Rn → R, given by
f (x) =
∞ if x /∈ E,
f (x) if x ∈ E.
(10.148)
Definition 10.21 Let us define the dual problem of the convex programming
inf
x∈E
f (x), (10.149)
that is the problem
inf
x∈D
f (x). (10.150)
We have
f (x∗
) = min
x∈D
f (x) = max
λ>0
φ(λ), (10.151)
hence, instead of searching f (x∗
) = min
x∈D
f (x), we may determine max
λ>0
φ(λ).
10.9 NUMERICAL METHODS FOR PROBLEMS OF CONVEX PROGRAMMING
We present hereafter some methods of convex programming.
10.9.1 Method of Conditional Gradient
For the point x of the admissible set E, we consider the problem
min
x∈E
[f (x) + f (x), x − x ]. (10.152)
If x0
is the solution of this problem, then, on the segment of a line which joints the points x and
x0, that is, for the points
x = (1 − α)x + αx0
, (10.153)
we search the point of minimum of f , that is, we solve the problem
min
α∈[0,1]
[f (x + α(x0
− x))]; (10.154)
let us suppose that this minimum is attained for α = α. Under these conditions, we continue the
procedure with the point
x1
= x + α(x0
− x). (10.155)
10.9.2 Method of Gradient’s Projection
The idea of the method of gradient’s projection consists in the displacement along the antigradient’s
direction −∇f (x) by a step so as to not go out of the domain of admissible solutions. If h is the
length of the step (which depends on any iteration step), then we calculate
x0
= x − ∇f (x); (10.156)
we solve the problem
min
x∈E
1
2
x − x0
, x − x0
, (10.157)
continuing the procedure with the point of minimum thus obtained.
QUADRATIC PROGRAMMING 603
10.9.3 Method of Possible Directions
Let x be an admissible point and let us define the set of active restrictions at the point x, denoted
by S(x) as being the set of all indices i for which gi(x) = 0.
At the point x, we will search a direction x, which make an obtuse angle with ∇f as well
as with the external normals to the active restrictions ∇gi(x), i ∈ S(x). This choice leads to the
diminishing of the function to be minimized and ensures to remain in the interior of E, if we impose
the conditions
∇f (x), x + β ∇f (x) ≤ 0, (10.158)
∇gi(x), x + β ∇gi(x) ≤ 0, i ∈ S(x), (10.159)
where the factor β has to be minimized. Usually, we introduce also a normalization condition of
the form
x, x ≤ 1 (10.160)
or
−1 ≤ xj ≤ 1, j = 1, n. (10.161)
Once the direction x is determined, we pass to the solving of the problem of one-directional
minimization
min
β
[f (x + βx)], with gi(x + βx) ≤ 0, i ∈ S(x). (10.162)
10.9.4 Method of Penalizing Functions
In the frame of penalizing functions method, we introduce a term in the function to be minimized,
which penalizes the non compliance of a restriction. Let us consider
(x) =
m
i=1
[max{gi(x), 0}]2
(10.163)
and let us search the minimum of the function f (x) + r (x), where r is a sufficiently great positive
number.
10.10 QUADRATIC PROGRAMMING
Let us consider the programming problem
min f (x) = min

1
2
n
j=1
n
k=1
cjk xj xk +
n
j=1
dj xj

 , (10.164)
n
j=1
aij xj ≥ bi, i = 1, m, (10.165)
or in a matrix form,
min f (x) = min
1
2
x, Cx + D, x , (10.166)
Ax ≥ b, (10.167)
604 OPTIMIZATIONS
where C ∈ Mn(R) is symmetric and positive definite, A ∈ Mm,n(R), D ∈ Mn,1(R), b ∈ Mm,1(R).
Lagrange’s function is of the form
L =
1
2
x, Cx + D, x + λ, b − Ax , (10.168)
the saddle point being searched for λ ≥ 0.
The optimality criterion
L(x∗
, λ∗
) ≤ L(x, λ∗
) (10.169)
leads to
∂L(x, λ∗)
∂x
|x=x∗ = 0 (10.170)
or
Cx∗
+ D − AT
λ∗
= 0. (10.171)
The inequality
L(x∗
, λ) ≤ L(x∗
, λ∗
) (10.172)
leads to
Ax∗
≥ b, (10.173)
λ∗
i (bi − (Ax)i) = 0, i = 1, m. (10.174)
Moreover,
λ∗
i ≥ 0, i = 1, m. (10.175)
It follows that if the pair (x∗
, λ∗
) satisfies conditions (10.171), (10.173), (10.174), and (10.175),
then x∗
is the solution for problems (10.166), (10.167), while λ∗
is the solution for the dual problem.
We suppose that the rows of the matrix A are linearly independent (it means that the restrictions
(10.165) are independent). In relation (10.174), we have denoted by (Ax)i the element on the row
i and column 1 in the product Ax.
The system of restrictions (10.167) defines a polyhedrical set with faces of various dimensions.
Each face contains only admissible points, which satisfy a system of equations
AI x = bI (10.176)
where AI is the matrix obtained from the matrix A by retaining only the rows of the set I, that is,
the matrix of rows (A)i, i ∈ I, I = {i1, i2, . . . , iI }; analogically for the matrix bI .
On the other hand, the minimum x∗
is found on a face of the polyhedron, in particular on an
edge or at its vertex.
Let us suppose that there exists the admissible point x for which we have the set
I = {i1, i2, . . . , iI }, the rows of the matrix AI are independent, while x satisfies relation (10.176).
There may occur two situations.
(i) The point x gives the minimum of the function f with the restrictions (10.176). It follows
that there exist the factors λi, i ∈ I, for which
∂
∂x
[f (x) + λI (bI − AI x)]x=x = 0, (10.177)
that is,
Cx + D − AT
I λI = 0. (10.178)
DYNAMIC PROGRAMMING 605
From relation (10.178), we determine the vector λI ; if all its components λI , i ∈ I are non-
negative, then the algorithm stops, because the searched solution x∗
= x has been found.
But, if there exists an index ik ∈ I for which λik
< 0, then ik is eliminated from the set I,
resulting in
I = {i1, . . . , ik−1, ik, . . . , iI }, (10.179)
that is, we pass to a new face of the polyhedral set.
(ii) The function f attains its minimum in the set of solutions (10.176), at a point x0 = x. In
this case, we write
z = x0 − x, (10.180)
gi = −(Ax)i = −
n
j=1
aij zj , i /∈ I, (10.181)
i = (Ax)i − bi =
n
j=1
aij xj − bi, i /∈ I (10.182)
and determine
ε0 = min
gi>0
i
gi
. (10.183)
We choose
ε = min{ε0, 1}. (10.184)
If ε = 1, then the displacement has been made at the point x0, the set I being preserved. If ε < 1,
then this minimum has been attained for an index i which did not belong to the set and the set I
must be brought up-to-date by adding this index also
I = {i1, i2, . . . , II , i }; (10.185)
we replace thus the point x by the point x + εx0.
Let us notice that for the determination of x(0)
, that is, of the start point, we must solve the linear
system
Cx + D − AT
I λI = 0, AI x = bI . (10.186)
10.11 DYNAMIC PROGRAMMING
Let us consider the optimal control problem for the system10
dϕ
dt
= f(ϕ, u), 0 ≤ t ≤ T, ϕ(0) = ϕ0, (10.187)
where
ϕ = φ1 φ2 · · · φn
T
, f = f1 f2 · · · fn
T
,
u = u1 u2 · · · um
T
, n ∈ N, m ∈ N, n ≥ 1, m ≥ 1, (10.188)
10The concept of dynamic programming was introduced by Richard E. Bellman (1920–1984) in 1953.
606 OPTIMIZATIONS
The admissible commands are given by u = u(t) and are piecewise continuous, u(t) ∈ U, where
U is a closed set.
In the class of the admissible commands we must find a command u(t) to which corresponds
the solution ϕ(t) of problem (10.187) for which the functional
F(u) =
T
0
f0(ϕ, u)dt (10.189)
be minimum.
To do this, we apply Bellman’s principle11
which states that the optimal command, at any
moment, does not depend on the previous history of the system, but is determined only by the goal
of the command and by the state of the system at that moment.
Denoting
Q(ϕ, t) = min
u∈U
T
t
f0(ϕ(τ), u(τ))dτ, (10.190)
Bellman’s optimality principle leads to the notation
Q(ϕ(t), t) = min
u



t+ t
t
f0 (ϕ (τ) , u(τ)) dτ + min
u
T
t+ t
f0(ϕ(τ), u(τ))dτ



. (10.191)
But
T
t+ t
f0(ϕ(τ), u(τ))dτ = Q(ξ + ξ, t + t), (10.192)
where
ξ =
t+ t
t
f(ϕ, u)dτ. (10.193)
Let us suppose that both terms between brackets in relation (10.191) may be expanded into a
Taylor series and let us make t → 0. It follows that Bellman’s equation
−
∂Q
∂t
= min f0 (ϕ, u) + f (ϕ, u) ,
∂Q
∂ϕ
, (10.194)
Q(ϕ, T ) = 0. (10.195)
If the minimum in the right side of relation (10.194) is attained at only one point u∗, then u∗ is
function of ϕ and ∂Q/∂ϕ, that is,
u∗
= u∗
ϕ,
∂Q
∂ϕ
. (10.196)
Introducing this result in relation (10.194), it follows that a nonlinear system of the form
−
∂Q
∂t
= f0 ϕ, u∗
ϕ,
∂Q
∂ϕ
+ f ϕ, u∗
ϕ,
∂Q
∂ϕ
,
∂Q
∂ϕ
. (10.197)
11Richard E. Bellman (1920–1984) stated this principle in 1952.
PONTRYAGIN’S PRINCIPLE OF MAXIMUM 607
If u∗
is a function of ϕ and t, then system (10.197) is a hyperbolic one, with the characteristics
oriented from t = 0 to t = T .
Let us consider a process described by a system of difference equations
ϕi+1 = g(ϕi, ui), i = 0, N − 1. (10.198)
We must minimize the functional
F(u) =
N−1
i=0
f0(ϕi, ui), (10.199)
the solution of which depends on the initial state ϕ0 and on the number of steps N. If we denote
the searched optimal value by QN (ϕ0), then the problem of minimum leads to the relation
QN (ϕ0) = min
u0
min
[u1,u2, ... ,uN−1]
f0 ϕ0, u0 +
N−1
i=1
f0(ϕi, ui) . (10.200)
But
N−1
i=1
f0(ϕi, ui) = QN−1(ϕ1), (10.201)
obtaining thus
QN (ϕ0) = min
u0
[f0(ϕ0, u0) + QN (ϕ1)]. (10.202)
Step by step, we get the recurrence relations
QN−j (ϕj ) = min
uj ∈U
[f0(ϕj , uj ) + QN−j−1(ϕj+1)], j = 0, N − 2, ϕj+1 = g(ϕj , uj ),
QN−1(ϕj ) = min
uN−1∈U
f0(ϕN−1, uN−1), ϕN−11 = g(ϕN−2, uN−2). (10.203)
If ϕN−1 is known, then from (10.180) we get uN−2, . . . , u0 and QN (ϕ0).
10.12 PONTRYAGIN’S PRINCIPLE OF MAXIMUM
Let us consider the system of ordinary differential equations
dϕ
dt
= f(ϕ, u), 0 ≤ t ≤ T, (10.204)
where
ϕ = φ1 φ2 · · · φn
T
, f = f1 f2 · · · fn
T
, u = u1 u2 · · · un
T
, (10.205)
and to which we add the limit conditions
ϕ(0) ∈ S0, ϕ(T ) ∈ S1, (10.206)
where S0 and S1 are given manifolds in the Euclidian space En.
608 OPTIMIZATIONS
The problem requires that, being given a closed set U ⊂ En, do determine a moment T and a
command u = u(t) ∈ U piecewise continuous, for which the trajectory ϕ = ϕ(t, u) do satisfy the
conditions (10.204) and (10.206), as well as
F(u) =
T
0
f0(ϕ, u)dt = minimum. (10.207)
We will suppose that:
• the functions f(ϕ, u) are definite and continuous in the doublet (ϕ, u), together with the partial
derivatives ∂fi/∂φj , i, j = 1, n;
• the manifolds S0 and S1 are given by the relations
S0 = {ϕ|φi(0) = φ0
i ; i = 1, n}, (10.208)
S1 = {ϕ|hk(ϕ(t)) = 0; k = 1, l, l ≤ n}, (10.209)
where hk(x) are functions with continuous partial derivatives; supplementary, ∇hk(x), k = 1, l,
contains linearly independent components for any x ∈ S1.
Let us remark that if l = n, then we get the optimal control problem (10.204), (10.206), (10.207)
with fixed right end. Condition (10.208) mean fixation of the left end. If S1 = En, then we have an
optimal control problem with a mobile right end, while if 0 < l < n, then we have a problem with
a mobile right end. Immaterial of whether the right end is fixed, free or mobile, the dimension of
the manifold S1 is equal to n − 1.
Theorem 10.1 (Pontryagin12
). Let the system be of ordinary differential equations
dϕ
dt
= f(ϕ, u), u ∈ U, S0 = {ϕ(0) = ϕ0
}, S1 = {hk(ϕ(T )) = 0, k = 1, l} (10.210)
for which the above conditions are fulfilled. Let {ϕ(t), u(t)}, 0 ≤ t ≤ T be the optimal process that
leads the system from the state ϕ0
in the state ϕ1
∈ S1, and let us introduce Hamilton’s function
H(ϕ, ψ, u) =
n
i=0
ψifi(ϕ, u). (10.211)
Under these conditions, there exists the nontrivial vector function
ψ(t) = ψ1 (t) ψ2(t) · · · ψn(t)
T
, ψ0 = const ≤ 0, (10.212)
which satisfies the system of equations
∂ψi
∂t
= −
∂H(ϕ(t), ψ, u(t))
∂φi
, i = 1, n, (10.213)
with the limit conditions
ψi(T ) =
l
k=1
γk
∂hk(ϕ(T ))
∂φi
, i = 1, n, (10.214)
12Lev Semenovich Pontryagin (1908–1988) formulated this principle in 1956.
PROBLEMS OF EXTREMUM 609
where γ1, . . . , γl are numbers such that, at any moment 0 ≤ t ≤ T verifies the condition of maxi-
mum
H(ϕ(t), ψ(t), u(t)) = max
u∈U
H(ϕ(t), ψ(t), u). (10.215)
If the moment T is not fixed, then the following relation takes place
HT = H(ϕ(T ), ψ(T ), u(T )) = 0. (10.216)
The classical problem of the variational calculus consists in the minimization of the functional
F =
T
0
f0 φ,
dφ
dt
, t dt (10.217)
in the class of the functions sectionally smooth, which satisfy the limit conditions φ(0) ∈ S0, φ(T ) ∈
S1, is a particular case of problems (10.204), (10.206), (10.207), that is, to find the minimum of the
functional
F =
T
t0
f0(φ, u, t)dt, (10.218)
with the condition
dφ
dt
= u. (10.219)
10.13 PROBLEMS OF EXTREMUM
Hereby, we will denote a Hilbert space over the field of real numbers by H, and the scalar product
and the norm in H by ·, · H and H, respectively. Let π(u, v) be a symmetric and continuous
bilinear form and L(u) a linear form continuous in H. We also denote by D ⊂ H a convex and
closed set.
We define the quadratic functional
F(v) = π(v, v) + 2L(v), (10.220)
where π(v, v) is positive definite on H, that is, there exists c > 0 with the property
π(v, v) ≥ c v 2
H, (10.221)
for any v ∈ H.
Under these conditions there exists an uniquely determined element u ∈ D, which is the solution
of the problem
F(u) = inf
u∈D
F(v). (10.222)
Theorem 10.2 If the above conditions are fulfilled, then u ∈ D is a solution of problem (10.222)
if and only if for any v ∈ D we have
π(u, v − u) ≥ L(v − u). (10.223)
610 OPTIMIZATIONS
Demonstration. The necessity results from the following considerations. If u is a solution of problem
(10.222), then for any v ∈ D and θ ∈ (0, 1) we have
F(u) ≤ F((1 − θ)u + θv), (10.224)
where we take into account that D is convex.
From equation (10.224), we obtain
F(u + θ(v − u)) + F(u)
θ
≥ 0 (10.225)
and passing to the limit for θ → 0, it follows that
lim
θ→0
F(u + θ(v − u)) + F(u)
θ
= lim
θ→0
2[π(u, v − u) − L(v − u)]
+ lim
θ→0
θπ(v − u, v − u) = 2[π(u, v − u) − L(v − u)] ≥ 0 (10.226)
for any v ∈ D.
Let us now show the sufficiency. Because F(u) is convex, then for any v ∈ D and any θ ∈ (0, 1)
subsists the inequality
F((1 − θ)u + θv) ≤ (1 − θ)F(u) + θF(v), (10.227)
from which it follows that
F(v) − F(u) ≥
F((1 − θ)u + θv) − F(u)
θ
. (10.228)
We pass to the limit for θ → 0 and it follows that
F(u) ≤ F(v) (10.229)
for any v ∈ D, hence the theorem is proved.
Observation 10.20 If we write v = u ± φ with φ ∈ D arbitrary, then
π(u, φ) ≥ L(φ), −π(u, φ) ≥ −L(φ), (10.230)
hence u is a solution of problem (10.222) if and only if, for any φ ∈ D, we have
π(u, φ) = L(φ), (10.231)
that is, Euler’s equation for the variational problem
F(u) = inf
v∈D
F(v). (10.232)
NUMERICAL EXAMPLES 611
10.14 NUMERICAL EXAMPLES
Example 10.1 Let the function be f : [0, 2] → R,
f (x) =
x5
5
− x. (10.233)
We wish to localize the minimum of this function knowing a = 0, b = 0.8, c = 2.
First of all we use the linear algorithm of localization of the minimum and have
a = 0 < 0.8 = b, (10.234)
f (a) = 0, f (b) =
0.85
5
− 0.8 = −0.734464, f (a) > f (b). (10.235)
Let
k = 1.1 (10.236)
and calculate
c1 = b + k(b − a) = 1.68, (10.237)
f (c1) =
1.685
5
− 1.68 = 0.99656 > 0, f (c1) > f (b). (10.238)
It follows that the point of minimum is in the interval [0, 1.68].
On the other hand, the parabola which passes through the points A(0, 0), B(0.8, −0.734464),
C(2, 4.4) is of equation
L2(x) = 2.5984x2
− 2.9968x (10.239)
and attains its minimum at the point
x∗
=
2.9968
2 × 2.5984
= 0.576663. (10.240)
Moreover,
f (x∗
) = −0.563909 < f (2) = 4.4, (10.241)
f (x∗
) = −0.563909 > f (0.8) = −0.734464; (10.242)
hence, the point of minimum of the function f is in the interval [0.8, 2].
To determine the minimum, we may use the algorithm of the golden section, the results being
specified in the Table 10.1.
We may also use the Brent algorithm, the calculation being given in Table 10.2.
In both cases the precision is
ε = 10−3
. (10.243)
Example 10.2 Let us consider the function U : D ⊂ R3 → R,
U(x) = U(x, y, z) = 2x2
+ 5y2
+ 5z2
+ 2xy − 4xz − 4yz, (10.244)
where
D = {(x, y, z) ∈ R3
|x2
+ 2y2
+ z2
≤ 2}. (10.245)
Let p(1)
be the direction given by
p(1)
= 1 2 3
T
. (10.246)
612 OPTIMIZATIONS
TABLE 10.1 Determination of the Minimum of Function (10.233) by Means of the Algorithm
of the Golden Section
Step x0 x1 x2 x3 f (x0) f (x1) f (x2) f (x3)
0 0.000 0.800 1.258 2.000 0.000 −0.735 −0.627 4.400
1 0.000 0.494 0.800 1.258 0.000 −0.489 −0.734 −0.627
2 0.494 0.800 0.975 1.258 −0.489 −0.734 −0.799 −0.627
3 0.800 0.975 1.083 1.258 −0.734 −0.799 −0.785 −0.627
4 0.800 0.908 0.975 1.083 −0.734 −0.785 −0.799 −0.785
5 0.908 0.975 1.016 1.083 −0.785 −0.799 −0.799 −0.785
6 0.975 1.016 1.042 1.083 −0.799 −0.799 −0.796 −0.785
7 0.975 1.001 1.016 1.042 −0.799 −0.800 −0.799 −0.796
8 0.975 0.991 1.001 1.016 −0.799 −0.800 −0.800 −0.799
10 0.991 1.001 1.007 1.016 −0.800 −0.800 −0.800 −0.799
11 0.991 0.997 1.001 1.007 −0.800 −0.800 −0.800 −0.800
TABLE 10.2 Determination of the Minimum of Function (10.233) by Brent’s Algorithm
Step a b u v t x fa fb fu fv ft fx
0 0.000 2.000 2.000 2.000 0.000 1.000 0.000 4.400 4.400 4.400 0.000 −0.800
1 0.000 1.382 1.382 1.382 0.000 1.000 0.000 4.400 4.400 4.400 0.000 −0.800
2 0.618 1.382 0.618 0.000 0.618 1.000 0.000 4.400 −0.374 0.000 −0.374 −0.800
3 0.618 1.146 1.146 0.618 1.146 1.000 0.000 4.400 −0.600 −0.374 −0.600 −0.800
4 0.854 1.146 0.854 1.146 0.854 1.000 0.000 4.400 −0.751 −0.600 −0.751 −0.800
5 0.854 1.056 1.056 0.854 1.056 1.000 0.000 4.400 −0.763 −0.751 −0.763 −0.800
6 0.944 1.056 0.944 1.056 0.944 1.000 0.000 4.400 −0.793 −0.763 −0.793 −0.800
7 0.944 1.021 1.021 0.944 1.021 1.000 0.000 4.400 −0.794 −0.793 −0.794 −0.800
8 0.979 1.021 0.979 1.021 0.979 1.000 0.000 4.400 −0.799 −0.794 −0.799 −0.800
9 0.979 1.008 1.008 0.979 1.008 1.000 0.000 4.400 −0.799 −0.799 −0.799 −0.800
We wish to determine the other G -conjugate directions too, as well as the minimum of the
function U.
To do this, we calculate the Hessian matrix
∇2
U(x) =










∂2
U
∂x2
∂2
U
∂x∂y
∂2
U
∂x∂z
∂2
U
∂x∂y
∂2
U
∂y2
∂2
U
∂y∂z
∂2
U
∂x∂z
∂2
U
∂y∂z
∂2
U
∂z2










=


4 2 −4
2 10 −4
−4 −4 10

 . (10.247)
The second G -conjugate direction is determined by the relation
1 2 3


4 2 −4
2 10 −4
−4 −4 10




p12
p22
p23

 = 0, (10.248)
which leads to the equation
−4p21 + 10p22 + 18p23 = 0; (10.249)
NUMERICAL EXAMPLES 613
we choose
p21 = 2, p22 = −1, p23 = 1. (10.250)
We have obtained
p(2)
= 2 −1
T
. (10.251)
The last G -conjugate direction is given by the relation
2 −1 1


4 2 −4
2 10 −4
−4 −4 10




p31
p32
p33

 = 0, (10.252)
from which
2p31 − 10p32 + 6p33 = 0. (10.253)
We choose
p31 = 2, p32 = 1, p33 = 1, (10.254)
hence
p(3)
= 2 1 1
T
. (10.255)
We take as start point the value
x(0)
= 1 0 1
T
. (10.256)
The expression
U(x(0)
+ αp(1)
) = U




1 + α
2α
1 + 3α



 = 35α2
+ 14α + 3 (10.257)
becomes minimum for
α = −
1
5
(10.258)
and it follows that
x(1)
= x(0)
+ αp(1)
= 4
5 −2
5
2
5
T
. (10.259)
We calculate
U(x(1)
+ αp(2)
) = U
















4
5
+ 2α
−
2
5
− α
2
5
+ α
















= 26α2
+
104
5
α +
104
25
. (10.260)
The minimum of this expression is attained for
α = −
2
5
, (10.261)
from which
x(2)
= x(1)
+ αp(2)
= 0 0 0
T
(10.262)
614 OPTIMIZATIONS
Finally, the expression
U(x(2)
+ αp(3)
) = U




2α
α
α



 = 10α2
(10.263)
attains its minimum for
α = 0 (10.264)
and it follows that
x(3)
= x(2)
+ αp(3)
= 0 0 0
T
= x(2)
. (10.265)
The point of minimum of the function U is given by x(3)
, while the minimum value of U is
Umin = U(x(3)
) = 0. (10.266)
Example 10.3 Let us consider the function U : R3 → R,
U(x) = U(x, y, z) = ex2
(y2
+ z2
), (10.267)
for which we wish to calculate the minimum by Powell’s algorithm. We know
ε = 10−2
, iter = 3, (10.268)
x(0)
= 2 1 −3
T
. (10.269)
We have
U(x(k−1)
+ αp(k)
) = U






x(k−1) + αp(k)
1
y(k−1)
+ αp(k)
2
z(k−1)
+ αp(k)
3






= e(x(k−1)+αp
(k)
1 )2
[(y(k−1)
+ αp(k)
2 )2
+ (z(k−1)
+ αp(k)
3 )2
] (10.270)
dU(x(k−1)
+ αp(k)
)
dα
= e(x(k−1)+αp
(k)
1 )2 2 x(k−1) + αp(k)
1 p(k)
1 + (y(k−1)
+ αp(k)
2 )2
+ (z(k−1)
+ αp(k)
3 )2
+2p(k)
2 (y(k−1)
+ αp(k)
2 ) + 2p(k)
3 (z(k−1)
+ αp(k)
3 )
= e(x(k−1)+αp
(k)
1 )2
F(α). (10.271)
The value αmin which minimizes the expression (10.271) is obtained by solving the equation of
second degree
F(α) = 0. (10.272)
The directions p(1)
, p(2)
and p(3)
are
p(1)
= 1 0 0
T
, p(2)
= 0 1 0
T
, p(3)
= 0 0 1
T
. (10.273)
NUMERICAL EXAMPLES 615
We have
U(x(0)
+ αp(1)
) = U




2 + α
1
−3



 = 10e(2+α)2
, (10.274)
dU(x(0) + αp(1))
dα
= 20e(2+α)2(2+α)
, (10.275)
from which
αmin = −2, (10.276)
x(1)
= x(0)
− 2p(1)
= 0 1 −3
T
. (10.277)
We calculate now
U(x(1)
+ αp(2)
) = U




0
1 + α
−3



 = 10 + 2α + α2
, (10.278)
dU(x(1)
+ αp(2)
)
dα
= 2α + 2, (10.279)
such that
αmin = −1, (10.280)
x(2)
= x(1)
− p(2)
= 0 0 −3
T
. (10.281)
Finally, we also find
U(x(2)
+ αp(3)
) = U




0
0
−3

 + α

 = 9 − 6α + α2
(10.282)
so that
αmin = 3, (10.283)
x(3)
= x(2)
+ 3p(3)
= 0 0 0
T
. (10.284)
On the other hand, the new value p(3)
is given by
p(3)
= x(3)
− x(2)
= 0 0 3
T
; (10.285)
we have
U(x(3)
+ αo(3)
) = U




0
0
3α



 = 9α2
, (10.286)
from which
αmin = 0, (10.287)
x(4)
= x(3)
= 0 0 0
T
. (10.288)
616 OPTIMIZATIONS
But
x(4)
− x(3)
= 0 < ε, (10.289)
such that the point of minimum is determined by
xmin = 0 0 0
T
, (10.290)
the minimum value of the function U being
Umin = U(xmin) = 0. (10.291)
Example 10.4 Let us consider again the function U of Example 10.3, for which we will calculate
the minimum using gradient type methods.
We begin by the gradient method. Therefore, we calculate
∇U(x) =



2xex2
y2
+ z2
2yex2
2zex2


 (10.292)
and it follows that
∇U(x(0)
) =


40e4
2e4
−6e4

 , (10.293)
this being the first direction p(1).
The scalar α0 minimizes the expression
U(x(0)
+ α0p(1)
) = U




2 + 40α0e4
1 + 2α0e4
−3 − 6α0e4



 = e(2+40α0e4)2
(10 + 40α2
0e8
+ 40α0e4
). (10.294)
But
U (α0) = e(2+40α0e4)2
(3200e12
α2
0 + 3280α0e8
+ 840e4
) (10.295)
and the equation U (α0) = 0 leads to
α01 = −
21
40e4
or α02 = −
1
2e4
. (10.296)
Then
U(α01) =
e361
40
, U(α02) = 0, (10.297)
so that we choose α0 = α02.
It follows that
x(1)
= x(0)
−
1
2e4
p(1)
=


2
1
−3

 −


20
1
−3

 =


−18
0
0

 . (10.298)
We calculate
∇U(x(1)
) =


0
0
0

 ; (10.299)
hence, the sequence x(k) becomes constant x(k) = x(1), k ≥ 2.
NUMERICAL EXAMPLES 617
If we wish to solve the problem by methods of conjugate gradient, then we calculate:
• for the Fletcher–Reeves method:
β1 =
[∇U(x(1))]T[∇U(x(1))]
[∇U(x(0))]T[∇U(x(0))]
= 0; (10.300)
• for the Polak–Ribi`ere method:
y(0)
= ∇U(x(1)
) − ∇U(x(0)
) =


−40e4
−2e4
6e4

 , (10.301)
β0 =
[∇U(x(1))]Ty(0)
[∇U(x(0))]T[∇U(x(0))]
= 0; (10.302)
• for the Hestenes–Stiefel method:
β0 =
[∇U(x(1))]Ty(0)
[(y(0))]T p(0)
= 0, (10.303)
p(1)
= −∇U(x(1)
) − β0p(0)
=


0
0
0

 . (10.304)
We observe that in all cases, we obtain the same constant sequence x(k) = x(1), k ≥ 2, hence
Umin = 0.
Comparing the Example 10.3 and Example 10.4, we see that we do not obtain the same points
of minimum. This may be explained by the fact that the function U has an infinity of points of
minimum characterized by x ∈ R arbitrary, y = 0, z = 0.
Example 10.5 We wish to solve the linear system
5x1 + 2x2 + 2x3 = 11, 2x1 + 5x2 + 2x3 = 14, 2x1 + 2x2 + 5x3 = 11, (10.305)
using methods of gradient type and starting with
x(0)
= −1 1 0
T
. (10.306)
We know the values
ε = 10−3
, δ = 10−1
, iter = 10. (10.307)
We have
A =


5 2 2
2 5 2
2 2 5

 , b =


11
14
11

 . (10.308)
The matrix A is positive definite because
x1 x2 x3
T
[A]


x1
x2
x3

 = (x1 + 2x2)2
+ (x2 + 2x3)2
+ (x3 + 2x1)2
. (10.309)
The data are given in Table 10.3.
618 OPTIMIZATIONS
TABLE 10.3 Solution of System (10.305) by the Gradient Method
Step x p p, p (p, Ap) α
0


−1
0
1




14.00000
14.00000
8.00000

 456.00000 3960.00000 0.11515
1


0.61212
1.61212
1.92121




0.87273
0.87273
−3.05455

 10.85355 35.98810 0.30159
2


0.87532
1.87532
1.00000




0.87273
0.87273
0.49870

 1.77201 15.38850 0.11515
3


0.97582
1.97582
1.05743




0.05440
0.05440
−0.19041

 0.04218 0.13985 0.30159
4


0.99223
1.99223
1.00000




0.05440
0.05440
0.03109

 0.00689 0.05980 0.11515
5


0.99849
1.99849
1.00358




0.00339
0.00339
−0.01187

 0.00016 0.00054 0.30159
6


0.99952
1.99952
1.00000




0.00339
0.00339
0.00194

 0.00003 0.00023 0.11515
7


0.99991
1.99991
1.00000




0.00021
0.00021
−0.00074

 0.00000 0.00000 0.30159
8


0.99997
1.99997
1.00000

 – – – –
If we apply the Fletcher–Reeves method, then we obtain the data given in Table 10.4.
Example 10.6 Let the function be U : R3 → R,
U(x) = U(x, y, z) = 5x2
+ 5y2
+ 5z2
+ 4xy + 4yz + 4xz, (10.310)
for which we wish to determine the minimum, using Newton type methods.
We know
ε = 10−2
, (10.311)
B0 =


1 0 0
0 1 0
0 0 1

 = I3, (10.312)
while the start vector is
x(0)
= 1 −1 1
T
. (10.313)
NUMERICAL EXAMPLES 619
TABLE 10.4 Solution of System (10.305) by the Fletcher–Reeves Method
Step x p r p, p α β
0


−1.00000
0.00000
1.00000




14.00000
14.00000
8.00000




14.00000
14.00000
8.00000

 456.00000 0.11515 0.00274
1


0.61212
1.61212
1.92121




0.91110
0.91110
−3.03262




0.87273
0.87273
−3.05455

 10.85698 0.30572 0.03963
2


0.89067
1.89067
0.99407




0.81331
0.81331
0.34682




0.77720
0.77720
0.46700

 1.44323 0.11769 0.00126
3


0.98638
1.98638
1.03488




0.02660
0.02660
−0.11950




0.02557
0.02557
-0.11994

 0.01570 0.28084 0.06347
We calculate
∇U(x) =


10x + 4y + 4z
10y + 4x + 4z
10z + 4x + 4y

 , (10.314)
∇U2
(x) =


10 4 4
4 10 4
4 4 10

 . (10.315)
The matrix ∇U2
(x) is positive definite because
x1 x2 x3


10 4 4
4 10 4
4 4 10




x1
x2
x3

 = 2[(x + 2y)2
+ (y + 2z)2
+ (x + 2x)2
]. (10.316)
Moreover,
[∇2
U(x)]−1
=
1
648


84 −24 −24
−24 84 −24
−24 −24 84

 . (10.317)
In the case of Newton’s method we obtain the sequence of iterations
x(k+1)
= x(k)
− [∇2
U(x)]−1
∇U(x(k)
) =








−
8
27
x(k)
1 +
4
27
x(k)
2 +
4
27
x(k)
3
4
27
x(k)
1 −
8
27
x(k)
2 +
4
27
x(k)
3
4
27
x(k)
1 +
4
27
x(k)
2 −
8
27
x(k)
3








. (10.318)
The calculations are given in Table 10.5.
620 OPTIMIZATIONS
TABLE 10.5 Determination of the Minimum of the Function U by Newton’s Method
Step x1 x2 x3
0 1.000000 −1.000000 1.000000
1 −0.296296 0.592593 −0.296296
2 0.131687 −0.263374 0.131687
3 −0.058528 0.117055 −0.058528
4 0.026012 −0.052025 0.026012
5 −0.011561 0.023122 −0.011561
6 0.005138 −0.010276 0.005138
7 −0.002284 0.004567 −0.002284
8 0.001015 −0.002030 0.001015
9 −0.000451 0.000902 −0.000451
10 0.000200 −0.000401 0.000200
11 −0.000089 0.000178 −0.000089
12 0.000040 −0.000079 0.000040
13 −0.000018 0.000035 −0.000018
14 0.000008 −0.000016 0.000008
15 −0.000003 0.000007 −0.000003
16 0.000002 −0.000003 0.000002
17 0.000001 0.000001 −0.000001
18 0.000000 −0.000001 0.000000
19 −0.000000 0.000000 −0.000000
20 0.000000 −0.000000 0.000000
In the case of Davidon–Fletcher–Powell method we have successively
B0p(0)
= −∇U(x(0)
),


1 0 0
0 1 0
0 0 1







p(0)
1
p(0)
2
p(0)
3





= −


10
−2
10

 , (10.319)
p0 = −10 2 −10
T
, (10.320)
U(x(0)
+ αp(0)
) = U




1 − 10α
−1 + 2α
1 − 10α



 = 1260α2
− 204α + 11. (10.321)
This expression is minimized for
α0 =
17
210
(10.322)
and it follows that
x(1)
= x(0)
+ α0p(0)
=
4
21
−
88
105
4
21
, (10.323)
y(0)
= ∇U(x(1)
) − ∇U(x(0)
) =








−
374
35
−
34
7
−
374
35








, (10.324)
NUMERICAL EXAMPLES 621
z(0)
= y(0)
+ α0∇U(x(0)
) =








−
374
35
−
34
7
−
374
35








+
17
10


10
−2
10

 =








221
35
−
289
35
221
35








, (10.325)
B1 = B0 +
z(0)
[y(0)
]T
+ y(0)
[z(0)
]T
[y(0)]T[x(1) − x(0)]
−
[z(0)
]T
[x(1)
− x(0)
]
{[y(0)]T[x(1) − x(0)]}2
y(0)
[y(0)
]T
=


−7.171836 4.971392 −8.171836
4.971392 1.971425 4.971392
8.171021 0.114249 9.171021

 . (10.326)
Obviously, the procedure may continue.
The application of the Broyden–Fletcher–Goldfarb–Shanno method is completely similar.
The minimum of the function U(x) is obtained for
xmin = 0, U(xmin) = 0. (10.327)
Example 10.7 Let the problem of linear programming be
maxim(2x1 − x2) =?, (10.328)
with the restrictions
x1 + x2 ≤ 5, x2 − x1 ≤ 4, x2 − x1 ≥ −3, x2 +
4
3
x1 ≥ 4. (10.329)
Having only two variables x1 and x2, we can associate the straight lines
d1 : x1 + x2 + 5 = 0, d2 : x2 − x1 − 4 = 0, d3 : x2 − x1 + 3 = 0, d4 : x2 +
4
3
x1 − 4 = 0,
(10.330)
represented in Figure 10.1.
x1
x2
O 54321
5
4
3
2
1
D
C
B A
d4
d1
d3
d2
Figure 10.1 Geometric solution of the problem of linear programming (10.328) and (10.329).
622 OPTIMIZATIONS
These lines define the quadrilateral ABCD, its vertices having the coordinates
A(0.5, 4.5), B(0, 4), C(3, 0), D(4, 1). (10.331)
The function f : R2
→ R,
f (x1, x2) = 2x1 − x2 (10.332)
has in these points, the values
f (A) = −3.5, f (B) = −8, f (C) = 6, f (D) = 7, (10.333)
the maximum value taking place at the point D.
It follows that the solution of problem (10.328) and (10.329) is
maxim(2x1 − x2) = 7. (10.334)
The same problem (10.328) and (10.329) to which we add the conditions xi ≥ 0, i = 1, 2, may
be solved by the primal Simplex algorithm.
Thus, the system of restriction (10.329) will be replaced by the system
x1 + x2 + x3 = 5, x2 − x1 + x4 = 4, x1 − x2 + x5 = 3,
4
3
x1 + x2 − x6 + x7 = 4, xi ≥ 0, i = 1, 7, (10.335)
while problem (10.328) will be replaced by
minim f (x) = minim (x2 − 2x1) =?. (10.336)
We construct the Simplex table.
x1 x2 x3 x4 x5 x6 x7
−2 1 0 0 0 0 0 0
x3 1 1 1 0 0 0 0 5
x4 −1 1 0 1 0 0 0 4
x5 1 −1 0 0 1 0 0 3
x7
4
3
1 0 0 0 −1 1 4
A basic solution is
x1 = 0, x2 = 0, x3 = 5, x4 = 4, x5 = 3, x6 = 0, x7 = 4. (10.337)
At the first iteration, x1 enters in the basis and x5 exits from the basis (it is possible for the exit
of x7 too, because 3/1 = 4/(4/3)!).
It follows that the new table
x1 x2 x3 x4 x5 x6 x7
0 −1 0 0 2 0 0 6
x3 0 2 1 0 −1 0 0 2
x4 0 0 0 1 1 0 0 7
x1 1 −1 0 0 1 0 0 3
x7 0
7
3
0 0 −
4
3
−1 1 0
APPLICATIONS 623
The new basic solution reads
x1 = 3, x2 = 0, x3 = 2, x4 = 7, x5 = 0, x6 = 0, x7 = 0. (10.338)
In the next step, x2 enters in the basis instead x3 and we obtain the new Simplex table.
x1 x2 x3 x4 x5 x6 x7
0 0
1
2
0
3
2
0 0 7
x2 0 2 1 0 −1 0 0 2
x4 0 0 0 1 1 0 0 7
x1 1 0
1
2
0
1
2
0 0 4
x7 0 0 −
7
6
0 −
1
6
−1 1 −
7
3
It follows that the solution
x1 = 4, x2 = 1, x3 = 0, x4 = 7, x5 = 0, x6 =
7
3
, x7 = 0. (10.339)
We observe that the anomaly which appears in the last line of the Simplex table, that is, the
solution x6 = 0, x7 = −7/3 is due to the modality of transformation of the last relation (10.329) in
the last equality (10.335). Indeed, we would obtain
4
3
x1 + x2 ≥ 4,
4
3
x1 + x2 − x6 = 4, (10.340)
but not the unit column corresponding to x6. In this situation, we have written x6 → x6 − x7 to
obtain the unit column for the variable x7. In fact, this has been only a trick to start the Simplex
algorithm.
Analogically, we can use the dual Simplex algorithm, obviously after the transformation of
problem (10.335) and (10.336) in the dual problem.
10.15 APPLICATIONS
Problem 10.1
Let us consider the model of half of an automotive in Figure 10.2, formed of the bar AB of length
l1 + l2 and of the nonlinear springs 1 and 2. The forces in the two springs are given by f1(z) and
f2(z), respectively, where z is the elongation, while the functions f1 and f2 are odd in the variable
z. The weight of the bar is G, its center of gravity C being at the distance l1 with respect to A,
while its moment of inertia with respect to this center is J. We suppose that the rotation θ of the
bar AB is small and that the springs have the same length in the nondeformed state.
Determine the positions of equilibrium of the bar.
Numerical application for G = 5000 N, f1(z) = f2(z) = f (z), f (z) = kzp
, p = 1 or p = 3,
k = 25000 N/mp
, l1 = 1.5 m, l2 = 2.5 m.
Solution:
1. Theoretical aspects
The system has two degrees of freedom: the displacement x of the center of gravity C and the
rotation θ of the bar. We have denoted the position of the bar in the absence of any deformation
by A0B0.
624 OPTIMIZATIONS
A
G
C0
C
A0 B0
B
x1
x2
l1 l2
21
x
θ
Figure 10.2 Theoretical model.
There result the displacements x1 and x2 of the ends A and B, respectively, in the form
x1 = x − l1θ, x2 = x + l2θ. (10.341)
The theorem of momentum leads to the equation
G + f1(x − l1θ) + f2(x + l2θ) = 0, (10.342)
while the theorem of moment of momentum, with respect to the center of gravity C allows to
write
f1(x − l1θ)l1 − f2(x + l2θ)l2 = 0. (10.343)
The two equations (10.342) and (10.343) may be put together in the equation
U(x) = U(x, θ) = 0, (10.344)
where
U(x, θ) = [G + f1(x − l1θ) + f2(x + l2θ)]2
+ [l1f1(x − l1θ) + l2f2(x + l2θ)]2
. (10.345)
If the system formed by equation (10.342) and equation (10.343) has a solution, then equation
(10.344) has a solution too and reciprocally. The determination of the solution of equation
(10.344) is equivalent, in this case, to the determination of the minimum of the function U,
given by expression (10.345).
2. Numerical case
For p = 1, we have successively
f1(x − l1θ) = 25000(x − 1.5θ), (10.346)
f2(x + l2θ) = 25000(x + 2.5θ), (10.347)
APPLICATIONS 625
the function U being of the form
U(x, θ) = [5000 + 2500(x − 1.5θ) + 25000(x + 2.5θ)]2
+ {1.5[25000(x − 1.5θ)] − 2.5[25000(x + 2.5θ)]}2
(10.348)
or equivalent,
U(x, θ) = (5000 + 50000x + 25000θ)2
+ (−25000x − 212500θ)2
. (10.349)
It follows that
U(x, θ) = 3.125 × 109
x2
+ 4.578125 × 1010
θ2
+ 1.3125 × 1010
xθ
+ 5 × 108
x + 2.5 × 108
θ + 2.5 × 107
, (10.350)
with the solution
θ = −0.011 rad, x = −0.094 m. (10.351)
For p = 3, we obtain
U(x, θ) = [5000 + 2500(x − 1.5θ)3
+ 25000(x + 2.5θ)3
]2
+ {1.5[25000(x − 1.5θ)3
] − 2.5[25000(x + 2.5θ)3
]}2
, (10.352)
with the solution
θ = 0.0196 rad, x = −0.47064 m. (10.353)
Problem 10.2
Let us consider the linear program
min cT
x (10.354)
with the restrictions
Bx = b, x ≥ 0, (10.355)
where x ∈ Mn,1(R), b ∈ Mm,1(R), c ∈ Mn,1(R), B ∈ Mmn (R).
Let us solve this program in the case m = n − 1, while B is a full rank matrix.
Solution: Because B is a full rank matrix, it follows that the components of the vector x may be
written as a function of only one component, assumed as x1, that is,
x2 = α2x1 + β2, . . . , xn = αnx1 + βn. (10.356)
The function
f (x) = cT
x (10.357)
now takes the form
f (x) = c1x1 + · · · + cnxn = ax1 + b, (10.358)
that is, it becomes a linear function in a single unknown x1.
If a ≥ 0, then obviously,
min f = f (0) = b. (10.359)
626 OPTIMIZATIONS
If a < 0, then one considers relation (10.356). If the coefficients αi, i = 2, n are positive, then
expression (10.356) does not introduce any restriction on the variable x1 and the program does not
have an optimal solution. If there exist negative coefficients αj , then from
xj = αj x1 + βj , xj ≥ 0, (10.360)
we deduce
αj x1 + βj ≥ 0, (10.361)
hence
x1 ≤ −
βj
αj
. (10.362)
If there exists at least a strictly negative βj , then there results x1 strictly negative and the linear
program does not have an optimal solution.
It follows that in the case a < 0, the necessary and sufficient condition to have an optimal
solution for the program consists in the existence of at least expression (10.356) with αj < 0 and
in the condition that the expressions of this form have strictly positive coefficients βj .
Let us remark that if relation (10.359) takes place, then the linear program has an optimal solution
if and only if all the values βi ≥ 0, i = 2, n.
FURTHER READING
Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis:
Theory, Methods and Practice. Boca Raton: CRC Press.
Atkinson K, Han W (2010). Theoretical Numerical Analysis: A Functional Analysis Framework. 3rd
ed. New York: Springer-Verlag.
Baldick R (2009). Applied Optimization: Formulation and Algorithms for Engineering Systems. Cam-
bridge. Cambridge University Press.
Berbente C, Mitran S, Zancu S (1997). Metode Numerice. Bucures¸ti: Editura Tehnic˘a (in Romanian).
Boyd S, Vandenberghe L (2004). Convex Optimization. Cambridge: Cambridge University Press.
Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson.
Chong EKP, ˙Zak SH (2008). An Introduction to Optimization. 3rd ed. Hoboken: John Wiley & Sons,
Inc.
Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice Hall.
Dennis JE Jr, Schnabel RB (1987). Numerical Methods for Unconstrained Optimization and Nonlinear
Equations. Philadelphia: SIAM.
Diwekar U (2010). Introduction to Applied Optimization. 2nd ed. New York: Springer-Verlag.
Fletcher R (2000). Practical Methods of Optimizations. 2nd ed. New York: John Wiley & Sons, Inc.
Golub GH, van Loan CF (1996). Matrix Computations. 3rd ed. Baltimore: John Hopkins University
Press.
Griva I, Nash SG, Sofer A (2008). Linear and Nonlinear Optimization. 2nd ed. Philadelphia: SIAM.
Hamming RW (2012). Introduction to Applied Numerical Analysis. New York: Dover Publications.
Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill.
Jazar RN (2008). Vehicle Dynamics: Theory and Applications: New York: Springer-Verlag.
Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd
ed. Boca Raton: CRC Press.
Kleppner D, Kolenkow RJ (2010). An Introduction to Mechanics. Cambridge: Cambridge University
Press.
FURTHER READING 627
Lange K (2010a). Numerical Analysis for Statisticians. 2nd ed. New York: Springer-Verlag.
Lanczos C (1949). The Variational Principles of Mechanics. Toronto: University of Toronto Press.
Lange K (2010b). Optimization. New York: Springer-Verlag.
Lawden DF (2006). Analytical Methods of Optimization. 2nd ed. New York: Dover Publications.
Luenberger DG (1997). Optimization by Vector Space Methods. New York: John Wiley & Sons, Inc.
Lurie AI (2002). Analytical Mechanics. New York: Springer-Verlag.
Marciuk GI (1983). Metode de Analiz˘a Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Roma-
nian).
Meriam JL, Kraige LG (2012). Engineering Mechanics: Dynamics. Hoboken: John Wiley & Sons, Inc.
Nocedal J, Wright SJ (2006). Numerical Optimization. 2nd ed. New York: Springer-Verlag.
Pandrea N, Pˆarlac S, Popa D (2001). Modele pentru Studiul Vibrat¸iilor Automobilelor. Pites¸ti: Tiparg
(in Romanian).
Rao SS (2009). Engineering Optimization: Theory and Practice. 3rd ed. Hoboken: John Wiley & Sons,
Inc.
Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press.
Ruszczy´nski A (2006). Nonlinear Optimization. Princeton: Princeton University Press.
Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson.
St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
S¨uli E, Mayers D (2003). An Introduction to Numerical Analysis. Cambridge: Cambridge University
Press.
Venkataraman P (2009). Applied Optimization with MATLAB Programming. 2nd ed. Hoboken: John
Wiley & Sons, Inc.
INDEX
Adams method, 463
Adams predictor-corrector method, 469
fifth-order, 470
fourth-order, 470
third-order, 469
Adams–Bashforth methods, 465
fifth-order, 467
fourth-order, 467
third-order, 467
Adams–Moulton methods, 468
fifth-order, 468
fourth-order, 468
third-order, 468
sufficient condition for convergence, 469
aleatory variable, 151
almost mini–max
approximation, 345
polynomial, 346
approximate osculating polynomial, 339
approximation of functions by trigonometric
functions, 346
Banach theorem, 38
base of a vector space, 124
Bellman
equation, 606
principle, 606
Bernoulli
method, 76
numbers, 395
Bessel
formulae of interpolation, 324
Numerical Analysis with Applications in Mechanics and Engineering, First Edition.
Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.
 2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc.
dichotomy, 325
first, 324
quadratic, 325
inequality, 348
Bierge–Vi`ete method, 79
biharmonic
function, 546
polynomials, 546
bipartition method, 17
a posteriori estimation of the error, 19
a priori estimation of the error, 19
convergence, 18
bisection method see bipartition method
Brent algorithm, 582
Broyden–Fletcher–Goldfarb–Shanno method, 593
Budan theorem, 67
Budan–Fourier theorem see Budan theorem
calculation
process, 1
stability, 1
Cauchy
criterion of convergence for a sequence of matrices,
119
problem, 452
correct stated, 452
perturbed, 452
Cauchy–Buniakowski–Schwarz inequality, 117
Cauchy–Lipschitz theorem, 452
characteristic
equation, 131, 153, 156
polynomial, 131
629
630 INDEX
Chebyshev
interpolation polynomials, 340, 407
polynomials, 420
quadrature formulae, 398
theorem
for aleatory variable, 152
for polynomials, 342
Cholesky method, 137
chord method, 20
a posteriori estimation of the error, 23
a priori estimation of the error, 24
convergence, 20
complete sequence, 550
conditions
essential, 551
natural, 551
conjugate directions, 583
algorithm, 583
G-conjugate directions, 584
Constantinescu torque converter, 511
contraction method, 37
a posteriori estimation of the error, 42
a priori estimation of the error, 41
control problem, 605
convex programming, 600
problem, 601
dual problem, 602
Courant number, 532
Cramer rule, 133
Crank–Nicholson
algorithm, 542
method, 542
Crout method, 136
Danilevski method, 157
Davidon–Fletcher–Powell method, 593
Descartes theorem, 65
determinant
calculation
using definition, 111
using equivalent matrices, 112
definition, 111
determination of limits of the roots of polynomials, 55,
58
determination of the minimum, 580
diagonal form of a matrix, 134
Dirichlet
conditions, 349
generalized theorem, 347
theorem, 349
direct power method, 160
accelerated convergence, 163
discriminant of the equation of third degree, 88
displacement method, 166
divided differences, 327
Doolittle method, 136
dynamic programming, 605
eigenvalue, 153
eigenvector, 153
elementary transformations, 593
energetic space, 549
equivalent systems, 593
errors
absolute, 3
approximation, 2
enter data, 1
in integration of ordinary differential equations, 473
propagation, 3
addition, 3
computation of functions, 8
division of two numbers, 7
inversion of a number, 7
multiplication, 5
raising to a negative entire power, 7
subtraction, 8
taking root of pth order, 7
relative, 3
round-off, 3
Euler
formulae of integration
first, 395
second, 396
method, 454
algorithm, 455
determination of the error, 456
modified, 460
predictor-corrector method, 469
variational
equation, 547
problem, 610
Euler–Maclaurin formulae of integration see Euler
formulae of integration
Everett formulae of interpolation, 326
first, 326
second, 326
explicit system, 594
extremum, 609
finite differences, 312
Fletcher–Reeves
algorithm, 590
method, 588
Fourier
approximation of functions see approximation of
functions by trigonometric functions
generalized coefficients, 347
generalized series, 347
method, 568, 571
Frame–Fadeev method, 131
Frobenius form of a matrix, 158
full rank matrix, 141
Galerkin method, 551
Gauss
formulae of interpolation, 322
first, 322
second, 323
method, 133
quadrature formulae, 405
INDEX 631
Gauss–Jordan method
for inversion of matrices, 124
for linear systems, 134
Gauss–Legendre quadrature formulae see Gauss
quadrature formulae
Gauss–Seidel method, 147
convergence, 147
error estimation, 148
Gauss type quadrature formulae, 412
Gauss–Hermite, 414
Gauss–Jacobi, 413
Gauss–Laguerre, 415
in which appear the derivatives, 418
with imposed points, 417
generalized power, 316
generalized solution, 549, 551
Givens rotation matrices, 171
golden section algorithm, 580
gradient
conditional gradient method, 602
conjugate gradient method, 587
gradient type methods in optimizations, 585
algorithm, 586
convergence, 587
gradient projection method, 602
method for linear systems, 589
algorithm, 590
method for nonlinear systems, 277
Gramm–Schmidt procedure, 406
Grashof formula, 524
Hamilton–Cayley equation see characteristic equation
Hamming predictor-corrector method, 470
Hermite
formula, 331
interpolation, 339
interpolation polynomial, 340
interpolation theorem, 340
polynomials, 408
theorem, 330
Hessian matrix, 577
Hestenes–Stiefel method, 589
Heun method, 460
Horner generalized schema, 70
Householder
matrix, 169
reflexion, 169
vector, 169
Hudde method, 87
improper integrals, 382
calculation, 420
infinite systems of linear equations, 152
completely regular, 152
regular, 152
interpolation, 307
knots, 307
with exponentials, 355
with rational functions, 355
inverse interpolation, 332
determination of the roots of an equation, 333
with arbitrary division points, 333
with equidistant division points, 332
inverse power method, 165
inversion of matrices, 123
by means of the characteristic polynomial, 131
by partition, 125
direct, 123
iterative methods
for inversion of the matrices, 128
a priori estimation of the error, 130
convergence, 129
for linear systems, 142
a posteriori estimation of the error, 146
a priori estimation of the error
convergence
for nonlinear systems, 273
Jacobi
method see iteration method
polynomials, 408
Jacobian, 275
Kantorovich method, 422
Krylov method, 155
Lagrange
function, 601
saddle point, 601
interpolation polynomial, 307
evaluation of the error, 310
existence and uniqueness, 307
method, 69
Laguerre polynomials, 409
Lax–Wendorff schema, 533
least square method
for approximation of functions, 352
for overdetermined systems, 174
for partial differential equations, 355
Legendre polynomials, 400, 407
Leverrier method, 166
Lin methods, 79
first method, 79
second method, 80
linear equivalence method (LEM), 471
first LEM equivalent, 471
second LEM equivalent, 472
linear programming
admissible solution, 595
canonical form
formulation of the problem, 595
geometrical interpretation, 597
optimal solution (program), 595
possible solution, 595
restrictions, 595
standard form, 596
linear transformation, 153
Lobacevski–Graeffe method, 72
case of a pair of complex conjugate roots, 74
case of distinct real roots, 72
632 INDEX
Lobacevski–Graeffe method (Continued)
case of two pairs of complex conjugate roots, 75
localization of the minimum, 579
algorithm, 579
L–R method, 166
LU factorization, 135
Markoff formula, 333
Markov chain, 150
mathematical expectation, 151
matrix
symmetric, 137
positive definite, 137
method
of entire series, 280
of one-dimensional search, 583
of penalizing functions, 603
of possible direction, 603
of terms grouping, 59
modulus of a matrix, 114
Milne fourth-order predictor-corrector method, 470
minimization along a direction, 578
minimum, 577
global, 577
local, 577
minimum residual, 175
mini–max
approximations of functions, 344
principle, 344
Moivre formula, 341
Monte Carlo method
for definite integrals, 423
for linear systems, 150
multibody dynamics, 128, 492, 499, 504
multistep methods, 462
explicit (open), 462
implicit (closed), 462
Newton
direction, 590
formula with divided differences, 331
formulae, 166
interpolation polynomials, 317
backward, 319
error, 322
forward, 317
method
for one dimensional case see tangent method
for systems of nonlinear equations, 275
convergence, 276
modified, 276
stopping condition, 276
simplified method, 33
a posteriori estimation of the error, 35
a priori estimation of the error, 35
convergence, 33
theorem, 59
Newton type methods, 590
quasi Newton method, 593
Newton method, 590
Newton–Cˆotes
error in quadrature formula, 385
quadrature, 384
quadrature formula, 385
Newton–Kantorovich method, 42
a posteriori estimation of the error, 45
a priori estimation of the error, 45
convergence, 42
Newton–Raphson method, 277
norm of a matrix
canonical, 116
definition, 115
1 norm, 116
2 norm, 173, 193
k norm, 116, 169
∞ norm, 116
numerical differentiation, 377
by means of expansion into a Taylor series,
377
approximation error, 379
by means of interpolation polynomials, 380
numerical integration, 382
optimality conditions, 578
optimizations, 577
orthogonal
matrix, 170
polynomials, 406
properties, 410
overdetermined systems, 174
Parseval relation (equality), 348
partial differential equations of first order, 529
characteristic hypersurfaces, 530
characteristic system, 530
homogeneous, 530
numerical solution with explicit schemata, 530
numerical solution with implicit schemata, 530,
533
partial differential equations of second order, 534
of elliptic type, 534
of hyperbolic type, 543
algorithm, 546
of parabolic type, 538
method with differences backward, 540
method with differences forward, 539
Peano theorem, 452
point matching method, 546
Poisson equation, 534
algorithm, 537
Polak–Ribi`ere method, 588
polydine cam, 367
Pontryagin principle of maximum, 607
Powell algorithm, 585
predictor-corrector methods, 469
Prony method, 355
pseudo-inverse of a matrix, 177
QR decomposition, 169
QR factorization, 170
INDEX 633
quadratic programming, 603
optimality criterion, 604
quadrature formula, 384
rank of a matrix, 113
definition, 113
calculation, 113
relaxation method, 149
remainders of series of matrices, 123
Richardson
formula of extrapolation, 396
method of integration, 542
Ritz method, 549
Romberg
formula of integration, 396
procedure, 398
rotation method, 168
Runge function, 362
Runge–Kutta methods, 458
of fourth-order, 460
of the sixth-order, 460
of the mean point, 459
of the third-order, 460
Runge–Kutta–Fehlberg method, 461
Schneider formula, 524
Schultz
conditions to determine the inverse of a matrix,
138
method
for inversion of the matrices see iterative method
for inversion of the matrices,
for solving systems of linear equations, 137
Schur
complement, 127
method of inversion of the matrices, 127
secant method see chord method
separation of roots, 60
sequence of matrices, 119
convergence, 119
limit, 119
series of matrices, 120
absolute convergence, 120
convergence, 120
similar matrices, 155
simplex algorithm, 597
dual, 599
primal, 597
Simpson
error for the formula of quadrature, 389
formula of quadrature, 389
generalized formula of quadrature, 389
singular value decomposition (SVD), 172
theorem, 173
solution
basic, 595
nondegenerate, 595
spectrum of a matrix, 153
spline
cubical spline function
with free boundary, 336
algorithm, 338
uniqueness, 337
with imposed boundary, 336
algorithm, 339
uniqueness, 338
functions, 335
interpolation, 335
Steffensen formula of interpolation, 326
Stirling formula of interpolation, 324
Sturm
sequence, 66
theorem, 67
substitution lemma, 124
system of normal equations, 175
tangent method, 26
a posteriori estimation of the error, 29
a priori estimation of the error, 29
convergence, 27
procedure of choice of the start point, 32
Taylor
method, 457
polynomials, 311
theorem, 311
trapezoid
error for the formula of quadrature, 387
formula of quadrature, 386
generalized formula of quadrature, 388
triangular form of a linear system, 133
trust region, 591
algorithm, 591
underdetermined linear systems, 178
variable
principal (basic), 595
secondary (nonbasic), 595
variational methods for partial differential equations, 547
vector
orthogonal, 172
orthonormal, 172
space, 124
Vi`ete relations, 72
Waring theorem, 62
wave propagation equation, 554
Wendorff schema, 555

More Related Content

PDF
Fourth order improved finite difference approach to pure bending analysis o...
PDF
solucionario de purcell 0
PDF
Answer solution with analysis aieee 2011 aakash
PDF
IRJET- On the Pellian Like Equation 5x2-7y2=-8
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
Fiveminutemathadditiondrills0 18
PDF
Capitulo 4 Soluciones Purcell 9na Edicion
PDF
Applications of Advanced Numerical Simulations and Analysis in Theoretical As...
Fourth order improved finite difference approach to pure bending analysis o...
solucionario de purcell 0
Answer solution with analysis aieee 2011 aakash
IRJET- On the Pellian Like Equation 5x2-7y2=-8
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Fiveminutemathadditiondrills0 18
Capitulo 4 Soluciones Purcell 9na Edicion
Applications of Advanced Numerical Simulations and Analysis in Theoretical As...

Similar to Wiley numerical analysis with applications in mechanics and engineering by teodorescu, stenascu, and pandread - 2013 (20)

PDF
James_F_Epperson_An_Introduction_to_Numerical_Methods_and_Analysis.pdf
PDF
Numerical Methods For Engineers_S. C. Chapra And R. P. Canale.pdf
PDF
Numerical Analysis
PDF
Na 20130603
DOCX
A First Course in NumeriCAl methodsCS07_Ascher-Gre.docx
PDF
Numerical Computing
PDF
Steven C. Chapra, Raymond P. Canale - Numerical Methods for Engineers-McGraw-...
PDF
numerical methods for civil engineering for every one
PDF
Engineering mathematics.pdf
PDF
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
PDF
Métodos Numéricos
PDF
Method Of Averaging For Differential Equations On An Infinite Interval Theory...
PDF
Nagle solucionario impares
PDF
Solucionario de ecuaciones diferenciales y problemas con valores en la fronte...
PDF
Solucionario de ecuaciones diferenciales y problemas con valores en la fronte...
PDF
293156106-Grewal-B-S-higher-Engineering-Mathematics-Khanna-2012.pdf
 
PDF
Differential Equations for Engineers
PDF
Arfken_Weber_Harris__Instructors_manual_Mathemati.pdf
PDF
Dennis j e, schnabel b numerical methods for unconstrained optimization and n...
PDF
Diederik Fokkema - Thesis
James_F_Epperson_An_Introduction_to_Numerical_Methods_and_Analysis.pdf
Numerical Methods For Engineers_S. C. Chapra And R. P. Canale.pdf
Numerical Analysis
Na 20130603
A First Course in NumeriCAl methodsCS07_Ascher-Gre.docx
Numerical Computing
Steven C. Chapra, Raymond P. Canale - Numerical Methods for Engineers-McGraw-...
numerical methods for civil engineering for every one
Engineering mathematics.pdf
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
Métodos Numéricos
Method Of Averaging For Differential Equations On An Infinite Interval Theory...
Nagle solucionario impares
Solucionario de ecuaciones diferenciales y problemas con valores en la fronte...
Solucionario de ecuaciones diferenciales y problemas con valores en la fronte...
293156106-Grewal-B-S-higher-Engineering-Mathematics-Khanna-2012.pdf
 
Differential Equations for Engineers
Arfken_Weber_Harris__Instructors_manual_Mathemati.pdf
Dennis j e, schnabel b numerical methods for unconstrained optimization and n...
Diederik Fokkema - Thesis
Ad

More from Julio Banks (20)

PDF
Apologia - A Call for a Reformation of Christian Protestants Organizations.pdf
PDF
Mathcad - CMS (Component Mode Synthesis) Analysis.pdf
PDF
MathCAD - Synchronicity Algorithm.pdf
PDF
Sharing the gospel with muslims
PDF
Mathcad explicit solution cubic equation examples
PDF
Math cad prime the relationship between the cubit, meter, pi and the golden...
PDF
Mathcad day number in the year and solar declination angle
PDF
Transcript for abraham_lincoln_thanksgiving_proclamation_1863
PDF
Thanksgiving and lincolns calls to prayer
PDF
Jannaf 10 1986 paper by julio c. banks, et. al.-ballistic performance of lpg ...
PDF
Man's search-for-meaning-viktor-frankl
PDF
Love versus shadow self
PDF
Exposing the truth about the qur'an
PDF
NASA-TM-X-74335 --U.S. Standard Atmosphere 1976
PDF
Mathcad P-elements linear versus nonlinear stress 2014-t6
PDF
Apologia - The martyrs killed for clarifying the bible
PDF
Apologia - Always be prepared to give a reason for the hope that is within yo...
PDF
Spontaneous creation of the universe ex nihil by maya lincoln and avi wasser
PDF
The “necessary observer” that quantum mechanics require is described in the b...
PDF
Advances in fatigue and fracture mechanics by grzegorz (greg) glinka
Apologia - A Call for a Reformation of Christian Protestants Organizations.pdf
Mathcad - CMS (Component Mode Synthesis) Analysis.pdf
MathCAD - Synchronicity Algorithm.pdf
Sharing the gospel with muslims
Mathcad explicit solution cubic equation examples
Math cad prime the relationship between the cubit, meter, pi and the golden...
Mathcad day number in the year and solar declination angle
Transcript for abraham_lincoln_thanksgiving_proclamation_1863
Thanksgiving and lincolns calls to prayer
Jannaf 10 1986 paper by julio c. banks, et. al.-ballistic performance of lpg ...
Man's search-for-meaning-viktor-frankl
Love versus shadow self
Exposing the truth about the qur'an
NASA-TM-X-74335 --U.S. Standard Atmosphere 1976
Mathcad P-elements linear versus nonlinear stress 2014-t6
Apologia - The martyrs killed for clarifying the bible
Apologia - Always be prepared to give a reason for the hope that is within yo...
Spontaneous creation of the universe ex nihil by maya lincoln and avi wasser
The “necessary observer” that quantum mechanics require is described in the b...
Advances in fatigue and fracture mechanics by grzegorz (greg) glinka
Ad

Recently uploaded (20)

PPTX
Fundamentals of Mechanical Engineering.pptx
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
ChapteR012372321DFGDSFGDFGDFSGDFGDFGDFGSDFGDFGFD
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PPTX
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
introduction to high performance computing
PPTX
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PDF
Abrasive, erosive and cavitation wear.pdf
PDF
August -2025_Top10 Read_Articles_ijait.pdf
PPTX
Management Information system : MIS-e-Business Systems.pptx
PDF
August 2025 - Top 10 Read Articles in Network Security & Its Applications
PPTX
CyberSecurity Mobile and Wireless Devices
PDF
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Fundamentals of Mechanical Engineering.pptx
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
ChapteR012372321DFGDSFGDFGDFSGDFGDFGDFGSDFGDFGFD
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
Exploratory_Data_Analysis_Fundamentals.pdf
introduction to high performance computing
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
Categorization of Factors Affecting Classification Algorithms Selection
Fundamentals of safety and accident prevention -final (1).pptx
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
Abrasive, erosive and cavitation wear.pdf
August -2025_Top10 Read_Articles_ijait.pdf
Management Information system : MIS-e-Business Systems.pptx
August 2025 - Top 10 Read Articles in Network Security & Its Applications
CyberSecurity Mobile and Wireless Devices
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt

Wiley numerical analysis with applications in mechanics and engineering by teodorescu, stenascu, and pandread - 2013

  • 1. NUMERICAL ANALYSIS WITH APPLICATIONS IN MECHANICS AND ENGINEERING
  • 3. NUMERICAL ANALYSIS WITH APPLICATIONS IN MECHANICS AND ENGINEERING PETRE TEODORESCU NICOLAE-DORU ST ˘ANESCU NICOLAE PANDREA
  • 4. Copyright  2013 by The Institute of Electrical and Electronics Engineers, Inc. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. All rights reserved Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-Publication Data: Teodorescu, P. P. Numerical Analysis with Applications in Mechanics and Engineering / Petre Teodorescu, Nicolae-Doru Stanescu, Nicolae Pandrea. pages cm ISBN 978-1-118-07750-4 (cloth) 1. Numerical analysis. 2. Engineering mathematics. I. Stanescu, Nicolae-Doru. II. Pandrea, Nicolae. III. Title. QA297.T456 2013 620.001 518–dc23 2012043659 Printed in the United States of America ISBN: 9781118077504 10 9 8 7 6 5 4 3 2 1
  • 5. CONTENTS Preface xi 1 Errors in Numerical Analysis 1 1.1 Enter Data Errors, 1 1.2 Approximation Errors, 2 1.3 Round-Off Errors, 3 1.4 Propagation of Errors, 3 1.4.1 Addition, 3 1.4.2 Multiplication, 5 1.4.3 Inversion of a Number, 7 1.4.4 Division of Two Numbers, 7 1.4.5 Raising to a Negative Entire Power, 7 1.4.6 Taking the Root of pth Order, 7 1.4.7 Subtraction, 8 1.4.8 Computation of Functions, 8 1.5 Applications, 8 Further Reading, 14 2 Solution of Equations 17 2.1 The Bipartition (Bisection) Method, 17 2.2 The Chord (Secant) Method, 20 2.3 The Tangent Method (Newton), 26 2.4 The Contraction Method, 37 2.5 The Newton–Kantorovich Method, 42 2.6 Numerical Examples, 46 2.7 Applications, 49 Further Reading, 52 v
  • 6. vi CONTENTS 3 Solution of Algebraic Equations 55 3.1 Determination of Limits of the Roots of Polynomials, 55 3.2 Separation of Roots, 60 3.3 Lagrange’s Method, 69 3.4 The Lobachevski–Graeffe Method, 72 3.4.1 The Case of Distinct Real Roots, 72 3.4.2 The Case of a Pair of Complex Conjugate Roots, 74 3.4.3 The Case of Two Pairs of Complex Conjugate Roots, 75 3.5 The Bernoulli Method, 76 3.6 The Bierge–Vi`ete Method, 79 3.7 Lin Methods, 79 3.8 Numerical Examples, 82 3.9 Applications, 94 Further Reading, 109 4 Linear Algebra 111 4.1 Calculation of Determinants, 111 4.1.1 Use of Definition, 111 4.1.2 Use of Equivalent Matrices, 112 4.2 Calculation of the Rank, 113 4.3 Norm of a Matrix, 114 4.4 Inversion of Matrices, 123 4.4.1 Direct Inversion, 123 4.4.2 The Gauss–Jordan Method, 124 4.4.3 The Determination of the Inverse Matrix by its Partition, 125 4.4.4 Schur’s Method of Inversion of Matrices, 127 4.4.5 The Iterative Method (Schulz), 128 4.4.6 Inversion by Means of the Characteristic Polynomial, 131 4.4.7 The Frame–Fadeev Method, 131 4.5 Solution of Linear Algebraic Systems of Equations, 132 4.5.1 Cramer’s Rule, 132 4.5.2 Gauss’s Method, 133 4.5.3 The Gauss–Jordan Method, 134 4.5.4 The LU Factorization, 135 4.5.5 The Schur Method of Solving Systems of Linear Equations, 137 4.5.6 The Iteration Method (Jacobi), 142 4.5.7 The Gauss–Seidel Method, 147 4.5.8 The Relaxation Method, 149 4.5.9 The Monte Carlo Method, 150 4.5.10 Infinite Systems of Linear Equations, 152 4.6 Determination of Eigenvalues and Eigenvectors, 153 4.6.1 Introduction, 153 4.6.2 Krylov’s Method, 155 4.6.3 Danilevski’s Method, 157 4.6.4 The Direct Power Method, 160 4.6.5 The Inverse Power Method, 165 4.6.6 The Displacement Method, 166 4.6.7 Leverrier’s Method, 166
  • 7. CONTENTS vii 4.6.8 The L–R (Left–Right) Method, 166 4.6.9 The Rotation Method, 168 4.7 QR Decomposition, 169 4.8 The Singular Value Decomposition (SVD), 172 4.9 Use of the Least Squares Method in Solving the Linear Overdetermined Systems, 174 4.10 The Pseudo-Inverse of a Matrix, 177 4.11 Solving of the Underdetermined Linear Systems, 178 4.12 Numerical Examples, 178 4.13 Applications, 211 Further Reading, 269 5 Solution of Systems of Nonlinear Equations 273 5.1 The Iteration Method (Jacobi), 273 5.2 Newton’s Method, 275 5.3 The Modified Newton’s Method, 276 5.4 The Newton–Raphson Method, 277 5.5 The Gradient Method, 277 5.6 The Method of Entire Series, 280 5.7 Numerical Example, 281 5.8 Applications, 287 Further Reading, 304 6 Interpolation and Approximation of Functions 307 6.1 Lagrange’s Interpolation Polynomial, 307 6.2 Taylor Polynomials, 311 6.3 Finite Differences: Generalized Power, 312 6.4 Newton’s Interpolation Polynomials, 317 6.5 Central Differences: Gauss’s Formulae, Stirling’s Formula, Bessel’s Formula, Everett’s Formulae, 322 6.6 Divided Differences, 327 6.7 Newton-Type Formula with Divided Differences, 331 6.8 Inverse Interpolation, 332 6.9 Determination of the Roots of an Equation by Inverse Interpolation, 333 6.10 Interpolation by Spline Functions, 335 6.11 Hermite’s Interpolation, 339 6.12 Chebyshev’s Polynomials, 340 6.13 Mini–Max Approximation of Functions, 344 6.14 Almost Mini–Max Approximation of Functions, 345 6.15 Approximation of Functions by Trigonometric Functions (Fourier), 346 6.16 Approximation of Functions by the Least Squares, 352 6.17 Other Methods of Interpolation, 354 6.17.1 Interpolation with Rational Functions, 354 6.17.2 The Method of Least Squares with Rational Functions, 355 6.17.3 Interpolation with Exponentials, 355 6.18 Numerical Examples, 356 6.19 Applications, 363 Further Reading, 374
  • 8. viii CONTENTS 7 Numerical Differentiation and Integration 377 7.1 Introduction, 377 7.2 Numerical Differentiation by Means of an Expansion into a Taylor Series, 377 7.3 Numerical Differentiation by Means of Interpolation Polynomials, 380 7.4 Introduction to Numerical Integration, 382 7.5 The Newton–Cˆotes Quadrature Formulae, 384 7.6 The Trapezoid Formula, 386 7.7 Simpson’s Formula, 389 7.8 Euler’s and Gregory’s Formulae, 393 7.9 Romberg’s Formula, 396 7.10 Chebyshev’s Quadrature Formulae, 398 7.11 Legendre’s Polynomials, 400 7.12 Gauss’s Quadrature Formulae, 405 7.13 Orthogonal Polynomials, 406 7.13.1 Legendre Polynomials, 407 7.13.2 Chebyshev Polynomials, 407 7.13.3 Jacobi Polynomials, 408 7.13.4 Hermite Polynomials, 408 7.13.5 Laguerre Polynomials, 409 7.13.6 General Properties of the Orthogonal Polynomials, 410 7.14 Quadrature Formulae of Gauss Type Obtained by Orthogonal Polynomials, 412 7.14.1 Gauss–Jacobi Quadrature Formulae, 413 7.14.2 Gauss–Hermite Quadrature Formulae, 414 7.14.3 Gauss–Laguerre Quadrature Formulae, 415 7.15 Other Quadrature Formulae, 417 7.15.1 Gauss Formulae with Imposed Points, 417 7.15.2 Gauss Formulae in which the Derivatives of the Function Also Appear, 418 7.16 Calculation of Improper Integrals, 420 7.17 Kantorovich’s Method, 422 7.18 The Monte Carlo Method for Calculation of Definite Integrals, 423 7.18.1 The One-Dimensional Case, 423 7.18.2 The Multidimensional Case, 425 7.19 Numerical Examples, 427 7.20 Applications, 435 Further Reading, 447 8 Integration of Ordinary Differential Equations and of Systems of Ordinary Differential Equations 451 8.1 State of the Problem, 451 8.2 Euler’s Method, 454 8.3 Taylor Method, 457 8.4 The Runge–Kutta Methods, 458 8.5 Multistep Methods, 462 8.6 Adams’s Method, 463 8.7 The Adams–Bashforth Methods, 465 8.8 The Adams–Moulton Methods, 467 8.9 Predictor–Corrector Methods, 469 8.9.1 Euler’s Predictor–Corrector Method, 469
  • 9. CONTENTS ix 8.9.2 Adams’s Predictor–Corrector Methods, 469 8.9.3 Milne’s Fourth-Order Predictor–Corrector Method, 470 8.9.4 Hamming’s Predictor–Corrector Method, 470 8.10 The Linear Equivalence Method (LEM), 471 8.11 Considerations about the Errors, 473 8.12 Numerical Example, 474 8.13 Applications, 480 Further Reading, 525 9 Integration of Partial Differential Equations and of Systems of Partial Differential Equations 529 9.1 Introduction, 529 9.2 Partial Differential Equations of First Order, 529 9.2.1 Numerical Integration by Means of Explicit Schemata, 531 9.2.2 Numerical Integration by Means of Implicit Schemata, 533 9.3 Partial Differential Equations of Second Order, 534 9.4 Partial Differential Equations of Second Order of Elliptic Type, 534 9.5 Partial Differential Equations of Second Order of Parabolic Type, 538 9.6 Partial Differential Equations of Second Order of Hyperbolic Type, 543 9.7 Point Matching Method, 546 9.8 Variational Methods, 547 9.8.1 Ritz’s Method, 549 9.8.2 Galerkin’s Method, 551 9.8.3 Method of the Least Squares, 553 9.9 Numerical Examples, 554 9.10 Applications, 562 Further Reading, 575 10 Optimizations 577 10.1 Introduction, 577 10.2 Minimization Along a Direction, 578 10.2.1 Localization of the Minimum, 579 10.2.2 Determination of the Minimum, 580 10.3 Conjugate Directions, 583 10.4 Powell’s Algorithm, 585 10.5 Methods of Gradient Type, 585 10.5.1 The Gradient Method, 585 10.5.2 The Conjugate Gradient Method, 587 10.5.3 Solution of Systems of Linear Equations by Means of Methods of Gradient Type, 589 10.6 Methods of Newton Type, 590 10.6.1 Newton’s Method, 590 10.6.2 Quasi-Newton Method, 592 10.7 Linear Programming: The Simplex Algorithm, 593 10.7.1 Introduction, 593 10.7.2 Formulation of the Problem of Linear Programming, 595 10.7.3 Geometrical Interpretation, 597 10.7.4 The Primal Simplex Algorithm, 597 10.7.5 The Dual Simplex Algorithm, 599
  • 10. x CONTENTS 10.8 Convex Programming, 600 10.9 Numerical Methods for Problems of Convex Programming, 602 10.9.1 Method of Conditional Gradient, 602 10.9.2 Method of Gradient’s Projection, 602 10.9.3 Method of Possible Directions, 603 10.9.4 Method of Penalizing Functions, 603 10.10 Quadratic Programming, 603 10.11 Dynamic Programming, 605 10.12 Pontryagin’s Principle of Maximum, 607 10.13 Problems of Extremum, 609 10.14 Numerical Examples, 611 10.15 Applications, 623 Further Reading, 626 Index 629
  • 11. PREFACE In writing this book, it is the authors’ wish to create a bridge between mathematical and technical disciplines, which requires knowledge of strong mathematical tools in the area of numerical analysis. Unlike other books in this area, this interdisciplinary work links the applicative part of numerical methods, where mathematical results are used without understanding their proof, to the theoretical part of these methods, where each statement is rigorously demonstrated. Each chapter is followed by problems of mechanics, physics, or engineering. The problem is first stated in its mechanical or technical form. Then the mathematical model is set up, emphasizing the physical magnitudes playing the part of unknown functions and the laws that lead to the mathematical problem. The solution is then obtained by specifying the mathematical methods described in the corresponding theoretical presentation. Finally, a mechanical, physical, and technical interpretation of the solution is provided, giving rise to complete knowledge of the studied phenomenon. The book is organized into 10 chapters. Each of them begins with a theoretical presentation, which is based on practical computation—the “know-how” of the mathematical method—and ends with a range of applications. The book contains some personal results of the authors, which have been found to be beneficial to readers. The authors are grateful to Mrs. Eng. Ariadna–Carmen Stan for her valuable help in the pre- sentation of this book. The excellent cooperation from the team of John Wiley & Sons, Hoboken, USA, is gratefully acknowledged. The prerequisites of this book are courses in elementary analysis and algebra, acquired by a student in a technical university. The book is addressed to a broad audience—to all those interested in using mathematical models and methods in various fields such as mechanics, physics, and civil and mechanical engineering; people involved in teaching, research, or design; as well as students. Petre Teodorescu Nicolae-Doru St˘anescu Nicolae Pandrea xi
  • 12. 1 ERRORS IN NUMERICAL ANALYSIS In this chapter, we deal with the most encountered errors in numerical analysis, that is, enter data errors, approximation errors, round-off errors, and propagation of errors. 1.1 ENTER DATA ERRORS Enter data errors appear, usually, if the enter data are obtained from measurements or experiments. In such a case, the errors corresponding to the estimation of the enter data are propagated, by means of the calculation algorithm, to the exit data. We define in what follows the notion of stability of errors. Definition 1.1 A calculation process P is stable to errors if, for any ε > 0, there exists δ > 0 such that if for any two sets I1 and I2 of enter data we have I1 − I2 i < δ, then the two exit sets S1 and S2, corresponding to I1 and I2, respectively, verify the relation S1 − S2 e < ε. Observation 1.1 The two norms i and e of the enter and exit quantities, respectively, which occur in Definition 1.1, depend on the process considered. Intuitively, according to Definition 1.1, the calculation process is stable if, for small variations of the enter data, we obtain small variations of the exit data. Hence, we must characterize the stable calculation process. Let us consider that the calculation process P is characterized by a family fk of functions defined on a set of enter data with values in a set of exit data. We consider such a vector function fk of vector variable fk : D → Rn , where D is a domain in Rm (we propose to have m enter data and n exit data). Definition 1.2 f : D → Rn is a Lipschitz function (has the Lipschitz property) if there exists m > 0, constant, so as to have f(x) − f(y) < m x − y for any x, y ∈ D (the first norm is in Rn and the second one in Rm ). Numerical Analysis with Applications in Mechanics and Engineering, First Edition. Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.  2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc. 1
  • 13. 2 ERRORS IN NUMERICAL ANALYSIS It is easy to see that a calculation process characterized by Lipschitz functions is a stable one. In addition, a function with the Lipschitz property is continuous (even uniform continuous) but the converse does not hold; for example, the function f : R+ → R+, f (x) = √ x, is continuous but it is not Lipschitz. Indeed, let us suppose that f (x) = √ x is Lipschitz, hence that it has a positive constant m > 0 such that |f (x) − f (y)| < m|x − y|, (∀)x, y ∈ R+. (1.1) Let us choose x and y such that 0 < y < x < 1/4m2. Expression (1.1) leads to √ x − √ y < m( √ x − √ y)( √ x + √ y), (1.2) from which we get 1 < m( √ x + √ y). (1.3) From the choice of x and y, it follows that √ x + √ y < 1 4m2 + 1 4m2 = 1 m , (1.4) so that relations (1.3) and (1.4) lead to 1 < m 1 m = 1, (1.5) which is absurd. Hence, the continuous function f : R+ → R+, f (x) = √ x is not a Lipschitz one. 1.2 APPROXIMATION ERRORS The approximation errors have to be accepted by the conception of the algorithms because of various objective considerations. Let us determine the limit of a sequence using a computer; it is supposed that the sequence is convergent. Let the sequence {xn}n∈N be defined by the relation xn+1 = 1 1 + x2 n , n ∈ N, x0 ∈ R. (1.6) We observe that the terms of the sequence are positive, excepting eventually x0. The limit of this sequence, denoted by x, is the positive root of the equation x = 1 1 + x2 . (1.7) If we wish to determine x with two exact decimal digits, then we take an arbitrary value of x0, for example, x0 = 0, and calculate the successive terms of the sequence {xn} (Table 1.1).
  • 14. PROPAGATION OF ERRORS 3 TABLE 1.1 Calculation of x with Two Exact Decimal Digits n xn n xn n xn n xn 0 0 4 0.6028 8 0.6705 12 0.6804 1 1 5 0.7290 9 0.6899 13 0.6836 2 0.5 6 0.6530 10 0.6775 14 0.6815 3 0.8 7 0.7011 11 0.6854 15 0.6828 We obtain x = 0.68 . . . 1.3 ROUND-OFF ERRORS Round-off errors are due to the mode of representation of the data in the computer. For instance, the number 0.8125 in base 10 is represented in base 2 in the form 0.8125 = 0.11012 and the number 0.75 in the form 0.112. Let us suppose that we have a computer that works with three significant digits. The sum 0.8125 + 0.75 becomes 1.5625 = 0.11012 + 0.112 ≈ 0.1102 + 0.112 = 1.1002 = 1.5. (1.8) Such errors may also appear because of the choice of inadequate types of data in the programming realized on the computer. 1.4 PROPAGATION OF ERRORS Let us consider the number x and let x be an approximation of it. Definition 1.3 (i) We call absolute error the expression E = x − x. (1.9) (ii) We call relative error the expression R = x − x x . (1.10) 1.4.1 Addition Let x1, x2, . . . , xn be the numbers for which the relative errors are R1, R2, . . . , Rn, while their absolute errors read E1, E2, . . . , En. The relative error of the sum is R n i=1 xi = n i=1 Ei n i=1 xi (1.11) and we may write the relation min i=1,n |Ri| ≤ R n i=1 xi ≤ max i=1,n |Ri|, (1.12)
  • 15. 4 ERRORS IN NUMERICAL ANALYSIS that is, the modulus of the relative error of the sum is contained between the lower and the higher values in the modulus of the relative errors of the component members. Thus, if the terms x1, x2, . . . , xn are positive and of the same order of magnitude, max i=1,n xi min i=1,n xi < 10, (1.13) then we must take the same number of significant digits for each term xi, i = 1, n, the same number of significant digits occurring in the sum too. If the numbers x1, x2, . . . , xn are much different among them, then the number of the significant digits after the comma is given by the greatest number xi (we suppose that xi > 0, i = 1, n). For instance, if we have to add the numbers x1 = 100.32, x2 = 0.57381, (1.14) both numbers having five significant digits, then we will round off x2 to two digits (as x1) and write x1 + x2 = 100.32 + 0.57 = 100.89. (1.15) It is observed that addition may result in a compensation of the errors, in the sense that the absolute error of the sum is, in general, smaller than the sum of the absolute error of each term. We consider that the absolute error has a Gauss distribution for each of the terms xi, i = 1, n, given by the distribution density φ(x) = 1 σ √ 2π e − x2 2σ2 , (1.16) from which we obtain the distribution function (x) = x −∞ φ(x)dx, (1.17) with the properties (−∞) = 0, (∞) = 1, (x) ∈ (0, 1), −∞ < x < ∞. (1.18) The probability that x is contained between −x0 and x0, with x0 > 0 is P (|x| < x0) = (x0) − (−x0) = x0 −x0 φ(t)dt = √ 2 σ √ π x0 0 e − t2 2σ2 dt. (1.19) Because φ(x) is an even function, it follows that the mean value of a variable with a normal Gauss distribution is xmed = ∞ −∞ xφ(x)dx = 0, (1.20) while its mean square deviation reads (x2 )max = ∞ −∞ x2 φ(x)dx = σ2 . (1.21) Usually, we choose σ as being the mean square root σ = σRMS = 1 n n i=1 R2 i . (1.22)
  • 16. PROPAGATION OF ERRORS 5 1.4.2 Multiplication Let us consider two numbers x1, x2 for which the relative errors are R1, R2, while the approxima- tions are x1, x2, respectively. We have x1x2 = x1(1 + R1)x2(1 + R2) = x1x2(1 + R1 + R2 + R1R2). (1.23) Because R1 and R2 are small, we may consider R1R2 ≈ 0, hence x1x2 = x1x2(1 + R1 + R2), (1.24) so that the relative error of the product of the two numbers reads R(x1x2) = R1 + R2. (1.25) Similarly, for n numbers x1, x2, . . . , xn, of relative errors R1, R2, . . . , Rn, we have R n i=1 xi = n i=1 Ri. (1.26) Let x be a number that may be written in the form x = x∗ × 10r , 1 ≤ x∗ < 10, 10r ≤ x < 10r+1 , x∗ ∈ Z. (1.27) The absolute error is |E| ≤ 10r−n , (1.28) while the relative one is |R| = |E| x ≤ 10r−n+1 x∗ × 10r = 10−n+1 x∗ ≤ 10−n+1 , (1.29) where we have supposed that x has n significant digits. If x is the round-off of x at n significant digits, then |E| ≤ 5 × 10r−n , |R| ≤ 5 x × 10r−n . (1.30) The error of the last significant digit, the nth, is ε = E 10r−n+1 = xR 10r−n+1 = x∗ R × 10n−1 . (1.31) Let x1, x2 now be two numbers of relative errors R1, R2 and let R be the relative error of the product x1x2. We have R = x1x2 − x1x2 x1x2 = R1 + R2 − R1R2. (1.32) Moreover, |R| takes its greatest value if R1 and R2 are negative; hence, we may write |R| ≤ 5 1 x∗ 1 + 1 x∗ 2 × 10−n + 25 x∗ 1 x∗ 2 × 10−2n , (1.33)
  • 17. 6 ERRORS IN NUMERICAL ANALYSIS where the error of the digit on the nth position is |ε(x1x2)| ≤ (x1x2)∗ 2 1 x∗ 1 + 1 x∗ 2 + 5 2 (x1x2)∗ x∗ 1 x∗ 2 × 10−n . (1.34) On the other hand, (x1x2)∗ = x∗ 1 x∗ 2 × 10−p , (1.35) where p = 0 or p = 1, the most disadvantageous case being that described by p = 0. The function φ(x∗ 1 , x∗ 2 ) = 10p 2 (x∗ 1 + x∗ 2 + 5 × 10−n ) (1.36) defined for 1 ≤ x∗ 1 < 10, 1 ≤ x∗ 2 < 10, 1 ≤ x∗ 1 x∗ 2 < 10 will attain its maximum on the frontier of the above domain, that is, for x∗ 1 = 1, x∗ 2 = 10 or x∗ 1 = 10, x∗ 2 = 1. It follows that φ(x∗ 1 , x∗ 2 ) ≤ 10−p 2 (11 + 5 × 10−n ), (1.37) and hence |ε(x1, x2)| ≤ 11 2 + 5 2 × 10−n < 6, (1.38) so that the error of the nth digit of the response will have at the most six units. If x1 = x2 = x, then the most disadvantageous case is given by (x∗ )2 = (x2 )∗ = 10 (1.39) when |ε(x2 )| ≤ 10 1 2 + 5 2 × 10−n < 4, (1.40) that is, the nth digit of x2 is given by an approximation of four units. Let x1, . . . , xm now be m numbers; then ε m i=1 xi ≤ (x1 · · · xm)∗ 2 × 5 × 10−n m i=1 1 + 5 × 10−n x∗ i − 1 , (1.41) the most disadvantageous case being that in which m − 1 numbers x∗ i are equal to 1, while one number is equal 10. In this case, we have ε m i=1 xi ≤ 5 5 × 10−n 1 + 5 × 10−n m−1 1 + 5 × 10−n 10 − 1 . (1.42) If all the m numbers are equal, xi = x, i = 1, m, then the most disadvantageous situation appears for (x∗ )m = (xm )∗ = 10, and hence it follows that |ε(xm )| ≤ 5 5 × 10−n 1 + 5 × 10−m 10 m − 1 . (1.43)
  • 18. PROPAGATION OF ERRORS 7 1.4.3 Inversion of a Number Let x be a number, x its approximation, and R its relative error. We may write 1 x = 1 x(1 + R) = 1 x (1 − R + R2 − R3 + · · ·) ≈ 1 x (1 − R), (1.44) hence R 1 x = 1 x − 1 x 1 x = R, (1.45) so that the relative error remains the same. In general, E 1 x = − E x2 . (1.46) 1.4.4 Division of Two Numbers We may imagine the division of x1 by x2 as the multiplication of x1 by 1/x2, so that R x1 x2 = R(x1) + R(x2); (1.47) hence, the relative errors are summed up. 1.4.5 Raising to a Negative Entire Power We may write R 1 xm = R 1 x 1 x · · · 1 x = m i=1 R 1 x = m i=1 R(x), m ∈ N, m = 0, (1.48) so that the relative errors are summed up. 1.4.6 Taking the Root of pth Order We have, successively, x 1 p = (x + R) 1 p = x 1 p (1 + R) 1 p = x 1 p 1 + R p + 1 p 1 p − 1 R2 2! + 1 p 1 p − 1 1 p − 2 R3 3! + · · · , (1.49) R x 1 p = x 1 p − x 1 p x 1 p ≈ − R p . (1.50) The maximum error for the nth digit is now obtained for x = 10(k−m)/m , x∗ = 1, (x∗ )m = 101−m , m = 1/p, k entire, and is given by ε x∗ 1 p ≤ 101−m 2 × 5 × 10−n [(1 + 5 × 10−n )m − 1] = 10n−m [(1 + 5 × 10−n )m − 1]. (1.51)
  • 19. 8 ERRORS IN NUMERICAL ANALYSIS 1.4.7 Subtraction Subtraction is the most disadvantageous operation if the result is small with respect to the minuend and the subtrahend. Let us consider the subtraction 20.003 − 19.998 in which the first four digits of each number are known with precision; concerning the fifth digit, we can say that it is determined with a precision of 1 unit. It follows that for 20.003 the relative error is R1 ≤ 10−3 20.003 < 5 × 10−5 , (1.52) while for 19.998 the relative error is R1 ≤ 10−3 19.998 < 5.1 × 10−5 . (1.53) The result of the subtraction operation is 5 × 10−3 , while the last digit may be wrong with two units, so that the relative error of the difference is R = 2 × 10−3 5 × 10−3 = 400 × 10−3 , (1.54) that is, a relative error that is approximately 8000 times greater than R1 or R2. It follows the rule that the difference of two quantities must be directly calculated, without previously calculating the two quantities. 1.4.8 Computation of Functions Starting from Taylor’s relation f (x) − f (x) = (x − x)f (ξ), (1.55) where ξ is a point situated between x and x, it follows that the absolute error is |E(f )| ≤ |E| sup ξ∈Int(x,x) |f (ξ)|, (1.56) while the relative error reads |R(f )| ≤ |E| |f (x)| sup ξ∈Int(x,x) |f (ξ)|, (1.57) where Int(x, x) defines the real interval of ends x and x. 1.5 APPLICATIONS Problem 1.1 Let us consider the sequence of integrals In = 1 0 xn ex dx, n ∈ N. (1.58) (i) Determine a recurrence formula for {In}n∈N.
  • 20. APPLICATIONS 9 Solution: To calculate In, n ≥ 1, we use integration by parts and have In = 1 0 xn ex dx = xn ex 1 0 − n 0 0 xn−1 ex dx = e − In−1. (1.59) (ii) Show that lim n→∞ In does exist. Solution: For x ∈ [0, 1] we have xn+1 ex ≤ xn ex , (1.60) hence In+1 ≤ In for any n ∈ N. It follows that {In}n∈N is a decreasing sequence of real numbers. On the other hand, xn ex ≥ 0, x ∈ [0, 1], n ∈ N, (1.61) so that {In}n∈N is a positive sequence of real numbers. We get 0 ≤ · · · ≤ In+1 ≤ In ≤ · · · ≤ I1 ≤ I0, (1.62) so that {In}n∈N is convergent and, moreover, 0 ≤ lim n→∞ In ≤ I0 = 1 0 ex dx = e − 1. (1.63) (iii) Calculate I13. Solution: To calculate the integral we have two methods. Method 1. I0 = 1 0 ex dx = ex 1 0 = e − 1, (1.64) I1 = e − 1I0 = 1, (1.65) I2 = e − 2I1 = e − 2, (1.66) I3 = e − 3I2 = 6 − 2e, (1.67) I4 = e − 4I3 = 9e − 24, (1.68) I5 = e − 5I4 = 120 − 44e, (1.69) I6 = e − 6I5 = 265e − 720, (1.70) I7 = e − 7I6 = 5040 − 1854e, (1.71) I8 = e − 8I7 = 14833e − 40320, (1.72) I9 = e − 9I8 = 362880 − 133496e, (1.73) I10 = e − 10I9 = 1334961e − 3628800, (1.74) I11 = e − 11I10 = 39916800 − 14684570e, (1.75) I12 = e − 12I11 = 176214841e − 479001600, (1.76) I13 = e − 13I12 = 6227020800 − 2290792932e. (1.77) It follows that I13 = 0.1798. (1.78)
  • 21. 10 ERRORS IN NUMERICAL ANALYSIS Method 2. In this case, we replace directly the calculated values, thus obtaining I0 = e − 1 = 1.718281828, (1.79) I1 = e − 1I0 = 1, (1.80) I2 = e − 2I1 = 0.718281828, (1.81) I3 = e − 3I2 = 0.563436344, (1.82) I4 = e − 4I3 = 0.464536452, (1.83) I5 = e − 5I4 = 0.395599568, (1.84) I6 = e − 6I5 = 0.34468442, (1.85) I7 = e − 7I6 = 0.305490888, (1.86) I8 = e − 8I7 = 0.274354724, (1.87) I9 = e − 9I8 = 0.249089312, (1.88) I10 = e − 10I9 = 0.227388708, (1.89) I11 = e − 11I10 = 0.21700604, (1.90) I12 = e − 12I11 = 0.114209348, (1.91) I13 = e − 13I12 = 1.233560304. (1.92) We observe that, because of the propagation of errors, the second method cannot be used to calculate In, n ≥ 12. Problem 1.2 Let the sequences {xn}n∈N and {yn}n∈N be defined recursively: xn+1 = 1 2 xn + 0.5 xn , x0 = 1, (1.93) yn+1 = yn − λ(y2 n − 0.5), y0 = 1. (1.94) (i) Calculate x1, x2, . . . , x7. Solution: We have, successively, x1 = 1 2 x0 + 0.5 x0 = 3 4 , (1.95) x2 = 1 2 x1 + 0.5 x1 = 17 24 , (1.96) x3 = 1 2 x2 + 0.5 x2 = 577 816 , (1.97) x4 = 1 2 x3 + 0.5 x3 = 0.707107, (1.98) x5 = 1 2 x4 + 0.5 x4 = 0.707107, (1.99)
  • 22. APPLICATIONS 11 x6 = 1 2 x5 + 0.5 x5 = 0.707107, (1.100) x7 = 1 2 x6 + 0.5 x6 = 0.707107. (1.101) (ii) Calculate y1, y2, . . . , y7 for λ = 0.49. Solution: There result the values y1 = y0 − 0.49(y2 0 − 0.5) = 0.755, (1.102) y2 = y1 − 0.49(y2 1 − 0.5) = 0.720688, (1.103) y3 = y2 − 0.49(y2 2 − 0.5) = 0.711186, (1.104) y4 = y3 − 0.49(y2 3 − 0.5) = 0.708351, (1.105) y5 = y4 − 0.49(y2 4 − 0.5) = 0.707488, (1.106) y6 = y5 − 0.49(y2 5 − 0.5) = 0.707224, (1.107) y7 = y8 − 0.49(y2 8 − 0.5) = 0.707143. (1.108) (iii) Calculate y1, y2, . . . , y7 for λ = 49. Solution: In this case, we obtain the values y1 = y0 − 49(y2 0 − 0.5) = −23.5, (1.109) y2 = y1 − 49(y2 1 − 0.5) = −27059.25, (1.110) y3 = y2 − 49(y2 2 − 0.5) = −3.587797 × 1010 , (1.111) y4 = y3 − 49(y2 3 − 0.5) = −6.307422 × 1022 , (1.112) y5 = y4 − 49(y2 4 − 0.5) = −1.949395 × 1047 , (1.113) y6 = y5 − 49(y2 5 − 0.5) = −1.862070 × 1096 , (1.114) y7 = y8 − 49(y2 8 − 0.5) = −1.698979 × 10194 . (1.115) We observe that the sequences {xn}n∈N and {yn}n∈N converge to √ 0.5 = 0.707107 for λ = 0.49, while the sequence {yn}n∈N is divergent for λ = 49. Problem 1.3 If the independent aleatory variables X1 and X2 have the density distributions p1(x) and p2(x), respectively, then the aleatory variable X1 + X2 has the density distribution p(x) = ∞ −∞ p1(x − s)p2(s)ds. (1.116) (i) Demonstrate that if the aleatory variables X1 and X2 have a normal distribution by zero mean and standard deviations σ1 and σ2, then the aleatory variable X1 + X2 has a normal distribution. Solution: From equation (1.116) we have p(x) = ∞ −∞ 1 σ1 √ 2π e − (x − s)2 2σ2 1 1 σ2 √ 2π e − x2 2σ2 2 ds = 1 2πσ1σ2 ∞ −∞ e − (x − s)2 2σ2 1 e − s2 2σ2 2 ds. (1.117)
  • 23. 12 ERRORS IN NUMERICAL ANALYSIS We require the values λ1, λ2, and a real, such that (x − s)2 2σ2 1 + s2 2σ2 2 = x2 λ2 1 + (s − ax)2 λ2 2 , (1.118) from which x2 σ2 1 = x2 2 λ2 1 + a2x2 λ2 2 , s2 σ2 1 + s2 σ2 2 = s2 λ2 2 , − 2xs σ2 1 = − 2asx λ2 2 , (1.119) with the solution λ2 2 = σ2 1σ2 2 σ2 1 + σ2 2 , a = σ2 2 σ2 1 + σ2 2 , λ2 1 = σ2 1 + σ2 2. (1.120) We make the change of variable s − ax = √ 2λ2t, ds = √ 2λ2dt (1.121) and expression (1.118) becomes p(x) = 1 2πσ1σ2 ∞ −∞ e − x2 2λ2 1 e−t2 λ2dt = 1 σ2 1 + σ2 2 √ 2π e − x2 2(σ2 1 + σ2 2) . (1.122) (ii) Calculate the mean and the standard deviation of the aleatory variable X1 + X2 of point (i). Solution: We calculate ∞ −∞ xp(x)dx = 1 σ2 1 + σ2 2 √ 2π ∞ −∞ xe − x2 2 σ2 1+σ2 2 dx = 0, (1.123) ∞ −∞ x2 p(x)dx = 1 σ2 1 + σ2 2 √ 2π ∞ −∞ x2 e − x2 2 σ2 1 + σ2 2 dx =   − σ2 1 + σ2 2 √ 2π xe − x2 σ2 1+σ2 2    ∞ −∞ + σ2 1 + σ2 2 √ 2π ∞ −∞ e − x2 2(σ2 1+σ2 2) dx = σ2 1 + σ2 2. (1.124) (iii) Let X be an aleatory variable with a normal distribution, a zero mean, and standard deviation σ. Calculate I1 = 1 σ √ 2π ∞ −∞ e − x2 2σ2 dx (1.125) and I2 = 1 σ √ 2π σ −σ e − x2 2σ2 dx. (1.126) Solution: Through the change of variable x = σ √ 2u, dx = σ √ 2du, (1.127)
  • 24. APPLICATIONS 13 it follows that I1 = 1 σ √ 2π ∞ −∞ e−u2 σ √ 2du = 1. (1.128) Similarly, we have I2 = 1 √ π σ −σ e−u2 du. (1.129) On the other hand, σ −σ e−u2 du = 2π 0 σ 0 e−ρ2 ρdρdθ = π(1 − e−σ2 ), (1.130) so that I2 = 1 − e−σ2 . (1.131) (iv) Let 0 < ε < 1, fixed. Determine R > 0 so that 1 √ π R −R e−x2 dx < ε. (1.132) Solution: Proceeding as with point (iii), it follows that R −R e−x2 dx = π(1 − e−R2 ), (1.133) so that we obtain the inequality 1 − e−R2 < ε, (1.134) from which R < − ln(1 − ε2). (1.135) (v) Calculate I3 = 1 σ √ 2π R −R e − x2 2σ2 dx (1.136) and I4 = 1 σ √ 2π ∞ R e − x2 2σ2 dx (1.137) Solution: We again make the change of variable (1.127) and obtain I3 = 1 √ π R σ √ 2 − R σ √ 2 e−u2 du. (1.138) Point (ii) shows that A −A e−x2 dx = π(1 − e−A2 ), A > 0; (1.139) hence, it follows that I3 = 1 − e − R2 2σ2 . (1.140)
  • 25. 14 ERRORS IN NUMERICAL ANALYSIS On the other hand, we have seen that I1 = 1 and we may write I1 = 1 σ √ 2π 2 ∞ R e − x2 σ2 dx + R −R e − x2 2σ2 dx = 2I4 + I3. (1.141) Immediately, it follows that I4 = I1 − I3 2 = 1 − 1 − e − R2 2σ2 2 . (1.142) (vi) Let X1 and X2 be two aleatory variables with a normal distribution, a zero mean, and standard deviation σ. Determine the density distribution of the aleatory variable X1 + X2, as well as its mean and standard deviation. Solution: It is a particular case of points (i) and (ii); hence, we obtain p(x) = 1 2σ √ π e − x2 4σ2 , (1.143) that is, a normal aleatory variable of zero mean and standard deviation σ √ 2. (vii) Let N1 and N2 be numbers estimated with errors ε1 and ε2, respectively, considered to be aleatory variables with normal distribution, zero mean, and standard deviation σ. Calculate the sum N1 + N2 so that the error is less than a value ε > 0. Solution: The requested probability is given by I = ε −∞ 1 2σ √ π e − x2 4σ2 dx = −ε −∞ 1 2σ √ π e − x2 4σ2 dx + ε −ε 1 2σ √ π e − x2 4σ2 dx. (1.144) Taking into account the previous results, we obtain −ε −∞ 1 2σ √ π e − x2 4σ2 dx = 1 − 1 − e − ε2 4σ2 2 , (1.145) ε −ε 1 2σ √ π e − x2 4σ2 dx = 1 − e − ε2 4σ2 , (1.146) so that I = 1 2 1 + 1 − e − ε2 4σ2 . (1.147) FURTHER READING Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of America. Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis: Theory, Methods and Practice. Boca Raton: CRC Press. Atkinson KE (1989). An Introduction to Numerical Analysis. 2nd ed. New York: John Wiley & Sons, Inc.
  • 26. FURTHER READING 15 Atkinson KE (2003). Elementary Numerical Analysis. 2nd ed. New York: John Wiley & Sons, Inc. Bakhvalov N (1976). M´ethodes Num´erique. Moscou: Editions Mir (in French). Berbente C, Mitran S, Zancu S (1997). Metode Numerice. Bucures¸ti: Editura Tehnic˘a (in Romanian). Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole. Chapra SC (1996). Applied Numerical Methods with MATLAB for Engineers and Scientists. Boston: McGraw-Hill. Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson. Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice Hall. D´emidovitch B, Maron I (1973). ´El´ements de Calcul Num´erique. Moscou: Editions Mir (in French). Epperson JF (2007). An Introduction to Numerical Methods and Analysis. Hoboken: John Wiley & Sons, Inc. Gautschi W (1997). Numerical Analysis: An Introduction. Boston: Birkh¨auser. Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen- tation of Algorithms. Princeton: Princeton University Press. Hamming RW (1987). Numerical Methods for Scientists and Engineers. 2nd ed. New York: Dover Publications. Hamming RW (2012). Introduction to Applied Numerical Analysis. New York: Dover Publications. Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing. Higham NJ (2002). Accuracy and Stability of Numerical Algorithms. 2nd ed. Philadelphia: SIAM. Hildebrand FB (1987). Introduction to Numerical Analysis. 2nd ed. New York: Dover Publications. Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill. Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd ed. Boca Raton: CRC Press. Krˆılov AN (1957). Lect¸ii de Calcule prin Aproximat¸ii. Bucures¸ti: Editura Tehnic˘a (in Romanian). Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill. Levine L (1964). Methods for Solving Engineering Problems Using Analog Computers. New York: McGraw-Hill. Marinescu G (1974). Analiz˘a Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific Computing. 3rd ed. Cambridge: Cambridge University Press. Quarteroni A, Sacco R, Saleri F (2010). Numerical Mathematics. 2nd ed. Berlin: Springer-Verlag. Ralston A, Rabinowitz P (2001). A First Course in Numerical Analysis. 2nd ed. New York: Dover Publications. Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press. Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson. Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN. Bucures¸ti: Editura Tehnic˘a (in Romanian). St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian). Stoer J, Bulirsh R (2010). Introduction to Numerical Analysis. 3rd ed. New York: Springer-Verlag.
  • 27. 2 SOLUTION OF EQUATIONS We deal with several methods of approximate solutions of equations, that is, the bipartition method, the chord (secant) method, the tangent method (Newton), and the Newton–Kantorovich method. These are followed by applications. 2.1 THE BIPARTITION (BISECTION) METHOD Let us consider the equation1 f (x) = 0, (2.1) where f : [a, b] → R, a, b ∈ R, a < b, f continuous on [a, b], with a single root α, f (α) = 0, on the interval [a, b]. First, we verify if f (a) = 0 or f (b) = 0; if this occurs, then the algorithm stops. Otherwise, we consider the middle of the interval [a, b], c = (a + b)/2. We verify if c is a solution of equation (2.1); if f (c) = 0, the algorithm stops; if not, we calculate f (c). If f (a) × f (c) < 0, then we consider the interval [a, c] on which we have the true solution; if not, we consider the interval [c, b]. Thus, the interval [a, b] is diminished to [a, c] or [c, b], its new length being equal to (a + b)/2. We thus obtain a new interval [a, b], where a = c or b = c, and we apply the procedure described above. The procedure stops when a certain criterion (e.g., the length of the interval [a, b] is less than a given ε) is fulfilled. 1 The bipartition method is the simplest and most popular method for solving equations. It was known by ancient mathematicians. Numerical Analysis with Applications in Mechanics and Engineering, First Edition. Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.  2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc. 17
  • 28. 18 SOLUTION OF EQUATIONS As we can see from this exposition, the bipartition method consists in the construction of three sequences {an}, {bn}, and {cn}, n ∈ N, as follows: a0 = 0, b0 = b, cn = an + bn 2 , n ≥ 0, an+1 = an for f an × f (cn) < 0 cn otherwise , bn+1 = bn for f cn × f (bn) < 0 cn otherwise . (2.2) The bipartition method is based on the following theorem. Theorem 2.1 The sequences {an}, {bn}, {cn}, n ∈ N, given by formulae (2.2), are convergent, and their limit is the value of the unique real root α of equation (2.1) on the interval [a, b]. Demonstration. Let us show that bn − an = b − a 2n , (2.3) for any n ∈ N. To fix the ideas, we suppose that f (a) < 0 and f (b) > 0. If f (cn−1) < 0, then bn − an = bn−1 − cn−1 = bn−1 − an−1 + bn−1 2 = bn−1 − an−1 2 , (2.4) whereas if f (cn−1) > 0, we get bn − an = cn−1 − an−1 = an−1 + bn−1 2 − an−1 = bn−1 − an−1 2 . (2.5) Hence, in general, bn − an = bn−1 − an−1 2 = bn−2 − an−2 22 = · · · = b0 − a0 2n = b − a 2n . (2.6) It is obvious that an < cn < bn, n ∈ N. (2.7) From the definition of the sequence {an}, n ∈ N, it follows that an+1 = an or an+1 = cn = (an + bn)/2 > an. We may write an+1 ≥ an, n ∈ N; (2.8) hence, the sequence {an}, n ∈ N, is monotone increasing. Analogically, we obtain the relation bn+1 ≤ bn, n ∈ N; (2.9) this means that the sequence {bn}, n ∈ N, is monotone decreasing. Let us have the sequence of relations a = a0 ≤ a1 ≤ · · · ≤ an ≤ · · · ≤ bn ≤ · · · ≤ b1 ≤ b0 = b, (2.10) where the sequence {an}, n ∈ N, is superior bounded by any value bn, n ∈ N; in particular, we can take bn = b. The sequence {bn}, n ∈ N, is inferior bounded by any value an, n ∈ N, in particular by an = a.
  • 29. THE BIPARTITION (BISECTION) METHOD 19 We have stated thus that {an}, n ∈ N, is a monotone increasing and superior bounded (by b) sequence, and hence it is convergent, while the sequence {bn}, n ∈ N is a monotone decreasing and inferior bounded (by a) sequence, and hence it is convergent, too. Let A = lim n→∞ an and B = lim n→∞ bn. Let us show that A = B, that is, that the sequences {an}, {bn}, n ∈ N, have the same limit. We have A − B = lim n→∞ an − lim n→∞ bn = lim n→∞ (an − bn). (2.11) On the other hand, taking into account relation (2.6), we get lim n→∞ (an − bn) = lim n→∞ a − b 2n = 0. (2.12) The last two expressions show that A − B = 0, hence A = B. Let A = lim n→∞ an = lim n→∞ bn. Applying now the tongs theorem for the sequences {an}, {bn}, and {cn}, n ∈ N, and taking into account (2.7), it follows that the sequence {cn}, n ∈ N, is convergent and lim n→∞ cn = A. Let us show that f (A) = 0. We have f (A) = f lim n→∞ an = lim n→∞ f (an) ≤ 0, (2.13) f (A) = f lim n→∞ bn = lim n→∞ f (bn) ≥ 0, (2.14) where if f is continuous, then the function commutes into the limit. The last two expressions lead to f (A) = 0, and hence A is the root α of the equation f (x) = 0 on the interval [a, b]. To determine the corresponding error, we can proceed in two modes. In the first method, we start from the evident relations |an − bn| = 2|an − cn|, |an − bn| = 2|bn − cn|, (2.15) where an = cn−1, or bn = cn−1, from which we obtain |an − bn| = 2|cn−1 − cn|, (2.16) so that |cn − α| < |an − bn| = 2|cn−1 − cn|. (2.17) To determine the solution α with an error ε, we must calculate the terms of the sequence {cn}, n ∈ N, until the relation 2|cn−1 − cn| < ε (2.18) is fulfilled. We then have an a posteriori estimation of the error. In the second method, we start from the relation |cn − α| < bn − an = b − a 2n . (2.19) To determine now the solution α with an error ε, we must calculate n terms of the sequence {cn}, n ∈ N, so that b − a 2n < ε ⇒ n = ln b−a ε ln 2 + 1, (2.20) where the brackets represent the entire part of the fraction. We then have an a priori estimation of the error.
  • 30. 20 SOLUTION OF EQUATIONS Observation 2.1 If equation (2.1) has several roots on the interval [a, b], the above algorithm leads to one of these roots. 2.2 THE CHORD (SECANT) METHOD Let us consider the equation2 f (x) = 0, (2.21) where f : [a, b] ⊂ R → R, f continuous on [a, b], with a single root on the interval [a, b].We construct a point c ∈ [a, b] using the following rule. Let us consider the straight line AB, where A(a, f (a)) and B(b, f (b)). The equation of this line, denoted by (d) in Figure 2.1, is y − f (a) f (b) − f (a) = x − a b − a . (2.22) The value of the abscissa c of the intersection point of the straight line (d) with the Ox-axis is given by the equation −f (a) f (b) − f (a) = c − a b − a , (2.23) from which we obtain c = af (b) − bf (a) f (b) − f (a) . (2.24) The method consists firstly in verifying if a or b is solution of equation (2.21). If f (a) = 0 or f (b) = 0, then the procedure stops. Otherwise, we determine c using formula (2.24). If f (c) = 0, then the algorithm ends, the required solution of equation (2.21) being found. If f (c) = 0, then we calculate the products. If f (a) · f (c) < 0, then the solution is in the interval [a, c], and if f (a) · f (c) > 0, then, obviously, we consider the interval [c, b]. Thus, the interval [a, b] is replaced by one of the intervals [a, c] or [c, b], the length of which is strictly smaller than that of the interval [a, b]. A(a, f(a)) f(x) O x c ba B(b, f(b)) y Figure 2.1 The chord method. 2The method was known by the Babylonian and Egyptian mathematicians in different forms. It also appears (as regula falsi) in the papers of Abu Kamil (tenth century), Qusta ibn Luqa (tenth century), and Leonardo of Pisa (Fibonacci, 1202).
  • 31. THE CHORD (SECANT) METHOD 21 The chord method involves the construction of three sequences {an}, {bn}, and {c}, n ∈ N, defined recurrently as follows: a0 = a, b0 = b, cn = anf (bn) − bnf (an) f (bn) − f (an) , an+1 = an if f an · f (cn) < 0 cn otherwise ; bn+1 = bn if f cn · f (bn) < 0 cn otherwise . (2.25) Theorem 2.2 Let f : [a, b] → R, f ∈ C0 ([a, b]), f with a single root in the interval [a, b]. Under these conditions, the sequence {cn}, defined by relations (2.25), converges to α, the unique solution of equation (2.21) in the interval [a, b]. Demonstration. The sequences {an} and {bn}, n ∈ N, satisfy the relation an < bn, (∀) n ∈ N. (2.26) Indeed, for n = 0 we have a0 = a < b = b0. On the other hand, if f (cn−1) = 0, then we have an − bn = cn−1 − bn = an−1f (bn−1) − bn−1f (an−1) f (bn−1) − f (an−1) − bn−1 = f (bn−1)(an−1 − bn−1) f (bn−1) − f (an−1) (2.27) for an = cn−1, and an − bn = an−1 − cn−1 = an−1 − an−1f (bn−1) − bn−1f (an−1) f (bn−1) − f (an−1) = − f (an−1) f (bn−1) − f (an−1) (2.28) for bn = cn−1, respectively. Let us suppose that f (a) < 0 and f (b) > 0, which leads to f (an) < 0 and f (bn) > 0, (∀) n ∈ N, respectively. In this case, it follows that an − bn has the same sign as an−1 − bn−1. By complete induction we obtain an < bn, hence relation (2.26) is true. We have an < cn < bn, (∀)n ∈ N. (2.29) Indeed, we can write an − cn = an − anf (bn) − bnf (an) f (bn) − f (an) = − f (an)(an − bn) f (bn) − f (an) < 0 (2.30) and bn − cn = bn − anf (bn) − bnf (an) f (bn) − f (an) = − f (bn)(bn − an) f (bn) − f (an) > 0, (2.31) respectively, hence relation (2.29) is true. We thus show that the sequence {an}n∈N is monotone increasing and superior bounded by any element bn of the sequence {bn}n∈N, in particular by b0 = b. Hence, the sequence {an}n∈N is conver- gent, and let A be its limit. Analogically, the sequence {bn}n∈N is monotone decreasing and inferior bounded by any element of the sequence {an}n∈N, particularly by a0 = a; hence, the sequence {bn}n∈N is convergent, and let B be its limit. We thus obtain lim n→∞ an = A, lim n→∞ bn = B, A ≤ B, A, B ∈ [a, b]. (2.32)
  • 32. 22 SOLUTION OF EQUATIONS Let us show now that the sequence {cn} is convergent. Case 2.1 We suppose that A = B. Using inequality (2.29) and passing to the limit, we obtain A = lim n→∞ an ≤ lim n→∞ cn ≤ lim n→∞ bn = B (2.33) and the theorem of tongs leads to lim n→∞ cn = A = B. (2.34) On the other hand, f (A) = f ( lim n→∞ an) = lim n→∞ f (an) ≤ 0 (2.35) and f (B) = f ( lim n→∞ bn) = lim n→∞ f (bn) ≥ 0; (2.36) because of the continuity of f , the limit commutes with the function. It follows from equation (2.35) and equation (2.36) that f (A) = f ( lim n→∞ cn) = 0 and, because f has a single root in the interval [a, b], we deduce that A = α, hence lim n→∞ cn = α. Case 2.2 We suppose that A = B. Let us assume at the very beginning that it is not possible to have f (A) = f (B) = 0 because f has only one root in the interval [a, b]. Hence, f (A) = f (B). Let us now pass to the limit for cn = anf (bn) − bnf (an) f (bn) − f (an) . (2.37) We get lim n→∞ cn = lim n→∞ anf (bn) − bnf (an) f (bn) − f (an) = Af (B) − Bf (A) f (B) − f (A) . (2.38) If f (A) = 0 and f (B) = 0, then relation (2.38) leads to lim n→∞ cn = Af (B) − B · 0 f (B) − 0 = A, (2.39) hence cn → α. If f (B) = 0 and f (A) = 0, then relation (2.38) leads to lim n→∞ cn = A · 0 − Bf (A) 0 − f (A) = B, (2.40) so that we get once more cn → α. Finally, if f (A) = 0 and f (B) = 0, it is obvious that f (A) < 0 and f (B) > 0. On the other hand, the inequalities A < Af (B) − Bf (A) f (B) − f (A) < B (2.41) hold, which is evident because they lead to −Af (A) < −Bf (A) and Af (B) < Bf (B). (2.42)
  • 33. THE CHORD (SECANT) METHOD 23 Passing now to the limit in equation (2.37) and taking into account inequality (2.41), we get A < lim n→∞ cn < B, (2.43) so that we have an+1 = cn or bn+1 = cn for any n ≥ 0. Hence, {cn|n ∈ N} ⊂ {am|m ∈ N∗ } ∪ {bm|m ∈ N∗ } ⊂ (−∞, A) ∪ (B, +∞), (2.44) from which lim n→∞ cn ∈ (−∞, A] ∪ [B, ∞). (2.45) Relations (2.43) and (2.45) are in contradiction so that this case is not possible, and the theorem is proved. Theorem 2.3 (a posteriori estimation of the error). Let f : [a, b] → R, f continuous on [a, b] and derivable on (a, b); we suppose that f has a single root α on [a, b] and that there exist the real and strict positive constants m > 0, M > 0 such that m ≤ |f (x)| ≤ M, (∀)x ∈ (a, b). (2.46) Under these conditions, the relation |cn−1 − α| ≤ M m |cn − cn−1|, (2.47) which represents the a posteriori estimation of the error in the chord method, holds. Demonstration. Assuming that f (cn−1) = 0, we can write cn − cn−1 = anf (bn) − bnf (an) f (bn) − f (an) − an = f (an)(an − bn) f (bn) − f (an) (2.48) if f (cn−1) < 0 and cn − cn−1 = anf (bn) − bnf (an) f (bn) − f (an) − bn = f (bn)(an − bn) f (bn) − f (an) (2.49) if f (cn−1) > 0, respectively. Let us apply now Lagrange’s finite increments formula to the function f on the interval [an, bn]. Hence, there exists ξ ∈ (an, bn) such that f (bn) − f (an) = f (ξ)(bn − an). (2.50) From equation (2.48), equation (2.49), and equation (2.50) we get cn − cn−1 = − f (an) f (ξ) for f (cn−1) < 0 (2.51) or cn − cn−1 = − f (bn) f (ξ) for f (cn−1) > 0. (2.52)
  • 34. 24 SOLUTION OF EQUATIONS Let us now apply Lagrange’s formula to the restriction of the function f on the interval [an, α]. Hence, there exists ξn ∈ (an, α) such that f (α) − f (an) = f (ξn)(α − an); (2.53) because f (α) = 0, we get −f (an) = f (ξn)(α − an). (2.54) Obviously, in the case f (cn−1) > 0, we apply Lagrange’s formula to the restriction of function f in the interval (α, bn), the calculation being analogous. From equation (2.51) and equation (2.54), it follows that cn − cn−1 = − f (ξn)(α − an) f (ξn) . (2.55) On the other hand, we can write the relations |cn − cn−1| ≥ |α − cn−1|, |α − an| ≤ |cn − cn−1|, (2.56) |f (ξn)| ≤ M, |f (ξn)| ≥ m, (2.57) so that, by applying the modulus, expression (2.55) leads to |α − cn−1| ≤ M m |cn − cn−1|. (2.58) Theorem 2.4 (a priori estimation of the error). Let f : [a, b] → R, f having a single root on [a, b]. If f is convex, strictly increasing, and derivable on [a, b] and if f (a) > 0, then the relation α − cn ≤ 1 − f (a) f (b) n (α − c0) ≤ 1 − f (a) f (b) n (b − a) (2.59) holds. Demonstration. Because f is convex, we deduce that f is strictly increasing on [a, b] so that we have f (a) < f (x) < f (b), (∀)x ∈ (a, b). (2.60) From equation (2.37), taking into account that f is convex and supposing that f (a) < 0, f (b) > 0, we obtain α − cn = α − cn−1f (b) − bf (cn−1) f (b) − f (cn−1) = f (b)(α − cn−1) − f (cn−1)(α − b) f (b) − f (cn−1) . (2.61) We now apply Lagrange’s theorem to the function f on the interval [α, b]; hence, there exists ξ ∈ (α, b) such that f (b) − f (α) = f (ξ)(b − α). (2.62) Analogically, applying Lagrange’s formula to the function f on the interval [cn−1, α], it results in the existence of ζ ∈ (cn−1, α), for which we can write f (α) − f (cn−1) = f (ζ)(α − cn−1). (2.63)
  • 35. THE CHORD (SECANT) METHOD 25 O y x 2 f(b) 4 f(b) 8 f(b) f(b) f(a) a c3c2c1c0 A B b Figure 2.2 Modified chord method. Because f (α) = 0, expressions (2.62) and (2.63) take a simpler form f (b) = f (ξ)(b − α), (2.64) −f (cn−1) = f (ζ)(α − cn−1), (2.65) respectively.Replacing the last two relations in formula (2.61), we obtain α − cn = f (ξ) − f (ζ) f (b) − f (cn−1) (b − α)(α − cn−1). (2.66) Because ζ < ξ and f is strictly increasing, we get f (ξ) − f (ζ) > 0. (2.67) On the other hand, f (b) > 0 and f (cn−1) < 0. Relation (2.66) leads now to α − cn ≤ f (ξ) − f (ζ) f (b) (b − α)(α − cn−1). (2.68) Replacing relation (2.64) in the last formula, we get α − cn ≤ f (ξ) − f (ζ) f (ξ) (α − cn−1) = 1 − f (ζ) f (ξ) (α − cn−1) ≤ 1 − f (a) f (b) (α − cn−1). (2.69) If we write relation (2.69) for n − 1, n − 2, . . . , 1, it results in α − cn ≤ 1 − f (a) f (b) n (α − c0) ≤ 1 − f (a) f (b) n (b − a), (2.70) and the theorem is proved. A variant of this method supposes the division by 2, at each step, of the value of the function at the end at which it is maintained. The situation is presented in a graphical form in Figure 2.2. In the case considered in the figure, we obtain the results c0 = af (b) − bf (a) f (b) − f (a) , c1 = c0 f (b) 2 − bf (c0) f (b) 2 − f (c0) , c2 = c1 f (b) 4 − bf (c1) f (b) 4 − f (c1) , c3 = c2 f (b) 8 − bf (c2) f (b) 8 − f (c2) . (2.71)
  • 36. 26 SOLUTION OF EQUATIONS O x x x2 x1 x0 f(x0) y Figure 2.3 The tangent method. 2.3 THE TANGENT METHOD (NEWTON) Let us consider the equation3 f (x) = 0, (2.72) with the root x, and let (x − , x + ) be the interval on which equation (2.72) has a single solution (obviously x). Let us consider the point x0 ∈ (x − , x + ) and let us construct the tangent to the graph of the function f at the point (x0, f (x0)) (Fig. 2.3); the corresponding equation is y − f (x0) = f (x0)(x − x0). (2.73) The point of intersection with the Ox-axis is given by −f (x0) = f (x0)(x1 − x0), (2.74) from which x1 = x0 − f (x0) f (x0) . (2.75) The last formula allows the construction of a recurrent sequence {xn}n∈N in the form x0 ∈ (x − , x + ), xn+1 = xn − f (xn) f (xn) . (2.76) The tangent method consists in the construction of the terms of the sequence {xn}n∈N until a certain criterion of stopping is satisfied, that is, until we obtain x or until the modulus of the difference between two consecutive terms xn and xn+1 of the sequence is smaller than an ε a priori given. 3 The method is sometimes called the Newton–Raphson method. It appears in De analysi per aequationes numero terminorum infinitas (1711) by Isaac Newton (1642–1727) used for finding polynomials roots, in De metodis fluxionum et serierum infinitarum (1736) by Isaac Newton, again for polynomials roots, in A Treatise of Algebra both Historical and Practical by John Wallis (1690), and in Analysis aequationum universalis (1690) by Joseph Raphson (circa 1648–circa 1715). The general case of the method for arbitrary equations was given by Thomas Simpson in 1740.
  • 37. THE TANGENT METHOD (NEWTON) 27 x∗ xO y Figure 2.4 The tangent at the point x∗ at the graph of the function f is horizontal and it does not intersect the Ox-axis. Therefore, we state the following theorem. Theorem 2.5 Let us consider the function f : (x − , x + ) → R, with f (x) = 0, which has a single root in the interval (x − , x + ). Let us suppose that f is twice derivable on (x − , x + ) and that there exist the real, strictly positive constants α > 0, β > 0 so as to have |f (x)| ≥ α, |f (x)| ≤ β for any x ∈ (x − , x + ). If we denote by λ the value min{ , 2α/β}, then, for any x0 ∈ (x − λ, x + λ), the sequence {xn}n∈N, defined by relations (2.76), converges to x. Demonstration. Let us observe firstly that, because of the hypothesis of the existence of the constant α > 0, the derivative f does not vanish on the interval (x − , x + ). Hence, the situation considered in Figure 2.4, in which the tangent to the graph of the function f at the point x∗ is horizontal, cannot be accepted in the hypothesis considered. Taking into account the existence of the constant β > 0, we can state that if x ∈ (x − , x + ) such that |f (x)| < ∞ exists, then we have |f (x)| < ∞ for any x ∈ (x − , x + ). There- fore, let us apply Lagrange’s formula of finite increments to the function f on the interval defined by the ends x and x. We deduce the existence of a point ξ in this interval for which we have |f (x) − f (x)| = |x − x| · |f (ξ)| ≤ β|x − x| ≤ β , (2.77) from which 2β + f (x) ≤ f (x) ≤ 2β + f (x), (2.78) hence |f (x)| < ∞. Thus, the hypotheses of the theorem avoid the situation in Figure 2.5, which leads to x = x∗ in the iteration sequence (2.76) (xn = xn+k, for any k ≥ 1). y x∗O x Figure 2.5 The tangent at the point x∗ at the graph of the function f is vertical and, by iteration of relation (2.76) for x = x∗ we get xn = xn+k for any k ≥ 1.
  • 38. 28 SOLUTION OF EQUATIONS We cannot have |f (x)| = ∞ for any x ∈ (x − , x + ) because the graph of f would be a vertical straight line passing through x∗ so that f can no more be a function in the sense of the known definition. The sequence {xn}n∈N satisfies the relation |x − xn+1| ≤ β 2α |x − xn|2 . (2.79) Indeed, we may write successively |x − xn+1| = x − xn + f xn f (xn) = f xn (x − xn) + f (xn) f (xn) = −f xn (x)xn − f (xn) f (xn) , (2.80) so that |x − xn+1| = f (x) − f (xn) − f (xn)(x − xn) f (xn) , (2.81) because f (x) = 0. On the other hand, by representing the function f by means of a Taylor series around the point xn, we have f (x) = f (xn) + x − xn 1! f (xn) + (x − xn)2 2! f (ξ), (2.82) where ξ is a point situated between x and xn. From relations (2.81) and (2.82) we get |x − xn+1| = x − xn 2 f (ξ) 2! 1 f xn (2.83) and, taking into account that |f (xn)| ≥ α, |f (ξ)| ≤ β, we obtain equation (2.79). To show that the sequence {xn}n∈N has its terms in the interval (x − λ, x + λ), we use an induction method. The affirmation is obvious for n = 0 because of the choice of x0. Let us now suppose that xn ∈ (x − λ, x + λ). From equation (2.79) we get |x − xn+1| ≤ β 2α |x − xn|2 < β 2α · λ2 = β 2α λ · λ ≤ λ, (2.84) which leads to −λ < x − xn+1 < λ, x − λ < xn+1 < x + λ, (2.85) hence xn+1 ∈ (x − λ, x + λ). Therefore, if xn ∈ (x − λ, x + λ), then xn+1 ∈ (x − λ, x + λ) and also x0 ∈ (x − λ, x + λ). It follows that xn ∈ (x − λ, x + λ) for any n ∈ N. To show that {xn}n∈N converges to x, we multiply expression (2.79) by β/(2α). We obtain β 2α |x − xn+1| ≤ β 2α x − xn 2 . (2.86) Let us denote by {zn}n∈N the sequence defined by zn = β 2α (x − xn), n ∈ N, (2.87) so that equation (2.86) can now be written as zn+1 ≤ z2 n. (2.88)
  • 39. THE TANGENT METHOD (NEWTON) 29 Written for n − 1, n − 2, . . . , 0, relation (2.88) leads to zn+1 ≤ z2n+1 0 . (2.89) On the other hand, z0 = β 2α |x − x0| < β 2α λ ≤ 1, (2.90) corresponding to the definition of λ. Finally, there results lim n→∞ zn = 0, lim n→∞ β 2α |x − xn| = 0, (2.91) from which lim n→∞ xn = x, (2.92) so that the sequence {xn}n∈N converges to the single root x ∈ (x − , x + ) of the equation f (x) = 0. Proposition 2.1 (a priori estimation of the tangent method). If λ < 2α/β, then the relation |x − xn| ≤ 2α β β 2α λ 2n (2.93) exists under the conditions of Theorem 2.5. Demonstration. We can easily obtain β 2α (x − xn) ≤ β 2α x − x0 2n < β 2α λ 2n , (2.94) from relation (2.79), and the proposition is proved. Observation 2.2 To obtain the root x with a precision ε we get, from formula (2.93), the estimation 2α β β 2α λ 2n < ε, (2.95) from which we get the number of iteration steps n = ln ln εβ 2α / ln βλ 2α / ln 2 + 1, (2.96) whereby the entire part of the function within square brackets is proved. Proposition 2.2 (a posteriori estimation of the error in the tangent method). We have the expression |xn+1 − x| ≤ β 2α |xn+1 − xn|2 , (2.97) in the frame of Theorem 2.5.
  • 40. 30 SOLUTION OF EQUATIONS Demonstration. By expansion into a Taylor series of the function f around xn, we get f (xn+1) = f (xn) + xn+1 − xn 1! f (xn) + (xn+1 − xn)2 2! f (ζ), (2.98) from which f (xn+1) − f (xn) − xn+1 − xn 1 f (xn) = (xn+1 − xn)2 2 f (ζ), (2.99) where ζ is a point situated between xn and xn+1. Applying the modulus to equation (2.99) and taking into account equation (2.76), we get |f (xn+1)| = (xn+1 − xn)2 2 |f (ζ)|. (2.100) On the other hand, from the hypotheses of Theorem 2.5 we obtain |f (ζ)| ≤ β, (2.101) and relation (2.100) may be transcribed into the form |f (xn+1)| ≤ β 2 |xn+1 − xn|2 . (2.102) Applying the formula of finite increments to the function f between the points xn+1 and x (the root of the equation f (x) = 0 in the interval (x − , x + )), the existence of a point ξ between xn+1 and x such that f (xn+1) − f (x) = f (ξ)(xn+1 − x) (2.103) is proved. Taking into account that f (x) = 0, relations (2.102) and (2.103) lead to |f (ξ)||xn+1 − x| ≤ β 2 |xn+1 − xn|2 (2.104) and, taking into account that |f (ξ)| ≥ α, we obtain relation (2.97), which we had to prove. Observation 2.3 To obtain the root x with precision ε, formula (2.97) leads to β 2α |xn+1 − xn|2 < ε, (2.105) from which |xn+1 − xn| < 2αε β ; (2.106) the iteration algorithm continues until the modulus of the difference of two consecutive iterations becomes smaller than √ 2αε/β. Theorem 2.6 Let f : [a, b] → R a function that satisfies the following conditions: (i) f is strictly positive on (a, b), that is, f (x) > 0, (∀) x ∈ (a, b); (ii) f is strictly positive on (a, b), hence f (x) > 0, (∀) x ∈ (a, b); (iii) f has a single root x in the interval (a, b).
  • 41. THE TANGENT METHOD (NEWTON) 31 In the above hypotheses, the sequence {xn}n∈N, defined by relation (2.76) with f (x0) > 0, is a sequence of real numbers that converges to x. Demonstration. The sequence {xn}n∈N is a decreasing one. To prove this, we write Taylor’s relation for the points xn+1 and xn so that f (xn+1) = f (xn) + xn+1 − xn 1! f (xn) + (xn+1 − xn)2 2! f (ξ), (2.107) where ξ is a point between xn and xn+1. On the other hand, from relation (2.76) we obtain f (xn) + f (xn)(xn+1 − xn) = 0, (2.108) which, replaced in formula (2.107), leads to f (xn+1) = f (ξ) 2 (xn+1 − xn)2 . (2.109) Taking into account hypothesis (ii), we get f (xn+1) > 0, (∀) n ≥ 0, and because f (x0) > 0 it follows that f (xn) > 0, (∀) n ∈ N. Relation (2.76) may be written in the form xn+1 − xn = − f (xn) f (xn) (2.110) and because f (xn) > 0, f (xn) > 0 (hypothesis (i)), we have xn+1 − xn < 0, (2.111) and hence the sequence {xn}n∈N is a decreasing one (even strictly decreasing). The sequence {xn}n∈N is inferior bounded by x, the unique solution of the equation f (x) = 0 in the interval (a, b). Indeed, because f (xn) ≥ 0, (∀) n ∈ N, and the function f is strictly increasing on (a, b) (hypothesis (i)) and f (x) = 0, we obtain xn ≥ x, (∀) n ∈ N, and hence the sequence {xn}n∈N is inferior bounded by x. From the previous two steps, we deduce that {xn}n∈N is convergent; let x∗ be its limit. Passing to the limit for n → ∞ in the definition relation (2.76), we get lim n→∞ xn+1 = lim n→∞ xn − lim n→∞ f (xn) f (xn) , (2.112) from which x∗ = x∗ − f (x∗ ) f (x∗) , (2.113) hence f (x∗ ) = 0. But f (x) = 0 and f have a single root in (a, b) such that x∗ = x; hence the theorem is proved. Observation 2.4 (i) Theorem 2.6 makes sure that, in the conditions of the hypotheses, the sequence {xn}n∈N is convergent to x with f (x) = 0, and x0 can be taken arbitrarily in the interval (a, b), with the condition f (x0) > 0. In particular, if the conditions (i) and (ii) are satisfied at the point b, we can take x0 = b.
  • 42. 32 SOLUTION OF EQUATIONS (ii) If the function f is strictly concave and decreasing, then we can consider the function −f , which has the same root x, attaining the hypotheses of Theorem 2.6. (iii) If f is strictly convex and decreasing, then we can take x0 = a, assuming that the hypotheses (i) and (ii) of Theorem 2.6 are satisfied at the point a. (iv) If the function f is strictly concave and increasing, then we consider the function −f , which satisfies the conditions of point (iii) of this observation. Observation 2.5 We can no more give formulae for an a priori or an a posteriori estimation of the error in the conditions of Theorem 2.6. Therefore, the sequence of iterations stops usually when |xn+1 − xn|2 < ε, where ε is the imposed error. Observation 2.6 Newton’s method presented here has at least two deficiencies. The first one consists in the choice of intervals of the form (x − µ, x + µ), where x is the required solution, that is, intervals centered just at the point x, which is unknown. This deficiency can be easily eliminated for functions twice differentiable as shown later. The second deficiency arises because in any iteration step we must calculate f (xn) as well as f (xn). We can construct a simplified Newton’s method in which we need not calculate f (xn) every time, but always use f (x0). Such a method is given by Theorem 2.8. Theorem 2.7 (general procedure of choice of the start point x0). Let f : [a, b] → R be a func- tion twice differentiable for which f (a) < 0 and f (b) > 0. Let us suppose that there exist the strict positive constants α and β such that |f (x)| ≥ α and |f (x)| ≤ β for any x ∈ [a, b]. We apply the bisection method to the equation f (x) = 0 on the interval [a, b] until we obtain an interval [m1, m2] for which a < m1, m2 < b and m2 − m1 < 2α/β. Choosing x0 ∈ (m1, m2), the sequence of successive iterations given by Newton’s method converges to the unique solution x of the equation f (x) = 0 in the interval [a, b]. Demonstration. From the condition |f (x)| ≥ α, α > 0 and because f is twice differentiable, it follows that f (x) does not change the sign in the interval [a, b]. But f (a) < 0 and f (b) > 0, and hence f is strictly increasing (f (x) > 0, (∀)x ∈ [a, b]). Hence, f has a single solution x in the interval [a, b] so that such a hypothesis is not necessary. Let [γn, γn] be the interval obtained at the nth iteration in the bipartition method. It is known that the sequences {γn}n∈N and {γn}n∈N converge to x. Let us introduce the value ε = min x − a, b − x, 2α β ; (2.114) we observe that ε > 0. There result the following statements: • there exists n such that |γn − x| < ε for n > n ; • there exists n such that |γn − x| < ε for n > n ; • there exists n such that |γn − x| < ε for n > n . Let us denote nε = max{n , n , n }. From the above three statements, we obtain |γn − x| < ε, |γn − x| < ε, |γn − x| < ε, with n > nε. (2.115)
  • 43. THE TANGENT METHOD (NEWTON) 33 We denote by [m1, m2] the interval [γn, γn] corresponding to n = nε + 1. The first inequality (2.115) leads to −ε < x − m1 < ε; (2.116) hence, because a − x > ε, we get m1 > a. Analogically, from the second relation (2.115) we obtain m2 < b, and hence the last relation (2.115) leads to m2 − m1 < ε < 2α β . (2.117) On the other hand, the interval [m1, m2] can be written in the form a + i b − a 2nε+1 , a + (i + 1) b − a 2nε+1 , (2.118) with i ∈ N, i > 0 (because m1 > a) and i + 1 < 2nε+1 (because m2 < b). We have m1 − (m2 − m1) = a + (i − 1) b − a 2nε+1 ≥ a, (2.119) m2 − (m2 − m1) = a + (i + 2) b − a 2nε+1 ≤ b. (2.120) Considering that x ∈ (m1, m2), we get m1 > x − (m2 − m1), m2 < x + (m2 − m1), x − (m2 − m1) > a, x + (m2 − m1) < b. (2.121) Introducing the notation = m2 − m1, (2.122) we are led to the sequence of inclusions (m1, m2) ⊂ (x − , x + ) ⊂ [a, b]. (2.123) On the other hand, m2 − m1 < 2α/β, hence λ = m2 − m1 = in Theorem 2.5. Theorem 2.8 (simplified Newton’s method). Let f : (x − , x + ) → R be a function for which x is its single root in the interval (x − , x + ). Let us suppose that f is twice differentiable on (x − , x + ) and that there exist two strictly positive constants α and β such that |f (α)| ≥ α and |f (x)| ≤ β for any x ∈ (x − , x + ). Also, let λ be such that 0 < λ < min{ , α/(2β)}. Under these conditions, the sequence {xn}n∈N defined by x0 ∈ (x − λ, x + λ), xn+1 = xn − f (xn) f (x0) , with f (x0) = 0 (2.124) converges to x (Fig. 2.6). Demonstration. Let us show that xn ∈ (x − λ, x + λ) for any n ∈ N using the induction method. By the choice of x0, it follows that the statement is true for n = 0. Let us suppose that the affirmation is true for n and let us state it for n + 1. We have, successively, |x − xn+1| = x − xn + f xn f (x0) = f x0 (x − xn) + f (xn) f (x0) . (2.125)
  • 44. 34 SOLUTION OF EQUATIONS O x x x2 x1 x0 f(x0) y Figure 2.6 Simplified Newton’s method. On the other hand, f (x) = 0, and the previous relation leads to |x − xn+1| = 1 |f (x0)| |f (x0)(x − xn) + f (xn) − f (x)|. (2.126) Let us now apply Lagrange’s formula of finite increments to the function f on the interval defined by the points xn and x. It results in the existence of a point ξ situated between xn and x such that f (xn) − f (x) = f (ξ)(xn − x). (2.127) Relation (2.126) becomes |x − xn+1| = 1 |f (x0)| |[f (x0) − f (ξ)](x − xn)|. (2.128) We now apply Lagrange’s formula to the function f on the interval defined by the points x0 and ξ; let us deduce that there exists a point ζ in this interval such that f (x0) − f (ξ) = f (ζ)(x0 − ξ). (2.129) Relation (2.128) now becomes |x − xn+1| = 1 |f (x0)| |f (ζ)||x0 − ξ||x − xn|. (2.130) Taking into account the hypotheses of the theorem concerning the derivatives f and f and the constants α > 0 and β > 0, relation (2.130) leads to |x − xn+1| ≤ β α |x0 − ξ||x − xn|. (2.131) We may now write the following sequence of relations |x0 − ξ| = |x0 − x + x − ξ| ≤ |x0 − x| + |x − ξ| ≤ λ + λ = 2λ; (2.132)
  • 45. THE TANGENT METHOD (NEWTON) 35 from equation (2.131) and equation (2.132) we obtain |x − xn+1| ≤ 2βλ α |x − xn|. (2.133) By the choice of λ in the hypotheses of the theorem, we get 2βλ/α < 1; hence, |x − xn+1| < |x − xn|. (2.134) The induction hypothesis |x − xn| < λ leads to |x − xn+1| < λ, hence xn+1 ∈ (x − λ, x + λ), and the induction principle states that xn ∈ (x − λ, x + λ) for any n ∈ N. Let us show that xn → x for n → ∞. We write relation (2.133) for n − 1, n − 2, . . . , 0, hence |x − xn+1| ≤ 2βλ α n+1 |x − x0|; (2.135) because 2βλ/α < 1, we get |x − xn+1| → 0 for n → ∞, (2.136) that is, lim n→∞ xn = x, and the theorem is proved. Proposition 2.3 (a priori estimation of the error in Newton’s simplified method). The relation |x − xn| ≤ 2βλ α n λ (2.137) exists under the conditions of Theorem 2.8. Demonstration. If we write relation (2.135) for n, that is, |x − xn| ≤ 2βλ α n |x − x0|, (2.138) and if we consider that x0 ∈ (x − λ, x + λ), hence |x − x0| < λ, we obtain the formula required. Observation 2.7 If we wish to determine x with an imposed accuracy ε, then we have to consider |x − xn| ≤ 2βλ α n λ < ε; (2.139) we thus obtain the necessary iteration steps in the simplified Newton’s method n = ln ε λ / ln 2βλ α + 1, (2.140) where, as is usual, the square brackets denote the entire part of the function. Proposition 2.4 (a posteriori estimation of the error in the simplified Newton method). The relation |xn+1 − x| ≤ 2βλ α |xn+1 − xn| (2.141) exists under the conditions of Theorem 2.8.
  • 46. 36 SOLUTION OF EQUATIONS Demonstration. Let us write Taylor’s formula for the function f at the points xn+1 and xn. We have f (xn+1) = f (xn) + xn+1 − xn 1! f (xn) + (xn+1 − xn)2 2! f (ξ), (2.142) where ξ is a point between xn and xn+1. From the definition of the sequence {xn}n∈N, we obtain the relation f (xn) = f (x0)(xn − xn+1), (2.143) which, when replaced in equation (2.142), leads to f (xn+1) = [f (x0) − f (xn)](xn − xn+1) + (xn+1 − xn)2 2 f (ξ). (2.144) Let us now apply Lagrange’s formula to the function f (x) for the points x0 and xn. It follows that there exists ζ such that f (x0) − f (xn) = f (ζ)(x0 − xn). (2.145) From equation (2.145) and equation (2.144), we get f (xn+1) = f (ζ)(x0 − xn)(xn − xn+1) + (xn+1 − xn)2 2 f (ξ). (2.146) In modulus, we obtain |f (xn+1)| = f (ζ) (x0 − xn)(xn − xn+1) + (xn+1 − xn)2 2 f (ξ) ≤ f (ζ) |x0 − xn| + |f (ξ)| 2 |xn+1 − xn| |xn+1 − xn|. (2.147) On the other hand, we have |x0 − xn| = |x0 − x + x − xn| ≤ |x0 − x| + |x − xn| < 2λ (2.148) and |xn+1 − xn| = |xn+1 − x + x − xn| ≤ |xn+1 − x| + |x − xn| < 2λ. (2.149) Hence, |f (xn+1)| ≤ f (ζ) |x0 − xn| + |f (ξ)| 2 |xn+1 − xn| |xn+1 − xn| < [2λ|f (ζ)| + λ|f (ξ)|]|xn+1 − xn|. (2.150) The condition of boundedness of |f (x)| on (x − , x + ), expressed by |f (x)| < β with β > 0, and relation (2.150) lead to |f (xn+1)| < 3βλ|xn+1 − xn|. (2.151) Let us now apply Lagrange’s formula to the function f and for the points xn+1 and x, f (xn+1) − f (x) = f (γ)(xn+1 − x), (2.152) where γ is a point situated between xn+1 and x.
  • 47. THE CONTRACTION METHOD 37 On the other hand, f (x) = 0 so that f (xn+1) = f (γ)(xn+1 − x), (2.153) which, when introduced in relation (2.151), leads to |f (γ)||xn+1 − x| < 3βλ|xn+1 − xn|. (2.154) Considering that |f (x)| ≥ α for any x ∈ (x − , x + ), the above formula leads to relation (2.141) so that the proposition is proved. Observation 2.8 If we wish to determine x with an imposed precision ε, then we must continue the sequence of iterations (2.124) until |xn+1 − xn| < αε 3βλ . (2.155) Observation 2.9 The statements in Observation 2.4 remain valid in this case too. 2.4 THE CONTRACTION METHOD Let us consider the equation f (x) = 0 (2.156) with f : I → R, where I is an interval of the real axis. We suppose that we can rewrite the formula in the form x = φ(x), (2.157) assuming that x is a solution of equation (2.156) if and only if it is a solution of equation (2.157). Definition 2.1 The roots of equation (2.157) are called fixed points of the function φ. Observation 2.10 The passing from equation (2.156) to equation (2.157) is not unique. Indeed, let us consider that φ(x) = x − λf (x), (2.158) where λ is a real arbitrary parameter. In this case, any root x of equation (2.156) is also a root of equation (2.157) and the converse is also true. Let us consider an approximation x0 of the root of equation (2.157) and let us construct the sequence {xn}n∈N defined by the relation of recurrence xn+1 = φ(xn), n ≥ 0. (2.159) We have to state sufficient conditions for this sequence so as to converge to the root x of equation (2.157).
  • 48. 38 SOLUTION OF EQUATIONS Definition 2.2 Let B be a Banach space and φ : B → B an application for which there exists q ∈ (0, 1) such that for any two elements x and y of B we have φ(x) − φ(y) ≤ q x − y . (2.160) Such a function is called contraction. Theorem 2.9 (Stefan Banach (1892–1945)). Let B be a Banach space and φ a contraction on it. In this case, the sequence {xn}n∈N defined by equation (2.159) is convergent to the unique root x for any x0 ∈ B. Demonstration. Let us consider two successive terms xn and xn+1 of the sequence {xn}n∈N for which we can write xn+1 − xn = φ(xn) − φ(xn−1) ≤ q xn − xn−1 ≤ q2 xn−1 − xn−2 ≤ · · · ≤ qn x1 − x0 . (2.161) On the other hand, xn+p − xn = xn+p − xn+p−1 + xn+p−1 − xn+p−2 + · · · + xn+1 − xn ≤ xn+p − xn+p−1 + xn+p−1 − xn+p−2 + · · · + xn+1 − xn ≤ qn+p−1 x1 − x0 + qn+p−2 x1 − x0 + · · · + qn x1 − x0 = qn x1 − x0 (qp−1 + qp−2 + · · · + 1) = qn x1 − x0 1 − qp 1 − q < qn 1 − q x1 − x0 . (2.162) The sequence {xn}n∈N is a Cauchy one. Indeed, for any ε > 0 there exists nε ∈ N such that for any n ≥ nε and for any p > 0, p ∈ N, to have the relation xn+p − xn < ε. (2.163) It is sufficient to assume qn 1 − q x1 − x0 < ε, (2.164) as relation (2.162) suggests; hence, {xn}n∈N is a Cauchy sequence. Because B is a Banach space, we state that {xn}n∈N is convergent, and let x∗ = lim n→∞ xn. (2.165) We observe that φ satisfies condition (2.160) because it is a contraction, and hence it is contin- uous. We may write x∗ = lim n→∞ φ(xn) = φ( lim n→∞ xn) = φ(x∗ ), (2.166) Hence, x∗ = x is a root of equation (2.157). Let us show that x is the unique solution of equation (2.157). Per absurdum, let us suppose that x is not the unique solution of equation (2.157) and let x be another solution of the same. We have x − x = φ(x) − φ(x) ≤ q x − x < x − x , (2.167) because φ is a contraction, and hence x is unique. Corollary 2.1 Let φ : [a, b] → R so that (a) for any x ∈ [a, b], we have φ(x) ∈ [a, b];
  • 49. THE CONTRACTION METHOD 39 (b) there exists q ∈ (0, 1), such that for any x, y of [a, b] we have |φ(x) − φ(y)| ≤ q|x − y|. (2.168) Under these conditions, (i) if x0 ∈ [a, b], then xn ∈ [a, b] for any n ∈ N and the sequence {xn}n∈N is convergent; (ii) if x = lim n→∞ xn, then x is the unique root of equation (2.157) in [a, b]. Demonstration. We can apply Banach’s theorem 2.9 because the set of real numbers R is a Banach space and relation (2.170) shows that φ is a contraction. On the other hand, φ(x) ∈ [a, b] for any x ∈ [a, b] and, because x0 ∈ [a, b], we successively deduce that x1 = φ(x0) ∈ [a, b], x2 ∈ [a, b], . . . , xn ∈ [a, b], . . . ; hence the corollary is proved. Corollary 2.2 Let φ : [a, b] → R so that (a) we have φ(x) ∈ [a, b] for any x ∈ [a, b]; (b) φ is differentiable on [a, b] and there exists q ∈ (0, 1) such that |φ (x)| ≤ q < 1, for any x ∈ [a, b]. (2.169) Under these conditions, (i) if x0 ∈ [a, b], then xn ∈ [a, b] for any n ∈ N and the sequence {xn}n∈N is convergent; (ii) if x = lim n→∞ xn, then x is the only root of equation (2.170) in [a, b]. Demonstration. Let us consider x ∈ [a, b], y ∈ [a, b], x < y. Under these conditions, we can apply Lagrange’s formula of finite increments to the function φ on the interval [x, y]. Hence, there exists ξ ∈ (x, y) such that φ(y) − φ(x) = φ (ξ)(x − y). (2.170) Applying the modulus, we get |φ(x) − φ(y)| = |φ (ξ)||x − y|, (2.171) from which |φ(x) − φ(y)| ≤ sup ξ∈[a,b] |φ (ξ)||x − y| ≤ q|x − y|, (2.172) so that we can use Corollary 2.1. Observation 2.11 To apply a method using the above considerations, we must solve the following problems: (i) the determination of the interval [a, b] so as to have φ(x) ∈ [a, b] for any x ∈ [a, b]; (ii) φ is a contraction on the interval [a, b].
  • 50. 40 SOLUTION OF EQUATIONS Proposition 2.5 Let φ : [a − λ, a + λ] → R be a contraction of the contraction constant q. If |φ(a) − a| ≤ (1 − q)λ, then there exists the relation φ([a − λ, a + λ]) ⊆ [a − λ, a + λ]. Demonstration. Let x ∈ [a − λ, a + λ]. We have |φ(x) − a| = |φ(x) − φ(a) + φ(a) − a| ≤ |φ(x) − φ(a)| + |φ(a) − a|. (2.173) On the other hand, φ is a contraction, hence |φ(x) − φ(a)| ≤ q|x − a|. (2.174) If we take into account the hypothesis and relation (2.174), then relation (2.173) leads to |φ(x) − a| ≤ q|x − a| + (1 − q)λ. (2.175) Because x ∈ [a − λ, a + λ], it follows that |x − a| ≤ λ (2.176) so that relation (2.175) allows |φ(x) − a| ≤ qλ + (1 − q)λ = λ, (2.177) that is, φ(x) ∈ [a − λ, a + λ], for any x ∈ [a − λ, a + λ], (2.178) and the proposition is proved. Proposition 2.6 Let φ : [a, b] → R. If φ satisfies the conditions (a) φ is differentiable on [a, b]; (b) the equation x = φ(x) has a root x ∈ (α, β), with α = a + b − a 3 , β = b − b − a 3 ; (2.179) (c) there exists q ∈ (0, 1) such that |φ (x)| ≤ q < 1, for any x ∈ [a, b]; (2.180) (d) x0 ∈ (α, β); then (i) the sequence {xn}n∈N has all the terms in the interval (a, b); (ii) the sequence {xn}n∈N is convergent and lim n→∞ xn = x; (iii) x is the unique solution of the equation x = φ(x) in (a, b). Demonstration. The points (ii) and (iii) are obvious consequences of Corollary 2.2.
  • 51. THE CONTRACTION METHOD 41 To demonstrate point (i), let x1 = φ(x0). Applying the finite increments formula to the function φ between the points x0 and x, it follows that there exists ξ between x0 and x such that |x1 − x| = |φ(x0) − φ(x)| = |φ (ξ)||x0 − x|. (2.181) On the other hand, |φ (ξ)| ≤ sup ξ∈[a,b] |φ (ξ)| ≤ q < 1 (2.182) and relation (2.181) allows |x1 − x| ≤ q|x0 − x| ≤ q(β − α) < b − a 3 ; (2.183) hence, x1 ∈ (a, b). Let us suppose that xn−1 ∈ (a, b) and |xn−1 − x| < (b − a)/3. We wish to show that |xn − x| < (b − a)/3. We have |xn − x| = |φ(xn−1) − φ(x)|. (2.184) We now apply Lagrange’s finite increments formula between the points xn−1 and x so that |φ(xn−1) − φ(x)| = |xn−1 − x||φ (ζ)| ≤ |xn−1 − x| sup ζ∈[a,b] |φ (ζ)| ≤ q b − a 3 < b − a 3 ; (2.185) hence, xn ∈ (a, b); this is valid for any n ∈ N, taking into account the mathematical induction principle. Proposition 2.7 (a priori estimation of the error in the contractions method). Let x = φ(x) with φ : [a, b] → [a, b], φ contraction, and let x be its unique root in [a, b]. Let {xn}n∈N be the sequence of successive approximations defined by the recurrence relation (2.159). Under these conditions, there exists the relation |xn − x| ≤ qn (b − a), (2.186) where q is the contraction constant of φ, 0 < q < 1. Demonstration. Formula (2.186) is an obvious consequence of the successive relations |xn − x| = |φ(xn−2) − φ(x)| ≤ q|xn−1 − x| = q|φ(xn−2) − φ(x)| ≤ q2 |xn−2 − x| ≤ · · · ≤ qn |x0 − x|, (2.187) where |x0 − x| ≤ b − a. (2.188) Observation 2.12 To determine the solution x of equation (2.157) with precision ε, we must determine the necessary number nε of iterations from qn (b − a) < ε, (2.189) from which nε = ln ε/ (b − a) ln q + 1, (2.190) where the square brackets represent the entire part function.
  • 52. 42 SOLUTION OF EQUATIONS Proposition 2.8 (a posteriori estimation of the error in the contractions method). Let x= φ(x) with φ : [a, b] → [a, b] a contraction of the contraction constant q, 0 < q < 1, and let x be the unique root of this equation in [a, b]. Let us also consider the sequence {xn}n∈N of successive approximations defined by the recurrence relation (2.159). Under these conditions, there exists the relation |xn − x| ≤ 1 1 − q |xn+1 − xn| (2.191) for any n ∈ N. Demonstration. Formula (2.162) leads to the relation |xn+p − xn| = |xn+p − xn+p−1 + xn+p−1 − xn+p−2 + · · · + xn+1 − xn| ≤ |xn+p − xn+p−1| + |xn+p−1 − xn+p−2| + · · · + |xn+1 − xn| ≤ qp−1 |xn+1 − xn| + qp−2 |xn+1 − xn| + · · · + |xn+1 − xn| = 1 − qp 1 − q |xn+1 − xn|. (2.192) We pass to the limit for p → ∞ in relation (2.192), hence lim p→∞ |xn+p − xn| ≤ lim p→∞ 1 − qp 1 − q |xn+1 − xn| (2.193) and, because lim p→∞ xn+p = x and lim p→∞ qp = 0, we obtain formula (2.191), which had to be proved. Observation 2.13 To determine the solution of equation (2.157) with precision ε, we must calculate the terms of the sequence {xn}n∈N until 1 1 − q |xn+1 − xn| < ε, (2.194) from which |xn+1 − xn| < ε(1 − q). (2.195) 2.5 THE NEWTON–KANTOROVICH METHOD We now deal with a variant4 of Newton’s method, where the successive iterations sequence is defined by a contraction. Theorem 2.10 Let f : [x∗ − λ, x∗ + λ] → R, f (x∗ ) = 0, be a twice differentiable function. Let us denote this as a = |f (x∗ )|, (2.196) c = |f (x∗ )|. (2.197) We also suppose that there exists b > 0 such that |f (x)| ≤ b, for any x ∈ [x∗ − λ, x∗ + λ] (2.198) 4The theorem was stated by Leonid Vitaliyevich Kantorovich (1912–1986) in 1940.
  • 53. THE NEWTON–KANTOROVICH METHOD 43 and let us denote µ = bc 2a2 . (2.199) If µ < 1/4, under these conditions, the application g(x) = x − f (x) f (x∗) (2.200) is a contraction from [x∗ − ky∗, x∗ + ky∗] to [x∗ − ky∗, x∗ + ky∗], where k = c a , (2.201) and y∗ is the smallest solution of the equation µy2 − y + 1 = 0, (2.202) that is, y∗ = 1 − √ 1 − 4µ 2µ . (2.203) Demonstration. Firstly, we show that g([x∗ − ky∗ , x∗ + ky∗ ]) ⊆ [x∗ − ky∗ , x∗ + ky∗ ]. Let us cal- culate |g(x) − x∗ |. We have |g(x) − x∗ | = x − f (x) f (x∗) − x∗ = x − x∗ − f (x) f (x∗) + f (x∗ ) f (x∗) − f (x∗ ) f (x∗) = f (x∗ ) (x − x∗ ) − f (x) + f (x∗ ) f (x∗) − f (x∗ ) f (x∗) ≤ 1 |f (x∗)| |f (x∗ )(x − x∗ ) − f (x) + f (x∗ )| + f (x∗ ) f (x∗) . (2.204) If we take into account relations (2.196), (2.197), and (2.201), then relation (2.204) leads to |g(x) − x∗ | ≤ 1 a |f (x∗ )(x − x∗ ) − f (x) + f (x∗ )| + k. (2.205) Taylor’s formula written for the points x and x∗ leads to f (x) = f (x∗ ) + f (x∗ )(x − x∗ ) + 1 2 f (ξ)(x − x∗ )2 , (2.206) where ξ is a point situated between x and x∗. Obviously, it follows that |f (x∗ )(x − x∗ ) − f (x) + f (x∗ )| ≤ 1 2 |f (ξ)|(x − x∗ )2 (2.207) and, taking into account condition (2.198), we have |f (x∗ )(x − x∗ ) − f (x) + f (x∗ )| ≤ 1 2 b(x − x∗ )2 . (2.208)
  • 54. 44 SOLUTION OF EQUATIONS We obtain |g(x) − x∗ | ≤ b 2a |x − x∗ |2 + k (2.209) from relations (2.205) and (2.206). On the other hand, x ∈ [x∗ − ky∗ , x∗ + ky∗ ], hence |x − x∗ | ≤ ky∗ (2.210) and relation (2.209) leads to |g(x) − x∗ | ≤ b 2a k2 (y∗ )2 + k = k bc 2a2 y∗ 2 + 1 . (2.211) From relations (2.199) and (2.202), we get bc 2a2 (y∗ )2 + 1 = µ(y∗ )2 + 1 = y∗ , (2.212) hence |g(x) − x∗ | ≤ ky∗ . (2.213) Concluding, g([x∗ − ky∗ , x∗ + ky∗ ]) ⊂ [x∗ − ky∗ , x∗ + ky∗ ]. Let us show now that g is a contraction. We have |g(x)| = 1 − f (x) f (x∗) = 1 f (x∗) |f (x∗ ) − f (x)|. (2.214) Applying the finite increments formula to the function f for the points x and x∗ , it follows that there exists η between x and x∗ such that f (x∗ ) − f (x) = f (η)(x∗ − x) (2.215) and, applying the modulus to the last relation, we get |f (x∗ ) − f (x)| = |f (η)||x∗ − x|. (2.216) Taking into account equation (2.198), relation (2.216) leads to |f (x∗ ) − f (x)| ≤ b|x∗ − x|. (2.217) Relations (2.214) and (2.217) imply that |g (x)| ≤ 1 |f (x∗)| b|x∗ − x| (2.218) and, taking into account equation (2.197), we obtain |g (x)| ≤ b|x∗ − x|. (2.219) Applying now formulae (2.210), (2.201), and (2.199), we get |g (x)| ≤ b a ky∗ = 2µy∗ = 1 − 1 − 4µ. (2.220)
  • 55. THE NEWTON–KANTOROVICH METHOD 45 Because 0 < µ < 1/4, we get |g (x)| < 1 and can choose as contraction constant q = 1 − 1 − 4µ < 1, (2.221) proving that g is a contraction. Observation 2.14 We must obviously have [x∗ − ky∗ , x∗ + ky∗ ] ⊂ [x∗ − λ, x∗ + λ]. (2.222) To fulfill condition (2.222), it is sufficient that ky∗ ≤ λ, (2.223) from which k ≤ λ y∗ = 2λµ 1 − √ 1 − 4µ . (2.224) Observation 2.15 The solution x of the equation x = g(x), (2.225) which is the same as that of the equation f (x) = 0, (2.226) is obtained by constructing the sequence x0 ∈ [x∗ − ky∗ , x∗ + ky∗ ] arbitrary, xn+1 = g(xn), n ≥ 0. (2.227) Observation 2.16 The formulae that define the a priori estimation of the error |x − xn| ≤ qn 1 − q |x1 − x0| (2.228) and the a posteriori estimation of the error |x − xn| ≤ 1 1 − q |xn+1 − xn|, (2.229) respectively, are obviously those in the contractions method, specifying that q is given by equation (2.221).
  • 56. 46 SOLUTION OF EQUATIONS 2.6 NUMERICAL EXAMPLES Consider the equation f (x) = x − 1 − sin x 2 = 0, x ∈ [0, 1]. (2.230) We observe that f (0) = −0.5 and f (1) = 0.9207; we also have f (x) = 1 + 0.5 cos x, (2.231) hence f (x) > 0 for x ∈ [0, 1]. We conclude that the equation f (x) = 0 has only one root in the interval [0, 1]. Let us apply the bipartition method to solve equation (2.230). The calculation is given in Table 2.1. We may state that x ∈ [0.333984375, 0.3359375]. (2.232) We now apply the method of the chord to solve equation (2.230); the calculation may be found in Table 2.2. It follows that x ≈ 0.335418. (2.233) The recurrence formula in the tangent method reads xn+1 = xn − xn − 0.5(1 − sin xn) 1 + 0.5 cos xn . (2.234) Because f (x) = 1 + 0.5 cos x, f (x) = −0.5 sin x (2.235) TABLE 2.1 Solution of Equation (2.229) by the Bipartition Method Step an bn cn f (an) f (bn) f (cn) 0 0 1 0.5 −0.5 0.9207 0.2397 > 0 1 0 0.5 0.25 −0.5 0.2397 −0.1263 < 0 2 0.25 0.5 0.375 −0.1263 0.2397 0.0581 > 0 3 0.25 0.375 0.3125 −0.1263 0.0581 −0.0338 < 0 4 0.3125 0.375 0.34375 −0.0338 0.0581 0.0123 > 0 5 0.3125 0.34375 0.328125 −0.0338 0.0123 −0.0107 < 0 6 0.328125 0.34375 0.3359375 −0.0107 0.0123 0.0008 > 0 7 0.328125 0.3359375 0.33203125 −0.0107 0.0008 −0.005 < 0 8 0.33203125 0.3359375 0.333984375 −0.005 0.0008 −0.0021 < 0 TABLE 2.2 Solution of Equation (2.229) by the Chord Method Step an bn cn f (an) f (bn) f (cn) 0 0 1 0.351931 −0.5 < 0 0.920735 0.024287 > 0 1 0 0.351931 0.335628 −0.5 < 0 0.024287 0.000309 > 0 2 0 0.335628 0.335421 −0.5 < 0 0.000309 4 × 10−6 > 0 3 0 0.335421 0.335418 −0.5 < 0 4 × 10−6 > 0 ≈ 0
  • 57. NUMERICAL EXAMPLES 47 and f (x) > 0, f (x) ≤ 0 for x ∈ [0, 1], we deduce that the function f is strictly increasing and concave on the interval [0, 1]. We may thus choose x0 = a = 0. (2.236) The calculations are given in Table 2.3. We obtain x ≈ 0.335418. (2.237) Let us solve the same problem by means of the modified Newton method, for which xn+1 = xn − xn − 0.5(1 − sin x) 1.5 . (2.238) The calculations are given in Table 2.4. We get x ≈ 0.335418. (2.239) To solve equation (2.230) by the contractions method, we write it in the form x = 0.5(1 − sin x) = φ(x). (2.240) Taking into account that the derivative φ (x) = −0.5 cos x, |φ (x)| ≤ 0.5 < 1, (2.241) it follows that φ(x) is a contraction such that the recurrence formula is of the form xn+1 = φ(xn) = 0.5(1 − sin xn); (2.242) the calculation is given in Table 2.5. TABLE 2.3 Solution of Equation (2.229) by the Tangent Method Step xn f (xn) f (xn) 0 0 −0.5 1.5 1 0.333333 −0.003070 1.472479 2 0.335418 −4.7675 × 10−8 1.472136 3 0.335418 – – TABLE 2.4 Solution of Equation (2.229) by Means of the Modified Newton Method Step xn f (xn) 0 0 −0.5 1 0.333333 −0.003070 2 0.335380 −0.000056 3 0.335417 −2 × 10−6 4 0.335418 −4.7675 × 10−8 5 0.335418 –
  • 58. 48 SOLUTION OF EQUATIONS TABLE 2.5 Solution of Equation (2.229) by the Contractions Method Step xn φ(xn) 0 0.5 0.260287 1 0.260287 0.371321 2 0.371321 0.318577 3 0.318577 0.343392 4 0.343392 0.331650 5 0.331659 0.337194 6 0.337194 0.334580 7 0.334580 0.335814 8 0.335814 0.335231 9 0.335231 0.335506 10 0.335506 0.335377 11 0.335377 0.335437 12 0.335437 0.335409 13 0.335409 0.335422 14 0.335422 0.335416 15 0.335416 0.335419 16 0.335419 0.335418 17 0.335418 0.335418 We obtain x ≈ 0.335418. (2.243) To apply the Newton–Kantorovich method, let us consider x∗ = 0.5. (2.244) c = |f (x∗ )| = |0.5 − 0.5(1 − sin 0.5)| = 0.239713, (2.245) a = |f (x∗ )| = |1 + cos 0.5| = 1.438791, (2.246) |f (x)| = |−0.5 sin x| = 0.5 sin x ≤ 0.5 sin 1 = 0.420735; (2.247) we may thus take |f (x)| ≤ b = 0.43, (2.248) µ = bc 2a2 = 0.024896 < 1 4 , (2.249) λ = 0.5. (2.250) Hence, we can apply the Newton–Kantorovich method, with k = c a = 0.166607, (2.251) y∗ = 1 − √ 1 − 4µ 2µ = 1.026219, (2.252) ky∗ = 0.170975 (2.253) and the function g : [0.329025, 0.670975] → [0.329025, 0.670975] g(x) = x − f (x) f (x∗) . (2.254)
  • 59. APPLICATIONS 49 TABLE 2.6 Solution of Equation (2.229) by the Newton–Kantorovich Method Step xn f (xn) 0 0.5 0.239713 1 0.333393 −0.002981 2 0.335465 0.000069 3 0.335417 −0.000002 4 0.335418 −4.7675 × 10−8 5 0.335418 – The calculation is given in Table 2.6. We deduce that x ≈ 0.335418. (2.255) The following conclusions result: (i) the most unfavorable method is that of bisection, for which a relatively large number of steps are necessary to determine the solution with a good approximation; (ii) the number of steps in the contractions method depends on the value of the contraction constant; if this constant is close to 1, then the number of iteration steps increases; (iii) Newton’s method is quicker than the modified Newton method; (iv) the Newton–Kantorovich method has both the advantages and the disadvantages of New- ton’s and contractions methods; (v) the chord method is quicker than the bisection one, but less quick than Newton’s method. 2.7 APPLICATIONS Problem 2.1 Let us consider a material point of mass m, which moves on the Ox-axis (Fig. 2.7), under the action of a force F = − F0 b xe x b . (2.256) Determine the displacement xmax, knowing the initial conditions: t = 0, x = x0, ˙x = v0. Numerical application: x0 = 0, v0 = 40 ms−1 , F0 = 50 N, b = 2 m, m = 1 kg. F x xMO Figure 2.7 Problem 2.1. Solution: 1. Theory The theorem of variation of the kinetic energy is mv2 2 − mv2 0 2 = W, (2.257)
  • 60. 50 SOLUTION OF EQUATIONS where v is the velocity of the material point, while W = x x0 F(x)dx (2.258) is the work of the force F; imposing the condition v = 0, we obtain xmax as the solution of the equation − mv2 0 2 = x x0 F(x)dx. (2.259) In the considered case, by using the notations ξ = x b , k = mv2 0 2bF0 + x0 b e x0 b − e x0 b , (2.260) we obtain the equation ξeξ − eξ − k = 0. (2.261) 2. Numerical Calculation In the case of the numerical application, equation (2.260) takes the form ξeξ − eξ − 7 = 0, (2.262) the solution of which is ξ ≈ 1.973139, (2.263) that is, xmax = bξ ≈ 3.946278 m. (2.264) Problem 2.2 Two particles move on the Ox-axis corresponding to the laws (Fig. 2.8) x1 = A1 cos(ω1t), (2.265) x2 = d + A cos(ω2t + φ), (2.266) where ω1, ω2, φ, A1, and A2 are positive constants, while t is the time. d x1 x2 xO A Figure 2.8 Problem 2.2. Let us determine the first positive value of the time at which the two particles meet. Numerical application: ω1 = 2 s−1 , ω2 = π s−1 , φ = π/6 rad, d = 1 m, A1 = 0.6 m, A2 = 0.8 m.
  • 61. APPLICATIONS 51 Solution: The meeting condition reads x1 = x2, (2.267) from which A1 cos(ω1t) = d + A2 cos(ω2t + φ) (2.268) or cos(ω1t) = d A1 + A2 A1 cos(ω2t + φ). (2.269) Because −1 ≤ cos(ω1t) ≤ 1, we obtain a condition that must verify the parameters of the problem −1 ≤ d A1 + A2 A1 cos(ω2t + φ) ≤ 1. (2.270) In the numerical case considered, it follows that cos 2t = 1 0.6 + 0.8 0.6 cos πt + π 6 . (2.271) Let us represent graphically the functions f : R+ → R, g : R+ → R (Fig. 2.9) f (t) = 0.6 cos 2t, g(t) = 1 + 0.8 cos πt + π 6 . (2.272) 0 0.5 1 1.5 2 2.5 3 3.5 4 −1 −0.5 0 0.5 1 1.5 2 t (s) f(t)andg(t)(m) Figure 2.9 The functions f (t) (continuous line) and g(t) (dotted line). From the figure, we obtain the first point of meeting for t1 contained between 2.5 and 3 s. Solving the equation 0.6 cos 2t − 1 − 0.8 cos πt + π 6 = 0, (2.273) we obtain the required solution t1 ≈ 2.6485 s. (2.274)
  • 62. 52 SOLUTION OF EQUATIONS FURTHER READING Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of America. Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis: Theory, Methods and Practice. Boca Raton: CRC Press. Atkinson KE (1989). An Introduction to Numerical Analysis. 2nd ed. New York: John Wiley & Sons, Inc. Atkinson KE (2003). Elementary Numerical Analysis. 2nd ed. Hoboken: John Wiley & Sons, Inc. Bakhvalov N (1976). M´ethodes Num´erique. Moscou: Editions Mir (in French). Berbente C, Mitran S, Zancu S (1997). Metode Numerice. Bucures¸ti: Editura Tehnic˘a (in Romanian). Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole. Butt R (2009). Introduction to Numerical Analysis Using MATLAB. Boston: Jones and Bartlett Pub- lishers. Chapra SC (1996). Applied Numerical Methods with MATLAB for Engineers and Scientists. Boston: McGraw-Hill. Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson. Cira O, M˘arus¸ter S¸ (2008). Metode Numerice pentru Ecuat¸ii Neliniare. Bucures¸ti: Editura Matrix Rom (in Romanian). Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice Hall. D´emidovitch B, Maron I (1973). ´El´ements de Calcul Num´erique. Moscou: Editions Mir (in French). Dennis JE Jr, Schnabel RB (1987). Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Philadelphia: SIAM. DiBenedetto E (2010). Classical Mechanics: Theory and Mathematical Modeling. New York: Springer-Verlag. Epperson JF (2007). An Introduction to Numerical Methods and Analysis. Hoboken: John Wiley & Sons, Inc. Fung YC, Tong P (2011). Classical and Computational Solid Mechanics. Singapore: World Scientific Publishing. Gautschi W (1997). Numerical Analysis: An Introduction. Boston: Birkh¨auser. Godunov SK, Reabenki VS (1977). Scheme de Calcul cu Diferent¸e Finite. Bucures¸ti: Editura Tehnic˘a (in Romanian). Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen- tation of Algorithms. Princeton: Princeton University Press. Hamming RW (1987). Numerical Methods for Scientists and Engineers. 2nd ed. New York: Dover Publications. Hamming RW (2012). Introduction to Applied Numerical Analysis. New York: Dover Publications. Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing. Higham NJ (2002). Accuracy and Stability of Numerical Algorithms. 2nd ed. Philadelphia: SIAM. Hildebrand FB (1987). Introduction to Numerical Analysis. 2nd ed. New York: Dover Publications. Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill. Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd ed. Boca Raton: CRC Press. Kelley CT (1987a). Iterative Methods for Linear and Nonlinear Equations. Philadelphia: SIAM. Kelley CT (1987b). Solving Nonlinear Equations with Newton’s Method. Philadelphia: SIAM. Kleppner D, Kolenkow RJ (2010). An Introduction to Mechanics. Cambridge: Cambridge University Press. Kress R (1996). Numerical Analysis. New York: Springer-Verlag. Krˆılov AN (1957). Lect¸ii de Calcule prin Aproximat¸ii. Bucures¸ti: Editura Tehnic˘a (in Romanian).
  • 63. FURTHER READING 53 Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill. Lange K (2010). Numerical Analysis for Statisticians. 2nd ed. New York: Springer-Verlag. Lurie AI (2002). Analytical Mechanics. New York: Springer-Verlag. Marinescu G (1974). Analiz˘a Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Meriam JL, Kraige LG (2012). Engineering Mechanics: Dynamics. Hoboken: John Wiley & Sons, Inc. Otto SR, Denier JP (2005). An Introduction to Programming and Numerical Methods in MATLAB. London: Springer-Verlag. Pandrea N (2000). Elemente de Mecanica Solidului ˆın Coordonate Pl¨uckeriene. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Pandrea N, St˘anescu ND (2002). Mecanic˘a. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian). Popovici P, Cira O (1992). Rezolvarea Numeric˘a a Ecuat¸iilor Neliniare. Timis¸oara: Editura Signata (in Romanian). Postolache M (2006). Modelare Numeric˘a. Teorie s¸i Aplicat¸ii. Bucures¸ti: Editura Fair Partners (in Romanian). Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific Computing. 3rd ed. Cambridge: Cambridge University Press. Quarteroni A, Sacco R, Saleri F (2010). Numerical Mathematics. 2nd ed. Berlin: Springer-Verlag. Ralston A, Rabinowitz P (2001). A First Course in Numerical Analysis. 2nd ed. New York: Dover Publications. Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press. Salvadori MG, Baron ML (1962). Numerical Methods in Engineering. Englewood Cliffs: Prentice Hall. Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson. Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN. Bucures¸ti: Editura Tehnic˘a (in Romanian). St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian). Stoer J, Bulirsh R (2010). Introduction to Numerical Analysis. 3rd ed. New York: Springer-Verlag. S¨uli E, Mayers D (2003). An Introduction to Numerical Analysis. Cambridge: Cambridge University Press. Teodorescu PP (2010). Mechanical Systems: Classical Models. Volume 1: Particle Mechanics. Dordrecht: Springer-Verlag. Udris¸te C, Iftode V, Postolache M (1996). Metode Numerice de Calcul. Algoritmi s¸i Programe Turbo Pascal. Bucures¸ti: Editura Tehnic˘a (in Romanian).
  • 64. 3 SOLUTION OF ALGEBRAIC EQUATIONS In this chapter, we deal with the determination of limits of the roots of polynomials, including their separation. Three methods are considered, namely, Lagrange’s method, the Lobachevski–Graeffe method, and Bernoulli’s method. 3.1 DETERMINATION OF LIMITS OF THE ROOTS OF POLYNOMIALS Let f (X) = a0Xn + a1Xn−1 + · · · + an (3.1) be a polynomial in R(X), where n ∈ N∗ , ai ∈ R, i = 0, n. Let us consider the algebraic equation f (x) = a0xn + a1xn−1 + · · · + an = 0. (3.2) Theorem 3.1 All the roots of the algebraic equation (3.2) are in the circular annulus of the complex plane, defined by the inequalities |an| a + |an| ≤ |x| ≤ 1 + a |a0| , (3.3) where a and a are specified by a = max{|a1|, . . . , |an|}, a = max{|a0|, . . . , |an−1|}. (3.4) Demonstration. We now show that |x| ≤ 1 + a |a0| . (3.5) Numerical Analysis with Applications in Mechanics and Engineering, First Edition. Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.  2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc. 55
  • 65. 56 SOLUTION OF ALGEBRAIC EQUATIONS We may write |a1xn−1 + a2xn−2 + · · · + an| ≤ |a1||xn−1 | + |a2||xn−2 | + · · · + |an| ≤ a(|xn−1 | + |xn−2 | + · · · + 1) = a |x|n − 1 |x| − 1 . (3.6) If |x| > 1, then a |x|n − 1 |x| − 1 < a |x|n |x| − 1 (3.7) and relation (3.6) leads to |a1xn−1 + a2xn−2 + · · · + an| < a |x|n |x| − 1 . (3.8) Let x be a root of equation (3.2). Thus, |f (x)| = |a0xn + a1xn−1 + · · · + an| = 0 (3.9) and |f (x)| ≥ |a0xn | − |a1xn−1 + · · · + an|, (3.10) from which |a0xn | ≤ |a1xn−1 + · · · + an|. (3.11) Taking into account relations (3.6) and (3.7), the latter formula leads to the relation |a0||x|n < a |x|n |x| − 1 , (3.12) and hence inequality (3.5) is proved. To arrive at the other inequality, we perform the transformation x → 1/y, hence f (y) = anyn + an−1yn−1 + · · · + a0. (3.13) Let y be a root of this polynomial. Taking into account equation (3.5), we get |y| ≤ 1 + a |an| , (3.14) that is, 1 |x| ≤ 1 + a |an| , (3.15) hence |x| ≥ |an| |an| + a . (3.16) Observation 3.1 Let L = 1 + a |a0| , l = |an| a + |an| . (3.17) We have l < 1, L > 1, and L > l.
  • 66. DETERMINATION OF LIMITS OF THE ROOTS OF POLYNOMIALS 57 O l L x2 x1 Figure 3.1 Domain where the roots of equation (3.2) lie. Observation 3.2 The roots of equation (3.2) are in the hatched domain of the complex plane (Fig. 3.1). Observation 3.3 If equation (3.2) has positive, real roots, then formula (3.3) can be written for these roots in the form |an| a + |an| ≤ x ≤ 1 + a |a0| . (3.18) Observation 3.4 We can always assume that an = 0. In the opposite case, we may obtain an equation of the form a0xp + a1xp−1 + · · · + ap = 0, (3.19) where ap = 0. Definition 3.1 The real number L > 0 is called a superior bound of the positive roots of equation (3.2) if for any such root x, we have x < L. Definition 3.2 The real number l > 0 is called an inferior bound of the positive roots of equation (3.2) if for any such root x, we have x > l. Observation 3.5 (i) The value −l < 0 will be a superior bound of the negative roots of equation (3.2) if l > 0 is an inferior bound of the positive roots of the same equation. (ii) The value −L < 0 will be an inferior bound of the negative roots of equation (3.2) if L > 0 is a superior bound of the positive roots of the same equation. (iii) The real roots of equation (3.2) are in the set (−L, −l) ∪ (l, L). Observation 3.6 (i) Let us consider the equation f1(x) = (−1)n f (−x) = 0, (3.20) for which L1 denotes a superior bound of its positive roots. If α < 0, is a negative root of equation (3.2), then −α > 0 will be a root of equation (3.20), hence −α < L1, from which α > −L1.
  • 67. 58 SOLUTION OF ALGEBRAIC EQUATIONS (ii) Let us consider the equation f2(x) = xn f 1 x = 0 (3.21) and let L2 denote a superior bound of its positive roots. If α > 0 is a positive root of equation (3.2), then 1/α > 0 is a solution of equation (3.21) for which 1/α < L2, hence α > 1/L2. (iii) Let us now consider the equation f3(x) = (−1)n xn f − 1 x = 0. (3.22) Let L3 be a superior bound of its positive roots. If α < 0 is a negative root of equation (3.2), then −1/α > 0 is a positive root of equation (3.22), for which the relation −1/α < L3 is true. Hence, it follows that α < −1/L3. (iv) From the above considerations, it follows that the real roots of equation (3.2) belong to the set (−L1, −1/L3) ∪ (1/L2, L). Theorem 3.2 Let A be the greatest absolute value of the negative coefficients of the algebraic equation (3.2) for which a0 > 0 (eventually, by multiplying it by −1). In these conditions, a superior limit of the positive roots of this equation is given by L = 1 + A a0 1 k , (3.23) where k is the index of the first negative coefficient in the expression of the polynomial function (3.1). Demonstration. Let us specify the terms which appear in equation (3.23). Thus, A is given by A = max 1≤i≤n {|ai||ai < 0}, (3.24) while k is given by k = min{i|ai < 0, aj ≥ 0, (∀)j < i}. (3.25) Let x > 0. Then, f (x) can be written in the form f (x) = a0xn + · · · + ak−1xn−k+1 + (akxn−k + · · · + an) ≥ a0xn − A(xn−k + · · · + 1) = a0xn − B xn−k+1 − 1 x − 1 . (3.26) For x > 1, the last formula leads to f (x) = a0xn − A xn−k+1 x − 1 . (3.27) Let x be a positive root of equation (3.2). Relation (3.27) leads to 0 > a0xn − A xn−k+1 x − 1 , (3.28)
  • 68. DETERMINATION OF LIMITS OF THE ROOTS OF POLYNOMIALS 59 from which a0 < A x−(k−1) x − 1 = A xk−1 (x − 1) < A (x − 1)k , (3.29) so that x < 1 + A a0 1 k . (3.30) Observation 3.7 If all the coefficients of equation (3.2) are positive, then this equation has no positive roots. Observation 3.8 We notice that Theorem 3.2 gives more restricted bounds for the limits of the real roots of equation (3.2). Theorem 3.3 (Newton). Let f be the polynomial function given by equation (3.1), with a0 > 0, and let a ∈ R, a > 0, a number such that f (a) > 0, f (a) > 0, . . . , f (n) (a) > 0. In these conditions, a is a superior bound of the positive roots of equation (3.2). Demonstration. The expansion of f into a Taylor series around a is of the form f (x) = f (a) + (x − a) f (a) 1! + (x − a)2 f (a) 2! + · · · + (x − a)n f (n) (a) n! · · · . (3.31) We observe that if x ≥ a, then f (x) > 0 because f (i) (a) > 0 and x − a > 0. It thus follows that f cannot have roots greater than a, hence a is a superior bound of the roots of the equation f (x) = 0. Let us show that there exists such an a. We have f (n) (x) = a0n! > 0, (3.32) because a0 > 0, by hypothesis. It follows that f (n−1) (x) is strictly increasing, and hence there exists a1 ∈ R so that f (n−1) (x) > 0 for x ≥ a1. Obviously, we may consider a1 > 0. We pass now to f (n−1) (x), which is strictly positive for x > a1. It follows that there exists a2, where a2 ≥ a1, such that f (n−2) (x) > 0 for x ≥ a2. The procedure continues until an, with an ≥ an−1 ≥ · · · ≥ a1, so that f (i) (an) ≥ 0 for any i = 0, n. We now take a = an. Theorem 3.4 Let f be a polynomial function of the form (3.1) and let us suppose that a0 > 0 and that the polynomial has only one variation of sign in the sequence of its coefficients, that there exists i, 1 ≤ i < n, so that aj > 0 for any j, 0 ≤ j ≤ i, and aj < 0 for any j, i < j ≤ n. Let us suppose that there exists a ∈ R, so that f (a) ≥ 0. Then f (x) > 0 for any x > a. Demonstration. Let us write the polynomial f in the form f (x) = (a0xn + · · · + aixn−i ) − (ai+1xn−i−1 + · · · + an). (3.33) It follows that f (x) = xn−i a0xn−i + · · · + ai − ai+1 x + · · · + an xn−i . (3.34) If x increases starting from a, the expression in first parentheses will increase, while that in the second parentheses will decrease. Hence f (x) is increasing and, because f (a) ≥ 0, it follows that f (x) > 0 for x > a. Hence a is a superior bound of the positive roots of the equation f (x) = 0.
  • 69. 60 SOLUTION OF ALGEBRAIC EQUATIONS Observation 3.9 The previous theorem suggests a method to determine a superior bound of the positive roots of equation (3.2). To do this, we group the terms of the polynomial so that (i) the powers of x are decreasing in any group; (ii) the first coefficient of a group is positive; (iii) we have only one variation of sign in the interior of each group. We now determine a superior bound of the positive roots for each group; hence, the superior bound of the positive roots of equation (3.2) will be the maximum of the superior bounds of the positive roots of the groups. Observation 3.10 The method presented above, called the method of terms grouping, is sensible to the choice of the groups. 3.2 SEPARATION OF ROOTS Definition 3.3 Let {bi}i=0,m be a finite sequence of real numbers, so that bi < bi+1, i = 0, m. We say that this sequence separates the roots of the algebraic equation f (x) = 0 with f (x) = a0xn + · · · + an, (3.35) if we have a single root of this equation in any interval (bi, bi+1), i = 0, m. Observation 3.11 The sequence {bi}i=0, m can be chosen as consisting of a part of Rolle’s sequence. Proposition 3.1 Let f be a polynomial of even degree n, n = 2k, for which a0a2k < 0. The equation f (x) = a0x2k + a1x2k−1 + · · · + a2k = 0 (3.36) has at least one positive root and one negative root in these conditions. Demonstration. To fix the ideas, let us suppose that a0 > 0. Because lim x→∞ f (x) = +∞, (3.37) it follows that there exists m1 > 0, so that f (x) > 0 for any x > m1. Analogically, we have lim x→−∞ f (x) = ∞, (3.38) hence there exist m2 < 0, so that f (x) > 0 for any x < m2. Let M = max{|m1|, |m2|}. Hence, for any x, |x| > M, we will have f (x) > 0. On the other hand, f (0) = a2k < 0 according to a0a2k < 0 and a0 > 0. It follows that there exists ξ1 ∈ (−M, 0) and ξ2 ∈ (0, M), so that f (ξ1) = 0 and f (ξ2) = 0. Hence, equation (3.36) has at least one positive root, the proposition being proved. Observation 3.12 Proposition 3.1 specifies only the existence of a positive root and a negative one, but there can exist several positive and negative roots. Proposition 3.2 Let f be a polynomial with real coefficients and (a, b) an interval of the real axis. Let us suppose that f has a single root x of multiplicity order k on this interval. Under these conditions,
  • 70. SEPARATION OF ROOTS 61 (i) f (a)f (b) > 0 if k is an even number; (ii) f (a)f (b) < 0 if k is an odd number. Demonstration. We write f in the form f (x) = (x − x)k g(x), (3.39) where g(x) is a polynomial with real coefficients, without solution in the interval (a, b). We have f (a) = (a − x)k g(a), f (b) = (b − x)k g(b). (3.40) We mention that g(a) and g(b) have the same sign, because g has no roots in the interval (a, b). On the other hand, a − x < 0, b − x > 0, (3.41) because x ∈ (a, b). We may write f (a)f (b) = (a − x)k (b − x)k g(a)g(b) = [(a − x)(b − x)]k g(a)g(b). (3.42) The sign of f (a)f (b) is given by [(a − x)(b − x)]k . If k is an even number (eventually 0), then [(a − x)(b − x)]k > 0, hence f (a)f (b) > 0. Analogically, if k is an odd number, then [(a − x)(b − x)]k < 0, so that f (a)f (b) < 0. Proposition 3.3 Let f be a polynomial of degree n with real coefficients and (a, b) an interval of the real axis. Let us suppose that, in the interval (a, b), f has s roots denoted by x1, x2, . . . , xs, of multiplicity orders k1, k2, . . . , ks. In these conditions, (i) if f (a)f (b) < 0, then s i=1 ki is an odd number; (ii) if f (a)f (b) > 0, then s i=1 ki is an even number (eventually 0). Demonstration. Let us suppose that the roots x1, x2, . . . , xs have been increasingly ordered on the interval (a, b), so that x1 < x2 < · · · < xs. Let x be one of these roots of multiplicity order k and h a real number. We may write f (x + h) = hk f (k) (x) k! + hk+1 f (k+1) (ξ1) (k + 1)! , (3.43) where ξ1 ∈ (x, x + h). Analogically, f (x − h) = (−1)k hk f (k) (x) k! + (−1)k+1 hk+1 f (k+1) (ξ2) (k + 1)! , (3.44) with ξ2 ∈ (x − h, x). Hence, it follows that f (x + h)f (x − h) = (−1)k hk f (k)(x) k! 2 + (−1)k h2k+1 k!(k + 1)! × f (k) (x)f (k+1) (ξ1) + (−1)k+1 h2k+1 k!(k + 1)! f (k) (x)f (k+1) (ξ2) + (−1)k+1 h2k+2 [(k + 1)!]2 f (k+1) (ξ1)f (k+1) (ξ2) (3.45)
  • 71. 62 SOLUTION OF ALGEBRAIC EQUATIONS or f (x + h)f (x − h) = (−1)k hk f (k)(x) k! 2 + h2k+1 φ(x, ξ1, ξ2), (3.46) where the notation for the function φ is obvious. We can immediately show that, for h sufficiently small, the sign of f (x + h)f (x − h) is given by the sign of (−1)k and it is +1 for k even and −1 for k odd, respectively. It follows that f has the sign of f (a) on the interval (a, x1), has the sign of (−1)k1 f (a) on the interval (x1, x2), has the sign of (−1)k1+k2 f (a) on the interval (x2, x3), . . . , and has the sign of (−1)k1+···+ks f (a) on the interval (xs, b). Hence, we can state that if f (a)f (b) < 0, then s i=1 ki is an odd number, while if f (a)f (b) > 0, then the sum is an even number (eventually 0). Theorem 3.5 (Edward Waring, 1736–1798). Let f be a polynomial with real coefficients and x1 and x2 be two consecutive roots of the polynomial (i.e., no other root of the polynomial exists between x1 and x2). Let x1 be of order of multiplicity k1, and x2 of order of multiplicity k2. Under these conditions, the polynomial g = f + λf , λ ∈ R, has, on the interval (x1, x2), a number of real roots, the sum of multiplicity orders of which is odd. Moreover, x1 and x2 are roots of the polynomial g, of multiplicity orders k1 − 1 and k2 − 1, respectively. Demonstration. Let us write the polynomial f in the form f (x) = (x − x1)k1 (x − x2)k2 h(x), (3.47) where h(x) does not change in sign in the interval (x1, x2). Hence, f (x) = k1(x − x1)k1−1 (x − x2)k2 h(x) + k2(x − x1)k1 (x − x2)k2−1 h(x) + (x − x1)k1 (x − x2)k2 h (x) = (x − x1)k1−1 (x − x2)k2−1 [k1(x − x2)h(x) + k2(x − x1)h(x) + (x − x1)(x − x2)h (x)]. (3.48) Denoting by p(x) the polynomial p(x) = k1(x − x2)h(x) + k2(x − x1)h(x) + (x − x1)(x − x2)h (x), (3.49) relation (3.48) leads to f (x) = (x − x1)k1−1 (x − x2)k2−1 p(x). (3.50) The polynomial g(x) can be written in the form g(x) = (x − x1)k1−1 (x − x2)k2−1 p(x) + λ(x − x1)k1 (x − x2)k2 h(x) = (x − x1)k1−1 (x − x2)k2−1 [p(x) + λ(x − x1)(x − x2)h(x)]. (3.51) Denoting by q(x) the polynomial q(x) = p(x) + λ(x − x1)(x − x2)h(x), (3.52) formula (3.51) leads to g(x) = (x − x1)k1−1 (x − x2)k2−1 q(x). (3.53)
  • 72. SEPARATION OF ROOTS 63 Note that g(x) has the roots x1 and x2 of multiplicity orders k1 − 1 and k2 − 1 (a root of multiplicity order 0 is, in fact, not a root), respectively. The roots of g(x), other than x1 and x2, are the roots of q(x). But q(x1) = p(x1) = k1(x1 − x2)h(x1), q(x2) = p(x2) = k2(x2 − x1)h(x2), (3.54) hence q(x1)q(x2) = −k1k2(x1 − x2)2 h(x1)h(x2). (3.55) On the other hand, h(x1) and h(x2) have the same sign on (x1, x2), k1 > 0, k2 > 0, (x1 − x2)2 > 0, and we obtain q(x1)q(x2) < 0. (3.56) Taking into account Proposition 3.3, the theorem is proved. Corollary 3.1 Let f be a polynomial with real coefficients, the roots of which are x1, . . . , xs, of multiplicity orders k1, . . . , ks, respectively. (i) If all the roots of f are real, then all the roots of f are also real. (ii) If all the roots of f are simple, then all the roots of f are also simple and separate the roots of f . Demonstration. (i) We may write s i=1 ki = n, (3.57) where n is the degree of f . Waring’s theorem shows that xi, i = 1, s, are roots of the polynomial f + λf , λ ∈ R, also of multiplicity orders ki − 1. It follows that the sum of the multiplicity orders of the roots of f + λf is given by s i=1 (ki − 1) = s i=1 ki − s = n − s. (3.58) On the other hand, there exists at least one root between xi and xi+1. The addition of these s − 1 roots to the sum equation (3.58) shows that the sum of multiplicity orders of the roots of polynomial f + λf is at least equal to n − s + (s − 1) = n − 1. (3.59) Let us note that, from Waring’s theorem, from formula (3.59), and because the sum of the multiplicity orders of the polynomial f + λf is equal to n, it follows that each of the roots of f + λf , situated between xi and xi+1, are simple roots. Accordingly, it follows that the last root of f + λf is situated either in the interval (−∞, x1), or in the interval (xs, ∞) and that this root is simple. This last root cannot be complex, without being real, ξ ∈ C − R, because the polynomial f + λf being with real coefficients, it would result that the conjugate ξ of ξ is also a root. The sum of the multiplicity orders of the other roots of the polynomial f + λf , which are real, in accordance with Waring’s theorem, would be equal to n − 2, in contradiction to formula (3.59). The s roots of f are x1, . . . , xs, of multiplicity orders k1 − 1, . . . , ks − 1, respectively, as well as the s − 1 roots situated
  • 73. 64 SOLUTION OF ALGEBRAIC EQUATIONS between xi and xi+1, i = 1, s − 1. The sum of the multiplicity orders of these roots is equal to s i=1 (ki − 1) + (s − 1) = n − 1 (3.60) and, because the degree of f is equal to n − 1, it is sufficient to make λ = 0, obtaining thus all the roots of f , all of which are real. (ii) It is a particular case of point (i) for k1 = k2 = · · · = ks = 1. (3.61) Proposition 3.4 Let a1, . . . , an be a finite sequence of nonzero numbers. If we leave out the intermediate terms a2, . . . , an−1, the extremes a1 and an remaining unchanged, then the number of sign variations in the sequence of two elements obtained differs from the number of sign variations of the initial sequence by an even number (eventually 0). Demonstration. Let us consider a sequence of three consecutive elements of the initial sequence, that is, ai, ai+1, ai+2, i ≥ 1, i ≤ n − 2, and let us eliminate the intermediate element ai+1. To fix the ideas, let us suppose that ai > 0. The following situations are possible: (a) ai+1 > 0, ai+2 > 0. The number of sign variations is equal to zero in the initial sequence and in the last one also it is equal to zero; hence, the difference of the two numbers is an even number; (b) ai+1 > 0, ai+2 < 0. The number of sign variations is equal to one in the initial sequence, and in the last one it is equal to one too; the difference of the two numbers is zero, hence an even number; (c) ai+1 < 0, ai+2 > 0. In this case, we have two sign variations in the initial sequence, while in the last sequence we have none; the difference is equal to two, hence an even number; (d) ai+1 < 0, ai+2 < 0. We have one variation of sign in both sequences; the difference is thus equal to zero, hence an even number. The considered property thus holds for this sequence of three elements. In the general case, by eliminating any intermediate term from a2 to an−1, the number of sign variations differs by two or remains the same and the proposition is proved. Corollary 3.2 Let f (x) = a0xn + a1xn−1 + · · · + an−1x + an (3.62) be a polynomial of degree n with real coefficients. The number of sign variations of the sequence of the coefficients of f has the same parity as the sum of the orders of multiplicity of the positive real roots of the equation f (x) = 0. Demonstration. Let us suppose that a0 > 0. There are two cases. If an < 0, then f (0) = an < 0 and lim x→∞ f (x) = +∞. According to Proposition 3.3, it follows that the sum of the orders of multiplicity of the positive roots of the equation f (x) = 0 is an odd number, and hence Proposition 3.4 shows that the number of sign variations in the sequence of the coefficients of f is an odd number. If an > 0, then f (0) = an > 0 and lim x→∞ f (x) = +∞. As the number of sign variations in the sequence of the coefficients of f is even, Proposition 3.3 shows that the sum of the orders of multiplicity of the positive roots of the equation f (x) = 0 is an even number.
  • 74. SEPARATION OF ROOTS 65 Corollary 3.3 Let f be a polynomial of degree n with real coefficients, the positive real roots of which are all simple. In this case, the number of sign variations in the sequence of the coefficients of f has the same parity as the number of positive roots of the equation f (x) = 0. Demonstration. In the given conditions, the sum of the multiplicity orders of the positive roots of the equation f (x) = 0 is equal just to the number of these roots and we apply Corollary 3.2. Lemma 3.1 Let α be a nonzero positive number and let f (x) be a polynomial of degree n. Let us consider the polynomial g(x) = (x − α)f (x). In these conditions, the number of sign variations in the sequence of the coefficients of the polynomial g differs from the number of sign variations of the coefficients of f by a positive odd number. Demonstration. Let us consider the polynomial f (x) = anxn + an−1xn−1 + · · · + a1x + a0, (3.63) which we write in the form f (x) = anxn + · · · + aixi − aj xj − · · · − akxk + alxl + · · · , (3.64) where we have marked groups of terms of the same sign. The polynomial g(x) is now written in the form g(x) = (x − α)(anxn + · · · + aixi ) + (x − α)(−aj xj − · · · − akxk ) + (x − α)(alxl + · · ·) − · · · = anxn+1 + · · · + aixi+1 − αanxn − · · · − αaixi − aj xj+1 − · · · − akxk+1 + αaj xj + · · · + αakxk + alxl+1 + · · · − αalxl − · · · (3.65) The following situations may occur: (a) i > j + 1. We have only one sign variation in this case. (b) i = j + 1. We introduce the terms −αai and −aj in the same group and have once more a sign variation between the first and the last term in the group. (c) k > l + 1. We have a sign variation too. (d) k = l + 1. The coefficient of xl+1 is positive and we have a sign variation in this case. Let an and ap be the coefficients of the extreme terms of the polynomial f . It follows that the extreme terms of g are an and αap. If anap > 0, then −αanap < 0, whereas if anap < 0, then −αanap > 0. It follows that the number of sign variations in the sequence of the coefficients of g is greater than the number of variations in the sequence of variations of f ; we mention also that the difference between the two numbers is an odd number. Theorem 3.6 (Descartes1 ). Let us suppose that the equation f (x) = 0 has only simple roots, the number of positive roots of which is p. In this case, p is either equal to the number of sign variations in the sequence of coefficients of f or is less than minus from this one by an even number. Demonstration. Let x1, . . . , xp be the p positive simple roots of the equation f (x) = 0. We may write f (x) = (x − x1) · · · (x − xp)g(x), (3.66) 1The theorem was presented by Ren´e Descartes (1596–1650) in La G´eom´etrie (1637) and is also known as the rule of signs.
  • 75. 66 SOLUTION OF ALGEBRAIC EQUATIONS where g(x) has no positive roots. Let n1 be the number of sign variations of the coefficients of the polynomial g(x). According to the Corollary 3.3, n1 is an even number n1 = 2m, m ∈ N. (3.67) We now apply the Lemma 3.1 p times, so that every time number of the sign variations in the sequence of the coefficients of the obtained polynomials will increase by an odd number. It follows that the number of sign variations in the sequence of the coefficients of the polynomial g is given by Nv = n1 + p i=1 (2ki + 1) = 2 m + p i=1 ki + p. (3.68) We obtain p = Nv − 2 m + p i=1 ki (3.69) and the theorem is proved. Observation 3.13 Taking into account the polynomial f1(x) = (−1)n f (−x), we may apply Descartes’ theorem for the negative roots of f too. Definition 3.4 Let f be a polynomial with real coefficients, which does not have multiple roots, and let [a, b] be an interval of the real axis. A finite sequence f0, f1, . . . , fk of polynomials associated with the polynomial f on this interval is called a Sturm sequence if (i) the last polynomial fk(x) has no real roots; (ii) two consecutive polynomials fi(x) and fi+1(x) have no common roots; (iii) if x ∈ R and fi(x) = 0, then fi−1(x)fi+1(x) < 0, i = 0, k − 1; (iv) fi(a) = 0; fi(b) = 0 for any i = 0, k. Proposition 3.5 For any polynomial f with real coefficients, without multiple roots, and for any interval [a, b] with f (a) = 0, f (b) = 0, there exists a Sturm sequence associated with f on [a, b]. Demonstration. Let us construct the sequence fi so that f0 = f, f1 = f , (3.70) while for i ≥ 2 we have f0 = f1q1 − f2, f1 = f2q2 − f3, . . . , fi-2 = fi−1qi−1 − fi, . . . (3.71) Because the degrees of the polynomials decrease, it follows that there exist only a finite number of such polynomials. In the following, we verify that the sequence of these polynomials fi, i = 0, k, previously defined is a Sturm sequence associated with f on [a, b]. (i) Because f = f0 and f = f1 have no common factors (f has no multiple roots), it follows that the last polynomial fk of the sequence is a constant.
  • 76. SEPARATION OF ROOTS 67 (ii) If fi and fi−1, 1 ≤ i ≤ k, have a common root, then from relation (3.71) it would follow that fi−2 has the same root. Finally, we can show that the root is common to f0 = f and f1 = f , so that the polynomial f would then have multiple roots, which is a contradiction to the hypothesis. Therefore, fi and fi−1 have no common roots, 1 ≤ i ≤ k. (iii) Let x ∈ R be so that fi(x) = 0 for a certain index i, 1 ≤ i ≤ k − 1. From fi−1(x) = fi(x)qi(x) − fi+1(x), (3.72) we get fi−1(x) = −fi+1(x), (3.73) because fi(x) = 0; hence fi−1(x)fi+1(x) < 0. (3.74) (iv) From (ii) and (iii) it follows that fi(a) may be equal to zero only for a finite number of indices i1, i2, . . . , ip between 0 and k, as well as for any two neighboring indices ik+1 − ik > 1. We can replace the value a with the value a + ε, ε sufficiently small, so that the properties (i), (ii) and (iii) are not violated, and fi(a) = 0 for any i = 0, k . Analogically, we may also replace the value b with the value b − µ, µ sufficiently small, to get all the properties required by the Sturm sequence. Theorem 3.7 (Sturm2). Let f be a polynomial with real coefficients and without multiple roots. The number of real roots of the polynomial f in the interval [a, b] is given by Ws(a) − Ws(b), where Ws(x∗ ) is the number of the sign variations in the sequence f0(x∗ ), f1(x∗ ), . . . , fk(x∗ ). Demonstration. Let fi, 0 ≤ i ≤ k − 1, be an arbitrary term (but not the last) of the Sturm sequence and let us denote by x1, x2, . . . , xs the roots of this polynomial in the interval [a, b]. We shall show that Ws(x∗ ) remains constant for x∗ ∈ (xk, xk+1). Let us suppose per absurdum that Ws(x∗) is not constant. Then there exist two real numbers, y1 and y2, in the interval (xk, xk+1) so that fi(y1)fi(y2) < 0. It follows that there exists ξ ∈ (y1, y2), y1 < y2, so that fi(ξ) = 0. But ξ is not a root of fi because ξ ∈ (xk, xk+1) and xk and xk+1 are consecutive roots. Hence, Ws(x∗ ) is constant for x∗ ∈ (xk, xk+1). Let us consider y ∈ [a, b] and fi(y) = 0, 1 ≤ i ≤ k − 1, that is y is not a root for f . We shall show that Ws(a) = Ws(b). From property (iii) of the Sturm sequence, it follows that fi−1(y)fi+1(y) < 0, that is, fi−1(y) and fi+1(y) have opposite signs. These signs do not change if we replace y by a and b, respectively. Hence, it follows that the number of sign variations in the triples fi−1(a), fi(a), fi+1(a) and fi−1(b), fi(b), fi+1(b), respectively is every time equal to unity. We conclude that if y is not a root of f , then Ws(a) = Ws(b). Let y ∈ [a, b], y a root of f . In this case, f (a)f (b) < 0, and hence f (a) and f (b) have the same sign as f (b). It results Ws(a) − Ws(b) = 1. It follows that each root adds a unity to Ws(a) − Ws(b). Thus the theorem is proved. Theorem 3.8 (Budan3 or Budan–Fourier). Let f be a polynomial in the variable x, and a and b two real numbers, not necessarily finite. Let us denote by δf the sequence f , f , . . . , f (n) and by W(δf , p) the number of variations of sign in the sequence f (p), f (p), . . . , f (n)(p). In these conditions, if R(f, a, b) is the number of real roots of f in the interval [a, b], each root being 2 The idea is a generalization of Euclid’s algorithm in the case of polynomials and was proved in 1829 by Jacques Charles Franc¸ois Sturm (1803–1855). 3Ferdinand Franc¸ois D´esir´e Budan de Boislaurent (1761–1840) proved this theorem in 1807. The proof was lost and was replaced by another statement of an equivalent theorem belonging to Jean Baptiste Joseph Fourier (1768–1830), published in 1836.
  • 77. 68 SOLUTION OF ALGEBRAIC EQUATIONS counted as many times as its order of multiplicity, then W(δf , a) − W(δf , b) is at least equal to R(f, a, b), while the difference between them is a positive even number, that is, W(δf , a) − W(δf , b) = R(f, a, b) + 2k, k ∈ N. (3.75) Demonstration. First, let us remark that W(δf , x) may change its value only if x passes through a root x of a polynomial of the sequence δf . We can find an ε > 0 so that, in the interval [x − ε, x + ε], no function of the sequence δf has roots, other than x. Let us denote by m the order of multiplicity of x. If we show that W(δf , x) = W(δf , x + ε) (3.76) and W(δf , x − ε) − [W(δf , x) + m] = 2k, k ∈ N, (3.77) then the theorem is proved. Indeed, when x goes through the interval [a, b], R(f, a, b) and W(δf , x) are modified only if x becomes equal to a root x of f or of one of its derivatives. At such a point, R(f, a, b) increases with the order of multiplicity of x for f , while W(δf , x) decreases with the sum of m and an even natural number (Proposition 3.4). It follows therefore that the sum R(f, a, b) + W(δf , x) may be changed only by the roots x of f or of its derivatives, in which case the value of the sum decreases by an even natural number. We thus obtain the above theorem, because this sum is equal to W(δf , a) for x = a. Let us now prove relations (3.76) and (3.77). The proof is obtained by induction on the degree of f . If f is of first degree, then W(δf , x − ε) = 1, W(δf , x) = W(δf , x + ε) = 0 (3.78) and the induction hypothesis is verified. Let us suppose now that the degree of f is at least equal to 2 and that m is the order of multiplicity of x for f . We begin by assuming that f (x) = 0, from which m > 0 and x is the root of an order of multiplicity m − 1 of f . The induction hypothesis leads to W(δf , x) = W(δf , x + ε), W(δf , x − ε) − [W(δf , x) + (m − 1)] = 2k1, k1 ∈ N. (3.79) From Lagrange’s mean theorem, applied to the intervals [x − ε, x] and [x, x + ε], we deduce that f and f do not have the same sign in x − ε but have the same sign in x + ε, hence W(δf x) = W(δf , x) = W(δf , x + ε) = W(δf , x + ε), (3.80) W(δf , x − ε) = W(δf , x − ε) + 1 ≥ W(δf , x) + (m − 1) + 1 = W(δf , x) + m, (3.81) W(δf , x − ε) − [W(δf , x) + m] = 2k, k ∈ N, (3.82) so that the theorem is proved in this case. If f (x) = 0, that is m = 0, then we denote by m the order of multiplicity of x for f . From the induction hypothesis, we have W(δf , x) = W(δf , x + ε), (3.83) W(δf , x − ε) − [W(δf , x) − m ] = 2k1, k1 ∈ N. (3.84)
  • 78. LAGRANGE’S METHOD 69 On the other hand, f (x) = 0 and f (x) = 0, f (x) = 0, . . . , f (m )(x) = 0, f (m +1)(x) = 0. We may suppose that f (m +1)(x) > 0 (eventually, multiplying f by −1). The following situations may occur: • m is an even number. In this case, f (x − ε) and f (x + ε) are positive, hence, for each x of the set {x − ε, x, x + ε}, the first nonzero term of the sequence f (x), f (x), . . . , f (k) (x) is positive. If f (x) > 0, then W(δf , x) = W(δf , x), while if f (x) < 0, then W(δf , x) = W(δf , x) + 1. The theorem is proved, because it follows that W(δf , x) = W(δf , x + ε), (3.85) W(δf , x − ε) − W(δf , x) = W(δf , x − ε) − W(δf , x), (3.86) the term on the right being greater than m by an even number; and because m is an even number, it follows that this term is also an even number. • m is an odd number. We get f (x − ε) < 0 < f (x + ε), (3.87) hence the first nonzero term of the sequence f (x), f (x), . . . , f (k)(x) will have the signs +, −, + at the points x − ε, x, x + ε, respectively. If f (x) > 0, then W(δf , x − ε) = W(δf , x − ε) + 1, so that the other two variations of sign remain unchanged. If f (x) < 0, then the number of variations of sign does not change for x − ε, but increases by unity for x and x + ε. We obtain W(δf , x) = W(δf , x + ε), (3.88) W(δf , x − ε) − W(δf , x) = W(δf , x − ε) − W(δf , x) ± 1. (3.89) On the other hand, W(δf , x − ε) − W(δf , x) is equal to m to which we add an even natural number (i.e., an odd number, because m is odd). It follows therefore that if we add or subtract 1 to this difference, we obtain an even natural number which is just W(δf , x − ε) − W(δf , x) and the theorem is proved. Observation 3.14 Descartes’ theorem is a particular case of Budan’s theorem. Indeed, if f = a0xn + a1xn−1 + · · · + an, (3.90) then sgnf (0) = sgnan, sgnf (0) = sgnan−1, . . . , sgnf (n) (0) = a0, (3.91) sgnf (∞) = sgnf (∞) = · · · = sgnf (n) (∞) = sgna0 (3.92) and from Budan’s theorem, for a = 0, b = ∞, it follows that W(δf , 0) is just the number of variations of sign in the sequence a0, a1, . . . , an, W(δf , ∞) = 0. Hence we obtain Descartes’ theorem. 3.3 LAGRANGE’S METHOD Let us consider the equation4 f (x) = a0xn + a1xn−1 + · · · + a0 = 0, (3.93) 4The method was named in the honor of Joseph Louis Lagrange (Giuseppe Luigi Lagrancia or Giuseppe Luigi Lagrangia) (1736–1813) who studied the problem in 1770.
  • 79. 70 SOLUTION OF ALGEBRAIC EQUATIONS TABLE 3.1 The Generalized Horner’s Schema a0 a1 a2 · · · an−2 an−1 an α a0 a11 a12 · · · a1,n−2 a1,n−1 α a0 a21 a22 · · · a2,n−2 · · · · · · · · · · · · · · · α a0 an−1,1 α a0 the coefficients ai, i = 0, n, of which are real numbers and let α ∈ R be an arbitrary value. We may write Taylor’s formula around α in the form f (x) = f (α) + x − α 1! f (α) + (x − α)2 2! f (α) + · · · + (x − α)n n! f (n) (α). (3.94) Hence, it follows that the remainder of the division of f (x) by x − α is just f (α), while the quotient is given by Q1(x) = f (α) + x − α 2! f (α) + · · · + (x − α)n−1 n! f (n) (α). (3.95) The remainder of the division of Q1(x) by x − α is f (α), while the quotient becomes Q2(x) = f (α) 2! + x − α 3! f (α) + · · · + (x − α)n−2 n! f (n) (α). (3.96) In general, Qi(x) = f (i) (α) i! + x − α (i + 1)! f (i+1) (α) + · · · + (x − α)n−i n! f (n) (α), (3.97) while the remainder of the division of Qi(x) by x − α is Ri(x) = f (i) (α) i! . (3.98) Hence, we have the following relations between the coefficients a0, . . . , an of f (α) and the coefficients a0, . . . , an−1 of f (α): a0 = a0, a1 = a0α + a1, . . . , an−1 = an−2α + an−1. (3.99) Analogically, the coefficients a0, . . . , an−1 of f (α) and the coefficients a 0, . . . , an−2 of f (α)/2 are related as follows: a0 = a0, a1 = a0 α + a1, . . . , an−2 = an−3α + an−2. (3.100) The above relations may be systematized in Table 3.1. The first row gives the coefficients of f (α), that is, a0, . . . , an, the second gives the coef- ficients of f (α)/1!, that is, a0 = a0, a1 = a11, . . . , an−1 = a1,n−1, the third the coefficients of f (α)/2!, that is, a0 = a0, a1 = a21, . . . , an−2 = a2,n−2, . . . , the nth row has the coefficients of f (n−1) (α)/(n − 1)!, that is, a(n−1) 0 = a0, a(n−1) 1 = an−1,1, and the last row, the (n + 1)th, the coef- ficients of f (n)(α)/n!, that is, a(n) 0 = a0. This table is known as the generalized Horner’s schema.
  • 80. LAGRANGE’S METHOD 71 Let us suppose that equation (3.93) is the one that has a positive real root x. The case of the negative root is similar to the previous one if we consider the equation g(x) = (−1)n f (−x) = 0. (3.101) Let us suppose that we have found a natural number a1, so that a1 < x < a1 + 1; (3.102) hence x becomes x = a1 + 1 x1 , (3.103) with x1 ∈ R∗ +. We then have f1(x1) = xn 1 f (a1) + xn−1 1 f (a1) 1! + xn−2 1 f (a1) 2! + · · · + f (n) (a1) n! = 0. (3.104) We now search for a natural number a2, so that x1 = a2 + 1 x2 , (3.105) and hence f2(x2) = xn 2 f (a2) + xn−1 2 f (a2) 1! + xn−2 2 f (a2) 2! + · · · + f (n) (a2) n! = 0. (3.106) The procedure continues by searching for a3, so that x2 = a3 + 1 x3 . (3.107) Finally, we obtain x = a1 + 1 a2 + 1 a3 + 1 ... , (3.108) a decomposition of x in a continued fraction. Let us denote Rn = a1 + 1 a2 + 1 a3 + 1 ... + 1 an = An Bn . (3.109) Dirichlet’s theorem shows that |x − Rn| < 1 B2 n , (3.110) thus obtaining the error of approximation in the solution x. The method presented above is called Lagrange’s method. Observation 3.15 To apply Lagrange’s method, it is necessary to have one and only one solution of equation (3.93) between a1 and a1 + 1.
  • 81. 72 SOLUTION OF ALGEBRAIC EQUATIONS 3.4 THE LOBACHEVSKI–GRAEFFE METHOD Let us consider the algebraic equation5 a0xn + a1xn−1 + · · · + an−1x + an = 0, (3.111) where ai ∈ R, i = 0, n, and let us denote by xi, i = 1, n, its roots. 3.4.1 The Case of Distinct Real Roots Let us suppose that the n distinct roots are obtained as follows |x1| > |x2| > · · · > |xn|. (3.112) The corresponding Vi`ete’s relations are x1 + x2 + · · · + xn = − a1 a0 , x1x2 + · · · + x1xn + · · · + xn−1xn = a2 a0 , x1x2x3 + · · · + x1x2xn + · · · + xn−2xn−1xn = − a3 a0 , . . . , x1x2 · · · xn = (−1)n an a0 . (3.113) If |x1| |x2| |x3| · · · |xn|, (3.114) then the roots xi, i = 1, n, may be given by the approximate formulae x1 ≈ − a1 a0 , x2 ≈ − a2 a1 , x3 ≈ − a3 a2 , . . . , xn ≈ − an an−1 . (3.115) Let us see now how we can transform equation (3.111) into another one for which the roots yi, i = 1, n, satisfy condition (3.114); there exist certain relations between the roots xk, k = 1, n, and yi, i = 1, n. We now introduce the polynomial function f (x) = a0xn + a1xn−1 + · · · + an−1x + an; (3.116) we can then write f (x) = a0(x − x1)(x − x2) · · · (x − xn) (3.117) because of the supposition that the roots xi, i = 1, n, are real and distinct. On the other hand, f (−x) = (−1)n a0(x + x1)(x + x2) · · · (x + xn), (3.118) hence f (x)f (−x) = (−1)n a2 0(x2 − x2 1)(x2 − x2 2) · · · (x2 − x2 n). (3.119) From relation (3.116), we get f (−x) = (−1)n a0xn + (−1)n−1 a1xn−1 + · · · + (−1)an−1x + an, (3.120) 5This method was presented by Germinal Pierre Dandelin (1794–1847) in 1826, Karl Heinrich Graeffe (Karl Heinrich Gr¨affe) (1799–1873) in 1837, and Nikolai Ivanovich Lobachevski (1792–1856) in 1834.
  • 82. THE LOBACHEVSKI–GRAEFFE METHOD 73 and hence f (x)f (−x) = (−1)n [a2 0x2n − (a2 1 − 2a0a2)x2n−2 + (a2 2 − 2a1a3 + 2a0a4)x2n−2 + · · · + (−1)n a2 n]. (3.121) By the transformation y = x2 , (3.122) the equation f (x)f (−x) = 0 (3.123) becomes a0yn + a1yn−1 + · · · + an−1y + an = 0, (3.124) where a0 = a2 0, a1 = −(a2 1 − 2a0a2), a2 = a2 2 − 2a1a3 + 2a0a4, . . . , an = (−1)n a2 n. (3.125) We can write these relations in the form aj = a2 j + 2 n i=1 (−1)i aj−1aj+1 (−1)j , j = 0, n, (3.126) where aj = 0 for j < 0 or j > n. Observation 3.16 (i) Equation (3.123) has 2n roots, namely, ±x1, ±x2, . . . , ±xn. (ii) By solving equation (3.124), we obtain the roots y1, y2, . . . , yn. The roots x1, x2, . . . , xn are no more unique, because xi = −yi or xi = − −yi, i = 1, n. The procedure described above can be repeated for equation (3.124) in y. In general, the proce- dure is repeated p times, obtaining thus an equation of the form a (p) 0 zn + a (p) 1 zn−1 + · · · + a (p) n−1z + a(p) n = 0, (3.127) the roots of which are z1, z2, . . . , zn. The connection between zi and xi is given by xi = 2p −zi or xi = − 2p −zi, i = 1, n. (3.128) The roots of equation (3.127) are given by the formulae of the form (3.115), hence z1 = − a (p) 1 a (p) 0 , z2 = − a (p) 2 a (p) 1 , . . . , zi = − a (p) i+1 a (p) i , . . . , zn = − a (p) n a (p) n−1 , (3.129) so relations (3.128) may also be written in the form xi = 2p − a (p) i a (p) i−1 or xi = − 2p − a (p) i a (p) i−1 , i = 1, n. (3.130) Relations (3.130) must satisfy the initial equation f (x) = 0, retaining only its solutions.
  • 83. 74 SOLUTION OF ALGEBRAIC EQUATIONS 3.4.2 The Case of a Pair of Complex Conjugate Roots Let us again consider equation (3.111), supposing that two of its roots, say xk and xk+1, are conjugate complex ones. We can write relation (3.112) in the form |x1| > |x2| > · · · > |xk| = |xk+1| > |xk+2| > · · · > |xn|. (3.131) We denote by r = |xk| = |xk+1|, (3.132) the modulus of the conjugate complex roots, where xk = α + iβ, xk+1 = α − iβ, r = α2 + β2. (3.133) From Vi`ete’s relation, x1 + x2 + · · · + xk + xk+1 + · · · + xn = − a1 a0 , (3.134) we easily obtain α = − a1 2a0 − 1 2 n j=1 j=k;j=k+1 xj , (3.135) by taking into account relations (3.133). Squaring equation (3.111) and proceeding as from equation (3.111), we obtain the equation a (p) 0 zn + a (p) 1 zn−1 + · · · + a (p) k−1zn−k+1 + a (p) k zn−k + a (p) k+1zn−k−1 + · · · + a (p) 1 z + a (p) 0 = 0. (3.136) The roots zk and zk+1 satisfy the relation a (p) k−1z2 + a (p) k z + a (p) k+1 = 0. (3.137) Then zkzk+1 = a (p) k+1 a (p) k−1 . (3.138) On the other hand, zkzk+1 = (xkxk+1)2p = (r2 )2p , (3.139) from which r2 = 2p a (p) k+1 a (p) k−1 . (3.140) From relations (3.135) and (3.140), we get β2 = r2 − α2 = 2p a (p) k+1 a (p) k−1 −    − a1 2a0 − 1 2 n j=1 j=k;j=k+1 xj     . (3.141) Knowing α and β, we obtain the roots xk and xk+1.
  • 84. THE LOBACHEVSKI–GRAEFFE METHOD 75 Observation 3.17 (i) If all the roots of equation (3.111) are real and distinct, then all the products of the form aj−iaj+i become negligible with respect to a2 j , hence all the coefficients a(s) j become perfect quasi-squares beginning from a certain rank. (ii) If a certain a(s) j , 1 ≤ j ≤ n − 1, does not become a perfect square, but is situated between two perfect squares a(s) j−1 and a(s) j+1, then the ratio (r2 )2s = a(s) j+1 a(s) j−1 , (3.142) where r is the modulus of the pair of conjugate complex roots or even the value of a double real root (if the imaginary part of the conjugate complex roots vanishes). (iii) More generally, if 2l − 1 coefficients a(s) k−2l+1, a(s) k−2l+2, . . . , a(s) k are situated between two perfect squares a(s) k−2l and a(s) k , then there exist l pairs of roots that have the same modulus r. 3.4.3 The Case of Two Pairs of Complex Conjugate Roots Let xk, xk+1 and xl, xl+1 be two pairs of conjugate complex roots, so that xk = α1 + iβ1, xk+1 = α1 − iβ1, xl = α2 + iβ2, xl+1 = α2 − iβ2, (3.143) with β1 = 0, β2 = 0. We may write the sequence of inequalities |x1| > |x2| > · · · > |xk−1| > |xk| = |xk+1| > |xk+2| > · · · > |xl−1| > |xl| = |xl+1| > |xl+2| > · · · > |xn|, (3.144) where x1, . . . , xn are the roots of equation (3.111), all real, except for xk, xk+1, xl, and xl+1. We obtain thus two equations of second degree, that is, a (p) k−1z2 + a (p) k z + a (p) k+1 = 0, a (p) l−1z2 + a (p) l z + a (p) l+1 = 0. (3.145) Let us denote by r1 and r2 the moduli of the two pairs of complex roots r1 = |xk| = |xk+1|, r2 = |xl| = |xl+1|. (3.146) We can write the relations r2 1 = xkxk+1 = α2 1 + β2 1, r2 2 = xlxl+1 = α2 2 + β2 2; (3.147) and from equation (3.145), we obtain r2 1 = 2p a (p) k+1 a (p) k−1 , r2 2 = 2p a (p) l+1 a (p) l−1 . (3.148) From the first Vi`ete relation for equation (3.111), we have n i=1 xi = − a1 a0 (3.149)
  • 85. 76 SOLUTION OF ALGEBRAIC EQUATIONS or x1 + x2 + · · · + xk−1 + xk+1 + xk+2 + · · · + xl−1 + xl + xl+1 + xl+2 + · · · + xn = − a1 a0 , (3.150) because xk + xk+1 = 2α1, xl + xl+1 = 2α2, (3.151) we have α1 + α2 = − 1 2       a1 a0 + n i=1 i=k;i=k+1 i=l;i=l+1 xi       . (3.152) Let us consider now the last two Vi`ete relations, x1x2 . . . xn−1 + x1x2 . . . xn−2xn + · · · + x2x3 . . . xn = (−1)n−1 an−1 a0 , (3.153) x1x2 . . . xn = (−1)n an a0 . (3.154) By division, we get 1 x1 + · · · + 1 xk−1 + 1 xk + 1 xk+1 + 1 xk+2 + · · · + 1 xl−1 + 1 xl + 1 xl+1 + 1 xl+2 + · · · + 1 xn = − an−1 an . (3.155) On the other hand, 1 xk + 1 xk+1 = xk + xk+1 xkxk+1 = 2α1 r2 1 , 1 xl + 1 xl+1 = xl + xl+1 xlxl+1 = 2α2 r2 2 , (3.156) leading to α1 r2 1 + α2 r2 2 = − 1 2       an−1 an + n i=1 i=k;i=k+1 i=l;i=l+1 1 xi       . (3.157) We obtain α1 and α2 from relations (3.152) and (3.157). Taking into account r1, r2, α1, α2, it follows that β1 = r2 1 − α2 1, β2 = r2 2 − α2 2. (3.158) 3.5 THE BERNOULLI METHOD Let us consider the equation6 f (x) = xn + a1xn−1 + · · · + an = 0, (3.159) 6Daniel Bernoulli (1700–1782) used this method for the first time in 1724.
  • 86. THE BERNOULLI METHOD 77 to which we associate the recurrence formula µn + a1µn−1 + · · · + an = 0. (3.160) If the roots of equation (3.195) are ξ1, ξ2, . . . , ξn and if equation (3.160) is considered to be a difference equation, then the solution of the latter is of the form µk = C1ξk 1 + C2ξk 2 + · · · + Cnξk n, k = 1, n, (3.161) where Ci, i = 1, n, are constants that do not depend on k, while the roots ξi, i = 1, n, are assumed to be distinct. Let us further suppose that the roots ξi, i = 1, n, are indexed such that |ξ1| > |ξ2| > · · · > |ξn|. (3.162) Writing expression (3.161) in the form µk = C1ξk 1 1 + C2 ξk 2 ξk 1 + · · · + Cn ξk n ξk 1 (3.163) and making k → k − 1, from which µk−1 = C1ξk−1 1 1 + C2 ξk−1 2 ξk−1 1 + · · · + Cn ξk−1 n ξk−1 1 , (3.164) it follows that ξ1 = lim k→∞ µk µk−1 , (3.165) supposing that µ1, µ2, . . . , µn are chosen so as not to have C1 = 0. Such a choice is given by µ1 = µ2 = · · · = µn−1 = 0, µn = −a0. (3.166) Another choice for the n values is given by µr = −(a1µr−1 + a2µr−2 + · · · + ar−1µ1 + ar ), r = 1, n, (3.167) where we suppose that µi = 0 if i ≤ 0. In the case of this choice, we obtain Ci = 1, i = 1, n, and µk = ξk 1 + ξk 2 + · · · + ξk n, k ≥ 1. (3.168) Moreover, we also obtain also the approximate relations ξ1 ≈ µk µk−1 , ξ1 ≈ k √ µk, (3.169) with k sufficiently large. If ξ1 is a complex root, then ξ2 = ξ1, |ξ1| = |ξ2|. We may write ξ1 = ζ1 + iη1 = β1eiφ1 , ξ1 = ζ1 − iη1 = β1e−iφ1 , (3.170) where β1 = ξ2 1 + η2 1 > 0. (3.171)
  • 87. 78 SOLUTION OF ALGEBRAIC EQUATIONS The sum C1ξk 1 + C2ξk 2 may be replaced by βk 1(C1 cos kφ1 + C2 sin kφ1), (3.172) where we have made the substitutions C1 → C1 − iC2 2 , C2 → C1 + iC2 2 . (3.173) Hence it follows that, for k sufficiently large, we may write µk ≈ βk 1(C1 cos kφ1 + C2 sin kφ1). (3.174) Moreover, µk must satisfy the recurrence relation µk+1 − 2µkβ1 cos φ1 + β2 1µk−1 = 0. (3.175) Making k → k − 1, we obtain the second relation of recurrence µk − 2µk−1β1 cos φ1 + β2 1µk−2 = 0. (3.176) By eliminating cos φ1 between these relations, it follows that (µ2 k−1 − µkµk−2)β2 1 = µ2 k − µk+1µk−1, (3.177) whereas by eliminating β2 1, we obtain 2(µ2 k−1 − µkµk−2) cos φ1 = µkµk−1 − µk+1µk−2. (3.178) Denoting sk = µ2 k − µk+1µk−1, tk = µkµk−1 − µk+1µk−2, (3.179) we obtain the values β2 1 ≈ sk sk−1 , 2β1 cos φ1 ≈ tk tk−1 , (3.180) for k sufficiently large and C1 = 0, C2 = 0 (a case that may be eliminated). If ξ1 is a double root, ξ1 = ξ2, then in the sum (3.161) we obtain the expression ξk 1(C1 + C2k). It follows that µk satisfies the relation µk+1 − 2µkξ1 + µk−1ξ2 1 = 0, (3.181) for k → ∞. Proceeding as above, we obtain the relation 2ξ1 ≈ tk sk−1 . (3.182)
  • 88. LIN METHODS 79 3.6 THE BIERGE–VI `ETE METHOD Let us consider the polynomial7 f (x) = xn + a1xn−1 + · · · + an (3.183) which we divide by x − ξ. It follows that f (x) = xn + a1xn−1 + · · · + an = (x − ξ)(xn−1 + b1xn−2 + · · · + bn−1) + R, (3.184) where R is the remainder. In particular, R = f (ξ). (3.185) Dividing now the quotient of relation (3.184) by x − ξ, we obtain xn−1 + b1xn−2 + · · · + bn−1 = (x − ξ)(xn−2 + c1xn−3 + · · · + cn−2) + R , (3.186) while the remainder R verifies the relation R = f (ξ). (3.187) Obviously, the procedure may continue. Between the coefficients ai, i = 1, n, and bj , j = 1, n − 1, there take place the relations a1 = b1 − ξ, a2 = b2 − ξb1, . . . , an−1 = bn−1 − ξbn−2, an = R − ξbn−1 (3.188) and similarly for bj , j = 1, n − 1, and ck, k = 1, n − 2. It follows that R = f (ξ) = bn = an + ξbn−1, (3.189) R = f (ξ) = cn−1 = bn−1 + ξcn−2. (3.190) Thus, we obtain the relation of recurrence ξ∗ = ξ − R R = ξ − an + ξbn−1 bn−1 + ξcn−2 . (3.191) As a matter of fact, the Bierge–Vi`ete method is a variant of Newton’s method, in which the computation of the functions f (ξ) and f (ξ) is avoided. 3.7 LIN METHODS The first Lin method8 derives from the Bierge–Vi`ete one, for which the relation f (ξ) = 0 is equivalent to an + ξbn−1 = 0, (3.192) the notations being those in the previous paragraph. 7 This method is the Newton–Raphson method in the case of polynomials. It was named in the honor of Franc¸ois Vi`ete (1540–1603) who stated it for the first time, in a primary form, in 1600. 8The methods were presented for the first time by Sir Leonard Bairstow (1880–1963) in 1920. They were mathe- matically developed by S. N. Lin in 1941 and 1943.
  • 89. 80 SOLUTION OF ALGEBRAIC EQUATIONS In this case, bn−1 is seen as a function of ξ, hence relation (3.192) is written in the form ξ = − an bn−1(ξ) . (3.193) We obtain thus an iterative formula in which ξ∗ = − an bn−1(ξ) , (3.194) from which ξ = ξ∗ − ξ = − an + ξbn−1(ξ) bn−1(ξ) (3.195) or, equivalently, ξ∗ = ξ − R bn−1(ξ) . (3.196) On the other hand, we have seen in the previous paragraph that bn−1(ξ) = f (ξ) − an ξ (3.197) and the recurrence relation (3.193) becomes ξ∗ = − anξ f (ξ) − an . (3.198) Hence, it follows that the first Lin method is equivalent to the application of the method of con- tractions to the function F(ξ) = − anξ f (ξ) − an ; (3.199) this method is convergent if |F (x)| = an xf (x) − f (x) + an [f (x) − an]2 < 1. (3.200) On the other hand, if ξ is a root of the equation f (x) = 0, then we may write µr = F (ξr ) = 1 + ξr an f (ξr ) = 1 + ξr f (ξr ) f (0) (3.201) and the convergence is ensured if |µr | = 1 + ξr an f ξr < 1, (3.202) that is, if the start value for the iterations sequence is sufficiently close to ξr . The second method of Lin starts from the idea of dividing the polynomial f (x) = xn + a1xn−1 + · · · + an (3.203) by the quadratic factor x2 + px + q, obtaining xn + a1xn−1 + · · · + an = (x2 + px + q)(xn−2 + b1xn−3 + · · · + bn−2) + Rx + S. (3.204) It follows therefore that x2 + px + q is a divisor of f if and only if R = 0 and S = 0.
  • 90. LIN METHODS 81 Expanding the computations in equation (3.204), we obtain the relations a1 = b1 + p, a2 = b2 + pb1 + q, a3 = b3 + pb2 + qb1, . . . , an−2 = bn−2 + pbn−3 + qbn−4, an−1 = R + pbn−2 + qbn−3, an = S + qbn−2. (3.205) Using the recurrence formula bk = ak − pbk−1 − qbk−2, k = 1, n, b0 = 1, b−1 = 0, (3.206) it follows that R and S are given by R = bn−1 = an−1 − pbn−2 − qn−3, S = bn + pbn−1 = an − qbn−2. (3.207) Using the condition R = 0, S = 0, so that x2 + px + q divides f , we obtain an−1 − pbn−2 − qbn−3 = 0, an − qbn−2 = 0. (3.208) Lin’s idea consists in applying the method of successive iterations to the sequences defined by p = an−1 − qbn−3 bn−2 , q = an bn−2 , (3.209) so that the new values p∗ , q∗ after iteration become p∗ = an−1 − qbn−3 bn−2 , q∗ = an bn−2 , (3.210) p = p∗ − p = an−1 − pbn−2 − qbn−3 bn−2 , q = q∗ − q = an − qbn−2 bn−2 (3.211) or, equivalently, p∗ = p + R bn−2 , q∗ = q + S bn−2 . (3.212) Because x1 and x2 are the roots of the equation x2 + px + q, we have Rx1 + S = f (x1), Rx2 + S = f (x2), (3.213) resulting in the expressions x1(p∗ − p) + (q∗ − q) = qf (x1) an − S , x2(p∗ − p) + (q∗ − q) = qf (x2) an − S . (3.214) Denoting the roots of the equation x2 + p∗x + q∗ = 0 by x∗ 1 , x∗ 2 , we obtain the relations (x2 − x1)(x∗ 1 − x1) = qf (x1) an − S − (x∗ 1 − x1)(x∗ 2 − x2), (x2 − x1)(x∗ 2 − x2) = − qf (x2) an − S + (x∗ 1 − x1)(x∗ 2 − x2). (3.215)
  • 91. 82 SOLUTION OF ALGEBRAIC EQUATIONS If (x∗ 1 , x∗ 2 ) is sufficiently close to (x1, x2), then Lagrange’s theorem of finite increments leads to x1 − x∗ 1 ≈ 1 + ξ1ξ2 ξ2 − ξ1 f ξ1 an (ξ1 − x1), x2 − x∗ 2 ≈ 1 − ξ1ξ2 ξ2 − ξ1 f ξ2 an (ξ2 − x2), (3.216) where ξ1 and ξ2 are the roots of equation f (x) = 0. Hence, the method is convergent if the moduli of the expressions in the brackets of relation (3.50) are strictly subunitary. Moreover, it is necessary that the start values for p and q be sufficiently close to −(ξ1 + ξ2) and ξ1ξ2, respectively. 3.8 NUMERICAL EXAMPLES Example 3.1 Let us consider the polynomial P (x) = X5 + 3X4 − 2X3 + 6X2 + 5X − 7 (3.217) for which we wish to determine the limits between which its roots can be found. Using the notation in Section 3.1, we have a = max{|3|, |−2|, |6|, |5|, |−7|} = 7, a0 = 1, (3.218) a = max{|1|, |3|, |−2|, |6|, |5|} = 6, a5 = −7, (3.219) so that the roots of the equation P (x) = 0 can be found in the interval 7 13 = 7 6 + 7 ≤ |x| ≤ 1 + 7 1 = 8. (3.220) The positive roots of the equation P (x) = 0 has as upper limit, the value L = 8. (3.221) Let us consider the equations P (−x) = 0, (3.222) x5 P 1 x = 0, (3.223) and −x5 P − 1 x = 0, (3.224) which may be written also in the forms x5 − 3x4 − 2x3 − 6x2 + 5x + 7 = 0, (3.225) 7x5 − 5x4 − 6x3 + 2x2 − 3x − 1 = 0, (3.226) and 7x5 + 5x4 − 6x3 − 2x2 − 3x + 1 = 0. (3.227)
  • 92. NUMERICAL EXAMPLES 83 The upper limits of the positive roots of these equations are given by L1 = 1 + 7 1 = 8, L2 = 1 + 6 7 = 13 7 , L3 = 1 + 6 7 = 13 7 , (3.228) so that the real roots of the equation P (x) = 0 are to be found in the set M1 = −L1, − 1 L3 1 L2 , L = −8, − 7 13 7 13 , 8 . (3.229) If we solve the problem by using the second method of determination of the upper limit of the roots of the equation, we get (i) for the equation P (x) = 0: a0 = 1, A = 7, k = 2, L = 1 + (a0/A)1/k = 1 + √ 1/7; (ii) for equation (3.225): a0 = 1, A = 6, k = 1, L1 = 1 + 1/6 = 7/6; (iii) for equation (3.226): a0 = 7, A = 5, k = 1, L2 = 1 + 7/5 = 12/5; (iv) for equation (3.227): a0 = 7, A = 6, k = 2, L3 = 1 + (7/6)1/2 = 1 + √ 7/6. In this case, the real roots of the equation P (x) = 0 have to be found in the set M2 = − 7 6 , − 1 1 + √ 7/6 5 12 , 1 + 1/7 . (3.230) Let us denote by f (x), f1(x), f2(x), and f3(x) the functions f (x) = x5 + 3x4 − 2x3 + 6x2 + 5x − 7, f1(x) = x5 − 3x4 − 2x3 − 6x2 + 5x + 7, f2(x) = 7x5 − 5x4 − 6x3 + 2x2 − 3x − 1, f3(x) = 7x5 + 5x4 − 6x3 − 2x2 − 3x + 1, (3.231) for which the derivatives are f (x) = 5x4 + 12x3 − 6x2 + 12x + 5, f (x) = 20x3 + 36x2 − 12x + 12, f (x) = 60x2 + 72x − 12, f (iv) (x) = 120x + 72, f (v) (x) = 120, (3.232) f1(x) = 5x4 − 12x3 − 6x2 − 12x + 5, f1 (x) = 20x3 − 36x2 − 12x − 12, f1 (x) = 60x2 − 72x − 12, f (iv) 1 (x) = 120x − 72, f (v) 1 (x) = 120, (3.233) f2(x) = 35x4 − 20x3 − 18x2 + 4x − 3, f2 (x) = 140x3 − 60x2 − 36x + 4, f2 (x) = 420x2 − 120x − 36, f (iv) 2 (x) = 840x − 120, f (v) 2 (x) = 840, (3.234) f3(x) = 35x4 + 20x3 − 18x2 − 4x − 3, f3 (x) = 140x3 + 60x2 − 36x − 4, f3 (x) = 420x2 + 120x − 36, f (iv) 3 (x) = 840x + 120, f (v) 3 (x) = 840. (3.235) To apply Newton’s method, we search first for a value M > 0 so that f (v)(M) > 0. Obviously, M may be any positive real number. We choose M = 0.1. We search now for a value M ≥ M so that f (iv) (M ) > 0. We choose M = M. The procedure is continued for the value M and the derivative f (x), obtaining M = M = M. Step by step, it follows that we may choose the value L = 1 for the function f (x). Analogically, we get the following values • for f1(x): L1 = 4; • for f2(x): L2 = 2; • for f3(x): L3 = 1.
  • 93. 84 SOLUTION OF ALGEBRAIC EQUATIONS It follows that the real roots of the equation f (x) = 0 have to be found in the set M3 = [−4, −1] 1 2 , 1 . (3.236) Let us solve the same problem by the method of grouping the terms. For f (x) we may make a group of the form (x5 + 3x4 − 2x3 ) + (6x2 + 5x − 7), (3.237) for which we find as upper bounds of the positive roots the values M1 = 2, M2 = 1, so that an upper bound of these roots is given by M1 = 2. In the same case of the function f (x) we may make also the group (x5 − 2x3 ) + (3x4 + 6x2 + 5x − 7), (3.238) for which the upper bounds of the positive roots are the values M3 = 2 and M4 = 1, from which we deduce that the upper bound of the positive roots is given by the value M3 = 2. In conclusion, the upper limit of the positive roots of the equation f (x) = 0 is L = max{M1, M3} = 2. By an analogous procedure, it follows that: • for f1(x) there is only one possibility of grouping (x5 − 3x4 − 2x3 − 6x2 ) + (5x + 7), (3.239) hence the value L1 = 4; • for f2(x), there is only one possibility of grouping (7x5 − 5x4 − 6x3 ) + (2x2 − 3x − 1) (3.240) to which corresponds L2 = 2; • for f3(x) the possibilities of grouping (7x5 + 5x4 − 6x3 − 2x2 − 3x) + (1), (3.241) with L3 = 1, (7x5 − 6x3 ) + (5x4 − 2x2 − 3x) + (1), (3.242) with L3 = 2, (7x5 − 6x3 − 2x2 ) + (5x4 − 3x) + (1), (3.243) with L3 = 2, (7x5 − 6x3 − 3x) + (5x4 − 2x2 ) + (1), (3.244) with L(iv) 3 = 2, (7x5 − 2x2 − 3x) + (5x4 − 6x3 ) + (1), (3.245) with L(v) 3 = 2, (7x5 − 2x2 ) + (5x4 − 6x3 − 3x) + (1), (3.246) with L(vi) 3 = 2, (7x5 − 3x) + (5x4 − 6x3 − 2x2 ) + (1), (3.247) with L(vii) 3 = 2, so that L3 = 2.
  • 94. NUMERICAL EXAMPLES 85 In conclusion, the real roots of the equation f (x) = 0 may be found in the set M4 = −4, − 1 2 1 2 , 2 . (3.248) We observe that the four methods lead to different results. Moreover, Newton’s method and the method of grouping of terms lead to sufficiently laborious expressions for the determination of the values L, L1, L2, and L3, because they imply polynomials of a great degree for which we have no formulas to calculate the roots. In the example presented here, we have preferred to determine these limits by entire numbers, although sometimes they can be found as roots of some algebraic equations of small degrees (1 or 2). The first two methods are simpler to apply, the second one having a more restricted area of values for the real roots. Example 3.2 We wish to determine, as a function of the real parameter λ, the number of negative and positive roots of the equation x4 − 2x2 − λ = 0. (3.249) To do this, we denote by f : R → R the polynomial function f (x) = x4 − 2x2 − λ, (3.250) the derivative of which is f (x) = 4x3 − 4x, (3.251) so that the first two polynomials of the Sturm sequence are f0(x) = x4 − 2x2 − λ, (3.252) f1(x) = x3 − x. (3.253) Dividing f0 by f1, we obtain the quotient x and the remainder (−x2 − λ), so that the following polynomial in the Sturm sequence reads f2(x) = x2 + λ. (3.254) Now dividing f1 by f2 results in the quotient x and the remainder −(λ + 1)x, from we get the polynomial f3(x) = (λ + 1)x. (3.255) We continue this process with the polynomials f2 and f3, for which we obtain the quotient x/(λ + 1) and the remainder λ; hence, the last polynomial of the Sturm sequence is f4(x) = −λ. (3.256) Case 1 λ ∈ (−∞, −1) We construct the following table, where ε > 0 is a sufficiently small value. f0 f1 f2 f3 f4 WS −∞ + − + + + 2 −ε + + − + + 2 ε + − − – + 2 ∞ + + + − + 2 The number of negative roots of the equation f (x) = 0 is given by WS(−∞) − WS(−ε) = 2 − 2 = 0, (3.257)
  • 95. 86 SOLUTION OF ALGEBRAIC EQUATIONS while the number of positive roots of the same equation is WS(ε) − WS(∞) = 0. (3.258) In conclusion, for λ ∈ (−∞, −1) our equation has no real roots. Case 2 λ = −1 In this case, the equation f (x) = 0 becomes x4 − 2x2 + 1 = (x2 − 1)2 = 0 (3.259) and has the double roots x1 = −1 and x2 = 1. Case 3 λ ∈ (−1, 0) We construct the following table, where ε is a sufficiently small positive value. f0 f1 f2 f3 f4 WS −∞ + − + − + 4 −ε + + − − + 2 ε + − − + + 2 ∞ + + + + + 0 It follows that the number of negative values of the equation f (x) = 0 is WS(−∞) − WS(−ε) = 4 − 2 = 2, (3.260) while the number of positive roots of the same equation is given by WS(ε) − WS(∞) = 2 − 0 = 2. (3.261) Case 4 λ = 0 The equation f (x) = 0 now takes the form x4 − 2x2 = x2 (x2 − 2) = 0 (3.262) and has the double root x1 = 0 and the simple roots x2 = − √ 2 and x3 = √ 2. Case 5 λ ∈ (0, ∞) We construct the following table, in which ε > 0 is a sufficiently small value. f0 f1 f2 f3 f4 WS −∞ + − + − + 4 −ε + + − − + 2 ε + − − + + 2 ∞ + + + + + 0 In this case, the number of negative roots of the equation f (x) = 0 is WS(−∞) − WS(−ε) = 3 − 2 = 1, (3.263) while the number of positive roots is WS(ε) − WS(∞) = 2 − 1 = 1. (3.264)
  • 96. NUMERICAL EXAMPLES 87 If we were to apply Descartes’ theorem to solve the same problem, then we would find that for λ > 0, we have only one variation of sign in the sequence of the coefficients of the polynomial x4 − 2x2 − λ, which means that equation (3.249) has only one positive root. Making x → −x, we obtain the same equation (3.249) and, analogically it follows that it has a negative root. If λ < 0, then Descartes’ theorem shows that equation (3.249) has zero or two positive roots and zero or two negative roots. The same conclusion is obtained from Budan’s theorem. Example 3.3 Let us consider the equation f (x) = x3 + x − 3 = 0, (3.265) the roots of which we wish to determine. We begin by presenting an exact method of solving the equation of third degree, that is, the method of Hudde (Johann van Waveren Hudde, 1628–1704). Let us observe that any equation of third degree, a0y3 + a1y2 + a2y + a3 = 0, a0 = 0, (3.266) may be brought to the canonical form x3 + ax + b = 0, (3.267) by dividing it by a0 and by the transformation y = x − a1 3a0 . (3.268) We search for solutions of the form x = u + v, u, v ∈ C, (3.269) for equation (3.267). It follows that (u + v)3 + a(u + v) + b = 0 (3.270) or, equivalently, u3 + v3 + 3uv(u + v) + a(u + v) + b = 0. (3.271) We shall determine u and v so that u3 + v3 = −b, uv = − a 3 . (3.272) The last relation (3.272) leads to u3 + v3 = − a3 27 , (3.273) hence u3 and v3 are solutions of second-degree equation z2 + bz − a3 27 = 0, (3.274)
  • 97. 88 SOLUTION OF ALGEBRAIC EQUATIONS from which z1,2 = −b ± b2 + 4a3 27 2 . (3.275) We get the values u = 3 −b − b2 + 4a3 27 2 , v = 3 −b + b2 + 4a3 27 2 . (3.276) Let us denote by the expression = 4a3 27 + b2 , (3.277) called the discriminant of the equation of third degree. Three situations may occur: Case 1 = 0. In this case, all the roots of equation (3.267) are real, one of them being a double; this is just the condition for such a root. Indeed, denoting by g(x) the function g(x) = x3 + ax + b, (3.278) the derivative of which is g (x) = 3x2 + a. (3.279) From the condition that g(x) and g (x) have a common root, we deduce 3x3 + 3ax + 3b = 0, 3x3 + ax = 0, (3.280) from which 2ax + 3b = 0. (3.281) Hence the common root is x = − 3b 2a . (3.282) Replacing x in equation (3.267), we get − 27b3 8a3 − 3b 2 + b = 0, (3.283) from which 27b3 4a3 + b = 0, (3.284) that is, the condition = 0. Case 2 < 0. In this situation, expressions (3.276) become u = 3 −b − i √ | | 2 , v = 3 −b + i √ | | 2 (3.285) or, taking into account the trigonometric representation of complex numbers, u = 3 A(cos θ − i sin θ), v = 3 A(cos θ + i sin θ), (3.286)
  • 98. NUMERICAL EXAMPLES 89 where A = 1 2 b2 + 2 (3.287) and θ is the argument, θ ∈ [0, 2π). We deduce the values u1 = 3 √ A cos θ 3 − i sin θ 3 , u2 = 3 √ A cos θ + 2π 3 − i sin θ + 2π 3 , u3 = 3 √ A cos θ + 4π 3 − i sin θ + 4π 3 , (3.288) v1 = 3 √ A cos θ 3 + i sin θ 3 , v2 = 3 √ A cos θ + 2π 3 + i sin θ + 2π 3 , v3 = 3 √ A cos θ + 4π 3 + i sin θ + 4π 3 , (3.289) and the roots of equation (3.267) are x1 = u1 + v1 = 2 3 √ A cos θ 3 , x2 = u2 + v2 = 2 3 √ A cos θ + 2π 3 , x3 = u3 + v3 = 2 3 √ A cos θ + 4π 3 . (3.290) All these roots are real and distinct. Case 3 > 0. In this situation, expressions (3.276) read u = 3 −b − √ 2 , v = 3 −b + √ 2 (3.291) or, equivalently, u = 3 |b + √ | 2 (cos λπ + i sin λπ), v = 3 |b − √ | 2 (cos µπ + i sin µπ), (3.292) where λ and µ are two entire parameters with the values 0 or 1, function of the sign of the expressions −b − √ and −b + √ . It follows that u1 = 3 |b + √ | 2 cos λπ 3 + i sin λπ 3 , u2 = 3 |b + √ | 2 cos λπ + 2π 3 + i sin λπ + 2π 3 , u3 = 3 |b + √ | 2 cos λπ + 4π 3 + i sin λπ + 4π 3 , (3.293) v1 = 3 |b − √ | 2 cos µπ 3 + i sin µπ 3 , v2 = 3 |b − √ | 2 cos µπ + 2π 3 + i sin µπ + 2π 3 , v3 = 3 |b − √ | 2 cos µπ + 4π 3 + i sin µπ + 4π 3 . (3.294) We obtain nine combinations for the roots of equation (3.267) from which only three lead to roots, one real and two complex conjugate.
  • 99. 90 SOLUTION OF ALGEBRAIC EQUATIONS Returning to equation (3.265), it has already been brought to the canonical form with a = 1 and b = −3. It follows that = 4a3 27 + b2 = 247 27 > 0, (3.295) and hence equation (3.265) has a real root and two complex conjugate. We have u = 3 3 − 247 27 2 , v = 3 3 + 247 27 2 , (3.296) so that u1 = −0.230806, u2 = 0.115403 + 0.199883i, u3 = 0.115403 − 0.199883i, (3.297) v1 = 1.444217, v2 = −0.722109 + 1.250729i, v3 = −0.722109 − 1.250729i. (3.298) These result in the solutions x1 = u1 + v1 = 1.213411, x2 = u2 + v2 = −0.606706 + 1.450612i, x3 = u3 + v3 = −0.606706 − 1.450612i. (3.299) Applying Descartes theorem to the function f (x), we deduce that equation (3.265) has only one positive root. Now making x → −x in equation (3.265), we deduce the equation x3 + x + 3 = 0, (3.300) so that equation (3.265) has no negative roots. In conclusion, equation (3.265) has a positive root and two complex conjugate roots. Let us apply now Lagrange’s method to determine the positive root of equation (3.265). We have f (1) = −1 < 0, f (2) = 7 > 0, hence the positive root of equation (3.265) lies between 1 and 2. We construct the following table: 1 0 1 −3 1 1 1 2 −1 1 1 2 4 1 1 3 1 1 It results in the equation f1(x) = x3 − 4x2 − 3x − 1 = 0, (3.301) while the solution reads x = 1 + 1 · · · . (3.302) As f1(4) = −13 < 0, f1(5) = 9 > 0, the equation f1(x) = 0 has a root between 4 and 5, while the solution x reads as x = 1 + 1 4 + 1 ... . (3.303)
  • 100. NUMERICAL EXAMPLES 91 We construct the following table: 1 −4 −3 −1 4 1 0 −3 −13 4 1 4 13 4 1 8 4 1 and obtain the equation f2(x) = 13x3 − 13x2 − 8x − 1 = 0, (3.304) for which f2(1) = −9 < 0, f2(2) = 35 > 0. Now, the solution becomes x = 1 + 1 4 + 1 1 + 1 · · · . (3.305) It results in the table 13 −13 −8 −1 1 13 0 −8 −9 1 13 13 5 1 13 26 1 13 and the new equation f3(x) = 9x3 − 5x2 − 26x − 13 = 0, (3.306) for which f3(2) = −13 < 0, f3(3) = 107 > 0; the equation f3(x) = 0 has a root between 2 and 3. Moreover, the solution x takes the form x = 1 + 1 4 + 1 1 + 1 2 + 1 ... (3.307) and we obtain the following table. 9 −5 −26 −13 2 9 13 0 −13 2 9 31 62 2 9 49 2 9 It results in the equation f4(x) = 13x3 − 62x2 − 49x − 9 = 0, (3.308)
  • 101. 92 SOLUTION OF ALGEBRAIC EQUATIONS TABLE 3.2 Solving of Equation (2.49) by the Lobachevski–Graeffe Method Step a0 a1 a2 a3 0 1 0 1 −3 1 1 2 1 −9 2 1 −2 37 −81 3 1 70 1045 −6561 4 1 −2810 2010565 −43046721 5 1 −3874970 3.800449 × 1012 −1.853020 × 1015 6 1 −7.414495 × 1012 1.412905 × 1025 −3.433684 × 1030 7 1 −2.671664 × 1025 2.081974 × 1050 −1.179018 × 1061 for which f4(5) = −179 < 0, f4(6) = 273 > 0; the solution x takes the form x = 1 + 1 4 + 1 1 + 1 2 + 1 5 + 1 ... . (3.309) We stop here and write x ≈ 1 + 1 4 + 1 1 + 1 2 + 1 5 + 1 = 108 89 = 1.213483, (3.310) the precision of determination of the solution being x − 108 89 < 1 892 = 1 7921 . (3.311) Let us solve now equation (3.265) using the Lobachevski–Graeffe method. We may pass from the coefficients a (p) i , i = 0, 3, at the step p, to the coefficients a (p+1) i , i = 0, 3, using the formulae a (p+1) 0 = [a (p) 0 ]2 , a (p+1) 1 = −{[a (p) 1 ]2 − 2a (p) 0 a (p) 2 }, a (p+1) 2 = [a (p) 2 ]2 − 2a (p) 1 a (p) 3 , a (p+1) 3 = −[a p 3 ]2 . (3.312) It results in Table 3.2. The changes of sign in the column of a1 indicates the presence of a pair of complex roots. The real root is determined by the relation x3 = ± 27 − a(7) 3 a(7) 2 = ±1.21341; (3.313) we observe that equation (3.265) is verified by x3 = 1.21341. (3.314)
  • 102. NUMERICAL EXAMPLES 93 Searching for the complex roots of the form x1 = α + iβ, x2 = α − iβ, (3.315) we obtain, from the Vi`ete relation, α = − a1 2a0 − 1 2 x3 = −0.60671. (3.316) If r is the modulus of the two complex roots (3.315), then r2 = 27 a(7) 2 a(7) 0 = 2.472368, (3.317) hence β2 = r2 − α2 = 2.104271, β = 1.45061. (3.318) The required roots are x1 = −0.60671 + 1.45061i, x2 = −0.60671 − 1.45061i, x3 = 1.21341. (3.319) We shall use now the Bernoulli method to solve equation (3.265). We choose parameters µk, k ∈ N, using the recurrence formula µk = −(a1µk−1 + a2µk−2 + · · · + ak−1µ1 + ak), (3.320) where a0 = 1, a1 = 0, a2 = 1, a3 = −3, ai = 0 for i < 0 or i > 3 and µi = 0 for i ≤ 0. Successively, we get µ1 = x1 + x2 + x3 = 0, (3.321) µ2 = x2 1 + x2 2 + x2 3 = −2, (3.322) µ3 = x3 1 + x3 2 + x3 3 = 9, (3.323) µ4 = −(a1µ3 + a2µ2 + a3µ1) = 2, (3.324) µ5 = −15, µ6 = 25, µ7 = 21, µ8 = −70, µ9 = 54, µ10 = 133, µ11 = −264, µ12 = 29, µ13 = 663, µ14 = −821, µ15 = −576, µ16 = 2810, µ17 = −1887, (3.325) s16 = µ2 16 − µ15µ17 = 6809188, s15 = µ2 15 − µ14µ16 = 2638786, t16 = µ16µ15 − µ17µ14 = −3167787, t15 = µ15µ14 − µ16µ13 = −1390134, (3.326) so that |x2|2 = |x3|2 ≈ s16 s15 = 2.580424, |x2| = |x3| ≈ 1.60637, (3.327) 2|x2| cos φ = 2|x3| cos φ ≈ t16 t15 = 2.278764, cos φ = 0.709. (3.328) Although the modulus of the two complex conjugate roots determined is relatively correct, their argument (cos φ) is obtained as positive; but, in reality, this has a negative value. Let us apply now the Bierge–Vi`ete method to determine the real root of equation (3.265). To do this, let ξ be a real number. Dividing f (x) twice by x − ξ, we obtain the following data.
  • 103. 94 SOLUTION OF ALGEBRAIC EQUATIONS 1 0 1 −3 ξ 1 ξ = b1 ξ2 + 1 = b2 ξ3 + ξ − 3 ξ 1 2ξ = c1 3ξ2 + 1 The following recurrence relation results: ξ∗ = ξ − −3 + ξ(ξ2 + 1) ξ2 + 1 + ξ × 2ξ = ξ − ξ3 + ξ − 3 3ξ2 + 1 . (3.329) As a matter of fact, we have obtained thus the same Newton method, the results of which have been presented before. We have seen that the application of the first Lin method is equivalent to the application of the contractions method to the function F(ξ) = − anξ f (ξ) − an = 3 ξ2 + 1 . (3.330) The method is convergent if 1 + x1 an f x1 < 1, (3.331) that is, 1 − 1.213411 3 3 × 1.2134112 + 1 < 1, (3.332) which is absurd because it leads to 1.191 < 1. The convergence is assured in the case of the second Lin method if we have simultaneously 1 + x2x3 x3 − x2 f x2 an < 1, 1 − x2x3 x3 − x2 f x3 an < 1. (3.333) We obtain 1 − 2.472367 2.901224 1 −3 3(−0.606706 − 1.450612i)2 + 1 < 1, 1 − 2.472367 2.901224 1 −3 3(−0.606706 + 1.450612i)2 + 1 < 1, (3.334) that is, |−0.195481 + 1.5i| < 1, |−0.195481 − 1.5i| < 1, (3.335) which is absurd; hence neither the second Lin method also cannot be applied. 3.9 APPLICATIONS Problem 3.1 A material point of mass m moves along the Ox-axis (Fig. 3.2) acted upon by a force F = −F0P x b , (3.336) where P is a polynomial of nth degree, while b is a given constant.
  • 104. APPLICATIONS 95 F x xMO Figure 3.2 Problem 3.1. Determine the displacements of extreme values, knowing the following initial conditions: t = 0, x = x0, ˙x = v0. Solution: 1. Theory From the theorem of kinetic energy, mv2 2 − mv2 0 2 = W, (3.337) where v is the velocity of the material point, while W is the work done by force F. W = x x0 F(x)dx, (3.338) and with the initial condition v = 0, we obtain the extreme values of the distance x as solutions of the equation − mv2 0 2 = x x0 F(x)dx. (3.339) With the help of the notations ξ = x b , ξ0 = x0 b , (3.340) we obtain from equation (3.339) the algebraic equation P(ξ) − k = 0, (3.341) where P(ξ) is a primitive of the polynomial P (ξ), while k is given by k = mv2 0 2bF0 + P(ξ). (3.342) Numerical application: m = 4 kg, x0 = 0, v0 = 20 ms−1 , F0 = 50 N, P (x/b) = A2(x/b)2 + A1(x/b) + A0, A0 = −2, A1 = 2, A2 = 3, b = 1 m. 2. Numerical computation We obtain the following successively: P (ξ) = 3ξ2 + 2ξ − 2, (3.343) P (ξ) = ξ3 + ξ2 − 2ξ, P(ξ0) = 0; (3.344) it results in the equation ξ3 + ξ2 − 2ξ − 16 = 0 (3.345) with the solutions ξ1 = −2.459120, ξ2 = −3.45912 − 3.749674i, ξ3 = −3.45912 + 3.749674i. (3.346)
  • 105. 96 SOLUTION OF ALGEBRAIC EQUATIONS Problem 3.2 Consider the system illustrated in Figure 3.3 (the schema of half of an automobile), formed by two equal masses m1 and a mass m2. The nonlinear springs (denoted by k1, ε1) give an elastic force Fe = k1z + ε1z2 , (3.347) where z is the elongation. ϕ l l x3 m2g m1g m1g k2 k2 k1,ε1 k1,ε1 x2 x1 Figure 3.3 Problem 3.2. The system moves in a vertical plane, the rotation of the bar of mass m2 (denoted by φ) being considered sufficiently small to admit the approximations sin φ ≈ φ, cos φ ≈ 1. Let us suppose that both the nonlinear and the linear springs are contracted. Determine • the positions of equilibrium; • their stability as a function of the parameter ε1, assuming that k1, k2, m1, m2 are known. Numerical application: m1 = 50 kg, m2 = 750 kg, l = 2 m, k2 = 20000 Nm−1 , k1 = 105 Nm−1 , J = [m2(2l)2 ]/12 = 1000 kg m2 , g = 9.8065 m s−2 . Solution: 1. Theory 1.1. Equations of equilibrium Isolating the three bodies, we obtain the representations in Figure 3.4. The equations of equilibrium are ε1x2 10 + k1x10 − k2(x30 + lφ0 − x10) − m1g = 0, (3.348) ε1x2 20 + k1x20 − k2(x30 − lφ0 − x20) − m1g = 0, (3.349) k2(x30 + lφ0 − x10) + k2(x30 − lφ0 − x20) − m2g = 0, (3.350) k2l(x30 − lφ0 − x20) + k2l(x30 + lφ0 − x10) = 0, (3.351) where the index 0 corresponds to the position of equilibrium.
  • 106. APPLICATIONS 97 m2g m1g m1g k1(x30 − lϕ − x20) k2(x30 + lϕ − x10) k1x10 + ε1 x 0 10 k1x20 + ε1 x 0 20 ϕ Figure 3.4 Isolation of the rigid bodies. The above equations may be put in the form ε1x2 10 + (k1 + k2)x10 − k2x30 − k2lφ0 = m1g, (3.352) ε1x2 20 + (k1 + k2)x20 − k2x30 + k2lφ0 = m1g, (3.353) −k2x10 − k2x20 + 2k2x30 = m2g, (3.354) x10 − x20 − 2lφ0 = 0. (3.355) 1.2. Positions of equilibrium From relation (3.355), we obtain φ0 = x10 − x20 2l , (3.356) which, replaced in relations (3.352) and (3.354), leads to ε1x2 10 + (k1 + k2)x10 − k2x30 − k2 2 (x10 − x20) = m1g, (3.357) ε1x2 20 + (k1 + k2)x20 − k2x30 + k2 2 (x10 − x20) = m1g. (3.358) From equation (3.354), we get x30 = m2g 2k2 + x10 + x20 2 . (3.359) Subtracting relation (3.357) from (3.358), term by term, it follows that ε1(x2 10 − x2 20) + k1(x10 − x20) = 0, (3.360) from which it follows that x10 = x20 or x10 + x20 = − k1 ε1 . (3.361)
  • 107. 98 SOLUTION OF ALGEBRAIC EQUATIONS If x10 = x20, then from equation (3.356) we obtain φ0 = 0, so that from equation (3.359) we get x30 = m2g 2k2 + x10 = m2g 2k2 + x20. (3.362) If x10 + x20 = −k1/ε1, then we may write x10 = − k1 ε1 − x20, x20 = − k1 ε1 − x10. (3.363) Relation (3.359) leads to x30 = m2g 2k2 − k1 2ε1 , (3.364) while from equation (3.356) we obtain φ0 = x10 l + k1 2lε1 , φ0 − x20 l − k1 2lε1 . (3.365) Equation (3.357) now takes the form ε1x2 10 + (k1 − k2)x10 − k1k2 2ε1 − m1 + m2 2 g = 0, (3.366) while equation (3.358) becomes ε1x2 20 + (k1 − k2)x20 − k1k2 2ε1 − m1 + m2 2 g = 0. (3.367) As a matter of fact, equations (3.366) and (3.367) are the same. The discriminant of these equations is = k2 1 + k2 2 + 4ε1 m1 + m2 2 g (3.368) and the condition ≥ 0 leads to the inequality ε1 ≥ − k2 1 + k2 2 4 m1 + m2 2 g . (3.369) The sum of the roots of equation (3.366) (of equation (3.367) too) is S = k1 − k2 ε1 = − k1 ε1 , (3.370) which means that the position of equilibrium (if it exists) is given by x10 = x20 = k2 − k1 − √ 2ε1 or x10 = x20 = k2 − k1 + √ 2ε1 . (3.371) As x10 > 0, x20 > 0 (the springs are compressed), from x10 + x20 = −k1/ε1 it follows that ε1 < 0. It follows, from the first equality (3.371), that k2 = , (3.372)
  • 108. APPLICATIONS 99 from which k2 1 + k2 2 + 4ε1 m1 + m2 2 g = k2 2, (3.373) that is, ε1 = − k2 1 4 m1 + m2 2 g , (3.374) which verifies inequalities (3.369) and ε1 < 0. It follows that the position of equilibrium is x10 = x20 = − k1 2ε1 > 0, (3.375) ε1 being given by equation (3.374). For the second equation (3.371), we obtain k2 = − √ , (3.376) which is absurd. Let us remark that equation (3.375) is a particular case of the first relation (3.361), and hence at equilibrium x10 = x20. 1.3. Equations of motion Using the schema in Figure 3.3, these equations are m1 ¨x1 = k2(x3 + lφ − x1) − k1x1 − ε1x2 1 + m1g, (3.377) m1 ¨x2 = k2(x3 − lφ − x2) − k1x2 − ε1x2 2 + m1g, (3.378) m2 ¨x3 = k2(−x3 − lφ + x1) + k2(−x3 + lφ + x2) + m2g, (3.379) J ¨φ = k2l(−x3 − lφ + x1) − k2l(−x3 + lφ + x2). (3.380) Denoting x1 = ξ1, x2 = ξ2, x3 = ξ3, φ = ξ4, ˙x1 = ξ5, ˙x2 = ξ6, ˙x3 = ξ7, ˙φ = ξ8, a10 = − ε1 m1 , a11 = − k1 + k2 m1 , a13 = k2 m1 , a14 = k2l m1 , (3.381) a20 = − ε1 m1 , a22 = − k1 + k2 m1 , a23 = k2 m1 , a24 = − k2l m1 , (3.382) a31 = k2 m2 , a32 = k2 m2 , a33 = − 2k2 m2 , (3.383) a41 = k2l J , a42 = − k2l J , a44 = − 2k2l2 J , (3.384) we obtain the system ˙ξ1 = ξ5, ˙ξ2 = ξ6, ˙ξ3 = ξ7, ˙ξ4 = ξ8, ˙ξ5 = a10ξ2 1 + a11ξ1 + a13ξ3 + a14ξ4 + g, ˙ξ6 = a20ξ2 2 + a22ξ2 + a23ξ3 + a24ξ4 + g, ˙ξ7 = a31ξ1 + a32ξ2 + a33ξ3 + g, ˙ξ8 = a41ξ1 + a42ξ2 + a44ξ4. (3.385)
  • 109. 100 SOLUTION OF ALGEBRAIC EQUATIONS 1.4. Stability of the positions of equilibrium Denoting by fk(ξ1, . . . , ξ8), k = 1, 8, the expressions of the right member of relations (3.385) and by jkl = ∂fk/∂ξl, k, l = 1, 8, their partial derivatives, the characteristic equation is −λ 0 0 0 1 0 0 0 0 −λ 0 0 0 1 0 0 0 0 −λ 0 0 0 1 0 0 0 0 −λ 0 0 0 1 j51 0 j53 j54 −λ 0 0 0 0 j62 j63 j64 0 −λ 0 0 j71 j72 j73 0 0 0 −λ 0 j81 j82 0 j84 0 0 0 −λ = 0, (3.386) from which j51 − λ2 0 j53 j54 0 j62 − λ2 j63 j64 j71 j72 j73 − λ2 0 j81 j82 0 j84 − λ2 = 0. (3.387) We obtain the algebraic equation of eighth degree in λ λ8 + Aλ6 + Bλ4 + Cλ2 + D = 0, (3.388) where A = −j51 − j62 − j73 − j74, (3.389) B = j62j73 + j62j84 + j73j84 − j64j82 − j63j72 +j51j62 + j51j73 + j51j84 − j53j71 − j54j81, (3.390) C = −j62j73j84 + j64j73j82 + j63j72j84 − j51j62j73 − j51j62j84 − j51j73j84 +j51j64j82 + j51j63j72 + j53j62j71 + j53j71j84 + j54j62j81 +j54j73j81, (3.391) D = j51j62j73j84 − j51j64j73j82 − j51j63j72j84 + j53j64j71j82 − j53j64j72j81 −j53j62j71j84 − j54j62j73j81 − j54j63j71j82 + j54j63j72j81. (3.392) Equation (3.388), with the notation u = λ2 , may be written in the form u4 + Au3 + Bu2 + Cu + D = 0 (3.393) and, for the position of stable equilibrium, it is necessary and sufficient that all the roots of equation (3.393) be negative and distinct (see 1.10. Discussion). The following situations may occur: • The roots are distinct. • There is a double root. • There is a triple root. • There is a root of an order of multiplicity equal to four. • There are two double roots.
  • 110. APPLICATIONS 101 1.5. Case of distinct roots Making u → −u in equation (3.393), we obtain u4 − Au3 + Bu2 − Cu + D = 0 (3.394) and, from Descartes’ theorem, we deduce the necessary condition for the existence of four negative roots A > 0, B > 0, C > 0, D > 0. (3.395) We construct the Sturm sequence associated to the polynomial f (u) = u4 + Au3 + Bu2 + Cu + D. (3.396) We choose f0(u) = u4 + Au3 + Bu2 + Cu + D, (3.397) f1 = u3 + 3A 4 u2 + B 2 u + C 4 . (3.398) Dividing f0 by f1, we obtain the remainder R2 = 8B − 3A2 16 u2 + 6C − AB 8 u + 16D − AC 16 . (3.399) We find that it is necessary that 8B − 3A2 = 0; in the opposite case, R2 would have a degree at most equal to 1 (as with the polynomial f2 in the Sturm sequence) and would result in only four terms in the Sturm sequence (f0, f1, f2, and f3 (the last term being a constant)), so that in the sequence f0(−∞), f1(−∞), f2(−∞), f3(−∞), we would have at most three variations of sign. It would follow that equation (3.393) has at most three negative roots, which is not convenient. As a conclusion, it results in the necessary condition 8B − 3A2 = 0. (3.400) Writing R2 = −α2u2 − β2u − γ2, (3.401) we may choose the following term of Sturm’s sequence in the form f2(u) = u2 + β2u + γ2, (3.402) where β2 = β2 α2 = 2(6C − AB) 8B − 3A2 , γ2 = γ2 α2 = 16D − AC 8B − 3A2 . (3.403) Dividing f1 by f2, we obtain the remainder R3 = −β3u − γ3, (3.404) where β3 = γ2 − B 2 + β2 3 4 A − β2 , γ3 = − C 4 + γ2 3 4 A − β2 . (3.405)
  • 111. 102 SOLUTION OF ALGEBRAIC EQUATIONS TABLE 3.3 Table of the Variations of Sign in the Sturm Sequence u f0 f1 f2 f3 f4 WS −∞ + − + − sgnf4 3 or 4 0 + + sgnγ2 sgnγ3 sgnf4 0, 1, 2, or 3 Similar considerations lead to the condition β3 = 0, from which 16D − AC 8B − 3A2 − B 2 + 2(6C − AB) 8B − 3A2 3 4 A − 2 (6C − AB) 8B − 3A2 = 0. (3.406) We choose f3(u) = u + γ3, (3.407) with γ3 = γ3 β3 . (3.408) Dividing f2 by f3 results in the remainder R4 = γ2 − γ3(β2 − γ3) (3.409) and the polynomial f4(u) = γ3(β2 − γ3) − γ2, (3.410) which must be nonzero (the roots are distinct!), from which we obtain the condition −C 4 + γ2 3 4 A − β2 γ2 − B 2 + β2 3 4 A − β2 β2 − −C 4 + γ2 3 4 A − β2 γ2 − B 2 + β2 3 4 A − β2 − γ2 = 0. (3.411) We may construct Table 3.3. The only possibility to have four negative distinct roots is that WS(−∞) = 4 and WS(0) = 0, from which result the conditions f4 > 0, γ2 > 0, γ3 > 0. (3.412) 1.6. The case of a double root If the polynomial f (u) given by (L) has a double root, say u, then u is also a root for the derivative f (u), that is, u4 + Au3 + Bu2 + Cu + D = 0, 4u3 + 3Au2 + 2Bu + C = 0. (3.413) Relations (3.413) multiplied by 4 and −u, respectively, summed, lead to Au3 + 2Bu2 + 3Cu + 4D = 0. (3.414) We multiply the second relation (3.413) by A, relation (3.141) by −4 and make the sum, obtaining (3A2 − 8B)u2 + (2AB − 12C)u + AC − 16D = 0. (3.415)
  • 112. APPLICATIONS 103 Summing relation (3.414), multiplied by (8B − 3A2 ), with relation (3.415), multi- plied by Au, we get the relation (4B2 + A2 B − 3AC)u2 + (6BC − 2A2 C − 4AD)u + D(3A2 − 8B) = 0. (3.416) Multiplying expressions (3.415) and (3.416) by (4B2 − A2 B − 3AC) and (8B − 3A2), respectively, and summing the results thus obtained, we get (8AB3 − 2A3 B2 − 28A2 BC + 36AC2 − 32ABD − 6A4 C + 12A3 D)u + 4AB2 C − A3 BC − 3A2 C2 − 128B2 D + 64A2 BD + 48ACD − 9A4 D = 0 (3.417) and the condition 4AB3 − A3 B2 − 14A2 BC + 18AC2 − 16ABD − 3A4 C + 6A3 D = 0. (3.418) We now construct Horner’s schema in Table 3.4. The other roots result from u2 + (A + 2u)u + 3u2 + 2Au + B = 0 (3.419) which must have two negative roots, distinct and different from u, from which result the conditions = (A + 2u)2 − 4(3u2 + 2Au + B) > 0, A + 2u > 0, 3u2 + 2Au + B > 0, −(A + 2u) ± √ 2 = u. (3.420) Writing relation (3.418) in the form E1u + E2 = 0, the notations being obvious, we also obtain the condition E2 E1 > 0. (3.421) 1.7. Case of a triple root Let us denote this root by u; then, it must satisfy the conditions u 4 + Au 3 + Bu 2 + Cu + D = 0, 4u 3 + 3Au 2 + 2Bu + C = 0, 6u 2 + 3Au + B = 0. (3.422) Multiplying the second relation (3.422) by 3, the third one by −2u and summing, we obtain the equation 3Au 2 + 4Bu + 3C = 0. (3.423) Summing now the last relation (3.422), multiplied by A, to relation (3.423), multi- plied by −2, it follows that (3A2 − 8B)u + AB − 6C = 0, (3.424) TABLE 3.4 Horner’s Schema for a Double Root 1 A B C D u 1 A + u u2 + Au + B u3 + Au2 + Bu + C 0 u 1 A + 2u 3u2 + 2Au + B 0
  • 113. 104 SOLUTION OF ALGEBRAIC EQUATIONS from which u = 6C − AB 3A2 − 8B < 0, 3A2 − 8B = 0. (3.425) We construct now Horner’s schema in Table 3.5. We obtain thus the last root u∗ = −A − 3u < 0. (3.426) 1.8. Case of the root of order of multiplicity equal to four Let u be this root. It will satisfy the relations u 4 + Au 3 + Bu 2 + Cu + D = 0, 4u 3 + 3Au 2 + 2Bu + C = 0, 6u 2 + 3Au + B = 0, 4u + A = 0, (3.427) from which u = − A 4 < 0; (3.428) it follows u + A 4 4 = u4 + Au3 + Bu2 + Cu + D, (3.429) from which B = 3A2 8 , C = A3 16 , D = A4 256 . (3.430) 1.9. Case of two double roots Let u < 0 and u < 0 be the two double roots. We may write u4 + Au3 + Bu2 + Cu + D = (u − u)2 (u − u)2 , (3.431) from which A = −2(u + u), B = (u + u)2 , C = −2uu(u + u), D = (uu)2 , A > 0, B > 0, C > 0, D > 0. (3.432) It follows that u and u are solutions of the equation z2 + A 2 z + √ D = 0, (3.433) that is, z1,2 = − A 4 ± 1 2 A2 − 16 √ D, (3.434) TABLE 3.5 Horner’s Schema for a Triple Root 1 A B C D u 1 A + u u 2 + Au + B u 3 + Au 2 + Bu + C 0 u 1 A + 2u 3u 2 + 2Au + B 0 u 1 A + 3u 0
  • 114. APPLICATIONS 105 obtaining thus a new condition A4 256 > D. (3.435) Denoting by u = − A 4 + 1 2 A2 − 16 √ D, u = − A 4 − 1 2 A2 − 16 √ D, (3.436) it follows that u + u = − A 2 , uu = √ D, (u + u)2 = A2 4 = B, −2uu(u + u) = A √ D = C. (3.437) 1.10. Discussion Let u = α + iβ, α = 0, β = 0 be a root of equation (3.413), which will be written in the trigonometric form u = |u|(cos θ + i sin θ), (3.438) from which λ = u 1 2 = |u| cos θ 2 + i sin θ 2 or λ = u 1 2 = |u| cos θ 2 + π + i sin θ 2 + π . (3.439) Let us remark that, irrespective of the value of θ, we get either cos(θ/2) > 0, or cos(θ/2 + π) > 0, hence equation (3.388) will have at least one root with a positive real part, that is, the position of equilibrium is unstable. Let us suppose now that a root of equation (3.393) is of the form u = iβ, β = 0, (3.440) that is, u = |β| cos π 2 + i sin π 2 or u = |β| cos 3π 2 + i sin 3π 2 . (3.441) We deduce λ = u 1 2 = |β| cos π 4 + i sin π 4 or λ = u 1 2 = |β| cos 3π 4 + i sin 3π 4 or λ = u 1 2 = |β| cos 5π 4 + i sin 5π 4 , (3.442) hence at least one root of the characteristic equation (3.388) has its real part positive, so that the equilibrium is unstable. The case α = 0, β = 0 leads to the root u = 0, from which it follows that λ = 0 is a double root of the characteristic equation (3.388). The linear approximation of the motion around the position of equilibrium will contain a term of the form Kt, where K is a constant, hence the equilibrium also is unstable. Thus, the only possibility of stability of equilibrium is that described by the fact that all the roots of equation (3.393) are negative.
  • 115. 106 SOLUTION OF ALGEBRAIC EQUATIONS −6.5 −6 −5.5 −5 −4.5 −4 −3.5 −3 −2.5 −2 × 105 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 ε1 ξ11 Figure 3.5 The first branch of stability described by ξ11 for ε1 < 0. If such a root u < 0 is double, then for the characteristic equation, we obtain the dou- ble roots λ1 = i √ |u|, λ2 = −i √ |u|. Each such root leads, in the linear approximation of the motion around the position of equilibrium, to terms of the form Kt √ |u| sin(πt/2); the equilibrium is unstable too. Hence, it follows that the equilibrium is stable (in fact, simply stable) if and only if the four roots of equation (3.393) are negative and distinct. 2. Numerical computation We obtain the values a11 = −2400, a13 = 400, a14 = 53.333, a22 = −2400, a23 = 400, a24 = −53.333, a31 = 26.667, a32 = 26.667, a33 = −53.333, a41 = 40, a42 = −40, a44 = −160, (3.443) a10 = − ε1 50 , a20 = − ε1 50 , (3.444) j51 = − ε1ξ1 25 − 2400, j53 = 400, j54 = 53.333, j62 = − ε1ξ2 25 − 2400 = − ε1ξ1 25 − 2400, j63 = 400, j64 = −53.333, j71 = 26.667, j72 = 26.667, j73 = −53.333, j81 = 40, j82 = −40, j84 = −160. (3.445) The stability diagrams are plotted in Figure 3.5, Figure 3.6, and Figure 3.7. We have to consider two branches for ε1 < 0. The first branch is given by ξ11 = k2 − k1 + k2 1 + k2 2 + 4ε1 m1 + m2 2 g 2ε1 (3.446)
  • 116. APPLICATIONS 107 −7 −6 −5 −4 −3 −2 −1 0 × 105 0 50 100 150 ε1 × 105ε1 ξ12ξ12 −6 −5.5 −5 −4.5 −4 −3.5 −3 −2.5 −2 −1.5 −1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (a) (b) Figure 3.6 (a) The second branch of stability described by ξ12 for ε1 < 0 and (b) detail of this branch. and the second one by ξ12 = k2 − k1 − k2 1 + k2 2 + 4ε1 m1 + m2 2 g 2ε1 . (3.447) They exist only if the expression under the radical is positive. The two branches start from the same point for which the expression under the radical vanishes. The first branch may lead, for values of ε1 sufficiently close to zero, to negative roots ξ11, a fact which is not in concordance with the hypothesis that all the springs are compressed. The branch contains simply stable positions of equilibrium and is presented in Figure 3.5. The second branch leads to solutions valid for any ε1 < 0. Moreover, these solutions define simply stable positions of equilibrium. For ε1 → 0, we obtain ξ12 → ∞. This branch is presented in Figure 3.6.
  • 117. 108 SOLUTION OF ALGEBRAIC EQUATIONS 0 (a) (b) 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 × 106 0 2 4 6 8 10 12 14 × 1011 ε1 × 106ε1 ξ13 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 ξ13 Figure 3.7 (a) Branch of stability described by ξ1 for ε1 > 0 and (b) detail of this branch. If ε1 > 0, then we have to consider only one branch, described by ξ1 = k2 − k1 + k2 1 + k2 2 + 4ε1 m1 + m2 2 g 2ε1 . (3.448) This branch leads to ξ1 → ∞ for ε1 → 0 too. It is presented in Figure 3.7. If ε1 = 0, then we obtain the linear case described by ξ1 = m1 + m2 2 g k1 , (3.449) which is a simply stable position of equilibrium.
  • 118. FURTHER READING 109 Obviously, the stability diagram in the general case is much more complicated and to draw it we must take into consideration all the possibilities of compression or expansion of the springs. Moreover, because the function that describes the elastic force in the nonlinear springs is not an odd function, the situations to be studied cannot be obtained one from the other by simple changes of sign. The diagrams that are presented are only parts of the stability diagram of the mechanical system considered. FURTHER READING Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of America. Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis: Theory, Methods and Practice. Boca Raton: CRC Press. Atkinson KE (1989). An Introduction to Numerical Analysis. 2nd ed. New York: John Wiley & Sons, Inc. Atkinson KE (2003). Elementary Numerical Analysis. 2nd ed. Hoboken: John Wiley & Sons, Inc. Bakhvalov N (1976). M´ethodes Num´erique. Moscow: Editions Mir (in French). Berbente C, Mitran S, Zancu S (1997). Metode Numerice. Bucures¸ti: Editura Tehnic˘a (in Romanian). Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole. Butt R (2009). Introduction to Numerical Analysis Using MATLAB. Boston: Jones and Bartlett Pub- lishers. Chapra SC (1996). Applied Numerical Methods with MATLAB for Engineers and Scientists. Boston: McGraw-Hill. Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson. Cira O, M˘arus¸ter S¸ (2008). Metode Numerice pentru Ecuat¸ii Neliniare. Bucures¸ti: Editura Matrix Rom (in Romanian). Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice-Hall, Inc. D´emidovitch B, Maron I (1973). ´El´ements de Calcul Num´erique. Moscow: Editions Mir (in French). DiBenedetto E (2010). Classical Mechanics: Theory and Mathematical Modeling. New York: Springer- Verlag. Epperson JF (2007). An Introduction to Numerical Methods and Analysis. Hoboken: John Wiley & Sons, Inc. Fung YC, Tong P (2011). Classical and Computational Solid Mechanics. Singapore: World Scientific Publishing. Gautschi W (1997). Numerical Analysis: An Introduction. Boston: Birkh¨auser. Godunov SK, Reabenki VS (1977). Scheme de Calcul cu Diferent¸e Finite. Bucures¸ti: Editura Tehnic˘a (in Romanian). Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen- tation of Algorithms. Princeton: Princeton University Press. Hamming RW (1987). Numerical Methods for Scientists and Engineers. 2nd ed. New York: Dover Publications. Hamming RW (2012). Introduction to Applied Numerical Analysis. New York: Dover Publications. Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing. Hildebrand FB (1987). Introduction to Numerical Analysis. 2nd ed. New York: Dover Publications. Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill. Jazar RN (2008). Vehicle Dynamics: Theory and Applications: New York: Springer-Verlag. Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd ed. Boca Raton: CRC Press.
  • 119. 110 SOLUTION OF ALGEBRAIC EQUATIONS Kleppner D, Kolenkow RJ (2010). An Introduction to Mechanics. Cambridge: Cambridge University Press. Kress R (1996). Numerical Analysis. New York: Springer-Verlag. Krˆılov AN (1957). Lect¸ii de Calcule prin Aproximat¸ii. Bucures¸ti: Editura Tehnic˘a (in Romanian). Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill. Lange K (2010). Numerical Analysis for Statisticians. 2nd ed. New York: Springer-Verlag. Lurie AI (2002). Analytical Mechanics. New York: Springer-Verlag. Marinescu G (1974). Analiz˘a Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Meriam JL, Kraige LG (2012). Engineering Mechanics: Dynamics. Hoboken: John Wiley & Sons, Inc. Otto SR, Denier JP (2005). An Introduction to Programming and Numerical Methods in MATLAB. London: Springer-Verlag. Pandrea N (2000). Elemente de Mecanica Solidului ˆın Coordonate Pl¨uckeriene. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Pandrea N, Pˆarlac S, Popa D (2001). Modele pentru Studiul Vibrat¸iilor Automobilelor. Pites¸ti: Tiparg (in Romanian). Pandrea N, St˘anescu ND (2002). Mecanic˘a. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian). Popovici P, Cira O (1992). Rezolvarea Numeric˘a a Ecuat¸iilor Neliniare. Timis¸oara: Editura Signata (in Romanian). Postolache M (2006). Modelare Numeric˘a. Teorie s¸i Aplicat¸ii. Bucures¸ti: Editura Fair Partners (in Romanian). Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific Computing. 3rd ed. Cambridge: Cambridge University Press. Quarteroni A, Sacco R, Saleri F (2010). Numerical Mathematics. 2nd ed. Berlin: Springer-Verlag. Ralston A, Rabinowitz P (2001). A First Course in Numerical Analysis. 2nd ed. New York: Dover Publications. Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press. Salvadori MG, Baron ML (1962). Numerical Methods in Engineering. Englewood Cliffs: Prentice-Hall, Inc. Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson. Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN. Bucures¸ti: Editura Tehnic˘a (in Romanian). St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian). St˘anescu ND (2011). Mechanical Systems with neo–Hookean Elements: Stability and Behavior. Saar- brucken: LAP. Stoer J, Bulirsh R (2010). Introduction to Numerical Analysis. 3rd ed. New York: Springer-Verlag. S¨uli E, Mayers D (2003). An Introduction to Numerical Analysis. Cambridge: Cambridge University Press. Teodorescu PP (2010). Mechanical Systems: Classical Models. Volume 1: Particle Mechanics. Dor- drecht: Springer-Verlag. Udris¸te C, Iftode V, Postolache M (1996). Metode Numerice de Calcul. Algoritmi s¸i Programe Turbo Pascal. Bucures¸ti: Editura Tehnic˘a (in Romanian).
  • 120. 4 LINEAR ALGEBRA 4.1 CALCULATION OF DETERMINANTS 4.1.1 Use of Definition Let A be a square matrix of order n, A ∈ Mn(R), the elements of which are aij , i, j = 1, n, hence A = [aij ]i=1,n j=1,n . (4.1) The determinant of matrix A, denoted by det A, is given by det A = σ∈ n sgnσ n i=1 aiσ(i), (4.2) where σ is a permutation of the set {1, 2, . . . , n}, n is the set of all these permutations, while sgnσ is the signature of the permutation σ, having the value 1 if σ is an even permutation and the value −1 if σ is an odd permutation. Observation 4.1 In the calculation of the determinant det A, there appear n! terms in formula (4.2). Observation 4.2 As n! is a quickly increasing function with respect to n, it follows that the number of terms that must be calculated becomes very large, even for small values of n. For each generated permutation, one must calculate its signature too. It follows that the calculation time increases considerably, even for small values of n. For instance, 7! = 5040, so that a determinant of seventh order will generate 5040 permutations, and it is necessary to determine the signature of every one of them. Observation 4.3 Formula (4.2) must be applied even in the case in which the value of the determinant can be exactly obtained by other methods. Numerical Analysis with Applications in Mechanics and Engineering, First Edition. Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.  2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc. 111
  • 121. 112 LINEAR ALGEBRA 4.1.2 Use of Equivalent Matrices This method starts from the following properties of the determinants: • if two rows (columns) of a square matrix A commute into each other, then a new matrix A is obtained for which det A = − det A; • if a row (column) of a square matrix is multiplied by λ, then a new matrix A is obtained for which det A = λ det A; • if a row (column) of a square matrix A is multiplied by λ and this is added to another row (column) of A, then a new matrix A is obtained for which det A = det A. The idea of the method consists in applying some such transformation to the matrix A, in order to obtain a new matrix A of a particular form, for which det A is easier to calculate (directly). We must take into account the factors λ1, . . . , λm that may occur because of the transformations made, so that det A = n i=1 λi det A . (4.3) Observation 4.4 Let us consider a permutation σ of the set {1, 2, . . . , n} and let us suppose that σ is unlike the identical one. Let us now write this permutation in the form σ = 1 2 · · · i · · · n σ (1) σ(2) · · · σ(i) · · · σ(n) . (4.4) Then there exists an index i ∈ {1, 2, . . . , n} so that σ(i) = j < i. Demonstration. Let us suppose that this affirmation is not true. Then, for any i ∈ {1, 2, . . . , n} we have σ(i) = j ≥ i. First, we take i = n. We deduce σ(n) = j ≥ n, hence σ(n) = n. Let us sup- pose now that i = n − 1. It follows that σ(n − 1) = j ≥ n − 1, from which σ(n − 1) = n − 1 or σ(n − 1) = n. But σ(n) = n, hence σ(n − 1) = n. We obtain σ(n − 1) = n − 1. Proceeding analog- ically for i = n − 2, i = n − 3, . . . , i = 1, it follows that σ(i) = i for any i, 1 ≤ i ≤ n. But, by definition, σ is different fromthe identical permutation. Hence, there is a contradiction, so that the supposition made is false and the observation is proved. The previous observation shows that any term in formula (4.2), excepting the one obtained by using the identical permutation, will contain a factor aij , i < j, that is, an element situated under the principal diagonal in the matrix A. It follows that for a matrix A of the form A =        a11 a12 · · · a1,n−1 a1n 0 a22 · · · a2,n−1 a2n ... ... ... ... ... 0 0 · · · an−1,n−1 an−1,n 0 0 · · · 0 ann        , (4.5) the determinant is easy to calculate; it is given by det A = n i=1 aii . (4.6) By this method, we try to obtain a matrix A of the form (4.5) so as to have det A = ± det A , (4.7)
  • 122. CALCULATION OF THE RANK 113 where we take the sign + in case of an even number of row permutations or the sign− in case of an odd number of such permutations. Observation 4.5 Let us suppose that at a certain transformation step we obtain aii = 0, for a certain i, and that for any j, 1 < j ≤ n, we have aji = 0. In this case, det A = 0 and it is no more necessary to obtain the form (4.5). Observation 4.6 The procedure presented above may be modified to obtain a matrix A of the form A =       0 0 · · · 0 a1n 0 0 · · · a2,n−1 a2n · · · · · · · · · · · · · · · 0 an−1,2 · · · an−1,n−1 an−1,n an1 an2 · · · an,n−1 ann       , (4.8) for which det A = − n i=1 ai,n+1−i. (4.9) 4.2 CALCULATION OF THE RANK Let A be a matrix with m rows and n columns and real elements, A ∈ Mm,n(R). Let us suppose that m ≤ n. By definition, the rank of the matrix A is the order of the greatest nonzero minor; but, to obtain this rank, we must consider a great number of determinants. Observation 4.7 We have rankA ≤ min{m, n}. (4.10) for a matrix A ∈ Mm,n(R). To calculate this rank we use its following properties: • The rank of the matrix A is equal to the rank of its transpose AT . • The rank of the matrix A is not modified by multiplying a row (column) by a nonzero number. • The rank of the matrix A does not change by commuting two of its rows (columns) into each other. • The rank of the matrix A is not modified by multiplying one of its rows (columns) by λ and by adding the result to another row (column) of A. The idea of the method consists in obtaining a matrix A of the same rank as A but of the particular form (m ≤ n) A =       a11 0 · · · 0 0 · · · 0 0 0 a22 · · · 0 0 · · · 0 0 · · · · · · · · · · · · · · · · · · · · · · · · 0 0 · · · am−1,m−1 0 · · · 0 0 0 0 · · · 0 am,m · · · 0 0       , (4.11) where the greatest nonzero minor is obtained by selecting those rows and columns for which aii = 0, 1 ≤ i ≤ m.
  • 123. 114 LINEAR ALGEBRA Hence, it follows that the rank of the matrix A is equal to the number of nonzero elements of the principal pseudo diagonal of matrix A in formula (4.11). Observation 4.8 We need to continue calculating until we obtain the form (4.11). If we try to obtain only a superior triangular matrix, then we may obtain an incorrect result. 4.3 NORM OF A MATRIX Definition 4.1 Let A and B be two matrices with m rows and n columns and real elements. We say that A ≤ B if for any i and j, 1 ≤ i ≤ m, 1 ≤ j ≤ n, we have aij ≤ bij . (4.12) Definition 4.2 Let A ∈ Mm,n(R). We define the modulus of the matrix A = [aij ]i=1,m j=1,n by the relation |A| = [|aij |]i=1,m j=1,n . (4.13) Proposition 4.1 The application of modulus has the following properties: (i) If A and B are two matrices with m rows and n columns and real elements, then |A + B| ≤ |A| + |B|. (4.14) (ii) If A ∈ Mm,n(R) and B ∈ Mn,p(R), then |AB| ≤ |A| · |B|. (4.15) (iii) If A ∈ Mm,n(R) and α ∈ R, then |αA| = |α| · |A|. (4.16) Demonstration. (i) Let A = [aij ]i=1,m j=1,n , B = [bij ]i=1,m j=1,n and let us denote by C = [cij ]i=1,m j=1,n the matrix sum, C = A + B. An element of this matrix is given by cij = aij + bij , (4.17) hence, by applying the modulus, we obtain |cij | = |aij + bij | ≤ |aij | + |bij |. (4.18) Because i and j have been chosen arbitrarily, 1 ≤ i ≤ m, 1 ≤ j ≤ n, we have the result (4.14).
  • 124. NORM OF A MATRIX 115 (ii) Let us denote by C = [cij ]i=1,m j=1,n the matrix product C = A · B. An element of this matrix is given by cij = n k=1 aik bkj . (4.19) Analogically, we denote by D = [dij ]i=1,m j=1,p the matrix D = |A| · |B|, an element of which is given by dij = n k=1 |aik ||bkj |. (4.20) Comparing relations (4.18) and (4.19) and taking into account that i and j are arbitrary, 1 ≤ i ≤ m, 1 ≤ j ≤ p, we obtain the relation (4.15). (iii) Let B = [bij ]i=1,m j=1,n be the matrix B = |αA|, an element of which is bij = |αaij | = |α||aij |. (4.21) We obtain the relation (4.16) immediately. Corollary 4.1 Let A ∈ Mn(R) be a square matrix of order n with real elements. Then, for p ∈ N arbitrary, one has the relation |Ap | ≤ |A|p . (4.22) Demonstration. If p = 0, then relation (4.22) is obvious because |A0 | = In and |A|0 = In. (4.23) Let us suppose that the relation is true for p and let us state it now for p + 1. We have |Ap+1 | = |A p · A|. (4.24) Applying the property (ii) of Proposition 4.1, we get |Ap · A| ≤ |Ap | · |A| (4.25) and relation (4.24) becomes |Ap+1 | = |Ap · A| ≤ |Ap | · |A| ≤ |A|p · |A| = |A|p+1 . (4.26) The principle of mathematical induction shows thus that relation (4.24) is true for any p ∈ N and the corollary is proved. Definition 4.3 Let A ∈ Mm,n(R). A real number satisfying the following properties is called the norm of the matrix A, and is denoted by A : (i) A ≥ 0 and A = 0 if and only if A = 0m,n. (ii) αA = |α| A for any α ∈ R.
  • 125. 116 LINEAR ALGEBRA (iii) A + B ≤ A + B for any matrix B ∈ Mm,n(R). (iv) AB ≤ A · B for any matrix B ∈ Mn,p(R). Observation 4.9 (i) If we put α = −1 in the property (ii) of Definition 4.3, it follows that −A = A for any A ∈ Mm,n(R). (4.27) (ii) From A = B + A − B ≤ B + A − B (4.28) it follows that A − B ≤ A − B (4.29) and, taking into account equation (4.27), we get B − A ≤ A − B (4.30) too. The last two relations lead to A − B ≥ | A − B |. (4.31) Definition 4.4 A norm of a matrix is called canonical if, in addition to the four properties of Definition 4.3, it also fulfills the following conditions: (i) For any matrix A ∈ Mm,n(R), A = [aij ]i=1,m j=1,n , we have |aij | ≤ A for any i = 1, m, j = 1, n; (ii) for any A, B ∈ Mm,n(R) with |A| ≤ |B|, we have A ≤ B . Proposition 4.2 Let A ∈ Mm,n(R) be a matrix with m rows and n columns and real elements. We define A ∞ = max 1≤i≤m n j=1 |aij |, (4.32) A 1 = max 1≤j≤m m i=1 |aij |, (4.33) A k = m i=1 n j=1 a2 ij . (4.34) Under these conditions, · ∞, · 1, and · k are canonical norms. Demonstration. One must verify six properties: (i) It is obvious that A ≥ 0 and A = 0 if and only if A = 0m,n for any of the three norms.
  • 126. NORM OF A MATRIX 117 (ii) The relation αA = |α| A is immediate because the modulus of the product is equal to the product of the moduli. (iii) Let B ∈ Mn,p(R) be arbitrary. We may write successively A + B ∞ = max 1≤i≤m n j=1 |aij + bij | ≤ max 1≤i≤m   n j=1 aij + n j=1 |bij |   ≤ max 1≤i≤m n j=1 |aij | + max 1≤i≤m n j=1 |bij | = A ∞ + B ∞, (4.35) A + B 1 = max 1≤j≤m m i=1 |aij + bij | ≤ max 1≤j≤m m i=1 aij + m i=1 |bij | ≤ max 1≤j≤m m i=1 |aij | + max 1≤j≤m m i=1 |bij | = A 1 + B 1, (4.36) A + B k = m i=1 n j=1 (aij + bij )2 = m i=1 n j=1 |aij + bij |2 ≤ m i=1 n j=1 (|aij | + |bij |)2 = m i=1 n j=1 |aij |2 + m i=1 n j=1 |bij |2 + 2 m i=1 n j=1 |aij ||bij |. (4.37) The Cauchy–Buniakowski–Schwarz inequality states that for any real numbers xi, yi, i = 1, r, one has r i=1 xiyi 2 ≤ r i=1 |xi|2 r i=1 |yi|2 , (4.38) hence m i=1 n j=1 |aij ||bij | ≤ m i=1 n j=1 |aij |2 m i=1 n j=1 |bij |2 . (4.39) From equations (4.37) and (4.39) we obtain A + B k ≤ m i=1 n j=1 |aij |2 + m i=1 n j=1 |bij |2 + 2 m i=1 n j=1 |aij |2 m i=1 n j=1 |bij |2 = m i=1 n j=1 |aij |2 + m i=1 n j=1 |bij |2 = A k + B k. (4.40) (iv) Let B ∈ Mn,p(R) be arbitrary. We have AB ∞ = max 1≤i≤m    p j=1 n l=1 ail blj    ≤ max 1≤i≤m p j=1 n l=1 |ail blj | = max 1≤i≤m n l=1 |ail | p j=1 |blj |. (4.41) But max n l=1 |ail | = A ∞ (4.42)
  • 127. 118 LINEAR ALGEBRA and p j=1 |blj | ≤ B ∞, (4.43) because it is a sum of moduli on the row l. It follows that AB ∞ ≤ A ∞ B ∞. (4.44) Then AB 1 = max 1≤l≤p    m i=1 n j=1 ail blj    ≤ max 1≤l≤p m i=1 n j=1 |ail blj | ≤ max 1≤l≤p n j=1 |blj | m i=1 |aij | = B 1 m i=1 |aij | ≤ B 1 A 1 = A 1 B 1. (4.45) We also have A · B k = m i=1 p j=1 n l=1 ail blj 2 ≤ m i=1 p j=1 n l=1 ail |blj | 2 . (4.46) From the Cauchy–Buniakowski–Schwarz inequality, we obtain n l=1 ail |blj | 2 ≤ n l=1 |ail |2 · n l=1 |blj |2 (4.47) and relation (4.46) becomes A · B k ≤ m i=1 n l=1 |ail | 2 p j=1 n l=1 |blj | 2 = A k B k. (4.48) (v) Let alc be an arbitrary element of matrix A. We may write A ∞ = |ai1| + |ai2| + · · · + |ain| ≥ |alc|, (4.49) A 1 = |a1j | + |a2j | + · · · + |amj | ≥ |alc|, (4.50) in these conditions, where i and j are the row and the column, respectively, on which the sum of the moduli is maximum A k = m i=1 n j=1 |aij |2 ≥ |alc|2 = |alc|. (4.51) (vi) This is an immediate consequence of (v). From |A| < |B| it follows that A ≤ B , where corresponds to any of the three norms.
  • 128. NORM OF A MATRIX 119 Definition 4.5 Let {Ak }k∈N∗ be a sequence of matrices of the same order Ak ∈ Mm,n(R), for any k ∈ N∗. We call the limit of the sequence of matrices {Ak}k∈N a matrix A = lim k→∞ Ak, A ∈ Mm,n(R), (4.52) for which the elements of A = [aij ]i=1,m j=1,n are given by aij = lim k→∞ a(k) ij , (4.53) where Ak = [a(k) ij ]i=1,m j=1,n . Proposition 4.3 (i) The necessary and sufficient condition for the sequence {Ak}k∈N to be convergent to the matrix A is lim k→∞ A − Ak → 0, (4.54) where is a certain canonical norm. Moreover, lim k→∞ Ak = A . (4.55) (ii) The Cauchy criterion for convergence: The necessary and sufficient condition for the sequence of matrices {Ak}k∈N to be convergent is that for any ε > 0, there exists N(ε) ∈ N so that for any k > N(ε) and any p ∈ N∗ , Ak+p − Ak < ε, where takes the place of any canonical norm. Demonstration. (i) The necessity. If Ak = [a(k) ij ]i=1,m j=1,n → [aij ]i=1,m j=1,n , then for any ε > 0, there exists N(ε) ∈ N so that for any k > N(ε), |aij − a(k) ij | < ε (4.56) or, equivalently, |A − Ak| < ε1m,n, (4.57) where 1m,n is the matrix with m rows and n columns and all elements equal to 1. Passing to the norm in relation (4.47) and taking into account that is a canonical norm, we have A − Ak < ε 1m,n , (4.58) for any k > N(ε). We get the relation lim k→∞ A − Ak = 0, (4.59) because 1m,n is a finite real number. The sufficiency. From A − Ak → 0 for k → ∞ it follows that for any ε > 0, there exists N(ε) ∈ N so that for any k > N(ε), A − Ak < ε. (4.60)
  • 129. 120 LINEAR ALGEBRA Taking into account that is a canonical norm, we get |aij − a(k) ij | ≤ A − Ak < ε (4.61) for k > N(ε). Therefore, it follows that lim n→∞ a(k) ij = aij , (4.62) hence lim k→∞ Ak = A. (4.63) On the other hand, one has | Ak − A | ≤ A − Ak < ε (4.64) and one obtains lim k→∞ Ak = A . (4.65) (ii) The necessity. If lim k→∞ Ak = A, then from (i) it follows that lim k→∞ A − Ak = 0. In that case, Ak+p − Ak = Ak+p − A + A − Ak ≤ Ak+p − A + A − Ak . (4.66) Let ε > 0. Because lim k→∞ A − Ak = 0, there exists N(ε) ∈ N∗ so that Ak − A < ε/2 for any k > N(ε). We may choose N(ε) so that Ak+p − A < ε 2 , A − Ak < ε 2 (4.67) for any p ∈ N; then equation (4.68) leads to Ak+p − Ak < ε. (4.68) The sufficiency. If Ak+p − Ak < ε, because is a canonical norm, we have |a (k+p) ij − a(k) ij | < ε (4.69) for any i, j, 1 ≤ i ≤ m, 1 ≤ j ≤ n. It follows that a(k) ij is a Cauchy sequence of real numbers, hence convergent; let us denote its limit by aij . Taking into account (i), the proposition is proved. Definition 4.6 Let {Ak}k∈N∗ be a sequence of matrices with Ak ∈ Mm,n(R) for any k ∈ N∗ . In that case, ∞ k=1 Ak = lim N→∞ N k=1 Ak. (4.70) Definition 4.7 (i) We say that the series SN = N k=1 Ak, with Ak ∈ Mm,n(R), is convergent if the sequence {SN }N∈N∗ is convergent. (ii) The series is called absolute convergent if the series SN = N k=1|Ak| is convergent.
  • 130. NORM OF A MATRIX 121 Proposition 4.4 (i) If the series SN = N k=1 Ak, Ak ∈ Mm,n(R), is convergent, then lim k→∞ Ak = 0. (ii) If a series of matrices is absolute convergent, then it is convergent. (iii) Let be a canonical norm. If the numerical series ∞ k=1 Ak is convergent, then the series ∞ k=1 Ak is absolute convergent. (iv) Let r be the convergence radius of the series ∞ k=1 Ak xk , where is a canonical norm. In this case, if x < r, then the series ∞ k=1 Akxk and ∞ k=1 xk Ak are convergent, where x is chosen so that we may calculate ( ∞ k=1 Akxk (x ∈ Mn(R)) or ∞ k=1 xk Ak (x ∈ Mm(R)), respectively). (v) If x < 1, then the series ∞ k=1 Axk and ∞ k=1 xkA, where the matrices A and x are of such a nature that one may effect the calculations, are convergent. (vi) Let x ∈ Mn(R) and A ∈ Mm,n(R), with x < 1. Under these conditions, ∞ k=1 Axk = A(In − x)−1 . (4.71) If x ∈ Mm(R) and x < 1, then ∞ k=1 xk A = (x − In)−1 A. (4.72) Demonstration. (i) Let SN = N k=1 Ak. As SN is convergent, there exists S ∈ Mm,n(R) so that S = lim N→∞ SN . On the other hand, AN+1 = SN+1 − SN and we pass to the limit after N. We have lim N→∞ AN+1 = lim N→∞ SN+1 − lim N→∞ SN = S − S = 0. (4.73) (ii) Let the series ∞ k=1 Ak be absolute convergent, that is, ∞ k=1|Ak| is convergent. But ∞ k=1 |Ak| = ∞ k=1 a(k) ij i=1,m j=1,n . (4.74) It follows that that any series ∞ k=1|a(k) ij | is convergent, hence any series ∞ k=1 a(k) ij is absolute convergent. Moreover, the series ∞ k=1 Ak = ∞ k=1 a(k) ij i=1,m j=1,n is absolute con- vergent. (iii) Let Ak = [a(k) ij ]i=1,m j=1,n . As is a canonical norm, it follows that |a(k) ij | ≤ Ak for any i, j, 1 ≤ i ≤ m, 1 ≤ j ≤ n. Hence, any series ∞ k=1 a(k) ij is absolute convergent and ∞ k=1 Ak is also absolute convergent. (iv) We may write successively N+p k=1 Akxk − N k=1 Akxk = AN+1xN+1 + · · · + AN+pxN+p ≤ AN+1 xN+1 + · · · + AN+p xN+p ≤ x N+1 ( AN+1 + · · · + AN+p x p−1 ) ≤ rN+1 ( A + · · · + AN+p rp−1 ); (4.75)
  • 131. 122 LINEAR ALGEBRA As the series ∞ k=1 Ak xk is convergent for x < r, it follows that N+p k=1 Akxk − N k=1 Akxk < ε, N > N(ε), p ∈ N. (4.76) Cauchy’s criterion states that ∞ k=1 Akxk is convergent. Analogically, we state that ∞ k=1 xkAk is convergent. (v) The series ∞ k=0 Axk is convergent for x < 1, as a consequence of (iv) with r = 1, the geometric series with subunitary ratio being convergent. Analogically, this follows for the series ∞ k=0 xkA. (vi) Starting from the relation A(In + x + x2 + · · · + xk)(In − x) = A(In − xk+1 ), (4.77) passing to the limit for k → ∞ and taking into account that xk+1 → 0, because x < 1, we obtain ∞ k=0 Axk (In − x) = A. (4.78) Let us consider the particular case A = In. One has ∞ k=0 xk (In − x) = In, (4.79) hence det ∞ k=0 xk det(In − x) = det(In) = 1, (4.80) from which det(In − x) = 0, (4.81) the matrix In − x being invertible. Hence, equation (4.78) leads to ∞ k=0 Axk = A(In − x)−1 . (4.82) The second relation is analogous. Corollary 4.2 (i) If x < 1, x ∈ Mn(R), then ∞ k=0 xk = (In − x)−1 . (4.83) (ii) Under the same conditions as in (i), we have (In − x)−1 ≤ In + x 1 − x . (4.84)
  • 132. INVERSION OF MATRICES 123 Demonstration. (i) It is a consequence of point (vi) of the previous proposition for A = In. (ii) We have (In − x)−1 = ∞ k=0 xk ≤ In + ∞ k=1 x k ≤ In + x 1 1 − x . (4.85) Observation 4.10 If In = 1, then relation (4.84) becomes (In − x)−1 ≤ 1 1 − x . (4.86) Proposition 4.5 (Evaluation of the Remainders). Let us denote by Rk the remainder of the series ∞ k=0 Axk , x < 1, that is, Rk = ∞ i=k+1 Axk . (4.87) Under these conditions, Rk ≤ A x k+1 1 − x . (4.88) Demonstration. We may write Rn = ∞ i=k+1 Axi ≤ A ∞ i=k+1 xi ≤ A ∞ i=k+1 x i = A x k+1 1 − A . (4.89) 4.4 INVERSION OF MATRICES 4.4.1 Direct Inversion Let the matrix A = [aij ]i,j=1,n, A ∈ Mn(R) (4.90) for which det A = 0; (4.91) that is, the matrix A is a nonsingular square matrix of order n with real elements. Under these conditions, the inverse of the matrix A is given by A−1 = 1 det A (−1)i+j ji i,j=1, n , (4.92) where lk is the determinant of the matrix Alk obtained from the matrix A by eliminating the row l and the column k, hence a square matrix of order n − 1, A =           a11 · · · a1,k−1 a1,k+1 · · · a1n a21 · · · a2,k−1 a2,k+1 · · · a2n · · · · · · · · · · · · · · · · · · al−1,1 · · · al−1,k−1 al−1,k+1 · · · al−1,n al+1,1 · · · al+1,k−1 al+1,k+1 · · · al+1,n · · · · · · · · · · · · · · · · · · an1 · · · an,k−1 an,k+1 · · · ann           . (4.93)
  • 133. 124 LINEAR ALGEBRA 4.4.2 The Gauss–Jordan Method This method1 is based on Lemma 4.1 (Substitution Lemma). Let V be a finite-dimensional vector space of dimension n and let B = {v1, v2, . . . , vn} a basis of the vapor space. Let x, x = 0, be a vector of V. Then x may replace any vector vi of the basis B if, by expressing it as a linear combination of the basis’ vectors, the scalar αi which multiplies vi is nonzero. Moreover, in this case, the set B = {v1, . . . , vi−1, x, vi+1, . . . , vn } is a basis of the vector space V. Demonstration. There exist scalars α1, α2, . . . , αn, not all equal to zero, so that x = n i=1 αivi, (4.94) because B is a basis. Let us suppose that α1 = 0. It follows that v1 = 1 α1 x − n i=2 αivi . (4.95) Let us show that the vectors x, v2, . . . , vn are linearly independent. We suppose firstly that this is not so; in this case, there can exist scalars β1, . . . , βn, not all zero, so that β1x + β2v2 + · · · + βnvn = 0. (4.96) Taking into account equation (4.94), we have β1(α1v1 + α2v2 + · · · + αnvn) + β2v2 + · · · + βnvn = 0, (4.97) from which β1α1v1 + (β1α1 + β2)v2 + · · · + (β1αn + βn)vn = 0. (4.98) As B is a basis, one obtains the system β1α1 = 0, β1α2 + β2 = 0, . . . , β1αn + βn = 0 (4.99) with the solution B, i = 1, n. The vectors x, v2, . . . , vn are thus linearly independent. Let us now show that they constitute a system of generators. Let y be a vector of V. As B is a basis, there exist scalars γi, i = 1, n, so that y = γ1v1 + γ2v2 + · · · + γnvn. (4.100) Taking into account equation (4.95), we get y = γ1 1 α1 x − α2v2 + α3v3 + · · · + αnvn + γ2v2 + · · · + γnvn, (4.101) from which we have y = γ1 α1 x + (γ2 − γ1α2)v2 + (γ3 − γ1α3)v3 + · · · + (γn − γ1αn)vn. (4.102) 1The method is named after Carl Friedrich Gauss (1777–1855) who discovered it in 1810, and Wilhelm Jordan (1842–1899) who described it in 1887. The method was known by the Chinese (tenth–second century BC). It also appears in 1670 attributed to Isaac Newton.
  • 134. INVERSION OF MATRICES 125 As y is arbitrary, the vectors x, v2, . . . , vn form a system of generators B , which is a new basis for V, hence the lemma is proved. If the matrix A is nonsingular, then its columns are linearly independent. Let us write A in the form A = [c1 c2 · · · cn], (4.103) where ci is the column i ci = [a1i a2i · · · ani ]T . (4.104) Let us consider now an arbitrary column vector of dimension n b = [b1 b2 · · · bn]T . (4.105) One can associate to every column ci, i = 1, n, a vector vi of Rn. The vectors vi, i = 1, n, are linearly independent too, because ci, i = 1, n, are linearly independent. One has dim Rn = n, so that vi and ci, respectively, form a basis. Hence, the vector b given by equation (4.105) will be generated by the columns of matrix A b = α1c1 + α2c2 + · · · + αncn. (4.106) In particular, b may be a column of the unit matrix. Let us construct the table a11 a12 · · · a1n a21 a22 · · · a2n · · · · · · · · · · · · an1 an2 · · · ann 1 0 · · · 0 0 1 · · · 0 · · · · · · · · · · · · 0 0 · · · 1 . (4.107) On the left side of the table are given the columns of the matrix A, while on the right side is the unit matrix. Thus, a row of the table has 2n elements. We multiply the rows of the table by numbers conveniently chosen, we commute them into one another or we add one to another, so as to obtain the unit matrix on the left side; and we obtain the inverse matrix A−1 on the right. The procedure is an application of the substitution lemma because, obviously, the columns of the matrix A (supposed to be nonsingular) as well as the column of the unit matrix are bases in the space Rn . Observation 4.11 If, at a given point, on the left side of the table, on trying to obtain the column i, we have aii = 0, ai+1,i = 0, . . . , ani = 0, then det A = 0 and we cannot obtain the inverse matrix. Observation 4.12 Usually, one tries to have on the position of aii the greatest modulus between |aii |, |ai+1,i|, . . . , |an,i|, so as to reduce the errors of calculation. 4.4.3 The Determination of the Inverse Matrix by its Partition Let us consider the nonsingular square matrix of nth order A ∈ Mn(R), with real elements. Let us partition the matrix A = A1 A3 A2 A4 , (4.108) where A1 ∈ Mp(R) is a square matrix of order p, p < n, A2 ∈ Mn−p,p(R) is a matrix with n − p rows and p columns, A3 ∈ Mp,n−p(R) is a matrix with p rows and n − p columns, while A4 ∈ Mn−p(R) is a square matrix of order n − p.
  • 135. 126 LINEAR ALGEBRA Let us denote by B = B1 B3 B2 B4 (4.109) the inverse of the matrix A, of the same form (4.108), where the dimensions of the matrices Bi are the same as those of the matrices Ai, i = 1, 4. As the matrix B is the inverse of the matrix A, one has A · B = In, (4.110) where In is the unit matrix of order n. Taking into account relations (4.108) and (4.109), relation (4.110) leads to A1 A3 A2 A4 B1 B3 B2 B4 = Ip 0p,n−p 0n−p,p In−p , (4.111) from which A1B1 + A3B2 = Ip, A1B3 + A3B4 = 0p,n−p, A2B1 + A4B2 = 0n−p,p, A2B3 + A4B4 = In−p. (4.112) The second relation (4.112) leads to B3 = −A−1 1 A3B4, (4.113) which, replaced in the last relation (4.112), leads to −A2A−1 1 A3B4 + A4B4 = In−p, (4.114) hence B4 = (A4 − A2A−1 1 A3)−1 . (4.115) On the other hand, one may write the relation B · A = In (4.116) too; it follows that the system B1A1 + B3A2 = Ip, B1A3 + B3A4 = 0p,n−p, B2A1 + B4A2 = 0n−p,p, B2A3 + B4A4 = In−p. (4.117) The third relation (4.117) leads to B2 = −B4A3A−1 1 , (4.118) while from the first relation (4.112), one obtains B1 = A−1 1 − A−1 1 A3B2. (4.119) Finally, the formulae (4.113), (4.115), (4.118), and (4.119) lead to B4 = (A4 − A2A−1 1 A3)−1 , B3 = −A−1 1 A3(A4 − A2A−1 1 A3)−1 , B2 = −(A4 − A2A−1 1 A3)−1 A2A−1 1 , B1 = A−1 1 + A−1 1 A3(A4 − A2A−1 1 A3)−1 A2A−1 1 . (4.120)
  • 136. INVERSION OF MATRICES 127 4.4.4 Schur’s Method of Inversion of Matrices Let A be a quadratic matrix of nth order, which we partition in the form2 A = A1 A2 A3 A4 , (4.121) where A1 is a quadratic matrix of pth order, p < n, A2 is a matrix with p rows and n − p columns, A3 is a matrix with n − p rows and p columns, while A4 is a quadratic matrix of (n − p)th order. Let us also suppose that A4 and A1 − A2A−1 4 A3 are invertible matrices. Definition 4.8 The matrix A1 − A2A−1 4 A3 is called the Schur complement of the matrix A. Proposition 4.6 Under the above conditions, the decomposition A1 A2 A3 A4 = Ip A2A−1 4 0n−p,p In−p A1 − A2A−1 4 A3 0p,n−p 0n−p,p A4 Ip 0p,n−p A−1 4 A3 In−p (4.122) takes place, where Ip and In−p are the unit matrices of orders p and n − p, respectively, while 0p,n−p or 0n−p,p mark zeros with p rows and n − p columns or with n − p rows and p columns, respectively. Demonstration. The result is evident, being an elementary multiplication of matrices. Corollary 4.3 Under the same conditions, the inverse of the matrix A is A−1 = Ip 0p,n−p −A−1 4 A3 In−p   A1 − A2A−1 4 A3 −1 0p,n−p 0n−p,p A−1 4   Ip −A2A−1 4 0n−p,p In−p . (4.123) Demonstration. The result is obvious. Observation 4.13 (i) We may consider that the matrix A1 is invertible, in which case, the Schur complement of the matrix A is given by A4 − A3A−1 1 A2. (ii) If A1 and A4 − A3A−1 1 A2 are invertible, then we may write A = Ip 0p,n−p A3A−1 1 In−p A1 0p,n−p 0n−p,p A4 − A3A−1 1 A2 Ip A−1 1 A2 0n−p,p In−p (4.124) and A−1 = Ip −A−1 1 A2 0n−p,p In−p   A−1 1 0p,n−p 0n−p,p A4 − A3A−1 1 A2 −1   Ip 0p,n−p −A3A−1 1 In−p , (4.125) respectively. 2The method was found by Isaai Schur (1875–1941).
  • 137. 128 LINEAR ALGEBRA Observation 4.14 (i) If A, A2, A3, and A4 are invertible, while A1 = 0p, then the Schur complement is given by −A2A−1 4 A3 and the inverse of the matrix A is given, corresponding to formula (4.123), by A−1 = −A−1 3 A4A−1 2 A−1 3 A−1 2 0n−p . (4.126) (ii) If A, A1, A2 and A3 are invertible, while A4 = 0n−p, then the Schur complement becomes −A3A−1 1 A2 and relation (4.127) leads to A−1 = 0p A−1 3 A−1 2 −A−1 2 A1A−1 3 . (4.127) In multibody dynamics, matrices of the form A = A1 −AT 3 A3 0 , (4.128) for which the decomposition, corresponding to relation (4.124), is A = Ip 0p,n−p A3A−1 1 In−p A1 0p,n−p 0n−p,p A3A−1 1 AT 3 Ip −A−1 1 AT 3 0n−p,p In−p , (4.129) are of interest. Using relation (4.125), the inverse of this matrix is of the form A−1 = Ip A−1 1 AT 3 0n−p,p In−p A1 0p,n−p 0n−p,p A3A−1 1 AT 3 −1 Ip 0p,n−p −A3A−1 1 In−p , (4.130) 4.4.5 The Iterative Method (Schulz) Let A ∈ Mn(R) be nonsingular and B0 be an approximate value of A−1 . Let us consider the matrix3 C0 = In − AB0, (4.131) where In is the unit matrix of order n. If C0 = 0, then B0 = A−1 and the procedure stops. Let us suppose that C0 = 0. We construct the sequence Bk = Bk−1 + Bk−1Ck−1, k ≥ 1, (4.132) where Ck−1 = In − ABk−1. (4.133) Proposition 4.7 The relation Ck = C2k 0 , k ≥ 1. (4.134) takes place for the sequence {Ck}k≥1 defined by relations (4.132) and (4.133). 3The method was published by G. Schulz in 1933.
  • 138. INVERSION OF MATRICES 129 Demonstration. We have C1 = In − A · B1 = In − A(B0 + B0C0) = In − AB0(In + C0) (4.135) for k = 1. On the other hand, AB0 = In − C0, (4.136) hence C1 = In − (In − C0)(In + C0) = C2 0. (4.137) Let us now suppose that Ck = C2k 0 and let us show that Ck+1 = C2k+1 0 . We may write Ck+1 = In − ABk+1 = In − A(Bk + BkCk) = In − ABk(In + Ck). (4.138) Then ABk = In − Ck, (4.139) corresponding to relation (4.133); relation (4.138) leads to Ck+1 = In − (In − Ck)(In + Ck) = C2 k = C2k+1 0 . (4.140) Taking into account the principle of mathematical induction, relation (4.134) is true for any k ≥ 1. Proposition 4.8 If there exists q ∈ R, 0 < q < 1, so that C0 ≤ q, then lim k→∞ Bk = A−1 . (4.141) Demonstration. We may write successively Ck = C2k 0 ≤ C0 2k ≤ q2k , (4.142) hence lim k→∞ Ck = 0. (4.143) On the other hand, Ck = In − ABk (4.144) and relation (4.143) leads to lim k→∞ Ck = lim k→∞ (In − ABk) = 0. (4.145) The last relation implies In = lim k→∞ ABk, (4.146) hence lim k→∞ Bk = A−1 . (4.147) Proposition 4.9 Taking into account the previous notations, the following relation exists: A−1 − Bk ≤ B0 In + q 1 − q q2k . (4.148)
  • 139. 130 LINEAR ALGEBRA Demonstration. The relation A−1 − Bk may be written in the form A−1 − Bk = A−1 − A−1 (ABk) = A−1 (In − ABk) ≤ A−1 In − ABk = A−1 Ck = A−1 C2k 0 ≤ A−1 C0 2k ≤ A−1 q2k . (4.149) Then A−1 = B0(In − C0)−1 , (4.150) hence A−1 ≤ B0 (In − C0)−1 = B0 In + C0 + C2 0 + · · · ≤ B0 ( In + C0 + C2 0 + · · ·) ≤ B0 ( In + q + q2 + q22 + q23 + · · ·) ≤ B0 ( In + q + q2 + q3 + · · ·) ≤ B0 In + q 1 1 − q . (4.151) It follows that A−1 − Bk ≤ B0 In + q 1 + q q2k . (4.152) Observation 4.15 If In = 1, then relation (4.148) becomes A−1 − Bk ≤ B0 1 − q q2k . (4.153) Observation 4.16 If we wish to obtain the matrix A−1 with a precision ε, then we stop at the point that B0 1 − q q2k < ε (4.154) if In = 1, or when B0 In + q 1 − q q2k < ε. (4.155) Each of the relations (4.154) and (4.155) indicates the number of necessary iteration steps. We have thus to deal with an a priori estimation of the error. Observation 4.17 One uses sometimes a stopping condition of the form Bk − Bk−1 < ε, (4.156) which results because the sequence Bk is convergent.
  • 140. INVERSION OF MATRICES 131 4.4.6 Inversion by Means of the Characteristic Polynomial Let A ∈ Mn(R) a square matrix with det A = 0 and the secular equation det[A − λIn] = 0, (4.157) which may be transcribed in the form a11 − λ a12 a13 . . . a1,n−1 a1n a21 a22 − λ a23 . . . a2,n−1 a2n . . . . . . . . . . . . . . . . . . an−1,1 an−1,2 an−1,3 . . . an−1,n−1 − λ an−1,n an1 an2 an3 . . . an,n−1 ann − λ = 0 (4.158) or, equivalently, (−1)n λn + σ1λn−1 + σ2λn−2 + · · · + σn = 0. (4.159) Replacing λ by A in the characteristic equation (4.157), one gets det[A − A] = 0 (4.160) which, obviously, is true. Hence, (−1)n An + σ1An−1 + σ2An−2 + · · · + σnIn = 0. (4.161) On multiplying relation (4.161) on the right by A−1 , we get (−1)n An−1 + σ1An−2 + · · · + σn−1In = −σnA−1 , (4.162) obtaining A−1 = − 1 σn [(−1)n An−1 + σ1An−2 + · · · + σn−1In]. (4.163) 4.4.7 The Frame–Fadeev Method The Frame–Fadeev method4 is a different reading from the previous one, where the coefficients σi, i = 1, n, of various powers of λ in the secular equation are obtained as traces of certain matrices. We multiply the characteristic equation (−1)n λn + σ1λn−1 + σ2λn−2 + · · · + σn−1λ + σn = 0 (4.164) by (−1)n to bring it in the form λn + σ∗ 1λn−1 + σ∗ 2λn−2 + · · · + σ∗ n−1λ + σ∗ n = 0. (4.165) Following sequences A1 = A, σ∗ 1 = −TrA1, B1 = A1 + σ∗ 1In, (4.166) A2 = AB1, σ∗ 2 = − 1 2 TrA2, B2 = A2 + σ∗ 2In (4.167) 4This method was published by J. S. Frame in 1949 and then in 1964, and by D. K. Fadeev (Faddeev) in 1952 in Russian and then in 1963 in English.
  • 141. 132 LINEAR ALGEBRA and, in general, Ak = ABk−1, σ∗ k = − 1 k TrAk, Bk = A + σ∗ kIn, (4.168) until An = ABn−1, σ∗ n = − 1 n TrAn, Bn = An + σ∗ nIn (4.169) are obtained. The last relation (4.169) is just the Hamilton–Cayley equation, hence Bn = An + σ∗ nIn = 0, (4.170) from which An = −σ∗ nIn. (4.171) The first formula (4.169) leads now to ABn−1 = −σ∗ nIn, (4.172) hence A−1 = − 1 σ∗ n Bn−1. (4.173) 4.5 SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 4.5.1 Cramer’s Rule Let us consider the linear system of n equations with n unknowns5    a11x1 + a12x2 + · · · + a1nxn = b1, a21x1 + a22x2 + · · · + a2nxn = b2, ... an1x1 + an2x2 + · · · + annxn = bn, (4.174) which may be written in the form Ax = b, (4.175) where A = [aij ]i,j=1,n, x = x1 x2 · · · xn T , b = b1 b2 · · · bn T , A ∈ Mn(R), x ∈ Mn,1(R), b ∈ Mn,1(R). We suppose that the system is determined compatible, that is det A = 0. In this case, the solution of the system is given by xi = i , i = 1, n, (4.176) where = a11 a12 · · · a1n a21 a22 · · · a2n · · · · · · · · · · · · an1 an2 · · · ann , (4.177) 5The method was named after Gabriel Cramer (1704–1752) who published it in 1750.
  • 142. SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 133 i = a11 a12 · · · a1,i−1 b1 a1,i+1 · · · a1n a21 a22 · · · a2,i−1 b2 a2,i+1 · · · a2n · · · · · · · · · · · · · · · · · · · · · · · · an−1,1 an−1,2 · · · an−1,i−1 bn−1 an−1,i+1 · · · an−1,n an1 an2 · · · an,i+1 bn an,i+1 · · · ann . (4.178) Formulae (4.176) form the so-called Cramer’s rule. The obvious disadvantage of this method consists in the fact that one must calculate n + 1 determinants of n + 1 distinct matrices of nth order. If det A = 0, then the system may be undetermined compatible or incompatible. Obviously, the first step consists in the calculation of = det A. If = 0, then one may apply formula (4.176); in the contrary case, the algorithm stops. 4.5.2 Gauss’s Method The idea of Gauss’s method consists in bringing the system of n linear algebraic equations with n unknowns    a11x1 + a12x2 + · · · + a1nxn = b1, a21x1 + a22x2 + · · · + a2nxn = b2, ... an1x1 + an2x2 + · · · + annxn = bn (4.179) to a canonical form in which the unknowns may easily be obtained. Such a form is the triangular one in which the system (4.179) is written in the form    a11x1 + a12x2 + a13x3 + · · · + a1,n−1xn−1 + a1nxn = b1, a22x2 + a23x3 + · · · + a1,n−1xn−1 + a2nxn = b2, a33x3 + · · · + a3,n−1xn−1 + a3nxn = b3, ... an−1,n−1xn−1 + an−1,nxn = bn−1, annxn = bn. . (4.180) The solution of this system is given by xn = 1 ann bn, xn−1 = 1 an−1,n−1 (bn−1 − an−1,nxn), . . . , xi = 1 aii  bi − n j=i+1 aij   , . . . , x1 = 1 a11  b1 − n j=2 a1j xj   (4.181) and is obtained step by step, starting from xn, continuing with xn−1, xn−2, . . . , until x1. Observation 4.18 Obviously, equation (4.180) is not the only possible form in which the solution can be immediately obtained. We may take, for instance, an inferior triangular form of the system, the determination of the unknowns beginning with x1, continuing with x2, x3, . . . , until xn. Observation 4.19 We observe from equation (4.181) that all the coefficients aii , i = 1, n, must be nonzero. In Gauss’s method, we multiply successively the first row with suitable values, adding then to the rows 2, 3, . . . , n, so as to obtain a zero value for all ai1, 2 ≤ i ≤ n, remaining with a11 = 0. If
  • 143. 134 LINEAR ALGEBRA a11 = 0 in the initial system, then we look for a nonzero coefficient between a21, a31, . . . , an1. If all ai1, i = 1, n, vanish, then the procedure stops, and the variable x1 disappears in all n equations (if not, we have a system of n equations with at most n − 1 unknowns). We continue then with the second row and multiply it (we suppose a22 = 0) with suitable values so that all elements a32, a42, . . . , an2 do vanish. Obviously, if a22 = 0, then we commute the second row with another row i for which ai2 = 0. After the first step, the system (4.181) becomes    a(1) 11 x1 + a(1) 12 x2 + a(1) 13 x3 + · · · + a(1) 1n xn = b(1) 1 , a(1) 22 x2 + a(1) 23 x3 + · · · + a(1) 2n xn = b(1) 2 , a(1) 32 x2 + a(1) 33 x3 + · · · + a(1) 3n = b(1) 3 , ... a(1) n2 x2 + a(1) n3 x3 + · · · + a(1) nn xn = b(1) n , (4.182) while, after the second step, the form of the system will be    a(2) 11 x1 + a(2) 12 x2 + a(2) 13 x3 + · · · + a(2) 1n xn = b(2) 1 , a(2) 22 x2 + a(2) 23 x3 + · · · + a(2) 2n xn = b(2) 2 , a(2) 32 x2 + a(2) 33 x3 + · · · + a(2) 3n = b(2) 3 , ... a(2) n2 x2 + a(2) n3 x3 + · · · + a(2) nn xn = b(2) n . (4.183) The procedure will continue until we obtain the form (4.180). Observation 4.20 To reduce the calculation errors at step i, i = 1, n − 1, we do not make the pivot with a(i) ii but bring to position the element among a(i) ii , a(i) i+1,i, . . . , a(i) ni that is greatest in modulus. If all these elements vanish (the maximum is equal to zero), then the algorithm stops. 4.5.3 The Gauss–Jordan Method The Gauss–Jordan method is a similar one to Gauss’s method, specifying that the row i, after multiplying it with suitable values, is added not only to the rows i + 1, i + 2, . . . , n, but to the rows 1, 2, . . . , i − 1 too. Thus, the matrix of the system becomes a diagonal one    a11x1 + 0 · x2 + 0 · x3 + · · · + 0 · xn−1 + 0 · xn = b1, a22x2 + 0 · x3 + · · · 0 · xn−1 + 0 · xn = b2, a33x3 + · · · + 0 · xn−1 + 0 · xn = b3, ... an−1,n−1xn−1 + 0 · xn−1 + 0 · xn = bn−1, ann xn = bn. . (4.184) In this case, the solution of the system (4.184) becomes xi = bi aii , aii = 0, i = 1, n. (4.185) The observations to Gauss’s method remain valid.
  • 144. SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 135 4.5.4 The LU Factorization The idea of the method6 consists in writing the matrix A of the linear system Ax = b, A ∈ Mn(R), x ∈ Mn,1(R), b ∈ Mn,1(R), in the form A = LU, (4.186) where L ∈ Mn(R) is an inferior triangular matrix L =          l11 0 0 · · · 0 0 l21 l22 0 · · · 0 0 l31 l32 l33 · · · 0 0 · · · · · · · · · ... · · · · · · ln−1,1 ln−1,2 ln−1,3 · · · ln−1,n−1 0 ln,1 ln,2 ln,3 · · · ln,n−1 ln,n          , (4.187) while U ∈ Mn(R) is a superior triangular matrix U =          u11 u12 u13 · · · u1,n−1 u1n 0 u22 u23 · · · u2,n−1 u2,n 0 0 u33 · · · u3,n−1 u3,n · · · · · · · · · ... · · · · · · 0 0 0 · · · un−1,n−1 un−1,n 0 0 0 · · · 0 un,n          . (4.188) Taking into account the previous relations, we may determine the values of the elements of the matrices L and U l11u11 = a11, l11u12 = a12, . . . , l11u1n = a1n, l21u11 = a21, l21u12 + l22u22 = a22, . . . , l21u1n + l22u2n = a2n, . . . , ln1u11 = an1, ln1u12 + ln2u22 = an2, ln1u1n + ln2u2n + . . . + lnn unn = ann. (4.189) The system Ax = b (4.190) now takes the form LUx = b. (4.191) We denote Ux = y (4.192) and we now have to solve two systems, that is, Ly = b, (4.193) with L an inferior triangular matrix, which has the solution yi = 1 lii  bi − i−1 j=1 lji yj   , i = 1, n, (4.194) 6The method was introduced by Alan Mathison Turing (1912–1954).
  • 145. 136 LINEAR ALGEBRA and the system Ux = y, (4.195) where U is a superior triangular matrix, having the solution xi = 1 uii  yi − n j=i+1 uji xj   , i = 1, n, (4.196) respectively. Observation 4.21 We observe that the system (4.189) has n2 equations and n2 + n unknowns. To be determined, n unknowns must be a priori specified. Depending on the specified unknowns, the method must be used in various readings that will be presented in the following. A. The Doolittle Method In the frame of this method,7 the values lii = 1, i = 1, n (4.197) are established and the system (4.189) becomes u11 = a11, u12 = a12, . . . , u1n = a1n, l21u11 = a21, l21u12 + u22 = a22, . . . , l21u1n + u2n = a2n, . . . , ln1u11 = an1, ln1u12 + ln2u22 = an2, . . . , ln1u1n + ln2un2 + . . . + unn = ann. (4.198) The solution of this system becomes lii = 1, i = 1, n, u1j = a1j , j = 1, n, l21 = a21 a11 , u2k = a2k − l21u1k, k = 2, n, . . . , ln1 = an1 u11 , ln2 = 1 u22 (an2 − ln1u12), . . . , unn = ann − ln1un1 − ln2un2 − · · · − ln,n−1un,n−1. (4.199) B. The Crout Method In the Crout method,8 the values uii = 1 i = 1, n, (4.200) are imposed, so that the system (4.189) becomes l11 = a11, l11u12 = a12, . . . , l11u1n = a1n, l21 = a21, l21u12 + l22 = a22, . . . , l21u1n + l22u2n = a2n, . . . , ln1 = an1, ln1u12 + ln2 = an2, . . . , ln1u1n + ln2u2n + . . . + ln,n−1un−1,n + ln,n = ann, (4.201) with the solution uii = 1, i = 1, n, lj1 = aj1, j = 1, n, u12 = a12 l11 , lk2 = ak2 − lk1u12, k = 2, n, . . . , u1n = a1n l11 , u2n = 1 l22 (a2n − l21u1n), . . . , lnn = ann − ln1u1n − · · · − ln,n−1un−1,n. (4.202) 7 The method was described by Myrick Hascall Doolittle (1830–1913). 8The method was named after Prescott Durand Crout (1907–1984).
  • 146. SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 137 C. The Cholesky Method We suppose that the matrix A is symmetric and positive definite,9 that is AT = A (4.203) and xT Ax > 0 (4.204) for any x ∈ Mn,1(R), x = 0. We may choose the matrices L and U so that U = LT , (4.205) in these conditions. The condition A = LU, written now in the form A = LLT , (4.206) or, equivalently,     a11 a12 a13 · · · a1n a21 a22 a23 · · · a2n · · · · · · · · · · · · · · · an1 an2 an3 · · · ann     =     l11 0 0 · · · 0 l21 l22 0 · · · 0 · · · · · · · · · · · · · · · ln1 ln2 ln3 · · · lnn         l11 l21 . . . ln1 0 l22 . . . ln2 . . . . . . . . . . . . 0 . . . . . . lnn     , (4.207) leads to l2 11 = a11, l11l21 = a12, . . . , l11ln1 = a1n, l21l11 = a21, l2 21 + l2 22 = a22, . . . , l21ln1 + l22ln2 = a2n, . . . , ln1l11 = an1, ln1l21 + ln2l22 = an2, . . . , l2 n1 + l2 n2 + . . . + l2 nn = ann, (4.208) the solution of which is lii = aii − i−1 j=1 l2 ji , j = 1, n, lij = 1 lii aij − i−1 k=1 lki lkj , j > i. (4.209) 4.5.5 The Schur Method of Solving Systems of Linear Equations Let us consider the linear system    a11x1 + a12x2 + · · · + a1nxn = b1, ... an1x1 + an2x2 + · · · + ann xn = bn, , (4.210) which we write in a condensed form as Ax = b. (4.211) We suppose that the system is compatible determined and that the matrix A allows a partition of the form A = A1 A2 A3 A4 , (4.212) where A1 ∈ Mp(R), A2 ∈ Mp,n−p(R), A3 ∈ Mn−p,p(R), and A4 ∈ Mn−p(R). 9The method was presented by Andr´e–Louis Cholesky (1876–1918).
  • 147. 138 LINEAR ALGEBRA We partition the column vectors x and b in the form x = x1 x2 , b = b1 b2 , (4.213) where x1 = [x1 · · · xp]T , x2 = [xp+1 · · · xn]T , b1 = [b1 · · · bp]T , b2 = [bp+1 · · · bn]T . (4.214) The system (4.211) is now written in the form A1x1 + A2x2 = b1, A3x1 + A4x2 = b2. (4.215) If the matrix A4 is invertible, then the second equation (4.213) becomes A−1 4 A3x1 + x2 = A−1 4 b2, (4.216) from which x2 = A−1 4 b2 − A−1 4 A3x1. (4.217) Substituting now relation (4.217) in the first equation (4.215), we get A1x1 + A2A−1 4 b2 − A2A−1 4 A3x1 = b1 (4.218) or, equivalently, (A1 − A2A−1 4 A3)x1 = b1 − A2A−1 4 b2. (4.219) Now, if A1 − A2A−1 4 A3 is invertible, then it follows x1 = (A1 − A2A−1 4 A3)−1 (b1 − A2A−1 4 b2). (4.220) Relations (4.220) and (4.217) give the solution of the system (4.211). The conditions of invertibility of the matrices A4 and A1 − A2A−1 4 A3 are just the Schur conditions for the determination of the matrix A−1. If the matrix A1 is invertible, then the first equation (4.215) leads to x1 = A−1 1 b1 − A−1 1 A2x2, (4.221) while from the second equation (4.215) we obtain A3A−1 1 b1 − A3A−1 1 A2x2 + A4x2 = b2, (4.222) from which, if A4 − A3A−1 1 A2 is an invertible matrix, we get x2 = (A4 − A3A−1 1 A2)−1 (b2 − A3A−1 1 b1). (4.223) In this case too, the invertibility conditions of the matrices A1 and A4 − A3A−1 1 A2 are Schur’s conditions to determine the inverse of the matrix A. Let us suppose now that the matrix A4 is invertible, while Q is a nonsingular quadratic matrix. Moreover, we verify the relations Qb2 = 0, x2 = QT λ. (4.224)
  • 148. SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 139 Under these conditions, the equation A1 A2 A3 A4 x1 x2 = b1 b2 (4.225) may be written in the form I 0 0 Q A1 A2 A3 A4 I 0 0 QT x1 λ = b1 0 , (4.226) which may be easily verified by performing the requested products and taking into account the relations (4.224). It follows that A1 A2QT QA3 QA4QT x1 λ = b1 0 , (4.227) from which A1x1 + A2QT λ = b1, QA3x1 + QA4QT λ = 0. (4.228) From the second relation (4.228) one obtains λ = −(QT )−1 A−1 4 A3x1, (4.229) which, replaced in the first relation (4.228), leads to (A1 − A2A−1 4 A3)x1 = b1. (4.230) If the expression between parentheses in equation (4.230) defines a nonsingular matrix, then relations (4.230) and (4.229) give the required solution, because QA4QT is always invertible. Let us consider now the case in which A4 is not invertible, a situation that may be frequently encountered in the mechanics of multibody systems, when A4 = 0. From the first relation (4.228), we get x1 = A−1 1 (b1 − A2QT λ), (4.231) which, replaced in the second relation (4.228), leads to (A4 − A3A−1 1 A2)QT λ = −A3A−1 1 b1. (4.232) If the expression from the parentheses in equation (4.232), as well as A1, are nonsingular matrices, then relations (4.232) and (4.231) lead to the solution of the system (4.225) with the conditions (4.224). In the particular case frequently encountered, for which A4 = 0, the relation (4.70) is simplified in the form QA3A2QT λ = −QA3A−1 1 b1, (4.233) from which λ = (A3A2QT )−1 A3A−1 1 b1, (4.234) and the relation (4.69) now leading to x1 = A−1 1 [I − A2QT (A3A2QT )−1 A3A−1 1 ]b1. (4.235)
  • 149. 140 LINEAR ALGEBRA Let us now consider the case of the system A1 A2 A3 A4 x1 x2 + c1 c2 = b1 b2 , (4.236) the relations (4.224) continuing to remain valid. Proceeding analogically, we obtain the relation A1 A2QT QA3 QA4QT x1 λ = b1 − c1 −Qc2 , (4.237) resulting in the system A1x1 + A2QT λ = b1 − c1, QA3x1 + QA4QT λ = −Qc2. (4.238) If A4 is invertible, then the last relation (4.238) leads to λ = −(QT )−1 A−1 4 (c2 + A3c1), (4.239) which, replaced in the first equation (4.238), allows to write (A1 − A2A−1 4 A3)x1 = b1 − c1 + A2A−1 4 c2. (4.240) If the expression between parentheses of equation (4.240) defines a nonsingular matrix, then for- mulae (4.240) and (4.239) give the required solution. If A1 is invertible, then the first relation (4.238) leads to x1 = A−1 1 (b1 − c1 − A2QT λ), (4.241) which, replaced in the second equation (4.238), allows to write (A4 − A3A−1 1 A2)QT λ = −[c2 + A3A−1 1 (b1 − c1)]. (4.242) If the expression between the parentheses, in the left-hand term of the formula (4.242) defines an invertible matrix, then relations (4.242) and (4.241) give the solution we require. In the particular case defined by A4 = 0, we obtain, from relations (4.242) and (4.241), the formulae λ = (A3A−1 1 A2QT )−1 [c2 + A3A−1 1 (b1 − c1)], (4.243) x1 = A−1 1 {b1 − c1 − A2(A3A−1 1 A2)−1 [c2 + A3A−1 1 (b1 − c1)]}. (4.244) Let us now modify the second condition (4.224) in the form x2 = QT λ + β. (4.245) The system (4.236) now becomes A1 A2QT QA3 QA4QT x1 λ = b1 − c1 − A2β −Qc2 − QA4β , (4.246) from which we get A1x1 + A2QT λ = b1 − c1 − A2β, QA3x1 + QA4QT λ = −Qc2 − QA4β. (4.247)
  • 150. SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 141 If A4 is invertible, then the last relation (4.247) leads to λ − (A4QT )−1 (A3x1 + c2 + A4β), (4.248) which, replaced in the first equation (4.247), allows to write (A1 − A2A−1 4 A3)x1 = b1 − c1 − A2β + A2A−1 4 (c1 + A4β). (4.249) If the expression between the parentheses on the left-hand side of this formula defines an invertible matrix, then relations (4.249) and (4.248) give the required answer. If A1 is invertible, then the first relation (4.247) leads to x1 = A−1 1 (b1 − c1 − A2β) − A−1 1 A2QT λ, (4.250) which, replaced in the last relation (4.247), allows to write (A4 − A3A−1 1 A2)QT λ = −c2 − A4β − A3A−1 1 (b1 − c1 − A2β). (4.251) If the parentheses of the left-hand side of the previous relation define a nonsingular matrix, then the relations (4.251) and (4.250) constitute the required answer. In the particular case given by A4 = 0, formulae (4.251) and (4.250) are simplified in the form λ = (QT A3A−1 1 A2)−1 [−c2 − A3A−1 1 (b1 − c1 − A2β)], (4.252) x1 = A−1 1 (b1 − c1 − A2β) − A−1 1 A2(A3A−1 1 A2)−1 [−c2 − A3A−1 1 (b1 − c1 − A2β)]. (4.253) Observation 4.22 The theory presented above remains valid also in the case in which we renounce the condition that Q be invertible. The only condition asked is that Q should be a full rank matrix. Considering now the system (4.247), if A4 is invertible, then the last equation (4.247) leads to λ = (QA4QT )−1 (−Qc2 − QA4β − QA3x1), (4.254) while the first relation (4.247) gives [A1 − A2QT (QA4QT )−1 QA3]x1 = b1 − c1 − A2β + A2QT (QA4QT )−1 (Qc1 + QA4β). (4.255) If the square brackets on the left-hand side of this formula define an invertible matrix, then formulae (4.255) and (4.254) give the allowed answer. If A1 is invertible, then, from the first relation (4.247), we get x1 = A−1 1 (b1 − c1 − A2β − A2QT λ), (4.256) which, replaced in the second formula (4.247), leads to (QA4QT + QA3A−1 1 A2QT )λ = Qc2 + QA4β + QA3(b1 − c1 − A2β). (4.257) If the parentheses on the left-hand side of equation (4.257) define a nonsingular matrix, then the formulae (4.257) and (4.256) give the searched answer.
  • 151. 142 LINEAR ALGEBRA If A4 = 0, then the relation (4.257) may be written in the form QA3A−1 1 A2QT λ = Qc2 + QA3(b1 − c1 − A2β). (4.258) If β = 0, then the relations (4.254)–(4.258) become λ = (QA4QT )−1 (−Qc2 − QA3x1), (4.259) [A1 − A2QT (QA4QT )−1 QA3]x1 = b1 − c1 + A2QT (QA4QT )−1 Qc2, (4.260) x1 = A−1 1 (b1 − c1 − A2QT λ), (4.261) (QA4QT + QA3A−1 1 A2QT )λ = Qc2 + QA3(b1 − c1), (4.262) QA3A−1 1 A2QT λ = Qc2 + QA3(b1 − c1). (4.263) If we also have c1 = 0, c2 = 0, then the last relations are simplified and, furthermore, we are led to λ = −(QA4QT )−1 QA3x1, (4.264) [A1 − A2QT (QA4QT )−1 QA3]x1 = b1, (4.265) x1 = A−1 1 (b1 − A2QT λ), (4.266) (QA4QT + QA3A−1 1 A2QT )λ = QA3b1, (4.267) QA3A−1 1 A2QT λ = QA3b1. (4.268) 4.5.6 The Iteration Method (Jacobi) Let us consider the system of linear equations10    a11x1 + a12x2 + · · · + a1nxn = b1, a21x1 + a22x2 + · · · + a2nxn = b2, ... ai1x1 + a12x2 + · · · + ainxn = bi, ... an1x1 + an2x2 + · · · + annxn = bn, (4.269) which may also be written in the matrix form Ax = b, (4.270) where A =     a11 a12 . . . a1n a21 a22 . . . a2n . . . . . . . . . . . . an1 an2 . . . ann     , b = b1 b2 · · · bn T , x = x1 x2 · · · xn T . (4.271) We suppose that aii = 0, i = 1, n, in the system (4.269) and that A is nonsingular. 10The method was named after Carl Gustav Jacob Jacobi (1804–1851).
  • 152. SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 143 If one includes the unknown xi in the equation i of the system (4.269), then one may write    x1 = b1 a11 − a12 a11 x2 − a13 a11 x3 − · · · − a1n a11 xn, x2 = b2 a22 − a21 a22 x1 − a23 a22 x3 − · · · − a2n a22 xn, ... xi = bi aii − ai1 aii x1 − ai2 aii x2 − · · · − ain aii xn, ... xn = bn ann − an1 ann x1 − an2 ann x2 − · · · − an,n−1 ann xn−1. (4.272) Let us denote βi = bi aii , i = 1, n, (4.273) αij = − aij aii , i = 1, n, i = j, αij = 0, i = j. (4.274) It follows that α =     0 α12 . . . α1,n−1 α1n α21 0 . . . α2,n−1 α2n . . . . . . . . . . . . . . . αn1 αn2 . . . αn,n−1 0     , β =      β1 β2 ... βn      , (4.275) so that the system (4.272) becomes x = β + αx. (4.276) Let x(0) ∈ Mn,1(R) be an initial solution of the system (4.276). We define the sequence of iterations x(1) = β + αx(0) , x(2) = β + αx(1) , . . . , x(k+1) = β + αx(k) , . . . , (4.277) where k ∈ N∗ . Let us suppose that the sequence x(0) , x(1) , . . . , x(k) , . . . , is convergent and let x = lim k→∞ x(k) (4.278) be its limit. It follows that x = β + αx, (4.279) hence x is the solution of the system (4.276). Proposition 4.10 A sufficient condition of convergence of the sequence of successive iterations x(k+1) = β + αx(k) , k ∈ N∗ , x(0) arbitrary, (4.280) is α < 1, where is one of the canonical norms. Demonstration. We may write x(k) = β + αx(k−1) = β + α(β + αx(k−2) ) = (In + α)β + α2 x(k−2) . (4.281)
  • 153. 144 LINEAR ALGEBRA We get, in general, x(k) = (In + α + α2 + · · · + αk−1 )β + αk x(0) , (4.282) where In is the unit matrix of nth order. On the other hand, from α < 1 and because is canonical, we also have αk ≤ α k . (4.283) It follows that αk → 0 for k → ∞, because α k → 0 for k → ∞. One obtains lim k→∞ αk = 0. (4.284) Then lim k→∞ (In + α + · · · + αk−1 ) = (In − α)−1 (4.285) and, passing to the limit in (4.282), it follows that x = (In − α)−1 β, (4.286) from which (In − α)x = β; (4.287) which is just the relation (4.279), showing that x is a solution of the system (4.276), hence of the system (4.269). Observation 4.23 Instead of the sequence of successive iterations x(0) , x(1) , . . . , x(k) , . . . , we may consider the sequence y(0) = x(0) , y(k) = x(k) − x(k−1) , k ∈ N∗ . (4.288) We get y(k+1) = x(k+1) − x(k) = β + αx(k) − β − αx(k−1) , (4.289) from which y(k+1) = α(x(k) − x(k−1) ), k ∈ N∗ , (4.290) hence y(k+1) = αy(k) , k ∈ N∗ . (4.291) On the other hand, x(k+1) = k+1 i=0 y(i) = x(0) + k+1 i=1 αi y(1) . (4.292) Observation 4.24 (i) If x(0) = β, then the sequence of successive iterations becomes a particular form x(0) = β, x(1) = β + αx(0) = (In + α)β, x(2) = β + αx(1) = β + αβ + α2 β = (In + α + α2 )β, . . ., x(k) = (In + α + α2 + · · · + αn )β, . . . (4.293)
  • 154. SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 145 (ii) For x(0) = β, relation (4.292) is written in the form x(k+1) = k+1 i=0 αi β, (4.294) where α0 = In. Proposition 4.11 (Estimation of the Error). Under the above conditions, the relation x(k) − x ≤ 1 1 − α x(k+1) − x(k) ≤ α k x(1) − x(0) 1 − α (4.295) follows. Demonstration. Let x(m+1) and x(m) be two consecutive iterations, with m ∈ N∗ . We have x(m+1) − x(m) = β + αx(m) − β − αx(m−1) = α(x(m) − x(m−1) ). (4.296) It follows that x(m+1) − x(m) = αm−k (x(k+1) − x(k) ) = αm (x(1) − x(0) ) (4.297) for any 1 ≤ k < m. Passing to the norm in the relation (4.297), it follows that x(m+1) − x(m) ≤ α m−k x(k+1) − x(k) ≤ α m x(1) − x(0) . (4.298) Let p ∈ N∗, arbitrary. We calculate x(k+p) − x(k) = x(k+p) − x(k+p−1) + x(k+p−1) − · · · − x(k+1) + x(k) ≤ x(k+p) − x(k+p−1) + x(k+p−1) − x(k+p−2) + · · · + x(k+1) − x(k) . (4.299) From (4.298), we get x(k+p) − x(k+p−1) ≤ α p−1 x(k+1) − x(k) , x(k+p−1) − x(k+p−2) ≤ α p−2 x(k+1) − x(k) , . . . , x(k+2) − x(k+1) ≤ α x(k+1) − x(k) , (4.300) so that the relation (4.298) leads to x(k+p) − x(k) ≤ α p−1 x(k+1) − x(k) + α p−2 x(k+1) − x(k) + · · · + x(k+1) − x(k) = 1 − α p 1 − α x(k+1) − x(k) ≤ 1 1 − α x(k+1) − x(k) . (4.301) Taking into account that x(k+1) − x(k) ≤ α x(k) − x(k−1) ≤ α 2 x(k−1) − x(k−2) ≤ · · · ≤ α k x(1) − x(0) , (4.302) we get x(k+p) − x(k) ≤ 1 1 − α x(k+1) − x(k) ≤ α k 1 − α x(1) − x(0) (4.303) from the formula (4.301).
  • 155. 146 LINEAR ALGEBRA We pass now to the limit for p → ∞ in the last relation and take into account lim p→∞ x(k+p) = x, obtaining the relation (4.295), which had to be proved. Corollary 4.4 If x(0) = β, then the relation (4.295) becomes x(k) − x ≤ 1 1 − α x(k+1) − x(k) ≤ α k+1 1 − α β . (4.304) Demonstration. We have x(0) = β, x(1) = (In + α)β, x(2) = (In + α + α2 )β, . . . , x(m) = (In + α + α2 + · · · + αm )β (4.305) for x(0) = β, so that x(k+1) − x(k) = αk+1 β ≤ α k+1 β (4.306) and the relation (4.304) is obvious. Observation 4.25 (i) A priori estimation of the error: The formula (4.295), written in the form x(k) − x ≤ α k x(1) − x(0) 1 − α < ε, (4.307) leads to the a priori estimation of the error in the iterations method. So, to determine the solution x with an imposed precision ε, we must make a number of iterations given by k = ln ε (1 − α ) / x(1) − x(0) ln( α ) + 1, (4.308) where the external brackets mark the entire part of the function. (ii) A posteriori estimation of the error: This estimation is given by the formula (4.295) written in the form x(k) − x ≤ 1 1 − α x(k+1) − x(k) < ε. (4.309) Hence, to determine x with an imposed precision ε, we must iterate until the difference between two successive iterations x(k) and x(k+1) verifies the relation x(k+1) − x(k) < ε(1 − α ). (4.310) Observation 4.26 A sufficient condition to have α < 1 is given by the relation |aii | > n j=1 j=i |aij |, i = 1, n. (4.311) Thus, it follows α ∞ < 1. Analogically, if |aii | > n j=1 j=i |aij |, i = 1, n, (4.312) then we get α 1 < 1.
  • 156. SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 147 4.5.7 The Gauss–Seidel Method The Gauss–Seidel method11 is a variant of the iterations method; indeed, at the step k + 1 for the determination of x(k+1) i one uses the values x(k+1) 1 , . . . , x(k+1) i−1 (obtained at this step) and the values x(k) i+1, . . . , x(k) n (determined in the preceding step). We may write x(k+1) 1 = β1 + n j=1 αij x(k) j , x(k+1) 2 = β2 + α21x(k+1) 1 n j=2 α2j x(k) j , . . . , x(k+1) i = βi + i−1 j=1 αij x(k+1) j + n j=i+1 αij − x(k) j , . . . , x(k+1) n = βn + n−1 j=1 αnj x(k) j . (4.313) Proposition 4.12 Let x = αx + β, where α ∞ < 1 be the linear system. Under these conditions, the iterative Gauss–Seidel process described by the relations (4.313) is convergent to the unique solution of the system for any choice of the initial value x(0). Demonstration. The component x(k) i is given by x(k) i = i−1 j=1 αij x(k) j + n j=i+1 αij x(k−1) j + βi, i = 1, n. (4.314) On the other hand, xi = n j=1 αij xj + βi, i = 1, n, (4.315) and, by subtracting the relation (4.314) from relation (4.315) term by term, we obtain xi − x(k) i = i−1 j=1 αij (xj − x(k) j ) + n j=i+1 αij (xj − x(k−1) j ). (4.316) We apply the modulus in the last relation and obtain the result |xi − x(k) i | ≤ i−1 j=1 |αij ||xj − x(k) j | + n j=i+1 |αij ||xj − x(k−1) j |, i = 1, n. (4.317) But |xi − x(k) i | ≤ x − x(k) ∞, (4.318) because ∞ is a canonical norm, and hence |xi − x(k) i | ≤ i−1 j=1 |αij | x − x(k) + n k=i+1 |αij | x − x(k−1) . (4.319) Let us denote by m the value of the index i = 1, n for which |xm − x(k) m | is the norm α ∞, hence |xm − x(k) m | = max 1≤i≤n |xi − x(k) i | = α ∞. (4.320) 11The method is named after Carl Friedrich Gauss (1777–1855) and Philipp Ludwig von Seidel (1821–1896).
  • 157. 148 LINEAR ALGEBRA We have x − x(k) ≤ λi x − x(k) + µi x − x(k−1) , (4.321) hence x − x(k) ≤ µi 1 − λi x − x(k−1) . (4.322) Let q = max 1≤i≤n µi 1 − λi . (4.323) Let us show that q ≤ α ∞ < 1. Now, λi + µi = n j=1 |αij | ≤ α ∞, (4.324) from which µi ≤ α ∞ − λi, i = 1, n, (4.325) with α ∞ < 1. We may also write µi 1 − λi ≤ α ∞ − λi 1 − λi ≤ α ∞ − λi α ∞ 1 − λi = α ∞ < 1, (4.326) hence q ≤ α ∞. The relation (4.322) leads now to the sequence of inequalities x − x(k) ≤ q x − x(k−1) ≤ q2 x − x(k−2) ≤ · · · ≤ qk x − x(0) (4.327) and, by passing to the limit as k → ∞, we get lim k→∞ x(k) = x (4.328) and the proposition is thus proved. Proposition 4.13 (Error Estimation). Under the above conditions, the inequalities result: x(k) − x ∞ ≤ 1 1 − q x(k+1) − x(k) ∞ ≤ qk 1 − q x(1) − x(0) ∞ (4.329) Demonstration. The proof is analogical to that of Proposition 4.11. Observation 4.27 Obviously, the formulae for error estimation are x(k) − x ∞ ≤ qk 1 − q x(1) − x(0) ∞ < ε (4.330) and x(k) − x ∞ ≤ 1 1 − q x(k+1) − x(k) ∞, (4.331) respectively.
  • 158. SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 149 4.5.8 The Relaxation Method Let the linear system be given by    a11x1 + a12x2 + · · · + a1nxn = b1, a21x1 + a22x2 + · · · + a2nxn = b2, . . . an1x1 + an2x2 + · · · + annxn = bn, (4.332) which we assume to be compatible determined and with aii = 0, i = 1, n. Dividing row i by aii , i = 1, n, one obtains the system    −x1 + γ12x2 + · · · + γ1nxn + δ1 = 0, γ21x1 − x2 + · · · + γ2nxn − δ2 = 0, . . . γn1x1 + γn2x2 + · · · − xn + δn = 0, , (4.333) where γij = − aij aii , δi = bi aii , i, j = 1, n, i = j. (4.334) Let x(0) = x(0) 1 x(0) 2 · · · x(0) n T be an approximation of the solution of the system (4.323), which we replace in that one. We thus obtain rests of the form R(0) 1 = −x(0) 1 + n j=2 γ1j x(0) j + δ1, R(0) 2 = −x(0) 2 + n j=2 γ2j x(0) j + δ2, . . . , R(0) n = −x(0) n + n j=2 γnj x(0) j + δn. (4.335) Let |R(0) k | = max{|R(0) 1 |, |R(0) 2 |, . . . , |R(0) n |}, (4.336) be the maximum of the moduli of these rests and let us give to xk the value xk + R(0) k . At this point, R(1) k = 0 and the other rests are R(1) i = R(0) i + γik R(0) k , i = 1, n, i = k. (4.337) Between the rests R(1) i , i = 1, n, one of them will be maximum in modulus, say, R(1) l . We give to xi the increment R(1) l ; it follows that R(2) l = 0 and R(2) i = R(1) i + γil R(1) l , i = 1, n, i = l. (4.338) The process may continue either until one obtains the desired precision, or until R (p) i = 0, i = 1, n, at some step. The solution of the system is given by xi = x(0) i + p k=1 R(k) i , (4.339) where p is the number of the iteration steps performed.
  • 159. 150 LINEAR ALGEBRA 4.5.9 The Monte Carlo Method Let us consider the linear system12 Ax = b, A ∈ Mn(R), x, b ∈ Mn,1(R), (4.340) which can be written in the form x = αx + β, (4.341) where α < 1, being one of the canonical norms. Let us choose the factors vij , i, j = 1, n, so that αij = pij vij , (4.342) where pij ≥ 0, with pij > 0 for αij > 0, i, j = 1, n, (4.343) n j=1 pij < 1, i = 1, n. (4.344) We construct the matrix H so that hij = pij , i, j = 1, n, hn+1,j = 0, j = 1, n, hi,n+1 = 1 − n j=1 pij , i = 1, n, hn+1,n+1 = 1, that is, H =                 p11 p12 · · · p1n p1,n+1 = 1 − n j=1 p1j p21 p22 · · · p2n p2,n+1 = 1 − n j=1 p2j · · · · · · · · · · · · · · · pn1 pn2 · · · pnn pn,n+1 = 1 − n j=1 pnj 0 0 · · · 0 1                 . (4.345) Moreover, we choose a sequence S1, S2, . . . , Sn+1 of states possible and incompatible with one another, in which Sn+1 is the frontier or the absorbent barrier. Thus, pij represents the probability of passing of a particle from the state Si to the state Sj independently of the previous states, the further states being non-definite. The state Sn+1 is a singular one and supposes the stopping of the particle, which is evidenced by pn+1,j = 0, j = 1, n. Thus, a particle starts from an initial state Si, i = 1, n, then passes into another state Sj and so on until it attains the final state Sn+1. Obviously, the number of states through which the particle passes is finite, but the number is different from simulation to simulation, that is, there are a number of paths from the initial state Si, i = 1, n, to the final one Sn+1. It appears as a simple, homogeneous Markov chain with a finite number of states. Let Si0 , i0 = 1, n, be an initial state and one such Markov chain that defines the trajectory of the particle be given by Ti = {Si0 , Si1 , . . . , Sim , Sim+1 }, (4.346) where Sim+1 = Sn+1, that is, the final state. 12The Monte Carlo method was stated in the 1940s by John von Neumann (1903–1957), Stanislaw Marcin Ulam (1909–1984), and Nicholas Constantine Metropolis (1915–1999). The name of the method comes from the famous Monte Carlo Casino.
  • 160. SOLUTION OF LINEAR ALGEBRAIC SYSTEMS OF EQUATIONS 151 We associate with this trajectory the aleatory variable Xi, the value of which is ξ(Ti) = βi0 + vi0i1 βi0 + vi1i2 βi1 + · · · + vim−1im βim . (4.347) Theorem 4.1 The mathematical expectation MXi = Ti ξ(Ti)P (Ti) = i Tij ξ(Tij )P (Tij ) = xi (4.348) is a solution of the system (4.341). Demonstration. The trajectories of Ti type may be divided into distinct classes as functions of the state through which the particle passes for the first time. We have Ti1 = {Si, S1, . . . }, Ti2 = {Si, S2, . . . }, Tin = {Si, Sn, . . . }, Tin+1 = {Si, Sn+1}. (4.349) Thus, Ti is the trajectory from one of the sets (4.349), if Ti is given by (4.346), then the associate aleatory variable Xi will have the value ξ(Tij ) = βi + vij βj + vji2 βi2 + · · · + vim−1im βim = βi + vij ξ(Tj ). (4.350) Obviously, for the trajectory Tin+1 = {Si, Sn+1}, we have ξ(Tin+1 ) = βi. (4.351) If j < n + 1, then the trajectory Ti is composed from the segment (Si, Sj ), to which we add a trajectory from the set Tj defined by (4.349). It follows that P (Tij ) = pij P (Tj ). (4.352) If j = n + 1, then P (Tin+1 ) = pi,n+1. (4.353) It follows that MXi = n i=1 Tj [βi + vij ξ(Tij )]pij P (Tj ) + βipi,n+1 (4.354) or MXi = n j=1 pij vij Tj ξ(Tj ) + βi   n j=1 pij Tj P Tj + pi,n+1   . (4.355) On the other hand, Tj ξ(Tj )P (Tj ) = MXj , j = 1, n, (4.356) Tj P (Tj ) = 1, (4.357) n j=1 pij Tj P (Tj ) + pi,n+1 = n+1 j=1 pij = 1, (4.358)
  • 161. 152 LINEAR ALGEBRA so that the formula (4.355) becomes MXi = n j=1 αij MXj + βi, i = 1, n, (4.359) and the theorem is proved. Chebyshev’s theorem ensures that the inequality xi − 1 N N k=1 ξ T (k) i < ε (4.360) is realized with a probability tending to 1 for N → ∞. Thus it follows that xi ≈ 1 N N k=1 ξ(T (k) i ). (4.361) Practically, the problem is solved in a simpler manner. One constructs the matrix H. Let us observe that if α < 1, then we may choose pij = |αij | and vij = 1 if αij > 0 or vij = −1 if αij < 0. Let us suppose that we wish to determine xi, hence we start with the state Si. Thus, a uniformly distributed aleatory number is generated in the interval (0, 1), let the number be π1. On the line i of the matrix H, an index j is required, so that j k=1 pik ≤ π1 and j+1 k=1 pik > π1. (4.362) This index defines the new state Sj through which the particle passes. Obviously, this state may also be Sn+1, the case in which the trajectory stops. If j = n + 1, then we use the row j of the matrix H, for which a new uniformly distributed aleatory number is generated in the interval (0, 1). The process continues until the final state Sn+1 is attained. Thus, ξ(T (1) i ), where the upper index (1) marks the first simulation, is calculated. The procedure is repeated N times, the approximate value of xi being given by the formula (4.361). Observation 4.28 The process gives also a possibility to calculate the inverse of the matrix A, with A < 1, because determining the inverse A−1 is equivalent to solving a system of n2 linear equations with n2 unknowns. 4.5.10 Infinite Systems of Linear Equations We have considered until now a linear system of n equations with n unknowns, where n is a finite integer. We try to generalize this for n → ∞. Let us consider the infinite system Ax = b, (4.363) where x = [xk]T k∈N, A = [ajk ]j,k∈N, b = [bj ]T j∈N. Definition 4.9 The system is called regular if the matrix A is diagonally dominant, that is, |ajj | ≥ k∈K k=j |ajk |, j ∈ N, (4.364)
  • 162. DETERMINATION OF EIGENVALUES AND EIGENVECTORS 153 and completely regular if the above inequality (4.364) is strict, that is, A is strictly diagonally dominant. The well-known theorem that asserts the existence and the uniqueness of the solution of a finite, linear algebraic system, whose associated matrix is strictly diagonally dominant can be extended to completely regular infinite systems. If the system is regular, but not completely regular, only the existence of the solution is ensured. The condition (4.364) may be written also in the form ρ = 1 − k∈N k=j |ajk ||ajj | ≥ 0. (4.365) In case of a regular system, one may use the method of sections, considering that n is a finite integer, that is, one solves a finite system formed by the first n equations with the first n unknowns, by the methods presented above. Obviously, the accuracy of the solution depends on the number n. 4.6 DETERMINATION OF EIGENVALUES AND EIGENVECTORS 4.6.1 Introduction Let A ∈ Mn(C) be a matrix with complex elements and V ⊂ Cn a vector space. The matrix A defines a linear transformation by the relation x ∈ V → Ax ∈ V. (4.366) Let us consider a subspace V1 of V and let us suppose that V1 is invariant with respect to the linear transformation induced by the matrix A, hence for any x ∈ V1 it follows that Ax ∈ V1. It follows that the subspace V1 is defined by the equation Ax = λx, (4.367) where λ is an element of the corpus that defines the product by scalars over V. Definition 4.10 Any nonzero element x that satisfies the relation (4.367) is called an eigenvector of the matrix A, while the element λ is called an eigenvalue of the matrix A. Definition 4.11 The set of all the eigenvalues of the matrix A is called the spectrum of this matrix and is denoted by SpA or Λ(A). Observation 4.29 (i) If λ is an eigenvalue of the matrix A, then the matrix A − λIn, where In is the unit matrix of order n, is a singular matrix and, conversely, if the matrix A − λIn is singular, then λ is an eigenvalue for the matrix A. (ii) The eigenvalues of the matrix A are obtained by solving the algebraic equation det[A − λIn] = 0, (4.368) called the characteristic equation or secular equation.
  • 163. 154 LINEAR ALGEBRA (iii) Equation (4.368) is an algebraic equation of nth degree, which, corresponding to the basic theorem of algebra, has n roots in C. These roots may be distinct or one may have various orders of multiplicity. Hence, it follows that to an eigenvector there corresponds only one eigenvalue, but to an eigenvalue there may correspond several eigenvectors. (iv) If A ∈ Mn(R), then the eigenvalues are real or conjugate complex. (v) If the matrix A ∈ Mn(C) has n distinct eigenvalues λi, i = 1, n, then any vector y ∈ Cn may be written in the form y = n i=1 µixi, (4.369) where µi ∈ C, i = 1, n, the formula being unique. (vi) As Axi = λixi, i = 1, n, (4.370) by multiplying the relation (4.369) on the left by Ak , we obtain Ak y = n i=1 Ak µixi = n i=1 µiAk xi = n i=1 µiAk−1 (Axi) = n i=1 µiAk−1 λixi = · · · = n i=1 λk i µixi. (4.371) (vii) Let us suppose that we have the relation |λ1| > |λi|, i = 2, n, (4.372) for the matrix A; that is, λ1 is the greatest eigenvalue in modulus. The expression (4.371) may also be written in the form Ak y = n i=1 λk i µixi = λk 1µ1x1 + λk 2µ2x2 + · · · + λk nµnxn = λk 1µ1 x1 + λk 2 λk 1 µ2 µ1 x2 + · · · + λk n λk 1 µn µ1 xn , (4.373) where we suppose that µ1 = 0. Passing to the limit after k in the last relation, we get lim k→∞ Ak y = lim k→∞ λk 1µ1x1. (4.374) (viii) Let A ∈ Mn(C) and k ∈ N∗. Under these conditions, if the eigenvalues of A are distinct λi ∈ C, i = 1, n, λi = λj for i = j, j = 1, n, then the spectrum of the matrix Ak is given by Λ(Ak ) = {λk i }, i = 1, n. (4.375) It follows that if A is idempotent (that is A2 = A), then Λ(A) = {0, 1}, and if A is nilpotent (that is there exists k ∈ N so that Ak = 0), then Λ(A) = {0}.
  • 164. DETERMINATION OF EIGENVALUES AND EIGENVECTORS 155 (ix) If x is an eigenvector of the matrix A corresponding to the eigenvalue λ, that is, if Ax = λx, (4.376) while y is a vector in Cn , which depends on the variable t ∈ R (in general, t is the time), corresponding to the law y(t) = eλt x, (4.377) then y verifies the differential equation dy dt = Ay. (4.378) Indeed, one may write dy dt = λeλt x = eλt λx = eλt Ax = Aeλt x = Ay. (4.379) It follows that the particular solution of a system of ordinary differential equations may be immediately written if one knows the eigenvectors and the eigenvalues of the matrix A. Definition 4.12 The matrices A and B of Mn(C) are said to be similar if there exists a nonsingular matrix P ∈ Mn(C), so that B = P−1 AP. (4.380) Observation 4.30 Let λ be an eigenvalue of the matrix A and x be the corresponding eigenvector. If B is a matrix similar to A, by means of the matrix P, then λ is an eigenvalue of A if and only if it is eigenvalue of B with the eigenvector P−1 x. Indeed, we obtain B(P−1 x) = P−1 APP−1 x = P−1 Ax = P−1 λx = λP−1 x (4.381) from Ax = λx. 4.6.2 Krylov’s Method Let us denote by P (λ) the characteristic polynomial13 P (λ) = det[A − λIn], (4.382) where A ∈ Mn(R), In being as usual the unit matrix of order n. We may write P (λ) = (−1)n λn + p1λn−1 + p2λn−2 + · · · + pn. (4.383) Multiplying the relation (4.383) by (−1)n we obtain a polynomial of nth degree, for which the dominant coefficient is equal to 1, P1(λ) = λn + q1λn−1 + q2λn−2 + · · · + qn, (4.384) in which qi = (−1)n pi, i = 1, n − 1. (4.385) 13The method is credited to Aleksey Nikolaevich Krylov (1863–1945) who first presented it in 1931.
  • 165. 156 LINEAR ALGEBRA The Hamilton–Cayley theorem allows to state that the matrix A equates the characteristic polynomial to zero. Hence, we obtain An + q1An−1 + q2An−2 + · · · + qnIn = 0. (4.386) Let y(0) = y(0) 1 y(0) 2 · · · y(0) n T (4.387) be a nonzero vector in Rn . Let us multiply the relation (4.386) on the right by y(0) . It results An y(0) + q1An−1 y(0) + q2An−2 y(0) + · · · + qny(0) = 0. (4.388) We denote Ak y(0) = y(k) , k = 0, n (4.389) and the relation (4.388) becomes y(n) + q1y(n−1) + q2y(n−2) + · · · + qny(0) = 0, (4.390) an equation in which the unknowns are q1, q2, . . . , qn. The relation (4.390) may be also written in the form q1y(n−1) + q2y(n−2) + · · · + qny(0) = −y(n) (4.391) or in components,      y(n−1) 1 y(n−2) 1 · · · y(0) 1 y(n−1) 2 y(n−2) 2 · · · y(0) 2 · · · · · · · · · · · · y(n−1) n y(n−2) n · · · y(0) n          q1 q2 · · · qn     = −      y(n) 1 y(n) 2 · · · y(n) n      . (4.392) The coefficients q1, q2, . . . , qn of the characteristic polynomial are determined by solving the linear system (4.392) of n equations with n unknowns. Observation 4.31 The relation (4.389) that defines the vector y(k) may also be written recursively y(0) ∈ Rn arbitrary, y(0) = 0, y(k) = Ay(k−1) , k ≥ 1. (4.393) Observation 4.32 If the roots of the characteristic polynomial are real and distinct, then Krylov’s method also leads to the corresponding eigenvectors. Indeed, the n eigenvectors x1, . . . , xn form a basis in Rn ; then any vector of Rn may be written as a linear combination of these vectors of the basis. In particular, there exist the constants c1, c2, . . . , cn, not all zero, so that y(0) = c1x1 + c2x2 + · · · + cnxn. (4.394) The relations (4.393) are transcribed now in the form y(0) = Ay(0) = A(c1x1 + · · · + cnxn) = c1λ1x1 + c2λ2x2 + · · · + cnλnxn, y(2) = c1λ2 1x1 + c2λ2 2x2 + · · · + cnλ2 nxn, . . . , y(n) = c1λn 1x1 + c2λn 2x2 + · · · + cnλn nxn. (4.395)
  • 166. DETERMINATION OF EIGENVALUES AND EIGENVECTORS 157 Let us introduce the polynomials φi(λ) = λn−1 + q1iλn−2 + · · · + qn−1,i, i = 1, n, (4.396) hence, it follows that y(n−1) + q1iy(n−2) + · · · + qn−1,iy(0) = c1φi(λ1)x1 + · · · + cnφi(λn)xn. (4.397) On the other hand, we consider φi(λ) = P1(λ) λ − λi , (4.398) so that the coefficients qij , i = 1, n, j = 1, n − 1, are given by Horner’s schema q0j = 1, qij = λj qi−1,j + qi. (4.399) Under these conditions, φi(λj ) = 0 for any i and j with i = j (4.400) and φi(λj ) = P (λj ) = 0. (4.401) We thus obtain y(n−1) + q1iy(n−2) + · · · + qn−1,iy(0) = ciφi(λi)xi, i = 1, n (4.402) and if ci = 0, then we get the eigenvectors ciφi(λi)xi, i = 1, n. 4.6.3 Danilevski’s Method Let14 P (λ) = a11 − λ a12 · · · a1n−1 a1n a21 a22 − λ · · · a2n−1 a2n · · · · · · · · · · · · · · · an−1,1 an−1,2 · · · an−1,n−1 − λ an−1,n an1 an2 · · · an,n−1 an,n − λ = (−1)n λn − n i=1 piλn−i . (4.403) be the characteristic polynomial of the matrix A ∈ Mn(R). The idea of the method consists in the transformation of the matrix A − λIn =       a11 − λ a12 · · · a1n−1 a1n a21 a22 − λ · · · a2n−1 a2n · · · · · · · · · · · · · · · an−1,1 an−1,2 · · · an−1,n−1 − λ an−1,n an1 an2 · · · an,n−1 an,n − λ       (4.404) 14The method was stated by A. M. Danilevski (Danilevsky) in Russian in 1937, and then in 1959 it was translated into English.
  • 167. 158 LINEAR ALGEBRA into the matrix B − λIn =       p1 − λ p2 p3 · · · pn−2 pn−1 pn 1 −λ 0 · · · 0 0 0 0 1 −λ · · · 0 0 0 · · · · · · · · · · · · · · · · · · · · · 0 0 0 · · · 0 1 −λ       (4.405) of a normal Frobenius form.15 On the other hand, the determinant of the matrix B − λIn, calculated by developing after the first row, is det[B − λIn] = (−1)n−1 n i=1 piλn−i − λn = P (λ). (4.406) To bring the matrix A to the Frobenius form B, we proceed as follows: • We multiply the (n − 1)th column of the matrix A by an1/an,n−1, an2/an,n−1, . . . , an,n−2/an,n−1, ann/an,n−1, respectively, and subtract it from the columns 1, 2, . . . , n − 2, n, respectively. This is equivalent to the multiplication on the right of the matrix A by the matrix M1 =         1 0 · · · 0 0 0 0 1 · · · 0 0 0 · · · · · · · · · · · · · · · · · · − an,1 an,n−1 − an,2 an,n−1 · · · − an,n−2 an,n−1 1 an,n−1 − ann an,n−1 0 0 · · · 0 0 1         . (4.407) The inverse of the matrix M1 is M−1 1 =       1 0 · · · 0 0 0 0 1 · · · 0 0 0 · · · · · · · · · · · · · · · · · · an1 an2 · · · an,n−2 an,n−1 an,n 0 0 · · · 0 0 1       . (4.408) • To obtain a similar matrix, we must consider, in the following step, the matrix A2 = M−1 1 A1M1, A1 = A. (4.409) • the procedure is repeated for the (n − 1)th row and the matrix A2 until we obtain the (n − 1)th row of the Frobenius matrix; • the procedure continues until the second row, when the Frobenius matrix directly results. Observation 4.33 If the element ai,i−1 is equal to zero (this means, on the computer, |ai,i−1| < ε, ε given a priori), then one searches on the row i for a nonzero element among ai1, ai2, . . . , ai,i−2, 15This form was introduced by Ferdinand Georg Frobenius (1849–1917).
  • 168. DETERMINATION OF EIGENVALUES AND EIGENVECTORS 159 let it be aij , j < i − 1, adding the columns i and j of the initial matrix. This means multiplication on the right by the matrix M∗ ij =           1 0 0 · · · 0 · · · 0 · · · 0 · · · 0 0 1 0 · · · 0 · · · 0 · · · 0 · · · 0 0 0 1 · · · 0 · · · 0 · · · 0 · · · 0 · · · · · · · · · · · · · · · · · · . . . · · · · · · · · · · · · 0 0 0 · · · m∗ jj = 1 · · · 0 · · · m∗ ji = 1 · · · 0 · · · · · · · · · · · · · · · · · · . . . · · · · · · · · · · · · 0 0 0 · · · 0 · · · 0 · · · 0 · · · 1           , (4.410) the inverse of which is (M∗ ij )−1 =           1 0 0 . . . 0 . . . 0 . . . 0 . . . 0 0 1 0 . . . 0 . . . 0 . . . 0 . . . 0 0 0 1 . . . 0 . . . 0 . . . 0 . . . 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 0 0 . . . m∗ jj −1 = 1 . . . 0 . . . (m∗ ji )−1 = −1 . . . 0 0 0 0 . . . 0 . . . 0 . . . 0 . . . 1           . (4.411) Observation 4.34 (i) If y is an eigenvector of the Frobenius matrix B, then the eigenvector of the matrix A is x = M1M2 · · · Mn−1y, (4.412) where we suppose that, by passing from the matrix A to the Frobenius matrix B, addition of columns have not been necessary (if not, matrices of the form (4.410) would appear in the product (4.412)). (ii) Let us consider that the Frobenius matrix has distinct eigenvalues and let λ be one such value (which is eigenvalue for the matrix A too, the matrix A also having distinct eigenvalues). The eigenvector y, corresponding to the eigenvalue λ for the Frobenius matrix, satisfies the relation       p1 p2 . . . pn−1 pn 1 0 . . . 0 0 0 1 . . . 0 0 . . . . . . . . . . . . . . . 0 0 . . . 1 0             y1 y2 y3 . . . yn       = λ       y1 y2 y3 . . . yn       , (4.413) from which yn−j = λyn−j+1, j = 1, n − 1 (4.414) and n i=1 piyi = λy1. (4.415) The relation (4.414) leads to yn−1 = λyn, yn−2 = λyn−1 = λ2 yn, . . . , y1 = λn−1 yn (4.416)
  • 169. 160 LINEAR ALGEBRA and, by replacing in (4.415), we obtain yn λn − n i=1 piλn−i = 0; (4.417) hence, the characteristic polynomial of the matrix A is the same as that of the Frobenius matrix B. Moreover, because yn = 0 (if not, it would follow that y = 0), one obtains also the eigenvector of the Frobenius matrix B, y = [λn−1 λn−2 · · · λ 1]T , (4.418) where we have supposed that yn = 1. Observation 4.35 To reduce the errors in calculation, we usually consider as pivot not the element ai,i−1 but the greatest element in modulus from among ai1, ai2, . . . , ai,i−1. Let that element be aij , that is |aij | = max k=1,i−1 |aik |. (4.419) A commutation one into the other of the columns i and j is necessary, under these conditions; one thus uses a matrix Pij = P −1 ij =              1 0 . . . 0 . . . 0 . . . 0 0 1 · · · 0 · · · 0 · · · 0 · · · · · · · · · · · · · · · · · · · · · · · · 0 0 · · · pii = 0 · · · pij = 1 · · · 0 · · · · · · · · · · · · · · · · · · · · · · · · 0 0 · · · pji = 1 · · · pjj = 0 · · · 0 · · · · · · · · · · · · · · · · · · · · · · · · 0 0 · · · 0 · · · 0 · · · 1              . (4.420) Observation 4.36 If at a certain point, all the elements a(n+1−i) ij , j = 1, i − 1, vanish, that is, at the step n + 1 − i we will not be able to find a pivot on the row i, then the determinant of the matrix A is written, according to Laplace’s theorem, as the product of two determinants and the matrix A is decomposed into blocks. 4.6.4 The Direct Power Method Let us consider the matrix A ∈ Mn(R) for which we suppose that the eigenvalues are distinct and ordered as follows: |λ1| > |λ2| > · · · > |λn|. (4.421) The n eigenvectors x1, x2, . . . , xn, corresponding to the eigenvalues λ1, λ2, . . . , λn are linearly independent, hence they form a basis in Rn . Let y ∈ Rn be arbitrary. Under these conditions, y has a unique representation with respect to the basis’ vectors x1, . . . , xn; hence there exist the real constants a1, . . . , an, uniquely determinate, so that y = n j=1 aj xj . (4.422)
  • 170. DETERMINATION OF EIGENVALUES AND EIGENVECTORS 161 On the other hand, Ay = A n j=1 aj xj = n j=1 aj (Axj ) (4.423) and because Axj = λj xj , j = 1, n, (4.424) it results in the representation Ay = n j=1 aj λj xj . (4.425) Analogically, we obtain the relations A2 y = n j=1 aj λ2 j xj , (4.426) A3 y = n j=1 aj λ3 j xj (4.427) and, in general, An y = n j=1 aj λn j xj (4.428) for any m ∈ N∗ . Let us denote y(m) = Am y = [y(m) 1 y(m) 2 · · · y(m) n ]T ; (4.429) the relation (4.428) becomes y(m) = n j=1 aj λm j xj . (4.430) Let V be the subspace of Rn generated by the set of vectors Y = {y(1) , y(2) , . . . , y(m) , . . . } (4.431) and let B B = {e1, e2, . . . , en}. (4.432) be a basis of it or an extension of one of its bases in Rn . Observation 4.37 (i) All the previous considerations are valid for y = 0. Obviously, if y = 0, then y(m) = 0 for any m ∈ N∗ . Moreover, if A = 0, then y(m) = 0 for any m ∈ N∗ , irrespective of the y initially chosen. We will suppose that A = 0 and y = 0.
  • 171. 162 LINEAR ALGEBRA (ii) Obviously, the space Y may have a dimension less than n. As y(m) ∈ Rn for any m ∈ N∗, it follows that Y ⊂ Rn and dim Y ≤ dim Rn . If dim Y = n, then, obviously, B is given by the formula (4.432). If dim Y < n, then one can add to the basis’ vectors, let us say in terms of k, n − k vectors to form the basis B in Rn . As B is such a basis, it follows that any vector of Rn may be written in the form of a unique linear combination of vectors of B. In particular, xj = n i=1 xij ej , j = 1, n. (4.433) Under these conditions, the vector y(m) becomes y(m) = n j=1 aj λm j n i=1 xij ei = n i=1  ei   n j=1 aj xij λm j    . (4.434) But n j=1 aj xij λm j = y(m) i , (4.435) so that y(m) i = n j=1 aj xij λm j . (4.436) Writing the previous relation for m + 1, y(m+1) i = n j=1 aj xij λm+1 j , (4.437) and making the ratio of (4.436) to (4.437), we obtain y(m+1) i y(m) i = n j=1 aj xij λm+1 j n j=1 aj xij λm j . (4.438) On the other hand, y(m+1) i = n j=1 aj xij λm+1 j = a1xi1λm+1 1 + · · · + anxinλm+1 n = λm+1 1 a1xi1 + a2xi2 λ2 λ1 m+1 + · · · + anxin λn λ1 m+1 (4.439) and, analogically, y(m) i = λm 1 a1xi1 + a2xi2 λ2 λ1 m + · · · + anxin λn λ1 m . (4.440)
  • 172. DETERMINATION OF EIGENVALUES AND EIGENVECTORS 163 Taking into account the relations (4.421), (4.439), and (4.440) and making m → ∞ in the relation (4.438), we get lim m→∞ y(m+1) i y(m) i = λ1. (4.441) Observation 4.38 (i) The formula (4.441) suggests that the index i, 1 ≤ i ≤ n, chosen for the ratio y(m+1) i /y(m) i , does not matter because we obtain λ1, as the limit. The statement is erroneous. (ii) It is also possible that the limit in the relation (4.441) is infinite or does not exist, which may lead to erroneous values for the approximation of λ1. (iii) It follows from (i) and (ii) that the method is sensitive to the choice of the start vector y. (iv) Instead of the ratio (4.441), we may choose λ1 = lim m→∞ n i=1 y(m+1) i n i=1 y(m) i , (4.442) so as to obtain the approximate formula λ1 ≈ n i=1 y(m+1) i n i=1 y(m) i . (4.443) (v) The procedure may be accelerated with regard to the convergence, by using powers of 2 as values of m, so that A2 = AA, A4 = A2 A2 , . . . , A2k = A2k−1 A2k−1 . (4.444) The value of λ1 is given by the ratio λ1 = lim k→∞ y(2k) i y(2k−1) i . (4.445) (vi) The vector y(m) = Am y (4.446) is the approximate value of the eigenvector of the matrix A, associated with the eigenvalue λ1. Indeed, one may write Am y = a1λm 1 x1 + n j=2 aj λm j xj = a1λm 1    x1 + n j=2 aj a1 λj λ1 m xj    . (4.447) But lim m→∞ λj λ1 m = 0, (4.448) It follows Am y ≈ a1λm 1 x1, (4.449) hence the vector Amy differs from the eigenvector x1 only by a multiplicative factor.
  • 173. 164 LINEAR ALGEBRA (vii) One can also choose λ1 = 1 n n i=1 y(m+1) i y(m) i . (4.450) If the greatest modulus of the roots is multiple, of order of multiplicity p, that is, |λ1| = |λ2| = · · · = |λp| > |λp+1| > · · · > |λn|, (4.451) then y(m+1) i y(m) i = λm+1 1 p j=1 aj xij + n j=p+1 aj xij λm+1 j λm 1 p j=1 aj xij + n j=p+1 aj xij λm j . (4.452) Assuming that p j=1 aj xij = 0, (4.453) we obtain y(m+1) i y(m) i = λ1 1 + n j=p+1        aj xij p j=1 aj xij λj λ1 m+1        1 + n j=p+1        aj xij p j=1 aj xij λj λ1 m        . (4.454) Passing to the limit for m and taking into account that (λj /λ1)m → 0 for m → ∞, we obtain lim m→∞ y(m+1) i y(m) i = λ1. (4.455) Now, Amy = y(m) is one of the eigenvectors associated with the eigenvalue λ1. Observation 4.39 (i) Let us form the sequence of matrices A, A2 , A22 , . . . , A2k . As n i=1 λm i = Tr(Am ), m = 2k , (4.456) where Tr(·) denotes the trace of (·), we obtain the equality λm 1 + λm 2 + · · · + λm n = λm 1 1 + λ2 λ1 m + · · · + λn λ1 m = Tr(Am ) (4.457)
  • 174. DETERMINATION OF EIGENVALUES AND EIGENVECTORS 165 for the simple eigenvalue λ1. Passing to the limit for m → ∞, it follows that Tr(Am ) = λm 1 , (4.458) from which λ1 = m Tr(Am). (4.459) (ii) If now we write Am+1 = Am · A, (4.460) λm+1 1 + · · · + λm+1 n = Tr(Am+1 ), (4.461) λm 1 + · · · + λm n = Tr(Am ). (4.462) Dividing the last two relations and making m → ∞, it follows that λ1 = Tr(Am+1) Tr(Am) . (4.463) 4.6.5 The Inverse Power Method The inverse power method is used to find the smallest eigenvalue in the modulus of the matrix A ∈ Mn(R), in the case in which A is nonsingular. In the latter case, det A = 0 and, in the characteristic polynomial P (λ) = (−1)n λn + pn−1λn−1 + · · · + p1λ + p0, (4.464) the free term is nonzero p0 = det A = 0. (4.465) Hence, λ = 0 is not an eigenvalue for the matrix A. Let x be an eigenvector corresponding to the eigenvalue λ of the matrix A. One can successively write (λ = 0) x = λ−1 λx = λ−1 Ax, (4.466) from which A−1 x = λ−1 x; (4.467) hence, the eigenvalues of the inverse A−1 are the inverses of the eigenvalues of the original matrix A. Thus, if λ1 is the smallest eigenvalue in modulus of the matrix A, that is, 0 < |λ1| < |λ2| ≤ · · · ≤ |λn|, (4.468) then 1/λ1 is the greatest eigenvalue in modulus for A−1 and we can use the method of direct power for the matrix A−1 . Obviously, all the commentaries and discussions made in the direct power method remain valid.
  • 175. 166 LINEAR ALGEBRA 4.6.6 The Displacement Method The idea of the displacement method is based on the observation that if the matrix A has the eigenvalues λ1, λ2, . . . , λn, then the matrix A − qIn, A ∈ Mn(R), q ∈ R, has the eigenvalues λ1 − q, λ2 − q, . . . , λn − q. Thus, one can find also eigenvalues other than the maximum or the minimum ones in the modulus for the matrix A. Let us suppose that λ1 is the maximum value in the modulus of the matrix A, while λn is the minimum value in modulus of the matrix A. After a displacement q, considering the matrix A − qIn, the maximum and the minimum eigenvalues in the modulus of this new matrix are given by λ1 = max 1≤i≤n |λi − q|, λn1 = min 1≤i≤n |λi − q|. (4.469) 4.6.7 Leverrier’s Method Let A ∈ Mn(R), with the characteristic polynomial16 P (λ) = det(λIn − A) = λn + p1λn−1 + p2λn−2 + · · · + pn. (4.470) The roots of P (λ) are λ1, λ2, . . . , λn. Let us denote Sk = λk 1 + λk 2 + · · · + λk n. (4.471) The following Newton formulae are known Sk + Sk−1p1 + · · · + S1pk−1 = −kpk. (4.472) We obtain the relations k = 1 ⇒ S1 = −p1 ⇒ p1 = −S1, k = 2 ⇒ S2 + p1S1 = −2p2 ⇒ p2 = − 1 2 (S2 + p1S1), . . . , k = n ⇒ pn = − 1 n (Sn + Sn−1p1 + Sn−2 + · · · + S1pn−1), (4.473) for k = 1, 2, . . . , n. On the other hand, S1 = λ1 + · · · + λn = T r(A), S2 = λ2 1 + · · · + λ2 n = T r(A2 ), . . . , Sk = λk 1 + · · · + λk n = T r(Ak ), . . . , Sn = λn 1 + · · · + λn n = Tr(An ). (4.474) The coefficients p1, p2, . . . , pn are given by the formulae (4.472) and (4.473). 4.6.8 The L–R (Left–Right) Method This method is based on the fact that any matrix A ∈ Mn(R) may be decomposed as a product of two matrices A = LR, (4.475) in which L ∈ Mn(R) is an inferior triangular matrix, while R ∈ Mn(R) is a superior triangular matrix. The decomposition leads to the sequence of matrices A1, A2, . . . , Ak in which Ai = LiRi, i = 1, k, (4.476) 16The method was named in the honor of Urbain Jean Joseph Le Verrier (Leverrier) (1811–1877).
  • 176. DETERMINATION OF EIGENVALUES AND EIGENVECTORS 167 where the matrices Li are inferior triangular, with the elements on the principal diagonal equal to unity l(i) jj = 1, j = 1, n, (4.477) while the matrices Ri are superior triangular. The recurrence formula of the sequence is given by Ai+1 = RiLi, A1 = A. (4.478) One obtains thus a similar transformation, because L1 = A1R−1 1 , A2 = R1L1 = R1A1R−1 1 = L−1 1 A1L1, A3 = R2L2 = R2R1A1(R2R1)−1 = (L1L2)−1 A1(L1L2), . . . , Ak = (Rk−1Rk−2 · · · R1)A1(Rk−1Rk−2 · · · R1)−1 = (L1L2 · · · Lk−1)−1 A1(L1L2 · · · Lk−1) (4.479) Moreover, all the matrices A1, . . . , Ak have the same eigenvalues. Observation 4.40 (i) The matrix Sk−1 = j=1 Lj (4.480) is an inferior triangular matrix with s(k−1) ii = 1 (4.481) and lim i→∞ Li = In, (4.482) where In is the unit matrix of order n, while the matrix Dk−1 = k−1 j=1 Rj (4.483) is a superior triangular matrix. (ii) The elements of the principal diagonal of the matrix Ri tend to the eigenvalues of the matrix A for i → ∞. (iii) The elements of the matrices Li and Ri are the solutions of the system of equations (4.476), that is, r1i = a1i, i = 1, n, rij = 0 for i > j, li1 = a1i a11 , i = 2, n, a11 = 0, rij = aij − i−1 k=1 lik rkj , i ≤ j, lij = 1 rii  aij − j−1 k=1 lik rkj   , rii = 0, i > j, lii = 1, i = 1, n, lij = 0, i < j. (4.484)
  • 177. 168 LINEAR ALGEBRA (iv) If the sequence A1, A2, . . . , Ak is convergent, then the matrix Ak is superior triangular and the elements situated on the principal diagonal are the eigenvalues of the matrix A, that is, a(k) ii = λi, i = 1, n. (4.485) (v) The condition for stopping the algorithm is given by Ak − Ak−1 < ε, (4.486) where ε is a positive error imposed a priori, while is one of the canonical norms of the matrix. 4.6.9 The Rotation Method The rotation method applies to the symmetric matrices A ∈ Mn(R) and supplies both the eigen- vectors and the eigenvalues of the matrix. The idea consists in the construction of sequence of matrices A0 = A, A1, . . . , Ap, . . . , obtained by the rule Ap+1 = R−1 ij ApRij , (4.487) the transformations being unitary and orthogonal. To do this, we choose the matrix Rij in the form of a rotation matrix Rij =                1 0 · · · 0 · · · 0 · · · 0 0 0 1 · · · 0 · · · 0 · · · 0 0 · · · · · · · · · · · · · · · · · · · · · · · · · · · 0 0 · · · cos α · · · − sin α · · · 0 0 · · · · · · · · · · · · · · · · · · · · · · · · · · · 0 0 · · · sin α · · · cos α · · · 0 0 · · · · · · · · · · · · · · · · · · · · · · · · · · · 0 0 · · · 0 · · · 0 · · · 1 0 0 0 · · · 0 · · · 0 · · · 0 1                , (4.488) that is, a unitary matrix in which the elements rii , rij , rji , rjj have been modified in the form rii = cos α, rij = − sin α, rji = sin α, rjj = cos α. Obviously, R−1 ij = RT ij . (4.489) By multiplying a matrix Ap by R−1 ij on the left and by Rij on the right, respectively, Ap+1 = R−1 ij ApRij , (4.490) we obtain a new matrix, which has the property a (p+1) ij = a (p+1) ji = 0, (4.491) that is, two new extradiagonal elements equal to zero have been created.
  • 178. QR DECOMPOSITION 169 On the other hand, the Euclidian norm k remains unchanged to transformations similar to rotation matrices, so that k=l a (p+1) kl 2 = k=l a (p) kl 2 − 2 a (p) ij 2 + 1 2 a (p) jj − a (p) ii sin 2α + 2a (p) ij cos 2α 2 . (4.492) It follows therefore that the Euclidian norm of the new matrix, calculated only with the extradiagonal elements, will diminish the most if |a (p) ij | = max k=l |a (p) kl | (4.493) and tan 2α = 2a (p) ij a (p) jj − a (p) ii , |α| ≤ π 4 . (4.494) If we denote such a norm by k , then Ap+1 k 2 = Ap k 2 − 2 a (p) ipjp 2 (4.495) and, furthermore, a (p) ipjp 2 ≥ Ap k 2 n(n − 1) , (4.496) because a (p) ipjp is the maximal element in modulus out of the principal diagonal in the matrix Ap. It results in the sequence of inequalities [ Ap+1 k ]2 ≤ [ Ap k ]2 1 − 2 n (n − 1) ≤ · · · ≤ [ A0 k ]2 1 − 2 n (n − 1) p+1 , (4.497) hence lim p→∞ Ap+1 k = 0. (4.498) One obtains thus a matrix A∗ , all the extradiagonal elements of which are equal to zero, while on the principal diagonal, it has the eigenvalues of the matrix A. Moreover, for a matrix Ap, p ∈ N, the elements of the principal diagonal approximate the eigenvalues of the matrix A, while the columns of the matrix R = Ri1j1 Ri2j2 · · · Rip−1jp−1 (4.499) approximate the eigenvectors of the matrix A. 4.7 QR DECOMPOSITION Definition 4.13 Let v ∈ Mn,1(R), v = 0. The matrix H = In − 2vvT vTv (4.500) is called the Householder17 reflexion or Householder matrix or Householder transformation, while the vector v is called the Householder vector, In being the unit matrix of order n. 17Introduced by Alston Scott Householder (1904–1993) in 1958.
  • 179. 170 LINEAR ALGEBRA Let a vector x ∈ Mn,1(R), x = [x1 x2 · · · xn]T (4.501) and let us calculate Hx = In − 2vvT vTv x = x − 2vT x vTv v. (4.502) Let e1 be the first column of the unit matrix In. If Hx is in the vector subspace generated by e1, then it follows results that v is in the vector subspace generated by x and e1. Let us take v = x + λe1, (4.503) where λ ∈ R. It follows that vT x = xT x + λx1, (4.504) vT v = xT x + 2λx1 + λ2 , (4.505) Hx = 1 − 2 xT x + λx1 xTx + 2λx1 + λ2 − 2λ vT x vTv e1; (4.506) the condition that Hx be in the vector subspace generated by e1 leads to 1 − 2 xT x + λx1 xTx + 2λx1 + λ2 = 0, (4.507) that is, λ2 = xT x, λ = ± √ xTx. (4.508) Definition 4.14 Let A ∈ Mm,n(R). We call the following expression the QR factorization of the matrix A: A = QR, (4.509) where Q ∈ Mm(R) is an orthogonal matrix. QT Q = Im, (4.510) and R ∈ Mm,n(R) is an upper triangular matrix. Let A =     a11 a12 a13 · · · a1n a21 a22 a23 · · · a2n · · · · · · · · · · · · · · · am1 am2 am3 · · · amn     . (4.511) We find a Householder matrix H1 ∈ Mm(R), so that H1A =     a11 a12 a13 · · · a1n 0 a22 a23 · · · a2n · · · · · · · · · · · · · · · 0 am2 am3 · · · amn     . (4.512)
  • 180. QR DECOMPOSITION 171 We determine now a new Householder matrix H2 ∈ Mm−1(R) with the property H2      a22 a23 ... am2      =      a22 0 ... 0      (4.513) and choose H2 = 1 O 0 H2 . (4.514) Thus, H2H1A =       a11 a12 a13 · · · a1n 0 a22 a23 · · · a2n 0 0 a33 · · · a3n · · · · · · · · · · · · · · · 0 0 am3 · · · amn       . (4.515) The procedure is continuing with the determination of the matrix H3 with the property H3      a33 a43 ... am3      =      a33 0 ... 0      (4.516) and with the choice of the matrix H3 = I2 0 0 H3 . (4.517) Thus, we determine the Householder matrices H1, H2, H3, . . . , Hp, where p = min{m, n}. Moreover, R = HpHp−1 . . . H2H1A (4.518) and Q = H1H2 . . . Hp−1Hp. (4.519) Another possibility to obtain the QR decomposition is by the use of the Givens rotation matrices. Definition 4.15 The matrix denoted by G(i, j, θ), which is different from the unit matrix In, and whose elements are given by gii = cos θ, gij = sin θ, gji = − sin θ, gjj = cos θ. (4.520) is called the Givens rotation18 matrix of order n. 18Defined by James Wallace Givens Jr. (1910–1993) in 1950.
  • 181. 172 LINEAR ALGEBRA Let y be the product y =      y1 y2 ... yn      = G(i, j, θ)      x1 x2 ... xn      . (4.521) It follows that yk =    xi cos θ − xj sin θ, for k = i xi sin θ + xj cos θ, for k = j xk, otherwise, (4.522) so that yk = 0 for cos θ = xi x2 i + x2 j , sin θ = − xj x2 i + x2 j . (4.523) Multiplying the matrix A on the left by various Givens matrices GT 1 , GT 2 , . . . , GT r , results, in a finite number of steps, in the matrix R = GT r GT r−1 · · · GT 2 GT 1 A, (4.524) an upper triangular matrix. The matrix Q is given by Q = G1G2 · · · Gr−1Gr . (4.525) 4.8 THE SINGULAR VALUE DECOMPOSITION (SVD) Definition 4.16 (i) Let x1, x2, . . . , xp be vectors in Rn, p ≤ n. We say that the vectors xi, i = 1, p, are orthogonal if xT i xj = 0, (4.526) for any 1 ≤ i, j ≤ p, i = j. (ii) If, in addition, xT i xi = 1 (4.527) for any 1 ≤ i ≤ p, then the vectors x1, x2, . . . , xp are called orthonormal vectors. Observation 4.41 (i) If xi, i = 1, p, are orthogonal, then they are also linear independent. (ii) The system of orthogonal vectors xi, i = 1, p, of Rn , p < n, may be completed by the vectors xp+1, . . . , xn, so that the new system of vectors x1, . . . , xn is orthogonal. (iii) There exists A1 ∈ Mn,(n−p)(R), p < n, so that the matrix A = A1 A2 (4.528) has orthonormal columns.
  • 182. THE SINGULAR VALUE DECOMPOSITION (SVD) 173 Theorem 4.2 (Singular Value Decomposition (SVD)19 ). If A ∈ Mm,n(R) then there exist orthogonal matrices U ∈ Mm(R) and V ∈ Mn(R) so that UT AV =     σ1 0 · · · 0 0 σ2 · · · 0 · · · · · · · · · · · · 0 0 · · · σp     , (4.529) where p = min{m, n}. Demonstration. Let x ∈ Rn and y ∈ Rm be two vectors of unitary norm that fulfill the relation Ax = A 2y = σy. (4.530) Taking into account the previous observation, we know that matrices V2 ∈ Mn,n−1(R) and U2 ∈ Mm,m−1(R) exist, so that the matrices V = [xV2] ∈ Mn(R) and U = [yU2] ∈ Mm(R) are orthogonal. On the other hand, UT AV = σ wT 0 B = A1. (4.531) But A1 σ w 2 2 ≥ (σ2 + wT w)2 (4.532) and A1 σ w 2 ≤ A1 2 σ w 2 . (4.533) Then X 2 = XT 2, (4.534) for any matrix X, and hence, σ w 2 = σ w 2 σ w T 2 ≥ σ w σ w 2 = (σ2 + wTw), (4.535) so that A1 2 2 ≥ σ2 + wT w. (4.536) But U and V are orthogonal; we have UAV 2 = UT AV 2 = A 2 (4.537) and we deduce σ2 = A 2 2 = UT AV 2 2 = A1 2 2. (4.538) 19The algorithm for SVD was given by Gene Howard Golub (1932–2007) and William Morton Kahan (1933–) in 1970.
  • 183. 174 LINEAR ALGEBRA Comparing relations (4.536) and (4.538), it follows that wT w = w 2 2 = 0, (4.539) and hence, w = 0. (4.540) The procedure is continued for the matrix B ∈ Mm−1,n−1(R) and so on, the theorem being proved. In the demonstration, we used 2 defined as follows: • for x ∈ Rn , x 2 = x2 1 + x2 2 + · · · + x2 n. (4.541) • for A ∈ Mm,n(R), A 2 = sup x=0 Ax 2 x 2 = max x 2=1 Ax 2. (4.542) 4.9 USE OF THE LEAST SQUARES METHOD IN SOLVING THE LINEAR OVERDETERMINED SYSTEMS We consider the linear system Ax = b, (4.543) where A ∈ Mm,n(R), m ≥ n, x ∈ Mn,1(R), b ∈ Mm,1(R). Definition 4.17 System (4.543) is called an overdetermined system. Obviously, system (4.543) has an exact solution only in some particular cases. An idea of solving consists in finding the vector x so as to minimize the expression Ax − b , where is one of the norms of the matrix, that is min x∈Mn,1(R) Ax − b . (4.544) It is obvious that the answer depends on the chosen norm. Usually, we consider the norm 2, which leads to the least squares method. To begin, let us consider the case in which the columns of the matrix A are linearly independent. We start from the equality A(x + αz) − b 2 2 = Ax − b 2 2 + 2αxT AT (Ax − b) + α2 Az 2 2, (4.545) where α is a real parameter, while z ∈ Mn,1(R). If x is a solution of relation (4.544), then AT (Ax − b) = 0. (4.546) Indeed, if relation (4.546) is not satisfied, then we choose z = −AT (Ax − b) (4.547)
  • 184. USE OF THE LEAST SQUARES METHOD 175 and from equation (4.545) we get A(x + αz) − b 2 2 < Ax − b 2 2, (4.548) that is, x does not minimize expression (4.544), which is absurd. It follows therefore that if the columns of the matrix A are linearly independent, then the solution of system (4.543) in the sense of the least squares, denoted by xLS, is obtained from the linear system AT AxLS = AT b. (4.549) Definition 4.18 (i) System (4.549) is called the system of normal equations. (ii) The expression rLS = b − AxLS (4.550) is called the minimum residual. If A has the QR decomposition, where Q ∈ Mm(R) is orthogonal, while R is upper triangular, then QT A = R =           r11 r12 · · · r1n 0 r22 · · · r2n · · · · · · · · · · · · 0 0 · · · rnn 0 0 · · · 0 · · · · · · · · · · · · 0 0 · · · 0           . (4.551) We also denote QT b = c d , (4.552) where c = [c1 c2 · · · cn]T , d = [d1 d2 · · · dm−n]T . (4.553) Thus, it follows that Ax − b 2 2 = QT Ax − QT b 2 2 = R1x − c 2 2 + d 2 2, (4.554) with R1 =     r11 r12 · · · 0 0 r22 · · · r2n · · · · · · · · · · · · 0 0 · · · rnn     , R1 ∈ Mn(R). (4.555) As rank(A) = rank(R1) = n, (4.556) the solution of system (4.543) in the sense of the least squares is obtained from the system R1xLS = c. (4.557)
  • 185. 176 LINEAR ALGEBRA The case in which the columns of the matrix A are not linearly independent is somewhat more complicated. Let us denote by x a solution of equation (4.544) and let z ∈ null(A). It follows that x + z is also a solution of equation (4.544), hence condition (4.544) does not have a unique solution. Moreover, the set of all x ∈ Mn,1(R) for which Ax − b 2 is minimum is a convex set. We define in this set xLS as being that x for which x 2 is minimum. Let us show that xLS is unique. We denote by Q and Z two orthogonal matrices for which QT AZ = T =           t11 t12 · · · t1r 0 · · · 0 t21 t22 · · · t2r 0 · · · 0 · · · · · · · · · · · · · · · · · · · · · tr1 tr2 · · · trr 0 · · · 0 0 0 · · · 0 0 · · · 0 · · · · · · · · · · · · · · · · · · · · · 0 0 · · · 0 0 · · · 0           , (4.558) where r = rank(A). Under these conditions, Ax − b 2 2 = (QT AZ)ZT x − QT b 2 2 = T1w − c 2 2 + d 2 2, (4.559) where ZT x = w y , QT b = c d , (4.560) w = [w1 w2 · · · wr ]T , y = [y1 y2 · · · yn−r ]T , c = [c1 c2 · · · cr ]T , d = [d1 d2 · · · dn−r ]T , (4.561) T1 =     t11 t12 · · · t1r t21 t22 · · · t2r · · · · · · · · · · · · tr1 tr2 · · · trr     . (4.562) If we choose x such that equation (4.559) be minimum, then w = T−1 1 c (4.563) and xLS = Z T−1 1 c 0 . (4.564) If we use SVD for the matrix A, then xLS = r i=1 uT i b σi vi, (4.565)
  • 186. THE PSEUDO-INVERSE OF A MATRIX 177 where UT AV = Σ =           σ1 0 · · · 0 0 · · · 0 0 σ2 · · · 0 0 · · · 0 · · · · · · · · · · · · · · · · · · · · · 0 0 · · · σr 0 · · · 0 0 0 · · · 0 0 · · · 0 · · · · · · · · · · · · · · · · · · · · · 0 0 · · · 0 0 · · · 0           , Σ ∈ Mm,n(R), (4.566) U = [u1 u2 · · · um], V = [v1 v2 · · · vn]. (4.567) 4.10 THE PSEUDO-INVERSE OF A MATRIX Let A ∈ Mm,n(R) for which we know the SVD, UT AV = Σ ∈ Mm,n(R) (4.568) and let r = rank(A). Definition 4.19 The matrix A+ ∈ Mn,m(R) is defined by A+ = VΣ+ UT , (4.569) where Σ+ ∈ Mn,m(R) and Σ+ =                 1 σ1 0 · · · 0 0 · · · 0 0 1 σ2 · · · 0 0 · · · 0 · · · · · · · · · · · · · · · · · · · · · 0 0 · · · 1 σr 0 · · · 0 0 0 · · · 0 0 · · · 0 · · · · · · · · · · · · · · · · · · 0 0 0 · · · 0 0 · · · 0                 . (4.570) Let us observe that xLS = A+ b; (4.571) hence A+ is the unique solution of the problem min X∈Mn,m(R) AX − Im k. (4.572)
  • 187. 178 LINEAR ALGEBRA 4.11 SOLVING OF THE UNDERDETERMINED LINEAR SYSTEMS Definition 4.20 The linear system Ax = b, (4.573) where A ∈ Mm,n(R), b ∈ Mm,1(R), x ∈ Mn,1(R) and m < n is called an underdetermined linear system. Let us consider the QR decomposition of the matrix AT , AT = QR = Q R1 0n−m,m , (4.574) where R1 ∈ Mm(R), while 0n−m,m is a matrix of Mn−m,m(R) with all elements equal to zero. System (4.573) is now written in the form (QR)T x = RT 1 0m,n−m z1 z2 = b, (4.575) where z1 ∈ Mm,1(R), z2 ∈ Mn−m,1(R) and QT x = [z1 z2]T . (4.576) The minimum in norm solution is obtained if we impose the condition z2 = 0. In general, an underdetermined system either does not have a solution or has an infinite number of solutions. 4.12 NUMERICAL EXAMPLES Example 4.1 Let us calculate the determinant of the matrix A =   1 2 −3 5 0 4 2 1 7   . (4.577) If we calculate the determinant by means of the definition, then we have to consider 3! = 6 permutations. These permutations, together with their signs and the corresponding products are given below. Permutation Sign Product p1 = (1, 2, 3) + P1 = a11a22a33 = 0 p2 = (1, 3, 2) – P2 = a11a23a32 = 4 p3 = (2, 1, 3) – P3 = a12a21a33 = 70 p4 = (2, 3, 1) + P4 = a12a23a31 = 16 p5 = (3, 1, 2) + P5 = a13a21a32 = −15 p6 = (3, 2, 1) – P6 = a13a22a31 = 0 We obtain det A = P1 − P2 − P3 + P4 + P5 − P6 = −73. (4.578)
  • 188. NUMERICAL EXAMPLES 179 The same problem may be solved by means of equivalent matrices. Let us denote by the required determinant and let us commute the rows 1 and 2 of the matrix A with each other in order to realize the pivoting with the maximum element in modulus of the column 1. We have = − 5 0 4 1 2 −3 2 1 7 . (4.579) We multiply row 1 by −1/5 and −2/5, and we add it to the rows 2 and 3, respectively, obtaining = − 5 0 −4 0 2 − 19 5 0 1 27 5 . (4.580) We now multiply row 2 by −1/2 and we add it to row 3, obtaining = − 5 0 −4 0 2 − 19 5 0 0 73 10 = −73. (4.581) Example 4.2 Let us calculate the rank of the matrix A =   1 2 3 0 3 5 8 1 6 11 17 1   . (4.582) We observe that the minor of second order 2 = 1 2 3 5 = −1 (4.583) has a non zero value, hence the rank of A is at least equal to two. Let us now border this minor by elements so as to obtain all the minors of order 3. As a matter of fact we have only two such minors, that is 31 = 1 2 3 3 5 8 6 11 17 = 0, (4.584) 32 = 1 2 0 3 5 1 6 11 1 = 0, (4.585) so it follows that rank A = 2. (4.586)
  • 189. 180 LINEAR ALGEBRA We may solve this problem by using equivalent matrices too. Thus, the rank of the matrix A is the same with the rank of the matrix obtained from the matrix A by commuting rows 1 and 3 with each other, A =   1 2 3 0 3 5 8 1 6 11 17 1   ∼   6 11 17 1 3 5 8 1 1 2 3 0   . (4.587) We now multiply, in the new matrix, row 1 by −1/2 and −1/6, and add it to rows 2 and 3, respectively, obtaining A ∼      6 11 17 1 0 − 1 2 − 1 2 1 2 0 1 6 1 6 − 1 6      . (4.588) We now multiply the rows 2 and 3 by 2 and 6, respectively, obtaining A ∼   6 11 17 1 0 −1 −1 1 0 1 1 −1   . (4.589) We multiply column 1 by −11/6, by −17/6 and by −1/6 and add it to columns 2, 3, and 4, respectively, obtaining A ∼   6 0 0 0 0 −1 −1 1 0 1 1 −1   . (4.590) We add now the second row to the third one, resulting A ∼   6 0 0 0 0 −1 −1 −1 0 0 0 0   . (4.591) The last step consists in the subtraction of the second column from the third one and by addition of the second column to the fourth one, deducing A ∼   6 0 0 0 0 −1 0 0 0 0 0 0   = B. (4.592) To determine the rank of the matrix A it is now sufficient to number the non-zero elements of the principal quasi-diagonal of the matrix B, that is, the elements b11 = 6, b22 = −1 and b33 = 0. It follows that rank A = 2. (4.593) Example 4.3 Let the matrix A =   1 2 −1 0 3 4 5 6 −2   . (4.594) We pose the problem of calculating the inverse of this matrix. The direct method supposes the calculation of the determinant det A = 25 (4.595)
  • 190. NUMERICAL EXAMPLES 181 and of the minors 11 = 3 4 6 −2 = −30, 12 = 0 4 5 −2 = −20, 13 = 0 3 5 6 = −15, 21 = 2 −1 6 −2 = 2, 22 = 1 −1 5 −2 = 3, 23 = 1 2 5 6 = −4, 31 = 2 −1 3 4 = 11, 32 = 1 −1 0 4 = 4, 33 = 1 2 0 3 = 3, (4.596) from which A−1 = 1 25   −30 −2 11 20 3 −4 −15 4 3   . (4.597) We now pass on to the Gauss–Jordan method for which we construct the table 1 2 −1 0 3 4 5 6 −2 1 0 0 0 1 0 0 0 1 . (4.598) We commute rows 1 and 3 with each other, 5 6 −2 0 3 4 1 2 −1 0 0 1 0 1 0 1 0 0 , (4.599) we divide row 1 by 5, 1 6 5 − 2 5 0 3 4 1 2 −1 0 0 1 5 0 1 0 1 0 − 1 5 . (4.600) and then we subtract row 1 from row 3, obtaining 1 6 5 − 2 5 0 3 4 0 4 5 − 3 5 0 0 1 5 0 1 0 1 0 − 1 5 . (4.601) We now divide row 2 by 3, 1 6 5 − 2 5 0 1 4 3 0 4 5 − 3 5 0 0 1 5 0 1 3 0 1 0 − 1 5 , (4.602)
  • 191. 182 LINEAR ALGEBRA and then multiply the new row 2 by −6/5 and −4/5, and add the results to rows 1 and 3, respectively, obtaining 1 0 −2 0 1 4 3 0 0 − 3 5 0 − 2 5 1 5 0 1 3 0 1 − 14 5 − 1 5 . (4.603) Further, we divide the third row by −5/3, 1 0 −2 0 1 4 3 0 0 1 0 − 2 5 1 5 0 1 3 0 − 3 5 4 25 3 25 , (4.604) and multiply the new row 3 by 2 and −4/3, and add it to the rows 1 and 2, respectively, 1 0 0 0 1 0 0 0 1 − 6 5 − 2 25 11 25 4 5 3 25 − 4 25 − 3 5 4 25 3 25 . (4.605) We have thus, in the right part of table (4.605), the searched required inverse, given before in equation (4.597). We shall solve now the same problem by the method of partitioning the matrix A. If A = A1 A3 A2 A4 , A−1 = B = B1 B3 B2 B4 , (4.606) then we have B4 = (A4 − A2A−1 1 A3)−1 , ∈ B3 = −A−1 1 A3B4, B2 = −B4A2A−1 1 , B1 = A−1 1 − A−1 1 A3B2, (4.607) with the conditions that A4 − A2A−1 1 A3 and A1 be invertible matrices. Let us choose A1 = [1], A2 = 0 5 , A3 = [2 − 1], A4 = 3 4 6 −2 , (4.608)
  • 192. NUMERICAL EXAMPLES 183 from which A−1 1 = [1], (4.609) B4 = A4 − A2A−1 1 A3 = 3 4 6 −2 − 0 5 [1] 2 −1 = 3 4 −4 3 , (4.610) (A4 − A2A−1 1 A3)−1 = 1 25 3 −4 4 3 , (4.611) B3 = −[1] 2 −1 1 25 3 −4 4 3 = 1 25 [2 11], (4.612) B2 = − 1 25 3 −4 4 3 0 5 = 1 25 20 −15 , (4.613) B1 = −[1] − [1] 2 −1 1 25 20 −15 = − 6 5 . (4.614) We obtained thus the same inverse (4.597). To determine the inverse using the iterative method (Schulz) we shall consider an approximation B0 of A−1 , given by B0 =   −1.23 −0.1 0.46 0.77 0.13 −0.15 −0.62 0.17 0.11   . (4.615) We deduce C0 = I3 − AB0 =   0.07 0.01 −0.05 0.17 −0.07 0.01 0.29 0.06 −0.18   , (4.616) C0 ∞ = 0.53, (4.617) so that we may apply Schulz’s method. There follows, successively, B1 = B0 + B0C0 =   −1.1997 −0.0777 0.4377 0.8025 0.1196 −0.1602 −0.6026 0.1595 0.1229   , (4.618) C1 = I3 − AB1 =   −0.0079 −0.002 0.0056 0.0029 0.0032 −0.011 −0.0217 −0.0101 0.0185   , (4.619) B2 = B1 + B1C1 =   −1.199946 −0.079970 0.439934 0.799983 0.119996 −0.159985 −0.600044 0.159974 0.120045   . (4.620) The procedure may, obviously, continue, the exact value of the inverse being A−1 = lim n→∞ Bn =   −1.2 −0.08 0.44 0.8 0.12 −0.16 −0.6 0.16 0.12   . (4.621)
  • 193. 184 LINEAR ALGEBRA Another possibility to determine A−1 consists in the use of the characteristic polynomial of the matrix A. To do this, we calculate 1 − λ 2 −1 0 3 − λ 4 5 6 −2 − λ = −λ3 + 2λ2 + 24λ + 25, (4.622) which leads to the equation A3 + 2A2 + 24A + 25I3 = O3, (4.623) from which, multiplying by A−1 , we get −A2 + 2A + 24I3 = −25A−1 ; (4.624) hence A−1 = 1 25 (A2 − 2A − 24I3). (4.625) But A2 =   1 2 −1 0 3 4 5 6 −2     1 2 −1 0 3 4 5 6 −2   =   −4 2 9 20 33 4 −5 16 23   (4.626) and it follows that A−1 = 1 25     −4 2 9 20 33 4 −5 16 23   − 2   1 2 −1 0 3 4 5 6 −2   − 24   1 0 0 0 1 0 0 0 1     = − 1 25   −30 −2 11 20 3 −4 −15 4 3   . (4.627) Let us now calculate the inverse of A, using the Frame–Fadeev method. We have successively A1 = A =   1 2 −1 0 3 4 5 6 −2   , σ1 = −Tr(A1) = −2, B1 = A1 + σ1I3 =   −1 2 −1 0 1 4 5 6 −4   , (4.628) A2 = AB1 =   −6 −2 11 20 27 −4 −15 4 27   , σ2 = − 1 2 Tr(A2) = −24, B2 = A2 + σ2I3 =   −30 −2 11 20 3 −4 −15 4 3   , (4.629) A3 = AB2 =   25 0 0 0 25 0 0 0 25   , σ3 = − 1 3 Tr(A3) = −25, B3 = A3 + σ3I3 =   0 0 0 0 0 0 0 0 0   ; (4.630)
  • 194. NUMERICAL EXAMPLES 185 hence A−1 = − 1 σ3 B2 = 1 25   −30 −2 11 20 3 −4 −15 4 3   . (4.631) To determine the inverse of A by Schur’s method, let us consider A = A1 A2 A3 A4 , (4.632) where A1 = [1], A2 = [2 − 1], A3 = 0 5 , A4 = 3 4 6 −2 . (4.633) We have A−1 4 = − 1 30 −2 −4 −6 3 , (4.634) A2A−1 4 = − 1 30 [2 − 1] −2 −4 −6 3 = − 1 30 [2 − 11], (4.635) A−1 4 A3 = − 1 30 −2 −4 −6 3 0 5 = − 1 30 −20 15 , (4.636) A2A−1 4 A3 = − 1 30 [2 − 11] 0 5 = 11 6 , (4.637) A1 − A2A−1 4 A3 = − 5 6 , (A1 − A2A−1 4 A3)−1 = − 6 5 . (4.638) We may write A =   1 2 −1 0 3 4 5 6 −2   =     1 − 2 30 11 30 0 1 0 0 0 1         − 5 6 0 0 0 3 4 0 6 −2           1 0 0 2 3 1 0 − 1 2 0 1       , (4.639) A−1 =        1 0 0 − 2 3 1 0 1 2 0 1                − 6 5 0 0 0 1 15 2 15 0 1 5 − 1 10              1 1 15 − 11 30 0 1 0 0 0 1      =         − 6 5 − 2 25 11 25 4 5 3 25 − 4 25 − 3 5 4 25 3 25         . (4.640) Example 4.4 Let the linear system be 10x1 + 2x2 − x3 = 7, 2x1 + 8x2 + x3 = −5, x1 + x2 + 10x3 = 8, (4.641) the solution of which is required.
  • 195. 186 LINEAR ALGEBRA If we wish to apply Cramer’s rule, then we must calculate the determinants = 10 2 −1 2 8 1 −1 1 10 = 738, 1 = 7 2 −1 −5 8 1 8 1 10 = 738, 2 = 10 7 −1 2 −5 1 −1 8 10 = −738, 3 = 10 2 7 2 8 −5 −1 1 8 = 738, (4.642) wherefrom x1 = 1 = 1, x2 = 2 = −1, x3 = 3 = 1. (4.643) To solve the same problem by Gauss’s method, we multiply the first equation in system (4.641) by −1/5 and by 1/10 and we will add it to the second and third equations (4.642) and (4.643), respectively, obtaining 10x1 + 2x2 − x3 = 7, 38 5 x2 + 6 5 x3 = − 32 5 , 6 5 x2 + 99 10 x3 = 87 10 . (4.644) We now multiply the second equation in system (4.644) by −3/19 and add it to the third equation (4.644), resulting in the system 10x1 + 2x2 − x3 = 7, 38 5 x2 + 6 5 x3 = − 32 5 , 369 38 x3 = 369 38 , (4.645) with the solution x3 = 1, x2 = −1, x1 = 1. (4.646) The first step in solving by the Gauss–Jordan method leads to the same system (4.644). We now multiply the second equation by −5/19 and by −3/19 and add it to the first and to the third equations of system (4.644), respectively, obtaining 10x1 − 25 19 x3 = 165 19 , 38 5 x2 + 6 5 x3 = − 32 5 , 369 38 x3 = 369 38 . (4.647) We now multiply the third equation of system (4.647) by 50/369 and −76/615, and add it to the first and second equations (4.641) and (4.642), respectively, obtaining 10x1 = 10, 38 5 x2 = − 38 5 , 369 38 x3 = 369 38 (4.648) and the solution x1 = 1, x2 = −1, x3 = 1. (4.649) Applying the Doolittle method of factorization LU, we are led to   1 0 0 l21 1 0 l31 l32 1     u11 u12 u13 0 u22 u23 0 0 u33   =   10 2 −1 2 8 1 −1 1 10   , (4.650) wherefrom we obtain the system u11 = 10, u12 = 2, u13 = −1, l21u11 = 2, l21u12 + u22 = 8, l21u13 + u23 = 1, l31u11 = −1, l31u12 + l32u22 = 1, l31u13 + l32u23 + u33 = 10, (4.651)
  • 196. NUMERICAL EXAMPLES 187 with the solution u11 = 10, u12 = 2, u13 = −1, l21 = 1 5 , u22 = 38 5 , u23 = 6 5 , l31 = − 1 10 , l32 = 3 19 , u33 = 369 38 . (4.652) There results L =        1 0 0 1 5 1 0 − 1 10 3 19 1        , U =        10 2 −1 0 38 5 6 5 0 0 369 38        . (4.653) We denote Ux = y (4.654) and solve the system Ly = b, (4.655) that is y1 = 7, 1 5 y1 + y2 = −5, − 1 10 y1 + 3 19 y2 + y3 = 8, (4.656) wherefrom y1 = 7, y2 = − 32 5 , y3 = 369 38 . (4.657) Expression (4.654) leads to the system 10x1 + 2x2 − x3 = 7, 38 5 x2 + 6 5 x3 = − 32 5 , 369 38 x3 = 369 38 , (4.658) with the known solution (4.649). The Crout method leads to   l11 0 0 l21 l22 0 l31 l32 l33     1 u12 u13 0 1 u23 0 0 1   =   10 2 −1 2 8 1 −1 1 10   , (4.659) wherefrom l11 = 10, l11u12 = 2, l11u13 = −1, l21 = 2, l21u12 + l22 = 8, l21u13 + l22u23 = 1, l31 = −1, l31u12 + l32 = 1, l31u13 + l32u23 + l33 = 10, (4.660) with the solution l11 = 10, u12 = 1 5 , u13 = − 1 10 , l22 = 38 5 , u23 = 3 19 , l31 = −1; (4.661) hence L =   10 0 0 2 38 5 0 −1 6 5 369 38   , U =   1 1 5 − 1 10 0 1 3 19 0 0 1   . (4.662)
  • 197. 188 LINEAR ALGEBRA This results in the system 10y1 = 7, 2y1 + 38 5 y2 = −5, −y1 + 6 5 y2 + 369 38 y3 = 8, (4.663) with the solution y1 = 7 10 , y2 = − 16 19 , y3 = 1, (4.664) and the system x1 + 1 5 x2 − 1 10 x3 = 7 10 , x2 + 3 19 x3 = − 16 19 , x3 = 1, (4.665) with the same solution (4.649). To apply the Cholesky method, we must verify that the matrix A is symmetric (obviously!) and positive definite. We have A =   10 2 −1 2 8 1 −1 1 10   , (4.666) xT Ax = x1 x2 x3   10 2 −1 2 8 1 −1 1 10     x1 x2 x3   = (2x1 + x2)2 + (x1 − x3)2 + (x2 + x3)2 + 5x2 1 + 6x2 2 + 8x2 3 > 0, (4.667) for any x = 0. Hence, we may apply the Cholesky method in which L =   l11 0 0 l21 l22 0 l31 l32 l33   , U =   l11 l21 l31 0 l22 l32 0 0 l33   . (4.668) It results the system l2 11 = 10, l11l21 = 2, l11l31 = −1, l21l11 = 2, l2 21 + l2 22 = 8, l21l31 + l22l32 = 1, l31l11 = −1, l31l21 + l32l22 = 1, l2 31 + l2 32 + l2 33 = 10, (4.669) with the solution l11 = √ 10, l21 = 2 √ 10 , l31 = − 1 √ 10 , l22 = 38 5 , l32 = 6 √ 190 , l33 = 369 38 , (4.670) so that L =     √ 10 0 0 2√ 10 38 5 0 − 1√ 10 6√ 190 369 38     , U =      √ 10 2√ 10 − 1√ 10 0 38 5 6√ 190 0 0 369 38      . (4.671) We obtain the systems √ 10y1 = 7, 2 √ 10 y1 + 38 5 y2 = −5, − 1 √ 10 y1 + 6 √ 190 y2 + 369 38 y3 = 8, (4.672)
  • 198. NUMERICAL EXAMPLES 189 with the solution y1 = 7 √ 10 , y2 = − 32 √ 190 , y3 = 369 38 , (4.673) and √ 10x1 + 2 √ 10 x2 − 1 √ 10 x3 = 7 √ 10 , 38 5 x2 + 6 √ 190 x3 = − 32 √ 190 , 369 38 x3 = 369 38 , (4.674) respectively, wherefrom results solution (4.649). To solve system (4.641) by the iteration (Jacobi) method, we write it in the form x1 = 0.7 − 0.2x2 + 0.1x3, x2 = −0.625 − 0.25x1 − 0.125x3, x3 = 0.8 + 0.1x1 − 0.1x2, (4.675) the matrices α and β having the expressions α =   0 −0.2 0.1 −0.25 0 −0.125 0.1 −0.1 0   , β =   0.7 −0.625 0.8   . (4.676) We choose x(0) = β, (4.677) the iteration formula being x(n+1) = αx(n) + β. (4.678) Let us observe that α ∞ = 0.375 < 1, (4.679) so that the Jacobi method may be applied. We have successively x(1) = αx(0) + β =   0 −0.2 0.1 −0.25 0 −0.125 0.1 −0.1 0     0.7 −0.625 0.8   +   0.7 −0.625 0.8   =   0.905 −0.9 0.9325   , (4.680) x(2) = αx(1) + β =   0 −0.2 0.1 −0.25 0 −0.125 0.1 −0.1 0     0.905 −0.9 0.9325   +   0.7 −0.625 0.8   =   0.97325 −0.9678125 0.9805   , (4.681) x(3) = αx(2) + β =   0.9916125 −0.990875 0.99410625   , x(4) = αx(3) + β =   0.997585625 −0.997166406 0.99824875   . (4.682) The procedure may continue, so that at the limit, we obtain x = lim n→∞ x(n) =   1 −1 1   . (4.683)
  • 199. 190 LINEAR ALGEBRA The solution of system (4.641) may be determined by means of the Gauss–Seidel method too. In this case, the iteration formulae read x(n+1) 1 = 0.7 − 0.2x(n) 2 + 0.1x(n) 3 , x(n+1) 2 = −0.625 − 0.25x(n+1) 1 − 0.125x(n) 3 , x(n+1) 3 = 0.8 + 0.1x(n+1) 1 − 0.1x(n+1) 2 . (4.684) It results successively in x(1) 1 = 0.7 + 0.2 · 0.625 + 0.1 · 0.8 = 0.905, x(1) 2 = −0.625 − 0.2 · 0.905 + 0.125 · 0.8 = −0.95125, x(1) 3 = 0.8 + 0.1 · 0.905 + 0.1 · 0.95125 = 0.985625, (4.685) x(2) 1 = 0.7 + 0.2 × 0.95125 + 0.1 × 0.985625 = 0.9888125, x(2) 2 = −0.625 − 0.25 × 0.9888125 − 0.125 × 0.985625 = −0.99540625, x(2) 3 = 0.8 + 0.1 × 0.9888125 + 0.1 × 0.99540625 = 0.998421875, (4.686) x(3) 1 = 0.998923437, x(3) 2 = −0.999533593, x(3) 3 = 0.999845703, (4.687) x(4) 1 = 0.999891288, x(4) 2 = −0.999953534, x(4) 3 = 0.999984482. (4.688) The procedure continues by obtaining at the limit, for n → ∞, solution (4.649). If we wish to solve the problem by the relaxation method, then we write system (4.641) in the form x1 + 0.2x2 − 0.1x3 − 0.7 = 0, x2 + 0.25x1 + 0.125x3 + 0.625 = 0, x3 − 0.1x1 + 0.1x2 − 0.8 = 0. (4.689) Let us replace in equation (4.689) the values given by x(0). It follows that 0.7 − 0.2 × 0.625 − 0.1 × 0.8 − 0.7 = −0.205 = R(0) 1 , −0.625 + 0.25 × 0.7 + 0.125 × 0.8 + 0.625 = −0.275 = R(0) 2 , 0.8 − 0.1 × 0.7 − 0.1 × 0.625 − 0.8 = −0.1325 = R(0) 3 . (4.690) The greatest remainder in modulus is R(0) 2 , so that x(1) = x(0) +   0 −0.275 0   =   0.7 −0.9 0.8   . (4.691) We now replace in system (4.689) the values given by x(1) , obtaining the remainders 0.7 − 0.2 × 0.9 − 0.1 × 0.8 − 0.7 = −0.26 = R(1) 1 , −0.9 − 0.25 × 0.7 + 0.125 × 0.8 + 0.625 = 0 = R(1) 2 , 0.8 − 0.1 × 0.7 − 0.1 × 0.9 − 0.8 = −0.16 = R(1) 3 ; (4.692)
  • 200. NUMERICAL EXAMPLES 191 the greatest remainder in modulus is R(1) 1 , hence x(2) = x(1) +   0.26 0 0   =   0.96 −0.9 0.8   . (4.693) Continuing the procedure, we obtain the values x(3) =   0.96 −0.9 0.986   , x(4) =   0.96 −0.98825 0.986   , x(5) =   0.99625 −0.98825 0.986   , x(6) =   0.99625 −0.98825 0.99845   , x(7) =   0.99625 −0.9988687 0.99845   , . . . (4.694) To apply Schur’s method, we write the matrix A =   10 2 −1 2 8 1 −1 1 10   (4.695) in the form A = A1 A2 A3 A4 , (4.696) where A1 = 10 2 2 8 , A2 = −1 1 , A3 = [−1 1], A4 = [10]. (4.697) The vectors x =   x1 x2 x3   , b =   7 −5 8   , (4.698) are written in the form x = x1 x2 , b = b1 b2 , (4.699) where x1 = x1 x2 , x2 = [x3], b1 = 7 −5 , b2 = [8]. (4.700) It follows that x1 = (A1 − A2A−1 4 A3)−1 (b1 − A2A−1 4 b2), (4.701) x2 = A−1 4 b2 − A−1 4 A3x1. (4.702) Effecting the necessary calculations, we obtain A−1 4 = 1 10 , (4.703)
  • 201. 192 LINEAR ALGEBRA A1 − A2A−1 4 A3 = 1 10 99 21 21 79 , (A1 − A2A−1 4 A3)−1 = 1 738 79 −21 −21 99 , (4.704) b1 − A2A−1 4 b2 = 1 10 78 −58 , (4.705) x1 = 1 7380 79 −21 −21 99 78 −58 = 1 −1 (4.706) x2 = 1 10 [8] − 1 10 [−1 1] 1 −1 = [1]. (4.707) System (4.641) may be solved by the Monte Carlo method too. To do this, we write it in the form x1 = −0.2x2 + 0.1x3 + 0.7, x2 = −0.25x1 − 0.125x3 − 0.625, x3 = 0.1x1 − 0.1x2 + 0.8 (4.708) and the matrix H becomes H =     0 0.2 0.1 0.7 0.25 0 0.125 0.625 0.1 0.1 0 0.8 0 0 0 1     . (4.709) For the initial state S1, we may write as follows: • If 0 ≤ x < 0.2, then we pass to the state S2. • If 0.2 ≤ x < 0.3, then we pass to the state S3. • If 0.3 ≤ x < 1, then we pass in the final state S4. For the initial state S2, we have the following: • If 0 ≤ x < 0.25, then we pass to the state S1. • If 0.25 ≤ x < 0.375, then we pass to the state S3. • If 0.375 ≤ x < 1, then we pass to the final state S4. Finally, for the initial state S3 we get the following: • If 0 ≤ x < 0.1, then we pass to the state S1. • If 0.1 ≤ x < 0.2, then we pass to the state S2. • If 0.2 ≤ x < 1, then we pass to the final state S4. Moreover, v11 = 0, v12 = −1, v13 = 1, v21 = −1, v22 = 0, v23 = −1, v31 = 1, v32 = −1, v33 = 0. (4.710) There have been 1000 simulations made for each unknown xi, i = 1, 3, of the following form:
  • 202. NUMERICAL EXAMPLES 193 Nr. Random number Trajectory The value of the aleatory variable X 1 0.263 0.194 0.925 S1, S3, S2, S4 0.7 − 0.8 + 0.325 − 0.625 We obtain the values x1 ≈ 0.98, x2 ≈ −1.03, x3 ≈ 1.06. (4.711) Example 4.5 Let x ∈ M2,1(R). We define the norm x 2 = x2 1 + x2 2 , (4.712) where x = x1 x2 T . (4.713) For a matrix A ∈ M2(R) we define the norm A 2 = sup x=0 Ax 2 x 2 . (4.714) Let us consider A = 1 2 0 −1 . (4.715) We wish to calculate A 2. Let us show that expression (4.712) defines a norm. First of all x 2 ≥ 0 for any x ∈ M2,1(R). Moreover, x 2 = 0 leads to x2 1 + x2 2 = 0, with the unique solution x1 = x2 = 0, hence x = 0. Let y ∈ M2,1(R), y = y1 y2 T . (4.716) We have successively x + y 2 = (x1 + y1)2 + (x2 + y2)2 = x2 1 + x2 2 + y2 1 + y2 2 + 2x1y1 + 2x2y2, (4.717) x 2 + y 2 = x2 1 + x2 2 + y2 1 + y2 2 (4.718) and the inequality x + y 2 ≤ x 2 + y 2 (4.719) is equivalent to x1y1 + x2y2 ≤ x2 1 + x2 2 y2 1 + y2 2 . (4.720) If x1y1 + x2y2 < 0, then inequality (4.720) is obviously satisfied. If x1y1 + x2y2 > 0, then we square both members of inequality (4.720) and obtain the equivalent relation 2x1x2y1y2 ≤ x2 1 y2 2 + x2 2 y2 1 , (4.721) Which is obviously true.
  • 203. 194 LINEAR ALGEBRA We also may write αx 2 = α2x2 1 + α2x2 2 = |α| x 2, (4.722) where α ∈ R, hence x 2 is a norm. On the other hand, A 2 = sup x=0 Ax 2 x 2 = sup x=0 A x x 2 = max x 2=1 Ax 2. (4.723) From x 2 = 1, it follows that there exists θ ∈ [0, 2π) with the property x = x1 x2 T = cos θ sin θ T . (4.724) If A = a11 a12 a21 a22 , (4.725) then Ax| x 2=1 = a11 cos θ + a12 sin θ a21 cos θ + a22 sin θ (4.726) and Ax 2 = [(a2 11 + a2 21)cos2 θ + (a2 12 + a2 22)sin2 θ + 2(a11a12 + a21a22) sin θ cos θ] 1 2 . (4.727) It follows that A 2 = max θ∈[0,2π) a2 11 + a2 21 − a2 12 − a2 22 2 cos 2θ + a11a12 + a21a22 sin 2θ + a2 11 + a2 21 + a2 12 + a2 22 2 1 2 . (4.728) We verify immediately that 2 is norm. For the matrix A given by equation (4.715), we get A 2 = max θ∈[0,2π) [−2 cos 2θ + 2 sin 2θ + 3] 1 2 . (4.729) We denote f : [0, 2π) → R, f (θ) = −2 cos 2θ + 2 sin 2θ + 3, (4.730) and we may write f (θ) = 4 sin 2θ + 4 cos 2θ. (4.731) The equation f (θ) = 0 leads to the solution tan 2θ = −1, (4.732) wherefrom sin 2θ = √ 2 2 , cos 2θ = − √ 2 2 . (4.733)
  • 204. NUMERICAL EXAMPLES 195 It follows that A 2 = 3 + 2 √ 2. (4.734) Example 4.6 Let the matrix A =   2 1 −1 3 0 3 2 5 2 4 1 8   , (4.735) for which we calculate the QR factorization. We have x1 = 2 0 2 T , x 2 = 2 √ 2 = λ1 (4.736) and choose v1 = x1 + 2 √ 2e1 = 2 1 + √ 2 0 1 T . (4.737) It follows successively that v1vT 1 = 4   3 + 2 √ 2 0 1 + √ 2 0 0 0 1 + √ 2 0 1   , (4.738) vT 1 v1 = 8(2 + √ 2), (4.739) 2 v1vT 1 vT 1 v1 = 1 2 + √ 2   3 + 2 √ 2 0 1 + √ 2 0 0 0 1 + √ 2 0 1   , (4.740) H1 = 1 2 + √ 2   −1 − √ 2 0 −1 − √ 2 0 2 + √ 2 0 −1 − √ 2 0 1 + √ 2   , (4.741) H1A =   −2.828427 −3.535534 0 −7.778175 0 3 2 5 0 2.121320 1.414215 3.535534   . (4.742) We also find x2 = 3 2.121320 T , x2 2 = 3.674234 = λ2. (4.743) v2 = x2 + 3.674234e2 = 6.674234 2.121320 T , (4.744) v2vT 2 = 44.545399 14.158186 14.158186 4.5 , (4.745) vT 2 v2 = 49.045399, (4.746) 2 v2vT 2 vT 2 v2 = 1.816497 0.577350 0.577350 0.183503 , (4.747) H2 = −0.816497 −0.577350 −0.577350 0.816497 , (4.748) H2 =   1 0 0 0 −0.816497 −0.577350 0 −0.577350 0.816497   , (4.749)
  • 205. 196 LINEAR ALGEBRA H2H1A =   −2.828427 −3.535534 0 −7.778175 0 −3.674235 −2.449491 −6.123726 0 0 0 0   = R, (4.750) Q = H1H2 =   −0.707107 0.408248 −0.577350 0 −0.816497 −0.577350 −0.707107 −0.408248 0.577350   . (4.751) The same factorization may be found by u with the Givens matrices too. At the beginning, we equate to zero the element a31 = 2. To do this, we choose the Givens matrix G1 =   1 0 0 0 cos θ sin θ 0 − sin θ cos θ   , (4.752) such that GT 1   2 0 2   =   2 −2 sin θ 2 cos θ   . (4.753) The element 2 cos θ vanishes for θ = π/2 and we obtain G1 =   1 0 0 0 0 1 0 −1 0   , GT 1 =   1 0 0 0 0 −1 0 1 0   , (4.754) GT 1 A =   2 1 −1 3 −2 −4 −1 −8 0 3 2 5   . (4.755) We now equate to zero the element −2 of row 2 and column 1 in the matrix GT 1 A. For this, we choose G2 =   cos θ sin θ 0 − sin θ cos θ 0 0 0 1   , GT 2 =   cos θ − sin θ 0 sin θ cos θ 0 0 0 1   (4.756) and obtain GT 2   2 −2 0   =   2 cos θ + 2 sin θ 2 sin θ − 2 cos θ 0   . (4.757) The element 2 sin θ − 2 cos θ vanishes for θ = π/4 and we obtain G2 =         √ 2 2 √ 2 2 0 − √ 2 2 √ 2 2 0 0 0 1         , (4.758)
  • 206. NUMERICAL EXAMPLES 197 GT 2 GT 1 A =         √ 2 2 − √ 2 2 0 √ 2 2 √ 2 2 0 0 0 1             2 1 −1 3 −2 −4 −1 −8 0 3 2 5     =         2 √ 2 5 √ 2 2 0 11 √ 2 2 0 −3 √ 2 2 − √ 2 −5 √ 2 2 0 3 2 5         . (4.759) Obviously, the procedure may be continued, obtaining again the known factorization. Example 4.7 Let us consider the matrix A = 1 2 0 2 , (4.760) for which we want to calculate the SVD. Let u ∈ M2,1(R), u = cos θ sin θ T , θ ∈ [0, 2π), u 2 = 1. (4.761) To determine A 2 we have to calculate Au = cos θ + 2 sin θ 2 sin θ (4.762) and Au 2 = 9 2 + 2 sin 2θ − 7 2 cos 2θ. (4.763) Let f : [0, 2π) → R, f (θ) = 9 2 + 2 sin 2θ − 7 2 cos 2θ, (4.764) for which f (θ) = 4 cos 2θ + 7 sin 2θ. (4.765) The equation f (θ) = 0 leads to the solution tan 2θ = − 4 7 , sin 2θ = 4 √ 65 , cos 2θ = − 7 √ 65 , (4.766) hence A 2 = 9 2 + √ 65 2 = 2.92081 (4.767) The equation Ax = σy = A 2y (4.768) leads to 1 2 0 2 x1 x2 = σ y1 y2 (4.769) wherefrom x1 + 2x2 = σy1, 2x2 = σy2; (4.770)
  • 207. 198 LINEAR ALGEBRA moreover, x2 1 + x2 2 = 1, y2 1 + y2 2 = 1. (4.771) Relation (4.720) leads to (x1 + x2)2 + (2x2)2 = σ2 , (4.772) hence x2 1 + 4x1x2 + 8x2 2 = σ2 . (4.773) It follows that 4x1x2 + 7x2 2 = σ2 − 1. (4.774) We obtain successively x1 = σ2 − 1 − 7x2 2 4x2 , (4.775) σ2 − 1 − 7x2 2 4x2 2 + x2 2 = 1, (4.776) 65x4 2 − [14(σ2 − 1) + 16]x2 2 + (σ2 − 1)2 = 0, (4.777) x2 2 = 0.93412, x2 = ±0.9665. (4.778) We choose x2 = 0.9665, x1 = 0.2567, (4.779) wherefrom y1 = 0.7497, y2 = 0.6618. (4.780) We now determine the vector v2 so that x = x1 x2 T and v2 = v1 v2 T are orthogonal. We deduce 0.2567v1 + 0.9665v2 = 0 (4.781) and may choose v1 = −0.9665, v2 = 0.2567, (4.782) resulting in the matrix V = 0.2567 −0.9665 0.9665 0.2567 . (4.783) Analogically, we get U = 0.7497 −0.6618 0.6618 0.7497 . (4.784) Moreover, UT AV = 2.92 0 0 0.68 (4.785) and the problem is solved. Example 4.8 Let the matrix A =   −1 −3 −4 8 12 14 −4 −5 −5   , (4.786)
  • 208. NUMERICAL EXAMPLES 199 for which we wish to determine the eigenvalues and eigenvectors. We begin solving with Krylov’s method. To do this, we consider the vector y(0) = 1 0 1 T (4.787) and calculate successively y(1) = Ay(0) =   −1 −3 −4 8 12 14 −4 −5 −5     1 0 1   =   −5 22 −9   , (4.788) y(2) = Ay(1) =   −1 −3 −4 8 12 14 −4 −5 −5     −5 22 −9   =   −25 98 −45   , (4.789) y(3) = Ay(2) =   −1 −3 −4 8 12 14 −4 −5 −5     −5 22 −9   =   −89 346 −165   . (4.790) It results in the linear system   −25 −5 1 98 22 0 −45 −9 1     q1 q2 q3   = −   −89 346 −165   , (4.791) with the solution q1 = −6, q2 = 11, q3 = −6 (4.792) and the characteristic polynomial P (λ) = λ3 − 6λ2 + 11λ − 6. (4.793) The eigenvalues of the matrix A result from the equation P (λ) = 0 and are λ1 = 3, λ2 = 2, λ3 = 1. (4.794) The polynomials φi(λ), i = 1, 3, are obtained by dividing P (λ) by λ − λi; we have φ1(x) = λ2 − 3λ + 2, φ2(λ) = λ2 − 4λ + 3, φ3(λ) = λ2 − 5λ + 6. (4.795) The eigenvectors are ciφi(λi)xi = y(2) + q1iy(1) + q21y(0) , i = 1, 3, (4.796) where φ1(λ1) = 2, φ2(λ2) = −1, φ3(λ3) = 2. (4.797) It follows that 2c1x1 =   −25 98 −45   − 3   −5 22 −9   + 2   1 0 1   =   −8 32 −16   , (4.798)
  • 209. 200 LINEAR ALGEBRA −c2x2 =   −25 98 −45   − 4   −5 22 −9   + 3   1 0 1   =   −2 10 −6   , (4.799) 2c3x3 =   −25 98 −45   − 5   −5 22 −9   + 6   1 0 1   =   6 −12 6   . (4.800) To apply the Danilevski method, we must obtain the Frobenius form of the matrix A. We multiply the matrix A on the left by the matrix M1 =      1 0 0 − 4 5 − 1 5 −1 0 0 1      , (4.801) the inverse of which is M−1 1 =   1 0 0 −4 −5 −5 0 0 1   , (4.802) and obtain A2 = M−1 1 AM1 =        7 5 3 5 −1 12 5 23 5 −6 0 1 0        . (4.803) We now multiply the matrix A2 on the left by the matrix M2 =      5 12 − 23 12 5 2 0 1 0 0 0 1      , (4.804) the inverse of which is M−1 2 =      12 5 23 12 −6 0 1 0 0 0 1      , (4.805) obtaining A3 = M−1 2 A2M2 =   6 −11 6 1 0 0 0 1 0   . (4.806) The matrix A3 is just the required Frobenius form. The characteristic polynomial is −λ3 + 6λ2 − 11λ + 6 = 0 (4.807) and has the roots given by equation (4.720).
  • 210. NUMERICAL EXAMPLES 201 We obtain the eigenvectors of the Frobenius matrix in the form yi = λ2 i λi 1 T , i = 1, 3, (4.808) that is y1 = 9 3 1 T , y2 = 4 2 1 T , y3 = 1 1 1 T . (4.809) The eigenvectors of the matrix A are xi = M1M2yi, i = 1, 3, (4.810) and it follows successively that M1M2 =        5 12 − 23 12 5 2 − 1 3 4 3 −3 0 0 1        , (4.811) x1 =        5 12 − 23 12 5 2 − 1 3 4 3 −3 0 0 1            9 3 1     =      1 2 −2 1      , (4.812) x2 =        5 12 − 23 12 5 2 − 1 3 4 3 −3 0 0 1          4 2 1   =        1 3 − 5 3 1        , (4.813) x3 =        5 12 − 23 12 5 2 − 1 3 4 3 −3 0 0 1          1 1 1   =   1 −2 1   . (4.814) The maximum eigenvalue in modulus and the corresponding eigenvector may be determined by means of the direct power method. To do this, we use the vector y(0) defined by relation (4.713) and calculate successively y(1) = y(0) = 1 0 1 T , (4.815) y(2) = Ay(1) =   −1 −3 −4 8 12 14 −4 −5 −5     1 0 1   =   −5 22 −9   , (4.816) y(3) = Ay(2) =   −1 −3 −4 8 12 14 −4 −5 −5     −5 22 −9   =   −25 98 −45   , (4.817)
  • 211. 202 LINEAR ALGEBRA y(4) = Ay(3) =   −1 −3 −4 8 12 14 −4 −5 −5     −25 98 −45   =   −89 346 −165   , (4.818) y(5) = Ay(4) =   −1 −3 −4 8 12 14 −4 −5 −5     −89 346 −165   =   −289 1130 −549   , (4.819) y(6) = Ay(5) =   −1 −3 −4 8 12 14 −4 −5 −5     −289 1130 −549   =   −905 3562 −1749   , (4.820) y(7) = Ay(6) =   −1 −3 −4 8 12 14 −4 −5 −5     −905 3562 −1749   =   −2785 11018 −5445   , (4.821) y(8) = Ay(7) =   −1 −3 −4 8 12 14 −4 −5 −5     −2785 11018 −5445   =   −8489 33706 −16725   , (4.822) y(9) = Ay(8) =   −1 −3 −4 8 12 14 −4 −5 −5     −8489 33706 −16725   =   −25729 102410 −50949   , (4.823) y(10) = Ay(9) =   −1 −3 −4 8 12 14 −4 −5 −5     −8489 33706 −16725   =   −25729 102410 −50949   . (4.824) It follows that λ1 ≈ y(10) 1 y(9) 1 = 3.02, λ1 ≈ y(0) 2 y(9) 2 = 3.025, λ1 ≈ y(10) 3 y(9) 3 = 3.029. (4.825) The eigenvector is y(10) on normalization gives y(10) = −0.219 0.873 −0.435 T . (4.826) The eigenvalue λ3 = 1 may be obtained by using the inverse power method. We have A−1 = 1 6   10 5 6 −16 −11 −18 8 7 12   (4.827) and, using the same vector y(0) given by equation (4.787), we have y(1) = A−1 y(0) = 1 6   10 5 6 −16 −11 −18 8 7 12     1 0 1   = 1 6   16 −34 20   , (4.828) y(2) = A−1 y(1) = 1 36   10 5 6 −16 −11 −18 8 7 12     16 −34 20   = 1 36   110 −242 110   , (4.829)
  • 212. NUMERICAL EXAMPLES 203 y(3) = A−1 y(2) = 1 63   10 5 6 −16 −11 −18 8 7 12     110 −242 110   = 1 63   550 −1078 506   , (4.830) y(4) = A−1 y(3) = 1 64   10 5 6 −16 −11 −18 8 7 12     550 −1078 506   = 1 64   3146 6050 2926   , (4.831) y(5) = A−1 y(4) = 1 65   10 5 6 −16 −11 −18 8 7 12     3146 6050 2926   = 1 65   18766 −36454 17930   , (4.832) y(6) = A−1 y(5) = 1 66   10 5 6 −16 −11 −18 8 7 12     18766 −36454 17930   = 1 66   112970 −222002 110110   , (4.833) y(7) = A−1 y(6) = 1 67   10 5 6 −16 −11 −18 8 7 12     112970 −222002 110110   = 1 67   680350 −1347478 671066   , (4.834) y(8) = A−1 y(7) = 1 68   10 5 6 −16 −11 −18 8 7 12     680350 −1347478 671066   = 1 68   4092506 −8142530 4063246   , (4.835) y(9) = A−1 y(8) = 1 69   10 5 6 −16 −11 −18 8 7 12     4092506 −8142530 4063246   = 1 69   245914886 −49050694 24501290   , (4.836) y(10) = A−1 y(9) = 1 610   10 5 6 −16 −11 −18 8 7 12     245914886 −49050694 24501290  = 1 610   147673130 −294935762 147395710  . (4.837) It follows that λ3 ≈ y(10) 1 y(9) 1 = 1.0008, λ3 ≈ y(10) 2 y(9) 2 = 1.0021, λ3 ≈ y(10) 3 y(9) 3 = 1.0026, (4.838) and we obtain the eigenvector y(10) or, when normalized, y(10) = 0.4088 −0.8164 0.4080 T . (4.839) The eigenvalue λ2 may be found by means of the displacement method. To do this, we consider the matrix B = A − 1.9I3 =   −2.9 −3 −4 8 10.1 14 −4 −5 −6.9   , (4.840) the inverse of which is B−1 =   −3.131313 7.070707 16.161616 8.080808 −40.505051 −86.868687 −4.040404 25.252525 53.434343   . (4.841)
  • 213. 204 LINEAR ALGEBRA We successively calculate B−2 =   1.64269 99.58167 198.75522 −1.63249 −495.85751 −992.55170 0.81624 297.92876 596.27586   , (4.842) B−4 =   2.364 10000.159 19999.477 −3.360 −49997.593 −99994.870 1.679 29998.797 59997.436   , (4.843) B−8 =   −15.8 100000011.4 200000020.9 92.6 −499999600.9 −1000000199 −56.3 3000000505 600000099.5   . (4.844) It follows that for B−8 , the eigenvalue λ ≈ 8 Tr(B−8) = 10.0; (4.845) hence, the matrix B has the eigenvalue λ = 1 λ = 0.1. (4.846) We deduce from equation (4.840) that the matrix A has the eigenvalue λ2 = λ + 1.9 = 2.0. (4.847) The eigenvalues of the matrix A may be determined by the Leverier method too. We calculate A =   −1 −3 −4 8 12 14 −4 −5 −5   , S1 = Tr(A) = 6, (4.848) A2 =   −7 −13 −18 32 50 66 −16 −23 −29   , S2 = Tr(A2 ) = 14, (4.849) A3 =   −25 −45 −64 104 174 242 −52 −83 −113   , S3 = Tr(A3 ) = 36, (4.850) the coefficients of the characteristic polynomial being given by p1 = −S1 = −6, (4.851) p2 = − 1 2 (S2 + p1S1) = 11, (4.852) p3 = − 1 3 (S3 + p1S2 + p2S1) = −6. (4.853) We obtain the characteristic equation λ3 − 6λ2 + 11λ − 6 = 0, (4.854) whose roots are given by equation (4.794).
  • 214. NUMERICAL EXAMPLES 205 Another method to determine the eigenvalues is the Left–Right one. We write the matrix A in the form A =   1 0 0 l21 1 0 l31 l32 1     r11 r12 r13 0 r22 r23 0 0 r33   ; (4.855) it results in the system r1 = −1, r12 = −3, r13 = −4, l21r11 = 8, l2r12 + r22 = 12, l21r13 + r23 = 14, l31r11 = −4, l31r12 + l32r22 = −5, l31r13 + l32r23 + r33 = −5, (4.856) with the solution r11 = −1, r12 = −3, r13 = −4, l21 = −8, r22 = −12, r23 = −18, l31 = 4, l32 = − 7 12 , r33 = 1 2 , (4.857) hence the matrices L1 =   1 0 0 −8 1 0 4 − 7 12 1   , R1 =   −1 −3 −4 0 −12 −18 0 0 1 2   , (4.858) A2 = R1L1 =   −1 −3 −4 0 −12 −18 0 0 1 2     1 0 0 −8 1 0 4 − 7 12 1   =     7 −2 3 −4 24 −3 2 −18 2 − 7 24 1 2     . (4.859) The procedure can continue, the data obtained being given in Table 4.1. This results in the following eigenvalues λ1 ≈ 3.0002, λ2 ≈ 1.9888, λ3 ≈ 1.0056. (4.860) Example 4.9 Let the linear system be 2x1 + 3x2 + x3 + 3x4 = 9, x1 − 2x2 − x3 + 5x4 = 3, 3x1 + 6x2 + x3 − 2x4 = 8, −2x1 − x2 + 6x3 + 4x4 = 7, x1 + 2x2 + 5x3 − 7x4 = 1. (4.861) We wish to determine the solution of this system in the sense of the least squares method. We have A =       2 3 1 3 1 −2 −1 5 3 6 1 −2 −2 −1 6 4 1 2 5 −7       . (4.862) We shall first determine the rank of the matrix A. We commute rows 1 and 2 with each other, A ∼       1 −2 −1 5 2 3 1 3 3 6 1 −2 −2 −1 6 4 1 2 5 −7       ; (4.863)
  • 215. 206 LINEAR ALGEBRA TABLE 4.1 Determination of the Eigenvalues by the L–R Method Step A L R 1   −1 −3 −4 8 12 14 −4 −5 −5     1 0 0 −8 11 0 4 0.583333 1     −1 −3 −4 0 −12 −18 0 0 0.5   2   7 −0.6667 −4 24 −1.5 −18 2 0.2917 0.5     1 0 0 3.4286 1 0 0.2857 −0.1288 1     7 −0.6667 −4 0 0.7857 −4.2857 0 0 1.0909   3   3.5714 −0.1515 −4 1.4694 1.3377 −4.2857 0.3117 −0.1405 1.0909     1 0 0 0.4114 1 0 0.0873 −0.0909 1     3.5714 −0.1515 −4 0 1.4 −2.64 0 0 1.2   4   3.16 0.2121 −4 0.3456 1.64 −2.64 0.1047 −0.1091 1.2     1 0 0 0.1094 1 0 0.0331 −0.0718 1     3.16 0.2121 −4 0 1.16168 −2.2025 0 0 1.1744   5   3.0506 0.4994 −4 0.1038 1.7750 −2.2025 0.0389 −0.0843 1.1744     1 0 0 0.0348 1 0 0.0128 −0.0516 1     3.0506 0.4994 −4 0 1.7580 −2.0664 0 0 1.1188   6   3.0166 0.7058 −4 0.0335 1.8646 −2.0664 0.0143 −0.0577 1.1188     1 0 0 0.0111 1 0 0.0047 −0.0329 1     3.0166 0.7058 −4 0 1.8568 −2.0220 0 0 1.0712   7   3.0055 0.8374 −4 0.0110 1.9233 −2.0220 0.0051 −0.0352 1.0712     1 0 0 0.0037 1 0 0.0017 −0.0191 1     3.0055 0.8374 −4 0 1.9202 −2.0073 0 0 1.0396   8   3.0018 0.9137 −4 0.0037 1.9585 −2.0073 0.0018 −0.0198 1.0396     1 0 0 0.0012 1 0 0.0006 −0.0104 1     3.0018 0.9137 −4 0 1.9574 −2.0024 0 0 1.0211   9   3.0006 0.9553 −4 0.0012 1.9783 −2.0024 0.0006 −0.0106 1.0211     1 0 0 0.0004 1 0 0.0002 −0.0055 1     3.0006 0.9553 −4 0 1.9779 −2.0008 0 0 1.0110   10   3.0002 0.9772 −4 0.0004 1.9888 −2.0008 0.0002 −0.0055 1.0110     1 0 0 0.0001 1 0 0.0001 −0.0028 1     3.0002 0.9772 −4 0 1.9887 −2.0003 0 0 1.0056   then we multiply row 1 by −2, −3, 2 and −1, and add it to rows 2, 3, 4, 5, respectively, obtaining A ∼       1 −2 −1 5 0 7 3 −7 0 12 4 −17 0 −5 4 −6 0 4 6 −12       . (4.864) We multiply column 1 by 2, 1, −5 and add this to columns 2, 3, 4, respectively, to get A ∼       1 0 0 0 0 7 3 −7 0 12 4 −17 0 −5 4 −6 0 4 6 −12       ; (4.865)
  • 216. NUMERICAL EXAMPLES 207 We also multiply row 2 by −12/7, 5/7 and −4/7, and add this to rows 3, 4, 5, respectively A ∼              1 0 0 0 0 7 3 −7 0 0 − 8 7 −5 0 0 43 7 −11 0 0 30 7 −8              . (4.866) We now multiply column 2 by −3/7 and 1, and add this to columns 3, 4, respectively A ∼             1 0 0 0 0 7 0 0 0 0 − 8 7 −5 0 0 43 7 −11 0 0 30 7 −8             . (4.867) We now multiply row 3 by 43/8 and 30/8, and add this to rows 4, 5, respectively A ∼              1 0 0 0 0 7 0 0 0 0 − 8 7 −5 0 0 0 − 303 7 0 0 0 − 107 4              . (4.868) Finally, we multiply row 4 by −749/1212 and add this to row 5 A ∼             1 0 0 0 0 7 0 0 0 0 − 8 7 −5 0 0 0 − 303 7 0 0 0 0             . (4.869) It follows that rank(A) = 4, (4.870) so that we must solve the linear system AT AxLS = AT b, (4.871)
  • 217. 208 LINEAR ALGEBRA that is     2 1 3 −2 1 3 −2 6 −1 2 1 −1 1 6 5 3 5 −2 4 −7           2 3 1 3 1 −2 −1 5 3 6 1 −2 −2 −1 6 4 1 2 5 −7           x1 x2 x3 x4     =     2 1 3 −2 1 3 −2 6 −1 2 1 −1 1 6 5 3 5 −2 4 −7           9 3 8 7 1       (4.872) or, equivalently,     19 26 −3 −10 26 54 15 −31 −3 15 64 −15 −10 −31 −15 103         x1 x2 x3 x4     =     32 64 61 47     . (4.873) The solution of this system is xLS = x1 x2 x3 x4 T = 1 1 1 1 T . (4.874) Example 4.10 Let us again take the matrix A of Example 4.7 for which we have found U = 0.7497 −0.6618 0.6618 0.7497 , V = 0.2567 −0.9665 0.9665 0.2567 , (4.875) A = 1 2 0 2 , (4.876) Σ = UT AV = 2.92 0 0 0.68 . (4.877) Its pseudo-inverse (in fact, it is just the inverse) is given by A+ = VΣ+ UT = 0.2567 −0.9665 0.9665 0.2567 1 2.92 0 0 1 0.68 0.7497 0.6618 −0.6618 0.7497 = 1 −1 0 0.5 . (4.878) Example 4.11 Let the underdetermined linear system be 2x1 + 3x2 + x3 = 6, x1 + 4x2 + 3x3 = 8. (4.879) The matrix A has the expression A = 2 3 1 1 4 3 , AT =   2 1 3 4 1 3   . (4.880) We find now the QR decomposition of the matrix AT . We have x1 = 2 3 1 T , x1 2 = √ 14 = λ1 (4.881) and choose v1 = x1 + λ1e1 = 2 + √ 14 3 1 T . (4.882)
  • 218. NUMERICAL EXAMPLES 209 Then v1vT 1 =   18 + 4 √ 14 6 + 3 √ 14 2 + √ 14 6 + 3 √ 14 9 3 2 + √ 14 3 1   , (4.883) vT 1 v1 = 28 + 4 √ 14, (4.884) 2 v1vT 1 vT 1 v1 =   1.53452 0.80178 0.26726 0.80178 0.41893 0.13964 0.26726 0.13964 0.04655   , (4.885) H1 =   −0.53452 −0.80178 −0.26726 −0.80178 0.58107 −0.13964 −0.26726 −0.13964 0.95345   , (4.886) H1AT =   −0.53452 −0.80178 −0.26726 −0.80178 0.58107 −0.13964 −0.26726 −0.13964 0.95345     2 1 3 4 1 3   =   −3.74164 −4.54342 0 1.10358 0 2.03453   . (4.887) The next vector is x2 = 1.10358 2.03453 T , (4.888) for which x2 2 = 2.31456. (4.889) We choose v2 = x2 + x2 2e2 = 3.418174 2.03453 T , (4.890) for which v2vT 2 = 11, 68368 6, 95431 6, 95431 4, 13931 , (4.891) vT 2 v2 = 15, 82299, (4.892) 2 v2vT 2 vT 2 v2 = 1, 47680 0, 87901 0, 87901 0, 52320 , (4.893) H2 =   1 0 0 0 −0.47680 −0.87901 0 −0.87901 0.47680   , (4.894) H2H1AT =   −3.74164 −4.54342 0 −2.31456 0 0   = R, (4.895) Q = H1H2 =   −0.53452 0.61721 0.57735 −0.80178 −0.15431 −0.57735 −0.26726 −0.77151 0.57735   . (4.896) It results in the system −3.74164 0 0 −4.54342 −2.31456 0   z1 z2 z3   = 6 8 , (4.897)
  • 219. 210 LINEAR ALGEBRA with the solution z1 = −1.60357, z2 = −0.30861, z3 = 0. (4.898) The vector x is given by the system   0.53452 −0.80178 −0.26726 0.61721 −0.15431 −0.77151 0.57735 −0.57735 0.57735     x1 x2 x3   =   −1.60357 −0.30861 0   (4.899) and it follows that x1 = 0.66667, x2 = 1.33334, x3 = 0.66667, x 2 = 1.633. (4.900) If we consider z3 = 0.57785, then we obtain x1 = 1, x2 = 1, x3 = 1, x 2 = √ 3 = 1.73205 > x 2. Example 4.12 Let A = 1 2 2 2 , (4.901) be the matrix for which we wish to determine the eigenvalues and eigenvectors by means of the rotation method. To do this, we construct the matrix R1 given by R1 = cos α − sin α sin α cos α , R−1 1 = RT 1 , (4.902) where tan 2α = 2a12 a11 − a22 = −4. (4.903) It follows that α = −0.66291, (4.904) R1 = 0.78821 0.61541 −0.61541 0.78821 (4.905) and the new matrix A2 = R−1 1 AR1 = −0.56156 0 0 3.56157 . (4.906) We observe that the matrix A2 is a diagonal one, the eigenvalues of the matrix A being given by λ1 ≈ 0.56156, λ2 = 3.56157, (4.907) while the eigenvectors read v1 = 0.78821 −0.61541 T , v2 = 0.61541 0.78821 T . (4.908) The exact eigenvalues of the matrix A are λ1 = 3 − √ 17 2 = −0.56155, λ2 = 3 + √ 17 2 = 3.56155. (4.909)
  • 220. APPLICATIONS 211 4.13 APPLICATIONS Problem 4.1 Let us show that the motion of the system in Figure 4.1 is stable if the force F is given by ¨F = −40˙x − 25x, (4.910) the constants of the system being m = 4 kg, c = 20 N s m−1 , k = 41 N m−1 . x c k F Figure 4.1 Problem 4.1. Solution: Differentiating twice the differential equation of motion m¨x + c ˙x + kx = F (4.911) with respect to time and taking into account the numerical values, we obtain 4 ···· x + 20 ··· x + 41 ·· x + 40 · x + 25x = 0; (4.912) the characteristic equation is b0r4 + b1r3 + b2r2 + b3r + b4 = 0, (4.913) where b0 = 4, b1 = 20, b2 = 41, b3 = 40, b4 = 25. The motion is asymptotically stable if the solutions of equation (4.913) are either strictly negative or complex with a strict negative real part. To this end, the conditions of the Routh–Hurwitz criterion must be fulfilled, that is, bi > 0, i = 1, 4, det A1 > 0, det A2 > 0, (4.914) where A1 = b1 b0 b3 b2 , A2 =   b1 b0 0 b3 b2 b1 0 b4 b3   (4.915) or, equivalent, A1 = 20 4 40 41 , A2 =   20 4 0 40 41 20 0 25 40   . (4.916) In case of the numerical application, we obtain the values det A1 = 660, det A2 = 16,400, con- ditions (4.914) are fulfilled and, as a consequence, the motion is asymptotically stable.
  • 221. 212 LINEAR ALGEBRA Moreover, the roots of equation (4.913) are r1,2 = −2 ± i, r3,4 = −1/2 ± i, obtaining a solution obviously asymptotically stable: x = C1e−2t cos(t + φ1) + C2e− t 2 cos(t + φ2), (4.917) where C1, C2, φ1, φ2 are integration constants that may be determined by the initial conditions. Problem 4.2 We consider a rigid solid acted upon by five forces of intensities Fi, i = 1, 5, the supports of which being the straight lines of equations bix − aiy = 0, z − zi = 0, a2 i + b2 i = 1, i = 1, 5. (4.918) If we show that if the rank of the matrix A =     a1 a2 a3 a4 a5 b1 b2 b3 b4 b5 a1z1 a2z2 a3z3 a4z4 a5z5 b1z1 b2z2 b3z3 b4z4 b5z5     (4.919) is equal to four, then we may determine the intensities Fi, i = 1, 5, so that the solid is in equilibrium. Solution: The equations of equilibrium 5 i=1 Fiai = 0, 5 i=1 Fibi = 0, 5 i=1 Fiaizi = 0, 5 i=1 Fibizi = 0 (4.920) form a system of homogeneous algebraic equations, which admits solutions if rankA = 4. Because the determinant a1 a2 a3 a4 b1 b2 b3 b4 a1z1 a2z2 a3z3 a4z4 b1z1 b2z2 b3z3 b4z4 = 1 0 √ 2 2 3 5 0 1 √ 2 2 4 5 0 0 √ 2 2 − 3 10 0 1 2 √ 2 2 − 2 5 = 3 √ 2 40 = 0, (4.921) system (4.920) admits the solution F1 a2 a3 a4 a5 b2 b3 b4 b5 a2z2 a3z3 a4z4 a5z5 b2z2 b3z3 b4z4 b5z5 = F2 a3 a4 a5 a1 b3 b4 b5 b1 a3z3 a4z4 a5z5 a1z1 b3z3 b4z4 b5z5 b1z1 = F3 a4 a5 a1 a2 b4 b5 b1 b2 a4z4 a5z5 a1z1 a2z2 b4z4 b5z5 b1z1 b2z2 = F4 a5 a1 a2 a3 b5 b1 b2 b3 a5z5 a1z1 a2z2 a3z3 b5z5 b1z1 b2z2 b3z3 = F5 a1 a2 a3 a4 b1 b2 b3 b4 a1z1 a2z2 a3z3 a4z4 b1z1 b2z2 b3z3 b4z4 (4.922)
  • 222. APPLICATIONS 213 or, equivalent, F1 0 √ 2 2 3 5 − 3 5 1 √ 2 2 4 5 4 5 0 √ 2 2 − 3 10 3 5 1 2 √ 2 2 − 2 5 − 4 5 = F2 √ 2 2 3 5 − 3 5 1 √ 2 2 4 5 4 5 0 √ 2 2 − 3 10 3 5 0 √ 2 2 − 2 5 − 4 5 0 = F3 3 5 − 3 5 1 0 4 5 4 5 0 1 − 3 10 3 5 0 0 − 2 5 − 4 5 0 1 2 = F4 − 3 5 1 0 √ 2 2 4 5 0 1 √ 2 2 3 5 0 0 √ 2 2 − 4 5 0 1 2 √ 2 2 = F5 1 0 √ 2 2 3 5 0 1 √ 2 2 4 5 0 0 √ 2 2 − 3 10 0 1 2 √ 2 2 − 2 5 , (4.923) from which F1 − 213 √ 2 200 = F2 − 19 √ 2 25 = F3 21 25 = F4 19 √ 2 20 = F5 − 13 √ 2 40 . (4.924) Denoting now by λ (arbitrary real number) the common value of the ratios in relations (4.924), we obtain the forces F1 = − 213 √ 2 200 λ, F2 = − 19 √ 2 25 λ, F3 = 21 25 λ, F4 = 19 √ 2 20 λ, F5 = − 13 √ 2 40 λ. (4.925) Problem 4.3 Let us consider a rigid solid (Fig. 4.2), suspended by n = 6 bars with spherical hinges. Let ui be the unit vectors in the directions AiA0i, ri the position vectors of the points Ai, the vectors mi = ri · ui, (ai, bi, ci), (di, ei, fi) the projections of the vectors ui, mi, i = 1, 6, on the axes of the OXYZ-trihedron, while A is the matrix defined by A =         a1 a2 a3 a4 a5 a6 b1 b2 b3 b4 b5 b6 c1 c2 c3 c4 c5 c6 d1 d2 d3 d4 d5 d6 e1 e2 e3 e4 e5 e6 f1 f2 f3 f4 f5 f6         . (4.926) Let us show that if rankA = 6, then the equilibrium of the rigid solid is a statically determined (isostatic) problem, hence the efforts Ni in the bars AiA0i, i = 1, 6, may be determined for any system of forces (F, MO) that acts upon the rigid solid.
  • 223. 214 LINEAR ALGEBRA Z Mo F Y O Ai ui A0i X Figure 4.2 Problem 4.3. As a numerical application, we consider the cube in Figure 4.3 of side l = 2 m, acted upon by the force of components FX = 2000 N, FY = 2000 N, FZ = 4000 N, by the moment of projections MOX = 3000 Nm, MOY = 1000 N m, MOZ = 2000 N m, the bars A1A01, A5A05 being parallel to the OX-axis, the bars A2A02, A6A06 being parallel to the OY -axis, while the bars A3A03, A4A04 are parallel to the OZ-axis. Solution: By means of the vectors ui, mi, i = 1, 6, the equations of equilibrium 6 i=1 Ni + F = 0, 6 i=1 ri · Ni + MO = 0 (4.927) are obtained in the form 6 i=1 Niui + F = 0, 6 i=1 Nimi + MO = 0. (4.928) A03 X Z O Y F A3 A05 A5 A02 A06 A01 A4 A2 A6A1 A04 Mo u1 Figure 4.3 Numerical application.
  • 224. APPLICATIONS 215 If we denote by (FX, FY , FZ), (MOX , MOY , MOZ ) the projections of the vectors F, MO on the axes OX, OY , OZ and by {F}, {N} the column matrices {F} = FX FY FZ MOX MOY MOZ T , {N} = N1 N2 N3 N4 N5 N6 T , (4.929) then system (4.928) leads to the matrix equation A{N} + {F} = {0}, (4.930) which has a solution if rankA = 6 and the problem is isostatic. Observation 4.42 If the number of bars n > 6, then equation (4.930) may have as well a solution if rankA = 6. In this case, the problem is statically undetermined (hyperstatic); the determination of the reactions Ni, i = 1, n, is possible by taking into account the elastic equilibrium equations. In the numerical case, it follows that A =         1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 2 0 0 0 0 0 0 2 0 0 0 0 0 0 2         , (4.931) det A = 8 = 0, rankA = 6; (4.932) and because {F} = 1000 2 2 4 3 1 2 T , (4.933) we obtain the matrix equation         1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 2 0 0 0 0 0 0 2 0 0 0 0 0 0 2                 N1 N2 N3 N4 N5 N6         =         −2000 −2000 −4000 −3000 −1000 −2000         , (4.934) from which the values N1 = −1500 N, N2 = −1000 N, N3 = −2500 N, N4 = −1500 N, N5 = −500 N, N6 = −1000 N. (4.935) Problem 4.4 A homogeneous straight bar AB, of constant cross section, of mass m and length 2l is moving, under the action of its own weight, in the vertical plane OXY (Fig. 4.4), with the end A on the hyperbola of equation F(X, Y) = (X − 2l)Y − 8l2 = 0. (4.936)
  • 225. 216 LINEAR ALGEBRA O Y X B0(l,8l) B A0(3l,8l) A(XA,YA) mg θ Figure 4.4 Problem 4.4. Knowing that at the initial moment the bar is in rest and parallel to the OX-axis, the end A being of coordinates (3l, 8l), determine the reaction NA at this moment, the acceleration of the gravity center C, as well as the angular acceleration. Numerical application for m = 3 kg, l = 1 m. Solution: Denoting by (X, Y) the coordinates of the center of gravity C and by θ the angle made by the bar with the OX-axis, we may write the relations XA = X + l cos θ, YA = Y + l sin θ, ˙XA = ˙X − l˙θ sin θ, ˙YA = ˙Y + l˙θ cos θ, ¨XA = ¨X − l¨θ sin θ − l˙θ2 cos θ, ¨YA = ¨Y + l¨θ cos θ − l˙θ2 sin θ, (4.937) from which, for the initial moment (X = 3l, Y = 8l, θ = 0, ˙θ = 0, ˙X = 0, ˙Y = 0), we obtain ˙XA = 0, ˙YA = 0, ¨XA = ¨X, ¨YA = ¨Y + l¨θ. (4.938) By successive differentiation of equation (4.936) with regard to time, we get the relations ˙XAYA + (XA − 2l) ˙YA = 0, ¨XAYA + (XA − 2l) ¨YA + 2 ˙XA ˙YA = 0. (4.939) Taking into account the relations at the initial moment, it follows that 8l ¨X + l ¨Y + l2 ¨θ = 0. (4.940) The reaction NA has the components NAX = λ ∂F ∂X X=XA Y=YA , NAY = λ ∂F ∂Y X=XA Y=YA (4.941) or NAX = λYA, NAY = λ(XA − 2l), (4.942) which, at the initial moment, become NAX = 8λl, NAY = λl. (4.943)
  • 226. APPLICATIONS 217 Under these conditions, the theorem of momentum leads to the equations m ¨X = 8λl, m ¨Y = −mg + λl, (4.944) while the theorem of moment of momentum with respect to the point C allows to write ml2 3 ¨θ = λl2 . (4.945) Using the notation A =        m 0 0 −8l 0 m 0 −l 0 0 ml2 3 −l2 8l l l2 0        , (4.946) equation (4.944) and equation (4.945), equation (4.940) can be written in the matrix form A     ¨X ¨Y ¨θ λ     =     0 −mg 0 0     , (4.947) from which, by inverting the matrix A, the matrix     ¨X ¨Y ¨θ λ     = −mgA−1     0 1 0 0     (4.948) is obtained. For the numerical application, we have A =     3 0 0 −8 0 3 0 −1 0 0 1 −1 8 1 1 0     , A−1 =     0.019608 −0.039216 −0.117647 0.117647 −0.039216 0.328431 −0.014706 0.014706 −0.117647 −0.014706 0.955882 0.044118 −0.117647 −0.014706 −0.044118 0.044118    , (4.949)     ¨X ¨Y ¨θ λ     =     1.153715 −9.662276 0.432643 0.432643     , (4.950) NAX = 3.461144 N, NAY = 0.432643 N. (4.951) Problem 4.5 We consider a system of two homogeneous straight bars, of constant cross sections, lengths 2l1 and 2l2 and masses m1 and m2 (Fig. 4.5), respectively, acted upon by their own weights m1g and m2g (a double pendulum). The fixed reference system is OXY , the OX-axis being vertical. Taking as generalized coordinates the coordinates (X1, Y1), (X2, Y2) of the center of gravity C1 and C2, respectively, as well as the angles θ1 and θ2 made by the bars with the OX-axis, it is required
  • 227. 218 LINEAR ALGEBRA X Y m1g m2g θ1 O C1(X1,Y1) C2(X2,Y2) θ2 Figure 4.5 Problem 4.5. (a) to write the differential equation of motion, using the multibody method; (b) the dimensions l1 = l2 = 0.5 m and the initial conditions at t = 0: X1 = l1, Y1 = 0, θ1 = 0, X2 = 2l1, Y2 = l2, θ2 = π/2, ˙X1 = ˙X2 = ˙Y1 = ˙Y2 = 0, ˙θ1 = ˙θ2 = 0 being given, determine the accelerations ¨X1, ¨Y1, ¨θ1, ¨X2, ¨Y2, ¨θ2 and the reactions, by inverting the matrix in two cases, that is, m1 = m2 = 3 kg and m1 = 0, m2 = 3 kg. Solution: Differentiating the constraints functions with respect to time, X1 − l1cosθ1 = 0, Y1 − l1sinθ1 = 0, −X1 − l1cosθ1 + X2 − l2cosθ2 = 0, −Y1 − l1sinθ1 + Y2 − l2sinθ2 = 0, (4.952) and using the notations [B] =     1 0 l1 sin θ1 0 0 0 0 1 −l1 cos θ1 0 0 0 −1 0 l1 sin θ1 1 0 l2 sin θ2 0 −1 −l1 cos θ1 0 1 −l2 cos θ2     , (4.953) {q} = X1 Y1 θ1 X2 Y2 θ2 T , (4.954) [B] being the constraints matrix and {q} the column matrix of the generalized coordinates, we obtain the relation [B]{˙q} = {0}. (4.955) We apply Lagrange’s equations d dt ∂T ∂ ˙qk − ∂T ∂qk + ∂V ∂qk = 4 i=1 Bik λi, k = 1, 6, (4.956) where λi, i = 1, 4, are Lagrange’s multipliers, while the kinetic energy T and the potential energy V are given by the relations T = 1 2 2 i=1 mi ˙X2 i + ˙Y2 i + mil2 i 3 ˙θ2 i , (4.957) V = −g 2 i=1 miXi, (4.958) respectively.
  • 228. APPLICATIONS 219 Using the notations [M] =             m1 0 0 0 0 0 0 m1 0 0 0 0 0 0 m1l2 1 3 0 0 0 0 0 0 m2 0 0 0 0 0 0 m2 0 0 0 0 0 0 m2l2 2 3             , {F} =         m1g 0 0 m2g 0 0         , {λ} =     λ1 λ2 λ3 λ4     , (4.959) we obtain the matrix equation [M]{¨q} = {F} + [B]T {λ}. (4.960) Relation (4.960) and relation (4.955), differentiated with respect to time, are expressed together in the matrix equation of motion of the mechanical system [M] −[B]T [B] [0] {¨q} {λ} = {F} −[ ˙B]{˙q} . (4.961) We obtain {¨q} {λ} = [M]−1 − [M]−1 [B]T [[B][M]−1 [B]T ]−1 [B][M]−1 [M]−1 [B]T [[B][M]−1 [B]T ]−1 −[[B][M]−1 [B]T ]−1 [B][M]−1 [[B][M]−1 [B]T ]−1 × {F} −[ ˙B]{˙q} (4.962) if the matrix [M] is invertible. For the first numerical application, we obtain the values [M] =         3 0 0 0 0 0 0 3 0 0 0 0 0 0 0.25 0 0 0 0 0 0 3 0 0 0 0 0 0 3 0 0 0 0 0 0 0.25         , [M]−1 =                   1 3 0 0 0 0 0 0 1 3 0 0 0 0 0 0 4 0 0 0 0 0 0 1 3 0 0 0 0 0 0 1 3 0 0 0 0 0 0 4                   , (4.963) [B] =     1 0 0 0 0 0 0 1 −0.5 0 0 0 −1 0 0 1 0 0.5 0 −1 −0.5 0 1 0     , [B]T =         1 0 −1 0 0 1 0 −1 0 −0.5 0 −0.5 0 0 1 0 0 0 0 1 0 0 0.5 0         , (4.964) {F} = 29.4195 0 0 29.4195 0 0 T , (4.965) [ ˙B]{˙q} = 0 0 0 0 T , (4.966)
  • 229. 220 LINEAR ALGEBRA {¨q} {λ} = ¨X1 ¨Y1 ¨θ1 ¨X2 ¨Y2 ¨θ2 λ1 λ2 λ3 λ4 T = 0 0 0 7.354875 0 −14.709750 −36.774375 0 −7.354875 0 T (4.967) for the initial moment, where λ1, λ2 are the reactions at the hinge O, while λ3, λ4 are the reactions at the hinge O1. For the second numerical application, the matrix [M] is not invertible, so that it is necessary to proceed to the inversion of the total matrix [A] = [M] −[B]T [B] [0] . (4.968) Hence, it follows that {¨q} {λ} = ¨X1 ¨Y1 ¨θ1 ¨X2 ¨Y2 ¨θ2 λ1 λ2 λ3 λ4 T = 0 0 0 7.354875 0 −14.709750 −7.354875 0 −7.354875 0 T . (4.969) Problem 4.6 Consider a rigid solid, as illustrated in Figure 4.6, upon which a percussion P is applied at the point A. We denote by • Oxyz —the reference system rigidly connected to the solid; • m—the mass; • [JO]—the matrix of the moments of inertia [JO] =   Jx −Jxy −Jxz −Jxy Jy −Jyz −Jxz −Jyz Jz   ; (4.970) • rC —the position vector of the center of gravity C; • xC, yC, zC —the coordinates of the gravity center C; • rA —the position vector of the point A; • xA, yA, zA —the coordinates of the point A; C (xC,yC,zC) A (xA,yA,zA) rA rC P z x O y uv0 Oω0 O Figure 4.6 Problem 4.6.
  • 230. APPLICATIONS 221 • u—the unit vector of the percussion P; • a, b, c—the components of the unit vector u; • d, e, f —the projections on the axes of the vector rA · u; • P —the intensity of the percussion P; • v0 O —the velocity of the point O before the application of the percussion; • v0 Ox , v0 Oy , v0 Oz —the projections of the velocity v0 O on the axes; • ω0 —the angular velocity of the rigid solid before the application of the percussion; • ω0 x, ω0 y, ω0 z —the projections of the vector ω0 on the axes; • vO —the velocity of the point O after application of the percussion; • vOx , vOy , vOz —projections of the velocity vO on the axes; • ω—the angular velocity after percussion; • ωx, ωy, ωz —the projections of the vector ω on the axes; • {v0 O}, {ω0 }, {vO}, {ω}—the column matrices defined by {v0 O} = v0 Ox v0 Oy v0 Oz T , {ω0 } = ω0 x ω0 y ω0 z T , {vO} = vOx vOy vOz T , {ω} = ωx ωy ωz T ; (4.971) • {u}, {mu}—the column matrices defined by {u} = a b c T , {mu} = d e f T ; (4.972) • {V}, {V0}, {U}—the column matrices defined by {V} = vOx vOy vOz ωx ωy ωz T , {V0 } = v0 Ox v0 Oy v0 Oz ω0 x ω0 y ω0 z T , (4.973) {U} = a b c d e f T ; • [m], [S], [M]—the matrices defined by [m] =   m 0 0 0 m 0 0 0 m   , [S] =   0 −mzC myC mzC 0 −mxC −myC mxC 0   , [M] = [m] [S]T [S] [JO] . (4.974) Determine the velocities vOx , vOy , vOz , ωx, ωy, ωz after the application of the percussion. For the numerical application, we take m = 80, Jx = 2, Jxy = 0.8, Jxz = 0.4, Jy = 2, Jyz = 0.4, Jz = 3.2, xC = 0.05, yC = 0.05, zC = 0.025, xA = 0.2, yA = 0.2, zA = 0.1, a = 2/3, b = 1/3, c = 2/3, v0 Ox = 10, v0 Oy = 8, v0 Oz = 7, ω0 x = 4, ω0 y = 3, ω0 z = 5, P = 100 (quantities given in SI). Solution: The theorem of momentum for collisions, in matrix form, leads to [m] vO − {v0 O} + [S]T {{ω} − {ω0 }} = P {u}. (4.975) Analogically, the theorem of moment of momentum for collisions about the point O, in matrix form, reads [S]{{vO} − {v0 O}} + [JO]{{ω} − {ω0 }} = P {mu}. (4.976)
  • 231. 222 LINEAR ALGEBRA Equation (4.975) and equation (4.976) may be written together in a matrix form [M]{{V} − {V0 }} = P {U}; (4.977) inverting the matrix [M] {V} = {V0 } + P [M]−1 {U}. (4.978) For the numerical application, we obtain rA = 0.2i + 0.2j + 0.1k, u = 2 3 i + 1 3 j + 2 3 k, rA · u = i j k 0.2 0.2 0.1 2 3 1 3 2 3 = 0.1i − 0.2 3 j − 0.2 3 k, (4.979) [S] =   0 −2 4 2 0 −4 −4 4 0   , (4.980) {U} = 2 3 1 3 2 3 0.1 − 0.2 3 − 0.2 3 T , {V0 } = 10 8 7 4 3 5 T , (4.981) [M] =         80 0 0 0 2 −4 0 80 0 −2 0 4 0 0 80 4 −4 0 0 −2 4 2 −0.8 −0.4 2 0 −4 −0.8 2 −0.4 −4 4 0 −0.4 −0.4 3.2         , (4.982) [M]−1 =         −0.013620 −0.000854 −0.000532 −0.001260 −0.011898 0.016447 −0.000854 1.013620 −0.000532 0.011898 0.001260 −0.016447 −0.000532 −0.000532 −0.014628 −0.021277 0.021277 0 −0.001260 0.011898 −0.021277 0.673292 0.247760 0.098681 −0.011898 0.001260 0.021277 0.247760 0.673292 0.098681 0.016447 −0.016447 0 0.098681 0.098681 0.378289         , (4.983) from which {V} = vOx vOy vOz ωx ωy ωz T = 8.985140 8.581827 5.616983 7.317427 0.998380 3.355253 T . (4.984) Problem 4.7 The matrix of the moments of inertia of a rigid solid is [J] =   Jxx −Jxy −Jxz −Jxy Jyy −Jyz −Jxz −Jyz Jzz   =   2.178606 0.313753 −0.219693 0.313753 3.209143 0.553764 −0.219693 0.553764 3.612250   ; let us determine the principal moments of inertia Jx, Jy, Jz, as well as the principal directions.
  • 232. APPLICATIONS 223 Solution: 1. Theory The principal moments of inertia are just the eigenvalues of the matrix [J], which are given by the third-degree equation det[[J] − λ[I]] = 0, (4.985) where [I] is the unit matrix of third order, hence Jx = λ1, Jy = λ2, Jz = λ3. (4.986) The principal directions ai, bi, ci, a2 i + b2 i + c2 i = 1, i = 1, 3, are given by the system (Jxx − λi)ai − Jxy bi − Jxz ci = 0, −Jxy ai + (Jyy − λi)bi − Jyz ci = 0. (4.987) Using the notations 1i = −Jxy −Jxz Jyy − λi −Jyz , 2i = −Jxz Jxx − λi −Jyz −Jxy , 3i = Jxx − λi −Jxy Jxy Jyy − λi , (4.988) we obtain the equalities ai 1i = bi 2i = ci 3i = µi; (4.989) the condition a2 i + b2 i + c2 i = 1 leads to µi = 1 2 1i + 2 2i + 2 3i , (4.990) so that the solution is ai = µi 1i, bi = µi 2i, ci = µi 3i, i = 1, 3. (4.991) 2. Numerical calculation Solving system (4.985), we obtain the eigenvalues λ1 = 2, λ2 = 3, λ3 = 4, (4.992) hence relations (4.988) lead to 11 = 0.439385, 21 = −0.167835, 31 = 0.117519, (4.993) 12 = 0.219692, 22 = 0.385929, 32 = −0.270230, (4.994) 13 = −0.000001, 23 = 0.939693, 33 = 1.342021, (4.995) µ1 = 2.062672, a1 = 0.906308, b1 = −0.346188, c1 = 0.242404, (4.996) µ2 = 1.923681, a2 = 0.422618, b2 = 0.742405, c2 = −0.519836, (4.997) µ3 = 0.610387, a3 = 4 × 10−7 , b3 = 0.573576, c3 = 0.819152. (4.998)
  • 233. 224 LINEAR ALGEBRA Mmin Fi F ri A (xA, yA, zA) P z x yO ui Figure 4.7 Problem 4.8. Problem 4.8 Consider a rigid solid (Fig. 4.7) in the reference frame Oxyz and the straight lines that pass through the points Ai of position vectors ri(xi, yi, zi), i = 1, 3, the unit vectors along which are ui(ai, bi, ci), i = 1, 3. Upon this solid act three forces of unknown intensities F1, F2, F3, the supports of which are the three straight lines. Let us determine the intensities F1, F2, F3 of the forces so that, at the point A of position vector rA(xA, yA, zA), the system of forces is reduced to a minimal torsor. Numerical application: x1 = 0, y1 = 0, z1 = 8a, x2 = a, y2 = 0, z2 = 0, x3 = 0, y3 = −6a, z3 = 0, a1 = 1, b1 = 0, c1 = 0, a2 = 0, b2 = 1, c2 = 0, a3 = 0, b3 = 0, c3 = 1, xA = 0, yA = 0, zA = 7a, a = 1 m. Solution: 1. Theory Reduced at O, the system of three forces is of components F = 3 i=1 Fiui, MO = 3 i=1 Firi · ui; (4.999) by reducing it at A, we obtain the components F = 3 i=1 Fiui, MA = 3 i=1 Firi · ui − rA · 3 i=1 Fiui. (4.1000) The conditions to have the minimal moment is transcribed in the relation MA = λF. (4.1001) Using the notations {F} = Fx Fy Fz T , {MA} = MAx MAy MAz T , {F} = F1 F2 F3 T , (4.1002) di = yici − zibi, ei = ziai − xici, fi = xibi − yiai, i = 1, 3, (4.1003) [U] =   a1 a2 a3 b1 b2 b3 c1 c2 c3   , [V] =   d1 d2 d3 e1 e2 e3 f1 f2 f3   , [rA] =   0 −zA yA zA 0 −xA −yA xA 0   , (4.1004) [A] = [V] − [rA][U], [B] = [U]−1 [A], (4.1005)
  • 234. APPLICATIONS 225 in a matrix form, relations (4.1000) become {F} = [U]{F}, {MA} = [A]{F} (4.1006) and condition (4.1001) reads [B]{F} = λ{F}; (4.1007) the problem becomes one of eigenvalues and eigenvectors. The eigenvalues λ1, λ2, λ3 are given by the equation det[[B] − λ[I]] = 0, (4.1008) while the intensities of the forces are given by the first two secular equations of the matrix equation (4.1007). We obtain thus three directions, hence three minimal torsors to which the considered system of forces is reduced. 2. Numerical calculation It follows, successively, that d1 = 0, e1 = 6a, f1 = 0, d2 = 0, e2 = 0, f2 = a, d3 = 6a, e3 = 0, f3 = 0, (4.1009) [U] =   1 0 0 0 1 0 0 0 1   , [V] =   0 0 6a 6a 0 0 0 a 0   , [rA] =   0 −7a 0 7a 0 0 0 0 0   , (4.1010) [A] = [B] =   0 7a −6a a 0 0 0 a 0   =   0 7 −6 1 0 0 0 1 0   , (4.1011) while equation (4.1008) is λ3 − 7λ + 6 = 0, (4.1012) with the solutions λ1 = 1, λ2 = 2, λ3 = −3. (4.1013) Equation (4.1007), written in the form   −λi 7 −6 1 −λi 0 0 1 −λi     F1 F2 F3   = {0}, (4.1014) leads to the solutions F2 = F1 λi , F3 = F1 λ2 i , (4.1015) that is, to the set of values of the components of the resultant along the axes F1 F1 F1 T , F1 F1 2 F1 4 T , F1 −F1 3 F1 9 T , (4.1016) F1 being an arbitrary value. Finally, it results in • the first minimal torsor: resultant F = F1 √ 3, minimum moment M1 min = F1a √ 3 = F1 √ 3, direction of the resultant 1/ √ 3 1/ √ 3 1/ √ 3 T ;
  • 235. 226 LINEAR ALGEBRA • the second minimal torsor: resultant F = F1 √ 21/4, minimum moment M1 min = F1a √ 21/2 = F1 √ 21/2, direction of the resultant 4/ √ 21 2/ √ 21 1/ √ 21 T ; • the third minimal torsor: resultant F = F1 √ 91/4, minimum moment M1 min = −F1a √ 91/3 = −F1 √ 91/3, direction of the resultant 9/ √ 91 −3/ √ 91 1/ √ 21 T . Problem 4.9 To study the free vibrations of an automobile, let us consider the model in Figure 4.8. Thus, for this model (half of an automobile) let the notations be as follows: k1, k2 —stiffness of the tires; k3, k4 —stiffness of the suspension springs; k5, k6 —stiffness of the passengers’ chairs; m1, m2 —the masses of the wheels (to which are added the masses of the pivot pins); m3 —half of the suspended mass of the automobile; m5, m6 —the masses of the chairs, to which are added 75% of the passengers’ masses; J —moment of inertia of the suspended mass with respect to the gravity center C. It is required • to determine the deflections of the springs in the state of equilibrium; • to write the matrix equation of the free vibrations; • to determine the eigenpulsations and the modal matrix; • to discuss the results thus obtained. Solution: 1. Theory Denoting by zi0, i = 1, 6, the deflections of the springs in the state of equilibrium and taking into account the forces represented in Figure 4.9, we obtain the equilibrium equations k1z10 − k3z30 = m1g, k2z20 − k4z40 = m2g, k3z30 + k4z40 − k5z50 − k6z60 = m3g, k3z30l1 − k5z50ls1 − k4z40l2 + k6z60ls2 = 0, k5z50 = m5g, k6z60 = m6g, (4.1017) m6g k3k4 k2 k1 k6 m3g l2 l1 ls2 ls1 m5g k5 m2g m1g Figure 4.8 Problem 4.9.
  • 236. APPLICATIONS 227 m6g m3g m2g m1g m5g k6z60 k6z60 k4z40 k4z40 k2z20 k1z10 k5z50 k5z50 k3 z30 k3 z30 l2 l1 ls2 ls1 l Figure 4.9 Equations of equilibrium. from which it follows that z10 = g k1l [m4l2 + m1l + m5(l1 + ls1) + m6(l2 − ls2)], z20 = g k2l [m3l1 + m2l + m5(l1 − ls1) + m6(l2 + ls2)], z30 = g k3l [(m3 + m5 + m6)l2 + m5ls1 − m6ls2], z40 = g k4l [(m3 + m5 + m6)l1 − m5ls1 + m6ls2], z50 = m5g k5 , z60 = m6g k6 . (4.1018) For an arbitrary position, denoting the displacements with respect to the position of equilibrium by z1, z2, z5, z6 for the masses m1, m2, m5, m6, the displacement of the point C by z3 and the angle of rotation of the suspended mass by φ, we obtain the forces represented in Figure 4.10. The theorem of momentum, written for the bodies of masses m1, m2, m3, m5, m6, leads to the equations m1 ¨z1 = −k1(z1 − z10) + k3(z3 + l1φ − z1 − z30) − m1g, m2 ¨z2 = −k2(z2 − z20) + k4(z3 − l2φ − z2 − z40) − m2g, m3 ¨z3 = −k3(z3 + l1φ − z1 − z30) − k4(z3 − l2φ − z2 − z40) + k5(z5 − z3 − ls1φ − z50) + k6(z6 − z3 + ls2φ − z60) − m3g, m5 ¨z5 = −k5(z5 − z3 − ls1φ − z50) − m5g, m6 ¨z6 = −k6(z6 − z3 + ls2φ − z60) − m6g, (4.1019)
  • 237. 228 LINEAR ALGEBRA m6g m3g m1gm2g m5g k6(z6−z3+ls2 ϕ−z60) k4(z4−z2−l2ϕ−z40) k3(z3−z1−l1ϕ−z30) k5(z5−z3+ls1 ϕ−z50) k2(z2−z20) k1(z1−z10) z6 z3 z2 z1 z5 C ϕ Figure 4.10 Equations of motion. while the theorem of moment of momentum with respect to the center of gravity of the body of mass m3 leads to the equation J ¨φ = k4l2(z3 − l2φ − z2 − z40) + k5ls1(z5 − z3 − ls1φ − z50) − k3l1(z3 + l1φ − z1 − z30) − k6ls2(z6 − z3 + ls2φ − z60). (4.1020) Using the matrix notations, {z} = z1 z2 z3 φ z5 z6 T , (4.1021) [M] =         m1 0 0 0 0 0 0 m2 0 0 0 0 0 0 m3 0 0 0 0 0 0 J 0 0 0 0 0 0 m5 0 0 0 0 0 0 m6         , (4.1022) [K] =        k1 + k3 0 −k3 k3l1 0 0 0 k2 + k4 −k4 k4l2 0 0 −k3 −k4 k2 + k4 + k5 + k6 k3l1 + k4l2 + k5ls1 − k6ls2 −k5 −k6 −k3l1 k4l2 k5l1 − k4l2 + k5ls1 − k6ls2 k3l2 1 + k4l2 2 + k3l2 s1 + k6l2 s2 −k5ls1 k6ls2 0 0 −k5 −k5ls1 k5 0 0 0 −k6 k6ls2 0 k6        (4.1023) and taking into account equation (4.1017), equation (4.1019), and equation (4.1020), we obtain the matrix differential equation [M]{¨z} + [K]{z} = {0}. (4.1024) The solution of this equation is of the form {z} = {a} cos(pt − φ) (4.1025)
  • 238. APPLICATIONS 229 and leads to the matrix equation [−p2 [M] + [K]]{a} = p2 {a}, (4.1026) equivalent to the equation [M]−1 [K]{a} = p2 {a}, (4.1027) which is a problem of eigenvalues and eigenvectors. Solving the equation det[[K] − p2 [M]] = 0, (4.1028) we obtain the eigenvalues p2 1, p2 2, . . . , p2 6 and the eigenpulsations p1, p2, . . . , p6. Corresponding to each eigenvalue, we obtain the eigenvectors {a(i) }, i = 1, 6, which define the modal matrix [A] = a(1) {a(2) } · · · {a(6) } . (4.1029) 2. Numerical calculation We obtain successively [M] =         30 0 0 0 0 0 0 30 0 0 0 0 0 0 450 0 0 0 0 0 0 300 0 0 0 0 0 0 60 0 0 0 0 0 0 60         , (4.1030) [M]−1 =         0.03333 0 0 0 0 0 0 0.03333 0 0 0 0 0 0 0.002222 0 0 0 0 0 0 0.003333 0 0 0 0 0 0 0.066667 0 0 0 0 0 0 0.066667         , (4.1031) [K] =         152000 0 −12000 15000 0 0 0 154000 −14000 17500 0 0 −12000 −14000 158000 −2500 −2000 −2000 15000 17500 −2500 42065 −1200 1200 0 0 −2000 −1200 2000 0 0 0 −2000 1200 0 2000         , (4.1032) [M]−1 [K] =         5066.67 0 −400 500 0 0 0 5133.33 −466.67 583.33 0 0 −26.67 −31.11 351.11 −5.56 −4.44 −4.44 50 58.33 −8.33 140.22 −4 4 0 0 −33.33 −20 33.33 0 0 0 −33.33 20 0 33.33         , (4.1033)
  • 239. 230 LINEAR ALGEBRA p1 = 6.22 s−1 , p2 = 8.04 s−1 , p3 = 13.13 s−1 , p4 = 14.26 s−1 , p5 = 71.19 s−1 , p6 = 41.69 s−1 , (4.1034) [A] =         0 0 0 0 1.0 0 0 0 0 0 0 1.0 0.5 0 0 0.2 0 0 0 −0.6 −0.2 0 0 0 0.7 −0.5 0.7 −0.7 0 0 0.6 0.7 −0.7 −0.7 0 0         . (4.1035) The first mode of vibration defined by the eigenvector for the eigenpulsation p1 corresponds to raising the vibrations of the suspended mass together with the displacement in phase of the chairs. The second and the third modes of vibration correspond to a motion of pitching, together with the motion in opposition to the phase of the chairs. The fourth mode of vibrations corresponds to a vibration of raise, together with the motion in opposition of the phase of the chairs. The last two modes of vibration correspond exclusively to the vibrations of the wheels. Problem 4.10 We consider the rectangular plate in Figure 4.11, of dimensions 2l1, 2l2, of mass m and of moments of inertia JX = ml2 2/3, JY = ml2 1/3, JZ = m(l2 1 + l2 2)/3, suspended by the springs AiBi of stiffness ki, i = 1, 4. As shown in Figure 4.11, the plate is in equilibrium under the action of the weight mg and of the deformed springs of deflections si, i = 1, 4. Considering that the deflections si are relatively great with respect to the displacements of the plate when it is vibrating, knowing the lengths Li = Ai0Bi and the angles αi, i = 1, 4, determine the following: • the matrix differential equation of the linear vibrations; • the eigenpulsations; • the modal matrix. B2 B3 B4 B1 k1 O0 2l2 2l1 Y X A20 A30 A40 A10 k2 k3 k4 α2 α3 α1 α4 Figure 4.11 Problem 4.10.
  • 240. APPLICATIONS 231 C z y x mg uC ki O Bi Ai δi ui Ai0 O0 Z Y X δ θ Figure 4.12 Small displacements of the rigid body. Solution: 1. Theory We consider a rigid solid the position of which is specified with respect to the fixed reference system O0XYZ and to a system of reference Oxyz rigidly linked to it (Fig. 4.12), so that at the position of equilibrium the mobile system coincides with the fixed one. A small displacement of an arbitrary position of the rigid solid is defined by the linear displace- ment δ of the point O and by the rotation angle θ. The rigid solid is acted upon by its own weight mg and is suspended by the springs AiBi, i = 1, n. To construct the mathematical model of the linear solutions of the rigid solid, we introduce the following notations: • (θX, θY , θZ), (δX, δY , δZ)—projections of the vectors θ and δ on the axes of the system O0XYZ; • {θ}, {δ}, {∆}—column matrices {θ} = θX θY θZ T , {δ} = δX δY δZ T , {∆} = δX δY δZ θX θY θZ T ; (4.1036) • δi —displacement Ai0Ai of the end of the spring AiBi; • ui —unit vector in the direction Ai0Ai of the spring AiBi in the position of equilibrium of the solid; • ri —position vector of the point Ai0; • xi, yi, zi —coordinates of the point Ai0 in the system O0XYZ, respectively, the coordinates of the point Ai in the system Oxyz; • [ri]—the matrix defined by [ri] =   0 −zi yi zi 0 −xi −yi xi 0   ; (4.1037) • ai, bi, ci —projections of the vector ui in the system O0XYZ;
  • 241. 232 LINEAR ALGEBRA • m∗ i —the vector defined by the relation m∗ i = ri · ui; (4.1038) • di, ei, fi —the projections of the vector m∗ i on the axes of the trihedron O0XYZ, that is, the quantities di = yici − zibi, ei = ziai − xici, fi = xibi − yiai; (4.1039) • {ui}, {m∗ i }, {Ui}—the column matrices given by the relations {ui} = ai bi ci T , {m∗ i } = di ei fi T , {Ui} = ai bi ci di ei fi T ; (4.1040) • C —the center of gravity of the rigid solid; • uC —the unit vector in the direction toward to the surface of the Earth; • xC, yC, zC —the coordinates of the center C in the system Oxyz, respectively, the coordi- nates of the point C0 in the system O0XYZ; • rC —the position vector O0C0 of the point C0; • aC, bC, cC —the projections of the vector uC in the system O0XYZ; • dC, eC, fC —parameters defined by the relations dC = yCcC − zCbC, eC = zCaC − xCcC, fC = xCbC − yCaC; (4.1041) • {UC}—the column matrix {UC} = aC bC cC dC eC fC T ; (4.1042) • δC —displacement of the point C; • li0 —the undeformed length of the spring AiBi; • [S]—the matrix defined by [S] =   0 −mzC myC mzC 0 −mxC −myC mxC 0   ; (4.1043) • [m]—the matrix [m] =   m 0 0 0 m 0 0 0 m   ; (4.1044) • [J]—the matrix of the moments of inertia [J] =   Jxx −Jxy −Jxz −Jxy Jyy −Jyz −Jxz −Jyz Jzz   ; (4.1045) • [M]—the matrix of inertia of the rigid solid [M] = [m] [S]T [S] [J] ; (4.1046)
  • 242. APPLICATIONS 233 • T , V —the kinetic energy and the potential energy, respectively; • Va, VC —the potential energy of the springs and the potential energy of the weight mg, respectively; • {∂T /∂ ˙∆}. {∂T /∂∆}, {∂V/∂∆}—the column matrices of the partial derivatives ∂T ∂ ˙∆ = ∂T ∂˙δX ∂T ∂˙δY ∂T ∂˙δZ ∂T ∂˙θX ∂T ∂˙θY ∂T ∂˙θZ T , ∂T ∂∆ = ∂T ∂δX ∂T ∂δY ∂T ∂δZ ∂T ∂θX ∂T ∂θY ∂T ∂θZ T , (4.1047) ∂V ∂∆ = ∂V ∂δX ∂V ∂δY ∂V ∂δZ ∂V ∂θX ∂V ∂θY ∂V ∂θZ T . By these notations, we may write T = 1 2 { ˙∆}[M]{ ˙∆}, Va = 1 2 n i=1 ki(AiBi − li0)2 , VC = mgδCuC, (4.1048) V = Va + VC. (4.1049) Lagrange’s equations have the matrix form d dt ∂T ∂ ˙∆ − ∂T ∂∆ + ∂V ∂∆ = {0}; (4.1050) taking into account the relation d dt ∂T ∂ ˙∆ = [M]{ ¨∆} (4.1051) and the fact that {∂T /∂∆} = {0}, because it is a function of second degree in the components of the matrix { ˙∆}, equation (4.1050) reads [M]{ ¨∆} + ∂VC ∂∆ + ∂Va ∂∆ = {0}. (4.1052) The displacements δC, δi, i = 1, n, being small, can be expressed by the relations δC = δ + θ · rC, δi = δ + θ · ri, i = 1, n, (4.1053) so that VC = mg{UC}T {∆}, (4.1054) ∂VC ∂∆ = mg{UC}. (4.1055) To calculate the column matrix {∂Va/∂∆} we express first the length AiBi, taking into account the second relation (4.1053), AiBi = (AiBi)2 = (Liui − δi)2, (4.1056) AiBi = (Liui − δ − θ · ri)2 (4.1057)
  • 243. 234 LINEAR ALGEBRA or AiBi = [(Liai − δX − θY zi + θZyi)2 + (Libi − δY − θZxi + θXzi)2 + (Lici − δZ − θXyi + θY xi)2 ]; (4.1058) by computing, it follows that ∂AiBi ∂∆ = 1 AiBi −Li Ui + [I] −[ri] [ri] −[ri]2 {∆}, (4.1059) where [I] is the unit matrix of third order. From relation (4.1057), expanding the binomial into series and neglecting the nonlinear terms, we obtain AiBi = Li 1 − 2 ui Li δ − θ · ri 1 2 = Li − {Ui}T {∆}; (4.1060) taking into account the relation si = Li − li0 (4.1061) and neglecting the nonlinear terms, it follows that AiBi − li0 AiBi = si − {Ui}T {∆} Li − {Ui}T{∆} = si Li − 1 Li Ui T {∆} 1 + 1 Li Ui T {∆} (4.1062) or AiBi − li0 AiBi = si Li − 1 − si Li {Ui}T {∆}. (4.1063) Finally, denoting by [K] the rigidity matrix [K] = n i=1 kisi Li [I] [ri]T [ri] −[ri]2 + n i=1 1 − si Li {Ui}{Ui}T (4.1064) and taking into account the equilibrium equation mg{UC} − n i=1 kisi{Ui} = {0}, (4.1065) we get, from equation (4.1062), the matrix differential equation of the linear vibrations [M]{ ¨∆} + [K]{∆} = {0}. (4.1066) 2. Numerical calculation We obtain successively JX = 5 kg m2 , JY = 3.2 kg m2 , JZ = 8.2 kg m2 , (4.1067) xC = 0, yC = 0, zC = 0, [S] =   0 0 0 0 0 0 0 0 0   , (4.1068) [M] =         60 0 0 0 0 0 0 60 0 0 0 0 0 0 60 0 0 0 0 0 0 5 0 0 0 0 0 0 3.2 0 0 0 0 0 0 8.2         , (4.1069)
  • 244. APPLICATIONS 235 x1 = 0.3, y1 = 0.5, z1 = 0, x2 = −0.3, y2 = 0.5, z2 = 0, x3 = −0.3, y3 = −0.5, z3 = 0, x4 = 0, y4 = −0.5, z4 = 0, (4.1070) [r1] =   0 0 0.5 0 0 −0.3 −0.5 0.3 0   , [r2] =   0 0 0.5 0 0 0.3 −0.5 −0.3 0   , [r3] =   0 0 −0.5 0 0 −0.3 0.5 0.3 0   , [r4] =   0 0 −0.5 0 0 −0.3 0.5 0.3 0   , (4.1071) a1 = √ 3 2 , b1 = 1 2 , c1 = 0, d1 = 0, e1 = 0, f1 = −0.28301, a2 = − √ 3 2 , b2 = 1 2 , c2 = 0, d2 = 0, e2 = 0, f2 = 0.28301, a3 = − √ 3 2 , b3 = − 1 2 , c3 = 0, d3 = 0, e3 = 0, f3 = −0.28301, a4 = √ 3 2 , b4 = − 1 2 , c4 = 0, d4 = 0, e4 = 0, f4 = 0.28301, (4.1072) [r1]2 =   −0.25 0.15 0 0.15 −0.09 0 0 0 −0.34   , [r2]2 =   −0.25 0.15 0 0.15 −0.09 0 0 0 −0.34   , [r3]2 =   −0.25 0.15 0 0.15 −0.09 0 0 0 −0.34   , [r4]2 =   −0.25 −0.15 0 −0.15 −0.09 0 0 0 −0.34   , (4.1073) {U1} = 0.86603 0.5 0 0 0 −0.28301 T , {U2} = −0.86603 0.5 0 0 0 0.28301 T , {U3} = −0.86603 −0.5 0 0 0 −0.28301 T , {U4} = 0.86603 −0.5 0 0 0 0.28301 T , (4.1074) {U1}{U1}T =         0.75 0.43301 0 0 0 −0.24510 0.43301 0.25 0 0 0 −0.14151 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 −0.24510 −0.14151 0 0 0 008010         , {U2}{U2}T =         0.75 −0.43301 0 0 0 −0.24510 −0.43301 0.25 0 0 0 0.14151 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 −0.24510 0.14151 0 0 0 0.08010         ,
  • 245. 236 LINEAR ALGEBRA {U3}{U3}T =         0.75 0.43301 0 0 0 0.24510 0.43301 0.25 0 0 0 0.14151 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.24510 0.14151 0 0 0 0.08010         , {U4}{U4}T =         0.75 −0.43301 0 0 0 0.24510 −0.43301 0.25 0 0 0 −0.14151 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.24510 −0.14151 0 0 0 0.08010         , (4.1075) k1s1 L1 [I] [r1]T [r1] −[r1]2 + k1 1 − s1 L1 {U1}{U1}T =         6206.67 2829 0 0 0 −2254.65 2829 2940 0 0 0 −532.53 0 0 1306.67 653.33 −392 0 0 0 653.33 326.67 −196 0 0 0 −392 −196 117.6 0 −2254.65 −532.53 0 0 0 967.59         , (4.1076) k2s2 L2 [I] [r2]T [r2] −[r2]2 + k2 1 − s2 L2 {U2}{U2}T =         3266.67 −1131.6 0 0 0 −1293.86 −1131.6 1960 0 0 0 −22.19 0 0 1306.67 653.33 392 0 0 0 653.33 326.67 −196 0 0 0 392 −196 117.6 0 −1293.86 −22.19 0 0 0 653.39         , (4.1077) k3s3 L3 [I] [r3]T [r3] −[r3]2 + k3 1 − s3 L3 {U3}{U3}T =         1568 678.96 0 0 0 580.32 678.96 784 0 0 0 104.29 0 0 392 −196 117.6 0 0 0 −196 98 −58.8 0 0 0 117.6 −58.8 35.28 0 580.32 104.29 0 0 0 258.88         , (4.1078) k4s4 L4 [I] [r4]T [r4] −[r4]2 + k4 1 − s4 L4 {U4}{U4}T =         1148 −436.47 0 0 0 443.06 −436.47 644 0 0 0 −25.04 0 0 392 −196 −117.6 0 0 0 −196 98 58.8 0 0 0 −117.6 58.8 35.28 0 443.06 −25.04 0 0 0 214.02         , (4.1079)
  • 246. APPLICATIONS 237 [K] = 4 i=1 kisi Li [I] [ri]T [ri] −[ri]2 + 4 i=1 ki 1 − si Li {Ui}{Ui}T =         12189.34 1939.89 0 0 0 −2525.13 1939.89 6328 0 0 0 −475.47 0 0 3397.34 914.66 0 0 0 0 914.66 849.34 −392 0 0 0 0 −392 305.76 0 −2525.13 −475.47 0 0 0 2094.08         . (4.1080) The eigenpulsations are obtained from the equation 12189.34 −60p2 1939.89 0 0 0 −2525.13 1939.89 6328 −60p2 0 0 0 −475.47 0 0 3397.34 −60p2 914.66 0 0 0 0 914.66 849.34 −5p2 −392 0 0 0 0 −392 305.76 −3.2p2 0 −2525.13 −475.47 0 0 0 2094.08 −8.2p2 = 0, (4.1081) from which 12189.34 − 60p2 1939.89 −2525.13 1939.89 6328 − 60p2 −475.47 −2525.13 −475.47 2094.08 − 8.2p2 × 3397.34 − 60p2 914.66 0 914.66 849.34 − 5p2 −392 0 −392 305.76 − 3.2p2 = 0, (4.1082) that is, −29520p6 + 16649219.28p4 − 2532108243.09p2 + 115198062272.87 = 0 (4.1083) or −960p6 + 309158.72p4 − 18112125.58p2 + 104420926.76 = 0. (4.1084) It follows that p1 = 18.751, p2 = 10.934, p3 = 9.635, p4 = 567.485, p5 = 0.763, p6 = 0.761. (4.1085) For the first three eigenpulsations, we use the system (12189.34 − 60p2 )a1 + 1939.89a2 = 2525.13, 1939.89a1 + (6328 − 60p2 )a2 = 475.47, (4.1086)
  • 247. 238 LINEAR ALGEBRA while for the last three eigenpulsations, we use the system (3397.34 − 60p2 )b1 + 914.66b2 = 0, 914.66b1 + (849.34 − 5p2 )b2 = 392. (4.1087) The modal matrix reads [A] =         −0.495 0.382 0.791 0 0 0 −0.973 0.314 −1.396 0 0 0 0 0 0 10−8 −0.17843 −0.17841 0 0 0 0.00024 0.65594 0.65591 0 0 0 1 1 1 1 1 1 0 0 0         . (4.1088) Problem 4.11 Determine the efforts in the homogeneous, articulated, straight bars, of constant cross section from which a rigid solution is suspended. Solution: 1. Theory 1.1. Generalities. Notations Consider a rigid solid, as illustrated in Figure 4.13, suspended by the elastic straight bars A0iAi, i = 1, n, of constant cross section and spherical articulated (with spherical hinges) and the notations: • O0XYZ —the dextrorsum three-axes orthogonal fixed reference system; • Oxyz —the dextrorsum three-axes orthogonal reference system, rigidly linked to the solid; • XO, YO, ZO —the coordinates of the point O in the system O0XYZ; • F, MO —the resultant and the resultant moment, respectively, of the external forces that act upon the body; O0 z x y Z X Y Ai ui MO F O (XO,YO,ZO) A0i Figure 4.13 Problem 4.11.
  • 248. APPLICATIONS 239 li +∆li ri Ai0 (Xi0 ,Yi0 ,Zi0 ) z x O (XO,YO,ZO) y Ai li Figure 4.14 Small displacements. • (Fx, Fy, Fz), (Mx, My, Mz)—projection of the vectors F, MO on the axes of the Oxyz-trihedron; • li —length of the bar A0iAi; • Ai —area of the cross section of the bar A0iAi; • Ei —the longitudinal elasticity modulus of the bar A0iAi; • ki —the stiffness of the bar A0iAi ki = EiAi li ; (4.1089) • δ—the displacement (small) of the point O (Fig. 4.14); • θ—the rotation angle (small) of the rigid solid; • (δx, δy, δz), (θx, θy, θz)—the projections of the vectors δ and θ on the axes of the Oxyz-trihedron; • δi —the displacement (small) of the point Ai; • ui —the unit vector of the direction AiA0i; • ri —the position vector of the point Ai; • xi, yi, zi —the coordinates of the point Ai in the Oxyz-system; • ai, bi, ci —projections of the unit vector ui on the axes of the Oxyz-trihedron; • di, ei, fi —projections of the vector ri · ui on the axes of the Oxyz-trihedron, that is, di = yici − zibi, ei = ziai − xici, fi = xibi − yiai; (4.1090) • Ni —intensity of the effort Ni in the bar A0iAi; • li —deformation of the bar A0iAi; • {F}, {∆}, {Ui}—column matrices defined by {F} = Fx Fy Fz Mx My Mz T , {∆} = δx δy δz θx θy θz T , (4.1091) {Ui} = ai bi ci di ei fi T . 1.2. Case in which none of the bars is deformed by the application of the external load F, MOWith the above notations, we write the obvious relation (li + li)2 = (−liui + δi)2 , (4.1092)
  • 249. 240 LINEAR ALGEBRA from which, neglecting the nonlinear terms ( li)2 , we obtain the relation li = −ui · δi. (4.1093) The displacement of the point Ai of the solid is small, so that it can be expressed by δi = δ + θ · ri; (4.1094) hence, using the mentioned notations, relation (4.1093) becomes li = −{Ui}T {∆}. (4.1095) Under these conditions, the intensities of the efforts in the bars are Ni = ki li = −ki{Ui}T {∆}, i = 1, n; (4.1096) if Ni > 0 the bars are subjected to traction and if Ni < 0 they are subjected to com- pression. The effort vector reads Ni = −ki{Ui}T {∆}ui. (4.1097) Taking into account the previous notations and the equations of equilibrium n i=1 Ni + F = 0, n i=1 ri · Ni + MO = 0, (4.1098) we obtain the matrix equation [K]{∆} = {F}, (4.1099) where [K] is the stiffness matrix given by [K] = n i=1 ki{Ui}{Ui}T = n i=1 ki         a2 i aibi aici aidi aiei aifi biai b2 i bici bidi biei bifi ciai cibi c2 i cidi ciei cifi diai dibi dici d2 i diei difi eiai eibi eici eidi e2 i eifi fiai fibi fici fidi fiei f 2 i         . (4.1100) Thus, equation (4.1099) gives the displacement {∆}, and then the efforts in the bars are given by equation (4.1096) Particular cases: (a) The bars are parallel We suppose in this case, that the bars are parallel to the Oz-axis and we get, successively ai = bi = 0, ci = 1, di = yi, ei = −xi, fi = 0, (4.1101) {Ui} = ci di ei T , {∆} = δz θx θy T , {F} = Fz Mx My T , (4.1102) [K] = n i=1 ki{Ui}{Ui}T = n i=1 ki   c2 i cidi ciei dici d2 i diei eici eidi e2 i  . (4.1103)
  • 250. APPLICATIONS 241 (b) The bars are coplanar We assume that the bars are situated in the Oxy-plane, so that ci = 0, di = ei = 0, fi = xibi − yiai, (4.1104) {Ui} = ai bi fi T , {∆} = δx δy θz T , {F} = Fx Fy Mz T , (4.1105) [K] = n i=1 ki{Ui}{Ui}T = n i=1 ki   a2 i aibi aifi biai b2 i bifi fiai fibi f 2 i  . (4.1106) (c) The bars are parallel and coplanar In this case, we assume that the bars are situated in the Oz-plane and are parallel to the Oz-axis; it follows that ai = bi = 0, ci = 1, di = fi = 0, ei = −xi, (4.1107) {Ui} = ci ei T , {∆} = δz θy T , {F} = Fz My T , (4.1108) [K] = n i=1 ki{Ui}{Ui}T = n i=1 ki c2 i ciei eici e2 i . (4.1109) (d) The bars are concurrent In this case, the solid is reduced to the concurrence point, so that θx = θy = θz, di=0 = ei = fi, (4.1110) {Ui} = ai bi ci T , { } = δx δy δz T , (4.1111) [K] = n i=1 ki{Ui}{Ui}T = n i=1 ki   a2 i aibi aici biai b2 i bici ciai cibi c2 i  . (4.1112) (e) The bars are concurrent and coplanar If the bars are situated in the Oxy-plane, we have ci = 0, (4.1113) {Ui} = ai bi T , {∆} = δx δy T , (4.1114) [K] = n i=1 ki{Ui}{Ui}T = n i=1 ki a2 i aibi biai b2 i . (4.1115) 1.3. Case in which the bars have errors of fabrication equal to li In this case, relations (4.1096) and (4.1097) become Ni = −ki[ li + {Ui}T {∆}], (4.1116) Ni = −kiui[ li + {Ui}T {∆}]. (4.1117) Using the notation {F} = −ki li{Ui}, (4.1118)
  • 251. 242 LINEAR ALGEBRA C G u5 u6 u3 u5 u4 u2 u1 A5 A6 A3 A2 A1 A4 A05 A07 A7 A06 A03 A04 A01 A05 X Z O Y Figure 4.15 Application 2.1. for where li corresponds to bars that are longer, the equation of equilibrium (4.1098) leads to the equation [K]{∆} = {F} + {F}. (4.1119) The rigidity matrix is also given by relation (4.1100). In the case of temperature variations, the deviations that appear are given by ←→ l i = liαi T, (4.1120) where αi is the coefficient of linear dilatation, while T is the temperature variation in Kelvin. 2. Numerical applications Application 2.1. We consider the rigid solid in the form of a homogeneous parallelepiped (Fig. 4.15) of weight G and dimensions 2a, 2b, 2c, suspended by seven homogeneous articulated straight bars, of the same length l and of the same stiffness k, the bars A1A01, A2A02 being parallel to the OX-axis, the bars A3A03, A4A04 being parallel to the OY -axis, while the bars A5A05, A6A06 and A7A07 are parallel to the vertical OZ-axis.Assuming that the rigid solid is acted upon only by its own weight G, let us determine the efforts in the seven bars in the following cases: (a) The bars have no fabrication errors. (b) The bars A1A01, A6A06 have the fabrication errors l1, l6. (c) The bar A4A04 is heated by T 0 . Solution of Application 2.1: (a) It follows, successively, that {U1} = 1 0 0 0 0 0 T , {U2} = 1 0 0 0 0.5 0 T , {U3} = 0 1 0 0 0 0 T , {U4} = 0 1 0 0 0 1 T , {U5} = 0 0 1 0 0 0 T , {U6} = 0 0 1 1 0 0 T , {U7} = 0 0 1 1 −1 0 T , (4.1121)
  • 252. APPLICATIONS 243 {U1}{U1}T =         1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0         , {U2}{U2}T =         1 0 0 0 0.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.5 0 0 0 0.25 0 0 0 0 0 0 0         , {U3}{U3}T =         0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0         , {U4}{U4}T =         0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1         , {U5}{U5}T =         0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0         , {U6}{U6}T =         0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0         , {U7}{U7}T =         0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 −1 0 0 0 1 1 −1 0 0 0 −1 −1 1 0 0 0 0 0 0 0         , (4.1122) [K] = k 7 i=1 {Ui}{Ui}T = 107         2 0 0 0 0.5 0 0 2 0 0 0 1 0 0 3 2 −1 0 0 0 2 2 −1 0 0.5 0 −1 −1 1.25 0 0 1 0 0 0 1         , (4.1123) {F} = 200000 0 0 −1 −1 1 0 T , (4.1124) [K]−1 = 10−7         0.6 0 0 −0.2 −0.4 0 0 1 0 0 0 −1 0 0 1 −1 0 0 −0.2 0 −1 1.9 0.8 0 −0.4 0 0 0.8 1.6 0 0 −1 0 0 0 2         . (4.1125) {∆} = [K]−1 {F} = 0.004 0 0 −0.002 0.016 0 T . (4.1126) N1 = −k1{U1}T { } = −40,000 N, N2 = −k2{U2}T { } = −12,0000 N, N3 = −k3{U3}T { } = 0 N, N4 = −k4{U4}T { } = 0 N, N5 = −k5{U5}T { } = 0 N, N6 = −k6{U6}T { } = 20,000 N, N7 = −k7{U7}T { } = 180,000 N. (4.1127)
  • 253. 244 LINEAR ALGEBRA O G u3u4 u2 u1 A1 A4 A2 A3 A04 A02 A03 A01 X Z Y Figure 4.16 Application 2.2. (b) We have, in this case, {F} = −k1 l1 U1 − k6 l6{U6} = 200000 −1 0 −1 −1 0 0 T , (4.1128) {F} + { ←→ F } = 200000 −1 0 −2 −2 1 0 T , (4.1129) {∆} = [K]−1 {{F} + {F}} = −0.012 0 0 −0.016 0.008 0 T , (4.1130) N1 = −k1[ l1 + {U1}T {∆}] = −80,000 N, N2 = −k2{U2}T {∆} = 80,000 N, N3 = −k3{U3}T {∆} = 0 N, N4 = −k4{U4}T {∆} = 0 N, N5 = −k5{U5}T {∆} = 160,000 N, N6 = −k6[ l6 + {U6}T {∆}] = −40,000 N, N7 = −k7{U7}T {∆} = 240,000 N. (4.1131) (c) We obtain l4 = lα4 T = 0.012 m, (4.1132) {F} = −k4 l4{U4} = 120,000 0 −1 0 0 0 −1 T , (4.1133) {F} + {F} = 40,000 0 −3 −5 −5 5 −3 T , (4.1134) {∆} = [K]−1 {{F} + {F}} = −0.004 0 0 −0.002 0.016 −0.012 T , (4.1135) N1 = −k1{U1}T {∆} = 40,000 N, N2 = −k2{U2}T {∆} = −40,000 N, N3 = −k3{U3}T {∆} = 0 N, N4 = −k4[ l4 + {U4}T {∆}] = 0 N, N5 = −k5{U5}T {∆} = 20,000 N, N6 = −k6{U6}{∆} = −140,000 N, N7 = −k7{U7}{∆} = 180,000 N. (4.1136) Application 2.2. A square horizontal plate of side 2l and weight G is suspended by four vertical bars of elastic stiffness k1, k2, k3, k4 (Fig. 4.16). Determine the efforts in the bars in the following cases: (a) The bars have no fabrication errors. (b) The bar A1A01 has a fabrication error given by l1. Numerical data: l = 1 m, G = 200000 N, k1 = 8 × 106 Nm−1 , k2 = 2 × 106 Nm−1 , k3 = 5 × 106 Nm−1 , k4 = 6 × 106 Nm−1 , l1 = 0.02 m. Solution of Application 2.2: (a) It follows that {U1} = 1 −l −l T = 1 −1 −1 T , {U2} = 1 l −l T = 1 1 −1 T , {U3} = 1 l l T = 1 1 1 T , {U4} = 1 −l l T = 1 −1 1 T , (4.1137)
  • 254. APPLICATIONS 245 {U1}{U1}T =   1 −1 −1 −1 1 1 −1 1 1   , {U2}{U2}T =   1 1 −1 1 1 −1 −1 −1 1   , {U3}{U3}T =   1 1 1 1 1 1 1 1 1   , {U4}{U4}T =   1 −1 1 −1 1 −1 1 −1 1   , (4.1138) [K] = 4 i=1 ki{Ui}{Ui}T = 106   21 −7 1 −7 21 5 1 5 21   , (4.1139) {F} = 200,000 −1 0 0 T , (4.1140) [K]−1 = 10−6   0.054622 0.019958 −0.007353 0.019958 0.057773 −0.014706 −0.007353 −0.014706 0.051471   , (4.1141) {∆} = [K]−1 {F} = 0.010924 0.003992 −0.001471 T , (4.1142) N1 = −k1{U1}T {∆} = −67,224 N, N2 = −k2{U2}T {∆} = −32,774 N, N3 = −k3{U3}T {∆} = −67,225 N, N4 = −k4{U4}T {∆} = −32,766 N. (4.1143) (b) The rigidity matrix [K] and the matrix {F} remain the same. We calculate, successively, {F} = −k1 l1{U1} = 160,000 −1 1 1 T , (4.1144) {F} + {F} = 40,000 −9 4 4 T , (4.1145) {∆} = [K]−1 {{F} + {F}} = −0.017647 −0.000294 0.008529 T , (4.1146) N1 = −k1[ l1 + {U1}T {∆}] = 47,056 N, N2 = −k2{U2}T {∆} = 52,040 N, N3 = −k3{U3}T {∆} = 47,060 N, N4 = −k4{U4}T {∆} = 52,944 N. (4.1147) Application 2.3. Let us consider the rectangular plate in Figure 4.17 of dimensions 2a, 2b, suspended by the hinged bars A1A01, A3A03 parallel to the Ox-axis and by the bars A2A02, A4A04 parallel to the Oy-axis. O A3 A4 A1 A2 A02 A03 A04 A01 P x y 2a 2b Figure 4.17 Application 2.3.
  • 255. 246 LINEAR ALGEBRA Knowing that the plate is acted upon at the point A2 by the force P parallel to the Ox-axis and knowing the rigidities k1 = k, k2 = k3 = 2k, k4 = 3k, determine the efforts in the bars in the following cases: (a) The bars have not fabrication errors. (b) The bar A1A01 has a fabrication error equal to l1. (c) The bar A1A01 of length l1 is heated by T 0 . Numerical data: a = 0.5 m, b = 0.4 m, P = 10000 N, k = 106 N m−1 , l1 = 0.01 m, T = 100 K, α1 = 12 × 106 deg−1 , l1 = 1 m. Solution of Application 2.3: (a) We have {U1} = 1 0 b T = 1 0 0.4 T , {U2} = 0 1 a T = 0 1 0.5 T , {U3} = −1 0 b T = −1 0 0.4 T , {U4} = 0 1 −a T = 0 1 −0.5 T , (4.1148) {U1}{U1}T =   1 0 0.4 0 0 0 0.4 0 0.16   , {U2}{U2}T =   0 0 0 0 1 0.5 0 0.5 0.25   , {U3}{U3}T =   1 0 −0.4 0 0 0 −0.4 0 0.16   , {U4}{U4}T =   0 0 0 0 1 −0.5 0 −0.5 0.25   , (4.1149) [K] = 4 i=1 ki{Ui}{Ui}T = 106   3 0 −0.4 0 5 −0.5 −0.4 −0.5 1.73   , (4.1150) [K]−1 = 10−6   0.344262 0.008197 0.081967 0.008197 0.206148 0.061475 0.081967 0.061475 0.614754   , (4.1151) {F} = P 1 0 −b T = 10,000 1 0 −0.4 T , (4.1152) {∆} = δx δy θz T = [K]−1 {F} = 0.003115 0.000164 −0.001639 T , (4.1153) N1 = −k1{U1}T {∆} = −24,594 N, N2 = −k2{U2}T {∆} = 1311 N, N3 = −k3{U3}T {∆} = 75,412 N, N4 = −k4{U4}T {∆} = −24,505 N. (4.1154) (b) We obtain {F} = −k1 l1{U1} = 2000 −5 0 −2 T , (4.1155) {F} + {F} = 8000 0 0 −1 T , (4.1156) {∆} = [K]−1 {{F} + {F}} = −0.000656 −0.000492 −0.004918 T , (4.1157) N1 = −k1[ l1 + {U1}T {∆}] = −73,768 N, N2 = −k2{U2}T {∆} = 5902 N, N3 = −k3{U3}T {∆} = 2622.4 N, N4 = −k4{U4}T {∆} = 5901 N. (4.1158) (c) It follows successively that l1 = l1α1 T = 0.0012 m, (4.1159) {F} = −k1 l1{U1} = −12000 0 −4800 T , (4.1160) {F} + {F} = −2000 0 −8800 T , (4.1161)
  • 256. APPLICATIONS 247 A05 A04 A03 A02 A01 y z P O x Figure 4.18 Application 2.4. {∆} = [K]−1 {{F} + {F}} = −0.001410 −0.000557 −0.005574 T , (4.1162) N1 = −k1[ l1 + {U1}T {∆}] = 24,396 N, N2 = −k2{U2}T {∆} = 6688 N, N3 = −k3{U3}T {∆} = 1639.2 N, N4 = −k4{U4}T {∆} = −6690 N. (4.1163) Application 2.4. We consider the spatial system of articulated bars in Figure 4.18, concurrent at the articulation O, where the vertical force P situated on the Oz-axis is acting. Knowing the rigidities of the bars A0iO, i = 1, 5, k1 = 2k, k2 = 1.5k, k3 = 2k, k4 = 3k, k5 = 2.5k and the direction cosines of their directions (ai, bi, ci), i = 1, 5, determine the efforts in the bars in the following cases: (a) The bars have no fabrication errors. (b) The bar A01O has a fabrication error equal to l1. (c) The bar A01O of length l1 is heated by T 0 . Numerical data: P = 20000 N, k = 106 N m−1 , l1 = 0.02 m, l1 = 1 m, α1 = 12 × 10−6 deg−1 , (a1, b1, c1) = (3/5, 4/5, 0), (a2, b2, c2) = (2/3, 2/3, 1/3), (a3, b3, c3) = (0, 3/5, 4/5), (a4, b4, c4) = (−2/3, 2/3, 1/3), (a5, b5, c5) = (−3/5, 4/5, 0). Solution of Application 2.4: (a) We have, successively, {Ui} = ai bi ci T , i = 1, 5, (4.1164) {U1}{U1}T =        9 25 12 25 0 12 25 16 25 0 0 0 0        , {U2}{U2}T =         4 9 4 9 2 9 4 9 4 9 2 9 2 9 2 9 1 9         , {U3}{U3}T =       0 0 0 0 9 25 12 25 0 12 25 16 25       , {U4}{U4}T =         4 9 − 4 9 − 2 9 − 4 9 4 9 2 9 − 2 9 2 9 1 9         , {U5}{U5}T =        9 25 − 12 25 0 − 12 25 16 25 0 0 0 0        , (4.1165)
  • 257. 248 LINEAR ALGEBRA [K] = 5 i=1 ki{Ui}{Ui}T = 106   3.62 −0.906667 −0.333333 −0.906667 5.6 1.96 −0.333333 1.96 1.78   , (4.1166) [K]−1 = 10−6   0.324021 0.119911 −0.192715 0.119911 0.334921 −0.391245 −0.192715 −0.391245 1.028696   , (4.1167) {F} = 20,000 0 0 −1 T , (4.1168) {∆} = δx δy δz T = [K]−1 {F} = 0.003854 0.007825 −0.020574 T , (4.1169) N1 = −k1{U1}T {∆} = −17,144.8 N, N2 = −k2{U2}T {∆} = −1392 N, N3 = −k3{U3}T {∆} = 23,528.4 N, N4 = −k4{U4}T {∆} = 12,632 N, N5 = −k5{U5}T {∆} = −9869 N. (4.1170) (b) It follows that {F} = −k1 l1{U1} = 8000 −3 −4 0 T , (4.1171) {F} + {F} = 4000 −6 −8 −5 T , (4.1172) {∆} = [K]−1 {{F} + {F}} = −0.007759 −0.005770 −0.003429 T , (4.1173) N1 = −k1[ l1 + {U1}T {∆}] = −21,457.2 N, N2 = −k2{U2}T {∆} = 15,243.5 N, N3 = −k3{U3}T {∆} = 12,410.4 N, N4 = −k4{U4}T {∆} = 6309 N, N5 = −k5{U5}T {∆} = −98.5 N. (4.1174) (c) We obtain the values l1 = l1α1 T = 12 × 10−4 m, (4.1175) {F} = −k1 l1{U1} = −1440 −1920 0 T , (4.1176) {F} + {F} = −1440 −1920 −20000 T , (4.1177) {∆} = [K]−1 {{F} + {F}} = 0.003157 −0.008641 −0.021048 T , (4.1178) N1 = −k1[ l1 + {U1}T {∆}] = 7637.2 N, N2 = −k2{U2}T {∆} = 16,008 N, N3 = −k3{U3}T {∆} = 44,046 N, N4 = −k4{U4}T {∆} = 44,644 N, N5 = −k5{U5}T {∆} = 22,017.5 N. (4.1179) Problem 4.12 Let us consider the continuous beam in Figure 4.19, where the sections have lengths lk and rigidities EIk, k = 1, n − 1. The beam is acted upon by given distributed loads and by given concentrated forces and moments. It is required to determine the reactions Vk, k = 1, n, in the supports.
  • 258. APPLICATIONS 249 l1 l1 l2 l2 ln−2 ln−2 ln−1 ln−1 Figure 4.19 Problem 4.12. Q q Mk−1 Mk+1 Mk+1Mk−1 Mk Mk Mk Mk Ak−1 Ak+1 Ak+1 Ak−1 Ak Ak Ak Ak Q V ″k−1 V″kV ′k V ′k+1 q lk−1 lk (a) (b) (d)(c) Figure 4.20 Isolation of the sections Ak−1Ak and AkAk+1. Solution: 1. Theory By isolating the sections Ak−1Ak, AkAk+1, we obtain the representations in Figure 4.20a and b, where we have the notations: • q, Q—given external loads; • Mk−1, Mk, Mk+1 —bending moments; • Vk, Vk+1 —reactions at the right of each section; • Vk−1, Vk —reactions at the left of each section. Figure 4.20c and d represents the loadings (bending moments) of the conjugate beams, while the bending moments given by the external loads q, Q are represented under the reference lines Ak−1Ak, AkAk+1. Denoting by Mr k−1, Ml k+1 the resultant moments of the external loading q with respect to the points Ak−1 and Ak+1, respectively, it follows that the reactions Vk, Vk are given by Vk = 1 lk−1 (Mk−1 + Mr k−1 − Mk), Vk = 1 lk (Mk∗1 + Ml k∗1 − Mk), (4.1180) so that the reaction at the support Ak Vk = Vk + Vk (4.1181) reads Vk = Mk−1 lk−1 − Mk 1 lk−1 + 1 lk + Mk+1 lk + Mr k−1 lk−1 + Ml k+1 lk . (4.1182) Because the rotations φk, φk at the fixed support Ak for the two sections, respectively, are equal to the shearing forces divided by the rigidities EIk−1, EIk of the conjugate beams, it follows that φk = 1 EIk−1 Mklk−1 2 2lk−1 3 + Mk−1lk−1 2 lk−1 3 + Sr k−1 , (4.1183) φk = 1 EIk Mklk 2 2lk 3 + Mk+1lk+1 2 lk 3 + Sl k+1 , (4.1184)
  • 259. 250 LINEAR ALGEBRA where by Sr k−1, Sl k+1 we have denoted the static moments of the bending moments of the areas corresponding to the external loads q, Q. The indices r, l specify the loadings to the right and at the left of the supports Ak−1, Ak+1, respectively. Taking into account the relation φk + φk = 0, (4.1185) we obtain, from equation (4.1183) and equation (4.1184), the Clapeyron relation Mk−1lk−1 Ik−1 + 2Mk lk−1 Ik−1 + lk Ik + Mk+1 Ik + 6 Sr k−1 lk−1Ik−1 + Sl k+1 lkIk = 0. (4.1186) If we take into account that the moments at the supports A1, An vanish (M1 = Mn = 0) and if we use the notations [A] =           2 l1 I1 + l2 I2 l2 I2 0 0 · · · 0 0 0 l2 I2 2 l2 I2 + l3 I3 l3 I3 0 · · · 0 0 0 0 l3 I3 2 l3 I3 + l4 I4 0 · · · 0 0 0 · · · · · · · · · · · · · · · · · · · · · · · · 0 0 0 0 · · · ln−3 In−3 2 ln−3 In−3 + ln−2 In−2 ln−2 In−2 0 0 0 0 · · · 0 ln−2 In−2 2 ln−2 In−2 + ln−1 In−1           , (4.1187) {B} = −6                     Sr 1 l1I1 + Sl 2 l2I2 Sr 2 l2I2 + Sl 3 l3I3 Sr 3 l3I3 + Sl 3 l3I3 ... Sr n−2 ln−2In−2 + Sl n−1 ln−1In−1                     , (4.1188) {M} = M2 M3 · · · Mn−1 T , (4.1189) {V} = V1 V2 · · · Vn T , (4.1190) [C] =               1 l1 0 0 0 · · · − 1 l1 + 1 l2 1 l2 0 0 · · · 1 l2 − 1 l2 + 1 l3 1 l3 0 · · · · · · · · · · · · · · · · · · · · · · · · · · · 0 0 0 0 · · · 1 ln−3 − 1 ln−3 + 1 ln−2 1 ln−2 0 0 0 0 · · · 0 1 ln−2 − 1 ln−2 + 1 ln−1 0 0 0 0 · · · 0 0 1 ln−1               , (4.1191)
  • 260. APPLICATIONS 251 Pq l l l l l/2l/2 A1 A2 A3 A4 A5 A6 Figure 4.21 Numerical application. {D} =                           Ml 1 l1 Mr 1 l1 + Ml 2 l2 Mr 2 l2 + Ml 3 l3 ... Mr n−2 ln−2 + Ml n−1 ln−1 Mr n−1 ln−1                           , (4.1192) then equation (4.1186) and equation (4.1182), for k = 1, n, may be written in the matrix form as [A]{M} = {B}, (4.1193) {V} = [C]{M} + {D}, (4.1194) from which we obtain the solution {V} = [C][A]−1 {B} + {D}. (4.1195) 2. Numerical application Figure 4.21 gives n = 6, Ik = I = 256 × 10−6 m2 , k = 1, 6, lk = l = 1 m, k = 1, 6, P = 20,000 N, q = 40,000 N m−1 . The reactions Vk, k = 1,6, are required. Solution of the numerical application: The matrices [A] and [C] are obtained, directly, from relations (4.1187) and (4.1191) [A] = 3906.25     4 1 0 0 1 4 1 0 0 1 4 1 0 0 1 4     , [C] =         1 0 0 0 −2 1 0 0 1 −2 1 0 0 1 −2 1 0 0 1 −2 0 0 0 1         . (4.1196) The matrix {B} is written first in the form {B} = − 6 EI Sr 1 0 0 Sl 6 T , (4.1197)
  • 261. 252 LINEAR ALGEBRA (b) (a) A2 q l/2 l/2 A1 q l2 8 Figure 4.22 Section A1A2. (b) (a) l/2l/2 A5 A6 P P l 4 Figure 4.23 Section A5A6. and from Figure 4.22b and Figure 4.23b it follows that Sr 1 = ql4 24 , (4.1198) Sl 6 = P l3 16 , (4.1199) {B} = −39062500 0 0 −29296875 T . (4.1200) Analogically, the matrix {D} is written first in the form {D} = Ml 2 l1 Mr 1 l1 0 0 Ml 6 l5 Mr 5 l1 T , (4.1201) and, because from Figure 4.22a and Figure 4.23a we have Mr 1 = Ml 2 = ql2 2 , (4.1202) Mr 5 = Ml 6 = Pl 2 , (4.1203) we obtain {D} = 10000 2 2 0 0 1 1 T . (4.1204)
  • 262. APPLICATIONS 253 A B C P l2 l1 l3 Figure 4.24 Problem 4.13. X1 X3 X1 P C B Figure 4.25 Basic system. In the numerical case, from relation (4.1195) it follows that {V} = V1 V2 V3 V4 V5 V6 T = −2000 9000 −27500 −25000 62500 −20000 T . (4.1205) Problem 4.13 Let us determine, by the method of efforts, the reactions in the built-in sections A, B in Figure 4.24, assuming that the bars AC, CB have the same rigidity EI . Numerical application: EI = 2 × 108 N m2 , l1 = 0.5 m, l2 = 0.4 m, l3 = 0.6 m, P = 12,000 N. Solution: 1. Theory Introducing at the built-in section A, the reactions forces X1 and X2 and the reaction moment X3 (Fig. 4.25), we obtain the basic system, which is the bent beamACB, built in at B and acted upon by the force P and by the unknown reactions X1, X2, X3. The external load P , the unit forces along the forces X1 and X2 and the unit moment in the direction of the moment X3 produce on the basic system the diagrams of bending moments M0, m1, m2 m3, represented in Figure 4.26. By means of these diagrams we calculate the coefficients of influence δi0 = miM0 EI dx, (4.1206) δij = mimj EI dx. (4.1207) Being given that the variations of the moments mi are linear, we can also calculate them by Vereshchyagin’s rule: δi0 = miC EI , (4.1208) δij = imjC EI , (4.1209)
  • 263. 254 LINEAR ALGEBRA (a) (b) (c) (d) l2+l3A 1 BC 1 m1 m3 m2 M0 −l1 B 1 C −Pl3 −Pl3 A BC C B Figure 4.26 Diagrams of bending moments. where is the area of the moment surface of the diagram M0, while miC is the moment of the diagram corresponding to the center of gravity of the surface and mi is the area of the surface of moments of the diagram mi, while mjC is the moment of the diagram mj corresponding to the center of gravity of the surface of moments mi, respectively. 2. Numerical application It follows successively that δ10 = P l1l2 3 2EI , δ20 = − P l2 3(3l2 + 2l3) 6EI , δ30 = − P l2 3 2EI , (4.1210) δ11 = l2 1(l1 + 3l2 + 3l3) 3EI , δ12 = δ21 = − l1(l2 + l3)2 2EI , δ13 = δ31 = − l1(l1 + 2l2 + 2l3) 2EI , δ22 = (l2 + l3)3 3EI , δ23 = δ32 = (l2 + l3)2 2EI , δ33 = l1 + l2 + l3 EI . (4.1211) δ10 = 3.24 × 10−6 m2 , δ20 = −8.64 × 10−6 m2 , δ30 = −6.48 × 10−6 , (4.1212) δ11 = 1.45833 × 10−9 m N−1 , δ12 = δ21 = −1.25 × 10−9 m N−1 , δ13 = δ31 = −3.125 × 10−9 N−1 , δ22 = 1.66667 × 10−9 m N−1 , δ23 = δ32 = 2.5 × 109 N−1 , δ33 = 7.5 × 10−9 N−1 m−1 . (4.1213) Using the notations [δ] =   δ11 δ12 δ13 δ21 δ22 δ23 δ31 δ32 δ33   = 10−9   1.45833 −1.25 −3.125 −1.25 1.66667 2.5 −3.125 2.5 7.5   , {δ0} = δ10 δ20 δ30 T = 10−6 3.24 −8.64 −6.48 T , {X} = X1 X2 X3 T , (4.1214) we obtain the matrix equation of condition [δ]{X} = −{δ0}, (4.1215) from which we obtain {X} = −[δ]−1 {δ0}. (4.1216)
  • 264. APPLICATIONS 255 In our case, [δ]−1 = 1.53604 × 109   6.25003 1.5625 2.08334 1.5625 1.17185 0.26043 2.08334 0.26043 0.86805   , {X} =   10368.187 10368.071 1728.133   . (4.1217) The reactions at B are HB = X1 = 10,368.187 N, VB = P − X2 = 1631.929 N, MB = P l3 + X1l1 − X2(l2 + l3) − X3 = 287.89 N m. (4.1218) Observation 4.43 If l1 = l2 = l3 = l, then we obtain the values δ10 = P l4 2EI , δ20 = − 5P l4 6EI , δ30 = − P l2 2EI , (4.1219) δ11 = 7l3 3EI , δ12 = δ21 = − 2l3 EI , δ13 = δ31 = − 5l2 2EI , δ22 = 8l3 3EI , δ23 = δ32 = 2l2 EI , δ33 = 3l EI . (4.1220) The condition for this is given by l EI          7l2 3 −2l2 − 5l 2 −2l2 8l2 3 2l − 5l 2 2l 3            X1 X2 X3   = − P l2 EI          l2 2 − 5l2 6 − l2 2          (4.1221) or, equivalently, by         7l2 3 −2l2 − 5l 2 −2l2 8l2 3 2l − 5l 2 2l 3           X1 X2 X3   = Pl          − l2 2 5l2 6 l2 2          , (4.1222) with the solution X1 X2 X3 T = P 4 7P 16 Pl 12 T . (4.1223) P A2 A1 A5 A6 A7 A4 A9 A8A3 Figure 4.27 Problem 4.14.
  • 265. 256 LINEAR ALGEBRA Problem 4.14 Let us show that the plane frame in Figure 4.27 is with fixed knots and determine the reactions at the points A5, A6, A7, A8 by the method of displacements, knowing that the bars have the same rigidity EI and the same length, while A3A9 = A9A8 = l. Numerical application for l = 1 m, EI = 6 × 108 N m2 , P = 12,000 N. Solution: 1. Theory If we replace the elastic knots A1, A2, A3, A4 and the built-in ones A5, A6, A7, A8 by articula- tions, we obtain the structure in Figure 4.28, which has b = 8 bars and n = 8 articulations. The structure in Figure 4.27 has r = 12 external constraints (three in each built-in section A5, A6, A7, A8). It follows thus that the expression 2n − (b + r) = −4 is negative, so that the structure is with fixed knots. Isolating an arbitrary bar AhAj (Fig. 4.29), denoting by φh, φj , Mh, Mj the rotation angles and the moments at the ends of the bar, respectively, and using the method of the conjugate bar, we obtain the relations Mhj = 2EI l (2φh + φj ) + 2(Sh − 2Sj ) l2 , Mjh = 2EI l (2φj + φh) + 2(Sj − 2Sh) l2 , (4.1224) where by Sh, Sj we have denoted the static moments of the areas of the bending moments given by the external loads Q, q (Fig. 4.29). In the case of Figure 4.27, these static moments vanish for all the bars, excepting the bar A3A8 (Fig. 4.30), for which S3 = S8 = P l3 2 . (4.1225) To determine the unknown rotations φ1, φ2, φ3, φ4 at the knots A1, A2, A3, A4, we write the equilibrium equations that are obtained by isolating the knots, that is, M12 + M14 + M15 = 0, M21 + M23 = 0, M32 + M34 + M38 = 0, M41 + M43 + M46 + M47 = 0. (4.1226) A2 A3 A8 A7 A4 A6 A1 A5 Figure 4.28 Resulting structure. Ah Ajϕh ϕj q l Mhj Mjh Q Figure 4.29 Isolation of the bar AhAj .
  • 266. APPLICATIONS 257 2 Pl P A8A3 Figure 4.30 Diagram of bending moments for the bar A3A8. Vj = VhMhj Mjh Ah Vh Aj Figure 4.31 Bar without external loads. 2. Computation relations With the view to obtain the system of four equations with four unknowns from system (4.1226), we take into account that φ5 = φ6 = φ7 = φ8 = 0, obtaining thus the equalities M12 = 2EI (2φ1 + φ2) l , M14 = 2EI (2φ1 + φ4) l , M15 = EI φ1 l , M21 = 2EI (2φ2 + φ1) l , M23 = 2EI (2φ2 + φ3) l , M31 = 2EI (2φ1 + φ3) l , (4.1227) M34 = 2EI (2φ3 + φ4) l , M32 = 2EI (2φ3 + φ2) l , M38 = 2EI φ3 l − Pl 4 , M41 = 2EI (2φ4 + φ1) l , M43 = 2EI (2φ4 + φ3) l , M46 = 4EI φ4 l , M47 = 4EI φ4 l , so that system (4.1226), with the notation [A] =     6 1 0 1 1 4 1 0 0 1 5 1 1 0 1 8     , {φ} = φ1 φ2 φ3 φ4 T , {B} = 0 0 Pl2 8 0 T , (4.1228) becomes [A]{φ} = {B} (4.1229) and has the solution {φ} = [A]−1 {B}. (4.1230) The rotations φ1, φ2, φ3, φ4 being known now, from relations (4.1227), we determine the indicated moments and, moreover, the moments M51, M64, M74, M83 by the formulae M51 = 2EI φ1 l , M64 = M74 = 2EI φ4 l , M83 = Pl 8 + EI φ1 l . (4.1231)
  • 267. 258 LINEAR ALGEBRA For bars unloaded with external loads (Fig. 4.31), we obtain the reactions Vh = Vj = Mhj + Mjh l , (4.1232) while for the bar A3A8 (Fig. 4.32) we obtain the reactions V3 = M38 + M83 − Pl 2l , V8 = M38 + M83 + Pl 2l . (4.1233) On the basis of relation (4.1232), we may determine (Fig. 4.33) the reactions H5, H6, V7, that is, H5 = M51 + M15 l , H6 = M64 + M46 l , V7 = M47 + M74 l . (4.1234) To determine the reactions V5 and V6, we isolate the parts in Figure 4.34; there result the successive relations V2 = M23 + M32 l , V1 = M14 + M41 l , V5 = −(V1 + V7), V3 = M38 + M83 − Pl 2l , V4 = M47 + M74 l , (4.1235) V6 = V1 + V2 − V3 − V4, (4.1236) P V8 V′3 M83 M38 Figure 4.32 The bar A3A8. A2 A1 A5 A3 A8 A7 A4 A6 V5 V6 V7 V8 H5 H6 H7 H8 Figure 4.33 Calculation of the reactions H5, H6, and V7. V2 V1 V5 V′3 V′ V6 V2 V1 A3 A4 A6 A2 A1 A5 Figure 4.34 Determination of the reactions V5 and V6.
  • 268. APPLICATIONS 259 H2 H4H1 H2 H3 H3 H7 A8 H8 A7 A3 A4 A2 A1 Figure 4.35 Determination of the reactions H7 and H8. while, for the determination of the reactions H7 and H8, we isolate the parts in Figure 4.35 and there result the successive relations H2 = M12 + M21 l , H3 = M34 + M43 l , H8 = −(H2 + H3), (4.1237) H1 = M15 + M51 l , H4 = M46 + M64 l , H7 = H2 + H3 − H1 − H4. (4.1238) In conclusion, we obtain the reactions: • at the point A5 —H5, V5, M51; • at the point A6 —H6, V6, M64; • at the point A7 —H7, V7, M74; • at the point A8 —H8, V8, M83. 3. Numeric computation We calculate successively {B} = 0 0 0.25 × 10−6 0 T , (4.1239) [A]−1 =     0.178744 −0.048309 0.014493 −0.024155 −0.048309 0.276570 −0.057971 0.013285 0.014493 −0.057971 0.217391 −0.028986 −0.024155 0.013285 −0.028986 0.131643     , (4.1240) {φ} = φ1 φ2 φ3 φ4 T = 10−9 3.62325 −14.49275 54.34775 −7.2465 T , (4.1241) M51 = 4.3479 N m, M64 = M74 = −8.6958 N m, M83 = 152.17395 N m, M12 = −8.6955 N m, M15 = 2.17395 N m, M14 = 0 N m, M21 = −30.4347 N m, M23 = 30.4347 N m, M31 = 73.9131 N m, M34 = 121.7388 N m, M38 = −234.7827 N m, M41 = 13.0437 N m, M43 = 47.8257 N m, M46 = −17.3916 N m, M47 = 47.8257 N m, M32 = 113.0433 N m, (4.1242) V3 = −641.3 N, V8 = 558.7 N, H5 = 6.49 N, H6 = −26.09 N, V7 = 39.13 N, V2 = 143.5 N, V1 = −13.04 N, V5 = −26.09 N, V4 = 39.13 N, V6 = 758.7 N, H2 = −39.13 N, H3 = 169.6 N, H8 = −130.5 N, H1 = 6.52 N, H4 = −26.09 N, H7 = 150 N. (4.1243)
  • 269. 260 LINEAR ALGEBRA A B C D E F G60° 60° 60° l 3l l 2l X Y 32l Figure 4.36 Problem 4.15. Problem 4.15 Let us consider the plane articulated mechanism in Figure 4.36, where the crank AB is rotating with the constant angular velocity ω1. For the position in Figure 4.36, determine the angular velocities ω2, ω3, ω4, ω5 and the angular accelerations ε2, ε3, ε4, ε5 of the bars BC, CD, EF, FG. Numerical application for ω1 = 100 s−1 , l = 0.2 m. Solution: 1. Theory In an arbitrary position and in a more general case, the mechanism is represented in Figure 4.37. Denoting by l1, l2, l∗ 2 , l3, l4, l5 the lengths of the bars AB, BC, BE, CD, EF, FG, from the vector equations AB + BC + CD = OD − OA, AB + BE + EF + FG = OG − OA, (4.1244) projected on the axes OX and OY , we obtain the scalar equations l1 cos φ1 + l2 cos φ2 + l3 cos φ3 = XD − XA, l1 sin φ1 + l2 sin φ2 + l3 sin φ3 = YD − YA, l1 cos φ1 + l∗ 2 cos φ2 + l4 cos φ4 + l5 cos φ5 = XG − XA, l1 sin φ1 + l∗ 2 sin φ2 + l4 sin φ4 + l5 sin φ5 = YG − YA. (4.1245) O ϕ5 ϕ4 D (xD,yD) ϕ3 ϕ2 ϕ1 Y X G (xG,yG) F E C B A(xA,yA) Figure 4.37 The general case.
  • 270. APPLICATIONS 261 Differentiating successively with respect to time relations (4.1225) and denoting by ωi, εi the angular velocities and accelerations, respectively, ωi = ˙φi, εi = ˙ωi, i = 1,5, (4.1246) and knowing that ˙ω1 = 0, we obtain the systems of equations −l2ω2 sin φ2 − l3ω3 sin φ3 = l1ω1 sin φ1, l2ω2 cos φ2 + l3ω3 cos φ3 = −l1ω1 cos φ1, −l∗ 2 ω2 sin φ2 − l4ω4 sin φ4 − l5ω5 sin φ5 = l1ω1 sin φ1, (4.1247) l∗ 2 ω2 cos φ2 + l4ω4 cos φ4 + l5ω5 cos φ5 = −l1ω1 cos φ1, −l2ε2 sin φ2 − l3ε3 sin φ3 − l2ω2 2 cos φ2 − l3ω2 3 cos φ3 = l1ω2 1 cos φ1, l2ε2 cos φ2 + l3ε3 cos φ3 − l2ω2 2 sin φ2 − l3ω2 3 sin φ3 = l1ω2 1 sin φ1, −l∗ 2 ε2 sin φ2 − l4ε4 sin φ4 − l5ε5 sin φ5 − l∗ 2 ω2 2 cos φ2 (4.1248) −l4ω2 4 cos φ4 − l5ω2 5 cos φ5 = l1ω2 1 cos φ1, l∗ 2 ε2 cos φ2 + l4ε4 cos φ4 + l5ε5 cos φ5 − l∗ 2 ω2 2 sin φ2 −l4ω2 4 sin φ4 − l5ω2 5 sin φ5 = l1ω2 1 sin φ1. By using the notations [A] =     −l2 sin φ2 −l3 sin φ3 0 0 l2 cos φ2 l3 cos φ3 0 0 −l∗ 2 sin φ2 0 −l4 sin φ4 −l5 sin φ5 l∗ 2 cos φ2 0 l4 cos φ4 l5 cos φ5     , (4.1249) {B} = sin φ1 − cos φ1 sin φ1 − cos φ1 T , {ω} = ω2 ω3 ω4 ω5 T , (4.1250) [Ap] =     −l2 cos φ2 −l3 cos φ3 0 0 −l2 sin φ2 −l3 sin φ3 0 0 −l∗ 2 cos φ2 0 −l4 cos φ4 −l5 cos φ5 −l∗ 2 sin φ2 0 −l4 sin φ4 −l5 sin φ5     , (4.1251) {Bp} = cos φ1 sin φ1 cos φ1 sin φ1 T , {ω2 } = ω2 1 ω2 2 ω2 3 ω2 4 T , {ε} = ε1 ε2 ε3 ε4 T , (4.1252) the systems of equation (4.1247) and equation (4.1248) are written in the matrix form [A]{ω} = l1ω1{B}, (4.1253) [A]{ε} = l1ω2 1{Bp} − [Ap]{ω2 }, (4.1254) obtaining thus the solutions {ω} = l1ω1[A]−1 {B}, (4.1255) {ε} = l1ω2 1[A]−1 {Bp} − [A]−1 [Ap]{ω2 }. (4.1256)
  • 271. 262 LINEAR ALGEBRA 2. Numerical calculation The following values result: l1 = l, l2 = 3l, l∗ 2 = 4l, l3 = 3l, l4 = 2l, l5 = 2 √ 3l, XA = 0, YA = 0, XG = 5l, YG = 0, XD = 4l, YD = 0, φ1 = 0 ◦ , φ2 = 60 ◦ , φ3 = 300 ◦ , φ4 = 0 ◦ , φ5 = 270 ◦ , (4.1257) [A] =     −0.51962 0.51962 0 0 0.3 0.3 0 0 −0.69282 0 0 0.69282 0.4 0 0.4 0     , (4.1258) {B} = 0.86603 −0.5 0.86603 −0.5 T , (4.1259) [Ap] =     −0.3 −0.3 0 0 −0.51962 0.51962 0 0 −0.4 0 −0.4 0 −0.69282 0 0 0.69282     , (4.1260) {Bp} = 0.5 0.86603 0.5 0.86603 T , (4.1261) [A]−1 =     −0.962242 1.666667 0 0 0.962242 1.666667 0 0 0.962242 −1.666667 0 2.5 −0.962242 1.666667 1.44376 0     , (4.1262) {ω} = −33.333 0 8.333 −33.333 T , (4.1263) {ω2 } = 1111.089 0 69.439 1111.089 T , (4.1264) {ε} = 2566.03 5131.99 −280.61 4690.98 T . (4.1265) Problem 4.16 We consider a mechanical system, the motion of which is defined by the matrix differential equation with constant coefficients {˙x} = [A]{x} + [B]{u}, (4.1266) where • {x} = x1 x2 · · · xn T is the state vector; • {u} = u1 u2 · · · um T is the command vector; • [A] = [aij ]1≤i,j≤n is the matrix of coefficients; • [B] = [bij ] 1 ≤ i ≤ n 1 ≤ j ≤ m is the command matrix. Knowing that the matrix [A] has either positive solutions or complex ones with a positive real part, determine a command vector to make stable the motion with the aid of a reaction matrix. Numerical application for [A] =   1 1 0 1 1 1 0 1 1   , [B] =   0 0 1   . (4.1267)
  • 272. APPLICATIONS 263 Solution: 1. Theory If the matrix [A] has all its eigenvalues either strictly negative or complex with a negative real part, then even the null command vector {u} = {0} satisfies the condition that the motion be stable. If this condition is not fulfilled, then we may determine a command vector of the form {u} = [K]{x}, (4.1268) [K] being the reaction matrix, so that the motion is stable. To do this, the eigenvalues of the matrix [A] + [B][K] must be either negative or complex with the real part negative. To determine the matrix [K] that must fulfill these conditions, we may use the method of allocation of poles, by choosing convenient eigenvalues λ1, λ2, . . . , λn and obtaining the elements of the matrix [K] by means of the equations det[[A] + [B][K] − λ[I]] = 0, λ = λ1, λ2, . . . , λn. (4.1269) 2. Numerical calculation In the numerical case considered, the eigenvalues of the matrix [A] are given by the equation 1 − λ 1 0 1 1 − λ 1 0 1 1 − λ = 0, (4.1270) That is, λ1 = 1, λ2,3 = 1 ± √ 2; (4.1271) thus, in the absence of the command, the motion is unstable. In the numerical case considered, the reaction matrix [K] is of the form [K] = α1 α2 α3 , (4.1272) hence equation (4.1269) reads 1 − λ 1 0 1 1 − λ 1 α1 1 + α2 1 + α3 − λ = 0 (4.1273) or λ3 − λ2 (3 + α3) + λ(−α2 + 2α3 + 1) + 1 − α1 + α2 = 0. (4.1274) If we allocate the poles λ1 = −1, λ2 = −2 + i, λ3 = −2 − i, (4.1275) then λ1 + λ2 + λ3 = −5, λ1λ2 + λ1λ3 + λ2λ3 = 9, λ1λ2λ3 = −5 (4.1276) and we obtain the system −5 = 3 + α3, 9 = −α2 + 2α3 + 1, −5 = α1 − α2 − 1, (4.1277) from which it follows that α1 = −28, α2 = −24, α3 = −8; (4.1278)
  • 273. 264 LINEAR ALGEBRA x bb y a a O p(x) p(x) Figure 4.38 Problem 4.17. as a conclusion, the reaction matrix is [K] = −28 −24 −8 , (4.1279) so that the command becomes u = −28x1 − 24x2 − 8x3. (4.1280) Problem 4.17 Let a rectangular plate of dimensions 2a and 2b (λ = a/b, λ = 1/λ = b/a) be subjected in the middle plane by the distributed loads p(x) = p(x) = b0 + n bn cos γnx, γn = nπ a , n ∈ N, (4.1281) distributed on y = ±b, respectively, symmetrical with respect to both axes of coordinates (Fig. 4.38). The state of stress (σx and σy, normal stresses; τxy , tangential stress) may be expressed in the form σx = n (−1)n An 1(γny) cos γnx + m (−1)m Bm 2(δmx) cos δmy, σy = b0 + n (−1)n An 2(γny) cos γnx + m (−1)m Bm 1(δmx) cos δmy, (4.1282) τxy = n (−1)n An 3(γny) sin γnx + m (−1)m Bm 3(δmx) sin δmy, where it has been denoted (i = 1, 2, 3) i(γny) = i(νπζ) for ν = nλ , n ∈ N, ζ = η, i(δmx) = i(νπζ) for ν = mλ, m ∈ N, ζ = ξ, (4.1283) with ξ = x a , η = y b , δm = mπ b , m ∈ N. (4.1284) The functions i(νπζ) are defined by the relations 1(νπζ) = νπ sinh νπ [(1 − νπ coth νπ) cosh νπζ + νπζ sinh νπζ], 2(νπζ) = νπ sinh νπ [(1 + νπ coth νπ) cosh νπζ − νπζ sinh νπζ], 3(νπζ) = − νπ sinh νπ (νπ coth νπ sinh νπζ − νπζ cosh νπζ). (4.1285)
  • 274. APPLICATIONS 265 The sequences of coefficients An and Bn are given by the system of equations with a double infinity of unknowns i µ2 mi Ai + κ(mλ)Bm = 0, m, i ∈ N, l µ2 lnBl + κ (nλ )An = (−1)n bn, n, l ∈ N, (4.1286) where we have introduced the rational function µmn = 2γmδm γ2 n + δ2 m , n, m ∈ N (4.1287) and the hyperbolic functions κ(mλ) = 1(δma) = coth δma + δma sinh2 δma δma, m ∈ N, κ (nλ ) = 1(γnb) = coth γnb + γnb sinh2 γnb γnb, n ∈ N. (4.1288) To solve the system of infinite linear algebraic equations (4.1286) by approximate methods, we must prove the existence and the uniqueness of the solution, as well as its boundedness; thus we search whether the system is regular or not. The system is completely regular if the conditions i µ2 mi < κ(mλ), l µ2 ln < κ (mλ ) (4.1289) are fulfilled. Solution: Let the expansions into Fourier series be given by 3(γny) = m (−1)m µ2 mn cos δmy, 3(δmx) = n (−1)n µ2 mn cos γnx. (4.1290) In particular, we get κ(mλ) = 3(δma) = coth δma − δma sinh2 δma δma, m ∈ N, κ (nλ ) = 3(γnb) = coth γnb − γnb sinh2 γnb γnb, n ∈ N (4.1291) Relations (4.1289) and (4.1290) lead to i µ2 mi = κ(mλ) − 2 δma sinh δma 2 , l µ2 ln = κ (nλ ) − 2 γnb sinh γnb 2 . (4.1292) Thus, conditions (4.1288) become 2 δma sinh δma 2 > 0, 2 γnb sinh γnb 2 > 0. (4.1293)
  • 275. 266 LINEAR ALGEBRA We notice that these magnitudes tend to zero for m → ∞ or n → ∞. Hence, the system of equations (4.1286) is regular, but not completely regular. We have ρm = 1 − i µ2 mi κ(mλ) = 2 δma sinh δma 2 κ(mλ) = 4δma sinh 2δma + 2δma , ρn = 4γnb sinh 2γnb + 2γnb . (4.1294) Asking that the solution of the infinite system of equations, the existence of which is assured for a regular system, be bounded and obtained by successive approximations, the free terms, that is, the Fourier coefficients bn must satisfy the condition bn = bn κ2(nλ ) ≤ Kρn, (4.1295) where K is a positive constant, hence bn must be of the same order of magnitude as ρn. As a result, the type of external loads that may be taken into consideration is very restricted. The solution of a regular system, however, is not necessarily unique. To study this problem, we make the change of variable An = γ2 nAn, Bm = δ2 mBm, m, n ∈ N. (4.1296) Thus, system (4.1286) becomes i ω2 µ2 mi Ai + κ(mλ)Bm = 0, κ (mλ )An + l ω 2 µ2 lnBl = (−1)n bn, (4.1297) where we have denoted ω = m n λ, ω = 1 ω = n m λ , (4.1298) eventually taking n = i or m = l. Let the expansions into Fourier series be given by 1(γny) = 2 + m (−1)m ω 2 µ2 mn cos δmy, 1(δmx) = 2 + n (−1)n ω2 µ2 mn cos γnx. (4.1299) Relations (4.1287) and (4.1298) allow us now to write i ω2 µ2 mi = χ(mλ) − 2, l ω 2 µ2 ln = χ (nλ ) − 2. (4.1300) Thus, we get ρm = 1 − i ω2 µ2 mi κ2(mλ) = 2 κ2(mλ) = 2 δma coth δma + δma sinh2δma , ρn = 2 κ (nλ ) = 2 γnb coth γnb + γnb sinh2γnb . (4.1301)
  • 276. APPLICATIONS 267 Hence, the system of equations (4.1296) is regular too (not completely regular). Thus, the Fourier coefficients bn must be of order of magnitude 1/n2 (ρm and ρn tend to zero for m → ∞ and for n → ∞). As, by a change of variable of form (4.1295), where γn → ∞ and δm → ∞, together with n → ∞ and m → ∞, we have obtained also a regular system with bounded free terms, on the basis of a theorem of P. S. Bondarenko, we can affirm that the solution of both systems is unique. It is also sufficient for system (4.1286) to have Fourier coefficients of order of magnitude 1/n2 . Hence, we can consider any case of loading with a distributed load (we cannot make a calculation for a concentrated load; in this case, this force must be replaced by an equivalent distributed load on a certain interval). To diminish the restriction imposed on the external loads, we will try a new change of variable, namely, An = γnAn, Bm = δmBm, m, n ∈ N. (4.1302) System (4.1286) reads i ωµ2 mi Ai + κ(mλ)Bm = 0, κ (nλ )An + l ω µ2 lnBl = (−1)n γnbn, (4.1303) in this case. Taking into account i ωµ2 mi = i µmi (ωµmi ) ≤ i µ2 mi i ω2 µ2 mi = κ(mλ)[κ(mλ) − 2] < κ(mλ) (4.1304) and l ω µ2 ln ≤ κ (nλ )[κ (nλ ) − 2] < κ (nλ ), (4.1305) we may affirm that system (4.1302) is regular too, obtaining the same conclusions as above. But the evaluations thus made are not strict; we try now to bring some improvements. We notice that we may write i ωµ2 mi = 4(mλ)2 i i [i2 + (mλ)2]2 . (4.1306) On the basis of some evaluations made by P. S. Bondarenko, who considers that the series above approximates a certain definite integral, we can write i i [i2 + (mλ)2]2 ≤    f1 (mλ) , mλ ≤ 3, f2(mλ), 3 < mλ ≤ 4, f3(mλ), mλ > 4, (4.1307) where we denoted f1(mλ) = 1 [1 + (mλ)2]2 + 1 [4 + (mλ)2]2 + 21 + (mλ)2 4[9 + (mλ)2]2 + 32 + (mλ)2 4[16 + (mλ)2] , f2(mλ) = 1 8(mλ)2 + 3 + (mλ)2 4[1 + (mλ)2]2 + 2 [4 + (mλ)2]2 + 32 + (mλ)2 4[16 + (mλ)2] , (4.1308) f3(mλ) = 1 4(mλ)2 + 3 + (mλ)2 4[1 + (mλ)2]2 + 8 + (mλ)2 4[4 + (mλ)2] .
  • 277. 268 LINEAR ALGEBRA It follows that ρm = 1 − i ωµ2 mi χ(mλ) ≥ 1 − 4(mλ)2 fk(mλ) χ(mλ) = 1 − 1 π (mλ)2 fk(mλ) coth mλ + πmλ sinh2πmλ , (4.1309) for k = 1, 2, 3. The denominator of the last fraction is superunitary, being equal to the unity only for m → ∞. Hence, we may write ρm ≥ 1 − 4 π (mλ)2 fk(mλ). (4.1310) The maximum of the function (mλ)2fk(mλ) is smaller or at most equal to the sum of the maximum values of each component fraction, for every variation interval of the argument mλ. We may thus write max[(mλ)2 f1(mλ)] ≤ 1 4 + 1 9 + 5 24 + 369 2500 < 0.250 + 0.112 + 0.210 + 0.148 = 0.720, max[(mλ)2 f2(mλ)] ≤ 1 24 + 27 100 + 18 169 + 1 4 < 0.042 + 0.270 + 0.108 + 0.250 = 0.670, max[(mλ)2 f3(mλ)] ≤ 1 16 + 76 289 + 1 4 < 0.065 + 0.265 + 0.250 = 0.580. (4.1311) Thus, ρm > 1 − 4 π 0.720 > 1 − 0.920 = 0.080 > 0, (4.1312) for any m (for m → ∞ too). Analogically, we may show that ρn = 1 − l ω µ2 ln κ (nλ ) ≥ 1 − 4(nλ )2 fk(nλ ) κ (nλ ) > 0.080 > 0, (4.1313) for any n (for n → ∞ too). Hence, the infinite system (4.1302) is completely regular. Its free terms, that is, the Fourier coefficients bn must be all bounded; but we cannot consider loadings with concentrated moments (in this case, the Fourier coefficients bn must be of the order of magnitude of n, so that they cannot all be bounded). The linear system of algebraic equations may now be solved on sections (the first n equations with the first n unknowns), obtaining a result as accurate as we choose. Let us now show that, from the infinite system of linear algebraic equations, we may obtain An = 1 κ (nλ ) (−1)n bn − l µ2 lnBl , Bm = − 1 κ(mλ) l µ2 mi Ai. (4.1314)
  • 278. FURTHER READING 269 Introducing An in the first group of equations (4.1286), we obtain the system l aml Bl = cm, (4.1315) with aml = − i µ2 mi µ2 li κ (iλ ) , aml = alm, m = l, amm = κ(mλ) − i µ4 mi κ (iλ ) , cm = − k (−1)k bk µ2 mk κ (kλ ) , (4.1316) while, introducing Bm in the second group of equations (4.1286), we obtain the system i bni Bi = dn, (4.1317) with bni = − l µ2 lnµ2 li κ(lλ) , bni = bin, n = i, bnn = κ (nλ ) − l µ4 ln κ(lλ) , dn = (−1)n bn. (4.1318) We obtain that both systems are symmetric with respect to the principal diagonal. We obtain thus a system of equations for each sequence of unknown coefficients. These systems have, obviously, the same properties as system (4.1286) and may be similarly studied. FURTHER READING Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of America. Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis: Theory, Methods and Practice. Boca Raton: CRC Press. Atkinson KE (1989). An Introduction to Numerical Analysis. 2nd ed. New York: John Wiley & Sons, Inc. Atkinson KE (1993). Elementary Numerical Analysis. 2nd ed. New York: John Wiley & Sons, Inc. Atkinson K, Han W (2010). Theoretical Numerical Analysis: A Functional Analysis Framework. 3rd ed. New York: Springer-Verlag. Bakhvalov N (1976). M´ethodes Num´erique. Moscou: Editions Mir (in French). Berbente C, Mitran S, Zancu S (1997). Metode Numerice. Bucures¸ti: Editura Tehnic˘a (in Romanian). Bhatia R (1996). Matrix Analysis. New York: Springer-Verlag. Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole. Butt R (2009). Introduction to Numerical Analysis Using MATLAB. Boston: Jones and Bartlett Pub- lishers. Chapra SC (1996). Applied Numerical Methods with MATLAB for Engineers and Scientists. Boston: McGraw-Hill. Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson.
  • 279. 270 LINEAR ALGEBRA Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice Hall. Den Hartog JP (1961). Strength of Materials. New York: Dover Books on Engineering. D´emidovitch B, Maron I (1973). ´El´ements de Calcul Num´erique. Moscou: Editions Mir (in French). DiBenedetto E (2010). Classical Mechanics: Theory and Mathematical Modeling. New York: Springer- Verlag. Epperson JF (2007). An Introduction to Numerical Methods and Analysis. Hoboken: John Wiley & Sons, Inc. Fung YC, Tong P (2011). Classical and Computational Solid Mechanics. Singapore: World Scientific Publishing. Golub GH, van Loan CF (1996). Matrix Computations. 3rd ed. Baltimore: John Hopkins University Press. Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen- tation of Algorithms. Princeton: Princeton University Press. Hamming RW (1987). Numerical Methods for Scientists and Engineers. 2nd ed. New York: Dover Publications. Hamming RW (2012). Introduction to Applied Numerical Analysis. New York: Dover Publications. Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing. Hibbeler RC (2010). Mechanics of Materials. 8th ed. Englewood Cliffs: Prentice Hall. Higham NJ (2002). Accuracy and Stability of Numerical Algorithms. 2nd ed. Philadelphia: SIAM. Hildebrand FB (1987). Introduction to Numerical Analysis. 2nd ed. New York: Dover Publications. Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill. Ionescu GM (2005). Algebr˘a Liniar˘a. Bucures¸ti: Editura Garamond (in Romanian). Jazar RN (2008). Vehicle Dynamics: Theory and Applications. New York: Springer-Verlag. Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd ed. Boca Raton: CRC Press. Kelley CT (1987). Iterative Methods for Linear and Nonlinear Equations. Philadelphia: SIAM. Kleppner D, Kolenkow RJ (2010). An Introduction to Mechanics. Cambridge: Cambridge University Press. Kress R (1996). Numerical Analysis. New York: Springer-Verlag. Krˆılov AN (1957). Lect¸ii de Calcule prin Aproximat¸ii. Bucures¸ti: Editura Tehnic˘a (in Romanian). Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill. Lange K (2010). Numerical Analysis for Statisticians. 2nd ed. New York: Springer-Verlag. Lurie AI (2002). Analytical Mechanics. New York: Springer-Verlag. Lurie AI (2005). Theory of Elasticity. New York: Springer-Verlag. Mabie HH, Reinholtz CF (1987). Mechanisms and Dynamics of Machinery. 4th ed. New York: John Wiley & Sons, Inc. Marinescu G (1974). Analiza Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Meriam JL, Kraige LG (2012). Engineering Mechanics: Dynamics. Hoboken: John Wiley & Sons, Inc. Otto SR, Denier JP (2005). An Introduction to Programming and Numerical Methods in MATLAB. London: Springer-Verlag. Palm WJ III (2007). Mechanical Vibrations. Hoboken: John Wiley & Sons, Inc. Pandrea N (2000). Elemente de Mecanica Solidului ˆın Coordonate Pl¨uckeriene. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Pandrea N, Pˆarlac S, Popa D (2001). Modele pentru Studiul Vibrat¸iilor Automobilelor. Pites¸ti: Tiparg (in Romanian). Pandrea N, Popa D (2000). Mecanisme Teorie s¸i Aplicat¸ii CAD. Bucures¸ti: Editura Tehnic˘a (in Romanian). Pandrea N, St˘anescu ND (2002). Mecanic˘a. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
  • 280. FURTHER READING 271 Postolache M (2006). Modelare Numeric˘a. Teorie s¸i Aplicat¸ii. Bucures¸ti: Editura Fair Partners (in Romanian). Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific Computing. 3rd ed. Cambridge: Cambridge University Press. Quarteroni A, Sacco R, Saleri F (2010). Numerical Mathematics. 2nd ed. Berlin: Springer-Verlag. Ralston A, Rabinowitz P (2001). A First Course in Numerical Analysis. 2nd ed. New York: Dover Publications. Reza F (1973). Spat¸ii Liniare. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian). Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press. Salvadori MG, Baron ML (1962). Numerical Methods in Engineering. Englewood Cliffs: Prentice Hall. Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson. Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN. Bucures¸ti: Editura Tehnic˘a (in Romanian). Sinha AK (2010). Vibration of Mechanical Systems. Cambridge: Cambridge University Press. St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian). Stoer J, Bulirsh R (2010). Introduction to Numerical Analysis. 3rd ed. New York: Springer-Verlag. S¨uli E, Mayers D (2003). An Introduction to Numerical Analysis. Cambridge: Cambridge University Press. Trefethen LN, Bau D III (1997). Numerical Linear Algebra. Philadelphia: SIAM. Udris¸te C, Iftode V, Postolache M (1996). Metode Numerice de Calcul. Algoritmi s¸i Programe Turbo Pascal. Bucures¸ti: Editura Tehnic˘a (in Romanian). Voi´evodine V (1980). Principes Num´eriques d’Alg´ebre Lin´eare. Moscou: Editions Mir (in French). Wilkinson JH (1988). The Algebraic Eigenvalue Problem. Oxford: Oxford University Press.
  • 281. 5 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS 5.1 THE ITERATION METHOD (JACOBI) Let us consider the equation1 F(x) = 0, (5.1) where F : D ⊂ Rn → Rn , x ∈ Rn . In components, we have f1(x1, x2, . . . , xn) = 0, f2(x1, x2, . . . , xn) = 0, . . . , fn(x1, x2, . . . , xn). (5.2) Let us now write equation (5.1) in the form x = G(x), (5.3) where G : D ⊂ Rn → Rn or, in components, x1 = g1(x1, . . . , xn), x2 = g2(x1, . . . , xn), . . . , xn = gn(x1, . . . , xn). (5.4) We observe that, if G is a contraction, then the sequence of successive iterations x(0) ∈ D arbitrary, x(1) = G(x(0) ), x(2) = G(x(1) ), . . . , x(n+1) = G(x(n) ), . . . , n ∈ N∗ (5.5) where we assume that x(i) ∈ D for any i ∈ N∗ is convergent, as proved by Banach’s fixed-point theorem, because Rn is a Banach space with the usual Euclidean norm. The limit of this sequence is lim n→∞ x(n) = x (5.6) 1 The method is a generalization in the case of nonlinear systems for the Jacobi method in the case of linear systems of equations. Numerical Analysis with Applications in Mechanics and Engineering, First Edition. Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.  2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc. 273
  • 282. 274 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS and satisfies the relation x = G(x). (5.7) Observation 5.1 If G is a contraction, then all the functions gi(x), i = 1, n, are contractions. Indeed, if G is a contraction, then there exists q ∈ R, 0 < q < 1, so that G (x) − G(y) k ≤ q x − y k, (5.8) for any x and y of D, k being the Euclidean norm on Rn. Relation (5.8) may also be written in the form n i=1 (gi(x) − gi(y))2 ≤ q n i=1 (xi − yi)2 . (5.9) On the other hand, n i=1 (gi(x) − gi(y))2 ≥ (gj (x) − gj (y))2 (5.10) for any j = 1, n and relation (5.9) leads to |gj (x) − gj (y)| ≤ q x − y , (5.11) that is, gj : D ⊂ Rn → R is a contraction. Observation 5.2 Let us suppose that gi : D ⊂ Rn → R is a contraction for any i = 1, n; it does not mean that G : D ⊂ Rn → Rn is also a contraction. Indeed, let us suppose that n = 2, so that gi(x) = gi(x1, x2) = λxi, i = 1, 2, 0 < λ < 1. (5.12) We have |gi(x) − gi(y)| = λ|x1 − y1| = λ (x1 − y1)2 ≤ λ (x1 − y1)2 + (x2 − y2)2 = λ x − y , (5.13) so that gi, i = 1, 2, are contractions. On the other hand, G (x) − G(y) = (g1(x) − g1(y))2 + (g2(x) − g2(y))2 = λ2(x1 − y1)2 + λ2(x1 − y1)2 = λ √ 2|x1 − y1|. (5.14) Let us now choose λ > 1/ √ 2 and x and y so that x = x1 a T , y = y1 a T . (5.15) It follows that x − y = (x1 − y1)2 = |x1 − y1|, (5.16)
  • 283. NEWTON’S METHOD 275 hence the condition G (x) − G(y) ≤ q x − y , 0 < q < 1, leads to q|x1 − y1| ≥ λ √ 2|x1 − y1| > |x1 − y1|, (5.17) which is absurd. Observation 5.3 Let us consider the Jacobian of G, J =            ∂g1 ∂x1 ∂g1 ∂x2 · · · ∂g1 ∂xn ∂g2 ∂x1 ∂g2 ∂x2 · · · ∂g2 ∂xn · · · · · · · · · · · · ∂gn ∂x1 ∂gn ∂x2 · · · ∂gn ∂xn            , (5.18) and one of the norms ∞ or 1. Proceeding as in the one-dimensional case, it follows that if J ∞ = max i=1,n n j=1 ∂gi ∂xj < 1 on D (5.19) or if J 1 = max j=1,n n i=1 ∂gj ∂xi < 1 on D, (5.20) respectively, then the function G is a contraction and the sequence of successive iterations is convergent. 5.2 NEWTON’S METHOD Let the equation2 be f(x) = 0, (5.21) where f : D ⊂ Rn → Rn , and let us denote by x its solution. We suppose that f = f1 f2 · · · fn T , (5.22) the functions fi, i = 1, n, being of class C1 on D. We also suppose that the determinant of Jacobi’s matrix does not vanish at x, det J = ∂f1 ∂x1 ∂f1 ∂x2 · · · ∂f1 ∂xn ∂f2 ∂x1 ∂f2 ∂x2 · · · ∂f2 ∂xn · · · · · · · · · · · · ∂fn ∂x1 ∂fn ∂x2 · · · ∂fn ∂xn x=x = 0. (5.23) 2This method is the generalization of the Newton method presented in Chapter 2.
  • 284. 276 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS It follows that there exists a neighborhood V of x so that det J = 0 on V. Let us consider an arbitrary point x ∈ V. There exists J−1 (x) under these conditions, and we may define a recursive sequence, so that x(0) ∈ V arbitrary, x(k) = x(k−1) − J−1 (x(k−1) )f(x), k ∈ N∗ , (5.24) with the condition x(i) ∈ V, i ∈ N. Theorem 5.1 Let f : D ⊂ Rn → Rn and equation (5.21) with the solution x. Let us suppose that det J(x) = 0 too. If there exists the real constants α, β, and γ so that J−1 x(0) ≤ α, (5.25) x(1) − x(0) ≤ γ, (5.26) n i=1 n j=1 ∂2 fi ∂xi∂yj < β; i = 1, n; j = 1, n, (5.27) 2nαβγ < 1, (5.28) then the recurrent sequence defined by relation (5.24) is convergent to the solution x of equation (5.21). Demonstration. It is analogous to that of the Theorem 2.5. As stopping conditions for the iterative process we use x(k) − x(k−1) < ε, (5.29) f x(k) < ε, (5.30) where is one of the canonical norms of the matrix. Sometimes, we use both conditions (5.29) and (5.30) together. A variant of condition (5.29) is given by x(k) − x(k−1) < ε, if x(k) ≤ 1, (5.31) x(k) − x(k−1) x(k) < ε, if x(k) > 1. (5.32) 5.3 THE MODIFIED NEWTON METHOD If the matrix J−1 is continuous on a neighborhood of the solution x and if the start vector x(0) is sufficiently close to x, that is, it fulfills the conditions of Theorem 5.1, then we may replace the sequence of iterations x(k+1) = x(k) − J−1 (x(k) )f(x(k) ) (5.33) by the sequence x(k+1) = x(k) − J−1 (x(0) )f(x(k) ), (5.34) obtaining thus a variant of Newton’s method3 ; this variant has the advantage in that the calculation of the inverse J−1 at each iteration step is no more necessary. 3It is the generalization of the modified Newton method discussed in Chapter 2.
  • 285. THE GRADIENT METHOD 277 5.4 THE NEWTON–RAPHSON METHOD Let us consider the system of nonlinear equations4 f(x) = 0 (5.35) for which an approximation x(0) of the solution x is known. Let us now determine the variation δ(0) = δ1 δ2 · · · δn T , (5.36) so that x(0) + δ(0) be a solution of equation (5.35). Expanding the components fi, i = 1, n, of the vector function f into a Taylor series around x(0), we have fi(x0 ) + δ(0) 1 ∂fi ∂x1 x=x(0) + δ(0) 2 ∂fi ∂x2 x=x(0) + · · · + δ(0) n ∂fi ∂xn x=x(0) + · · · + = 0, i = 1, n. (5.37) We neglect the terms of higher order in relation (5.37), obtaining thus a linear system of n equations with n unknowns δ(0) 1 ∂f1 ∂x1 x=x(0) + δ(0) 2 ∂f1 ∂x2 x=x(0) + · · · + δ(0) n ∂f1 ∂xn x=x(0) = −f1(x0 ), . . . , δ(0) 1 ∂fn ∂x1 x=x(0) + δ(0) 2 ∂fn ∂x2 x=x(0) + · · · + δ(0) n ∂fn ∂xn x=x(0) = −fn(x0 ). (5.38) By solving this system, we obtain the values δ(0) i , i = 1, n. The new approximation of the solution is x(1) = x(0) + δ(0) (5.39) and the procedure continues, obtaining successively δ(1), x(2), δ(2), x(3), and so on. 5.5 THE GRADIENT METHOD Let the equation be f(x) = 0, (5.40) where f : Rn → Rn is at least of class C1 on a domain D ⊂ Rn, while x = x1 x2 · · · xn T . Equation (5.40) may also be written in the form of a system with n unknowns f1(x1, . . . , xn) = 0, f2(x1, . . . , xn) = 0, . . . , fn(x1, . . . , xn) = 0. (5.41) Let us consider the function U(x) = n i=1 f 2 i (x1, x2, . . . , xn). (5.42) 4It is easy to prove that the Newton method is equivalent to the Newton–Raphson method; moreover, they lead to the same results.
  • 286. 278 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS We observe that the solution x of equation (5.40) is solution of the equation U(x) = 0 (5.43) too and reciprocally. We thus reduce the problem of solving equation (5.40) to an equivalent problem of determination of the absolute minimum of the function U(x). Let us denote by x(0) the first approximation of the solution of equation (5.40), or the first approximation of the absolute minimum of function (5.42). We will draw through x(0) the level hypersurface of the function U(x) = U(x(0)) (Fig. 5.1). We will go along the normal to this hyper- surface at the point P0 until it pierces another hypersurface U(x) = U(x(1)) where it meets the point P1 of coordinate x(1) with U(x(1) ) < U(x(0) ). Starting now from the point P1 along the normal to the hypersurface U(x) = U(x(1) ), we obtain the point P2 corresponding to the intersection of the normal with the hypersphere U(x) = U(x(2) ), where U(x(2) ) < U(x(1) ). Let this point of coordi- nate be x(2) . The procedure continues, obtaining thus the sequence of points P1, P2, P3, . . . , Pn for which we have the sequence of relations U(x(1) ) > U(x(2) ) > U(x(3) ) > · · · > U(x(n) ) > · · · (5.44) it follows that the sequence of points P1, P2, . . . , Pn, . . . approaches the point P , which realizes the minimum value of the function U(x). The triangle OP0P1 leads to x(1) = x(0) − λ0∇U(x(0) ), (5.45) where ∇U means the gradient of the function U ∇U(x) = ∂U ∂x1 ∂U ∂x2 · · · ∂U ∂xn T . (5.46) Let the function now be φ(λ0) = U(x(0) − λ0∇U(x(0) )). (5.47) We must search that value of the parameter λ0 for which the function φ(λ0) will be minimum, from which it follows that ∂φ ∂λ0 = 0 (5.48) P O P0 P2 P1 P3 U(x(2) ) U(x(3) ) U(x(1) ) U(x(0) ) x(3) x(1) x(0) x(2) Figure 5.1 The gradient method.
  • 287. THE GRADIENT METHOD 279 or ∂ ∂λ0 U(x(0) − λ0∇U(x(0) )) = 0, (5.49) λ0 being the smallest positive solution of equation (5.49). On the other hand, we have φ(λ0) = n i=1 f 2 i (x(0) − λ0∇U(x(0) )) = 0. (5.50) Expanding the functions fi into a Taylor series, supposing that λ0 1 and neglecting the nonlinear terms in λ0, we obtain the relation φ(λ0) = n i=1 f 2 i  x(0) − λ0 n j=1 ∂fi x(0) ∂xj ∂U(x(0) ) ∂xj   . (5.51) Condition (5.48) of minimum may be now written in the form 2 n i=1     fi x(0) − λ0 n j=1 ∂fi x(0) ∂xj ∂U(x(0) ) ∂xj   n j=1 ∂fi x(0) ∂xj ∂U(x(0) ) ∂xj    = 0, (5.52) from which it follows that λ0 = n i=1  fi x(0) n j=1 ∂fi x(0) ∂xj ∂fi(x(0) ) ∂xj   n i=1   n j=1 ∂fi x(0) ∂xj ∂fi(x(0) ) ∂xj   2 . (5.53) From the definition of the function U(x) we have ∂U ∂xj = ∂ ∂xj n i=1 f 2 i (x) = 2 n i=1 fi (x) ∂fi(x) ∂xj , (5.54) ∇U(x) = 2 n i=1 fi (x) ∂fi(x) ∂x1 · · · n i=1 fi(x) ∂fi(x) ∂xn T = 2J(x)f(x), (5.55) where we have denoted the Jacobian of the vector function f by J(x), J(x) =       ∂f1 ∂x1 ∂f1 ∂x2 · · · ∂f1 ∂xn · · · · · · · · · · · · ∂fn ∂x1 ∂fn ∂x2 · · · ∂fn ∂xn       . (5.56) We denote the scalar product by ·, · x, y = xT y, (5.57)
  • 288. 280 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS where x = x1 · · · xn T , y = y1 · · · yn T , (5.58) so that relation (5.53) may be written in a more compact form 2λ0 = f(x(0) ), J(x(0) )JT (x(0) )f(x(0) ) J(x(0))JT(x(0))f(x(0)), J(x(0))JT(x(0))f(x(0)) . (5.59) Relation (5.45) now becomes x(1) = x(0) − 2λ0JT (x(0) )f(x(0) ). (5.60) Using the recurrence relation x(k+1) = x(k) − 2λkJT (x(k) )f(x(k) ), (5.61) we thus obtain the sequence of vectors x(1), . . . , x(k), . . . , where 2λk = f(x(k) ), J(x(k) )JT (x(k) )f(x(k) ) J(x(k))JT(x(k))f(x(k)), J(x(k))JT(x(k))f(x(k)) . (5.62) 5.6 THE METHOD OF ENTIRE SERIES Instead of solving the system of equations fk(x1, x2, . . . , xn) = 0, k = 1, n, (5.63) we will solve the system formed by the equations Fk(x1, x2, . . . , xn, λ) = 0, k = 1, n, (5.64) where Fk, k = 1, n, are analytic on a neighborhood of the solution x, while λ is a real parameter. The functions Fk(x, λ) fulfill the condition that the solution of system (5.64) is known for λ = 0, while for λ = 1 we have Fk(x, 1) = fk(x), k = 1, n. Moreover, Fk, k = 1, n, are analytic in λ. Moreover, we also suppose that for 0 ≤ λ ≤ 1 system (5.64) admits an analytic solution x(λ), while x = x(0) is an isolated solution of system (5.63). Expanding into a Taylor series the solution xj (λ) around 0, we have xj (λ) = xj (0) + λxj (0) + λ2 2! xj (0) + · · · , j = 1, n. (5.65) Differentiating relation (5.64) with respect to λ, we obtain n j=1 ∂Fk ∂xj xj (λ) + ∂Fk ∂λ = 0, k = 1, n. (5.66) If we denote by x(0) = x(0) the solution for λ = 0, then relation (5.66) leads to n j=1 ∂Fk(x(0) , 0) ∂xj xj (0) + ∂Fk(x(0) , 0) ∂λ = 0, k = 1, n, (5.67)
  • 289. NUMERICAL EXAMPLE 281 and if det ∂Fk x(0) , 0 ∂xj = 0, (5.68) then from equation (5.67) we obtain the values xj (0), j = 1, n. Differentiating once more expression (5.66) with respect to λ, we get n j=1 ∂Fk ∂xj xj (λ) + n j=1 n l=1 ∂2 Fk ∂xj ∂xl xj (λ)xl(λ) + 2 n j=1 ∂2 Fk ∂xj ∂λ xj (λ) + ∂2 Fk ∂λ2 = 0, k = 1, n. (5.69) Making now λ = 0, expression (5.69) becomes n j=1 ∂Fk(x(0) , 0) ∂xj xj (0) + n j=1 n l=1 ∂2 Fk(x(0) , 0) ∂xj ∂xl xj (0)xl(0) + 2 n j=1 ∂2 Fk(x(0) , 0) ∂xj ∂λ xj (0) + ∂2 Fk(x(0) , 0) ∂λ2 = 0, k = 1, n; (5.70) because the values xj (0), j = 1, n, are known, it follows that xj (0) are determined from equation (5.70). Obviously, the procedure of differentiation may continue now with relation (5.69), solving xj (0), and so on. The solution of system (5.63) is thus given by expression (5.65). 5.7 NUMERICAL EXAMPLE Example 5.1 Let us consider the nonlinear system 50x1 + x2 1 + x2 2 + x3 2 = 52, 50x2 + x3 1 + x4 2 = 52, (5.71) which has the obvious solution x1 = 1 and x2 = 1. To determine the solution by Jacobi’s method, we write this system in the form x1 = 1.04 − 0.02x2 1 − 0.02x3 2 = g1(x1, x2), x2 = 1.04 − 0.02x3 1 − 0.02x4 2 = g2(x1, x2). (5.72) The Jacobi matrix is given by J =     ∂g1 ∂x1 ∂g1 ∂x2 ∂g2 ∂x1 ∂g2 ∂x2     = −0.04x1 −0.06x2 2 −0.06x2 1 −0.08x3 2 (5.73) and we observe that J < 1 for a neighborhood of the solution 1 1 T .
  • 290. 282 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS Let use choose as vector of start x(0) = 1.05 0.92 T . (5.74) The relation of recurrence reads   x(n+1) 1 x(n+1) 2   =    1.04 − 0.02 x(n) 1 2 − 0.02(x(n) 2 )3 1.04 − 0.02(x(n) 1 )3 − 0.02(x(n) 2 )4    , (5.75) the calculations being given in Table 5.1. To apply Newton’s method, we write F(x) = f1 x1, x2 f2(x1, x2) = 50x1 + x2 1 + x3 2 − 52 50x2 + x3 1 + x3 2 − 52 , (5.76) so that the Jacobian is J(x) = 50 + 2x1 3x2 2 3x2 1 50 + 4x3 2 , (5.77) from which J−1 (x) = 1 (50 + 2x1)(50 + 4x3 2 ) − 9x2 1 x2 2 50 + 4x3 2 −3x2 2 −3x2 1 50 + 2x1 . (5.78) The recurrence formula reads x(n+1) = x(n) − J−1 (x(n) )F(x(n) ), (5.79) the calculation being systematized in Table 5.2. In the case of the modified Newton method, the recurrence relation reads x(n+1) = x(n) − J−1 (x(0) )F(x(n) ), (5.80) where J−1 (x(0) ) = 0.019252 −0.000920 −0.001199 0.018884 , (5.81) The calculations are given in Table 5.3. TABLE 5.1 Solution of Equation (5.71) by Jacobi’s Method Step x(n) 1 x(n) 2 0 1.05 0.92 1 1.002376 1.002520 2 0.999753 0.999655 3 1.000031 1.000042 4 0.999996 0.999995 5 1.000000 1.000001
  • 291. NUMERICAL EXAMPLE 283 TABLE 5.2 Solution of Equation (5.71) by Newton’s Method Step x(n) 1 x(n) 2 J−1 (xn ) F(x(n) ) 0 1.05 0.92 0.019252 −0.000920 −0.001199 0.018884 2.381188 −4.125982 1 1.000361 1.000770 0.019252 −0.001075 −0.001072 0.018575 0.021084 0.042667 2 1.000001 1.000000 TABLE 5.3 Solution of Equation (5.71) by Newton’s Modified Method Step x(n) 1 x(n) 2 F(x(n) ) 0 1.05 0.92 2.381188 −4.125982 1 1.000361 1.000770 0.021084 0.042667 2 0.999994 0.999990 −0.000342 −0.000558 3 1.000000 1.000000 The problem is put to see if Newton’s method has been correctly applied, that is, if the conditions of Theorem 5.1 are fulfilled. We thus calculate successively J−1 x0 ∞ = 0.019252 −0.000920 −0.001199 0.018884 ∞ = 0.020172 = α, (5.82) x(1) − x(0) ∞ = 1.000361 −1.05 1.000770 −0.92 ∞ = 0.08077 = β, (5.83) 2 i=1 2 j=1 ∂2 fi ∂xi∂xj ∞ = 2 0 0 6x2 ∞ + 6x1 0 0 12x2 2 ∞ . (5.84) Choosing now a neighborhood of the point (1, 1), given by x − 1 1 ∞ < 0.1, (5.85) we deduce 2 i=1 2 j=1 ∂2 fi ∂xi∂xj ∞ = |6x2| + |12x2 2 | < 6 × 1.1 + 12 × 1.12 = 21.12 = γ. (5.86) It follows that the relation 2nαβγ = 2 × 2 × 0.020172 × 0.08077 × 21.12 = 0.1376 < 1; (5.87)
  • 292. 284 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS hence, Newton’s method has been correctly applied. Let us now pass to the solving of system (5.71) by means of the Newton–Raphson method. To do this, we successively calculate ∂f1 ∂x1 = 50 + 2x1, ∂f1 ∂x2 = 3x2 2 , ∂f2 ∂x1 = 3x2 1 , ∂f2 ∂x2 = 50 + 4x3 2 , (5.88) ∂f1 ∂x1 x=x(0) = 52.1, ∂f1 ∂x2 x=x(0) = 2.5392, ∂f2 ∂x1 x=x(0) = 3.3075, ∂f2 ∂x2 x=x(0) = 53.114752, f1(x(0) ) = 2.381188, f2(x(0) ) = −4.125982 (5.89) and obtain the system 52.1δ(0) 1 + 2.5392δ(0) 2 = −2.381188, 3.3075δ(0) 1 + 53.114752δ(0) 2 = 4.125982, (5.90) with the solution δ(0) 1 = −0.049641, δ(0) 2 = 0.080772, (5.91) so that x(1) = x(0) + δ(0) 1 δ(0) 2 = 1.000359 1.000772 . (5.92) In the following step, we have ∂f1 ∂x1 x=x(0) = 52.000718, ∂f1 ∂x2 x=x(0) = 3.004634, ∂f2 ∂x1 x=x(0) = 3.002154, ∂f2 ∂x2 x=x(0) = 54.009271, f1(x(0) ) = 0.020986, f2(x(0) ) = 0.042769, (5.93) the system 52.000718δ(1) 1 + 3.004634δ(1) 2 = −0.020986, 3.002154δ(1) 1 + 54.009271δ(1) 2 = −0.042769, (5.94) and the solution δ(1) 1 = −0.000359, δ(1) 2 = −0.000772. (5.95) It follows that x(2) = x(1) + δ(1) 1 δ(1) 2 = 1.000000 1.000000 . (5.96) We observe that the Newton and Newton–Raphson methods lead to the same solution (in the limits of the calculation approximates). As a matter of fact, the two methods are equivalent. Let us now pass to the solution of system (5.71) by means of the gradient method.
  • 293. NUMERICAL EXAMPLE 285 We calculate successively J(x) = 50 + 2x1 3x2 2 3x2 1 50 + 4x3 2 , (5.97) JT (x) = 50 + 2x1 3x2 1 3x2 2 50 + 4x3 2 , (5.98) F(x) = 50x1 + x2 1 + x3 2 − 52 50x2 + x3 1 + x4 2 − 52 , (5.99) JT (x)F(x) = 50 + 2x1 (50x1 + x2 1 + x2 2 − 52) + 3x2 1 (50x2 + x3 1 + x4 2 − 52) 3x2 2 (50x1 + x2 1 + x3 2 − 52) + (50 + 4x3 2 )(50x2 + x3 1 + x4 2 − 52) , (5.100) J(x)JT (x) = 50 + 2x1 2 + 9x4 2 3x2 1 (50 + 2x1) 3x2 1 (50 + 2x1) + 3x2 2 (50 + 4x2)3 9x4 4 + (50 + 4x3 2 )2 , (5.101) The calculations are contained in Table 5.4. Let us consider the system F(x, λ) = 50x1 + λ x2 1 + x3 2 − 52 50x2 + λ(x3 1 + x4 2 ) − 52 = 0 0 , (5.102) where λ is a real parameter. For λ = 1 we obtain system (5.71), while for λ = 0, the solution of system (5.102) becomes obvious x(0) = 1.04 1.04 . (5.103) We observe that the conditions asked by the method of entire series are fulfilled. TABLE 5.4 The Solution of Equation (5.71) by the Gradient Method Step x(n) 2λn 0 1.05 0.92 0.0003957 1 1.0076065 1.0043272 0.0003189 2 1.0004987 0.9995063 0.0004063 3 1.0000230 1.0000285 0.0003117 4 1.0000002 1.0000002 0.0003126
  • 294. 286 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS We have ∂F1 ∂x1 = 50 + 2λx1, ∂F1 ∂x2 = 3λx2 2 , ∂F2 ∂x1 = 3λx2 1 , ∂F2 ∂x2 = 50 + 4λx3 2 , (5.104) ∂F1(x(0) , 0) ∂x1 = 50, ∂F1(x(0) , 0) ∂x2 = 0, ∂F2(x(0) , 0) ∂x1 = 0, ∂F2(x(0) , 0) ∂x2 = 50, (5.105) ∂F1 ∂λ = x2 1 + x3 2 , ∂F2 ∂λ = x3 1 + x4 2 , (5.106) ∂F1(x(0) , 0) ∂λ = 1.042 + 1.043 = 2.206464, ∂F2(x(0) , 0) ∂λ = 1.043 + 1.044 = 2.29472256, (5.107) where x(0) = x(0) = 1.04 1.04 . (5.108) It follows that the system 50x1(0) + 2.206464 = 0, 50x2(0) + 2.29472256 = 0, (5.109) with the solution x1(0) = −0.04412928, x2(0) = −0.045894451. (5.110) On the other hand, ∂2 F1 ∂x2 1 = 2λ, ∂2 F1 ∂x1∂x2 = 0, ∂2 F2 ∂x1∂x2 = 0, ∂2 F2 ∂x2 2 = 12λx2 2 , (5.111) ∂2 F1 ∂x1∂λ = 2x1, ∂2 F1 ∂x2∂λ = 3x2 2 , ∂2 F2 ∂x1∂λ = 3x2 1 , ∂2 F2 ∂x2∂λ = 4x3 2 , (5.112) ∂2F1 ∂λ2 = 0, ∂2F2 ∂λ2 = 0, (5.113) ∂2 F1(x(0) , 0) ∂x2 1 = 0, ∂2 F2(x(0) , 0) ∂x2 2 = 0, (5.114) ∂2 F1(x(0) , 0) ∂x1∂λ = 2.08, ∂2 F1(x(0) , 0) ∂x2∂λ = 3.2448, ∂2F2(x(0), 0) ∂x1∂λ = 3.2448, ∂2F2(x(0), 0) ∂x2∂λ = 4.499456. (5.115) There follows the system 50x1 (0) − 0.481414433 = 0, 50x2 (0) − 0.699381501 = 0, (5.116) with the solution x1 (0) = 0.009628288, x2 (0) = 0.01398763. (5.117)
  • 295. APPLICATIONS 287 We obtain the values x1 ≈ x1(0) + x1(0) + 1 2 x1 (0) = 1.000684864, x2 ≈ x2(0) + x2(0) + 1 2 x2 (0) = 1.001099364. (5.118) 5.8 APPLICATIONS Problem 5.1 Let us consider the plane articulated mechanism in Figure 5.2, where the dimensions OA = l1, AB = l2, BC = l3, AD = l∗ 2 , DE = l4, EF = l5, the angle α, the coordinates XC, YC, XF , YF and the initial position φi = φ◦ i , i = 1, 5, are known. Determine and represent graphically the functions φi(φ1), i = 2, 5. Numerical application: l = 0.2 m, l1 = l, l2 = 3l, l3 = 3l, l∗ 2 = 4l, l4 = 2l, l5 = 2l √ 3, α = 0◦ , XC = 4l, YC = 0, XF = 5l, YF = 0, φ◦ 1 = 0◦ , φ◦ 2 = 60◦ , φ◦ 3 = 60◦ , φ◦ 4 = 0◦ , φ◦ 5 = −90◦ , ω = 100 s−1, the imposed precision being ε = 0.0001, while the variation of the angle φ1 is φ1 = 1◦ . Solution: 1. Theory The vector equations OA + AB + BC = OC, OA + AD + DE + EF = OF, (5.119) projected on the axes OX, OY , the notations f1 = l1 cos φ1 + l2 cos φ2 + l3 cos φ3 − XC, f2 = l1 sin φ1 + l2 sin φ2 + l3 sin φ3 − YC, f3 = l1 cos φ1 + l∗ 2 cos(φ2 + α) + l4 cos φ4 + l5 cos φ5 − XF , f4 = l1 sin φ1 + l∗ 2 sin(φ2 + α) + l4 sin φ4 + l5 sin φ5 − YF , (5.120) being used, lead to the system of nonlinear equations fi(φ2, φ3, φ4, φ5) = 0, i = 2, 5; (5.121) we must determine the unknowns φ2, φ3, φ4, φ5 in function of the angle φ1. O ϕ2 ϕ1 α ϕ3 ϕ4 ϕ5 (xA,yA) C (xC,yC) Y X F (xF,yF) E D B A Figure 5.2 Problem 5.1.
  • 296. 288 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS Denoting by [J] the Jacobian [J] =     −l2 sin φ2 −l3 sin φ3 0 0 l2 cos φ2 l3 cos φ3 0 0 −l∗ 2 sin φ2 + α 0 −l4 sin φ4 −l5 sin φ5 l∗ 2 cos(φ2 + α) 0 l4 cos φ4 l5 cos φ5     (5.122) and by {φ}, {f}, { φ} the column matrices {φ} = φ2 φ3 φ4 φ5 T , {f} = f2 f3 f4 f5 T , { φ} = φ2 φ3 φ4 φ5 T , (5.123) we obtain the equation [J]{ φ} = −{f}, (5.124) from which, by means of the known initial values φ◦ i , i = 1, 5, we determine { φ}; then {φ} → {φ◦ } + { φ}, and the iteration process is continued until | φi| < ε, i = 2, 5, ε becomes the imposed precision. After determination of the angles φi, i = 2, 5, an increment φ1 = 1◦ of the angle φ1 is given; the values known from the previous step are considered to be approximate values for φi, i = 2, 5, and the iteration process is taken again. 2. Numerical calculation The results of the simulation are presented in Table 5.5 and graphically are plotted in the diagrams of Figure 5.3, Figure 5.4, Figure 5.5, and Figure 5.6. Problem 5.2 We consider the rigid solid in Figure 5.7 suspended by six bars A0iAi, i = 1, 6, spherical articulated and having lengths variable in time li(t) = l0i + si(t), si(0) = 0, i = 1, 6. (5.125) 0 50 100 150 200 250 300 350 400 25 30 35 40 45 50 55 60 65 70 ϕ1 ϕ2 Figure 5.3 Time history φ2 = φ2(φ1).
  • 297. APPLICATIONS 289 0 50 100 150 200 250 300 350 400 –70 –65 –60 –55 –50 –45 –40 –35 –30 –25 ϕ1 ϕ3 Figure 5.4 Time history φ3 = φ3(φ1). 0 50 100 150 200 250 300 350 400 −5 0 5 10 15 20 25 30 35 ϕ1 ϕ4 Figure 5.5 Time history φ4 = φ4(φ1). In particular, this may be the mechanical model of a Stewart platform. The position of the rigid solid with respect to a fixed frame of reference O0XYZ is defined by the position of the frame of reference rigidly linked to the body Oxyz, by the coordinates XO, YO, ZO of the point O and by the Bryan angles ψ, θ, φ, respectively. Knowing the coordinates xi, yi, zi of the points Ai, i = 1, 6, in the system Oxyz, the coordinates X0i, Y0i, Z0i of the points A0i, i = 1, 6, in the system O0XYZ, the functions si(t), i = 1, 6, the initial position X◦ O, Y◦ O, Z◦ O, ψ◦ , θ◦ , φ◦ , the error ε and the step t, determine the functions XO(t), YO(t), ZO(t), ψ(t), θ(t), φ(t) and represent them graphically.
  • 298. 290 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS 0 50 100 150 200 250 300 350 400 −105 −100 −95 −90 −85 −80 −75 −70 −65 ϕ1 ϕ5 Figure 5.6 Time history φ5 = φ5(φ1). Ai A0i O0 O (xO,yO,zO) z y x Z X Y Figure 5.7 Problem 5.2. Numerical application (Fig. 5.8): l = 1 m; l0i = l, i = 1, 6; s1(t) = (l/100) sin πt; si(t) = 0, i = 2, 6; the coordinates of the points A0i, Ai are given in Table 5.6. We know X◦ O = Y◦ O = Z◦ O = 0 m, ψ◦ = θ◦ = φ◦ = 0 rad, ε = 10−6 , t = 0.05 s too. Solution: 1. Theory 1.1. Notations We denote • Xi, Yi, Zi —the coordinates of the points Ai, i = 1, 6 in the system O0XYZ; • {Ri}, {RO}, {ri}, i = 1, 6—column matrices defined by the relations {Ri} = Xi Yi Zi T , {RO} = XO YO ZO T , {ri} = xi yi zi T ; (5.126)
  • 299. APPLICATIONS 291 TABLE 5.5 Results of the Simulation φ1[ ◦ ] φ2[ ◦ ] φ3[ ◦ ] φ4[ ◦ ] φ5[ ◦ ] 0.000000 60.000000 −60.000000 0.000000 −90.000000 10.000000 56.481055 −63.073225 −1.425165 −93.194507 20.000000 52.744084 −65.497920 −2.303616 −95.958155 30.000000 49.001803 −67.131160 −2.561647 −98.101199 40.000000 45.423080 −67.906509 −2.163048 −99.505742 50.000000 42.122571 −67.829804 −1.112080 −100.129729 60.000000 39.165924 −66.961696 0.551096 −99.994181 70.000000 36.582735 −65.396930 2.758832 −99.163908 80.000000 34.380507 −63.247016 5.424664 −97.729573 90.000000 32.556126 −60.628613 8.450173 −95.794641 100.000000 31.103960 −57.657128 11.729479 −93.467451 110.000000 30.020982 −54.444041 15.151752 −90.857173 120.000000 29.309571 −51.096361 18.602544 −88.072056 130.000000 28.978502 −47.716900 21.964860 −85.218490 140.000000 29.042278 −44.404335 25.120966 −82.399651 150.000000 29.518760 −41.252238 27.955835 −79.712771 160.000000 30.425042 −38.346622 30.362862 −77.244546 170.000000 31.771829 −35.762068 32.251753 −75.065015 180.000000 33.557310 −33.557310 33.557310 −73.221345 190.000000 35.762068 −31.771829 34.246675 −71.733836 200.000000 38.346622 −30.425042 34.322340 −70.596343 210.000000 41.252238 −29.518760 33.819449 −69.781847 220.000000 44.404335 −29.042278 32.798103 −69.251836 230.000000 47.716900 −28.978502 31.333094 −68.966865 240.000000 51.096361 −29.309571 29.503898 −68.895743 250.000000 54.444041 −30.020982 27.386860 −69.021894 260.000000 57.657128 −31.103960 25.050335 −69.346594 270.000000 60.628613 −32.556126 22.552622 −69.889466 280.000000 63.247016 −34.380507 19.942121 −70.686657 290.000000 65.396930 −36.582735 17.259152 −71.786823 300.000000 66.961696 −39.165924 14.539060 −73.244478 310.000000 67.829804 −42.122571 11.816457 −75.109848 320.000000 67.906509 −45.423080 9.130485 −77.414403 330.000000 67.131160 −49.001803 6.530785 −80.152359 340.000000 65.497920 −52.744084 4.083089 −83.261374 350.000000 63.073225 −56.481055 1.872281 −86.609758 360.000000 60.000000 −60.000000 0.000000 −90.000000 • [ψ], [θ], [φ]—rotation matrices [ψ] =    1 0 0 0 cos ψ − sin ψ 0 sin ψ cos ψ    , [θ] =    cos θ 0 sin θ 0 1 0 − sin θ 0 cos θ    , [φ] =    cos φ − sin φ 0 sin φ cos φ 0 0 0 1    ; (5.127)
  • 300. 292 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS • [Uψ], [Uθ], [Uφ]—matrices given by the relations [Uψ] =   0 0 0 0 0 −1 0 1 0   , [Uθ] =   0 0 1 0 0 0 −1 0 0   , [Uφ] =   0 −1 0 1 0 0 0 0 0   ; (5.128) • [A]—rotation matrix [A] = [ψ][θ][φ]; (5.129) • [Aψ], [Aθ], [Aφ]—partial derivatives of the rotation matrix, which are written in the form [Aψ] = [Uψ][A], [Aθ] = [A][φ]T [Uθ][φ], [Aφ] = [A][Uφ]; (5.130) • fi, i = 1, 6—functions of variables XO, YO, ZO, ψ, θ, φ, defined by the relations {fi} = [{Ri}T − {R0i}T ]{{Ri} − {R0i}} − (l0i + si)2 , i = 1, 6; (5.131) • {f}—the column matrix {f} = f1 f2 f3 f4 f5 f6 T ; (5.132) • {q}, { q}—the column matrices {q} = XO YO ZO ψ θ φ T , { q} = XO YO ZO ψ θ φ T ; (5.133) • [Bi]—matrix given by the relation [Bi] = Aψ {ri} [Aθ]{ri} [Aφ]{ri} , i = 1, 6. (5.134) 1.2. Computation relations The column matrices {Ri}, {RO}, {ri} are dependent on the relation {Ri} = {RO} + [A]{ri}, i = 1, 6. (5.135) The conditions (A0iAi)2 = (l01 + si)2 , i = 1, 6 (5.136) are transcribed in the system of nonlinear equations fi = 0, i = 1, 6, (5.137) the solution of which leads to the equation [J]{ q} = −{f}, (5.138)
  • 301. APPLICATIONS 293 O A1 A01 A02 A2 A4 A3 A6 A06A05 A5 A04 A03 X,x Z,z Y,y l l l l l/2l Figure 5.8 Numerical application. TABLE 5.6 Coordinates of the Points A0i, Ai, i = 1, 6. i X0i Y0i Z0i xi yi zi 1 2l 0 0 l 0 0 2 2l 0 l/2 l 0 l/2 3 0 2l 0 0 l 0 4 l 2l 0 l l 0 5 0 0 3l/2 0 0 l/2 6 0 l 3l/2 0 l l/2 [J] being the Jacobian of the system, which, with the given notations, reads [J] = 2       R1 T − {R01}T [I] [B1] [{R2}T − {R02}T ] [I] [B2] · · · · · · [{R6}T − {R06}T ] [I] [B6]       . (5.139) We calculate successively • the values of the functions si; • the matrices [ψ], [θ], [φ], [A], [Aψ], [Aθ], [Aφ]; • the matrices {Ri}; • the values of the functions fi, i = 1, 6, and the column matrix {f}; • the matrices [Bi], i = 1, 6; • the Jacobian [J]; • the column matrix { q}; • the column matrix {q} that becomes {q} + { q};
  • 302. 294 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS • in a cyclic manner, until | qi| < ε, i = 1, 6; • the parameter t becomes t + t, and the calculation is taken again, the approximate values of the matrix {q} being those given at the previous step. 2. Numerical calculation The motion is periodic with the period T = 2π/π = 2 s, while the results are transcribed in Figure. 5.9, Figure 5.10, Figure 5.11, Figure 5.12, Figure 5.13, and Figure 5.14. 0 0.5 1 1.5 2 2.5 3 3.5 4 −0.01 −0.005 0 0.005 0.01 0.015 t (s) xo(m) Figure 5.9 Time history XO(t). 0 0.5 1 1.5 2 2.5 3 3.5 4 0 1 2 3 4 5 6 × 10−5 t (s) YO(m) Figure 5.10 Time history YO(t).
  • 303. APPLICATIONS 295 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.2 0.4 0.6 0.8 1 1.2 × 10−4 t (s) ZO(m) Figure 5.11 Time history ZO(t). 0 0.5 1 1.5 2 2.5 3 3.5 4 −3 −2 −1 0 1 2 3 × 10−4 t (s) ψ(°) Figure 5.12 Time history ψ(t). Problem 5.3 Let us consider the planetary gear in Figure 5.15, with an angular axial tripod coupling to the gear box, and with an angular coupling to the wheel in the ball joint C. The motion is transmitted from the tulip axle (body 1) by contacts between the ramps of the tulip BiAi, i = 1, 3, symmetrical parallel to the rotation axis and to the arms of the tripod O2Ai, i = 1, 3, axisymmetric and normal to the axle O2C disposed.
  • 304. 296 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS 0 0.5 1 1.5 2 2.5 3 3.5 4 −1.5 −1 −0.5 0 0.5 1 1.5 t (s) θ(°) Figure 5.13 Time history θ(t). 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.002 0.004 0.006 0.008 0.01 0.012 t (s) φ(°) Figure 5.14 Time history φ(t). On the rotation axis of the tulip, we consider the point O0, so chosen as to have O2C = O0C = l. The fixed reference system O0x0y0z0 is so chosen that the O0z0-axis coincides with the rotation axis; as well, we choose the mobile reference system rigidly linked to the tulip O0x1y1z1, so that the O0z1-axis coincides with the O0z0-axis, while the O0x1-axis be parallel with O∗ C1 and intersects the ramp B1A1.
  • 305. APPLICATIONS 297 Gear box 1 2 B1 B3 B2 A1 A3 A2 O2O∗ z0,z1O0 C O0x0 O0x1 x2 x0 x1 z2 θ α θ Figure 5.15 Problem 5.3. We denote by θ the rotation angle of the tulip (the angle between the axes O0x0 and O0x1); knowing the distances O∗ B1 = O∗ B2 = O∗ B3 = r, the angle α (the angle between the O0z0-axis and the O2C-line), the length l and the coordinates XC, YC of the point C in the system O0x0y0z0, determine • the variation of the angle γ (the angle between O2C and the O0z0-axis), as a function of the angle θ; • the variation of the coordinates ξ, η, ζ of the point O2 in the reference system O0x1y1z1; • the variation of the coordinates ξ0, η0, ζ0 of the point O2 in the reference system O0x0y0z0 as a function of the angle θ; • the projections of the trajectory of the point O2 on each of the planes O0x1y1 and O0x0y0. Numerical application: r = 0.04 m, l = 0.2 m, α = 30◦ , XC = 0 m, YC = −0.1 m. Solution: 1. Theory We choose the system of reference O2x2y2z2, so that the O2x2-axis coincides with the straight line O2A1, while the O2z-axis coincides with the straight line O2C, and denoting by x1i, y1i, z1i, x2i, y2i, z2i the coordinates of the points Ai, i = 1, 3, in each of the systems O0x1y1z1, O2x2y2z2, we write the relations   x1i y1i z1i   =   ξ η ζ   + [A21]   x2i y2i z2i   , i = 1, 3, (5.140) where [A21] is the rotation matrix of the system O2x2y2z2, with respect to the system O0x1y1z1 [A21] =   α1 α2 α3 β1 β2 β3 γ1 γ2 γ3   . (5.141) Taking into account the relations   x1i y1i z1i   =   r cos δi r sin δi z1i   ,   x2i y2i z2i   =   µi cos δi µi sin δi 0   , (5.142)
  • 306. 298 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS where δi = 2 3 (i − 1)π, µi = O2Ai, i = 1, 3, (5.143) from equation (5.140) we obtain the relations r cos δi = ξ + µi(α1 cos δi + α2 sin δi), (5.144) r sin δi = η + µi(β1 cos δi + β2 sin δi), (5.145) z1i = ζ + µi(γ1 cos δi + γ2 sin δi). (5.146) By eliminating the parameter µi of equation (5.144) and equation (5.145), we obtain ξ(β1 cos δi + β2 sin δi) − η(α1 cos δi + α2 sin δi) = r 2 [(β1 − α2) + (β1 + α2) cos δi + (β2 − α1) sin δi], i = 1, 3 (5.147) and taking into account the equalities 3 i=1 sin δi = 3 i=1 cos δi = 3 i=1 sin 2δi = 3 i=1 cos 2δi = 0, (5.148) by summation of relation (5.147), we obtain the condition α2 = β1; (5.149) by adding and subtracting relation (5.147) for i = 2, 3, we obtain the system ξβ1 − ηα1 = rβ1, ξβ2 − ηα2 = r 2 (α1 − β2), (5.150) from which we obtain the unknowns ξ = r 2γ3 [−2β1α1 + α1(α1 − β2)], η = rβ1 2γ3 (α1 − 3β2). (5.151) By means of Euler’s angles ψ, γ, φ condition (5.149) becomes ψ = −φ, and the rotation matrix takes the form [A21] =    cos2 φ + sin2 cos γ − sin φ cos φ (1 − cos γ) − sin φ sin γ − sin φ cos φ(1 − cos γ) sin2 φ + cos2 φ cos γ − cos φ sin γ sin φ sin γ cos φ sin γ cos γ    , (5.152) while the coordinates ξ, η are given by ξ = r(1 − cos γ) 2 cos γ (cos 3φ cos φ + cos γ sin 3φ sin φ), (5.153) η = r(1 − cos γ) 2 cos γ (− cos 3φ sin φ + cos γ sin 3φ cos φ). (5.154)
  • 307. APPLICATIONS 299 Starting from the vector relation O0O2 + O2C = O0C, (5.155) denoting by [θ] the rotation matrix from the system O0x0y0z0 to the system O0x1y1z1 [θ] =   cos θ − sin θ 0 sin θ cos θ 0 0 0 1   , (5.156) and denoting by β the angle defined by the relations cos β = XC X2 C + Y2 C , sin β = YC X2 C + Y2 C , (5.157) we obtain the matrix equation   ξ η ζ   + [A21]   0 0 l   = [θ]T   l sin α cos β l sin α sin β l cos α   , (5.158) from which the scalar relations r(1 − cos γ) 2 cos γ (cos 3φ cos φ + cos γ sin 3φ sin φ) − l sin γ sin φ = l sin α cos(θ − β), (5.159) r(1 − cos γ) 2 cos γ (− cos 3φ sin φ + cos γ sin 3φ cos φ) − 1 sin γ cos φ = l sin α sin(θ − β), (5.160) ζ + l cos γ = l cos α (5.161) are obtained. Summing relations (5.159) and (5.160), multiplied by sin φ, cos φ, cos φ, − sin φ, and using the notation λ = r 2l , (5.162) we obtain the equations f1(φ, γ) = λ(1 − cos γ) sin 3φ − sin γ − sin α sin(φ − θ + β) = 0, (5.163) f2(φ, γ) = λ(1 − cos γ) cos 3φ − sin α cos γ cos(φ − θ + β) = 0, (5.164) the solving of which leads to φ(θ), γ(θ). 2. Numerical calculation For θ = 0 we obtain the approximate values γ = α, φ = 0 and because β = 3π/2, from equation (5.163) and equation (5.164) we obtain by the Newton–Raphson method, the results plotted into the diagrams in Figure 5.16 and Figure 5.17; then, from relations (5.153), (5.154), and (5.161), we obtain the results plotted in Figure 5.18, Figure 5.19, Figure 5.20, and Figure 5.21.
  • 308. 300 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 θ (°) φ(°) Figure 5.16 Time history φ = φ(θ). 0 50 100 150 200 250 300 350 400 29 29.2 29.4 29.6 29.8 30 30.2 30.4 30.6 30.8 31 θ (°) γ(°) Figure 5.17 Time history γ = γ(θ). To calculate φ and γ, we have taken into account that φ γ = − A11 A12 A21 A22 −1 f1 f2 , (5.165)
  • 309. APPLICATIONS 301 0 50 100 150 200 250 300 350 400 −3 −2 −1 0 1 2 3 4 × 10−3 θ (°) ξ(m) Figure 5.18 Time history ξ = ξ(θ). 0 50 100 150 200 250 300 350 400 −0.015 −0.01 −0.005 0 0.005 0.01 0.015 θ (°) η(m) Figure 5.19 Time history η = η(θ). where A11 = 3λ(1 − cos γ) cos 3φ − sin α cos(φ − θ + β), A12 = λ sin γ sin 3φ − cos γ, A21 = −3λ(1 − cos γ) sin 3φ + sin α cos γ sin(φ − θ + β), A22 = λ sin γ cos 3φ + sin α sin γ cos(φ − θ + β). (5.166)
  • 310. 302 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS 0 50 100 150 200 250 300 350 400 −1.5 −1 −0.5 0 0.5 1 1.5 2 × 10−3 θ (°) ζ(m) Figure 5.20 Time history ζ = ζ(θ). −3 −2 −1 0 1 2 3 4 × 10−3 −0.015 −0.01 −0.005 0 0.005 0.01 0.015 ξ (m) η(m) Figure 5.21 Variation η = η(ξ). For the diagrams ξ0(θ), η0(θ), ζ0(η) we take into account the relations ξ0 = ξ cos θ − η sin θ, η0 = ξ sin θ + η cos θ, ζ0 = l(cos α − cos γ); (5.167) and the diagrams in Figure 5.22, Figure 5.23, Figure 5.24, Figure 5.25 are obtained.
  • 311. APPLICATIONS 303 0 50 100 150 200 250 300 350 400 −0.01 −0.008 −0.006 −0.004 −0.002 0 0.002 0.004 0.006 0.008 0.01 θ (°) ξ0(m) Figure 5.22 Time history ξ0 = ξ0(θ). 0 50 100 150 200 250 300 350 400 −0.01 −0.008 −0.006 −0.004 −0.002 0 0.002 0.004 0.006 0.008 0.01 θ (°) η0(m) Figure 5.23 Time history η0 = η0(θ).
  • 312. 304 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS 0 50 100 150 200 250 300 350 400 −1.5 −1 −0.5 0 0.5 1 1.5 2 × 10−3 θ (°) ζ0(m) Figure 5.24 Time history ζ0 = ζ0(θ). −0.01 −0.008 −0.006 −0.004 −0.002 0 0.002 0.004 0.006 0.008 0.01 −0.01 −0.008 −0.006 −0.004 −0.002 0 0.002 0.004 0.006 0.008 0.01 ξ0 (m) η0(m) Figure 5.25 Variation η0 = η0(ξ0). FURTHER READING Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of America. Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis: Theory, Methods and Practice. Boca Raton: CRC Press.
  • 313. FURTHER READING 305 Atkinson KE (1989). An Introduction to Numerical Analysis. 2nd ed. New York: John Wiley & Sons, Inc. Atkinson K, Han W (2010). Theoretical Numerical Analysis: A Functional Analysis Framework. 3rd ed. New York: Springer-Verlag. Bakhvalov N (1976). M´ethodes Num´erique. Moscou: Editions Mir (in French). Berbente C, Mitran S, Zancu S (1997). Metode Numerice. Bucures¸ti: Editura Tehnic˘a (in Romanian). Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole. Butt R (2009). Introduction to Numerical Analysis Using MATLAB. Boston: Jones and Bartlett Publishers. Chapra SC (1996). Applied Numerical Methods with MATLAB for Engineers and Scientists. Boston: McGraw-Hill. Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson. Cira O, M˘arus¸ter S¸ (2008). Metode Numerice pentru Ecuat¸ii Neliniare. Bucures¸ti: Editura Matrix Rom (in Romanian). Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Hills: Prentice Hall. D´emidovitch B, Maron I (1973). ´El´ements de Calcul Num´erique. Moscou: Editions Mir (in French). Dennis JE Jr, Schnabel RB (1987). Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Philadelphia: SIAM. DiBenedetto E (2010). Classical Mechanics: Theory and Mathematical Modeling. New York: Springer-Verlag. Fung YC, Tong P (2011). Classical and Computational Solid Mechanics. Singapore: World Scientific Publishing. Gautschi W (1997). Numerical Analysis: An Introduction. Boston: Birkh¨auser. Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen- tation of Algorithms. Princeton: Princeton University Press. Hamming RW (1987). Numerical Methods for Scientists and Engineers. 2nd ed. New York: Dover Publications. Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing. Hildebrand FB (1987). Introduction to Numerical Analysis. 2nd ed. New York: Dover Publications. Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill. Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd ed. Boca Raton: CRC Press. Kleppner D, Kolenkow RJ (2010). An Introduction to Mechanics. Cambridge: Cambridge University Press. Kress R (1996). Numerical Analysis. New York: Springer-Verlag. Krˆılov AN (1957). Lect¸ii de Calcule prin Aproximat¸ii. Bucures¸ti: Editura Tehnic˘a (in Romanian). Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill. Lurie AI (2002). Analytical Mechanics. New York: Springer-Verlag. Mabie HH, Reinholtz CF (1987). Mechanisms and Dynamics of Machinery. 4th ed. New York: John Wiley & Sons, Inc. Meriam JL, Kraige LG (2012). Engineering Mechanics: Dynamics. Hoboken: John Wiley & Sons, Inc. Pandrea N (2000). Elemente de Mecanica Solidului ˆın Coordonate Pl¨uckeriene. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Pandrea N, Pˆarlac S, Popa D (2001). Modele pentru Studiul Vibrat¸iilor Automobilelor. Pites¸ti: Tiparg (in Romanian). Pandrea N, Popa D (2000). Mecanisme. Teorie s¸i Aplicat¸ii CAD. Bucures¸ti: Editura Tehnic˘a (in Romanian). Pandrea N, St˘anescu ND (2002). Mecanic˘a. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian).
  • 314. 306 SOLUTION OF SYSTEMS OF NONLINEAR EQUATIONS Popovici P, Cira O (1992). Rezolvarea Numeric˘a a Ecuat¸iilor Neliniare. Timis¸oara: Editura Signata (in Romanian). Postolache M (2006). Modelare Numeric˘a. Teorie s¸i Aplicat¸ii. Bucures¸ti: Editura Fair Partners (in Romanian). Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific Computing. 3rd ed. Cambridge: Cambridge University Press. Quarteroni A, Sacco R, Saleri F (2010). Numerical Mathematics. 2nd ed. Berlin: Springer-Verlag. Ralston A, Rabinowitz P (2001). A First Course in Numerical Analysis. 2nd ed. New York: Dover Publications. Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press. Salvadori MG, Baron ML (1962). Numerical Methods in Engineering. Englewood Cliffs: Prentice-Hall Inc. Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson. Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN. Bucures¸ti: Editura Tehnic˘a (in Romanian). St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian). S¨uli E, Mayers D (2003). An Introduction to Numerical Analysis. Cambridge: Cambridge University Press. Udris¸te C, Iftode V, Postolache M (1996). Metode Numerice de Calcul. Algoritmi s¸i Programe Turbo Pascal. Bucures¸ti: Editura Tehnic˘a (in Romanian).
  • 315. 6 INTERPOLATION AND APPROXIMATION OF FUNCTIONS 6.1 LAGRANGE’S INTERPOLATION POLYNOMIAL Definition 6.1 Let [a, b], −∞ < a < b < ∞, be an interval of the real axis and x0, x1, . . . , xn, n + 1 interior points of the segment [a, b], with a ≤ x0 < x1 < x2 < · · · < xn−1 < xn = b. (6.1) The points xi, i = 0, n, are called interpolation knots. Let us consider a function f : [a, b] → R for which we know the values yi = f (xi), i = 0, n. (6.2) We wish to construct a polynomial1 function L(x), for which the values at the interpolation knots xi, i = 0, n, coincide with the values of the function f at the very same points, that is, yi = L(xi), i = 0, n. (6.3) Theorem 6.1 Let f : [a, b] → R, the interpolation knots xi, i = 0, n, and the values of the function f at the points xi, that is, yi = f (xi), i = 0, n. Under these conditions, there exists a polynomial Ln(x) of degree n at the most which is unique and the values of which coincide with the values of the function f at the interpolation knots. 1The polynomial was discovered by Edward Waring (circa 1736–1798) in 1779, then by Leonhard Euler (1707–1783) in 1783, and published by Joseph Louis Lagrange (1736–1813) in 1795. Numerical Analysis with Applications in Mechanics and Engineering, First Edition. Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.  2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc. 307
  • 316. 308 INTERPOLATION AND APPROXIMATION OF FUNCTIONS Demonstration. Let us consider a polynomial ψi(x) with the property ψi(xj ) = δij , (6.4) where δij is Kronecker’s symbol δij = 1 for i = j 0 for i = j. (6.5) It follows that the polynomial ψi(x) may be written in the form ψi(x) = Ci(x − x0)(x − x1) · · · (x − xi−1)(x − xi+1) · · · (x − xn), (6.6) where Ci is given by the condition ψi(xi) = Ci(xi − x0) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn) = 1. (6.7) We obtain Ci = 1 (xi − x0) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn) , (6.8) hence ψi(x) = (x − x0) · · · (x − xi−1)(x − xi+1) · · · (x − xn) (xi − x0) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn) . (6.9) Let us construct the polynomial Ln(x) in the form Ln(x) = n i=0 ψi(x)yi. (6.10) We have Ln(xj ) = n i=0 ψi(xj )yi = ψj (xj )yj = yj . (6.11) Because ψi(x), i = 0, n, are polynomials of nth degree, it follows that Ln(x) has a degree n at the most. Formula (6.10) may also be written in the form Ln(x) = n i=0 (x − x0) · · · (x − xi−1)(x − xi+1) · · · (x − xn) (xi − x0) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn) yi. (6.12) We will show that Ln(x) is unique. Let us suppose that there exists another polynomial n(x) such that n(xi) = yi, the degree of n(x) being n at the most. Let us consider the polynomial Dn(x) = Ln(x) − n(x), (6.13) which is of degree n at the most (as a difference of two polynomials of degrees equal to n at the most), and let us assume that Dn(xi) = 0, i = 0, n. (6.14)
  • 317. LAGRANGE’S INTERPOLATION POLYNOMIAL 309 It follows that the polynomial Dn(x) of degree n at the most has at least n + 1 real roots, x0, x1, . . . , xn. It follows that the polynomial Dn(x) vanishes identically, hence Ln(x) = n(x), (6.15) where Ln(x) is unique. Definition 6.2 The polynomial Ln(x) given by formula (6.12) is called Lagrange’s interpolation polynomial. Observation 6.1 Let us denote by Pn+1(x) the polynomial Pn+1(x) = n i=0 (x − xi). (6.16) Under these conditions, we have Ln(x) = Pn+1(x) n i=0 yi (x − xi)Pn+1(xi) . (6.17) Demonstration. We may successively write Ln(x) = n i=0 (x − x0) · · · (x − xi−1)(x − xi+1) · · · (x − xn) (xi − x0) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn) x − xi x − xi yi = Pn+1(x) n i=0 yi (x − xi) 1 (xi − x0) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn) . (6.18) On the other hand, P n+1 (x) = n i=0 (x − x0) · · · (x − xi−1)(x − xi+1) · · · (x − xn) (6.19) and it follows that P n+1 (xi) = (xi − x0) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn). (6.20) Formula (6.18), in which we replace relation (6.20), leads to relation (6.17), which had to be proved. Observation 6.2 The polynomial Ln(x) may also be written in the form Ln(x) = anxn + an−1xn−1 + · · · + a1x + a0 (6.21) and condition (6.3) implies a system of n + 1 linear equations with n + 1 unknowns a0, a1, . . . , an:    anxn 0 + an−1xn−1 0 + · · · + a1x0 + a0 = y0, ... anxn n + an−1xn−1 n + · · · + a1xn + a0 = yn. (6.22)
  • 318. 310 INTERPOLATION AND APPROXIMATION OF FUNCTIONS The determinant of the system matrix = xn 0 xn−1 0 · · · x0 1 xn 1 xn−1 1 · · · x1 1 · · · · · · · · · · · · · · · xn n xn−1 n · · · xn 1 , (6.23) is of the Vandermonde type, the value of which, = (x1 − x2) · · · (x1 − xn)(x2 − x3) · · · (x2 − xn) · · · (xn−1 − xn) = i<j (xi − xj ), (6.24) does not vanish because xi = xj for i = j. Hence, it follows that system (6.22) has a unique solution so that Lagrange’s polynomial does exist and is unique. Theorem 6.2 (Evaluation of the Error for Lagrange’s Polynomial). Let f : [a, b] → R of class Cn+1 on [a, b] and let us denote by M the value M = sup x∈[a,b] |f (n+1) (x)|. (6.25) Let x0, x1, . . . , xn be the n + 1 interpolation knots on [a, b] and let yi = f (xi), i = 0, n. Ln(x), Lagrange’s polynomial, satisfy Ln(xi) = yi, i = 0, n. Under these conditions, we have |f (x) − Ln(x)| ≤ M (n + 1)! |Pn+1(x)|. (6.26) Demonstration. Let us denote by θ : [a, b] → R an auxiliary function θ(x) = f (x) − Ln(x) − λPn+1(x), λ ∈ R. (6.27) We observe that θ(xi) = f (xi) − Ln(xi) − λPn+1(xi) = 0, i = 0, n; (6.28) hence, θ(x) has at least n + 1 roots in the interval [a, b]. Let us choose λ ∈ R so that θ(x) admits a (n + 2)th root also on [a, b], and let us denote this root by x. In this case, θ(x) = f (x) − Ln(x) − λPn+1(x) = 0; (6.29) hence λ = f (x) − Ln(x) Pn+1(x) . (6.30) Let us arrange the n + 2 roots of θ in increasing order. The intervals [x0, x1], [x1, x2], . . . , [xj , x], [x, xj+1], . . . , [xn−1, xn] are thus obtained. The function θ(x) vanishes at the ends of each of these intervals for the value λ given by relation (6.30). Applying Rolle’s theorem for each of these intervals, it follows that the function θ (x) has at least n + 1 distinct roots. Analogically, it follows that θ (x) has at least n distinct roots, . . . , and the function θ(n+1)(x) has at least one root ζ; hence, θ(n+1) (ζ) = 0. (6.31)
  • 319. TAYLOR POLYNOMIALS 311 Differentiating n + 1 times the function θ in relation (6.27) and taking into account that L(n+1) n (x) = 0, P (n+1) n+1 = (n + 1)!, (6.32) because Ln(x) is a polynomial of degree n at the most while Pn+1(x) is a polynomial of (n + 1)th degree, we get θ(n+1) (x) = f (n+1) (x) − λ(n + 1)!, (6.33) from which f (n+1) (ζ) − λ(n + 1)! = 0; (6.34) hence λ = f (n+1) (ζ) (n + 1)! . (6.35) Equating relations (6.30) and (6.35), we get f (x) − Ln(x) = f (n+1)(ζ) (n + 1)! Pn+1(x) (6.36) and because x is arbitrary, it follows that |f (x) − Ln(x)| = 1 (n + 1)! |f (n+1) (ζ)||Pn+1(x)|. (6.37) Passing on to the supremum after ζ on [a, b], we obtain |f (x) − Ln(x)| ≤ 1 (n + 1)! sup ζ∈[a,b] |f (n+1) (ζ)||Pn+1(x)| = M (n + 1)! |Pn+1(x)|, (6.38) and the theorem is proved. 6.2 TAYLOR POLYNOMIALS We remember a well-known theorem of analysis. Theorem 6.3 (Taylor2 ). Let us consider f : I → R, where I is an interval of the real axis, and x and x are two elements of I. If f is of class Cn+1 on I, then the relation f (x) = f (x) + (x − x)1 1! f (x) + · · · + (x − x)n n! f (n) (x) + (x − x)n+1 (n + 1)! f (n+1) (ζ) (6.39) exists, where ζ is a point between x and x. Observation 6.3 Relation (6.39) leads to an approximate formula for the calculation of f (x), that is, f (x) ≈ f (x) + (x − x)1 1! f (x) + (x − x)2 2! f (x) + · · · + (x − x)n n! f (n) (x) = n k=0 (x − x)k k! f (k) (x). (6.40) 2Brook Taylor (1685–1731) stated this theorem in 1712.
  • 320. 312 INTERPOLATION AND APPROXIMATION OF FUNCTIONS 6.3 FINITE DIFFERENCES: GENERALIZED POWER Let there be a function f : R → R. Definition 6.3 We call step a fixed value h = x of the increment of the argument of the func- tion f . Definition 6.4 The expression y = f (x) = f (x + x) − f (x) (6.41) is called first difference of the function f . In general, the difference of order n of the function f is defined by n y = ( n−1 y), (6.42) where n ≥ 2. Proposition 6.1 Let Pn(x) = a0xn + a1xn−1 + · · · + an (6.43) be a polynomial of nth degree, where ai ∈ R, i = 0, n, and h is the step. Under these conditions, (i) kPn(x) is a polynomial of degree n − k, 1 ≤ k ≤ n, the dominant coefficient of which is given by a(k) 0 = a0n(n − 1) · · · (n − k + 1)hk ; (6.44) (ii) for k = n, we have a(n) 0 = a0n!hn ; (6.45) (iii) if k > n, then k Pn(x) = 0. (6.46) Demonstration (i) Successively, we may write Pn(x) = Pn(x + h) − Pn(x) = a0(x + h)n + a1(x + h)n−1 + · · · + an − a0xn − a1xn−1 − · · · − an = a0xn + C1 na0xn−1 h + · · · + a1xn−1 + · · · + an − a0xn − a1xn−1 − · · · − an = C1 na0xn−1 h + · · · ; (6.47) hence, Pn(x) is a polynomial of degree n − 1, the dominant coefficient of which is a(1) 0 = C1 na0h = na0h. (6.48) Then 2 Pn(x) = ( Pn(x)) = na0h(x + h)n−1 + · · · − na0hxn−1 − · · · = na0hC1 n−1xn−2 h + · · · = n(n − 1)a0h2 xn−2 + · · · , (6.49)
  • 321. FINITE DIFFERENCES: GENERALIZED POWER 313 and hence 2 Pn(x) is a polynomial of degree n − 2, the dominant coefficient of which is given by a(2) 0 = n(n − 1)a0h2 . (6.50) We can thus show that k Pn(x) is a polynomial of degree n − k, 1 ≤ k ≤ n, the dominant coefficient of which is given by (6.44). (ii) It is a particular case of (i) for k = n. It follows that n Pn(x) is a polynomial of degree 0 (hence a constant), its value being given by n Pn(x) = a0n!hn . (6.51) (iii) Let k = n + 1. We have k Pn(x) = ( n Pn(x)) = (a0n!hn ) = 0 (6.52) and, in general, the finite difference of a constant is zero and the proposition is proved. Proposition 6.2 Finite differences have the following properties: (i) If f and g are two functions and a and b two real constants, then (af + bg) = a f + b g. (6.53) (ii) The relation m ( n y) = m+n y (6.54) exists for any m, n ∈ N∗ (N∗ = N − {0} = 1, 2, 3, . . . ). (iii) If we write f (x + x) = f (x) + f (x) = (1 + )f (x), (6.55) then the relation f (x + n x) = (1 + )n f (x) = n k=0 Ck n k f (x) (6.56) holds for any n ∈ N∗ . Demonstration (i) We have (af + bg) = af (x + x) + bg(x + x) − af (x) − bg(x) = a[f (x + x) − f (x)] + b[g(x + x) − g(x)] = a f + b g. (6.57) (ii) Let n ∈ N∗ arbitrary be fixed. If m = 1, then we have m ( n y) = ( n y) = 1+n y = m+n y (6.58) corresponding to the definition of the finite difference. Let us suppose that relation (6.54) is valid for n ∈ N∗ arbitrary and m ∈ N∗ and let us write it for m + 1. We have
  • 322. 314 INTERPOLATION AND APPROXIMATION OF FUNCTIONS m+1 ( n y) = [ m ( n y)] = ( m+n y) = m+1+n y (6.59) and, conforming to the principle of mathematical induction, it follows that the property (ii) holds for any m, n ∈ N∗ . (iii) For n = 1 we have f (x + x) = f (x) + f (x) = (1 + )f (x), (6.60) while for n = 2 we may write f (x + 2 x) = (1 + )f (x + x) = (1 + )2 f (x). (6.61) Let us suppose that relation (6.56) holds for n and let us show that it holds for n + 1 too. We have f [x + (n + 1) x] = (1 + )f (x + n x) = (1 + )(1 + )n f (x) = (1 + )n+1 f (x) (6.62) and, conforming to the principle of mathematical induction, the property is valid for any n ∈ N∗ . Corollary 6.1 We may write n f (x) = f (x + n x) − C1 nf [x + (n − 1) x] + C2 nf [x + (n − 2) x] + · · · + (−1)n f (x) (6.63) for any n ∈ N∗. Demonstration. Indeed, n f (x) = [(1 + ) − 1]n f (x) = n k=0 Ck n(−1)k (1 + )n−k f (x) = n k=0 (−1)k f [x + (n − k) x] = f (x + n x) − C1 nf [x + (n − 1) x] + C2 nf [x + (n − 2) x] + · · · + (−1)n f (x). (6.64) Proposition 6.3 Let I be an open interval of the real axis and f : I → R of class C∞ on I. Let us denote the step by h = x. Under these conditions, n f (x) = ( x)n f (n) (x + nξ x), (6.65) where 0 < ξ < 1. Demonstration. We proceed by induction after n. For n = 1 we get f (x) = xf (x + ξ x), (6.66) which is just Lagrange’s theorem of finite increments. Let us suppose that the statement holds for n and let us show that it is valid for n + 1 too. We have n+1 f (x) = ( n f (x)) = n f (x + x) − n f (x) = ( x)n [f (n) (x + x + nξ1 x) − f (n) (x + nξ1 x)] = ( x)n ( x)f (n+1) (x + nξ1 x + λ x), (6.67)
  • 323. FINITE DIFFERENCES: GENERALIZED POWER 315 the last relation being the result of the application of Lagrange’s theorem, while λ ∈ (0, 1). Let us denote ξ = nξ1 + λ n + 1 ∈ (0, 1); (6.68) hence n+1 f (x) = ( x)n+1 f (n+1) [x + (n + 1)ξ x]. (6.69) Corresponding to the principle of mathematical induction, the property is valid for any n ∈ N∗ . Corollary 6.2 In the above conditions, there exists the relation f (n) (x) = lim x→0 n f (x) ( x)n . (6.70) Demonstration. We pass to the limit for x → 0 in the relation f (n) (x + nξ x) = nf (x) ( x)n , (6.71) with 0 < ξ < 1, and obtain just the requested relation. Observation 6.4 (i) Let there be a system of equidistant points xi, i = 0, n, for which xi = xi+1 − xi = h = ct, i = 0, n − 1, (6.72) and let us denote by yi, i = 0, n the values of the function at the points xi. We may write the relations yi = yi+1 − yi, i = 0, n − 1, (6.73) 2 yi = yi+1 − yi = yi+1 − 2yi+1 + yi, i = 0, n − 2, (6.74) and, in general, k yi = k−1 yi+1 − k−1 yi, i = 0, n − k. (6.75) On the other hand, yi+1 = yi + yi = (1 + )yi, i = 0, n − 1, (6.76) yi+2 = yi+1 + yi+1 = (1 + )yi+1 = (1 + )2 yi, i = 0, n − 2, (6.77) and, in general, yi+k = (1 + )k yi, i = 0, n − k. (6.78) Hence, it follows that yi+k = k j=0 C j k j yi = yi + C1 k yi + · · · + k yi. (6.79)
  • 324. 316 INTERPOLATION AND APPROXIMATION OF FUNCTIONS (ii) We can calculate k yi = [(1 + ) − 1]k yi = k j=0 C j k (−1)j (1 + )k−j yi = (1 + )k yi − C1 k (1 + )k−1 yi + C2 n(1 + )k−2 yi + · · · + (−1)k Ck k yi (6.80) and, taking into account relation (6.78), we obtain k yi = yi+k − C1 k yi+k−1 + C2 k yi+k−2 + · · · + (−1)k yi. (6.81) Usually, we put the finite differences as, for example, in Table 6.1. Definition 6.5 We denote by generalized power of order n the product x(n) = x(x − h)(x − 2h) · · · [x − (n − 1)h]. (6.82) Proposition 6.4 The relation k x(n) = n(n − 1) · · · [n − (k − 1)h]hk x(n−k) (6.83) holds for k ∈ N∗ . Demonstration. Let us consider firstly that k = 1. We have x(n) = (x + h)(n) − x(n) = (x + h)x(x − h) · · · [x − (n − 2)h] − x(x − h) · · · [x − (n − 2)h][x − (n − 1)h] = x(x − h) · · · [x − (n − 2)h]nh = nhx(n−1) . (6.84) It follows that 2 x(n) = nh x(n−1) = nh[(x + h)(n−1) − x(n−1) ] = nh{(x + h)x · · · [x − (n − 3)h] − x(x − h) · · · [x − (n − 2)h]} = nhx(x − h) · · · [x − (n − 3)h]h(n − 1) = n(n − 1)h2 x(n−2) , (6.85) for k = 2. Let us suppose that the relation holds for k and let us show that it remains valid for k + 1. We have k x(n) = n(n − 1) · · · [n − (k − 1)]hk x(n−k) (6.86) TABLE 6.1 Table of the Finite Differences x y y 2y . . . n−3y n−2y n−1y ny x0 y0 y0 2 y0 . . . n−3 y0 n−2 y0 n−1 y0 n y0 x1 y1 y1 2 y1 . . . n−3 y1 n−2 y1 n−1 y1 x2 y2 y2 2y2 . . . n−3y2 n−2y2 x3 y3 y3 2 y3 . . . n−3 y3 . . . . . . . . . . . . . . . xn−2 yn−2 yn−2 2 yn−2 xn−1 yn−1 yn−1 xn yn
  • 325. NEWTON’S INTERPOLATION POLYNOMIALS 317 k+1 x(n) = n(n − 1) · · · [n − (k − 1)]hk [(x + h)(n−k) − x(n−k) ] = n(n − 1) · · · [n − (k − 1)]hk {(x + h)x · · · [x − (n − k − 2)] − x(x − h) · · · [x − (n − k − 1)h]} = n(n − 1) · · · [n − (k − 1)]hk x(x − h) · · · [x − (n − k − 2)h](n − k)h = n(n − 1) · · · (n − k)hk+1 x(n−k−1) (6.87) and, conforming to the principle of mathematical induction, property (6.83) is valid for any k ∈ N∗. Observation 6.5 If h = 0, then the generalized power coincides with the normal power. 6.4 NEWTON’S INTERPOLATION POLYNOMIALS Proposition 6.5 Let us consider the function f : [a, b] → R and an equidistant system of knots3 xi = x0 + ih, i = 0, n, (6.88) where h is the constant interpolation step. If yi = f (xi), i = 0, n, then there exists a polynomial Pn(x) of degree n at the most so that Pn(xi) = yi and Pn = y0 + q 1! y0 + q(q − 1) 2! 2 y0 + · · · + q(q − 1) . . . [q − (n − 1)] n! n y0, (6.89) where q = x − x0 h . (6.90) Demonstration. Let us search the polynomial Pn in the form Pn = a0 + a1(x − x0) + a2(x − x0)(x − x1) + · · · + an(x − x0) · · · (x − xn−1) (6.91) or, equivalently, Pn = a0 + a1(x − x0)(1) + a2(x − x0)(2) + · · · + an(x − x0)(n) . (6.92) The condition Pn(xi) = yi is equivalent to the condition k Pn(x0) = k y0, k ≥ 0. (6.93) For k = 0, we obtain Pn(x0) = y0, (6.94) from which a0 = y0. (6.95) For k = 1 we have Pn(x) = 1!a1h + 2a2h(x − x0)(1) + · · · + nanh(x − x0)(n−1) , (6.96) 3Newton’s interpolation polynomials were described by Isaac Newton in a letter to Smith in 1675; a letter to Oldenburg in 1676; in Methodus Differentialis in 1711; in Regula Differentiarum written in 1676 and discovered in the twentieth century; and in Philosophiae Naturalis Principia Mathematica, published in 1687.
  • 326. 318 INTERPOLATION AND APPROXIMATION OF FUNCTIONS obtaining Pn(x0) = 1!a1h, (6.97) hence a1 = y0 1!h . (6.98) For k = 2 we have 2 Pn(x) = 1 × 2 × a2h2 + 2 × 3 × a3h2 (x − x0)(1) + · · · + n(n − 1)anh2 (x − x0)(n−2) (6.99) and we get 2 Pn(x0) = 2!a2h2 , (6.100) from which a2 = 2y0 2!h2 . (6.101) Step by step, we obtain ak = ky0 k!hk , k = 0, n, (6.102) and the polynomial Pn(x) may now be written as Pn(x) = y0 + y0 1!h (x − x0)(1) + 2 y0 2!h2 (x − x0)(2) + · · · + n y0 n!hn (x − x0)(n) . (6.103) We now verify that Pn(x) is an interpolation polynomial, that is, Pn(xk) = yk, k = 0, n. (6.104) Observing that (xk − x0)(k+p) = 0, (6.105) for any p ∈ N∗ , it follows that Pn(xk) may be written in the form Pn(xk) = y0 + y0 1!h (xk − x0)(1) + 2 y0 2!h2 (xk − x0)(2) + · · · + k y0 k!hk (xk − x0)(k) . (6.106) Then xk − x0 = kh, xk − x1 = (k − 1)h, xk − x2 = (k − 2)h, . . . , xk − xk−1 = h (6.107) and formula (6.106) is now written as Pn(xk) = y0 + y0 1!h kh + 2y0 2!h2 k(k − 1)h2 + · · · + ky0 k!hk hk k(k − 1) · · · 1. (6.108) Because k(k − 1) · · · [k − (p − 1)] p! = C p k , (6.109) relation (6.108) becomes Pn(xk) = y0 + C1 k y0 + C2 k 2 y0 + · · · + Ck k k y0. (6.110)
  • 327. NEWTON’S INTERPOLATION POLYNOMIALS 319 But we know that yk = (1 + )k y0 (6.111) and it follows that Pn(xk) = (1 + )k y0 = yk. (6.112) We calculate (x − x0)k hk = (x − x0)(x − x1) · · · (x − xk−1) hk = x − x0 h x − x1 h · · · x − xk−1 h . (6.113) But x − x0 h = q, x − x1 h = x − x0 − h h = q − 1, . . . , x − xk−1 h = x − x0 − (k − 1)h h = q − (k − 1) (6.114) and, taking into account the relation (6.103), we obtain the relation (6.89); hence the proposition is proved. Definition 6.6 The polynomial Pn(x) is called Newton’s polynomial or Newton’s forward polynomial. Observation 6.6 Newton’s formula (6.89) is inconvenient for x contiguous to the value xn (x situated in the inferior part of the finite difference table); therefore, another Newton’s polynomial beginning with xn is necessary. Observation 6.7 Because f (k) k = lim x→0 k f (x) ( x)k , (6.115) corresponding to the demonstrations in Section 6.3 and considering that lim h→0 k y0 hk = y(k) (x0), (6.116) it follows that f (k) = y(k) (x0), (6.117) so that Newton’s polynomial is transformed into the formula of expansion into a Taylor series. Proposition 6.6 Let f : [a, b] → R and the equidistant interpolation knots xi = x0 + ih, i = 0, n. Let us denote by yi the values of the function f at the points xi, yi = f (xi), i = 0, n. Under these conditions, the polynomial of degree n at the most, given by Pn(x) = yn + q 1! yn−1 + q(q + 1) 2! 2 yn−2 + · · · + q(q + 1) · · · (q + n − 1) n! n y0, (6.118) is an interpolation polynomial with q = x − xn h . (6.119) Demonstration. We seek the polynomial Pn(x) in the form Pn(x) = a0 + a1(x − xn) + a2(x − xn)(x − xn−1) + · · · + an(x − xn)(x − xn−1) · · · (x − x1). (6.120)
  • 328. 320 INTERPOLATION AND APPROXIMATION OF FUNCTIONS The condition Pn(xi) = yi is equivalent to the condition i Pn(xn−i) = i yn−i. (6.121) Relation (6.120) may also be written in the form Pn(x) = a0 + a1(x − xn)(1) + a2(x − xn−1)(2) + · · · + an(x − x1)(n) . (6.122) We obtain Pn(xn) = a0, (6.123) for i = 0 in relation (6.121), from which a0 = yn. (6.124) If we make i = 1 in the same relation, then it follows that Pn(xn−1) = yn−1, (6.125) where Pn(xn−1) = 1 × a1 × h; (6.126) hence, a1 = yn−1 1!h . (6.127) On the other hand, 2 Pn(x) = 1 × 2 × a2h2 + 2 × 3a3h2 (x − xn−2)(1) + · · · + n(n − 1)(x − x1)(n−2) ; (6.128) making x = xn−2, we obtain 2 Pn(xn−2) = 2!a2h2 . (6.129) But 2 Pn(xn−2) = 2 yn−2, (6.130) corresponding to relation (6.121) for i = 2 so that it follows that a2 = 2 yn−2 2!h2 . (6.131) Step by step, we obtain ai = i yn−i i!hi , i = 0, n. (6.132) Newton’s polynomial becomes Pn(x) = yn + yn−1 1!h (x − xn)(1) + 2yn−2 2!h2 (x − xn−1)(2) + · · · + ny0 n!hn (x − x1)(n) . (6.133) Pn(x) is an interpolation polynomial, that is, Pn(xi) = yi, i = 0, n. (6.134)
  • 329. NEWTON’S INTERPOLATION POLYNOMIALS 321 Firstly, let us observe that (x − xn−k)(k+p) = 0, (6.135) for any p ∈ N∗ ; hence, Pn(xi) = yn + yn−1 1!h (xi − xn)(1) + 2yn−2 2!h (xi − xn−1)(2) + · · · + n−iyi (n − i)!hn−i (xi − xi+1)(n−i) . (6.136) Then xi − xn = (i − n)h, xi − xn−1 = (i − n + 1)h, . . . , xi − xi+1 = −h (6.137) and relation (6.136) reads Pn(xi) = yn + (i − n)h 1!h yn−1 + (i − n)(i − n + 1)h2 2!h2 2 yn−2 + · · · + (i − n)(i − n + 1) · · · (−1)hn−i (n − i)!hn−i n−i yi. (6.138) On the other hand, i − n 1! = − n − i 1! = −C1 n−i, (i − n)(i − n + 1) . . . (−1) 2! = (n − i)(n − i − 1) 2! = C2 n−i, . . . , (i − n)(i − n + 1) · · · (−1) (n − i)! = (−1)n−i (n − i)! (n − i)! = (−1)n−i Cn−i n−i (6.139) and relation (6.138) leads to Pn(xi) = yn − C1 n−i yn−1 + C2 n−i 2 yn−2 + · · · + (−1)n−i Cn−i n−i n−i yi = yi, (6.140) corresponding to Section 6.3. We have x − xn h = q, x − xn−1 h = x − xn + h h = q + 1, x − xn−2 h = x − xn + 2h h = q + 2, . . . , x − x1 h = x − xn + (n − 1)h h = q + (n − 1) (6.141) and relation (6.133) leads to relation (6.118), which had to be proved. Definition 6.7 The polynomial Pn(x) is called Newton’s polynomial or Newton’s backward poly- nomial. Observation 6.8 Newton’s formula (6.118) is used for values contiguous to xn (situated in the inferior part of the finite differences table). Observation 6.9 (i) We know that the Lagrange interpolation polynomial is unique; hence, Newton’s polynomials are in fact Lagrange polynomials written differently.
  • 330. 322 INTERPOLATION AND APPROXIMATION OF FUNCTIONS (ii) The error in case of the Lagrange polynomial is given by |f (x) − Ln(x)| = |f (n+1) (ζ)| (n + 1)! |Pn+1(x)|, (6.142) where ζ is a point situated in the interval [a, b], while Pn+1(x) = (x − x0)(x − x1) · · · (x − xn). (6.143) Considering that Pn+1(x) = qh(q − 1)h · · · (q − n)h = q(q − 1) · · · (q − n)hn+1 , (6.144) where we used Newton’s forward polynomial and the relation f (n+1) (ζ) = lim h→0 n+1 f (ζ) hn+1 , (6.145) relation (6.144) becomes |f (x) − Pn(x)| ≈ n+1 f (ζ) (n + 1)!hn+1 q[n+1] hn+1 ≈ n+1 y0q[n+1] (n + 1)! . (6.146) Analogically, for Newton’s backward polynomial we have Pn+1 = qh(q + 1)h · · · (q + n)h = q(q + 1) · · · (q + n)hn+1 (6.147) and it follows that |f (x) − Pn(x)| ≈ n+1 f (ζ) (n + 1)!hn+1 (q + n)(n+1) hn+1 ≈ n+1 y0(q + n)(n+1) (n + 1)! . (6.148) 6.5 CENTRAL DIFFERENCES: GAUSS’S FORMULAE, STIRLING’S FORMULA, BESSEL’S FORMULA, EVERETT’S FORMULAE Let us consider the function f : [a, b] → R and 2n + 1 equidistant points in the interval [a, b]. We denote these points by x−n, x−n+1, . . . , x−1, x0, x1, . . . , xn−1, xn and denote by h the step h = xi+1 − xi = const, i = −n, n − 1. (6.149) Theorem 6.4 (Gauss’s first formula4 ). Under the above conditions and denoting q = x − x0 h (6.150) and yi = f (xi), i = −n, n, there exists a unique interpolation polynomial of degree 2n at the most, the expression of which is 4 Carl Friedrich Gauss (1777–1855) gave these formulae in 1812, in a lecture on interpolation.
  • 331. CENTRAL DIFFERENCES 323 P (x) = y0 + q y0 + q(q − 1) 2! 2 y−1 + (q + 1)q(q − 1) 3! 3 y−1 + (q + 1)q(q − 1)(q − 2) 4! 4 y−2 + (q + 2)(q + 1)q(q − 1)(q − 2) 5! 5 y−2 + · · · + (q + n − 1) · · · (q + 1)q(q − 1) · · · (q − n) (2n)! 2n y−n. (6.151) Demonstration. In the case of Gauss’s polynomial, the conditions are k P (xi) = k yi, i = −n, n, k = 0, 2n. (6.152) We require the polynomial in the form P (x) = a0 + a1(x − x−1)(−1) + a2(x − x−1)(2) + a3(x − x−2)(3) + a4(x − x−2)(4) + · · · + a2n−1(x − x−n)(2n−1) + a2n(x − x−n)(2n) . (6.153) Proceeding as with Newton’s polynomials, conditions (6.152) lead to a0 = y0, a1 = y0 1!h , a2 = 2 y−1 2!h2 , a3 = 3 y−1 3!h3 , a4 = 4 y−2 4!h4 , . . . , a2n = 2n y−n (2n)!h2n . (6.154) Taking into account equation (6.150) and equation (6.154) and replacing in relation (6.153), we get formula (6.151), which had to be proved. As for Newton’s polynomials, we may show that P (x) is an interpolation polynomial. Observation 6.10 The first Gauss formula may also be written in the form P (x) = y0 + q(1) y0 + q(2) 2! 2 y−1 + (q + 1)(3) 3! 3 y−1 + (q + 1)(4) 4! 4 y−2 + · · · + (q + n − 1)(2n) (2n)! 2n y−n. (6.155) Definition 6.8 The finite differences y−1, y0, and 2 y−1 are called central differences. For an arbitrary i between −n + 1 and 0, we call central differences the finite differences yi−1, yi, and 2 yi−1. Theorem 6.5 (Gauss’s Second Formula). Under the conditions of Theorem 6.4, the interpolation polynomial may be written in the form P (x) = y0 + q(1) y−1 + (q + 1)(2) 2! 2 y−1 + (q + 1)(3) 3! 3 y−2 + (q + 2)(4) 4! 4 y−2 + · · · + (q + n)(2n) (2n)! 2n y−n. (6.156) Demonstration. It is analogous to the demonstrations of the first Gauss formula and the Newton polynomials.
  • 332. 324 INTERPOLATION AND APPROXIMATION OF FUNCTIONS Corollary 6.3 (The Stirling Formula5 ). Under the conditions of Theorem 6.4, the interpolation polynomial reads P (x) = y0 + q y−1 + y0 2 + q2 2 2 y−1 + q(q2 − 1) 3! 3 y−2 + 3 y−1 2 + q2 (q2 − 1) 4! 4 y−2 + q(q2 − 12 )(q2 − 22 ) 5! 5 y−3 + 5 y−2 2 + · · · + q2(q2 − 12 ) · · · [q2 − (n − 1)2] (2n)! 2n y−n. (6.157) Demonstration. Formula (6.157) is the arithmetic mean of relations (6.151) and (6.156). For Bessel’s formulae6 we start from Gauss second formula, in which we take as initial values x1 and, correspondingly, y1 = f (x1). We have x − x1 h = q − 1 (6.158) and, replacing q by q − 1, we obtain P (x) = y1 + (q − 1) y0 + q(q − 1) 2! 2 y0 + q(q − 1)(q − 2) 3! 3 y−1 + (q + 1)q(q − 1)(q − 2) 4! 4 y−1 + (q + 1)q(q − 1)(q − 2)(q − 3) 5! 5 y−2 + · · · + (q + n − 2)(q + n − 3) · · · (q − n) (2n − 1)! 2n−1 y−(n−1) + (q + n − 1)(q + n − 2) · · · (q − n) (2n)! 2n y−(n−1). (6.159) To obtain the first interpolation formula of Bessel, we take the arithmetic mean between relation (6.159) and the first interpolation formula of Gauss, resulting in P (x) = y0 + y1 2 + q − 1 2 y0 + q(q − 1) 2! 2 y−1 + 2 y0 2 + q − 1 2 q(q − 1) 3! 3 y−1 + q(q − 1)(q + 1)(q − 2) 4! 4 y−2 + 4 y−1 2 + q − 1 2 q(q − 1)(q + 1)(q − 2) 5! 5 y−2 + q(q − 1)(q + 1)(q − 2)(q + 2)(q − 3) 6! 6 y−3 + 6 y−2 2 + · · · + q(q − 1)(q + 1)(q − 2)(q + 2) · · · (q − n)(q + n − 1) (2n)! 2n y−n + 2n y−n+1 2 + q − 1 2 q(q − 1)(q + 1)(q − 2)(q + 2) · · · (q − n)(q + n − 1) (2n + 1)! 2n+1 y−n, (6.160) 5 In 1719, James Stirling (1692–1770) discussed some Newton’s interpolation formulae in Methodus Differentialis. In 1730, Stirling published a more elaborate booklet on the topic. 6Friedrich Wilhelm Bessel (1784–1846) published these formulae in 1824.
  • 333. CENTRAL DIFFERENCES 325 where q = x − x0 h . (6.161) The polynomial P (x) in formula (6.160) coincides with f (x) at the points x−n, x−n+1, . . . , xn, xn+1, that is, at 2n + 2 points. If we consider the particular case n = 1, then we obtain the quadratic interpolation formula of Bessel P (x) = y0 + q y0 + q(q − 1) 4 ( y1 − y−1). (6.162) Let us observe that in Bessel’s formula (6.160) all the terms that contain differences of odd order have the factor (q − 1/2). If we choose q = 1/2, then we obtain Bessel’s dichotomy formula P x0 + x1 2 = y0 + y1 2 − 1 8 2 y−1 + 2 y0 2 + 3 128 4 y−2 + 4 y−1 2 − 5 1024 4y−3 + 4y−2 2 + · · · + (−1)n [1 × 3 × 5 × · · · × (2n − 1)]2 22n (2n)! 2n y−n + 2n y−n+1 2 . (6.163) If we denote q1 = q − 1 2 , (6.164) then Bessel’s formula reads P (x) = y0 + y1 2 + q1 y0 + q2 1 − 1 4 2! 2y−1 + 2y0 2 + q1 q2 1 − 1 4 3! 3 y−1 + q2 1 − 1 4 q2 1 − 9 4 4! 4 y−2 + 4 y−1 2 + q q2 1 − 1 4 q2 1 − 9 4 5! 5 y−2 + q2 1 − 1 4 q2 1 − 9 4 q2 1 − 25 4 6! 6 y−3 + 6 y−2 2 + · · · + q2 1 − 1 4 q2 1 − 9 4 · · · q2 1 − (2n−1)2 4 (2n)! 2n y−n + 2n y−n+1 2 + q1 q2 1 − 1 4 q2 1 − 9 4 · · · q2 1 − (2n−1)2 4 (2n+)! 2n+ y−n+1, (6.165) where q1 = x − x0 + x1 2 h . (6.166) Definition 6.9 We define the operator δ by the relations δf (x) = f x + h 2 − f x − h 2 , δk+1 f (x) = δk f x + h 2 − δk f x − h 2 , (6.167) where k ≥ 1, k ∈ N.
  • 334. 326 INTERPOLATION AND APPROXIMATION OF FUNCTIONS Observation 6.11 (i) Calculating δ2 f (x), we obtain δ2 f (x) = δf x + h 2 − δf x − h 2 = f (x + h) − 2f (x) + f (x − h). (6.168) (ii) Proceeding by induction, it follows immediately that if k is an even number, then in the calculation of δkyp supplementary intermediate points do not appear. Indeed, if k = 2, we have seen above that the affirmation is true. Let us suppose that the affirmation is true for k = 2l and let us show that it remains true for k = 2l + 2, l ∈ N, l ≥ 1. We have δ2l+2 yp = δ2l−2 yp+1 − 2δ2l−2 yp + δ2l−2 yp−1 (6.169) and, because all the terms on the right side do not introduce new supplementary points besides the given ones x−n, x−n+1, . . . , xn, the affirmation is proved. Starting from the first formula of Gauss and writing all the finite differences as a function of δk y0 and δk y1, we obtain the first Everett formula7 P (x) = (1 − q)y0 − q(q − 1)(q − 2) 3! δ2 y0 − (q + 1)q(q − 1)(q − 2)(q − 3) 5! δ4 y0 − · · · − (q + n − 1)(q + n − 2) . . . (q − n − 1) (2n + 1)! δ2n y0 + qy1 + (q + 1)q(q − 1) 3! δ2 y1 + (q + 2)(q + 1)q(q − 1)(q − 2) 5! δ4 y1 + · · · + (q + n)(q + n − 1) . . . (q − n) (2n + 1)! δ2n y1. (6.170) Observation 6.12 (i) The expression δyp+1/2 reads δyp+ 1 2 = f (xp + h) − f (xp) = yp+1 − yp. (6.171) (ii) Proceeding as with Observation 6.11, we deduce that δk yp+1/2 does not introduce supple- mentary points if k is a natural odd number. The first Gauss formula may also be written in the form P (x) = y0 + q + 1 2! δy1 2 + (q + 2)(q + 1)q(q − 1) 4! δ3 y1 2 + · · · + (q + n + 1)(q + n) · · · (q − n) (2n + 2)! δ2n+1 y1 2 − q(q − 1) 2! δy− 1 2 − (q + 1)q(q − 1)(q − 2) 4! δ3 y− 1 2 − · · · − (q + n)(q + n − 1) · · · (q − n − 1) (2n + 2)! δ2n+1 y− 1 2 , (6.172) called the second interpolation formula of Everett or the interpolation formula of Steffensen.8 7 Joseph Davis Everett (1831–1904) published his formulae in 1900. 8The formula is called after Johan Frederik Steffensen (1873–1961) who presented it in 1950.
  • 335. DIVIDED DIFFERENCES 327 6.6 DIVIDED DIFFERENCES Definition 6.10 Let there be f : I ⊂ R → R, I interval of the real axis, and the division points x1, x2, . . . , xn. The values of the function at these points are yi = f (xi), i = 1, n. We define the divided differences by the relations [xi, xj ] = f (xi; xj ) = f (xj ) − f (xi) xj − xi , (6.173) [xi, xj , xk] = f (xi; xj ; xk) = f (xj ; xk) − f (xi; xj ) xk − xi , (6.174) and, in general, by [xi1 , xi2 , . . . , xik+1 ] = f (xi1 ; xi2 ; . . . ; xik+1 ) = f (xi2 ; . . . ; xik+1 ) − f (xi1 ; . . . ; xik ) xik+1 − xi1 , (6.175) where il ∈ {1, 2, . . . , n}, l = 1, k + 1. Theorem 6.6 There exists the relation f (x1; . . . ; xk) = k j=1 j=i f (xj ) (xj − xi) . (6.176) Demonstration. We proceed by induction. For k = 1, we have f (x1) = f (x1) (6.177) which is true. For k = 2, we obtain f (x1; x2) = f (x2) x2 − x1 + f (x1) x1 − x2 = f (x2) − f (x1) x2 − x1 , (6.178) which is the definition of divided differences. Let us suppose now that the affirmation is valid for any i ≤ k and let us show that it exists for k + 1. We have f (x1; . . . ; xk+1) = f (x2; . . . ; xk+1) − f (x1; . . . ; xk) xk+1 − x1 = 1 xk+1 − x1        k+1 j=2 f xj 2≤i≤k+1 i=j (xj − xi) − k j=1 f (xj ) 1≤i≤k i=j (xj − xi)        , (6.179) corresponding to the induction hypothesis.
  • 336. 328 INTERPOLATION AND APPROXIMATION OF FUNCTIONS We calculate the coefficient of f (xj ), that is, cj = 1 xk+1 − x1        1 2≤i≤k+1 i=j xj − xi − 1 1≤i≤k i=j (xj − xi)        = (xj − x1) − (xj − xk+1) (xk+1 − x1) 1≤i≤k+1 i=j (xj − xi) = 1 1≤i≤k+1 i=j (xj − xi) (6.180) and the theorem is proved. Observation 6.13 (i) The divided differences are linear operators, that is, l i=1 αifi (x1; . . . ; xk) = l i=1 αifi(x1; . . . ; xk). (6.181) (ii) A divided difference is an even function with respect to its arguments. We may construct Table 6.2 of divided differences in the following form. Observation 6.14 (i) If x2 = x1 + ε, then f (x1; x2) = f (x1 + ε) − f (x1) ε (6.182) and it follows that f (x; x) = lim ε→0 f (x + ε) − f (x) ε = f (x). (6.183) (ii) In general, f (x; x; . . . ; x) = 1 k! f (k) (x), (6.184) where x appears k times in the left part of formula (6.184). TABLE 6.2 Table of Divided Differences x1 f (x1) f (x1; x2) x2 f (x2) f (x1; x2; x3) f (x2; x3) f (x1; x2; x3; x4) x3 f (x3) f (x2; x3; x4) f (x1; x2; x3; x4; x5) . . . f (x3; x4) f (x2; x3; x4; x5) x4 f (x4) f (x3; x4; x5) . . . . . . . . . . . . . . . . . . . . . xn f (xn)
  • 337. DIVIDED DIFFERENCES 329 The demonstration is made by induction. For k = 1, the affirmation has been given at point (i). Let us suppose that the affirmation holds for k and that it remains valid for k + 1. We may write f(x; . . . ; x; x)k+1 times = lim ε→0 f (x; x + ε; . . . ; x + (k + 1)ε) = lim ε→0 f (x + ε; . . . ; x + (k + 1)ε) − f (x; . . . ; x + kε) x + (k + 1)ε − x = 1 k + 1 f (k+1) (x) k! = f (k+1) (x) (k + 1)! , (6.185) the affirmation thus being proved. (iii) There exists the relation d dx f (x1; . . . ; xn; x) = f (x1; . . . ; xn; x; x). (6.186) (iv) If u1, . . . , up are differentiable functions of x, then d dx f (x1; . . . ; xn; u1; . . . ; up) = p i=1 f (x1; . . . ; xn; u1; . . . ; up; ui) dui dx . (6.187) (v) We may write dr dxr f (x1; . . . ; xn; x) = 1 r! f (x1; . . . ; xn; x; . . . ; x), (6.188) where x appears r times on the right side. Theorem 6.7 Let x0, x1, . . . , xn be distinct internal points of a connected domain D included in the complex plane and f : D → C holomorphic. Under these conditions, [x0; x1; . . . ; xn] = 1 2πi C f (z)dz (z − x0) · · · (z − xn) , (6.189) where C is a rectifiable contour in the complex plane, contained in D, which contains in its interior the points x0, x1, . . . , xn. Demonstration. Let I = 1 2πi C f (z)dz (z − x0) · · · (z − xn) , (6.190) where C is passed through in the positive sense. We apply the residue theorem, knowing that the function under the integral admits the poles of the first order x0, x1, . . . , xn; it follows that I = n k=0 f (xk) n i=0 i=k (xk − xi) , (6.191) the last expression being [x0; x1; . . . ; xn], in conformity with Theorem 6.6.
  • 338. 330 INTERPOLATION AND APPROXIMATION OF FUNCTIONS Observation 6.15 (i) It follows that Theorem 6.7 is true in the domain of holomorphy of the function f (z) too; hence, the representation remains valid, immaterial of the choice of the points xi in the domain bounded by the curve C, in particular, if these points coincide. (ii) If we denote by L the length of the curve C, then we have |[x0; x1; . . . ; xn]| ≤ L 2π max z∈C |f (z)| min z∈C |(z − x0) · · · (z − xn)| . (6.192) Theorem 6.8 (Hermite). Let f : D → C analytic, D connected, with zk, k = 1, ν, interpolation knots of multiplicity orders pk, ν k=0 pk = n + 1. Under these conditions, we have f (x) = ν k=1   pk−1 k=0   m s=0 f (m)(zk) (pk − m − 1)!(m − s)! dm−s dzm−s z − zk pk Q(z) z=zk Q(x) (x − zk)s+1     + Q(x)[x0; x1; . . . xn], (6.193) where x0, x1, . . . , xn are the interpolation knots z1, . . . , zν too, but counted as many times as indicated by the multiplicity order, x = x0, while Q will be specified later. Demonstration. From Theorem 6.7, we have [x0; x1; . . . ; xn] = 1 2πi C f (z)dz (z − z0)p0 · · · (z − zν)pν . (6.194) Let us choose the curves Ck in the form of circles of radii rk, sufficiently small, centered at zk, and interior to the domain bounded by the curve C. It follows that formula (6.194) may be written in the form [x0; x1; . . . ; xn] = 1 2πi Ck f (z)dz (z − z0)p0 · · · (z − zν)pν . (6.195) We denote q(z) = ν k=0 (z − zk)pk , (6.196) Ik = 1 2πi Ck z − zk pk q(z) f (z) 1 (z − zk)pk dz. (6.197) The function (z − zk)pk f (z)/q(z) is holomorphic in the circle bounded by Ck. From Cauchy’s theorem, we have Ik = 1 (pk − 1)! dpk−1 dzpk−1 z − zk pk q(z) f (z) z=zk . (6.198) Applying now Leibniz’s formula of differentiation of a product of functions, follows that I = ν k=0    pk−1 m=0   f (m)(zk) (pk − m − 1)! dm dzm z − zk pk q(z) z=zk      . (6.199)
  • 339. NEWTON-TYPE FORMULA WITH DIVIDED DIFFERENCES 331 We denote Q(z) = q(z) z − x ν k=1 (z − zk)pk (6.200) and have I = ν k=0    pk−1 m=0   f (m)(zk) (pk − m − 1)! dm dzm z − zk pk Q(z) 1 z − x z=zk      . (6.201) We make k = 0 and apply once more Leibniz’s formula to relation (6.201), obtaining I = f (x) Q(x) − ν k=1    pk−1 m=0   f (m)(zk) (pk − m − 1)! m s=0 1 (m − s)! dm dzm z − zk pk Q(z) 1 z − x z=zk      = [x0; x1; . . . ; xn], (6.202) that is, Hermite’s formula, the theorem thus being proved. 6.7 NEWTON-TYPE FORMULA WITH DIVIDED DIFFERENCES Lemma 6.1 If P (x) is a polynomial of nth degree, then its divided difference of (n + 1)th order satisfies the relation P (x; x0; x1; . . . ; xn) = 0, (6.203) where the knots xi, i = 0, n, are distinct. Demonstration. From the definition, we have P (x; x0) = P (x) − P (x0) x − x0 , (6.204) which is a polynomial of (n − 1)th degree. Further, P (x; x0; x1) = P (x; x0) − P (x0; x1) x − x1 (6.205) is a polynomial of (n − 2)th order. Moreover, it follows that x − x1 divides P (x; x0) − P (x0; x1). Proceeding step by step, we obtain P (x; x0; . . . ; xn−1), which is a polynomial of zeroth degree, that is, a constant, which will be denoted by C. Finally, P (x; x0; x1; . . . ; xn) = C − C x − xn = 0, (6.206) hence the lemma is proved. A consequence for the Lagrange interpolation polynomial is immediately obtained. Indeed, if P (x) is a Lagrange interpolation polynomial for which P (xi) = yi, i = 0, n, then P (x; x0; x1; . . . ; xn) = 0. (6.207) On the other hand, P (x) = P (x0) + (x − x0) P (x) − P (x0) x − x0 = P (x0) + P (x; x0)(x − x0). (6.208)
  • 340. 332 INTERPOLATION AND APPROXIMATION OF FUNCTIONS Proceeding step by step, it follows that P (x) = P (x0) + P (x; x0)(x − x0) = P (x0) + P (x0; x1)(x − x0) + P (x; x0; x1)(x − x0)(x − x1) = P (x0) + P (x0; x1)(x − x0) + P (x; x0; x1)(x − x0)(x − x1) + P (x; x0; x1; x2)(x − x0)(x − x1)(x − x2) = · · · = P (x0) + P (x0; x1)(x − x0) + P (x; x0; x1)(x − x0)(x − x1) + · · · + P (x0; x1; . . . ; xn)(x − x0) . . . (x − xn−1) + P (x; x0; x1; . . . ; xn)(x − x0) . . . (x − xn−1)(x − xn), (6.209) where we have marked the last term too, even if this one is equal to zero. Definition 6.11 The expression P (x) = y0 + [x0, x1](x − x0) + [x0, x1, x2](x − x0)(x − x1) + · · · + [x0, x1, . . . , xn](x − x0) · · · (x − xn−1) (6.210) is called Newton-type formula with divided differences. 6.8 INVERSE INTERPOLATION The determination of the value x for which the function takes a certain value y is considered in the frame of the inverse interpolation. Two cases may occur: • the division points are equidistant; • the division points are arbitrary. Let us begin with the first case. Newton’s forward interpolation polynomial leads to y = y0 + q 1! y0 + q(q − 1) 2! 2 y0 + · · · + q(q − 1) · · · (q − n + 1) n! n y0; (6.211) that is, y = y(q). (6.212) The problem consists in solving equation (6.211), because if we know q and the relation q = x − x0 h , (6.213) h being the interpolation step, then it results automatically in the required value of x. We start with an initial approximation of the solution and customarily we take q0 = y − y0 y0 , (6.214) the solution obtained from equation (6.211) by neglecting the nonlinear terms. If f is of class Cn+1([a, b]), [a, b] being the interval that contains the points of division, while f is the function that connects the values xi and yi = f (xi), i = 0, n, then the iterative sequence given by the relation qp+1 = y − y0 y0 − qp(qp − 1) 2! y0 − · · · − qp(qp − 1) · · · (qp − n + 1) n! n y0, p ∈ N, (6.215)
  • 341. DETERMINATION OF THE ROOTS OF AN EQUATION BY INVERSE INTERPOLATION 333 where q0 is definite by equation (6.214), is convergent to q, the solution of equation (6.211), the problem thus being solved. If the knots are arbitrary, then instead of constructing the Lagrange polynomial that gives y as a function of x, we construct the Lagrange polynomial that gives x as a function of y, that is, x = n i=0 (y − y0) · · · (y − yi−1)(y − yi+1) · · · (y − yn) (yi − y0) · · · (yi − yi−1)(yi − yi+1) · · · (yi − yn) yi (6.216) or x = x0 + [y0, y1](y − y0) + [y0, y1, y2](y − y0)(y − y1) + · · · + [y0, y1, . . . , yn](y − y0)(y − y1) · · · (y − yn−1), (6.217) the problem being solved by a simple numerical replacement. Obviously, this method may be applied in the case of equidistant knots also. 6.9 DETERMINATION OF THE ROOTS OF AN EQUATION BY INVERSE INTERPOLATION The method of determination of the roots of an equation by inverse interpolation is an application of the preceding paragraph. The idea consists in construction of a table of values with knots that are equidistant or not and in finding the value x for which f (x) = 0 at a certain interval. An application consists in the determination of the eigenvalues of a matrix. Let us consider the characteristic equation written in the form D(λ) = a11 − λ a12 · · · a1n a21 a22 − λ · · · a2n · · · · · · · · · · · · an1 an2 · · · ann − λ = 0, (6.218) and let us give to λ the values 0, 1, 2, . . . , n, resulting in D(0), D(1), . . . , D(n). By using Newton’s forward formula, we obtain D(λ) = D(0) + λ D(0) + λ(λ − 1) 2! 2 D(0) + · · · + λ(λ − 1) · · · (λ − n + 1) n! n D(0). (6.219) On the other hand, λ(λ − 1) · · · (λ − r + 1) n! = r p=1 cpr λp , r = 1, n, (6.220) so that expression (6.219) reads D(λ) = D(0) + n p=1  λp n i=p cpi i D (0)  , (6.221) thus obtaining Markoff’s formula.
  • 342. 334 INTERPOLATION AND APPROXIMATION OF FUNCTIONS If, instead of the values 0, 1, . . . , n, we choose the values a, a + h, . . . , a + nh, then Markoff’s formula takes the form D(λ) = D(a) + n p=1  (λ − a)p n i=p cpi hi i D(a)  . (6.222) Let us consider, for example, that the matrix A is given by A =   1 0 3 1 2 −1 0 3 1   ; (6.223) then D(0) = 1 0 3 1 2 −1 0 3 1 = 14, D(1) = 0 0 3 1 1 −1 0 3 0 = 9, D(2) = −1 0 3 1 0 −1 0 3 −1 = 6, D(3) = −2 0 3 1 −1 −1 0 3 −2 = −1, (6.224) λ 1! = λ, λ(λ − 1) 2! = λ2 2 − λ 2 , λ(λ − 1)(λ − 2) 3! = λ3 6 − λ2 2 + λ 3 , (6.225) c11 = 1, c12 = − 1 2 , c22 = 1 2 , c13 = 1 3 , c23 = − 1 2 , c33 = 1 6 . (6.226) We thus construct Table 6.3, the table of finite differences. We obtain D(λ) = D(0) + λ(c11 D(0) + c12 2 D(0) + c13 3 D(0)) + λ2 (c22 2 D(0) + c23 3 D(0)) + λ3 c33 3 D(0) = 14 − 8λ + 4λ2 − λ3 . (6.227) Let the function f : R → R, f (λ) = −λ3 + 4λ2 − 8λ + 14, the derivative of which is f (λ) = −3λ2 + 8λ − 8. The equation f (λ) = 0 has no real roots, and hence the function f (λ) is strictly decreasing on R. It follows that the equation f (λ) = 0 has a single real root; because D(2) > 0, D(3) < 0, we may state that this root is between 2 and 3. Refining this interval a little, we find that the root is between 2.7 and 3, a situation for which Table 6.4 of finite differences has been created. TABLE 6.3 The Table of Finite Differences λ D D 2 D 3 D 0 14 −5 2 −6 1 9 −3 −4 2 6 −7 3 −1
  • 343. INTERPOLATION BY SPLINE FUNCTIONS 335 TABLE 6.4 Table of Finite Differences λ f (λ) f 2 f 3 f 2.7 1.877 −0.869 −0.088 −0.006 2.8 1.008 −0.957 −0.094 2.9 0.051 −1.051 3.0 −1 We choose λ0 = 2.9, which corresponds to q0 = 2. We have q1 = 0 − 1.877 −0.869 − 2 × 1 2! × (−0.869) × (−0.088) − 2 × 1 × 0 3! × (−0.869) × (−0.006) = 2.05869, q2 = 2.04945, q3 = 2.05093, q4 = 2.05069, q5 = 2.05073, (6.228) from which we obtain the root of the equation f (λ) = 0, that is, λ ≈ 2.7 + 0.1q5 = 2.905, (6.229) for which f (λ) = 0.00073. (6.230) 6.10 INTERPOLATION BY SPLINE FUNCTIONS Let us consider a function f : [a, b] → R and an approximation of the same by an interpolation polynomial P such that P (xi) = f (xi) = yi, i = 0, n, xi being the interpolation knots. For higher values of n, there is a better chance for the degree of the interpolation polynomial to increase (obviously, remaining n at the most). But, a polynomial of a higher degree has a deep oscillatory character as can be seen in Figure 6.1. Because of this oscillation property, interpolation polynomials of high degree are avoided. An alternative used to obtain interpolation functions is to divide the interval [a, b] in a finite set of subintervals, using for each subinterval another interpolation polynomial. We thus obtain a piecewise interpolation. Let us observe that such a method does not guarantee the differentiability of the approximation function at the ends of the subintervals. Usually, it is required that the approx- imation function be of the same class of differentiability as the original function. Practically, if the approximation function is of class C2 on [a, b], then it is sufficient for most situations. Usually, we use on each subinterval polynomial functions of third degree; hence, we realize a cubical spline interpolation. Definition 6.12 Let f : [a, b] → R and the interpolation knots be a = x0 < x1 < · · · < x = b. (6.231) A cubical spline for the function f is a function S that satisfies the following conditions: (a) Sj = S|[xj ,xj+1] is a polynomial of degree at the most 3 for each j = 0, n − 1; (b) S(xj ) = f (xj ) for any j = 0, n; (c) Sj+1(xj+1) = Sj (xj+1) for any j = 0, n − 2; (d) S (xj+1) = Sj (xj+1) for j = 0, n − 2;
  • 344. 336 INTERPOLATION AND APPROXIMATION OF FUNCTIONS f O P y Ai(xi,yi) Ai+1(xi+1,yi+1) Ai−1(xi−1,yi−1) xxi +1 yi+1 yi−1 yi xi−1 xi Figure 6.1 The oscillatory character of polynomials of high degree. (e) Sj+1(xj+1) = Sj (xj+1) for j = 0, n − 2; (f) the following boundary conditions are satisfied: • or S (x0) = S (xn) = 0 (the so-called condition of free boundary), • or S (x0) = f (x0) and S (xn) = f (xn) (the so-called condition of imposed boundary). Observation 6.16 We have to determine n polynomials of third degree Sj , j = 0, n − 1. As any polynomial of third degree has four coefficients, it follows that the interpolation by spline functions is equivalent to the determination of 4n coefficients. Condition (b) of Definition 6.12 leads to n + 1 equations, the condition (c) leads to n − 1 equations, condition (d) implies n − 1 equations, while the condition (e) leads to n − 1 equations. We thus have 4n − 2 equations to which are added the two equations of point (f) for free or imposed frontier. A system of 4n equations with 4n unknowns are thus obtained. Observation 6.17 Let us choose the polynomials Sj , j = 0, n − 1, in the form Sj (x) = aj + bj (x − xj ) + cj (x − xj )2 + dj (x − xj )3 . (6.232) Immediately, we notice that Sj (xj ) = S(xj ) = f (xj ) = aj , j = 0, n − 1. (6.233) On the other hand, aj+1 = Sj+1(xj+1) = Sj (xj+1), (6.234) hence aj+1 = aj + bj (xj+1 − xj ) + cj (xj+1 − xj )2 + dj (xj+1 − xj )3 , j = 0, n − 1, (6.235) where we have assumed that an = f (xn). (6.236)
  • 345. INTERPOLATION BY SPLINE FUNCTIONS 337 Defining bn = S (xn) (6.237) and observing that Sj (x) = bj + 2cj (x − xj ) + 3dj (x − xj ), (6.238) from which S (xj ) = bj , j = 0, n − 1, (6.239) we obtain bj+1 = bj + 2cj (xj+1 − xj ) + 3dj (xj+1 − xj )2 , j = 0, n − 1, (6.240) from condition (d). Finally, defining cn = S (xn) 2 (6.241) and applying the condition (e), we obtain the relation cj+1 = cj + 3dj (xj+1 − xj ). (6.242) Relation (6.242) leads to dj = cj+1 − cj 3(xj+1 − xj ) ; (6.243) replacing in relations (6.235) and (6.240), we obtain aj+1 = aj + bj (xj+1 − xj ) + (xj+1 − xj )2 3 (2cj + cj+1), (6.244) bj+1 = bj + (xj+1 − x)(cj + cj+1), (6.245) for j = 0, n − 1. Eliminating bj between the last two relations, it follows that the system (xj − xj−1)cj−1 + 2(xj+1 − xj−1)cj + (xj+1 − xj )cj+1 = 3 (xj+1 − xj ) (aj+1 − aj ) − 3 xj − xj−1 (aj − aj−1), j = 1, n − 1 (6.246) the unknowns being cj , j = 0, n; this system is a linear one. Theorem 6.9 If f : [a, b] → R, then f has a unique natural interpolation spline, which is a unique interpolation spline that satisfies the free boundary conditions S (a) = S (b) = 0. Demonstration. The free boundary conditions imply cn = S (xn) 2 = 0, (6.247) 0 = S (x0) = 2c0 + 6d0(x0 − x0), c0 = 0. (6.248)
  • 346. 338 INTERPOLATION AND APPROXIMATION OF FUNCTIONS System (6.246) determines the matrix         1 0 0 · · · 0 0 x1 − x0 2 x2 − x0 x1 − x0 · · · 0 0 0 x2 − x1 2(x3 − x1) · · · 0 0 · · · · · · · · · · · · · · · · · · 0 0 0 · · · 2(xn − xn−2) xn − xn−1 0 0 0 · · · 0 1         , (6.249) the determinant of which is nonzero. Observation 6.18 We can describe an algorithm for the determination of a natural spline interpo- lation function as follows: – for i = 1, n − 1, calculate αi = 3[f (xi+1)(xi − xi−1) − f (xi)(xi+1 − xi−1) + f (xi−1) (xi+1 − xi)]/[(xi+1 − xi)(xi − xi−1)]; – set β0 = 1, γ0 = 0, δ0 = 0; – for i = 1, n − 1, calculate βi = 2(xi+1 − xi−1) − (xi − xi−1)γi−1, γi = (1/βi)(xi+1 − xi), δi = (1/βi)[αi − (xi − xi−1)δi−1]; – set βn = 1, δn = 0, cn = δn; – for j = n − 1, 0, calculate cj = δj − γj cj+1, bj = [f (xj+1) − f (xj )]/(xj+1 − xj ) −[(xj+1 − xj )(cj+1 + 2cj )]/3, dj = (cj+1 − cj )/3(xj+1 − xj ); – the natural spline interpolation function reads Sj (x) = f (xj ) + bj (x − xj ) + cj (x − xj )2 + dj (x − x)3 , j = 0, n − 1. Theorem 6.10 If f : [a, b] → R, then f admits a unique spline interpolation function that satisfies the imposed boundary conditions S (a) = f (a) and S (b) = f (b). Demonstration. Because S (a) = S (x0) = b0, (6.250) equation (6.244), written for j = 0, implies f (a) = a1 − a0 x1 − x0 − x1 − x0 3 (2c0 + c1), (6.251) from which 2(x1 − x0)c0 + (x1 − x0)c1 = 3 x1 − x0 (a1 − a0) − 3f (a). (6.252) Analogically, f (b) = bn = bn−1 + (xn − xn−1)(cn−1 + cn) (6.253) and equation (6.244), written for j = n − 1, leads to f (b) = an − an−1 xn − xn−1 − xn − xn−1 3 (2cn−1 + cn) + (xn − xn−1)(cn−1 + cn) = an − an−1 xn − xn−1 + xn − xn−1 3 (cn−1 + 2cn), (6.254) from which (xn − xn−1)cn−1 + 2(xn − xn−1)cn = 3f (b) − 3 xn − xn−1 (an − an−1). (6.255)
  • 347. HERMITE’S INTERPOLATION 339 The system formed by equation (6.246), equation (6.252), and equation (6.255) is a linear system, the matrix of which is         2 x1 − x0 x1 − x0 0 · · · 0 0 0 2(x2 − x0) x1 − x0 · · · 0 0 0 x2 − x1 2(x3 − x1) · · · · · · · · · · · · · · · · · · · · · · · · · · · 0 0 0 · · · 2(xn − xn−2) xn − xn−1 0 0 0 · · · xn−1 − xn−2 2(xn − xn−1)         . (6.256) The determinant of this matrix does not vanish, hence its solution is unique. Observation 6.19 In this case too, we may give an algorithm to determine the cubical spline interpolation function with the imposed boundary conditions as follows: – set α0 = [3(f (x1) − f (x0))]/x1 − x0 − 3f (x0), αn = 3f (xn) − [3(f (xn) − f (xn−1))]/xn − xn−1; – for i = 1, n − 1, calculate αi = 3[f (xi+1)(xi − xi−1) − f (xi)(xi+1 − xi−1) + f (xi−1)(xi+1 − xi)]/[(xi+1 − xi)(xi − xi−1)]; – set β0 = 2(x1 − x0), γ0 = 1/2, δ0 = α0/2(x1 − x0), b0 = f (x0); – for i = 1, n − 1, calculate βi = 2(xi+1 − xi−1) − (xi − xi−1)γi−1, γi = (1/βi)(xi+1 − xi), δi = (1/βi)[αi − (xi − xi−1)δi−1]; – set βn = (xn − xn−1)(2 − γn−1), δn = (1/βn)[αn − (xn − xn−1)δn−1], cn = δn; – for j = n − 1, 0, calculate cj = δj − γj cj + 1, bj = {[f (xj+1) − f (xj )]/xj+1 − xj } − {[(xj+1 − xj )(cj+1 + 2cj )]/3}, dj = (cj+1 − cj )/3(xj+1 − xj ); – the cubical spline interpolation function is given by Sj (x) = f (xj ) + bj (x − xj ) + cj (x − xj )2 + dj (x − xj )3 , j = 0, n − 1. 6.11 HERMITE’S INTERPOLATION Definition 6.13 Let [a, b] be an interval of the real axis, with n + 1 distinct points in this interval x0, x1, . . . , xn and mi, i = 0, n, n + 1 integers associated to the points xi. We denote by m the value m = max 0≤i≤n mi. (6.257) Let a function f : [a, b] → R, f at least of class Cm on the interval [a, b]. The polynomial P of minimum degree, which satisfies dk P (xi) dxk = dk f (xi) dxk (6.258) for any i = 0, n and k = 0, mi is called approximation osculating polynomial of the function f on the interval [a, b]. Observation 6.20 The degree of the approximation osculating polynomial P will be at the most M = n i=0 mi + n, (6.259) because the number of conditions that must be satisfied is n i=0 mi + n + 1 and a polynomial of degree M has M + 1 coefficients that are deduced from these conditions.
  • 348. 340 INTERPOLATION AND APPROXIMATION OF FUNCTIONS Observation 6.21 (i) If n = 0, then the approximation osculating polynomial P becomes just the Taylor polyno- mial of degree m0 for f at x0. (ii) If mi = 0 for i = 0, n, then the approximation osculating polynomial P coincides with Lagrange’s interpolation polynomial at the interpolation knots x0, x1, . . . , xn. Theorem 6.11 If f ∈ C1([a, b]), f : [a, b] → R and x0, x1, . . . , xn are n + 1 distinct points in [a, b], then the unique polynomial9 of minimum degree, which coincides with f at the knots xi, i = 0, n, and the derivative of which coincides with f at the very same points xi is given by H2n+1(x) = n j=0 f (xj )Hn,j (x) + n j=0 f (xj )Hn,j (x), (6.260) where Hn,j (x) = [1 − 2(x − xj )Ln,j (xj )]L2 n,j (x) (6.261) and Hn,j (x) = (x − xj )L2 n,j (x), (6.262) while Ln,j represents the polynomial coefficient of degree n and orderj, that is, Ln,j = (x − x0) · · · (x − xj−1)(x − xj+1) · · · (x − xn) (xj − x0) · · · (xj − xj−1)(xj − xj+1) · · · (xj − xn) . (6.263) If f ∈ C2n+2 ([a, b]), then the following expression of the approximation osculating polynomial error f (x) − H2n+1(x) = (x − x0)2 · · · (x − xn)2 (2n + 2)! f (2n+2) (ξ), (6.264) where ξ is a point situated between a and b, exists. Demonstration. It is similar to the proof of the existence and uniqueness of the Lagrange polynomial, formula (6.264) being obtained in an analogous way as the formula of the error in case of the Lagrange polynomial. 6.12 CHEBYSHEV’S POLYNOMIALS Definition 6.14 Let f : [a, b] → R be a real function of real variable. We call deviation from zero of the function f (x) on the segment [a, b] the greatest value of the modulus of the function f on the very same interval. Lemma 6.2 Let x ∈ [−1, 1] and Tn(x) = cos(n arccos x). (6.265) Under these conditions, 9The name of the polynomial is given in honor of Charles Hermite (1822–1901).
  • 349. CHEBYSHEV’S POLYNOMIALS 341 (i) Tn(x) represents a polynomial10 of degree n in x, the dominant coefficient of which is equal to 2n−1 ; (ii) all the roots of the equation Tn(x) = 0 are distinct and in the interval [−1, 1]; (iii) the maximal value of the polynomial Tn(x) on the interval [−1, 1] is equal to 1 and exists for xk = cos 2kπ n , k = 0, 1, . . . , n 2 + 1; (6.266) (iv) the minimal value of the polynomial Tn(x) on the interval [−1, 1] is equal to −1 and exists for xl = cos (2l + 1)π n , l = 0, 1, . . . , n − 1 2 + 1. (6.267) Demonstration. From Moivre’s formula (cos α + i sin α)n = cos nα + i sin nα, n ∈ N∗ ; (6.268) considering (cos α + i sin α)n = cosn α + iC1 ncosn−1 α sin α − C2 ncosn−2 αsin2 α + · · · + in sinn α, (6.269) we obtain cos nα = cosn α − C2 ncosn−2 αsin2 α + C4 ncosn−4 αsin4 α − · · · (6.270) Choosing now α = arccos x, (6.271) hence cos α = x, sin α = 1 − x2, (6.272) formula (6.270) leads to Tn(x) = cos(n arccos x) = xn − C2 nxn−2 (1 − x2 ) + C4 nxn−4 (1 − x2 )2 − · · · (6.273) It follows that Tn is a polynomial of degree n at the most. (i) On the other hand, the coefficient of xn is given by 1 + C2 n + C4 n + · · · = 2n−1 , (6.274) so that the point (i) of the lemma is proved. (ii) The following equation cos φ = 0 (6.275) leads to the solutions φ = 2k − 1 2 π, k ∈ Z. (6.276) It follows that Tn(x) = cos(n arccos x) = 0 (6.277) 10The polynomials are named after Pafnuty Lvovich Chebysev (1821–1894) who introduced them in 1854.
  • 350. 342 INTERPOLATION AND APPROXIMATION OF FUNCTIONS if and only if n arccos x = 2k − 1 2 π, x = cos 2k − 1 2n π , k ∈ Z. (6.278) Giving the values 1, 2, 3, . . . , n to k, we get n distinct roots of the equation Tn(x) = 0, that is, x1 = cos π 2n , x2 = cos 3π 2n , x3 = cos 5π 2n , . . . , xn = cos 2n − 1 2n π . (6.279) (iii) From (6.265) it follows that −1 ≤ Tn(x) ≤ 1. (6.280) The condition Tn(x) = 1 leads to n arccos x = 2kπ, k ∈ Z, (6.281) obtaining immediately relation (6.266). (iv) It is analogous to point (iii), the condition Tn(x) = −1 leading to n arccos x = (2k + 1)π, k ∈ Z. (6.282) Definition 6.15 The polynomials Kn(x) = 21−n Tn(x), x ∈ [−1, 1], are called Chebyshev’s poly- nomials. Theorem 6.12 (Chebyshev) (i) The deviation from zero of the polynomial Q(x) = xn + a1xn−1 + a2xn−2 + · · · + an−1x + an (6.283) cannot be less then 21−n on the interval [−1, 1] and it is equal to 21−n only for Chebyshev’s polynomial Kn(x). (ii) There exists a unique polynomial of degree n with the dominant coefficient equal to 1, the deviation of which on the segment [−1, 1] is equal to 21−n , this polynomial being, obviously, Kn(x). Demonstration (i) Let us suppose, per absurdum, that there would exist a polynomial Q(x) of the form (6.283) for which the deviation from zero would be less than 21−n . This means that for any x ∈ [−1, 1], we have − 1 2n+1 < Q(x) < 1 2n+1 (6.284) or, equivalently, Q(x) − 1 2n+1 < 0, Q(x) + 1 2n+1 > 0. (6.285)
  • 351. CHEBYSHEV’S POLYNOMIALS 343 Let us consider the polynomial P (x) = Q(x) − Kn(x). (6.286) Because the coefficients of the terms of maximal degree are equal to 1 both for Q(x) and forKn(x), it follows that P (x) is a polynomial of degree n − 1 at the most. On the other hand, from formulae (6.266) and (6.267) it follows that P (1) = Q(1) − Kn(1) < 0, P cos π n = Q cos π n − Kn cos π n > 0, P cos 2π n = Q cos 2π n − Kn cos 2π n < 0, P cos 3π n = Q cos 3π n − Kn cos 3π n > 0, . . . (6.287) This means that for x = 1, x = cos(2π/n), x = cos(4π/n), . . . , the polynomial P (x) is negative, while for x = cos(π/n), x = cos(3π/n), . . . , the polynomial P (x) is positive. It follows that the polynomial P (x) has at least one root between 1 and cos(π/n), at least one root between x = cos(π/n) and x = cos(2π/n), . . . , at least one root between x = cos[(n − 1)π/n] and x = cos π = 1. Hence, the polynomial P (x) has at least n roots. But P (x) is of degree n − 1 at the most. That means that P (x) = 0, hence Q(x) = Kn(x). (ii) Let us assume now, per absurdum too, that there exists a polynomial Q(x) of degree n at the most, the dominant coefficient of which is equal to 1 and for which the deviation from zero on the segment [−1, 1] is equal to 21−n . Let P (x) = Q(x) − Kn(x), (6.288) which obviously is a polynomial of degree n − 1 at the most. For the polynomial P (x) we may state that it has nonpositive values at the points x = 1, x = cos(2π/n), x = cos(4π/n), . . . , while at the points x = cos(π/n), x = cos(3π/n), . . . it has nonnegative ones. It follows that on each interval [−1, cos((n − 1)π/n)], [cos((n − 1)π/n), cos((n − 2)π/n)], . . . , [cos(3π/n), cos(2π/n)], [cos(2π/n), cos(π/n)], [cos(π/n), 1] the equation P (x) = 0 has at least one root. But, although we have n intervals, the number of roots of the equation P (x) = 0 may be less than n because a root may be the common extremity of two neighboring intervals. Let us now consider such a case, for example, the case in which the root is x = cos(π/n). This means that in the interval [cos(2π/n), 1] the equation P (x) = 0 has a single root, that is, x. Because of this, it follows that the curve y = P (x) is tangential to the Ox-axis at the point x = cos(π/n). If not, then the curve y = P (x) pierces the Ox-axis at the point x and P (x) becomes positive either on the interval (cos(2π/n), cos(π/n)) or on the interval (cos(π/n), 1). But P (x) is a continuous function and P (cos(2π/n)) < 0, P (1) < 0, and hence the equation P (x) = 0 has the second root on the interval [cos(2π/n), 1], which is a contradiction, from which the curve y = P (x) is tangential to the Ox-axis at the point x. This means that x is a double root of the equationP (x) = 0. Let us suppose now that x is not a double root of the equation P (x) = 0. Hence, the equation may be written in the form x − cos π n P1(x) = 0, (6.289) where the polynomial P1(x) is of degree n − 2 at the most and P1(x) = 0. But P1(x) is a continuous function so that it has a constant sign in a neighborhood V of x. But the polynomial P (x) = x − cos π n P1(x) (6.290)
  • 352. 344 INTERPOLATION AND APPROXIMATION OF FUNCTIONS changes the sign on V , together with the factor x − cos(π/n); it means that the curve y = P (x) pierces the axis Ox at the point x, which is not possible. Hence, if x = cos(π/n) is a root of the equation P (x) = 0, then it is at least a double one. It follows that on each interval of the form [cos(kπ/n), cos((k − 1)π/n)], k = 1, n − 1, we have at least one root of the equation P (x) = 0 (if it is in the interior of the interval, then it is at least a single one; if it is at one of the frontiers (excepting the ends −1 and 1 where one has no roots) it is at least a double one). The equation P (x) = 0 will thus have at least n roots (distinct or not), P (x) being a polynomial of degree n − 1 at the most. It follows that P (x) is an identical zero polynomial and the point (ii) of the theorem is proved. 6.13 MINI–MAX APPROXIMATION OF FUNCTIONS Let the function f : [a, b] → R and its approximate g : [a, b] → R. We suppose that both f and g are at least of class C0 on the interval [a, b]. The mini–max principle requires that the approximation function g satisfies the condition max x∈[a,b] |f (x) − g(x)| = minimum. (6.291) Observation 6.22 Condition (6.291) is incomplete at least for one reason: the kind of function we require for the approximate gis not specified. Usually, g is required in the set of polynomial functions. Let us consider on the interval [a, b] a division formed by the points x0, x1, . . . , xn so that xi < xi+1, i = 0, n − 1, and let g : [a, b] → R the approximate of the function f , which we require in the form of a polynomial Pn(x) of degree n at the most. The mini–max principle given by relation (6.291) is thus written in the form max x∈[a,b] |f (x) − Pn(x)| = minimum. (6.292) In this case, the required polynomialPn(x) will have the smallest deviation from the function f on the interval [a, b]. We also require that the polynomial Pn(x) pass through the interpolation knots xi, that is, Pn(xi) = yi, yi = f (xi), i = 0, n. (6.293) In contrast to the interpolations considered until now, the interpolation knots are not known. We minimize error (6.292) by an adequate choice of knots. Lagrange’s interpolation leads to |f (x) − Pn(x)| = |f (n+1)(ξ)| (n + 1)! |(x − x0)(x − x1) · · · (x − xn)|, (6.294) where ξ is a point situated between a and b, while f is at least of class Cn+1 on [a, b]. Let us consider the product Rn+1(x) = (x − x0)(x − x1) · · · (x − xn) (6.295) and let us make the change of variable x = b − a 2 u + b + a 2 , (6.296) so that the interval [a, b] is transformed into the interval [−1, 1]. It follows that Rn+1(u) = b − a 2 n+1 (u − u0)(u − u1) · · · (u − un). (6.297)
  • 353. ALMOST MINI–MAX APPROXIMATION OF FUNCTIONS 345 As we know from Chebyshev’s polynomials, the minimum of the product Rn+1(u), which is a polynomial of (n + 1)th degree in u, is realized if ui, i = 0, n, are just the zeros of Chebyshev’s polynomial Kn+1(u). We may write Rn+1(u) ≥ b − a 2 n+1 1 2n , (6.298) and formula (6.294) leads to |f (x) − Pn(x)| ≥ |f (n+1) (ξ)| (n + 1)! b − a 2 n+1 1 2n . (6.299) On the other hand, the roots of Chebyshev’s polynomial Kn+1(u) are u0 = cos 2 (n + 1) − 1 2(n + 1) π , u1 = cos 2n − 1 2 (n + 1) π , . . . , un = cos π 2 (n + 1) , (6.300) so that the interpolation knots will be xi = b − a 2 ui + b + a 2 . (6.301) Hence, it follows that among all the polynomials of degree n at the most, the one that minimizes error (6.292) is the one constructed with the abscissas of the knots given by the roots of Chebyshev’s polynomial Kn+1(x), of degree n + 1. 6.14 ALMOST MINI–MAX APPROXIMATION OF FUNCTIONS Let us give a somewhat new formulation to the mini–max optimization criterion. Instead of max x∈[a,b] |f (x) − Pn(x)| = minimum, (6.302) where f is a real function defined on [a, b], at least of class C0 on [a, b], while Pn(x) is a polynomial of degree n at the most, we will require max x∈[a,b] |f (x) − Pn(x)| ≤ ε, (6.303) where ε is a positive error a priori imposed. We reduce the problem to the interval [−1, 1], with its generality not being changed. We also suppose that f is analytic on [−1, 1], that is, f may be expanded into a convergent power series f (x) = ∞ k=0 bkxk . (6.304) Lemma 6.3 The Chebyshev polynomials constitute the basis for the vector space of real polynomials. Demonstration. The idea consists in showing that every polynomial P (x) may be written as a linear combination of the polynomials Kn(x), n ∈ N. The demonstration is made by induction after n. The affirmation is true for n = 0, because 1 is Chebyshev’s polynomial K0(x). Let us suppose that
  • 354. 346 INTERPOLATION AND APPROXIMATION OF FUNCTIONS the affirmation holds for any polynomial xk , k ≤ n, and let us state it for xn+1 . The polynomial xn+1 − Kn+1(x) is of degree n at the most for which we can write xn+1 − Kn+1(x) = α0K0(x) + α1K1(x) + · · · + αnKn(x), (6.305) with αi ∈ R, i = 0, n. It follows that xn+1 can also be written as a combination of Chebyshev polynomials and, by mathematical induction, the lemma is proved. Taking into account Lemma 6.3, it follows that relation (6.304) may be written by means of the Chebyshev polynomials as follows: f (x) = ∞ k=0 akKk(x). (6.306) Truncating series (6.306) at k = n, we get Pn(x) = n k=0 akKk(x) (6.307) and criterion (6.303) leads to |f (x) − Pn(x)| = ∞ k=n+1 akKk (x) < ∞ k=n+1 |ak||Kk(x)| ≤ ∞ k=n+1 |ak| 1 2k−1 < ∞ k=n+1 |ak| < ε. (6.308) Instead of the infinite sum ∞ k=n+1|ak| we usually consider the approximation N k=n+1|ak| so that condition (6.303) now reads N k=n+1 |ak| < ε. (6.309) Definition 6.16 The polynomial Pn(x) thus obtained is called an almost mini–max polynomial for the function f . Observation 6.23 (i) The almost mini–max polynomial Pn(x) of the function f may be different from the mini–max polynomial constructed in Section 6.13. (ii) We know that the mini–max polynomial minimizes the error, but this minimal error is not known. Using the almost mini–max polynomial, the error is less than ε > 0 imposed a priori. 6.15 APPROXIMATION OF FUNCTIONS BY TRIGONOMETRIC FUNCTIONS (FOURIER) Definition 6.17 (i) Let H be a fixed Hilbert space. We call basis in H a system B = {ei}i∈I linearly independent of elements in H for which the Hilbert subspace generated by it is dense in H. (ii) We call orthonormal basis in H (total or complete orthonormal system) any basis B of H for which we have for any two elements ei and ej of B,
  • 355. APPROXIMATION OF FUNCTIONS BY TRIGONOMETRIC FUNCTIONS (FOURIER) 347 ei, ej = δij , (6.310) where ·, · is the scalar product on H, while δij is Kronecker’s symbol δij = 1 for i = j, 0 otherwise. (6.311) (iii) Let H be a Hilbert space with an orthonormal basis B = {en}n≥1. For any arbitrary u ∈ H, we call generalized Fourier coefficients of u relative to B the numbers cn = u, en , n ≥ 1, (6.312) while the series n>1cnen is called generalized Fourier series11 of u relative to B. Theorem 6.13 (Generalization of Dirichlet’s Theorem). Let H be a Hilbert space with an orthonormal basis B = {e}n≥1. For any u ∈ H, its generalized Fourier series relative to B is convergent in H, its sum being equal to u. The numerical series u≥1|cn|2 is convergent, its sum being equal to u 2 . Demonstration. We must show that lim n→∞ u − n i=1 ciei = 0, (6.313) lim n→∞ u 2 − n i=1 |ci|2 = 0, (6.314) respectively. Let un = n i=1 ciei, n ≥ 1, (6.315) where ci given by equation (6.312) are the Fourier coefficients of u relative to the basis B. Let k ∈ N, 1 ≤ k ≤ n, arbitrary. We may write un, ek = n i=1 ci ei, ek = n i=1 ciδij = ck = u, ek , (6.316) that is, un − u, ek = 0. (6.317) Let n ≥ 1, arbitrary but fixed, and let us denote by Hn the vector subspace of H, generated by the elements e1, e2, . . . , en. It follows that un − u ∈ H⊥ n for any n ≥ 1. But Hn is a subspace of finite dimension (dim Hn = n), hence a closed set in H. Moreover, un is the projection of u on Hn. Because u − un 2 + un − v 2 = u − v 2 , (6.318) 11The series is called after Jean Baptiste Joseph Fourier (1768–1830) who published his results in M´emoire sur la propagation de la chaleur dans les corps solides in 1807 and then in Th´eorie analytique de la chaleur in 1822. The first steps in this field were made by Leonhard Euler (1707–1783), Jean-Baptiste le Rond d’Alembert (1717–1783), and Daniel Bernoulli (1700–1782).
  • 356. 348 INTERPOLATION AND APPROXIMATION OF FUNCTIONS corresponding to Pythagoras’s theorem, it follows that u − un ≤ u − v . (6.319) Let ε > 0 be fixed. Because the subspace generated by B in H is dense, it follows that there exists v ∈ H, a finite linear combination of elements of B such that u − v < ε. (6.320) It follows that there exists a natural number N(ε) such that v ∈ Hn for any n ≥ N(ε), and from (6.319) and (6.320) we obtain u − un < ε (6.321) too for any n ≥ N(ε). We have shown that un → u in H, that is, ∞ i=1 ciei = u. (6.322) On the other hand, u 2 = un, un = n i=1 ciei, n j=1 cj ej = n i=1 n j=1 cicj δij = n i=1 |ci|2 , (6.323) a relation valid for any n ≥ 1. Making n → ∞ and considering that un → u , it follows that ∞ i=1 |ci|2 = u 2 . (6.324) Definition 6.18 Relation (6.324) is called the relation or the equality of Parseval. Corollary 6.4 (i) If the basis B is fixed and u ∈ H, then the Fourier expansion of u is unique. (ii) For any n ≥ 1 we have Bessel’s inequality n i=1 |ci|2 ≤ u 2 (6.325) and lim n→∞ cn = 0. (6.326) (iii) Let H = L2 [−π,π], that is, the space of real square integrable functions, on which the scalar product f, g = π −π f (x)g(x)dx (6.327) has been defined, and let us consider as orthonormal basis in H the sequence e1 = 1 √ 2π , e2 = 1 √ π cos x, e3 = 1 √ π sin x, e4 = 1 √ π cos 2x, e5 = 1 √ π sin 2x, . . . (6.328)
  • 357. APPROXIMATION OF FUNCTIONS BY TRIGONOMETRIC FUNCTIONS (FOURIER) 349 Under these conditions, for u : [−π, π] → R, u ∈ H, the generalized Fourier coefficients of u relative to the orthonormal basis B = {e}n≥1 are c1 = π 2 a0, c2 = a1 √ π, c3 = b1 √ π, c4 = a2 √ π, c5 = b2 √ π, . . . , (6.329) where an = 1 π π −π u(x) cos(nx)dx, bn = 1 π π −π u(x) sin(nx)dx, n ≥ 0. (6.330) Parseval’s equality now reads a2 0 2 + ∞ i=1 (a2 i + b2 i ) = 1 π π π u2 (x)dx. (6.331) (iv) (Dirichlet’s theorem) If the periodic function f (x) of period 2π satisfies Dirichlet’s condi- tions in the interval (−π, π), that is, (a) f is uniformly bounded on (−π, π), that is, there exists M > 0 and finite such that |f (x)| ≤ M for any x ∈ (−π, π), and (b) f has a finite number of strict extremes, then, at each point of continuity x ∈ (−π, π), the function f (x) may be expanded into a trigonometric Fourier series f (x) = a0 2 + ∞ i=1 [ai cos(ix) + bi sin(ix)], (6.332) where the Fourier coefficients ai and bi are given by ai = 1 π π −π f (x) cos(ix)dx, i = 0, 1, 2, . . . , (6.333) bi = 1 π π −π f (x) sin(ix)dx, i = 1, 2, . . . , (6.334) respectively. If x ∈ (−π, π) is a point of discontinuity for the function f (x), then the sum S(x) of the Fourier series (6.332) attached to f reads S(x) = f (x − 0) + f (x + 0) 2 . (6.335) At the ends, we have S(−π) = S(π) = f (−π + 0) + f (π + 0) 2 . (6.336) Demonstration (i) Let us suppose, per absurdum, that the expansion is not unique, that is, u = ∞ i=1 ciei and u = ∞ i=1 diei, (6.337) where there exists at least i ∈ N∗ such that ci = di. Let vn = n i=1 diei. It follows that vn, ei = di for any i ≤ n, making n → ∞; because vn → u it also follows that u, ei = di, that is, di = ci for any i ≥ 1.
  • 358. 350 INTERPOLATION AND APPROXIMATION OF FUNCTIONS (ii) The relations are obvious, taking into account Parseval’s equality. (iii) We successively have c1 = u, e1 = π −π u(x) 1 √ 2π dx = π 2 a0, (6.338) c2 = u, e2 = π −π u(x) 1 √ π cos xdx = √ πa1, (6.339) c3 = u, e3 = π −π u(x) 1 √ π sin xdx = √ πb1 (6.340) and, in general, all the requested relations are satisfied. Parseval’s equality becomes π −π u2 (x)dx = n i=1 |ci|2 = π 2 a2 0 + π ∞ i=1 (a2 i + b2 i ) = π a2 0 2 + ∞ i=1 a2 i + b2 i , (6.341) that is, relation (6.331). (iv) Obviously, a function f that satisfies Dirichlet’s conditions is a function of L2 [−π,π] and the theorem is proved. At the points of discontinuity, the Fourier series is replaced by relations (6.335) and (6.336), respectively, which may or not satisfy equality (6.332). Observation 6.24 (i) If the function f (x) is even, f (x) = f (−x), then bi = 0, for any i ∈ N∗ and the Fourier series becomes f (x) = a0 2 + ∞ i=1 ai cos(ix), ai = 2 π π 0 f (x) cos(ix)dx, i ∈ N. (6.342) (ii) If the function f (x) is odd f (−x) = −f (x), then ai = 0, i ∈ N, and the Fourier series reads f (x) = ∞ i=1 bi sin(ix), bi = 2 π π 0 f (x) sin(ix)dx, i ∈ N. (6.343) (iii) If the function f (x) satisfies Dirichlet’s conditions on the interval (−l, l), then we have the expansion f (x) = a0 2 + ∞ i=1 ai cos πi l x + bi sin πi l x , (6.344) where ai = 1 l l −l f (x) cos πi l x dx, i = 0, 1, 2, . . . , (6.345) bi = 1 l l −l f (x) sin πi l x dx, i = 1, 2, 3, . . . (6.346)
  • 359. APPROXIMATION OF FUNCTIONS BY TRIGONOMETRIC FUNCTIONS (FOURIER) 351 (iv) If the function f (x) satisfies Dirichlet’s conditions on a finite interval (a, b), then we make the change of variable x = αz + β, (6.347) so that a = −απ + β, b = απ + β, (6.348) from which β = a + b 2 , α = b − a 2π . (6.349) Transformation (6.347) may be written as x = b − a 2π z + b + a 2 . (6.350) Let us consider now the case in which the function f is given numerically, that is, we know the values yi = f (xi), (6.351) with xi, i = 0, n, division knots, xi ∈ [−π, π]. We denote by S(x) the series S(x) = a0 2 + m k=1 ak cos(kx) + m k=1 bk sin(kx). (6.352) The coefficients ai, i = 0, n, and bi, i = 1, n, are determined by the condition of minimal error εf = n i=0 [yi − S(xi)]2 = minimum. (6.353) There result the conditions ∂εf ∂aj = 0, j = 0, m; ∂εf ∂bj = 0, j = 1, m. (6.354) Taking into account that ∂S(xi) ∂a0 = 1 2 , ∂S(xi) ∂aj = cos(jxi), ∂S(xi) ∂bj = sin(jxi), j = 1, m, (6.355) Equation (6.353) and equation (6.354) lead to the system n i=0 yi = n i=0 S(xi), n i=0 yi cos(jxi) = n i=0 S(xi) cos(jxi), n i=0 yi sin(jxi) = n i=0 S(xi) sin(jxi), j = 1, m. (6.356) The system is compatible if n + 1 ≥ 2m + 1. (6.357)
  • 360. 352 INTERPOLATION AND APPROXIMATION OF FUNCTIONS O y x (x2,y2) (x1,y1) (x0,y0) (x3,y3) (xi,yi) (xN,yN) Figure 6.2 Discrete approximation by the least squares. 6.16 APPROXIMATION OF FUNCTIONS BY THE LEAST SQUARES An idea to consider the approximation function g(x) for a given function f (x) is that of writing the approximate in the form of a finite linear combination of certain functions12 = {φi}i=1,n that satisfy certain properties. Under these conditions, the approximate g(x) will be of the form g(x) = n i=1 ciφi(x), (6.358) where ci, i = 1, n are real constants. Thus, once the set is chosen, the problem is reduced to the determination of the constants ci, i = 1, n. These constants result from the condition that the graph of the approximate g(x) be sufficiently near the set M = {(xi, yi), i = 1, N}. The nearness of the approximate g(x) to the set M is calculated by means of a norm, which usually is f 2 = b a f 2 (x)dx (6.359) for f ∈ C0([a, b]) and f 2 = n i=0 |f (xi)|2 (6.360) for the discrete case, respectively. The problem of approximation of a given function f by a linear combination of the functions of the set may be seen as a problem of determination of the constants ci, i = 1, n, which minimize the expression f − n i=1 ciφi = minimum. (6.361) Definition 6.19 If the norm in relation (6.361) is one of norms (6.359) or (6.360), then the approximation of the function f (x) by g(x) = n i=1 ciφi(x) (6.362) is called approximation by the least square. 12The first description of the least squares method was given by Carl Friedrich Gauss (1777–1855) in Theoria motus corporum coelestium in sectionibus conicis Solem ambientum in 1809.
  • 361. APPROXIMATION OF FUNCTIONS BY THE LEAST SQUARES 353 Let us suppose, at the beginning, that we have a sequence of values (xi, yi), i = 0, N, as a result of the application of an unknown function f (x) on the distinct values xi, i = 0, N (Fig. 6.2). We require a straight line that realizes the best approximation. The problem is thus reduced to the minimization of the function E(a, b) = N i=0 [yi − (axi + b)]2 , (6.363) where a and b are the parameters of the straight line (d) : y = ax + b. (6.364) For minimizing expression (6.363), it is necessary that ∂E(a, b) ∂a = 0, ∂E(a, b) ∂b = 0 (6.365) or, otherwise, ∂ ∂a N i=0 [yi − (axi + b)]2 = 0, ∂ ∂b N i=0 [yi − (axi + b)]2 = 0. (6.366) System (6.366) is equivalent with a N i=0 x2 i + b N i=0 xi = N i=0 xiyi, a N i=0 xi + b(N + 1) = N i=0 yi (6.367) and has the solution a = (N + 1) N i=0 xiyi − N i=0 xi N i=0 yi (N + 1) N i=0 x2 i − N i=0 xi 2 , b = N i=0 x2 i N i=0 yi − N i=0 xiyi N i=0 xi (N + 1) N i=0 x2 i − N i=0 xi 2 . (6.368) Considering that d2 E(a, b) is everywhere positive definite, it follows that the function E(a, b) is convex; hence, the previous critical point given by relation (6.368) is a point of global minimum. Let us pass now to the general case in which the approximate g is a polynomial of nth degree g(x) = a0 + a1x + a2x2 + · · · + anxn , (6.369) with n < N. The problem is obviously reduced to the determination of the coefficients a0, a1, . . . , an, which minimize the expression E(al) = N i=0 [yi − g(xi)]2 = N i=0 y2 i − 2 N i=0   n j=0 aj x j i   yi + N i=0   n j=0 aj x j i   2 = N i=0 y2 i − 2 n j=0 aj N i=0 yix j i + n j=0 n k=0 aj ak N i=0 x j+k i . (6.370)
  • 362. 354 INTERPOLATION AND APPROXIMATION OF FUNCTIONS To minimize E(al) it is necessary that ∂E ∂al = 0, for l = 0, n. (6.371) There result the equations −2 N i=0 yix j i + 2 n k=0 ak N i=0 x j+k i = ∂E ∂aj = 0, j = 0, n. (6.372) We obtain the determined compatible system a0 N i=0 x0 i + a1 N i=0 x1 i + a2 N i=0 x2 i + · · · + an N i=0 xn i = N i=0 yix0 i , a0 N i=0 x1 i + a1 N i=0 x2 i + a2 N i=0 x3 i + · · · + an N i=0 xn+1 i = N i=0 yix1 i , . . . , a0 N i=0 xn i + a1 N i=0 xn+1 i + a2 N i=0 xn+2 i + · · · + an N i=0 x2n i = N i=0 yixn i . (6.373) Because the error is a convex function, it follows that the solution of system (6.373) is a point of global minimum. 6.17 OTHER METHODS OF INTERPOLATION 6.17.1 Interpolation with Rational Functions The interpolation with polynomials has at least one disadvantage, that is, for x → ±∞ the values of the polynomials become infinite too. Many times, we know, practically, some information about the real function, concerning its behavior at ±∞, as, for instance, it has a certain oblique asymptote, it is bounded, and so on. For this reason, we may choose as approximate function a rational one R(x) = P (x) Q(x) , (6.374) where P and Q are polynomials of mth and nth degree, respectively. We may write R(x) = a0xm + a1xm−1 + · · · + am b0xn + b1xn−1 + · · · + bn . (6.375) Because b0 may be a common factor, we may choose b0 = 1 such that expression (6.375) takes the form R(x) = a0xm + a1xm−1 + · · · + am xn + b1xn−1 + · · · + bn . (6.376) We have m + n + 1 unknown coefficients (a0, . . . , am, b1, . . . , bn) to determine in relation (6.376) so that m + n + 1 division points are necessary. If, for instance, we know that the function has an oblique asymptote of the form y = cx + d, then we obtain the values m = n + 1, a0 = c, a1 − b1c = d, the number of division points necessary to determine the coefficients thus being reduced.
  • 363. OTHER METHODS OF INTERPOLATION 355 6.17.2 The Method of Least Squares with Rational Functions We may also give in this case a criterion of optimization, that is, N i=1 yi − P xi Q(xi) 2 = minimum. (6.377) Proceeding as with the method of least squares, it follows that the coefficients of the polynomials P (x) and Q(x) will be determined by equations of the form N i=1    yi − P xi Q(xi) ∂P (xi) ∂aj Q(xi)    = 0, N i=1 yi − P xi Q(xi) P (xi) Q2(xi) ∂Q(xi) ∂bk = 0, (6.378) where j = 0, m, k = 1, n, while N is the number of the division points at which we know the values of the function. Unfortunately, system (6.378) is a nonlinear one so that the calculation of the coefficients of the polynomials P (x) and Q(x) become difficult. 6.17.3 Interpolation with Exponentials We may require an approximate of the function f (x) in the form g(x) = C1eα1x + C2eα2x + · · · + Cpeαpx , (6.379) thus introducing 2p unknowns Ci, αi, i = 1, p. These unknowns are deduced by the conditions f (xi) = yi = g(xi), i = 0, 2p − 1. (6.380) Two cases may occur: (i) The exponents are known, that is, we know the values αi, i = 1, p. In this case, because the exponentials are linearly independent, we obtain a linear system of p equations with p unknowns Ci, i = 1, p, compatible determined, of the form C1eα1x1 + C2eα2x1 + · · · + Cpeαpx1 = y1, C1eα1x2 + C2eα2x2 + · · · + Cpeαpx2 = y2, . . . , C1eα1xp + C2eα2xp + · · · + Cpeαpxp = yp. (6.381) (ii) The exponents are unknown. If the division points are equidistant, then we apply Prony’s method.13 To do this, we observe that the exponential eαij = (eαi )j = ρ j i (6.382) satisfies, for any i = 0, k − 1, a relation of the form y(j + k) + Ck−1y(j + k − 1) + Ck−2y(j + k − 2) + · · · + C0y(j) = 0, (6.383) 13The method was introduced by Gaspard Clair Franc¸ois Marie Riche de Prony (1755–1839) in 1795.
  • 364. 356 INTERPOLATION AND APPROXIMATION OF FUNCTIONS where we have supposed that the division points are xj = j − 1; this may be always made, eventually, in a scalar way on the Ox-axis. In relation (6.383), the coefficients Ci, i = 0, k − 1, are constant real numbers. The characteristic equation is of the form ρk + Ck−1ρk−1 + Ck−2ρk−2 + · · · + C0 = 0. (6.384) We remark that the original function f (x) satisfies equation (6.383), that is, f (j + k) + Ck−1f (j + k − 1) + Ck−2f (j + k − 2) + · · · + C0f (j) = 0, j = 1, n − k. (6.385) From the last relation, there result the values of the constants C0, . . . , Ck−1, while from relation (6.384) we obtain the roots ρ0, . . . , ρk−1, the interpolation exponentials being of the form g(x) = C0eρ0x + C1eρ1x + · · · + Ck−1eρk−1x . (6.386) If certain parameters are imposed, for example, we know α0, then the number of unknowns diminishes so that equation (6.384) now has an imposed root ρ0 = α0. 6.18 NUMERICAL EXAMPLES Example 6.1 Let us consider the following table of data. xi yi = f (xi) 0 −2 1 −3 2 −16 3 −35 4 −30 We solve the problem to determine the Lagrange interpolation polynomial. From the relation L4(x) = 4 i=0     4 j=0 j=i x − xj xi − xj     yi, (6.387) we deduce L4(x) = (x − 1)(x − 2)(x − 3)(x − 4) (0 − 1)(0 − 2)(0 − 3)(0 − 4) (−2) + x(x − 2)(x − 3)(x − 4) (1 − 0)(1 − 2)(1 − 3)(1 − 4) (−3) + x(x − 1)(x − 3)(x − 4) (2 − 0)(2 − 1)(2 − 3)(2 − 4) (−16) + x(x − 1)(x − 2)(x − 4) (3 − 0)(3 − 1)(3 − 2)(3 − 4) (−35) + x(x − 1)(x − 2)(x − 3) (4 − 0)(4 − 1)(4 − 2)(4 − 3) (−30) = x4 − 5x3 + 2x2 + x − 2. (6.388) Example 6.2 Let the function f : [−1, ∞) → [0, ∞), f (x) = √ x + 1. (6.389)
  • 365. NUMERICAL EXAMPLES 357 We wish to determine approximations of √ 1.1 and √ 0.89, by means of the expansions into a Taylor series of the function f . Because f (x) = (x + 1)− 1 2 2 , f (x) = − (x + 1)− 3 2 22 , f (x) = 1 × 3 23 (x + 1)− 5 2 , . . . , f (n) (x) = (−1)n+1 (2n − 3)!! 2n (x + 1) 1−2n 2 , n ≥ 2, (6.390) we deduce f (0) = 1 2 , f (0) = − 1 22 , f (0) = 1 × 3 23 , . . . , f (n) (0) = (−1)n+1(2n − 3)!! 2n , n ≥ 2, (6.391) obtaining the expansion into a Taylor series around the origin f (x) = f (0) + 1 2 x 1! + n k=2 xk k! (−1)k+1(2k − 3)!! 2k + xn+1 (n + 1)! (−1)n+2(2n − 1)!! 2n+1 (1 + ξ)− 1+2n 2 , (6.392) where ξ is a point situated between 0 and x. For an approximate calculation of √ 1.1 we have x = 0.1 and it follows that f (0.1) ≈ f (0) = 1, (6.393) f (0.1) ≈ f (0) + 0.1 2 × 1! = 1.05, (6.394) f (0.1) ≈ f (0) + 0.1 2 × 1! − 0.12 22 × 2! = 1.04875, (6.395) f (0.1) ≈ f (0) + 0.1 2 × 1! − 0.12 22 × 2! + 0.13 × 3 23 × 3! = 1.0488125, (6.396) f (0.1) ≈ f (0) + 0.1 2 × 1! − 0.12 22 × 2! + 0.13 × 3 23 × 3! − 0.14 × 3 × 5 24 × 4! = 1.048808594. (6.397) The exact value is √ 1.1 = 1.048808848, (6.398) so that approximation (6.397) gives six exact decimal digits. For √ 0.89 we must take x = −0.11, and we obtain f (−0.11) ≈ f (0) = 1, (6.399)
  • 366. 358 INTERPOLATION AND APPROXIMATION OF FUNCTIONS f (−0.11) ≈ f (0) − 0.11 2 × 1! = 0.945, (6.400) f (−0.11) ≈ f (0) − 0.11 2 × 1! − 0.112 22 × 2! = 0.9434875, (6.401) f (−0.11) ≈ f (0) − 0.11 2 × 1! − 0.112 22 × 2! − 0.113 × 3 23 × 3! = 0.943404312, (6.402) f (−0.11) ≈ f (0) − 0.11 2 × 1! − 0.112 22 × 2! − 0.113 × 3 23 × 3! − 0.114 × 3 × 5 24 × 4! = 0.943398593. (6.403) On the other hand, √ 0.89 = 0.943398113, (6.404) and hence approximation (6.403) that uses the first four derivatives of the function f leads to six exact decimal digits. Example 6.3 For the function f : [−1, 3] → R we know the following values. xi yi = f (xi) −1 6 0 3 1 −2 2 9 3 78 We wish to determine approximate values for f (−0.9) and f (2.8) using forward and backward Newton’s interpolation polynomials, respectively. To do this, we construct Table 6.5 of finite differences. In the case of forward Newton’s polynomial, the value of q is given by q = x − x0 h = −0.9 + 1 1 = 0.1 (6.405) and we have P (q) = y0 + q 1! y0 + q(q − 1) 2! 2 y0 + q(q − 1)(q − 2) 3! 3 y0 + q(q − 1)(q − 2)(q − 3) 4! 4 y0, (6.406) TABLE 6.5 Table of Finite Differences xi yi yi 2yi 3yi 4yi −1 6 −3 −2 18 24 0 3 −5 16 42 1 −2 11 58 2 9 69 3 78
  • 367. NUMERICAL EXAMPLES 359 from which f (−0.9) ≈ P (0.1) = 5.8071. (6.407) For the backward Newton’s polynomial we may write q = x − xn h = 2.8 − 3 1 = −0.2, (6.408) P (q) = y4 + q 1! y3 + q(q + 1) 2! 2 y2 + q(q + 1)(q + 2) 3! 3 y1 + q(q + 1)(q + 2)(q + 3) 4! 4 y0, (6.409) hence f (2.8) ≈ P (−0.2) = 56.7376. (6.410) Example 6.4 Let the function f : [−3, 3] → R be given by the following table of values. xi yi = f (xi) −3 68 −2 42 −1 18 0 2 1 0 2 18 3 62 We wish to have an approximate value for f (0.5). We construct Table 6.6 of finite differences. We have x0 = 0, x−1 = −1, x−2 = −2, x−3 = −3, x1 = 1, x2 = 2, x3 = 3, (6.411) h = 1, q = x − x0 h = 0.5. (6.412) TABLE 6.6 Table of Finite Differences xi yi = f (xi) yi 2 yi 3 yi 4 yi 5 yi 6 yi −3 68 −26 2 6 0 0 0 −2 42 −24 8 6 0 0 −1 18 −16 14 6 0 0 2 −2 20 6 1 0 18 26 2 18 44 3 62
  • 368. 360 INTERPOLATION AND APPROXIMATION OF FUNCTIONS If we apply Gauss’s first formula, then we obtain f (0.5) ≈ y0 + q 1! y0 + q(q − 1) 2! 2 y−1 + (q + 1)q(q − 1) 3! 3 y−1 + (q + 1)q(q − 1)(q − 2) 4! 4 y−2 + (q + 2)(q + 1)q(q − 1)(q − 2) 5! 5 y−2 + (q + 2)(q + 1)q(q − 1)(q − 2)(q − 3) 6! 6 y−3 = −1.125. (6.413) The use of the second Gauss’s formula leads to the relation f (0.5) ≈ y0 + q(1) 1! y−1 + (q + 1)(2) 2! 2 y−1 + (q + 1)(3) 3! 3 y−2 + (q + 2)(4) 4! 4 y−2 + (q + 2)(5) 5! 5 y−3 + (q + 3)(6) 6! 6 y−3 = −1.125. (6.414) Analogically, we may use the formulae of Stirling, Bessel, or Everrett. Example 6.5 Let us consider the function f : [0, 1] → R, f (x) = ex , (6.415) as well as the intermediary points x0 = 0, x1 = 0.5, x2 = 1. (6.416) The values of the function f at these points are f (0) = 1, f (0.5) = 1.64872, f (1) = 2.71828. (6.417) If we wish to determine the natural cubic spline interpolation polynomial, then we shall calculate successively α1 = 3[f (x2)(x1 − x0) − f (x1)(x2 − x0) + f (x0)(x2 − x1)] (x2 − x1)(x1 − x0) = 2.52504, (6.418) β0 = 1, γ0 = 0, δ0 = 0, (6.419) β1 = 2(x2 − x0) − (x1 − x2)γ0 = 2, (6.420) γ1 = 1 β1 (x2 − x1) = 0.25, (6.421) δ1 = 1 β1 [α1 − (x1 − x0)δ0] = 1.26252, (6.422) β2 = 1, δ2 = 0, c2 = 0, (6.423)
  • 369. NUMERICAL EXAMPLES 361 c1 = δ1 − γ1c2 = 0, (6.424) b1 = f (x2) − f (x1) x2 − x1 − (x2 − x1)(c2 + 2c1) 3 = 1.71828, (6.425) d1 = c2 − c1 3(x2 − x1) = −0.84168, (6.426) b0 = f (x1) − f (x0) x1 − x0 − (x1 − x0)(c1 + 2c0) 3 = 1.08702, (6.427) d0 = c1 − c0 3(x1 − x0) = 0.84168. (6.428) We obtain the natural cubic spline interpolation polynomial in the form S(x) =    1 + 1.08702x + 0.84168x3 , for x ∈ [0, 0.5] 1.64872 + 1.71828(x − 0.5) + 1.26252(x − 0.5)2 −0.84168(x − 0.5)3 , for x ∈ [0.5, 1] . (6.429) If we wish to determine the cubic spline interpolation polynomial with an imposed frontier, then we must take into account that f (0) = 1, f (0.5) = 1.64872, f (1) = 2.71828, (6.430) obtaining the answer S(x) =    1 + x + 0.48895x2 for x ∈ [0, 0.5] 1.64872 + 1.64785(x − 0.5) + 0.80677(x − 0.5)2 +0.35155(x − 0.5)3 for x ∈ [0.5, 1] . (6.431) Example 6.6 Let us consider the function f : [0, 4] → R, f (x) = sin x 3 + x + sin x (6.432) and the interpolation knots x0 = 0, x1 = 1, x2 = 2, x3 = 3, x4 = 4. (6.433) If we realize the interpolation of this function by interpolation polynomials, then the limit to infinite of any such polynomial will be ±∞, in contradiction to lim x→±∞ f (x) = 0. (6.434) We realize the interpolation by rational functions and let R(x) = P (x) Q(x) (6.435) be such a function. From relation (6.434), we deduce degP < degQ (6.436)
  • 370. 362 INTERPOLATION AND APPROXIMATION OF FUNCTIONS and, because we have five interpolation points, we may take P (x) = a1x + a0, Q(x) = b2x2 + b1x + b0, (6.437) with b2 = 0, ai ∈ R, i = 0, 1, bi ∈ R, i = 0, 2. It follows that the linear system a0 b0 = f (0) = 0, a1 + a0 b2 + b1 + b0 = f (1) = 0.17380, 2a1 + a0 4b2 + 2b1 + b0 = f (2) = 0.15388, 3a1 + a0 9b2 + 3b1 + b0 = f (3) = 0.02298, 4a1 + a0 16b2 + 4b1 + b0 = f (4) = −0.12122 (6.438) which is equivalent to, a0 = 0, a1 − 0.17380b0 − 0.17380b1 − 0.17380b2 = 0, 2a1 − 0.15388b0 − 0.30776b1 − 0.61552b2 = 0, 3a1 − 0.02298b0 − 0.06894b1 − 0.20682b2 = 0, 4a1 + 0.12122b0 + 0.48488b1 + 1.93952b2 = 0. (6.439) System (6.439) is compatible indeterminate. We shall determine its general solution. To do this, we consider that a1 = λ, where λ is a real parameter. It follows that the system 0.15388b0 + 0.30776b1 + 0.61552b2 = 2λ, 0.02298b0 + 0.06894b1 + 0.20682b2 = 3λ, 0.12122b0 + 0.48488b1 + 1.93952b2 = −4λ, (6.440) with the solution b0 = −1065.4λ, b1 = 820.29λ, b2 = −140.55λ. (6.441) We deduce R(x) = λx −140.55λx2 + 820.29λx − 1065λ = x −140.55x2 + 820.29x − 1065 , λ = 0. (6.442) Example 6.7 Let us consider the function f : [−1, 1] → R, f (x) = 1 1 + x2 , (6.443) called the Runge function, for which let us choose two systems of knots of interpolation. The first system will contain four equidistant interpolation knots, that is, x0 = −1, x1 = − 1 3 , x2 = 1 3 , x3 = 1, (6.444) while the second system will have as interpolation knots the roots of the Chebyshev polynomial K4(x), that is, x0 = − 2 + √ 2 4 , x1 = − 2 − √ 2 4 , x2 = 2 − √ 2 4 , x3 = 2 + √ 2 4 . (6.445) 14The function was presented by Carl David Tolm´e Runge (1856–1927) in 1901.
  • 371. APPLICATIONS 363 TABLE 6.7 The Values of the Interpolation Knots and of Function (6.443) at These Knots x f (x) x f (x) x0 = −1 0.5 x0 = −0.9238795 0.5395043 x1 = −0.3333333 0.9 x1 = −0.3826834 0.8722604 x2 = 0.3333333 0.9 x2 = 0.3826834 0.8722604 x3 = 1 0.5 x3 = 0.9238795 0.5395043 We shall construct interpolation polynomials corresponding to each system of knots and shall verify that the deviation is minimum in the case of the second system of interpolation knots for various numbers of interpolation knots. The Lagrange polynomial that passes through the interpolation knots zi, i = 0, 3, reads L3(x) = (x − z1)(x − z2)(x − z3) (z0 − z1)(z0 − z2)(z0 − z3) y0 + (x − z0)(x − z2)(x − z3) (z1 − z0)(z1 − z2)(z1 − z3) y1 + (x − z0)(x − z1)(x − z3) (z2 − z0)(z2 − z1)(z2 − z3) y2 + (x − z0)(x − z1)(x − z2) (z3 − z0)(z3 − z1)(z3 − z2) y3, (6.446) where yi = f (zi), i = 0, 3. We construct Table 6.7 with the values of the interpolation knots and of function (6.443) at these knots. The Lagrange polynomial for the first system of interpolation knots reads L(1) 3 (x) = −0.45x2 + 0.95. (6.447) The Lagrange polynomial for the second set of interpolation knots is L(2) 3 (x) = −0.4705883x2 + 0.9411765. (6.448) In general, calculating the values of the function f and of the polynomials L(1) n (x) and L(2) n (x) on the interval [−1, 1] with the step x = 0.001, we have determined the values in Table 6.8. We have denoted by εeq the maximum deviation for equidistant points, by Peq the points at which this deviation takes place, by εCh the maximum deviation with Chebyshev knots, and by PCh the points at which the maximum deviation with Chebyshev knots takes place. We observe that for the interpolation knots given by the roots of the Chebyshev polynomial the error is stable at values of order 10−15 ; for equidistant interpolation knots, the error is unbounded; thus, the oscillatory character of the polynomials of higher degree is established. 6.19 APPLICATIONS Problem 6.1 Let us consider the planar linkage in Figure 6.3, where OA = d, OC = c, AB = a, BC = b, and CM = λBC; the polynomials of first, second, and third degree, that approximate, in the sense of the least squares, the trajectory of the point M if the positions of the points Mi, specified by the angles φi = − 3π 4 + (i − 1) π 4 , i = 1, 7, (6.449) are known, have to be determined.
  • 372. 364 INTERPOLATION AND APPROXIMATION OF FUNCTIONS TABLE 6.8 Deviation n εeq Peq εCh PCh 4 0.058359 ±0.701 0.058824 0 5 0.022282 ±0.827 0.012195 ±1 6 0.014091 ±0.851 0.10101 0 7 0.006873 ±0.894 0.002092 ±1 8 0.004273 ±0.905 0.001733 0 9 0.002258 ±0.925 0.000359 ±1 10 0.001425 ±0.931 0.000297 0 11 0.000791 ±0.943 0.00062 ±1 12 0.000501 ±0.947 0.00051 0 13 0.00029 ±0.954 0.00001 ±1 14 0.0001815 ±0.957 88 × 10−7 0 15 0.0001061 ±0.962 18 × 10−7 ±1 16 6.73 × 10−5 ±0.964 1.5 × 10−6 0 17 3.99 × 10−5 ±0.968 3.11 × 10−7 ±1 18 2.54 × 10−5 ±0.969 2.58 × 10−7 0 19 1.52 × 10−5 ±0.972 5.34 × 10−8 ±1 20 9.67 × 10−6 ±0.973 4.42 × 10−8 0 25 8.84 × 10−7 ±0.980 2.70 × 10−10 ±1 30 8.79 × 10−8 ±0.984 6.57 × 10−12 0 35 1.92 × 10−8 ±0.979 4.02 × 10−14 ±0.964 40 4.13 × 10−7 ±0.989 1.78 × 10−15 ±0.082 45 9.37 × 10−6 ±0.991 1.22 × 10−15 ±0.052 50 0.0003145 ±0.988 1.22 × 10−15 ±0.319 60 0.365949 ±0.994 1.67 × 10−15 ±0.163 70 218.546 ±0.990 1.67 × 10−15 ±0.035 80 171416 ±0.995 1.67 × 10−15 ±0.056 90 2.03 × 108 ±0.996 1.55 × 10−15 ±0.753 100 1.47 × 1011 ±0.998 2 × 10−15 ±0.054 200 1.42 × 1041 ±0.998 2.78 × 10−15 ±0.544 300 3.95 × 1070 ±0.999 2.66 × 10−15 ±0.043 400 4.67 × 10100 ±0.999 3.33 × 10−15 ±0.320 500 4.23 × 10130 ±0.999 3.66 × 10−15 ±0.445 Solution: 1. Theory Denoting by XC, YC the coordinates of the point C, as well as OC2 = c2, CB2 = b2, we obtain the equations X2 C + Y2 C = c2 , (6.450) [XC − (d + a cos φ)]2 + (YC − a sin φ)2 = b2 , (6.451) from which, by subtracting and using the notation f = c2 + a2 + d2 + 2ad cos φ − b2 2 , (6.452) we get the equation of first degree XC(d + a cos φ) + YCa sin φ = f. (6.453)
  • 373. APPLICATIONS 365 B ϕ Mi M2 M1 MY X O C A Figure 6.3 Problem 6.1. Further, using the notations h = fa sin φ a2 + d2 + 2ad cos φ , k = c2 (d + a cos φ)2 − f 2 a2 + d2 + 2ad cos φ , (6.454) equation (6.450) and equation (6.453) lead to the equation Y2 C − 2hYC − k = 0, (6.455) the solution of which is YC = h + h2 + k; (6.456) also, from equation (6.453) we obtain XC = f − YCa sin φ d + a cos φ . (6.457) Denoting then by X, Y the coordinates of the point M, there result X = (1 − λ)XC + λ(d + a cos φ), (6.458) Y = (1 − λ)YC + λa sin φ. (6.459) Numerical application for a = l, b = c = 3l, d = 2l, l = 1, λ = 1/3 (with a positive value for λ, it follows, on the basis of a known relation in the affine geometry, that the point M is between C and B). 2. Numerical calculation Relations (6.449), (6.452), (6.454), (6.456), (6.457), (6.458), and (6.459) lead to the values in Table 6.9. Successively, the polynomials Y = 2.405819 − 0.496319X, (6.460) Y = 2.220796 + 0.377282X − 0.390308X2 , (6.461) Y = 2.209666 + 0.773455X − 0.888467X2 + 0.147989X3 (6.462) are obtained (Fig. 6.4). Problem 6.2 Let there be a mechanism with the plane follower of translation as shown in Figure 6.5; the mechanism is used for admission and evacuation of gas at heat engines.
  • 374. 366 INTERPOLATION AND APPROXIMATION OF FUNCTIONS −0.5 0 0.5 1 1.5 2 2.5 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 X Y −0.5 0 0.5 1 1.5 2 2.5 1 2 0.8 1.2 1.4 1.6 1.8 2.2 2.4 2.6 X Y −0.5 0 0.5 1 1.5 2 2.5 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 X Y (a) (b) (c) Figure 6.4 The trajectory of the point M and the polynomials of approximations by the least square method: (a) of first degree, (b) of second degree, (c) of third degree (the dashed line represents the original function).
  • 375. APPLICATIONS 367 TABLE 6.9 Numerical Results i φi XCi YCi Xi Yi 1 −2.356194 2.041879 2.197892 1.792217 1.229559 2 −1.570796 2.244990 1.989980 2.163327 0.993320 3 −0.785398 2.024246 2.214143 2.251866 1.240393 4 0.000000 1.500000 2.598076 2.000000 1.732051 5 0.785398 0.682861 2.921250 1.357610 2.183202 6 1.570796 −0.244990 2.989980 0.503340 2.326653 7 2.356194 −0.748985 2.904999 −0.068359 2.172368 Follower s Cam ϕ Figure 6.5 Problem 6.2. ϕ1 ϕ2 ϕ s2 s1 O s Figure 6.6 The displacement of the follower versus the rotation angle of the cam. If the motion law s = s(φ) of the follower, where s is the displacement and φ is the rotation angle of the cam, is piecewise polynomial, then the cam is called polydine (Fig. 6.6). Let us determine, on the interval [φ1, φ2], the Hermitic polynomial of minimal degree, which verifies the conditions si(φi) = s(φi), vi = ds dφ |φ=φi , ai = d2 s dφ2 |φ=φi , i = 1, 2. (6.463) Solution: 1. Theory Because there are six conditions, it means that the polynomial is of fifth degree and may be written in the form s = b0 + b1 + b2 2 + b3 3 + b4 4 + b5 5 , (6.464) where = φ − φ1 φ − φ0 , ∈ [0, 1]. (6.465)
  • 376. 368 INTERPOLATION AND APPROXIMATION OF FUNCTIONS Moreover, taking into account conditions (6.463), polynomial (6.464) reads s = s1P1( ) + s2P2( ) + (φ2 − φ1)[v1P3( ) + v2P4( )](φ2 − φ1)2 [a1P5( ) + a2P6( )], (6.466) where Pi( ), i = 1, 6, are polynomials of fifth degree in , which satisfy the conditions P1(0) = 1, P1(1) = 1, P1 (0) = P1 (1) = 0, P1 (0) = P1 (1) = 0, P2(0) = 0, P2(1) = 1, P2 (0) = P2 (1) = 0, P2 (0) = P2 (1) = 0, P3(0) = P3(1) = 0, P3 (0) = 1, P3 (1) = 1, P3 (0) = P3 (1) = 0, P4(0) = P4(1) = 0, P4 (0) = 0, P4 (1) = 1, P4 (0) = P4 (1) = 0, P5(0) = P5(1) = 0, P5 (0) = P5 (1) = 0, P5 (0) = 1, P5 (1) = 0, P6(0) = P6(1) = 0, P6 (0) = P6 (1) = 0, P6 (0) = 0, P6 (1) = 1. (6.467) If we express the polynomial Pi( ) and its derivatives in the form Pi( ) = c0i + c1i + c2i 2 + · · · + c5i 5 , Pi ( ) = c1i + 2c2i + 3c3i 2 + 4c4i 3 + 5c5i 4 , Pi ( ) = 2c2i + 6c3i + 12c4i 2 + 20c5i 3 , i = 1, 6, (6.468) then conditions (6.467) lead to the system c3i + c4i + c5i = αi, 3c3i + 4c4i + 5c5i = βi, 6c3i + 12c4i + 20c5i = γi, i = 1, 6, (6.469) where the constants αi, βi, γi and c0i, c1i, c2i, determined for each case, are given in Table 6.10. The solution c3i = 20αi − 8βi + γi 2 , c4i = −15αi + 7βi − γi, c5i = 12αi − 6βi + γi 2 (6.470) is obtained from system (6.469), using the data of Table 6.10; numerical results are given in Table 6.11. Thus, the six polynomials read P1( ) = 1 − 10 3 + 15 4 − 6 5 , P2( ) = 10 3 − 15 4 + 6 5 , P3( ) = − 6 3 + 8 4 − 3 5 , P4( ) = −4 3 + 7 4 − 3 5 , P5( ) = 1 2 2 − 3 2 3 + 3 2 4 − 1 2 5 , P6( ) = 1 2 3 − 4 + 1 2 5 . (6.471) Particular case: φ1 = 0 rad, s1 = 0 mm, φ2 = 1 rad, s2 = h = 7 mm, v1 = v2 = 0, a1 = a2 = 0. 2. Particular case The answer s = hP2( ) = h 10 φ3 φ3 2 − 15 φ4 φ4 2 + 6 φ5 φ5 2 = 7(10φ3 − 15φ4 + 6φ5 ) (6.472) is obtained and the diagram is shown in Figure 6.7. Problem 6.3 Let us consider the quadrangular mechanism in Figure 6.8, where AB = a, OA = b, BC = CM = OC = c. It is required to determine the distance Y0 so that the straight line Y − Y0 = 0 approximates the trajectory of the point M on the interval φ ∈ [−π/2, π/2] in the sense of the mini–max method.
  • 377. APPLICATIONS 369 TABLE 6.10 The Values c0i, c1i, c2i, αi, βi, and γi i c0i c1i c2i αi βi γi 1 1 0 0 −1 −1 0 2 0 1 0 1 1 0 3 0 0 0 −1 −1 0 4 0 0 0 0 0 1 5 0 0 1/2 −1/2 −1/2 −1 6 0 0 0 0 0 1 TABLE 6.11 The Values c3i, c4i, and c5i i c3i c4i c5i 1 −10 15 −6 2 10 −15 6 3 −6 8 −3 4 −4 7 −3 5 −3/2 3/2 −1/2 6 1/2 −1 1/2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5 6 7 ϕ (rad) s(mm) Figure 6.7 The variation diagram s = s(φ). Solution: 1. Theoretical aspects Let us consider the function y = y(x), the graphic of which is symmetric on the interval [−a, a] (Fig. 6.9). We wish to determine the straight line y − y0 = 0, which approximates this curve in the sense of the mini–max method. Let us choose, for example, y0i = y(0) + yi; (6.473)
  • 378. 370 INTERPOLATION AND APPROXIMATION OF FUNCTIONS M(X ,Y) C B A O Y X Y0 β ϕ Figure 6.8 Problem 6.3. y −a a x Figure 6.9 Theoretical aspects. we then calculate max|y(xi) − y0i| = yi max. (6.474) We construct a table such as the following for each yi. x yi max 0 0.2 0.01 0.3 0.02 0.5 ... ... a 0.01 The above table has been created for y = 0.2. We thus obtain a sequence of data of the following form. yi yi max 0 0.5 0.01 0.8 ... ... 0.5 0.125 ... ...
  • 379. APPLICATIONS 371 y0 y x 2 −1 1 Figure 6.10 Function (6.477). The minimum in this table is obtained (in the case given by us) for yi = 0.5 and has the value yi max = 0.125 = minimum. (6.475) We deduce the required straight line of the equation y0 − y(0) = 0.125. (6.476) Sometimes, the problem may be solved analytically also. Let there be a function (Fig. 6.10) with y = 2x2 , x ∈ [−1, 1], (6.477) for which we consider y0 < f (1) = 2. (6.478) It follows immediately that g(x) = |y0 − 2x2 | =    y0 − 2x2 for |x| ≤ y0 √ 2 , 2x2 − y0 for |x| > y0 √ 2 . (6.479) In the first case of formula (6.479), we deduce gmax = y0, while in the second case we have gmax = 2 − y0. It follows that the required straight line is given by y0 = 1, y − 1 = 0. (6.480) Let us return to the problem in Figure 6.8. The triangle OBM is rectangular at O so that there result the relations OM = BM2 − OB2 = 4c2 − (a2 + c2 + 2ac cos φ). (6.481)
  • 380. 372 INTERPOLATION AND APPROXIMATION OF FUNCTIONS Thus, also result cos β = b + a cos φ a2 + b2 + 2ab cos φ , sin β = a sin φ a2 + b2 + 2ab cos φ ; (6.482) hence XM = OM cos π 2 + β = − a sin φ 4c2 − (a2 + b2 + 2ab cos φ) a2 + b2 + 2ab cos φ , (6.483) YM = OM sin π 2 + β = (b + a cos φ) 4c2 − (a2 + b2 + 2ab cos φ) a2 + b2 + 2ab cos φ . (6.484) Because XM (−φ) = −XM (φ), YM (−φ) = YM (φ), it follows that the trajectory of the point M is symmetric with respect to the OY -axis. Numerical application for a = 0.1 m, b = 0.2 m, c = 0.25 m. 2. Numerical calculation Expressions (6.483) and (6.484) become XM = − 0.1 sin φ √ 0.2 − 0.04 cos φ √ 0.05 + 0.04 cos φ , (6.485) YM = (0.2 + 0.1 cos φ) √ 0.2 − 0.04 cos φ √ 0.05 + 0.04 cos φ . (6.486) Denoting now φ = π 2 φ∗ , φ∗ ∈ [−1, 1], (6.487) we obtain the following table of values. φ∗ XM YM −1 0.200000 0.400000 −0.9 0.183292 0.400183 −0.8 0.164973 0.400529 −0.7 0.145533 0.400825 −0.6 0.125354 0.400968 −0.5 0.104726 0.400934 −0.4 0.083858 0.400758 −0.3 0.062893 0.400505 −0.2 0.041912 0.400251 −0.1 0.020947 0.400067 0 0.000000 0.400000 0.1 −0.020947 0.400067 0.2 −0.041912 0.400251 0.3 −0.062893 0.400505 0.4 −0.083858 0.400758 0.5 −0.104726 0.400934 0.6 −0.125354 0.400968 0.7 −0.145533 0.400825 0.8 −0.164973 0.400529 0.9 −0.183292 0.400183 1 −0.200000 0.400000
  • 381. APPLICATIONS 373 We consider now the step Y = 10−6 m (6.488) and the interval 0.4 m ≤ Y ≤ 0.401 m. For each Y we have constructed a table of the following form (in this case the table has been created for Y = 0.4 m) Xi M Yi M |Yi M − Y| 0.200000 0.400000 0.000000 0.183292 0.400183 0.000183 0.164973 0.400529 0.000529 0.145533 0.400825 0.000825 0.125354 0.400968 0.000968 0.104726 0.400934 0.000934 0.083858 0.400758 0.000758 0.062893 0.400505 0.000505 0.041912 0.400251 0.000251 0.020947 0.400067 0.000067 0.000000 0.400000 0.000000 −0.020947 0.400067 0.000067 −0.041912 0.400251 0.000251 −0.062893 0.400505 0.000505 −0.083858 0.400758 0.000758 −0.104726 0.400934 0.000934 −0.125354 0.400968 0.000968 −0.145533 0.400825 0.000825 −0.164973 0.400529 0.000529 −0.183292 0.400183 0.000183 −0.200000 0.400000 0.000000 From the above table, it follows max|Yi M − Y| = 0.000968. (6.489) Analyzing each table, we deduce the value min max|Yi M − Y| = 0.000484 (6.490) obtained for Y0 = 0.400484 m; (6.491) hence, the equation of the required straight line is Y − 0.400484 = 0. (6.492) In Figure 6.11 the trajectory of the point M has been drawn (with a continuous line), as also the straight line (6.492) (with a broken line).
  • 382. 374 INTERPOLATION AND APPROXIMATION OF FUNCTIONS −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2 0.4 0.4001 0.4002 0.4003 0.4004 0.4005 0.4006 0.4007 0.4008 0.4009 0.401 X (m) Y(m) Figure 6.11 Trajectory of the point M (continuous line) and its approximation by the straight line (6.492) (broken line). FURTHER READING Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of America. Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis: Theory, Methods and Practice. Boca Raton: CRC Press. Atkinson KE (1989). An Introduction to Numerical Analysis. 2nd ed. New York: John Wiley & Sons, Inc. Atkinson KE (1993). Elementary Numerical Analysis. 2nd ed. New York: John Wiley & Sons, Inc. Atkinson K, Han W (2010). Theoretical Numerical Analysis: A Functional Analysis Framework. 3rd ed. New York: Springer-Verlag. Bakhvalov N (1976). M´ethodes Num´erique. Moscou: Editions Mir (in French). Berbente C, Mitran S, Zancu S (1997). Metode Numerice. Bucures¸ti: Editura Tehnic˘a (in Romanian). Bloch S (1951). Angen¨aherte Synthese von Mechanismen. Berlin: Verlag Technik (in German). Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole. Butt R (2009). Introduction to Numerical Analysis Using MATLAB. Boston: Jones and Bartlett Pub- lishers. Chapra SC (1996). Applied Numerical Methods with MATLAB for Engineers and Scientists. Boston: McGraw-Hill. Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson. Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice Hall. D´emidovitch B, Maron I (1973). ´El´ements de Calcul Num´erique. Moscou: Editions Mir (in French). DiBenedetto E (2010). Classical Mechanics: Theory and Mathematical Modeling. New York: Springer- Verlag. Epperson JF (2007). An Introduction to Numerical Methods and Analysis. Hoboken: John Wiley & Sons, Inc.
  • 383. FURTHER READING 375 Fung YC, Tong P (2011). Classical and Computational Solid Mechanics. Singapore: World Scientific Publishing. Golub GH, van Loan CF (1996). Matrix Computations. 3rd ed. Baltimore: John Hopkins University Press. Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen- tation of Algorithms. Princeton: Princeton University Press. Hamming RW (1987). Numerical Methods for Scientists and Engineers. 2nd ed. New York: Dover Publications. Hamming RW (2012). Introduction to Applied Numerical Analysis. New York: Dover Publications. Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing. Hildebrand FB (1987). Introduction to Numerical Analysis. 2nd ed. New York: Dover Publications. Hoffman JD (1992). Numeical Methods for Engineers and Scientists. New York: McGraw-Hill. Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd ed. Boca Raton: CRC Press. Kleppner D, Kolenkow RJ (2010). An Introduction to Mechanics. Cambridge: Cambridge University Press. Kress R (1996). Numerical Analysis. New York: Springer-Verlag. Krˆılov AN (1957). Lect¸ii de Calcule prin Aproximat¸ii. Bucures¸ti: Editura Tehnic˘a (in Romanian). Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill. Lange K (2010). Numerical Analysis for Statisticians. 2nd ed. New York: Springer-Verlag. Lurie AI (2002). Analytical Mechanics. New York: Springer-Verlag. Mabie HH, Reinholtz CF (1987). Mechanisms and Dynamics of Machinery. 4th ed. New York: John Wiley & Sons, Inc. Marciuk GI (1983). Metode de Analiz˘a Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Roma- nian). Marciuk GI, S¸aidurov VV (1981). Cres¸terea Preciziei Solut¸iilor ˆın Scheme cu Diferent¸e. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Meriam JL, Kraige LG (2012). Engineering Mechanics: Dynamics. Hoboken: John Wiley & Sons, Inc. Otto SR, Denier JP (2005). An Introduction to Programming and Numerical Methods in MATLAB. London: Springer-Verlag. Pandrea N (2000). Elemente de Mecanica Solidului ˆın Coordonate Pl¨uckeriene. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Pandrea N, Pˆarlac S, Popa D (2001). Modele pentru Studiul Vibrat¸iilor Automobilelor. Pites¸ti: Tiparg (in Romanian). Pandrea N, Popa D (2000). Mecanisme. Teorie s¸i Aplicat¸ii CAD. Bucures¸ti: Editura Tehnic˘a (in Romanian). Pandrea N, St˘anescu ND (2002). Mecanic˘a. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian). Postolache M (2006). Modelare Numeric˘a. Teorie s¸i Aplicat¸ii. Bucures¸ti: Editura Fair Partners (in Romanian). Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific Computing. 3rd ed. Cambridge: Cambridge University Press. Quarteroni A, Sacco R, Saleri F (2010). Numerica Mathematics. 2nd ed. Berlin: Springer-Verlag. Ralston A, Rabinowitz P (2001). A First Course in Numerical Analysis. 2nd ed. New York: Dover Publications. Reza F (1973). Spat¸ii Liniare. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a p. 4 (in Romanian). Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press. pp. 1–8, 10. Rivi`ere B (2008). Discontinuous Galerkin Methods for Solving Elliptic and Parabolic Equations: Theory and Implementation. Philadelphia: SIAM. 9.
  • 384. 376 INTERPOLATION AND APPROXIMATION OF FUNCTIONS Salvadori MG, Baron ML (1962). Numerical Methods in Engineering. Englewood Cliffs: Prentice Hall. Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson. Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN. Bucures¸ti: Editura Tehnic˘a (in Romanian). St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian). Stoer J, Bulirsh R (2010). Introduction to Numerical Analysis. 3rd ed. New York: Springer-Verlag. S¨uli E, Mayers D (2003). An Introduction to Numerical Analysis. Cambridge: Cambridge University Press. Udris¸te C, Iftode V, Postolache M (1996). Metode Numerice de Calcul. Algoritmi s¸i Programe Turbo Pascal. Bucures¸ti: Editura Tehnic˘a (in Romanian).
  • 385. 7 NUMERICAL DIFFERENTIATION AND INTEGRATION 7.1 INTRODUCTION The numerical differentiation is used if the function to differentiate is defined in a numerical form by its values yi at the knots xi, yi = f (xi), i = 0, n, (7.1) with f : D ⊂ R → R, or if the expression of the function is very complicated and difficult to use, or if the function is the solution of an equation or of a system of equations. The operation of differentiation is, in general, avoided, because it increases the small errors. Such an example is given in Figure 7.1, where the function f has been drawn by an unbroken line, while its approximate f has been drawn by a broken one. The function and its approximate pass through the points Ai−1(xi−1, yi−1), Ai(xi, yi), Ai+1(xi+1, yi+1). The straight line (τ) is tangent to the graph of the function f at the point Ai−1(xi−1, yi−1), while the straight line (τ1) marks the tangent to the graph of the approximate f at the very same point Ai−1(xi−1, yi−1). Thus we obtain the relation tan α = f (xi−1), tan α1 = f (xi−1) (7.2) and in the figure, we observe that the error is |tan α − tan α1|. 7.2 NUMERICAL DIFFERENTIATION BY MEANS OF AN EXPANSION INTO A TAYLOR SERIES Let f : [a, b] → R be of class C3 ([a, b]) and let a = x0 < x1 < x2 < · · · < xn = b (7.3) Numerical Analysis with Applications in Mechanics and Engineering, First Edition. Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.  2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc. 377
  • 386. 378 NUMERICAL DIFFERENTIATION AND INTEGRATION x Ai(xi,yi) Ai+1(xi+1,yi+1) Ai−1(xi−1,yi−1) (τ) α1α y O (τ1) Figure 7.1 Numerical differentiation. be a division of the interval [a, b]. Let us denote by h the magnitude h = xi − xi−1 (7.4) and by h1 the magnitude h1 = xi+1 − xi. (7.5) In the general case, h = h1 and we may write h1 = hα, (7.6) where α ∈ R∗ +. Let us consider now the expansion into a Taylor series of the function f around the point xi f (x) = f (xi) + x − xi 1! f (xi) + (x − xi)2 2! f (xi) + (x − xi)3 3! f (ξ), (7.7) where ξ is a point situated between x and xi. We may also write ξ = xi + θ(x − xi), (7.8) where θ ∈ (0, 1). It follows that f (x) = f (xi) + x − xi 1! f (xi) + (x − xi)2 2! f (xi) + (x − xi)3 3! f [xi + θ(x − xi)]. (7.9) Let us now consider the values x = xi−1 and x = xi+1, i = 1, n − 1. We thus obtain f (xi+1) = f (xi) + αh 1! f (xi) + (αh)2 2! f (xi) + (αh)3 3! f (ξi), (7.10) with ξi situated between xi and xi+1, and f (xi−1) = f (xi) − h 1! f (xi) + h2 2! f (xi) − h3 3! f (ζi), (7.11) where ζi is situated between xi−1 and xi, respectively.
  • 387. NUMERICAL DIFFERENTIATION BY MEANS OF AN EXPANSION INTO A TAYLOR SERIES 379 We will now subtract the last two relations one from another, obtaining f (xi+1) − f (xi−1) = (α + 1)h 1! f (xi) + h2(α2 − 1) 2! f (xi) + (αh)3 3! f (ξi) + h3 3! f (ζi), (7.12) from which f (xi) = 1 (α + 1)h (f (xi+1) − f (xi−1)) + (1 − α)h 2! f (xi) − h2 3!(α + 1) (α3 f (ξi) + f (ζi)). (7.13) Observation 7.1 If f : [a, b] → R is at least of class C3 ([a, b]), then we may consider f (xi) ≈ 1 (α + 1)h (f (xi+1) − f (xi−1)). (7.14) We add now relation (7.10) to relation (7.11) multiplied by α. It follows that f (xi+1) + αf (xi−1) = (1 + α)f (xi) + αh2 2! f (xi)(1 + α) + (αh)3 3! f (ξi) − αh3 3! f (ζi), (7.15) from which f (xi) = 2 α(α + 1)h2 f xi+1 + α f (xi−1) − (1 + α)f (xi) − (αh)3 3! f (ξi) − αh3 3! f (ζi) . (7.16) Observation 7.2 In the same conditions as at the Observation 7.1, we can use the approximate formula f (xi) ≈ 2 α(α + 1)h2 [αf (xi−1) − (1 + α)f (xi) + f (xi+1)]. (7.17) Proposition 7.1 Let f : [a, b] → R be at least of class C3 ([a, b]). Under these conditions: (i) the approximation error of f (xi) obtained by using formula (7.14) is εf = (1 − α)h 2! f (xi) − h2 3!(1 + α) (α3 f (ξi) + f (ζi)); (7.18) (ii) the approximation error of f (xi) obtained by using formula (7.17) is εf = h 3(α + 1) (α2 f (ξi) − f (ζi)). (7.19) Demonstration. It is immediate, using formulae (7.13) and (7.16), respectively.
  • 388. 380 NUMERICAL DIFFERENTIATION AND INTEGRATION Corollary 7.1 If the knots are equidistant (α = 1), then (i) formula (7.14) of approximation of the derivative of first order f (xi) takes the form f (xi) ≈ f (xi+1) − f (xi−1) 2h , (7.20) the error being εf = − h2 12 (f (ξi) + f (ζi)); (7.21) (ii) formula (7.17) of approximation of the derivative of second order f (xi) reads f (xi) ≈ f (xi−1) − 2f (xi) + f (xi+1) h2 , (7.22) the error being εf = h 6 (f (ξi) − f (ζi)). (7.23) Corollary 7.2 If f : [a, b] → R, f ∈ C3 ([a, b]) and the interpolation knots are equidistant, then we denote M = sup x∈[a,b] f (x), m = inf x∈[a,b] f (x). (7.24) In this case, |εf | ≤ h2 6 max{|M|, |m|}, (7.25) |εf | ≤ h 6 |M − m|. (7.26) Observation 7.3 We use the approximate formulae f (x0) ≈ −3f (x0) + 4f (x1) − f (x2) x2 − x0 , (7.27) f (xn) ≈ 3f (xn) − 4f (xn−1) + f (xn−2) xn − xn−2 . (7.28) for the points x0 and xn. 7.3 NUMERICAL DIFFERENTIATION BY MEANS OF INTERPOLATION POLYNOMIALS Let the function be f : [a, b] → R and the equidistant interpolation knots xi, i = 0, n, so that xi+1 − xi = h = const, i = 0, n − 1. (7.29) We also denote by P (q) Newton’s interpolation polynomial, where q = (x − x0)/h for x in the superior part of the finite differences table and where q = (x − xn)/h for x in the inferior part
  • 389. NUMERICAL DIFFERENTIATION BY MEANS OF INTERPOLATION POLYNOMIALS 381 of the finite differences table. We approximate the derivative f (x) = df/dx by the derivative of Newton’s polynomial at the very same point f (x) ≈ dP dx . (7.30) We mention that we may write dP dx = dP dq dq dx = 1 h dP dq , d2 P dx2 = d dx dP dx = 1 h d dx dP dq = 1 h2 d2 P dq2 , . . . , (7.31) dk P dxk = 1 hk dk P dqk , . . . Lemma 7.1 Let x(n) be the generalized power of nth order. Under these conditions dk dxk x(n) = n(n − 1) · · · (n − k + 1)x(n−k) . (7.32) Demonstration. We have x(n) = nhx(n−1) (7.33) and d dx x(n) = lim h→0 x(n) h = nx(n−1) . (7.34) Step by step, we obtain formula (7.32). Let P (q) be Newton’s forward polynomial P (q) = y0 + q(1) 1! y0 + q(2) 2! 2 y0 + · · · + q(n) n! n y0. (7.35) Under these conditions, assuming that q = (x − x0)/h, Lemma 7.1 leads to dP dx = 1 h y0 + 2q(1) 2! 2 y0 + 3q(2) 3! 3 y0 + · · · + nq(n−1) n! n y0 = 1 h y0 + q(1) 1! 2 y0 + q(2) 2! 3 y0 + · · · + q(n−1) (n − 1)! n y0 , (7.36) d2 P dx2 = 1 h2 2 y0 + 2q(1) 2! 3 y0 + 3q(2) 3! 4 y0 + · · · + (n − 1)q(n−2) (n − 1)! n y0 = 1 h2 2 y0 + q(1) 1! 3 y0 + q(2) 2! 4 y0 + · · · + q(n−2) (n − 2)! n y0 . (7.37) In general, we may write dkP dxk = 1 hk k y0 + q(1) 2! k+1 y0 + q(2) 2! k+2 y0 + · · · + q(n−k) (n − k)! n y0 . (7.38)
  • 390. 382 NUMERICAL DIFFERENTIATION AND INTEGRATION Let us consider now Newton’s backward polynomial P (q) = yn + q(1) 1! yn−1 + (q + 1)(2) 2! 2 yn−2 + · · · + (q + n − 1)(n) n! n y0. (7.39) Applying again Lemma 7.1 with q = (x − xn)/h, we have dP dx = 1 h yn−1 + 2(q + 1)(1) 2! 2 yn−2 + · · · + n(q + n − 1)(n−1) n! n y0 = 1 h yn−1 + (q + 1)(1) 1! 2 yn−2 + · · · + (q + n − 1)(n−1) (n − 1)! n y0 , (7.40) d2 P dx2 = 1 h2 2 yn−2 + 2(q + 2)(1) 2! 3 yn−3 + · · · + (n − 1)(q + n − 1)(n−2) (n − 1)! n y0 = 1 h2 2 yn−2 + (q + 2)(1) 1! 3 yn−3 + · · · + (q + n − 1)(n−2) (n − 2)! n y0 (7.41) and, in general, dk P dxk = 1 hk k yn−k + (q + k)(1) 2! k+1 yn−k−1 + · · · + (q + n − 1)(n−k) (n − k)! n y0 . (7.42) 7.4 INTRODUCTION TO NUMERICAL INTEGRATION We want to calculate the integrals I = b a f (x)dx, (7.43) where −∞ ≤ a < b ≤ ∞, f being integrable on [a, b]. In general, two situations exist The first case is that of a proper integral (7.43), which will be considered at this place. The second case assumes that the integral (7.43) is an improper one. Several techniques exist to transform an improper integral into a proper one, or to approximate with an imposed precision the value of the improper integral. If the interval [a, b] is an infinite one, that is, if the integral (7.43) has one of the forms I = b −∞ f (x)dx, I = ∞ a f (x)dx, I = ∞ −∞ f (x)dx, (7.44) then we may use the following techniques to calculate the improper integrals: • the change of variable which may lead to the transformation of the infinite interval (−∞, b], [a, ∞) or (−∞, ∞) into an interval of finite length;
  • 391. INTRODUCTION TO NUMERICAL INTEGRATION 383 • the separation of the integral in two integrals of the form b −∞ f (x)dx = b1 −∞ f (x)dx + b b1 f (x)dx, ∞ a f (x)dx = a1 a f (x)dx + a1 ∞ f (x)dx, (7.45) ∞ −∞ f (x)dx = a2 −∞ f (x)dx + b2 a2 f (x)dx + b2 ∞ f (x)dx. The idea is that if |ai|, |bi|, i = 1, 2 are sufficient great, then the improper integrals b1 −∞ f (x)dx, ∞ a1 f (x)dx, a2 −∞ f (x)dx and ∞ b2 f (x)dx may be neglected, the values of the integrals in formula (7.45) being given by b −∞ f (x)dx ≈ b b1 f (x)dx, ∞ a f (x)dx ≈ a1 a f (x)dx, ∞ −∞ f (x)dx ≈ b2 a2 f (x)dx. (7.46) A question rises: what can we understand by |ai|, |bi|, i = 1, 2, sufficient great? In general, the answer to this question is based on the following considerations: we may analytically show that the improper integrals neglected b1 −∞ f (x)dx, ∞ a1 f (x)dx, a2 −∞ f (x)dx and ∞ b2 f (x)dx may be less than an ε a priori given for |ai|, |bi|, i = 1, 2, sufficient great in modulus, or we calculate the integrals b b1 f (x)dx, b1 d1 f (x)dx, d1 b1, (7.47) a1 a f (x)dx, c1 a1 f (x)dx, c1 a1, (7.48) b2 a2 f (x)dx, a2 c2 f (x)dx, d2 b2 f (x)dx, c2 a2, d2 b2, (7.49) and we show that b1 a1 f (x) dx b b1 f (x)dx 1, c1 a1 f (x) dx a1 a f (x)dx < 1, a2 c2 f (x) dx + d2 b2 f (x) dx b2 a2 f (x) dx 1; (7.50) • if the asymptotic behavior of f (x) that is, we know that the functions g1(x) and g2(x) so that lim x→∞ f (x) g1(x) = µ1, lim x→−∞ f (x) g2(x) = µ2, (7.51) where µ1 and µ2 are two finite real values, then we may write the approximate relations a −∞ f (x)dx ≈ µ2 a1 −∞ g2(x)dx + a a1 f (x)dx, ∞ b f (x)dx ≈ µ2 b1 b f (x)dx + ∞ b1 g1(x)dx, ∞ −∞ f (x)dx = µ2 a2 −∞ g2(x)dx + b2 a2 f (x)dx + µ1 ∞ b2 g1(x)dx; (7.52)
  • 392. 384 NUMERICAL DIFFERENTIATION AND INTEGRATION • a last method to solve the problem of the improper integral on an infinite interval is that by a change of variable, which transforms the infinite limit into a finite one. But, in many cases, this technique introduces a singularity. The last situation that may appear for the integral (7.43) is that in which the interval [a, b] is bounded, but lim x→a f (x)dx = ±∞ or lim x→b f (x)dx = ±∞. (7.53) There are several methods to avoid the singularities, that is: • their elimination, by using the integration by parts, the change of variable etc.; • the use of certain Gauss type quadrature formulae, which eliminate some types of singularities, using other polynomials as Legendre ones; • the use of Gauss type quadrature formulae with Legendre polynomials, because the calculation of the values of the function f at the points a and b is not necessary; • the division of the integral in several integrals of the form b a f (x)dx = a+ε1 a f (x)dx + b−ε2 a+ε1 f (x)dx + b b−ε2 f (x)dx, (7.54) using a very small integration step for the first and the last integral in the right member, which leads to a very great time of calculation; • the transformation of the finite interval in an infinite one by a certain change of variable, the new integral thus obtained being easier to calculate. 7.5 THE NEWTON–C ˆOTES QUADRATURE FORMULAE We begin with a definition. Definition 7.1 A quadrature formula is a numerical procedure by which the value of a definite integral is approximated by using information about the integrand only at certain points in which this one is definite. Let N be a nonzero natural number and the integral I = N 0 f (x)dx. (7.55) Observation 7.4 Any integral of the form I = b a g(u)du, (7.56) with −∞ < a < b < ∞, may be brought to form (7.55) by using the change of variable u = a + b − a N x, du = b − a N dx. (7.57) Indeed, I = b a g(u)du = N 0 g a + b − a N x b − a N dx, (7.58)
  • 393. THE NEWTON–C ˆOTES QUADRATURE FORMULAE 385 where f (x) = g a + b − a N x b − a N . (7.59) Let us further denote by yi = f (i), i = 0, N, (7.60) the values of the function f of equation (7.55) at the points i and by LN (x) the Lagrange polynomial corresponding to the function f on the interval [0, N] and to the division points xi = i, i = 0, N. We replace the integral (7.55) by the approximate value I ≈ N 0 LN (x)dx. (7.61) On the other hand, we have LN (x) = N i=0 (x − 0)(x − 1) · · · (x − i + 1)(x − i − 1) · · · (x − N) (i − 0)(i − 1) · · · (i − i + 1)(i − i − 1) · · · (i − N) yi (7.62) or, equivalently, LN (x) = N i=0 φi(xi)yi, (7.63) where the notations are obvious. Replacing relation (7.63) in the formula (7.61), we get I ≈ N 0 N i=0 φi(x)yidx = N i=0 yi N 0 φi(x)dx = N i=0 c(N) i yi, (7.64) where c(N) i = N 0 φi(x)dx. (7.65) Definition 7.2 The formula I ≈ N i=0 c(N) i yi (7.66) is called the Newton–Cˆotes quadrature formula.1 Proposition 7.2 (Error in the Newton–Cˆotes Quadrature Formula). If the function f is of class CN+1 and if we denote M = sup x∈[0,N] |f (N+1) (x)|, (7.67) then the formula I − N i=0 c(N) i yi ≤ M (N + 1)! N 0 |x(x − 1) · · · (x − N)|dx (7.68) takes place. 1The formula is named after Sir Isaac Newton (1642–1727) and Roger Cˆotes (1682–1716).
  • 394. 386 NUMERICAL DIFFERENTIATION AND INTEGRATION Demonstration. If from the error formula of Lagrange’s polynomial |f (x) − LN (x)| ≤ M (N + 1)! |x(x − 1) · · · (x − N)|, (7.69) we pass to integration, then I − N i=0 c(N) i yi = N 0 f (x) dx − N 0 LN (x)dx = N 0 f (x) − LN (x) dx ≤ N 0 |f (x) − LN (x)|dx ≤ N 0 M (N + 1)! x(x − 1) · · · (x − N) dx ≤ M (N + 1)! N 0 |x(x − 1) · · · (x − N)|dx (7.70) and the proposition is stated. Observation 7.5 We can write the exact formula I − N i=0 c(N) i yi = f (N+1) (ξ) (N + 1)! N 0 x(x − 1) · · · (x − N)dx (7.71) too, obtained analogically as equation (7.68), taking into account the expression of the rest in Lagrange’s polynomial, ξ being a point between 0 and N. 7.6 THE TRAPEZOID FORMULA This formula is a particular case of the Newton–Cˆotes quadrature formula for N = 1. Let the integral be Ii = xi+1 xi f (x)dx, (7.72) where f : [xi, xi+1] → R, xi = xi+1, f at least of class C0 on [xi, xi+1]. We make the change of variable x = xi + (xi+1 − xi)u, dx = (xi+1 − xi)du (7.73) and the integral (7.72) now reads Ii = 1 0 F(u)du, (7.74) with F(u) = f [xi + (xi+1 − xi)u](xi+1 − xi). (7.75) Taking into account the discussion in Section 7.5, we have Ii ≈ c(1) 0 y0 + c(1) 1 y1, (7.76)
  • 395. THE TRAPEZOID FORMULA 387 where y0 = F(0) = f (xi)(xi+1 − xi), y1 = F(1) = f (xi+1)(xi+1 − xi), (7.77) c(1) 0 = 1 0 x − 1 0 − 1 dx = 1 0 (1 − x)dx = x − x2 2 1 0 = 1 2 , c(1) 1 = 1 0 x − 0 1 − 0 dx = 1 0 xdx = x2 2 1 0 = 1 2 . (7.78) It follows that Ii ≈ xi+1 − xi 2 (f (xi) + f (xi+1)). (7.79) Definition 7.3 Relation (7.79) is called the trapezoid formula. Observation 7.6 Relation (7.79) means that the area under the curve y = f (x), equal to the integral Ii, is approximated by the area of the trapezium hatched in Figure 7.2. Let f : [a, b] → R be of class C2 on [a, b]. Let us assume that the interval [a, b] is divided into n equal parts so that a = x0 < x1 < x2 · · · < xn = b, xj+1 − xj = h = b − a n , j = 0, n − 1. (7.80) Applying the trapezoid formula on each interval [xj , xj+1] and summing, we obtain I = b a f (x)dx = n−1 j=0 xj+1 xj f (x)dx ≈ n j=0 xj+1 − xj 2 (f (xj+1) + f (xj )) = h 2 [(f (a) + f (x1)) + (f (x1) + f (x2)) + · · · + (f (xn−1) + f (b))], (7.81) that is, I ≈ h 2  f (a) + f (b) + 2 n−1 j=1 f (xj )   . (7.82) Definition 7.4 Formula (7.82) is called the generalized trapezoid formula. O x y xi+1xi L1 (x) f(x) Figure 7.2 The trapezoid formula.
  • 396. 388 NUMERICAL DIFFERENTIATION AND INTEGRATION Proposition 7.3 (The Error in the Generalized Trapezoid Formula). If f : [a, b] → R, is of class C2 on [a, b], then the relation b a f (x)dx − h 2  f (a) + f (b) + 2 n j=1 f (xj )   = − (b − a)2 12u2 f (ξ) (7.83) holds, where ξ is a point situated between a and b, while xj , j = 0, n, is an equidistant division of the interval [a, b], with x0 = a, xn = b and xj+1 − xj = h = (b − a)/n. Demonstration. Let us calculate the error on each interval of the form [xj , xj+1], j = 0, n − 1. Taking into account Observation 7.4, we have εj (f (x)) = εj (F(u)) = F (ζ) 2! 1 0 x(x − 1)dx, (7.84) where ζ ∈ [0, 1], while 1 0 x(x − 1)dx = x3 3 − x2 2 1 0 = − 1 6 . (7.85) Relation (7.84) now becomes εj (f (x)) = − F (ζ) 12 . (7.86) Formula (7.75) leads to F (u) = (xj+1 − xj )3 f [xj + (xj+1 − xj )u] (7.87) and taking into account that xj+1 − xj = h, relation (7.86) reads εj (f (x)) = − h3 12 f (ξj ), (7.88) where ξj is a point in the interval [xj , xj+1]. We have ε[a,b](f (x)) = n−1 j=0 εj (f (x)) = − h3 12 n−1 j=0 f (ξj ) (7.89) on the entire interval [a, b]. Because f ∈ C2([a, b]), there exists ξ ∈ [a, b] so that f (ξ) = 1 n n−1 j=0 f (ξj ) (7.90) and relation (7.89) becomes ε[a,b](f (x)) = − (b − a)3 12n2 f (ξj ), (7.91) the relation (7.83), which had to be demonstrated.
  • 397. SIMPSON’S FORMULA 389 Corollary 7.3 In the conditions of Proposition 7.3, denoting M = sup x∈[a,b] |f (x)|, (7.92) there exists the inequality |ε[a,b](f (x))| ≤ M 12u2 (b − a)3 . (7.93) Demonstration. From relation (7.91) we obtain immediately |ε[a,b](f (x))| = (b − a)3 12u2 |f (ξ)| ≤ (b − a)3 12n2 sup ξ∈[a,b] |f (ξ)| = M 12u2 (b − a)3 . (7.94) Observation 7.7 We observe that, by increasing the number of division points (increasing n) the error ε[a,b](f (x)) diminishes in direct proportion to n2 . This method of increasing the precision may not always be used, because the growth of n leads to the increasing of the calculation time. 7.7 SIMPSON’S FORMULA This formula is a particular case of the Newton–Cˆotes formula2 for N = 2. Let f : [a, b] → R, be of class C0 on [a, b], and let a division of the interval be [a, b] so that a = x0 < x1 < · · · < x2n = b, xi+1 − xi = h = b − a 2n , i = 0, 2n − 1. (7.95) Let us consider the integral I2i = x2i+2 x2i f (x)dx (7.96) and let us make the change of variable x = x2i + x2i+2 − x2i 2 u, dx = x2i+2 − x2i 2 du. (7.97) The integral (7.96) now reads I2i = 2 0 F(u)du, (7.98) where F(u) = f x2i + x2i+2 − x2i 2 u x2i+2 − x2i 2 . (7.99) Corresponding to the Section 7.5, we have I2i ≈ c(2) 0 y0 + c(2) 1 y1 + c(2) 2 y2, (7.100) 2The method was introduced by Thomas Simpson (1710–1761) in 1750. The method was also known by Bonaven- tura Francesco Cavalieri (1598–1647) since 1639, Johannes Kepler (1571–1630) since 1609, and James Gregory (1638–1675) since 1668 in the book The Universal Part of Geometry.
  • 398. 390 NUMERICAL DIFFERENTIATION AND INTEGRATION where y0 = F(0) = hf (x2i), y1 = F(1) = hf (x2i+1), y2 = F(2) = hf (x2i+2), (7.101) c(2) 0 = 2 0 (x − 1)(x − 2) (0 − 1)(0 − 2) dx = 2 0 x2 − 3x + 2 2 dx = x3 6 − 3x2 4 + x 2 0 = 1 3 , c(2) 1 = 2 0 x(x − 2) (1 − 0)(1 − 2) dx = 2 0 2x − x2 1 dx = x2 − x3 3 2 0 = 4 3 , (7.102) c(2) 2 = 2 0 x(x − 1) (2 − 0)(2 − 1) dx = 2 0 x2 − x 2 dx = x3 6 − x2 4 + x 2 0 = 1 3 . We thus obtain I2i ≈ h 3 (f (x2i) + 4f (x2i+1) + f (x2i+2)). (7.103) Definition 7.5 Formula (7.103) is called Simpson’s formula. Observation 7.8 Geometrically, relation (7.103) shows that the integral I, equal to the area under the curve f (x), is approximated by the area hatched in Figure 7.3 and which is under the L2(x). Applying Simpson’s formula on each interval [x2j , x2j+2], with j = 0, n − 1 and summing, we obtain I = b a f (x)dx ≈ n−1 j=0 I2j = h 3 n−1 j=0 (y2j + 4y2j+1 + y2j+2) = h 3 [y0 + y2n + 4(y1 + y3 + · · · + y2n−1) + 2(y2 + y4 + · · · + y2n−2)]. (7.104) Definition 7.6 Formula (7.104) is called the generalized Simpson formula. Proposition 7.4 (The Error in the Generalized Simpson Formula). If f : [a, b] → R, f of class C4 on [a, b], while xj ,j = 0, 2n, is an equidistant division of the interval [a, b], with x0 = a, x2n = b and xj+1 − xj = h = (b − a)/(2n), then takes place the relation b a f (x)dx − h 3 [y0 + y2n + 4(y1 + · · · + y2n−1) + 2(y2 + · · · + y2n−2)] = − (b − a)5 2880n4 y(4) (ξ), (7.105) where ξ ∈ [a, b]. O x y L2(x) f(x) x2i x2i+1 x2i+2 Figure 7.3 The Simpson formula.
  • 399. SIMPSON’S FORMULA 391 Demonstration. Let us consider the interval [x2j , x2j+2] for which the error is ε2j (f (x)) = x2j+1+h x2j f (x)dx − h 3 (y2j + 4y2j+1 + y2j+2) (7.106) or, equivalently, ε2j (f (x)) = x2j+1+h x2j −h f (x)dx − h 3 [y(x2j − h) + 4y(x2j+1) + y(x2j+2 + h)], (7.107) ε2j being a function of h. We have dε2j dh = y(x2j+1 + h) + y(x2j+1 − h) − 1 3 [y(x2j+1 − h) + 4y(x2j+1) + y(x2j+1 + h)] − h 3 − dy x2j+1 − h dh + dy(x2j+1 + h) dh (7.108) and it follows that dε2j dh = 2 3 [y(x2j+1 + h) + y(x2j+1 − h)] − 4 3 y(x2j+1) − h 3 − dy x2j+1 − h dh + dy(x2j+1 + h) dh . (7.109) Further, d2 ε2j dh2 = 2 3 dy x2j+1 + h dh − dy(x2j+1 − h) dh − 1 3 − dy x2j+1 − h dh + dy(x2j+1 + h) dh − h 3 d2 y x2j+1 − h dh2 + d2 y(x2j+1 + h) dh2 , (7.110) that is, d2 ε2j dh2 = − 1 3 − dy x2j+1 − h dh − dy(x2j+1 + h) dh − h 3 d2 y x2j+1 − h dh2 + d2 y(x2j+1 + h) dh2 . (7.111) Analogically, d2 ε2j dh2 = 1 3 d2 y x2j+1 − h dh2 + d2 y(x2j+1 + h) dh2 − 1 3 d2 y x2j+1 − h dh2 + d2 y(x2j+1 + h) dh2 − h 3 − d3 y x2j+1 − h dh3 + d3 y(x2j+1 + h) dh3 = − h 3 d3 y x2j+1 + h dh3 − d3 y(x2j+1 − h) dh3 . (7.112)
  • 400. 392 NUMERICAL DIFFERENTIATION AND INTEGRATION Applying Lagrange’s finite increments formula to the function d3 y/dh3 on the interval [y(x2j+1 − h), y(x2j+1 + h)], it follows that the existence of an intermediate point ξ2j ∈ (x2j+1 − h, x2j+1 + h) so that d3 y(x2j+1 + h) dh3 − d3 y(x2j+1 − h) dh3 = 2h d4 y(ξ2j ) dh4 , (7.113) hence d3 ε2j dh3 = − 2h2 3 d4 y(ξ2j ) dh4 . (7.114) On the other hand, we have ε2j (0) = 0, dε2j(0) dh = 0, d2 ε2j(0) dh2 = 0 (7.115) and, by successive integration of formula (7.114) between 0 and h, we obtain d2 ε2j (h) dh2 = d2 ε2j (0) dh2 + h 0 d3 ε2j (τ) dτ3 dτ = −2 d4 y(ξ2j ) dh4 h 0 τ2 dτ = − 2 9 h3 d4 y(ξ2j ) dh4 , (7.116) dε2j (h) dh = dε2j (0) dh + h 0 d2 ε2j (τ) dτ2 dτ = − 2 9 d4 y(ξ2j ) dh4 h 0 τ3 dτ = − 1 18 h4 d4 y(ξ2j ) dh4 , (7.117) ε2j (h) = ε2j (0) + h 0 dε2j(τ) dτ dτ = − 1 18 d4 y(ξ2j ) dh4 h 0 τ4 dτ = − 1 90 h5 d4 y(ξ2j ) dh4 . (7.118) It follows that ε2j (h) = − h5 90 y(4) (ξ2j ), (7.119) where ξ2j ∈ (x2j , x2j+2). Summing on the entire interval [a, b], we obtain the error ε[a,b](f (x)) = − h5 90 n−1 j=0 y(4) (ξ2j ). (7.120) Because f is of class C4 on [a, b], there exists ξ ∈ [a, b] so that 1 n n−1 j=0 y(4) (ξ2j ) = y(4) (ξ) (7.121) and expression (7.120) of the error reads ε[a,b](f (x)) = − nh5 90 y(4) (ξ). (7.122) Taking into account that h = (b − a)/(2n), the last formula leads to ε[a,b](f (x)) = − n 90 (b − a)5 32n5 y(4) (ξ) = − (b − a)5 2880n4 y(4) (ξ), (7.123) that is, relation (7.105) which had to be stated.
  • 401. EULER’S AND GREGORY’S FORMULAE 393 Corollary 7.4 In the conditions of Proposition 7.4, denoting M = sup x∈[a,b] |f (4) (x)|, (7.124) the relation |ε[a,b](f (x))| ≤ M 2880n4 (b − a)5 (7.125) is valid Demonstration. From equation (7.123) it follows that |ε[a,b]f (x)| = (b − a)5 2880n4 |y(4) (ξ)| ≤ (b − a)5 2880n4 sup ξ∈[a,b] |f (4) (ξ)| = M(b − a)5 2880n4 . (7.126) Observation 7.9 If the number n of the division points increases, then the error decreases in direct proportion to n4 . But the growth of n cannot be as great as we wish, because the calculation time may increase too much. 7.8 EULER’S AND GREGORY’S FORMULAE Definition 7.7 We define the operators ∇, E, D, J called operator of backward differentiation, operator of shifting, operator of differentiation, and operator of integration, respectively, by the formulae ∇f (x) = f (x) − f (x − h), (7.127) Ef(x) = f (x + h), (7.128) Df (x) = f (x), (7.129) Jf (x) = x+h x f (t)dt, (7.130) where h is the division step. Observation 7.10 (i) There exist the immediate relations EP f (x) = f (x + ph), p ∈ N, (7.131) DJf (x) = JDf(x), (7.132) DJ = JD = , (7.133) D−1 f (x) = F(x) + C, (7.134) where F(x) is a primitive of f (x), while C is a constant, D−1 f (x) = Jf(x), (7.135) = E − 1, ∇ = 1 − E−1 , (7.136)
  • 402. 394 NUMERICAL DIFFERENTIATION AND INTEGRATION where 1 is the identity operator, 1f (x) = f (x), DJ = JD = E − 1, (7.137) p = Ep ∇p = (E − 1)p = Ep − C1 pEp−1 + C2 pEp−2 − · · · + (−1)p−1 Cp−1 p E + (−1)p , (7.138) ∇p yk = yk − C1 k yk−1 + C2 pyk−2 − · · · + (−1)p−1 yk−p+1 + (−1)p yk−p, (7.139) (1 − ∇)(1 + ∇ + ∇2 + · · · + ∇p ) = 1 − ∇p+1 , (7.140) (1 − ∇)−1 = 1 + ∇ + ∇2 + · · · + ∇p + · · · = ∞ i=0 ∇i . (7.141) (ii) If the function f is a polynomial of nth degree, then (1 − ∇)−1 = 1 + ∇ + ∇2 + · · · + ∇n . (7.142) Let us consider the sum k−1 l=m f (x0 + lh) = f (xm) + f (xm+1) + · · · + f (xk−1) = ym + ym+1 + · · · yk−1, (7.143) where yi = f (xi), i ∈ N. The problem is connected to finding a function F(x) with the property F(x) = f (x). Indeed, if we find such a function F(x), then k−1 l=m f (x0 + lh) = F(xm+1) − F(xm) + F(xm+2) − F(xm+1) + · · · + F(xk) − F(xk−1) = F(xk) − F(xm). (7.144) Writing F(x) = −1 f (x), we have −1 f (xk) = C + k−1 l=m f (xl), (7.145) k−1 l=l0 f (xl) = −1 f (xk) − −1 f (xl0 ), (7.146) where C is a constant, while l0 is an integer for which m ≤ l0 ≤ k. If f is a polynomial, then p−1 l=0 f (xl) = (1 + E + E2 + · · · + Ep−1 )f (x0) = Ep − 1 E − 1 f (x0) = (1 + )p − 1 f (x0) = p + p (p − 1) 2! + p(p − 1)(p − 2) 3! 2 + · · · + p (p − 1) · · · (p − n + 1) n! n f (x0), (7.147) where n is its degree.
  • 403. EULER’S AND GREGORY’S FORMULAE 395 Let us remark that the formula is useful for n small in comparison with p. Taking into account the identity DJ −1 = 1, (7.148) obtained from equation (7.133), it follows that hf (x) = hD ehD − 1 Jf(x). (7.149) Definition 7.8 The coefficients Bi of the expansion t et − 1 = ∞ i=0 Bi i! ti (7.150) are called Bernoulli’s numbers.3 Bernoulli’s numbers verify the property B1 = − 1 2 , B2p+1 = 0, p ∈ N, P = 0. (7.151) Hence it follows that expression (7.149) now becomes hf (x) = ∞ i=0 Bi i! hi Di Jf(x) (7.152) or hf (x) = x+h x f (t)dt + ∞ i=1 Bi i! hi Di Jf(x). (7.153) If we take into account that Di Jf(x) = Di−1 (f (x + h) − f (x)), (7.154) then relation (7.153) becomes f (x) = 1 h x+h x f (t)dt + ∞ i=1 Bi i! hi−1 (f (i−1) (x + h) + f (i−1) (x)) (7.155) or, equivalently, p−1 l=0 f (xl) = 1 h xp x0 f (t)dt + ∞ i=1 Bi i! hi−1 (f (i−1) (xp) + f (i−1) (x0)), (7.156) called the first Euler formula or the first Euler–Maclaurin formula.4 3 The numbers are called after Jacob Bernoulli (1654–1705) who used them in the book Ars Conjectandi published in 1713. The numbers are also known by Seki Takakazu (Seki K¯owa) (1642–1708). 4The formulae were called after Leonhard Euler (1707–1783) and Colin Maclaurin (1698–1746) who discovered them in 1735.
  • 404. 396 NUMERICAL DIFFERENTIATION AND INTEGRATION If we take into account equation (7.151), then relation (7.156) reads p−1 l=0 f (xl) = 1 h xp x0 f (t)dt + 1 2 (f (x0) + f (xp)) + ∞ i=1 B2i (2i)! h2i−1 (f (2i−1) (xp) + f (2i−1) (x0)). (7.157) Obviously, if f is a polynomial, then the infinite sum on the right side becomes a finite one. Analogically, we obtain also the second Euler formula or the second Euler–Maclaurin formula, in the form p−1 l=0 f xl + h 2 = 1 h xp x0 f (t)dt − ∞ i=1 (1 − 21−2i )B2i (2i)! h2i−1 (f (2i−1) (xp) + f (2i−1) (x0)). (7.158) In the first Euler formula we express the derivatives at the point x0 by forward differences, while the derivatives at the point xp by the backward differences in the form hf (x0) = y0 − 1 2 2 y0 + 1 3 3 y0 − 1 4 4 y0 + 1 5 5 y0 − · · · , hf (xp) = ∇yp + 1 2 ∇2 yp + 1 3 ∇3 yp + 1 4 ∇4 yp + 1 5 ∇5 yp + · · · , h3 f (x0) = 3 y0 − 3 2 4 y0 + 7 4 5 y0 − · · · , h3 f (xp) = ∇3 yp + 3 2 ∇4 yp + 7 4 ∇5 yp · · · , (7.159) then we obtain Gregory’s formula5 xp x0 f (t)dt = h 1 2 y0 + y1 + y2 + · · · + yp−1 + 1 2 yp − h 12 (∇yp − y0) − h 24 (∇2 yp + 2 y0) − 19h 720 (∇3 yp − 3 y0) + 3h 180 (∇4 yp + 4 y0) − 863h 60480 (∇5 yp − 5 y0) − · · · (7.160) 7.9 ROMBERG’S FORMULA Let us suppose that the error in the calculation of the integral I = b a f (x)dx (7.161) 5The formula was discovered by James Gregory (1638–1675) in 1670.
  • 405. ROMBERG’S FORMULA 397 may be written in the form E = Chp f (p) (ξ), (7.162) where integration step is h, C is a positive constant that does not depend on h, p is a natural nonzero number, while ξ ∈ (a, b). If we calculate the integral (7.161) with the integration steps h1 and h2, then the errors are E1 = I − I1 = Ch p 1 f (p) (ξ1), (7.163) E2 = I − I2 = Ch p 2 f (p) (ξ2). (7.164) Let us remark that, in general, ξ1 = ξ2. Let us suppose that f (p)(ξ1) ≈ f (p)(ξ2). Under these conditions, the integral I may be approx- imated by Richardson’s formula of extrapolation6 I = h p 1 I2 − h p 2 I1 h p 1 − h p 2 = I2 + I2 − I1 h1 h2 p − 1 . (7.165) If, for example, h2 = λh1, then I = I2 − λp I1 1 − λp = I2 + I2 − I1 1 λ p − 1 . (7.166) Usually, we consider h2 = h1/2 and it follows that I = 2p I2 − I1 2p − 1 = I2 + I2 − I1 2p − 1 . (7.167) On the other hand, the error in the trapezium formula may be put in the form E = C1h2 + C2h4 + · · · + Cph2p + (b − a)h2p+2 + B2p+2 (2p + 2)! f (2p+2) (ξ), (7.168) where B2k are Bernoulli’s numbers. Suppose now that the integration step is chosen of the form hn = b − a 2n , (7.169) and let us denote by I(0) n the value of the integral, calculated with the step hn. We apply Richardson’s extrapolation formula, in which I(0) n+1 is the value of the same integral with a halved step. We obtain the approximation I(1) n = 2n I(0) n+1 − I(0) n 2n − 1 . (7.170) The procedure may continue and we obtain the general recurrence formulae I(p) n = 4p I (p−1) n+1 − I (p−1) n 4p − 1 , (7.171) I (p) 0 = 4p I (p−1) 1 − I (p−1) 0 4p − 1 . (7.172) 6The formula was published by Lewis Fry Richardson (1881–1953) in 1910.
  • 406. 398 NUMERICAL DIFFERENTIATION AND INTEGRATION TABLE 7.1 Table of the Romberg Procedure I(0) 0 I(0) 1 I(1) 0 I(0) 2 I(1) 1 I(2) 0 I(0) 3 I(1) 2 I(2) 1 I(3) 0 ... ... ... ... ... Using these formulae, the approximation I (p) 1 has an error of the order h2p+2 , so that, for example, in expression (7.168) of the error for I(1) 1 , the term C1h2 does not appear any longer. This procedure is called the Romberg procedure. 7 Usually, we work in a table form, where the integrals are put as shown in Table 7.1. 7.10 CHEBYSHEV’S QUADRATURE FORMULAE In the Newton–Cˆotes formulae the division knots have been arbitrarily chosen, the only condition imposed being that of their equidistance. If this condition is not put and we choose certain points as division knots, then we obtain Chebyshev’s quadrature formulae.8 Let us consider the integral I = 1 −1 f (x)dx (7.173) and let us write the relation I ≈ n i=1 Aif (xi), (7.174) where Ai are certain constants, and xi are the division knots. Obviously, the relation (7.174) is an equality only in certain cases. In the case of Chebyshev’s quadrature formulae, the following conditions are put: (a) the constants Ai, i = 1, n, are equal, that is A1 = A2 = · · · = An = A; (7.175) (b) the quadrature formula (7.174) is exact for any polynomial till the degree n inclusive. Observation 7.11 (i) Let us write the quadrature formula (7.174) for the polynomial f (x) = 1. We obtain I = 1 −1 dx = 2; (7.176) taking into account the condition a in Section 7.10, it follows that I = A1 + A2 + · · · + An = nA, (7.177) 7 Werner Romberg (1909–2003) published the procedure in 1955. In fact, the procedure is an application of the Richardson extrapolation on the trapezoid formula. 8The formula was called in honor of Pafnuty Lvovich Chebyshev (1821–1894).
  • 407. CHEBYSHEV’S QUADRATURE FORMULAE 399 from which A1 = A2 = · · · = An = A = 2 n . (7.178) (ii) Because the polynomials 1, x, x2 , . . . , xn form a basis for the vector space of polynomials of degree at most n, it follows that we must verify the condition b in Section 7.10 for these polynomials only. But 1 −1 xk dx = xk+1 k + 1 1 −1 = 1 − (−1)k+1 k + 1 (7.179) and we obtain the system x1 + x2 + · · · + xn = 0, x1 1 + x2 2 + · · · + x2 n = 2 3 × n 2 , x3 1 + x3 2 + · · · + x3 n = 0, . . . , xk 1 + xk 2 + · · · + xk n = 1 − (−1)k+1 k + 1 × n 2 , . . . , xn 1 + xn 2 + · · · + xn n = 1 − (−1)n+1 n + 1 × n 2 . (7.180) The solving of system (7.180) in the unknowns x1, x2, . . . , xn is equivalent to the solving of an algebraic equation of degree n. A question arises: are the solutions of system (7.180) real and contained in the interval [−1, 1]? The answer to this question is positive only for n ≤ 7 and n = 9.9 It has been shown that for n = 8 and n ≥ 10 system (7.180) has not only real roots, hence Chebyshev’s method cannot be applied. Observation 7.12 Let the integral be J = b a F(u)du (7.181) for which we make the change of variable u = b + a 2 + b − a 2 x, du = b − a 2 dx. (7.182) It follows that J = b a F(u)du = 1 −1 F b + a 2 + b − a 2 x b − a 2 dx; (7.183) denoting f (x) = F b + a 2 + b − a 2 x b − a 2 , (7.184) we obtain form (7.173). The quadrature formula now reads b a F(u)du ≈ 2 n b − a 2 n i=1 F(ui) = b − a n n i=1 F(ui), (7.185) 9This result belongs to Francis Begnaud Hildebrand (1915–2002) who published it in Introduction to Numerical Analysis in 1956.
  • 408. 400 NUMERICAL DIFFERENTIATION AND INTEGRATION where ui = b + a 2 + b − a 2 xi. (7.186) 7.11 LEGENDRE’S POLYNOMIALS Let us consider an interval [a, b] ⊂ R and let f and g be two functions of class at least Cn on [a, b]. The obvious relation b a f (x)g(n) (x)dx = f (x)g(n−1) (x)dx|b a − f (x)g(n−2) (x)|b a + f (x)g(n−3) (x)|b a − · · · + (−1)n−1 f (n−1) (x)g(x)|b a + (−1)n b a f (n) (x)g(x)dx (7.187) takes place in these conditions. We will particularize relation (7.187) taking for f (x) any polynomial Q(x) of degree at most n − 1 and for g(x) the polynomial An(x − a)n(x − b)n, An ∈ R. Because the degree of Q(x) is at most n − 1, we get Q(n) (x) = 0, b a Q(n) (x)g(x)dx = 0. (7.188) From g(x) = An(x − a)n (x − b)n (7.189) we obtain g(a) = g (a) = g (a) = · · · = gn−1 (a) = 0, g(b) = g (b) = g (b) = · · · = gn−1 (b) = 0 (7.190) and relation (7.187) is reduced now to An b a Q(x) dn dxn [(x − a)n (x − b)n ]dx = 0. (7.191) Let us now denote Pn the polynomial of degree n by Pn, given by Pn(x) = An dn dxn [(x − a)n (x − b)n ]. (7.192) On the other hand, Q(x) is an arbitrary polynomial of degree at most n − 1, so that for Q(x) we may take the polynomials of a basis of the vector space of the polynomials of degree at most n − 1, that is the polynomials 1, x, x2 , . . . , xn−1 . We may write b a Pn(x) = 0, b a xPn(x) = 0, . . . , b a xn−1 Pn(x) = 0. (7.193) We observe that we may also write the relation b a Pm(x)Pn(x)dx = 0, m = n. (7.194)
  • 409. LEGENDRE’S POLYNOMIALS 401 Indeed, let us suppose that m < n; we may consider that Pm(x) is one of the polynomials Q(x) of degree at most n − 1. Observation 7.13 Relation (7.194) means that the sequence {Pn(x)}x∈N is a sequence of orthog- onal polynomials on [a, b]. Observation 7.14 The polynomials Pn are unique, with the exception of a multiplicative constant. Indeed, let us suppose that there exists a sequence n (x) n∈N of orthogonal polynomials too. We may write the relations b a Q(x) n (x)dx = 0, b a Q(x)Pn(x)dx = 0, b a Q(x)CnPn(x)dx = 0, (7.195) where Cn is an arbitrary constant, while Q(x) is an arbitrary polynomial of degree at most equal to n − 1. From the first and the third relation (7.195) we obtain b a CnPn (x) − n (x) Q(x)dx = 0. (7.196) We choose the constant Cn so that the polynomial CnPn(x) − n(x) does have a degree at most n − 1 and we take Q(x) = CnPn(x) − n (x). (7.197) We obtain the expression b a CnPn (x) − n (x) 2 dx = 0, (7.198) hence, CnPn(x) − n (x) = 0, (7.199) that is, {Pn(x)}n∈N are uniquely determined excepting a multiplicative constant. Definition 7.9 The sequence of polynomials10 Pn(x) = 1 2n n! dn dxn [(x2 − 1)n ] (7.200) is called the sequence of Legendre polynomials. Theorem 7.1 Let {Pn(x)}n∈N be the sequence of Legendre polynomials and let Rn(x) be the polynomials Rn(x) = 2n n!Pn(x). (7.201) 10These polynomials were introduced by Adrien–Marie Legendre (1752–1833) in Recherches sur la figure des plan`etes published in 1784.
  • 410. 402 NUMERICAL DIFFERENTIATION AND INTEGRATION Under these conditions, the following affirmations hold: (i) for any n ∈ N Pn(1) = 1; (7.202) (ii) for any n ∈ N Pn(−1) = (−1)n ; (7.203) (iii) all the real roots of Legendre’s polynomials Pn(x) are in the interval (−1, 1) for any n ∈ N; (iv) for any n ∈ N we have (x2 − 1)Rn(x) = nxRn(x) − 2n2 Rn−1(x); (7.204) (v) for any n ∈ N we have Rn+1(x) = 2(2n + 1)xRn(x) − 4n2 Rn−1(x); (7.205) (vi) the sequence of the polynomials Rn(x) forms a Sturm sequence. Demonstration. (i) We rewrite the Legendre polynomial (7.200) by means of the Leibniz formula dn dxn (uv) = dn u dxn v + C1 n dn−1 u dxn−1 dv dx + C2 n dn−2 u dxn−2 d2 v d2x + · · · + u dn v dxn , (7.206) assuming u = (x − 1)n , v = (x + 1)n . (7.207) It follows that Pn(x) = 1 2n n! {[(x − 1)n ](n) (x + 1)n + C1 n[(x − 1)n ](n−1) [(x + 1)n ] + C2 n[(x − 1)n ](n−2) [(x + 1)n ] + · · · + (x − 1)n [(x + 1)n ](n) }. (7.208) But [(x − 1)n ](k) |x=1 = 0, k = 1, n − 1, [(x − 1)n ](n) = n! (7.209) and [(x + 1)n ](k) |x=−1 = 0, k = 1, n − 1, [(x + 1)n ](n) = n!. (7.210) Relation (7.208) leads to Pn(1) = 1 2n n! n!(1 + 1)n = 1. (7.211) (ii) From equation (7.208) we get Pn(−1) = 1 2n n! n!(−1 − 1)n = (−1)n . (7.212)
  • 411. LEGENDRE’S POLYNOMIALS 403 (iii) Let us observe that the polynomial (x2 − 1)n and its n − 1 successive derivatives vanish at the points x = −1 and x = 1. Taking into account Rolle’s theorem, under these conditions, the first derivative will have a real root in the interval (−1, 1). The first derivative vanishes at three points x = −1, x = 1 and at a point between −1 and 1 and it follows that the second derivative will have two distinct roots in the interval (−1, 1). Applying Rolle’s theorem, step by step, it follows that the (n − 1)th derivative has n − 1 distinct roots in the interval (−1, 1), hence Pn(x) has n distinct roots in the interval (−1, 1). (iv) Let us write Rn(x) = [(x2 − 1)n−1 (x2 − 1)](n) , (7.213) a relation to which we apply Leibniz’s formula (7.206) with u = (x2 − 1)n−1 , v = x2 − 1. (7.214) It follows that Rn(x) = [(x2 − 1)n−1 ](n) (x2 − 1) + 2nx[(x2 − 1)n−1 ](n−1) + n(n − 1)[(x2 − 1)n−1 ](n−2) . (7.215) Now, we write Rn(x) = [(x2 − 1)n ](n) = 2n[(x2 − 1)n−1 x](n−1) (7.216) and apply again Leibniz’s formula (7.206) with u = (x2 − 1)n−1 , v = x, (7.217) obtaining Rn(x) = 2nx[(x2 − 1)n−1 ](n−1) + 2n(n − 1)[(x2 − 1)n−1 ](n−2) . (7.218) Multiplying relation (7.215) by 2 and subtracting relation (7.218), we get Rn(x) = 2(x2 − 1)Rn−1(x) + 2nxRn−1(x). (7.219) On the other hand, Rn(x) = [(x2 − 1)n ](n−1) = 2n[(x2 − 1)n−1 x](n) , (7.220) and we may again apply Leibniz’s formula (7.206) with u = (x2 − 1)n−1 , v = x, (7.221) Resulting in Rn(x) = 2nxRn−1(x) + 2n2 Rn−1(x). (7.222) Multiplying relation (7.219) by nx and relation (7.222) by x2 − 1 and subtracting the results thus obtained one of the other, we obtain (x2 − 1)Rn(x) = nxRn(x) − 2n2 Rn−1(x), (7.223) that is, relation (7.204) which had to be stated.
  • 412. 404 NUMERICAL DIFFERENTIATION AND INTEGRATION (v) We make n → n + 1 in relation (7.219) it follows that Rn+1(x) = 2(x2 − 1)Rn(x) + 2(n + 1)xRn(x) (7.224) or, equivalently, 2(x2 − 1)Rn(x) = Rn+1(x) − 2(n + 1)xRn(x). (7.225) We multiply relation (7.223) by 2 and subtract expression (7.225) from the result thus obtained, that is, 0 = 2nxRn(x) + 2(n + 1)xRn(x) − Rn+1(x) − 4n2 Rn−1(x) (7.226) or Rn+1(x) = 2(2n + 1)xRn(x) − 4n2 Rn−1(x), (7.227) that is, relation (7.205) which had to be stated. (vi) The last polynomial Rn(x) preserves a constant sign (i.e., R0(x)), because it is a constant. Two neighboring polynomials Rk(x) and Rk+1(x) cannot simultaneously vanish, because taking into account equation (7.227), Rk−1(x) would vanish too, and step by step R0(x) would also vanish, which is absurd. If Rn(x) = 0, then from equation (7.227) we obtain Rn+1(x0) = −4n2 Rn−1(x0), (7.228) hence Rn+1(x0)Rn−1(x0) < 0. (7.229) Let x0 be a root of Rn(x). From equation (7.223) we obtain (x2 0 − 1)Rn(x0) = nx0Rn(x0) − 2n2 Rn−1(x0) (7.230) and because Rn(x0) = 0, it follows that (1 − x2 0 )Rn(x0) = 2n2 Rn−1(x0). (7.231) But x0 ∈ (−1, 1), because the roots of Legendre’s polynomial Pn(x) = 1 2n n! Rn(x) (7.232) are in the interval (−1, 1), hence 1 − x2 0 > 0. (7.233) From equation (7.231) and equation (7.233) it follows that Rn(x0) and Rn−1(x0) have the same sign. It follows that Rn(x) forms a Sturm sequence.
  • 413. GAUSS’S QUADRATURE FORMULAE 405 7.12 GAUSS’S QUADRATURE FORMULAE Let f : [−1, 1] → R and the quadrature formula11 be I = 1 −1 f (x)dx ≈ n i=1 Aif (xi). (7.234) We wish that formula (7.234) be exact for polynomials of a maximum possible degree N. Because we have 2n unknowns, that is, the constants A1, A2, . . . , An and the knots x1, x2, . . . , xn of the division, it follows that N = 2n − 1, (7.235) because a polynomial of degree 2n − 1 has 2n coefficients. Proceeding as at Chebyshev’s quadrature formulae, it follows that it is sufficient to satisfy relation (7.234) only for the polynomials 1, x, x2 , x3, . . . , x2n−1, because they form a basis in the vector space of the polynomials of degree at most 2n − 1. On the other hand, 1 −1 xk dx = 1 − (−1)k+1 k + 1 (7.236) and it follows that the system A1 + A2 + · · · + An = 1 −1 dx = 2, A1x1 + A2x2 + · · · + Anxn = 1 −1 x dx = 0, A1x2 1 + A2x2 2 + · · · + Anx2 n = 1 −1 x2 dx = 2 3 , . . . , A1xk 1 + A2xk 2 + · · · + Anxk n = 1 −1 xk dx = 1 − (−1)k+1 k + 1 , . . . , A1x2n−1 1 + A2x2n−1 2 + · · · + Anx2n−1 n = 1 −1 x2n−1 dx = 0. (7.237) Let us consider that f (x) = xk Pn(x), k = 0, n − 1, (7.238) where Pn(x) is Legendre’s polynomial of degree n. Taking into account the properties of the Legendre polynomial, we have 1 −1 xk Pn(x)dx = 0, k = 0, n − 1, (7.239) and from formula (7.234) we get 1 −1 xk Pn(x)dx = n i=1 Aixk i Pn(xi), k = 0, n − 1. (7.240) Equating the last two relations, it follows that xi are the roots of Legendre’s polynomial of nth degree, all these roots being real, distinct, and situated in the interval (−1, 1). 11The method was developed by Carl Friedrich Gauss (1777–1855) in Methodus nova integralium valores per approximationem inveniendi in 1815. The method is also known as Gauss–Legendre quadrature.
  • 414. 406 NUMERICAL DIFFERENTIATION AND INTEGRATION We select now the first n equations from system (7.237), which form a linear system of n equations with n unknowns, that is, the coefficients A1, A2, . . . , An. The determinant of this system is a Vandermonde one, = n i,j=1 i<j (xi − xj ) = 0, (7.241) because the roots xi of Legendre’s polynomial Pn(x) are distinct. It thus follows that the system has a unique solution. Observation 7.15 If we have to calculate J = b a F(u)du, (7.242) then, by a change of variable u = b + a 2 + b − a 2 x, du = b − a 2 dx, (7.243) we obtain J = 1 −1 F b + a 2 + b − a 2 x b − a 2 dx; (7.244) denoting f (x) = F b + a 2 + b − a 2 x b − a 2 , (7.245) we obtain form (7.234) of the integral, b a F(u)du = b − a 2 n i=1 AiF b + a 2 + b − a 2 xi , (7.246) where xi are the roots of the Legendre polynomial. 7.13 ORTHOGONAL POLYNOMIALS Let us denote by R[X] the set of the polynomials with real coefficients in the indeterminate X. We define the scalar product on R[X] by P (x), Q(x) = b a P (x)Q(x)ρ(x)dx, (7.247) where ρ(x) is a weight function. Definition 7.10 We say that the polynomials P and Q are orthogonal if and only if P, Q = 0, where the scalar product , has been defined by relation (7.247). Observation 7.16 Starting from the sequence of polynomials 1, x, x2 , . . . , we construct a sequence of orthogonal polynomials P0, P1, . . . , Pn, with the help of the Gramm–Schmidt procedure. Thus, we have P0 = 1, P1 = x − x, P0 P0 2 P0, . . . , Pn = xn − n−1 i=1 xn , Pi Pi 2 Pi, . . . , (7.248)
  • 415. ORTHOGONAL POLYNOMIALS 407 where marks the norm defined by P = P, P . (7.249) We may thus construct various sequences of orthogonal polynomials. 7.13.1 Legendre Polynomials In the case of Legendre’s polynomials, we choose a = −1, b = 1, ρ(x) = 1; It follows that P0 = 1, (7.250) P1 = x − x, 1 P0 2 × 1 = x, (7.251) P2 = x2 − x2, 1 P0 2 × 1 − x2, x P1 × x = x2 − 1 3 , (7.252) P3 = x3 − x3 , 1 P0 2 × 1 − x3 , x P1 2 × x − x3 , x2 − 1 3 P2 2 × x2 − 1 3 = x3 − 3 5 x, . . . (7.253) 7.13.2 Chebyshev Polynomials We define the Chebyshev polynomials by a = −1, b = 1, ρ(x) = 1/ √ 1 − x2. Because Ik = 1 −1 xk √ 1 − x2 dx = (−xk−1 1 − x2)|1 −1 + 1 −1 (k − 1)xk−2 1 − x2dx = (k − 1) 1 −1 xk−2 dx √ 1 − x2 − (k − 1) 1 −1 xk dx √ 1 − x2 = (k − 1)Ik−2 − (k − 1)Ik, (7.254) we obtain kIk = (k − 1)Ik−2, (7.255) that is, Ik = k − 1 k Ik−2. (7.256) On the other hand, I0 = 1 −1 dx √ 1 − x2 = π, (7.257) I1 = 1 −1 x dx √ 1 − x2 = 0, (7.258) hence I2p+1 = 0, p ∈ N, (7.259) I2 = 1 2 I0 = π 2 , I4 = 3 4 I2 = 3π 8 , . . . , I2p = 2p − 1 2p I2p−2 = (2p − 1)!! (2p)!! π. (7.260)
  • 416. 408 NUMERICAL DIFFERENTIATION AND INTEGRATION We obtain the Chebyshev polynomials in the form P0 = 1, (7.261) P1 = x − x, 1 P0 2 × 1 = x, (7.262) P2 = x2 − x2 , 1 P0 2 × 1 − x2 , x P1 2 × x = x2 − 1 2 , (7.263) P3 = x3 − x3 , 1 P0 2 × 1 − x3 , x P1 2 × x − x3, x2 − 1 2 P2 2 × x2 − 1 2 = x3 − 3 4 x, . . . (7.264) 7.13.3 Jacobi Polynomials In case of the Jacobi polynomials,12 a = −1, b = 1, ρ(x) = (1 − x)α (1 + x)β , α > −1, β > −1, α, β integers. We observe that we obtain various sequences of orthogonal polynomials, depending on the choice of the parameters α and β. If α = β = 0, then we get Legendre’s polynomials. Let us take α = β = 1. We have P0 = 1, (7.265) P1 = x − x, 1 P0 2 × 1 = x, (7.266) P2 = x2 − x2, 1 P0 2 × 1 − x2, x P1 2 x = x2 − 1 5 , (7.267) P3 = x3 − x3 , 1 P0 2 × 1 − x3 , x P1 2 × x − x3 , x2 − 1 5 P2 2 × x2 − 1 5 = x3 − 3 7 x, . . . (7.268) 7.13.4 Hermite Polynomials In the case of the Hermite polynomials13 we have a = −∞, b = ∞, ρ(x) = exp(−x2 ). We may write Ik = ∞ −∞ xk e−x2 dx = − xk−1 2 e−x2 ∞ −∞ + (k − 1) ∞ −∞ xk−2 2 e−x3 dx = k − 1 2 Ik−2. (7.269) On the other hand, I0 = ∞ −∞ e−x2 dx = √ π, (7.270) I1 = ∞ −∞ xe−x2 dx = 0; (7.271) 12 These polynomials were introduced by Carl Gustav Jacob Jacobi (1804–1851). 13These polynomials were named in honor of Charles Hermite (1822–1901) who studied them in Sur un nouveau d´eveloppement en s´erie de fonctions in 1864. They were also studied by Pierre–Simon Laplace (1749–1827) in a memoir since 1810 and Chebyshev in Sur le d´eveloppement des fonctions `a une seulle variable in 1859.
  • 417. ORTHOGONAL POLYNOMIALS 409 hence, it follows that I2p+1 = 0, p ∈ N, (7.272) I0 = √ π, I2 = 1 2 I0 = √ π 2 , I4 = 3 4 √ π, . . . , I2p = 2p − 1 2 I2p−2 = (2p − 1)!! 2p √ π, . . . (7.273) We obtain the Hermite polynomials P0 = 1, (7.274) P1 = x − x, 1 P0 2 × 1 = x, (7.275) P2 = x2 − x2 , 1 P0 2 × 1 − x2 , x P1 2 × x = x2 − 1 2 , (7.276) P3 = x3 − x3 , 1 P0 2 × 1 − x3 , x P1 2 × x − x3 , x2 − 1 2 P2 2 × x2 − 1 2 = x3 − 3 2 x, . . . (7.277) 7.13.5 Laguerre Polynomials The Laguerre polynomials14 are defined by a = 0, b = ∞, ρ(x) = e−x xα , α integer. Obviously, we obtain various sequences of Laguerre polynomials as function of the exponent α. We may consider the case α = 1. Taking into account that Ik = ∞ 0 xk xe−x dx = (−xk+1 e−x )|∞ 0 + (k + 1) ∞ 0 xk−1 xe−x dx = (k + 1)Ik−1, (7.278) I0 = ∞ 0 xe−x dx = (−xe−x )|∞ 0 + ∞ 0 e−x dx = 1, (7.279) we get I1 = 2I0 = 2, I2 = 3I1 = 6, . . . , Ik = (k + 1)Ik = (k + 1)!. (7.280) We obtain thus Laguerre’s polynomials P0 = 1, (7.281) P1 = x − x, 1 P0 2 × 1 = x − 2, (7.282) P2 = x2 − x2 , 1 P0 2 × 1 − x2 , x − 2 P1 2 × (x − 2) = x2 − 6x + 6, (7.283) P3 = x3 − x3, 1 P0 2 × 1 − x3, x − 2 P1 2 × (x − 2) − x3, x2 − 6x + 6 P2 2 × (x2 − 6x + 6) · · · = x3 − 12x2 + 36x − 24, (7.284) 14They are called after Edmond Nicolas Laguerre (1834–1886) who studied them in 1879.
  • 418. 410 NUMERICAL DIFFERENTIATION AND INTEGRATION 7.13.6 General Properties of the Orthogonal Polynomials Let us begin with a remark. Observation 7.17 (i) The complex roots λ1, λ2, . . . , λn of the polynomials Pj , j = 1, m, given by formulae (7.248), satisfy the relation λk = xQi, Qi Qi 2 , k = 1, n, (7.285) in which Qi(x) = n l=1 l=i (x − λl), i = 1, n. (7.286) Indeed, if λk is a root of Pn(x), then Pn(x) = (x − λk)Qk(x); (7.287) from the orthogonality condition Qk, Pn = 0 (7.288) we get 0 = Qk, Pn = Qk, xQk − Qk, λkQk , (7.289) that is, a relation equivalent to equation (7.285). (ii) The scalar product defined by relation (7.247) has the property of symmetry, that is, we have the relation xP, Q = P, xQ . (7.290) Proposition 7.5 If the scalar product considered in relation (7.248) satisfies the condition (7.285), then the polynomials P0, P1, . . . , Pm verify the relations P0(x) = 1, P1(x) = x − α0, . . . , Pi+1(x) = (x − αi)Pi(x) − βiPi−1(x), i = 1, m − 1, (7.291) where αi = xPi, Pi Pi 2 , i = 1, m − 1, (7.292) βi = Pi 2 Pi−1 2 , i = 1, m − 1. (7.293) Demonstration. The first relations (7.291) result directly from formulae (7.248). Let it now be m ≥ 2, and for any i = 1, m − 1, let us consider Qi+1(x) = (x − αi)Pi(x) − β1Pi−1(x), (7.294)
  • 419. ORTHOGONAL POLYNOMIALS 411 with Pi−1 and Pi given by relation (7.248). Because Pi−1 and Pi are orthogonal, we get Qi+1, Pi = xPi, Pi − αi Pi 2 . (7.295) Moreover, Qi+1, Pi+1 = Pi, xPi − βi Pi−1 2 = Pi, xPi−1 − Pi = 0, (7.296) because xi does not appear in the difference xPi−1 − Pixi , while Pi is orthogonal to the polynomials 1, x, x2, . . . , xi−1. On the other hand, for any k = 0, i − 2, the polynomial Pi is orthogonal to the polynomials Pk and xPk; hence Qi+1, Pk = Pi, xPk − αi Pi, Pk − βi Pi−1, Pk = 0. (7.297) We thus deduce that the polynomial Qi+1 is orthogonal to all the polynomials of degree strictly less than i and is of the form Qi+1(x) = xi+1 + R(x), (7.298) where the degree of R is at most equal to i. On the other hand, the polynomials P0, P1, . . . , Pi form an orthogonal basis for the space of polynomials of degree at most equal to i, so that R(x) will be written in the form R(x) = i k=0 R, Pk Pk 2 Pk. (7.299) From the relation xi+1 , Pk + R, Pk = Qi+1, Pk = 0, k = 0, i, (7.300) we deduce R, Pk = − xi+1 , Pk , (7.301) hence, Qi+1 = Pi+1, the proposition being stated. Theorem 7.2 If the scalar product (7.247) satisfies the conditions of symmetry (7.290), then the roots of the polynomial Pn constructed with relation (7.247) and denoted by λ1, λ2, . . . , λn are real, distinct, and verify the relations λi = xLi, Li Li 2 , i = 1, n, (7.302) in which Li(x) = n k=1 k=i x − λk λi − λk . (7.303) Demonstration. Because xQj , Qj = Qj , xQj = xQj , Qj , (7.304) where the upper bar marks the complex conjugate, taking into account Proposition 7.5 we deduce that the roots are real and distinct.
  • 420. 412 NUMERICAL DIFFERENTIATION AND INTEGRATION If the coefficients of the polynomials Pn are real numbers, then the complex roots of these polynomials are conjugate two by two, which means that the polynomial Pn is written in the form Pn(x) = [(x − a)2 + b2 ]R(x), (7.305) where a and b are real numbers, while R is a polynomial with real coefficients of degree n − 2. We may write successively 0 = Pn, R = [(x − a)2 + b2 ]R, R = (x − a)2 R, R + b2 R, R = (x − a)R 2 + b2 R 2 > 0, (7.306) which is absurd. If the polynomial P would have a multiple real root a, then Pn(x) = (x − a)2 R(x), (7.307) where R is a polynomial of degree n − 2, which may have a as root. We have 0 = Pn, R = (x − a)2 R, R = (x − a)R 2 > 0 (7.308) obtaining again a contradiction. Formula (7.302) is a consequence of Proposition 7.5. 7.14 QUADRATURE FORMULAE OF GAUSS TYPE OBTAINED BY ORTHOGONAL POLYNOMIALS We have calculated in the previous paragraph various orthogonal polynomials till the third degree. Let P be such a polynomial of degree 3, and denote by x1, x2 and x3 its real and distinct roots. We search a quadrature formula of the form b a f (x)dx ≈ A1f (x1) + A2f (x2) + A3f (x3), (7.309) where A1, A2 and A3 are constants; the formula is exact for polynomials of maximum possible degree. We have b a dx = b − a, b a x dx = b2 − a2 2 , b a x2 dx = b3 − a3 3 , (7.310) obtaining thus a linear system of three equations with three unknowns A1 + A2 + A3 = b − a, A1x1 + A2x2 + A3x3 = b2 − a2 2 , (7.311) A1x2 1 + A2x2 2 + A3x2 3 = b3 − a3 3 . We deduce the values A1, A2, and A3 from system (7.311).
  • 421. QUADRATURE FORMULAE OF GAUSS TYPE OBTAINED BY ORTHOGONAL POLYNOMIALS 413 Obviously, if we wish to have a quadrature formula at n points, then we consider the polynomial Pn with the roots x1, x2, . . . , xn; it follows that the system A1 + A2 + · · · + An = b − a, A1x1 + A2x2 + · · · + Anxn = b2 − a2 2 , . . . , A1xn−1 1 + A2xn−1 2 + · · · + Anxn−1 n = bn − an n . (7.312) 7.14.1 Gauss–Jacobi Quadrature Formulae The Jacobi polynomial of second degree is given (the case α = β = 1) by P2(x) = x2 − 1 5 ; (7.313) it has the roots x1 = − 1 5 , x2 = 1 5 (7.314) and it follows that the system A1 + A2 = 2, −A1 1 5 + A2 1 5 = 0, (7.315) with the solution A1 = A2 = 1. We obtain the Gauss–Jacobi quadrature formula 1 −1 f (x)dx ≈ f − 1 5 + f 1 5 . (7.316) Considering now the Jacobi polynomial of third degree (the case α = β = 1) P3(x) = x3 − 3 7 x, (7.317) we obtain the roots x1 = − 3 7 , x2 = 0, x3 = 3 7 (7.318) and the system A1 + A2 + A3 = 2, −A1 3 7 + A3 3 7 = 0, 3 7 A1 + 3 7 A3 = 2 3 , (7.319) with the solution A1 = 7 9 , A2 = 4 9 , A3 = 7 9 . (7.320) It follows that the Gauss–Jacobi quadrature formula 1 −1 f (x)dx ≈ 7 9 f − 3 7 + 4 9 f (0) + 7 9 f 3 7 . (7.321)
  • 422. 414 NUMERICAL DIFFERENTIATION AND INTEGRATION 7.14.2 Gauss–Hermite Quadrature Formulae A formula of the form ∞ −∞ e−x2 f (x)dx ≈ n i=1 Aif (xi) (7.322) is searched; this one is exact for f polynomial of the maximum possible degree. The Hermite polynomials P1(x) = x has the root x1 = 0, so that formula (7.322) becomes ∞ −∞ e−x2 f (x)dx ≈ A1f (0). (7.323) Choosing f (x) = 1, we obtain ∞ −∞ e−x2 dx = √ π = A1 (7.324) and the first Gauss–Hermite quadrature formula reads ∞ −∞ e−x2 dx = √ πf (0). (7.325) Let us consider now the Hermite polynomial P2(x) = x2 − 1/2, with the roots x1 = − 1 2 , x2 = 1 2 ; (7.326) the quadrature formula is now of the form ∞ −∞ e−x2 f (x)dx ≈ A1f − 1 2 + A2f 1 2 . (7.327) Taking f (x) = 1 and f (x) = x, we obtain the linear algebraic system ∞ −∞ e−x2 dx = √ π = A1 + A2, ∞ −∞ xe−x2 dx = 0 = −A1 1 2 + A2 1 2 , (7.328) with the solution A1 = A2 = √ π 2 ; (7.329) it follows that the second Gauss–Hermite quadrature formula ∞ −∞ e−x2 f (x)dx ≈ √ π 2 f − 1 2 + √ π 2 f 1 2 . (7.330) For a Gauss–Hermite quadrature formula in three points, one starts from the Hermite polynomial P3(x) = x3 − 3x/2, the roots of which are x1 = − 3 2 , x2 = 0, x3 = 3 2 . (7.331) From ∞ −∞ e−x2 f (x)dx ≈ A1f (x1) + A2f (x2) + A3f (x3), (7.332)
  • 423. QUADRATURE FORMULAE OF GAUSS TYPE OBTAINED BY ORTHOGONAL POLYNOMIALS 415 choosing f (x) = 1, f (x) = x, and f (x) = x2 , we obtain the linear algebraic system ∞ −∞ e−x2 dx = √ π = A1 + A2 + A3, ∞ −∞ xe−x2 dx = 0 = −A1 3 2 + A3 3 2 , (7.333) ∞ −∞ x2 e−x2 dx = √ π 2 = 3 2 A1 + 3 2 A3, with the solution A1 = √ π 6 , A2 = 2 √ π 3 , A3 = √ π 6 ; (7.334) it thus results the Gauss–Hermite quadrature formula ∞ −∞ e−x2 f (x)dx ≈ √ π 6 f − 3 2 + 2 √ π 3 f (0) + √ π 6 f 3 2 . (7.335) 7.14.3 Gauss–Laguerre Quadrature Formulae We take the quadrature formulae of the form (for α = 1) ∞ 0 xe−x f (x)dx ≈ n i=1 Aif (xi). (7.336) For the Laguerre polynomial P1(x) = x − 2 we find the root x1 = 2 and formula (7.336) becomes ∞ 0 xe−x f (x)dx ≈ A1f (2). (7.337) Choosing f (x) = 1, it follows that the equation ∞ 0 xe−x dx = 1 = A1; (7.338) thus we obtain the Gauss–Laguerre quadrature formula ∞ 0 xe−x f (x)dx ≈ f (2). (7.339) In the case of the Laguerre polynomial P2(x) = x2 − 6x + 6, the roots being x1 = 3 − √ 3, x2 = 3 + √ 3, (7.340) we obtain the relation ∞ 0 xe−x f (x)dx ≈ A1f (3 − √ 3) + A2f (3 + √ 3). (7.341)
  • 424. 416 NUMERICAL DIFFERENTIATION AND INTEGRATION Taking now f (x) = 1 and f (x) = x, it follows that the linear algebraic system ∞ 0 xe−x dx = 1 = A1 + A2, ∞ 0 x2 e−x dx = 2 = A1(3 − √ 3) + A2(3 + √ 3), (7.342) with the solution A1 = 1 + √ 3 2 √ 3 , A2 = √ 3 − 1 2 √ 3 . (7.343) We obtain the Gauss–Laguerre quadrature formula ∞ 0 xe−x f (x)dx ≈ 1 + √ 3 2 √ 3 f (3 − √ 3) + √ 3 − 1 2 √ 3 f (3 + √ 3). (7.344) Let the Laguerre polynomial now be P3(x) = x3 − 12x2 + 36x − 24, the roots of which are x1 ≈ 0.9358, x2 ≈ 3.3050, x3 ≈ 7.7598. (7.345) The quadrature formula reads ∞ 0 xe−x f (x)dx ≈ A1f (x1) + A2f (x2) + A3f (x3). (7.346) Choosing f (x) = 1, f (x) = x, and f (x) = x2 , it follows that the linear algebraic system ∞ 0 xe−x dx = 1 = A1 + A2 + A3, ∞ 0 x2 e−x dx = 2 = A1x1 + A2x2 + A3x3, ∞ 0 x3 e−x dx = 6 = A1x2 1 + A2x2 2 + A3x2 3 , (7.347) from which we obtain the values A1 = 0.589, A2 = 0.391, A3 = 0.020. (7.348) The Gauss–Laguerre quadrature formula reads ∞ 0 xe−x f (x)dx = 0.589f (0.9358) + 0.391f (3.3050) + 0.020f (7.7598). (7.349)
  • 425. OTHER QUADRATURE FORMULAE 417 7.15 OTHER QUADRATURE FORMULAE 7.15.1 Gauss Formulae with Imposed Points We present now the theory in the case in which a point of division is imposed, so that 1 −1 f (x)dx = C0f (x0) + n i=1 Cif (xi), (7.350) where the division point x0 is the imposed point. Let us remark that 2n + 1 parameters remain to be determined, that is, the points xi, i = 1, n, and the coefficients C0, C1, . . . , Cn. Proceeding as in the Gauss method, we have 1 −1 dx = 2 = C0 + n i=1 Ci, 1 −1 x dx = 0 = C0x0 + n i=1 Cixi, 1 −1 x2 dx = 2 3 = C0x2 0 + n i=1 Cix2 i , . . . , 1 −1 x2n−1 dx = 0 = C0x2n−1 0 + n i=1 Cix2n−1 i , 1 −1 x2n dx = 2 2n + 1 = C0x2n 0 + n i=1 Cix2n i . (7.351) Multiplying by x0 each such relation (unless the last one) and subtracting from the following one, we obtain −2 = n i=1 Ci(xi − x0) = n i=1 Cixi − x0 n i=1 Ci, 2 3 = n i=1 Cixi(xi − x0) = n i=1 Cix2 i − x0 n i=1 Cixi, − 2 3 = n i=1 Cix2 i (xi − x0) = n i=1 Cix3 i − x0 n i=1 Cix2 i , . . . , 2 2n + 1 = n i=1 Cix2n−1 i (xi − x0) = n i=1 Cix2n i − x0 n i=1 Cix2n−1 i . (7.352) From the first relation (7.352), we find n i=1 Cixi = −2 + x0 n i=1 Ci, (7.353) which replaced in the second relation (7.352), leads to n i=1 Cix2 i = 2 3 − 2x0 + x2 0 n i=1 Ci. (7.354)
  • 426. 418 NUMERICAL DIFFERENTIATION AND INTEGRATION Step by step, we deduce n i=1 Cixk i = Pk(x0) + xk 0 n i=1 Ci, (7.355) where Pk is a polynomial of (k − 1)th degree. On the other hand, from the first relation (7.351), we obtain n i=1 Ci = 2 − C0, (7.356) so that expression (7.355) becomes n i=1 Cixk i = Pk(x0) + (2 − C0)xk 0 . (7.357) The problem has been reduced to Gauss quadrature in which the terms that define the sums n i=1 Cixk i are no more equal to 1 −1 xkdx, but to the expressions at the right of relation (7.357). We find the same interpolation knots, but the constants C0, C1, . . . , Cn are other ones now. Similarly, we discuss the case in which more points are imposed. 7.15.2 Gauss Formulae in which the Derivatives of the Function Also Appear A formula in which the derivatives of the function also appear is of the form 1 −1 f (x)dx = C1f (x1) + · · · + Cpf (xp) + D1f (x1) + · · · + Dr f (xr ) + E1f (x1 ) + · · · + Esf (xs ) + · · · (7.358) Such a relation may or may not have certain imposed points, but we must be careful because the system which is obtained may be without solutions. As a first example, let us consider a Gauss formula of the form 1 −1 f (x)dx = Cf (y) + Df (y), (7.359) where the unknowns are C, D, and y. We have 1 −1 dx = 2 = C, 1 −1 x dx = 0 = Cy + D, 1 −1 x2 dx = 2 3 = Cy2 + 2Dy. (7.360) From the first relation (7.360) it follows that C = 2, and from the second one we get D = −Cy = −2y, which replaced in the last expression (7.360), leads to 2 3 = 2y2 − 2y2 = 0, (7.361) which is absurd; hence, we cannot have such a Gauss formula. Let us now search a Gauss formula of the form 1 −1 f (x)dx = Cf (−1) + Df (1) + Ef (y), (7.362)
  • 427. OTHER QUADRATURE FORMULAE 419 in which the unknowns are C, D, E, and y. We have 1 −1 dx = 2 = C + D, 1 −1 x dx = 0 = −C + D + E, 1 −1 x2 dx = 2 3 = C + D + 2Ey, 1 −1 x3 dx = 0 = −C + D + 3Ey2 . (7.363) It follows that successively C = 2 − D, E = C − D = 2 − 2D, (7.364) 2 3 = 2 + 2(2 − 2D)y, 2D − 2 + 3(2 − 2D)y2 = 0, (7.365) from which y = 1 3(D − 1) , y2 = 1 9(D − 1)2 , (7.366) y2 = 2 − 2D 3(2 − 2D) = 1 3 . (7.367) For y = 1/ √ 3, we obtain the values (D − 1)2 = 1 3 , D = 1 + 1 √ 3 , or D = 1 − 1 √ 3 , (7.368) E = − 2 √ 3 or E = 2 √ 3 , (7.369) C = 1 − 1 √ 3 or C = 1 + 1 √ 3 (7.370) as well as the quadrature formulae 1 −1 f (x)dx = 1 − 1 √ 3 f (−1) + 1 + 1 √ 3 f (1) − 2 √ 3 f 1 √ 3 , (7.371) 1 −1 f (x)dx = 1 + 1 √ 3 f (−1) + 1 − 1 √ 3 f (1) + 2 √ 3 f 1 √ 3 . (7.372) For y = −1/ √ 3, the formulae read 1 −1 f (x)dx = 1 − 1 √ 3 f (−1) + 1 + 1 √ 3 f (1) − 2 √ 3 f − 1 √ 3 , (7.373) 1 −1 f (x)dx = 1 + 1 √ 3 f (−1) + 1 − 1 √ 3 f (1) + 2 √ 3 f − 1 √ 3 . (7.374)
  • 428. 420 NUMERICAL DIFFERENTIATION AND INTEGRATION 7.16 CALCULATION OF IMPROPER INTEGRALS We will exemplify, in this paragraph, the methods described in Section 7.4 for the calculation of the improper integrals. We consider the integral I = ∞ 0 dx (x + 2) √ x + 1 . (7.375) The integral may be written in the form I = ∞ 0 xe−x dx xe−x(x + 2) √ x + 1 ; (7.376) we may apply the Gauss–Laguerre quadrature formula I ≈ 0.589f (0.9358) + 0.391f (3.3050) + 0.020f (7.7598), (7.377) where f (x) = ex x(x + 2) √ x + 1 . (7.378) It follows that f (0.9358) ≈ 0.667, f (3.3050) ≈ 0.749, f (7.7598) ≈ 10.459, (7.379) I ≈ 0.895. (7.380) By the change of variable x = u − 2, dx = du, (7.381) it follows that I = ∞ 2 du u √ u − 1 . (7.382) By a new change of variable u = 1 v , du = − 1 v2 dv, (7.383) the integral takes the form I = 0 1 2 − 1 v2 dv 1 v 1 v − 1 = 1 2 0 dv √ v(v − 1) . (7.384) By a new change of variable v = w + 1 4 , dv = 1 4 dw, (7.385) it follows that I = 1 −1 dw √ w + 1 √ 3 − w . (7.386)
  • 429. CALCULATION OF IMPROPER INTEGRALS 421 We may apply the Gauss quadrature formula in three points, obtaining I ≈ 5 9 f − 3 5 + 8 9 f (0) + 5 9 f 3 5 , (7.387) where f (w) = 1 √ w + 1 √ 3 − w , (7.388) f − 3 5 ≈ 0.9734, f (0) ≈ 0.5774, f 3 5 ≈ 0.5032, (7.389) I ≈ 1.3336. (7.390) If we wish to apply the Gauss–Jacobi quadrature formula in three points, we calculate successively f − 3 7 ≈ 1.5147, f (0) ≈ 0.5774, f 3 7 ≈ 0.3946, (7.391) I ≈ 7 9 f − 3 7 + 4 9 f (0) + 7 9 f 3 7 ≈ 1.7416. (7.392) Returning to relation (7.382) of the integral, we observe that the asymptotic behavior of the function f (u) = 1 u √ u − 1 (7.393) is given by the function g(u) = 1 u √ u = u− 3 2 . (7.394) Calculating (a > 0) ∞ a g(u)du = −2u− 1 2 ∞ a = 2 √ a , (7.395) we observe that the integral (7.395) may be made as small as we wish by conveniently choosing a. For example, let, a = 100. We may write ∞ 2 du u √ u − 1 = 100 2 du u √ u − 1 + ∞ 100 du u √ u − 1 ≈ 100 2 du u √ u − 1 + ∞ 100 du u 3 2 = 0.2 + 100 2 du u √ u − 1 . (7.396) By the change of variable u = 49v + 51, du = 49dv, (7.397) the last integral (7.396) becomes 100 2 du u √ u − 1 = 1 −1 49dv (49v + 51) √ 49v + 50 . (7.398)
  • 430. 422 NUMERICAL DIFFERENTIATION AND INTEGRATION Applying the Gauss quadrature formula in three points to the last integral f (v) = 49 (49v + 51) √ 49v + 50 , (7.399) f − 3 5 ≈ 1.0823, f (0) ≈ 0.1359, f 3 5 ≈ 0.0587, (7.400) we get 1 −1 49dv (49v + 51) √ 49v + 50 ≈ 0.7455, (7.401) I ≈ 0.9455. (7.402) In form (7.384), this integral may be easily calculated; it has the value I = (arcsin(2v))| 1 2 0 = π 2 ≈ 1.5708. (7.403) We remark that the values thus obtained are sensibly different from the exact value (7.403). The precision may be improved by using Gauss quadrature formulae in several points; but we are thus led to an increased calculation time. 7.17 KANTOROVICH’S METHOD The idea of this method15 consists in writing I = b a f (x)dx (7.404) in the form I = b a g(x)dx + b a (f (x) − g(x))dx, (7.405) where the first integral is directly calculated, while the second one is calculated by numerical formulae. Let us return, by exemplifying, to the example of the preceding paragraph written in the form I = 1 2 0 dx √ x √ 1 − x . (7.406) The function f (x) = 1 √ x √ 1 − x (7.407) is not defined for x = 0. We expand into series the function φ(x) = 1 √ 1 − x (7.408) 15The method was described by Leonid Vitaliyevich Kantorovich (1912–1986).
  • 431. THE MONTE CARLO METHOD FOR CALCULATION OF DEFINITE INTEGRALS 423 around x = 0; it follows that φ(x) = 1 + 1 2 x + 3 4 x2 + 5 16 x3 + 35 128 x4 + · · · (7.409) We get I = 1 2 0 x− 1 2 dx + 1 2 1 2 0 x 1 2 dx + 3 8 1 2 0 x 3 2 dx + 5 16 1 2 0 x 5 2 dx + 35 128 1 2 0 x 7 2 dx + J = 1.5691585 + J, (7.410) where J is the integral J = 1 2 0 1 √ 1 − x − 1 + 1 2 x + 3 8 x2 + 5 16 x3 + 35 128 x4 + · · · dx. (7.411) This last integral is no more an improper one and may be calculated as usual, for example, by the trapezoid formula with the step h = 0.1. Denoting ψ(x) = 1 √ 1 − x − 1 + 1 2 x + 3 8 x2 + 5 16 x3 + 35 128 x4 , (7.412) we have ψ(0) = 0, ψ(0.1) = 2.7 × 10−6 , ψ(0.2) = 9.65 × 10−5 , (7.413) ψ(0.3) = 8.263 × 10−4 , ψ(0.4) = 0.0039944, ψ(0.5) = 0.0143112, J ≈ 0.1 2 [ψ(0) + 2(ψ(0.1) + ψ(0.2) + ψ(0.3) + ψ(0.4)) + ψ(0.5)] = 0.001208. (7.414) It follows that I ≈ 1.50916 + 0.00121 = 1.57037, (7.415) which is a value very close to the exact one I = π/2. 7.18 THE MONTE CARLO METHOD FOR CALCULATION OF DEFINITE INTEGRALS Hereafter, we consider firstly the one-dimensional case, generalizing then for the multidimensional case. 7.18.1 The One-Dimensional Case Let us suppose that we must calculate the integral I = b a f (x)dx, (7.416) where a and b are two finite real numbers, a < b, while f is continuous and positive on [a, b]. With the change of variable x = a + (b − a)t, dx = (b − a)dt, (7.417)
  • 432. 424 NUMERICAL DIFFERENTIATION AND INTEGRATION G(t) G(t) t1 1 P2 P1 O Figure 7.4 The Monte Carlo method in the one-dimensional case. the integral I reads I = 1 0 F(t)dt, (7.418) where F(t) = (b − a)f (a + (b − a)t), (7.419) Let M = max t∈[0,1] F(t), (7.420) so that the integral I may be put in the form I = M 1 0 F(t) M dt = M 1 0 G(t)dt. (7.421) The function G is definite on the interval [0, 1] and takes values in the same interval. Graphically, this is shown in Figure 7.4. Denoted by A the hatched area in Figure 7.4, it follows that the integral I has the form I = MA. (7.422) Obviously, if the value M given by relation (7.420) is difficult to determine, then, we may take a covering value for M. Hence, it follows that the determination of the value of the integral has been reduced to the determination of the area A. To do this, we generate pairs (x, y) of aleatory numbers, uniformly distributed in the interval [0, 1], resulting in the points P1(x1, y1), P2(x2, y2), . . . , Pn(xn, yn). We index the entire variable s by 0. If the point Pi(xi, yi) is in the interior of the hatched area (the case of the point P1 in Fig. 7.4), then the variable s is incremented by a unit; in the opposite case (the case of the point P2 in Fig. 7.4), the variable s remains unchanged. Finally, the area A is approximated by the formula A ≈ s n , (7.423) where n is the number of generatings. Obviously, A = lim n→∞ s n . (7.424)
  • 433. THE MONTE CARLO METHOD FOR CALCULATION OF DEFINITE INTEGRALS 425 Observation 7.18 (i) If the function f changes of sign in the interval [a, b], then we divide the interval [a, b] in subintervals on which the function f keeps a constant sign; thus we apply the described method on each such interval. (ii) If F(t) is negative on the interval [0, 1], then we choose M = min t∈[0,1] F(t) (7.425) and it follows that G : [0, 1] → [0, 1]; the procedure may be applied. 7.18.2 The Multidimensional Case Let the function be y = f (x1, x2, . . . , xn), (7.426) continuous on the closed domain D of Rn and the integral I = D · · · f (x1, x2, . . . , xn)dx1 dx2 · · · dxn. (7.427) The domain D may be included in the n-dimensional hyperparallelepiped [a1, b1] × [a2, b2] × · · · × [an, bn] ⊇ D (7.428) We make the change of variable xi = ai + (bi − ai)ξi, i = 1, n, (7.429) from which D(x1, x2, . . . , xn) D(ξ1, ξ2, . . . , ξn) = b1 − a1 0 · · · 0 0 b2 − a2 · · · 0 · · · · · · · · · · · · 0 0 · · · bn − an = n i=1 (bi − ai); (7.430) the integral I becomes I = E · · · F(ξ1, ξ2, . . . , ξn)dξ1 dξ2 · · · dξn, (7.431) where E marks the n-dimensional hypercube E = [0, 1] × [0, 1] × · · · × [0, 1], (7.432) while F(ξ1, ξ2, . . . , ξn) = n i=1 (bi − ai)f (a1 + (b1 − a1)ξ1, . . . , an + (bn − an)ξn). (7.433)
  • 434. 426 NUMERICAL DIFFERENTIATION AND INTEGRATION We generate groups of n aleatory numbers uniformly distributed in the interval [0, 1]. Let g = (g1, g2, . . . , gn) be such a group. The point P (g1, g2, . . . , gn) may be found in the inte- rior of the transform of the domain D by the changes of variables (7.429), case in which it must be taken in consideration with the value F(g1, g2, . . . , gn). Let us denote by S the set of all the points of this kind obtained by N generations of groups of aleatory uniformly distributed numbers. We define the value ymed = 1 |S| g∈S F(g), (7.434) where |S| is the cardinal number of S, F(g) = F(g1, g2, . . . , gn), while g = (g1, g2, . . . , gn) is the group of n uniformly distributed aleatory numbers. For the integral I follows that the approximate value I ≈ 1 N g∈S F(g). (7.435) If the function F(ξ1, ξ2, . . . , ξn) is positive, then we may consider the integral (7.433) as defining the volume of the body in a (n + 1)-dimensional space given by 0 ≤ ξi ≤ 1, i = 1, n, 0 ≤ y ≤ F(ξ). (7.436) We may find a real positive number B for which 0 ≤ F(ξ) ≤ B. We introduce variable η = 1 B y, (7.437) so that the integral I now becomes I = E×[0,1] · · · dξ1 dξ2 · · · dξndη (7.438) and is equal to the volume of a hypercylinder interior to the (n + 1)-dimensional hypercube given by E × [0, 1]. Now, we also generate groups of uniformly distributed aleatory numbers in the interval [0, 1]; but, in this case, a group will contain n + 1 uniformly distributed aleatory numbers. If we denote by S the set of groups which define points in the interior of the hypercylinder, then I ≈ B |S| N , (7.439) where N is the number of generations of such groups. Observation 7.19 (i) If as a consequence of the generation of a group of aleatory numbers it follows that a point is raised on the frontier of the domain, then this may be considered as a valid point, which is definite by a group of, or on the contrary, it is possible to not take it into consideration. Obviously, immaterial of how we consider such a point, passing to the limit for the number of generations N → ∞, we obtain the searched value of the integral. (ii) The method supposes that we may determine if a group g is a part or not a part of the set S. If the frontier of the domain D is described by complicated expressions, then the validation of a group g may take sufficient time, so that the method is quite slow.
  • 435. NUMERICAL EXAMPLES 427 7.19 NUMERICAL EXAMPLES Example 7.1 Let us consider the function f : [0, 3] → R, f (x) = ex (sin x + cos2 x), (7.440) for which the values in the following table are known. xi yi = f (xi) 0 1 0.5 2.060204 1.2 3.530421 1.8 6.203722 2.3 11.865576 3.0 22.520007 We wish to determine approximations for the values of the derivatives f (x) and f (x) in the interior division knots and to compare these values with the real values. The derivative of the function f is given by f (x) = ex (sin x + cos x + xcos2 x − sin 2x), (7.441) f (x) = ex (2 cos x + cos2 x − 2 sin 2x − 2 cos 2x). (7.442) For the knot x1 = 0.5 we have h = x1 − x0 = 0.5, h1 = x2 − x1 = 0.7 = 1.4h (7.443) and it follows that f (x) ≈ 1 (α + 1)h (f (x2) − f (x0)) = 1 (1.4 + 1)0.5 (3.530421 − 1) = 2.10868 (7.444) f (x) ≈ 1 α(α + 1)h (f (x2) − f (x0) − (1 + α)f (x1)) = 2 1.4 × 2.4 × 0.5 (3.530421 + 1.4 × 1 − 2.4 × 2.060204) = −0.01675. (7.445) The exact values are f (x) = 2.11974, f (x1) = −0.39278. (7.446) The calculations are given in the following table. xi yi approx f (xi) exact f (xi) approx f (xi) exact f (xi) 0 1 2 1 0.5 2.060204 2.10868 2.11974 −0.01675 −0.39278 1.2 3.530411 3.18732 2.49087 2.53636 3.25332 1.8 6.203722 9.09290 7.50632 7.49259 13.76763 2.3 11.865576 13.59690 15.13127 3.24742 13.19643 3.0 22.520007 8.24769 47.43018
  • 436. 428 NUMERICAL DIFFERENTIATION AND INTEGRATION Example 7.2 Let f : [0, 4] → R, f (x) = sin x 1 + cos2x (7.447) and the equidistant division knots x0 = 0, x1 = 1, x2 = 2, x3 = 3, x4 = 4. Approximate values of the derivatives f (xi), f (xi), i = 1, 3, as well as of the derivatives f (0.5), f (0.4), f (3.7), f (3.73) are asked. We construct the table of finite differences. xi yi = f (xi) yi 2yi 3yi 4yi 0 0 0.651330 −0.527588 −0.299956 1.229384 1 0.651330 0.123742 −0.827544 0.929824 2 0.775072 −0.703802 0.102280 3 0.071270 −0.601522 4 −0.530252 If we use an expansion into a Taylor series, then we obtain the following results: f (1) ≈ f (2) − f (0) 2 = 0.387536, f (2) ≈ f (3) − f (1) 2 = −0.290030, (7.448) f (3) ≈ f (4) − f (2) 2 = −0.652662, f (1) ≈ f (0) + f (2) − 2f (1) 12 = −0.527588, f (2) ≈ f (1) + f (3) − 2f (2) 12 = −0.827544, (7.449) f (3) ≈ f (2) + f (4) − 2f (3) 12 = 0.102280. The Newton forward and backward interpolation polynomials read P1(q1) = y0 + q(1) 1 1! y0 + q(2) 1 2! 2 y0 + q(3) 1 3! 3 y0 + q(4) 1 4! 4 y0, (7.450) P2(q2) = y4 + q(1) 2 1! y3 + (q2 + 1)(2) 2! 2 y2 + (q2 + 2)(3) 3! 3 y1 + (q2 + 3)(4) 4! 4 y2, (7.451) respectively, where q1 = x − x0 h , q2 = x − xn h . (7.452) We deduce the following values: • for x = 0.5: q1 = 0.5 − 0 1 = 0.5, (7.453) f (0.5) ≈ 1 1 y0 + q(1) 1 1! 2 y0 + q(2) 1 2! 3 y0 + q(3) 1 3! 4 y0 = 0.501867; (7.454)
  • 437. NUMERICAL EXAMPLES 429 • for x = 0.4: q = 0.4 − 0 1 = 0.4, (7.455) f (0.4) ≈ 1 12 2 y0 + q(1) 1 1! 3 y0 + q(2) 1 2! 4 y0 = −0.801243; (7.456) • for x = 3.7: q2 = 3.7 − 0 1 = −0.3, (7.457) f (3.7) ≈ 1 1 y3 + q2 + 1 (1) 1! 2 y2 + (q2 + 2)(2) 2! 3 y1 + (q2 + 3)(3) 3! 4 y0 = 0.681654, (7.458) • for x = 3.73: q2 = 3.73 − 4 1 = −0.27, (7.459) f (3.73) ≈ 1 12 2 y2 + q2 + 2 (1) 1! 3 y1 + (q2 + 3)(2) 2! 4 y0 = 4.614004. (7.460) On the other hand, f (x) = cos x(2 + sin2 x) (1 + cos2x)2 , (7.461) f (x) = sin x(1 + 7cos2x − 4sin2 x) (1 + cos2x)3 (7.462) and the exact values of the function and of its first two derivative are given in the following table. x f (x) f (x) f (x) 0 0 0.5 0 0.4 0.2102684 0.876641 0.422211 0.5 0.270839 0.624515 0.534146 1 0.651330 0.876641 0.405069 2 0.775072 −0.854707 −0.294121 3 0.071270 −0.510032 0.142858 3.7 −0.308174 −0.654380 −0.596319 3.73 −0.328049 −0.670677 −0.633525 4 −0.530252 −0.8255541 −0.697247 These two examples show: (i) the method that uses the expansion into a Taylor series is more precise than the one which uses interpolation polynomials; (ii) the derivative of first order is more precisely calculated as that of second order; (iii) the numerical derivative does not offer a good precision.
  • 438. 430 NUMERICAL DIFFERENTIATION AND INTEGRATION Example 7.3 Let I = 2 1 x sin dx. (7.463) We shall give approximations of the integral I using various numerical methods. The integral I may be directly calculated, obtaining the value I = (−x cos x + sin x)|2 1 = 1.4404224. (7.464) To apply the trapezium method, we take the division step h = 0.1, obtaining the following data. xi yi = f (xi) 1 0.8414710 1.1 0.9803281 1.2 1.1184469 1.3 1.2526256 1.4 1.3796296 1.5 1.4962425 1.6 1.5993178 1.7 1.6858302 1.8 1.7529257 1.9 1.7979702 2.0 1.8185949 It follows that I ≈ 0.1 2 (f (1) + 2(f (1, 1) + f (1, 2) + · · · + f (1, 9)) + f (2)) = 1.4393350. (7.465) The same problem may be solved by using Simpson’s formula obtaining I ≈ 0.1 3 (f (1) + 2(f (1.2) + f (1.4) + f (1.6) + f (1.8)) + 4(f (1.1) + f (1.3) + f (1.5) + f (1.7) + f (1.9)) + f (2)) = 1.4404233. (7.466) Let us consider the transformation x = 1 2 y + 3 2 , dx = 1 2 dy. (7.467) Now, the integral I reads I = 1 −1 y + 3 2 sin y + 3 2 dy 2 . (7.468) We shall determine the Chebyshev quadrature formulae for the cases n = 2, n = 3, and n = 4, applying them to the integral (7.468). In the case n = 2 we obtain A1 = A2 = 1 (7.469)
  • 439. NUMERICAL EXAMPLES 431 and the system x1 + x2 = 0, x2 1 + x2 2 = 2 3 , (7.470) which it results in Chebyshev’s formula I = 1 −1 f (x)dx ≈ f − 1 √ 3 + f 1 3 (7.471) and, numerically, I ≈ 1.440144. (7.472) If n = 3, then we deduce the values A1 = A2 = A3 = 2 3 (7.473) and the system x1 + x2 + x3 = 0, x2 1 + x2 2 + x2 3 = 1, x3 1 + x3 2 + x3 3 = 0, (7.474) with the solution x1 = − 1 2 , x2 = 0, x3 = 1 2 . (7.475) Chebyshev’s formula reads I = 1 −1 f (x)dx ≈ 2 3 f − 1 2 + f (0) + f 1 2 , (7.476) leading to the value I ≈ 1.440318. (7.477) Finally, in the case n = 4 we obtain the values A1 = A2 = A3 = A4 = 1 2 (7.478) and the system x1 + x2 + x3 + x4 = 0, x2 1 + x2 2 + x2 3 + x2 4 = 4 3 , x3 1 + x3 2 + x3 3 + x3 4 = 0, x4 1 + x4 2 + x4 3 + x4 4 = 4 5 , (7.479) with the solution x1 = −0.79466, x2 = −0.18759, x3 = 0.18759, x4 = 0.79466. (7.480) The integral I will have the value I = 1 −1 f (x)dx ≈ 0.5(f (−0.79466) + f (−0.18759) + f (0.18759) + f (0.79466)), (7.481)
  • 440. 432 NUMERICAL DIFFERENTIATION AND INTEGRATION hence I ≈ 1.440422. (7.482) The same integral I at equation (7.468) may be calculated by quadrature formulae of Gauss type. To do this, we determine firstly the Legendre polynomials: P0(x) = 1, (7.483) P1(x) = 1 2 × 1! d2 (x2 − 1) dx = x, (7.484) P2(x) = 1 22 × 2! d2[(x2 − 1)2] dx2 = 1 2 (3x2 − 1), (7.485) P3(x) = 1 23 × 3! d3 [(x2 − 1)3 ] dx3 = 1 2 (5x3 − 3x), (7.486) P4(x) = 1 24 × 4! d4[(x2 − 1)4] dx4 = 1 8 (35x4 − 30x2 + 3), . . . (7.487) The roots of these polynomials are • for P1(x): x1 = 0; (7.488) • for P2(x): x1 = − 1 3 , x2 = 1 3 ; (7.489) • for P3(x): x1 = − 3 5 , x2 = 0, x3 = 3 5 ; (7.490) • for P4(x): x1 = − 30 + √ 480 70 , x2 = − 30 − √ 480 70 , x3 = − 30 − √ 480 70 , x4 = 30 + √ 480 70 . (7.491) In the case n = 2 we obtain the system A1 + A2 = 2, −A1 1 3 + A2 1 3 = 0, (7.492) with the solution A1 = 1, A2 = 1; (7.493) it results in the quadrature formula I = 1 −1 f (x)dx ≈ f − 1 3 + f 1 3 , (7.494) which is Chebyshev’s quadrature formula (7.471), leading to the same value (7.472) for I.
  • 441. NUMERICAL EXAMPLES 433 In the case n = 3 we obtain the system A1 + A2 + A3 = 2, − 3 5 A1 + 3 5 A3 = 0, 3 5 A1 + 2 5 A3 = 2 3 , (7.495) with the solution A1 = 5 9 , A2 = 8 9 , A3 = 5 9 (7.496) and the formula I = 1 −1 f (x)dx ≈ 5 9 f − 3 5 + 8 9 f (0) + 5 9 f 3 5 , (7.497) from which I ≈ 1.440423. (7.498) For n = 4 we obtain the system A1 + A2 + A3 + A4 = 2, A1x1 + A2x2 + A3x3 + A4x4 = 0, A1x2 1 + A2x2 2 + A3x2 3 + A4x2 4 = 2 3 , A1x3 1 + A2x3 2 + A3x3 3 + A4x3 4 = 0, (7.499) with x1, x2, x3, and x4 given by equation (7.491). The solution of this system is A1 = A4 = x2 3 − 1 3 x2 3 − x2 4 = 0.3478548, A2 = A3 = x2 4 − 1 3 x2 4 − x2 3 = 0.6521452, (7.500) leading to the quadrature formula I = 1 −1 f (x)dx ≈ 0.3478548f (x1) + 0.6521452f (x2) + 0.6521452f (x3) + 0.3478548f (x4) = 0.3478548[f (−0.8611363) + f (0.8611363)] + 0.6521452[f (−0.3399810) + f (0.3399810)], (7.501) from which I ≈ 1.440422. (7.502) Another possibility of determination of the integral (7.463) is by the use of the Monte Carlo method. To do this, we denote by f (x) the function f : [1, 2] → R, f (x) = x sin x, (7.503) the derivative of which is f (x) = sin x − x cos x. (7.504) The equation f (x) = 0 leads to x = tan x, (7.505) without solution in the interval [1, 2].
  • 442. 434 NUMERICAL DIFFERENTIATION AND INTEGRATION Moreover, f (x) > 0 for any x ∈ [1, 2]. We deduce that the maximum value of the function f takes place at the point 2, while the minimum value of the same function takes place at the point 1; we may write max f = f (2) = 1.818595, min f = f (1) = 0.841471. (7.506) We shall generate pairs of aleatory numbers (a, b), where a is an aleatory number uniformly distributed in the interval [1, 2], while b is an aleatory number uniformly distributed in the interval [0, 2]. The value b is then compared with f (a). If b < f (a), then the pair (a, b) is taken into consideration; otherwise it is eliminated. We have made 1000 generations of the following form. Step a b f (a) Counter 1 1.644 1.958 1.639597 0 2 1.064 1.622 0.930259 0 3 1.622 1.414 1.619874 1 4 1.521 0.606 1.519115 2 5 1.212 0.600 1.134820 3 6 1.303 1.086 1.256556 3 7 1.856 0.872 1.781026 4 8 1.648 1.648 1.643091 4 9 1.713 0.702 1.695709 5 10 1.000 1.288 0.841471 5 We obtained the result I ≈ 1.456. (7.507) To apply Euler’s or Gregory’s formulae, we may calculate first Bernoulli’s numbers. Writing et − 1 = t + t2 2! + t3 3! + t4 4! + t5 5! + · · · (7.508) and t et − 1 = B0 + B1t + B2 2! t2 + B3 3! t3 + B4 4! t4 + B5 5! t5 + · · · , (7.509) it follows that t = B0 + B1t + B2 2 t2 + B3 6 t3 + B4 24 t4 + B5 120 t5 + · · · × t + t2 2 + t3 6 + t4 24 + t5 120 + · · · , (7.510) hence B0 = 1, B0 2 + B1 = 0, B0 6 + B1 2 + B2 2 = 0, B0 24 + B1 6 + B2 4 + B3 6 = 0, B0 120 + B1 24 + B2 12 + B3 12 + B4 24 = 0, . . . , (7.511) from which B0 = 1, B1 = − 1 2 , B2 = 1 6 , B3 = 0, B4 = − 1 30 , . . . (7.512)
  • 443. APPLICATIONS 435 On the other hand, f (x) = x cos x + sin x, (7.513) f (x) = −x sin x + 2 cos x, (7.514) f (x) = −x cos x − 3 sin x. (7.515) The first formula of Euler leads to 2 1 f (x)dx ≈ h 9 i=0 f (xi) − h 4 i=1 Bi i! hi−1 [f (i−1) (2) + f (i−1) (1)], (7.516) where f (2) = 1.8185949, f (1) = 0.8414710, (7.517) f (2) = 0.077004, f (1) = 1.381773, (7.518) f (2) = −2.650889, f (1) = 0.239134, (7.519) f (2) = −1.895599, f (1) = −3.064715. (7.520) It follows that I ≈ 1.38428. (7.521) Analogically, we may use the second formula of Euler or Gregory’s formula too. We have seen that the value of the considered integral, calculated by the trapezium method is I2 ≈ 1.4393350. (7.522) If we would use only the points 1.0, 1.2, 1.4, 1.6, 1.8, and 2.0, then the value of the integral, calculated by the trapezium method too, would be I1 ≈ 0.2 2 (f (1) + 2(f (1.2) + f (1.4) + f (1.6) + f (1.8)) + f (2)) = 1.4360706. (7.523) The Richardson extrapolation formula leads to the value I = I2 + I2 − I1 22 − 1 = 1.440423. (7.524) 7.20 APPLICATIONS Problem 7.1 Let us consider the forward eccentric with pusher rod (Fig. 7.5) of a heat engine; the motion law of the valve is given by s = s(φ), where φ is the rotation angle of the cam. Let us determine • the law of motion of the follower; • the parametric equations of the cam; • the variation of the curvature radius of the cam, in numerical values.
  • 444. 436 NUMERICAL DIFFERENTIATION AND INTEGRATION C (XC,YC) D B A Y y XO x r0 ∼s sl1 l Valve ϕ ϕFollower Pusher rod Figure 7.5 Distribution mechanism. We know s =    0 for φ ∈ 0, π 2 h(1 + cos 2φ) for φ ∈ π 2 , 3π 2 0 for φ ∈ 3π 2 , 2π , (7.525) h = 4 mm, CD = a = 3 mm, CB = b = 20 mm, AB = l = 70 mm, l1 = 30.72 mm, r0 = 10 mm, XC = 30 mm, YC = 110 mm. Solution: 1. Theory Denoting by θ the rotation angle of the rocker BD, we obtain the relation θ = arcsin s a . (7.526) The coordinates of the points A, B in the OXY -system (Fig. 7.5) read XA = 0, YA = r0 + l1 + s, XB = XC − b cos θ, YB = YC + b sin θ; (7.527) under these conditions, taking into account the relations (XC − b)2 + (YC − r0 − l1)2 − l2 = 0 (7.528) and using the notations α = b sin θ + YC − r0 − l1, (7.529) β = 2b[(YC − r0 − l1) sin θ + XC(1 − cos θ)], (7.530) the relation (XB − XA)2 + (YB − YA)2 − l2 = 0 (7.531) leads to the equation s2 − 2αs + β = 0, (7.532)
  • 445. APPLICATIONS 437 s 2 1 M P Y y X O x r0 γ ω ϕ ϕ Figure 7.6 Parametric equations of the cam. the solution of which s = α − α2 − β (7.533) represents the law of motion of the follower. The numerical solution is obtained by giving to the angle φ values from degree to degree and by calculating the values of the parameters θi, αi, βi, si, i = 0, 360, by means of relations (7.526), (7.529), (7.530), and (7.533). To establish the parametric equations of the cam in the proper system of axes (Fig. 7.6), the relation between the absolute velocity vM2 , the transportation velocity vM1 , and the relative velocity vM2M1 of the point M2 is written in the form vM2 = vM1 + vM2M1 ; (7.534) projecting on the Oy-axis, we obtain ω ds dφ = ωOM sin γ (7.535) or MP = ds dφ , (7.536) where ω is the angular velocity of the cam.Under these conditions, the coordinates x, y of the point M are x = −(r0 + s) sin φ − ds dφ cos φ, (7.537) y = (r0 + s) cos φ − ds dφ sin φ, (7.538) while the curvature radius R = dx dφ 2 + dy dφ 2 3 2 d2 x dφ2 dy dφ − d2 y dφ2 dx dφ (7.539) becomes R = r + s + d2s dφ2 . (7.540)
  • 446. 438 NUMERICAL DIFFERENTIATION AND INTEGRATION −10 −8 −6 −4 −2 0 2 4 6 8 10 −12 −10 −8 −6 −4 −2 0 2 4 6 8 10 x y Figure 7.7 Representation of the cam. 2. Numerical calculation For a numerical calculation, we give to the angle φi, i = 0, 360, values from degree in degree; thus we calculate successively the parameters si, θi, αi, βi, si, xi, yi, Ri by means of relations (7.525), (7.526), (7.529), (7.530), (7.533), (7.537), (7.538), and (7.540), where for the derivatives ds/dφ, d2s/dφ2 we use finite differences ds dφ |φ=φi = si+1 − si−1 2 180 π , (7.541) d2 s dφ2 |φ=φi = (si+1 − 2si + si−1) 180 π 2 . (7.542) The results obtained for φ = 0◦ , φ = 10◦ , . . . , φ = 360◦ are given in Table 7.2. The representation of the cam is given in Figure 7.7. If the radius r0 of the basis circle is small, then it is possible that the curvature radius becomes negative; the cam is no more useful from a technical point of view in this case. To avoid this situation, we increase the radius r0 of the basis circle. Problem 7.2 Let the equation of nondamped free nonlinear vibrations be ¨x + f (x) = 0, (7.543) where f (x) is an odd function f (x) =    p2 xn ln+1 if x ≥ 0 p2 ln+1 (−1)n−1 xn if x < 0 . (7.544) It is asked to show that the period of vibrations is given by Tn = 4 p (n + 1)ln−1 2An−1 In, (7.545)
  • 447. APPLICATIONS 439 TABLE 7.2 Numerical Results φi si θi αi βi si ds dφ d2s dφ2 xi yi Ri 0 0.000 0.000 69.280 0.000 0.000 0.000 0.000 0.000 10.000 10.000 10 0.000 0.000 69.280 0.000 0.000 0.000 0.000 −1.736 9.848 10.000 20 0.000 0.000 69.280 0.000 0.000 0.000 0.000 −3.420 9.397 10.000 30 0.000 0.000 69.280 0.000 0.000 0.000 0.000 −5.000 8.660 10.000 40 0.000 0.000 69.280 0.000 0.000 0.000 0.000 −6.428 7.660 10.000 50 0.000 0.000 69.280 0.000 0.000 0.000 0.000 −7.660 6.428 10.000 60 0.000 0.000 69.280 0.000 0.000 0.000 0.000 −8.660 5.000 10.000 70 0.000 0.000 69.280 0.000 0.000 0.000 0.000 −9.397 3.420 10.000 80 0.000 0.000 69.280 0.000 0.000 0.000 0.000 −9.848 1.736 10.000 90 0.000 0.000 69.280 0.000 0.000 0.008 0.931 −10.000 −0.008 10.931 100 0.241 0.080 69.308 3.890 0.028 0.318 1.750 −9.820 −2.055 11.778 110 0.936 0.312 69.389 15.105 0.109 0.599 1.430 −9.295 −4.020 11.539 120 2.000 0.667 69.513 32.326 0.233 0.807 0.937 −8.458 −5.816 11.170 130 3.305 1.102 69.665 53.512 0.385 0.919 0.330 −7.365 −7.379 10.715 140 4.695 1.565 69.826 76.135 0.547 0.920 −0.318 −6.075 −8.671 10.229 150 6.000 2.000 69.978 97.464 0.700 0.810 −0.931 −4.648 −9.671 9.769 160 7.064 2.355 70.102 114.904 0.824 0.602 −1.432 −3.137 −10.377 9.393 170 7.759 2.587 70.183 126.311 0.906 0.320 −1.760 −1.578 −10.796 9.146 180 8.000 2.668 70.211 130.278 0.934 0.000 −1.874 0.000 −10.934 9.060 190 7.759 2.587 70.183 126.311 0.906 −0.320 −1.760 1.578 −10.796 9.146 200 7.064 2.355 70.102 114.904 0.824 −0.602 −1.432 3.137 −10.377 9.393 210 6.000 2.000 69.978 97.464 0.700 −0.810 −0.931 4.648 −9.671 9.769 220 4.695 1.565 69.826 76.135 0.547 −0.920 −0.318 6.075 −8.671 10.229 230 3.305 1.102 69.665 53.512 0.385 −0.919 0.330 7.365 −7.379 10.715 240 2.000 0.667 69.513 32.326 0.233 −0.807 0.937 8.458 −5.816 11.170 250 0.936 0.312 69.389 15.105 0.109 −0.599 1.430 9.295 −4.020 11.539 260 0.241 0.080 69.308 3.890 0.028 −0.318 1.750 9.820 −2.055 11.778 270 0.000 0.000 69.280 0.000 0.000 −0.008 0.931 10.000 −0.008 10.931 280 0.000 0.000 69.280 0.000 0.000 0.000 0.000 9.848 1.736 10.000 290 0.000 0.000 69.280 0.000 0.000 0.000 0.000 9.397 3.420 10.000 300 0.000 0.000 69.280 0.000 0.000 0.000 0.000 8.660 5.000 10.000 310 0.000 0.000 69.280 0.000 0.000 0.000 0.000 7.660 6.428 10.000 320 0.000 0.000 69.280 0.000 0.000 0.000 0.000 6.428 7.660 10.000 330 0.000 0.000 69.280 0.000 0.000 0.000 0.000 5.000 8.660 10.000 340 0.000 0.000 69.280 0.000 0.000 0.000 0.000 3.420 9.397 10.000 350 0.000 0.000 69.280 0.000 0.000 0.000 0.000 1.736 9.848 10.000 360 0.000 0.000 69.280 0.000 0.000 0.000 0.000 0.000 10.000 10.000 where In = 1 0 dβ 1 − βn+1 , (7.546) for the initial conditions are t = 0, x = A, ˙x = 0. (7.547) Determine numerically the periods T1, T2, T3, T4, T5 for A = l/λ, λ positive.
  • 448. 440 NUMERICAL DIFFERENTIATION AND INTEGRATION Solution: 1. Theory The differential equation (7.543), written in the form ˙xd(˙x) + f (x)dx = 0, (7.548) is integrated in the form ˙x2 2 + x 0 f (ξ)dξ = C1, (7.549) the integration constant C1 being specified by the initial conditions in the form C1 = A 0 f (ξ)dξ, (7.550) from which relation (7.549) becomes ˙x2 = 2 A x f (ξ)dξ. (7.551) From the very beginning, the velocity ˙x is negative, hence ˙x = − 2 A x f (ξ)dξ, (7.552) so that dt = − dx 2 A x f (ξ)dξ , (7.553) hence t = − x 0 dη 2 A η f (ξ)dξ + C2. (7.554) Taking into account the initial given conditions, it follows that C2 = A 0 dη 2 A η f (ξ)dξ , (7.555) from which the relation becomes t = A x dη 2 A η f (ξ)dξ . (7.556)
  • 449. APPLICATIONS 441 For x = 0 in equation (7.556), we obtain the time T /4 (a quarter of the period), therefore T = 4 A 0 dη 2 A η f (ξ)dξ ; (7.557) replacing f (ξ) by (p2 /ln+1 )ξn , we obtain T = 4 p (n + 1)ln+1 2 A 0 dη An+1 − ηn+1 , (7.558) so that the substitution η = Aβ leads to T = 4 p (n + 1)ln+1 2An+1 In, (7.559) where In is the integral (7.546). 2. Numerical results Numerically, we obtain the values: • with Gauss formula in two points I1 = 1.328412, T1 = 5.3137 λ p , I2 = 1.202903, T2 = 5.8930 λ 3 2 p , I3 = 1.139060, T3 = 6.3977 λ2 p , (7.560) I4 = 1.099923, T4 = 6.9565 λ 5 2 p , I5 = 1.073808, T5 = 7.4603 λ3 p ; • with Gauss formula in three points I1 = 1.395058, T1 = 5.5802 λ p , I2 = 1.259053, T2 = 6.1681 λ 3 2 p , I3 = 1.187340, T3 = 6.7166 λ2 p , (7.561) I4 = 1.143415, T4 = 7.2316 λ 5 2 p , I5 = 1.113941, T5 = 7.7176 λ3 p ; • with Gauss formula in four points I1 = 1.434062, T1 = 5.7362 λ p , I2 = 1.290703, T2 = 6.3231 λ 3 2 p , I3 = 1.214628, T3 = 6.8710 λ2 p , (7.562) I4 = 1.167633, T4 = 7.3848 λ 5 2 p , I5 = 1.135837, T5 = 7.8693 λ3 p .
  • 450. 442 NUMERICAL DIFFERENTIATION AND INTEGRATION Problem 7.3 We consider the equation of nondamped free vibrations ¨x + f (x) = 0, (7.563) where f (x) = f1 (x) , if x ≤ 0, f2(x), if x > 0, (7.564) and f1(0) = f2(0) = 0, f1(x) ≤ 0. (7.565) Show that the period is given by T = 2 0 A1 dη 2 A1 η f1(ξ)dξ + 2 A2 0 dη 2 A2 η f2(ξ)dξ , (7.566) where the distance A1 is specified by the equation 0 −A1 f (x)dx + A2 0 f (x)dx = 0, (7.567) for the initial conditions t = 0, x = A2, ˙x = 0, A2 > 0. (7.568) Numerical application for A2 = 0.25 and f (x) = −6x2 if x ≤ 0, 6x + 64x3 if x > 0. (7.569) Solution: 1. Theory Applying the theorem of kinetic energy and work on the interval BC (Fig. 7.8) and observing that the kinetic energy at the points B and C vanishes, we obtain relation (7.567). Starting from the point B (Fig. 7.8), the particle travels through the direction BO in the time interval t2 given by the relation (7.556) of Problem 7.2, where x is replaced by 0, f (x) by f2(x) and A by A2, so that t2 = A2 0 dη 2 A2 η f2(ξ)dξ . (7.570) C O B x A1 A2 Figure 7.8 Problem 7.3.
  • 451. APPLICATIONS 443 In the study of the motion from the point C toward the point O, we obtain ˙x2 2 + x 0 f1(ξ)dξ = C1, (7.571) the initial conditions t = 0, x = −A1, ˙x = 0 leading to C1 = A1 0 f1(ξ)dξ, (7.572) so that ˙x2 = 2 −A1 x f1(ξ)dξ; (7.573) because the velocity is ˙x > 0, it follows that ˙x = 2 −A1 x f1(ξ)dξ, (7.574) from which t = x 0 dη 2 −A1 η f1(ξ)dξ + C2. (7.575) The initial conditions lead, successively, to C2 = − −A1 0 dη 2 −A1 η f1(ξ)dξ , (7.576) t = x −A1 dη 2 −A1 η f1(ξ)dξ , (7.577) obtaining the time of traveling through the distance CO (equal to the time corresponding to the distance OC) t1 = 0 −A1 dη 2 −A1 η f1(ξ)dξ . (7.578) Summing the times t1 and t2 given by relations (7.578) and (7.570), we obtain half of the period (T /2), hence the relation (7.566). 2. Numerical calculation Relations (7.567) and (7.569) lead to 0 −A1 (−6x2 )dx + A2 0 (6x + 64x3 )dx = 0, (7.579) −2A3 1 + 3A2 2 + 16A4 2 = 0; (7.580)
  • 452. 444 NUMERICAL DIFFERENTIATION AND INTEGRATION because A2 = 0.25, it follows that A1 = 0.5, so that we obtain successively −A1 η f1(ξ)dξ = 2(η3 + 0.125), (7.581) A2 η f2(ξ)dξ = 0.25 − 3η2 − 16η4 , (7.582) T = 2 0 −0.5 dη 4(η3 + 0.125) + 2 0.25 0 dη 2(0.25 − 3η2 − 16η4) , (7.583) so that T = 2.668799 s, (7.584) where, for the calculation of the integrals we use Gauss formula in four points. Problem 7.4 Let us consider the crankshaft mechanism in Figure 7.9, where: • the crank OA has the length r, while the moment of inertia with respect to the point O is equal to J1; • the shaft AB is a homogeneous bar of length l, of constant cross section, of mass m2 and moment of inertia with respect to the center of gravity J2 = m2l2 /12; • the rocker in B has the mass m3. The crank OA is acted by a moment M M = M0 if 0 ≤ φ ≤ π, −M0 if π < φ ≤ 2π, (7.585) and the motion of the mechanism is in a phased regimen, the mean angular velocity of the crank being ωm. We ask to determine • the variation of the angular velocity ω of the crank OA as function of the angle φ; • the irregularity degree δ0 of the motion; • the moment of inertia Jv of a fly wheel rigidly linked to the crank OA, so that the irregularity degree δ be equal to δ0/4. O B C2 (X2,Y2) A X Y 3 2 1 M ω ϕ ψ Figure 7.9 Problem 7.4.
  • 453. APPLICATIONS 445 Numerical application: ωm = 100 rad s−1 , r = 0.04 m, l = 0.2 m, J1 = 0.00016 kg m2 , m2 = 1.2 kg, J2 = 0.004 kg m2, m3 = 0.8 kg, M0 = 4 N m. Solution: 1. Theory Denoting by X2, Y2 the coordinates of the point C2 and by X3 the distance OB, the kinetic energy of the mechanism is T = 1 2 J1ω2 + 1 2 J2 dψ dφ 2 ω2 + 1 2 m2 dX2 dφ 2 + dY2 dφ 2 ω2 + 1 2 m3 dX3 dφ 2 ω2 (7.586) or with the notation Jred(φ) = J1 + J2 dψ dφ 2 + m2 dX2 dφ 2 + dY2 dφ 2 + m3 dX3 dφ 2 , (7.587) T = 1 2 Jred(φ)ω2 . (7.588) The numerical computation of the moment of inertia Jred(φ) is made by the successive relations ψ = arcsin r sin φ l , (7.589) dψ dφ = r cos φ l cos ψ , (7.590) dX2 dφ = −r sin φ − l 2 dψ dφ sin ψ, (7.591) dY2 dφ = l 2 dψ dφ cos ψ, (7.592) dX3 dφ = −r sin φ − l dψ dφ sin ψ. (7.593) Applying the theorem of the kinetic energy between the position in which φ = 0, Jred(0) = J1 + m2r2/3, ω(0) = ω0 and an arbitrary position, we obtain the equality Jred(φ)ω2 − J0ω2 0 = 2L(φ), (7.594) where L(φ) = φ 0 M dφ = M0φ if 0 ≤ φ ≤ π, M0 (2π − φ) if π < φ ≤ 2π. (7.595) The motion is periodic, because L(2π) = 0, L(2π) = L(0), the period being φd = 2π. From equation (7.594), we deduce ω(φ) = 2L(φ) + J0ω2 0 Jred(φ) , (7.596) while the mean angular velocity is given by ωm = 1 2π 2π 0 2L(φ) + J0ω2 0 Jred(φ) dφ. (7.597)
  • 454. 446 NUMERICAL DIFFERENTIATION AND INTEGRATION From equation (7.597), we obtain the unknown ω0. We take as approximate value of start ω = ωm, and with the notation F(ω0) = 1 2π 2π 0 2L(φ) + J0ω2 0 Jred(φ) dφ − ωm, (7.598) applying Newton’s method, it follows that ω0 = − 2π 0 2L(φ) + J0ω2 0 Jred(φ) dφ − 2πωm 2π 0 Jred(φ) 2L(φ) + J0ω2 0 J0ω0 Jred(φ) dφ (7.599) and the iterative process continues till | ω0| < 0.01. From the graphic representation of the function ω(φ), we obtain the values ωmin, ωmax and it follows that δ = ωmax − ωmin ωm . (7.600) Adding the fly wheel of moment of inertia Jv, relation (7.598) becomes F(ω0) = 1 2π 2π 0 2L(φ) + (J0 + Jv)ω2 0 Jred(φ) + Jv dφ − ωm. (7.601) We consider Jv = J0/10 and we calculate ω0, ωmin, ωmax, δ for the set of values Jv, 2 Jv, . . . , comparing δ with δ0/4. The function δ(Jv) is decreasing. 0 50 100 150 200 250 300 350 400 60 80 100 120 140 160 180 200 ϕ (°) ω(rads–1) Figure 7.10 Diagram ω = ω(φ).
  • 455. FURTHER READING 447 0 1 2 3 4 5 6 7 8 × 10−3 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 δ Jv (kg m2 ) Figure 7.11 Diagram δ = δ(Jv); dashed lines mark δ0 and δ0/4. 2. Numerical calculation We obtain the results plotted in the diagrams in Figure 7.10 and Figure 7.11. It follows that ωmin = 67.2455 rad s−1 , ωmax = 195.8535 rad s−1 , δ0 = 1.2861, Jv ≈ 4.5 kg m2 . (7.602) FURTHER READING Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of America. Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis: Theory, Methods and Practice. Boca Raton: CRC Press. Atkinson KE (1989). An Introduction to Numerical Analysis. 2nd ed. New York: John Wiley & Sons, Inc. Atkinson KE (2003). Elementary Numerical Analysis. 2nd ed. New York: John Wiley & Sons, Inc. Bakhvalov N (1976). M´ethodes Num´erique. Moscou: Editions Mir (in French). Berbente C, Mitran S, Zancu S (1997). Metode Numerice. Bucures¸ti: Editura Tehnic˘a (in Romanian). Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole. Butt R (2009). Introduction to Numerical Analysis Using MATLAB. Boston: Jones and Bartlett Publishers. Chapra SC (1996). Applied Numerical Methods with MATLAB for Engineers and Scientists. Boston: McGraw-Hill. Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson. Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice Hall. Davis JD, Rabinowitz P (2007). Methods of Numerical Integration. 2nd ed. New York: Dover Publications. D´emidovitch B, Maron I (1973). ´El´ements de Calcul Num´erique. Moscou: Editions Mir (in French).
  • 456. 448 NUMERICAL DIFFERENTIATION AND INTEGRATION DiBenedetto E (2010). Classical Mechanics: Theory and Mathematical Modeling. New York: Springer- Verlag. Epperson JF (2007). An Introduction to Numerical Methods and Analysis. Hoboken: John Wiley & Sons, Inc. Fung YC, Tong P (2011). Classical and Computational Solid Mechanics. Singapore: World Scientific Publishing. Gautschi W (1997). Numerical Analysis: An Introduction. Boston: Birkh¨auser. Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen- tation of Algorithms. Princeton: Princeton University Press. Hamming RW (1987). Numerical Methods for Scientists and Engineers. 2nd ed. New York: Dover Publications. Hamming RW (2012). Introduction to Applied Numerical Analysis. New York: Dover Publications. Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing. Hildebrand FB (1987). Introduction to Numerical Analysis. 2nd ed. New York: Dover Publications. Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill. Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd ed. Boca Raton: CRC Press. Kleppner D, Kolenkow RJ (2010). An Introduction to Mechanics. Cambridge: Cambridge University Press. Kress R (1996). Numerical Analysis. New York: Springer-Verlag. Krˆılov AN (1957). Lect¸ii de Calcule prin Aproximat¸ii. Bucures¸ti: Editura Tehnic˘a (in Romanian). Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill. Lange K (2010). Numerical Analysis for Statisticians. 2nd ed. New York: Springer-Verlag. Lurie AI (2002). Analytical Mechanics. New York: Springer-Verlag. Mabie HH, Reinholtz CF (1987). Mechanisms and Dynamics of Machinery. 4th ed. New York: John Wiley & Sons, Inc. Marciuk GI (1983). Metode de Analiz˘a Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Roma- nian). Meriam JL, Kraige LG (2012). Engineering Mechanics: Dynamics. Hoboken: John Wiley & Sons, Inc. Otto SR, Denier JP (2005). An Introduction to Programming and Numerical Methods in MATLAB. London: Springer-Verlag. Palm WJ III (2007). Mechanical Vibrations. Hoboken: John Wiley & Sons, Inc. Pandrea N, Popa D (2000). Mecanisme. Teorie s¸i Aplicat¸ii CAD. Bucures¸ti: Editura Tehnic˘a (in Romanian). Pandrea N (2000). Elemente de Mecanica Solidului ˆın Coordonate Pl¨uckeriene. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Pandrea N, St˘anescu ND (2002). Mecanic˘a. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian). Postolache M (2006). Modelare Numeric˘a. Teorie s¸i Aplicat¸ii. Bucures¸ti: Editura Fair Partners (in Romanian). Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific Computing. 3rd ed. Cambridge: Cambridge University Press. Quarteroni A, Sacco R, Saleri F (2010). Numerical Mathematics. 2nd ed. Berlin: Springer-Verlag. Ralston A, Rabinowitz P (2001). A First Course in Numerical Analysis. 2nd ed. New York: Dover Publications. Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press. Salvadori MG, Baron ML (1962). Numerical Methods in Engineering. Englewood Cliffs: Prentice Hall. Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson.
  • 457. FURTHER READING 449 Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN. Bucures¸ti: Editura Tehnic˘a (in Romanian). Sinha AK (2010). Vibration of Mechanical Systems. Cambridge: Cambridge University Press. St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian). Stoer J, Bulirsh R (2010). Introduction to Numerical Analysis. 3rd ed. New York: Springer-Verlag. S¨uli E, Mayers D (2003). An Introduction to Numerical Analysis. Cambridge: Cambridge University Press. Udris¸te C, Iftode V, Postolache M (1996). Metode Numerice de Calcul. Algoritmi s¸i Programe Turbo Pascal. Bucures¸ti: Editura Tehnic˘a (in Romanian).
  • 458. 8 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS AND OF SYSTEMS OF ORDINARY DIFFERENTIAL EQUATIONS This chapter presents the numerical methods for the integration of ordinary differential equations and of systems of differential equations. We thus present Euler’s method, Taylor’s method, the Runge–Kutta methods, the multistep methods, and the predictor–corrector methods. Finally, we close the chapter with some applications. 8.1 STATE OF THE PROBLEM Let us consider the ordinary differential equation dx dt = f(t, x), (8.1) where x ∈ Rn , f : Rn+1 → R, and t ∈ I, with I interval on the real axis. We shall attach to equation (8.1) the initial condition x(t0) = x0 . (8.2) Relations (8.1) and (8.2) form the so-called Cauchy problem or the problem with initial values, which can be written in detail as dx1 dt = f1(t, x1, x2, . . . , xn), dx2 dt = f2(t, x1, x2, . . . , xn), . . . , dxn dt = fn(t, x1, x2, . . . , xn), (8.3) to which we add x1(t0) = x0 1 , x2(t0) = x0 2 , . . . , xn(t0) = x0 n. (8.4) Equation (8.1), to which we added the initial condition (8.2), is equivalent to the differential equation system (8.3), to which we add the initial conditions (8.4). It follows that we can thus treat the most general case of Cauchy problems (8.1) and (8.2). Numerical Analysis with Applications in Mechanics and Engineering, First Edition. Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.  2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc. 451
  • 459. 452 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS The first question is to find the conditions under which Cauchy problems (8.1) and (8.2) have solutions, and especially solutions that are unique. Theorem 8.1 (Of Existence and Uniqueness; Cauchy–Lipschitz1 ). Let f : I × G ⊂ R × Rn be continuous and Lipschitzian. Under these conditions, for any t0 ∈ I and x0 ∈ G, fixed, there exists a neighborhood I0 × J0 × G0 ∈ VRn+2 (t0, t0, x0) (i.e., a neighborhood in Rn+2 for (t0, t0, x0)) with the propriety that I0 × J0 × G0 ⊂ I × I × G and that there exists a unique a function α ∈ C0 I0 × J0 × G0 with the properties dα(tτξ0) dt = f(t, α(tτξ0 )), (8.5) for any t ∈ I0, and α(τ, τ, ξ0 ) = ξ0 , (8.6) for any (τ, ξ0 ) ∈ (J0 × G0). Definition 8.1 We say that Cauchy problems (8.1) and (8.2) are correctly stated if (i) there exists a unique solution x = x(t) of problems (8.1) and (8.2); (ii) there exists ε > 0 with the property that the problem dz dt = f(t, z) + δ(t), z(0) = z0 = x0 + ε0 (8.7) admits a unique solution z = z(t) for any ε0 with ε0 < ε and δ(t) < ε; (iii) there exists a constant K > 0 such that z(t) − x(t) < Kε (8.8) for any t ∈ I. Definition 8.2 Cauchy problem (8.7) is named the perturbed problem associated to Cauchy prob- lems (8.1) and (8.2). Corollary 8.1 Cauchy problems (8.1) and (8.2) are correctly stated problems under the conditions of the Cauchy–Lipschitz theorem. Demonstration. The corollary is obvious considering ε, ε0 such that we do not leave the domain I × G. If we abandon the Lipschitz condition in the Cauchy–Lipschitz theorem, then we can prove only the existence of the solution of the Cauchy problem. Theorem 8.2 (Of Existence; Peano2 ). Let f : I × G ⊂ R × Rn → R be continuous in I × G. Under these conditions, for any (t0, x0 ) ∈ I × G there exists a solution of Cauchy problems (8.1) and (8.2). 1 The theorem is also known as Picard–Lindel¨of theorem. It was called after Charles ´Emile Picard (1856–1941), Ernst Leonard Lindel¨of (1870–1946), Rudolf Otto Sigismund Lipschitz (1832–1903), and Augustin–Louis Cauchy (1789–1857). 2 Giuseppe Peano (1858–1932) proved this theorem in 1886.
  • 460. STATE OF THE PROBLEM 453 Observation 8.1 (i) The Cauchy–Lipschitz and Peano theorems assure the existence and uniqueness or only the existence of the solution of the Cauchy problem, respectively, in a neighborhood of the initial conditions. In general, the solution can be extended without problems to intervals long enough, but there is no rule in this sense. (ii) If we consider the ordinary differential equation dn y dtn = f t, y, dy dt , d2 y dt2 , . . . , dn−1 y dtn−1 , (8.9) with the conditions y(0) = y0, dy dt (0) = y0, . . . , dn−1y dtn−1 (0) = y(n−1) 0 , (8.10) then, using the notations x1 = y, x2 = dy dt , . . . , xn = dn−1y dtn−1 , (8.11) we obtain the system dx2 dt = x1, dx3 dt = x2, . . . , dxn−1 dt = xn−2, dxn dt = f (t, x1, x2, . . . , xn), (8.12) for which the initial conditions are x1(0) = x0 1 = y0, x2(0) = x0 2 = y0, . . . , xn(0) = x0 n = y(n−1) 0 . (8.13) It thus follows that equation (8.9) is not a special case and that it can be considered in the frame of the general Cauchy problems (8.1) and (8.2). (iii) The Cauchy–Lipschitz and Peano theorems give us sufficient conditions for the existence and uniqueness or only for the existence of the solution of Cauchy problems (8.1) and (8.2), respectively. Therefore, it does not mean that, if the hypotheses of these theorems are not satisfied, then the Cauchy problem has no solution or that the solution is not unique. Let us consider, for instance, the problem of a ball that falls on the surface of the Earth, the restitution coefficient being k. The mechanical problem is simple, and if we denote by h0 the initial height of the ball, then it will collide with the Earth, at a speed v0 = 2gh0; after the collision, it will have the speed v = v1 = kv0 (Fig. 8.1). The new height reached by the ball is h1 = v2 1/2g = kh0 and the process can continue, the ball jumping lesser and lesser. During the time when the ball is in the air the mathematical problem is simple, the equation of motion being ¨x = −g. (8.14) The inconveniences appear at the collision between the ball and the Earth, when the velocity vector presents discontinuities in both modulus and sense. Obviously, none of the previous theorems can be applied, although the problem has a unique solution. (iv) As we observed, the Cauchy–Lipschitz or Peano theorems can be applied on some subin- tervals (the time in which the ball is in the air), the solution being obtained piecewise.
  • 461. 454 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS h1 h0 x ν0′ = ν1 ν0 Figure 8.1 The collision between a ball and the surface of the Earth. 8.2 EULER’S METHOD The goal of the method3 is to obtain an approximation of the solution of the Cauchy problem dx dt = f (t, x), t ∈ [t0, tf ], x(t0) = x0 , (8.15) considered as a correct stated problem. Let the interval [t0, tf ] be divided by N + 1 equidistant points (including the limit ones), ti = t0 + ih, h = tf − t0 N , i = 0, 1, . . . , N. (8.16) We shall assume that the unique solution x = x(t) is at least of class C2 in the interval [t0, tf ] and we shall use the Taylor theorem x(ti+1) = x(ti + h) = x(ti) + h dx(ti) dt + h2 2 d2 x(ξi) dt2 , (8.17) where ξi ∈ (ti, ti+1). Relation (8.17) holds for all i = 0, 1, . . . , N − 1. Writing relation (8.15) for t = ti, dx(ti) dt = f (ti, x(ti)) (8.18) and replacing expression (8.18) in equation (8.17), we obtain x(ti+1) = x(ti + h) = x(ti) + hf (ti, x(ti)) + h2 2 d2 x(ξi) dt2 . (8.19) There results the equation x(ti+1) − x(ti) h = f (ti, x(ti)) + h 2 d2 x(ξi) dt2 . (8.20) Because x is of class C2 in the interval [t0, tf ], we deduce that, for a small h, the expression (h/2)d2x(ξi)/dt2 is small enough to be neglected in relation (8.20); hence, we obtain x(ti+1) − x(ti) h ≈ f (ti, x(ti)). (8.21) 3Leonhard Euler (1707–1783) published this method in Institutionum calculi integralis in 1768–1770.
  • 462. EULER’S METHOD 455 Denoting w0 = x(t0), wi+1 = wi + hf (ti, wi), (8.22) we get wi ≈ x(ti) (8.23) for all i = 0, 1, . . . , N. Definition 8.3 Expression (8.22) is named the equation with finite differences associated to Euler’s method. Observation 8.2 Euler’s method can be easily generalized to the n-dimensional case, resulting in the following algorithm: – inputs N, t0, tf, x(t0) = x(0), w(0) = x(0); – calculate h = tf − t0 N ; – for i from 1 to N – calculate ti = t0 + ih; – calculate w(i) = w(i−1) + hf(ti−1, w(i−1) ). Lemma 8.1 Let x ∈ R, x ≥ −1, and m ∈ N∗ arbitrary. Under these conditions exists the inequality 0 ≤ (1 + x)m ≤ emx . (8.24) Demonstration. The first relation (8.24) is evident. For the second one, we shall proceed by induc- tion. For m = 1, the relation becomes (the case m = 0 is evident) 1 + x ≤ ex . (8.25) Let us consider the function g : [−1, ∞) → R, g(x) = ex − x − 1, (8.26) for which g (x) = ex − 1, g (x) = ex > 0. (8.27) The equation g (x) = 0 has the unique solution x = 0, which is a point of minimum and g(0) = 0, such that the relation (8.25) is true for any x ∈ [−1, ∞). Let us assume that expression (8.24) is true for m ∈ N and let us prove it for m + 1. We have (1 + x)m+1 = (1 + x)(1 + x)m ≤ (1 + x)emx ≤ ex emx = e(m+1)x . (8.28) Taking into account the principle of mathematical induction, it follows that equation (8.24) is true for any m ∈ N. Lemma 8.2 If m and n are two real positive numbers and {ai}i=0,k is a finite set of real numbers with a0 ≥ 0, which satisfies the relation ai+1 ≤ (1 + m)ai + n, i = 0, k − 1, (8.29)
  • 463. 456 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS then ai+1 ≤ e(i+1)m n m + a0 − n m , i = 0, k − 1. (8.30) Demonstration. We shall use the induction after i. For i = 0, we have a1 ≤ (1 + m)a0 + n; (8.31) applying Lemma 8.1, we obtain a1 ≤ em a0 + n. (8.32) We shall prove that em a0 + n ≤ em n m + a0 − n m . (8.33) The last relation reads equivalently in the form n ≤ em n m − n m , 1 + m ≤ em , (8.34) obviously true from Lemma 8.1. Let us suppose that the affirmation is true for i and let us prove it for i + 1. We can write ai+1 ≤ (1 + m)ai + n ≤ (1 + m)eim n m + a0 − n m . (8.35) We shall prove that (1 + m)eim n m + a0 − n m ≤ e(i+1)m n m + a0 − n m , (8.36) meaning that 1 + m ≤ em is obviously true. The lemma is completely proved. Theorem 8.3 (Determination of the Error in Euler’s Method). Let x(t) be the unique solution of Cauchy problem (8.15) and wi, i = 0, N, the approximations of the values of the solution obtained using Euler’s method for a certain N > 0, N ∈ N. If x is defined in a convex set D, if it is Lipschitzian in D, of constant L, and if there is M ∈ R, M > 0, such that d2 x dt2 ≤ M; (∀)t ∈ [t0, tf ], (8.37) then |x(ti) − wi| ≤ hM 2L [eL(ti−t0) − 1] (8.38) for i = 0, N. Demonstration. For i = 0 we obtain 0 = |x(t0) − w0| ≤ hM 2L (eL·0 − 1) = 0 (8.39) and the theorem is true.
  • 464. TAYLOR METHOD 457 On the other hand, x(ti+1) = x(ti) + hf (ti, x(ti)) + h2 2 d2x(ti + θih) dt2 , (8.40) where θi ∈ (0, 1), i = 0, N − 1 and wi+1 = wi + hf (ti, wi), i = 0, N − 1. (8.41) It successively results in |x(ti+1) − wi+1| = x ti − wi + hf (ti, x(ti)) − hf (ti, wi) + h2 2 d2 x(ti + θih) dt2 ≤ |x(ti) − wi| + h|f (ti, x(ti)) − f (ti, wi)| + h2 2 d2x ti + θih dt2 ≤ |x(ti) − wi|(1 + hL) + h2 M 2 . (8.42) Now applying Lemma 8.2 with aj = |x(tj ) − wj |, j = 0, N, m = hL, n = h2 M/2, expression (8.42) leads to |x(ti+1) − wi+1| ≤ e(i+1)hL x t0 − w0 + hM 2L − hM 2L . (8.43) Taking into account that x(t0) = w0 = x0 and (i + 1)h = ti+1 − t0, relation (8.43) leads us to expres- sion (8.38) that we had to prove. Observation 8.3 Relation (8.38) shows that the bound of the error depends linearly on the size of the division step h. In conclusion, a better approximation of the solution is obtained by decreasing the division step. 8.3 TAYLOR METHOD We shall consider the Cauchy problem dx dt = f (t, x(t)), t ∈ [t0, tf ], x(t0) = x0, (8.44) considered as a correct stated one, and we shall assume that the function x = x(t), the solution of the problem, is at least of class Cn+1 in the interval [t0, tf ]. Using the expansion of the function x = x(t) into a Taylor series, we can write the relation x(ti+1) = x(ti) + h dx(ti) dt + h2 2 d2 x(ti) dt2 + · · · + hn n! dn x(ti) dtn + hn+1 (n + 1)! dn+1 x(ξi) dtn , (8.45) in which ξi is an intermediary point between ti and ti+1, ξi ∈ (ti, ti+1), ti are the nodes of an equidistant division of the interval [t0, tf ], h is the step of the division, h = (tf − t0)/N, ti = t0 + ih, i = 1, N, and N is the number of points of the division. On the other hand, we have dx dt = f (t, x(t)), (8.46)
  • 465. 458 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS d2 x dt2 = ∂f (t, x(t)) ∂t + ∂f (t, x(t)) ∂x dx dt = ∂f (t, x(t)) ∂t + ∂f (t, x(t)) ∂x f (t, x(t)) = df (t, x(t)) dt = f (t, x(t)), (8.47) d3 x dt3 = d dt df (t, x (t)) dt = ∂f (t, x(t)) ∂t + ∂f (t, x(t)) ∂x dx(t) dt = ∂f (t, x(t)) ∂t + ∂f (t, x(t)) ∂x f (t, x(t)) = d2 f (t, x(t)) dt2 = f (t, x(t)). (8.48) and, in general, dk x(t) dtk = dx(k−1) (t) dt = df (k−2) (t, x(t)) dt = f (k−1) (t, x(t)). (8.49) Replacing these derivatives in equation (8.45), it follows that x(ti+1) = x(ti) + hf (ti, x(ti)) + h2 2 f (ti, x(ti)) + · · · + hn n! f (n−1) (ti, x(ti)) + hn+1 (n + 1)! f (n) (ξi, x(ξi)). (8.50) Renouncing to the remainder, we obtain the equation with finite differences w0 = x(t0) = x0, wi+1 = wi + hT (n) (ti, wi), i = 0, N − 1, (8.51) where T (n) (ti, wi) = f (ti, wi) + h 2 f (ti, wi) + · · · + hn−1 n! f (n−1) (ti, wi). (8.52) Definition 8.4 Relation (8.51) is called the equation with differences associated to the nth-order Taylor’s method. Observation 8.4 Euler’s method is in fact the first-order Taylor’s method. 8.4 THE RUNGE–KUTTA METHODS The Runge–Kutta method4 implies the obtaining of the values c1, α1, and β1, such that c1f (t + α1, x + β1) approximates T (2) (t, x) = f (t, x) + (h/2)f (t, x) with an error at most equal to O(h2 ), which is the truncation error for the second-order Taylor’s method. On the other hand, f (t, x(t)) = df dt (t, x(t)) = ∂f ∂t (t, x(t)) + ∂f ∂x (t, x(t))x (t), (8.53) where x (t) = f (t, x(t)), (8.54) 4The methods were developed by Carl David Tolm´e Runge (1856–1927) and Martin Wilhelm Kutta (1867–1944) in 1901.
  • 466. THE RUNGE–KUTTA METHODS 459 hence T (2) (t, x(t)) = f (t, x(t)) + h 2 ∂t ∂x (t, x(t)) + h 2 ∂t ∂x (t, x(t))f (t, x(t)). (8.55) Expanding into a Taylor series c1f (t + α1, x + β1) around (t, x), we obtain c1f (t + α1, x + β1) = c1f (t, x) + c1α1 ∂f ∂t (t, x) + c1β1 ∂f ∂t (t, x) + c1R2(t + α1, x + β1), (8.56) where the rest R2(t + α1, x + β1) reads R2(t + α1, x + β1) = α2 1 2 ∂2 f ∂t2 (τ, ξ) + α1β1(τ, ξ) + β2 1 2 ∂2 f ∂x2 (τ, ξ). (8.57) Identifying the coefficients of f and of its derivatives in formulae (8.55) and (8.56), we find the system c1 = 1, c1α1 = h 2 , c1β1 = h 2 f (t, x), (8.58) the solution of which is c1 = 1, α1 = h 2 , β1 = h 2 f (t, x). (8.59) Therefore, it follows that T (2) (t, x) = f t + h 2 , x + h 2 f (t, x) − R2 t + h 2 , x + h 2 f (t, x) , (8.60) where R2 t + h 2 , x + h 2 f (t, x) = h2 8 ∂2f ∂t2 (τ, ξ) + h2 4 f (t, x) ∂2f ∂t∂x (τ, ξ) + h2 8 [f (t, x)]2 ∂2f ∂x2 (τ, ξ). (8.61) Observation 8.5 If all second-order partial derivatives of f are bounded, then R2[t + (h/2), x + (h/2)f (t, x)] will be of order O(h2 ). Definition 8.5 The method with differences obtained from Taylor’s method by replacing T 2(t, x) is called the Runge–Kutta method of the mean point. The mean point method is given by the relations w0 = x(t0) = x0, wi+1 = wi + hf ti + h 2 , wi + h 2 f ti, wi , i = 0, N − 1. (8.62) Definition 8.6 (i) If we approximate T (2) (t, x(t)) = f (t, x(t)) + h 2 ∂f ∂x (t, x(t)) + h2 6 ∂2f ∂x2 (t, x(t)) (8.63) by an expression of the form T (2) (t, x(t)) ≈ c1f (t, x(t)) + c2f (t + α2, x + δ2f (t, x(t))) (8.64)
  • 467. 460 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS so that the error is of order O(h2 ), and if we choose the parameters c1 = c2 = 1 2 , α2 = δ2 = h, (8.65) then we obtain the modified Euler method for which the equation with differences reads w0 = x(t0) = x0, wi+1 = wi + h 2 [f(ti, wi) + f(ti+1, wi + hf(ti, wi))], i = 0, N − 1. (8.66) (ii) Under the same conditions, choosing c1 = 1 4 , c2 = 3 4 , α2 = δ2 = 2 3 h, (8.67) we obtain Heun’s method,5 for which the equation with differences is of the form w0 = x(t0) = x0, wi+1 = wi + h 4 f ti, wi + 3f ti + 2 3 h, wi + 2 3 hf ti, wi , i = 0, N − 1. (8.68) Analogically, the higher order Runge–Kutta formulae are established: – the third-order Runge–Kutta method for which the equation with differences is w0 = x(t0) = x0, K1 = hf(ti, wi), K2 = hf ti + h 2 , wi + K1 2 , K3 = hf(ti + h, wi + 2K2 + K1), wi+1 = wi + 1 6 (K1 + 4K2 + K3); (8.69) – the fourth-order Runge–Kutta method for which the equation with differences reads w0 = x(t0) = x0, K1 = hf(ti, wi), K2 = hf ti + h 2 , wi + K1 2 , K3 = hf ti + h, wi + K2 2 , K4 = hf(ti + h, wi + K3), wi+1 = wi + 1 6 (K1 + 2K2 + 2K3 + K4); (8.70) – the sixth-order Runge–Kutta method for which the equation with differences has the form w0 = x(t0) = x0, K1 = hf(ti, wi), K2 = hf ti + h 2 , wi + K1 3 , K3 = hf ti + 2h 5 , wi + 1 25 6K2 + 4K1 , K4 = hf ti + h, wi + 1 4 15K3 − 12K2 + K1 , K5 = hf ti + 2h 3 , wi + 2 81 4K4 − 25K3 + 45K2 + 3K1 , K6 = hf ti + 4h 5 , wi + 1 75 8K4 + 10K3 + 36K2 + 6K1 , wi+1 = wi + 1 192 (23K1 + 125K3 − 81K5 + 125K6). (8.71) 5After Karl L. W. M. Heun (1859–1929) who published it in 1900.
  • 468. THE RUNGE–KUTTA METHODS 461 Definition 8.7 The local error is the absolute value of the difference between the approximation at a division point and the exact solution at the same point of the Cauchy problem that has as initial value the approximation at the previous division point. Observation 8.6 If y(t) is the solution of the Cauchy problem ˙y(t) = f (t, y), t0 ≤ t ≤ tf , y(t0) = wi, (8.72) where wi is the approximate value obtained using the method with differences, then the local error at the point ti+1 has the expression εi+1(h) = |y(ti+1) − wi+1|. (8.73) In various problems, we can apply methods that exert some control on the error too. One of these methods is the Runge–Kutta–Fehlberg method6 for which the algorithm is the following: – inputs ε > 0, t0, w0 = x(t0) = x0, h = ε1/4, tf ; – i = 0; – while ti + h ≤ tf – calculate w0 = x(t0) = x0, K1 = hf(ti, wi), K2 = hf ti + h 2 , wi + 1 3 K1 , K1 = hf(ti, wi), K2 = hf ti + h 4 , wi + 1 4 K1 , K3 = hf ti + 3h 8 , wi + 3 32 K1 + 9 32 K2 , K4 = hf ti + 12 13 h, wi + 1932 2197 K1 − 7200 2197 K2 + 7296 2197 K3 , K5 = hf ti + h, wi + 439 216 K1 − 8K2 + 3680 513 K3 − 845 4104 K4 , K6 = hf ti + h 2 , wi − 8 27 K1 + 2K2 − 3544 2565 K3 + 1859 4104 K4 − 11 40 K5 ; – calculate wi+1 = wi + 25 216 K1 + 1408 2565 K3 + 2197 4104 K4 − 1 5 K5, wi+1 = wi + 16 135 K1 + 6656 12825 K3 + 28561 56430 K4 − 9 50 K5 + 2 55 K6; – calculate ri+1 = 1 h (wi+1 − wi), δ = 0, 84 ε ri+1 1 4 ; – if δ ≤ 0.1, then h := 0.1h; – if δ ≥ 4, then h := 4h; – if 0.1 < δ < 4, then h := δh; – if ri+1 ≤ ε, then i := i + 1. In this case, wi approximates xi(t) with a local error at most ε. 6The algorithm was presented by Erwin Fehlberg (1891–1979) in Classical fifth-, sixth-, seventh-, and eighth-order Runge–Kutta formulae with stepsize control in 1968.
  • 469. 462 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS 8.5 MULTISTEP METHODS The methods presented before required only the knowledge of the value xi at the point ti to determine numerically the value xi+1 at the point ti+1. Therefore, it was necessary to return only one step to determine the new value; it means that we discussed one-step methods. The following methods use approximations of the solutions in more previous points to determine the approximate solution at the present division point. Definition 8.8 It is called the multistep method for the determination of the approximate wi of the solution of the Cauchy problem ˙x(t) = f(t, x), t0 ≤ t ≤ tf , x(t0) = x0, (8.74) at the division point ti+1, by using the equations with finite differences, which can be represented in the form wi+1 = am−1wi + am−2wi−1 + · · · + a0wi+1−m + h[bmf(ti+1, wi+1) + bm−1(ti+1, wi+1) + · · · + b0f(ti+1−m, wi+1−m)], i = m − 1, . . . , N − 1, (8.75) where N is the number of the division steps of the interval [t0, tf ], h is the division step of the same interval, h = (tf − t0)/N, m > 1, and in addition w0 = x(t0) = x0, w1 = x(t1) = x1, . . . , wm−1 = x(tm−1) = xm−1. (8.76) Definition 8.9 (i) If bm = 0, then the method is called explicit or open because relation (8.75) is an explicit equation to determine wi+1. (ii) If bm = 0, then the method is called implicit or closed because wi+1 appears in both members of expression (8.75). Observation 8.7 The start values w0, w1, . . . , wm−1 must be specified according to formula (8.76); that is, they must be the exact values of the function x = x(t) at the points t0, t1 = t0 + h, . . . , tm−1 = t0 + (m − 1)h, or they can be determined using a one-step method starting from the value w0 = x(t0) = x0. The most used technique to obtain multistep methods starts from the evident equality x(ti+1) = x(ti) + ti+1 ti f(t, x(t))dt. (8.77) Owing to the fact that the integral at the right hand part of relation (8.77) cannot be calculated because the solution x(t) is not known, we replace f(t, x(t)) by an interpolation polynomial P(t) that is determined as a function of the known values (t0, w0), (t1, w1), . . . , (ti, wi), where wj = x(tj ), j = 0, i. Relation (8.77) now becomes x(ti+1) ≈ x(ti) + ti+1 ti P(t)dt. (8.78)
  • 470. ADAMS’S METHOD 463 8.6 ADAMS’S METHOD In the equation7 dx dt = f (t, x), (8.79) we replace the function f (t, x) by the first five terms of Newton’s polynomial N5(q) = x0 + q 1! x0 + q(q − 1) 2! 2 x0 + q(q − 1)(q − 2) 3! 3 x0 + q(q − 1)(q − 2)(q − 3) 4! 4 x0, (8.80) in which q = (t − t0)/h, h = (tf − t0)/N, N being the number of the division points in the interval t0, tf , t = t0 + qh, dt = hdq. Integrating, it follows that x1 − x0 = x0+h x0 f (t, x)dt = h 1 0 f (t, x)dq = h x0 + 1 2 x0 − 1 12 2 x0 − 1 24 3 x0 − 19 720 4 x0 , (8.81) x2 − x0 = x0+2h x0 f (t, x)dt = h 2 0 f (t, x)dq = h 2x0 + 2 x0 + 1 3 2 x0 − 1 90 4 x0 , (8.82) x3 − x0 = x0+3h x0 f (t, x)dt = h 3 0 f (t, x)dq = h 3x0 + 9 2 x0 + 9 4 2 x0 + 3 8 3 x0 − 3 80 4 x0 , (8.83) x4 − x0 = x0+4h x0 f (t, x)dt = h 4 0 f (t, x)dq = h 4x0 + 8 x0 + 20 3 2 x0 + 8 3 3 x0 + 14 45 4 x0 . (8.84) The calculation involves successive approximations: – approximation 1: x(1) 1 = x0 + f (t0, x0), f (t0, x0) = f (t1, x(1) 1 ) − f (t0, x0); (8.85) 7 The method was presented by John Couch Adams (1819–1892). It appears for the first time in a letter written by F. Bashforth in 1855.
  • 471. 464 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS – approximation 2: x(2) 1 = x0 + hf (t0, x0) + 1 2 h f (t0, x0), x(1) 2 = x0 + 2hf (t0, x0) + 2h f (t0, x0), f (t0, x0) = f (t1, x(2) 1 ) − f (t0, x0), 2 f (t0, x0) = f (t2, x(1) 2 ) − 2f (t1, x(2) 1 ) + f (t0, x0); (8.86) – approximation 3: x(3) 1 = x0 + hf (t0, x0) + 1 2 h f (t0, x0) − 1 12 h 2 f (t0, x0), x(2) 2 = x0 + 2hf (t0, x0) + 2h f (t0, x0) + 1 3 h 2 f (t0, x0), x(1) 3 = x0 + 3hf (t0, x0) + 9 2 h f (t0, x0) + 9 4 h 2 f (t0, x0), f (t0, x0) = f (t1, x(3) 1 ) − f (t0, x0), 2 f (t0, x0) = f (t2, x(2) 2 ) − 2f (t1, x(3) 1 ) + 3f (t0, x0), 3 f (t0, x0) = f (t3, x(1) 3 ) − 3f (t2, x(2) 2 ) + 3f (t1, x(3) 1 ) − f (t0, x0); (8.87) – approximation 4: x(4) 1 = x0 + hf (t0, x0) + 1 2 h f (t0, x0) − 1 12 h 2 f (t0, x0) + 1 24 h 3 f (t0, x0), x(3) 2 = x0 + 2hf (t0, x0) + 2h f (t0, x0) + 1 3 h 2 f (t0, x0), x(2) 3 = x0 + 3hf (t0, x0) + 9 2 h f (t0, x0) + 9 4 h 2 f (t0, x0) + 1 8 h 3 f (t0, x0), x(1) 4 = x0 + 4hf (t0, x0) + 8h f (t0, x0) + 20 3 h 2 f (t0, x0) + 8 3 h 3 f (t0, x0), f (t0, x0) = f (t1, x(4) 1 ) − f (t0, x0), 2 f (t0, x0) = f (t2, x(3) 2 ) − 2f (t1, x(4) 1 ) + f (t0, x0), 3 f (t0, x0) = f (t3, x(2) 3 ) − 3f (t2, x(3) 2 ) + 3f (t1, x(4) 1 ) − f (t0, x0), 4 f (t0, x0) = f (t4, x(1) 4 ) − 4f (t3, x(2) 3 ) + 6f (t2, x(3) 2 ) − 4f (t1, x(4) 1 ) + f (t0, x0); (8.88) – approximation 5: x(5) 1 = x0 + hf (t0, x0) + 1 2 h f (t0, x0) − 1 12 h 2 f (t0, x0) + 1 24 h 3 f (t0, x0) − 19 720 h 4 f (t0, x0), x(4) 2 = x0 + 2hf (t0, x0) + 2h f (t0, x0) + 1 3 h 2 f (t0, x0) − 1 90 h 4 f (t0, x0)
  • 472. THE ADAMS–BASHFORTH METHODS 465 x(3) 3 = x0 + 3hf (t0, x0) + 9 2 h f (t0, x0) + 9 4 h 2 f (t0, x0) + 3 8 h 3 f (t0, x0) − 3 80 h 4 f (t0, x0), x(2) 4 = x0 + 4hf (t0, x0) + 8h f (t0, x0) + 20 3 h 2 f (t0, x0) + 8 3 h 3 f (t0, x0) + 14 45 h 4 f (t0, x0), f (t0, x0) = f (t1, x(5) 1 ) − f (t0, x0), 2 f (t0, x0) = f (t2, x(4) 2 ) − 2f (t1, x(5) 1 ) + f (t0, x0), 3 f (t0, x0) = f (t3, x(3) 3 ) − 3f (t2, x(4) 2 ) + 3f (t1, x(5) 1 ) − f (t0, x0), 4 f (t0, x0) = f (t4, x(2) 4 ) − 4f (t3, x(3) 3 ) + 6f (t2, x(4) 2 ) − 4(t1, x(5) 1 ) + f (t0, x0). (8.89) The values x1, x2, x3, x4 are calculated repeatedly according to formula (8.86), formula (8.87), formula (8.88), and formula (8.89) until the difference between two successive iterations decreases under an imposed value. We now replace the function f (t, x) by Newton’s polynomial N∗ 5 (q) = f (ti, xi) + q 1! f (ti−1, xi−1) + q(q + 1) 2! 2 f (ti−2, xi−2) + q(q + 1)(q − 2) 3! 3 f (ti−3, xi−3) + q(q + 1)(q − 2)(q − 3) 4! 4 f (ti−4, xi−4), (8.90) where q = (t − ti)/h. Thus, it follows that ti+1 ti f (t, x)dt = h 1 0 f (t, x)dq. (8.91) Integrating, we deduce Adams’s formula xi+1 = xi + hf (ti, xi) + 1 2 h f (ti−1, xi−1) + 5 12 h 2 f (ti−2, xi−2) + 3 8 h 3 f (ti−3, xi−3) + 251 720 h 4 f (ti−4, xi−4), i = 4, 5, . . . (8.92) 8.7 THE ADAMS–BASHFORTH METHODS To deduce the recurrent formula of the Adams–Bashforth method,8 we shall start from the relation f (ti + qh) = f (ti) + q 1! f (ti−1) + q(q + 1) 2! 2 f (ti−2) + q(q − 1)(q − 2) 3! 3 f (ti−3) + · · · (8.93) 8The methods were published by John Couch Adams (1819–1892) and Francis Bashforth (1819–1912) in An Attempt to Test the Theories of Capillary Action by Comparing the Theoretical and Measured Forms of Drops of Fluid, with an Explanation of the Method of Integration Employed in Constructing the Tables which Give the Theoretical Forms of Such Drops in 1882.
  • 473. 466 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS It follows that ti+1 ti P (t)dt = h 1 0 f (ti + qh)dq; (8.94) using expression (8.93), we obtain ti+1 ti P (t)dt = h 1 0 f (ti)dq + h 1 f (ti−1) 1 0 qdq + h 2! 2 f (ti−2) 1 0 q(q + 1)dq + · · · + h r! r f (ti−r ) 1 0 q(q + 1) · · · (q + r − 1)dq + · · · (8.95) Calculating the integrals and limiting ourselves to the terms up to r f (ti−r ), we deduce the expression ti+1 ti P (t)dt = hf (ti) + h 2 f (ti−1) + 5h 12 2 f (ti−2) + 3h 8 3 f (ti−3) + 251h 720 4 f (ti−4) + · · · (8.96) Thus, it results in the recurrent relation for the Adams–Bashforth method xi+1 = xi + hf (ti) + h 2 f (ti−1) + 5h 12 2 f (ti−2) + 3h 8 3 f (ti−3) + 251h 720 4 f (ti−4) + · · · (8.97) Depending on the degree r of the interpolation polynomial, we deduce different Adams–Bashforth formulae: – for r = 1: xi+1 = xi + h 2 [3f (ti, xi) − f (ti−1, xi−1)]; (8.98) – for r = 2: xi+1 = xi + h 12 [23f (ti, xi) − 16f (ti−1, xi−1) + 5f (ti−2, xi−2)]; (8.99) – for r = 3: xi+1 = xi + h 24 [55f (ti, xi) − 59f (ti−1, xi−1) + 37f (ti−2, xi−2) − 9f (ti−3, xi−3)]; (8.100) – for r = 4: xi+1 = xi + h 720 [1901f (ti, xi) − 2774f (ti−1, xi−1) + 2616f (ti−2, xi−2) − 1274f (ti−3, xi−3) + 251f (ti−4, xi−4)]. (8.101) The most used methods are those of the third, fourth, and fifth order, for which the recurrent relations read as follows:
  • 474. THE ADAMS–MOULTON METHODS 467 – the third-order Adams–Bashforth method: w0 = x(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2, wi+1 = wi + h 12 [23f(ti, wi) − 16f(ti−1, wi−1) + 5f(ti−2, wi−2)]; (8.102) – the fourth-order Adams–Bashforth method: w0 = x(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2, w3 = x(t3) = x3, wi+1 = wi + h 24 [55f (ti, wi) − 59f (ti−1, wi−1) + 37f (ti−2, wi−2) − 9f (ti−3, wi−3)]; (8.103) – the fifth-order Adams–Bashforth method: w0 = x(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2, w3 = x(t3) = x3, w4 = x(t4) = x4, wi+1 = wi + h 720 [1901f (ti, wi) − 2774f (ti−1, wi−1) + 2616f (ti−2, wi−2) − 1274f (ti−3, wi−3) + 251f (ti−4, wi−4)]. (8.104) Observation 8.8 The start values w0, w1, . . . are obtained using a one-step method. 8.8 THE ADAMS–MOULTON METHODS Writing the interpolation polynomial P (t) in the form P (t) = f (ti+1 ) + q − 1 1! f (ti) + (q − 1)q 2! 2 f (ti−1) + (q − 1)q(q + 1) 3! 3 f (ti−2) + (q − 1)q(q + 1)(q + 2) 4! 4 f (ti−3) + · · · + (q − 1)q(q + 1) . . . (q + r − 2) r! r f (ti−r+1), (8.105) it results, by integration, in ti+1 ti P (t)dt = hf (ti+1 ) − h 2 f (ti) − h 12 2 f (ti−1) − h 24 3 f (ti−2) + 19h 720 4 f (ti−3) − · · · (8.106) Limiting the number of terms in the right-hand side of formula (8.106), we obtain the following particular expressions: – for r = 1: xi+1 = xi + 0.5h[f(ti+1, xi+1) + f(ti, xi)]; (8.107) – for r = 2: xi+1 = xi + h 12 [5f(ti+1, xi+1) + 8f(ti, xi) − f(ti−1, xi−1)]; (8.108) – for r = 3: xi+1 = xi + h 24 [9f(ti+1, xi+1) + 19f(ti, xi) − 5f(ti−1, xi−1) + f(ti−2, xi−2)]; (8.109)
  • 475. 468 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS – for r = 4: xi+1 = xi + h 720 [251f(ti+1, xi+1) + 646f(ti, xi) − 246f(ti−1, xi−1) + 106f(ti−2, xi−2) − 19f(ti−3, xi−3)]. (8.110) The most used Adams–Moulton methods9 are those of the third, fourth, and fifth order for which the equations with differences read as follows: – the third-order Adams–Moulton method: w0 = x0(t0) = x0, w1 = x(t1) = x1, wi+1 = wi + h 12 [5f(ti+1, wi+1) + 8f(ti, wi) − f(ti−1, wi−1)]; (8.111) – the fourth-order Adams–Moulton method: w0 = x0(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2, wi+1 = wi + h 24 [9f(ti+1, wi+1) + 19f(ti, wi) − 5f(ti−1, wi−1) + f(ti−2, wi−2)]; (8.112) – the fifth-order Adams–Moulton method: w0 = x0(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2, w3 = x(t3) = x3, wi+1 = wi + h 720 [251f(ti+1, wi+1) + 646f(ti, wi) − 264f(ti−1, wi−1) + 106f(ti−2, wi−2) − 19f(ti−3, wi−3)]. (8.113) Observation 8.9 (i) Unlike the Adams–Bashforth methods in which the required value wi+1 appears only on the left side of the equality, in the Adams–Moulton formulae this appears both on the left and right sides of the equal sign. It follows that, at each step, it is necessary to solve an equation of the form wi+1 = wi + h[c0f(ti+1, wi+1) + c1f(ti, wi) + · · ·], (8.114) where c0, c1, . . . are the coefficients that appear in the respective Adams–Moulton formula. (ii) Equation (8.114) is solved by successive approximations using the recurrent formula w(k) i+1 = wi + h[c0f(ti+1, w(k−1) i+1 ) + c1f(ti, wi) + · · ·], (8.115) an expression that can also be written in the form w(k) i+1 = wi+1 + hc0f(ti+1, w(k−1) i+1 ) − hc0f(ti, wi), (8.116) obtained by subtraction of equation (8.114) from equation (8.115). 9Forest Ray Moulton (1872–1952) published these methods in New Methods in Exterior Ballistics in 1926.
  • 476. PREDICTOR–CORRECTOR METHODS 469 (iii) If the function f is Lipschitzian in the second variable and if there exists L > 0 so as to have for any y and z f(t, y) − f(t, z) ≤ L y − z , (8.117) then expression (8.116) can be written as w(k) i+1 − wi+1 ≤ hc0L w(k−1) i+1 − wi+1 . (8.118) The last formula offers us the sufficient condition for the convergence of the iterative procedure hc0L < 1 or h < 1 c0L . (8.119) 8.9 PREDICTOR–CORRECTOR METHODS Definition 8.10 A predictor–corrector method is a linear combination between an explicit multi- step method and an implicit multistep one, the first realizing a predetermination of the value xi+1 function of the previous values xi, xi−1, . . . , and the second realizing a more correct evaluation of the value xi+1. Observation 8.10 The corrector formula can be applied more times until the difference between two successive iterations x(k) i+1 and x(k+1) i+1 becomes less than an imposed value ε, that is, x(k+1) i+1 − x(k) i < ε. (8.120) We shall now present a few most used predictor–corrector methods. 8.9.1 Euler’s Predictor–Corrector Method In this case, the formula with differences reads w0 = x(t0) = x0, w pred i+1 = wi + hf(ti, wi), wcor i+1 = wi + 0.5h[f(ti, wi) + f(ti+1, w pred i+1 ). (8.121) 8.9.2 Adams’s Predictor–Corrector Methods These methods consist of an Adams–Bashforth method with the role of predictor for wi+1 and of an Adams–Moulton method with the role of corrector, both methods having the same order. We obtain – the third-order predictor–corrector Adams’s algorithm for which the equations with differ- ences read w0 = x(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2, w pred i+1 = wi + h 12 [23f(ti, wi) − 16f(ti−1, wi−1) + 5f(ti−2, wi−2)], wcor i = wi + h 12 [5f(ti+1, w pred i+1 ) + 8f(ti, w pred i ) − f(ti−1, w pred i−1 ); (8.122)
  • 477. 470 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS – the fourth-order predictor–corrector Adams’s algorithm for which the equations with differ- ences are w0 = x(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2, w3 = x(t3) = x3, w pred i+1 = wi + h 24 [55f(ti, wi) − 59f(ti−1, wi−1) + 37f(ti−2, wi−2) − 9f(ti−3, wi−3)], wcor i+1 = wi + h 24 [9f(ti+1, w pred i+1 ) + 19f(ti, w pred i ) − 5f(ti−1, w pred i−1 ) + f(ti−2, w pred i−2 )]; (8.123) – the fifth-order predictor–corrector Adams’s algorithm for which the equations with differences have the expressions w0 = x(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2, w3 = x(t3) = x3, w4 = x(t4) = x4, w pred i+1 = wi + h 720 [1901f(ti, wi) − 2774f(ti−1, wi−1) + 2616f(ti−2, wi−2) − 1274f(ti−3, wi−3) + 646f(ti−4, wi−4)], − 264f(ti−1, w pred i−1 ) + 106f(ti−2, w pred i−2 ) − 19f(ti−3, w pred i−3 )]. (8.124) The most used is the fourth-order predictor–corrector algorithm. 8.9.3 Milne’s Fourth-Order Predictor–Corrector Method For this method,10 the equations with differences read w0 = x(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2, w3 = x(t3) = x3, w pred i+1 = wi−3 + 4 3 h[2f(ti, wi) + 2f(ti−2, wi−2) − f(ti−1, wi−1)], wcor i+1 = wi−1 + h 3 [f(ti+1, wi+1) + 4f(ti, wi) + f(ti−1, wi−1)]. (8.125) 8.9.4 Hamming’s Predictor–Corrector Method The equations with differences are, in this case,11 w0 = x(t0) = x0, w1 = x(t1) = x1, w2 = x(t2) = x2, w3 = x(t3) = x3, w pred i+1 = wi−3 + 4 3 h[2f(ti, wi) + 2f(ti−2, wi−2) − f(ti−1, wi−1)], wcor i+1 = 9 8 wi − 1 8 wi−2 + 3h 8 [f(ti+1, w pred i+1 ) + 2f(ti, wi) − f(ti−1, wi−1)]. (8.126) 10 The method was presented by William Edmund Milne (1890–1971) in Numerical Calculus in 1949. 11 The method was described by Richard Wesley Hamming (1915–1998) in Numerical Methods for Scientists and Engineers in 1962.
  • 478. THE LINEAR EQUIVALENCE METHOD (LEM) 471 8.10 THE LINEAR EQUIVALENCE METHOD (LEM) The linear equivalence method (LEM) was introduced by Ileana Toma to study the nonlinear ordinary differential systems depending on the parameters in a classical linear frame. The method is presented only for homogeneous nonlinear differential operators with constant coefficients, although it can be—and was—applied in more general cases. Consider, therefore, the system F(y) = ˙y − f(y) = 0, f(y) = [fj (y)]j=1,n, fj (y) = ∞ |µ|=1 fjµyµ , fjµ ∈ R, (8.127) to which are associated the arbitrary Cauchy conditions y(t0) = y0, t0 ∈ R. (8.128) The main idea of LEM consists of an exponential mapping depending on n parameters— ξ = (ξ1, ξ2, . . . , ξn) ∈ Rn —namely, ν(x, ξ) ≡ e ξ,y . (8.129) Multiplying equation (8.127) by ν, and then differentiating it with respect to t and replacing the derivatives ˙yj from the nonlinear system gives (a) the first LEM equivalent: Lν(x, ξ) ≡ ∂ν ∂t − ξ, f(D) ν = 0, (8.130) a linear partial differential equation, always of first order with respect to x, accompanied by the obvious condition ν(t0, ξ) = e ξ,y0 , ξ ∈ Rn . (8.131) The usual notation fj (Dξ) stands for the formal operator fj (Dξ) = ∞ |µ|=1 fµ ∂|µ| ∂ξµ . (8.132) The formal scalar product in (8.130) is expressed as n j=1 ξj fj (Dξ) ≡ ξ, f(D) . (8.133) Searching now for the unknown function ν in the class of analytic with respect to ξ functions, ν(t, ξ) = 1 + ∞ |γ|=1 νγ(t) ξγ γ! (8.134) is obtained.
  • 479. 472 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS (b) the second LEM equivalent: δV ≡ dV dt − AV = 0, V = (Vj )j∈N, Vj = (νγ)|γ|=j , (8.135) which must be solved under the Cauchy conditions V(t0) = (y γ 0)|γ|∈N. (8.136) The LEM matrix A is always column-finite; in the case of a polynomial operator, A is also row-finite. The cells Ass on the main diagonal are square, of s + 1 rows and columns, and are generated by those fjµ, for which |µ| = 1. The other cells Ak,k+s contain only those fjµ with |µ| = s + 1. More precisely, the diagonal cells contain the coefficients of the linear part; on the next upper diagonal we find cells containing the coefficients of the second degree in y, and so on. In the case of polynomial operators of degree m, the associated LEM matrix is band-diagonal, the band being made up of m lines. We can express the LEM matrix as A(t) =     A11 A12 A13 · · · A1m A1,m+1 · · · 0 A22 A23 · · · A2m A2,m+1 · · · 0 0 A33 · · · A3m A3,m+1 · · · · · · · · · · · · · · · · · · · · · · · ·     . (8.137) It should be mentioned that this particular form of the LEM matrix is also conserved if the method is applied to nonhomogeneous ordinary differential systems with variable coefficients. This form permits the calculus by block partitioning, which represents a considerable simplification. It was proved that any analytic with respect to ξ solution of linear problems (8.130) and (8.131) is of the exponential form (8.129), with y solution of the nonlinear initial problems (8.127) and (8.128). Starting from this essential fact, we can establish various representations of the solution of nonlinear ordinary differential systems. Theorem 8.4 The solution of the nonlinear initial problem (i) coincides with the first n components of the infinite vector V(t) = eA(t−t0) V0, (8.138) where the exponential matrix eA(t−t0) = I + (t − t0) 1! A + (t − t0)2 2! A2 + · · · + (t − t0)n n! An + · · · (8.139) can be computed by block partitioning, each step involving finite sums; (ii) coincides with the series yj (t) = yj0 + ∞ l=1 |γ|=l ujγ(t)y γ 0 , j = 1, n, (8.140) where ujγ(t) are solutions of the finite linear ordinary differential systems dUk dt = AT 1kU1 + AT 2kU2 + · · · + AT kk Uk, k = 1, l, Us(t) = [uγ(t)]|γ|=s. (8.141)
  • 480. CONSIDERATIONS ABOUT THE ERRORS 473 which satisfy the Cauchy conditions U1(t0) = ej , Us(t0) = 0, s = 2, l. (8.142) T standing for transposed. The above theorem generalizes a similar one, stated for polynomial ordinary differential systems. The corresponding result is very much alike a solution of a linear ordinary differential system with constant coefficients. There is more: the computation is even easier because of the fact that the eigen values of the diagonal cells are always known. The generalized representation (8.140) is the normal LEM representation and it was used in many applications requiring the qualitative behavior of the solution. 8.11 CONSIDERATIONS ABOUT THE ERRORS The integration error is obviously of the order O(h) for Euler’s method. Taylor’s method has the advantage that the order of the error is O(hn ), but it has the disadvantage that it needs the calculus of the derivatives of the function f(t, x(t)). In the case of the Runge–Kutta type methods the error is of the order O(hp+1 ), where p is the order of the method. Butcher stated that between the number of evaluations of the function f at each step and the truncation error’s order, there is a link of the following form: – for two evaluations of the function f, the truncation error is of the order O(h2 ); – for three evaluations, the truncation error is of the order O(h3 ); – for four or five evaluations, the truncation error is of the order O(h4 ); – for six evaluations, the truncation error is of the order O(h5); – for seven evaluations, the truncation error is of the order O(h6 ); – for eight or more evaluations of the function f, the truncation error is of the order O(hn−2 ), where n is the number of evaluations. Proceeding as with the evaluation of the error in the case of Lagrange’s interpolation polynomials, we obtain the following estimations of the errors in the case of multistep methods: – for the second-order Adams–Bashforth method, εx = 5h3 12 M2, M2 = sup ξ∈[t0,tf ] |f (ξ, x(ξ))|; (8.143) – for the third-order Adams–Bashforth method, εx = 3h4 8 M3, M3 = sup ξ∈[t0,tf ] |f (3) (ξ, x(ξ))|; (8.144) – for the fourth-order Adams–Bashforth method, εx = 251h5 720 M4, M4 = sup ξ∈[t0,tf ] |f (4) (ξ, x(ξ))|; (8.145)
  • 481. 474 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS – for the fifth-order Adams–Bashforth method, εx = 85h6 288 M5, M5 = sup ξ∈[t0,tf ] |f (5) (ξ, x(ξ))|; (8.146) – for the second-order Adams–Moulton method, εx = h3 12 M2, M2 = sup ξ∈[t0,tf ] |f (ξ, x(ξ))|; (8.147) – for the third-order Adams–Moulton method, εx = h4 24 M3, M3 = sup ξ∈[t0,tf ] |f (3) (ξ, x(ξ))|; (8.148) – for the fourth-order Adams–Moulton method, εx = 19h5 720 M4, M4 = sup ξ∈[t0,tf ] |f (4) (ξ, x(ξ))|; (8.149) – for the fifth-order Adams–Moulton method, εx = 3h6 16 M5, M5 = sup ξ∈[t0,tf ] |f (5) (ξ, x(ξ))|. (8.150) We can easily observe that the Adams–Moulton methods are more precise than the same-order Adams–Bashforth methods. 8.12 NUMERICAL EXAMPLE Example Let us consider the Cauchy problem ˙x = dx dt = x + et (2 cos 2t − sin t), t ∈ [0, 2], x(0) = 1, (8.151) the solution of which, obviously, is x(t) = et (sin 2t + cos t). (8.152) We shall determine the numerical solution of this Cauchy problem by various methods, with the step h = 0.1. In the case of Euler’s method, the calculation relation is w(i) = w(i−1) + hf (ti−1, w(i−1) ), i = 1, 20, (8.153) where f (t, w) = w + et (2 cos 2t − sin t). (8.154) It results in Table 8.1.
  • 482. NUMERICAL EXAMPLE 475 TABLE 8.1 Solution of Problem (8.151) with Euler’s Method Step ti xi = x(ti) f (ti−1, wi−1) wi 0 0.0 1.000000 – 1.000000 1 0.1 1.319213 3.000000 1.300000 2 0.2 1.672693 3.355949 1.635595 3 0.3 2.051757 3.642913 1.999886 4 0.4 2.444231 3.829149 2.382801 5 0.5 2.834240 3.880586 2.770860 6 0.6 3.202145 3.762036 3.147063 7 0.7 3.524655 3.438735 3.490937 8 0.8 3.775141 2.878185 3.778755 9 0.9 3.924192 2.052280 3.983983 10 1.0 3.940421 0.939656 4.077949 11 1.1 3.791535 −0.471815 4.030767 12 1.2 3.445687 −2.182478 3.812520 13 1.3 2.873060 −4.178426 3.394677 14 1.4 2.047695 −6.429262 2.751751 15 1.5 0.949478 −8.886245 1.863126 16 1.6 −0.433755 −11.481013 0.715025 17 1.7 −2.104107 −14.125068 −0.697482 18 1.8 −4.051585 −16.710208 −2.368502 19 1.9 −6.252297 −19.110082 −4.279511 20 2.0 −8.666988 −21.183026 −6.397813 Another possibility to solve problem (8.151) is the use of Taylor’s method. We shall use Taylor’s method of second order, for which we have T (2) (ti, wi) = f (ti, w) + h 2 f (ti, wi), i = 0, 19, (8.155) f (t, x) = df (t, x) dt = x + et (4 cos 2t − 4 sin 2t − 2 sin t − cos t), (8.156) wi+1 = wi + hT (2) (ti, wi), i = 0, 19. (8.157) The numerical results are given in Table 8.2. If we solve the same Cauchy problem by Euler’s modified method, then we have the relation wi+1 = wi + h 2 [f (ti, wi) + f (ti+1, wi + hf (ti, wi))], i = 0, 19, (8.158) resulting in Table 8.3. The solution of Cauchy problems (8.151) and (8.152) by Heun’s method leads to the relation wi = wi + h 4 f ti, wi + 3f ti + 2 3 h, wi + 2 3 hf ti, wi (8.159) and to the data in Table 8.4. Another way to treat Cauchy problems (8.151) and (8.152) is that of the Runge–Kutta method.
  • 483. 476 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS TABLE 8.2 Solution of Problem (8.151) with Taylor’s Second-Order Method Step ti f (ti, wi) f (ti, wi) T (2) (ti, wi) wi 0 0.0 3.000000 4.000000 3.200000 1.000000 1 0.1 3.375949 3.453994 3.548649 1.320000 2 0.2 3.682182 2.589898 3.811677 1.674865 3 0.3 3.885295 1.376238 3.954107 2.056033 4 0.4 3.949228 −0.207727 3.938842 2.451443 5 0.5 3.836504 −2.168613 3.728074 2.845327 6 0.6 3.509807 −4.495524 3.285031 3.218135 7 0.7 2.933886 −7.156876 2.576043 3.546638 8 0.8 2.077767 −10.097625 1.572886 3.804242 9 0.9 0.917204 −13.237152 0.255346 3.961531 10 1.0 −0.562699 −16.468063 −1.386102 3.987065 11 1.1 −2.364790 −19.656143 −3.347597 3.848455 12 1.2 −4.477250 −22.641732 −5.609337 3.513696 13 1.3 −6.871177 −25.242758 −8.133315 2.952762 14 1.4 −9.498565 −27.259588 −10.861545 2.139430 15 1.5 −12.290864 −28.481849 −13.714956 1.053276 16 1.6 −15.158313 −28.697264 −16.593176 −0.318220 17 1.7 −17.990263 −27.702427 −19.375385 −1.977537 18 1.8 −20.656655 −25.315371 −21.922424 −3.915076 19 1.9 −23.010834 −21.389601 −24.080314 −6.107318 20 2.0 −24.893818 −15.829130 −25.685275 −8.515350 TABLE 8.3 Solution of Problem (8.151) with the Modified Euler Method Step ti f (ti, wi) + f (ti+1, wi + hf (ti, wi)) wi 0 0.0 6.355949 1.000000 1 0.1 6.355949 1.317797 2 0.2 7.036236 1.669609 3 0.3 7.543491 2.046784 4 0.4 7.808220 2.437195 5 0.5 7.756849 2.825037 6 0.6 7.314545 3.190765 7 0.7 6.408693 3.511199 8 0.8 4.973017 3.759850 9 0.9 2.952235 3.907462 10 1.0 0.307146 3.922819 11 1.1 −2.980065 3.773816 12 1.2 −6.900502 3.428791 13 1.3 −11.413518 2.858115 14 1.4 −16.442287 2.036000 15 1.5 −21.870334 0.942484 16 1.6 −27.539431 −0.434488 17 1.7 −33.249253 −2.096950 18 1.8 −38.759174 −4.034909 19 1.9 −43.792562 −6.224537 20 2.0 −48.043864 −8.626730
  • 484. NUMERICAL EXAMPLE 477 TABLE 8.4 Solution of Equation (8.151) by Heun’s Method Step ti xi wi 0 0.0 1.000000 1.000000 1 0.1 1.3192132 1.3185770 2 0.2 1.6726927 1.6714575 3 0.3 2.0517570 2.0500182 4 0.4 2.4442311 2.4421527 5 0.5 2.8342401 2.8320649 6 0.6 3.2021455 3.2002036 7 0.7 3.5246551 3.5233706 8 0.8 3.7751413 3.7750360 9 0.9 3.9241925 3.9258857 10 1.0 3.9404206 3.9446252 11 1.1 3.7915355 3.7990486 12 1.2 3.4456868 3.4573757 13 1.3 2.8730600 2.8898418 14 1.4 2.0476947 2.0705108 15 1.5 0.9494781 0.9792638 16 1.6 −0.4337552 −0.3961070 17 1.7 −2.1041065 −2.0577838 18 1.8 −4.0515853 −3.9958985 19 1.9 −6.2522972 −6.1867206 20 2.0 −8.6669884 −8.5911995 Thus, for the Runge–Kutta method of third order we apply the relations K1 = hf (ti, wi), K2 = hf ti + h 2 , wi + K1 2 , K3 = hf (ti + h, wi + 2K2 + K1), (8.160) wi+1 = wi + 1 6 (K1 + 4K2 + K3), (8.161) the results being given in Table 8.5. Analogically, for the Runge–Kutta method of fourth order we have the relations K1 = hf (ti, wi), K2 = hf ti + h 2 , wi + K1 2 , K3 = hf ti + h 2 , wi + K2 2 , K4 = hf (ti + h, wi + K3), (8.162) wi+1 = wi + 1 6 (K1 + 2K2 + 2K3 + K4), (8.163) while for the Runge–Kutta method of sixth order we may write K1 = hf (ti, wi), K2 = hf ti + h 2 , wi + K1 3 , K3 = hf ti + 2h 5 , wi + 1 25 6K2 + 4K3 , K4 = hf ti + h, wi + 1 4 15K3 − 12K2 + K1 ,
  • 485. 478 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS TABLE 8.5 Solution of Equation (8.151) by the Runge–Kutta Method of Third Order Step xi wi 0 1.0000000 1.0000000 1 1.3192132 1.3291972 2 1.6726927 1.6949971 3 2.0517570 2.0887387 4 2.4442311 2.4981579 5 2.8342401 2.9071609 6 3.2021455 3.2957409 7 3.5246551 3.6400730 8 3.7751413 3.9128210 9 3.9241925 4.0836843 10 3.9404206 4.1202083 11 3.7915355 3.9888704 12 3.4456868 3.6564447 13 2.8730600 3.0916307 14 2.0476947 2.2669173 15 0.9494781 1.1606331 16 −0.4337552 −0.2408872 17 −2.1041065 −1.9411070 18 −4.0515853 −3.9311771 19 −6.2522972 −6.1880339 20 −8.6669884 −8.6728551 K5 = hf ti + 2 3 h, wi + 2 81 4K4 − 25K3 + 45K2 + 3K1 , K6 = hf ti + 4 5 h, wi + 1 75 8K4 + 10K3 + 36K2 + 6K1 , (8.164) wi+1 = wi + 1 192 (23K1 + 125K3 − 8K5 + 12K6). (8.165) The results are given in Table 8.6 and Table 8.7. The solution of Cauchy problems (8.151) and (8.152) by the Runge–Kutta–Fehlberg method leads to the data in Table 8.8. We may study the problem by using the multistep methods too. Thus, Adams method leads to the results in Table 8.9. For Adams–Bashforth methods of the third, fourth, and fifth order we obtain the data in Table 8.10, Table 8.11, and Table 8.12, respectively. The use of the Adams–Moulton methods of third, fourth, and fifth order leads to the results in Table 8.13, Table 8.14, and Table 8.15, respectively. If we use the predictor–corrector methods, then it results – for Euler’s predictor–corrector method the data in Table 8.16; – for Adams’s predictor–corrector method the data in Table 8.17, Table 8.18, and Table 8.19; – for Milne’s predictor–corrector method of fourth order the data in Table 8.20; – for Hamming’s predictor–corrector method the data in Table 8.21.
  • 486. NUMERICAL EXAMPLE 479 TABLE 8.6 Solution of Equation (8.151) by the Runge–Kutta Method of Fourth Order Step xi wi 0 1.0000000 1.0000000 1 1.3192132 1.3192130 2 1.6726927 1.6726923 3 2.0517570 2.0517565 4 2.4442311 2.4442305 5 2.8342401 2.8342396 6 3.2021455 3.2021451 7 3.5246551 3.5246551 8 3.7751413 3.7751417 9 3.9241925 3.9241937 10 3.9404206 3.9404228 11 3.7915355 3.7915390 12 3.4456868 3.4456919 13 2.8730600 2.8730670 14 2.0476947 2.0477038 15 0.9494781 0.9494898 16 −0.4337552 −0.4337406 17 −2.1041065 −2.1040888 18 −4.0515853 −4.0515641 19 −6.2522972 −6.2522725 20 −8.6669884 −8.6669599 TABLE 8.7 Solution of Equation (8.151) by the Runge–Kutta Method of Sixth Order Step xi wi 0 1.0000000 1.0000000 1 1.3192132 1.3192132 2 1.6726927 1.6726927 3 2.0517570 2.0517570 4 2.4442311 2.4442311 5 2.8342401 2.8342402 6 3.2021455 3.2021455 7 3.5246551 3.5246551 8 3.7751413 3.7751413 9 3.9241925 3.9241926 10 3.9404206 3.9404208 11 3.7915355 3.7915357 12 3.4456868 3.4456871 13 2.8730600 2.8730603 14 2.0476947 2.0476951 15 0.9494781 0.9494786 16 −0.4337552 −0.4337547 17 −2.1041065 −2.1041059 18 −4.0515853 −4.0515846 19 −6.2522972 −6.2522964 20 −8.6669884 −8.6669876
  • 487. 480 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS TABLE 8.8 Solution of Equation (8.151) by the Runge–Kutta–Fehlberg Method t w x 0.0000000 1.0000000 1.0000000 0.0316228 1.0968461 1.0968461 0.1469862 1.4814837 1.4814836 0.2666438 1.9231489 1.9231488 0.4162211 2.5081341 2.5081341 0.5210182 2.9141284 2.9141283 0.6250856 3.2883041 3.2883041 0.7170764 3.5733095 3.5733096 0.8010931 3.7773744 3.7773745 0.8799701 3.9039488 3.9039489 0.9551846 3.9516312 3.9516314 1.0276254 3.9171714 3.9171716 1.0979024 3.7965716 3.7965719 1.1664726 3.5854933 3.5854937 1.2337016 3.2794283 3.2794287 1.2998982 2.8737670 2.8737676 1.3653372 2.3638101 2.3638107 1.4302756 1.7447325 1.7447331 1.4949669 1.0114991 1.0114998 1.5596750 0.1587110 0.1587118 1.6246911 −0.8196561 −0.8196552 1.6903571 −1.9307042 −1.9307032 1.7571028 −3.1834262 −3.1834251 1.8255128 −4.5902733 −4.5902721 1.8964618 −6.1705045 −6.1705032 1.9714434 −7.9585369 −7.9585354 8.13 APPLICATIONS Problem 8.1 Study the motion of a rigid solid with a point constrained to move without friction on a given curve (Fig. 8.2). As numerical application, let us consider a body formed (Fig. 8.3) from a homogeneous cube ABDEA B D E of mass m and edge l and a bar OG of length l and negligible mass, G being the center of the square ABDE. The point O moves without friction on the cylindrical curve of equations X0 = l cos ξ1, Y0 = l sin ξ1, Z0 = klξ1. (8.166) Knowing that the mass m, the length l, and the parameter k have the values m = 12 kg, l = 0.1 m, k = 0.1, (8.167) and that the initial conditions of the attached Cauchy problem are (for t = 0) ξ1 = 0 m, ξ5 = 0 m s−1 , ψ = 0 rad, θ = 0.001 rad, φ = 0 rad, ωx = 0 rad s−1 , ωy = 0 rad s−1 , ωz = 0 rad s−1 , (8.168) draw the variables ξi(t), i = 1, 8.
  • 488. APPLICATIONS 481 TABLE 8.9 Solution of Equation (8.151) by the Adams Method Step t x f (ti, wi) f 2 f 3 f 4 f w 0 0.0 1.00000 3.00000 0.37516 −0.07031 −0.03352 −0.00265 1.00000 1 0.1 1.31921 3.37516 0.30485 −0.10384 −0.03617 −0.00145 1.31921 2 0.2 1.67269 3.68001 0.20101 −0.14001 −0.03762 0.00024 1.67269 3 0.3 2.05176 3.88102 0.06100 −0.17764 −0.03739 0.00208 2.05176 4 0.4 2.44423 3.94202 −0.11664 −0.21502 −0.03530 0.00429 2.44423 5 0.5 2.83424 3.82538 −0.33166 −0.25033 −0.03102 0.00678 2.83420 6 0.6 3.20215 3.49371 −0.58199 −0.28134 −0.02424 0.00940 3.20204 7 0.7 3.52466 2.91173 −0.86333 −0.30558 −0.01484 0.01209 3.52448 8 0.8 3.77514 2.04839 −1.16892 −0.32042 −0.00275 0.01473 3.77487 9 0.9 3.92419 0.87948 −1.48934 −0.32317 0.01198 0.01714 3.92381 10 1.0 3.94042 −0.60986 −1.81251 −0.31120 0.02912 0.01916 3.93990 11 1.1 3.79154 −2.42238 −2.12371 −0.28208 0.04827 0.02060 3.79087 12 1.2 3.44569 −4.54609 −2.40579 −0.23381 0.06887 0.02125 3.44486 13 1.3 2.87306 −6.95188 −2.63960 −0.16494 0.09012 0.02092 2.87206 14 1.4 2.04769 −9.59148 −2.80454 −0.07482 0.11104 0.01942 2.04652 15 1.5 0.94948 −12.39601 −2.87935 0.03622 0.13046 0.01658 0.94813 16 1.6 −0.43376 −15.27536 −2.84313 0.16668 0.14704 0.01229 −0.43527 17 1.7 −2.10411 −18.11850 −2.67646 0.31372 0.15933 18 1.8 −4.05159 −20.79496 −2.36274 0.47305 19 1.9 −6.25230 −23.15770 −1.88970 20 2.0 −8.66699 −25.04739 TABLE 8.10 Solution of Equation (8.151) by the Adams–Bashforth Method of Third Order Step xi wi 0 1.00000 1.00000 1 1.31921 1.31921 2 1.67269 1.67269 3 2.05176 2.05301 4 2.44423 2.44707 5 2.83424 2.83887 6 3.20215 3.20874 7 3.52466 3.53335 8 3.77514 3.78599 9 3.92419 3.93717 10 3.94042 3.95539 11 3.79154 3.80823 12 3.44569 3.46373 13 2.87306 2.89191 14 2.04769 2.06669 15 0.94948 0.96783 16 −0.43376 −0.41696 17 −2.10411 −2.08986 18 −4.05159 −4.04092 19 −6.25230 −6.24627 20 −8.66699 −8.66659
  • 489. 482 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS TABLE 8.11 Solution of Equation (8.151) by the Adams–Bashforth Method of Fourth Order Step xi wi 0 1.00000 1.00000 1 1.31921 1.31921 2 1.67269 1.67269 3 2.05176 2.05176 4 2.44423 2.44433 5 2.83424 2.83441 6 3.20215 3.20233 7 3.52466 3.52478 8 3.77514 3.77513 9 3.92419 3.92394 10 3.94042 3.93980 11 3.79154 3.79041 12 3.44569 3.44391 13 2.87306 2.87047 14 2.04769 2.04414 15 0.94948 0.94479 16 −0.43376 −0.43971 17 −2.10411 −2.11146 18 −4.05159 −4.06043 19 −6.25230 −6.26269 20 −8.66699 −8.67894 TABLE 8.12 Solution of Equation (8.151) by the Adams–Bashforth Method of Fifth Order Step xi wi 0 1.00000 1.00000 1 1.31921 1.31921 2 1.67269 1.67269 3 2.05176 2.05176 4 2.44423 2.44423 5 2.83424 2.83420 6 3.20215 3.20204 7 3.52466 3.52448 8 3.77514 3.77487 9 3.92419 3.92381 10 3.94042 3.93990 11 3.79154 3.79087 12 3.44569 3.44486 13 2.87306 2.87206 14 2.04769 2.04652 15 0.94948 0.94813 16 −0.43376 −0.43527 17 −2.10411 −2.10577 18 −4.05159 −4.05338 19 −6.25230 −6.25418 20 −8.66699 −8.66892
  • 490. APPLICATIONS 483 TABLE 8.13 Solution of Equation (8.151) by the Adams–Moulton Method of Third Order Step xi wi 0 1.00000 1.00000 1 1.31921 1.31921 2 1.67269 1.67255 3 2.05176 2.05145 4 2.44423 2.44372 5 2.83424 2.83351 6 3.20215 3.20118 7 3.52466 3.52345 8 3.77514 3.77369 9 3.92419 3.92250 10 3.94042 3.93852 11 3.79154 3.78946 12 3.44569 3.44349 13 2.87306 2.87081 14 2.04769 2.04548 15 0.94948 0.94739 16 −0.43376 −0.43561 17 −2.10411 −2.10562 18 −4.05159 −4.05264 19 −6.25230 −6.25278 20 −8.66699 −8.66680 TABLE 8.14 Solution of Equation (8.151) by the Adams–Moulton Method of Fourth Order Step xi wi 0 1.00000 1.00000 1 1.31921 1.31921 2 1.67269 1.67269 3 2.05176 2.05175 4 2.44423 2.44422 5 2.83424 2.83422 6 3.20215 3.20213 7 3.52466 3.52465 8 3.77514 3.77515 9 3.92419 3.92423 10 3.94042 3.94049 11 3.79154 3.79165 12 3.44569 3.44586 13 2.87306 2.87330 14 2.04769 2.04802 15 0.94948 0.94990 16 −0.43376 −0.43323 17 −2.10411 −2.10347 18 −4.05159 −4.05084 19 −6.25230 −6.25143 20 −8.66699 −8.66601
  • 491. 484 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS TABLE 8.15 Solution of Equation (8.151) by the Adams–Moulton Method of Fifth Order Step xi wi 0 1.00000 1.00000 1 1.31921 1.31921 2 1.67269 1.67269 3 2.05176 2.05176 4 2.44423 2.44423 5 2.83424 2.83425 6 3.20215 3.20216 7 3.52466 3.52467 8 3.77514 3.77516 9 3.92419 3.92422 10 3.94042 3.94046 11 3.79154 3.79158 12 3.44569 3.44574 13 2.87306 2.87313 14 2.04769 2.04777 15 0.94948 0.94956 16 −0.43376 −0.43366 17 −2.10411 −2.10400 18 −4.05159 −4.05148 19 −6.25230 −6.25219 20 −8.66699 −8.66688 TABLE 8.16 Solution of Equation (8.151) by Euler’s Predictor–Corrector Method Step xi w pred i wcorr i 0 1.000000000 1.000000000 1.000000000 1 1.319213234 1.300000000 1.317797459 2 1.672692659 1.655172121 1.669609276 3 2.051756990 2.037301965 2.046783846 4 2.444231072 2.434388485 2.437194823 5 2.834240148 2.830692770 2.825037271 6 3.202145482 3.206658671 3.190764509 7 3.524655087 3.539008168 3.511199171 8 3.775141261 3.801043935 3.759850010 9 3.924192475 3.963187530 3.907461783 10 3.940420612 3.993775235 3.922819068 11 3.791535483 3.860124570 3.773815798 12 3.445686849 3.529872878 3.428790709 13 2.873060026 2.972575236 2.858114788 14 2.047694688 2.161532373 2.036000413 15 0.949478141 1.075800882 0.942483717 16 −0.433755206 −0.297681859 −0.434487826 17 −2.104106532 −1.961945934 −2.096950472 18 −4.051585254 −3.907918108 −4.034909166 19 −6.252297191 −6.112558023 −6.224537284 20 −8.666988414 −8.537342587 −8.626730488
  • 492. APPLICATIONS 485 TABLE 8.17 Solution of Equation (8.151) by Adams’s Predictor–Corrector Method of Third Order Step xi w pred i wcorr i 0 1.000000000 1.000000000 1.000000000 1 1.319213234 1.319213234 1.319213234 2 1.672692659 1.672692659 1.672692659 3 2.051756990 2.053006306 2.051661525 4 2.444231072 2.445469036 2.444025281 5 2.834240148 2.835416070 2.833912943 6 3.202145482 3.203185641 3.201689577 7 3.524655087 3.525480687 3.524068322 8 3.775141261 3.775667889 3.774427807 9 3.924192475 3.924333295 3.923363951 10 3.940420612 3.940090450 3.939497056 11 3.791535483 3.790655697 3.790546062 12 3.445686849 3.444190896 3.444670252 13 2.873060026 2.870899572 2.872064424 14 2.047694688 2.044846047 2.046777180 15 0.949478141 0.945948761 0.948703618 16 −0.433755206 −0.437920555 −0.434315861 17 −2.104106532 −2.108820265 −2.104378926 18 −4.051585254 −4.056712629 −4.051494673 19 −6.252297191 −6.257653609 −6.251772462 20 −8.666988414 −8.672338860 −8.665966364 TABLE 8.18 Solution of Equation (8.151) by Adams’s Predictor–Corrector Method of Fourth Order Step xi w pred i wcorr i 0 1.000000000 1.000000000 1.000000000 1 1.319213234 1.319213234 1.319213234 2 1.672692659 1.672692659 1.672692659 3 2.051756990 2.051756990 2.051756990 4 2.444231072 2.444325646 2.444229802 5 2.834240148 2.834290471 2.834239769 6 3.202145482 3.202142622 3.202148906 7 3.524655087 3.524590325 3.524665906 8 3.775141261 3.775008126 3.775163759 9 3.924192475 3.923986977 3.924231579 10 3.940420612 3.940142130 3.940481805 11 3.791535483 3.791187569 3.791624671 12 3.445686849 3.445277980 3.445810180 13 2.873060026 2.872604252 2.873223668 14 2.047694688 2.047212125 2.047904558 15 0.949478141 0.948995244 0.949739603 16 −0.433755206 −0.434205656 −0.433437678 17 −2.104106532 −2.104485777 −2.103729696 18 −4.051585254 −4.051849311 −4.051147446 19 −6.252297191 −6.252398041 −6.251798643 20 −8.666988414 −8.666875667 −8.666431527
  • 493. 486 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS TABLE 8.19 Solution of Equation (8.151) by Adams’s Predictor–Corrector Method of Fifth Order Step xi w pred i wcorr i 0 1.000000000 1.000000000 1.000000000 1 1.319213234 1.319213234 1.319213234 2 1.672692659 1.672692659 1.672692659 3 2.051756990 2.051756990 2.051756990 4 2.444231072 2.444231072 2.444231072 5 2.834240148 2.834199636 2.834241554 6 3.202145482 3.202095487 3.202148720 7 3.524655087 3.524595924 3.524660533 8 3.775141261 3.775074728 3.775149304 9 3.924192475 3.924120543 3.924203459 10 3.940420612 3.940346215 3.940434826 11 3.791535483 3.791462325 3.791553128 12 3.445686849 3.445619382 3.445708016 13 2.873060026 2.873003382 2.873084668 14 2.047694688 2.047654547 2.047722600 15 0.949478141 0.949460545 0.949508941 16 −0.433755206 −0.433744107 −0.433722093 17 −2.104106532 −2.104060787 −2.104071874 18 −4.051585254 −4.051499477 −4.051550007 19 −6.252297191 −6.252166980 −6.252262480 20 −8.666988414 −8.666810798 −8.666955499 TABLE 8.20 Solution of Equation (8.151) by Milne’s Predictor–Corrector Method of Fourth Order Step xi w pred i wcorr i 0 1.000000000 1.000000000 1.000000000 1 1.319213234 1.319213234 1.319213234 2 1.672692659 1.672692659 1.672692659 3 2.051756990 2.051756990 2.051756990 4 2.444231072 2.444313815 2.444232221 5 2.834240148 2.834284533 2.834241933 6 3.202145482 3.202140594 3.202149027 7 3.524655087 3.524591299 3.524660029 8 3.775141261 3.775009983 3.775148704 9 3.924192475 3.923986136 3.924202128 10 3.940420612 3.940134740 3.940433506 11 3.791535483 3.791168274 3.791551327 12 3.445686849 3.445241097 3.445706520 13 2.873060026 2.872542757 2.873083141 14 2.047694688 2.047118841 2.047721888 15 0.949478141 0.948862029 0.949508855 16 −0.433755206 −0.434386680 −0.433720655 17 −2.104106532 −2.104722840 −2.104068978 18 −4.051585254 −4.052149724 −4.051544739 19 −6.252297191 −6.252768644 −6.252254840 20 −8.666988414 −8.667321559 −8.666944624
  • 494. APPLICATIONS 487 TABLE 8.21 Solution of Equation (8.151) by Hamming’s Predictor–Corrector Method of Fourth Order Step xi w pred i wcorr i 0 1.000000000 1.000000000 1.000000000 1 1.319213234 1.319213234 1.319213234 2 1.672692659 1.672692659 1.672692659 3 2.051756990 2.051756990 2.051756990 4 2.444231072 2.444313815 2.444229732 5 2.834240148 2.834283869 2.834239485 6 3.202145482 3.202140273 3.202148436 7 3.524655087 3.524590804 3.524665679 8 3.775141261 3.775008426 3.775164256 9 3.924192475 3.923986924 3.924233399 10 3.940420612 3.940141921 3.940485639 11 3.791535483 3.791187804 3.791631306 12 3.445686849 3.445279364 3.445820481 13 2.873060026 2.872607655 2.873238554 14 2.047694688 2.047218551 2.047924973 15 0.949478141 0.949005833 0.949766481 16 −0.433755206 −0.434189654 −0.433403459 17 −2.104106532 −2.104463036 18 −4.051585254 −4.051818467 −4.051096382 19 −6.252297191 −6.252357753 −6.251738441 20 −8.666988414 −8.666824674 −8.666362041 Solution: 1. Theory Let us consider the rigid solid in Figure 8.2, in which its point O moves on the curve , and let O0X0Y0Z0 be a fixed reference system, Oxyz the movable system of the principal axes of inertia, and C the weight center of the rigid solid. Further on, we use the notations: • XO, YO, ZO, the co-ordinates of the point O in the fixed reference system; • rC, the vector OC; • xC, yC, zC, the co-ordinates of the point C in the system Oxyz; • m, the mass of the rigid solid; • Jx, Jy, Jz, the principal moments of inertia; • the parametric equations of the curve , given by XO = f1(λ), YO = f2(λ), ZO = f3(λ), where λ ∈ R; (8.169) • Ox0y0z0, a reference system with the origin at O and with the axes parallel to those of the system O0X0Y0Z0, respectively; • ψ, θ, φ, the Euler angles, which define the position of the system Oxyz relative to the system Ox0y0z0; • F, the resultant of the forces that act upon the rigid solid; • MO, the resultant moment of the given forces at O. Considering that the parameters ψ, θ, φ, λ and their derivatives ˙ψ, ˙θ, ˙φ, ˙λ, the inertial parameters m, Jx, Jy, Jz, xC, yC, zC, and also the torsor of the forces {F, MO} at the moment t = 0 are known,
  • 495. 488 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS C MO rC F O y x z y0′ x0′ z0′ Y0 Z0 X0 O0 Γ Figure 8.2 The rigid solid, a point of which is constrained to move without friction on a given curve. G mg Γ x X0 Z0 Y0 O0 z y O B A ED B′ A′ E′D′ Figure 8.3 Numerical application. determine the motion, respectively, the functions of time ψ = ψ(t), θ = θ(t), φ = φ(t), λ = λ(t), XO(t), YO(t), ZO(t). The theorem of momentum can be written in the vector form m[aO + ε × rC + ω × (ω × rC)] = F + N1ν + N2β, (8.170) where • aO is the acceleration of the point O; • ε is the angular acceleration of the rigid solid; • ω is the angular velocity of the rigid solid;
  • 496. APPLICATIONS 489 • ν, β are the unit vectors of the principal normal and of the binormal, respectively, to the curve ; • N1, N2 are the reactions in the direction of the principal normal and in the direction of the binormal, respectively, to the curve . The theorem of moment of momentum relative to the point O, in the vector form, reads rC × maO + [Jxεx − (Jy − Jz)ωyωz]i + [Jyεy − (Jz − Jx)ωzωx]j + [Jzεz − (Jx − Jy)ωxωy]k = MO, (8.171) where • ωx, ωy, ωz are the projections of the vector ω onto the axes of the system Oxyz; • εx, εy, εz are the projections of the vector ε onto the axes of the system Oxyz. If T1 is a tangent vector at the point O to the curve , then from relation (8.170), by a dot product of both members by T1, we can eliminate the reactions N1 and N2, obtaining m{T1 · aO + T1 · (ε × rC) + T1 · [ω × (ω × rC)]} = T1 · F. (8.172) As we shall see soon, the system consisting of equations (8.171) and (8.172) can be transformed in a system of eight first-order differential equations, from which the parameters ψ, θ, φ, λ are finally deduced. To pass from the system O0X0Y0Z0 to the system Oxyz, the rotation matrix [R] is written in the form [R] = [φ][θ][ψ], (8.173) where [φ] =   cos φ sin φ 0 − sin φ cos φ 0 0 0 1   , [θ] =   1 0 0 0 cos θ sin θ 0 − sin θ cos θ   , [ψ] =   cos ψ sin ψ 0 − sin ψ cos ψ 0 0 0 1   . (8.174) The vector T1, tangent to the curve, and the acceleration aO have the matrix expressions {T1} =   T1x T1y T1z   =   f1 (λ) f2(λ) f3(λ)   , (8.175) {aO} =   aOx aOy aOz   = ¨λ{T1} + ˙λ2 {T2}, (8.176) where {T2} =   T2x T2y T2z   =   f1 (λ) f2 (λ) f3 (λ)   , (8.177) in the system O0X0Y0Z0. On the basis of these notations, we calculate the dot product mT1 · aO and we obtain mT1 · aO = ¨λA14 + ˙λ2 A15, (8.178) where A14 = m(T 2 1x + T 2 1y + T 2 1z), A15 = m(T1xT2x + T1yT2y + T1zT2z). (8.179)
  • 497. 490 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS Further on, the calculation is made in the system Oxyz because the vectors ε, ω, rC, F, MO are represented in this system. Hence, we calculate successively {T∗ 1} = [R]{T1}, {T∗ 2} = [R]{T2}, (8.180) {a∗ O} = ¨λ{T∗ 1} + ˙λ2 {T∗ 2}. (8.181) The components ωx, ωy, ωz of the angular velocity are given by the relations ωx = ˙ψ sin θ sin φ + ˙θ cos φ, ωy = ˙ψ sin θ cos φ − ˙θ sin φ, ωz = ˙ψ cos θ + ˙φ, (8.182) from which it follows that ˙ψ = 1 sin θ (ωx sin φ + ωy cos φ), ˙θ = ωx cos φ − ωy sin φ, ˙φ = ωz − cos θ sin θ (ωx sin φ + ωy cos φ). (8.183) Further on, using the matrix notations {rC} =   xC yC zC   , (8.184) [ω] =   0 −ωz ωy ωz 0 −ωx −ωy ωx 0   , (8.185) {F} =   Fx Fy Fz   , {MO} =   MOx MOy MOz   , (8.186) and the scalar notations A11 = m(yCT ∗ 1z − zCT ∗ 1y), A12 = m(zCT ∗ 1x − xCT ∗ 1z), A13 = m(xCT ∗ 1y − yCT ∗ 1x), (8.187) B1 = −A15 ˙λ2 − m{T∗ 1}T [ω]2 {rC} + {T∗ 1}T {F}, (8.188) too, we obtain the relation A11εx + A12εy + A13εz + A14 ¨λ = B1, (8.189) from equation (8.182). Taking into account relations (8.180) and (8.181), we get for equation (8.171), the matrix formulation     m¨λ yCT ∗ 1z − zCT ∗ 1y + m˙λ2 (yCT ∗ 2z − zCT ∗ 2y) + Jxεx − (Jy − Jz)ωyωz m¨λ(zCT ∗ 1x − xCT ∗ 1z) + m˙λ2(zCT ∗ 2x − xCT ∗ 2z) + Jyεy − (Jz − Jx)ωzωx m¨λ(xCT ∗ 1y − yCT ∗ 1x) + m˙λ2 (xCT ∗ 2y − yCT ∗ 2x) + Jzεz − (Jx − Jy)ωxωy     =   MOx MOy MOz   ; (8.190) using the scalar notations B2 = MOx − m˙λ2 (yCT ∗ 2z − zCT ∗ 2y) + (Jy − Jz)ωyωz, B3 = MOy − m˙λ2 (zCT ∗ 2x − xCT ∗ 2z) + (Jz − Jx)ωzωx, B4 = MOz − m˙λ2 (xCT ∗ 2y − yCT ∗ 2x) + (Jx − Jy)ωxωy, (8.191)
  • 498. APPLICATIONS 491 we get the system A11 ¨λ + Jxεx = B2, A12 ¨λ + Jyεy = B3, A13 ¨λ + Jzεz = B4. (8.192) Equations (8.189) and (8.192) form a linear system of four equations with four unknowns ¨λ, εx, εy, εz. Finally, if we denote C = B1 − A11B2 Jx − A12B3 Jy − A13B4 Jz A14 − A2 11 Jx − A2 12 Jy − A2 13 Jz , (8.193) then we obtain, from equations (8.189) and (8.192), the system of four differential equations ¨λ = C, εx = 1 Jx (B2 − A11C), εy = 1 Jy (B2 − A12C), εz = 1 Jz (B4 − A13C). (8.194) To determine the parameters that are involved in the problem, we have to couple the equations of the kinematic system (8.183) with the equations of system (8.194). Thus, it results in a system of seven differential equations of first and second order. To apply the fourth-order Runge–Kutta method, the system must contain only first-order differential equations. With the notations λ = ξ1, ψ = ξ2, θ = ξ3, φ = ξ4, ˙λ = ξ5, ωx = ξ6, ωy = ξ7, ωz = ξ8, (8.195) we obtain, from relations (8.183) and (8.194), the following system of eight first-order differential equations ˙ξ1 = ξ5, ˙ξ2 = 1 sin ξ3 (ξ6 sin ξ4 + ξ7 cos ξ4), ˙ξ3 = ξ6 cos ξ4 − ξ7 sin ξ4, ˙ξ4 = ξ8 − cos ξ3 sin ξ3 (ξ6 sin ξ4 + ξ7 cos ξ4), ˙ξ5 = C, ˙ξ6 = 1 Jx (B2 − A11C), ˙ξ7 = 1 Jy (B2 − A12C), ˙ξ8 = 1 Jz (B4 − A13C). (8.196) Taking into account that the initial conditions are known (or can be deduced), we choose the integration step t and apply the fourth-order Runge–Kutta method to determine the numerical results. At each step of the method, we proceed with the calculations in the following manner: • the matrices {T1} and {T2} with relations (8.175) and (8.177); • the parameters A14 and A15 with relations (8.179); • the rotation matrix with relations (8.173) and (8.174); • the matrices {T∗ 1} and {T∗ 2} with relations (8.180); • the matrix [ω] with relation (8.185); • the expression B1 with relation (8.188); • the parameters B2, B3, B4 with relations (8.191); • the parameter C with relation (8.193).
  • 499. 492 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS 2. Numerical calculation The principal axes of inertia determine the system Oxyz, where Ox||BD, Oy||AB, Oz||AA . In this reference frame, the co-ordinates of the gravity center C are xC = 0, yC = 0, zC = − 3 2 l. (8.197) The principal moments of inertia read Jx = mz2 C + ml2 6 = 29 12 ml2 , Jy = mz2 C + ml2 6 = 29 12 ml2 , Jz = ml2 6 . (8.198) The force F is given by the weight G, which, in the system O0X0Y0Z0, has the expression {F0} = mg   0 0 −1   . (8.199) The rotation matrix reads [R] =   cξ2cξ4 − sξ2cξ3sξ4 sξ2cξ4 + cξ2cξ3sξ4 sξ3sξ4 −cξ2sξ4 − sξ2cξ3cξ4 −sξ2sξ4 + cξ2cξ3cξ4 sξ3cξ4 sξ2sξ3 −cξ2sξ3 cξ3   , (8.200) from which, if we take into account the relation {F} = [R]{F0}, (8.201) we obtain the expression {F} = −mg   sin ξ3 sin ξ4 sin ξ3 cos ξ4 cos ξ3   . (8.202) For the moment MO = OC × F, we obtain the matrix representation {MO} = 3 2 mgl sin ξ3   − cos ξ4 sin ξ4 0   . (8.203) The graphic results obtained after the simulation are captured in Figure 8.4. This problem may be solved by a method of multibody type, as will be seen in Problem 8.4; but it is necessary to solve an algebraic-differential system of equations, having the advantage of obtaining the reactions at the same time. Problem 8.2 Study the motion of a rigid solid having a point constrained to move without friction on a given surface (Fig. 8.5). As numerical application, let us consider the body formed (Fig. 8.6) by a homo- geneous cube ABDEA B D E of mass m and edge l and a bar OG of length l and neglected mass, G being the center of the square ABDE. The point O moves without friction on the plane of equations X0 = ξ1, Y0 = ξ2, Z0 = l − ξ1 − ξ2. (8.204)
  • 500. APPLICATIONS 493 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −35 −30 −25 −20 −15 −10 −5 0 t (s) t (s) t (s) t (s) t (s) t (s) t (s) t (s) ξ1 (=λ)(m) (a) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −35 −30 −25 −20 −15 −10 −5 0 5 ξ2 (=ϕ)(rad)ξ4 (=ϕ)(rad) (b) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 ξ3 (=θ)(rad) (c) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −5 0 5 10 15 20 25 (d) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −16 −14 −12 −10 −8 −6 −4 −2 0 2 4 ξ5 (=dotλ )(ms –1 ) (e) ξ6 (=ωx )(rads –1 )ξ7 (=ωy )(rads –1 )ξ8 (=ωz )(rads –1 ) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −15 −10 −5 0 5 10 (f) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −15 −10 −5 0 5 10 15 (g) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −0.1 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1 (h) Figure 8.4 Results of the simulation.
  • 501. 494 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS rC O0 O M C z′ z F y′ y x′ Z Y X Σ x Figure 8.5 The rigid solid with a point constrained to move without friction on a given surface. D′ E′ A′B′ D E AB O y x mg Y O0 X G Z Figure 8.6 Numerical application. Knowing that m = 12 kg, l = 0.1 m, (8.205) and that the initial conditions are given by (for t = 0) ξ1 = 0 m, ξ2 = 0 m, ˙ξ1 = 0 m s−1 , ˙ξ2 = 0 m s−1 , ψ = 0 rad, θ = 0.001 rad, φ = 0 rad, ωx = 0 rad/s, ωy = 0 rad s−1 , ωz = 0 rad s−1 , (8.206) we search the graphical representation of the variations of the variables ξi(t), i = 1, 10.
  • 502. APPLICATIONS 495 Solution: 1. Theory Let us consider a rigid solid (Fig. 8.5), the point O of which is constrained to move on the surface . We shall consider • the three-orthogonal system O0XYZ; • the three-orthogonal system Oxyz of the principal axes of inertia, relative to the point O of the rigid solid; • the three-orthogonal system Ox y z having the axes parallel to those of the three-orthogonal system O0XYZ. The following are known: – the equations of the surface X = X(ξ1, ξ2), Y = Y(ξ1, ξ2), Z = Z(ξ1, ξ2), (8.207) where ξ1 and ξ2, respectively, are two real parameters; – the mass and the principal moments of inertia of the rigid solid: m and Jx, Jy, and Jz, respectively; – the resultant of the given forces F(Fx, Fy, Fz) and the resultant moment M(Mx, My, Mz) of the given forces; – the position vector of the gravity center rC(xC, yC, zC). In addition, we shall define the Euler angles: ψ = ξ3, θ = ξ4, φ = ξ5. (8.208) We wish to determine • the motion and the functions of time ξi = ξi(t), i = 1, 2, . . . , 5; • the normal reaction N = N(t). Applying the theorem of momentum in the form of the theorem of gravity center’s motion, we obtain the vector relation m[aO + ε × rC + ω × (ω × rC)] = F + N. (8.209) The theorem of the moment of momentum leads to mrC × aO + Jε + ω × Jω = M. (8.210) The passing from the fixed system O0XYZ to the movable system Oxyz, rigidly linked to the rigid solid, is made by the matrix [P] = [φ][θ][ψ] = [ξ5][ξ4][ξ3], (8.211) where [φ] =   cos φ sin φ 0 − sin φ cos φ 0 0 0 1   , [θ] =   1 0 0 0 cos θ sin θ 0 − sin θ cos θ   , [ψ] =   cos ψ sin ψ 0 − sin ψ cos ψ 0 0 0 1   . (8.212)
  • 503. 496 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS Making the calculation, we find [P] =   cξ3cξ5 − sξ3cξ4sξ5 sξ3cξ5 + cξ3cξ4sξ5 sξ4sξ5 −cξ3sξ5 − sξ3cξ4cξ5 −sξ3sξ5 + cξ3cξ4cξ5 sξ4cξ5 sξ3sξ4 −cξ3sξ4 cξ4   , (8.213) where the functions cosine and sine are marked by c and s, respectively. We make the following notations: ˙ξ1 = ξ6, ˙ξ2 = ξ7, ωx = ξ8, ωy = ξ9, ωz = ξ10, (8.214) {r} =   X Y Z   , {r1} =          ∂X ∂ξ1 ∂Y ∂ξ1 ∂Z ∂ξ1          , {r2} =          ∂X ∂ξ2 ∂Y ∂ξ2 ∂Z ∂ξ2          , (8.215) {r11} =            ∂2 X ∂ξ2 1 ∂2Y ∂ξ2 1 ∂2 Z ∂ξ2 1            , {r12} =           ∂2 X ∂ξ1∂ξ2 ∂2 Y ∂ξ1∂ξ2 ∂2Z ∂ξ1∂ξ2           , {r22} =            ∂2 X ∂ξ2 2 ∂2Y ∂ξ2 2 ∂2 Z ∂ξ2 2            , (8.216) {R1} = [P]{r1}, {R2} = [P]{r2}, {R11} = [P]{r11}, {R12} = [P]{r12}, {R22} = [P]{r22}, (8.217) {rC} =   xC yC zC   , {ω} =   ξ8 ξ9 ξ10   , {ε} =   ˙ξ8 ˙ξ9 ˙ξ10   , (8.218) [rC] =   0 −zC yC zC 0 −xC −yC xC 0   , [ω] =   0 −ξ10 ξ9 ξ10 0 −ξ8 −ξ9 ξ8 0   , [J] =   Jx 0 0 0 Jy 0 0 0 Jz   , (8.219) {aO} = aOx aOy aOz T , {AO} = [P]{aO}. (8.220) Considering that r1⊥N, r2⊥N, aO are expressed in the system O0XYZ and ε, rC, ω, F in the system Oxyz, from equation (8.209) there result the matrix relations m{r1}T {aO} + m{ε}T [rC]{R1} = {R1}T {F} − m{R1}T [ω]2 {rC}, m{r2}T {aO} + m{ε}T [rC]{R2} = {R2}T {F} − m{R2}T [ω]2 {rC}, (8.221) where {aO} = {r1}¨ξ1 + {r2}¨ξ2 + {r11}˙ξ2 1 + {r22}˙ξ2 2 + 2{r12}˙ξ1 ˙ξ2 (8.222) or {aO} = {r1}˙ξ6 + {r2}˙ξ7 + {r11}ξ2 6 + {r22}ξ2 7 + 2{r12}ξ6ξ7. (8.223) It follows that {AO} = {R1}˙ξ6 + {R2}˙ξ7 + {R11}ξ2 6 + {R22}ξ2 7 + 2{R12}ξ6ξ7 (8.224) too.
  • 504. APPLICATIONS 497 We denote A11 = m{r1}T {r1}, A12 = m{r1}T {r2}, A13 = m(yCR1z − zCR1y), A14 = m(zCR1x − xCR1z), A15 = m(xCR1y − yCR1x), A21 = m{r2}T {r1}, A22 = m{r2}T {r2}, A23 = m(yCR2z − zCR2y), A24 = m(zCR2x − xCR2z), A25 = m(xCR2y − yCR2x), (8.225) B1 = {R1}T {F} − m{R1}T [ω]2 {rC} − m{r1}T {r11}ξ2 6 − m{r1}T {r22}ξ2 7 − 2m{r1}T {r12}ξ6ξ7, B2 = {R2}T {F} − m{R2}T [ω]2 {rC} − m{r2}T {r11}ξ2 6 − m{r2}T {r11}ξ2 7 − 2m{r2}T {r12}ξ6ξ7. (8.226) From equation (8.209) we obtain the equations A11 ˙ξ6 + A12 ˙ξ7 + A13 ˙ξ8 + A14 ˙ξ9 + A15 ˙ξ10 = B1, A21 ˙ξ6 + A22 ˙ξ7 + A23 ˙ξ8 + A24 ˙ξ9 + A25 ˙ξ10 = B2. (8.227) In the matrix form, relation (8.210) reads m[rC]{AO} + [J]{ε} + [ω][J]{ω} = {M} (8.228) or m[rC]{R1}˙ξ6 + m[rC]{R2}˙ξ7 + [J]{ε} = {M} − [ω][J]{ω} − m[rC]{R11}ξ2 6 − m[rC]{R22}ξ2 7 − 2m[rC]{R12}ξ6ξ7. (8.229) If we denote B3 = Mx + (Jy − Jz)ξ9ξ10 − m(yCR11z − zCR11y)ξ2 6 − m(yCR22z − zCR22y)ξ2 7 − 2m(yCR12z − zCR12y)ξ6ξ7, B4 = My + (Jz − Jx)ξ10ξ8 − m(zCR11x − xCR11z)ξ2 6 − m(zCR22x − xCR22z)ξ2 7 − 2m(zCR12x − xCR12z)ξ6ξ7, B5 = Mz + (Jx − Jy)ξ8ξ9 − m(xCR11y − yCR11x)ξ2 6 − m(xCR22y − yCR22x)ξ2 7 − 2m(xCR12y − yCR12x)ξ6ξ7, (8.230) then we obtain the system A13 ˙ξ6 + A23 ˙ξ7 + Jx ˙ξ8 = B3, A14 ˙ξ6 + A24 ˙ξ7 + Jy ˙ξ9 = B4, A15 ˙ξ6 + A25 ˙ξ7 + Jz ˙ξ10 = B5. (8.231) Solving the linear system formed by equations (8.227) and (8.231), it follows that ˙ξi = Di, i = 6, 7, . . . , 10. (8.232) From the known relations ωx = ˙ψ sin θ sin φ + ˙θ cos φ, ωy = ˙ψ sin θ cos φ − ˙θ sin φ, ωz = ˙ψ cos θ + ˙φ, (8.233) which form a system of three equations with the unknowns ˙ψ, ˙θ, and ˙φ, it follows that ˙ψ = 1 sin θ (ωx sin φ + ωy cos φ), ˙θ = ωx cos φ − ωy sin φ, ˙φ = ωz − cos θ sin θ (ωx sin φ + ωy cos φ). (8.234)
  • 505. 498 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS With the notations D1 = ξ6, D2=ξ7, D3 = 1 sin ξ4 (ξ8 sin ξ5 + ξ9 cos ξ5), D4 = ξ8 cos ξ5 − ξ9 sin ξ5, D5 = ξ10 − cos ξ4 sin ξ4 (ξ8 sin ξ5 + ξ9 cos ξ5), (8.235) it results in the system of first-order differential equations ˙ξi = Di, i = 1, 2, . . . , 10. (8.236) To apply the fourth-order Runge–Kutta method, it is necessary that at each step we execute the following calculations: • the rotation matrix with relation (8.213); • {r1}, {r2}, {r11}, {r12}, {r22}, with relations (8.215) and (8.216); • {R1}, {R2}, {R11}, {R12}, {R22}, with relations (8.217); • {rC}, {ω}, [rC], [ω], with relations (8.218) and (8.219); • A11, A12, A13, A14, A15, A21, A22, A23, A24, A25, B1, B2, with relations (8.225) and (8.226); • B3, B4, B5, with relations (8.230); • the linear system formed by equations (8.227) and (8.231), obtaining the parameters Di, i = 6, 7, . . . , 10; • Di, i = 1, 2, . . . , 5, with relations (8.235). 2. Numerical calculation Proceeding as in the previous application, we get – the co-ordinates of the gravity center C of the body xC = 0, yC = 0, zC = − 3 2 l; (8.237) – the principal moments of inertia Jx = mz2 C + ml2 6 = 29 12 ml2 , Jy = mz2 C + ml2 6 = 29 12 ml2 , Jz = ml2 6 ; (8.238) – the rotation matrix P =   cξ3cξ5 − sξ3cξ4sξ5 sξ3cξ5 + cξ3cξ4sξ5 sξ4sξ5 −cξ3sξ5 − sξ3cξ4cξ5 −sξ3sξ5 + cξ3cξ4cξ5 sξ4cξ5 sξ3sξ4 −cξ3sξ4 cξ4   ; (8.239) – the matrix expression of the force F, in the system Oxyz, {F} = −mg   sin ξ4 sin ξ5 sin ξ4 cos ξ5 cos ξ4   ; (8.240) – the matrix expression of the moment MO = OC × F, {MO} = 3 2 mgl sin ξ4   − cos ξ5 sin ξ5 0   . (8.241) Integrating the obtained system of differential equations by the fourth-order Runge–Kutta method, we get the numerical results plotted into diagrams (Fig. 8.7).
  • 506. APPLICATIONS 499 This problem may be solved by a multibody-type method too, as seen in Problem 8.3; also, in this case we have to solve an algebraic-differential system of equations, with the advantage of obtaining, at the same time, the reactions. Problem 8.3 We consider the parallelepiped ABCDA B C D (Fig. 8.8) of dimensions AD = 2a, AB = 2b, BB = 2c and of mass m, with the vertex A situated without friction on the cylindrical surface Z = 1 − x2 . (8.242) Knowing that the parallelepiped is acted on only by its own weight mg, while the O0Z-axis is vertical, with the initial conditions t = 0, XO = X0 O , YO = Y0 O , ZO = Z0 O , ψ = ψ0 , θ = θ0 , φ = φ0 , O being the gravity center, and ψ, θ, φ being Bryan’s angles, let us determine 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 5 10 15 20 25 30 35 40 45 t (s) t (s) t (s) t (s) ξ1 (m) (a) t (s) ξ2 (m) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 5 10 15 20 25 30 35 40 45 (b) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −1 −0.5 0 0.5 1 1.5 2 ξ3 (=ϕ)(rad)ξ5 (=ϕ)(rad) ξ4 (=θ)(rad) (c) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −2 −1.8 −1.6 −1.4 −1.2 −1 −0.8 −0.6 −0.4 −0.2 0 (d) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −2 −1.5 −1 −0.5 0.5 0 1 (e) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 2 4 6 8 10 12 14 16 18 t (s) ξ6(=dotξ1)(ms −1 ) (f) Figure 8.7 Results of the simulation.
  • 507. 500 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 2 4 6 8 10 12 14 16 18 t (s) t (s) t (s) t (s) (g) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −10 −8 −6 −4 −2 0 2 4 6 8 10 (h) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −10 −8 −6 −4 −2 0 2 4 6 8 10 (i) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −0.1 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1 (j) ξ7 (=dotξ2 )(ms–1 ) ξ8 (=ωx )(rads–1 )ξ10 (=ωz )(rads–1 ) ξ9 (=ωy )(rads–1 ) Figure 8.7 (Continued) • the trajectory of the point A; • the trajectory of the point O; • the reaction at A. Numerical application for a = 0.3 m, b = 0.2 m, c = 0.1 m, X0 O = 0.1 m, Y0 O = 0.2 m, Z0 O = 0.74 m, m = 100 kg, ψ0 = 0 rad, θ0 = 0 rad, φ0 = 0 rad, ˙ψ0 = 0 rad s−1 , ˙θ0 = 0 rad s−1 , ˙φ0 = 0 rad s−1 . Solution: 1. Theory 1.1. Kinematic relations We consider the frame of reference Oxyz rigidly linked to the parallelepiped, the axes Ox, Oy, Oz being parallel to AD, AB, BB , respectively, and the frame of reference Ox y z with the axes Ox , Oy , Oz , parallel to the axes O0X, O0Y, O0Z. If, from the position Ox y z we attain the position Oxyz, by successive rotations of angles ψ, θ, φ, specified in the schema Ox y z axis Ox −−−−→ angle ψ Ox y z axis Oy −−−−→ angle θ Ox y z axis Oz −−−−→ angle φ Oxyz,
  • 508. APPLICATIONS 501 Z Y X O O0 B D′ B′ C′ C D A′ A mg Figure 8.8 Problem 8.3. where ψ, θ, φ are Bryan’s angles, then the partial rotation matrices are [ψ] =   1 0 0 0 cos ψ − sin ψ 0 sin ψ cos ψ   , [θ] =   cos θ 0 sin θ 0 1 0 − sin θ 0 cos θ   , [φ] =   cos φ − sin φ 0 sin φ cos φ 0 0 0 1   , (8.243) while the matrix of the system Oxyz with respect to the system O0XYZ is [A] = [ψ][θ][φ]. (8.244) Associating to the matrices [ψ], [θ], [φ] the antisymmetric matrices [Uψ] =   0 0 0 0 0 −1 0 1 0   , [Uθ] =   0 0 1 0 0 0 −1 0 0   , [Uφ] =   0 −1 0 1 0 0 0 0 0   , (8.245) we obtain the derivatives [ψp], [θp], [φp] from the relations [ψp] = [Uψ][ψ] = [ψ][Uψ], [θp] = [Uθ][θ] = [θ][Uθ], [φp] = [Uφ][φ] = [φ][Uφ]; (8.246) thus, the partial derivatives [Aψ], [Aθ], [Aφ] of the matrix [A] are [Aψ] = [Uψ][A], [Aθ] = [A][φ]T [Uθ][φ], [Aφ] = [A][Uφ], (8.247) while the derivative with respect to time of the matrix [A] is [ ˙A] = ˙ψ[Aψ] + ˙θ[Aθ] + ˙φ[Aφ]. (8.248)
  • 509. 502 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS The square matrix [ω] of the angular velocity with respect to the frame Oxyz is antisymmetric and we deduce the relation [ω] = [A]T [ ˙A] =   0 −ωz ωy ωz 0 −ωx −ωy ωx 0   , (8.249) from which it follows that {ω} = [Q]{˙β}, (8.250) where {ω} = ωx ωy ωz T , {˙β} = ˙ψ ˙θ ˙φ T , (8.251) [Q] =   cos θ cos φ sin φ 0 − cos θ sin φ cos φ 0 sin θ 0 1   . (8.252) Moreover, we obtain [Qθ] =   − sin θ cos φ 0 0 sin θ sin φ 0 0 cos θ 0 0   , [Qφ] =   − cos θ sin φ cos φ 0 − cos θ cos φ − sin φ 0 0 0 0   , (8.253) [ ˙Q] = ˙θ[Qθ] + ˙φ[Qφ]. (8.254) 1.2. The constraints matrix In the frame of reference Oxyz, the point A has the co-ordinates a, −b, c; denoting by XA, YA, ZA the co-ordinates of the same point in the frame O0XYZ, we obtain the matrix equation   XA YA ZA   =   XO YO ZO   + [A]   a −b c   , (8.255) or {RA} = {RO} + [A]{rA}, (8.256) where {RA} = XA YA ZA T , {RO} = XO YO ZO T , {rA} = a −b c T . (8.257) Writing equation (8.242) in the general form f (X, Y, Z) = 0, (8.258) we must verify the relation f (XA, YA, ZA) = 0. (8.259) Differentiating with respect to time equation (8.259), it follows that {fp}T   ˙XA ˙YA ˙ZA   = 0, (8.260)
  • 510. APPLICATIONS 503 where {fp} = ∂f ∂X ∂f ∂Y ∂f ∂Z T . (8.261) Differentiating with respect to time relation (8.256) and taking into account the suc- cessive relations [ ˙A]{rA} = [A][ω]{rA} = [A][rA]T {ω}, (8.262) where [rA] =   0 −c −b c 0 −a b a 0   , (8.263) we obtain { ˙RA} = { ˙RO} + [A][rA]T [Q]{˙β}; (8.264) equation (8.260), with the notations [B] = {fp}T [I] [A][rA]T [Q] , (8.265) {q} = XO YO ZO ψ θ φ T , (8.266) becomes [B]{˙q} = 0, (8.267) where [B] is the constraints matrix. 1.3. The matrix differential equation of the motion The kinetic energy T of the rigid solid reads T = 1 2 m{ ˙RO}T { ˙RO} + 1 2 {ω}T [J]{ω}, (8.268) where [J] is the matrix of the moments of inertia with respect to the axes Ox, Oy, Oz [J] =   Jxx −Jxy −Jxz −Jyx Jyy −Jyz −Jzx −Jzy Jzz   . (8.269) In the considered case Jxy = Jxz = Jyz = 0 and Jxx = m 3 (b2 + c2 ), Jyy = m 3 (a2 + c2 ), Jzz = m 3 (a2 + b2 ). (8.270) Applying Lagrange’s equations and using the notations [m] =   m 0 0 0 m 0 0 0 m   , [M] = [m] [0] [0] [Q]T [J][Q] , {F} = 0 0 −mg 0 0 0 T , [∆] =    ˙β T [Qψ]T [J][Q] {˙β}T [Qθ]T [J][Q] {˙β}T [Qφ]T [J][Q]    , {Fβ} = [[ ˙Q]T [J][Q] + [Q]T [J][ ˙Q] + [∆]]{˙β}, {F} = 0 0 0 Fβ T T , (8.271)
  • 511. 504 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS we obtain the matrix differential equation [M]{¨q} = {F} + {F} + λ[B]T . (8.272) Equation (8.272) together with Equation (8.267), differentiated with respect to time, form the equation [M] −[B]T [B] 0 {¨q} λ = {F} + {F} −[ ˙B]{˙q} , (8.273) from which we obtain {¨q}, λ; then, by the Runge–Kutta method, we get the new values for {q} and {˙q}. 2. Numerical calculation With the initial values we calculate, successively, [ψ], [θ], [φ], [Aψ], [Aθ], [Aφ], [ ˙A], [Q], [Qψ], [Qθ], [Qφ], [ ˙Q] by relations (8.243) and (8.244) and then the co-ordinates XA, YA, ZA by relation (8.225) and {fp} = 2XA 0 1 T , (8.274) as well as the matrix [B] by relation (8.265). Hereafter, from equation (8.264) we obtain ˙XA, ˙YA, ˙ZA; we may thus calculate {˙fp} = 2 ˙XA 0 0 T , (8.275) and [ ˙B] = {˙fp}T [I] [A][rA]T [Q] + {fp}T [0] [ ˙A][rA]T[Q] + [A][rA]T[ ˙Q] , (8.276) and then the matrices [∆], {Fβ}, {F}, by relation (8.271), where [Qψ] = [0]. Finally, from equation (8.273) we calculate {¨q}, λ; then, by the Runge–Kutta method we deter- mine the new values {q}, {˙q}, the iteration process being then taken again. We obtain the diagrams in Figure 8.9. For the reaction, it follows that {NA} = λ{fp} = λ 2XA 0 1 T , (8.277) hence NA = λ 4X2 A + 1. (8.278) The graphic is drawn in Figure 8.10. Problem 8.4 Let ABCDA B C D in Figure 8.8 be the parallelepiped discussed in Problem 8.3, where the point A is situated without friction on the curve of equations X2 + Z − 1 = 0, X2 + (Y − 1)2 − 1 = 0. (8.279) Assuming the same data as in Problem 8.3 and the initial conditions X0 O = −0.3 m, Y0 O = 2.2 m, Z0 O = 0.9 m, ψ0 = 0 rad, θ0 = 0 rad, φ0 = 0 rad, ˙ψ0 = 0 rad s−1 , ˙θ0 = 0 rad s−1 , ˙φ0 = 0 rad s−1, let us determine • the trajectory of the point O; • the reaction at A.
  • 512. APPLICATIONS 505 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 t (s) t (s) t (s) t (s) t (s) t (s) XO (m) YO (m) ZO (m) XA (m) XA (m) ZA (m) ZA (m) YA (m) Y A (m) (a) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 (b) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.42 0.44 0.46 0.48 0.5 0.52 0.54 (d) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.54 0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74 (c) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 0.1 0.2 0.3 0.4 0.5 (e) (g) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.72 0.74 0.76 0.78 0.8 0.82 0.84 (f) 0.4 0.45 0.5 0.55 0.6 0 0.2 0.4 0.6 0.8 0.72 0.74 0.76 0.78 0.8 0.82 0.84 Figure 8.9 Variation diagrams.
  • 513. 506 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 1000 2000 3000 4000 5000 6000 t (s) N(N) Figure 8.10 The diagram NA = NA(t). Solution: 1. Theory In this case, the calculation algorithm remains, in principle, the same as that in the previous problem; the constraints matrix becomes [B] = f1p T [I] [A][rA]T [Q] {f2p}T [I] [A][rA]T [Q] , (8.280) where {f1p} = 2XA 0 1 T , {f2p} = 2XA 2YA − 2 0 T , (8.281) {˙f1p} = 2 ˙XA 0 0 T , {˙f2p} = 2 ˙XA 2 ˙YA 0 T . (8.282) It follows the calculation algorithm, so that – we determine the matrices [ψ], [θ], [φ], [A], [Aψ], [Aθ], [Aφ], [ ˙A], [Q], [Qψ], [Qθ], [Qφ]; – we determine the matrices {RA}, { ˙RA} {RA} = {RO} + [A]{rA}, { ˙RA} = { ˙RO} + [A][rA]T [Q]{˙β}; (8.283) – we determine the constraints matrix by relation (8.280) and its derivative by the relation [ ˙B] = ˙f1p T [I] [A][rA]T [Q] + {f1p}T [0] [ ˙A][rA]T[ ˙Q] + [A][rA]T[ ˙Q] {˙f2p}T [I] [A][rA]T [Q] + {f2p}T [0] [ ˙A][rA]T [ ˙Q] + [A][rA]T [ ˙Q] ; (8.284) – we calculate the matrices [M], {F} by the relations [m] =   m 0 0 0 m 0 0 0 m   , [M] = [m] [0] [0] [Q]T [J][Q] , [∆] =     ˙β T [Qψ]T [J][Q] {˙β}T [Qθ]T [J][Q] {˙β}T[Qφ]T[J][Q]     , {˙Fβ}T = [[ ˙Q]T [J][Q] + [Q]T [J][ ˙Q] + [∆]]{˙β}, {F} = 0 0 0 Fβ T T ; (8.285)
  • 514. APPLICATIONS 507 – we calculate {¨q}, λ1, λ2 from the equation [M] −[B]T [B] [0]   {¨q} λ1 λ2   = {F} + {F} −[ ˙B]{˙q} , (8.286) and then the new values of the matrices {q}, {˙q} by means of the Runge–Kutta method. The reaction NA reads {NA} = λ1{f1p} + λ2{f2p}, (8.287) NA = λ2 1{f1p}T{f1p} + λ2 2{f2p}T{f2p} + 2λ1λ2{f1p}T{f2p}. (8.288) 2. Numerical calculation We obtain the numerical results plotted in the diagrams in Figure 8.11 and Figure 8.12. Problem 8.5 Let us consider the system formed by n bodies, hung in a vertical plane and linked to one another in series (Fig. 8.13). Study the motion of this system. As numerical application, consider the system formed by four bodies (Fig. 8.14) for which n = 4, m1 = 10 kg, m2 = 8 kg, m3 = 50 kg, m4 = 16 kg, l1 = 4 m, l2 = 0.5 m, l3 = 0.5 m, l4 = 0.7 m, r1 = 2 m, r2 = 0.25 m, r3 = 0.25 m, r4 = 0.35 m, J1 = 13.3333 kg m2 , J2 = 0.1666 kg m2 , J3 = 1.0416 kg m2 , J4 = 0.6533 kg m2 . (8.289) The initial conditions are (for t = 0) θ0 1 = 0 rad, θ0 2 = 1 rad, θ0 3 = 3.12414 rad, θ0 4 = 3.12414 rad, ˙θ0 1 = 0 rad s−1 , ˙θ0 2 = 0.25 rad s−1 , ˙θ0 3 = 0 rad s−1 , ˙θ0 4 = 0 rad s−1 . (8.290) Solution: 1. Theory The following are known: • the masses of the n bodies mi, i = 1, . . . , n; • the moments of inertia relative to the gravity centers Ci of the bodies, calculated with respect to an axis perpendicular to the plane of the motion and denoted by Ji, i = 1, . . . , n; • the lengths of the bodies calculated from the link point to the previous body to the link point to the next body, denoted by li, i = 1, . . . , n; • the distances from the link point to the previous body to the gravity center, denoted by ri, i = 1, . . . , n. We are required to • establish the equations of motion of the bodies; • the numerical integration of these equations. To establish the equations of motion, we shall use the second-order Lagrange equations, which, in the general case of the holonomic constraints and assuming that the forces derive from a function of force, read d dt ∂T ∂ ˙qi − ∂T ∂qi + ∂V ∂qi = 0, (8.291)
  • 515. 508 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS XA (m) ZA (m) Y A (m) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 −0.33 −0.325 -0.32 −0.315 −0.31 −0.305 −0.3 −0.295 XO (m)ZO (m) XA (m)ZA (m) YA (m) YO(m) (a) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 2.05 2.1 2.15 2.2 t (s)t (s) t (s)t (s) t (s)t (s) (b) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 (c) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 −0.4 −0.35 −0.3 −0.25 −0.2 −0.15 −0.1 −0.05 0 (d) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 1.92 1.93 1.94 1.95 1.96 1.97 1.98 1.99 2 2.01 (e) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.85 0.9 0.95 1 (f) −0.4 −0.3 −0.2 −0.1 0 1.9 1.95 2 0.85 0.9 0.95 1 Figure 8.11 Variation diagrams.
  • 516. APPLICATIONS 509 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 200 400 600 800 1000 1200 1400 t (s) N(N) Figure 8.12 The diagram NA = NA(t). C1 m1g θ1 A1 C2 Cn A2 An–1 An θ2 m2g θn mng r1 l1 r2 l2 rn ln O Figure 8.13 Problem 8.5. C1θ1 θ3 A1 C2 C4 A4 θ2 r1 l1 r2 l2 r4 l4 O C3 A3 A2 l3 r3 m3g m1g m2g m4g θ4 Figure 8.14 Numerical application. where T denotes the kinetic energy of the system, V represents the potential energy, and qi, i = 1, n, is a generalized co-ordinate of the system. In this case, the kinetic energy is given by the relation T = n i=1 Ti, (8.292)
  • 517. 510 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS where Ti, i = 1, 2, . . . , n, are the kinetic energies of the component bodies of the system. These read Ti = 1 2 miv2 Ci + 1 2 Ji ˙θ2 i , (8.293) where vCi is the velocity of the gravity center of the body i, given by the relation v2 Ci = ˙x2 Ci + ˙y2 Ci . (8.294) We obtain T = 1 2 n i=1    mi   i−1 j=1 l2 j ˙θ2 j + r2 i ˙θ2 i + 2 i−2 j=1 i−1 k=j+1 lj lk ˙θj ˙θk cos θk − θj + 2 i−1 j=1 lj ri ˙θj ˙θk cos θi − θj   + Ji ˙θ2 i    . (8.295) Taking into account that the only forces that act are the weights of the bodies, the potential energy of the system takes the form V = −m1gr1 cos θ1 − m2g(l1 cos θ1 + r2 cos θ2) − m3g(l1 cos θ1 + l2 cos θ2 + r3 cos θ3) − · · · − mng(l1 cos θ1 + l2 cos θ2 + l3 cos θ3 + · · · + ln−1 cos θn−1 + rn cos θn). (8.296) With the notations Jii = Ji + mir2 i +   n j=i+1 mj   l2 i , (8.297) ai = miri +   n j=i+1 mj   li, (8.298) [J] =     J11 a2l1 cos θ1 − θ2 · · · anl1 cos(θ1 − θn) a2l1 cos(θ1 − θ2) J22 · · · anl2 cos(θ2 − θn) · · · · · · · · · · · · anl1 cos(θ1 − θn) anl2 cos(θ2 − θn) · · · Jnn     , (8.299) [A] =     0 a2l1 sin θ1 − θ2 · · · anl1 sin(θ1 − θn) −a2l1 sin(θ1 − θ2) 0 · · · anl2 sin(θ2 − θn) · · · · · · · · · · · · −anl1 sin(θ1 − θn) −anl2 sin(θ2 − θn) · · · 0     , (8.300) [K] =     ga1 0 · · · 0 0 ga2 · · · 0 · · · · · · · · · · · · 0 0 · · · gan     , (8.301) {θ} = θ1 θ2 · · · θn T , {¨θ} = ¨θ1 ¨θ2 · · · ¨θn T , {˙θ2 } = ˙θ2 1 ˙θ2 2 · · · ˙θ2 n T , {sin θ} = sin θ1 sin θ2 · · · sin θn T , (8.302)
  • 518. APPLICATIONS 511 where the elements of the matrices [J], [A], and [K] are given by the formulae Jpq =    Jpp for p = q, aqlp cos θp − θq for p < q, Jqp for p > q, (8.303) Apq =    0 for p = q, aqlp sin θp − θq for p < q, −Apq for p > q, (8.304) and Kpq = gap for p = q, 0 for p = q, (8.305) respectively, and the system of equations of motion reads [J]{¨θ} + [A]{˙θ2 } + [K]{sin θ} = {0}. (8.306) Relation (8.306) can be written in the form {¨θ} = −[J]−1 [A]{˙θ}2 − [J]−1 [K]{sin θ}. (8.307) With the notations θ1 = ξ1, θ2 = ξ2, . . . , θn = ξn, ˙θ1 = ξn+1, ˙θ2 = ξn+2, . . . , ˙θn = ξ2n, (8.308) [B] = [J]−1 [A], (8.309) [L] = [J]−1 [K], (8.310) we obtain the system dξi dt =    ξn+i for i ≤ n, − n j=1 Bi−n,j ξ2 n+j − n j=1 Li−n,j sin ξj for i > n. (8.311) 2. Numerical calculation In the case of the numerical application, we obtain, with the aid of the fourth-order Runge–Kutta method, the numerical results plotted in the diagrams in Figure 8.15. Problem 8.6 Let the kinematic schema in Figure 8.16 be of a torque converter of G. Constantinescu.12 It is composed of the principal axle 1, the floating lever 2, the connection bars 3, 3, and the bars 4, 4. The principal axle is articulated to the floting lever at the point A, the last one acting the connection bars by the multiple articulation B. The connection bars are acting the bars 4, 4 by the articulations D, D. The bars 4, 4 are hinged at the fixed point E. Thus, the motion of rotation of the principal axle 1 is transformed into the oscillatory plane-parallel motion of the lever 2, and this is transformed, by means of a coupling system, into a motion of rotation in the same sense of the secondary axle 5. In Figure 8.16, the simplest system of coupling, formed by the ratchet wheel 5 and the ratchets 6, 6, has been chosen. 12After George “Gogu” Constantinescu (1881–1965) who created the theory of sonics in A Treatise of Transmission of Power by Vibrations in 1918. This torque converter is an invention of G. Constantinescu.
  • 519. 512 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS t (s) t (s) t (s) t (s) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 θ1 (rad) θ3 (rad) θ2 (rad) θ4 (rad) (a) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 (b) 3 4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 −3 −2 −1 0 1 2 (c) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 −2 −1 0 1 2 3 4 (d) Figure 8.15 Results of the simulation. O A X Y E B C mg D D ~ 3 6 4 63 5 1 M5 M1 ω1 θ γ ~ ~ ~ ψϕ 2 Figure 8.16 Torque converter of G. Constantinescu. The following are known: – the distances OA = a, AB = b, AC = l, BD = BD = c, ED = ED = d, xE, yE; – the moment of inertia J1 and the mass m; – the motor torque M1 = M0 − kω2 ; (8.312) – the resistant torque M5 = M5 if ˙ψ ≥ 0, −M5 if ˙ψ < 0; (8.313)
  • 520. APPLICATIONS 513 – the initial conditions (which have to be consistent with position and velocity constraint equations, see below) t = 0, φ = φ0, θ = θ0, γ = γ0, ψ = ψ0, ˙φ = ˙θ = ˙γ = ˙ψ = 0. It is required to determine and represent graphically • ω1(t), ω5(t) = | ˙ψ|; • the trajectory of the point B. Numerical application: l = 0.3 m, a = 0.015 m, b = 0.15 m, c = 0.25 m, d = √ b2 − a2, xE = √ c2 − d2, yE = d, m = 3 kg, J1 = 0.1 kg m3 , M0 = 3.2 Nm, k = 2 × 10−5 Nms2 , M5 = 20 Nm, φ0 = −π/2, θ0 = arg tan(a/d), γ0 = arg tan(d/ √ c2 − d2), ψ0 = 0 rad, ˙φ = ˙θ = ˙γ = ˙ψ = 0 rad s−1 . Solution: 1. Theory The chosen mechanical model is that in which the bodies 3, 3, 4, 4 have no mass, while the one-directional system formed by these bars leads (approximately) to a symmetry of the motion of the bars 4, 4. Under these conditions, we study the motion of the mechanism with two degrees of freedom, formed by the elements 1, 2, 3, 4, 5, the bar 4 being acted on by the torque M5, given by relation (8.313). We obtain the equations of constraints a sin φ + b sin θ + c cos γ − d sin ψ = XE, −a cos φ + b cos θ − c sin γ + d cos ψ = YE; (8.314) by differentiation with respect to time, denoting by [B] the matrix of constraints [B] = a cos φ b cos θ −c sin γ −d cos ψ a sin φ −b sin θ −c cos γ −d sin ψ (8.315) and by {q} the column matrix of the generalized co-ordinates {q} = φ θ γ ψ T , (8.316) we obtain the equation of constraints [B]{˙q} = {0}. (8.317) The kinetic energy T of the system reads T = 1 2 [J1 ˙φ2 + m( ˙X2 C + ˙Y2 C)] (8.318) or T = 1 2 [(J1 + ma2 )˙φ2 + ml2 ˙θ2 + 2mal ˙φ˙θ cos(φ + θ)]. (8.319) Using Lagrange’s equations, we write successively the relations d dt ∂T ∂ ˙φ = (J1 + ma2 )¨φ + mal ¨θ cos(φ + θ) − mal(˙φ + ˙θ) sin(φ + θ), (8.320)
  • 521. 514 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS ∂T ∂φ = −mal ˙φ˙θ sin(φ + θ), (8.321) d dt ∂T ∂˙θ = ml2 ¨θ + mal ¨φ cos(φ + θ) − mal ˙φ(˙φ + ˙θ) sin(φ + θ), (8.322) ∂T ∂θ = −mal ˙φ˙θ sin(φ + θ), (8.323) d dt ∂T ∂ ˙γ = d dt ∂T ∂ ˙ψ = 0, ∂T ∂γ = ∂T ∂ψ = 0; (8.324) because the generalized forces are Qφ = M1 + mg sin φ, Qθ = −mgl sin θ, Qψ = −M5, (8.325) Lagrange’s equations, which are of the form d dt ∂T ∂ ˙qk − ∂T ∂ ˙qk = Qk + B1kλ1 + B2kλ2, (8.326) B1k, B2k being the elements of the matrix [B], while λ1, λ2 are Lagrange’s multipliers, are written in the matrix form [M]{¨q} = {F} + {F} + [B]T {λ}, (8.327) where [M] =     J1 + ma2 mal cos (φ + θ) 0 0 mal cos(φ + θ) ml2 0 0 0 0 0 0 0 0 0 0     , (8.328) {F} = Qφ Qθ 0 Qψ T , (8.329) {F} = mal ˙θ2 ˙φ2 0 0 T sin(φ + θ), (8.330) {λ} = λ1 λ2 T . (8.331) If to the differential equation (8.327) we add equation (8.317), differentiated with respect to time, we obtain the matrix differential equation [M] −[B]T [B] [0] {¨q} {λ} = {F} + {F} −[ ˙B]{˙q} , (8.332) where [ ˙B] = −a ˙φ sin φ −b˙θ sin θ −c˙γ cos γ d ˙ψ sin ψ a ˙φ cos φ −b˙θ cos θ c˙γ sin γ −d ˙ψ cos ψ . (8.333) For the given initial conditions, from equation (8.332) we determine the matrices {q}, {λ}; then, by the Runge–Kutta numerical method, we determine the new values of the matrices {q}, {˙q}, which become the initial conditions for the following integration step. This problem is a particular one in the class of drift and constraint stabilization.
  • 522. APPLICATIONS 515 0 0.5 1 1.5 2 2.5 −4.5 −4 −3.5 −3 −2.5 −2 −1.5 t (s) ω1(rads−1 ) Figure 8.17 Variation of ω1 = ω1(t). 0 0.5 1 1.5 2 2.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 t (s) ω5(rads−1) Figure 8.18 Variation of ω5 = ω5(t). 2. Numerical calculation On the basis of the calculation algorithm constructed by means of relations (8.312), (8.313), (8.315), (8.316), (8.323), (8.329), (8.330), (8.331), (8.333), and (8.332) as well as of the relations XB = a sin φ + b sin θ, YB = −a cos φ + b cos θ, (8.334) the results plotted in the diagrams in Figure 8.17, Figure 8.18, and Figure 8.19 have been obtained. Problem 8.7 We consider the toroidal wheel of radius r0 and balloon radius r, which, under the influence of the weight mg, is rolling without sliding on a horizontal plane. Knowing that, at the initial moment, the wheel axis is inclined by the angle θ0 with respect to the vertical and that the angular velocity is parallel to the rotation axis of the wheel and has the value ω0, let us determine
  • 523. 516 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS 0 0.01 0.02 0.03 0.04 0.05 0.06 0.13 0.135 0.14 0.145 0.15 0.155 0.16 0.165 XB (m) YB(m) Figure 8.19 Variation of YB = YB(XB). • the variation in time of the inclination angle of the wheel axis with respect to the vertical; • the trajectory of the point of contact wheel-plane; • the variation in time of the contact forces wheel-plane. Numerical application: r0 = 0.3 m, r = 0.05 m, m = 20 kg, Jx = Jy = 0.9 kg m2, Jz = 1.8 kg m2, θ0 = 5π/12 rad. Solution: 1. Theory 1.1. Equations of the torus We consider the circle of radius r situated in the plane Oy z (Fig. 8.20), the center C of it being chosen so as to have yC = −r0. The Oy z-plane is obtained by rotation with the angle η of the Oyz-plane around the Oz-axis. By the notations in Figure 8.20, the co-ordinates of a point of the circle in the system Ox y z are x = 0, y = −(r0 + r cos ξ), z = r sin ξ. (8.335) By rotating the circle, we obtain the torus, the parametric equations of which are obtained, in the Oxyz-frame, from the relation   x y z   =   cos η − sin η 0 sin η cos η 0 0 0 1     0 − r0 + r cos ξ r sin ξ   ; (8.336) it follows that x = (r0 + r cos ξ) sin η, y = −(r0 + r cos ξ) cos η, z = r sin ξ. (8.337)
  • 524. APPLICATIONS 517 x y x′ z O y′ ξ η η Figure 8.20 Equations of the torus. M y x X Z YO O z Figure 8.21 Conditions of tangency of the torus with the plane. 1.2. Conditions of tangency of the torus with the plane We take as rolling plane the horizontal O0XY -plane (Fig. 8.21) and we choose as rotation angles Euler’s angles ψ, θ, φ, to which correspond the partial rotation matrices [ψ] =   cos ψ − sin ψ 0 sin ψ cos ψ 0 0 0 1   , [θ] =   1 0 0 0 cos θ − sin θ 0 sin θ cos θ   , [φ] =   cos φ − sin φ 0 sin φ cos φ 0 0 0 1   , (8.338) and the rotation matrix [A] of the frame Oxyz with respect to the frame O0XYZ [A] = [ψ][θ][φ]. (8.339) Denoting by {r}, {rξ}, {rη} the matrices {r} =   r0 + r cos ξ sin η −(r0 + r cos ξ) cos η r sin ξ   , {rξ} =   −r sin ξ sin η r sin ξ cos η r cos ξ   , {rη} =   r0 + r cos ξ cos η (r0 + r cos ξ) sin η 0   , (8.340) the tangency conditions at the point M are written in the form 0 0 1 [A]{rξ} = 0, 0 0 1 [A]{rη} = 0; (8.341)
  • 525. 518 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS hence, we obtain the equations sin θ sin(φ + η) = 0, sin θ sin ξ cos(φ + η) + cos θ cos ξ = 0, (8.342) from which it follows that η = −φ, ξ = θ − π 2 . (8.343) 1.3. Initial conditions If we choose the frame of reference O0XYZ, so that the contact point at the initial moment is O0, the Ox-axis is parallel to the O0Y-axis, while the Oz-axis is normal to the O0Y-axis, then, at the initial moment, the conditions ψ = π 2 , θ = θ0, φ = 0, [A] =   0 −1 0 1 0 0 0 0 1     1 0 0 0 cos θ0 − sin θ0 0 sin θ0 cos θ0   , {r} =   0 − r0 + r sin θ0 −r cos θ0   , (8.344) are fulfilled; also, from the contact equation at O0   0 0 0   =   XO YO ZO   + [A]{r}, (8.345) we obtain the initial conditions XO = −r0 cos θ0, YO = 0, ZO = r0 sin θ0 + r. (8.346) From the conditions specified in the enunciation, it also follows that at the initial moment ˙ψ = ˙θ = 0, ˙φ = ω0, (8.347) while, from the condition of rolling without sliding, we get   ˙XO ˙YO ˙ZO   + [A][r]T [Q]   ˙ψ ˙θ ˙φ   = {0}; (8.348) knowing that [r] =   0 r cos θ0 − r0 + r sin θ0 −r cos θ0 0 0 r0 + r sin θ0 0 0   , [Q] =   sin φ sin θ cos φ 0 cos φ sin θ − sin φ 0 cos θ 0 1   =   0 1 0 sin θ0 0 0 cos θ 0 1   , (8.349) we obtain the initial conditions ˙XO = ˙YO = 0, ˙ZO = −(r0 + r sin θ0)ω0. (8.350)
  • 526. APPLICATIONS 519 1.4. The constraints matrix Taking into account relation (8.343), from the last relation (8.340) we get {r} =   − r0 + r sin θ sin φ −(r0 + r sin θ) cos φ −r cos θ   ; (8.351) with the notations [r] =   0 r cos θ − r0 + r sin θ cos φ −r cos θ 0 (r0 + r sin θ) sin φ (r0 + r sin θ) cos φ −(r0 + r sin θ) sin φ 0   , (8.352) from equation (8.348) we obtain the constraints matrix [B] = [I] [A][r]T [Q] . (8.353) The derivative with respect to time of the constraints matrix is [ ˙B] = [0] [ ˙A][r]T[Q] + [A][˙r]T[Q] + [A][r]T[ ˙Q] , (8.354) where [˙r] =   0 −˙z ˙y ˙z 0 −˙x − ˙y ˙x 0   , (8.355) ˙x = −r˙θ cos θ sin φ − ˙φ(r0 + r sin θ) cos φ, ˙y = −r˙θ cos θ sin φ + ˙φ(r0 + r sin θ) sin φ, ˙z = −r˙θ sin θ. (8.356) 2. Numerical calculation As has been shown in Problem 8.6, the matrix differential equation of the motion is [M] −[B]T [B] [0] {¨q} {λ} = {F} + {F} −[ ˙B]{˙q} , (8.357) where [m] =   m 0 0 0 m 0 0 0 m   , [J] =   Jx 0 0 0 Jy 0 0 0 Jz   , (8.358) [Q] =   sin φ sin θ cos φ 0 cos φ − sin φ 0 cos θ 0 1   , (8.359) [M] = [m] [0] [0] [Q]T[J][Q] , (8.360) {q} = XO YO ZO ψ θ φ T , {λ} = λ1 λ2 λ3 T , (8.361) {F} = 0 0 −mg 0 0 0 T , (8.362) {β} = ψ θ φ T , (8.363)
  • 527. 520 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS [Uψ] = [Uφ] =   0 −1 0 1 0 0 0 0 0   , [Uθ] =   0 0 0 0 0 −1 0 1 0   , (8.364) [Qψ] = [0], [Qθ] =   sin φ cos θ 0 0 cos φ cos θ 0 0 − sin θ 0 0   , [Qφ] =   cos φ sin θ − sin φ 0 − sin φ sin θ − cos φ 0 0 0 0   , (8.365) [∆] =    ˙β T [Qψ]T [J][Q] {˙β}T [Qθ]T [J][Q] {˙β}T [Qφ]T [J][Q]    , (8.366) [ ˙Q] = ˙θ[Qψ] + ˙φ[Qφ], [Aψ] = [Uψ][A], [Aθ] = [A][φ]T [Uθ][φ], [Aφ] = [A][Uφ], (8.367) {Fβ} = [[ ˙Q]T [J][Q] + [Q]T [J][ ˙Q] + [∆]]{˙β}, {F} = 0 0 0 Fβ T T . (8.368) By solving equation (8.357), we determine the functions XO(t), YO(t), ZO(t), ψ(t), θ(t), φ(t). The variation of the inclination angle θ is given in Figure 8.22. The trajectory of the contact point is obtained by means of the co-ordinates X, Y, Z = 0, which are obtained from the relation   X Y Z   =   XO YO ZO   + [A]{r}; (8.369) it results in the trajectory in Figure 8.23. The reaction of contact has the components along the axes O0X, O0Y, O0Z RX = λ1, RY = λ2, RZ = λ3; (8.370) thus, the force tangent to the wheel is Ft = ˙Xλ1 + ˙Yλ2 ˙X2 + ˙Y2 , (8.371) while the force in the plane of contact, normal to the tangent at the wheel, is Fn = ˙Yλ1 − ˙Xλ2 ˙X2 + ˙Y2 . (8.372) The variation in time of the forces RZ, Ft , Fn is given in Figure 8.24, Figure 8.25, and Figure 8.26. Problem 8.8 (Postcritical behavior of the cantilever beam). Let us consider a cantilever beam of length l, acted upon by the constant axial force P (Fig. 8.27). The mathematical model of the problem may be expressed in the nonlinear general form dy ds = sin θ, dθ ds = α2 (f − y), α2 = P EI , (8.373) where ds = (dx)2 + (dy)2, Oxis the direction along the bar axis, O corresponds to the bar left end, Oy is the transverse axis, θ is the rotation of the bar cross section, and EI is the constant
  • 528. APPLICATIONS 521 0 1 2 3 4 5 6 7 8 9 10 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 t (s) θ(rad) Figure 8.22 The variation θ = θ(t). −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 X (m) Y(m) Figure 8.23 Trajectory of the contact point. bending rigidity of the bar (E is the modulus of longitudinal elasticity, I is the moment of inertia of the cross section with respect to the neutral axis). The solution must be found under null Cauchy conditions y(0) = 0, θ(0) = 0. (8.374) We firstly perform the change of function y(x) = y(x) − f, (8.375) and then apply the LEM mapping, which in this case will depend on two parameters ν(x, σ, ξ) = eσy(s)+ξθ(s) ; (8.376)
  • 529. 522 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS 0 1 2 3 4 5 6 7 8 9 10 150 200 250 300 350 400 t (s) RZ(n) Figure 8.24 The variation RZ = RZ(t). 0 1 2 3 4 5 6 7 8 9 10 −25 −20 −15 −10 −5 0 5 10 15 20 25 Ft(N) Figure 8.25 The variation Ft = Ft (t). this leads to the first linear partial differential equation, equivalent to equation (8.373), the first LEM equivalent ∂ν ∂x = σ sin Dξν − α2 ξ ∂ν ∂σ . (8.377) By sin Dξ, we mean the operator obtained by formally replacing the powers of θ with derivatives with respect to ξ of the same order in the expansion of sin θ. Considering for ν a series expansion in σ and ξ, we get the second LEM equivalent dνij ds = i ∞ k=1 (−1)k+1 (2k − 1)! νi−1,j+2k−1 − jα2 νi+1,j−1. (8.378)
  • 530. APPLICATIONS 523 0 1 2 3 4 5 6 7 8 9 10 −200 −180 −160 −140 −120 −100 −80 −60 −40 −20 t (s) Fn(N) Figure 8.26 The variation Fn = Fn(t). x y O P l f θ Figure 8.27 Problem 8.8. Applying Theorem 8.4, we get the following normal LEM representation y(x) ≡ −f (cos αx − 1) − f 2 α2 (αx) − f 4 α4 (αx), (8.379) where (αx) is analytic in αx and (αx) is given by (αx) = 1 16 1 4 (cos 3αx − cos αx) + αx sin αx . (8.380) To equation (8.379) we apply the condition y(l) = f , meaning that the bar length remains still l if the shortening is neglected in postcritical behavior. This gives cos αl + (αf )2 (αl) ≡ 0, (8.381) in fact, an approximate relationship depending on the parameters f and α. From equation (8.381), by elementary computation we obtain f l ∼= 4 αl 2 cot αl sin 2αl − 2αl , π 2 < αl < π, (8.382) which is, in fact, a direct LEM representation of the postcritical values of f/l as a function of the supraunitary ratio P/Pcr (Pcr = π2EI /(4l2) is the critical force). It will be marked by LEM.
  • 531. 524 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS Considering for α the following expansion α = α0 + α1f + α2 f 2 2! + · · · ≡ ∞ j=0 αj f j j! , (8.383) and, introducing it in equation (8.381), a power series in f appears that must vanish identically. Determining the coefficients αj up to j = 2, we obtain α α0 ∼= 1 + 16α2 0f 2 , (8.384) from which another approximate LEM formula for the postcritical values of f/l, marked by LEM1 is finally deduced: f l ∼= 8 π P Pcr − 1. (8.385) We can also relate the dimensionless quantities αl and αf by taking (αf )2 = ∞ j=0 pj (α − α0)j lj j! ; (8.386) introducing this in formula (8.381) again leads to a series in αl, whose coefficients must vanish. Going as far as j = 1, we obtain the following approximating value for αf (αf )2 ∼= 16(α − α0)l, (8.387) and from equation (8.381) we get a third formula for the postcritical cantilever bar, marked by LEM2, f l ∼= 8 π Pcr P P Pcr − 1, (8.388) which coincides with Schneider’s formula. The form of these formulae suggests a comparison with Grashof’s formula (marked by G) f l ∼= 8 π P Pcr P Pcr − 1, (8.389) established from the well-known form of the solution of the cantilever problem by using elliptic integrals. The LEM representation for y was also used to get good postcritical formulae for other quantities of interest, such as δ/l, where δ = l − x(l), the displacement of the bar along its straight axis, and θ(l). In Table 8.22 the values of the ratio f/l, expressed by elliptic integrals (exact solution), are compared f l = 2k K(k) , k = sin θl 2 , K(k) = π 2 P Pcr , (8.390) with LEM, LEM1, LEM2, and G.
  • 532. FURTHER READING 525 TABLE 8.22 The Values of the Ratio f/l Computed Comparatively by Using Three LEM Variants, Grashof’s Formula, and Elliptic Integrals P/Pcr 1.004 1.015 1.035 1.063 1.102 1.152 1.215 1.293 Exact solution 0.110 0.220 0.324 0.422 0.514 0.594 0.662 0.720 LEM 0.110 0.220 0.324 0.422 0.516 0.601 0.676 0.741 LEM1 0.116 0.220 0.329 0.435 0.541 0.642 0.738 0.829 LEM2 (Schneider) 0.114 0.220 0.335 0.448 0.563 0.689 0.814 0.942 G 0.114 0.221 0.341 0.462 0.596 0.740 0.898 1.072 This comparison is emphasized for 1 < P/Pcr < 1.3, the formulae approximating the postcritical behavior of the cantilever bar being ordered with respect to their “goodness.” The mean square errors with respect to the exact solution are 0.24% for LEM, 1.36% for LEM1, 2.67% for LEM2 (Schneider), and 4.22% for G. These results point out that LEM leads to quite simple formulae, which give very good approximations for the ratio f/l, and that it is, in any case, much better than Grashof’s formula. Similar conclusions can be drawn for the ratio δ/l and for θ(l). We can conclude that the method presented here provides direct approximate formulae for f/l, δ/l, and θ(l) in the case of the cantilever bar, as well as critical values for the loads, considering various hypotheses. It must also be mentioned that this method, based on LEM, does not depend on some particular mechanical interpretation. Using the same pattern, we can obtain similar results for various cases of loading and support. FURTHER READING Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of America. Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis: Theory, Methods and Practice. Boca Raton: CRC Press. Atkinson KE (1989). An Introduction to Numerical Analysis. 2nd ed. New York: John Wiley & Sons, Inc. Atkinson KE (2003). Elementary Numerical Analysis. 2nd ed. Hoboken: John Wiley & Sons, Inc. Babuˇska I, Pr´ager M, Vit´asek E (1966). Numerical Processes in Differential Equations. Prague: SNTI. Bakhvalov N (1976). M´ethodes Num´erique. Moscou: Editions Mir (in French). Boyce WE, DiPrima RC (2008). Elementary Differential Equations and Boundary Value Problems. 9th ed. Hoboken: John Wiley & Sons, Inc. Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole. Chapra SC (1996). Applied Numerical Methods with MATLAB for Engineers and Scientists. Boston: McGraw-Hill. Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson. Constantinescu G (1985). Teoria sonicit˘at¸ii. Bucures¸ti: Editura Academiei (in Romanian). Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice Hall. D´emidovitch B, Maron I (1973). ´El´ements de Calcul Num´erique. Moscou: Editions Mir (in French). Den Hartog JP (1961). Strength of Materials. New York: Dover Books on Engineering. DiBenedetto E (2010). Classical Mechanics: Theory and Mathematical Modeling. New York: Springer- Verlag. Epperson JF (2007). An Introduction to Numerical Methods and Analysis. Hoboken: John Wiley & Sons, Inc.
  • 533. 526 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS Fung YC, Tong P (2011). Classical and Computational Solid Mechanics. Singapore: World Scientific Publishing. Gautschi W (1997). Numerical Analysis: An Introduction. Boston: Birkh¨auser. Godunov SK, Reabenki VS (1977). Scheme de Calcul cu Diferent¸e Finite. Bucures¸ti: Editura Tehnic˘a (in Romanian). Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen- tation of Algorithms. Princeton: Princeton University Press. Hamming RW (1987). Numerical Methods for Scientists and Engineers. 2nd ed. New York: Dover Publications. Hamming RW (2012). Introduction to Applied Numerical Analysis. New York: Dover Publications. Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing. Hibbeler RC (2010). Mechanics of Materials. 8th ed. Englewood Cliffs: Prentice Hall. Hildebrand FB (1987). Introduction to Numerical Analysis. 2nd ed. New York: Dover Publications. Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill. Iserles A (2008). A first Course in the Numerical Analysis of Differential Equations. 2nd ed. Cam- bridge: Cambridge University Press. Ixaru LG (1979). Metode Numerice pentru Ecuat¸ii Diferent¸iale cu Aplicat¸ii. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Jazar RN (2008). Vehicle Dynamics: Theory and Applications. New York: Springer-Verlag. Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd ed. Boca Raton: CRC Press. Kleppner D, Kolenkow RJ (2010). An Introduction to Mechanics. Cambridge: Cambridge University Press. Kress R (1996). Numerical Analysis. New York: Springer-Verlag. Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill. Levine L (1964). Methods for Solving Engineering Problems Using Analog Computers. New York: McGraw-Hill. Lurie AI (2002). Analytical Mechanics. New York: Springer-Verlag. Mabie HH, Reinholtz CF (1987). Mechanisms and Dynamics of Machinery. 4th ed. New York: John Wiley & Sons, Inc. Lurie AI (2005). Theory of Elasticity. New York: Springer-Verlag. Marciuk GI (1983). Metode de Analiz˘a Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Roma- nian). Marciuk GI, S¸aidurov VV (1981). Cres¸terea Preciziei Solut¸iilor ˆın Scheme cu Diferent¸e. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Marinescu G (1974). Analiza Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Meriam JL, Kraige LG (2012). Engineering Mechanics: Dynamics. Hoboken: John Wiley & Sons, Inc. Otto SR, Denier JP (2005). An Introduction to Programming and Numerical Methods in MATLAB. London: Springer-Verlag. Palm WJ III (2007). Mechanical Vibrations. Hoboken: John Wiley & Sons, Inc. Pandrea N (2000). Elemente de Mecanica Solidului ˆın Coordonate Pl¨uckeriene. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Pandrea N, Pˆarlac S, Popa D (2001). Modele pentru Studiul Vibrat¸iilor Automobilelor. Pites¸ti: Tiparg (in Romanian). Pandrea N, Popa D (2000). Mecanisme. Teorie s¸i Aplicat¸ii CAD. Bucures¸ti: Editura Tehnic˘a (in Romanian). Pandrea N, St˘anescu ND (2002). Mecanic˘a. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian). Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific Computing. 3rd ed. Cambridge: Cambridge University Press.
  • 534. FURTHER READING 527 Quarteroni A, Sacco R, Saleri F (2010). Numerical Mathematics. 2nd ed. Berlin: Springer-Verlag. Ralston A, Rabinowitz P (2001). A First Course in Numerical Analysis. 2nd ed. New York: Dover Publications. Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press. Salvadori MG, Baron ML (1962). Numerical Methods in Engineering. Englewood Cliffs: Prentice Hall. Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson. Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN. Bucures¸ti: Editura Tehnic˘a (in Romanian). Sinha AK (2010). Vibration of Mechanical Systems. Cambridge: Cambridge University Press. Soare M, Teodorescu PP, Toma I (2010). Ordinary Differential Equations with Applications to Mechan- ics. Dordrecht: Springer-Verlag. St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian). St˘anescu ND, Munteanu L, Chiroiu V, Pandrea N (2007). Sisteme Dinamice: Teorie s¸i Aplicat¸ii. Volume 1. Bucures¸ti: Editura Academiei Romˆane (in Romanian). St˘anescu ND, Munteanu L, Chiroiu V, Pandrea N (2011). Sisteme Dinamice. Teorie s¸i Applicat¸ii. Volume 2. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Stoer J, Bulirsh R (2010). Introduction to Numerical Analysis. 3rd ed. New York: Springer-Verlag. Stuart AM, Humphries AR (1998). Dynamical Systems and Numerical Analysis. Cambridge: Cam- bridge University Press. S¨uli E, Mayers D (2003). An Introduction to Numerical Analysis. Cambridge: Cambridge University Press. Teodorescu PP (2008). Mechanical Systems: Classical Models. Volume 2: Mechanics of Discrete and Continuous Systems. Dordrecht: Springer-Verlag. Teodorescu PP (2009). Mechanical Systems: Classical Models. Volume 3: Analytical Mechanics. Dor- drecht: Springer-Verlag. Toma I (2008). Metoda Echivalent¸ei Lineare s¸i Aplicat¸iile Ei ˆın Mecanic˘a. Bucures¸ti: Editura Tehnic˘a (in Romanian). Udris¸te C, Iftode V, Postolache M (1996). Metode Numerice de Calcul. Algoritmi s¸i Programe Turbo Pascal. Bucures¸ti: Editura Tehnic˘a (in Romanian).
  • 535. 9 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS AND OF SYSTEMS OF PARTIAL DIFFERENTIAL EQUATIONS 9.1 INTRODUCTION Many problems of science and technique lead to partial differential equations. The mathematical theories of such equations, especially of the nonlinear ones, are very intricate, such that their numerical study becomes inevitable. To classify the partial differential equations, we may use various criteria, that is, • considering the order of the derivatives, we have equations of first order, second order, or nth order; • considering the linearity character, we have linear, quasilinear, or nonlinear equations; • considering the influence of the integration domain at a point, we have equations of elliptic, parabolic, or hyperbolic type; • considering the types of limit conditions, we get Dirichlet, Neumann, or mixed problems. The partial differential equations which will be dealt with further are mostly the usual equations, the existence and the uniqueness of the solution being assured. 9.2 PARTIAL DIFFERENTIAL EQUATIONS OF FIRST ORDER The partial differential equations of first order have the general form n i=1 ai(x1, x2, . . . , xn, u) ∂u ∂xi = b(x1, x2, . . . , xn, u), (9.1) where u is the unknown function, xi, i = 1, n, are the independent variables, while ai, i = 1, n, and b are functions that do not depend on the partial derivatives ∂u/∂xi, i = 1, n. Numerical Analysis with Applications in Mechanics and Engineering, First Edition. Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.  2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc. 529
  • 536. 530 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS Definition 9.1 (i) If the functions ai, i = 1, n, and b do not depend on the unknown function u, then the equation is linear. (ii) If the function b is identically zero, b ≡ 0, then the equation is called homogeneous. The solution of equation (9.1) is reduced to the solving of a system of n ordinary differential equations dx1 a1(x1, . . . , xn, u) = dx2 a2(x1, . . . , xn, u) = · · · = dxn an(x1, . . . , xn, u) = du b(x1, . . . , xn, u) . (9.2) Definition 9.2 System (9.2) is called a characteristic system. In general, the solution of equation (9.1) is an n-dimensional hypersurface in a domain Dn+1 ⊂ Rn+1 , the solution being of the form F(x1, . . . , xn, u) = 0, or of the form u = f (x1, . . . , xn). In the case of the Cauchy problem, the n-dimensional integral hypersurface pierces an (n − 1)- dimensional hypersurface , contained in the (n + 1)-dimensional definition domain, the hypersur- face being the intersection of two n-dimensional hypersurfaces, F1(x1, . . . , xn, u) = 0; F2 = (x1, . . . , xn, u) = 0. (9.3) The solution of system (9.2) depends on n arbitrary constants Ci, i = 1, n, and is of the form φi(x1, . . . xn, u) = Ci, i = 1, n. (9.4) Definition 9.3 The hypersurfaces φi(x1, . . . xn, u) = Ci, i = 1, n, are called characteristic hyper- surfaces and depend on one parameter. Relations (9.3) and (9.4) form a system of n + 2 equations from which n + 1 variables x1, x2, . . . , xn, u are expressed as functions of Ci, i = 1, n; introducing in the last equation, we obtain (Ci, . . . , Cn) = 0. (9.5) From equations (9.4) and (9.5) we get the solution (C1, . . . , Cn) = (φ1, . . . , φn) ≡ F(x1, . . . , xn, u) = 0 (9.6) To solve the problem numerically, we proceed as follows. We seek the solution in the domain Dn+1 ⊂ Rn+1, which contains the hypersurface of equation (9.3). We divide conveniently the hypersurface , observing that the values at the knots represent initial conditions for the system of differential equation (9.2). If b ≡ 0, then the system (9.2) is simpler and reads dx1 a1(x1, . . . , xn, u0) = · · · = dxn an(x1, . . . , xn, u0) , (9.7) where u = u0 = const is a first integral of the system. There are two possibilities to tackle a numerical solution. The first implies the use of explicit schemata, while the second implies the use of implicit schemata.
  • 537. PARTIAL DIFFERENTIAL EQUATIONS OF FIRST ORDER 531 9.2.1 Numerical Integration by Means of Explicit Schemata The first step, in this case, consists of discretization of the partial differential equation, that is, dividing the domain by means of a calculation net and by replacing the partial differential equation by a new and simpler equation. The simplest method is based on finite differences. Let us deal with this method for a simple problem, that is, the partial differential equation of first order with two independent variables a1(x1, x2, u) ∂u ∂x1 + a2(x1, x2, u) ∂u ∂x2 = b(x1, x2, u); x1 ∈ [0, l1]; x2 ∈ [0, l2]. (9.8) To solve equation (9.8), there are necessary initial conditions of the form u(x1, 0) = f (x1). (9.9) Sometimes, limit conditions of the form u(0, x2) = g0(x2), u(l1, x2) = g1(x2) (9.10) are put, where the functions f , g0, and g1 are known. The numerical solution of equation (9.8) implies the division of the rectangular domain [0, l1] × [0, l2] by means of a net with equal steps on each axis, denoted by h and k for the variables x1 and x2, respectively (Fig. 9.1). Using the expansion of the function u(x1, x2) into a Taylor series around the point A(xi 1, x j 2 ), we get u(xi−1 1 , x j 2 ) = u(xi 1, x j 2 ) − h ∂(xi 1, x j 2 ) ∂x1 + O(h2 ), (9.11) u(xi 1, x j+1 2 ) = u(xi 1, x j 2 ) + k ∂(xi 1, x j 2 ) ∂x2 + O(k2 ), (9.12) O k hh k j+1 x2 i+1 x1 j−1 x2 j−1 x1 xj 2 xi 1 x1 A x2 Figure 9.1 The calculation net for equation (9.8).
  • 538. 532 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS where xi 1 = ih, i = 0, I, x j 2 = jk, j = 0, J, h = l1/I, k = l2/J. It follows that ∂u(xi 1, x j 2 ) ∂x1 = u(xi 1, x j 2 ) − u(xi−1 1 , x j 2 ) h + O(h), (9.13) ∂u(xi 1, x j 2 ) ∂x2 = u(xi 1, x j+1 2 ) − u(xi 1, x j 2 ) k + O(k). (9.14) Neglecting O(h) and O(k), in equations (9.13) and (9.14), we obtain the equation with finite differences a1(xi 1, x j 2 , u(xi 1, x j 2 )) u(xi 1, x j 2 ) − u(xi−1 1 , x j 2 ) h + a2(xi 1, x j 2 , u(xi 1, x j 2 )) u(xi 1, x j+1 2 ) − u(xi 1, x j 2 ) k = b(xi 1, x j 2 , u(xi 1, x j 2 )). (9.15) Let us now consider the waves propagation equation ∂u ∂t + a ∂u ∂x = 0, x ∈ [0, 1], t ∈ [0, T ], (9.16) where a is a positive constant. Applying the previous theory, we obtain the equation in finite differences V (xi , tj+1 ) = V (xi , tj ) + c[V (xi−1 , tj ) − V (xi , tj )], i = 1, I, j = 1, J, (9.17) where V (xi , tj ) means the approximate value of the function u(xi , tj ), xi = ih, tj = jk, h = 1/I, k = T /J. Definition 9.4 The number c of relation (9.17), the expression of which is c = ak h (9.18) bears the name of Courant.1 Equation (9.16) is equivalent to the system dt 1 = dx a , (9.19) which leads to the first integral x − at = C1, (9.20) where C1 is a constant; hence, the exact solution of the problem is u = φ(x − at), (9.21) where φ is an arbitrary function. 1The number appears in Courant–Friedrichs–Lewy condition of convergence, called after Richard Courant (1888–1972), Kurt O Friedrichs (1901–1982) and Hans Lewy (1904–1988) who published it in 1928.
  • 539. PARTIAL DIFFERENTIAL EQUATIONS OF FIRST ORDER 533 If c = 1, then the schema is V (xi , tj+1 ) = V (xi−1 , tj ). (9.22) Definition 9.5 We say that a method with finite differences is convergent if the solution obtained by means of the equation with differences converges to the exact solution, when the norm of the net tends to zero. Observation 9.1 (i) No schema is unconditionally stable or unstable. (ii) The schema given in the previous example is stable for 0 < c ≤ 1. (iii) A better approximation of the derivative ∂u(xi , tj )/∂x by using central differences ∂u(xi, tj ) ∂x = u(xi+1, tj ) − u(xi−1, tj ) 2h + O(h2 ) (9.23) leads to an unstable schema for any Courant number c. An often used explicit schema is the Lax–Wendroff2 schema for which, in the case of the previous example, the equation with differences reads V (xi , tj+1 ) = (1 − c2 )V (xi , tj ) − c 2 (1 − c)V (xi+1 , tj ) + c 2 (1 + c)V (xi−1 , tj ), (9.24) its order of accuracy being O(h2 ). Let us note that for c = 1 the Lax–Wendroff schema leads to the exact solution V (xi , tj+1 ) = V (xi−1 , tj ). 9.2.2 Numerical Integration by Means of Implicit Schemata The implicit schemata avoid the disadvantage of the conditional convergence that appears in case of the explicit schemata. In case of implicit schemata, the space derivative is approximated by using the approximate values V (xi , tj+1 ) and not the V (xi , tj ) ones. Thus, we may write ∂u(xi, tj+1) ∂x = u(xi+1, tj+1) − u(xi, tj+1) h + O(h). (9.25) In our example, the equation with finite differences takes the form V (xi , tj+1 ) = cV (xi+1 , tj+1 ) + V (xi , tj ) 1 + c , i = 1, 2, . . . , (9.26) which is unconditionally convergent. Another schema often used in the case of the considered example is that of Wendroff, for which the equation with differences reads V (xi , tj+1 ) = V (xi−1 , tj ) + 1 − c 1 + c [V (xi , tj ) − V (xi−1 , tj )]. (9.27) 2After Peter David Lax (1926–) and Burton Wendroff (1930–) who presented the method in 1960.
  • 540. 534 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS 9.3 PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER Let us consider the quasi-linear partial differential equation of second order n i=1 ai(x1, . . . , xn, u) ∂2 u ∂x2 i + n i=1 bi(x1, . . . , xn, u) ∂u ∂xi + c(x1, . . . , xn, u) = 0, (9.28) written in a canonical form (it does not contain mixed partial derivatives). Equation (9.28) is • of elliptic type if all the coefficients ai(x1, . . . , xn, u), i = 1, n, have the same sign; • of parabolic type if there exists an index j, 1 ≤ j ≤ n, so that aj (x1, . . . , xn, u) = 0, ai(x1, . . . , xn, u) = 0 for i = j, 1 ≤ i ≤ n, and bj (x1, . . . , xn, u) = 0; • of hyperbolic type if all the coefficients ai(x1, . . . , xn, u) have the same sign, excepting one, which is of opposite sign. Observation 9.2 (i) In case of an equation of elliptic type, an arbitrary point of the domain is influenced by all the points of any of its neighborhood. For the reason of reciprocal influence, a problem of elliptic type is numerically solved simultaneously for all the points of the domain. Moreover, the limit conditions are conditions of closed frontiers. (ii) If the equation is of parabolic type, then we can numerically go on in the direction xj for which aj (x1, . . . , xn, u) = 0. Equation (9.28) is now written in the form bj (x1, . . . , xn, u) ∂u ∂xj = F x1, . . . , xn, u, ∂u ∂xi , ∂2 u ∂x2 i , i = 1, n, i = j. (9.29) The problem is now solved only for the points situated on the hypersurfaces xj = const and not for all the points of the domain. (iii) In the case of hyperbolic equations, there exist points, which do not influence each other. The numerical solution must take this fact into account. Moreover, there exist several distinct characteristic directions along which we may go on starting from a certain initial state. In the case of these equations, we may have not only initial conditions but boundary conditions too. 9.4 PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER OF ELLIPTIC TYPE We consider Poisson’s equation3 ∇2 u(x, y) = ∂2 u ∂x2 (x, y) + ∂2 u ∂y2 (x, y) = f (x, y), (9.30) where (x, y) ∈ D, D rectangular domain, D = {(x, y)| a < x < b, c < y < d}, (9.31) 3The equation was studied by Sim´eon Denis Poisson (1781–1840) in 1818.
  • 541. PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER OF ELLIPTIC TYPE 535 with the boundary condition u(x, y) = g(x, y), (x, y) ∈ ∂D. (9.32) Observation 9.3 If f (x, y) and g(x, y) are continuous, then problem (9.30) with the boundary conditions (9.32) has a unique solution. We divide the interval [a, b] in n equal subintervals of length h and the interval [c, d] in m equal subintervals of length k, so that h = b − a n , k = d − c m . (9.33) Thus, the rectangle D will be covered by a net grid with vertical and horizontal lines which pass through the points xi, i = 0, n, and yj , j = 0, m, where xi = a + ih, i = 0, n, (9.34) yi = c + jk, j = 0, m. (9.35) Let a knot be Aij (xi, yj ), i = 1, n − 1, j = 1, m − 1, from the inside of the net. We may expand the function u(x, y) into a Taylor series in the x-variable, around xi, obtaining ∂2 u ∂x2 (xi, yj ) = u(xi+1, yj ) − 2u(xi, yj ) + u(xi−1, yj ) h2 − h2 12 ∂4 u ∂x4 (ξi, yj ), (9.36) where ξi is an intermediary value between xi−1 and xi+1. Analogically, expanding the function u(x, y) into a Taylor series in the y-variable, around yj , it follows that ∂2 u ∂y2 (xi, yj ) = u(xi, yj+1) − 2u(xi, yj ) + u(xi, yj−1) h2 − k2 12 ∂4 u ∂y4 (xi, ηj ), (9.37) with ηj , in this case being an intermediary point between yj−1 and yj . By means of formulae (9.36) and (9.37), problems (9.30) and (9.32) become u(xi+1, yj ) − 2u(xi, yj ) + u(xi−1, yj ) h2 + u(xi, yj+1) − 2u(xi, yj ) + u(xi, yj−1) k2 = f (xi, yj ) + h4 12 ∂4 u ∂x4 (xi, yj ) + k4 12 ∂4 u ∂y4 (xi, yj ), i = 1, n − 1, j = 1, m − 1, (9.38) u(x0, yj ) = g(x0, yj ), j = 0, m, (9.39) u(xn, yj ) = g(xn, yj ), j = 0, m, (9.40) u(xi, y0) = g(xi, y0), i = 1, n − 1, (9.41) u(xi, ym) = g(xi, ym), i = 1, n − 1. (9.42) Observation 9.4 The local truncation error is of order O(h2 + k2). We use the notation wij = u(xi, yj ), i = 0, n, j = 0, m; (9.43)
  • 542. 536 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS and take into account that h and k are sufficiently small to rewrite formulae (9.38)–(9.42) in the form 2 h k 2 + 1 wij − (wi+1,j + wi−1,j ) − h k 2 (wi,j+1 + wi,j−1) = −h2 f (xi, yj ), (9.44) w0,j = g(x0, yj ), j = 0, m, (9.45) wn,j = g(xn, yj ), j = 0, m, (9.46) wi,0 = g(xi, y0), i = 1, n − 1, (9.47) wi,m = g(xi, ym), i = 1, n − 1. (9.48) Equation (9.44), equation (9.45), equation (9.46), equation (9.47)and equation (9.48) lead to a system of (n − 1)(m − 1) linear equations with (n − 1)(m − 1) unknowns, that is, wi,j = u(xi, yj ), i = 1, n − 1, j = 1, m − 1. Numbering the knots of the net again, so that Ai,j = Al, (9.49) where l = i + (m − 1 − j)(n − 1), i = 1, n − 1, j = 1, m − 1 (9.50) and noting wi,j = wl, (9.51) we may write the system of (n − 1)(m − 1) equations with (n − 1)(m − 1) unknowns in a matrix form. O y A2(n−1) An−1 An An−2A3A2A1 An+1 An+2 A2n−3 A(n−1)(m−1)−1 A(n−1)(m−2)−1 A(n−1)(m−1) A(n−1)(m−2) A(n−1)(m−3)+2A(n−1)(m−3)+1A(n−1)(m−2)+1 A(n−1)(m−2)+2 A(n−1)(m−3)+3 A(n−1)(m−2)+3 ym−1 ym−2 ym y3 y2 y1 xn−2 xn−1 xxnx1 x2 x3 Figure 9.2 The numbering of the internal knots of the net.
  • 543. PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER OF ELLIPTIC TYPE 537 Observation 9.5 The renumbering creates a succession of the internal knots of the net, starting from left up to right lateral as shown in Figure 9.2. The algorithm of the finite differences for problems (9.30) and (9.32) reads – given m, n, a, b, c, d, ε, g(x, y), f (x, y); – calculate h = (b − a)/n, k = (d − c)/m; – for i from 0 to n calculate xi = a + ih; – for j from 0 to m calculate yj = c + jk; – for i from 1 to n − 1 do – for j from 1 to m − 1 do – calculate w(0) i,j = 0; – calculate λ = (h2 /k2 ); – set l = 1; – repeat – calculate w(l) 1,m−1 = [1/2(λ + 1)] − h2 f x1, ym−1 + g(x0, ym−1) + λg(x1, ym) +λw(l−1) 1,m−2 + w(l−1) 2,m−1 ; – for i from 2 to n − 2 calculate w(l) i,m−1 = [1/2(λ + 1)] − h2 f xi, ym−1 + λg(xi, ym) + w(l) i−1,m−1 +w(l−1) i+1,m−1 + λw(l−1) i,m−2 ; – calculate w(l) n−1,m−1 = [1/2(λ + 1)] − h2 f xn−1, ym−1 + g(xn, ym−1) + λg(xn−1, ym) +w(l) n−2,m−1 + λw(l−1) n−1,m−2 ; – for j from m − 2 to 2 do – calculate w(l) 1,j = [1/2(λ + 1)] −h2f x1, yj + g(x0, yj ) + λw(l) 1,j+1 + λw(l−1) 1,j−1 + w(l−1) 2,j ; – for i from 2 to n − 2 do – calculate w(l) i,j = [1/2(λ + 1)] −h2 f xi, yj + w(l) i−1,j + λw(l) i,j+1 + w(l−1) i+1,j + λw(l−1) i,j−1 ; – calculate w(l) n−1,j = [1/2(λ + 1)] −h2 f xn−1, yj + g(xn, yj ) + w(l) n−2,j + λw(l) n−1,j + λw(l−1) n−1,j−1 ; – calculate w(l) 1,1 = [1/2(λ + 1)] −h2 f x1, y1 + g(x0, y1) + λg(x1, y0) + λw(l) 1,2 + w(l−1) 2,1 ; – for i from 2 to n − 2 do – calculate w(l) i,1 = [1/2(λ + 1)] −h2 f x1, y1 + g(xi, y0) + w(l) i−1,1 + λw(l) i,2 + w(l−1) i+1,1 ; – calculate w(l) n−1,1 = [1/2(λ + 1)][−h2f (xn−1, y1) + g(xn, y1) + λg(xn−1, y0) +w(l) n−2,1 + λw(l) n−1,2] – set b = true; – for i from 1 to n − 1 do – for j from 1 to m − 1 do – if |w(l) i,j − w(l−1) i,j | ≥ ε then b = false; – if b = false
  • 544. 538 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS then l = l + 1; – until b = true. At the end, wi,j approximates u(xi, yj ) for i = 1, n − 1, j = 1, m − 1. Observation 9.6 The solving of the linear system has been previously made by the Gauss–Seidel method. 9.5 PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER OF PARABOLIC TYPE We consider the partial differential equation of second order of parabolic type of the form ∂u ∂t (x, t) − α2 ∂2 u ∂x2 (x, t) = 0, 0 ≤ x ≤ l, t > 0, (9.52) with the initial and on the frontier conditions u(x, 0) = f (x), 0 ≤ x ≤ l, (9.53) u(0, t) = u(l, t) = 0, t > 0. (9.54) We begin by choosing two net constants h and k, where h = l m , m ∈ N. (9.55) Thus, the knots of the net are (xi, tj ), where xi = ih, i = 0, m, (9.56) tj = jk, j = 0, 1, . . . (9.57) Expanding into a Taylor series, we obtain the formulae with differences ∂u ∂t (xi, tj ) = u(xi, tj + k) − u(xi, tj ) k − k 2 ∂2u ∂x2 (xi, τj), (9.58) where τj ∈ (tj , tj+1) and ∂2u ∂x2 (xi, tj ) = u(xi + h, tj ) − 2u(xi, tj ) + u(xi − h, tj ) h2 − h2 2 ∂4u ∂x4 (ξi, tj), (9.59) where ξi is a point between xi−1 and xi+1. Replacing expressions (9.58) and (9.59) in equation (9.52), we obtain the linear system wi,j+1 − wi,j k − α2 wi+1,j − 2wi,j + wi−1,j h2 = 0, i = 1, m − 1, j = 1, 2, . . . , (9.60) where wij is the approximate of u(xi, tj ).
  • 545. PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER OF PARABOLIC TYPE 539 Observation 9.7 The truncation error is of the order O(h + k2 ). From equation (9.60) we get wi,j+1 = 1 − 2α2 k h2 wi,j + α2 k h2 (wi+1,j + wi−1,j ), i = 1, m − 1, j = 1, 2, . . . (9.61) Condition (9.53) leads to wi,0 = f (xi), i = 0, m. (9.62) With these values, we can determine wi,1, i = 1, m − 1. From the frontier condition (9.54) we obtain w0,1 = wm,1 = 0. (9.63) Applying now the above described procedure with the known values wi,1, it follows that we can determine the other values wi,2, wi,3, . . . , wi,m−1. We obtain a tridiagonal quadratic matrix of order m − 1 associated to the linear system, the form of which is A =         1 − 2λ λ 0 0 · · · 0 0 0 λ 1 − 2λ λ 0 · · · 0 0 0 0 λ 1 − 2λ λ · · · 0 0 0 · · · · · · · · · · · · · · · · · · · · · · · · 0 0 0 0 · · · 0 1 − 2λ λ 0 0 0 0 · · · 0 λ 1 − 2λ         , (9.64) where λ = α2 k h2 . (9.65) If we now denote w(j) =      w1,j w2,j ... wm−1,j      , j = 1, 2, . . . , (9.66) w(0) =      f x1 f (x2) ... f (xm−1)      , (9.67) then the approximate solution of problems (9.52)–(9.54) is given by the matrix equation w(0) = Aw(j−1) , j = 1, 2, . . . (9.68) Definition 9.6 The technique that has been presented is called the method with differences forward. If we denote the error in the representation of the initial data w(0) by ε(0) , then w(1) reads w(1) = A(w(0) + ε(0) ) = Aw(0) + Aε(0) , (9.69)
  • 546. 540 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS so that the representation error of w(1) is given by Aε(0). Step by step we obtain the representation error Anε(0) of w(n). Hence, the method is stable if and only if An ε(0) ≤ ε(0) , n = 1, 2, . . . (9.70) This implies An ≤ 1, where denotes any of the canonical norms; it follows that the spectral radius of the matrix An must be at most equal to unity ρ(An ) = [ρ(A)]n ≤ 1. (9.71) This happens if all the eigenvalues of the matrix A are at most equal to unity. On the other hand, the eigenvalues of the matrix A are given by µi = 1 − 4λsin2 πi 2m , i = 1, m − 1, (9.72) while the stability condition 1 − 4λsin2 πi 2m ≤ 1, i = 1, m − 1, (9.73) leads to 0 ≤ 1 − 4λsin2 πi 2m ≤ 1 2 , i = 1, m − 1. (9.74) Making now m → ∞ (or its equivalent, h → 0), we get lim m→∞ sin2 (m − 1) π 2m = 1, (9.75) hence the searched condition is 0 ≤ λ = α2 k h2 ≤ 1 2 . (9.76) The previous presented schema is thus conditioned stable. A nonconditioned stable schema starts from the relation ∂u ∂t (xi, tj ) = u(xi, tj ) − u(xi, tj−1) k + k 2 ∂2 u ∂t2 (xi, τj ), (9.77) where τj is a point between tj−1 and tj , as well as from formula (9.59). We obtain wi,j − wi,j−1 k − α2 wi+1,j − 2wi,j + wi−1,j h2 = 0, wij ≈ u(xi, tj ). (9.78) Definition 9.7 The above presented method is called the method with differences backward. Equation (9.78) is written in the form (1 + 2λ)wi,j − λwi+1,j − λwi−1,j = wi,j−1, i = 1, m − 1, j = 1, 2, . . . (9.79) Because wi,0 = f (xi), i = 1, m − 1, and wm,0 = w0,j = 0, j = 1, 2, . . . , the linear system takes the matrix form Aw(j) = w(j−1) , (9.80)
  • 547. PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER OF PARABOLIC TYPE 541 where the matrix A is A =         1 + 2λ −λ 0 0 · · · 0 0 0 −λ 1 + 2λ −λ 0 · · · 0 0 0 0 −λ 1 + 2λ −λ · · · 0 0 0 · · · · · · · · · · · · · · · · · · · · · · · · 0 0 0 0 · · · 0 1 + 2λ −λ 0 0 0 0 · · · 0 −λ 1 + 2λ         . (9.81) The solving algorithm of problems (9.52)–(9.54) is as follows: – given m > 0, k, N > 0, T = kN , l; – calculate h = (l/m); – for i from 0 to m do – calculate xi = ih; – calculate λ = α2 (k/h2 ); – for i from 1 to m − 1 do – calculate wi,0 = f (xi); – for j from 1 to N do – calculate w0,j = wm,j = 0; – calculate l1 = 1 + 2λ, u1 = −(λ/l1); – for n from 2 to m − 2 do – calculate ln = 1 + 2λ + λun−1, un = −(λ/ln); – calculate lm−1 = 1 + 2λ + λun−2; – for j from 0 to N do – calculate z1 = w1,j /l1; – for n from 2 to m − 1 do – calculate zn = [(wn,j + λzn−1)/ln]; – calculate wm−1,j+1 = zm−1; – for n from m − 2 to 1 do – calculate wn,j+1 = znwn+1,j+1. The values wi,j approximate u(xi, tj ), i = 0, m, j = 0, N. In the case of the above described algorithm, the matrix A has the eigenvalues µi = 1 + 4λsin2 iπ 2m , i = 1, m − 1, (9.82) all of them being positive and superunitary. Thus, it follows that the eigenvalues of the matrix A−1 are positive and subunitary, and hence the method with differences backward is unconditioned stable. Using for ∂u(xi, tj )/∂t, the formula with central differences ∂u ∂t (xi, tj ) = u(xi, tj+1) − u(xi, tj−1) 2k − k2 6 ∂3u ∂t3 (xi, τj ), (9.83) where τj is between tj−1 and tj+1, and for ∂2 u(xi, tj )/∂x2 , formula (9.59), we obtain the approxi- mating system wi,j+1 − wi,j−1 2k − α2 wi+1,j − 2wi,j + wi−1,j h2 = 0, wij ≈ u(xi, tj ). (9.84)
  • 548. 542 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS Definition 9.8 The method put in evidence by relation (9.84) bears the name of Richardson.4 Observation 9.8 (i) The error of Richardson’s method is of order O(h2 + k2 ). (ii) The Richardson method is conditioned stable. An unconditioned stable method leads to the equation with finite differences wi,j+1 − wi,j k − α2 2 wi+1,j − 2wi,j + wi−1,j h2 + wi+1,j+1 − 2wi,j+1 + wi−1,j+1 h2 = 0. (9.85) Definition 9.9 The method given by formula (9.85) is called the Crank–Nicolson method. Observation 9.9 The truncation error in the Crank–Nicolson method is of order O(h2 + k2 ). The Crank–Nicolson system may be written in a matrix form Aw(j+1) = Bw(j) , j = 0, 1, 2, . . . , (9.86) the matrices A and B being given by A =                1 + λ − λ 2 0 0 · · · 0 0 0 − λ 2 1 + λ − λ 2 0 · · · 0 0 0 0 − λ 2 1 + λ − λ 2 · · · 0 0 0 · · · · · · · · · · · · · · · · · · · · · · · · 0 0 0 0 · · · 0 1 + λ − λ 2 0 0 0 0 · · · 0 − λ 2 1 + λ                , (9.87) B =                1 − λ λ 2 0 0 · · · 0 0 0 λ 2 1 − λ λ 2 0 · · · 0 0 0 0 λ 2 1 − λ λ 2 · · · 0 0 0 · · · · · · · · · · · · · · · · · · · · · · · · 0 0 0 0 · · · 0 1 − λ λ 2 0 0 0 0 · · · 0 λ 2 1 − λ                . (9.88) The solving Crank–Nicolson algorithm5 of solving problems (9.52)–(9.54) is as follows: – given: m > 0, k > 0, N > 0, T = kN , l; – calculate h = (l/m); – for i from 0 to m do – calculate xi = ih; – for j from 0 to N do – calculate tj = jk; 4 After Lewis Fry Richardson (1881–1953) who presented it in 1922. 5John Crank (1916–2006) and Phyllis Nicolson (1917–1968) published this algorithm in A Practical Method for Numerical Evaluation of Solutions of Partial Differential Equations of the Heat Conduction Type in 1947.
  • 549. PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER OF HYPERBOLIC TYPE 543 – calculate λ = α2 (k/h2 ); – for i from 1 to m − 1 do – calculate wi,0 = f (xi); – for j from 1 to N do w0,j = wm,j = 0; – calculate l1 = 1 + λ, u1 = −(λ/2l1); – for n from 2 to m − 2 do – calculate ln = 1 + λ + λ(un − 1/2), un = −(λ/2ln); – calculate lm−1 = 1 + λ + λ(um−2/2); – for j from 0 to N − 1 do – calculate z1 = [(1 − λ)w1,j + (λ/2)w2,j ]/l1; – for n from 2 to m − 1 do – calculate zn = [(1 − λ)wn,j + (λ/2)wn+1,j + (λ/2)wn−1,j + (λ/2)zn−1]/ln; – calculate wm−1,j+1 = zm−1; – for n from m − 2 to 1 do – calculate wn,j+1 = zn − unwn+1,j+1. Finally, wi,j approximate u(xi, tj ), i = 0, m, j = 0, N. 9.6 PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER OF HYPERBOLIC TYPE We start from the equation ∂2u ∂t2 (x, t) − α2 ∂2u ∂x2 (x, t) = 0, 0 < x < l, t > 0, (9.89) to which the conditions u(0, t) = u(l, t) = 0, t > 0, (9.90) u(x, 0) = f (x), 0 ≤ x ≤ l, (9.91) ∂u ∂t (x, 0) = g(x), 0 ≤ x ≤ l. (9.92) are added; α is a real constant in equation (9.89). Let us choose a nonzero natural number m and a time step k > 0 and denote h = l m . (9.93) Thus, the knots (xi, tj ) of the net are given by xi = ih, i = 0, m, (9.94) tj = jk, j = 0, 1, . . . (9.95) Let Ai,j (xi, tj ) be an interior point of the net. We can write the relation ∂2 u ∂t2 (xi, tj ) − α2 ∂2 u ∂x2 (xi, tj ) = 0 (9.96) at this point. Using the central differences of second order, we can write ∂2 u ∂t2 (xi, tj ) = u(xi, tj+1) − 2u(xi, tj ) + u(xi, tj−1) k2 − k2 12 ∂4 u ∂t4 (xi, τj ), (9.97)
  • 550. 544 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS where τj is an intermediary value between tj−1 and tj+1, and ∂2u ∂t2 (xi, tj ) = u(xi+1, tj ) − 2u(xi, tj ) + u(xi−1, tj ) k2 − h2 12 ∂4u ∂x4 (ξi, tj ), (9.98) where ξi ∈ (xi−1, xi+1). It follows that u(xi, tj+1) − 2u(xi, tj ) + u(xi, tj−1) k2 − α2 u(xi+1, tj ) − 2u(xi, tj ) + u(xi−1, tj ) k2 = 1 12 k2 ∂4 u ∂t4 xi, τj − α2 h2 ∂4 u ∂x4 (ξi, tj ) , (9.99) which will be approximated by wi,j+1 − 2wi,j + wi,j−1 k2 − α2 wi+1,j − 2wi,j + wi−1,j h2 = 0. (9.100) Denoting λ = αk h , (9.101) we obtain wi,j+1 − 2wi,j + wi,j−1 − λ2 wi+1,j + 2λ2 wi,j − λ2 wi−1,j = 0 (9.102) from equation (9.100), or equivalently, wi,j+1 = 2(1 − λ2 )wi,j + λ2 (wi+1,j + wi−1,j ) − wi,j−1 = 0, i = 1, m − 1, j = 1, 2, . . . (9.103) The frontier conditions (9.90) are w0,j = wm,j = 0, j = 1, 2, . . . , (9.104) while the initial conditions (9.91) lead to wi,0 = f (xi), i = 1, m − 1. (9.105) We obtain the matrix equation w(j+1) = Aw(j) − w(j−1) , (9.106) where w(k) =      w1,k w2,k ... wm−1,k      , (9.107) A =         2 1 − λ2 λ2 0 0 · · · 0 0 0 λ2 2(1 − λ2) λ2 0 · · · 0 0 0 0 λ2 2(1 − λ2 ) λ2 · · · 0 0 0 · · · · · · · · · · · · · · · · · · · · · · · · 0 0 0 0 · · · 0 2(1 − λ2) λ2 0 0 0 0 · · · 0 λ2 2(1 − λ2 )         . (9.108)
  • 551. PARTIAL DIFFERENTIAL EQUATIONS OF SECOND ORDER OF HYPERBOLIC TYPE 545 Observation 9.10 We notice that to determine w(j+1), the values w(j) and w(j−1) that create difficulties for j = 0 are necessary, because the values w1,j , j = 1, 2, . . . , must be determined by condition (9.92). Usually, ∂u/∂t is replaced by the expression with differences backward ∂u ∂t (xi, 0) = u(xi, t1) − u(xi, t0) k − k2 ∂2 u ∂t2 (xi, τi), (9.109) where τi ∈ (0, k). Thus, it follows that wi,1 = wi,0 + kg(xi), i = 1, m, (9.110) which leads to the error O(k) in the initial data. On the other hand, the local truncation error for equation (9.103) is of order O(h2 + k2 ); we wish to have an error of order O(k2 ) for the initial data. We have u(xi, t1) − u(xi, t0) k = ∂u ∂t (xi, 0) + k 2 ∂2 u ∂t2 (xi, 0) + k2 6 ∂3 u ∂t3 (xi, τi), (9.111) where τi ∈ (0, k). Supposing that equation (9.89) takes place on the initial interval too, that is, we may write ∂2u ∂t2 (xi, 0) − α2 ∂2u ∂x2 (xi, 0) = 0, i = 0, m, (9.112) and if there also exists f (x), then we may write ∂2 u ∂t2 (xi, 0) = α2 ∂2 u ∂x2 (xi, 0) = α2 d2 f (xi) dx2 = α2 f (xi). (9.113) But f (xi) = f (xi+1) − 2f (xi) + f (xi−1) h2 − h2 12 f (4) (ξi), (9.114) where ξi is between xi−1 and xi+1, while f ∈ C4 ([0, l]), and we obtain u(xi, t1) − u(xi, 0) k = g(xi) + kα2 2h2 [f (xi+1) − 2f (xi) + f (xi−1)] + O(h2 + k2 ). (9.115) We get successively u(xi, t1) = u(xi, 0) + kg(xi) + λ2 2 [f (xi+1) − 2f (xi) + f (xi−1)] + O(k3 + h2 k2 ) = (1 − λ2 )f (xi) + λ2 2 f (xi+1) + λ2 2 f (xi−1) + kg(xi) + O(k3 + h2 k2 ). (9.116) It follows that the determination of the values wi,1, i = 1, m − 1, can be made by means of the relation wi,1 = (1 − λ2 )f (xi) + λ2 2 f (xi+1) + λ2 2 f (xi−1) + kg(xi). (9.117)
  • 552. 546 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS Thus, the algorithm with finite differences used to solve problems (9.89)–(9.92) is – given: m, N > 0, k > 0, l, α, f (x), g(x); – calculate h = (l/m), T = kN , λ = (αk/h); – for i from 0 to m do – calculate xi = ih; – for j from 0 to N do – calculate tj = j × k; – for j from 1 to N do – calculate w0,j = wm,j = 0; – for i from 0 to m do – calculate wi,0 = f (xi); – for i from 1 to m − 1 do – calculate wi,1 = (1 − λ2 )f (xi) + (λ2 /2)(f (xi+1) + f (xi−1)) + kg(xi); – for j from 1 to N − 1 do – for i from 1 to m − 1 do – calculate wi,j−1 = 2(1 − λ2 )wi,j − λ2 (wi+1,j + wi−1,j ) − wi,j−1. Thus, wi,j approximate u(xi, tj ), i = 0, m, j = 0, N. 9.7 POINT MATCHING METHOD This method6 has been developed at the middle of the twentieth century. We will present it in the two-dimensional case for partial differential equations of elliptic type, particularly for biharmonic equations (we may use it on the same way for polyharmonic equations too). The method fits well with the plane problem of the theory of elasticity, formulated for a plane domain D. Some methods of calculation (e.g., the variational methods) allow to obtain, with an approxima- tion as good as we wish, the searched function (solution of the partial differential equation) and its derivatives at any point of the domain D. Other methods (finite differences method, nets method, relaxation method, etc.) allow to obtain an appropriate value of the searched function at a finite number of points in the interior of the domain, satisfying the boundary conditions also at a finite number of points. We can imagine a method of calculation that uses ideas from both types of methods. The method consists in searching an analytic function of a form as simple as possible, which does verify the partial differential equation at any point of D, excepting the boundary, where this does occur at a finite number of points. We will thus search a biharmonic function F(x, y) = n i=2 Pi(x, y), (9.118) where Pi(x, y) are biharmonic polynomials ( Pi = 0) of ith degree, i = 2, 3, . . . We notice that such a polynomial implies four arbitrary constants, except P2(x, y), which contain only three such constants. Thus, F(x, y) contains 4n − 5 arbitrary constants. At a point of the boundary, we may put two conditions, that is, for the function F (or for its tangential derivative ∂F/∂s) and for the normal derivative ∂F/∂n. Hence, for a point of the contour we get two conditions for the constants to determine. If we put boundary conditions at 2n − 3 points of the contour, we find a system of 4n − 6 equations, with 4n − 5 unknowns, which will determine the coefficients of the biharmonic polynomial. One of these constants must be taken arbitrary. 6Also known as collocation method. It was introduced by Leonid Vitaliyevich Kantorovich (1912–1986) in 1934.
  • 553. VARIATIONAL METHODS 547 + + + + + + + + 2n−4 2n−3 1 2 3 4 B1 or B2 Figure 9.3 Point matching method. Let B1 and B2 be the distribution of the real boundary conditions and B1 and B2 the boundary conditions obtained after calculation (Fig. 9.3). The differences B1 = B1 − B1 , B2 = B2 − B2 must be as small as possible, so that the error in the determination of the biharmonic function will also be as small as possible. The calculation of the error may be made from case to case from the physical point of view. As an advantage, we mention that contour can be a complicated one and that one gets an analytical expansion for the solution. Besides elementary representations (biharmonic polynomials), we may also use other functions, adequate for some particular problems. We have to solve a system of linear algebraic equations, so that various methods of calculation can be used. In fact, the method considered above is a collocation method. 9.8 VARIATIONAL METHODS Let us consider the functional I(y) = x1 x0 f (x, y, y )dx, (9.119) where f is a function continuous, together with its derivatives till the second order inclusive, in a domain of R3 , y = y(x) is continuous with continuous derivative y = dy/dx and y(x0) = y0, y(x1) = y1. It follows that the function f verifies Euler’s equation ∂f ∂y − d dx ∂f ∂y = 0. (9.120) If the functional is of the form I(y) = · · · f x1, x2, . . . , xn, y, ∂f ∂x1 , ∂f ∂x2 , . . . , ∂f ∂xn dx1dx2 · · · dxn, (9.121) then Euler’s equation reads ∂f ∂y − d dx1 ∂f ∂y ∂x1 − d dx2 ∂f ∂y ∂x2 − · · · − d dxn ∂f ∂y ∂xn = 0. (9.122)
  • 554. 548 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS In the general case, we consider the equation Lu = f, (9.123) where L is an autoadjoint positive linear operator, with the domain of definition D in the Hilbert space H, the dot product , which has values (the operator) in H, u ∈ D, while f ∈ H. Proposition 9.1 If the solution of problem (9.123) exists, then this one assures the minimum of the functional I(u) = Lu, u − 2 u, f . (9.124) Demonstration. Let u be the solution of problem (9.123) and v ∈ D arbitrary and nonzero. If c is a real nonzero number, then we consider vc = u + cv (9.125) and we may write I(vc) = L(u + cv), u + cv − 2 u + cv, f . (9.126) Because L is autoadjoint, we have I(vc) = I(u) + 2c Lu − f, v + c2 Lv, v , (9.127) obtaining thus I(vc) = I(u) + c2 Lv, v ; (9.128) because L is positive, it follows that I(vc) > I(u) (9.129) for any c = 0. Hence u minimizes the functional I(u). Proposition 9.2 If u ∈ H minimizes the functional I(u) and u ∈ D, then Lu = f . Demonstration. Let v ∈ D be arbitrary. Because w = αu + βv ∈ D, with α and β constant, taking into account that the functional I(u) attains its minimum at u, we get I(u + cv) ≥ I(u). (9.130) If L is symmetric, then from equation (9.130) we obtain 2c Lu − f, v + c2 Lv, v ≥ 0; (9.131) with necessity, and it follows that Lu − f, v = 0, (9.132) that is, Lu − f is orthogonal on any element of D, hence Lu − f = 0. (9.133)
  • 555. VARIATIONAL METHODS 549 9.8.1 Ritz’s Method In the frame of this method,7 we consider the problem Lu = f, (9.134) in Hilbert’s space H, with the scalar product , ; D is the domain of definition of L, considered dense in H, while L is a positive definite autoadjoint operator. The problem is equivalent to the finding of the element u ∈ D, which minimizes the functional I(u) = Lu, u − 2 f, u . (9.135) To ensure the existence of the solution, we consider a new scalar product in H, defined by u, v L = Lu, v , u, v ∈ D, (9.136) the norm being given by u L = u, u L. (9.137) Definition 9.10 We call energetic space defined by the operator L, the space obtained by the completing of D by the norm L. We denote this space by HL. We may write I(u) = u, u L − 2 f, u , u ∈ D. (9.138) Because L is positive definite, that is, Lu, u = u, u L ≥ c2 u 2 , u ∈ D, (9.139) with c constant, c > 0, the by completing D to HL, it follows that u, u L ≥ c2 u , for any u ∈ HL. On the other hand, | u, f | ≤ u f ≤ 1 c u L f = B u L, (9.140) so that u, f is bounded, and we may apply Ritz’s theorem. It follows that there exists u0 ∈ HL, so that for any u ∈ HL we have u, f = u, u0 L. (9.141) Thus, the functional reads I(u) = u, u L − 2 f, u = u, u L − 2 u, u0 L = u − u0 2 L − u0 2 L, (9.142) with u ∈ HL; hence it attains its minimum for u = u0. Definition 9.11 The element u0 ∈ HL bears the name of generalized solution of the equation Lu = f . Observation 9.11 If u0 ∈ D, then u0 is the classical solution of problem (9.134). 7After Walther Ritz (1878–1909) who published this method in 1909.
  • 556. 550 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS We will consider a sequence of finite dimensional subspaces Hk ⊆ HL given by the parameters k1, k2, . . . , so that ki → 0, for i → ∞. Definition 9.12 We say that the sequence {Hk} is complete in HL if for any u ∈ HL and ε > 0 there exists k = k(u, ε) > 0, so that inf v∈Hk u − v L < ε, (9.143) for any k < k. From the previous definition we deduce that if {Hk} is complete, then any element u ∈ HL may be approximated with any precision that we may wish with elements of Hk. We will ask to determine the element uk ∈ Hk, which minimizes the functional I(u) in Hk. Proposition 9.3 In the above conditions, the sequence {uk } of Ritz’s approximations for the solution of the equation Lu = f converges to the generalized solution of this problem. Demonstration. For v ∈ Hk we have u0 − uk 2 L = I(uk ) − I(u0) ≤ I(v) − I(u0) = u0 − v 2 L. (9.144) Because v is arbitrary, we may write u0 − uk 2 L ≤ inf v∈Hk u0 − v 2 L −−−→ k→0 0. (9.145) If a basis of the space Hk formed by the functions φk 1, φk 2, . . . , φk nk (nk being the dimension of the space Hk) is known, then the problem of the determination of uk ∈ Hk is equivalent to the determination of the coefficients c1, c2, . . . , cnk in the expansion uk = c1φk 1 + c2φk 2 + · · · + cnk φk nk . (9.146) We obtain the system Ac = g, (9.147) where c = c1 · · · cnk T , (9.148) g = g1 · · · gnk T , gi = f, φk i , i = 1, nk, (9.149) A = [aij ]i,j=1,nk , aij = φk i , φk j , i, j = 1, nk. (9.150) If φk i ∈ D, i = 1, nk, then we may also write aij = Lφk i , φk j , i, j = 1, nk. (9.151) Let us remark that the matrix A is symmetric and positive definite, because Av, v = nk i=1 nk j=1 aij vivj = nk i=1 viφk i , nk j=1 vj φk j L ≥ c2 nk i=1 viφk i 2 ≥ 0. (9.152)
  • 557. VARIATIONAL METHODS 551 Observation 9.12 It is possible that the functions φk 1, φk 2, . . . , φk nk do not verify the limit conditions imposed to problem (9.134). This is due to the completion of the space to HL. Definition 9.13 (i) The limit conditions which are obligatory satisfied by the functions of the domain D, and are not obligatory satisfied by the functions of the energetic space HL are called natural conditions for the operator L. (ii) The limit conditions which are obligatory satisfied by the functions of the energetic space HL are called essential conditions. Observation 9.13 In the frame of Ritz’s method we choose bases in the energetic space; it follows that the functions φk i , i = 1, nk, are not subjected to the natural conditions. 9.8.2 Galerkin’s Method In the frame of Ritz’s method it has been asked that the operator L be autoadjoint and positive definite, which represents a limitation of this method. In the case of Galerkin’s method8 we solve the operational equation Lu = f (9.153) in a Hilbert space H, f ∈ H, while the domain D of definition of L is dense in H. We write L in the form L = L0 + K, (9.154) where L0 is a positive definite symmetric operator with L−1 0 total continuous in H, while the domain DK of definition of K satisfies the relation DF ⊇ DL0 , where DL0 is the domain of definition of L0. We also introduce now the energetic space HL0 of the operator L0, with the scalar product u, v L0 and the norm u 2 L0 = u, u L0 . Let us perform a scalar product of relation (9.135) and an arbitrary function v ∈ DL0 . We obtain L0u, v + Ku, v = f, v (9.155) or u, v L0 + Ku, v = f, v . (9.156) Definition 9.14 We call the generalized solution of equation (9.135) a function u0 ∈ HL0 , which satisfies relation (9.156) for any v ∈ HL0 . Observation 9.14 If u0 ∈ DL0 , then, because u, v L0 = L0u, v , (9.157) it follows that L0u0 + Ku0 − f, v = 0 (9.158) and because DL0 is dense in H, we deduce that u0 satisfies equation (9.153). 8Boris Grigoryevich Galerkin (1871–1945) described the method in 1915.
  • 558. 552 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS Also, we now construct the spaces Hk ⊆ HL0 and the bases φk 1, φk 2, . . . , φk nk , the approximation of the solution being uk = nk i=1 ciφk i , (9.159) where the coefficients ci, i = 1, nk, are chosen so that uk do verify relation (9.156) for any v ∈ Hk. On the other hand, because v ∈ Hk, we deduce that v is written in the form v = nk i=1 biφk i ; (9.160) hence, uk is determined by the system of equations uk , φk i L0 + Kuk , φk i = f, φk i , i = 1, nk. (9.161) The last system may be put in the form Ac = g, (9.162) where A = [aij ]i,j=1,nk , aij = φk i , φk j L0 + Kφk i , φk j , i, j = 1, nk, (9.163) g = g1 · · · gnk T , gi = f, φk i , i = 1, nk, (9.164) c = c1 · · · cnk T . (9.165) Observation 9.15 If K = 0, then Galerkin’s method becomes Ritz’s method. Observation 9.16 We consider that there exists the operator L−1 0 , bounded and defined on the whole space H. Equation (9.153) is now equivalent to u + L−1 0 Ku = L−1 0 f. (9.166) We denote by H1 the Hilbert space with the scalar product u, v 1 = Lu0, L0v (9.167) and the norm u 1 = L0u . (9.168) We also now construct the subspaces Hk, finite dimensional but included in H1 and of bases ψk i , i = 1, nk, and search the approximate solution in the form uk = nk i=1 ciψk i , (9.169) where ci, i = 1, nk, are obtained from the system uk , ψk i 1 + L−1 0 Kuk , ψk i 1 = L−1 0 f, ψk i 1, i = 1, nk. (9.170)
  • 559. VARIATIONAL METHODS 553 9.8.3 Method of the Least Squares Let the operational equation be Lu = f, (9.171) in the Hilbert space H and let Hk be dimensional finite subspaces of H with the bases φk i , i = 1, nk, and with Hk ⊆ D. Starting from the relations ∂ ∂ci Lu − f = 0, i = 1, nk, (9.172) we obtain system (9.147) in which A = [aij ]i,j=1,nk , aij = Lφk i , Lφk j , i, j = 1, nk, (9.173) g = g1 · · · gnk T , gi = f, Lφk i , i = 1, nk, (9.174) c = c1 · · · cnk T , (9.175) the approximate solution being uk = nk i=1 ciφk i . (9.176) The approximate solution uk converges to the exact solution of equation (9.171), if that equation has a unique solution, the sequence of subspaces LHk is complete in D, while the operator L−1 does exist and is bounded. Observation 9.17 The problem is put that the limit solution verifies the limit conditions of problem (9.171). There are two possibilities of tackling this problem: (i) we impose the functions of the space Hk to verify the limit conditions; but the method is difficult to apply; (ii) if Lu = f in D and Liu = fi on ∂Di, i = 1, p, are the problems and the limit conditions, then we consider the functional Ik(u) = Lu − f 2 + nk i=1 ci(k) Liu − fi 2 , (9.177) where ci(k), i = 1, nk, are positive functions of parameter k. If the solution is smooth, then ci(k) = k −2 2m−mi − 1 2 , i = 1, nk, (9.178) where 2m is the order of the partial differential equation Lu = f , while mi is the order of the highest order derivative in the operator Li, i = 1, p. We now search the approximations uk as solutions of the variational problem inf v∈Hk Ik(v) = Ik(uk ). (9.179)
  • 560. 554 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS 9.9 NUMERICAL EXAMPLES Example 9.1 Let us consider the equation of wave propagation ∂u ∂t + a ∂u ∂x = 0, x ∈ [0, 1], t ∈ [0, T ], (9.180) where a is a positive constant. Applying the theory of numerical integration of partial differential equations of first order using explicit schemata, we obtain the equations with finite differences V (xi , tj+1 ) = V (xi , tj ) + c[V (xi−1 , tj ) − V (xi , tj )], i = 1, I, j = 1, J, (9.181) where by V (xi , tj ) has been denoted by the approximate value of the function u(xi , tj ), xi = ih, tj = jk, h = 1/I, k = I/J. Equation (9.180) is equivalent to the system dt 1 = dx a (9.182) which leads to the first integral x − at = C1, (9.183) where C1 is a constant; hence the exact solution of the problem is u = φ(x − at), (9.184) where φ is an arbitrary function. If C = 1, then it follows that the schema V (xi , tj+1 ) = V (xi−1 , tj ). (9.185) Example 9.2 Let the partial differential equation be ∂u ∂t + ∂u ∂x = 0 (9.186) for which the initial and boundary conditions are u(x, 0) = 0, 0 < x ≤ 1, u(0, t) = 1, t ≥ 0. (9.187) At the initial moment t = 0 the function u is identically null for all the values x in the domain, excepting x = 0 for which u = 1. We wish to obtain the equation with differences for problem (9.186), t ≤ 1, with the steps h = 0.1, k = 0.1. We shall apply relation (9.185) from Example 9.1. It follows that V (xi , t0 ) = 0, i > j, (9.188) V (x0 , tj ) = 1, j ≥ 0, (9.189) V (xi , tj+1 ) = V (xi−1 , tj ), i ≥ 1, j ≥ 0, i ≤ 10, j ≤ 9 (9.190)
  • 561. NUMERICAL EXAMPLES 555 0 t x (1,0) (0,9) (0,8) (0,7) (0,6) (0,5) (0,4) (0,3) (0,2) (0,1) (1,0)(0,9)(0,8)(0,7)(0,6)(0,5)(0,4)(0,3)(0,2)(0,1) Figure 9.4 Numerical solution of problem (9.186). and the solution V (xi , tj ) = 1 for i ≤ j 0 otherwise . (9.191) Graphically, the situation is given in Figure 9.4, wherein the points where V (xi , tj ) = 1 have been marked by a star, while the points where V (xi , tj ) = 0 have been marked by a circle. Let us observe that for c = 1 the Lax–Wendroff schema leads to the exact solution V (xi , tj+1 ) = V (xi−1 , tj ), as in this example. Example 9.3 The equation with finite differences for Example 9.1 are now of the form (using implicit schemata) V (xi , tj+1 ) = cV (xi+1, tj+1) + V (xi, tj ) 1 + c , i = 1, 2, . . . (9.192) which is unconditioned convergent. Another schema often used in case of Example 9.1 is the Wendroff schema for which the equation with differences reads V (xi , tj+1 ) = V (xi−1 , tj ) + 1 − c 1 + c [V (xi , tj ) − V (xi−1 , tj )]. (9.193) Example 9.4 Returning to Example 9.2 and using the implicit schemata (9.186) and (9.187) from Example 9.3 for c = 1, we obtain the same results as in Figure 9.4.
  • 562. 556 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS Example 9.5 Let the problem of elliptic type be ∂2 u ∂x2 + ∂2 u ∂y2 = 0, 0 < x < 1, 0 < y < 1 (9.194) with the boundary conditions u(x, 0) = 0, u(x, 1) = x, 0 ≤ x ≤ 1, (9.195) u(0, y) = 0, u(1, y) = y, 0 ≤ y ≤ 1, (9.196) the exact solution of which is u(x, y) = xy. (9.197) Using a system with n = 5, m = 5, we will determine the numerical solution of the problem. In the case of our problem h = 1 − 0 5 = 0.2, k = 1 − 0 5 = 0.2, h k = 1, (9.198) and the linear approximating system 4wi,j − wi+1,j − wi−1,j − wi,j+1 − wi,j−1 = 0, i = 1, 4, j = 1, 4, (9.199) w0,j = 0, j = 0, 5, (9.200) w5,j = 0.2j, j = 0, 5, (9.201) wi,0 = 0, i = 1, 4, (9.202) wi,m = 0.2i, i = 1, 4. (9.203) Renumbering the knots as in Figure 9.5, it follows that the linear system 4w13 − w14 − w9 = w0,1 + w1,0 = 0, 4w9 − w10 − w5 − w13 = w0,2 = 0, 4w5 − w6 − w1 − w9 = w0,3 = 0, 4w1 − w2 − w5 = w0,4 + w1,5 = 0 + 0.2 = 0.2, 4w14 − w15 − w13 − w10 = w2,0 = 0, 4w10 − w11 − w9 − w6 − w14 = 0, 4w6 − w7 − w5 − w2 − w10 = 0, 4w2 − w3 − w1 − w10 = w2,5 = 0.4, 4w15 − w16 − w14 − w11 = w3,0 = 0, 4w11 − w12 − w10 − w7 − w15 = 0, 4w7 − w8 − w6 − w3 − w11 = 0, 4w3 − w4 − w2 − w7 = w3,5 = 0.6, 4w16 − w15 − w12 = w5,1 + w4,0 = 0.2 + 0 = 0.2, 4w12 − w11 − w8 − w16 = w5,2 = 0.4, 4w8 − w7 − w4 − w12 = w5,3 = 0.6, 4w4 − w3 − w8 = w5,4 + w4,5 = 0.8 + 0.8 = 1.6. (9.204) The solution of this system is w1 = 0.16, w2 = 0.32, w3 = 0.48, w4 = 0.64, w5 = 0.12, w6 = 0.24, w7 = 0.36, w8 = 0.48, w9 = 0.08, w10 = 0.16, w11 = 0.24, (9.205) w12 = 0.32, w13 = 0.04, w14 = 0.08, w15 = 0.12, w16 = 0.16.
  • 563. NUMERICAL EXAMPLES 557 A1 A2 A3 A4 A7A6A5 A8 A12A11A10A9 A13 A15A14 A16 (0,10)(0,8)(0,6)(0,4)(0,2) (0,10) (0,8) (0,6) (0,4) (0,2) O y x Figure 9.5 Numbering of the knots for problem (9.196). We observe that the numerical solution coincides with the exact solution and this is because ∂4 u ∂x4 = 0, ∂4 u ∂y4 = 0; (9.206) hence the truncation error vanishes at each step. Example 9.6 Let the problem of elliptic type be ∂2 u ∂x2 + ∂2 u ∂y2 = 0, 0 < x < 1, 0 < y < 1 (9.207) with the boundary conditions u(x, 0) = 0, u(x, 1) = sin(πx) sinh 1, 0 ≤ x ≤ 1, u(0, y) = 0, u(x, 0) = 0, 0 ≤ y ≤ 1 (9.208) the solution of which is u(x, y) = sin(πx) sinh(πy). (9.209) Using the algorithm presented for the differential equations of elliptic type with n = 6, m = 6, and the stopping condition given by ε = 10−10 , we will determine the approximate numerical solution of the problem, as well as the error with respect to the exact solution |u(xi, yj ) − w(l) i,j |, i = 1, n, j = 1, m, while l is given by the algorithm. We have f (x, y) = 0 for (x, y) ∈ [0, 1] × [0, 1], (9.210) g(x, y) = 0 for y = 0, x = 0 or x = 1 sin (πx) sinh(π) for j = m (9.211) or written in the form g(xi, yj ) = 0 for j = 0, i = 0 or i = n sin (πx) sinh(1) for j = m . (9.212) The results of the program are given in Table 9.1 in which l = 80.
  • 564. 558 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS TABLE 9.1 Numerical Solution of Problems (9.207) and (9.208) i j xi yj w(80) i,j u(xi, yj ) |u(xi, yj ) − w(80) i,j | 1 1 0.1667 0.1667 0.28665 0.27393 0.01272 1 2 0.1667 0.3333 0.65011 0.62468 0.02542 1 3 0.1667 0.5000 1.18776 1.15065 0.03711 1 4 0.1667 0.6667 2.04367 1.99935 0.04433 1 5 0.1667 0.8333 3.44719 3.40881 0.03837 2 1 0.3333 0.1667 0.49649 0.47446 0.02204 2 2 0.3333 0.3333 1.12602 1.08198 0.04404 2 3 0.3333 0.5000 2.05726 1.99298 0.06428 2 4 0.3333 0.6667 3.53975 3.46297 0.07678 2 5 0.3333 0.8333 5.97070 5.90423 0.06647 3 1 0.5000 0.1667 0.57330 0.54785 0.02545 3 2 0.5000 0.3333 1.30021 1.24937 0.05085 3 3 0.5000 0.5000 2.37552 2.30130 0.07422 3 4 0.5000 0.6667 4.08735 3.99869 0.08865 3 5 0.5000 0.8333 6.89437 6.81762 0.07675 4 1 0.6667 0.1667 0.49649 0.47446 0.02204 4 2 0.6667 0.3333 1.12602 1.08198 0.04404 4 3 0.6667 0.5000 2.05726 1.99298 0.06428 4 4 0.6667 0.6667 3.53975 3.46297 0.07678 4 5 0.6667 0.8333 5.97070 5.90423 0.06647 5 1 0.8333 0.1667 0.28665 0.27393 0.01272 5 2 0.8333 0.3333 0.65011 0.62468 0.03711 5 3 0.8333 0.5000 1.18776 1.15065 0.03711 5 4 0.8333 0.6667 2.04367 1.99935 0.04433 5 5 0.8333 0.8333 3.44719 3.40881 0.03837 Example 9.7 Let the problem of parabolic type be ∂u ∂t − ∂2 u ∂x2 = 0, 0 < x < π, t > 0 (9.213) with the initial and boundary conditions u(x, 0) = sin x, (9.214) u(0, t) = u(π, t) = 0 (9.215) the exact solution of which is u(x, t) = e−t sin x. (9.216) Considering m = 20, from which h = π/20 and k = 0.01, we search the approximate of the problem for t = 0.5, which will be compared with the exact solution. We shall solve the same problem for h = π/20 and k = 0.1. The results are given in Table 9.2. The numerical and the exact solutions in the second case are given in Table 9.3. We observe that the method presented is not stable in the second case studied above.
  • 565. NUMERICAL EXAMPLES 559 TABLE 9.2 Solution of Equation (9.213) in the First Case i xi u(xi, 0.5) wi,50 |u(xi, 0.5)| − wi,50 0 0 0 0 0 1 0.157079633 0.094882299 0.094742054 0.000140245 2 0.314159265 0.187428281 0.187151245 0.000277037 3 0.471238898 0.275359157 0.274952150 0.000407007 4 0.628318531 0.356509777 0.355982821 0.000526955 5 0.7853981463 0.428881942 0.428248014 0.000633928 6 0.942477796 0.490693611 0.489968319 0.000725292 7 1.099557429 0.540422775 0.539623979 0.000798796 8 1.256637061 0576844936 0.575992305 0.000852632 9 1.413716694 0.599063261 0.598177788 0.000885473 10 1.570796327 0.606530660 0.605634150 0.000896510 11 1.727875959 0.599063261 0.598177788 0.000885473 12 1.884955592 0.576844936 0.575992305 0.000852632 13 2.042035225 0.540422775 0.539623979 0.000798796 14 2.199114858 0.490693611 0.489968319 0.0007252925 15 2.356194490 0.428881942 0.428248014 0.000633928 16 2.513274123 0.356509777 0.355982821 0.000526955 17 2.670353756 0.275359157 0.274952150 0.000407007 18 2.827433388 0.187428281 0.187151245 0.000277037 19 2.984513021 0.094882299 0.094742054 0.000140245 20 3.141592654 0 0 0 TABLE 9.3 Solution of Equation (9.213) in the Second Case i xi u(xi, 0.5) wi,5 |u(xi, 0.5)| − wi,5 0 0 0 0 0 1 0.157079633 0.094882299 0.092478468 0.002403832 2 0.314159265 0.187428281 0.182679809 0.004748473 3 0.471238898 0.275359157 0.268382966 0.006976191 4 0.628318531 0.356509777 0.347477645 0.009032132 5 0.7853981463 0.428881942 0.418016274 0.010865672 6 0.942477796 0.490693611 0.478261948 0.012431663 7 1.099557429 0.540422775 0.526731229 0.013691545 8 1.256637061 0576844936 0.562230640 0.014614296 9 1.413716694 0.599063261 0.582886066 0.015177195 10 1.570796327 0.606530660 0.591164279 0.015366381 11 1.727875959 0.599063261 0.582886066 0.015177195 12 1.884955592 0.576844936 0.562230640 0.014614296 13 2.042035225 0.540422775 0.526731229 0.013691545 14 2.199114858 0.490693611 0.478261948 0.012431663 15 2.356194490 0.428881942 0.418016270 0.010865672 16 2.513274123 0.356509777 0.347477645 0.009032132 17 2.670353756 0.275359157 0.268382966 0.006976191 18 2.827433388 0.187428281 0.182679809 0.004748473 19 2.984513021 0.094882299 0.092478468 0.002403832 20 3.141592654 0 0 0
  • 566. 560 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS Example 9.8 Let the problem of parabolic type be ∂u ∂t − ∂2 u ∂t2 = 0, 0 < x < π, t > 0 (9.217) with the initial and boundary conditions u(x, 0) = sin x, (9.218) u(0, t) = u(π, t) = 0 (9.219) the exact solution of which is u(x, t) = e−t sin x. (9.220) Considering m = 20, from which h = π/20 and k = 0.1 we will determine the approximate solution of the problem for t = 0.5, which will be compared with the exact solution using the method with backward differences. By means of the presented algorithm, the results are given in Table 9.4. Example 9.9 Let the problem of parabolic type be ∂u ∂t − ∂2 u ∂t2 = 0, 0 < x < π, t > 0 (9.221) with the initial and boundary conditions u(x, 0) = sin x, (9.222) u(0, t) = u(π, t) = 0 (9.223) TABLE 9.4 Solution of Problem (9.217) i xi u(xi, 0.5) wi,5 |u(xi, 0.5)| − wi,5 0 0 0 0 0 1 0.157079633 0.094882299 0.097224254 0.002341955 2 0.314159265 0.187428281 0.192054525 0.004626243 3 0.471238898 0.275359157 0.282155775 0.006796618 4 0.628318531 0.356509777 0.365309415 0.008799638 5 0.7853981463 0.428881942 0.439467923 0.010585981 6 0.942477796 0.490693611 0.502805274 0.012111662 7 1.099557429 0.540422775 0.553761889 0.013339114 8 1.256637061 0576844936 0.591083049 0.014238113 9 1.413716694 0.599063261 0.613849783 0.014786522 10 1.570796327 0.606530660 0.621501498 0.014970838 11 1.727875959 0.599063261 0.613849783 0.014786522 12 1.884955592 0.576844936 0.591083049 0.014238113 13 2.042035225 0.540422775 0.553761889 0.013339114 14 2.199114858 0.490693611 0.502805274 0.012111662 15 2.356194490 0.428881942 0.439467923 0.010585981 16 2.513274123 0.356509777 0.365309415 0.008799638 17 2.670353756 0.275359157 0.282155775 0.006796618 18 2.827433388 0.187428281 0.192054525 0.004626243 19 2.984513021 0.094882299 0.097224254 0.002341955 20 3.141592654 0 0 0
  • 567. NUMERICAL EXAMPLES 561 the exact solution of which is u(x, t) = e−t sin x. (9.224) Considering m = 20, where h = π/20 and k = 0.1, we will determine the approximate solu- tion of the problem for t = 0.5, which will be compared with the exact solution by using the Crank–Nicolson method. The results are given in Table 9.5. Example 9.10 Let the problem of hyperbolic type be ∂2u ∂t2 − ∂2u ∂x2 = 0, 0 < x < 1, t > 0 (9.225) with the conditions u(0, t) = u(1, t) = 0, t > 0, (9.226) u(x, 0) = sin(πx), 0 ≤ x ≤ 1, (9.227) ∂u ∂t (x, 0) = 0, 0 ≤ x ≤ 1; (9.228) the exact solution is u(x, t) = sin(πx) cos(πt). (9.229) TABLE 9.5 Solution of Problem (9.221) i xi u(xi, 0.5) wi,5 |u(xi, 0.5)| − wi,5 0 0 0 0 0 1 0.157079633 0.094882299 0.094940434 0.000058135 2 0.314159265 0.187428281 0.187543119 0.00114838 3 0.471238898 0.275359157 0.275527871 0.00168713 4 0.628318531 0.356509777 0.356728211 0.000218434 5 0.7853981463 0.428881942 0.429144720 0.000262777 6 0.942477796 0.490693611 0.490994261 0.000300649 7 1.099557429 0.540422775 0.540753893 0.000331118 8 1.256637061 0576844936 0.577198371 0.000353434 9 1.413716694 0.599063261 0.599430308 0.000367048 10 1.570796327 0.606530660 0.606902283 0.000371623 11 1.727875959 0.599063261 0.599430308 0.000367048 12 1.884955592 0.576844936 0.577198371 0.000353434 13 2.042035225 0.540422775 0.540753893 0.000331118 14 2.199114858 0.490693611 0.490994261 0.000300649 15 2.356194490 0.428881942 0.429144720 0.000262777 16 2.513274123 0.356509777 0.356728211 0.000218434 17 2.670353756 0.275359157 0.275527871 0.000168713 18 2.827433388 0.187428281 0.187543119 0.000114838 19 2.984513021 0.094882299 0.094940434 0.000058135 20 3.141592654 0 0 0
  • 568. 562 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS TABLE 9.6 Solution of Equation (9.225) i xi u(xi, 0.5) wi,60 |u(xi, 0.5)| − wi,60 0 0 0 0 0 1 0.05 −0.048340908 −0.051663969 0.003323061 2 0.10 −0.095491503 −0.102101248 0.006609746 3 0.15 −0.140290780 −0.150138925 0.009848145 4 0.20 −0.181635632 −0.193803147 0.012167515 5 0.25 −0.218508012 −0.234218363 0.015710350 6 0.30 −0.250000000 −0.266551849 0.16551849 7 0.35 −0.275336158 −0.292401548 0.017065390 8 0.40 −0.293892626 −0.311103275 0.017210649 9 0.45 −0.305212482 −0.313895800 0.008683318 10 0.50 −0.309016994 −0.299780167 0.009236827 11 0.55 −0.305212482 −0.278282952 0.026929531 12 0.60 −0.293892626 −0.259112488 0.034780138 13 0.65 −0.275336158 −0.241810622 0.033525536 14 0.70 −0.250000000 −0.218502651 0.031497349 15 0.75 −0.218508012 −0.189734816 0.028773196 16 0.80 −0.181635632 −0.158609575 0.023026057 17 0.85 −0.140290780 −0.122055771 0.018235009 18 0.90 −0.095491503 −0.083173084 0.012318419 19 0.95 −0.048340908 −0.042127931 0.006212977 20 1.00 0 0 0 Using the algorithm of finite differences for h = 0.05, k = 0.01, T = 0.5, we will determine the approximate solution, which will be compared with the exact solution. The results are given in Table 9.6. 9.10 APPLICATIONS Problem 9.1 Let be a square deep beam of side, acted upon on the upper side by a uniform distributed normal load and by the reactions which act as tangential loadings parabolically distributed (Fig. 9.6a). One asks to calculate the corresponding state of stress. Solution: We decompose the loading in two cases, using the properties of symmetry with respect to the Ox- axis. We have thus to solve the problem in Figure 9.6b, with properties of skew symmetry with respect to Ox; the case in Figure 9.6c is symmetric with respect to Ox and represents a simple compression, for which the state of stress is given by (σx, σy – normal stresses, τxy – tangential stress) σx = 0, σy = − p 2 , τxy = 0. (9.230) For the first case, we use the Airy biharmonic function F(x, y), the second derivatives of which give the state of stress in the form σx = ∂2 F ∂y2 , σy = ∂2 F ∂x2 , τxy = − ∂2 F ∂x∂y ; (9.231)
  • 569. APPLICATIONS 563 we notice that F(x, y) must be even with respect to x and odd with respect to y, so that we take the function of the form (the polynomials are obtained from the general form, putting the condition of biharmonicity) F(x, y) = P3(x, y) + P5(x, y) + P7(x, y) + P9(x, y) + P11(x, y) = γ3x2 y + δ3y3 + γ5(x4 y − x2 y3 ) + δ5(y5 − 5x2 y3 ) + γ7 x6 y − 10 3 x4 y3 + x2 y5 + δ7 y7 − 14x2 y5 + 35 3 x4 y3 + γ9(x8 y − 7x6 y3 + 7x4 y5 − x2 y7 ) + δ9(y9 − 27x2 y7 + 63x4 y5 − 21x6 y3 ) + γ11 x10 y − 12x8 y3 + 126 5 x6 y5 − 12x4 y7 + x2 y9 ; (9.232) hence the state of stress is given by σx = 6δ3y − 6γ5x2 y + δ5(20y3 − 30x2 y) + γ7(−20x4 y + 20x2 y3 ) + δ7(42y5 − 280x2 y3 + 70x4 y) + γ9(−42x6 y + 140x4 y3 − 42x2 y5 ) + δ9(72y7 − 1134x2 y5 + 1260x4 y3 − 126x6 y) + γ11(−72x8 y + 504x6 y3 − 504x4 y5 + 72x2 y7 ), σy = 2γ3y + γ5(12x2 y − 2y3 ) − 10δ5y3 + γ7(30x4 y − 40x2 y3 + 2y5 ) + δ7(−28y5 + 140x2 y3 ) + γ9(56x6 y − 210x4 y3 + 84x2 y5 − 2y7 ) + δ9(−54y7 + 756x2 y5 − 630x4 y3 ) + γ11(90x8 y − 672x6 y3 + 756x4 y5 − 144x2 y7 + 2y9 ), τxy = −2γ3x + γ5(−4x3 + 6xy2 ) + 30δ5xy2 + γ7(−6x5 + 40x3 y2 − 10xy4 ) + δ7(140xy4 − 140x3 y2 ) + γ9(−8x7 + 126x5 y2 − 140x3 y4 + 14xy6 ) + δ9(378xy6 − 1260x3 y4 + 378x5 y2 ) + γ11(−10x9 + 288x7 y2 − 756x5 y4 + 336x3 y6 − 18xy8 ). (9.233) We put conditions at 16 points of the contour. Because of the symmetry, there remain five distinct points (Fig. 9.6b). The conditions σx(a, 0) = 0, τxy (0, a) = 0 (9.234) are identically satisfied. We then have (τyx = τxy ) σy(0, a) = σy a 2 , a = σy(a, a) = −0.5p, τyx a 2 , a = τyx (a, a) = 0, σx a, a 2 = σx(a, a) = 0, τxy (a, 0) = 0.75p, τxy a, a 2 = 0.5625p; (9.235) we notice that at the point (a, a), three conditions must be satisfied, because of the symmetry of the stress tensor, hence of the tangential stresses.
  • 570. 564 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS x y O p 3p 4 aa a a x y O 1 22 3 4 4 0.5p aa 3 3 3 2 21 4 4 5 5 0.5625p 0.75p 0.5p aa x y O 0.5p aa 0.5p aa (a) (b) (c) Figure 9.6 Square deep beam.
  • 571. APPLICATIONS 565 We find thus following system of nine linear equations for the nine arbitrary parameters (α1 = γ3a, α2 = δ3a, α3 = γ5a3 , α4 = δ5a3 , α5 = γ7a5 , α6 = δ7a5 , α7 = γ9a7 , α8 = δ9a7 , α9 = γ11a9 ), α1 − α3 − 5α4 + α5 − 14α6 − α7 − 27α8 + α9 = −0.25p, 2α1 + α3 − 10α4 − 6.125α5 + 7α6 + 6.75α7 + 95.625α8 + 3.1016α9 = −0.5p, α1 + 5α3 − 5α4 − 4α5 + 56α6 − 36α7 + 36α8 + 16α9 = −0.25p, α1 − 2.5α3 − 15α4 + 0.1875α5 − 52.5α6 + 6.625α7 − 13.3125α8 − 11.6055α9 = 0, α1 − α3 − 15α4 − 12α5 + 4α7 + 252α8 + 80α9 = 0, 3α2 − 3α3 − 12.5α4 − 7.5a5 + 1.3125α6 − 4.8125α7 + 59.625α8 + 11.8125α9 = 0, 3α2 − 3α3 − 5α4 − 84α6 + 28α7 + 36α8 = 0, α1 + 2α3 + 3α5 + 4α7 + 5α9 = −0.375p, 2α1 + 2.5α3 − 7.5α4 − 3.375α5 + 26.25α6 − 14.9688α7 − 21.6563α8 − 19.9297α9 = −0.5625p. (9.236) By solving the system (we use one of the methods that have been presented in Section 4.5), we get γ3 = −0.347100 p a , δ3 = −0.083952 p a , γ5 = 0.009407 p a3 , δ5 = −0.014571 p a3 , γ7 = −0.009264 p a5 , δ7 = −0.003585 p a5 , γ9 = −0.003837 p a7 , δ9 = 0.000376 p a7 , γ11 = −0.000654 p a9 , (9.237) the function F(x, y) being thus completely determined. Taking into account the state of stress (9.230) and formulae (9.233), we get finally (ξ = x/a, η = y/a) σx = [(−0.504 + 0.380ξ2 − 0.064ξ4 + 0.113ξ6 + 0.047ξ8 )η + (−0.291 + 0.819ξ2 − 0.064ξ4 − 0.329ξ6 )η3 + (−0.151 − 0.265ξ2 + 0.329ξ4 )η5 + (0.027 − 0.047ξ3 )η7 ]p, σy = [−0.500 + (−0.695 + 0.113ξ2 − 0.278ξ4 − 0.215ξ6 − 0.059ξ8 )η + (0.127 − 0.132ξ2 + 0.570ξ4 + 0.439ξ6 )η3 + (0.082 − 0.038ξ2 − 0.494ξ4 )η5 + (−0.013 + 0.094ξ2 )η7 − 0.001η9 ]p, τxy = [0.695ξ − 0.638ξ3 + 0.056ξ5 + 0.031ξ7 + 0.006ξ9 + (−0.381ξ + 0.131ξ3 − 0.338ξ5 − 0.189ξ7 )η2 + (−0.409ξ + 0.063ξ3 + 0.494ξ5 )η4 + (0.088ξ − 0.221ξ3 )η6 + 0.012ξη8 ]p. (9.238) We obtain thus on the contour a distribution of stresses from which we subtract the distribution of the external loading; it follows that
  • 572. 566 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS • on the sides ξ = ±1: σx(±1, η) = (−0.028η + 0.135η3 − 0.087η5 − 0.020η7 )p = −0.02η(1 − η2 )(0.25 − η2 )(5.6 + η2 )p, τxy (±1, η) = ∓(0.027η2 − 0.148η4 + 0.133η6 − 0.012η8 )p ∼= ∓0.012η2 (1 − η2 )(0.25 − η2 )(9.7 − η2 )p; (9.239) • on the sides η = ±1: σy(ξ, ±1) = ±(0.037ξ2 − 0.202ξ4 + 0.224ξ6 − 0.059ξ8 )p ∼= ±0.05ξ2 (1 − ξ2 )(0.25 − ξ2 )(2.55 − ξ2 )p, τyx (ξ, ±1) = (0.005ξ − 0.065ξ3 + 0.212ξ5 − 0.158ξ7 + 0.006ξ9 )p ∼= 0.006ξ(1 − ξ2 )(0.25 − ξ2 )(0.15 − ξ2 )(25 − ξ2 )p. (9.240) We represent these parasite stresses in Figure 9.7. Although Saint–Venant’s principle cannot be applied, because the deep beam has equal dimensions, a negligible state of stress takes place in the interior (the stresses are very small with respect to the loading). We can make an elemen- tary verification, approximating the loading by parabolically distributed loads and using methods x y O 1 2 3 3 2 2 2 33 3 3 3 3 3 3 3 3 1 4 4 4 4 4 5 22 5 5 4 1 22 0.39a a a 0.007p0.001p 0.017p 0.005p 0.013p 0.008p 0.001p 0.009p aa 4 4 Figure 9.7 Parasitic stresses on the boundary.
  • 573. APPLICATIONS 567 of strength of materials. The bending moments at the vertical cross sections 2–2 and 1–1 are (covering) M2−2 = −2 2 3 0.01p a 2 a 4 = − 1 6 0.01pa2 , M1−1 = −2 2 3 0.01p a 2 3a 4 + 2 2 3 0.02p a 2 a 4 = − 1 6 0.01pa2 ; (9.241) hence, we get (the strength modulus is W = (1/6)(2a)2 = 2a2 /3) σmax = ∓0.0025p. (9.242) We can thus see that the error is not greater then 1.7% of the maximum external load, which takes place at four points of the contour. We may consider that the relations (9.238) give the searched state of stress, which we represent in Figure 9.8a and Figure 9.8b. The broken line in Figure 9.8a corresponds to the linear distribution obtained in strength of materials (Navier’s formula). x y O 0.687p σx 0.919p 0.750p 0.562p 0.375p 0.287p 0.763p0.950p(a) (b) 0.223p 0.293p 0.343p 0.95p 0.05p 0.5p 0.5p 0.121p 0.124p 0.5p 0.826p O p p 0.829p p 0.285p y 0.583p 0.75p σy τxy x Figure 9.8 State of stress (a) σx; (b) σy, τxy .
  • 574. 568 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS w(x,t) x G BA l Figure 9.9 Problem 9.2. Problem 9.2 We consider a thread of length l and density ρ, the cross section of which is constant of area equal to A. The spring is fixed at A (Fig. 9.9) and passes over a small pulley at B, at the other end of the thread holding a weight G. The partial differential equation of the free transverse vibrations of the thread is ∂2 w ∂x2 − 1 c2 ∂2 w ∂t2 = 0, (9.243) where w(x, t) is the deflection, while c is a constant c = G ρA ; (9.244) knowing the initial conditions t = 0, w(x, 0) = 4h0 x l − x2 l2 , ∂w(x, 0) ∂t = 0, (9.245) determine • the exact solution, integrating the equation by Fourier’s method; • a numerical solution, integrating with finite differences and compare the results. Numerical application: A = 10−6 m2 , ρ = 104 kg m−3 , l = 2 m, h0 = 2 × 10−2 m, G = 10−2 N. Solution: 1. Solution by the Fourier method We consider a solution of the form w(x, t) = Y(x) cos(pt − φ) (9.246) and expression (9.243) leads to the differential equation Y + p2 c2 Y = 0, (9.247) from which we obtain Y(x) = B cos p c x + D sin p c x ; (9.248) taking into account the boundary conditions w(0, t) = w(l, t) = 0, (9.249)
  • 575. APPLICATIONS 569 we obtain sin p c l = 0, (9.250) which leads to the eigenvalues pk = kπ c l , k = 1, 2, . . . (9.251) Under these conditions, the general solution takes the form w(x, t) = ∞ k=1 Dk sin kπx l cos(pkt − φk), (9.252) the constants Dk, φk being given by Dk cos φk = 2 l l 0 w(x, 0) sin kπx l dx, Dk sin φk = 2 l l 0 ∂w(x, 0) ∂t sin kπx l dx. (9.253) We obtain the results φk = 0, Dk = 16h0 k3π3 (1 − cos kπ), (9.254) from which the solution w(x, t) = 32h0 π3 ∞ i=1 sin (2i − 1) πx l cos p2i−1t (2i − 1)3 . (9.255) 2. Numerical calculation We apply the theory presented for the partial differential equations of second order of hyperbolic type for α = c, f (x) = 4h0 x l − x2 l2 , g(x) = 0. (9.256) The results for x = l/2 are plotted in Figure 9.10 Problem 9.3 Let us consider the bar BC of length l (Fig. 9.11), of density ρ, of modulus of longitudinal elasticity E, having a constant area A of the cross section; the bar is built-in at B, the end C being free. The partial differential equation of the free transverse vibrations of the bar reads ∂4 w ∂x4 + Aρ EI ∂2 w ∂t2 = 0, (9.257) where w(x, t) is the deflection (Fig. 9.11), and I is the principal moment of inertia of the cross section of the bar with respect to the neutral axis (normal to Bx and Bw). Being given the initial conditions t = 0, w(x, 0) = h0 f1(β1)f4 β1 x l − f2(β1)f3 β1 x l |f1(β1)f4(β1) − f2(β1)f3(β1)| , ∂w(x, 0) ∂t = 0, (9.258)
  • 576. 570 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −0.02 −0.015 −0.01 −0.005 0 0.005 0.01 0.015 0.02 t (s) w(l/2)&wfindiff(l/2) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −1.5 −1 −0.5 0 0.5 1 1.5 × 10−6 t (s) w(l/2)−wfindiff(l/2) (a) (b) Figure 9.10 (a) The analytic w(l/2) calculated with 20 terms and the numerical w(l/2) versus time; (b) the error. B w(x,t) x x C w Figure 9.11 Problem 9.3.
  • 577. APPLICATIONS 571 where f1, f2, f3, f4 are Krylov’s functions f1(z) = cosh z + cos z 2 , f2(z) = sinh z + sin z 2 , f3 = cosh z − cos z 2 , f4 = sinh z − sin z 2 , (9.259) while β1 is the smallest positive solution of the equation cosh β cos β + 1 = 0, (9.260) determine: • the exact solution, integrating the equation by Fourier’s method; • a numerical solution, integrating by means of finite differences and compare the results for x = l/2. Numerical application: ρ = 7800 kg m−3 , l = 1 m, A = 6 × 10−4 m2 , I = 5 × 10−9 m4 , E = 2 × 1011 N m−2 , h0 = 0.02 m. Solution: 1. Solution by Fourier’s method Let us consider a solution of the form w(x, t) = Y(x) cos(pt − φ); (9.261) from equation (9.257) we obtain the differential equation Y(iv) − α4 Y = 0, (9.262) where α4 = p2 ρA EI . (9.263) The solution of equation (9.262) and its derivatives Y , Y , Y satisfy the matrix equation            Y (x) Y (x) α Y (x) α2 Y (x) α3            =     f1 (αx) f2(αx) f3(αx) f4(αx) f4(αx) f1(αx) f2(αx) f3(αx) f3(αx) f4(αx) f1(αx) f2(αx) f2(αx) f3(αx) f4(αx) f1(αx)                 Y (0) Y (0) α Y (0) α2 Y (0) α3             . (9.264) Observing from Figure 9.11 that the conditions that take place for the bar at the ends are Y(0) = Y (0) = 0, Y (l) = Y (l) = 0, (9.265) we obtain from the expression (9.264) the homogenous equations in Y (0), Y (0) αf1(αl)Y (0) + f2(αl)Y (0) = 0, αf4(αl)Y (0) + f1(αl)Y (0) = 0. (9.266) The system (9.266) admits a nontrivial solution if f 2 1 (β) − f2(β)f4(β) = 0, (9.267)
  • 578. 572 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS where β = αl. (9.268) Taking into account equation (9.259), equation (9.267) becomes cosh β cos β + 1 = 0, (9.269) with the solutions β1, β2, . . . , βn, . . . , so that, from equation (9.263) and equation (9.268) we deduce the eigenpulsations pn = β2 n l2 EI ρA . (9.270) Taking into account relations (9.264), (9.266), and (9.270), the functions Yn(x) read Yn(x) = Dn n(x), (9.271) where Dn are constants, while n(x) are the eigenfunctions n(x) = f1(βn)f4 βn x l − f2(βn)f3 βn x l , (9.272) with the property of orthogonality l 0 n(x) m(x)dx = 0, if m = n. (9.273) Under these conditions, the general solution is w(x, t) = ∞ n=1 Dn n(x) cos(pnt − φn), (9.274) where Dn and φn are given by Dn cos φn = l 0 w(x, t) n(x)dx l 0 2 n(x)dx , Dn sin φn = l 0 ∂w(x,0) ∂t n(x)dx p2 n l 0 2 n(x)dx . (9.275) In the considered case, with the conditions (9.258), it follows that φn = 0, n ≥ 0, Dn = 0, n ≥ 1, (9.276) D1 = h0 |f1(β1)f4(β1) − f2(β1)f3(β1)| , (9.277) where β1 ≈ 1.875, p1 = β2 1 √ EI /(ρA)/l2 , hence w(x, t) = h0 f1(β1)f4 β1 x l − f2(β1)f3 β1 x l |f1(β1)f4(β1) − f2(β1)f3(β1)| cos p1t. (9.278)
  • 579. APPLICATIONS 573 2. Numerical calculation We consider the domain [0, l] × [0, T ] ⊂ R2 , (9.279) the number of division points being m and n, respectively. We may write h = l m , k = T n . (9.280) From the relation w(x, k) = w(x, 0) + k ∂w(x, 0) ∂t + O(k2 ) (9.281) we obtain wi,1 ≈ wi,0 + k ∂w(xi, 0) ∂t + wi,0, i = 0, m. (9.282) On the other hand, the conditions Y (l) = Y (l) = 0, (9.283) are put; we take into account that Y(l − h) = Y(l) − hY (l) + O(h4 ), Y(l − 2h) = Y(l) − 2hY (l) + O(h4 ), (9.284) Y(l − 2h) = Y(l − h) − hY (l − h) + O(h4 ), from which Y (l) = Y (l − h) = Y (l − 2h), (9.285) and that Y (l) ≈ wm−1,j − wm,j h , (9.286) Y (l − h) ≈ wm−2,j − wm−1,j h , (9.287) Y (l − 2h) ≈ wm−3,j − wm−2,j h , (9.288) we are led to wm−1,j = 2wm−2,j − wm−3,j , wm,j = 2wm−1,j − wm−2,j . (9.289) On the other hand, ∂4 w ∂x4 ≈ wi+2,j − 4wi+1,j + 6wi,j − 4wi−1,j + wi−2,j h4 , (9.290) ∂2w ∂t2 ≈ wi,j+1 − 2wi,j + wi,j−1 k2 , (9.291) so that equation (9.257) takes the form wi,j+1 = 2wi,j − wi,j−1 − λ2 (wi+2,j − 4wi+1,j + 6wi,j − 4wi−1,j + wi−2,j ), (9.292)
  • 580. 574 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS in finite differences, where λ2 = Aρ EI k2 h4 . (9.293) By formula (9.292), we may calculate the values w at the points A, B, and C, marked in Figure 9.12. The values w for the points of type D or E cannot be calculated by this formula. We apply the formula (9.289) for these points and we obtain: EDC1 C2B A T l t n j +1 j j −1 xm m−1 m−2 m−3 m−4i+2 i+1 ii−1 i−2 4321O Figure 9.12 Working schema. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 × 10−3 × 10−3 −6.805 −6.8 −6.795 −6.79 −6.785 −6.78 −6.775 −6.77 −6.765 t (s) w(l/2)&wfindiff(l/2) Figure 9.13 The analytic w(l/2) (continuous line) and numerical w(l/2) (dashed line) versus time.
  • 581. FURTHER READING 575 • for the point D wm−1,j+1 = 2wm−2,j+1 − wm−3,j+1 (9.294) or wD = 2wC1 − wC2 ; (9.295) • for the point E wm,j+1 = 2wm−1,j+1 − wm−2,j+1 (9.296) or wE = 2wD − wC1 . (9.297) The results obtained for x = l/2 are plotted in Figure 9.13. FURTHER READING Acton FS (1990). Numerical Methods that Work. 4th ed. Washington: Mathematical Association of America. Atkinson K, Han W (2010). Theoretical Numerical Analysis: A Functional Analysis Framework. 3rd ed. New York: Springer-Verlag. Babuˇska I, Pr´ager M, Vit´asek E (1966). Numerical Processes in Differential Equations. Prague: SNTI. Bakhvalov N (1976). M´ethodes Num´erique. Moscou: Editions Mir (in French). Boyce WE, DiPrima RC (2008). Elementary Differential Equations and Boundary Value Problems. 9th ed. Hoboken: John Wiley & Sons, Inc. Burden RL, Faires L (2009). Numerical Analysis. 9th ed. Boston: Brooks/Cole. Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson. Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice Hall. Den Hartog JP (1961). Strength of Materials. New York: Dover Books on Engineering. Epperson JF (2007). An Introduction to Numerical Methods and Analysis. Hoboken: John Wiley & Sons, Inc. Farlow SJ (1982). Partial Differential Equations for Scientists and Engineers. New York: John Wiley & Sons, Inc. Gockenbach MS (2010). Partial Differential Equations: Analytical and Numerical Methods. 2nd ed. Philadelphia: SIAM. Godunov SK, Reabenki VS (1977). Scheme de Calcul cu Diferent¸e Finite. Bucures¸ti: Editura Tehnic˘a (in Romanian). Golub GH, van Loan CF (1996). Matrix Computations. 3rd ed. Baltimore: John Hopkins University Press. Greenbaum A, Chartier TP (2012). Numerical Methods: Design, Analysis, and Computer Implemen- tation of Algorithms. Princeton: Princeton University Press. Grossmann C, Roos HG, Stynes M (2007). Numerical Treatment of Partial Differential Equations. Berlin: Springer-Verlag. Heinbockel JH (2006). Numerical Methods for Scientific Computing. Victoria: Trafford Publishing. Hibbeler RC (2010). Mechanics of Materials. 8th ed. Englewood Cliffs: Prentice Hall. Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill. Iserles A (2008). A First Course in the Numerical Analysis of Differential Equations. 2nd ed. Cambridge: Cambridge University Press. Ixaru LG (1979). Metode Numerice pentru Ecuat¸ii Diferent¸iale cu Aplicat¸ii. Bucures¸ti: Editura Academiei Romˆane (in Romanian).
  • 582. 576 INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd ed. Boca Raton: CRC Press. Kunz KS (1957). Numerical Analysis. New York: McGraw-Hill. Lurie AI (2005). Theory of Elasticity. New York: Springer-Verlag. Marciuk GI (1983). Metode de Analiz˘a Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Marciuk GI, S¸aidurov VV (1981). Cres¸terea Preciziei Solut¸iilor ˆın Scheme cu Diferent¸e. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Marinescu G (1974). Analiza Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Palm WJ III (2007). Mechanical Vibrations. Hoboken: John Wiley & Sons, Inc. Pandrea N, Pˆarlac S (2000). Vibrat¸ii Mecanice: Teorie s¸i Aplicat¸ii din Domeniile Autovehiculelor Rutiere s¸i din Domeniul Prelucr˘arilor Mecanice. Pites¸ti: Editura Universit˘at¸ii din Pites¸ti (in Romanian) Press WH, Teukolski SA, Vetterling WT, Flannery BP (2007). Numerical Recipes: The Art of Scientific Computing. 3rd ed. Cambridge: Cambridge University Press. Quarteroni A, Sacco R, Saleri F (2010). Numerical Mathematics. 2nd ed. Berlin: Springer-Verlag. Rivi`ere B (2008). Discontinuous Galerkin Methods for Solving Elliptic and Parabolic Equations: Theory and Implementation. Philadelphia: SIAM. Salvadori MG, Baron ML (1962). Numerical Methods in Engineering. Englewood Cliffs: Prentice Hall. Samarski A, Andr´eev V (1978). M´ethodes aux Diff´erences pour ´Equations Elliptiques. Moscou: Editions Mir (in French). Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson. Shabana AA (2011). Computational Continuum Mechanics. 2nd ed. Cambridge: Cambridge University Press. Simionescu I, Dranga M, Moise V (1995). Metode Numerice ˆın Tehnic˘a. Aplicat¸ii ˆın FORTRAN. Bucures¸ti: Editura Tehnic˘a (in Romanian). Sinha AK (2010). Vibration of Mechanical Systems. Cambridge: Cambridge University Press. Smith GD (1986). Numerical Solution of Partial Differential Equations: Finite Difference Methods. 3rd ed. Oxford: Oxford University Press. St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian). St˘anescu ND, Munteanu L, Chiroiu V, Pandrea N (2007). Sisteme Dinamice: Teorie s¸i Aplicat¸ii. Volume 1. Bucures¸ti: Editura Academiei Romˆane (in Romanian). St˘anescu ND, Munteanu L, Chiroiu V, Pandrea N (2011). Sisteme Dinamice. Teorie s¸i Applicat¸ii. Volume 2. Bucures¸ti: Editura Academiei Romˆane (in Romanian). Teodorescu PP, Nicorovici NAP (2010). Applications of the Theory of Groups in Mechanics and Physics. Dordrecht: Kluwer Academic Publishers. Teodorescu PP (2008). Mechanical Systems: Classical Models. Volume 2: Mechanics of Discrete and Continuous Systems. Dordrecht: Springer-Verlag. Teodorescu PP (2009). Mechanical Systems: Classical Models. Volume 3: Analytical Mechanics. Dordrecht: Springer-Verlag. Udris¸te C, Iftode V, Postolache M (1996). Metode Numerice de Calcul. Algoritmi s¸i Programe Turbo Pascal. Bucures¸ti: Editura Tehnic˘a (in Romanian).
  • 583. 10 OPTIMIZATIONS 10.1 INTRODUCTION Definition 10.1 A method of optimization solves the problem of determination of the minimum (maximum) of an objective (purpose) function U, where U : D ⊂ Rn → R. Observation 10.1 Because the determination of the maximum of the objective function U is equivalent to the determination of the minimum of the function −U, it follows that we may limit ourselves to the determination of the minimum of the objective function. In general, in case of optimization problems, the global minimum is of interest. Such a point of global minimum will be found between the points of local minimum; it can be unique or multiple (i.e., there exists only one point at which the function U takes its least value in D, or there are several such points, possibly even an infinity). For a local minimum x of the function U we can write ∇U(x) = 0, ∇2 U(x) > 0, (10.1) where ∇U is the gradient of U, that is, ∇U(x) = ∂U ∂x1 x=x i1 + · · · + ∂U ∂xn x=x in, (10.2) and where x = (x1, . . . , xn)T is a point of D ⊂ Rn, i1, . . . , in are the unit vectors of the coordinate axes in Rn , while ∇2 U is the Hessian matrix ∇2 U(x) =       ∂2 U ∂x2 1 ∂2 U ∂x1∂x2 . . . ∂2 U ∂x1∂xn . . . . . . . . . . . . ∂2U ∂xn∂x1 ∂2U ∂xn∂x2 . . . ∂2U ∂x2 n       x=x . (10.3) Numerical Analysis with Applications in Mechanics and Engineering, First Edition. Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.  2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc. 577
  • 584. 578 OPTIMIZATIONS Definition 10.2 Conditions (10.1) are called optimality conditions. Observation 10.2 (i) The optimality conditions are sufficient for x to be a global minimum for the function U, but they are not necessary. (ii) The condition ∇2 U(x) > 0 requires that the Hessian matrix be positive definite at the point x. To determine the global minimum of the function U, we can proceed intuitively in two modes: • we start with different points x(0) , determining in each case the minimum of the function U; the point x is that which leads to the least value between the minima previously obtained; • we determine a local minimum; if, by a perturbation, the algorithm returns us to the same point, then it may be a serious candidate for the global minimum. The classification of the optimization methods can be made after several criteria as • from the point of view of the restrictions imposed to the variables we have optimization problems with or without restrictions; • from the point of view of the objective function we may have linear optimization problems for which both the objective function and the restrictions are linear and nonlinear optimization problems in the opposite case; • from the point of view of the calculation of the derivatives we encounter (i) optimization methods of Newton type, where the Hessian matrix ∇2 U(x) and the gradient vector ∇U are calculated, (ii) optimization methods of quasi-Newton type and optimization methods with conjugate gradients, where only the partial derivatives of first order are calculated, and (iii) optimization methods where no partial derivatives are calculated. The optimization methods are iterative ones. They determine the value x as a limit of a sequence x(0) , x(1) , . . . , x(k) , . . . defined iteratively by the relation x(k+1) = x(k) + αkp(k) , k = 0, 1, . . . , (10.4) where p(k) is a direction of decreasing of the objective function U by the step k, while αk is a positive real number such that U(x(k+1) ) < U(x(k) ), k = 0, 1, . . . (10.5) The point x(0) ∈ D is necessary to start the algorithm. 10.2 MINIMIZATION ALONG A DIRECTION Let us consider the function f : R → R, the minimum of which we wish to determine. There can appear two situations: • the derivative f may be analytically determined. In this case, we have to solve the equation f (x) = 0 and to verify which of its solutions are local minima. The global minimum will, obviously, be the smallest value of these local minima and will correspond to one or several points at which f (x) = 0; • the derivative f cannot be analytically determined. In this case, we have to go through two steps:
  • 585. MINIMIZATION ALONG A DIRECTION 579 (a) localization of the minimum, that is, the determination of an interval (a, b) of the real axis that contains the point of minimum; (b) reduction of the length of the interval (a, b) until it has a length strictly smaller than an imposed value ε |b − a| < ε. (10.6) Observation 10.3 Let us denote the representation error of the numbers in the computer by εm, that is, the minimal distance between two numbers, which can be represented in the computer, for which the two representations differ. Under these conditions, ε must fulfill the relation ε ≥ √ εm. (10.7) Indeed, let a be a point sufficiently near to the point of minimum, so that f (a) ≈ 0. (10.8) Taylor’s relation around the point a leads to f (b) ≈ f (a) + (b − a)2 2! f (a). (10.9) The values a and b must satisfy the relation |f (b) − f (a)| > εm|f (a)|, (10.10) so that the representations of f (a) and f (b) be different. We thus deduce |b − a| ≈ 2εm |f (a)| |f (a)| = |a| √ εm 2 |f (a)| a2|f (a)| . (10.11) Moreover, if 2|f (a)|/(a2f (a)) is of order O(1), then |b − a| is of order O(|a| √ εm) and the condition |b − a| < ε|a| (10.12) leads to equation (10.7). 10.2.1 Localization of the Minimum To localize the minimum of a function f : R → R, at least three points are necessary. Considering three points a, b and c, so that a < b < c, the minimum xm is situated in the interval (a, c) if f (a) > f (b) and f (b) < f (c). If we have two values a and b, with a < b and f (a) > f (b), we use the following algorithm for the localization of the minimum: – given: a, b, a < b, f (a) > f (b); – calculate fa = f (a), fb = f (b); – repeat – calculate c = b + k(b − a), fc = f (c); – if fc > fb then xm ∈ (a, c); stop; else – calculate a = b, b = c, fa = fb, fb = fc; until false.
  • 586. 580 OPTIMIZATIONS Observation 10.4 (i) Usually, the searching step is not taken constant (k = 1), but it increases from one step to another, so that the localization of the minimum does take place as fast as possible hj+1 = khj , k > 1. (10.13) (ii) The algorithm may be improved by using a parabolic interpolation. Thus, a parabola passes through the points A(a, f (a)), B(b, f (b)) and C(c, f (c)), whose equation is g(x) = (x − b)(x − c) (a − b)(a − c) f (a) + (x − a)(x − c) (b − a)(b − c) f (b) + (x − a)(x − b) (c − a)(c − b) f (c) = d2x2 + d1x + d0. (10.14) Let us denote the point of minimum of this parabolaby x∗ = − d1 2d2 . (10.15) The following situations may occur: • x∗ > c. In this case it requires that x∗ not be very far from the point c, so that |x∗ − c| < λ|c − b|, where we may take, for example, λ = 50; • x∗ < a. The situation is similar to the previous one, replacing the point c by the point a; • x∗ ∈ (b, c), f (b) > f (x∗ ), f (x∗ ) < c. It follows that the minimum of the function is between the points b and c; • x∗ ∈ (a, b), f (a) > f (x∗ ), f (x∗ ) < f (b). The case is analogous to the preceding one, the minimum of the function f now taking place between a and b; • x∗ ∈ (b, c), f (a) ≤ f (x∗) or f (x∗) ≥ f (c). The algorithm fails; • x∗ ∈ (a, b), f (a) ≤ f (x∗), f (x∗) ≥ f (b). The algorithm fails. 10.2.2 Determination of the Minimum There are two ways to solve the problem. The first method supposes the reduction of the interval in which the minimum has been localized by successive steps, until the point of minimum is obtained with the desired accuracy. The method has the advantage of reliability (the point of minimum has been correctly determined), but also the disadvantage of a slow convergence. A second method to determine a point of minimum consists in replacing the function f (x) by another function g(x), which does pass through certain points, common with f (x), and hence to be g(xi) = f (xi) for certain xi of the interval in which the minimum takes place; it requires now the minimum of the function g(x). The method has the advantage of a faster convergence as in case of the previous one, but also the disadvantage of eventually leading to great errors if the point of minimum of the function g(x) is not in the considered interval. Usually, we take a parabola for g(x), because only three points are necessary to determine it. In connection with the first method, let us present the golden section algorithm1 in the following: – given: a < b < c, f (a) > f (b), f (b) < f (c), ε > √ εm, w = 0.38197; 1The algorithm was presented by Jack Carl Kiefer (1924–1981) in 1953.
  • 587. MINIMIZATION ALONG A DIRECTION 581 – calculate w1 = 1 − w, x0 = a, x3 = c, f0 = f (a), f3 = f (c); – if |c − a| > |b − a| then x1 = b, x2 = b + w|c − b|; else x2 = b, x1 = b − w|b − a|; – calculate f1 = f (x1), f2 = f (x2); – while |x3 − x0| > ε|x1 + x2| do – if f2 < f1 then x0 = x1, x1 = x2, x2 = w1x1 + wx3, f0 = f1, f1 = f2, f2 = f (x2); else x3 = x2, x2 = x1, x1 = w1x2 + wx0, f3 = f2, f2 = f1, f1 = f (x1); – if f1 < f2 then xmin = x1, fmin = f1; else xmin = x2, fmin = f2. The idea of the golden section algorithm is based on considerations which we further show. Let us consider three points a, b and c with a < b < c, fa = f (a) > f (b) = fb, fb < fc = f (c). (10.16) Let w = b − a c − a , 1 − w = c − b c − a . (10.17) We shall try to find a point x ∈ (a, c) so as to diminish the interval in which the minimum will be determined. We suppose also that (b, c) is an interval of length greater than (a, c) and that x is in (b, c). Let us denote z = x − b c − a . (10.18) The point of minimum will be either in the interval (a, x) or in the interval (b, c). We may write x − a c − a = w + z, c − b c − a = 1 − w. (10.19) Imposing the condition of equality of the two ratios of (10.19) (the most unfavorable case), it follows that the relation z = 1 − 2w. (10.20) But the same method has been used also for the determination of the point b at the previous step x − b c − b = b − a c − a = w, (10.21) from which we may successively deduce x − b = w(c − b) = z(c − a), 1 − w = c − b c − a = z w . (10.22) We thus obtained the equation w2 − 3w + 1 = 0, (10.23) which has the solution (w must be in the interval (0, 1)) w = 3 − √ 5 2 ≈ 0.38197; (10.24)
  • 588. 582 OPTIMIZATIONS hence, it follows that the position of the point x x = b + w(c − b) = c − (1 − w)(c − b). (10.25) We will present Brent’s algorithm2 of the second method: – given: a, c, f (a), f (c), nmax, w = 0.381966, ε; – calculate b = c, fb = fc, u = b, fu = fb; – if fb < fa then t = b, ft = fb, v = a, fv = fa; else t = a, ft = fa, v = b, fv = fb; – set i = 1, δu = 0, δx = b − a; – calculate x = 0.5(b + a), fx = f (x); – while (b − a) > ε(2|x| + 1) and i ≤ nmax do – calculate xm = 0.5(b + a); – if |δx| > 0, 5δu or u − a < ε(2|x| + 1) or b − u < ε(2|x| + 1) then – if x > xm then δx = w(a − x); else δx = w(b − x), δu = max(|b − x|, |a − x|); else r = (x − t)(fx − fv), q = (x − v)(fx − ft ), p = (x − v)q − (x − t)r, δx = −0.5 p q−r , δu = |δx|; – calculate fu = f (u), u = x + δx; – if fu ≤ fx then – if u ≥ x then a = x; else b = x; – calculate v = t, t = x, x = u, fv = ft , ft = fx, fx = fu; else – if u < x then a = u; else b = u; – if fu ≤ ft or t = x then v = t, t = u, fv = ft , ft = fu; else – if fu ≤ fv or v = x or x = t then v = u, fv = fu; – set i = i + 1. Brent’s algorithm uses six points a, b, u, v, t, x, not necessarily distinct, which have the following meanings: a and b are the points of the limits of the interval which contains the minimum; x is the point at which the function f takes the smallest value until a given moment; t is the value previous to x; v is the value previous to t, while u is the point at which the function f has been calculated last time. The parabolic interpolation is made through the points (x, f (x)), (t, f (t)) and (v, f (v)). Brent’s algorithm combines the assurance of the first method with the speed of the parabolic interpolation. To do this, we must take certain precautions so that the parabolic interpolation can be accepted, that is: • the calculated minimum be in the interval (a, b); 2Richard Pierce Brent (1946–) published this algorithm (also known as Brent’s method) in 1973.
  • 589. CONJUGATE DIRECTIONS 583 • the displacement with respect to the last value which approximates the minimum of f be at most equal to half of the previous displacement, to be sure that we have a convergent process; • the calculated point of minimum u is not be very near to another value previously calculated, that is, |u − p| > εp. 10.3 CONJUGATE DIRECTIONS A method to determine the minimum of a function U : Rn → Rn may be conceived as a repetition of the method of one-dimensional search along the directions i1, i2, . . . , in, not necessarily in this order. We thus determine a partial minimum of the function U, realizing the minimization of this function along the direction ij1 ; let U1 be this minimum. We minimize then along the direction ij2 , resulting in the minimum U2 and so on until ijn , obtaining the minimum Un. In the above procedure, we have jk ∈ {1, 2, . . . , n} and ijk = ijl for jk = jl, k = 1, n, l = 1, n. Moreover, there exists the sequence of inequalities U1 ≥ U2 ≥ · · · ≥ Un. (10.26) The algorithm is as follows: – given: x(0) , U(x); – for j from 1 to n do x(j) = min α∈R [U(x(j−1) + αij )]. Definition 10.3 The method considered above is called the method of one-dimensional search. Observation 10.5 The method is very simple, but has the disadvantage that either the minimum is not found or the time of work of the algorithm is sufficiently great to be inefficient. The problem is put to determine other more efficient displacement directions. Definition 10.4 The decreasing directions for which the method of one-dimensional search con- verges are called conjugate directions. Let us suppose that U(x) is twice differentiable with continuous derivatives. We may define the quadratic form φ(x) = U(x(k) ) + x1 − x(k) 1 . . . xn − x(k) n        ∂U ∂x1 ... ∂U ∂xn        x=x(k) + 1 2 x1 − x(k) 1 . . . xn − x(k) n       ∂2 U ∂x2 1 . . . ∂2 U ∂x1∂xn . . . . . . . . . ∂2 U ∂xn∂x1 . . . ∂2 U ∂x2 n       x=x (k) k     x1 − x(k) 1 ... xn − x(k) n     . (10.27) We observe that the quadratic form φ coincides with the three terms of the expansion into a Taylor series of the function U(x) about x(k) . The previous expression may be written in the form φ(x) = U(x(k) ) + (x − x(k) )∇U(x)|x=x(k) + 1 2 (x − x(k) )∇2 U(x)|x=x(k) (10.28)
  • 590. 584 OPTIMIZATIONS too. Moreover ∇φ(x) = ∇U(x)|x=x(k) +∇2 U(x)|x=x(k) (x − x(k) ). (10.29) Let us denote by p(k) the conjugate directions. The point x(k) is the point which minimizes the function φ(x(k−1) + αp(k−1) ), hence ∇U(x)|x=x(k) must be normal to the direction p(k−1) , which is written in the form [p(k−1) ]T ∇U(x)|x=x(k) = 0. (10.30) Moreover, the gradient of the function U(x), calculated at x = x(k+1) , must be normal to the direction p(k−1) , otherwise p(k−1) would not be a conjugate direction of minimization. Hence, [p(k−1) ]T ∇U(x)|x=x(k+1) = 0 (10.31) and equation (10.29) leads to ∇ (x) = ∇U(x)|x=x(k+1) + ∇2 U(x)|x=x(k+1) (x − x(k+1) ). (10.32) Subtracting relations (10.32) and (10.29) one from the other, we get ∇U(x)|x=x(k+1) − ∇U(x)|x=x(k) + [∇2 U(x)|x=x(k+1) − ∇2 U(x)|x=x(k) ](x − x(k+1) ) + ∇2 U(x)|x=x(k) (x(k) − x(k+1) ) = 0. (10.33) Taking now into account that x(k+1) has been determined by the displacement along the conjugate direction p(k) , it follows that ∇U(x)|x=x(k+1) = ∇U(x)|x=x(k) + ∇2 U(x)|x=x(k) (x(k+1) − x(k) ) = ∇U(x)|x=x(k) + αk∇2 U(x)|x=x(k) p(k) , (10.34) with αk ∈ R. Taking into account formulae (10.29) and (10.30), the product of the last relation and [p(k−1) ]T leads to [p(k−1) ]T ∇2 U(x)|x=x(k) [p(k) ] = 0. (10.35) Definition 10.5 Two directions which satisfy condition (10.35) are called G -conjugate directions. Observation 10.6 (i) If φ is a quadratic form, then its minimum is obtained after n displacements along n conjugate directions defined by relation (10.35). Therefore, it is requested that at each minimization stage of φ along the direction p(k) , the minimum must be determined so that [p(k) ]T ∇U(x)|x=x(k) = 0. (10.36) (ii) If the function U is not a quadratic form, then its minimum is not obtained after n displace- ments, but we arrive sufficiently near to it.
  • 591. METHODS OF GRADIENT TYPE 585 10.4 POWELL’S ALGORITHM The Powell algorithm3 gives a procedure to determine n conjugate directions without using the matrix ∇2 U(x) and is as follows: – given: x(0) , U(x), ε, n, iter; – for l from 1 to iter do – for j from 1 to n do – set p(j) = ij ; – for k from 1 to n − 1 do – for i from 1 to n do – determine x(i) so that min α∈R [U(x(i−1) + αp(i) )]; – for i from 1 to n − 1 do p(i) = p(i+1) ; – set p(n) = p(n) − x(0); – determine x(0) so that min α∈R [U(xn ) + αp(n) ]; – if |U − U0| < ε(1 − |U|) then stop (the minimum has been determined). Powell showed that, for a quadratic form φ, k iterations lead to a set of directions p(i), of which the latter k iterations are G -conjugate if the minimizations along the directions p(i) have been exactly made. In the frame of the algorithm, an iteration means n + 1 minimizations made along the directions p(1), p(2), . . . , p(n) and p(n) − x(0). Powell’s algorithm has the tendency to lead to linearly dependent directions. To avoid this phenomenon, we have two possibilities, that is: • either we use new initial positions for the directions p(j) = ij after n + 1 iterations; • or we renounce to the direction p(j) which has produced the greatest decrease of the function U(x). 10.5 METHODS OF GRADIENT TYPE The methods of gradient type are characterized by the use of the gradient of the function to be optimized, ∇U(x). 10.5.1 The Gradient Method This method rises from the observation that the given n − 1 -dimensional hypersurfaces of equations U(x) = Ci = const, i = 1, 2, . . . , (10.37) are disposed so that the constants Ci take more and more greater values when we go along the positive direction of the gradient. Definition 10.6 The hypersurfaces defined by relation (10.37) bear the name of level surfaces of the function U. 3Michael James David Powell (1936–) purposed this method in 1964.
  • 592. 586 OPTIMIZATIONS The gradient method supposes the construction of the sequence of iterations x(0) arbitrary, x(k+1) = x(k) + αk∇U(x(k) ), (10.38) where U(x(k) ) > U(x(k+1) ). (10.39) Let us notice that the direction p(k) = −∇U(x(k) ) is a direction of decreasing of the value of the function U(x) at the point x(k) (as a matter of fact, it is the direction of maximum decreasing for the function U(x) at the point x(k) ). The real value αk is determined by using one of the methods previously emphasized. Moreover, if the value αk is exactly determined, then between the gradients of the points x(k) and x(k+1) there exist the relations ∇U(x(k) )⊥∇U(x(k+1) ) ⇒ [∇U(x(k) )]T ∇U(x(k+1) ) = 0. (10.40) Definition 10.7 If the value of the scalar αk is exactly determined at each step k, then we say that the gradient method uses an optimal step or a Cauchy step. Any algorithm which uses the gradient of the objective function U(x) has the following structure: – given: x(0) , U(x), ∇U(x), ε, iter; – set x = x(0) , Uk = U(x(0) ), ∇U(x(k) ) = ∇U(x(0) ), p = ∇U(x(k) ); – for i from 1 to iter do – determine x so that min α∈R [x(k) + αp]; – set Uk+1 = U(x), ∇U(x(k+1)) = ∇U(x); – if Uk+1 ≥ Uk then the algorithm failed; stop; else perform test of convergence; actualize decreasing direction p; – set Uk = Uk+1. Observation 10.7 (i) A one-dimensional minimization method may be chosen, for example, Brent’s method. (ii) The gradient method does not require an exact calculus for the one-dimensional mini- mization. Therefore, we must specify a certain sufficiency criterion to determine the one- dimensional minimum. An idea is that of using the directional derivative in the form |[p(k) ]T ∇U[x(k) + αkp(k) ]| ≤ η|[p(k) ]T ∇U(x(k) )|, 0 ≤ η ≤ 1. (10.41) Thus, for η = 0 it follows that [p(k) ]T ∇U(x(k+1) ) = 0, hence the unidirectional minimization has been exactly made. We may impose also a condition of sufficient decreasing in the form ∇U(x(k+1) ) − ∇U(x(k) ) ≤ −µαk[∇U(x(k) )]T p(k) . (10.42) In general, we take 10−5 ≤ µ ≤ 10−1 , µ < η < 1. (10.43)
  • 593. METHODS OF GRADIENT TYPE 587 (iii) Concerning the convergence test, we may use many criteria. One of the criteria is defined by the relation x(k+1) − x(k) ≤ ε(1 + x(k+1) ). (10.44) A second criterion reads U(x(k+1) ) − U(x(k) ) ≤ ε(1 + U(x(k+1) ) ). (10.45) Sometimes one uses a criterion of the form ∇U(x(k+1) ) ≤ ε, (10.46) but its fulfillment does not necessarily mean that U has a minimum at that point (it can be a point of maximum or a mini–max one). 10.5.2 The Conjugate Gradient Method Let us consider the quadratic form φ(x) = ∇U(x(k) ) + [x − x(k) ]T ∇U(x(k) ) + 1 2 [x − x(k) ]T ∇2 U(x(k) )[x − x(k) ] (10.47) and a point x(k+1) for which we can write ∇φ(x(k+1) ) = ∇U(x(k) ) + ∇2 U(x(k) )[x(k+1) − x(k) ] = ∇U(x(k) ) + αk∇2 U(x(k) )p(k) , (10.48) where x(k+1) = x(k) + αkp(k) , (10.49) while the decreasing directions are given by p(k+1) = −∇U(x(k+1) ) + βkp(k) . (10.50) Imposing the condition that the directions p(k) and p(k+1) be G -conjugate [p(k+1) ]T ∇2 U(x(k) )p(k) = 0, (10.51) transposing relation (10.50) [p(k+1) ]T = −[∇U(x(k+1) )]T + βk[p(k) ]T (10.52) and multiplying it at the right by ∇2U(x(k))p(k), we get βk = [∇U(x(k+1) )]T ∇2 U(x(k) )p(k) [p(k)]T∇2U(x(k))p(k) . (10.53) Multiplying relation (10.52) by ∇2 U(x(k) )p(k+1) , it now follows that [p(k+1) ]T ∇2 U(x(k) )p(k+1) = −[∇U(x(k+1) )]T ∇2 U(x(k) )p(k+1) , (10.54) where we take into account relation (10.51).
  • 594. 588 OPTIMIZATIONS On the other hand, formula (10.48) leads to ∇2 U(x(k) )p(k) = ∇U(x(k+1) ) − ∇U(x(k) ) αk , (10.55) relation which holds if ∇U(x(k+1) ) and ∇U(x(k) ) are normal to each other, hence [∇U(x(k+1) )]T ∇U(x(k) ) = 0. (10.56) Relation (10.53) leads now to βk = − [∇U(x(k+1) )]T ∇2 U(x(k) )p(k) [∇U(x(k))]T∇2U(x(k))p(k) = [∇U(x(k+1) )]T ∇U(x(k+1) ) [∇U(x(k))]T∇U(x(k)) . (10.57) Multiplying relation (10.48) by [∇U(x(k+1) )]T and by [∇U(x(k) )]T and imposing condition (10.56) of perpendicularity of the vectors ∇U(x(k) ) and ∇U(x(k+1) ), we obtain αk = − [∇U(x(k))]T∇U(x(k))p(k) [∇U(x(k))]T∇2U(x(k))p(k) = [∇U(x(k+1))]T∇U(x(k+1)) [∇U(x(k+1))]T∇2U(x(k))p(k) . (10.58) On the other hand, the value αk of equation (10.48) is the value obtained from the approximation min Uα∈R[x(k) + αp(k) ]. Indeed, it is sufficient to show that the vectors p(k) and ∇U(x(k+1) ) are normal to each other [p(k) ]T ∇U(x(k+1) ) = 0. (10.59) But, from equation (10.48), equation (10.50), and equation (10.54) it follows that [p(k) ]T ∇U(x(k+1) ) = βk−1 [p(k−1) ]T ∇U(x(k) ). (10.60) We thus deduce that if at the previous step the one-dimensional search has been exactly made, that is, αk−1 has been determined so that p(k+1) and ∇U(x(k) ) be normal to each other, then we have relation (10.59) too. Observation 10.8 We have thus obtained the G -conjugate directions p(k) for which it has not been necessary to know the Hessian matrix, but for which it is necessary that the weights αk be exactly calculated. We use several variants to determine βk, that is: • the Fletcher–Reeves method4 for which βk = [∇U(x(k+1) )]T ∇U(x(k+1) ) [∇U(x(k))]T∇U(x(k)) ; (10.61) • the Polak–Ribi`ere method5 given by βk = [∇U(x(k+1))]Ty(k) [∇U(x(k))]T∇U(x(k)) , y(k) = ∇U(x(k+1) ) − ∇U(x(k) ); (10.62) 4 Roger Fletcher and C. M. Reeves published it in 1964. 5The method was presented by E. Polak and G. Ribi`ere in 1969.
  • 595. METHODS OF GRADIENT TYPE 589 • the Hestenes–Stiefel method6 characterized by βk = [∇U(x(k+1))]Ty(k) [∇U(x(k))]Tp(k) , y(k) = ∇U(x(k+1) ) − ∇U(x(k) ). (10.63) The most robust of these three methods is the Polak–Ribi`ere method. 10.5.3 Solution of Systems of Linear Equations by Means of Methods of Gradient Type Let the linear system be Ax = b, (10.64) where A is a positive definite symmetric matrix AT = A, xT Ax > 0, (∀) x = 0. (10.65) The solution of system (10.64) is equivalent to the minimization of the quadratic form U(x) = x, Ax − 2 x, b , (10.66) whereby ·, · we denoted the dot product given by y, z = yT z. (10.67) The gradient of U(x) is expressed by ∇U(x) = −2(b − Ax), (10.68) for the symmetric matrix A, while the Hessian reads ∇2 U(x) = 2A. (10.69) If we denote by x the solution of system (10.64), then ∇U(x) = 0, ∇2 U(x) = 2A, (10.70) hence the function U has a minimum at x. Moreover, if p is a decreasing direction, then we also have U(x + αp) = x + αp, A(x + αp) − 2 x + αp, b = U(x) + 2α p, Ax − b + α2 p, Ap . (10.71) On the other hand, p, Ap > 0, (10.72) because A is a positive definite matrix; hence, U(x + αp) has a minimum for α = α, obtained from dU(x + αp) dα = 0, (10.73) 6Magnus Rudolph Hestenes (1906–1991) and Eduard L. Stiefel (1909–1978) published the method in 1952.
  • 596. 590 OPTIMIZATIONS that is, 2 p, Ax − b + 2α p, Ap = 0, (10.74) from which α = p, b − Ax p, b − Ap . (10.75) For α = α it follows that the minimum of the function U(x + αp) along the direction p U(x + αp) = U(x) + α[2 p, Ax − b + α p, Ap ] = U(x) − p, b − Ax 2 p, Ap . (10.76) Observation 10.9 (i) Using the method of the gradient for which the decreasing direction is p = −∇U(x), (10.77) we then obtain the following algorithm: – given: x(0), A, b, iter, ε; – set i = 1; norm = 1; x = x(0) ; – while norm > ε and i ≤ iter do – calculate p = b − Ax, norm = √ p, p ; α = norm2 / p, Ap , x = x + αp, i = i + 1. (ii) If we apply the Fletcher–Reeves method, then we obtain the algorithm: – given: x(0) , A, b, iter, ε, δ; – set r(0) = b − Ax(0) , p(0) = r(0) ; – for k from 0 to iter − 1 do – if p(k), p(k) < δ then stop; – calculate αk = r(k),r(k) p(k),Ap(k) , x(k+1) = x(k) + αkp(k), r(k+1) = r(k) − αkAp(k); – if r(k+1) , r(k+1) < ε then stop; – calculate βk = r(k+1),r(k+1) r(k),Ar(k) , p(k+1) = r(k+1) + βkp(k) . 10.6 METHODS OF NEWTON TYPE The methods of Newton type use the Hessian matrix ∇2 U(x). 10.6.1 Newton’s Method The Newton method approximates the objective function U(x), at an arbitrary iteration k, by a quadratic form φk(x) = U(x(k) ) + [x − x(k) ]T ∇U(x(k) ) + 1 2 [x − x(k) ]T ∇2 U(x(k) )[x − x(k) ]. (10.78)
  • 597. METHODS OF NEWTON TYPE 591 If the Hessian matrix ∇2U(x(k)) is positive definite, then the quadratic form φk(x) has a minimum x = x, hence φk(x) − φk(x) > 0 (10.79) in a neighborhood of x. Moreover, the point of minimum x is a stationary point, hence the gradient of φk(x) vanishes at this point ∇φk(x) = 0. (10.80) We may write the approximate relation φk(x) − φk(x) ≈ 1 2 [x − x]T ∇2 U(x(k) )[x − x]. (10.81) Equation (10.80) may be solved using Newton’s method, which leads to the definition of the iterative sequence x(0) arbitrary, x(k+1) = x(k) − [∇2 U(x(k) )]−1 ∇U(x(k) ). (10.82) Definition 10.8 The decreasing direction p(k) , defined by p(k) = −[∇2 U(x(k) )]−1 ∇U(x(k) ), [∇U(x(k) )]T p(k) < 0, (10.83) bears the name of Newton direction. Observation 10.10 (i) The affirmation x(0) arbitrary in relation (10.82) must be understood as x(0) being an arbitrary point in a sufficiently small neighborhood of the exact solution, which is valid in any Newton method. (ii) If the Hessian matrix ∇2 U(x(k) ) is not positive definite, then it may happen that ∇U(x(k+1) ) be greater in value as ∇U(x(k) ), that is, the direction p(k) is no more a decreasing direction. (iii) If U(x) has flat zones, in other words if it can be approximated by a hyperplane, then in these zones the Hessian matrix ∇2 U(x) vanishes and the method cannot be applied. For these zones it would be necessary to determine, instead of the Hessian ∇2 U(x), another positive definite matrix to may continue the procedure. Various algorithms have been conceived to eliminate such inconveniences; one such algorithmis the algorithm of the trust region, which is presented as follows: – given: x(0) , U(x), ∇U(x), ∇2 U(x), µ, η, γ1, γ2, δ0, λ0, ε, εp, iter, np; – set x = x(0) , δ = δ0, λ = λ0, Uk = U(x(0) ), ∇U(x) = ∇U(x(0) ), ∇2 U(x(k) ) = ∇2 U(x(0) ), φk(x) = Uk; – for k from 1 to iter do – set d = 1, ip = 1; – while |d| > εp|λ| + 10−5 and ip < np do – calculate the Cholesky factorization of ∇U(x(k)) + λI = RTR; – solve the system RT Rp(k) = −∇U(x(k) ); – solve the system RT q = −p(k) ; – calculate d = ( p(k) / q )2(( p(k) /δ) − 1), λ = λ + d, ip = ip + 1; – calculate x(k+1) = x(k) + p(k) , Uk+1 = U(x(k+1) ), φk+1 = Uk + [p(k) ]T ∇U(x(k+1) ) + 1/2[p(k)]T∇2U(x(k+1))p(k), d = Uk+1 − Uk;
  • 598. 592 OPTIMIZATIONS – if |d| < ε|Uk+1| then stop (the minimum has been found); – calculate rk = d/φ(x(k+1) ) − φ(x(k) ); – if rk > µ then x(k+1) = x(k) ; – if rk ≤ µ then δ = γ1δ; else – if rk > η then δ = γ2δ. Observation 10.11 (i) The usual values for each of the parameters µ, η, γ1, and γ2 are as follows µ = 0.25, η = 0.75, γ1 = 0.5, γ2 = 2. (10.84) (ii) The algorithm establishes a trust region in the model, that is, a region in which U(x) may be good approximated by a quadratic form φk(x). This zone is a hypersphere of center x(k) and radius δk; we try the point of minimum for φk(x) in this hypersphere. This minimum is not taken into consideration if it does not belong to the interior of the hypersphere. (iii) The length of the hypersphere radius which defines the trust zone at the step k + 1 is calculated as a function of the previous value and of the ratio rk between the effective reduction of the hypersphere radius and the planed one rk = U(x(k+1) ) − U(x(k) ) φ(x(k+1)) − φ(x(k)) . (10.85) If rk is small, then δk+1 < δk, otherwise we consider δk+1 > δk. (iv) The searching Newton direction p(k) is determined by the relation [∇2 U(x(k) ) + λI] p(k) = −∇U(x(k) ), (10.86) where λ is a parameter which assures that the matrix ∇2 U(x(k) ) + λI be positive definite so as to avoid the cases in Observation 10.10. (v) The Cholesky decomposition is not imperatively necessary, but increases the calculation velocity to solve system (10.86) in case of the positive definite matrix ∇2U(x(k)) + λI. 10.6.2 Quasi-Newton Method The quasi-Newton method approximates the Hessian matrix ∇2U(x) by a positive definite symmetric matrix B. The equation which determinates the decreasing direction p(k) is now written in the form Bkp(k) = −∇U(x), (10.87) while x(k+1) is determined by the relation x(k+1) = x(k) + αkp(k) , (10.88)
  • 599. LINEAR PROGRAMMING: THE SIMPLEX ALGORITHM 593 where αk results from the condition of minimum of the function of one variable U[x(k) + αkp(k)]. It remains to solve the problem of bringing up-to-date the matrix Bk in the matrix Bk+1. There exist several methods, the most known being: • the Davidon–Fletcher–Powell7 method for which Bk+1 = Bk + z(k) [y(k) ]T + y(k) [z(k) ]T [y(k)]T[x(k+1) − x(k)] − [z(k) ]T [x(k+1) − x(k) ] {[y(k)]T[x(k+1) − x(k)]}2 y(k) [y(k) ]T , z(k) = y(k) + αk∇U(x(k) ), y(k) = ∇U(x(k+1) ) − ∇U(x(k) ); (10.89) • the Broyden–Fletcher–Goldfarb–Shanno method in which Bk+1 = Bk + y(k) [y(k) ]T [y(k)]T[x(k+1) − x(k)] − Bk[x(k+1) − x(k) ] [x(k+1) − x(k) ]T Bk [x(k+1) − x(k)]TBk[x(k+1) − x(k)] , y(k) = ∇U(x(k+1) ) − ∇U(x(k) ). (10.90) We may write x(k+1) = x(k) − αkB−1 k ∇U(x(k) ) (10.91) too, while formulae (10.89) and (10.90) also give the inverse B−1 k+1 as a function of B−1 k . Thus: • the Davidon–Fletcher–Powell method gives B−1 k+1 = B−1 k + [x(k+1) − x(k) ] [x(k+1) − x(k) ]T [y(k)]T[x(k+1) − x(k)] − B−1 k y(k) [y(k) ]T B−1 k [y(k)]TB−1 k y(k) ; (10.92) • the Broyden–Fletcher–Goldfarb–Shanno8 method leads to B−1 k+1 = B−1 k − B−1 k y(k)[x(k+1) − x(k)]T + [x(k+1) − x(k)][y(k)]TB−1 k [y(k)]T[x(k+1) − x(k)] + [x(k+1) − x(k) ][y(k) ]T B−1 k y(k) [x(k+1) − x(k) ]T {[y(k)]T[x(k+1) − x(k)]}2 + [x(k+1) − x(k)][x(k+1) − x(k)]T [y(k)]T[x(k+1) − x(k)] . (10.93) 10.7 LINEAR PROGRAMMING: THE SIMPLEX ALGORITHM 10.7.1 Introduction Let a linear system of m equations with n unknowns be Ax = b. (10.94) Definition 10.9 Two linear systems are called equivalent if any solution of the first system is a solution of the second system too and reciprocal. Definition 10.10 We call elementary transformation applied on a linear system as any one of the following: • the multiplication of an equation by a nonzero number; • the change of the order of two equations; 7 William C. Davidon, Roger Fletcher and Michael James David Powell published the method in 1958 and 1964. 8Charles George Broyden (1933–2011), Roger Fletcher, Donald Goldfarb and David Shanno published the method in 1970.
  • 600. 594 OPTIMIZATIONS • the multiplication of an equation by a nonzero number, the addition of the result to another equation, and the replacing of the latter equation by the equation thus obtained. Observation 10.12 (i) Each of the above operation determines an operation on the enlarged matrix of the system. These transformations are equivalent to the multiplication of the extended matrix at the left by certain matrices. Thus, considering the matrix M1 =       1 · · · 0 · · · 0 · · · · · · · · · · · · · · · 0 · · · α · · · 0 · · · · · · · · · · · · · · · 0 · · · 0 · · · 1       , M1 ∈ Mm(R), (10.95) which differs from the unit matrix only by the element α situated at the position (i, i), α = 0, the multiplication at the left of the extended matrix by M1 has as effect, the multiplication of the row i of the extended matrix by α. If we multiply the extended matrix at left by the matrix M2 given by M2 =           1 · · · 0 · · · 0 · · · 0 · · · · · · · · · · · · · · · · · · · · · 0 · · · 0 · · · 1 · · · 0 · · · · · · · · · · · · · · · · · · · · · 0 · · · 1 · · · 0 · · · 0 · · · · · · · · · · · · · · · · · · · · · 0 · · · 0 · · · 0 · · · 1           , M2 ∈ Mm(R), (10.96) which differs from the unit matrix of order m by the elements at the positions (i, i) and (i, j) replaced by 0 and by the elements at the positions (i, j) and (j, i) replaced by 1, then the product M2A, where A is the extended matrix has as effect, the interchange of the rows i and j of the extended matrix A. Let us now consider the matrix M3 =           1 · · · 0 · · · 0 · · · 0 · · · · · · · · · · · · · · · · · · · · · 0 · · · 1 · · · α · · · 0 · · · · · · · · · · · · · · · · · · · · · 0 · · · 0 · · · 1 · · · 0 · · · · · · · · · · · · · · · · · · · · · 0 · · · 0 · · · 0 · · · 0           , M3 ∈ Mm(R), (10.97) which differs from the unit matrix by the element at the position (i, j), which has the value α = 0; then the product M3A has as effect, the multiplication of the row j by α and the addition of it to the row i. (ii) The elementary operations lead, obviously, to equivalent systems. Definition 10.11 A system is called explicit if the matrix of the system contains all the columns of the unit matrix of order m (the number of the equations of the system).
  • 601. LINEAR PROGRAMMING: THE SIMPLEX ALGORITHM 595 Observation 10.13 (i) The columns of the unit matrix may be found at any position in the matrix A of the system. (ii) A developed linear system has the number of unknowns at least equal to the number of equations, that is, m ≤ n. Definition 10.12 The variables, the coefficients of which form the columns of the unit matrix are called principal or basic variables. The other variables of the system are called secondary or nonbasic variables. Observation 10.14 A compatible system may be developed so as to have exactly m columns of the unit matrix. To do this, it is sufficient to effect the elementary transformations presented in a certain order. Definition 10.13 (i) A solution of system (10.94) in which the n − m secondary variables vanish is called basic solution. (ii) A basic solution is called nondegenerate if it has exactly m nonzero components (the principal variables have nonzero values) and degenerate in the opposite case. 10.7.2 Formulation of the Problem of Linear Programming Definition 10.14 (i) A problem of linear programming is a problem which requires the minimization (maximiza- tion) of the function f (x1, x2, . . . , xn) = minimum (maximum) (10.98) if fi(x1, x2, . . . , xn) ≤ bi, i = 1, p, fj (x1, x2, . . . , xn) ≥ bj , j = p + 1, q, fk(x1, x2, . . . , xn) = bk, k = q + 1, r (10.99) and xl ≥ 0, l = 1, m1, xh ≤ 0, h = m1 + 1, m2, xt arbitrary, t = m2 + 1, m, (10.100) the functions f , fi, fj , and fk being linear. (ii) Conditions (10.99) are called the restrictions of the problem, while the vector x = [x1 . . . xn]T, which verifies the system of restrictions, is called possible solution of the linear programming problem. (iii) The possible solution x which verifies conditions (10.100) too is called admissible solution of the linear programming problem. (iv) The admissible solution which realizes the extremum of function (10.98) is called optimal solution or optimal program. The linear programming may be written in a matrix form AxSb, x > 0, f = CT x = minimum (maximum), (10.101)
  • 602. 596 OPTIMIZATIONS in which A = [aij ]i=1,m j=1,n , b = b1 · · · bn T , C = c1 · · · cn T , (10.102) and where S takes the place of one of the signs ≤, =, or ≥. Let us observe that the second relation (10.101) puts the condition that all variables be non- negative. This can be always obtained, as will be seen later. Definition 10.15 (i) A problem of linear programming is of standard form if all the restrictions are equations and if we impose conditions of non-negativeness to all variables. (ii) A problem of linear programming is of canonical form if all the restrictions are inequalities of the same sense and if conditions of non-negativeness are imposed to all variables. Observation 10.15 (i) A program of standard form reads Ax = b, x ≥ 0, f = CT x. (10.103) (ii) A program of canonical form is written Ax ≥ b, x ≥ 0, CT x = minimum (10.104) or Ax ≤ b, x ≥ 0, CT x = maximum. (10.105) (iii) A program may be brought to a standard or to a canonical form by using the following elementary transformations: • an inequality of a certain sense may be transformed into an opposite sense by multipli- cation with −1; • a negative variable may be transformed in a positive one by its multiplication with −1; • a variable, let us say xk, xk ∈ R, is written in the form xk = x(1) k − x(2) k , (10.106) where x(1) k ≥ 0, x(2) k ≥ 0; • an equality is expressed by means of two inequalities; so ai1x1 + ai2x2 + · · · + ainxn = bi (10.107) is written in the form ai1x1 + ai2x2 + · · · + ainxn ≤ bi, ai1x1 + ai2x2 + · · · + ainxn ≥ bi; (10.108)
  • 603. LINEAR PROGRAMMING: THE SIMPLEX ALGORITHM 597 • the inequalities are transformed in equalities by means of the compensation variables; thus ai1x1 + ai2x2 + · · · + ainxn ≤ bi (10.109) becomes ai1x1 + ai2x2 + · · · + ainxn + y = bi, y ≥ 0, (10.110) while ai1x1 + ai2x2 + · · · + ainxn ≥ bi (10.111) is transformed in ai1x1 + ai2x2 + · · · + ainxn − y = bi, y ≥ 0. (10.112) 10.7.3 Geometrical Interpretation In the space Rn , an equality of the restrictions system defines a hyperplane, while an inequality defines a half-space. We thus define a convex polyhedron in the space Rn, and if the optimum is unique, then it will be situated at one of the vertices of this polyhedron. The objective function, written in the form f (x) = c1x1 + c2x2 + · · · + cnxn = λ, λ ∈ R, (10.113) defines a pencil of hyperplanes, while for λ = 0 we obtain a hyperplane which passes through the origin. Definition 10.16 (i) The hyperplanes of the pencil (10.113) are called level hyperplanes. (ii) The hyperplanes become straight lines in R2 and are called level straight lines. Observation 10.16 The objective function has the same value at points situated on the same level hyperplane. 10.7.4 The Primal Simplex Algorithm Definition 10.17 A linear program is said to be in primal admissible form if it is given by the relations maximum (minimum)f(x) = f0 + k∈K a0kxk, (10.114) xi + k∈K aik xk = bi, i ∈ I, (10.115) xk ≥ 0, xi ≥ 0, k ∈ K, i ∈ I, (10.116) where K is the set of indices of secondary variables, while I marks the set of indices of principal variables.
  • 604. 598 OPTIMIZATIONS Observation 10.17 Obviously, any linear program may be brought to the primal admissible form by means of the elementary transformations presented above. Let a program be in the primal admissible form and an admissible basic solution corresponding to this form. The Simplex algorithm9 realizes a partial examination of the list of the basic solutions of the system of restrictions, having scope to find an optimal basic solution or to demonstrate the inexistence of such a solution. Let us assume that after r steps the program takes its primal admissible form f = f (r) 0 + k∈K(r) a0kxk, (10.117) xi = k∈K(r) aik xk = bi, i ∈ I(r) , (10.118) xk ≥ 0, xi ≥ 0, k ∈ K(r) , i ∈ I(r) , (10.119) where the upper index r marks the iteration step. There are four operations to perform: • application of the optimality criterion. If a0k ≥ 0 for all k ∈ K(r) , then the linear program has the basic solution obtained at the step r and the algorithm stops; in the opposite case, we pass to the following stage; • application of the entrance criterion. At this stage, we determine the secondary unknown xh, which becomes a principal variable and is given by a0h = min k∈K(r) a0k < 0. (10.120) If all aih ≤ 0, i ∈ I(r) , then the program does not have an optimal solution and the algorithm stops; in the opposite case, we pass to the following stage; • application of the exit criterion. We determine the principal variable xj , which becomes sec- ondary by the relations xj ajh = min i∈I(r) aih >0 bi aih ; (10.121) • we make a pivoting with the pivot ajh to obtain the column of the unit matrix on the column h. Usually, we use tables. x1 · · · xi . . . xm xm+1 . . . xk . . . xn 0 · · · 0 . . . 0 a01 . . . a0k . . . a0n −f0 x1 1 . . . 0 . . . 0 a1,m+1 . . . a1k . . . a1n b1 · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · xi 0 . . . 1 . . . 0 ai,m+1 . . . ai,k . . . ai,n bi · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · xm 0 . . . 0 . . . 1 am,m+1 am,k am,n bm 9The algorithm was purposed by George Bernard Dantzig (1914–2005) in 1947.
  • 605. LINEAR PROGRAMMING: THE SIMPLEX ALGORITHM 599 10.7.5 The Dual Simplex Algorithm Definition 10.18 Let the problem of linear programming in canonical form be a11x1 + a12x2 + · · · + a1nxn ≥ b1, a21x1 + a22x2 + · · · + a2nxn ≥ b2, . . . , am1x1 + am2x2 + · · · + amnxn ≥ bm, (10.122) x1 ≥ 0, x2 ≥ 0, . . . , xn ≥ 0, (10.123) minimumf = c1x1 + c2x2 + · · · + cnxn. (10.124) By definition, the dual of this problem is a11y1 + a21y2 + · · · + am1ym ≤ c1, a12y1 + a22y2 + · · · + a2nym ≤ c2, . . . , a1ny1 + a2ny2 + · · · + amnym ≤ cn, (10.125) y1 ≥ 0, y2 ≥ 0, . . . , ym ≥ 0, (10.126) max g = b1y1 + · · · + bmym. (10.127) Observation 10.18 The dual problem is obtained from the primal problem as follows: • to each restriction of the system of restrictions (10.122) we associate a dual variable yi; • the variable yi does not have a sign restriction if the corresponding restriction of (10.122) is an equality and has a restriction in case of inequality; thus, for ≥ corresponds yi ≥ 0, for ≤ corresponds yi ≤ 0 and for = corresponds yi arbitrary; • to each variable xi we associate a restriction in which the coefficients of the variables y1, . . . , ym are the coefficients of the variable xi of system (10.122), while the free terms are ci; • the dual restriction associated to xi is ≤ if xi ≥ 0, is ≥ if xi ≤ 0 and is = for any xi; • the minimum of the objective function of the primal problem is transformed in the maximum of the objective function of the dual problem; • the objective function of the dual problem is obtained by means of the free terms of the initial restrictions (10.122). Definition 10.19 A linear program is in an explicit dual admissible form if xi + k∈K aik xk = bi, i ∈ I, (10.128) xi ≥ 0, xk ≥ 0, i ∈ I, k ∈ K, (10.129) f0 + k∈K a0kxk = minimum. (10.130) Let us suppose that at the step r, the linear program is expressed by the relations xi + k∈K aik xk = bi, i ∈ I(r) , (10.131)
  • 606. 600 OPTIMIZATIONS xi ≥ 0, xk ≥ 0, i ∈ I(r) , k ∈ K(r) , (10.132) f0 + k∈K(r) a0kxk = minimum. (10.133) For the step r + 1, we have to pass through the stages: • application of the optimality criterion. At this stage we establish if bi ≥ 0 for all i ∈ I(r). If the answer is yes, then the solution is optimal; in the opposite case, we pass to the following stage; • application of the exit criterion. We determine the unknown xj , which becomes secondary by the condition bj = min i∈I(r) bi, (10.134) and we verify if all the elements ajk ≥ 0, with k ∈ K(r). If yes, then the problem does not have an admissible solution; in the opposite case, we pass to the following step; • application of the entrance criterion. We determine the unknown xh, which becomes a principal variable. This results from the condition a0h ajh = min k∈K(r) ajk <0 a0k ajk ; (10.135) that effects the pivoting with the element ajh. 10.8 CONVEX PROGRAMMING Definition 10.20 Let X be a convex set and f a function f : X → R. We say that the function f is convex (or convex in Jensen’s sense) if for any α ∈ (0, 1) and any x1, x2 of X we have f (αx1 + (1 − α)x2) ≤ f (x1) + (1 − α)f (x2). (10.136) Observation 10.19 (i) If f is differentiable, then, instead relation (10.136), we may consider the inequality f (x) ≥ f (x∗ ) + f (x∗ ), x − x∗ , (10.137) where ·, · marks the scalar product. (ii) If f : I ⊂ R → R, I being an interval, let us consider the expansion into a Taylor series f (x) = f (x∗ ) + f (x∗ )(x − x∗ ) + 1 2 f (ξ)(x − x∗ )2 , (10.138) where ξ is a point between x and x∗ . The convexity condition of f leads to the inequality f (x) ≥ 0, for any x ∈ I. (iii) In the case f : D ⊂ Rn → R, it requires that the matrix of the derivatives of second order be semi-positive definite, that is,
  • 607. CONVEX PROGRAMMING 601 x1 x2 · · · xn               ∂2 f ∂x2 1 ∂2 f ∂x1∂x2 · · · ∂2 f ∂x1∂xn ∂2 f ∂x2∂x1 ∂2 f ∂x2 2 · · · ∂2 f ∂x2∂xn · · · · · · · · · · · · ∂2 f ∂xn∂x1 ∂2 f ∂xn∂x2 · · · ∂2 f ∂x2 n                    x1 x2 ... xn      ≥ 0, (10.139) for any x ∈ D. The problem of convex programming requires determining the minimum of f (x), f : D ⊂ Rn → R in the condition of a restriction of the form gi(x) ≤ 0, i = 1, m. If we denote by E the admissible set E = {x ∈ D|gi(x) ≤ 0, i = 1, m}, (10.140) then the problem of convex programming requires the determination of the value inf x∈E f (x). We define Lagrange’s function by L(x, λ) = f (x) + m i=1 λigi(x), (10.141) where λ = λ1 λ2 · · · λm T (10.142) is a vector of non-negative components λi > 0, i = 1, m. We suppose that the condition of regularity is fulfilled too, in the sense that there exists at least a point ξ ∈ E at which the inequalities gi(x) ≤ 0 become strict, that is, gi(ξ) < 0, i = 1, m. (10.143) The Kuhn–Tucker theorem states that to ensure the minimum of the function f by x∗ it is sufficient (and necessary if the condition of regularity is fulfilled) that the vector λ = [λ1 λ2 . . . λm]T does exist so as to have L(x∗ , λ) ≤ L(x∗ , λ∗ ) ≤ L(x, λ∗ ) (10.144) for all x ∈ D and λ > 0 (λ1 > 0, λ2 > 0, . . . , λm > 0). The point (x∗, λ∗) ∈ D × Rm + is called saddle point for Lagrange’s function and fulfills the condition λ∗ i gi(x∗ ) = 0, i = 1, m. (10.145) Moreover, L(x∗ , λ∗ ) = f (x∗ ). (10.146) Let us suppose that Lagrange’s function has the saddle point (x∗ , λ∗ ) and let us consider the function φ(λ) = inf x∈D L(x, λ), λ ≥ 0 (10.147)
  • 608. 602 OPTIMIZATIONS for which φ(λ∗) = f (x∗). Now, let the function be f : Rn → R, given by f (x) = ∞ if x /∈ E, f (x) if x ∈ E. (10.148) Definition 10.21 Let us define the dual problem of the convex programming inf x∈E f (x), (10.149) that is the problem inf x∈D f (x). (10.150) We have f (x∗ ) = min x∈D f (x) = max λ>0 φ(λ), (10.151) hence, instead of searching f (x∗ ) = min x∈D f (x), we may determine max λ>0 φ(λ). 10.9 NUMERICAL METHODS FOR PROBLEMS OF CONVEX PROGRAMMING We present hereafter some methods of convex programming. 10.9.1 Method of Conditional Gradient For the point x of the admissible set E, we consider the problem min x∈E [f (x) + f (x), x − x ]. (10.152) If x0 is the solution of this problem, then, on the segment of a line which joints the points x and x0, that is, for the points x = (1 − α)x + αx0 , (10.153) we search the point of minimum of f , that is, we solve the problem min α∈[0,1] [f (x + α(x0 − x))]; (10.154) let us suppose that this minimum is attained for α = α. Under these conditions, we continue the procedure with the point x1 = x + α(x0 − x). (10.155) 10.9.2 Method of Gradient’s Projection The idea of the method of gradient’s projection consists in the displacement along the antigradient’s direction −∇f (x) by a step so as to not go out of the domain of admissible solutions. If h is the length of the step (which depends on any iteration step), then we calculate x0 = x − ∇f (x); (10.156) we solve the problem min x∈E 1 2 x − x0 , x − x0 , (10.157) continuing the procedure with the point of minimum thus obtained.
  • 609. QUADRATIC PROGRAMMING 603 10.9.3 Method of Possible Directions Let x be an admissible point and let us define the set of active restrictions at the point x, denoted by S(x) as being the set of all indices i for which gi(x) = 0. At the point x, we will search a direction x, which make an obtuse angle with ∇f as well as with the external normals to the active restrictions ∇gi(x), i ∈ S(x). This choice leads to the diminishing of the function to be minimized and ensures to remain in the interior of E, if we impose the conditions ∇f (x), x + β ∇f (x) ≤ 0, (10.158) ∇gi(x), x + β ∇gi(x) ≤ 0, i ∈ S(x), (10.159) where the factor β has to be minimized. Usually, we introduce also a normalization condition of the form x, x ≤ 1 (10.160) or −1 ≤ xj ≤ 1, j = 1, n. (10.161) Once the direction x is determined, we pass to the solving of the problem of one-directional minimization min β [f (x + βx)], with gi(x + βx) ≤ 0, i ∈ S(x). (10.162) 10.9.4 Method of Penalizing Functions In the frame of penalizing functions method, we introduce a term in the function to be minimized, which penalizes the non compliance of a restriction. Let us consider (x) = m i=1 [max{gi(x), 0}]2 (10.163) and let us search the minimum of the function f (x) + r (x), where r is a sufficiently great positive number. 10.10 QUADRATIC PROGRAMMING Let us consider the programming problem min f (x) = min  1 2 n j=1 n k=1 cjk xj xk + n j=1 dj xj   , (10.164) n j=1 aij xj ≥ bi, i = 1, m, (10.165) or in a matrix form, min f (x) = min 1 2 x, Cx + D, x , (10.166) Ax ≥ b, (10.167)
  • 610. 604 OPTIMIZATIONS where C ∈ Mn(R) is symmetric and positive definite, A ∈ Mm,n(R), D ∈ Mn,1(R), b ∈ Mm,1(R). Lagrange’s function is of the form L = 1 2 x, Cx + D, x + λ, b − Ax , (10.168) the saddle point being searched for λ ≥ 0. The optimality criterion L(x∗ , λ∗ ) ≤ L(x, λ∗ ) (10.169) leads to ∂L(x, λ∗) ∂x |x=x∗ = 0 (10.170) or Cx∗ + D − AT λ∗ = 0. (10.171) The inequality L(x∗ , λ) ≤ L(x∗ , λ∗ ) (10.172) leads to Ax∗ ≥ b, (10.173) λ∗ i (bi − (Ax)i) = 0, i = 1, m. (10.174) Moreover, λ∗ i ≥ 0, i = 1, m. (10.175) It follows that if the pair (x∗ , λ∗ ) satisfies conditions (10.171), (10.173), (10.174), and (10.175), then x∗ is the solution for problems (10.166), (10.167), while λ∗ is the solution for the dual problem. We suppose that the rows of the matrix A are linearly independent (it means that the restrictions (10.165) are independent). In relation (10.174), we have denoted by (Ax)i the element on the row i and column 1 in the product Ax. The system of restrictions (10.167) defines a polyhedrical set with faces of various dimensions. Each face contains only admissible points, which satisfy a system of equations AI x = bI (10.176) where AI is the matrix obtained from the matrix A by retaining only the rows of the set I, that is, the matrix of rows (A)i, i ∈ I, I = {i1, i2, . . . , iI }; analogically for the matrix bI . On the other hand, the minimum x∗ is found on a face of the polyhedron, in particular on an edge or at its vertex. Let us suppose that there exists the admissible point x for which we have the set I = {i1, i2, . . . , iI }, the rows of the matrix AI are independent, while x satisfies relation (10.176). There may occur two situations. (i) The point x gives the minimum of the function f with the restrictions (10.176). It follows that there exist the factors λi, i ∈ I, for which ∂ ∂x [f (x) + λI (bI − AI x)]x=x = 0, (10.177) that is, Cx + D − AT I λI = 0. (10.178)
  • 611. DYNAMIC PROGRAMMING 605 From relation (10.178), we determine the vector λI ; if all its components λI , i ∈ I are non- negative, then the algorithm stops, because the searched solution x∗ = x has been found. But, if there exists an index ik ∈ I for which λik < 0, then ik is eliminated from the set I, resulting in I = {i1, . . . , ik−1, ik, . . . , iI }, (10.179) that is, we pass to a new face of the polyhedral set. (ii) The function f attains its minimum in the set of solutions (10.176), at a point x0 = x. In this case, we write z = x0 − x, (10.180) gi = −(Ax)i = − n j=1 aij zj , i /∈ I, (10.181) i = (Ax)i − bi = n j=1 aij xj − bi, i /∈ I (10.182) and determine ε0 = min gi>0 i gi . (10.183) We choose ε = min{ε0, 1}. (10.184) If ε = 1, then the displacement has been made at the point x0, the set I being preserved. If ε < 1, then this minimum has been attained for an index i which did not belong to the set and the set I must be brought up-to-date by adding this index also I = {i1, i2, . . . , II , i }; (10.185) we replace thus the point x by the point x + εx0. Let us notice that for the determination of x(0) , that is, of the start point, we must solve the linear system Cx + D − AT I λI = 0, AI x = bI . (10.186) 10.11 DYNAMIC PROGRAMMING Let us consider the optimal control problem for the system10 dϕ dt = f(ϕ, u), 0 ≤ t ≤ T, ϕ(0) = ϕ0, (10.187) where ϕ = φ1 φ2 · · · φn T , f = f1 f2 · · · fn T , u = u1 u2 · · · um T , n ∈ N, m ∈ N, n ≥ 1, m ≥ 1, (10.188) 10The concept of dynamic programming was introduced by Richard E. Bellman (1920–1984) in 1953.
  • 612. 606 OPTIMIZATIONS The admissible commands are given by u = u(t) and are piecewise continuous, u(t) ∈ U, where U is a closed set. In the class of the admissible commands we must find a command u(t) to which corresponds the solution ϕ(t) of problem (10.187) for which the functional F(u) = T 0 f0(ϕ, u)dt (10.189) be minimum. To do this, we apply Bellman’s principle11 which states that the optimal command, at any moment, does not depend on the previous history of the system, but is determined only by the goal of the command and by the state of the system at that moment. Denoting Q(ϕ, t) = min u∈U T t f0(ϕ(τ), u(τ))dτ, (10.190) Bellman’s optimality principle leads to the notation Q(ϕ(t), t) = min u    t+ t t f0 (ϕ (τ) , u(τ)) dτ + min u T t+ t f0(ϕ(τ), u(τ))dτ    . (10.191) But T t+ t f0(ϕ(τ), u(τ))dτ = Q(ξ + ξ, t + t), (10.192) where ξ = t+ t t f(ϕ, u)dτ. (10.193) Let us suppose that both terms between brackets in relation (10.191) may be expanded into a Taylor series and let us make t → 0. It follows that Bellman’s equation − ∂Q ∂t = min f0 (ϕ, u) + f (ϕ, u) , ∂Q ∂ϕ , (10.194) Q(ϕ, T ) = 0. (10.195) If the minimum in the right side of relation (10.194) is attained at only one point u∗, then u∗ is function of ϕ and ∂Q/∂ϕ, that is, u∗ = u∗ ϕ, ∂Q ∂ϕ . (10.196) Introducing this result in relation (10.194), it follows that a nonlinear system of the form − ∂Q ∂t = f0 ϕ, u∗ ϕ, ∂Q ∂ϕ + f ϕ, u∗ ϕ, ∂Q ∂ϕ , ∂Q ∂ϕ . (10.197) 11Richard E. Bellman (1920–1984) stated this principle in 1952.
  • 613. PONTRYAGIN’S PRINCIPLE OF MAXIMUM 607 If u∗ is a function of ϕ and t, then system (10.197) is a hyperbolic one, with the characteristics oriented from t = 0 to t = T . Let us consider a process described by a system of difference equations ϕi+1 = g(ϕi, ui), i = 0, N − 1. (10.198) We must minimize the functional F(u) = N−1 i=0 f0(ϕi, ui), (10.199) the solution of which depends on the initial state ϕ0 and on the number of steps N. If we denote the searched optimal value by QN (ϕ0), then the problem of minimum leads to the relation QN (ϕ0) = min u0 min [u1,u2, ... ,uN−1] f0 ϕ0, u0 + N−1 i=1 f0(ϕi, ui) . (10.200) But N−1 i=1 f0(ϕi, ui) = QN−1(ϕ1), (10.201) obtaining thus QN (ϕ0) = min u0 [f0(ϕ0, u0) + QN (ϕ1)]. (10.202) Step by step, we get the recurrence relations QN−j (ϕj ) = min uj ∈U [f0(ϕj , uj ) + QN−j−1(ϕj+1)], j = 0, N − 2, ϕj+1 = g(ϕj , uj ), QN−1(ϕj ) = min uN−1∈U f0(ϕN−1, uN−1), ϕN−11 = g(ϕN−2, uN−2). (10.203) If ϕN−1 is known, then from (10.180) we get uN−2, . . . , u0 and QN (ϕ0). 10.12 PONTRYAGIN’S PRINCIPLE OF MAXIMUM Let us consider the system of ordinary differential equations dϕ dt = f(ϕ, u), 0 ≤ t ≤ T, (10.204) where ϕ = φ1 φ2 · · · φn T , f = f1 f2 · · · fn T , u = u1 u2 · · · un T , (10.205) and to which we add the limit conditions ϕ(0) ∈ S0, ϕ(T ) ∈ S1, (10.206) where S0 and S1 are given manifolds in the Euclidian space En.
  • 614. 608 OPTIMIZATIONS The problem requires that, being given a closed set U ⊂ En, do determine a moment T and a command u = u(t) ∈ U piecewise continuous, for which the trajectory ϕ = ϕ(t, u) do satisfy the conditions (10.204) and (10.206), as well as F(u) = T 0 f0(ϕ, u)dt = minimum. (10.207) We will suppose that: • the functions f(ϕ, u) are definite and continuous in the doublet (ϕ, u), together with the partial derivatives ∂fi/∂φj , i, j = 1, n; • the manifolds S0 and S1 are given by the relations S0 = {ϕ|φi(0) = φ0 i ; i = 1, n}, (10.208) S1 = {ϕ|hk(ϕ(t)) = 0; k = 1, l, l ≤ n}, (10.209) where hk(x) are functions with continuous partial derivatives; supplementary, ∇hk(x), k = 1, l, contains linearly independent components for any x ∈ S1. Let us remark that if l = n, then we get the optimal control problem (10.204), (10.206), (10.207) with fixed right end. Condition (10.208) mean fixation of the left end. If S1 = En, then we have an optimal control problem with a mobile right end, while if 0 < l < n, then we have a problem with a mobile right end. Immaterial of whether the right end is fixed, free or mobile, the dimension of the manifold S1 is equal to n − 1. Theorem 10.1 (Pontryagin12 ). Let the system be of ordinary differential equations dϕ dt = f(ϕ, u), u ∈ U, S0 = {ϕ(0) = ϕ0 }, S1 = {hk(ϕ(T )) = 0, k = 1, l} (10.210) for which the above conditions are fulfilled. Let {ϕ(t), u(t)}, 0 ≤ t ≤ T be the optimal process that leads the system from the state ϕ0 in the state ϕ1 ∈ S1, and let us introduce Hamilton’s function H(ϕ, ψ, u) = n i=0 ψifi(ϕ, u). (10.211) Under these conditions, there exists the nontrivial vector function ψ(t) = ψ1 (t) ψ2(t) · · · ψn(t) T , ψ0 = const ≤ 0, (10.212) which satisfies the system of equations ∂ψi ∂t = − ∂H(ϕ(t), ψ, u(t)) ∂φi , i = 1, n, (10.213) with the limit conditions ψi(T ) = l k=1 γk ∂hk(ϕ(T )) ∂φi , i = 1, n, (10.214) 12Lev Semenovich Pontryagin (1908–1988) formulated this principle in 1956.
  • 615. PROBLEMS OF EXTREMUM 609 where γ1, . . . , γl are numbers such that, at any moment 0 ≤ t ≤ T verifies the condition of maxi- mum H(ϕ(t), ψ(t), u(t)) = max u∈U H(ϕ(t), ψ(t), u). (10.215) If the moment T is not fixed, then the following relation takes place HT = H(ϕ(T ), ψ(T ), u(T )) = 0. (10.216) The classical problem of the variational calculus consists in the minimization of the functional F = T 0 f0 φ, dφ dt , t dt (10.217) in the class of the functions sectionally smooth, which satisfy the limit conditions φ(0) ∈ S0, φ(T ) ∈ S1, is a particular case of problems (10.204), (10.206), (10.207), that is, to find the minimum of the functional F = T t0 f0(φ, u, t)dt, (10.218) with the condition dφ dt = u. (10.219) 10.13 PROBLEMS OF EXTREMUM Hereby, we will denote a Hilbert space over the field of real numbers by H, and the scalar product and the norm in H by ·, · H and H, respectively. Let π(u, v) be a symmetric and continuous bilinear form and L(u) a linear form continuous in H. We also denote by D ⊂ H a convex and closed set. We define the quadratic functional F(v) = π(v, v) + 2L(v), (10.220) where π(v, v) is positive definite on H, that is, there exists c > 0 with the property π(v, v) ≥ c v 2 H, (10.221) for any v ∈ H. Under these conditions there exists an uniquely determined element u ∈ D, which is the solution of the problem F(u) = inf u∈D F(v). (10.222) Theorem 10.2 If the above conditions are fulfilled, then u ∈ D is a solution of problem (10.222) if and only if for any v ∈ D we have π(u, v − u) ≥ L(v − u). (10.223)
  • 616. 610 OPTIMIZATIONS Demonstration. The necessity results from the following considerations. If u is a solution of problem (10.222), then for any v ∈ D and θ ∈ (0, 1) we have F(u) ≤ F((1 − θ)u + θv), (10.224) where we take into account that D is convex. From equation (10.224), we obtain F(u + θ(v − u)) + F(u) θ ≥ 0 (10.225) and passing to the limit for θ → 0, it follows that lim θ→0 F(u + θ(v − u)) + F(u) θ = lim θ→0 2[π(u, v − u) − L(v − u)] + lim θ→0 θπ(v − u, v − u) = 2[π(u, v − u) − L(v − u)] ≥ 0 (10.226) for any v ∈ D. Let us now show the sufficiency. Because F(u) is convex, then for any v ∈ D and any θ ∈ (0, 1) subsists the inequality F((1 − θ)u + θv) ≤ (1 − θ)F(u) + θF(v), (10.227) from which it follows that F(v) − F(u) ≥ F((1 − θ)u + θv) − F(u) θ . (10.228) We pass to the limit for θ → 0 and it follows that F(u) ≤ F(v) (10.229) for any v ∈ D, hence the theorem is proved. Observation 10.20 If we write v = u ± φ with φ ∈ D arbitrary, then π(u, φ) ≥ L(φ), −π(u, φ) ≥ −L(φ), (10.230) hence u is a solution of problem (10.222) if and only if, for any φ ∈ D, we have π(u, φ) = L(φ), (10.231) that is, Euler’s equation for the variational problem F(u) = inf v∈D F(v). (10.232)
  • 617. NUMERICAL EXAMPLES 611 10.14 NUMERICAL EXAMPLES Example 10.1 Let the function be f : [0, 2] → R, f (x) = x5 5 − x. (10.233) We wish to localize the minimum of this function knowing a = 0, b = 0.8, c = 2. First of all we use the linear algorithm of localization of the minimum and have a = 0 < 0.8 = b, (10.234) f (a) = 0, f (b) = 0.85 5 − 0.8 = −0.734464, f (a) > f (b). (10.235) Let k = 1.1 (10.236) and calculate c1 = b + k(b − a) = 1.68, (10.237) f (c1) = 1.685 5 − 1.68 = 0.99656 > 0, f (c1) > f (b). (10.238) It follows that the point of minimum is in the interval [0, 1.68]. On the other hand, the parabola which passes through the points A(0, 0), B(0.8, −0.734464), C(2, 4.4) is of equation L2(x) = 2.5984x2 − 2.9968x (10.239) and attains its minimum at the point x∗ = 2.9968 2 × 2.5984 = 0.576663. (10.240) Moreover, f (x∗ ) = −0.563909 < f (2) = 4.4, (10.241) f (x∗ ) = −0.563909 > f (0.8) = −0.734464; (10.242) hence, the point of minimum of the function f is in the interval [0.8, 2]. To determine the minimum, we may use the algorithm of the golden section, the results being specified in the Table 10.1. We may also use the Brent algorithm, the calculation being given in Table 10.2. In both cases the precision is ε = 10−3 . (10.243) Example 10.2 Let us consider the function U : D ⊂ R3 → R, U(x) = U(x, y, z) = 2x2 + 5y2 + 5z2 + 2xy − 4xz − 4yz, (10.244) where D = {(x, y, z) ∈ R3 |x2 + 2y2 + z2 ≤ 2}. (10.245) Let p(1) be the direction given by p(1) = 1 2 3 T . (10.246)
  • 618. 612 OPTIMIZATIONS TABLE 10.1 Determination of the Minimum of Function (10.233) by Means of the Algorithm of the Golden Section Step x0 x1 x2 x3 f (x0) f (x1) f (x2) f (x3) 0 0.000 0.800 1.258 2.000 0.000 −0.735 −0.627 4.400 1 0.000 0.494 0.800 1.258 0.000 −0.489 −0.734 −0.627 2 0.494 0.800 0.975 1.258 −0.489 −0.734 −0.799 −0.627 3 0.800 0.975 1.083 1.258 −0.734 −0.799 −0.785 −0.627 4 0.800 0.908 0.975 1.083 −0.734 −0.785 −0.799 −0.785 5 0.908 0.975 1.016 1.083 −0.785 −0.799 −0.799 −0.785 6 0.975 1.016 1.042 1.083 −0.799 −0.799 −0.796 −0.785 7 0.975 1.001 1.016 1.042 −0.799 −0.800 −0.799 −0.796 8 0.975 0.991 1.001 1.016 −0.799 −0.800 −0.800 −0.799 10 0.991 1.001 1.007 1.016 −0.800 −0.800 −0.800 −0.799 11 0.991 0.997 1.001 1.007 −0.800 −0.800 −0.800 −0.800 TABLE 10.2 Determination of the Minimum of Function (10.233) by Brent’s Algorithm Step a b u v t x fa fb fu fv ft fx 0 0.000 2.000 2.000 2.000 0.000 1.000 0.000 4.400 4.400 4.400 0.000 −0.800 1 0.000 1.382 1.382 1.382 0.000 1.000 0.000 4.400 4.400 4.400 0.000 −0.800 2 0.618 1.382 0.618 0.000 0.618 1.000 0.000 4.400 −0.374 0.000 −0.374 −0.800 3 0.618 1.146 1.146 0.618 1.146 1.000 0.000 4.400 −0.600 −0.374 −0.600 −0.800 4 0.854 1.146 0.854 1.146 0.854 1.000 0.000 4.400 −0.751 −0.600 −0.751 −0.800 5 0.854 1.056 1.056 0.854 1.056 1.000 0.000 4.400 −0.763 −0.751 −0.763 −0.800 6 0.944 1.056 0.944 1.056 0.944 1.000 0.000 4.400 −0.793 −0.763 −0.793 −0.800 7 0.944 1.021 1.021 0.944 1.021 1.000 0.000 4.400 −0.794 −0.793 −0.794 −0.800 8 0.979 1.021 0.979 1.021 0.979 1.000 0.000 4.400 −0.799 −0.794 −0.799 −0.800 9 0.979 1.008 1.008 0.979 1.008 1.000 0.000 4.400 −0.799 −0.799 −0.799 −0.800 We wish to determine the other G -conjugate directions too, as well as the minimum of the function U. To do this, we calculate the Hessian matrix ∇2 U(x) =           ∂2 U ∂x2 ∂2 U ∂x∂y ∂2 U ∂x∂z ∂2 U ∂x∂y ∂2 U ∂y2 ∂2 U ∂y∂z ∂2 U ∂x∂z ∂2 U ∂y∂z ∂2 U ∂z2           =   4 2 −4 2 10 −4 −4 −4 10   . (10.247) The second G -conjugate direction is determined by the relation 1 2 3   4 2 −4 2 10 −4 −4 −4 10     p12 p22 p23   = 0, (10.248) which leads to the equation −4p21 + 10p22 + 18p23 = 0; (10.249)
  • 619. NUMERICAL EXAMPLES 613 we choose p21 = 2, p22 = −1, p23 = 1. (10.250) We have obtained p(2) = 2 −1 T . (10.251) The last G -conjugate direction is given by the relation 2 −1 1   4 2 −4 2 10 −4 −4 −4 10     p31 p32 p33   = 0, (10.252) from which 2p31 − 10p32 + 6p33 = 0. (10.253) We choose p31 = 2, p32 = 1, p33 = 1, (10.254) hence p(3) = 2 1 1 T . (10.255) We take as start point the value x(0) = 1 0 1 T . (10.256) The expression U(x(0) + αp(1) ) = U     1 + α 2α 1 + 3α     = 35α2 + 14α + 3 (10.257) becomes minimum for α = − 1 5 (10.258) and it follows that x(1) = x(0) + αp(1) = 4 5 −2 5 2 5 T . (10.259) We calculate U(x(1) + αp(2) ) = U                 4 5 + 2α − 2 5 − α 2 5 + α                 = 26α2 + 104 5 α + 104 25 . (10.260) The minimum of this expression is attained for α = − 2 5 , (10.261) from which x(2) = x(1) + αp(2) = 0 0 0 T (10.262)
  • 620. 614 OPTIMIZATIONS Finally, the expression U(x(2) + αp(3) ) = U     2α α α     = 10α2 (10.263) attains its minimum for α = 0 (10.264) and it follows that x(3) = x(2) + αp(3) = 0 0 0 T = x(2) . (10.265) The point of minimum of the function U is given by x(3) , while the minimum value of U is Umin = U(x(3) ) = 0. (10.266) Example 10.3 Let us consider the function U : R3 → R, U(x) = U(x, y, z) = ex2 (y2 + z2 ), (10.267) for which we wish to calculate the minimum by Powell’s algorithm. We know ε = 10−2 , iter = 3, (10.268) x(0) = 2 1 −3 T . (10.269) We have U(x(k−1) + αp(k) ) = U       x(k−1) + αp(k) 1 y(k−1) + αp(k) 2 z(k−1) + αp(k) 3       = e(x(k−1)+αp (k) 1 )2 [(y(k−1) + αp(k) 2 )2 + (z(k−1) + αp(k) 3 )2 ] (10.270) dU(x(k−1) + αp(k) ) dα = e(x(k−1)+αp (k) 1 )2 2 x(k−1) + αp(k) 1 p(k) 1 + (y(k−1) + αp(k) 2 )2 + (z(k−1) + αp(k) 3 )2 +2p(k) 2 (y(k−1) + αp(k) 2 ) + 2p(k) 3 (z(k−1) + αp(k) 3 ) = e(x(k−1)+αp (k) 1 )2 F(α). (10.271) The value αmin which minimizes the expression (10.271) is obtained by solving the equation of second degree F(α) = 0. (10.272) The directions p(1) , p(2) and p(3) are p(1) = 1 0 0 T , p(2) = 0 1 0 T , p(3) = 0 0 1 T . (10.273)
  • 621. NUMERICAL EXAMPLES 615 We have U(x(0) + αp(1) ) = U     2 + α 1 −3     = 10e(2+α)2 , (10.274) dU(x(0) + αp(1)) dα = 20e(2+α)2(2+α) , (10.275) from which αmin = −2, (10.276) x(1) = x(0) − 2p(1) = 0 1 −3 T . (10.277) We calculate now U(x(1) + αp(2) ) = U     0 1 + α −3     = 10 + 2α + α2 , (10.278) dU(x(1) + αp(2) ) dα = 2α + 2, (10.279) such that αmin = −1, (10.280) x(2) = x(1) − p(2) = 0 0 −3 T . (10.281) Finally, we also find U(x(2) + αp(3) ) = U     0 0 −3   + α   = 9 − 6α + α2 (10.282) so that αmin = 3, (10.283) x(3) = x(2) + 3p(3) = 0 0 0 T . (10.284) On the other hand, the new value p(3) is given by p(3) = x(3) − x(2) = 0 0 3 T ; (10.285) we have U(x(3) + αo(3) ) = U     0 0 3α     = 9α2 , (10.286) from which αmin = 0, (10.287) x(4) = x(3) = 0 0 0 T . (10.288)
  • 622. 616 OPTIMIZATIONS But x(4) − x(3) = 0 < ε, (10.289) such that the point of minimum is determined by xmin = 0 0 0 T , (10.290) the minimum value of the function U being Umin = U(xmin) = 0. (10.291) Example 10.4 Let us consider again the function U of Example 10.3, for which we will calculate the minimum using gradient type methods. We begin by the gradient method. Therefore, we calculate ∇U(x) =    2xex2 y2 + z2 2yex2 2zex2    (10.292) and it follows that ∇U(x(0) ) =   40e4 2e4 −6e4   , (10.293) this being the first direction p(1). The scalar α0 minimizes the expression U(x(0) + α0p(1) ) = U     2 + 40α0e4 1 + 2α0e4 −3 − 6α0e4     = e(2+40α0e4)2 (10 + 40α2 0e8 + 40α0e4 ). (10.294) But U (α0) = e(2+40α0e4)2 (3200e12 α2 0 + 3280α0e8 + 840e4 ) (10.295) and the equation U (α0) = 0 leads to α01 = − 21 40e4 or α02 = − 1 2e4 . (10.296) Then U(α01) = e361 40 , U(α02) = 0, (10.297) so that we choose α0 = α02. It follows that x(1) = x(0) − 1 2e4 p(1) =   2 1 −3   −   20 1 −3   =   −18 0 0   . (10.298) We calculate ∇U(x(1) ) =   0 0 0   ; (10.299) hence, the sequence x(k) becomes constant x(k) = x(1), k ≥ 2.
  • 623. NUMERICAL EXAMPLES 617 If we wish to solve the problem by methods of conjugate gradient, then we calculate: • for the Fletcher–Reeves method: β1 = [∇U(x(1))]T[∇U(x(1))] [∇U(x(0))]T[∇U(x(0))] = 0; (10.300) • for the Polak–Ribi`ere method: y(0) = ∇U(x(1) ) − ∇U(x(0) ) =   −40e4 −2e4 6e4   , (10.301) β0 = [∇U(x(1))]Ty(0) [∇U(x(0))]T[∇U(x(0))] = 0; (10.302) • for the Hestenes–Stiefel method: β0 = [∇U(x(1))]Ty(0) [(y(0))]T p(0) = 0, (10.303) p(1) = −∇U(x(1) ) − β0p(0) =   0 0 0   . (10.304) We observe that in all cases, we obtain the same constant sequence x(k) = x(1), k ≥ 2, hence Umin = 0. Comparing the Example 10.3 and Example 10.4, we see that we do not obtain the same points of minimum. This may be explained by the fact that the function U has an infinity of points of minimum characterized by x ∈ R arbitrary, y = 0, z = 0. Example 10.5 We wish to solve the linear system 5x1 + 2x2 + 2x3 = 11, 2x1 + 5x2 + 2x3 = 14, 2x1 + 2x2 + 5x3 = 11, (10.305) using methods of gradient type and starting with x(0) = −1 1 0 T . (10.306) We know the values ε = 10−3 , δ = 10−1 , iter = 10. (10.307) We have A =   5 2 2 2 5 2 2 2 5   , b =   11 14 11   . (10.308) The matrix A is positive definite because x1 x2 x3 T [A]   x1 x2 x3   = (x1 + 2x2)2 + (x2 + 2x3)2 + (x3 + 2x1)2 . (10.309) The data are given in Table 10.3.
  • 624. 618 OPTIMIZATIONS TABLE 10.3 Solution of System (10.305) by the Gradient Method Step x p p, p (p, Ap) α 0   −1 0 1     14.00000 14.00000 8.00000   456.00000 3960.00000 0.11515 1   0.61212 1.61212 1.92121     0.87273 0.87273 −3.05455   10.85355 35.98810 0.30159 2   0.87532 1.87532 1.00000     0.87273 0.87273 0.49870   1.77201 15.38850 0.11515 3   0.97582 1.97582 1.05743     0.05440 0.05440 −0.19041   0.04218 0.13985 0.30159 4   0.99223 1.99223 1.00000     0.05440 0.05440 0.03109   0.00689 0.05980 0.11515 5   0.99849 1.99849 1.00358     0.00339 0.00339 −0.01187   0.00016 0.00054 0.30159 6   0.99952 1.99952 1.00000     0.00339 0.00339 0.00194   0.00003 0.00023 0.11515 7   0.99991 1.99991 1.00000     0.00021 0.00021 −0.00074   0.00000 0.00000 0.30159 8   0.99997 1.99997 1.00000   – – – – If we apply the Fletcher–Reeves method, then we obtain the data given in Table 10.4. Example 10.6 Let the function be U : R3 → R, U(x) = U(x, y, z) = 5x2 + 5y2 + 5z2 + 4xy + 4yz + 4xz, (10.310) for which we wish to determine the minimum, using Newton type methods. We know ε = 10−2 , (10.311) B0 =   1 0 0 0 1 0 0 0 1   = I3, (10.312) while the start vector is x(0) = 1 −1 1 T . (10.313)
  • 625. NUMERICAL EXAMPLES 619 TABLE 10.4 Solution of System (10.305) by the Fletcher–Reeves Method Step x p r p, p α β 0   −1.00000 0.00000 1.00000     14.00000 14.00000 8.00000     14.00000 14.00000 8.00000   456.00000 0.11515 0.00274 1   0.61212 1.61212 1.92121     0.91110 0.91110 −3.03262     0.87273 0.87273 −3.05455   10.85698 0.30572 0.03963 2   0.89067 1.89067 0.99407     0.81331 0.81331 0.34682     0.77720 0.77720 0.46700   1.44323 0.11769 0.00126 3   0.98638 1.98638 1.03488     0.02660 0.02660 −0.11950     0.02557 0.02557 -0.11994   0.01570 0.28084 0.06347 We calculate ∇U(x) =   10x + 4y + 4z 10y + 4x + 4z 10z + 4x + 4y   , (10.314) ∇U2 (x) =   10 4 4 4 10 4 4 4 10   . (10.315) The matrix ∇U2 (x) is positive definite because x1 x2 x3   10 4 4 4 10 4 4 4 10     x1 x2 x3   = 2[(x + 2y)2 + (y + 2z)2 + (x + 2x)2 ]. (10.316) Moreover, [∇2 U(x)]−1 = 1 648   84 −24 −24 −24 84 −24 −24 −24 84   . (10.317) In the case of Newton’s method we obtain the sequence of iterations x(k+1) = x(k) − [∇2 U(x)]−1 ∇U(x(k) ) =         − 8 27 x(k) 1 + 4 27 x(k) 2 + 4 27 x(k) 3 4 27 x(k) 1 − 8 27 x(k) 2 + 4 27 x(k) 3 4 27 x(k) 1 + 4 27 x(k) 2 − 8 27 x(k) 3         . (10.318) The calculations are given in Table 10.5.
  • 626. 620 OPTIMIZATIONS TABLE 10.5 Determination of the Minimum of the Function U by Newton’s Method Step x1 x2 x3 0 1.000000 −1.000000 1.000000 1 −0.296296 0.592593 −0.296296 2 0.131687 −0.263374 0.131687 3 −0.058528 0.117055 −0.058528 4 0.026012 −0.052025 0.026012 5 −0.011561 0.023122 −0.011561 6 0.005138 −0.010276 0.005138 7 −0.002284 0.004567 −0.002284 8 0.001015 −0.002030 0.001015 9 −0.000451 0.000902 −0.000451 10 0.000200 −0.000401 0.000200 11 −0.000089 0.000178 −0.000089 12 0.000040 −0.000079 0.000040 13 −0.000018 0.000035 −0.000018 14 0.000008 −0.000016 0.000008 15 −0.000003 0.000007 −0.000003 16 0.000002 −0.000003 0.000002 17 0.000001 0.000001 −0.000001 18 0.000000 −0.000001 0.000000 19 −0.000000 0.000000 −0.000000 20 0.000000 −0.000000 0.000000 In the case of Davidon–Fletcher–Powell method we have successively B0p(0) = −∇U(x(0) ),   1 0 0 0 1 0 0 0 1        p(0) 1 p(0) 2 p(0) 3      = −   10 −2 10   , (10.319) p0 = −10 2 −10 T , (10.320) U(x(0) + αp(0) ) = U     1 − 10α −1 + 2α 1 − 10α     = 1260α2 − 204α + 11. (10.321) This expression is minimized for α0 = 17 210 (10.322) and it follows that x(1) = x(0) + α0p(0) = 4 21 − 88 105 4 21 , (10.323) y(0) = ∇U(x(1) ) − ∇U(x(0) ) =         − 374 35 − 34 7 − 374 35         , (10.324)
  • 627. NUMERICAL EXAMPLES 621 z(0) = y(0) + α0∇U(x(0) ) =         − 374 35 − 34 7 − 374 35         + 17 10   10 −2 10   =         221 35 − 289 35 221 35         , (10.325) B1 = B0 + z(0) [y(0) ]T + y(0) [z(0) ]T [y(0)]T[x(1) − x(0)] − [z(0) ]T [x(1) − x(0) ] {[y(0)]T[x(1) − x(0)]}2 y(0) [y(0) ]T =   −7.171836 4.971392 −8.171836 4.971392 1.971425 4.971392 8.171021 0.114249 9.171021   . (10.326) Obviously, the procedure may continue. The application of the Broyden–Fletcher–Goldfarb–Shanno method is completely similar. The minimum of the function U(x) is obtained for xmin = 0, U(xmin) = 0. (10.327) Example 10.7 Let the problem of linear programming be maxim(2x1 − x2) =?, (10.328) with the restrictions x1 + x2 ≤ 5, x2 − x1 ≤ 4, x2 − x1 ≥ −3, x2 + 4 3 x1 ≥ 4. (10.329) Having only two variables x1 and x2, we can associate the straight lines d1 : x1 + x2 + 5 = 0, d2 : x2 − x1 − 4 = 0, d3 : x2 − x1 + 3 = 0, d4 : x2 + 4 3 x1 − 4 = 0, (10.330) represented in Figure 10.1. x1 x2 O 54321 5 4 3 2 1 D C B A d4 d1 d3 d2 Figure 10.1 Geometric solution of the problem of linear programming (10.328) and (10.329).
  • 628. 622 OPTIMIZATIONS These lines define the quadrilateral ABCD, its vertices having the coordinates A(0.5, 4.5), B(0, 4), C(3, 0), D(4, 1). (10.331) The function f : R2 → R, f (x1, x2) = 2x1 − x2 (10.332) has in these points, the values f (A) = −3.5, f (B) = −8, f (C) = 6, f (D) = 7, (10.333) the maximum value taking place at the point D. It follows that the solution of problem (10.328) and (10.329) is maxim(2x1 − x2) = 7. (10.334) The same problem (10.328) and (10.329) to which we add the conditions xi ≥ 0, i = 1, 2, may be solved by the primal Simplex algorithm. Thus, the system of restriction (10.329) will be replaced by the system x1 + x2 + x3 = 5, x2 − x1 + x4 = 4, x1 − x2 + x5 = 3, 4 3 x1 + x2 − x6 + x7 = 4, xi ≥ 0, i = 1, 7, (10.335) while problem (10.328) will be replaced by minim f (x) = minim (x2 − 2x1) =?. (10.336) We construct the Simplex table. x1 x2 x3 x4 x5 x6 x7 −2 1 0 0 0 0 0 0 x3 1 1 1 0 0 0 0 5 x4 −1 1 0 1 0 0 0 4 x5 1 −1 0 0 1 0 0 3 x7 4 3 1 0 0 0 −1 1 4 A basic solution is x1 = 0, x2 = 0, x3 = 5, x4 = 4, x5 = 3, x6 = 0, x7 = 4. (10.337) At the first iteration, x1 enters in the basis and x5 exits from the basis (it is possible for the exit of x7 too, because 3/1 = 4/(4/3)!). It follows that the new table x1 x2 x3 x4 x5 x6 x7 0 −1 0 0 2 0 0 6 x3 0 2 1 0 −1 0 0 2 x4 0 0 0 1 1 0 0 7 x1 1 −1 0 0 1 0 0 3 x7 0 7 3 0 0 − 4 3 −1 1 0
  • 629. APPLICATIONS 623 The new basic solution reads x1 = 3, x2 = 0, x3 = 2, x4 = 7, x5 = 0, x6 = 0, x7 = 0. (10.338) In the next step, x2 enters in the basis instead x3 and we obtain the new Simplex table. x1 x2 x3 x4 x5 x6 x7 0 0 1 2 0 3 2 0 0 7 x2 0 2 1 0 −1 0 0 2 x4 0 0 0 1 1 0 0 7 x1 1 0 1 2 0 1 2 0 0 4 x7 0 0 − 7 6 0 − 1 6 −1 1 − 7 3 It follows that the solution x1 = 4, x2 = 1, x3 = 0, x4 = 7, x5 = 0, x6 = 7 3 , x7 = 0. (10.339) We observe that the anomaly which appears in the last line of the Simplex table, that is, the solution x6 = 0, x7 = −7/3 is due to the modality of transformation of the last relation (10.329) in the last equality (10.335). Indeed, we would obtain 4 3 x1 + x2 ≥ 4, 4 3 x1 + x2 − x6 = 4, (10.340) but not the unit column corresponding to x6. In this situation, we have written x6 → x6 − x7 to obtain the unit column for the variable x7. In fact, this has been only a trick to start the Simplex algorithm. Analogically, we can use the dual Simplex algorithm, obviously after the transformation of problem (10.335) and (10.336) in the dual problem. 10.15 APPLICATIONS Problem 10.1 Let us consider the model of half of an automotive in Figure 10.2, formed of the bar AB of length l1 + l2 and of the nonlinear springs 1 and 2. The forces in the two springs are given by f1(z) and f2(z), respectively, where z is the elongation, while the functions f1 and f2 are odd in the variable z. The weight of the bar is G, its center of gravity C being at the distance l1 with respect to A, while its moment of inertia with respect to this center is J. We suppose that the rotation θ of the bar AB is small and that the springs have the same length in the nondeformed state. Determine the positions of equilibrium of the bar. Numerical application for G = 5000 N, f1(z) = f2(z) = f (z), f (z) = kzp , p = 1 or p = 3, k = 25000 N/mp , l1 = 1.5 m, l2 = 2.5 m. Solution: 1. Theoretical aspects The system has two degrees of freedom: the displacement x of the center of gravity C and the rotation θ of the bar. We have denoted the position of the bar in the absence of any deformation by A0B0.
  • 630. 624 OPTIMIZATIONS A G C0 C A0 B0 B x1 x2 l1 l2 21 x θ Figure 10.2 Theoretical model. There result the displacements x1 and x2 of the ends A and B, respectively, in the form x1 = x − l1θ, x2 = x + l2θ. (10.341) The theorem of momentum leads to the equation G + f1(x − l1θ) + f2(x + l2θ) = 0, (10.342) while the theorem of moment of momentum, with respect to the center of gravity C allows to write f1(x − l1θ)l1 − f2(x + l2θ)l2 = 0. (10.343) The two equations (10.342) and (10.343) may be put together in the equation U(x) = U(x, θ) = 0, (10.344) where U(x, θ) = [G + f1(x − l1θ) + f2(x + l2θ)]2 + [l1f1(x − l1θ) + l2f2(x + l2θ)]2 . (10.345) If the system formed by equation (10.342) and equation (10.343) has a solution, then equation (10.344) has a solution too and reciprocally. The determination of the solution of equation (10.344) is equivalent, in this case, to the determination of the minimum of the function U, given by expression (10.345). 2. Numerical case For p = 1, we have successively f1(x − l1θ) = 25000(x − 1.5θ), (10.346) f2(x + l2θ) = 25000(x + 2.5θ), (10.347)
  • 631. APPLICATIONS 625 the function U being of the form U(x, θ) = [5000 + 2500(x − 1.5θ) + 25000(x + 2.5θ)]2 + {1.5[25000(x − 1.5θ)] − 2.5[25000(x + 2.5θ)]}2 (10.348) or equivalent, U(x, θ) = (5000 + 50000x + 25000θ)2 + (−25000x − 212500θ)2 . (10.349) It follows that U(x, θ) = 3.125 × 109 x2 + 4.578125 × 1010 θ2 + 1.3125 × 1010 xθ + 5 × 108 x + 2.5 × 108 θ + 2.5 × 107 , (10.350) with the solution θ = −0.011 rad, x = −0.094 m. (10.351) For p = 3, we obtain U(x, θ) = [5000 + 2500(x − 1.5θ)3 + 25000(x + 2.5θ)3 ]2 + {1.5[25000(x − 1.5θ)3 ] − 2.5[25000(x + 2.5θ)3 ]}2 , (10.352) with the solution θ = 0.0196 rad, x = −0.47064 m. (10.353) Problem 10.2 Let us consider the linear program min cT x (10.354) with the restrictions Bx = b, x ≥ 0, (10.355) where x ∈ Mn,1(R), b ∈ Mm,1(R), c ∈ Mn,1(R), B ∈ Mmn (R). Let us solve this program in the case m = n − 1, while B is a full rank matrix. Solution: Because B is a full rank matrix, it follows that the components of the vector x may be written as a function of only one component, assumed as x1, that is, x2 = α2x1 + β2, . . . , xn = αnx1 + βn. (10.356) The function f (x) = cT x (10.357) now takes the form f (x) = c1x1 + · · · + cnxn = ax1 + b, (10.358) that is, it becomes a linear function in a single unknown x1. If a ≥ 0, then obviously, min f = f (0) = b. (10.359)
  • 632. 626 OPTIMIZATIONS If a < 0, then one considers relation (10.356). If the coefficients αi, i = 2, n are positive, then expression (10.356) does not introduce any restriction on the variable x1 and the program does not have an optimal solution. If there exist negative coefficients αj , then from xj = αj x1 + βj , xj ≥ 0, (10.360) we deduce αj x1 + βj ≥ 0, (10.361) hence x1 ≤ − βj αj . (10.362) If there exists at least a strictly negative βj , then there results x1 strictly negative and the linear program does not have an optimal solution. It follows that in the case a < 0, the necessary and sufficient condition to have an optimal solution for the program consists in the existence of at least expression (10.356) with αj < 0 and in the condition that the expressions of this form have strictly positive coefficients βj . Let us remark that if relation (10.359) takes place, then the linear program has an optimal solution if and only if all the values βi ≥ 0, i = 2, n. FURTHER READING Ackleh AS, Allen EJ, Hearfott RB, Seshaiyer P (2009). Classical and Modern Numerical Analysis: Theory, Methods and Practice. Boca Raton: CRC Press. Atkinson K, Han W (2010). Theoretical Numerical Analysis: A Functional Analysis Framework. 3rd ed. New York: Springer-Verlag. Baldick R (2009). Applied Optimization: Formulation and Algorithms for Engineering Systems. Cam- bridge. Cambridge University Press. Berbente C, Mitran S, Zancu S (1997). Metode Numerice. Bucures¸ti: Editura Tehnic˘a (in Romanian). Boyd S, Vandenberghe L (2004). Convex Optimization. Cambridge: Cambridge University Press. Cheney EW, Kincaid DR (1997). Numerical Mathematics and Computing. 6th ed. Belmont: Thomson. Chong EKP, ˙Zak SH (2008). An Introduction to Optimization. 3rd ed. Hoboken: John Wiley & Sons, Inc. Dahlquist G, Bj¨orck ´˚A (1974). Numerical Methods. Englewood Cliffs: Prentice Hall. Dennis JE Jr, Schnabel RB (1987). Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Philadelphia: SIAM. Diwekar U (2010). Introduction to Applied Optimization. 2nd ed. New York: Springer-Verlag. Fletcher R (2000). Practical Methods of Optimizations. 2nd ed. New York: John Wiley & Sons, Inc. Golub GH, van Loan CF (1996). Matrix Computations. 3rd ed. Baltimore: John Hopkins University Press. Griva I, Nash SG, Sofer A (2008). Linear and Nonlinear Optimization. 2nd ed. Philadelphia: SIAM. Hamming RW (2012). Introduction to Applied Numerical Analysis. New York: Dover Publications. Hoffman JD (1992). Numerical Methods for Engineers and Scientists. New York: McGraw-Hill. Jazar RN (2008). Vehicle Dynamics: Theory and Applications: New York: Springer-Verlag. Kharab A, Guenther RB (2011). An Introduction to Numerical Methods: A MATLAB Approach. 3rd ed. Boca Raton: CRC Press. Kleppner D, Kolenkow RJ (2010). An Introduction to Mechanics. Cambridge: Cambridge University Press.
  • 633. FURTHER READING 627 Lange K (2010a). Numerical Analysis for Statisticians. 2nd ed. New York: Springer-Verlag. Lanczos C (1949). The Variational Principles of Mechanics. Toronto: University of Toronto Press. Lange K (2010b). Optimization. New York: Springer-Verlag. Lawden DF (2006). Analytical Methods of Optimization. 2nd ed. New York: Dover Publications. Luenberger DG (1997). Optimization by Vector Space Methods. New York: John Wiley & Sons, Inc. Lurie AI (2002). Analytical Mechanics. New York: Springer-Verlag. Marciuk GI (1983). Metode de Analiz˘a Numeric˘a. Bucures¸ti: Editura Academiei Romˆane (in Roma- nian). Meriam JL, Kraige LG (2012). Engineering Mechanics: Dynamics. Hoboken: John Wiley & Sons, Inc. Nocedal J, Wright SJ (2006). Numerical Optimization. 2nd ed. New York: Springer-Verlag. Pandrea N, Pˆarlac S, Popa D (2001). Modele pentru Studiul Vibrat¸iilor Automobilelor. Pites¸ti: Tiparg (in Romanian). Rao SS (2009). Engineering Optimization: Theory and Practice. 3rd ed. Hoboken: John Wiley & Sons, Inc. Ridgway Scott L (2011). Numerical Analysis. Princeton: Princeton University Press. Ruszczy´nski A (2006). Nonlinear Optimization. Princeton: Princeton University Press. Sauer T (2011). Numerical Analysis. 2nd ed. London: Pearson. St˘anescu ND (2007). Metode Numerice. Bucures¸ti: Editura Didactic˘a s¸i Pedagogic˘a (in Romanian). S¨uli E, Mayers D (2003). An Introduction to Numerical Analysis. Cambridge: Cambridge University Press. Venkataraman P (2009). Applied Optimization with MATLAB Programming. 2nd ed. Hoboken: John Wiley & Sons, Inc.
  • 634. INDEX Adams method, 463 Adams predictor-corrector method, 469 fifth-order, 470 fourth-order, 470 third-order, 469 Adams–Bashforth methods, 465 fifth-order, 467 fourth-order, 467 third-order, 467 Adams–Moulton methods, 468 fifth-order, 468 fourth-order, 468 third-order, 468 sufficient condition for convergence, 469 aleatory variable, 151 almost mini–max approximation, 345 polynomial, 346 approximate osculating polynomial, 339 approximation of functions by trigonometric functions, 346 Banach theorem, 38 base of a vector space, 124 Bellman equation, 606 principle, 606 Bernoulli method, 76 numbers, 395 Bessel formulae of interpolation, 324 Numerical Analysis with Applications in Mechanics and Engineering, First Edition. Petre Teodorescu, Nicolae-Doru St˘anescu, and Nicolae Pandrea.  2013 The Institute of Electrical and Electronics Engineers, Inc. Published 2013 by John Wiley & Sons, Inc. dichotomy, 325 first, 324 quadratic, 325 inequality, 348 Bierge–Vi`ete method, 79 biharmonic function, 546 polynomials, 546 bipartition method, 17 a posteriori estimation of the error, 19 a priori estimation of the error, 19 convergence, 18 bisection method see bipartition method Brent algorithm, 582 Broyden–Fletcher–Goldfarb–Shanno method, 593 Budan theorem, 67 Budan–Fourier theorem see Budan theorem calculation process, 1 stability, 1 Cauchy criterion of convergence for a sequence of matrices, 119 problem, 452 correct stated, 452 perturbed, 452 Cauchy–Buniakowski–Schwarz inequality, 117 Cauchy–Lipschitz theorem, 452 characteristic equation, 131, 153, 156 polynomial, 131 629
  • 635. 630 INDEX Chebyshev interpolation polynomials, 340, 407 polynomials, 420 quadrature formulae, 398 theorem for aleatory variable, 152 for polynomials, 342 Cholesky method, 137 chord method, 20 a posteriori estimation of the error, 23 a priori estimation of the error, 24 convergence, 20 complete sequence, 550 conditions essential, 551 natural, 551 conjugate directions, 583 algorithm, 583 G-conjugate directions, 584 Constantinescu torque converter, 511 contraction method, 37 a posteriori estimation of the error, 42 a priori estimation of the error, 41 control problem, 605 convex programming, 600 problem, 601 dual problem, 602 Courant number, 532 Cramer rule, 133 Crank–Nicholson algorithm, 542 method, 542 Crout method, 136 Danilevski method, 157 Davidon–Fletcher–Powell method, 593 Descartes theorem, 65 determinant calculation using definition, 111 using equivalent matrices, 112 definition, 111 determination of limits of the roots of polynomials, 55, 58 determination of the minimum, 580 diagonal form of a matrix, 134 Dirichlet conditions, 349 generalized theorem, 347 theorem, 349 direct power method, 160 accelerated convergence, 163 discriminant of the equation of third degree, 88 displacement method, 166 divided differences, 327 Doolittle method, 136 dynamic programming, 605 eigenvalue, 153 eigenvector, 153 elementary transformations, 593 energetic space, 549 equivalent systems, 593 errors absolute, 3 approximation, 2 enter data, 1 in integration of ordinary differential equations, 473 propagation, 3 addition, 3 computation of functions, 8 division of two numbers, 7 inversion of a number, 7 multiplication, 5 raising to a negative entire power, 7 subtraction, 8 taking root of pth order, 7 relative, 3 round-off, 3 Euler formulae of integration first, 395 second, 396 method, 454 algorithm, 455 determination of the error, 456 modified, 460 predictor-corrector method, 469 variational equation, 547 problem, 610 Euler–Maclaurin formulae of integration see Euler formulae of integration Everett formulae of interpolation, 326 first, 326 second, 326 explicit system, 594 extremum, 609 finite differences, 312 Fletcher–Reeves algorithm, 590 method, 588 Fourier approximation of functions see approximation of functions by trigonometric functions generalized coefficients, 347 generalized series, 347 method, 568, 571 Frame–Fadeev method, 131 Frobenius form of a matrix, 158 full rank matrix, 141 Galerkin method, 551 Gauss formulae of interpolation, 322 first, 322 second, 323 method, 133 quadrature formulae, 405
  • 636. INDEX 631 Gauss–Jordan method for inversion of matrices, 124 for linear systems, 134 Gauss–Legendre quadrature formulae see Gauss quadrature formulae Gauss–Seidel method, 147 convergence, 147 error estimation, 148 Gauss type quadrature formulae, 412 Gauss–Hermite, 414 Gauss–Jacobi, 413 Gauss–Laguerre, 415 in which appear the derivatives, 418 with imposed points, 417 generalized power, 316 generalized solution, 549, 551 Givens rotation matrices, 171 golden section algorithm, 580 gradient conditional gradient method, 602 conjugate gradient method, 587 gradient type methods in optimizations, 585 algorithm, 586 convergence, 587 gradient projection method, 602 method for linear systems, 589 algorithm, 590 method for nonlinear systems, 277 Gramm–Schmidt procedure, 406 Grashof formula, 524 Hamilton–Cayley equation see characteristic equation Hamming predictor-corrector method, 470 Hermite formula, 331 interpolation, 339 interpolation polynomial, 340 interpolation theorem, 340 polynomials, 408 theorem, 330 Hessian matrix, 577 Hestenes–Stiefel method, 589 Heun method, 460 Horner generalized schema, 70 Householder matrix, 169 reflexion, 169 vector, 169 Hudde method, 87 improper integrals, 382 calculation, 420 infinite systems of linear equations, 152 completely regular, 152 regular, 152 interpolation, 307 knots, 307 with exponentials, 355 with rational functions, 355 inverse interpolation, 332 determination of the roots of an equation, 333 with arbitrary division points, 333 with equidistant division points, 332 inverse power method, 165 inversion of matrices, 123 by means of the characteristic polynomial, 131 by partition, 125 direct, 123 iterative methods for inversion of the matrices, 128 a priori estimation of the error, 130 convergence, 129 for linear systems, 142 a posteriori estimation of the error, 146 a priori estimation of the error convergence for nonlinear systems, 273 Jacobi method see iteration method polynomials, 408 Jacobian, 275 Kantorovich method, 422 Krylov method, 155 Lagrange function, 601 saddle point, 601 interpolation polynomial, 307 evaluation of the error, 310 existence and uniqueness, 307 method, 69 Laguerre polynomials, 409 Lax–Wendorff schema, 533 least square method for approximation of functions, 352 for overdetermined systems, 174 for partial differential equations, 355 Legendre polynomials, 400, 407 Leverrier method, 166 Lin methods, 79 first method, 79 second method, 80 linear equivalence method (LEM), 471 first LEM equivalent, 471 second LEM equivalent, 472 linear programming admissible solution, 595 canonical form formulation of the problem, 595 geometrical interpretation, 597 optimal solution (program), 595 possible solution, 595 restrictions, 595 standard form, 596 linear transformation, 153 Lobacevski–Graeffe method, 72 case of a pair of complex conjugate roots, 74 case of distinct real roots, 72
  • 637. 632 INDEX Lobacevski–Graeffe method (Continued) case of two pairs of complex conjugate roots, 75 localization of the minimum, 579 algorithm, 579 L–R method, 166 LU factorization, 135 Markoff formula, 333 Markov chain, 150 mathematical expectation, 151 matrix symmetric, 137 positive definite, 137 method of entire series, 280 of one-dimensional search, 583 of penalizing functions, 603 of possible direction, 603 of terms grouping, 59 modulus of a matrix, 114 Milne fourth-order predictor-corrector method, 470 minimization along a direction, 578 minimum, 577 global, 577 local, 577 minimum residual, 175 mini–max approximations of functions, 344 principle, 344 Moivre formula, 341 Monte Carlo method for definite integrals, 423 for linear systems, 150 multibody dynamics, 128, 492, 499, 504 multistep methods, 462 explicit (open), 462 implicit (closed), 462 Newton direction, 590 formula with divided differences, 331 formulae, 166 interpolation polynomials, 317 backward, 319 error, 322 forward, 317 method for one dimensional case see tangent method for systems of nonlinear equations, 275 convergence, 276 modified, 276 stopping condition, 276 simplified method, 33 a posteriori estimation of the error, 35 a priori estimation of the error, 35 convergence, 33 theorem, 59 Newton type methods, 590 quasi Newton method, 593 Newton method, 590 Newton–Cˆotes error in quadrature formula, 385 quadrature, 384 quadrature formula, 385 Newton–Kantorovich method, 42 a posteriori estimation of the error, 45 a priori estimation of the error, 45 convergence, 42 Newton–Raphson method, 277 norm of a matrix canonical, 116 definition, 115 1 norm, 116 2 norm, 173, 193 k norm, 116, 169 ∞ norm, 116 numerical differentiation, 377 by means of expansion into a Taylor series, 377 approximation error, 379 by means of interpolation polynomials, 380 numerical integration, 382 optimality conditions, 578 optimizations, 577 orthogonal matrix, 170 polynomials, 406 properties, 410 overdetermined systems, 174 Parseval relation (equality), 348 partial differential equations of first order, 529 characteristic hypersurfaces, 530 characteristic system, 530 homogeneous, 530 numerical solution with explicit schemata, 530 numerical solution with implicit schemata, 530, 533 partial differential equations of second order, 534 of elliptic type, 534 of hyperbolic type, 543 algorithm, 546 of parabolic type, 538 method with differences backward, 540 method with differences forward, 539 Peano theorem, 452 point matching method, 546 Poisson equation, 534 algorithm, 537 Polak–Ribi`ere method, 588 polydine cam, 367 Pontryagin principle of maximum, 607 Powell algorithm, 585 predictor-corrector methods, 469 Prony method, 355 pseudo-inverse of a matrix, 177 QR decomposition, 169 QR factorization, 170
  • 638. INDEX 633 quadratic programming, 603 optimality criterion, 604 quadrature formula, 384 rank of a matrix, 113 definition, 113 calculation, 113 relaxation method, 149 remainders of series of matrices, 123 Richardson formula of extrapolation, 396 method of integration, 542 Ritz method, 549 Romberg formula of integration, 396 procedure, 398 rotation method, 168 Runge function, 362 Runge–Kutta methods, 458 of fourth-order, 460 of the sixth-order, 460 of the mean point, 459 of the third-order, 460 Runge–Kutta–Fehlberg method, 461 Schneider formula, 524 Schultz conditions to determine the inverse of a matrix, 138 method for inversion of the matrices see iterative method for inversion of the matrices, for solving systems of linear equations, 137 Schur complement, 127 method of inversion of the matrices, 127 secant method see chord method separation of roots, 60 sequence of matrices, 119 convergence, 119 limit, 119 series of matrices, 120 absolute convergence, 120 convergence, 120 similar matrices, 155 simplex algorithm, 597 dual, 599 primal, 597 Simpson error for the formula of quadrature, 389 formula of quadrature, 389 generalized formula of quadrature, 389 singular value decomposition (SVD), 172 theorem, 173 solution basic, 595 nondegenerate, 595 spectrum of a matrix, 153 spline cubical spline function with free boundary, 336 algorithm, 338 uniqueness, 337 with imposed boundary, 336 algorithm, 339 uniqueness, 338 functions, 335 interpolation, 335 Steffensen formula of interpolation, 326 Stirling formula of interpolation, 324 Sturm sequence, 66 theorem, 67 substitution lemma, 124 system of normal equations, 175 tangent method, 26 a posteriori estimation of the error, 29 a priori estimation of the error, 29 convergence, 27 procedure of choice of the start point, 32 Taylor method, 457 polynomials, 311 theorem, 311 trapezoid error for the formula of quadrature, 387 formula of quadrature, 386 generalized formula of quadrature, 388 triangular form of a linear system, 133 trust region, 591 algorithm, 591 underdetermined linear systems, 178 variable principal (basic), 595 secondary (nonbasic), 595 variational methods for partial differential equations, 547 vector orthogonal, 172 orthonormal, 172 space, 124 Vi`ete relations, 72 Waring theorem, 62 wave propagation equation, 554 Wendorff schema, 555