Immediate download A tutorial on elliptic PDE solvers and their parallelization Craig C. Douglas ebooks 2025
1. Visit https://ebookfinal.com to download the full version and
explore more ebooks
A tutorial on elliptic PDE solvers and their
parallelization Craig C. Douglas
_____ Click the link below to download _____
https://ebookfinal.com/download/a-tutorial-on-elliptic-
pde-solvers-and-their-parallelization-craig-c-douglas/
Explore and download more ebooks at ebookfinal.com
2. Here are some suggested products you might be interested in.
Click the link to download
Ruby On Rails 3 Tutorial 1st Edition Michael Hartl
https://ebookfinal.com/download/ruby-on-rails-3-tutorial-1st-edition-
michael-hartl/
Fort Bowie Arizona Douglas C. Mcchristian
https://ebookfinal.com/download/fort-bowie-arizona-douglas-c-
mcchristian/
Servlet and JSP a tutorial 2nd edition Edition Kurniawan
https://ebookfinal.com/download/servlet-and-jsp-a-tutorial-2nd-
edition-edition-kurniawan/
Engineering Statistics 5th Edition Douglas C. Montgomery
https://ebookfinal.com/download/engineering-statistics-5th-edition-
douglas-c-montgomery/
3. Regular Army O Soldiering on the Western Frontier 1865
1891 1st Edition Edition Douglas C. Mcchristian
https://ebookfinal.com/download/regular-army-o-soldiering-on-the-
western-frontier-1865-1891-1st-edition-edition-douglas-c-mcchristian/
Methods on Nonlinear Elliptic Equations Aims Series on
Differential Equations Dynamical Systems 1st Edition
Wenxiong Chen
https://ebookfinal.com/download/methods-on-nonlinear-elliptic-
equations-aims-series-on-differential-equations-dynamical-systems-1st-
edition-wenxiong-chen/
Advances in Photochemistry 1st Edition Douglas C. Neckers
https://ebookfinal.com/download/advances-in-photochemistry-1st-
edition-douglas-c-neckers/
Engineering statistics 5nd ed Edition Douglas C Montgomery
https://ebookfinal.com/download/engineering-statistics-5nd-ed-edition-
douglas-c-montgomery/
Independent Component Analysis A Tutorial Introduction
James V. Stone
https://ebookfinal.com/download/independent-component-analysis-a-
tutorial-introduction-james-v-stone/
5. A tutorial on elliptic PDE solvers and their parallelization
Craig C. Douglas Digital Instant Download
Author(s): Craig C. Douglas, Gundolf Haase, Ulrich Langer
ISBN(s): 9780898715415, 0898715415
Edition: illustrated edition
File Details: PDF, 13.82 MB
Year: 2003
Language: english
7. A Tutorial on Elliptic
PDE Solvers and
Their Parallelization
8. SOFTWARE • ENVIRONMENTS • TOOLS
The series includes handbooks and software guides as well as monographs
on practical implementation of computational methods, environments, and tools.
The focus is on making recent developments available in a practical format
to researchers and other users of these methods and tools.
Editor-in-Chief
Jack J. Dongarra
University of Tennessee and Oak Ridge National Laboratory
Editorial Board
James W. Demmel, University of California, Berkeley
Dennis Gannon, Indiana University
Eric Grosse, AT&T Bell Laboratories
Ken Kennedy, Rice University
Jorge J. More, Argonne National Laboratory
Software, Environments, and Tools
Craig C. Douglas, Gundolf Haase, and Ulrich Langer, A Tutorial on Elliptic PDE Solvers and Their Parallelization
Louis Komzsik, The Lanczos Method: Evolution and Application
Bard Ermentrout, Simulating, Analyzing, and Animating Dynamical Systems: A Guide to XPPAUT for Researchers
and Students
V. A. Barker, L. S. Blackford, J. Dongarra, J. Du Croz, S. Hammarling, M. Marinova, J.Wasniewski, and
P. Yalamov, LAPACK95 Users' Guide
Stefan Goedecker and Adolfy Hoisie, Performance Optimization of Numerically Intensive Codes
Zhaojun Bai, James Demmel, Jack Dongarra, Axel Ruhe, and Henk van der Vorst,
Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide
Lloyd N. Trefethen, Spectral Methods in MATLAB
E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz,
A. Creenbaum, S. Hammarling, A. McKenney, and D. Sorensen,LAPACK Users' Guide, Third Edition
Michael W. Berry and Murray Browne, Understanding Search Engines: Mathematical Modeling and Text Retrieval
Jack J. Dongarra, lain S. Duff, Danny C. Sorensen, and Henk A. van der Vorst, Numerical Linear Algebra for
High-Performance Computers
R. B. Lehoucq, D. C. Sorensen, and C. Yang, ARPACK Users' Guide: Solution of Large-Scale Eigenvalue
Problems with Implicitly Restarted Arnoldi Methods
Randolph E. Bank, PLTMC: A Software Package for Solving Elliptic Partial Differential Equations, Users' Guide 8.0
L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling,
G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley, ScaLAPACK Users' Guide
Greg Astfalk, editor, Applications on Advanced Architecture Computers
Francoise Chaitin-Chatelin and Valerie Fraysse, Lectures on Finite Precision Computations
Roger W. Hockney, The Science of Computer Benchmarking
Richard Barrett, Michael Berry, Tony F.Chan, James Demmel, June Donato, Jack Dongarra, Victor Eijkhout, Roldan
Pozo, Charles Romine, and Henk van der Vorst, Templates for the Solution of Linear Systems: Building
Blocks for Iterative Methods
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling,
A. McKenney, S. Ostrouchov, and D. Sorensen, LAPACK Users' Guide, Second Edition
Jack J. Dongarra, lain S. Duff, Danny C. Sorensen, and Henk van der Vorst, Solving Linear Systems on Vector
and Shared Memory Computers
J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart, Unpack Users' Guide
9. Craig C. Douglas
University of Kentucky
Lexington, Kentucky
and
Yale University
New Haven, Connecticut
Gundolf Haase
Johannes Kepler University
Linz, Austria
Ulrich Langer
Johannes Kepler University
Linz, Austria
A Tutorial on Elliptic
PDE Solvers and
Their Parallelization
Society for Industrial and Applied Mathematics
Philadelphia
11. Contents
List of Figures ix
List of Algorithms xi
Abbreviations and Notation xiii
Preface xvii
1 Introduction 1
2 A Simple Example 5
2.1 The Poisson equation and its finite difference discretization 5
2.2 Sequential solving 8
2.2.1 Direct methods 8
2.2.2 Iterative methods 9
2.3 Parallel solving by means of DD 11
2.4 Some other discretization methods 13
3 Introduction to Parallelism 15
3.1 Classificationsof parallel computers 15
3.1.1 Classificationby Flynn 15
3.1.2 Classificationby memory access 17
3.1.3 Communication topologies 19
3.2 Specialties of parallel algorithms 20
3.2.1 Synchronization 20
3.2.2 Message passing 21
3.2.3 Deadlock 22
3.2.4 Data coherency 23
3.2.5 Parallel extensions of operating systems and program-
ming languages 23
3.3 Basic global operations 24
3.3.1 SEND andRECV 24
3.3.2 EXCHANGE 25
3.3.3 Gather-scatter operations 25
3.3.4 Broadcast 26
h
12. vi Contents
3.3.5 Reduce and reduce-all operations 27
3.3.6 Synchronization by barriers 27
3.3.7 Some remarks on portability 27
3.4 Performance evaluation of parallel algorithms 28
3.4.1 Speedup and scaleup 28
3.4.2 Efficiency 31
3.4.3 Communication expenditure 33
Exercises 33
4 Galerkin Finite Element Discretization of Elliptic Partial Differential Equa-
tions 35
4.1 Variational formulation of elliptic BVPs 35
4.2 Galerkin finite element discretization 40
4.2.1 The Galerkin method 41
4.2.2 The simplest finite element schemes 42
4.2.3 Analysis of the Galerkin FEM 62
4.2.4 Iterative solution of the Galerkin system 64
Exercises 68
5 Basic Numerical Routines in Parallel 71
5.1 Storage of sparse matrices 71
5.2 DD by nonoverlapping elements 72
5.3 Vector-vector operations 75
5.4 Matrix-vector operations 76
Exercises 79
6 Classical Solvers 83
6.1 Direct methods 84
6.1.1 LU factorization 84
6.1.2 ILU factorization 85
6.2 Smoothers 90
6.2.1 w-Jacobi iteration 91
6.2.2 Gauss–Seidel iteration 93
6.2.3 ADI methods 97
6.3 Roughers 103
6.3.1 CG method 103
6.3.2 GMRES solver 105
6.3.3 BICGSTAB solver 107
6.4 Preconditioners 108
Exercises 108
7 Multigrid Methods 111
7.1 Multigrid methods 1ll
7.2 The multigrid algorithm 112
7.2.1 Sequential algorithm 112
7.2.2 Parallel components of multigrid 113
13. Contents vii
7.2.3 Parallel algorithm 116
Exercises 117
8 Problems Not Addressed in This Book 119
Appendix Internet Addresses 121
Bibliography 125
Index 133
15. List of Figures
2.1 Rectangular domain with equidistant grid points 6
2.2 Structure of matrix and load vector. 7
2.3 Two subdomains with five-point stencil discretization 11
3.1 Types of parallel processes 16
3.2 Undefined status 20
3.3 Three possible outcomes 21
3.4 Blocking communication 21
3.5 Nonblocking communication 22
3.6 Deadlock in blocking communication 23
3.7 Nonblocking EXCHANGE 25
3.8 Blocking EXCHANGE 26
3.9 SCATTER and GATHER 26
3.10 BROADCAST operation 27
3.11 Speedup with respect to problem size 30
3.12 System times for Gaussian elimination, CG, PCG 31
4.1 Computational domain consisting of two different materials 39
4.2 Galerkin scheme 42
4.3 Courant's basis function 43
4.4 Examples of inadmissible meshes 45
4.5 Change in the type of the BCs 45
4.6 Capturing interfaces by triangulation 46
4.7 Triangle with an obtuse angle 46
4.8 Local numbering of the nodes in a triangle 47
4.9 Triangulation of the model problem CHIP. 48
4.10 Mapping between an arbitrary element and the master element 49
4.11 Definition of the finite element basis function)
50
4.12 One element matrix is assembled into the global matrix 57
4.13 Two element matrices are added into the global matrix 57
4.14 Mapping of a triangular edge onto the unit interval [0, 1] 60
5.1 Nonoverlapping DD 72
5.2 Nonoverlapping elements 73
5.3 Nonoverlapping elements with a revised discretization 77
ix
16. x List of Figures
5.4 Four subdomains inlocal numberingwith local discretization nx = ny =
4 and global discretization Nx = Ny = 8 81
6.1 Illustration of the rank-r modification 84
6.2 Scattered distribution of a matrix 85
6.3 Data flow of Jacobi iteration 91
6.4 Data flow of Gauss–Seidel forward iteration 94
6.5 Data flow of red-black Gauss–Seidel forward iteration 95
6.6 DD of unit square 95
6.7 ADI damping factor parameter space 99
6.8 Decomposition in four strips; • denotes an edge node 100
7.1 V- and W-cycles 112
7.2 Nonoverlapping element distribution on two nested grids 114
17. List of Algorithms
2.1 A simple parallel DD solver (P = 2 for our example) 12
6.1 Rank-r modificationof LU factorization—sequential 84
6.2 Block variant of rank-r modification 85
6.3 Block-wise rank-r modificationof LU factorization — parallel 86
6.4 Sequential block ILU factorization 87
6.5 Sequential block-wise substitution steps for LUw = r 87
6.6 Parallelized factorization 88
6.7 Parallel forward and backward substitution step for m = r 88
6.8 Parallel IUL factorization 89
6.9 Parallel backward and forward substitution step for m = r 89
6.10 Sequential Jacobi iteration .". . . 91
6.11 Parallel Jacobi iteration: JACOBI(K, u°, f) 92
6.12 Sequential Gauss–Seidel forward iteration 93
6.13 Update step in red-black Gauss–Seidel forward iteration 94
6.14 Parallel Gauss–Seidel w-Jacobi forward iteration 97
6.15 Sequential ADI in two dimensions 98
6.16 Parallel ADI in two dimensions: first try. 101
6.17 Parallel ADI in two dimensions: final version 102
6.18 Gauss–Seidel iteration for solving 103
6.19 Sequential CG with preconditioning 104
6.20 Parallelized CG 104
6.21 Sequential GMRES with preconditioning 105
6.22 Parallel GMRES—no restart 106
6.23 Sequential BICGSTAB 107
6.24 Parallel BICGSTAB 108
7.1 Sequential multigrid: MGM(Kq,uq, fq ,q) 113
7.2 Parallel multigrid: PMGM(Kq, uq, fq , q) 116
XI
19. Abbreviations and Notation
Nobody can say what a variable is.
—Hermann Weyl (1885-1955)
ADI Alternating Direction Implicit
a.e. almost everywhere
AMG Algebraic Multigrid
BC Boundary Condition
BICGSTAB BIConjugateGradient STABilized (method, algorithm,...)
BPX Bramble, Pasciak, and Xu Preconditioner
BVP Boundary Value Problem
CFD Computational Fluid Dynamics
CG Conjugate Gradient (method, algorithm,...)
COMA Cache Only Memory Access
CRS Compressed Row Storage
DD Domain Decomposition
DSM Distributed Shared Memory
FDM Finite Difference Method
FEM Finite Element Method
FVM Finite Volume Method
FFT Fast Fourier Transformation
GMRES Generalized Minimum RESidual (method, algorithm,...)
HPF High Performance Fortran
IC Incomplete Cholesky
ILU Incomplete LU factorization
LU LU factorization
MIC Modified Incomplete Cholesky factorization
MILU Modified Incomplete LU factorization
MIMD Multiple Instructions on Multiple Data
MISD Multiple Instructions on Single Data
MP Material Property
MPI Message-Passing Interface
NUMA Nonuniform Memory Access
PCG Preconditioned Conjugate Gradient
PDE Partial Differential Equation
XIII
20. xiv Abbreviations and Notation
QMR
SIMD
SISD
SMP
SOR
SPMD
SSOR
SPD
UMA
quasi-minimal residual
Single Instruction on Multiple Data
Single Instruction on Single Data
Symmetric Multiprocessing
Successive Overrelaxation
Single Program on Multiple Data
Symmetric Successive Overrelaxation
Symmetric Positive Definite
Uniform Memory Accesss
inner (scalar) product in some Hilbert space V
inner product in H1
( ): (u, u)1,a = (u, f)1 :=
norm in some normed space V: || u 2:
— (u, u) if V is a Hilbert space
norm in the function space Hl
:
continuous linear functional; see (4.3)
bilinear form; see (4.3)
Euclidian inner product in RNh
: (u, v)
B-energy norm: || M||:= (Bu, u), where B is an Nh x Nh SPD matrix
real numbers
m-dimensional Euclidean space
computational domain,
subdomain
boundary of a domain
Dirichlet boundary (section 4.1)
Neumann boundary (section 4.1)
Robin boundary (section 4.1)
function space: L2
function space: L ( ) :=
space of continuous functions
space of m times continuously differentiable functions / :
general function space for classical solutions
Sobolev space H1 ) :=
trace space H
Hilbert space
subspace of V
linear manifold in V generated by a given g e V and Vb
dual space of all linear and continuous functionals on V0
finite dimensional subspaces: V, V, V0h c V0
finite dimensional submanifold: Vgh C Vs
dimension of space Vf, (section 4.2.1)
set of nodes or their indices from discretized
set of nodes or their indices from discretized
set of nodes or their indices from discretized
number of nodes:
number of nodes:
esssup
/ :
21. Abbreviations and Notation xv
u classical solution u e X or weak solution u e Vg
Uh finiteelement (Galerkin) solution uh e Vgi,
u_h discrete solution (vector of coefficients) u^ e K^*
supp() supp( ) := [x e | x) 0},support of (x)
Th triangulation
Rh set of finite elements
Sr arbitrary triangular element
A master (= reference) element
Kh, K stiffness matrix
Ch,C preconditioning matrix
K (K) condition number of a matrix K
8ij Kronecker symbol, Sfj = 1for i = j and 0 otherwise
O (Np
) some quantity that behaves proportional to the pth power of N for N
oo
O (hp
) some quantitythat behaves proportional to the pth power of h for h 0
23. Preface
Computing has become a third branch of research, joining the traditional practices of theo-
rization and laboratory experimentation and verification. Due to the expense and complexity
of actually performing experiments in many situations, simulation must be done first. In
addition, computation in some fields leads to new directions in theory. The three practices
have formed a symbiotic relationship that is now known as computational science. In the
best situations, all three areas seamlessly interact.
The "computational" supplement is due to the wide use during the last decade of
computational methods in almost all classical and new sciences, including life and business
sciences.
Most of the models behindthese computer simulationsare based on partial differential
equations (PDEs). The first step toward computer simulation consists of the discretization
of the PDE model. This usually results in a very large-scale linear system of algebraic
equations or even in a sequence of such systems. The fast solution of these systems of
linear algebraic equations is crucial for the overall efficiency of the computer simulation.
The growing complexity of the models increasingly requires the use of parallel computers.
Besides expensive parallel computers, clusters of personal computers or workstations are
now very popular as an inexpensive alternative parallel hardware configuration for the
computer simulation of complex problems (and are even used as home parallel systems).
This tutorial serves as an introduction to the basic concepts of solving PDEs using
parallel numerical methods. The ability to understand, develop, and implement parallel
PDE solvers requires not only some basic knowledge about PDEs, discretization methods,
and solution techniques, but also some knowledge about parallel computers, parallel pro-
gramming, and the run-time behavior of parallel algorithms. Our tutorial provides this
knowledge injust eight short chapters. Wekept the examples simple so that the paralleliza-
tion strategies are not dominated by technical details. The practical course for the tutorial
can be downloaded from the internet (see Chapter 1for the internet addresses).
This tutorial is intended for advanced undergraduateand graduate students in compu-
tational sciences and engineering. However, our book can be helpful to many people who
use PDE-based parallel computer simulations in their professions. It is important to know
at least something about the possible errors and bottlenecks in parallel scientific computing.
We are indebted to the reviewers, who contributed a lot to the improvement of our
manuscript. In particular, we would like to thank Michael J. Hoist and David E. Keyes for
many helpful hints and wordsof advice. Wewouldlike to acknowledge theAustrian Science
Fund for supporting our cooperation within the Special Research Program on "Numerical
and Symbolic Scientific Computing" under the grant SFB F013 and the NSF for grants
xvii
24. XVIII Preface
DMS-9707040, CCR-9902022, CCR-9988165, and ACR-9721388.
Finally, we would like to thank our families, friends, and Gassners' Most & Jausn
(http://www.mostundjausn.at), who put up with us, nourished us, and allowed
us to write this book.
Greenwich, Connecticut (U.S.A.)
Linz (Austria)
Craig C. Douglas
Gundolf Haase and Ulrich Langer
September 2002
25. Chapter 1
Introduction
A couple months in the laboratory can save a couple hours
in the library.
—Frank H. Westheimer's Discovery
The computer simulation of physical phenomena and technical processes has become a
powerful tool in developing new products and technologies. The computationalsupplement
to many classical and new sciences, such as computational engineering, computational
physics, computational chemistry, computational biology, computational medicine, and
computational finance, is due to the wide use duringthe last decade of computational science
methods in almost all sciences, including life and business sciences. Most of the models
behind these computer simulations are based on partial differential equations (PDEs). After
this modeling phase, which can be very complex and time consuming, the first step toward
computer simulation consists of the discretization of the PDE model. This usuallyresults in
a large-scale linear systemof algebraic equations or even in a sequence of such systems,e.g.,
in nonlinear and/or time-dependent problems. In the latter case, the fast solution of these
systems of linear algebraic equations is crucial for the overall efficiency of the computer
simulation. The growing complexity of the models increasingly requires the use of parallel
computers. Clusters of workstations or even PCs are now very popular as the basic hardware
configuration for the computer simulation of complex problems, and even as home parallel
systems. Soon, with the forthcoming multiprocessor on a chip systems, multiprocessor
laptops will make small-scale parallel processing commonplace, even on airplanes.
The correct handling of computer simulations requires interdisciplinary abilities and
knowledge in different areas of applied mathematics and computer sciences and, of course,
in the concrete field to which the application belongs. More precisely, besides some under-
standing of modeling, it requires at least basic knowledge of PDEs, discretization methods,
numerical linear algebra, and, last but not least, computer science. The aim of this tutorial
is to provide such basic knowledge not only to students in applied mathematics and com-
puter science but also to students in various computational sciences and to people who use
computational methods in their own research or applications.
1
26. Chapter 1. Introduction
Chapter 2 provides the first ideas about what goes on in a computer simulation based
on PDEs. We look at the Poisson equation, which is certainly one of the most important
PDEs, not only as the most widely used model problem for second-order elliptic PDEs, but
also from a practical point of view. Fast Poisson solvers are needed in many applications,
such as heat conduction (see also Example 4.3), electrical field computation (potential
equation) [107], and pressure correction in computational fluid dynamics (CFD) simulation
[36]. The simplest discretization method for the Poisson equation is certainly the classical
finite difference method (FDM). Replacing the computational domain with some uniformly
spaced grid, substituting second-order derivatives of the Laplace operator with second-order
differences, and taking into account the boundary conditions (BCs), we arrive at a large-
scale, sparse system of algebraic equations. As simple as the FDM is, many problems
and difficulties arise in the case of more complicated computational domains, differential
operators, and BCs. These difficulties can be overcome by the finite element method (FEM),
which is discussed at length in Chapter 4. Nevertheless, due to its simple structure, the FDM
has some advantages in some applications. The efficiency of the solution process is then
mainly determined by the efficiency of the method for solving the corresponding linear
system of algebraic equations, which is nothing more than the discrete representation of
our PDE in the computer. In principle, there are two different classes of solvers: direct
methods and iterative solvers. We will examine some representatives from each class in
their sequential version in order to see their strengths and weaknesses with respect to our
systems of finite difference equations. Finally, we move from a sequential to a parallel
algorithm using elementary domain decomposition (DD) techniques.
Before usingaparallel solver, thereader shouldbe familiarwith somebasics of parallel
computing. Chapter 3 starts with a rough but systematic view of the rapidly developing
parallel computer hardware and how its characteristics have to be taken into account in
program and algorithm design. The presentation focuses mainly on distributed memory
computers as available in clustersof workstations and PCs. Only a fewbasic communication
routines are introduced in a general way and they are sufficient to implement parallel solvers
for PDEs. The interested reader can first try parallel programs just by using the standard
message-passing interface (MPI) library and can start to create a personal library of routines
necessary for the Exercises of the following chapters. The methodology of the parallel
approach in this tutorial, together with the very portable MPI library, allows the development
and running ofthese parallel programs on a simple PC orworkstation because several parallel
processes can run on one CPU. As soon as the parallel program is free of obvious errors, the
codes canberecompiled andrun on areal parallelcomputer,i.e., anexpensive supercomputer
or a much cheaper cluster of PCs (such as the Beowulf Project; see http://beowulf.org). If a
programmer writes a parallel program, then it will not take long before someone else claims
that a newer code is superior. Therefore, a few points on code efficiency in the context of
parallel numerical software are given to assist the reader.
The FEM is nowadays certainly the most powerful discretization technique on Earth
for elliptic boundary value problems (BVPs). Due to its great flexibility the FEM is more
widely used in applications than the FDM considered in our introductory chapter for the
Poisson equation. Chapter 4 gives a brief introduction to the basic mathematicalknowledge
that is necessary for the implementation of a simple finite element code from a mathematical
viewpoint. It is not so important to know a lot about Sobolev spaces, but it is important
to be able to derive the variational, or weak, formulation of the BVP that we are going to
2
27. Chapter 1. Introduction 3
solve from its classical statement. The basic mathematical tool for this is the formula of
integration by parts. In contrast to the FDM, where we derive the finite difference equations
directly from the PDE by replacing the derivatives with differences, the FEM starts with
the variational formulation of the BVP and looks for an approximation to its exact solution
in the form of some linear combination of a finite number of basis functions with local
support. The second part of Chapter 4 provides a precise description of the procedure for
deriving the finite element equations in the case of linear triangular finite elements. This
procedure can easily be generalized to other types of finite elements. After reading this
part, the reader should be able to write a first finite element code that generates the finite
element equations approximating the BVP. The final and usually the most time-consuming
step in finite element analysis consists of the solution of the finite element equations, which
is nothing more than a large, sparse system of algebraic equations. The parallelization of
the solver of the finite element equations is the most important part in the parallelization of
the entire finite element code. The parallelization of the other parts of a finite element code
is more or less straightforward. The solution methods and their parallelization are mainly
discussed in Chapters 5 and 6. We will see in Chapter 2 that iterative solvers are preferred
for really large scale systems, especially if we are thinking of their parallel solution. That
is why we focus our attention on iterative solvers. Some theoretical aspects that are directly
connected with the iterative solution of the finite element equations are considered at the end
of Chapter 4. There is also a subsection (section 4.2.3) that discusses briefly the analysisof
the FEM. This subsection can be skipped by the reader who is only interested in the practical
aspects of the FEM.
Before the parallel numerical algorithms for solving the discrete problems, Chapter 5
investigates basic numerical features needed in all of the appropriate solution algorithms
with respect to their adaptation to parallel computers. Here we concentrate on a data
decomposition that is naturally based on the FEM discretization. We see that numerical
primitives such as inner product and matrix-vector multiplication are amazingly easy to
parallelize. The Exercises refer to Chapter 3 and they will lead the reader directly to a
personal parallel implementation of data manipulations on parallel computers.
Conventional classical direct and iterative solution methods for solving large systems
of algebraic equations are analyzed for their parallelization properties in Chapter 6. Here the
detailed investigations of basic numerical routines from the previous chapter simplify the
parallelization significantly, especially for iterative solvers. Again, the reader can implement
a personal parallel solver guided by the Exercises.
Classical solution methods suffer from the fact that the solution time increases much
faster than the number of unknowns in the system of equations; e.g., it takes the solver 100–
1000 times as long when the number of unknownsis increased by a factor of 10. Chapter 7
introduces briefly a multigrid solver that is an optimal solver in the sense that it is 10times as
expensive with respect to both memory requirements and solution time for 10times as many
unknowns. Thanks to the previous chapters, the analysis of the parallelization properties is
rather simple for this multigrid algorithm. If the reader has followed the practical course up
to this stage, then it will be easy to implement a first parallel multigrid algorithm.
Chapter 8 addresses some problems that are not discussed in the book but that are
closely related to the topics of this tutorial. The references given there should help the
reader to generalize the parallelization techniques presented in this tutorial to other classes
of PDE problems and applications. We also refer the reader to [45], [104], and [53] for
28. 4 Chapter 1. Introduction
further studies in parallel scientific computing.
A list of abbreviations and notation is provided on pages xiii–xiv. Finally, the Ap-
pendix provides a guide to the internet.
The practical course for the tutorial can be downloaded from
http://www.numa.uni-linz.ac.at/books#Douglas-Haase-Langer
or
http://www.mgnet.org/mgnet-books.html#Douglas-Haase-Langer.
Theory cannot replace practice, but theory can greatly enhance the understanding of
what is going on in the computations. Do not believe the computational results blindly
without knowing something about possible errors, such as modeling errors, discretization
errors, round-off errors, iteration errors, and, last but not least, programming errors.
29. Chapter 2
A Simple Example
The things of this world cannot be made known without a
knowledge of mathematics.
—Roger Bacon (1214–1294)
2.1 The Poisson equation and its finite difference
discretization
Let us start with a simple, but at the same time very important, example, namely, the
Dirichlet boundary value problem (BVP) for the Poisson equation in the rectangular domain
= (0,2) x (0,1) with the boundary F = : Given some real function / in and
some real function g on F, find a real function u : defined on := such that
The given functions as well as the solution are supposed to be sufficiently smooth. The
differential operator A is called the Laplace operator. For simplicity, we consider only
homogeneous Dirichlet boundary conditions (BCs); i.e., g = 0 on .
The Poisson equation (2.1) is certainly the most prominent representative of second-
order elliptic partial differential equations (PDEs), not only from a practical point of view,
but also asthe most frequently used model problem fortesting numerical algorithms. Indeed,
the Poisson equation is the model problem for elliptic PDEs, much like the heat and wave
equations are for parabolic and hyperbolic PDEs.
The Poisson equation can be solved analytically in special cases, such as rectangular
domains withjust the right BCs, e.g., with homogeneous Dirichlet BCs, as imposed above.
Due to the simple structure of the differential operator and the simple domain in (2.1), a
considerable body of analysis is known that can be used to derive or verify solution methods
for Poisson's equation and other more complicated equations.
5
30. 6 Chapter 2. A Simple Example
Figure 2.1. Rectangular domain with equidistant grid points.
If we split the intervals in both the x and y directions into Nx and Ny subintervals
of length h each (i.e., Nx = 2Ny), then we obtain a grid (or mesh) of nodes like the one
presented in Fig. 2.1. We define the set of subscripts for all nodes by w := {(i, j) : i =
the set of subscripts belonging to interior nodes by h — (i, j) : i =
and the corresponding set of boundary nodes by yh :=wh w h .
Furthermore, we set fi,j — f(xt, y;) and we denote the approximate values of the solution
u(Xi, yj) of (2.1) at the grid points (xt, yj) := (ih, jh) by the values M,J = uh(xi, yj) of
some grid function UH '• &>h —>R. Here and in the following we associate the set of indices
with the corresponding set of grid points; i.e., &J/, 3 (i, j) •<->• (*,-,yj) €. £2/,. Replacing both
second derivatives in (2.1) with second-order finite differences at the grid points:
we immediately arrive at the following five-point stencil finite difference scheme that rep-
resents the discrete approximation to (2.1) on the grid aJh: Find the values w,,; of the grid
function uh : c5h -> R at all grid points (xi, yj), with (i, j) e , such that
Arranging the interior (unknown)values utj of the grid function UH in a proper way in
some vector uh, e.g., along the vertical (or horizontal) grid lines, and taking into account the
BCs on Yh, we observe that the finite difference scheme (2.4) is equivalent to the following
system of linear algebraic equations: Find u_h e Rn, N = (Nx —1) • (Ny — 1), such that
31. 2.1. The Poisson equation and its finite difference discretization 7
Figure 2.2. Structure of matrix and load vector.
where the N x N system matrix Kh and the right-hand side vector f can be rewritten
from (2.4)in an explicit form. This is illustrated in Fig.2.2.
The (band as well as profile) structure of the matrix Kf, heavily depends on the ar-
rangement of the unknowns, i.e., on the numbering of the grid points. In our example, we
prefer the numbering along the vertical grid lines because exactly this numbering gives us
the smallest bandwidth in the matrix. Keeping the bandwidth as small as possible is very
important for the efficiency of direct solvers.
Before reviewing some solution methods, we first look at further algebraic andanalytic
properties of the system matrix Kh that may have some impact on the efficiency of solvers,
especially for large-scale systems. Thefinerthe discretization, the larger the system is. More
precisely, the dimension N of the system grows like O(h~m
), where m is the dimension
of the computational domain ; i.e., m — 2 for our model problem (2.1). Fortunately, the
matrix KH is sparse. A sparse matrix is one with only a few nonzero elements per matrix
row and column, independent of its dimension. Our matrix Kh in (2.5) has at most five
nonzero entries per row and per column, independent of the fineness of the discretization.
This property is certainly the most important one with respect to the efficiency of iterative
as well as direct solvers.
A smart discretization technique should preserve the inherent properties of the differ-
ential operator involved in the BVP. In our case, the matrix Kh is symmetric (Kh — K ) and
positive definite ((Khuh,uh) > 0 Vuh = 0). These properties result from the symmetry
(formal self-adjointness) and uniform ellipticity of the Laplace operator. Symmetric posi-
tive definite (SPD) matrices are regular. Regular matrices are invertible. Thus, our system
of finite difference equations (2.4)has a unique solution.
Unfortunately, the matrix Kh is badly conditioned. The spectral condition number
K(Kh) '.= max(Kh)/^-min(Kh) defined by the ratio of the maximal eigenvalue max(Kh)
and the minimal eigenvalue kmin(Kh) of the matrix Kh behaves like O(h~2
) if h tends to
32. 8 Chapter 2. A Simple Example
0. That behavior affects the convergence rate of all classical iterative methods and it can
deteriorate the accuracy of the solution obtained by some direct method due to accumulated
round-off errors, especially on fine grids. These properties are not only typical features of
matrices arising from the finite difference discretization, but also characteristic of matrices
arising from the finite element discretization discussed in Chapter 4.
2.2 Sequential solving
There exist two vastly different approaches for solving (2.5): direct methods and iterative
methods. For notational purposes we consider an additive splitting of the matrix K =
E + D + F into its strictly lower part E, its diagonal part D = diag(K), and its strictly
upper part F. Here and in the following we omit the subscript h.
2.2.1 Direct methods
Direct methods produce the exact solution of systems of algebraic equationsjustlike (2.5)in
a finite number of arithmetical operations, provided that exact arithmetic without round-off
errors is used.
The classical direct method is the Gauss elimination process, which successively
replaces all entries in E with 0 and updates in each step the remaining part of the matrix
and the right-hand side f [46]. This step is called the elimination step. The resulting
transformed system of equations is triangular:
and can easily be solved by replacing the unknownsfrom the bottom to the top. This step
is called backward substitution.
The Gauss elimination produces in fact the so-called LU factorization K = LU of
the matrix K into a lower triangular matrix L, where all main diagonal entries are equal to
1, and an upper triangular matrix U. After this factorization step the solution of our system
(2.5) reduces to the following forward and backward substitution steps for the triangular
systems:
respectively. If the matrix K is symmetric and regular, then the LU factorization can be
put into the form K = LDLT
, extracting some (block) diagonal matrix D. In the SPD
case, the Cholesky factorization K = UT
U is quite popular [46, 80]. There exist several
modifications of the Gaussian elimination method and the factorization techniques, but all
of them use basically the same principles [46, 80, 94].
Each of these direct methods suffers from so-called fill-in during the elimination,
or factorization, step. After the factorization the triangular factors become much denser
than the original matrix. Nonzero entries appear at places within the band or profile of the
triangular factors where zero entries were found in the original matrix. The sparsity of the
original matrix gets lost. Thefill-inphenomenon causes a superlinear growth of complexity.
More precisely, the number of arithmetical operations and the storage requirements grow
like O(h~3m+2
) and O(h~2m+]
), respectively, as the discretization parameter h tends to 0.
33. 2.2. Sequential solving 9
In addition to this, one must be aware of the loss of about log (K(K)) valid digits in the
solution due to round-off errors.
The cost of computing the LU factorization and the substitutions in (2.7) is always
less than the cost of computing K~l
f [65]. Therefore, K~l
should never be explicitly
computed.
2.2.2 Iterative methods
Iterative methods produce a sequence { } of iterates that should converge to the exact
solution u_of our system (2.5) for an arbitrarily chosen initial guess u° e RN
as k tends to
infinity, where the superscript k is the iteration index.
In this introductory section to iterative methods we only consider some classical
iterative methods as special cases of stationary iterative methods of the form
where x is some properly chosen iteration (relaxation) parameter and C is a nonsingular
matrix (sometimes called preconditioner) that can be inverted easily and hopefully improves
the convergence rate. If K and C are SPD, then the iteration process (2.8) converges for
an arbitrarily chosen initial guess ° e RN
provided that r was picked up from the interval
(0, 2/max(C~l
K)), where Xmax(C~~}
K) denotes the maximal eigenvalue of the matrix
C~l
K (see also section 4.2.4).
Setting C := I results in the classical Richardsoniteration
An improvement consists of the choice C :— D and leads to the ca-Jacobi iteration
(r=w)
which is called the Jacobi iteration for = 1.
Choosing C = D + E (thelower triangular part of matrix K, which can be easily
inverted by forward substitution) and T = 1yields the forward Gauss–Seideliteration
Replacing C = D + E with C = D + F (i.e., changing the places of E and F in (2.11))
gives the backward Gauss–Seideliteration.
The slightly different choice C = D + u>E and r = w results in the successive
overrelaxation (SOR)iteration
Changing again the places of E and F in (2.12) gives the backward version of the SOR
iteration in analogy to the backward version of the Gauss–Seidel iteration. The forward
34. 10 Chapter 2. A Simple Example
SOR step (2.12) followed by abackward SOR step gives the so-called symmetricsuccessive
overrelaxation (SSOR) iteration. The SSOR iteration corresponds to our basic iteration
scheme (2.8) with the SSOR preconditioner
and the relaxation parameter T = (a(2 — w). The correct choice of the relaxation param-
eter w (0, 2) is certainly one crucial point for obtaining reasonable convergence rates.
The interested reader can find more information about these basic iterative methods and,
especially, about the correct choice of the relaxation parameters, e.g., in [92, pp. 95–116].
The Laplace operator in (2.1) is the sum of two one-dimensional (1D) differential
operators. Both have been discretized separately by a three-point stencil resulting in regular
N x N matrices Kx and Ky. Therefore, we can express the matrix in (2.5) as the sum
K = Kx + Ky. Each row and column in the discretization (Fig. 2.1) corresponds to one
block in the block diagonal matrices Kx and Ky. All these blocks are tridiagonal after a
temporary and local reordering of the unknowns. We rewrite (2.5) as
giving us the equation
which can easily be converted into the iteration form
This results in an iteration in the x direction that fits into scheme (2.8):
A similar iterationcan be derived for the y direction. Combiningboth iterations as half-steps
in one iteration, just as we combined the forward and backward SOR iterations, defines the
alternating direction implicit iterative method, known as the ADI method:
We refer to section 6.2.3 and to [92, pp. 116–118] for more information about the ADI
methods.
The classical iteration methods suffer from the bad conditioning of the matrices arising
from finite difference, or finite element, discretization. An appropriate preconditioning and
the acceleration by Krylov space methods can be very helpful. On the other hand, the
classical (eventually, properly damped) iteration methods usually have good smoothing
properties; i.e., the high frequencies in a Fourier decomposition of the iteration error are
damped out much faster than the low frequencies. This smoothing property combined with
the coarse grid approximation of the smooth parts leads us directly to multigrid methods,
which are discussed in Chapter 7. The parallelization of the classical and the advanced
35. 2.3. Parallel solving by means of DD 11
Figure 2.3. Two subdomains with five-point stencil discretization.
iteration methods (e.g., the multigrid methods, where we need the classical iteration methods
as smoothers, and the direct methods as coarse grid solvers) is certainly the most important
step toward the efficient solution of really large scale problems. Therefore, in the next
section, we briefly consider the domain decomposition (DD) method, which is nowadays
the basic tool for constructing parallel solution methods.
2.3 Parallel solving by means of DD
The simplest nonoverlapping DD for our rectangle consists of a decomposition of into the
two unit squares = [0, 1] x [0, 1] and 2 = [1, 2] x [0, 1]; see Fig. 2.3. The interface
between the two subdomains is Fc. This implies two classes of nodes. The coupling nodes
on the interface are denoted by subscript "C" and the interior nodes by subscript "I." These
classes of nodes yield the block structure (2.15) in system (2.5):
There are no matrix entries between interior nodes of different subdomains because of the
locality of the five-point stencil used in our finite difference discretization.
A block Gaussian elimination of the upper-right entries of the matrix (which corre-
sponds to the elimination of the interior unknowns)results in the block triangular system of
equations
with the Schur complement
36. 12 Chapter 2. A Simple Example
and a modified right-hand side
Therein,
blockdiag{KI,i, KI2}. The local Schur complements Kc,s —KCi,s Kj] KIC,S are completely
dense in general.
Algorithm 2.1: A simple parallel DD solver (P = 2 for our example).
This procedure of eliminating the interior unknownscan obviously be extended to the
more general case of decomposition into P subdomains. This generalization is depicted in
Algorithm 2.1 (P = 2 for our example of the decomposition of into the two subdomains
i and 2)- The main problem in Algorithm 2.1 consists of forming and solving the Schur
complement system in step II. In our simple example, we can calculate Sc explicitly and
solve it directly. This is exactly the same approach that was used in the classical finite
element substructuring technique. In general, the forming and the direct solution of the
Schur complement systemin step II ofAlgorithm 2.1 is too expensive. The iterative solution
of the Schurcomplement system(sometimes also called the iterative substructuringmethod)
requires only the matrix-by-vector multiplication Sc •u_k
cand eventually a preconditioning
operation wc =
C as basic operations, where d%,— g —Sc •u_k
cdenotes the defect
after k iterations (see section 2.2.2). The matrix-by-vector multiplication Sc •u_^involves
the solution of small systems with the matrices KiiS that can be carried out completely in
parallel. Theconstruction ofareally good Schurcomplement preconditioner isa challenging
task. Nowadays,optimal, or at least almost optimal, Schur complement preconditioners are
available [97]. In our simple example, we can use a preconditioner proposed by M. Dryja
[33]. Dryja's preconditioner replaces the Schur complement with the square root of the
discretized 1D (one-dimensional) Laplacian Ky along the interface (see section 2.2.2) using
the fact that Ky has the discrete eigenvectors ^ = [/•*•;0)],=o ~ [/2sin(/7n'/!)]i=:0 and
eigenvalues i(Ky) = A sin2
(lnh/2), with I = 1,2,..., Ny —l. This allows us toexpress
and Kj —
37. 2.4. Some other discretization methods 13
the defect dc and the solution wc of the preconditioning equation (we omit the iteration
index k for simplicity) as linear combinations of the eigenvectors
Now the preconditioning operation u)c = Cc
}
d_c can be rewritten as follows:
1. Express d_c in terms of eigenfrequencies and calculate the Fourier coefficients yi
(Fourier analysis).
2. Divide these Fourier coefficients by the square root of the eigenvalues of KY; i.e.,
B1 :=
3. Calculate the preconditioned defect by Fourier synthesis.
If we now denote the Fourier transformation by the square matrix F =
the above procedure can be written as
where A = diag
we can solve the system CcMLc — FT
AF wc —dc in three simple steps:
Fourier analysis:
scaling:
Fourier synthesis:
Usually, the Fourier transformations in the Fourier analysis and the Fourier synthesisrequire
a considerable amount of computing power, but this can be dramatically accelerated by the
fast Fourier transformation (FFT), especially if Ny is a power of 2; i.e., Ny = . We refer
to the original paper [23] and the book by W. Briggs and V.Henson [15] for more details.
2.4 Some other discretization methods
Throughout the remainder of this book we emphasize the finite element method (FEM).
Before the FEM is defined in complete detail in Chapter 4, we want to show the reader one
of the primary differences between it and the finite difference method (FDM). Both FDM
and FEM offer radically different ways of evaluating approximate solutions to the original
PDE at arbitrary points in the domain.
The FDM provides solution values at grid points, namely, { , }. Hence, the solution
lies in a vector space, not a function space.
then
with Due to the property
38. 14 Chapter 2. A Simple Example
The FEM computes coefficients for basis functions and produces a solution function
in a function space, not a vector space. The FEM solution uh can be written as
where {u(i,j
} is computed by the FEM for a given set of basis functions {( ( l
-^}.
To get the solution at a random point (x, y) in the domain using the FDM solution
data {MI,;}, there are two possibilities:
• If (x, y) lies on the grid (i.e., x = jc,-and y = jj, for some i, j), then Uh(x, y) = Ujj.
• Otherwise, interpolation must be used. This leads to another error term, which may
be larger than the truncation error of the FDM and lead to a much less accurate
approximate solution than is hoped for.
For the FEM, the basis functions usually have very compact support. Hence, only
a fraction of the {<p(t
'j)
} are nonzero. A bookkeeping procedure identifies nonzero <p(>
'j)
and evaluates (2.19) locally. For orthonormal basis sets, the evaluation can be particularly
inexpensive. There is no interpolation error involved, so whatever the error is in the FEM
is all that is in the evaluation anywhere in the domain.
We do not treat the finite volume methods (FVM) in this book. There are two classes
of FVM:
• ones that are the box scheme composed with an FDM [62], and
• ones that are the box scheme composed with an FEM [6, 59].
In addition, we do not consider the boundary element method, which is also widely
used in some applications [113].
39. Chapter 3
Introduction to Parallelism
The question of whether computers can think isjust like the
question of whether submarines can swim.
—Edsger W. Dijkstra (1930–2002)
3.1 Classifications of parallel computers
3.1.1 Classification by Flynn
In 1966, Michael Flynn [37] categorized parallel computer architectures according to how
the data stream and instruction stream are organized, as is shown in Table 3.1.
Instruction Stream
Single
SISD
SIMD
Multiple
MISD
MIMD
Single
Multiple
Data
Stream
Table 3.1. Classification by Flynn (Flynn's taxonomy).
The multiple instruction, single data (MISD) class describes an empty set. The single
instruction single data (SISD) class contains the normal single-processor computer with
potential internal parallel features. The single instruction, multiple data (SIMD) class has
parallelism at the instruction level. This class contains computers with vector units such
as the Cray T-90, and systolic array computers such as Thinking Machines Corporation
(TMC) CM2 and machines by MasPar. These parallel computers execute one program in
equal steps and are not in the main scope of our investigations. TMC and MasPar expired in
the mid-1990s. Cray was absorbed by SGI in the late 1990s. Like a phoenix, Cray rose from
its ashes in 2001. Besides the recent processors by NEC and Fujitsu, even state-of-the-art
PC processors by Intel and AMD contain a minor vector unit.
15
40. 16 Chapter 3. Introduction to Parallelism
Definition 3.1 (MIMD). MIMD means multiple instructions on multiple data and it char-
acterizes parallelism at the level of program execution, where each process runs its own
code.
Usually, these MIMD codes do not run fully independently, so that we have to distinguish
between competitive processes that have to use shared resources (center part of Fig. 3.1)
and communicatingprocesses, where each process possesses its data stream, which requires
data exchange at certain points in the program (right part of Fig. 3.1).
Figure 3.1. Types of parallel processes.
Normally each process runs the same code on different data sets. This allows us to
introduce a subclass of MIMD, namely, the class single program on multiple data (SPMD).
Definition 3.2 (SPMD). The single program on multiple data programming model is the
main class used inparallel programming today, especially if many processors are used and
an inherent data parallelism is definable.
There is no need for a synchronous execution of the codes at instruction level; thus an
SPMD machine is not an SIMD machine. However, there are usually some synchronization
points in a code to update data that are needed by other processors in order to preserve data
consistency in implemented algorithms.
We distinguish between multiple processes and multiple processors. Processes are
individual programs. There may be one or more processes running on one individual
processor. In each process, there may be one or more threads. Threads are "light-weight"
processes in the sense that they share all of the memory. While almost all of the algorithms
described in this tutorial can be implemented using threads, we normally only assume
parallelism at the processor level.
41. 3.1. Classifications of parallel computers 17
3.1.2 Classificationby memory access
Definition 3.3 (Shared Memory). Shared memory is memory that is accessed by sev-
eral competitiveprocesses "at the same time." Multiple memory requests are handled by
hardware or software protocols.
A shared memory parallel computer has the following advantage and disadvantages:
Each process has access to all of the data. Therefore, a sequential code can be easily
ported to parallel computers with shared memory and usually leads to a first increase
in performance with a small number of processors (2,... , 8).
As the number ofprocessors increases, the numberof memory bank conflicts and other
access conflicts usually rises. Thus, the scalability, i.e., performance proportional to
number of processors (i.e., wall clock time indirectly proportional to the number
of processors) cannot be guaranteed unless there is a lot of memory and memory
bandwidth.
Very efficient access administration and bus systems are necessary to decrease access
conflicts. This is one reason these memory subsystems are more expensive.
We introduce briefly three different models to construct shared memory systems [64].
Definition 3.4 (UMA). In the uniform memory access modelfor shared memory, all pro-
cessors have equal access time to the whole memory, which is uniformly shared by all
processors.
UMA is what vendors of many small shared memory computers (e.g., IBM SP3, two- to
four- processor Linux servers) try to implement. IBM and commodity Linux shared memory
machines have not yet expired as of 2002.
Definition 3.5 (NUMA). In the nonuniform memory access model for shared memory, the
access time to the shared memory varies with the location of theprocessor.
The NUMA model has become the standard model used in shared memory supercomputers
today. Examples are those made today by HP, IBM, and SGI.
Definition 3.6 (COMA). In the cache only memory access modelfor shared memory, all
processors use only their local cache memory, so that this memory model is a special case
of the NUMA model.
The Kendall SquareResearch KSR-1andKSR-2had suchaCOMA memory model. Kendall
Square expired in the mid-1990s.
Definition 3.7 (Distributed Memory). Distributed memory isa collectionof memory pieces
where each of them can be accessed by only one processor. If one processor requiresdata
42. 18 Chapter 3. Introduction to Parallelism
stored in the memory of another processor, then communication between these processors
(communicating processes) is necessary.
We have to take into account the following items for programming on distributed memory
computers:
There are no access conflicts between processors since data are locally stored.
The hardware is relatively inexpensive.
The code is potentially nearly optimally scalable.
There is no direct access to data stored on other processors and so communication
via special channels (links) is necessary. Hence, a sequential code does not run and
special parallel algorithms are required.
The ratio between arithmetic work and communication is one criterion for the quality
of a parallel algorithm. The time needed for communication is underestimated quite
frequently.
Bandwidth and transfer rate of the network between processors are of extreme im-
portance and are always overrated.
Recent parallel computer systems by vendors such as IBM, SUN, SGI, HP, NEC, and
Fujitsu are no longer machines with purely distributed or purely shared memory. They
usually combine 2–16 processors into one computing node with shared memory for these
processors. The memory of different computingnodes behaves again as distributed memory.
Definition 3.8 (DSM). The distributed shared memory model is a compromise between
shared and distributed memory models. The memory is distributed over all processors but
the program can handle the memory as shared. Therefore, this model is also called the
virtual shared memory model
In DSM, the distributed memory is combined with an operating system based on a
message-passing system (see section 3.2.2) which simulates the presence of a global shared
memory, e.g., a "sea of addresses" by the "interconnection fabric" of KSR and SGI.
The great advantageofDSM is that a sequential code runsimmediately onthis memory
model. If the algorithms take advantage of the local properties of data, i.e., most data
accesses of a process can be served from its own local memory, then good scalability
can be achieved. On the other hand, the parallel computer can also be handled as a
pure distributed memory computer.
For example, the SGI parallel machines Origin 2000 and Origin 3000 have symmetric
multiprocessing (SMP). Each processor (or a small group of processors) has its own local
memory. However, the parallel machine handles the whole memory as one huge shared
memory. This realization was made possible by using avery fast crossbar switch (CrayLink).
SGI was not expired as of 2002.
43. 3.1. Classifications of parallel computers 19
Besides hardware solutions to DSM, there are a number of software systems that
simulate DSM by managing the entire memory using sophisticated database techniques.
Definition 3.9 (Terascale GRID Computers). Several computer centers cooperate with
huge systems, connected by very, very fast networks capable of moving a terabyte in a few
seconds. Each node is aforest of possibly thousands of local DSMs. Making one of these
work is a trial (comic relief?).
3.1.3 Communication topologies
How can we connect the processors of a parallel computer? We are particularly interested
in computers with distributed memory since clusters of fast PCs are really easy to construct
and to use. A more detailed discussion of topologies can be found in [75].
Definition 3.10 (Link). A link is a connection between two processors. We distinguish
between a unidirectional link, which can be used only in one direction at a time, and a
bidirectional link, which can be used in both directions at any time. Two unidirectionallinks
are not a bidirectional link, however,even if thefunction is identical.
Definition 3.11 (Topology). An interconnection network of the processes is called a
topology.
Definition 3.12 (Physical Topology). A physical topology is the interconnection network
of the processors (nodes of the graph) given in hardware by the manufacturer.
This network can be configured by changing the cable connections (such as IBM SP)
or by a software reconfiguration of the hardware (e.g., Xplorer, MultiCluster-I). Recent
parallel computers usually have a fixed connection of the processor nodes, but it is also
possible to reconfigure the network for commonly used topologies by means of a crossbar
switch that allows potentially all connections.
As an example, the now ancient Transputer T805 was a very specialized processor
for parallel computing in the early 1990s. It had four hardware links so that it could handle
a 2D (two-dimensional) grid topology (and a 4D hypercube) directly by hardware. Using
one processor per grid point of the finite difference discretization of Poisson's equation in
the square (2.4) in Chapter 2 resulted in a very convenient parallel implementation on this
special parallel computer.
Definition 3.13 (Logical Topology). The logical topology refers to how processes (or
processor nodes) are connected to eachother. This may be given by the useror theoperating
system. Typically it is derived from the data relations or communication structure of the
program.
The mapping from the logical to the physical topology is done via a parallel operating
system or by parallel extensions to the operating system (e.g., see section 3.2.5). For
example, a four-process program might use physical processors 1, 89, 126, and 1023 in a
1024-processor system. Logically, the parallel program would assume it is using processors
44. 20 Chapter 3. Introduction to Parallelism
0 to 3. On most shared memory computers, the physical processors can change during the
course of a long computation.
Definition 3.14 (Diameter). The diameter of a topology is the maximal number of links
that have to be used by a message sentfrom an arbitrary node p to another node q.
The diameter is sometimes difficult to measure precisely. On some machines there is
more than one path possible when sending a message between nodes p and q. The paths
do not necessarily have the same number of links and are dependent on the network traffic
load. In fact, on some machines, a message can fail by not being delivered in a set maximum
period of time.
3.2 Specialties of parallel algorithms
3.2.1 Synchronization
Remark 3.15. Sequential programming is only the expression of our inability to transfer
the natural parallelism of the world to a machine.
Administrating parallel processes provides several challenges that are not typical on
single-processor systems. Synchronizing certain operations so that conflicts do not occur
is a particularly hard problem that cannot be done with software alone.
Definition 3.16 (Undefined Status). An undefined status occurs when the result of a data
manipulation cannot be predicted in advance. It happens mostly when several processes
have to access restricted resources, i.e., sharedmemory.
Figure 3.2. Undefined status.
For example, consider whathappens whenthe twoprocesses in Fig. 3.2running simul-
taneously on different processors modify the same variable. The machine code instruction
INC increments a register variable and DEC decrements it. The value of N depends on the
execution speed of processes A and B and therefore its value is not predictable; i.e., the status
of N is undefined, as shown in Fig. 3.3. In order to avoid programs that can suffer from data
with an undefined status, we have to treat operations A and B as atomic operations; i.e., they
cannot be split any further. This requires exclusive access to N for one process during the
time needed to finish the operation. This exclusive access is handled by synchronization.
Definition 3.17 (Synchronization). Synchronization prevents an undefined status of a
45. 3.2. Specialties of parallel algorithms 21
Figure 3.3. Three possible outcomes.
variable by means of a mechanism that allows processes to access that variable in a well-
defined way.
Synchronization is handled by asemaphore mechanism (see [25,41]) on shared mem-
ory machines and by message passing on distributed memory machines.
Definition 3.18 (Barrier). The barrier is a special synchronization mechanism consisting
of a certainpoint inaprogram thatmust bepassed by allprocesses (or a group of processes)
before the execution continues. This guarantees that each single process has to wait until
all remainingprocesses have reached thatprogram point.
3.2.2 Message passing
Message passing is a mechanism for transferring data directly from one process to another.
We distinguish between blocking and nonblocking communication (Figs. 3.4 and 3.5).
Figure 3.4. Blocking communication.
Blocking communication makes all (often two) involved processes wait until all pro-
cesses signal their readiness for the data/message exchange. Usually the processes wait
47. one of these medals as much as £500 has been given by a firm of
London coin-dealers, so rare is the piece.
The punishment meted out to coiners and clippers of coins in this
reign was incredibly barbarous. In those so-called “good old times” in
one day seven men were hanged and a woman burned for clipping
and counterfeiting the current coin.
A Coinage Act was passed by Parliament in 1696, and under its
provisions all the old hammered money was called in, melted in
furnaces near Whitehall, and sent in ingots to the Tower, to reappear
in the new milled form. That wonderful man, Sir Isaac Newton, was
made Master of the Tower Mint, and the number of mills being
increased by his advice, in a few months, owing to his energy, a time
of great commercial prosperity ensued. In 1810 the new Office of the
Mint was opened on Little Tower Hill, where it still remains.
48. The Beauchamp Tower
The following is taken from Mr Hocking’s article on the Tower Mint:
—
“On the morning of December 20th, 1798, James Turnbull, one
Dalton, and two other men were engaged in the press-room
swinging the fly of the screw-press, while Mr Finch, one of the
manager’s apprentices, fed the press with gold blank pieces, which
were struck into guineas. At nine o’clock Mr Finch sent the men to
their breakfast. They all four went out; but Dalton and Turnbull
49. returned almost directly. And while the latter held the door, Turnbull
drew a pistol and advanced upon Mr Finch, demanding the key of
the closet where the newly-coined guineas were kept. Finch,
paralyzed with fear and surprise, yielded it up. An old gentleman who
was in the room expostulated; but both were forced into a sort of
passage or large cupboard and locked in. Turnbull then helped
himself to the guineas, and managed to get off with no less than
2308. For nine days he effectually concealed himself in the
neighbourhood, and then, while endeavouring to escape to France,
was apprehended. He was tried, convicted, and sentenced to death.
In his defence he cleared Dalton from any willing complicity in the
crime.” Turnbull was executed at the Old Bailey.
50. CHAPTER XVIII
GEORGE I.
With George the First the Whigs came into power, and soon after
the new King’s accession, Robert Harley, Earl of Oxford, and the
former Lord Treasurer, was sent to the Tower on the charge of
having advised the French King as to the best means for capturing
the town of Tournai. Harley had resigned his Treasurer’s staff three
days before Queen Anne’s death, and on the 10th of June 1715, he
was impeached by the Commons, of whom only a short time before
he had been the idol, and committed to the Tower. His courage never
wavered, although he was left to languish for two years in the
fortress, and at length, on petitioning to be tried, he was acquitted in
July 1717. He died seven years later, aged sixty-two. Lord Powis and
Sir William Wyndham soon followed Lord Oxford to the Tower, but
the latter was very shortly after set at liberty without even undergoing
a trial. Wyndham was member for the county of Somerset from 1708
until his death in 1740; he had been Secretary of State for War and
Chancellor of the Exchequer to Queen Anne, as well as Master of
the Buckhounds. His talents and his eloquence made him one of the
foremost men of that brilliant age, and Pope sang his praises:
“How can I Pult’ney, Chesterfield forget,
While Roman spirit charms, and Attic wit;
Or Wyndham, just to freedom and the throne,
The Master of his passions and our own?”
Another distinguished prisoner at this time in the Tower was George
Granville, Lord Lansdowne of Bideford. Descended from that race of
heroes, the Grenvilles of the West, of whom Admiral Sir Richard of
the Revenge was the most famous, and grandson of Sir Bevil
Grenville, killed at the Battle of Lansdowne, George Granville
51. belonged by race and conviction to the party of the Stuarts, and, too
proud to seek safety in flight, as did so many of his contemporaries
at the accession of the House of Hanover, he remained in England,
and even protested from his place in the House of Lords against the
Bill for attainting Ormonde and Bolingbroke. Strongly suspected of
favouring the cause of James Stuart, Lansdowne was accused of
having taken part in a plot for raising an insurrection in the West
Country, where his name was a pillar of strength, “being possessed,”
as Lord Bolingbroke said, “now with the same political phrenzy for
the Pretender as he had in his youth for his father.” The plot was
discovered, and at the close of September 1715, Lansdowne and his
wife were committed to the Tower and kept there in close
confinement until all danger of insurrection had passed away, and
until the rising in the North had been crushed. In Queen Anne’s time,
Lansdowne had been sung as
“Trevanion and Granville as sound as a bell
For the Queen and the Church and Sacheverell.”
52. View of the Tower in the time of George I.
In 1710 he had succeeded Walpole as Minister for War, but he
prided himself more upon his literary gifts than upon those of his
birth and rank, or upon his political eminence. He wrote poetry, sad
stuff, and plays which were worse than his poems, for in these he
out-Wycherlyed Wycherley. The plays of the days of the Restoration
not excepted, there is nothing more indecent in theatrical literature
than Granville’s “The Old Gallant.”
The famous rising in Scotland in 1715 in favour of the son of
James II., the Chevalier de St George, or, as his adherents called
him, James the Third, brought many of the leaders of that ill-starred
rebellion to the Tower, and some to the block. Of the latter the young
Earl of Derwentwater was the most conspicuous. James Radcliffe,
Lord Derwentwater, was the only Englishman of high birth who took
up arms for the Jacobite cause in this rebellion of 1715. He appears
to have been a youth of high merit, and was only twenty-six when he
53. was persuaded to throw life and fortune on the side of the Chevalier.
One who knew him writes “that he was a man formed by nature to be
generally beloved.” His connection with the Stuarts was possibly
brought about by the fact that his mother, Mary Tudor, was a natural
daughter of Charles II., and also that he was a Catholic by birth. He
was a very wealthy landowner, with vast estates, which, after his
execution, were given to Greenwich Hospital. They brought him in,
including the mines, between thirty and forty thousand pounds a
year, a great fortune in those days. His home, from which he derived
his title, was situated in the most beautiful of the English lakes, the
lovely Lake of Derwentwater in Cumberland, and was called Lord’s
Island.
Derwentwater had been taken prisoner at Preston, with six Scotch
noblemen, William Maxwell, Earl of Nithsdale, Robert Dalziel, Earl of
Carnwath, George Seton, Lord Wintoun, William Gordon, Lord
Kenmure, William Murray, Earl of Nairn, and William Widdrington,
Lord Widdrington. They were brought up to London with their arms
tied behind them, their horses led by soldiers, and preceded by
drums and music, in a kind of trumpery triumph, and imprisoned in
the Tower. Much interest was made on their behalf in both Houses of
Parliament; in the Commons, Richard Steele pleaded for them, and
in the Lords, a motion for reading the petition presented to both
Houses, praying the King to show mercy to the prisoners, had only
been carried against the Ministry by a majority of nine. An address
was presented to George the First, praying him to “reprieve such of
the condemned lords as deserved mercy.” To this petition George, or
rather, his Prime Minister, Robert Walpole, answered that the King
would act as he thought most consistent for the dignity of the Crown
and the safety of the people, thus virtually rejecting the address.
Many of those who had places in the Government and had voted
against the Ministry were dismissed from their offices.
54. Window in the Cradle Tower
The trial of the Jacobite lords commenced on the 9th of February,
and lasted ten days. Wintoun, the only one of the prisoners who
pleaded “not guilty,” was the only one pardoned; the others were
condemned to death, Lord Cowper, the Lord High Steward,
pronouncing sentence on the 29th of February as follows:—“And
now, my Lords, nothing remains but that I pronounce upon you, and
sorry I am that it falls to my lot to do it, that terrible sentence of the
law, which must be the same that is usually given against the
meanest offenders in the like kind. The most ignominious and painful
55. parts of it are usually remitted by the grace of the Crown, to persons
of your quality; but the law in this case being deaf to all distinction of
persons, requires that I should pronounce, and accordingly it is
adjudged by this Court, ‘That you, James, Earl of Derwentwater,
William, Lord Widdrington, William, Earl of Nithsdale, Robert, Earl of
Carnwath, William, Viscount Kenmure, and William, Lord Nairne, and
every of you, return to the prison of the Tower, from whence you
came, and thence you must be drawn to the place of execution;
when you come there, you must be hang’d by the neck, but not till
you be dead; for you must be cut down alive; then your bowels must
be taken out, and burnt before your face; then your heads must be
severed from your bodies, and your bodies divided each into four
quarters; and these must be at the King’s disposal. And God
Almighty be merciful to your souls!’”
Widdrington and Carnwath were released by the Act of Grace in
1717, and Lord Nairne was subsequently pardoned, the four
remaining noblemen being left to die.
At ten o’clock in the morning of the 24th of February,
Derwentwater and Kenmure were brought out of the Tower in a
coach and were driven to a house known as the Transport Office, on
Tower Hill, facing the scaffold, which was draped in black cloth; there
they remained whilst the final preparations for their execution were
being carried out.
The first to be led out was young Lord Derwentwater; as he
mounted the scaffold steps his face was seen to be blanched, but
beyond this he showed no other sign of emotion in that supreme
moment, and when he spoke to the people it was with a firm voice
and a composed manner. After praying for some time he rose from
his knees and read a paper in which he declared himself a faithful
subject of the Chevalier St George, whom he said he regarded as
his rightful King. There was some roughness upon the surface of the
block, which Derwentwater perceiving, he bade the executioner
plane it smooth with the axe. He then took off his coat and waistcoat,
telling the headsman to look afterwards in the pockets, where he
should find some money for himself to pay him for his trouble, adding
that the signal for the blow would be when for the third time he
56. repeated the words, “Lord Jesus, receive my soul,” by stretching out
his arms. He was killed at one stroke. Thus perished in his twenty-
eighth year a man who was loved by all who knew him, rich and
poor, and whose memory still lingers in his beautiful northern lake
country in many an old song and ballad.
There is a curious legend connected with Derwentwater’s death to
the effect that after his execution, the peasantry rose and drove Lady
Derwentwater from Lord’s Island, believing that it was at her
instigation that her husband had joined the Jacobite rising; a ravine
near their old home, through which Lady Derwentwater is supposed
to have fled, still goes by the name of the “Lady’s Rake.” On the
night of his execution a brilliant “aurora borealis” lighted the northern
skies of Derwentwater, which the people in that district interpreted as
being a signal of Heaven’s displeasure at the death of the popular
young Earl; and the aurora is still called in the North, “Lord
Derwentwater’s Lights.”
57. The Earl of Derwentwater.
(From a Contemporary Engraving.)
After the scaffold had been cleaned, and every mark of the first
execution removed, Lord Kenmure was brought out from the house
in which he had waited whilst Derwentwater was being put to death,
and came on the scaffold accompanied by his son, two clergymen,
and some other friends. Kenmure, unlike Derwentwater, belonged to
the Church of England. He made no formal speech, but expressed
his sorrow at having pleaded guilty. He told the executioner that he
should give him no signal, but that he was to strike the second time
58. he placed his head upon the block. It required two blows of the axe
to kill him.
Kenmure had married the sister of Robert, Earl of Carnwath, who
was one of his fellow-prisoners, but who was respited and pardoned.
By judicious management, Lady Kenmure was able to save a
remnant out of the forfeited estates of her husband, and, later on,
George the First returned part of the family estates to her and her
children.
Some of the crowd who had gone to Tower Hill that morning in the
hope of seeing three of the Jacobite lords beheaded, must have
been surprised when only two appeared; the third doomed man,
Lord Nithsdale, had made his escape from the Tower a few hours
before his fellow-captives were led out to die.
Lord Nithsdale’s escape on the eve of his execution reads more
like a romance than sober history. But it was his wife who made the
name famous for all time by her devotion and undaunted courage.
All hope seemed lost after the Address for Mercy had been rejected
by the King, and all idea of respite had indeed been abandoned
except by the brave Lady Nithsdale, who was the daughter of
William, Marquis of Powis, and was born about the year 1690. On
hearing of the capture of her husband at Preston, Lady Nithsdale
had ridden up to London from their home, Torreglas, in
Dumfriesshire, through the bitter winter weather, and, although not a
strong woman, had endured all the hardships of the long journey and
the anguish of anxiety regarding her husband, with heroic courage.
Before leaving Torreglas she had buried all the most important
family records in the garden. Accompanied by her faithful Welsh
maid, Evans, and a groom, she rode to Newcastle, and thence by
public stage to York, where the snow lay so thick that no mail-coach
could leave the city for the south. Nothing daunted, Lady Nithsdale
rode all the way to London. On her arrival in the capital, her first
object was to intercede for her husband with the King. She went to
St James’s Palace, where George was holding a drawing-room, and
sat waiting for him in the long corridor on the first floor, through which
the King would pass after leaving his room before entering the state
59. rooms. Lady Nithsdale had never seen George the First, and in order
to make no mistake, she had brought a friend, a Mrs Morgan, who
knew the King by sight. When George appeared, “I threw myself,”
Lady Nithsdale writes, “at his feet, and told him in French that I was
the unfortunate Countess of Nithsdale, that he might not pretend to
be ignorant of my person. But seeing that he wanted to go off without
taking my petition, I caught hold of the skirt of his coat, that he might
stop and hear me. He endeavoured to escape out of my hands, but I
kept such strong hold, that he dragged me on my knees, from the
middle of the room to the very door of the drawing-room. At last one
of the Blue Ribands who attended his Majesty took me round the
waist, while another wrested the coat from my hands. The petition,
which I had endeavoured to thrust into his pocket, fell to the ground
in the scuffle, and I almost fainted away from grief and
disappointment.”
There was no time to be lost, and after this last chance of
obtaining a hearing from King George had failed, Lady Nithsdale
knew that she, and she alone, could save her husband’s life. To this
almost hopeless task she now devoted all her mind and all her
courage.
Returning to the Tower, where she had already been on several
occasions, she pretended to be the bearer of good news. On this
occasion she only remained long enough to tell Lord Nithsdale the
plan she had formed for effecting his deliverance, after which she
returned to her lodgings in Drury Lane. There she confided her plan
to her landlady, a worthy soul, named Mills, and prevailed upon her
to accompany her to the Tower, together with Mrs Morgan, after
some arrangement had been made in their costumes, to which “their
surprise and astonishment made them consent,” writes Lady
Nithsdale, “without thinking of the consequences.” On their way to
the fortress Lady Nithsdale entered into the details of her plan. Mrs
Morgan was to wear a dress belonging to Mrs Mills over her own
clothes, and in this dress Lady Nithsdale would disguise her
husband, and so transformed, he could make his way out of the
Tower. It was a bold scheme, and was admirably carried out in every
detail.
60. On arriving at the Governor’s, now the King’s, House, where Lord
Nithsdale was imprisoned, Lady Nithsdale was only allowed to bring
one friend in at a time, and first introduced Mrs Morgan, a friend, she
said, of her husband, who had come to bid him farewell. Mrs
Morgan, when she had come into the prisoner’s room, took off the
outer dress she was wearing over her own, and into this Lord
Nithsdale was duly introduced. Then Lady Nithsdale asked Mrs
Morgan to go out and bring in her maid Evans. “I despatched her
safe,” she writes, “and went partly downstairs to meet Mrs Mills, who
held her handkerchief to her face, as was natural for a person going
to take a last leave of a friend before his execution; and I desired her
to do this that my lord might go out in the same manner. Her
eyebrows were inclined to be sandy, and as my lord’s were dark and
thick, I had prepared some paint to disguise him. I had also got an
artificial headdress of the same coloured hair as hers, and rouged
his face and cheeks, to conceal his beard which he had not had time
to shave. All this provision I had before left in the Tower. The poor
guards, whom my slight liberality the day before had endeared me
to, let me go out quietly with my company, and were not so strictly on
the watch as they usually had been, and the more so, as they were
persuaded, from what I had told them the day before, that the
prisoners would obtain their pardon. I made Mrs Mills take off her
own hood, and put on that which I had brought for her. I then took
her by the hand, and led her out of my lord’s chamber; and in
passing through the next room, in which were several people, with all
the concern imaginable, I said, ‘My dear Mrs Catherine, go in all
haste, and send me my waiting-maid; she certainly cannot reflect
how late it is. I am to present my petition to-night, and if I let slip this
opportunity, I am undone, for to-morrow it is too late. Hasten her as
much as possible, for I shall be on thorns till she comes.’ Everybody
in the room, who were chiefly the guards’ wives and daughters,
seemed to compassionate me exceedingly, and the sentinel
officiously opened me the door. When I had seen her safe out, I
returned to my lord, and finished dressing him. I had taken care that
Mrs Mills did not go out crying, as she came in, that my lord might
better pass for the lady who came in crying and afflicted; and the
more so that as he had the same dress that she wore. When I had
61. almost finished dressing my lord, I perceived it was growing dark,
and was afraid that the light of the candle might betray us, so I
resolved to set off. I went out leading him by the hand, whilst he held
his handkerchief to his eyes. I spoke to him in the most piteous and
afflicted tone, bewailing the negligence of my maid Evans, who had
ruined me by her delay. Then I said, ‘My dear Mrs Betty, for the love
of God, run quickly and bring her with you; you know my lodging,
and if ever you made despatch in your life, do it at present; I am
almost distracted with this disappointment.’ The guards opened the
door, and I went downstairs with him, still conjuring him to make all
possible dispatch. As soon as he had cleared the door, I made him
walk before me, for fear the sentinel should take notice of his walk,
but I continued to press him to make all the despatch he possibly
could. At the bottom of the stairs I met my dear Evans, into whose
hands I confided him. I had before engaged Mr Mills to be in
readiness before the Tower, to conduct him to some place of safety
in case we succeeded. He looked upon the affair as so very
improbable to succeed, that his astonishment, when he saw us,
threw him into such a consternation, that he was almost out of
himself, which, Evans perceiving, with the greatest presence of
mind, without telling Lord Nithsdale anything, lest he should mistrust
them, conducted him to some of her own friends on whom she could
rely, and so secured him, without which we certainly should have
been undone. When she had conducted him, and left him with them,
she returned to Mr Mills, who by this time recovered himself from his
astonishment. They went home together, and having found a place
of security, brought Lord Nithsdale to it. In the meantime, as I had
pretended to have sent the young lady on a message, I was obliged
to return upstairs and go back to my lord’s room in the same feigned
anxiety of being too late, so that everybody seemed sincerely to
sympathise in my distress. When I was in the room I talked as if he
had been really present. I answered my own questions in my lord’s
voice, as nearly as I could imitate it, and walked up and down as if
we were conversing together, till I thought they had time enough
thoroughly to clear themselves of the guards. I then thought proper
to make off also. I opened the door and stood half in it, that those in
the outward chamber might hear what I said, but held it so close that
62. they could not look in. I bade my lord formal farewell for the night,
and added, that something more than usual must have happened to
make Evans negligent, on this important occasion, who had always
been so punctual in the smallest trifles; that I saw no other remedy
than to go in person; that if the Tower was then open, when I had
finished my business, I would return that night; but that he might be
assured I would be with him as early in the morning as I could gain
admittance into the Tower, and I flattered myself I should bring more
favourable news. Then, before I shut the door, I pulled through the
string of the latch, so that it could only be opened on the inside. I
then shut it with some degree of force, that I might be sure of its
being well shut. I said to the servant, as I passed by (who was
ignorant of the whole transaction), that he need not carry in candles
to his master, till my lord sent for them, as he desired to finish some
prayers first.” What an admirable wife was Lady Nithsdale, and what
a devoted maid to her was her “dear Evans.”
The Tower from Tower Hill
63. Lord Nithsdale got safely out of London in the suite of the Venetian
Ambassador,—whose coach and six were sent some days after his
escape to Dover,—disguised in the livery of one of the Ambassador’s
footmen. From Dover he succeeded in getting to Calais, and later on
to Rome.
Although Lady Nithsdale had succeeded in rescuing her lord from
the scaffold, her self-devotion did not end there, her task, she
thought, was still incomplete. In spite of the personal peril she herself
ran if found in England or over the border, for the King was mightily
annoyed at the ruse by which she had snatched her husband from
the jaws of death, Lady Nithsdale determined to protect her son’s
estates, which, owing to the attainder of his father, were now
Government property. Her first step was to recover the papers she
had hidden in the garden at Torreglas. “As I had hazarded my life for
the father,” she writes, “I would not do less than hazard it for the
son.” Attended by the faithful Evans and her groom, who had
accompanied her upon the memorable ride from York to London,
Lady Nithsdale returned to Dumfriesshire. Having arrived safely at
Torreglas, she put a brave face upon her errand, and invited her
neighbours to come and see her as if she had been sent by the
Government itself. On the night before these invitations were due,
this most astute and courageous lady dug up the family papers in the
garden, sending them off at once to a place of safety in the charge of
a trusty retainer. Before day broke she had again started on her
return journey to the south, and while the Dumfries justices were
laying their wise heads together, and consulting whether they should
or should not give orders for the seizure of Lady Nithsdale, she had
put many miles between herself and them. When the good folk of
Dumfries arrived at Torreglas, they found that the lady they sought in
the name of the law had given them the slip.
It is pleasant to picture the impotent rage of George Rex when he
heard of this second defiance of his kingly authority; he declared that
Lady Nithsdale did whatever she pleased in spite of him, and that
she had given him more trouble than any other woman in the whole
of Europe.
64. Lady Nithsdale joined her husband in Rome, where they lived
many years together, he dying in 1749, and his devoted wife
following him to the grave soon afterwards. She rests in the beautiful
Fitzalan Chapel, near Arundel Castle. One hopes that the faithful
Welsh maid, Evans, was with them till the end. According to Lord de
Ros, Lady Nithsdale’s portrait, painted by Godfrey Kneller, still hangs
in her Scottish home. “Her hair,” he says, “is bright brown, slightly
powdered; with large soft eyes, regular features, and a fair
complexion. Her soft expression and delicate appearance give little
indication of the strength of mind and courage she displayed. Her
dress is blue silk, with a border of cambric, and over it a cloak of
brown silk.”
Another of the Jacobite lords, Wintoun, also escaped from the
Tower. Little is known regarding the manner in which he broke his
prison and thus cheated the headsman, but it is supposed that he
managed to saw through the bars of his window, having previously
bribed his gaoler to let him be free and undisturbed in his work of
filing the iron. In his case there were no romantic details, or, if there
were any, they have not come down to us. Of Lord Wintoun’s
escape, Lord de Ros writes: “Being well seconded by friends of the
cause in London, he was conveyed safely to the Continent.”
Another large batch of prisoners who were suspected of being
Jacobites came into the Tower in the year 1722, the most notable of
them being Francis Atterbury, Bishop of Rochester, Thomas, Duke of
Norfolk, Lords North, Orrery, and Grey, Thomas Layer Corkran,
Christopher Layer, and an Irish clergyman named Kelly. Of these,
the last was the only one executed on the charge of high treason.
The plot in which these persons of varying degrees were accused
of being implicated, was to seize the Tower, and raise a rebellion in
favour of the Chevalier, an idea which goes to show that the old
fortress was even as late as the days of our first Hanoverian
sovereign regarded as an essential to the assumption of the
supreme power in the country. Atterbury was attainted and banished,
after undergoing a strict imprisonment, which he endured with much
patience from the 24th of August 1722, until the 18th of January in
the following year.
65. “How pleasing Atterbury’s softer hour;
How shines his soul unconquered in the Tower,”
as Pope has sung it. Atterbury never returned to England, dying after
eight years of exile in France.
In 1724 the Earl of Suffolk was committed to the Tower “for
granting protection in breach of the standing orders of the House of
Lords,” whatever that crime may have been, and in the following
year Lord Chancellor Macclesfield was imprisoned there “for venality
and corruption in the discharge of his office.”
67. Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
ebookfinal.com