SlideShare a Scribd company logo
2
Most read
5
Most read
numerical methods for civil engineering for every one
Numerical Methods
for Engineers
SEVENTH EDITION
Steven C. Chapra
Berger Chair in Computing and Engineering
Tufts University
Raymond P. Canale
Professor Emeritus of Civil Engineering
University of Michigan
NUMERICAL METHODS FOR ENGINEERS, SEVENTH EDITION
Published by McGraw-Hill Education, 2 Penn Plaza, New York, NY 10121. Copyright © 2015 by McGraw-Hill Education.
All rights reserved. Printed in the United States of America. Previous editions © 2010, 2006, and 2002. No part of this
publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system,
without the prior written consent of McGraw-Hill Education, including, but not limited to, in any network or other
electronic storage or transmission, or broadcast for distance learning.
Some ancillaries, including electronic and print components, may not be available to customers outside the United States.
This book is printed on acid-free paper.
1 2 3 4 5 6 7 8 9 0 DOC/DOC 1 0 9 8 7 6 5 4
ISBN 978–0–07–339792–4
MHID 0–07–339792–x
Senior Vice President, Products & Markets: Kurt L. Strand
Vice President, General Manager, Products & Markets: Marty Lange
Vice President, Content Production & Technology Services: Kimberly Meriwether David
Executive Brand Manager: Bill Stenquist
Managing Director: Thomas Timp
Global Publisher: Raghothaman Srinivasan
Developmental Editor: Lorraine Buczek
Marketing Manager: Heather Wagner
Director, Content Production: Terri Schiesl
Senior Content Project Manager: Melissa M. Leick
Buyer: Jennifer Pickel
Cover Designer: Studio Montage, St. Louis, MO
Cover Image: Peak towering above clouds: Royalty-Free/CORBIS; Skysurfers: Getty Images/Digital Vision/RF
Media Project Manager: Sandra M. Schnee
Compositor: Aptara®
, Inc.
Typeface: 10/12 Time Roman
Printer: R. R. Donnelley
All credits appearing on page or at the end of the book are considered to be an extension of the copyright page.
Library of Congress Cataloging-in-Publication Data
Chapra, Steven C.
Numerical methods for engineers / Steven C. Chapra, Berger chair in
computing and engineering, Tufts University, Raymond P. Canale, professor
emeritus of civil engineering, University of Michigan. — Seventh edition.
pages cm
Includes bibliographical references and index.
ISBN 978-0-07-339792-4 (alk. paper) — ISBN 0-07-339792-X (alk. paper)
1. Engineering mathematics—Data processing. 2. Numerical calculations—Data processing
3. Microcomputers—Programming. I. Canale, Raymond P. II. Title.
TA345.C47 2015
518.024’62—dc23 2013041704
To
Margaret and Gabriel Chapra
Helen and Chester Canale
iv
CONTENTS
PREFACE xiv
ABOUT THE AUTHORS xvi
PART ONE
MODELING, PT1.1 Motivation 3
COMPUTERS, AND PT1.2 Mathematical Background 5
ERROR ANALYSIS 3 PT1.3 Orientation 8
CHAPTER 1
Mathematical Modeling and Engineering Problem Solving 11
1.1 A Simple Mathematical Model 11
1.2 Conservation Laws and Engineering 18
Problems 21
CHAPTER 2
Programming and Software 27
2.1 Packages and Programming 27
2.2 Structured Programming 28
2.3 Modular Programming 37
2.4 Excel 39
2.5 MATLAB 43
2.6 Mathcad 47
2.7 Other Languages and Libraries 48
Problems 49
CHAPTER 3
Approximations and Round-Off Errors 55
3.1 Significant Figures 56
3.2 Accuracy and Precision 58
3.3 Error Definitions 59
3.4 Round-Off Errors 65
Problems 79
CONTENTS v
CHAPTER 4
Truncation Errors and the Taylor Series 81
4.1 The Taylor Series 81
4.2 Error Propagation 97
4.3 Total Numerical Error 101
4.4 Blunders, Formulation Errors, and Data Uncertainty 106
Problems 108
EPILOGUE: PART ONE 110
PT1.4 Trade-Offs 110
PT1.5 Important Relationships and Formulas 113
PT1.6 Advanced Methods and Additional References 113
PART TWO
ROOTS OF PT2.1 Motivation 117
EQUATIONS 117 PT2.2 Mathematical Background 119
PT2.3 Orientation 120
CHAPTER 5
Bracketing Methods 123
5.1 Graphical Methods 123
5.2 The Bisection Method 127
5.3 The False-Position Method 135
5.4 Incremental Searches and Determining Initial Guesses 141
Problems 142
CHAPTER 6
Open Methods 145
6.1 Simple Fixed-Point Iteration 146
6.2 The Newton-Raphson Method 151
6.3 The Secant Method 157
6.4 Brent’s Method 162
6.5 Multiple Roots 166
6.6 Systems of Nonlinear Equations 169
Problems 173
CHAPTER 7
Roots of Polynomials 176
7.1 Polynomials in Engineering and Science 176
7.2 Computing with Polynomials 179
7.3 Conventional Methods 182
vi CONTENTS
7.4 Müller’s Method 183
7.5 Bairstow’s Method 187
7.6 Other Methods 192
7.7 Root Location with Software Packages 192
Problems 202
CHAPTER 8
Case Studies: Roots of Equations 204
8.1 Ideal and Nonideal Gas Laws (Chemical/Bio Engineering) 204
8.2 Greenhouse Gases and Rainwater (Civil/Environmental Engineering) 207
8.3 Design of an Electric Circuit (Electrical Engineering) 209
8.4 Pipe Friction (Mechanical/Aerospace Engineering) 212
Problems 215
EPILOGUE: PART TWO 226
PT2.4 Trade-Offs 226
PT2.5 Important Relationships and Formulas 227
PT2.6 Advanced Methods and Additional References 227
PART THREE
LINEAR ALGEBRAIC PT3.1 Motivation 231
EQUATIONS 231 PT3.2 Mathematical Background 233
PT3.3 Orientation 241
CHAPTER 9
Gauss Elimination 245
9.1 Solving Small Numbers of Equations 245
9.2 Naive Gauss Elimination 252
9.3 Pitfalls of Elimination Methods 258
9.4 Techniques for Improving Solutions 264
9.5 Complex Systems 271
9.6 Nonlinear Systems of Equations 271
9.7 Gauss-Jordan 273
9.8 Summary 275
Problems 275
CHAPTER 10
LU Decomposition and Matrix Inversion 278
10.1 LU Decomposition 278
10.2 The Matrix Inverse 287
10.3 Error Analysis and System Condition 291
Problems 297
CONTENTS vii
CHAPTER 11
Special Matrices and Gauss-Seidel 300
11.1 Special Matrices 300
11.2 Gauss-Seidel 304
11.3 Linear Algebraic Equations with Software Packages 311
Problems 316
CHAPTER 12
Case Studies: Linear Algebraic Equations 319
12.1 Steady-State Analysis of a System of Reactors (Chemical/Bio Engineering) 319
12.2 Analysis of a Statically Determinate Truss (Civil/Environmental Engineering) 322
12.3 Currents and Voltages in Resistor Circuits (Electrical Engineering) 326
12.4 Spring-Mass Systems (Mechanical/Aerospace Engineering) 328
Problems 331
EPILOGUE: PART THREE 341
PT3.4 Trade-Offs 341
PT3.5 Important Relationships and Formulas 342
PT3.6 Advanced Methods and Additional References 342
PART FOUR
OPTIMIZATION 345 PT4.1 Motivation 345
PT4.2 Mathematical Background 350
PT4.3 Orientation 351
CHAPTER 13
One-Dimensional Unconstrained Optimization 355
13.1 Golden-Section Search 356
13.2 Parabolic Interpolation 363
13.3 Newton’s Method 365
13.4 Brent’s Method 366
Problems 368
CHAPTER 14
Multidimensional Unconstrained Optimization 370
14.1 Direct Methods 371
14.2 Gradient Methods 375
Problems 388
viii CONTENTS
CHAPTER 15
Constrained Optimization 390
15.1 Linear Programming 390
15.2 Nonlinear Constrained Optimization 401
15.3 Optimization with Software Packages 402
Problems 413
CHAPTER 16
Case Studies: Optimization 416
16.1 Least-Cost Design of a Tank (Chemical/Bio Engineering) 416
16.2 Least-Cost Treatment of Wastewater (Civil/Environmental Engineering) 421
16.3 Maximum Power Transfer for a Circuit (Electrical Engineering) 425
16.4 Equilibrium and Minimum Potential Energy (Mechanical/Aerospace Engineering) 429
Problems 431
EPILOGUE: PART FOUR 438
PT4.4 Trade-Offs 438
PT4.5 Additional References 439
PART FIVE
CURVE FITTING 441 PT5.1 Motivation 441
PT5.2 Mathematical Background 443
PT5.3 Orientation 452
CHAPTER 17
Least-Squares Regression 456
17.1 Linear Regression 456
17.2 Polynomial Regression 472
17.3 Multiple Linear Regression 476
17.4 General Linear Least Squares 479
17.5 Nonlinear Regression 483
Problems 487
CHAPTER 18
Interpolation 490
18.1 Newton’s Divided-Difference Interpolating Polynomials 491
18.2 Lagrange Interpolating Polynomials 502
18.3 Coefficients of an Interpolating Polynomial 507
18.4 Inverse Interpolation 507
18.5 Additional Comments 508
18.6 Spline Interpolation 511
18.7 Multidimensional Interpolation 521
Problems 524
CONTENTS ix
CHAPTER 19
Fourier Approximation 526
19.1 Curve Fitting with Sinusoidal Functions 527
19.2 Continuous Fourier Series 533
19.3 Frequency and Time Domains 536
19.4 Fourier Integral and Transform 540
19.5 Discrete Fourier Transform (DFT) 542
19.6 Fast Fourier Transform (FFT) 544
19.7 The Power Spectrum 551
19.8 Curve Fitting with Software Packages 552
Problems 561
CHAPTER 20
Case Studies: Curve Fitting 563
20.1 Linear Regression and Population Models (Chemical/Bio Engineering) 563
20.2 Use of Splines to Estimate Heat Transfer (Civil/Environmental Engineering) 567
20.3 Fourier Analysis (Electrical Engineering) 569
20.4 Analysis of Experimental Data (Mechanical/Aerospace Engineering) 570
Problems 572
EPILOGUE: PART FIVE 582
PT5.4 Trade-Offs 582
PT5.5 Important Relationships and Formulas 583
PT5.6 Advanced Methods and Additional References 584
PART SIX
NUMERICAL PT6.1 Motivation 587
DIFFERENTIATION PT6.2 Mathematical Background 597
AND PT6.3 Orientation 599
INTEGRATION 587
CHAPTER 21
Newton-Cotes Integration Formulas 603
21.1 The Trapezoidal Rule 605
21.2 Simpson’s Rules 615
21.3 Integration with Unequal Segments 624
21.4 Open Integration Formulas 627
21.5 Multiple Integrals 627
Problems 629
x CONTENTS
CHAPTER 22
Integration of Equations 633
22.1 Newton-Cotes Algorithms for Equations 633
22.2 Romberg Integration 634
22.3 Adaptive Quadrature 640
22.4 Gauss Quadrature 642
22.5 Improper Integrals 650
Problems 653
CHAPTER 23
Numerical Differentiation 655
23.1 High-Accuracy Differentiation Formulas 655
23.2 Richardson Extrapolation 658
23.3 Derivatives of Unequally Spaced Data 660
23.4 Derivatives and Integrals for Data with Errors 661
23.5 Partial Derivatives 662
23.6 Numerical Integration/Differentiation with Software Packages 663
Problems 670
CHAPTER 24
Case Studies: Numerical Integration and Differentiation 673
24.1 Integration to Determine the Total Quantity of Heat (Chemical/Bio
Engineering) 673
24.2 Effective Force on the Mast of a Racing Sailboat (Civil/Environmental
Engineering) 675
24.3 Root-Mean-Square Current by Numerical Integration (Electrical
Engineering) 677
24.4 Numerical Integration to Compute Work (Mechanical/Aerospace
Engineering) 680
Problems 684
EPILOGUE: PART SIX 694
PT6.4 Trade-Offs 694
PT6.5 Important Relationships and Formulas 695
PT6.6 Advanced Methods and Additional References 695
PART SEVEN
ORDINARY PT7.1 Motivation 699
DIFFERENTIAL PT7.2 Mathematical Background 703
EQUATIONS 699 PT7.3 Orientation 705
CONTENTS xi
CHAPTER 25
Runge-Kutta Methods 709
25.1 Euler’s Method 710
25.2 Improvements of Euler’s Method 721
25.3 Runge-Kutta Methods 729
25.4 Systems of Equations 739
25.5 Adaptive Runge-Kutta Methods 744
Problems 752
CHAPTER 26
Stiffness and Multistep Methods 755
26.1 Stiffness 755
26.2 Multistep Methods 759
Problems 779
CHAPTER 27
Boundary-Value and Eigenvalue Problems 781
27.1 General Methods for Boundary-Value Problems 782
27.2 Eigenvalue Problems 789
27.3 Odes and Eigenvalues with Software Packages 801
Problems 808
CHAPTER 28
Case Studies: Ordinary Differential Equations 811
28.1 Using ODEs to Analyze the Transient Response of a Reactor (Chemical/Bio
Engineering) 811
28.2 Predator-Prey Models and Chaos (Civil/Environmental Engineering) 818
28.3 Simulating Transient Current for an Electric Circuit (Electrical Engineering) 822
28.4 The Swinging Pendulum (Mechanical/Aerospace Engineering) 827
Problems 831
EPILOGUE: PART SEVEN 841
PT7.4 Trade-Offs 841
PT7.5 Important Relationships and Formulas 842
PT7.6 Advanced Methods and Additional References 842
PART EIGHT
PARTIAL PT8.1 Motivation 845
DIFFERENTIAL PT8.2 Orientation 848
EQUATIONS 845
xii CONTENTS
CHAPTER 29
Finite Difference: Elliptic Equations 852
29.1 The Laplace Equation 852
29.2 Solution Technique 854
29.3 Boundary Conditions 860
29.4 The Control-Volume Approach 866
29.5 Software to Solve Elliptic Equations 869
Problems 870
CHAPTER 30
Finite Difference: Parabolic Equations 873
30.1 The Heat-Conduction Equation 873
30.2 Explicit Methods 874
30.3 A Simple Implicit Method 878
30.4 The Crank-Nicolson Method 882
30.5 Parabolic Equations in Two Spatial Dimensions 885
Problems 888
CHAPTER 31
Finite-Element Method 890
31.1 The General Approach 891
31.2 Finite-Element Application in One Dimension 895
31.3 Two-Dimensional Problems 904
31.4 Solving PDEs with Software Packages 908
Problems 912
CHAPTER 32
Case Studies: Partial Differential Equations 915
32.1 One-Dimensional Mass Balance of a Reactor (Chemical/Bio
Engineering) 915
32.2 Deflections of a Plate (Civil/Environmental Engineering) 919
32.3 Two-Dimensional Electrostatic Field Problems (Electrical
Engineering) 921
32.4 Finite-Element Solution of a Series of Springs
(Mechanical/Aerospace Engineering) 924
Problems 928
EPILOGUE: PART EIGHT 931
PT8.3 Trade-Offs 931
PT8.4 Important Relationships and Formulas 931
PT8.5 Advanced Methods and Additional References 932
CONTENTS xiii
APPENDIX A: THE FOURIER SERIES 933
APPENDIX B: GETTING STARTED WITH MATLAB 935
APPENDIX C: GETTING STARTED WITH MATHCAD 943
BIBLIOGRAPHY 954
INDEX 957
xiv
PREFACE
It has been over twenty years since we published the first edition of this book. Over that
period, our original contention that numerical methods and computers would figure more
prominently in the engineering curriculum—particularly in the early parts—has been dra-
matically borne out. Many universities now offer freshman, sophomore, and junior courses in
both introductory computing and numerical methods. In addition, many of our colleagues are
integrating computer-oriented problems into other courses at all levels of the curriculum. Thus,
this new edition is still founded on the basic premise that student engineers should be provided
with a strong and early introduction to numerical methods. Consequently, although we have
expanded our coverage in the new edition, we have tried to maintain many of the features that
made the first edition accessible to both lower- and upper-level undergraduates. These include:
• Problem Orientation. Engineering students learn best when they are motivated by
problems. This is particularly true for mathematics and computing. Consequently, we
have approached numerical methods from a problem-solving perspective.
• Student-Oriented Pedagogy. We have developed a number of features to make this
book as student-friendly as possible. These include the overall organization, the use
of introductions and epilogues to consolidate major topics and the extensive use of
worked examples and case studies from all areas of engineering. We have also en-
deavored to keep our explanations straightforward and oriented practically.
• Computational Tools. We empower our students by helping them utilize the standard
“point-and-shoot” numerical problem-solving capabilities of packages like Excel,
MATLAB, and Mathcad software. However, students are also shown how to develop
simple, well-structured programs to extend the base capabilities of those environ-
ments. This knowledge carries over to standard programming languages such as Visual
Basic, Fortran 90, and C/C11. We believe that the current flight from computer
programming represents something of a “dumbing down” of the engineering curricu-
lum. The bottom line is that as long as engineers are not content to be tool limited,
they will have to write code. Only now they may be called “macros” or “M-files.”
This book is designed to empower them to do that.
Beyond these five original principles, the seventh edition has new and expanded problem
sets. Most of the problems have been modified so that they yield different numerical solu-
tions from previous editions. In addition, a variety of new problems have been included.
The seventh edition also includes McGraw-Hill’s Connect®
Engineering. This online
homework management tool allows assignment of algorithmic problems for homework,
quizzes, and tests. It connects students with the tools and resources they’ll need to achieve
success. To learn more, visit www.mcgrawhillconnect.com.
McGraw-Hill LearnSmart™ is also available as an integrated feature of McGraw-Hill
Connect®
Engineering. It is an adaptive learning system designed to help students learn faster,
study more efficiently, and retain more knowledge for greater success. LearnSmart assesses
PREFACE xv
a student’s knowledge of course content through a series of adaptive questions. It pinpoints
concepts the student does not understand and maps out a personalized study plan for success.
Visit the following site for a demonstration. www.mhlearnsmart.com
As always, our primary intent in writing this book is to provide students with a sound
introduction to numerical methods. We believe that motivated students who enjoy numeri-
cal methods, computers, and mathematics will, in the end, make better engineers. If our
book fosters an enthusiasm for these subjects, we will consider our efforts a success.
Acknowledgments. We would like to thank our friends at McGraw-Hill. In particular,
Lorraine Buczek and Bill Stenquist, who provided a positive and supportive atmosphere for
creating this edition. As usual, Beatrice Sussman did a masterful job of copyediting the man-
uscript and Arpana Kumari of Aptara also did an outstanding job in the book’s final production
phase. As in past editions, David Clough (University of Colorado), Mike Gustafson (Duke),
and Jerry Stedinger (Cornell University) generously shared their insights and suggestions. Use-
ful suggestions were also made by Bill Philpot (Cornell University), Jim Guilkey (University
of Utah), Dong-Il Seo (Chungnam National University, Korea), Niall Broekhuizen (NIWA,
New Zealand), and Raymundo Cordero and Karim Muci (ITESM, Mexico). The present edition
has also benefited from the reviews and suggestions by the following colleagues:
Betty Barr, University of Houston
Jalal Behzadi, Shahid Chamran University
Jordan Berg, Texas Tech University
Jacob Bishop, Utah State University
Estelle M. Eke, California State University, Sacramento
Yazan A. Hussain, Jordan University of Science & Technology
Yogesh Jaluria, Rutgers University
S. Graham Kelly, The University of Akron
Subha Kumpaty, Milwaukee School of Engineering
Eckart Meiburg, University of California-Santa Barbara
Prashant Mhaskar, McMaster University
Luke Olson, University of Illinois at Urbana-Champaign
Richard Pates Jr., Old Dominion University
Joseph H. Pierluissi, University of Texas at El Paso
Juan Perán, Universidad Nacional de Educación a Distancia (UNED)
Scott A. Socolofsky, Texas A&M University
It should be stressed that although we received useful advice from the aforementioned
individuals, we are responsible for any inaccuracies or mistakes you may detect in this edi-
tion. Please contact Steve Chapra via e-mail if you should detect any errors in this edition.
Finally, we would like to thank our family, friends, and students for their enduring
patience and support. In particular, Cynthia Chapra, Danielle Husley, and Claire Canale
are always there providing understanding, perspective, and love.
Steven C. Chapra
Medford, Massachusetts
steven.chapra@tufts.edu
Raymond P. Canale
Lake Leelanau, Michigan
xvi
ABOUT THE AUTHORS
Steve Chapra teaches in the Civil and Environmental Engineering Department at Tufts
University where he holds the Louis Berger Chair in Computing and Engineering. His
other books include Surface Water-Quality Modeling and Applied Numerical Methods
with MATLAB.
Dr. Chapra received engineering degrees from Manhattan College and the University
of Michigan. Before joining the faculty at Tufts, he worked for the Environmental Pro-
tection Agency and the National Oceanic and Atmospheric Administration, and taught at
Texas A&M University and the University of Colorado. His general research interests
focus on surface water-quality modeling and advanced computer applications in environ-
mental engineering.
He is a Fellow of the ASCE, and has received a number of awards for his scholarly
contributions, including the Rudolph Hering Medal (ASCE), and the Meriam-Wiley
Distinguished Author Award (American Society for Engineering Education). He has also
been recognized as the outstanding teacher among the engineering faculties at Texas
A&M University, the University of Colorado, and Tufts University.
Raymond P. Canale is an emeritus professor at the University of Michigan. During
his over 20-year career at the university, he taught numerous courses in the area of comput-
ers, numerical methods, and environmental engineering. He also directed extensive research
programs in the area of mathematical and computer modeling of aquatic ecosystems. He
has authored or coauthored several books and has published over 100 scientific papers and
reports. He has also designed and developed personal computer software to facilitate en-
gineering education and the solution of engineering problems. He has been given the
Meriam-Wiley Distinguished Author Award by the American Society for Engineering
Education for his books and software and several awards for his technical publications.
Professor Canale is now devoting his energies to applied problems, where he works
with engineering firms and industry and governmental agencies as a consultant and expert
witness.
Numerical Methods
for Engineers
PART ONE
3
PT1.1 MOTIVATION
Numerical methods are techniques by which mathematical problems are formulated so
that they can be solved with arithmetic operations. Although there are many kinds of
numerical methods, they have one common characteristic: they invariably involve large
numbers of tedious arithmetic calculations. It is little wonder that with the development
of fast, efficient digital computers, the role of numerical methods in engineering problem
solving has increased dramatically in recent years.
PT1.1.1 Noncomputer Methods
Beyond providing increased computational firepower, the widespread availability of com-
puters (especially personal computers) and their partnership with numerical methods has
had a significant influence on the actual engineering problem-solving process. In the
precomputer era there were generally three different ways in which engineers approached
problem solving:
1. Solutions were derived for some problems using analytical, or exact, methods. These
solutions were often useful and provided excellent insight into the behavior of some
systems. However, analytical solutions can be derived for only a limited class of
problems. These include those that can be approximated with linear models and
those that have simple geometry and low dimensionality. Consequently, analytical
solutions are of limited practical value because most real problems are nonlinear and
involve complex shapes and processes.
2. Graphical solutions were used to characterize the behavior of systems. These
graphical solutions usually took the form of plots or nomographs. Although graphical
techniques can often be used to solve complex problems, the results are not very
precise. Furthermore, graphical solutions (without the aid of computers) are extremely
tedious and awkward to implement. Finally, graphical techniques are often limited
to problems that can be described using three or fewer dimensions.
3. Calculators and slide rules were used to implement numerical methods manually.
Although in theory such approaches should be perfectly adequate for solving complex
problems, in actuality several difficulties are encountered. Manual calculations are
slow and tedious. Furthermore, consistent results are elusive because of simple
blunders that arise when numerous manual tasks are performed.
During the precomputer era, significant amounts of energy were expended on the
solution technique itself, rather than on problem definition and interpretation (Fig. PT1.1a).
This unfortunate situation existed because so much time and drudgery were required to
obtain numerical answers using precomputer techniques.
MODELING, COMPUTERS,
AND ERROR ANALYSIS
4 MODELING, COMPUTERS, AND ERROR ANALYSIS
Today, computers and numerical methods provide an alternative for such compli-
cated calculations. Using computer power to obtain solutions directly, you can approach
these calculations without recourse to simplifying assumptions or time-intensive tech-
niques. Although analytical solutions are still extremely valuable both for problem
solving and for providing insight, numerical methods represent alternatives that greatly
enlarge your capabilities to confront and solve problems. As a result, more time is
available for the use of your creative skills. Thus, more emphasis can be placed on
problem formulation and solution interpretation and the incorporation of total system,
or “holistic,” awareness (Fig. PT1.1b).
PT1.1.2 Numerical Methods and Engineering Practice
Since the late 1940s the widespread availability of digital computers has led to a veri-
table explosion in the use and development of numerical methods. At first, this growth
was somewhat limited by the cost of access to large mainframe computers, and, conse-
quently, many engineers continued to use simple analytical approaches in a significant
portion of their work. Needless to say, the recent evolution of inexpensive personal
FIGURE PT1.1
The three phases of engineering
problem solving in (a) the
precomputer and (b) the
computer era. The sizes of the
boxes indicate the level of
emphasis directed toward each
phase. Computers facilitate the
implementation of solution
techniques and thus allow more
emphasis to be placed on the
creative aspects of problem
formulation and interpretation
of results.
INTERPRETATION
Ease of calculation
allows holistic thoughts
and intuition to develop;
system sensitivity and behavior
can be studied
FORMULATION
In-depth exposition
of relationship of
problem to fundamental
laws
SOLUTION
Easy-to-use
computer
method
(b)
INTERPRETATION
In-depth analysis
limited by time-
consuming solution
FORMULATION
Fundamental
laws explained
briefly
SOLUTION
Elaborate and often
complicated method to
make problem tractable
(a)
PT1.2 MATHEMATICAL BACKGROUND 5
computers has given us ready access to powerful computational capabilities. There are
several additional reasons why you should study numerical methods:
1. Numerical methods are extremely powerful problem-solving tools. They are capable
of handling large systems of equations, nonlinearities, and complicated geometries
that are not uncommon in engineering practice and that are often impossible to solve
analytically. As such, they greatly enhance your problem-solving skills.
2. During your careers, you may often have occasion to use commercially available
prepackaged, or “canned,” computer programs that involve numerical methods. The
intelligent use of these programs is often predicated on knowledge of the basic
theory underlying the methods.
3. Many problems cannot be approached using canned programs. If you are conversant
with numerical methods and are adept at computer programming, you can design
your own programs to solve problems without having to buy or commission expensive
software.
4. Numerical methods are an efficient vehicle for learning to use computers. It is well
known that an effective way to learn programming is to actually write computer
programs. Because numerical methods are for the most part designed for
implementation on computers, they are ideal for this purpose. Further, they are
especially well-suited to illustrate the power and the limitations of computers. When
you successfully implement numerical methods on a computer and then apply them
to solve otherwise intractable problems, you will be provided with a dramatic
demonstration of how computers can serve your professional development. At the
same time, you will also learn to acknowledge and control the errors of approximation
that are part and parcel of large-scale numerical calculations.
5. Numerical methods provide a vehicle for you to reinforce your understanding of
mathematics. Because one function of numerical methods is to reduce higher
mathematics to basic arithmetic operations, they get at the “nuts and bolts” of some
otherwise obscure topics. Enhanced understanding and insight can result from this
alternative perspective.
PT1.2 MATHEMATICAL BACKGROUND
Every part in this book requires some mathematical background. Consequently, the in-
troductory material for each part includes a section, such as the one you are reading, on
mathematical background. Because Part One itself is devoted to background material on
mathematics and computers, this section does not involve a review of a specific math-
ematical topic. Rather, we take this opportunity to introduce you to the types of math-
ematical subject areas covered in this book. As summarized in Fig. PT1.2, these are
1. Roots of Equations (Fig. PT1.2a). These problems are concerned with the value of
a variable or a parameter that satisfies a single nonlinear equation. These problems
are especially valuable in engineering design contexts where it is often impossible
to explicitly solve design equations for parameters.
2. Systems of Linear Algebraic Equations (Fig. PT1.2b). These problems are similar in
spirit to roots of equations in the sense that they are concerned with values that
6 MODELING, COMPUTERS, AND ERROR ANALYSIS
f(x)
x
Root
x2
x1
Solution
Minimum
f(x)
x
Interpolation
f(x)
x
f(x)
x
Regression
f(x)
I
(a) Part 2: Roots of equations
Solve f(x) = 0 for x.
(c) Part 4: Optimization
(b) Part 3: Linear algebraic equations
Given the a’s and the c’s, solve
a11x1 + a12x2 = c1
a21x1 + a22x2 = c2
for the x’s.
Determine x that gives optimum f(x).
(e) Part 6: Integration
I = 兰a
b
f(x) dx
Find the area under the curve.
(d) Part 5: Curve fitting
x
FIGURE PT1.2
Summary of the numerical
methods covered in this book.
PT1.2 MATHEMATICAL BACKGROUND 7
satisfy equations. However, in contrast to satisfying a single equation, a set of values
is sought that simultaneously satisfies a set of linear algebraic equations. Such
equations arise in a variety of problem contexts and in all disciplines of engineering.
In particular, they originate in the mathematical modeling of large systems of
interconnected elements such as structures, electric circuits, and fluid networks.
However, they are also encountered in other areas of numerical methods such as
curve fitting and differential equations.
3. Optimization (Fig. PT1.2c). These problems involve determining a value or values
of an independent variable that correspond to a “best” or optimal value of a function.
Thus, as in Fig. PT1.2c, optimization involves identifying maxima and minima. Such
problems occur routinely in engineering design contexts. They also arise in a number
of other numerical methods. We address both single- and multi-variable unconstrained
optimization. We also describe constrained optimization with particular emphasis on
linear programming.
4. Curve Fitting (Fig. PT1.2d). You will often have occasion to fit curves to data points.
The techniques developed for this purpose can be divided into two general categories:
regression and interpolation. Regression is employed where there is a significant
degree of error associated with the data. Experimental results are often of this kind.
For these situations, the strategy is to derive a single curve that represents the general
trend of the data without necessarily matching any individual points. In contrast,
interpolation is used where the objective is to determine intermediate values between
relatively error-free data points. Such is usually the case for tabulated information.
For these situations, the strategy is to fit a curve directly through the data points and
use the curve to predict the intermediate values.
5. Integration (Fig. PT1.2e). As depicted, a physical interpretation of numerical
integration is the determination of the area under a curve. Integration has many
y
x
(g) Part 8: Partial differential equations
Given
solve for u as a function of
x and y
= f(x, y)
⭸2
u
⭸x2
⭸2
u
⭸y2
+
t
Slope =
f(ti, yi)
y
⌬t
ti ti + 1
( f ) Part 7: Ordinary differential equations
Given
solve for y as a function of t.
yi + 1 = yi + f(ti , yi) ⌬t
⯝ = f(t, y)
dy
dt
⌬y
⌬t
FIGURE PT1.2
(concluded)
8 MODELING, COMPUTERS, AND ERROR ANALYSIS
applications in engineering practice, ranging from the determination of the centroids
of oddly shaped objects to the calculation of total quantities based on sets of discrete
measurements. In addition, numerical integration formulas play an important role in
the solution of differential equations.
6. Ordinary Differential Equations (Fig. PT1.2f ). Ordinary differential equations are of
great significance in engineering practice. This is because many physical laws are
couched in terms of the rate of change of a quantity rather than the magnitude of
the quantity itself. Examples range from population-forecasting models (rate of
change of population) to the acceleration of a falling body (rate of change of velocity).
Two types of problems are addressed: initial-value and boundary-value problems. In
addition, the computation of eigenvalues is covered.
7. Partial Differential Equations (Fig. PT1.2g). Partial differential equations are used
to characterize engineering systems where the behavior of a physical quantity is
couched in terms of its rate of change with respect to two or more independent
variables. Examples include the steady-state distribution of temperature on a heated
plate (two spatial dimensions) or the time-variable temperature of a heated rod (time
and one spatial dimension). Two fundamentally different approaches are employed
to solve partial differential equations numerically. In the present text, we will
emphasize finite-difference methods that approximate the solution in a pointwise
fashion (Fig. PT1.2g). However, we will also present an introduction to finite-element
methods, which use a piecewise approach.
PT1.3 ORIENTATION
Some orientation might be helpful before proceeding with our introduction to nu-
merical methods. The following is intended as an overview of the material in Part One.
In addition, some objectives have been included to focus your efforts when studying
the material.
PT1.3.1 Scope and Preview
Figure PT1.3 is a schematic representation of the material in Part One. We have designed
this diagram to provide you with a global overview of this part of the book. We believe
that a sense of the “big picture” is critical to developing insight into numerical methods.
When reading a text, it is often possible to become lost in technical details. Whenever
you feel that you are losing the big picture, refer back to Fig. PT1.3 to reorient yourself.
Every part of this book includes a similar figure.
Figure PT1.3 also serves as a brief preview of the material covered in Part One.
Chapter 1 is designed to orient you to numerical methods and to provide motivation by
demonstrating how these techniques can be used in the engineering modeling process.
Chapter 2 is an introduction and review of computer-related aspects of numerical meth-
ods and suggests the level of computer skills you should acquire to efficiently apply
succeeding information. Chapters 3 and 4 deal with the important topic of error analysis,
which must be understood for the effective use of numerical methods. In addition, an
epilogue is included that introduces the trade-offs that have such great significance for
the effective implementation of numerical methods.
PT1.3 ORIENTATION 9
FIGURE PT1.3
Schematic of the organization of the material in Part One: Modeling, Computers, and Error Analysis.
CHAPTER 1
Mathematical
Modeling and
Engineering
Problem
Solving
PART 1
Modeling,
Computers,
and
Error Analysis
CHAPTER 2
Programming
and Software
CHAPTER 3
Approximations
and Round-Off
Errors
CHAPTER 4
Truncation
Errors and the
Taylor
Series
EPILOGUE
2.7
Languages and
libraries
2.6
Mathcad
2.5
MATLAB
2.4
Excel
2.3
Modular
programming
2.2
Structured
programming
2.1
Packages and
programming
PT 1.2
Mathematical
background
PT 1.6
Advanced
methods
PT 1.5
Important
formulas
4.4
Miscellaneous
errors
4.3
Total numerical
error
4.2
Error
propagation
4.1
Taylor
series
3.4
Round-off
errors
3.1
Significant
figures
3.3
Error
definitions
3.2
Accuracy and
precision
PT 1.4
Trade-offs
PT 1.3
Orientation
PT 1.1
Motivation
1.2
Conservation
laws
1.1
A simple
model
10 MODELING, COMPUTERS, AND ERROR ANALYSIS
TABLE PT1.1 Specific study objectives for Part One.
1. Recognize the difference between analytical and numerical solutions.
2. Understand how conservation laws are employed to develop mathematical models of physical
systems.
3. Define top-down and modular design.
4. Delineate the rules that underlie structured programming.
5. Be capable of composing structured and modular programs in a high-level computer language.
6. Know how to translate structured flowcharts and pseudocode into code in a high-level language.
7. Start to familiarize yourself with any software packages that you will be using in conjunction with
this text.
8. Recognize the distinction between truncation and round-off errors.
9. Understand the concepts of significant figures, accuracy, and precision.
10. Recognize the difference between true relative error et, approximate relative error ea, and
acceptable error es, and understand how ea and es are used to terminate an iterative computation.
11. Understand how numbers are represented in digital computers and how this representation induces
round-off error. In particular, know the difference between single and extended precision.
12. Recognize how computer arithmetic can introduce and amplify round-off errors in calculations. In
particular, appreciate the problem of subtractive cancellation.
13. Understand how the Taylor series and its remainder are employed to represent continuous functions.
14. Know the relationship between finite divided differences and derivatives.
15. Be able to analyze how errors are propagated through functional relationships.
16. Be familiar with the concepts of stability and condition.
17. Familiarize yourself with the trade-offs outlined in the Epilogue of Part One.
PT1.3.2 Goals and Objectives
Study Objectives. Upon completing Part One, you should be adequately prepared to
embark on your studies of numerical methods. In general, you should have gained a
fundamental understanding of the importance of computers and the role of approxima-
tions and errors in the implementation and development of numerical methods. In addi-
tion to these general goals, you should have mastered each of the specific study objectives
listed in Table PT1.1.
Computer Objectives. Upon completing Part One, you should have mastered sufficient
computer skills to develop your own software for the numerical methods in this text. You
should be able to develop well-structured and reliable computer programs on the basis
of pseudocode, flowcharts, or other forms of algorithms. You should have developed the
capability to document your programs so that they may be effectively employed by users.
Finally, in addition to your own programs, you may be using software packages along
with this book. Packages like Excel, Mathcad, or The MathWorks, Inc. MATLAB®
pro-
gram are examples of such software. You should become familiar with these packages,
so that you will be comfortable using them to solve numerical problems later in the text.
1
11
Mathematical Modeling and
Engineering Problem Solving
Knowledge and understanding are prerequisites for the effective implementation of any
tool. No matter how impressive your tool chest, you will be hard-pressed to repair a car
if you do not understand how it works.
This is particularly true when using computers to solve engineering problems.
Although they have great potential utility, computers are practically useless without a
fundamental understanding of how engineering systems work.
This understanding is initially gained by empirical means—that is, by observation
and experiment. However, while such empirically derived information is essential, it is
only half the story. Over years and years of observation and experiment, engineers and
scientists have noticed that certain aspects of their empirical studies occur repeatedly.
Such general behavior can then be expressed as fundamental laws that essentially embody
the cumulative wisdom of past experience. Thus, most engineering problem solving
employs the two-pronged approach of empiricism and theoretical analysis (Fig. 1.1).
It must be stressed that the two prongs are closely coupled. As new measurements are
taken, the generalizations may be modified or new ones developed. Similarly, the general-
izations can have a strong influence on the experiments and observations. In particular,
generalizations can serve as organizing principles that can be employed to synthesize ob-
servations and experimental results into a coherent and comprehensive framework from
which conclusions can be drawn. From an engineering problem-solving perspective, such
a framework is most useful when it is expressed in the form of a mathematical model.
The primary objective of this chapter is to introduce you to mathematical modeling
and its role in engineering problem solving. We will also illustrate how numerical meth-
ods figure in the process.
1.1 A SIMPLE MATHEMATICAL MODEL
A mathematical model can be broadly defined as a formulation or equation that expresses
the essential features of a physical system or process in mathematical terms. In a very
general sense, it can be represented as a functional relationship of the form
Dependent
variable
5 f a
independent
variables
, parameters,
forcing
functions
b (1.1)
C H A P T E R 1
12 MATHEMATICAL MODELING AND ENGINEERING PROBLEM SOLVING
where the dependent variable is a characteristic that usually reflects the behavior or state
of the system; the independent variables are usually dimensions, such as time and space,
along which the system’s behavior is being determined; the parameters are reflective of
the system’s properties or composition; and the forcing functions are external influences
acting upon the system.
The actual mathematical expression of Eq. (1.1) can range from a simple algebraic
relationship to large complicated sets of differential equations. For example, on the
basis of his observations, Newton formulated his second law of motion, which states
that the time rate of change of momentum of a body is equal to the resultant force
acting on it. The mathematical expression, or model, of the second law is the well-
known equation
F 5 ma (1.2)
where F 5 net force acting on the body (N, or kg m/s2
), m 5 mass of the object (kg),
and a 5 its acceleration (m/s2
).
Implementation
Numeric or
graphic results
Mathematical
model
Problem
definition
THEORY DATA
Problem-solving tools:
computers, statistics,
numerical methods,
graphics, etc.
Societal interfaces:
scheduling, optimization,
communication,
public interaction,
etc.
FIGURE 1.1
The engineering problem-
solving process.
1.1 A SIMPLE MATHEMATICAL MODEL 13
The second law can be recast in the format of Eq. (1.1) by merely dividing both
sides by m to give
a 5
F
m
(1.3)
where a 5 the dependent variable reflecting the system’s behavior, F 5 the forcing
function, and m 5 a parameter representing a property of the system. Note that for this
simple case there is no independent variable because we are not yet predicting how
acceleration varies in time or space.
Equation (1.3) has several characteristics that are typical of mathematical models of
the physical world:
1. It describes a natural process or system in mathematical terms.
2. It represents an idealization and simplification of reality. That is, the model ignores
negligible details of the natural process and focuses on its essential manifestations.
Thus, the second law does not include the effects of relativity that are of minimal
importance when applied to objects and forces that interact on or about the earth’s
surface at velocities and on scales visible to humans.
3. Finally, it yields reproducible results and, consequently, can be used for predictive
purposes. For example, if the force on an object and the mass of an object are known,
Eq. (1.3) can be used to compute acceleration.
Because of its simple algebraic form, the solution of Eq. (1.2) can be obtained eas-
ily. However, other mathematical models of physical phenomena may be much more
complex, and either cannot be solved exactly or require more sophisticated mathematical
techniques than simple algebra for their solution. To illustrate a more complex model of
this kind, Newton’s second law can be used to determine the terminal velocity of a free-
falling body near the earth’s surface. Our falling body will be a parachutist (Fig. 1.2). A
model for this case can be derived by expressing the acceleration as the time rate of
change of the velocity (dy兾dt) and substituting it into Eq. (1.3) to yield
dy
dt
5
F
m
(1.4)
where y is velocity (m/s) and t is time (s). Thus, the mass multiplied by the rate of
change of the velocity is equal to the net force acting on the body. If the net force is
positive, the object will accelerate. If it is negative, the object will decelerate. If the net
force is zero, the object’s velocity will remain at a constant level.
Next, we will express the net force in terms of measurable variables and parameters. For
a body falling within the vicinity of the earth (Fig. 1.2), the net force is composed of two
opposing forces: the downward pull of gravity FD and the upward force of air resistance FU:
F 5 FD 1 FU (1.5)
If the downward force is assigned a positive sign, the second law can be used to formu-
late the force due to gravity, as
FD 5 mg (1.6)
where g 5 the gravitational constant, or the acceleration due to gravity, which is approxi-
mately equal to 9.81 m/s2
.
FU
FD
FIGURE 1.2
Schematic diagram of the
forces acting on a falling
parachutist. FD is the downward
force due to gravity. FU is the
upward force due to air
resistance.
14 MATHEMATICAL MODELING AND ENGINEERING PROBLEM SOLVING
Air resistance can be formulated in a variety of ways. A simple approach is to as-
sume that it is linearly proportional to velocity1
and acts in an upward direction, as in
FU 5 2cy (1.7)
where c 5 a proportionality constant called the drag coefficient (kg/s). Thus, the greater
the fall velocity, the greater the upward force due to air resistance. The parameter c
accounts for properties of the falling object, such as shape or surface roughness, that
affect air resistance. For the present case, c might be a function of the type of jumpsuit
or the orientation used by the parachutist during free-fall.
The net force is the difference between the downward and upward force. Therefore,
Eqs. (1.4) through (1.7) can be combined to yield
dy
dt
5
mg 2 cy
m
(1.8)
or simplifying the right side,
dy
dt
5 g 2
c
m
y (1.9)
Equation (1.9) is a model that relates the acceleration of a falling object to the forces
acting on it. It is a differential equation because it is written in terms of the differential
rate of change (dy兾dt) of the variable that we are interested in predicting. However, in
contrast to the solution of Newton’s second law in Eq. (1.3), the exact solution of
Eq. (1.9) for the velocity of the falling parachutist cannot be obtained using simple
algebraic manipulation. Rather, more advanced techniques, such as those of calculus,
must be applied to obtain an exact or analytical solution. For example, if the parachutist
is initially at rest (y 5 0 at t 5 0), calculus can be used to solve Eq. (1.9) for
y(t) 5
gm
c
(1 2 e2(cym)t
) (1.10)
Note that Eq. (1.10) is cast in the general form of Eq. (1.1), where y(t) 5 the dependent
variable, t 5 the independent variable, c and m 5 parameters, and g 5 the forcing function.
EXAMPLE 1.1 Analytical Solution to the Falling Parachutist Problem
Problem Statement. A parachutist of mass 68.1 kg jumps out of a stationary hot air
balloon. Use Eq. (1.10) to compute velocity prior to opening the chute. The drag coefficient
is equal to 12.5 kg/s.
Solution. Inserting the parameters into Eq. (1.10) yields
y(t) 5
9.81(68.1)
12.5
(1 2 e2(12.5y68.1)t
) 5 53.44 (1 2 e20.18355t
)
which can be used to compute
1
In fact, the relationship is actually nonlinear and might better be represented by a power relationship such as
FU 5 2cy2
. We will explore how such nonlinearities affect the model in problems at the end of this chapter.
1.1 A SIMPLE MATHEMATICAL MODEL 15
t, s v, m/s
0 0.00
2 16.42
4 27.80
6 35.68
8 41.14
10 44.92
12 47.54
` 53.44
According to the model, the parachutist accelerates rapidly (Fig. 1.3). A velocity of 44.92
m/s is attained after 10 s. Note also that after a sufficiently long time, a constant veloc-
ity, called the terminal velocity, of 53.44 m/s is reached. This velocity is constant because,
eventually, the force of gravity will be in balance with the air resistance. Thus, the net
force is zero and acceleration has ceased.
Equation (1.10) is called an analytical, or exact, solution because it exactly satisfies
the original differential equation. Unfortunately, there are many mathematical models
that cannot be solved exactly. In many of these cases, the only alternative is to develop
a numerical solution that approximates the exact solution.
As mentioned previously, numerical methods are those in which the mathematical
problem is reformulated so it can be solved by arithmetic operations. This can be illustrated
FIGURE 1.3
The analytical solution to the
falling parachutist problem as
computed in Example 1.1.
Velocity increases with time and
asymptotically approaches a
terminal velocity.
0
0
20
40
4 8 12
t, s
v,
m/s
Terminal velocity
16 MATHEMATICAL MODELING AND ENGINEERING PROBLEM SOLVING
for Newton’s second law by realizing that the time rate of change of velocity can be
approximated by (Fig. 1.4):
dy
dt
>
¢y
¢t
5
y(ti11) 2 y(ti)
ti11 2 ti
(1.11)
where Dy and Dt 5 differences in velocity and time, respectively, computed over finite
intervals, y(ti) 5 velocity at an initial time ti, and y(ti+1) 5 velocity at some later time ti+1.
Note that dy/dt > ¢yy¢t is approximate because Dt is finite. Remember from calculus that
dy
dt
5 lim
¢tS0
¢y
¢t
Equation (1.11) represents the reverse process.
Equation (1.11) is called a finite divided difference approximation of the derivative
at time ti. It can be substituted into Eq. (1.9) to give
y(ti11) 2 y(ti)
ti11 2 ti
5 g 2
c
m
y(ti)
This equation can then be rearranged to yield
y(ti11) 5 y(ti) 1 cg 2
c
m
y(ti) d (ti11 2 ti) (1.12)
Notice that the term in brackets is the right-hand side of the differential equation
itself [Eq. (1.9)]. That is, it provides a means to compute the rate of change or slope of y.
Thus, the differential equation has been transformed into an equation that can be used
to determine the velocity algebraically at ti11 using the slope and previous values of
FIGURE 1.4
The use of a finite difference to
approximate the first derivative
of v with respect to t.
v(ti +1)
v(ti )
v
True slope
dv/dt
Approximate slope
v
t
v(ti +1) – v(ti )
ti +1 – ti
=
ti +1
ti t
t
1.1 A SIMPLE MATHEMATICAL MODEL 17
y and t. If you are given an initial value for velocity at some time ti, you can easily com-
pute velocity at a later time ti11. This new value of velocity at ti11 can in turn be employed
to extend the computation to velocity at ti12 and so on. Thus, at any time along the way,
New value 5 old value 1 slope 3 step size
Note that this approach is formally called Euler’s method.
EXAMPLE 1.2 Numerical Solution to the Falling Parachutist Problem
Problem Statement. Perform the same computation as in Example 1.1 but use Eq. (1.12)
to compute the velocity. Employ a step size of 2 s for the calculation.
Solution. At the start of the computation (ti 5 0), the velocity of the parachutist is
zero. Using this information and the parameter values from Example 1.1, Eq. (1.12) can
be used to compute velocity at ti11 5 2 s:
y 5 0 1 c 9.81 2
12.5
68.1
(0) d 2 5 19.62 m/s
For the next interval (from t 5 2 to 4 s), the computation is repeated, with the result
y 5 19.62 1 c9.81 2
12.5
68.1
(19.62)d2 5 32.04 m/s
The calculation is continued in a similar fashion to obtain additional values:
t, s v, m/s
0 0.00
2 19.62
4 32.04
6 39.90
8 44.87
10 48.02
12 50.01
` 53.44
The results are plotted in Fig. 1.5 along with the exact solution. It can be seen that
the numerical method captures the essential features of the exact solution. However, be-
cause we have employed straight-line segments to approximate a continuously curving
function, there is some discrepancy between the two results. One way to minimize such
discrepancies is to use a smaller step size. For example, applying Eq. (1.12) at l-s intervals
results in a smaller error, as the straight-line segments track closer to the true solution.
Using hand calculations, the effort associated with using smaller and smaller step sizes
would make such numerical solutions impractical. However, with the aid of the computer,
large numbers of calculations can be performed easily. Thus, you can accurately model the
velocity of the falling parachutist without having to solve the differential equation exactly.
As in the previous example, a computational price must be paid for a more accurate
numerical result. Each halving of the step size to attain more accuracy leads to a doubling
18 MATHEMATICAL MODELING AND ENGINEERING PROBLEM SOLVING
of the number of computations. Thus, we see that there is a trade-off between accuracy
and computational effort. Such trade-offs figure prominently in numerical methods and
constitute an important theme of this book. Consequently, we have devoted the Epilogue
of Part One to an introduction to more of these trade-offs.
1.2 CONSERVATION LAWS AND ENGINEERING
Aside from Newton’s second law, there are other major organizing principles in engineering.
Among the most important of these are the conservation laws. Although they form the
basis for a variety of complicated and powerful mathematical models, the great conserva-
tion laws of science and engineering are conceptually easy to understand. They all boil
down to
Change 5 increases 2 decreases (1.13)
This is precisely the format that we employed when using Newton’s law to develop a
force balance for the falling parachutist [Eq. (1.8)].
Although simple, Eq. (1.13) embodies one of the most fundamental ways in which
conservation laws are used in engineering—that is, to predict changes with respect to
time. We give Eq. (1.13) the special name time-variable (or transient) computation.
Aside from predicting changes, another way in which conservation laws are applied
is for cases where change is nonexistent. If change is zero, Eq. (1.13) becomes
Change 5 0 5 increases 2 decreases
or
Increases 5 decreases (1.14)
0
0
20
40
4 8 12
t, s
v,
m/s
Terminal velocity
Exact, analytical solution
Approximate, numerical solution
FIGURE 1.5
Comparison of the numerical
and analytical solutions for the
falling parachutist problem.
1.2 CONSERVATION LAWS AND ENGINEERING 19
Thus, if no change occurs, the increases and decreases must be in balance. This case,
which is also given a special name—the steady-state computation—has many applica-
tions in engineering. For example, for steady-state incompressible fluid flow in pipes, the
flow into a junction must be balanced by flow going out, as in
Flow in 5 flow out
For the junction in Fig. 1.6, the balance can be used to compute that the flow out of the
fourth pipe must be 60.
For the falling parachutist, steady-state conditions would correspond to the case
where the net force was zero, or [Eq. (1.8) with dy兾dt 5 0]
mg 5 cy (1.15)
Thus, at steady state, the downward and upward forces are in balance, and Eq. (1.15)
can be solved for the terminal velocity
y 5
mg
c
Although Eqs. (1.13) and (1.14) might appear trivially simple, they embody the two
fundamental ways that conservation laws are employed in engineering. As such, they will
form an important part of our efforts in subsequent chapters to illustrate the connection
between numerical methods and engineering. Our primary vehicles for making this con-
nection are the engineering applications that appear at the end of each part of this book.
Table 1.1 summarizes some of the simple engineering models and associated conserva-
tion laws that will form the basis for many of these engineering applications. Most of the
chemical engineering applications will focus on mass balances for reactors. The mass balance
is derived from the conservation of mass. It specifies that the change of mass of a chemical
in the reactor depends on the amount of mass flowing in minus the mass flowing out.
Both the civil and mechanical engineering applications will focus on models devel-
oped from the conservation of momentum. For civil engineering, force balances are
utilized to analyze structures such as the simple truss in Table 1.1. The same principles
are employed for the mechanical engineering applications to analyze the transient
up-and-down motion or vibrations of an automobile.
Pipe 2
Flow in = 80
Pipe 3
Flow out = 120
Pipe 4
Flow out = ?
Pipe 1
Flow in = 100
FIGURE 1.6
A flow balance for steady-state
incompressible fluid flow at the
junction of pipes.
20 MATHEMATICAL MODELING AND ENGINEERING PROBLEM SOLVING
TABLE 1.1 Devices and types of balances that are commonly used in the four major areas of engineering. For each
case, the conservation law upon which the balance is based is specified.
Structure
Civil engineering Conservation of
momentum
Chemical engineering
Field Device Organizing Principle Mathematical Expression
Conservation of mass
Force balance:
Mechanical engineering Conservation of
momentum
Machine Force balance:
Electrical engineering Conservation of charge Current balance:
Conservation of energy Voltage balance:
Mass balance:
Reactors Input Output
Over a unit of time period
mass = inputs – outputs
At each node
 horizontal forces (FH) = 0
 vertical forces (FV) = 0
For each node
 current (i) = 0
Around each loop
 emf’s –  voltage drops for resistors = 0
  –  iR = 0
–FV
+FV
+FH
–FH
+i2
–i3
+i1
+
–
Circuit
i1R1
i3R3
i2R2 ␰
Upward force
Downward force
x = 0
m = downward force – upward force
d2
x
dt2
PROBLEMS 21
TABLE 1.2 Some practical issues that will be explored in the engineering applications
at the end of each part of this book.
1. Nonlinear versus linear. Much of classical engineering depends on linearization to permit analytical
solutions. Although this is often appropriate, expanded insight can often be gained if nonlinear
problems are examined.
2. Large versus small systems. Without a computer, it is often not feasible to examine systems with over
three interacting components. With computers and numerical methods, more realistic multicomponent
systems can be examined.
3. Nonideal versus ideal. Idealized laws abound in engineering. Often there are nonidealized
alternatives that are more realistic but more computationally demanding. Approximate numerical
approaches can facilitate the application of these nonideal relationships.
4. Sensitivity analysis. Because they are so involved, many manual calculations require a great deal of
time and effort for successful implementation. This sometimes discourages the analyst from
implementing the multiple computations that are necessary to examine how a system responds under
different conditions. Such sensitivity analyses are facilitated when numerical methods allow the
computer to assume the computational burden.
5. Design. It is often a straightforward proposition to determine the performance of a system as a
function of its parameters. It is usually more difficult to solve the inverse problem—that is, determining
the parameters when the required performance is specified. Numerical methods and computers often
permit this task to be implemented in an efficient manner.
Finally, the electrical engineering applications employ both current and energy bal-
ances to model electric circuits. The current balance, which results from the conservation
of charge, is similar in spirit to the flow balance depicted in Fig. 1.6. Just as flow must
balance at the junction of pipes, electric current must balance at the junction of electric
wires. The energy balance specifies that the changes of voltage around any loop of the
circuit must add up to zero. The engineering applications are designed to illustrate how
numerical methods are actually employed in the engineering problem-solving process.
As such, they will permit us to explore practical issues (Table 1.2) that arise in real-world
applications. Making these connections between mathematical techniques such as nu-
merical methods and engineering practice is a critical step in tapping their true potential.
Careful examination of the engineering applications will help you to take this step.
PROBLEMS
1.1 Use calculus to solve Eq. (1.9) for the case where the initial
velocity, y(0) is nonzero.
1.2 Repeat Example 1.2. Compute the velocity to t 5 8 s, with a
step size of (a) 1 and (b) 0.5 s. Can you make any statement regard-
ing the errors of the calculation based on the results?
1.3 Rather than the linear relationship of Eq. (1.7), you might
choose to model the upward force on the parachutist as a second-
order relationship,
FU 5 2c¿y2
where c9 5 a bulk second-order drag coefficient (kg/m).
(a) Using calculus, obtain the closed-form solution for the case
where the jumper is initially at rest (y 5 0 at t 5 0).
(b) Repeat the numerical calculation in Example 1.2 with the same
initial condition and parameter values, but with second-order
drag. Use a value of 0.22 kg/m for c9.
1.4 For the free-falling parachutist with linear drag, assume a first
jumper is 70 kg and has a drag coefficient of 12 kg/s. If a second jumper
has a drag coefficient of 15 kg/s and a mass of 80 kg, how long will it
take him to reach the same velocity the first jumper reached in 9 s?
1.5 Compute the velocity of a free-falling parachutist using Euler’s
method for the case where m 5 80 kg and c 5 10 kg/s. Perform the
calculation from t 5 0 to 20 s with a step size of 1 s. Use an initial
condition that the parachutist has an upward velocity of 20 m/s at
t 5 0.At t 5 10 s, assume that the chute is instantaneously deployed
so that the drag coefficient jumps to 60 kg/s.
22 MATHEMATICAL MODELING AND ENGINEERING PROBLEM SOLVING
1.6 The following information is available for a bank account:
Date Deposits Withdrawals Interest Balance
5/1 1522.33
220.13 327.26
6/1
216.80 378.51
7/1
450.35 106.80
8/1
127.31 350.61
9/1
Note that the money earns interest which is computed as
Interest 5 i Bi
where i 5 the interest rate expressed as a fraction per month, and Bi
the initial balance at the beginning of the month.
(a) Use the conservation of cash to compute the balance on 6/1,
7/1, 8/1, and 9/1 if the interest rate is 1% per month (i 5
0.01/month). Show each step in the computation.
(b) Write a differential equation for the cash balance in the form
dB
dt
5 f(D(t), W(t), i)
where t 5 time (months), D(t) 5 deposits as a function of time
($/month), W(t) 5 withdrawals as a function of time ($/month).
For this case, assume that interest is compounded continu-
ously; that is, interest 5 iB.
(c) Use Euler’s method with a time step of 0.5 month to simulate
the balance. Assume that the deposits and withdrawals are ap-
plied uniformly over the month.
(d) Develop a plot of balance versus time for (a) and (c).
1.7 The amount of a uniformly distributed radioactive contaminant
contained in a closed reactor is measured by its concentration c
(becquerel/liter or Bq/L). The contaminant decreases at a decay
rate proportional to its concentration—that is,
decay rate 5 2kc
where k is a constant with units of day21
. Therefore, according to
Eq. (1.13), a mass balance for the reactor can be written as
dc
dt
5 2kc
a
change
in mass
b 5 a
decrease
by decay
b
(a) Use Euler’s method to solve this equation from t 5 0 to 1 d
with k 5 0.175d21
. Employ a step size of Dt 5 0.1. The con-
centration at t 5 0 is 100 Bq/L.
(b) Plot the solution on a semilog graph (i.e., ln c versus t) and
determine the slope. Interpret your results.
1.8 A group of 35 students attend a class in a room that measures
11 m by 8 m by 3 m. Each student takes up about 0.075 m3
and
gives out about 80 W of heat (1 W 5 1 J/s). Calculate the air tem-
perature rise during the first 20 minutes of the class if the room is
completely sealed and insulated. Assume the heat capacity, Cy, for
air is 0.718 kJ/(kg K). Assume air is an ideal gas at 208C and
101.325 kPa. Note that the heat absorbed by the air Q is related to
the mass of the air m, the heat capacity, and the change in tempera-
ture by the following relationship:
Q 5 m #
T2
T1
CydT 5 mCy (T2 2 T1)
The mass of air can be obtained from the ideal gas law:
PV 5
m
MwT
RT
where P is the gas pressure, V is the volume of the gas, Mwt is the
molecular weight of the gas (for air, 28.97 kg/kmol), and R is the
ideal gas constant [8.314 kPa m3
/(kmol K)].
1.9 A storage tank contains a liquid at depth y, where y 5 0 when
the tank is half full. Liquid is withdrawn at a constant flow rate Q to
meet demands. The contents are resupplied at a sinusoidal rate
3Q sin2
(t).
y
0
FIGURE P1.9
Equation (1.13) can be written for this system as
d(Ay)
dt
5 3Qsin2
(t) 2 Q
a
change in
volume
b 5 (inflow) 2 (outflow)
or, since the surface area A is constant
dy
dt
5 3
Q
A
sin2
(t) 2
Q
A
PROBLEMS 23
Use Euler’s method to solve for the depth y from t 5 0 to 10 d with
a step size of 0.5 d. The parameter values are A 5 1250 m2
and
Q 5 450 m3
/d. Assume that the initial condition is y 5 0.
1.10 For the same storage tank described in Prob. 1.9, suppose that
the outflow is not constant but rather depends on the depth. For this
case, the differential equation for depth can be written as
dy
dt
5 3
Q
A
sin2
(t) 2
a(1 1 y)1.5
A
Use Euler’s method to solve for the depth y from t 5 0 to 10 d with a step
size of 0.5 d. The parameter values are A 5 1250 m2
, Q 5 450 m3
/d,
and a 5 150. Assume that the initial condition is y 5 0.
1.11 Apply the conservation of volume (see Prob. 1.9) to simulate
the level of liquid in a conical storage tank (Fig. P1.11). The liquid
flows in at a sinusoidal rate of Qin 5 3 sin2
(t) and flows out accord-
ing to
Qout 5 3(y 2 yout)1.5
y . yout
Qout 5 0 y # yout
where flow has units of m3
/d and y 5 the elevation of the water sur-
face above the bottom of the tank (m). Use Euler’s method to solve
for the depth y from t 5 0 to 10 d with a step size of 0.5 d. The pa-
rameter values are rtop 5 2.5 m, ytop 5 4 m, and yout 5 1 m. Assume
that the level is initially below the outlet pipe with y(0) 5 0.8 m.
ytop
y
yout
0
Qin
Qout
s
1
rtop
FIGURE P1.11
1.12 In our example of the free-falling parachutist, we assumed that
the acceleration due to gravity was a constant value. Although this is
a decent approximation when we are examining falling objects near
the surface of the earth, the gravitational force decreases as we move
above sea level. A more general representation based on Newton’s
inverse square law of gravitational attraction can be written as
g(x) 5 g(0)
R2
(R 1 x)2
where g(x) 5 gravitational acceleration at altitude x (in m) mea-
sured upward from the earth’s surface (m/s2
), g(0) 5 gravitational
acceleration at the earth’s surface ( 9.81 m/s2
), and R 5 the earth’s
radius ( 6.37 3 106
m).
(a) In a fashion similar to the derivation of Eq. (1.9) use a force
balance to derive a differential equation for velocity as a func-
tion of time that utilizes this more complete representation of
gravitation. However, for this derivation, assume that upward
velocity is positive.
(b) For the case where drag is negligible, use the chain rule to ex-
press the differential equation as a function of altitude rather
than time. Recall that the chain rule is
dy
dt
5
dy
dx
dx
dt
(c) Use calculus to obtain the closed form solution where y 5 y0 at
x 5 0.
(d) Use Euler’s method to obtain a numerical solution from x 5 0
to 100,000 m using a step of 10,000 m where the initial velocity
is 1500 m/s upward. Compare your result with the analytical
solution.
1.13 Suppose that a spherical droplet of liquid evaporates at a rate
that is proportional to its surface area.
dV
dt
5 2kA
where V 5 volume (mm3
), t 5 time (min), k 5 the evaporation rate
(mm/min), and A 5 surface area (mm2
). Use Euler’s method to
compute the volume of the droplet from t 5 0 to 10 min using a step
size of 0.25 min.Assume that k 5 0.08 mm/min and that the droplet
initially has a radius of 2.5 mm. Assess the validity of your results
by determining the radius of your final computed volume and veri-
fying that it is consistent with the evaporation rate.
1.14 Newton’s law of cooling says that the temperature of a body
changes at a rate proportional to the difference between its
temperature and that of the surrounding medium (the ambient
temperature),
dT
dt
5 2k(T 2 Ta)
where T 5 the temperature of the body (8C), t 5 time (min),
k 5 the proportionality constant (per minute), and Ta 5 the ambi-
ent temperature (8C). Suppose that a cup of coffee originally has
a temperature of 708C. Use Euler’s method to compute the
temperature from t 5 0 to 10 min using a step size of 2 min if
Ta 5 208C and k 5 0.019/min.
1.15 As depicted in Fig. P1.15, an RLC circuit consists of three
elements: a resistor (R), and inductor (L) and a capacitor (C). The
flow of current across each element induces a voltage drop.
24 MATHEMATICAL MODELING AND ENGINEERING PROBLEM SOLVING
Q1
Q10 Q9 Q8
Q3 Q5
Q7
Q6
Q4
Q2
FIGURE P1.17
1.18 The velocity is equal to the rate of change of distance x (m),
dx
dt
5 y(t) (P1.18)
(a) Substitute Eq. (1.10) and develop an analytical solution for
distance as a function of time. Assume that x(0) 5 0.
(b) Use Euler’s method to numerically integrate Eqs. (P1.18) and
(1.9) in order to determine both the velocity and distance fallen
as a function of time for the first 10 s of free-fall using the same
parameters as in Example 1.2.
(c) Develop a plot of your numerical results together with the ana-
lytical solution.
1.19 You are working as a crime-scene investigator and must pre-
dict the temperature of a homicide victim over a 5-hr period. You
know that the room where the victim was found was at 108C when
the body was discovered.
(a) Use Newton’s law of cooling (Prob. 1.14) and Euler’s method
to compute the victim’s body temperature for the 5-hr period
using values of k 5 0.12/hr and Dt 5 0.5 hr. Assume that the
victim’s body temperature at the time of death was 378C, and
that the room temperature was at a constant value of 108C over
the 5-hr period.
(b) Further investigation reveals that the room temperature had
actually dropped linearly from 20 to 108C over the 5-hr period.
Repeat the same calculation as in (a) but incorporate this new
information.
(c) Compare the results from (a) and (b) by plotting them on the
same graph.
1.20 Suppose that a parachutist with linear drag (m 5 70 kg,
c 5 12.5 kg/s) jumps from an airplane flying at an altitude of a kilo-
meter with a horizontal velocity of 180 m/s relative to the ground.
(a) Write a system of four differential equations for x, y, yx 5 dx/dt
and yy 5 dy/dt.
Kirchhoff’s second voltage law states that the algebraic sum of
these voltage drops around a closed circuit is zero,
iR 1 L
di
dt
1
q
C
5 0
where i 5 current, R 5 resistance, L 5 inductance, t 5 time, q 5 charge,
and C 5 capacitance. In addition, the current is related to charge as in
dq
dt
5 i
(a) If the initial values are i(0) 5 0 and q(0) 5 1 C, use Euler’s
method to solve this pair of differential equations from t 5 0 to
0.1 s using a step size of Dt 5 0.01 s. Employ the following
parameters for your calculation: R 5 200 V, L 5 5 H, and
C 5 10–4
F.
(b) Develop a plot of i and q versus t.
q
c
iR
Resistor Inductor Capacitor
i
di
dt
L
FIGURE P1.15
1.16 Cancer cells grow exponentially with a doubling time of 20 h
when they have an unlimited nutrient supply. However, as the cells
start to form a solid spherical tumor without a blood supply, growth
at the center of the tumor becomes limited, and eventually cells
start to die.
(a) Exponential growth of cell number N can be expressed as
shown, where m is the growth rate of the cells. For cancer cells,
find the value of m.
dN
dt
5 mN
(b) Write an equation that will describe the rate of change of tumor
volume during exponential growth given that the diameter of
an individual cell is 20 microns.
(c) After a particular type of tumor exceeds 500 microns in diam-
eter, the cells at the center of the tumor die (but continue to take
up space in the tumor). Determine how long it will take for the
tumor to exceed this critical size.
1.17 A fluid is pumped into the network shown in Fig. P1.17. If
Q2 5 0.6, Q3 5 0.4, Q7 5 0.2, and Q8 5 0.3 m3
/s, determine the
other flows.
PROBLEMS 25
(b) At steady-state, use this equation to solve for the particle’s
terminal velocity.
(c) Employ the result of (b) to compute the particle’s terminal
velocity in m/s for a spherical silt particle settling in water:
d 5 10 mm, r 5 1 g/cm3
, rs 5 2.65 g/cm3
, and m 5 0.014 g/(cm?s).
(d) Check whether flow is laminar.
(e) Use Euler’s method to compute the velocity from t 5 0 to 2215
s
with Dt 5 2218
s given the parameters given previously along
with the initial condition: y (0) 5 0.
FG
FD FB
d
FIGURE P1.22
1.23 As described in Prob. 1.22, in addition to the downward force
of gravity (weight) and drag, an object falling through a fluid is also
subject to a buoyancy force that is proportional to the displaced
volume. For example, for a sphere with diameter d (m), the sphere’s
volume is V 5 pd3
/6 and its projected area is A 5 pd2
/4. The buoy-
ancy force can then be computed as Fb 5 –rVg. We neglected
buoyancy in our derivation of Eq. (1.9) because it is relatively small
for an object like a parachutist moving through air. However, for a
more dense fluid like water, it becomes more prominent.
(a) Derive a differential equation in the same fashion as Eq. (1.9),
but include the buoyancy force and represent the drag force as
described in Prob. 1.21.
(b) Rewrite the differential equation from (a) for the special case
of a sphere.
(c) Use the equation developed in (b) to compute the terminal
velocity (i.e., for the steady-state case). Use the following
parameter values for a sphere falling through water: sphere
diameter 5 1 cm, sphere density 5 2700 kg/m3
, water density 5
1000 kg/m3
, and Cd 5 0.47.
(d) Use Euler’s method with a step size of Dt 5 0.03125 s to nu-
merically solve for the velocity from t 5 0 to 0.25 s with an
initial velocity of zero.
1.24 As depicted in Fig. P1.24, the downward deflection y (m) of a
cantilever beam with a uniform load w (kg/m) can be computed as
y 5
w
24EI
(x4
2 4Lx3
1 6L2
x2
)
where x 5 distance (m), E 5 the modulus of elasticity 5 2 3 1011
Pa, I 5 moment of inertia 5 3.25 3 10–4
m4
, w 5 10,000 N/m, and
(b) If the initial horizontal position is defined as x 5 0, use Euler’s
methods with Dt 5 1 s to compute the jumper’s position over
the first 10 s.
(c) Develop plots of y versus t and y versus x. Use the plot to
graphically estimate when and where the jumper would hit the
ground if the chute failed to open.
1.21 As noted in Prob. 1.3, drag is more accurately represented as
depending on the square of velocity. A more fundamental represen-
tation of the drag force, which assumes turbulent conditions (i.e., a
high Reynolds number), can be formulated as
Fd 5 2
1
2
rACdyZyZ
where Fd 5 the drag force (N), r 5 fluid density (kg/m3
), A 5 the fron-
tal area of the object on a plane perpendicular to the direction of motion
(m2
), y 5 velocity (m/s), and Cd 5 a dimensionless drag coefficient.
(a) Write the pair of differential equations for velocity and position
(see Prob. 1.18) to describe the vertical motion of a sphere with di-
ameter d (m) and a density of rs (kg/km3
). The differential equation
for velocity should be written as a function of the sphere’s diameter.
(b) Use Euler’s method with a step size of Dt 5 2 s to compute the posi-
tion and velocity of a sphere over the first 14 s. Employ the follow-
ing parameters in your calculation: d 5 120 cm, r 5 1.3 kg/m3
,
rs 5 2700 kg/m3
, and Cd 5 0.47. Assume that the sphere has
the initial conditions: x(0) 5 100 m and y(0) 5 –40 m/s.
(c) Develop a plot of your results (i.e., y and y versus t) and use it
to graphically estimate when the sphere would hit the ground.
(d) Compute the value for the bulk second-order drag coefficient
cd9 (kg/m). Note that, as described in Prob. 1.3, the bulk second-
order drag coefficient is the term in the final differential equa-
tion for velocity that multiplies the term yZyZ.
1.22 As depicted in Fig. P1.22, a spherical particle settling through a
quiescent fluid is subject to three forces: the downward force of gravity
(FG), and the upward forces of buoyancy (FB) and drag (FD). Both the
gravity and buoyancy forces can be computed with Newton’s second
law with the latter equal to the weight of the displaced fluid. For lami-
nar flow, the drag force can be computed with Stokes’s law,
FD 5 3pmdy
where m 5 the dynamic viscosity of the fluid (N s/m2
), d 5 the
particle diameter (m), and y 5 the particle’s settling velocity (m/s).
Note that the mass of the particle can be expressed as the product of
the particle’s volume and density rs (kg/m3
) and the mass of the dis-
placed fluid can be computed as the product of the particle’s volume
and the fluid’s density r (kg/m3
). The volume of a sphere is pd3
/6. In
addition, laminar flow corresponds to the case where the dimension-
less Reynolds number, Re, is less than 1, where Re 5 rdy/m.
(a) Use a force balance for the particle to develop the differential
equation for dy/dt as a function of d, r, rs, and m.
26 MATHEMATICAL MODELING AND ENGINEERING PROBLEM SOLVING
1.26 Beyond fluids, Archimedes’ principle has proven useful in
geology when applied to solids on the earth’s crust. Figure P1.26
depicts one such case where a lighter conical granite mountain
“floats on” a denser basalt layer at the earth’s surface. Note that the
part of the cone below the surface is formally referred to as a frus-
tum. Develop a steady-state force balance for this case in terms of
the following parameters: basalt’s density (rb), granite’s density
(rg), the cone’s bottom radius (r), and the height above (h1) and
below (h2) the earth’s surface.
H
Basalt
Granite
h1
h2
r1
r2
FIGURE P1.26
L 5 length 5 4 m. This equation can be differentiated to yield the
slope of the downward deflection as a function of x:
dy
dx
5
w
24EI
(4x3
2 12Lx2
1 12L2
x)
If y 5 0 at x 5 0, use this equation with Euler’s method (Dx 5 0.125 m)
to compute the deflection from x 5 0 to L. Develop a plot of your results
along with the analytical solution computed with the first equation.
y
w
x = 0 x = L
0
FIGURE P1.24
A cantilever beam.
1.25 Use Archimedes’ principle to develop a steady-state force bal-
ance for a spherical ball of ice floating in seawater (Fig. P1.25). The
force balance should be expressed as a third-order polynomial (cubic)
in terms of height of the cap above the water line (h), the seawater’s
density (rf), the ball’s density (rs), and the ball’s radius (r).
h
r
FIGURE P1.25
2
27
Programming and Software
In Chap. 1, we used a net force to develop a mathematical model to predict the fall
velocity of a parachutist. This model took the form of a differential equation,
dy
dt
5 g 2
c
m
y
We also learned that a solution to this equation could be obtained by a simple numerical
approach called Euler’s method,
yi11 5 yi 1
dyi
dt
¢t
Given an initial condition, this equation can be implemented repeatedly to compute
the velocity as a function of time. However, to obtain good accuracy, many small steps
must be taken. This would be extremely laborious and time-consuming to implement by
hand. However, with the aid of the computer, such calculations can be performed easily.
So our next task is to figure out how to do this. The present chapter will introduce
you to how the computer is used as a tool to obtain such solutions.
2.1 PACKAGES AND PROGRAMMING
Today, there are two types of software users. On one hand, there are those who take what
they are given. That is, they limit themselves to the capabilities found in the software’s
standard mode of operation. For example, it is a straightforward proposition to solve a
system of linear equations or to generate a plot of x-y values with either Excel or MATLAB
software. Because this usually involves a minimum of effort, most users tend to adopt this
“vanilla” mode of operation. In addition, since the designers of these packages anticipate
most typical user needs, many meaningful problems can be solved in this way.
But what happens when problems arise that are beyond the standard capability of
the tool? Unfortunately, throwing up your hands and saying, “Sorry boss, no can do!” is
not acceptable in most engineering circles. In such cases, you have two alternatives.
First, you can look for a different package and see if it is capable of solving the
problem. That is one of the reasons we have chosen to cover both Excel and MATLAB
in this book. As you will see, neither one is all encompassing and each has different
C H A P T E R 2
28 PROGRAMMING AND SOFTWARE
strengths. By being conversant with both, you will greatly increase the range of problems
you can address.
Second, you can grow and become a “power user” by learning to write Excel VBA1
macros or MATLAB M-files. And what are these? They are nothing more than computer
programs that allow you to extend the capabilities of these tools. Because engineers should
never be content to be tool limited, they will do whatever is necessary to solve their prob-
lems. A powerful way to do this is to learn to write programs in the Excel and MATLAB
environments. Furthermore, the programming skills required for macros and M-files are the
same as those needed to effectively develop programs in languages like Fortran 90 or C.
The major goal of the present chapter is to show you how this can be done. However,
we do assume that you have been exposed to the rudiments of computer programming.
Therefore, our emphasis here is on facets of programming that directly affect its use in
engineering problem solving.
2.1.1 Computer Programs
Computer programs are merely a set of instructions that direct the computer to perform
a certain task. Since many individuals write programs for a broad range of applications,
most high-level computer languages, like Fortran 90 and C, have rich capabilities.
Although some engineers might need to tap the full range of these capabilities, most
merely require the ability to perform engineering-oriented numerical calculations.
Looked at from this perspective, we can narrow down the complexity to a few
programming topics. These are:
• Simple information representation (constants, variables, and type declarations).
• Advanced information representation (data structure, arrays, and records).
• Mathematical formulas (assignment, priority rules, and intrinsic functions).
• Input/output.
• Logical representation (sequence, selection, and repetition).
• Modular programming (functions and subroutines).
Because we assume that you have had some prior exposure to programming, we will
not spend time on the first four of these areas. At best, we offer them as a checklist that
covers what you will need to know to implement the programs that follow.
However, we will devote some time to the last two topics. We emphasize logical
representation because it is the single area that most influences an algorithm’s coherence
and understandability. We include modular programming because it also contributes
greatly to a program’s organization. In addition, modules provide a means to archive
useful algorithms in a convenient format for subsequent applications.
2.2 STRUCTURED PROGRAMMING
In the early days of computers, programmers usually did not pay much attention to
whether their programs were clear and easy to understand. Today, it is recognized that
there are many benefits to writing organized, well-structured code. Aside from the obvious
benefit of making software much easier to share, it also helps generate much more efficient
1
VBA is the acronym for Visual Basic for Applications.
2.2 STRUCTURED PROGRAMMING 29
program development. That is, well-structured algorithms are invariably easier to debug
and test, resulting in programs that take a shorter time to develop, test, and update.
Computer scientists have systematically studied the factors and procedures needed
to develop high-quality software of this kind. In essence, structured programming is a
set of rules that prescribe good style habits for the programmer. Although structured
programming is flexible enough to allow considerable creativity and personal expression,
its rules impose enough constraints to render the resulting codes far superior to unstruc-
tured versions. In particular, the finished product is more elegant and easier to understand.
A key idea behind structured programming is that any numerical algorithm can be
composed using the three fundamental control structures: sequence, selection, and rep-
etition. By limiting ourselves to these structures, the resulting computer code will be
clearer and easier to follow.
In the following paragraphs, we will describe each of these structures. To keep this
description generic, we will employ flowcharts and pseudocode. A flowchart is a visual
or graphical representation of an algorithm. The flowchart employs a series of blocks and
arrows, each of which represents a particular operation or step in the algorithm (Fig. 2.1).
The arrows represent the sequence in which the operations are implemented.
Not everyone involved with computer programming agrees that flowcharting is a
productive endeavor. In fact, some experienced programmers do not advocate flow-
charts. However, we feel that there are three good reasons for studying them. First, they
are still used for expressing and communicating algorithms. Second, even if they are
not employed routinely, there will be times when they will prove useful in planning,
unraveling, or communicating the logic of your own or someone else’s program. Finally,
and most important for our purposes, they are excellent pedagogical tools. From a
FIGURE 2.1
Symbols used in flowcharts.
SYMBOL NAME
Terminal
Flowlines
Process
Input/output
Decision
Junction
Off-page
connector
Count-controlled
loop
FUNCTION
Represents the beginning or end of a program.
Represents the flow of logic. The humps on the horizontal arrow indicate that
it passes over and does not connect with the vertical flowlines.
Represents calculations or data manipulations.
Represents inputs or outputs of data and information.
Represents a comparison, question, or decision that determines alternative
paths to be followed.
Represents the confluence of flowlines.
Represents a break that is continued on another page.
Used for loops which repeat a prespecified number of iterations.
30 PROGRAMMING AND SOFTWARE
teaching perspective, they are ideal vehicles for visualizing some of the fundamental
control structures employed in computer programming.
An alternative approach to express an algorithm that bridges the gap between flow-
charts and computer code is called pseudocode. This technique uses code-like statements
in place of the graphical symbols of the flowchart. We have adopted some style conventions
for the pseudocode in this book. Keywords such as IF, DO, INPUT, etc., are capitalized,
whereas the conditions, processing steps, and tasks are in lowercase. Additionally, the
processing steps are indented. Thus the keywords form a “sandwich” around the steps
to visually define the extent of each control structure.
One advantage of pseudocode is that it is easier to develop a program with it than
with a flowchart. The pseudocode is also easier to modify and share with others. However,
because of their graphic form, flowcharts sometimes are better suited for visualizing
complex algorithms. In the present text, we will use flowcharts for pedagogical purposes.
Pseudocode will be our principal vehicle for communicating algorithms related to
numerical methods.
2.2.1 Logical Representation
Sequence. The sequence structure expresses the trivial idea that unless you direct it
otherwise, the computer code is to be implemented one instruction at a time. As in Fig. 2.2,
the structure can be expressed generically as a flowchart or as pseudocode.
Selection. In contrast to the step-by-step sequence structure, selection provides a means
to split the program’s flow into branches based on the outcome of a logical condition.
Figure 2.3 shows the two most fundamental ways for doing this.
The single-alternative decision, or IF/THEN structure (Fig. 2.3a), allows for a detour
in the program flow if a logical condition is true. If it is false, nothing happens and the
program moves directly to the next statement following the ENDIF. The double-alternative
decision, or IF/THEN/ELSE structure (Fig. 2.3b), behaves in the same manner for a true
condition. However, if the condition is false, the program implements the code between
the ELSE and the ENDIF.
FIGURE 2.2
(a) Flowchart and
(b) pseudocode for the
sequence structure.
Instruction1
Instruction2
Instruction3
Instruction4
Instruction1
Instruction2
Instruction3
Instruction4
(a) Flowchart (b) Pseudocode
2.2 STRUCTURED PROGRAMMING 31
Although the IF/THEN and the IF/THEN/ELSE constructs are sufficient to construct
any numerical algorithm, two other variants are commonly used. Suppose that the ELSE
clause of an IF/THEN/ELSE contains another IF/THEN. For such cases, the ELSE and
the IF can be combined in the IF/THEN/ELSEIF structure shown in Fig. 2.4a.
Notice how in Fig. 2.4a there is a chain or “cascade” of decisions. The first one is
the IF statement, and each successive decision is an ELSEIF statement. Going down the
chain, the first condition encountered that tests true will cause a branch to its correspond-
ing code block followed by an exit of the structure. At the end of the chain of conditions,
if all the conditions have tested false, an optional ELSE block can be included.
The CASE structure is a variant on this type of decision making (Fig. 2.4b). Rather
than testing individual conditions, the branching is based on the value of a single test
expression. Depending on its value, different blocks of code will be implemented. In
addition, an optional block can be implemented if the expression takes on none of the
prescribed values (CASE ELSE).
Repetition. Repetition provides a means to implement instructions repeatedly. The
resulting constructs, called loops, come in two “flavors” distinguished by how they are
terminated.
FIGURE 2.3
Flowchart and pseudocode for
simple selection constructs.
(a) Single-alternative selection
(IF/THEN) and (b) double-
alternative selection
(IF/THEN/ELSE).
(a) Single-alternative structure (IF/THEN)
(b) Double-alternative structure (IF/THEN/ELSE)
Flowchart Pseudocode
IF condition THEN
True block
ENDIF
True
Condition
?
True Block
IF condition THEN
True block
ELSE
False block
ENDIF
True
False
Condition
?
True Block
False Block
32 PROGRAMMING AND SOFTWARE
The first and most fundamental type is called a decision loop because it terminates
based on the result of a logical condition. Figure 2.5 shows the most generic type of
decision loop, the DOEXIT construct, also called a break loop. This structure repeats
until a logical condition is true.
It is not necessary to have two blocks in this structure. If the first block is not
included, the structure is sometimes called a pretest loop because the logical test is
performed before anything occurs. Alternatively, if the second block is omitted, it is
(a) Multialternative structure (IF/THEN/ELSEIF)
(b) CASE structure (SELECT or SWITCH)
Flowchart Pseudocode
SELECT CASE Test Expression
CASE Value1
Block1
CASE Value2
Block2
CASE Value3
Block3
CASE ELSE
Block4
END SELECT
Value1 Value2 Value3 Else
Test
expression
Block1 Block2 Block3 Block4
IF condition1 THEN
Block1
ELSEIF condition2
Block2
ELSEIF condition3
Block3
ELSE
Block4
ENDIF
True
False
True
True
Condition1
?
False
Condition3
?
False
Condition2
?
Block1
Block2
Block3
Block4
FIGURE 2.4
Flowchart and pseudocode for supplementary selection or branching constructs. (a) Multiple-
alternative selection (IF/THEN/ELSEIF) and (b) CASE construct.
2.2 STRUCTURED PROGRAMMING 33
called a posttest loop. Because both blocks are included, the general case in Fig. 2.5 is
sometimes called a midtest loop.
It should be noted that the DOEXIT loop was introduced in Fortran 90 in an effort
to simplify decision loops. This control construct is a standard part of the Excel VBA
macro language but is not standard in C or MATLAB, which use the so-called WHILE
structure. Because we believe that the DOEXIT is superior, we have adopted it as our
decision loop structure throughout this book. In order to ensure that our algorithms are
directly implemented in both MATLAB and Excel, we will show how the break loop
can be simulated with the WHILE structure later in this chapter (see Sec. 2.5).
The break loop in Fig. 2.5 is called a logical loop because it terminates on a logical
condition. In contrast, a count-controlled or DOFOR loop (Fig. 2.6) performs a specified
number of repetitions, or iterations.
The count-controlled loop works as follows. The index (represented as i in Fig. 2.6)
is a variable that is set at an initial value of start. The program then tests whether the
FIGURE 2.5
The DOEXIT or break loop.
False
True
Condition
?
DO
Block1
IF condition EXIT
Block2
ENDDO
Flowchart Pseudocode
Block1
Block2
FIGURE 2.6
The count-controlled or DOFOR
construct.
i = start
True
False
i  finish
? i = i + step
DOFOR i = start, finish, step
Block
ENDDO
Flowchart Pseudocode
Block
34 PROGRAMMING AND SOFTWARE
index is less than or equal to the final value, finish. If so, it executes the body of the
loop, and then cycles back to the DO statement. Every time the ENDDO statement is
encountered, the index is automatically increased by the step. Thus the index acts as a
counter. Then, when the index is greater than the final value (finish), the computer auto-
matically exits the loop and transfers control to the line following the ENDDO statement.
Note that for nearly all computer languages, including those of Excel and MATLAB, if
the step is omitted, the computer assumes it is equal to 1.2
The numerical algorithms outlined in the following pages will be developed exclu-
sively from the structures outlined in Figs. 2.2 through 2.6. The following example
illustrates the basic approach by developing an algorithm to determine the roots for the
quadratic formula.
EXAMPLE 2.1 Algorithm for Roots of a Quadratic
Problem Statement. The roots of a quadratic equation
ax2
1 bx 1 c 5 0
can be determined with the quadratic formula,
x1
x2
5
2b 6 2Zb2
2 4acZ
2a
(E2.1.1)
Develop an algorithm that does the following:
Step 1: Prompts the user for the coefficients, a, b, and c.
Step 2: Implements the quadratic formula, guarding against all eventualities (for example,
avoiding division by zero and allowing for complex roots).
Step 3: Displays the solution, that is, the values for x.
Step 4: Allows the user the option to return to step 1 and repeat the process.
Solution. We will use a top-down approach to develop our algorithm. That is, we will
successively refine the algorithm rather than trying to work out all the details the first
time around.
To do this, let us assume for the present that the quadratic formula is foolproof
regardless of the values of the coefficients (obviously not true, but good enough for now).
A structured algorithm to implement the scheme is
DO
INPUT a, b, c
r1 5 (2b 1 SQRT(b2
2 4ac))y(2a)
r2 5 (2b 2 SQRT(b2
2 4ac))y(2a)
DISPLAY r1, r2
DISPLAY 'Try again? Answer yes or no'
INPUT response
IF response 5 'no' EXIT
ENDDO
2
A negative step can be used. In such cases, the loop terminates when the index is less than the final value.
2.2 STRUCTURED PROGRAMMING 35
A DOEXIT construct is used to implement the quadratic formula repeatedly as long as
the condition is false. The condition depends on the value of the character variable response.
If response is equal to ‘yes’ the calculation is implemented. If not, that is, response 5 ‘no’
the loop terminates. Thus, the user controls termination by inputting a value for response.
Now although the above algorithm works for certain cases, it is not foolproof. Depend-
ing on the values of the coefficients, the algorithm might not work. Here is what can happen:
• If a 5 0, an immediate problem arises because of division by zero. In fact, close
inspection of Eq. (E2.1.1) indicates that two different cases can arise. That is,
If b fi 0, the equation reduces to a linear equation with one real root, 2cyb.
If b 5 0, then no solution exists. That is, the problem is trivial.
• If a fi 0, two possible cases occur depending on the value of the discriminant,
d 5 b2
2 4ac. That is,
If d $ 0, two real roots occur.
If d , 0, two complex roots occur.
Notice how we have used indentation to highlight the decisional structure that underlies
the mathematics. This structure then readily translates to a set of coupled IF/THEN/ELSE
structures that can be inserted in place of the shaded statements in the previous code to give
the final algorithm:
DO
INPUT a, b, c
r1 5 0: r2 5 0: i1 5 0: i2 5 0
IF a 5 0 THEN
IF b fi 0 THEN
r1 5 2cyb
ELSE
DISPLAY Trivial solution
ENDIF
ELSE
discr 5 b2
2 4 * a * c
IF discr $ 0 THEN
r1 5 (2b 1 Sqrt(discr))y(2 * a)
r2 5 (2b 2 Sqrt(discr))y(2 * a)
ELSE
r1 5 2by(2 * a)
r2 5 r1
i1 5 Sqrt(Abs(discr))y(2 * a)
i2 5 2il
ENDIF
ENDIF
DISPLAY r1, r2, i1, i2
DISPLAY 'Try again? Answer yes or no'
INPUT response
IF response 5 'no' EXIT
ENDDO
36 PROGRAMMING AND SOFTWARE
The approach in the foregoing example can be employed to develop an algorithm
for the parachutist problem. Recall that, given an initial condition for time and velocity,
the problem involved iteratively solving the formula
yi11 5 yi 1
dyi
dt
¢t (2.1)
Now also remember that if we desired to attain good accuracy, we would need to employ
small steps. Therefore, we would probably want to apply the formula repeatedly from
the initial time to the final time. Consequently, an algorithm to solve the problem would
be based on a loop.
For example, suppose that we started the computation at t 5 0 and wanted to predict
the velocity at t 5 4 s using a time step of Dt 5 0.5 s. We would, therefore, need to
apply Eq. (2.1) eight times, that is,
n 5
4
0.5
5 8
where n 5 the number of iterations of the loop. Because this result is exact, that is, the
ratio is an integer, we can use a count-controlled loop as the basis for the algorithm.
Here is an example of the pseudocode:
g 5 9.81
INPUT cd, m
INPUT ti, vi, tf, dt
t 5 ti
v 5 vi
n 5 (tf 2 ti) y dt
DOFOR i 5 1 TO n
dvdt 5 g 2 (cd y m) * v
v 5 v 1 dvdt * dt
t 5 t 1 dt
ENDDO
DISPLAY v
Although this scheme is simple to program, it is not foolproof. In particular, it will
work only if the computation interval is evenly divisible by the time step.3
In order to
cover such cases, a decision loop can be substituted in place of the shaded area in the
previous pseudocode. The final result is
g 5 9.81
INPUT cd, m
INPUT ti, vi, tf, dt
t 5 ti
v 5 vi
3
This problem is compounded by the fact that computers use base-2 number representation for their internal
math. Consequently, some apparently evenly divisible numbers do not yield integers when the division is
implemented on a computer. We will cover this in Chap. 3.
2.3 MODULAR PROGRAMMING 37
h 5 dt
DO
IF t 1 dt . tf THEN
h 5 tf 2 t
ENDIF
dvdt 5 g 2 (cd y m) * v
v 5 v 1 dvdt * h
t 5 t 1 h
IF t $ tf EXIT
ENDDO
DISPLAY v
As soon as we enter the loop, we use an IF/THEN structure to test whether adding
t 1 dt will take us beyond the end of the interval. If it does not, which would usually
be the case at first, we do nothing. If it does, we would need to shorten the interval by
setting the variable step h to t f 2 t. By doing this, we guarantee that the next step falls
exactly on t f. After we implement this final step, the loop will terminate because the
condition t $ t f will test true.
Notice that before entering the loop, we assign the value of the time step, dt, to
another variable, h. We create this dummy variable so that our routine does not change
the given value of dt if and when we shorten the time step. We do this in anticipation
that we might need to use the original value of dt somewhere else in the event that this
code is integrated within a larger program.
It should be noted that the algorithm is still not foolproof. For example, the user
could have mistakenly entered a step size greater than the calculation interval, for
example, t f 2 ti 5 5 and dt 5 20. Thus, you might want to include error traps in your
code to catch such errors and to then allow the user to correct the mistake.
2.3 MODULAR PROGRAMMING
Imagine how difficult it would be to study a textbook that had no chapters, sections, or
paragraphs. Breaking complicated tasks or subjects into more manageable parts is one
way to make them easier to handle. In the same spirit, computer programs can be divided
into small subprograms, or modules, that can be developed and tested separately. This
approach is called modular programming.
The most important attribute of modules is that they be as independent and self-
contained as possible. In addition, they are typically designed to perform a specific,
well-defined function and have one entry and one exit point. As such, they are usually
short (generally 50 to 100 instructions in length) and highly focused.
In standard high-level languages such as Fortran 90 or C, the primary programming
element used to represent each module is the procedure. A procedure is a series of com-
puter instructions that together perform a given task. Two types of procedures are com-
monly employed: functions and subroutines. The former usually returns a single result,
whereas the latter returns several.
In addition, it should be mentioned that much of the programming related to software
packages like Excel and MATLAB involves the development of subprograms. Hence,
38 PROGRAMMING AND SOFTWARE
Excel macros and MATLAB functions are designed to receive some information, perform
a calculation, and return results. Thus, modular thinking is also consistent with how
programming is implemented in package environments.
Modular programming has a number of advantages. The use of small, self-contained
units makes the underlying logic easier to devise and to understand for both the developer
and the user. Development is facilitated because each module can be perfected in isolation.
In fact, for large projects, different programmers can work on individual parts. Modular
design also increases the ease with which a program can be debugged and tested because
errors can be more easily isolated. Finally, program maintenance and modification are
facilitated. This is primarily due to the fact that new modules can be developed to perform
additional tasks and then easily incorporated into the already coherent and organized scheme.
While all these attributes are reason enough to use modules, the most important
reason related to numerical engineering problem solving is that they allow you to main-
tain your own library of useful modules for later use in other programs. This will be the
philosophy of this book: All the algorithms will be presented as modules.
This approach is illustrated in Fig. 2.7, which shows a function developed to imple-
ment Euler’s method. Notice that this function application and the previous versions
differ in how they handle input/output. In the former versions, input and output directly
come from (via INPUT statements) and to (via DISPLAY statements) the user. In the
function, the inputs are passed into the FUNCTION via its argument list
Function Euler(dt, ti, tf, yi)
and the output is returned via the assignment statement
y 5 Euler(dt, ti, tf, yi)
In addition, recognize how generic the routine has become. There are no references
to the specifics of the parachutist problem. For example, rather than calling the dependent
FUNCTION Euler(dt, ti, tf, yi)
t 5 ti
y 5 yi
h 5 dt
DO
IF t 1 dt . tf THEN
h 5 tf 2 t
ENDIF
dydt 5 dy(t, y)
y 5 y 1 dydt * h
t 5 t 1 h
IF t $ tf EXIT
ENDDO
Euler 5 y
END Euler
FIGURE 2.7
Pseudocode for a function that
solves a differential equation
using Euler’s method.
2.4 EXCEL 39
variable y for velocity, the more generic label, y, is used within the function. Further,
notice that the derivative is not computed within the function by an explicit equation.
Rather, another function, dy, must be invoked to compute it. This acknowledges the fact
that we might want to use this function for many different problems beyond solving for
the parachutist’s velocity.
2.4 EXCEL
Excel is the spreadsheet produced by Microsoft, Inc. Spreadsheets are a special type of
mathematical software that allow the user to enter and perform calculations on rows and
columns of data. As such, they are a computerized version of a large accounting work-
sheet on which large interconnected calculations can be implemented and displayed.
Because the entire calculation is updated when any value on the sheet is changed, spread-
sheets are ideal for “what if?” sorts of analysis.
Excel has some built-in numerical capabilities including equation solving, curve
fitting, and optimization. It also includes VBA as a macro language that can be used to
implement numerical calculations. Finally, it has several visualization tools, such as
graphs and three-dimensional surface plots, that serve as valuable adjuncts for numerical
analysis. In the present section, we will show how these capabilities can be used to solve
the parachutist problem.
To do this, let us first set up a simple spreadsheet. As shown below, the first step
involves entering labels and numbers into the spreadsheet cells.
Before we write a macro program to calculate the numerical value, we can make
our subsequent work easier by attaching names to the parameter values. To do this, select
cells A3:B5 (the easiest way to do this is by moving the mouse to A3, holding down the
left mouse button and dragging down to B5). Next, go to the Formulas tab and in the
Defined Names group, click Create from Selection. This will open the Create Names
from Selection dialog box, where the Left column box should be automatically selected.
Then click OK to create the names. To verify that this has worked properly, select cell B3
and check that the label “m” appears in the name box (located on the left side of the
sheet just below the menu bars).
40 PROGRAMMING AND SOFTWARE
Move to cell C8 and enter the analytical solution (Eq. 1.9),
=9.81*m/cd*(1−exp(−cd/m*A8))
When this formula is entered, the value 0 should appear in cell C8. Then copy the for-
mula down to cell C9 to give a value of 16.405 m/s.
All the above is typical of the standard use of Excel. For example, at this point you
could change parameter values and see how the analytical solution changes.
Now, we will illustrate how VBA macros can be used to extend the standard capa-
bilities. Figure 2.8 lists pseudocode alongside Excel VBA code for all the control struc-
tures described in Sec. 2.2 (Figs. 2.3 through 2.6). Notice how, although the details
differ, the structure of the pseudocode and the VBA code are identical.
We can now use some of the constructs from Fig. 2.8 to write a macro function to
numerically compute velocity. Open VBA by selecting4
Tools Macro Visual Basic Editor
Once inside the Visual Basic Editor (VBE), select
Insert Module
and a new code window will open up. The following VBA function can be developed
directly from the pseudocode in Fig. 2.7. Type it into the code window.
Option Explicit
Function Euler(dt, ti, tf, yi, m, cd)
Dim h As Double, t As Double, y As Double, dydt As Double
t = ti
y = yi
h = dt
Do
If t + dt  tf Then
h = tf − t
End If
dydt = dy(t, y, m, cd)
y = y + dydt * h
t = t + h
If t = tf Then Exit Do
Loop
Euler = y
End Function
Compare this macro with the pseudocode from Fig. 2.7 and recognize how similar
they are. Also, see how we have expanded the function’s argument list to include the
necessary parameters for the parachutist velocity model. The resulting velocity, y, is then
passed back to the spreadsheet via the function name.
4
The hot key combination Alt-F11 is even quicker!
41
(a) Pseudocode
IF/THEN:
IF condition THEN
True block
ENDIF
IF/THEN/ELSE:
IF condition THEN
True block
ELSE
False block
ENDIF
IF/THEN/ELSEIF:
IF condition1 THEN
Block1
ELSEIF condition2
Block2
ELSEIF condition3
Block3
ELSE
Block4
ENDIF
CASE:
SELECT CASE Test Expression
CASE Value1
Block1
CASE Value2
Block2
CASE Value3
Block3
CASE ELSE
Block4
END SELECT
DOEXIT:
DO
Block1
IF condition EXIT
Block2
ENDDO
COUNT-CONTROLLED LOOP:
DOFOR i = start, finish, step
Block
ENDDO
(b) Excel VBA
If b  0 Then
r1 = −c / b
End If
If a  0 Then
b = Sqr(Abs(a))
Else
b = Sqr(a)
End If
If class = 1 Then
x = x + 8
ElseIf class  1 Then
x = x − 8
ElseIf class  10 Then
x = x − 32
Else
x = x − 64
End If
Select Case a + b
Case Is  −50
x = −5
Case Is  0
x = −5 − (a + b) / 10
Case Is  50
x = (a + b) / 10
Case Else
x = 5
End Select
Do
i = i + 1
If i = 10 Then Exit Do
j = i*x
Loop
For i = 1 To 10 Step 2
x = x + i
Next i
FIGURE 2.8
The fundamental control
structures in (a) pseudocode
and (b) Excel VBA.
42 PROGRAMMING AND SOFTWARE
Also notice how we have included another function to compute the derivative. This
can be entered in the same module by typing it directly below the Euler function,
Function dy(t, v, m, cd)
Const g As Double = 9.81
dy = g − (cd / m) * v
End Function
The final step is to return to the spreadsheet and invoke the function by entering the
following formula in cell B9
=Euler(dt,A8,A9,B8,m,cd)
The result of the numerical integration, 16.531, will appear in cell B9.
You should appreciate what has happened here. When you enter the function into
the spreadsheet cell, the parameters are passed into the VBA program where the calcula-
tion is performed and the result is then passed back and displayed in the cell. In effect,
the VBA macro language allows you to use Excel as your input/output mechanism. All
sorts of benefits arise from this fact.
For example, now that you have set up the calculation, you can play with it. Suppose
that the jumper was much heavier, say, m 5 100 kg (about 220 lb). Enter 100 into cell B3
and the spreadsheet will update immediately to show a value of 17.438 in cell B9. Change
the mass back to 68.1 kg and the previous result, 16.531, automatically reappears in cell B9.
Now let us take the process one step further by filling in some additional numbers for
the time. Enter the numbers 4, 6, . . . 16 in cells A10 through A16. Then copy the formu-
las from cells B9:C9 down to rows 10 through 16. Notice how the VBA program calculates
the numerical result correctly for each new row. (To verify this, change dt to 2 and compare
with the results previously computed by hand in Example 1.2.) An additional embellish-
ment would be to develop an x-y plot of the results using the Excel Chart Wizard.
The final spreadsheet is shown below. We now have created a pretty nice problem-
solving tool. You can perform sensitivity analyses by changing the values for each of
2.5 MATLAB 43
the parameters. As each new value is entered, the computation and the graph would be
automatically updated. It is this interactive nature that makes Excel so powerful. How-
ever, recognize that the ability to solve this problem hinges on being able to write the
macro with VBA.
It is the combination of the Excel environment with the VBA programming language
that truly opens up a world of possibilities for engineering problem solving. In the com-
ing chapters, we will illustrate how this is accomplished.
2.5 MATLAB
MATLAB is the flagship software product of The MathWorks, Inc., which was cofounded
by the numerical analysts Cleve Moler and John N. Little. As the name implies, MATLAB
was originally developed as a matrix laboratory. To this day, the major element of MAT-
LAB is still the matrix. Mathematical manipulations of matrices are very conveniently
implemented in an easy-to-use, interactive environment. To these matrix manipulations,
MATLAB has added a variety of numerical functions, symbolic computations, and visu-
alization tools. As a consequence, the present version represents a fairly comprehensive
technical computing environment.
MATLAB has a variety of functions and operators that allow convenient implemen-
tation of many of the numerical methods developed in this book. These will be described
in detail in the individual chapters that follow. In addition, programs can be written as
so-called M-files that can be used to implement numerical calculations. Let us explore
how this is done.
First, you should recognize that normal MATLAB use is closely related to program-
ming. For example, suppose that we wanted to determine the analytical solution to the
parachutist problem. This could be done with the following series of MATLAB commands
 g=9.81;
 m=68.1;
 cd=12.5;
 tf=2;
 v=g*m/cd*(1−exp(−cd/m*tf))
with the result being displayed as
v =
16.4217
Thus, the sequence of commands is just like the sequence of instructions in a typical
programming language.
Now what if you want to deviate from the sequential structure. Although there are
some neat ways to inject some nonsequential capabilities in the standard command mode,
the inclusion of decisions and loops is best done by creating a MATLAB document called
an M-file. To do this, make the menu selection
File New Script
44 PROGRAMMING AND SOFTWARE
and a new window will open with a heading “MATLAB Editor/Debugger.” In this
window, you can type and edit MATLAB programs. Type the following code there:
g=9.81;
m=68.1;
cd=12.5;
tf=2;
v=g*m/cd*(1−exp(−cd/m*tf))
Notice how the commands are written in exactly the way as they would be written
in the front end of MATLAB. Save the program with the name: analpara. MATLAB will
automatically attach the extension .m to denote it as an M-file: analpara.m.
To run the program, you must go back to the command mode. The most direct way
to do this is to click on the “MATLAB Command Window” button on the task bar (which
is usually at the bottom of the screen).
The program can now be run by typing the name of the M-file, analpara, which
should look like
 analpara
If you have done everything correctly, MATLAB should respond with the correct answer:
v =
16.4217
Now one problem with the foregoing is that it is set up to compute one case only. You
can make it more flexible by having the user input some of the variables. For example,
suppose that you wanted to assess the impact of mass on the velocity at 2 s. The M-file
could be rewritten as the following to accomplish this
g=9.81;
m=input('mass (kg)
: ')
;
cd=12.5;
tf=2;
v=g*m/cd*(1−exp(−cd/m*tf))
Save this as analpara2.m. If you typed analpara2 while being in command mode, the
prompt would show
mass (kg):
The user could then enter a value like 100, and the result will be displayed as
v =
17.3597
Now it should be pretty clear how we can program a numerical solution with an
M-file. In order to do this, we must first understand how MATLAB handles logical and
looping structures. Figure 2.9 lists pseudocode alongside MATLAB code for all the
2.5 MATLAB 45
(a) Pseudocode
IF/THEN:
IF condition THEN
True block
ENDIF
IF/THEN/ELSE:
IF condition THEN
True block
ELSE
False block
ENDIF
IF/THEN/ELSEIF:
IF condition1 THEN
Block1
ELSEIF condition2
Block2
ELSEIF condition3
Block3
ELSE
Block4
ENDIF
CASE:
SELECT CASE Test Expression
CASE Value1
Block1
CASE Value2
Block2
CASE Value3
Block3
CASE ELSE
Block4
END SELECT
DOEXIT:
DO
Block1
IF condition EXIT
Block2
ENDDO
COUNT-CONTROLLED LOOP:
DOFOR i = start, finish, step
Block
ENDDO
(b) MATLAB
if b ~= 0
r1 = −c / b;
end
if a  0
b = sqrt(abs(a));
else
b 5 sqrt(a);
end
if class == 1
x = x + 8;
elseif class  1
x = x − 8;
elseif class  10
x = x − 32;
else
x = x − 64;
end
switch a + b
case 1
x = −25;
case 2
x = −5 − (a + b) / 10;
case 3
x = (a + b) / 10;
otherwise
x = 5;
end
while (1)
i = i + 1;
if i = 10, break, end
j = i*x;
end
for i = 1:2:10
x = x + i;
end
FIGURE 2.9
The fundamental control
structures in (a) pseudocode
and (b) the MATLAB program-
ming language.
46 PROGRAMMING AND SOFTWARE
control structures from Sec. 2.2. Although the structures of the pseudocode and the
MATLAB code are very similar, there are some slight differences that should be noted.
In particular, look at how we have represented the DOEXIT structure. In place of
the DO, we use the statement WHILE(1). Because MATLAB interprets the number 1 as
corresponding to “true,” this statement will repeat infinitely in the same manner as the
DO statement. The loop is terminated with a break command. This command transfers
control to the statement following the end statement that terminates the loop.
Also notice that the parameters of the count-controlled loop are ordered differently. For
the pseudocode, the loop parameters are specified as start,finish,step. For MAT-
LAB, the parameters are ordered as start:step:finish.
The following MATLAB M-file can now be developed directly from the pseudocode
in Fig. 2.7. Type it into the MATLAB Editor/Debugger:
g=9.81;
m=input('mass (kg)
:')
;
cd=12.5;
ti=0;
tf=2;
vi=0;
dt=0.1;
t = ti;
v = vi;
h = dt;
while (1)
if t + dt  tf
h = tf − t;
end
dvdt = g − (cd / m) * v;
v = v + dvdt * h;
t = t + h;
if t = tf, break, end
end
disp('velocity (m/s):')
disp(v)
Save this file as numpara.m and return to the command mode and run it by entering:
numpara. The following output should result:
mass (kg): 100
velocity (m/s):
17.4559
As a final step in this development, let us take the above M-file and convert it into
a proper function. This can be done in the following M-file based on the pseudocode
from Fig. 2.7
function yy 5 euler(dt,ti,tf,yi,m,cd)
t = ti;
y = yi;
h = dt;
2.6 MATHCAD 47
while (1)
if t + dt  tf
h = tf − t;
end
dydt = dy(t, y, m, cd);
y = y + dydt * h;
t = t + h;
if t = tf, break, end
end
yy = y;
Save this file as euler.m and then create another M-file to compute the derivative,
function dydt = dy(t, v, m, cd)
g = 9.81;
dydt = g − (cd / m) * v;
Save this file as dy.m and return to the command mode. In order to invoke the function
and see the result, you can type in the following commands
 m=68.1;
 cd=12.5;
 ti=0;
 tf=2.;
 vi=0;
 dt=0.1;
 euler(dt,ti,tf,vi,m,cd)
When the last command is entered, the answer will be displayed as
ans =
16.5478
It is the combination of the MATLAB environment with the M-file programming
language that truly opens up a world of possibilities for engineering problem solving. In
the coming chapters we will illustrate how this is accomplished.
2.6 MATHCAD
Mathcad attempts to bridge the gap between spreadsheets like Excel and notepads. It
was originally developed by Allen Razdow of MIT who cofounded Mathsoft, Inc., which
published the first commercial version in 1986. Today, Mathsoft is part of Parametric
Technology Corporation (PTC) and Mathcad is in version 15.
Mathcad is essentially an interactive notepad that allows engineers and scientists to
perform a number of common mathematical, data-handling, and graphical tasks. Informa-
tion and equations are input to a “whiteboard” design environment that is similar in spirit
to a page of paper. Unlike a programming tool or spreadsheet, Mathcad’s interface
accepts and displays natural mathematical notation using keystrokes or menu palette
clicks—with no programming required. Because the worksheets contain live calculations,
a single keystroke that changes an input or equation instantly returns an updated result.
48 PROGRAMMING AND SOFTWARE
Mathcad can perform tasks in either numeric or symbolic mode. In numeric mode,
Mathcad functions and operators give numerical responses, whereas in symbolic mode results
are given as general expressions or equations. Maple V, a comprehensive symbolic math
package, is the basis of the symbolic mode and was incorporated into Mathcad in 1993.
Mathcad has a variety of functions and operators that allow convenient implementa-
tion of many of the numerical methods developed in this book. These will be described
in detail in succeeding chapters. In the event that you are unfamiliar with Mathcad,
Appendix C also provides a primer on using this powerful software.
2.7 OTHER LANGUAGES AND LIBRARIES
In Secs. 2.4 and 2.5, we showed how Excel and MATLAB function procedures for
Euler’s method could be developed from an algorithm expressed as pseudocode. You
should recognize that similar functions can be written in high-level languages like Fortran
90 and C++. For example, a Fortran 90 function for Euler’s method is
Function Euler(dt, ti, tf, yi, m, cd)
REAL dt, ti, tf, yi, m, cd
Real h, t, y, dydt
t = ti
y = yi
h = dt
Do
If (t + dt  tf) Then
h = tf − t
End If
dydt = dy(t, y, m, cd)
y = y + dydt * h
t = t + h
If (t = tf) Exit
End Do
Euler = y
End Function
For C, the result would look quite similar to the MATLAB function. The point is
that once a well-structured algorithm is developed in pseudocode form, it can be readily
implemented in a variety of programming environments.
In this book, our approach will be to provide you with well-structured procedures
written as pseudocode. This collection of algorithms then constitutes a numerical library
that can be accessed to perform specific numerical tasks in a range of software tools and
programming languages.
Beyond your own programs, you should be aware that commercial programming
libraries contain many useful numerical procedures. For example, the Numerical Recipe
library includes a large range of algorithms written in Fortran and C.5
These procedures
are described in both book (for example, Press et al. 2007) and electronic form.
5
Numerical Recipe procedures are also available in book and electronic format for Pascal, MS BASIC, and
MATLAB. Information on all the Numerical Recipe products can be found at http://www.nr.com/.
PROBLEMS 49
2.4 The sine function can be evaluated by the following infinite series:
sinx 5 x 2
x3
3!
1
x5
5!
2
x7
7!
1 p
Write an algorithm to implement this formula so that it computes
and prints out the values of sin x as each term in the series is added.
In other words, compute and print in sequence the values for
sinx 5 x
sinx 5 x 2
x3
3!
sinx 5 x 2
x3
3!
1
x5
5!
up to the order term n of your choosing. For each of the preceding,
compute and display the percent relative error as
% error 5
true 2 series approximation
true
3 100%
Write the algorithm as (a) a structured flowchart and (b) pseudocode.
2.5 Develop, debug, and document a program for Prob. 2.4 in either a
high-level language or a macro language of your choice. Employ the
library function for the sine in your computer to determine the true
value. Have the program print out the series approximation and the error
at each step.As a test case, employ the program to compute sin(1.5) for
up to and including the term x15
/15!. Interpret your results.
2.6 The following algorithm is designed to determine a grade for a
course that consists of quizzes, homework, and a final exam:
Step 1: Input course number and name.
Step 2: Input weighting factors for quizzes (WQ), homework
(WH), and the final exam (WF).
Step 3: Input quiz grades and determine an average quiz grade (AQ).
Step 4: Input homework grades and determine an average home-
work grade (AH).
Step 5: If this course has a final grade, continue to step 6. If not, go
to step 9.
Step 6: Input final exam grade (FE).
Step 7: Determine average grade AG according to
AG 5
WQ 3 AQ 1 WH 3 AH 1 WF 3 FE
WQ 1 WH 1 WF
3 100%
Step 8: Go to step 10.
Step 9: Determine average grade AG according to
AG 5
WQ 3 AQ 1 WH 3 AH
WQ 1 WH
3 100%
2.1 Write pseudocode to implement the flowchart depicted in
Fig. P2.1. Make sure that proper indentation is included to make
the structure clear.
F
F
F
T
T
T
x = 75 x = 0
x = x – 50
x ≤ 500
x  50
x  100
FIGURE P2.1
2.2 Rewrite the following pseudocode using proper indentation
DO
j 5 j 1 1
x 5 x 1 5
IF x . 5 THEN
y 5 x
ELSE
y 5 0
ENDIF
z 5 x 1 y
IF z . 50 EXIT
ENDDO
2.3 Develop, debug, and document a program to determine the
roots of a quadratic equation, ax2
1 bx 1 c, in either a high-level
language or a macro language of your choice. Use a subroutine
procedure to compute the roots (either real or complex). Perform
test runs for the cases (a) a 5 1, b 5 6, c 5 2; (b) a 5 0, b 5 24,
c 5 1.6; (c) a 5 3, b 5 2.5, c 5 7.
PROBLEMS
50 PROGRAMMING AND SOFTWARE
2.8 An amount of money P is invested in an account where interest
is compounded at the end of the period. The future worth F yielded
at an interest rate i after n periods may be determined from the
following formula:
F 5 P(1 1 i)n
Write a program that will calculate the future worth of an investment
for each year from 1 through n. The input to the function should
include the initial investment P, the interest rate i (as a decimal),
and the number of years n for which the future worth is to be calcu-
lated. The output should consist of a table with headings and
columns for n and F. Run the program for P 5 $100,000, i 5 0.04,
and n 5 11 years.
2.9 Economic formulas are available to compute annual payments
for loans. Suppose that you borrow an amount of money P and
agree to repay it in n annual payments at an interest rate of i. The
formula to compute the annual payment A is
A 5 P
i(1 1 i)n
(1 1 i)n
2 1
Write a program to compute A. Test it with P 5 $55,000 and an
interest rate of 6.6% (i 5 0.066). Compute results for n 5 1, 2, 3, 4,
and 5 and display the results as a table with headings and columns
for n and A.
2.10 The average daily temperature for an area can be approxi-
mated by the following function,
T 5 Tmean 1 (Tpeak 2 Tmean) cos (v(t 2 tpeak))
where Tmean 5 the average annual temperature, Tpeak 5 the peak
temperature, v 5 the frequency of the annual variation (5 2p/365),
and tpeak 5 day of the peak temperature (˘ 205 d). Develop a
program that computes the average temperature between two days
of the year for a particular city. Test it for (a) January–February
(t 5 0 to 59) in Miami, Florida (Tmean 5 22.18C; Tpeak 5 28.38C),
and (b) July–August (t 5 180 to 242) in Boston, Massachusetts
(Tmean 5 10.78C; Tpeak 5 22.98C).
2.11 Develop, debug, and test a program in either a high-level
language or a macro language of your choice to compute the
velocity of the falling parachutist as outlined in Example 1.2.
Design the program so that it allows the user to input values for
the drag coefficient and mass. Test the program by duplicating
the results from Example 1.2. Repeat the computation but em-
ploy step sizes of 1 and 0.5 s. Compare your results with the
analytical solution obtained previously in Example 1.1. Does a
smaller step size make the results better or worse? Explain your
results.
2.12 The bubble sort is an inefficient, but easy-to-program,
sorting technique. The idea behind the sort is to move down
through an array comparing adjacent pairs and swapping the
Step 10: Print out course number, name, and average grade.
Step 11: Terminate computation.
(a) Write well-structured pseudocode to implement this algorithm.
(b) Write, debug, and document a structured computer program
based on this algorithm. Test it using the following data to
calculate a grade without the final exam and a grade with the
final exam: WQ 5 30; WH 5 40; WF 5 30; quizzes 5 98, 95,
90, 60, 99; homework 5 98, 95, 86, 100, 100, 77; and final
exam 5 91.
2.7 The “divide and average” method, an old-time method for
approximating the square root of any positive number a can be
formulated as
x 5
x 1 ayx
2
(a) Write well-structured pseudocode to implement this algorithm
as depicted in Fig. P2.7. Use proper indentation so that the
structure is clear.
(b) Develop, debug, and document a program to implement this
equation in either a high-level language or a macro language of
your choice. Structure your code according to Fig. P2.7.
F
F
T
T
SquareRoot = 0
SquareRoot = x
y = (x + a/x)/2
e = |(y – x)/y|
x = y
tol = 106
x = a/2
a  0
e  tol
FIGURE P2.7
PROBLEMS 51
decisional control structures (like If/Then, ElseIf, Else, End If).
Design the function so that it returns the volume for all cases
where the depth is less than 3R. Return an error message
(“Overtop”) if you overtop the tank, that is, d . 3R. Test it with
the following data:
R 1 1 1 1
d 0.5 1.2 3.0 3.1
2R
R
d
FIGURE P2.13
I
II
III IV
␪
r
x
y
FIGURE P2.14
2.14 Two distances are required to specify the location of a point
relative to an origin in two-dimensional space (Fig. P2.14):
• The horizontal and vertical distances (x, y) in Cartesian
coordinates
• The radius and angle (r, u) in radial coordinates.
values if they are out of order. For this method to sort the array
completely, it may need to pass through it many times. As the
passes proceed for an ascending-order sort, the smaller elements
in the array appear to rise toward the top like bubbles. Eventu-
ally, there will be a pass through the array where no swaps are
required. Then, the array is sorted. After the first pass, the larg-
est value in the array drops directly to the bottom. Consequently,
the second pass only has to proceed to the second-to-last value,
and so on. Develop a program to set up an array of 20 random
numbers and sort them in ascending order with the bubble sort
(Fig. P2.12).
T
T
T
F
F
F
m = n – 1
switch = false
switch = true
m = m – 1
i = 1
i = i + 1
i  m
swap
ai ai+1
start
end
ai  ai+1
Not
switch
FIGURE P2.12
2.13 Figure P2.13 shows a cylindrical tank with a conical base.
If the liquid level is quite low in the conical part, the volume is
simply the conical volume of liquid. If the liquid level is mid-
range in the cylindrical part, the total volume of liquid includes
the filled conical part and the partially filled cylindrical part.
Write a well-structured function procedure to compute the
tank’s volume as a function of given values of R and d. Use
52 PROGRAMMING AND SOFTWARE
Letter Criteria
A 90 # numeric grade # 100
B 80 # numeric grade , 90
C 70 # numeric grade , 80
D 60 # numeric grade , 70
F numeric grade , 60
2.16 Develop well-structured function procedures to determine
(a) the factorial; (b) the minimum value in a vector; and (c) the
average of the values in a vector.
2.17 Develop well-structured programs to (a) determine the square
root of the sum of the squares of the elements of a two-dimensional
array (i.e., a matrix) and (b) normalize a matrix by dividing each
row by the maximum absolute value in the row so that the maxi-
mum element in each row is 1.
2.18 Piecewise functions are sometimes useful when the relation-
ship between a dependent and an independent variable cannot be
adequately represented by a single equation. For example, the
velocity of a rocket might be described by
y(t) 5 e
11t2
2 5t 0 # t # 10
1100 2 5t 10 # t # 20
50t 1 2(t 2 20)2
20 # t # 30
1520e20.2(t230)
t . 30
0 otherwise
Develop a well-structured function to compute v as a function of t.
Then use this function to generate a table of v versus t for t 5 25
to 50 at increments of 0.5.
2.19 Develop a well-structured function to determine the elapsed
days in a year. The function should be passed three values: mo 5 the
month (1–12), da 5 the day (1–31) and leap 5 (0 for non–leap
year and 1 for leap year). Test it for January 1, 1999; February 29,
2000; March 1, 2001; June 21, 2002; and December 31, 2004.
Hint: a nice way to do this combines the for and the switch
structures.
2.20 Develop a well-structured function to determine the elapsed
days in a year. The first line of the function should be set up as
function nd = days(mo, da, year)
where mo 5 the month (1–12), da 5 the day (1–31) and year 5 the
year. Test it for January 1, 1999; February 29, 2000; March 1, 2001;
June 21, 2002; and December 31, 2004.
2.21 Manning’s equation can be used to compute the velocity of
water in a rectangular open channel,
U 5
2S
n
a
BH
B 1 2H
b
2y3
It is relatively straightforward to compute Cartesian coordinates
(x, y) on the basis of polar coordinates (r, u). The reverse process
is not so simple. The radius can be computed by the following
formula:
r 5 2x2
1 y2
If the coordinates lie within the first and fourth coordinates (i.e.,
x . 0), then a simple formula can be used to compute u
u 5 tan21
a
y
x
b
The difficulty arises for the other cases. The following table sum-
marizes the possibilities:
x y U
,0 .0 tan21
(y/x) 1 p
,0 ,0 tan21
(y/x) 2 p
,0 50 p
50 .0 p/2
50 ,0 2p/2
50 50 0
(a) Write a well-structured flowchart for a subroutine procedure to
calculate r and u as a function of x and y. Express the final
results for u in degrees.
(b) Write a well-structured function procedure based on your
flowchart. Test your program by using it to fill out the follow-
ing table:
x y r U
1 0
1 1
0 1
21 1
21 0
21 21
0 21
1 21
0 0
2.15 Develop a well-structured function procedure that is passed a
numeric grade from 0 to 100 and returns a letter grade according to
the scheme:
PROBLEMS 53
2.23 The volume V of liquid in a hollow horizontal cylinder
of radius r and length L is related to the depth of the liquid h by
V 5 c r2
cos 21
a
r 2 h
r
b 2 (r 2 h) 22rh 2 h2
d L
Develop a well-structured function to create a plot of volume versus
depth. Test the program for r 5 2 m and L 5 5 m.
2.24 Develop a well-structured program to compute the ve-
locity of a parachutist as a function of time using Euler’s
method. Test your program for the case where m 5 80 kg and
c 5 10 kg/s. Perform the calculation from t 5 0 to 20 s with a
step size of 2 s. Use an initial condition that the parachutist
has an upward velocity of 20 m/s at t 5 0. At t 5 10 s, assume
that the parachute is instantaneously deployed so that the drag
coefficient jumps to 50 kg/s.
2.25 The pseudocode in Fig. P2.25 computes the factorial. Express
this algorithm as a well-structured function in the language of your
choice. Test it by computing 0! and 5!. In addition, test the error
trap by trying to evaluate 22!.
FUNCTION fac(n)
IF n $ 0 THEN
x 5 1
DOFOR i 5 1, n
x 5 x ? i
END DO
fac 5 x
ELSE
display error message
terminate
ENDIF
END fac
FIGURE P2.25
20.26 The height of a small rocket y can be calculated as a function
of time after blastoff with the following piecewise function:
y 5 38.1454t 1 0.13743t3
0 # t , 15
y 5 1036 1 130.909(t 2 15) 1 6.18425(t 2 15)2
2 0.428(t 2 15)3
15 # t , 33
y 5 2900262.468(t 233)216.9274(t 233)2
1 0.41796(t 233)3
t . 33
where U 5 velocity (m/s), S 5 channel slope, n 5 roughness coef-
ficient, B 5 width (m), and H 5 depth (m). The following data are
available for five channels:
n S B H
0.035 0.0001 10 2
0.020 0.0002 8 1
0.015 0.0010 20 1.5
0.030 0.0007 24 3
0.022 0.0003 15 2.5
Write a well-structured program that computes the velocity for
each of these channels. Have the program display the input data
along with the computed velocity in tabular form where velocity
is the fifth column. Include headings on the table to label the
columns.
2.22 A simply supported beam is loaded as shown in Fig. P2.22.
Using singularity functions, the displacement along the beam can
be expressed by the equation:
uy(x) 5
25
6
[kx 2 0l4
2 kx 2 5l4
] 1
15
6
kx 2 8l3
1 75kx 2 7l2
1
57
6
x3
2 238.25x
By definition, the singularity function can be expressed as
follows:
kx 2 aln
5 e
(x 2 a)n
when x . a
0 when x # a
f
Develop a program that creates a plot of displacement versus
distance along the beam x. Note that x 5 0 at the left end of the
beam.
20 kips/ft
150 kip-ft
15 kips
5’ 2’ 1’ 2’
FIGURE P2.22
54 PROGRAMMING AND SOFTWARE
Develop a well-structured pseudocode function to compute y as a
function of t. Note that if the user enters a negative value of t or if
the rocket has hit the ground (y # 0) then return a value of zero
for y. Also, the function should be invoked in the calling program
as height(t). Write the algorithm as (a) pseudocode, or (b) in
the high-level language of your choice.
20.27 As depicted in Fig. P2.27, a water tank consists of a
cylinder topped by the frustum of a cone. Develop a well-
structured function in the high-level language or macro lan-
guage of your choice to compute the volume given the water
level h (m) above the tank’s bottom. Design the function so
that it returns a value of zero for negative h’s and the value of
the maximum filled volume for h’s greater than the tank’s maxi-
mum depth. Given the following parameters, H1 5 10 m, r1 5 4 m,
H2 5 5 m, and r2 5 6.5 m, test your function by using it to
compute the volumes and generate a graph of the volume as a
function of level from h 5 21 to 16 m.
h
H2
H1
r1
r2
FIGURE P2.27
3
C H A P T E R 3
55
Approximations and
Round-Off Errors
Because so many of the methods in this book are straightforward in description and
application, it would be very tempting at this point for us to proceed directly to the main
body of the text and teach you how to use these techniques. However, understanding the
concept of error is so important to the effective use of numerical methods that we have
chosen to devote the next two chapters to this topic.
The importance of error was introduced in our discussion of the falling parachutist
in Chap. 1. Recall that we determined the velocity of a falling parachutist by both ana-
lytical and numerical methods. Although the numerical technique yielded estimates that
were close to the exact analytical solution, there was a discrepancy, or error, because the
numerical method involved an approximation. Actually, we were fortunate in that case
because the availability of an analytical solution allowed us to compute the error exactly.
For many applied engineering problems, we cannot obtain analytical solutions. Therefore,
we cannot compute exactly the errors associated with our numerical methods. In these
cases, we must settle for approximations or estimates of the errors.
Such errors are characteristic of most of the techniques described in this book. This
statement might at first seem contrary to what one normally conceives of as sound
engineering. Students and practicing engineers constantly strive to limit errors in their
work. When taking examinations or doing homework problems, you are penalized, not
rewarded, for your errors. In professional practice, errors can be costly and sometimes
catastrophic. If a structure or device fails, lives can be lost.
Although perfection is a laudable goal, it is rarely, if ever, attained. For example, despite
the fact that the model developed from Newton’s second law is an excellent approximation,
it would never in practice exactly predict the parachutist’s fall. A variety of factors such as
winds and slight variations in air resistance would result in deviations from the prediction. If
these deviations are systematically high or low, then we might need to develop a new model.
However, if they are randomly distributed and tightly grouped around the prediction, then the
deviations might be considered negligible and the model deemed adequate. Numerical
approximations also introduce similar discrepancies into the analysis. Again, the question
is: How much the next error is present in our calculations and is it tolerable?
This chapter and Chap. 4 cover basic topics related to the identification, quan-
tification, and minimization of these errors. In this chapter, general information con-
cerned with the quantification of error is reviewed in the first sections. This is
56 APPROXIMATIONS AND ROUND-OFF ERRORS
followed by a section on one of the two major forms of numerical error: round-off
error. Round-off error is due to the fact that computers can represent only quantities
with a finite number of digits. Then Chap. 4 deals with the other major form: trun-
cation error. Truncation error is the discrepancy introduced by the fact that numeri-
cal methods may employ approximations to represent exact mathematical operations
and quantities. Finally, we briefly discuss errors not directly connected with the
numerical methods themselves. These include blunders, formulation or model errors,
and data uncertainty.
3.1 SIGNIFICANT FIGURES
This book deals extensively with approximations connected with the manipulation of
numbers. Consequently, before discussing the errors associated with numerical methods,
it is useful to review basic concepts related to approximate representation of the numbers
themselves.
Whenever we employ a number in a computation, we must have assurance that it
can be used with confidence. For example, Fig. 3.1 depicts a speedometer and odom-
eter from an automobile. Visual inspection of the speedometer indicates that the car is
traveling between 48 and 49 km/h. Because the indicator is higher than the midpoint
between the markers on the gauge, we can say with assurance that the car is traveling
at approximately 49 km/h. We have confidence in this result because two or more rea-
sonable individuals reading this gauge would arrive at the same conclusion. However,
let us say that we insist that the speed be estimated to one decimal place. For this case,
40
8 7 3 2 4
4
5
0 120
20
40
60
80
100
FIGURE 3.1
An automobile speedometer and odometer illustrating the concept of a significant figure.
3.1 SIGNIFICANT FIGURES 57
one person might say 48.8, whereas another might say 48.9 km/h. Therefore, because of
the limits of this instrument, only the first two digits can be used with confidence. Estimates
of the third digit (or higher) must be viewed as approximations. It would be ludicrous to
claim, on the basis of this speedometer, that the automobile is traveling at 48.8642138 km/h.
In contrast, the odometer provides up to six certain digits. From Fig. 3.1, we can conclude
that the car has traveled slightly less than 87,324.5 km during its lifetime. In this case, the
seventh digit (and higher) is uncertain.
The concept of a significant figure, or digit, has been developed to formally designate
the reliability of a numerical value. The significant digits of a number are those that can
be used with confidence. They correspond to the number of certain digits plus one esti-
mated digit. For example, the speedometer and the odometer in Fig. 3.1 yield readings
of three and seven significant figures, respectively. For the speedometer, the two certain
digits are 48. It is conventional to set the estimated digit at one-half of the smallest scale
division on the measurement device. Thus the speedometer reading would consist of the
three significant figures: 48.5. In a similar fashion, the odometer would yield a seven-
significant-figure reading of 87,324.45.
Although it is usually a straightforward procedure to ascertain the significant figures
of a number, some cases can lead to confusion. For example, zeros are not always sig-
nificant figures because they may be necessary just to locate a decimal point. The num-
bers 0.00001845, 0.0001845, and 0.001845 all have four significant figures. Similarly,
when trailing zeros are used in large numbers, it is not clear how many, if any, of the
zeros are significant. For example, at face value the number 45,300 may have three, four,
or five significant digits, depending on whether the zeros are known with confidence. Such
uncertainty can be resolved by using scientific notation, where 4.53 3 104
, 4.530 3 104
,
4.5300 3 104
designate that the number is known to three, four, and five significant figures,
respectively.
The concept of significant figures has two important implications for our study of
numerical methods:
1. As introduced in the falling parachutist problem, numerical methods yield approxi-
mate results. We must, therefore, develop criteria to specify how confident we are in
our approximate result. One way to do this is in terms of significant figures. For
example, we might decide that our approximation is acceptable if it is correct to four
significant figures.
2. Although quantities such as p, e, or 17 represent specific quantities, they cannot be
expressed exactly by a limited number of digits. For example,
p 5 3.141592653589793238462643 p
ad infinitum. Because computers retain only a finite number of significant figures,
such numbers can never be represented exactly. The omission of the remaining
significant figures is called round-off error.
Both round-off error and the use of significant figures to express our confidence in
a numerical result will be explored in detail in subsequent sections. In addition, the
concept of significant figures will have relevance to our definition of accuracy and preci-
sion in the next section.
58 APPROXIMATIONS AND ROUND-OFF ERRORS
3.2 ACCURACY AND PRECISION
The errors associated with both calculations and measurements can be characterized with
regard to their accuracy and precision. Accuracy refers to how closely a computed or
measured value agrees with the true value. Precision refers to how closely individual
computed or measured values agree with each other.
These concepts can be illustrated graphically using an analogy from target practice.
The bullet holes on each target in Fig. 3.2 can be thought of as the predictions of a nu-
merical technique, whereas the bull’s-eye represents the truth. Inaccuracy (also called bias)
is defined as systematic deviation from the truth. Thus, although the shots in Fig. 3.2c are
more tightly grouped than those in Fig. 3.2a, the two cases are equally biased because
they are both centered on the upper left quadrant of the target. Imprecision (also called
uncertainty), on the other hand, refers to the magnitude of the scatter. Therefore, although
Fig. 3.2b and d are equally accurate (that is, centered on the bull’s-eye), the latter is
more precise because the shots are tightly grouped.
Numerical methods should be sufficiently accurate or unbiased to meet the require-
ments of a particular engineering problem. They also should be precise enough for adequate
(c)
(a)
(d)
(b)
Increasing accuracy
Increasing
precision
FIGURE 3.2
An example from marksmanship illustrating the concepts of accuracy and precision. (a) Inaccurate
and imprecise; (b) accurate and imprecise; (c) inaccurate and precise; (d) accurate and precise.
3.3 ERROR DEFINITIONS 59
engineering design. In this book, we will use the collective term error to represent both
the inaccuracy and the imprecision of our predictions. With these concepts as background,
we can now discuss the factors that contribute to the error of numerical computations.
3.3 ERROR DEFINITIONS
Numerical errors arise from the use of approximations to represent exact mathematical
operations and quantities. These include truncation errors, which result when approxima-
tions are used to represent exact mathematical procedures, and round-off errors, which
result when numbers having limited significant figures are used to represent exact num-
bers. For both types, the relationship between the exact, or true, result and the approxi-
mation can be formulated as
True value 5 approximation 1 error (3.1)
By rearranging Eq. (3.1), we find that the numerical error is equal to the discrepancy
between the truth and the approximation, as in
Et 5 true value 2 approximation (3.2)
where Et is used to designate the exact value of the error. The subscript t is included to
designate that this is the “true” error. This is in contrast to other cases, as described
shortly, where an “approximate” estimate of the error must be employed.
A shortcoming of this definition is that it takes no account of the order of magnitude
of the value under examination. For example, an error of a centimeter is much more sig-
nificant if we are measuring a rivet rather than a bridge. One way to account for the mag-
nitudes of the quantities being evaluated is to normalize the error to the true value, as in
True fractional relative error 5
true error
true value
where, as specified by Eq. (3.2), error 5 true value 2 approximation. The relative error
can also be multiplied by 100 percent to express it as
et 5
true error
true value
100% (3.3)
where et designates the true percent relative error.
EXAMPLE 3.1 Calculation of Errors
Problem Statement. Suppose that you have the task of measuring the lengths of a
bridge and a rivet and come up with 9999 and 9 cm, respectively. If the true values are
10,000 and 10 cm, respectively, compute (a) the true error and (b) the true percent rela-
tive error for each case.
Solution.
(a) The error for measuring the bridge is [Eq. (3.2)]
Et 5 10,000 2 9999 5 1 cm
60 APPROXIMATIONS AND ROUND-OFF ERRORS
and for the rivet it is
Et 5 10 2 9 5 1 cm
(b) The percent relative error for the bridge is [Eq. (3.3)]
et 5
1
10,000
100% 5 0.01%
and for the rivet it is
et 5
1
10
100% 5 10%
Thus, although both measurements have an error of 1 cm, the relative error for the rivet
is much greater. We would conclude that we have done an adequate job of measuring
the bridge, whereas our estimate for the rivet leaves something to be desired.
Notice that for Eqs. (3.2) and (3.3), E and e are subscripted with a t to signify that
the error is normalized to the true value. In Example 3.1, we were provided with this
value. However, in actual situations such information is rarely available. For numerical
methods, the true value will be known only when we deal with functions that can be
solved analytically. Such will typically be the case when we investigate the theoretical
behavior of a particular technique for simple systems. However, in real-world applications,
we will obviously not know the true answer a priori. For these situations, an alternative
is to normalize the error using the best available estimate of the true value, that is, to the
approximation itself, as in
ea 5
approximate error
approximation
100% (3.4)
where the subscript a signifies that the error is normalized to an approximate value. Note
also that for real-world applications, Eq. (3.2) cannot be used to calculate the error term
for Eq. (3.4). One of the challenges of numerical methods is to determine error estimates
in the absence of knowledge regarding the true value. For example, certain numerical
methods use an iterative approach to compute answers. In such an approach, a present
approximation is made on the basis of a previous approximation. This process is performed
repeatedly, or iteratively, to successively compute (we hope) better and better approxima-
tions. For such cases, the error is often estimated as the difference between previous and
current approximations. Thus, percent relative error is determined according to
ea 5
current approximation 2 previous approximation
current approximation
100% (3.5)
This and other approaches for expressing errors will be elaborated on in subsequent chapters.
The signs of Eqs. (3.2) through (3.5) may be either positive or negative. If the
approximation is greater than the true value (or the previous approximation is greater
than the current approximation), the error is negative; if the approximation is less than
the true value, the error is positive. Also, for Eqs. (3.3) to (3.5), the denominator may
3.3 ERROR DEFINITIONS 61
be less than zero, which can also lead to a negative error. Often, when performing
computations, we may not be concerned with the sign of the error, but we are interested
in whether the percent absolute value is lower than a prespecified percent tolerance es.
Therefore, it is often useful to employ the absolute value of Eqs. (3.2) through (3.5).
For such cases, the computation is repeated until
ZeaZ , es (3.6)
If this relationship holds, our result is assumed to be within the prespecified acceptable
level es. Note that for the remainder of this text, we will almost exclusively employ
absolute values when we use relative errors.
It is also convenient to relate these errors to the number of significant figures in the
approximation. It can be shown (Scarborough, 1966) that if the following criterion is
met, we can be assured that the result is correct to at least n significant figures.
es 5 (0.5 3 1022n
)% (3.7)
EXAMPLE 3.2 Error Estimates for Iterative Methods
Problem Statement. In mathematics, functions can often be represented by infinite
series. For example, the exponential function can be computed using
ex
5 1 1 x 1
x2
2
1
x3
3!
1 p 1
xn
n!
(E3.2.1)
Thus, as more terms are added in sequence, the approximation becomes a better and better
estimate of the true value of ex
. Equation (E3.2.1) is called a Maclaurin series expansion.
Starting with the simplest version, ex
5 1, add terms one at a time to estimate e0.5
.
After each new term is added, compute the true and approximate percent relative errors
with Eqs. (3.3) and (3.5), respectively. Note that the true value is e0.5
5 1.648721 . . . .
Add terms until the absolute value of the approximate error estimate ea falls below a
prespecified error criterion es conforming to three significant figures.
Solution. First, Eq. (3.7) can be employed to determine the error criterion that ensures
a result is correct to at least three significant figures:
es 5 (0.5 3 10223
)% 5 0.05%
Thus, we will add terms to the series until ea falls below this level.
The first estimate is simply equal to Eq. (E3.2.1) with a single term. Thus, the first es-
timate is equal to 1. The second estimate is then generated by adding the second term, as in
ex
5 1 1 x
or for x 5 0.5,
e0.5
5 1 1 0.5 5 1.5
This represents a true percent relative error of [Eq. (3.3)]
et 5
1.648721 2 1.5
1.648721
100% 5 9.02%
62 APPROXIMATIONS AND ROUND-OFF ERRORS
Equation (3.5) can be used to determine an approximate estimate of the error, as in
ea 5
1.5 2 1
1.5
100% 5 33.3%
Because ea is not less than the required value of es, we would continue the computation
by adding another term, x2
y2!, and repeating the error calculations. The process is con-
tinued until ea , es. The entire computation can be summarized as
Terms Result Et (%) Ea (%)
1 1 39.3
2 1.5 9.02 33.3
3 1.625 1.44 7.69
4 1.645833333 0.175 1.27
5 1.648437500 0.0172 0.158
6 1.648697917 0.00142 0.0158
Thus, after six terms are included, the approximate error falls below es 5 0.05% and the
computation is terminated. However, notice that, rather than three significant figures, the
result is accurate to five! This is because, for this case, both Eqs. (3.5) and (3.7) are con-
servative. That is, they ensure that the result is at least as good as they specify. Although,
as discussed in Chap. 6, this is not always the case for Eq. (3.5), it is true most of the time.
3.3.1 Computer Algorithm for Iterative Calculations
Many of the numerical methods described in the remainder of this text involve iterative cal-
culations of the sort illustrated in Example 3.2. These all entail solving a mathematical
problem by computing successive approximations to the solution starting from an initial guess.
The computer implementation of such iterative solutions involves loops. As we saw
in Sec. 2.1.1, these come in two basic flavors: count-controlled and decision loops. Most
iterative solutions use decision loops. Thus, rather than employing a prespecified number
of iterations, the process typically is repeated until an approximate error estimate falls
below a stopping criterion, as in Example 3.2.
A pseudocode for a generic iterative calculation is presented in Fig. 3.3. The function
is passed a value (val) along with a stopping error criterion (es) and a maximum al-
lowable number of iterations (maxit). The value is typically either (1) an initial value
or (2) the value for which the iterative calculation is to be made.
The function first initializes three variables. These include (1) a variable iter that
keeps track of the number of iterations, (2) a variable sol that holds the current estimate
of the solution, and (3) a variable ea that holds the approximate percent relative error.
Note that ea is initially set to a value of 100 to ensure that the loop executes at least once.
These initializations are followed by the decision loop that actually implements the
iterative calculation. Prior to generating a new solution, sol is first assigned to solold.
Then a new value of sol is computed and the iteration counter is incremented. If the
new value of sol is nonzero, the percent relative error ea is determined. The stopping
3.3 ERROR DEFINITIONS 63
criteria are then tested. If both are false, the loop repeats. If either are true, the loop
terminates and the final solution is sent back to the function call. The following example
illustrates how the generic algorithm can be applied to a specific iterative calculation.
EXAMPLE 3.3 Computer Implementation of an Iterative Calculation
Problem Statement. Develop a computer program based on the pseudocode from
Fig. 3.3 to implement the calculation from Example 3.2.
Solution. A function to implement the Maclaurin series expansion for ex
can be based on
the general scheme in Fig. 3.3. To do this, we first formulate the series expansion as a formula:
ex
 a
n
i50
xn
n!
Figure 3.4 shows functions to implement this series written in VBA and MATLAB software.
Similar codes could be developed in other languages such a C11 or Fortran 95. Notice
that whereas MATLAB has a built-in factorial function, it is necessary to compute the
factorial as part of the VBA implementation with a simple product accumulator fac.
When the programs are run, they generate an estimate for the exponential function.
For the MATLAB version, the answer is returned along with the approximate error and
the number of iterations. For example, e1
can be evaluated as
 format long
 [val, ea, iter] = IterMeth(1,1e−6,100)
val =
2.718281826198493
ea =
9.216155641522974e−007
iter =
12
FUNCTION IterMeth(val, es, maxit)
iter 5 1
sol 5 val
ea 5 100
DO
solold 5 sol
sol 5 ...
iter 5 iter 1 1
IF sol fi 0 ea5abs((sol 2 solold)/sol)*100
IF ea # es OR iter $ maxit EXIT
END DO
IterMeth 5 sol
END IterMeth
FIGURE 3.3
Pseudocode for a generic iterative calculation.
64 APPROXIMATIONS AND ROUND-OFF ERRORS
We can see that after 12 iterations, we obtain a result of 2.7182818 with an approxi-
mate error estimate of 5 9.2162 3 1027
%. The result can be verified by using the built-in
exp function to directly calculate the exact value and the true percent relative error,
 trueval=exp(1)
trueval =
2.718281828459046
 et=abs((trueval−val)/trueval)*100
et =
8.316108397236229e−008
As was the case with Example 3.2, we obtain the desirable outcome that the true error
is less than the approximate error.
With the preceding definitions as background, we can now proceed to the two types
of error connected directly with numerical methods: round-off errors and truncation
errors.
(b) MATLAB
function [v,ea,iter] = IterMeth(x,es,maxit)
% initialization
iter = 1;
sol = 1;
ea = 100;
% iterative calculation
while (1)
solold = sol;
sol = sol + x ^ iter / factorial(iter);
iter = iter + 1;
if sol~=0
ea=abs((sol − solold)/sol)*100;
end
if ea=es | iter=maxit,break,end
end
v = sol;
end
(a) VBA/Excel
Function IterMeth(x, es, maxit)
’ initialization
iter = 1
sol = 1
ea = 100
fac = 1
’ iterative calculation
Do
solold = sol
fac = fac * iter
sol = sol + x ^ iter / fac
iter = iter + 1
If sol  0 Then
ea = Abs((sol − solold) / sol) * 100
End If
If ea = es Or iter = maxit Then Exit Do
Loop
IterMeth = sol
End Function
FIGURE 3.4
(a) VBA/Excel and (b) MATLAB functions based on the pseudocode from Fig. 3.3.
3.4 ROUND-OFF ERRORS 65
3.4 ROUND-OFF ERRORS
As mentioned previously, round-off errors originate from the fact that computers retain
only a fixed number of significant figures during a calculation. Numbers such as p, e,
or 27 cannot be expressed by a fixed number of significant figures. Therefore, they
cannot be represented exactly by the computer. In addition, because computers use a
base-2 representation, they cannot precisely represent certain exact base-10 numbers. The
discrepancy introduced by this omission of significant figures is called round-off error.
3.4.1 Computer Representation of Numbers
Numerical round-off errors are directly related to the manner in which numbers are stored
in a computer. The fundamental unit whereby information is represented is called a word.
This is an entity that consists of a string of binary digits, or bits. Numbers are typically
stored in one or more words. To understand how this is accomplished, we must first
review some material related to number systems.
Number Systems. A number system is merely a convention for representing quantities.
Because we have 10 fingers and 10 toes, the number system that we are most familiar
with is the decimal, or base-10, number system. A base is the number used as the refer-
ence for constructing the system. The base-10 system uses the 10 digits—0, 1, 2, 3, 4,
5, 6, 7, 8, 9—to represent numbers. By themselves, these digits are satisfactory for
counting from 0 to 9.
For larger quantities, combinations of these basic digits are used, with the position
or place value specifying the magnitude. The right-most digit in a whole number repre-
sents a number from 0 to 9. The second digit from the right represents a multiple of 10.
The third digit from the right represents a multiple of 100 and so on. For example, if
we have the number 86,409 then we have eight groups of 10,000, six groups of 1000,
four groups of 100, zero groups of 10, and nine more units, or
(8 3 104
) 1 (6 3 103
) 1 (4 3 102
) 1 (0 3 101
) 1 (9 3 100
) 5 86,409
Figure 3.5a provides a visual representation of how a number is formulated in the
base-10 system. This type of representation is called positional notation.
Because the decimal system is so familiar, it is not commonly realized that there are
alternatives. For example, if human beings happened to have had eight fingers and eight
toes, we would undoubtedly have developed an octal, or base-8, representation. In the
same sense, our friend the computer is like a two-fingered animal who is limited to two
states—either 0 or 1. This relates to the fact that the primary logic units of digital com-
puters are on/off electronic components. Hence, numbers on the computer are represented
with a binary, or base-2, system. Just as with the decimal system, quantities can be
represented using positional notation. For example, the binary number 11 is equivalent
to (1 3 21
) 1 (1 3 20
) 5 2 1 1 5 3 in the decimal system. Figure 3.5b illustrates a
more complicated example.
Integer Representation. Now that we have reviewed how base-10 numbers can be
represented in binary form, it is simple to conceive of how integers are represented on
a computer. The most straightforward approach, called the signed magnitude method,
employs the first bit of a word to indicate the sign, with a 0 for positive and a 1 for
66 APPROXIMATIONS AND ROUND-OFF ERRORS
negative. The remaining bits are used to store the number. For example, the integer value
of 2173 would be stored on a 16-bit computer, as in Fig. 3.6.
EXAMPLE 3.4 Range of Integers
Problem Statement. Determine the range of integers in base-10 that can be represented
on a 16-bit computer.
FIGURE 3.5
How the (a) decimal (base-10) and the (b) binary (base-2) systems work. In (b), the binary num-
ber 10101101 is equivalent to the decimal number 173.
1 ⫻ 1 =
0 ⫻ 2 =
1 ⫻ 4 =
1 ⫻ 8 =
0 ⫻ 16 =
1 ⫻ 32 =
0 ⫻ 64 =
1 ⫻ 128 =
1
0
4
8
0
32
0
128
173
27
1
26
0
25
1
24
0
23
1
22
1
21
0
20
1
9 ⫻ 1 =
0 ⫻ 10 =
4 ⫻ 100 =
6 ⫻ 1,000 =
8 ⫻ 10,000 =
9
0
400
6,000
80,000
86,409
104
8
103
6
102
4
101
0
100
9
(a)
(b)
FIGURE 3.6
The representation of the decimal integer 2173 on a 16-bit computer using the signed
magnitude method.
1 0 0 0 0 0 0 0 1 0 1 0 1 1 0 1
Sign
Number
3.4 ROUND-OFF ERRORS 67
Solution. Of the 16 bits, the first bit holds the sign. The remaining 15 bits can hold
binary numbers from 0 to 111111111111111. The upper limit can be converted to a
decimal integer, as in
(1 3 214
) 1 (1 3 213
) 1 p 1 (1 3 21
) 1 (1 3 20
)
which equals 32,767 (note that this expression can be simply evaluated as 215
2 1). Thus,
a 16-bit computer word can store decimal integers ranging from 232,767 to 32,767. In
addition, because zero is already defined as 0000000000000000, it is redundant to use
the number 1000000000000000 to define a “minus zero.” Therefore, it is usually em-
ployed to represent an additional negative number: 232,768, and the range is from
232,768 to 32,767.
Note that the signed magnitude method described above is not used to represent
integers on conventional computers. A preferred approach called the 2’s complement
technique directly incorporates the sign into the number’s magnitude rather than provid-
ing a separate bit to represent plus or minus (see Chapra and Canale 1994). However,
Example 3.4 still serves to illustrate how all digital computers are limited in their capa-
bility to represent integers. That is, numbers above or below the range cannot be repre-
sented. A more serious limitation is encountered in the storage and manipulation of
fractional quantities as described next.
Floating-Point Representation. Fractional quantities are typically represented in com-
puters using floating-point form. In this approach, the number is expressed as a fractional
part, called a mantissa or significand, and an integer part, called an exponent or charac-
teristic, as in
m # be
where m 5 the mantissa, b 5 the base of the number system being used, and e 5 the
exponent. For instance, the number 156.78 could be represented as 0.15678 3 103
in a
floating-point base-10 system.
Figure 3.7 shows one way that a floating-point number could be stored in a word.
The first bit is reserved for the sign, the next series of bits for the signed exponent, and
the last bits for the mantissa.
FIGURE 3.7
The manner in which a floating-point number is stored in a word.
Sign
Signed
exponent
Mantissa
68 APPROXIMATIONS AND ROUND-OFF ERRORS
Note that the mantissa is usually normalized if it has leading zero digits. For ex-
ample, suppose the quantity 1y34 5 0.029411765 . . . was stored in a floating-point base-
10 system that allowed only four decimal places to be stored. Thus, 1y34 would be stored
as
0.0294 3 100
However, in the process of doing this, the inclusion of the useless zero to the right of
the decimal forces us to drop the digit 1 in the fifth decimal place. The number can be
normalized to remove the leading zero by multiplying the mantissa by 10 and lowering
the exponent by 1 to give
0.2941 3 1021
Thus, we retain an additional significant figure when the number is stored.
The consequence of normalization is that the absolute value of m is limited. That is,
1
b
# m , 1 (3.8)
where b 5 the base. For example, for a base-10 system, m would range between 0.1 and 1,
and for a base-2 system, between 0.5 and 1.
Floating-point representation allows both fractions and very large numbers to
be expressed on the computer. However, it has some disadvantages. For example,
floating-point numbers take up more room and take longer to process than integer
numbers. More significantly, however, their use introduces a source of error because
the mantissa holds only a finite number of significant figures. Thus, a round-off
error is introduced.
EXAMPLE 3.5 Hypothetical Set of Floating-Point Numbers
Problem Statement. Create a hypothetical floating-point number set for a machine that
stores information using 7-bit words. Employ the first bit for the sign of the number, the
next three for the sign and the magnitude of the exponent, and the last three for the
magnitude of the mantissa (Fig. 3.8).
FIGURE 3.8
The smallest possible positive floating-point number from Example 3.5.
0 1 1 1 1 0 0
Sign of
number
Sign of
exponent
Magnitude
of exponent
Magnitude
of mantissa
21
20
2–1
2–2
2–3
3.4 ROUND-OFF ERRORS 69
Solution. The smallest possible positive number is depicted in Fig. 3.8. The initial 0
indicates that the quantity is positive. The 1 in the second place designates that the
exponent has a negative sign. The 1’s in the third and fourth places give a maximum
value to the exponent of
1 3 21
1 1 3 20
5 3
Therefore, the exponent will be 23. Finally, the mantissa is specified by the 100 in the
last three places, which conforms to
1 3 221
1 0 3 222
1 0 3 223
5 0.5
Although a smaller mantissa is possible (e.g., 000, 001, 010, 011), the value of 100 is used
because of the limit imposed by normalization [Eq. (3.8)]. Thus, the smallest possible
positive number for this system is 10.5 3 223
, which is equal to 0.0625 in the base-10
system. The next highest numbers are developed by increasing the mantissa, as in
0111101 5 (1 3 221
1 0 3 222
1 1 3 223
) 3 223
5 (0.078125)10
0111110 5 (1 3 221
1 1 3 222
1 0 3 223
) 3 223
5 (0.093750)10
0111111 5 (1 3 221
1 1 3 222
1 1 3 223
) 3 223
5 (0.109375)10
Notice that the base-10 equivalents are spaced evenly with an interval of 0.015625.
At this point, to continue increasing, we must decrease the exponent to 10, which
gives a value of
1 3 21
1 0 3 20
5 2
The mantissa is decreased back to its smallest value of 100. Therefore, the next num-
ber is
0110100 5 (1 3 221
1 0 3 222
1 0 3 223
) 3 222
5 (0.125000)10
This still represents a gap of 0.125000 2 0.109375 5 0.015625. However, now when
higher numbers are generated by increasing the mantissa, the gap is lengthened to
0.03125,
0110101 5 (1 3 221
1 0 3 222
1 1 3 223
) 3 222
5 (0.156250)10
0110110 5 (1 3 221
1 1 3 222
1 0 3 223
) 3 222
5 (0.187500)10
0110111 5 (1 3 221
1 1 3 222
1 1 3 223
) 3 222
5 (0.218750)10
This pattern is repeated as each larger quantity is formulated until a maximum number
is reached,
0011111 5 (1 3 221
1 1 3 222
1 1 3 223
) 3 23
5 (7)10
The final number set is depicted graphically in Fig. 3.9.
Figure 3.9 manifests several aspects of floating-point representation that have
significance regarding computer round-off errors:
1. There Is a Limited Range of Quantities That May Be Represented. Just as for the
integer case, there are large positive and negative numbers that cannot be represented.
Attempts to employ numbers outside the acceptable range will result in what is called
70 APPROXIMATIONS AND ROUND-OFF ERRORS
an overflow error. However, in addition to large quantities, the floating-point repre-
sentation has the added limitation that very small numbers cannot be represented. This
is illustrated by the underflow “hole” between zero and the first positive number in
Fig. 3.9. It should be noted that this hole is enlarged because of the normalization
constraint of Eq. (3.8).
2. There Are Only a Finite Number of Quantities That Can Be Represented within the
Range. Thus, the degree of precision is limited. Obviously, irrational numbers cannot
be represented exactly. Furthermore, rational numbers that do not exactly match one
of the values in the set also cannot be represented precisely. The errors introduced by
approximating both these cases are referred to as quantizing errors. The actual
approximation is accomplished in either of two ways: chopping or rounding. For
example, suppose that the value of p 5 3.14159265358 . . . is to be stored on a base-
10 number system carrying seven significant figures. One method of approximation
would be to merely omit, or “chop off,” the eighth and higher terms, as in p 5
3.141592, with the introduction of an associated error of [Eq. (3.2)]
Et 5 0.00000065 p
This technique of retaining only the significant terms was originally dubbed
“truncation” in computer jargon. We prefer to call it chopping to distinguish it from
the truncation errors discussed in Chap. 4. Note that for the base-2 number system
⌬x
x – ⌬x
⌬x/2 ⌬x/2
x – ⌬x x + ⌬x
Chopping Rounding
0
0
7
Overflow
Underflow “hole”
at zero
FIGURE 3.9
The hypothetical number system developed in Example 3.5. Each value is indicated by a tick
mark. Only the positive numbers are shown. An identical set would also extend in the negative
direction.
3.4 ROUND-OFF ERRORS 71
in Fig. 3.9, chopping means that any quantity falling within an interval of length Dx
will be stored as the quantity at the lower end of the interval. Thus, the upper error
bound for chopping is Dx. Additionally, a bias is introduced because all errors are
positive. The shortcomings of chopping are attributable to the fact that the higher terms
in the complete decimal representation have no impact on the shortened version. For
instance, in our example of p, the first discarded digit is 6. Thus, the last retained digit
should be rounded up to yield 3.141593. Such rounding reduces the error to
Et 5 20.00000035 p
Consequently, rounding yields a lower absolute error than chopping. Note that for the
base-2 number system in Fig. 3.9, rounding means that any quantity falling within an
interval of length Dx will be represented as the nearest allowable number. Thus, the upper
error bound for rounding is Dxy2. Additionally, no bias is introduced because some errors
are positive and some are negative. Some computers employ rounding. However, this
adds to the computational overhead, and, consequently, many machines use simple
chopping. This approach is justified under the supposition that the number of significant
figures is large enough that resulting round-off error is usually negligible.
3. The Interval between Numbers, Dx, Increases as the Numbers Grow in Magnitude.
It is this characteristic, of course, that allows floating-point representation to preserve
significant digits. However, it also means that quantizing errors will be proportional
to the magnitude of the number being represented. For normalized floating-point
numbers, this proportionality can be expressed, for cases where chopping is employed,
as
Z¢xZ
ZxZ
# e (3.9)
and, for cases where rounding is employed, as
Z¢xZ
ZxZ
#
e
2
(3.10)
where % is referred to as the machine epsilon, which can be computed as
e 5 b12t
(3.11)
where b is the number base and t is the number of significant digits in the mantissa.
Notice that the inequalities in Eqs. (3.9) and (3.10) signify that these are error bounds.
That is, they specify the worst cases.
EXAMPLE 3.6 Machine Epsilon
Problem Statement. Determine the machine epsilon and verify its effectiveness in char-
acterizing the errors of the number system from Example 3.5. Assume that chopping is used.
Solution. The hypothetical floating-point system from Example 3.5 employed values
of the base b 5 2, and the number of mantissa bits t 5 3. Therefore, the machine epsi-
lon would be [Eq. (3.11)]
e 5 2123
5 0.25
72 APPROXIMATIONS AND ROUND-OFF ERRORS
Consequently, the relative quantizing error should be bounded by 0.25 for chopping. The
largest relative errors should occur for those quantities that fall just below the upper
bound of the first interval between successive equispaced numbers (Fig. 3.10). Those
numbers falling in the succeeding higher intervals would have the same value of Dx but
a greater value of x and, hence, would have a lower relative error. An example of a
maximum error would be a value falling just below the upper bound of the interval
between (0.125000)10 and (0.156250)10. For this case, the error would be less than
0.03125
0.125000
5 0.25
Thus, the error is as predicted by Eq. (3.9).
Largest relative
error
FIGURE 3.10
The largest quantizing error will occur for those values falling just below the upper bound of the
first of a series of equispaced intervals.
The magnitude dependence of quantizing errors has a number of practical applica-
tions in numerical methods. Most of these relate to the commonly employed operation
of testing whether two numbers are equal. This occurs when testing convergence of
quantities as well as in the stopping mechanism for iterative processes (recall Example
3.2). For these cases, it should be clear that, rather than test whether the two quantities
are equal, it is advisable to test whether their difference is less than an acceptably small
tolerance. Further, it should also be evident that normalized rather than absolute differ-
ence should be compared, particularly when dealing with numbers of large magnitude.
In addition, the machine epsilon can be employed in formulating stopping or convergence
criteria. This ensures that programs are portable—that is, they are not dependent on the
computer on which they are implemented. Figure 3.11 lists pseudocode to automatically
determine the machine epsilon of a binary computer.
Extended Precision. It should be noted at this point that, although round-off errors
can be important in contexts such as testing convergence, the number of significant
digits carried on most computers allows most engineering computations to be performed
with more than acceptable precision. For example, the hypothetical number system in
Fig. 3.9 is a gross exaggeration that was employed for illustrative purposes. Commercial
computers use much larger words and, consequently, allow numbers to be expressed with
more than adequate precision. For example, computers that use IEEE format allow
24 bits to be used for the mantissa, which translates into about seven significant base-10
digits of precision1
with a range of about 10238
to 1039
.
FIGURE 3.11
Pseudocode to determine
machine epsilon for a binary
computer.
epsilon 5 1
DO
IF (epsilon11 # 1)EXIT
epsilon 5 epsilon/2
END DO
epsilon 5 2 3 epsilon
1
Note that only 23 bits are actually used to store the mantissa. However, because of normalization, the first bit
of the mantissa is always 1 and is, therefore, not stored. Thus, this first bit together with the 23 stored bits
gives the 24 total bits of precision for the mantissa.
3.4 ROUND-OFF ERRORS 73
With this acknowledged, there are still cases where round-off error becomes critical.
For this reason most computers allow the specification of extended precision. The most
common of these is double precision, in which the number of words used to store
floating-point numbers is doubled. It provides about 15 to 16 decimal digits of precision
and a range of approximately 102308
to 10308
.
In many cases, the use of double-precision quantities can greatly mitigate the effect
of round-off errors. However, a price is paid for such remedies in that they also require
more memory and execution time. The difference in execution time for a small calcula-
tion might seem insignificant. However, as your programs become larger and more com-
plicated, the added execution time could become considerable and have a negative impact
on your effectiveness as a problem solver. Therefore, extended precision should not be
used frivolously. Rather, it should be selectively employed where it will yield the maxi-
mum benefit at the least cost in terms of execution time. In the following sections, we
will look closer at how round-off errors affect computations, and in so doing provide a
foundation of understanding to guide your use of the double-precision capability.
Before proceeding, it should be noted that some of the commonly used software pack-
ages (for example, Excel, Mathcad) routinely use double precision to represent numerical
quantities. Thus, the developers of these packages decided that mitigating round-off errors
would take precedence over any loss of speed incurred by using extended precision. Others,
like MATLAB software, allow you to use extended precision, if you desire.
3.4.2 Arithmetic Manipulations of Computer Numbers
Aside from the limitations of a computer’s number system, the actual arithmetic manipula-
tions involving these numbers can also result in round-off error. In the following section, we
will first illustrate how common arithmetic operations affect round-off errors. Then we will
investigate a number of particular manipulations that are especially prone to round-off errors.
Common Arithmetic Operations. Because of their familiarity, normalized base-10
numbers will be employed to illustrate the effect of round-off errors on simple addition,
subtraction, multiplication, and division. Other number bases would behave in a similar
fashion. To simplify the discussion, we will employ a hypothetical decimal computer
with a 4-digit mantissa and a 1-digit exponent. In addition, chopping is used. Rounding
would lead to similar though less dramatic errors.
When two floating-point numbers are added, the mantissa of the number with the
smaller exponent is modified so that the exponents are the same. This has the effect of align-
ing the decimal points. For example, suppose we want to add 0.1557 ? 101
1 0.4381 ? 1021
.
The decimal of the mantissa of the second number is shifted to the left a number of
places equal to the difference of the exponents [1 2 (21) 5 2], as in
0.4381 # 1021
S 0.004381 # 101
Now the numbers can be added,
0.1557 # 101
0.004381 # 101
0.160081 # 101
and the result chopped to 0.1600 ? 101
. Notice how the last two digits of the second
number that were shifted to the right have essentially been lost from the computation.
74 APPROXIMATIONS AND ROUND-OFF ERRORS
Subtraction is performed identically to addition except that the sign of the subtrahend
is reversed. For example, suppose that we are subtracting 26.86 from 36.41. That is,
0.3641 # 102
20.2686 # 102
0.0955 # 102
For this case the result is not normalized, and so we must shift the decimal one place
to the right to give 0.9550 ? 101
5 9.550. Notice that the zero added to the end of the man-
tissa is not significant but is merely appended to fill the empty space created by the shift.
Even more dramatic results would be obtained when the numbers are very close, as in
0.7642 # 103
20.7641 # 103
0.0001 # 103
which would be converted to 0.1000 ? 100
5 0.1000. Thus, for this case, three nonsig-
nificant zeros are appended. This introduces a substantial computational error because
subsequent manipulations would act as if these zeros were significant. As we will see in
a later section, the loss of significance during the subtraction of nearly equal numbers is
among the greatest source of round-off error in numerical methods.
Multiplication and division are somewhat more straightforward than addition or sub-
traction. The exponents are added and the mantissas multiplied. Because multiplication
of two n-digit mantissas will yield a 2n-digit result, most computers hold intermediate
results in a double-length register. For example,
0.1363 # 103
3 0.6423 # 1021
5 0.08754549 # 102
If, as in this case, a leading zero is introduced, the result is normalized,
0.08754549 # 102
S 0.8754549 # 101
and chopped to give
0.8754 # 101
Division is performed in a similar manner, but the mantissas are divided and the
exponents are subtracted. Then the results are normalized and chopped.
Large Computations. Certain methods require extremely large numbers of arithmetic
manipulations to arrive at their final results. In addition, these computations are often
interdependent. That is, the later calculations are dependent on the results of earlier ones.
Consequently, even though an individual round-off error could be small, the cumulative
effect over the course of a large computation can be significant.
EXAMPLE 3.7 Large Numbers of Interdependent Computations
Problem Statement. Investigate the effect of round-off error on large numbers of in-
terdependent computations. Develop a program to sum a number 100,000 times. Sum
the number 1 in single precision, and 0.00001 in single and double precision.
Solution. Figure 3.12 shows a Fortran 90 program that performs the summation. Whereas
the single-precision summation of 1 yields the expected result, the single-precision
3.4 ROUND-OFF ERRORS 75
summation of 0.00001 yields a large discrepancy. This error is reduced significantly when
0.00001 is summed in double precision.
Quantizing errors are the source of the discrepancies. Because the integer 1 can be
represented exactly within the computer, it can be summed exactly. In contrast, 0.00001
cannot be represented exactly and is quantized by a value that is slightly different from
its true value. Whereas this very slight discrepancy would be negligible for a small com-
putation, it accumulates after repeated summations. The problem still occurs in double
precision but is greatly mitigated because the quantizing error is much smaller.
PROGRAM fig0312
IMPLICIT none
INTEGER::i
REAL::sum1, sum2, x1, x2
DOUBLE PRECISION::sum3, x3
sum1=0.
sum2=0.
sum3=0.
x1=1.
x2=1.e−5
x3=1.d−5
DO i=1,100000
sum1=sum1+x1
sum2=sum2+x2
sum3=sum3+x3
END DO
PRINT *, sum1
PRINT *, sum2
PRINT *, sum3
END
output:
100000.000000
1.000990
9.999999999980838E-001
FIGURE 3.12
Fortran 90 program to
sum a number 105
times.
The case sums the number 1
in single precision and the
number 1025
in single and
double precision.
Note that the type of error illustrated by the previous example is somewhat atypical
in that all the errors in the repeated operation are of the same sign. In most cases the
errors of a long computation alternate sign in a random fashion and, thus, often cancel
out. However, there are also instances where such errors do not cancel but, in fact, lead
to a spurious final result. The following sections are intended to provide insight into ways
in which this may occur.
Adding a Large and a Small Number. Suppose we add a small number, 0.0010, to
a large number, 4000, using a hypothetical computer with the 4-digit mantissa and the
1-digit exponent. We modify the smaller number so that its exponent matches the larger,
0.4000 # 104
0.0000001 # 104
0.4000001 # 104
76 APPROXIMATIONS AND ROUND-OFF ERRORS
which is chopped to 0.4000 ? 104
. Thus, we might as well have not performed the
addition!
This type of error can occur in the computation of an infinite series. The initial terms
in such series are often relatively large in comparison with the later terms. Thus, after a few
terms have been added, we are in the situation of adding a small quantity to a large quantity.
One way to mitigate this type of error is to sum the series in reverse order—that is,
in ascending rather than descending order. In this way, each new term will be of com-
parable magnitude to the accumulated sum (see Prob. 3.5).
Subtractive Cancellation. This term refers to the round-off induced when subtracting
two nearly equal floating-point numbers.
One common instance where this can occur involves finding the roots of a quadratic
equation or parabola with the quadratic formula,
x1
x2
5
2b62b2
24ac
2a
(3.12)
For cases where b2
W 4ac, the difference in the numerator can be very small. In such
cases, double precision can mitigate the problem. In addition, an alternative formulation
can be used to minimize subtractive cancellation,
x1
x2
5
22c
b 6 2b2
2 4ac
(3.13)
An illustration of the problem and the use of this alternative formula are provided in the
following example.
EXAMPLE 3.8 Subtractive Cancellation
Problem Statement. Compute the values of the roots of a quadratic equation with a 5 1,
b 5 3000.001, and c 5 3. Check the computed values versus the true roots of x1 5 20.001
and x2 5 23000.
Solution. Figure 3.13 shows an Excel/VBA program that computes the roots x1 and
x2 on the basis of the quadratic formula [(Eq. (3.12)]. Note that both single- and
double-precision versions are given. Whereas the results for x2 are adequate, the
percent relative errors for x1 are poor for the single-precision version, et 5 2.4%.
This level could be inadequate for many applied engineering problems. This result
is particularly surprising because we are employing an analytical formula to obtain
our solution!
The loss of significance occurs in the line of both programs where two relatively
large numbers are subtracted. Similar problems do not occur when the same numbers
are added.
On the basis of the above, we can draw the general conclusion that the quadratic
formula will be susceptible to subtractive cancellation whenever b2
W 4ac. One way to
circumvent this problem is to use double precision. Another is to recast the quadratic
formula in the format of Eq. (3.13). As in the program output, both options give a much
smaller error because the subtractive cancellation is minimized or avoided.
3.4 ROUND-OFF ERRORS 77
Option Explicit
Sub fig0313()
Dim a As Single, b As Single
Dim c As Single, d As Single
Dim x1 As Single, x2 As Single
Dim x1r As Single
Dim aa As Double, bb As Double
Dim cc As Double, dd As Double
Dim x11 As Double, x22 As Double
'Single precision:
a = 1: b = 3000.001: c = 3
d = Sqr(b * b − 4 * a * c)
x1 = (−b + d) / (2 * a)
x2 = (−b − d) / (2 * a)
'Double precision:
aa = 1: bb = 3000.001: cc = 3
dd = Sqr(bb * bb − 4 * aa * cc)
x11 = (−bb + dd) / (2 * aa)
x22 = (−bb − dd) / (2 * aa)
'Modified formula for first root
'single precision:
x1r = −2 * c / (b + d)
FIGURE 3.13
Excel/VBA program to determine the roots of a quadratic.
'Display results
Sheets(sheet1).Select
Range(b2).Select
ActiveCell.Value = x1
ActiveCell.Offset(1, 0).Select
ActiveCell.Value = x2
ActiveCell.Offset(2, 0).Select
ActiveCell.Value = x11
ActiveCell.Offset(1, 0).Select
ActiveCell.Value = x22
ActiveCell.Offset(2, 0).Select
ActiveCell.Value = x1r
End Sub
OUTPUT:
Note that, as in the foregoing example, there are times when subtractive cancellation
can be circumvented by using a transformation. However, the only general remedy is to
employ extended precision.
Smearing. Smearing occurs whenever the individual terms in a summation are larger
than the summation itself. As in the following example, one case where this occurs is in
series of mixed signs.
EXAMPLE 3.9 Evaluation of ex
using Infinite Series
Problem Statement. The exponential function y 5 ex
is given by the infinite series
y 5 1 1 x 1
x2
2
1
x3
3!
1 p
Evaluate this function for x 5 10 and x 5 210, and be attentive to the problems of
round-off error.
Solution. Figure 3.14a gives an Excel/VBA program that uses the infinite series to
evaluate ex
. The variable i is the number of terms in the series, term is the value of the
78 APPROXIMATIONS AND ROUND-OFF ERRORS
current term added to the series, and sum is the accumulative value of the series. The
variable test is the preceding accumulative value of the series prior to adding term. The
series is terminated when the computer cannot detect the difference between test and sum.
Figure 3.14b shows the results of running the program for x 5 10. Note that this
case is completely satisfactory. The final result is achieved in 31 terms with the series
identical to the library function value within seven significant figures.
Figure 3.14c shows similar results for x 5 210. However, for this case, the results of
the series calculation are not even the same sign as the true result. As a matter of fact, the
negative results are open to serious question because ex
can never be less than zero. The
problem here is caused by round-off error. Note that many of the terms that make up the
(a) Program
Option Explicit
Sub fig0314()
Dim term As Single, test As Single
Dim sum As Single, x As Single
Dim i As Integer
i = 0: term = 1#: sum = 1#: test = 0#
Sheets(sheet1).Select
Range(b1).Select
x = ActiveCell.Value
Range(a3:c1003).ClearContents
Range(a3).Select
Do
If sum = test Then Exit Do
ActiveCell.Value = i
ActiveCell.Offset(0, 1).Select
ActiveCell.Value = term
ActiveCell.Offset(0, 1).Select
ActiveCell.Value = sum
ActiveCell.Offset(1, -2).Select
i = i + 1
test = sum
term = x ^ i / _
Application.WorksheetFunction.Fact(i)
sum = sum + term
Loop
ActiveCell.Offset(0, 1).Select
ActiveCell.Value = Exact value = 
ActiveCell.Offset(0, 1).Select
ActiveCell.Value = Exp(x)
End Sub
(b) Evaluation of e10
(c) Evaluation of e10
FIGURE 3.14
(a) An Excel/VBA program to evaluate ex
using an infinite series. (b) Evaluation of ex
.
(c) Evaluation of e2x
.
PROBLEMS 79
sum are much larger than the final result of the sum. Furthermore, unlike the previous case,
the individual terms vary in sign. Thus, in effect we are adding and subtracting large num-
bers (each with some small error) and placing great significance on the differences—that
is, subtractive cancellation. Thus, we can see that the culprit behind this example of smear-
ing is, in fact, subtractive cancellation. For such cases it is appropriate to seek some other
computational strategy. For example, one might try to compute y 5 e10
as y 5 (e21
)10
.
Other than such a reformulation, the only general recourse is extended precision.
Inner Products. As should be clear from the last sections, some infinite series are
particularly prone to round-off error. Fortunately, the calculation of series is not one of
the more common operations in numerical methods. A far more ubiquitous manipulation
is the calculation of inner products, as in
a
n
i51
xi yi 5 x1 y1 1 x2 y2 1 p 1 xn yn
This operation is very common, particularly in the solution of simultaneous linear alge-
braic equations. Such summations are prone to round-off error. Consequently, it is often
desirable to compute such summations in extended precision.
Although the foregoing sections should provide rules of thumb to mitigate round-off
error, they do not provide a direct means beyond trial and error to actually determine
the effect of such errors on a computation. In Chap. 4, we will introduce the Taylor
series, which will provide a mathematical approach for estimating these effects.
PROBLEMS
3.1 Convert the following base-2 numbers to base-10: (a) 101101,
(b) 101.011, and (c) 0.01101.
3.2 Convert the following base-8 numbers to base-10: 71,263 and
3.147.
3.3 Compose your own program based on Fig. 3.11 and use it to
determine your computer’s machine epsilon.
3.4 In a fashion similar to that in Fig. 3.11, write a short program
to determine the smallest number, xmin, used on the computer you
will be employing along with this book. Note that your computer
will be unable to reliably distinguish between zero and a quantity
that is smaller than this number.
3.5 The infinite series
f(n) 5 a
n
i51
1
i4
converges on a value of f(n) 5 p4
y90 as n approaches infinity.
Write a program in single precision to calculate f(n) for n 5 10,000
by computing the sum from i 5 1 to 10,000. Then repeat the calcu-
lation but in reverse order—that is, from i 5 10,000 to 1 using incre-
ments of 21. In each case, compute the true percent relative error.
Explain the results.
3.6 Evaluate e25
using two approaches
e2x
5 1 2 x 1
x2
2
2
x3
3!
1 p
and
e2x
5
1
ex 5
1
1 1 x 1
x2
2
1
x3
3!
1 p
and compare with the true value of 6.737947 3 1023
. Use 20 terms
to evaluate each series and compute true and approximate relative
errors as terms are added.
3.7 The derivative of f(x) 5 1y(1 2 3x2
) is given by
6x
(1 2 3x2
)2
Do you expect to have difficulties evaluating this function at
x 5 0.577? Try it using 3- and 4-digit arithmetic with chopping.
3.8 (a) Evaluate the polynomial
y 5 x3
2 5x2
1 6x 1 0.55
80 APPROXIMATIONS AND ROUND-OFF ERRORS
at x 5 1.37. Use 3-digit arithmetic with chopping. Evaluate the
percent relative error.
(b) Repeat (a) but express y as
y 5 ((x 2 5)x 1 6)x 1 0.55
Evaluate the error and compare with part (a).
3.9 Calculate the random access memory (RAM) in megabytes
necessary to store a multidimensional array that is 20 3 40 3 120.
This array is double precision, and each value requires a 64-bit
word. Recall that a 64-bit word 5 8 bytes and 1 kilobyte 5 210
bytes. Assume that the index starts at 1.
3.10 Determine the number of terms necessary to approximate cos x
to 8 significant figures using the Maclaurin series approximation
cosx 5 1 2
x2
2
1
x4
4!
2
x6
6!
1
x8
8!
2 p
Calculate the approximation using a value of x 5 0.3p. Write a
program to determine your result.
3.11 Use 5-digit arithmetic with chopping to determine the roots of
the following equation with Eqs. (3.12) and (3.13)
x2
2 5000.002x 1 10
Compute percent relative errors for your results.
3.12 How can the machine epsilon be employed to formulate a
stopping criterion es for your programs? Provide an example.
3.13 The “divide and average” method, an old-time method for
approximating the square root of any positive number a, can be
formulated as
x 5
x 1 ayx
2
Write a well-structured function to implement this algorithm based
on the algorithm outlined in Fig. 3.3.
4
C H A P T E R 4
81
Truncation Errors and
the Taylor Series
Truncation errors are those that result from using an approximation in place of an
exact mathematical procedure. For example, in Chap. 1 we approximated the deriva-
tive of velocity of a falling parachutist by a finite-divided-difference equation of the
form [Eq. (1.11)]
dy
dt

¢y
¢t
5
y(ti11) 2 y(ti)
ti11 2 ti
(4.1)
A truncation error was introduced into the numerical solution because the difference
equation only approximates the true value of the derivative (recall Fig. 1.4). In order to
gain insight into the properties of such errors, we now turn to a mathematical formulation
that is used widely in numerical methods to express functions in an approximate fashion—
the Taylor series.
4.1 THE TAYLOR SERIES
Taylor’s theorem (Box 4.1) and its associated formula, the Taylor series, is of great
value in the study of numerical methods. In essence, the Taylor series provides a means
to predict a function value at one point in terms of the function value and its deriva-
tives at another point. In particular, the theorem states that any smooth function can
be approximated as a polynomial.
A useful way to gain insight into the Taylor series is to build it term by term. For
example, the first term in the series is
f(xi11)  f(xi) (4.2)
This relationship, called the zero-order approximation, indicates that the value of f at the
new point is the same as its value at the old point. This result makes intuitive sense
because if xi and xi+1 are close to each other, it is likely that the new value is probably
similar to the old value.
Equation (4.2) provides a perfect estimate if the function being approximated is, in
fact, a constant. However, if the function changes at all over the interval, additional terms
82 TRUNCATION ERRORS AND THE TAYLOR SERIES
Box 4.1 Taylor’s Theorem
Taylor’s Theorem
If the function f and its first n 1 1 derivatives are continuous on an in-
terval containing a and x, then the value of the function at x is given by
f(x) 5 f(a) 1 f ¿(a)(x 2 a) 1
f –(a)
2!
(x 2 a)2
1
f (3)
(a)
3!
(x 2 a)3
1 p
1
f (n)
(a)
n!
(x 2 a)n
1 Rn (B4.1.1)
where the remainder Rn is defined as
Rn 5 #
x
a
(x 2 t)n
n!
f (n11)
(t)dt (B4.1.2)
where t 5 a dummy variable. Equation (B4.1.1) is called the Taylor
series or Taylor’s formula. If the remainder is omitted, the right side
of Eq. (B4.1.1) is the Taylor polynomial approximation to f(x). In
essence, the theorem states that any smooth function can be ap-
proximated as a polynomial.
Equation (B4.1.2) is but one way, called the integral form, by
which the remainder can be expressed. An alternative formulation
can be derived on the basis of the integral mean-value theorem.
First Theorem of Mean for Integrals
If the function g is continuous and integrable on an interval contain-
ing a and x, then there exists a point j between a and x such that
#
x
a
g(t) dt 5 g(j)(x 2 a) (B4.1.3)
In other words, this theorem states that the integral can be repre-
sented by an average value for the function g(j) times the interval
length x 2 a. Because the average must occur between the mini-
mum and maximum values for the interval, there is a point x 5 j at
which the function takes on the average value.
The first theorem is in fact a special case of a second mean-
value theorem for integrals.
Second Theorem of Mean for Integrals
If the functions g and h are continuous and integrable on an interval
containing a and x, and h does not change sign in the interval, then
there exists a point j between a and x such that
#
x
a
g(t)h(t)dt 5 g(j) #
x
a
h(t) dt (B4.1.4)
Thus, Eq. (B4.1.3) is equivalent to Eq. (B4.1.4) with h(t) 5 1.
The second theorem can be applied to Eq. (B4.1.2) with
g(t) 5 f (n11)
(t) h(t) 5
(x 2 t)n
n!
As t varies from a to x, h(t) is continuous and does not change sign.
Therefore, if f(n11)
(t) is continuous, then the integral mean-value
theorem holds and
Rn 5
f (n11)
(j)
(n 1 1)!
(x 2 a)n11
This equation is referred to as the derivative or Lagrange form of
the remainder.
of the Taylor series are required to provide a better estimate. For example, the first-order
approximation is developed by adding another term to yield
f(xi11)  f(xi) 1 f¿(xi)(xi11 2 xi) (4.3)
The additional first-order term consists of a slope f9(xi) multiplied by the distance between
xi and xi+1. Thus, the expression is now in the form of a straight line and is capable of
predicting an increase or decrease of the function between xi and xi+1.
Although Eq. (4.3) can predict a change, it is exact only for a straight-line, or linear,
trend. Therefore, a second-order term is added to the series to capture some of the cur-
vature that the function might exhibit:
f(xi11)  f(xi) 1 f ¿(xi)(xi11 2 xi) 1
f –(xi)
2!
(xi11 2 xi)2
(4.4)
4.1 THE TAYLOR SERIES 83
In a similar manner, additional terms can be included to develop the complete Taylor
series expansion:
f(xi11) 5 f(xi) 1 f¿(xi)(xi11 2 xi) 1
f –(xi)
2!
(xi11 2 xi)2
1
f (3)
(xi)
3!
(xi11 2 xi)3
1 p 1
f (n)
(xi)
n!
(xi11 2 xi)n
1 Rn (4.5)
Note that because Eq. (4.5) is an infinite series, an equal sign replaces the approximate
sign that was used in Eqs. (4.2) through (4.4). A remainder term is included to account
for all terms from n 1 1 to infinity:
Rn 5
f (n11)
(j)
(n 1 1)!
(xi11 2 xi)n11
(4.6)
where the subscript n connotes that this is the remainder for the nth-order approximation
and j is a value of x that lies somewhere between xi and xi+1. The introduction of the j
is so important that we will devote an entire section (Sec. 4.1.1) to its derivation. For
the time being, it is sufficient to recognize that there is such a value that provides an
exact determination of the error.
It is often convenient to simplify the Taylor series by defining a step size h 5 xi+1 2 xi
and expressing Eq. (4.5) as
f(xi11) 5 f(xi) 1 f¿(xi)h 1
f –(xi)
2!
h2
1
f (3)
(xi)
3!
h3
1 p 1
f n
(xi)
n!
hn
1 Rn (4.7)
where the remainder term is now
Rn 5
f (n11)
(j)
(n 1 1)!
hn11
(4.8)
EXAMPLE 4.1 Taylor Series Approximation of a Polynomial
Problem Statement. Use zero- through fourth-order Taylor series expansions to approxi-
mate the function
f(x) 5 20.1x4
2 0.15x3
2 0.5x2
2 0.25x 1 1.2
from xi 5 0 with h 5 1. That is, predict the function’s value at xi+1 5 1.
Solution. Because we are dealing with a known function, we can compute values for
f(x) between 0 and 1. The results (Fig. 4.1) indicate that the function starts at f(0) 5 1.2
and then curves downward to f(1) 5 0.2. Thus, the true value that we are trying to predict
is 0.2.
The Taylor series approximation with n 5 0 is [Eq. (4.2)]
f(xi11) . 1.2
84 TRUNCATION ERRORS AND THE TAYLOR SERIES
Thus, as in Fig. 4.1, the zero-order approximation is a constant. Using this formulation
results in a truncation error [recall Eq. (3.2)] of
Et 5 0.2 2 1.2 5 21.0
at x 5 1.
For n 5 1, the first derivative must be determined and evaluated at x 5 0:
f ¿(0) 5 20.4(0.0)3
2 0.45(0.0)2
2 1.0(0.0) 2 0.25 5 20.25
Therefore, the first-order approximation is [Eq. (4.3)]
f(xi11) . 1.2 2 0.25h
which can be used to compute f(1) 5 0.95. Consequently, the approximation begins to
capture the downward trajectory of the function in the form of a sloping straight line
(Fig. 4.1). This results in a reduction of the truncation error to
Et 5 0.2 2 0.95 5 20.75
For n 5 2, the second derivative is evaluated at x 5 0:
f –(0) 5 21.2(0.0)2
2 0.9(0.0) 2 1.0 5 21.0
Therefore, according to Eq. (4.4),
f(xi11) . 1.2 2 0.25h 2 0.5h2
and substituting h 5 1, f(1) 5 0.45. The inclusion of the second derivative now adds
some downward curvature resulting in an improved estimate, as seen in Fig. 4.1. The
truncation error is reduced further to 0.2 2 0.45 5 20.25.
FIGURE 4.1
The approximation of f(x) 5 20.1x4
2 0.15x3
2 0.5x2
2 0.25x 1 1.2 at x 5 1 by zero-order,
first-order, and second-order Taylor series expansions.
Second
order
First order
T
r
u
e
f(x)
1.0
0.5
0
xi = 0 xi + 1 = 1 x
f(xi + 1)
f(xi + 1) ⯝ f(xi) + f ⬘(xi)h + h2
h
f ⬙(xi)
2!
f(xi + 1) ⯝ f(xi) + f ⬘(xi)h
f(xi + 1) ⯝ f(xi)
f(xi)
Zero order
4.1 THE TAYLOR SERIES 85
Additional terms would improve the approximation even more. In fact, the inclusion
of the third and the fourth derivatives results in exactly the same equation we started with:
f(x) 5 1.2 2 0.25h 2 0.5h2
2 0.15h3
2 0.1h4
where the remainder term is
R4 5
f (5)
(j)
5!
h5
5 0
because the fifth derivative of a fourth-order polynomial is zero. Consequently, the Taylor
series expansion to the fourth derivative yields an exact estimate at xi+1 5 1:
f(1) 5 1.2 2 0.25(1) 2 0.5(1)2
2 0.15(1)3
2 0.1(1)4
5 0.2
In general, the nth-order Taylor series expansion will be exact for an nth-order
polynomial. For other differentiable and continuous functions, such as exponentials and
sinusoids, a finite number of terms will not yield an exact estimate. Each additional term
will contribute some improvement, however slight, to the approximation. This behavior
will be demonstrated in Example 4.2. Only if an infinite number of terms are added will
the series yield an exact result.
Although the above is true, the practical value of Taylor series expansions is that,
in most cases, the inclusion of only a few terms will result in an approximation that is
close enough to the true value for practical purposes. The assessment of how many terms
are required to get “close enough” is based on the remainder term of the expansion.
Recall that the remainder term is of the general form of Eq. (4.8). This relationship has
two major drawbacks. First, j is not known exactly but merely lies somewhere between
xi and xi+1. Second, to evaluate Eq. (4.8), we need to determine the (n 1 1)th derivative
of f(x). To do this, we need to know f(x). However, if we knew f(x), there would be no
need to perform the Taylor series expansion in the present context!
Despite this dilemma, Eq. (4.8) is still useful for gaining insight into truncation errors.
This is because we do have control over the term h in the equation. In other words, we
can choose how far away from x we want to evaluate f(x), and we can control the num-
ber of terms we include in the expansion. Consequently, Eq. (4.8) is usually expressed as
Rn 5 O(hn11
)
where the nomenclature O(hn11
) means that the truncation error is of the order of hn11
. That
is, the error is proportional to the step size h raised to the (n 1 l)th power. Although this
approximation implies nothing regarding the magnitude of the derivatives that multiply hn11
,
it is extremely useful in judging the comparative error of numerical methods based on Taylor
series expansions. For example, if the error is O(h), halving the step size will halve the error.
On the other hand, if the error is O(h2
), halving the step size will quarter the error.
In general, we can usually assume that the truncation error is decreased by the ad-
dition of terms to the Taylor series. In many cases, if h is sufficiently small, the first- and
other lower-order terms usually account for a disproportionately high percent of the error.
Thus, only a few terms are required to obtain an adequate estimate. This property is
illustrated by the following example.
86 TRUNCATION ERRORS AND THE TAYLOR SERIES
EXAMPLE 4.2 Use of Taylor Series Expansion to Approximate a Function with an Infinite
Number of Derivatives
Problem Statement. Use Taylor series expansions with n 5 0 to 6 to approximate
f(x) 5 cos x at xi+1 5 py3 on the basis of the value of f(x) and its derivatives at xi 5
py4. Note that this means that h 5 py3 2 py4 5 py12.
Solution. As with Example 4.1, our knowledge of the true function means that we can
determine the correct value f(py3) 5 0.5.
The zero-order approximation is [Eq. (4.3)]
f a
p
3
b  cos a
p
4
b 5 0.707106781
which represents a percent relative error of
et 5
0.5 2 0.707106781
0.5
100% 5 241.4%
For the first-order approximation, we add the first derivative term where f9(x) 5 2sin x:
f a
p
3
b  cos a
p
4
b 2 sin a
p
4
b a
p
12
b 5 0.521986659
which has et 5 24.40 percent.
For the second-order approximation, we add the second derivative term where
f 0(x) 5 2cos x:
f a
p
3
b  cos a
p
4
b 2 sin a
p
4
b a
p
12
b 2
cos(py4)
2
a
p
12
b
2
5 0.497754491
with et 5 0.449 percent. Thus, the inclusion of additional terms results in an improved
estimate.
The process can be continued and the results listed, as in Table 4.1. Notice that the
derivatives never go to zero, as was the case with the polynomial in Example 4.1. There-
fore, each additional term results in some improvement in the estimate. However, also
notice how most of the improvement comes with the initial terms. For this case, by the
time we have added the third-order term, the error is reduced to 2.62 3 1022
percent,
TABLE 4.1 Taylor series approximation of f(x) 5 cos x at xi11 5 p/3 using a base
point of p/4. Values are shown for various orders (n) of approximation.
Order n f(n)
(x) f(P/3) Et
0 cos x 0.707106781 241.4
1 2sin x 0.521986659 24.4
2 2cos x 0.497754491 0.449
3 sin x 0.499869147 2.62 3 1022
4 cos x 0.500007551 21.51 3 1023
5 2sin x 0.500000304 26.08 3 1025
6 2cos x 0.499999988 2.44 3 1026
4.1 THE TAYLOR SERIES 87
4.1.1 The Remainder for the Taylor Series Expansion
Before demonstrating how the Taylor series is actually used to estimate numerical errors,
we must explain why we included the argument j in Eq. (4.8). A mathematical derivation
is presented in Box 4.1. We will now develop an alternative exposition based on a some-
what more visual interpretation. Then we can extend this specific case to the more
general formulation.
Suppose that we truncated the Taylor series expansion [Eq. (4.7)] after the zero-
order term to yield
f(xi11)  f(xi)
A visual depiction of this zero-order prediction is shown in Fig. 4.2. The remainder, or
error, of this prediction, which is also shown in the illustration, consists of the infinite
series of terms that were truncated:
R0 5 f¿(xi)h 1
f –(xi)
2!
h2
1
f (3)
(xi)
3!
h3
1 p
It is obviously inconvenient to deal with the remainder in this infinite series format.
One simplification might be to truncate the remainder itself, as in
R0  f ¿(xi)h (4.9)
FIGURE 4.2
Graphical depiction of a zero-order Taylor series prediction and remainder.
Zero-order prediction
Exact prediction
f(x)
xi xi + 1 x
h
f(xi)
R0
which means that we have attained 99.9738 percent of the true value. Consequently,
although the addition of more terms will reduce the error further, the improvement
becomes negligible.
88 TRUNCATION ERRORS AND THE TAYLOR SERIES
Although, as stated in the previous section, lower-order derivatives usually account for
a greater share of the remainder than the higher-order terms, this result is still inexact
because of the neglected second- and higher-order terms. This “inexactness” is implied
by the approximate equality symbol () employed in Eq. (4.9).
An alternative simplification that transforms the approximation into an equivalence
is based on a graphical insight. As in Fig. 4.3, the derivative mean-value theorem states
that if a function f(x) and its first derivative are continuous over an interval from xi to
xi+1, then there exists at least one point on the function that has a slope, designated by
f9(j), that is parallel to the line joining f(xi) and f(xi+1). The parameter j marks the x
value where this slope occurs (Fig. 4.3). A physical illustration of this theorem is that,
if you travel between two points with an average velocity, there will be at least one mo-
ment during the course of the trip when you will be moving at that average velocity.
By invoking this theorem it is simple to realize that, as illustrated in Fig. 4.3, the
slope f9(j) is equal to the rise R0 divided by the run h, or
f ¿(j) 5
R0
h
which can be rearranged to give
R0 5 f ¿(j)h (4.10)
Thus, we have derived the zero-order version of Eq. (4.8). The higher-order versions are merely
a logical extension of the reasoning used to derive Eq. (4.10). The first-order version is
R1 5
f –(j)
2!
h2
(4.11)
FIGURE 4.3
Graphical depiction of the derivative mean-value theorem.
f(x)
xi xi + 1
 x
h
R0
Slope = f⬘()
Slope =
R0
h
4.1 THE TAYLOR SERIES 89
For this case, the value of j conforms to the x value corresponding to the second de-
rivative that makes Eq. (4.11) exact. Similar higher-order versions can be developed from
Eq. (4.8).
4.1.2 Using the Taylor Series to Estimate Truncation Errors
Although the Taylor series will be extremely useful in estimating truncation errors
throughout this book, it may not be clear to you how the expansion can actually be
applied to numerical methods. In fact, we have already done so in our example of the
falling parachutist. Recall that the objective of both Examples 1.1 and 1.2 was to pre-
dict velocity as a function of time. That is, we were interested in determining y(t). As
specified by Eq. (4.5), y(t) can be expanded in a Taylor series:
y(ti11) 5 y(ti) 1 y¿(ti)(ti11 2 ti) 1
y–(ti)
2!
(ti11 2 ti)2
1 p 1 Rn (4.12)
Now let us truncate the series after the first derivative term:
y(ti11) 5 y(ti) 1 y¿(ti)(ti11 2 ti) 1 R1 (4.13)
Equation (4.13) can be solved for
y¿(ti) 5
y(ti11) 2 y(ti)
ti11 2 ti
2
R1
ti11 2 ti
(4.14)
First-order Truncation
approximation error
The first part of Eq. (4.14) is exactly the same relationship that was used to approximate
the derivative in Example 1.2 [Eq. (1.11)]. However, because of the Taylor series ap-
proach, we have now obtained an estimate of the truncation error associated with this
approximation of the derivative. Using Eqs. (4.6) and (4.14) yields
R1
ti11 2 ti
5
y–(j)
2!
(ti11 2 ti) (4.15)
or
R1
ti11 2 ti
5 O(ti11 2 ti) (4.16)
Thus, the estimate of the derivative [Eq. (1.11) or the first part of Eq. (4.14)] has a trun-
cation error of order ti11 2 ti. In other words, the error of our derivative approximation
should be proportional to the step size. Consequently, if we halve the step size, we would
expect to halve the error of the derivative.
EXAMPLE 4.3 The Effect of Nonlinearity and Step Size on the Taylor Series Approximation
Problem Statement. Figure 4.4 is a plot of the function
f(x) 5 xm
(E4.3.1)
for m 5 1, 2, 3, and 4 over the range from x 5 1 to 2. Notice that for m 5 1 the function
is linear, and as m increases, more curvature or nonlinearity is introduced into the function.
90 TRUNCATION ERRORS AND THE TAYLOR SERIES
FIGURE 4.4
Plot of the function f(x) 5 xm
for m 5 1, 2, 3, and 4. Notice that the function becomes more
nonlinear as m increases.
1
0
5
10
15
2 x
f(x)
m = 2
m
=
3
m
=
4
m = 1
Employ the first-order Taylor series to approximate this function for various values of the
exponent m and the step size h.
Solution. Equation (E4.3.1) can be approximated by a first-order Taylor series expansion,
as in
f(xi11) 5 f(xi) 1 mxm21
i h (E4.3.2)
which has a remainder
R1 5
f–(xi)
2!
h2
1
f (3)
(xi)
3!
h3
1
f (4)
(xi)
4!
h4
1 p
First, we can examine how the approximation performs as m increases—that is, as the func-
tion becomes more nonlinear. For m 5 1, the actual value of the function at x 5 2 is 2.
4.1 THE TAYLOR SERIES 91
The Taylor series yields
f(2) 5 1 1 1(1) 5 2
and
R1 5 0
The remainder is zero because the second and higher derivatives of a linear function
are zero. Thus, as expected, the first-order Taylor series expansion is perfect when the
underlying function is linear.
For m 5 2, the actual value is f(2) 5 22
5 4. The first-order Taylor series
approximation is
f(2) 5 1 1 2(1) 5 3
and
R1 5 2
2(1)2
1 0 1 0 1 p 5 1
Thus, because the function is a parabola, the straight-line approximation results in a
discrepancy. Note that the remainder is determined exactly.
For m 5 3, the actual value is f(2) 5 23
5 8. The Taylor series approximation is
f(2) 5 1 1 3(1)2
(1) 5 4
and
R1 5 6
2(1)2
1 6
6(1)3
1 0 1 0 1 p 5 4
Again, there is a discrepancy that can be determined exactly from the Taylor series.
For m 5 4, the actual value is f(2) 5 24
5 16. The Taylor series approximation is
f(2) 5 1 1 4(1)3
(1) 5 5
and
R1 5 12
2 (1)2
1 24
6 (1)3
1 24
24(1)4
1 0 1 0 1 p 5 11
On the basis of these four cases, we observe that R1 increases as the function be-
comes more nonlinear. Furthermore, R1 accounts exactly for the discrepancy. This is
because Eq. (E4.3.1) is a simple monomial with a finite number of derivatives. This
permits a complete determination of the Taylor series remainder.
Next, we will examine Eq. (E4.3.2) for the case m 5 4 and observe how R1 changes
as the step size h is varied. For m 5 4, Eq. (E4.3.2) is
f(x 1 h) 5 f(x) 1 4x3
i h
If x 5 1, f(1) 5 1 and this equation can be expressed as
f(1 1 h) 5 1 1 4h
with a remainder of
R1 5 6h2
1 4h3
1 h4
92 TRUNCATION ERRORS AND THE TAYLOR SERIES
This leads to the conclusion that the discrepancy will decrease as h is reduced. Also, at
sufficiently small values of h, the error should become proportional to h2
. That is, as h is
halved, the error will be quartered. This behavior is confirmed by Table 4.2 and Fig. 4.5.
Thus, we conclude that the error of the first-order Taylor series approximation
decreases as m approaches 1 and as h decreases. Intuitively, this means that the Taylor
FIGURE 4.5
Log-log plot of the remainder R1 of the first-order Taylor series approximation of the function f(x) 5 x4
versus step size h. A line with a slope of 2 is also shown to indicate that as h decreases, the
error becomes proportional to h2
.
兩Slope兩 = 2
0.1
1
0.001
0.01
0.1
1
10
0.01 h
R1
TABLE 4.2 Comparison of the exact value of the function f(x) 5 x4
with the first-order
Taylor series approximation. Both the function and the approximation are
evaluated at x 1 h, where x 5 1.
First-Order
h True Approximation R1
1 16 5 11
0.5 5.0625 3 2.0625
0.25 2.441406 2 0.441406
0.125 1.601807 1.5 0.101807
0.0625 1.274429 1.25 0.024429
0.03125 1.130982 1.125 0.005982
0.015625 1.063980 1.0625 0.001480
4.1 THE TAYLOR SERIES 93
series becomes more accurate when the function we are approximating becomes more
like a straight line over the interval of interest. This can be accomplished either by reduc-
ing the size of the interval or by “straightening” the function by reducing m. Obviously,
the latter option is usually not available in the real world because the functions we analyze
are typically dictated by the physical problem context. Consequently, we do not have
control of their lack of linearity, and our only recourse is reducing the step size or includ-
ing additional terms in the Taylor series expansion.
4.1.3 Numerical Differentiation
Equation (4.14) is given a formal label in numerical methods—it is called a finite divided
difference. It can be represented generally as
f¿(xi) 5
f(xi11) 2 f(xi)
xi11 2 xi
1 O(xi11 2 xi) (4.17)
or
f¿(xi) 5
¢fi
h
1 O(h) (4.18)
where D fi is referred to as the first forward difference and h is called the step size, that
is, the length of the interval over which the approximation is made. It is termed a “forward”
difference because it utilizes data at i and i 1 1 to estimate the derivative (Fig. 4.6a). The
entire term D fyh is referred to as a first finite divided difference.
This forward divided difference is but one of many that can be developed from the
Taylor series to approximate derivatives numerically. For example, backward and centered
difference approximations of the first derivative can be developed in a fashion similar to
the derivation of Eq. (4.14). The former utilizes values at xi21 and xi (Fig. 4.6b), whereas
the latter uses values that are equally spaced around the point at which the derivative is
estimated (Fig. 4.6c). More accurate approximations of the first derivative can be devel-
oped by including higher-order terms of the Taylor series. Finally, all the above versions
can also be developed for second, third, and higher derivatives. The following sections
provide brief summaries illustrating how some of these cases are derived.
Backward Difference Approximation of the First Derivative. The Taylor series can
be expanded backward to calculate a previous value on the basis of a present value, as in
f(xi21) 5 f(xi) 2 f¿(xi)h 1
f–(xi)
2!
h2
2 p (4.19)
Truncating this equation after the first derivative and rearranging yields
f¿(xi) 
f(xi) 2 f(xi21)
h
5
§fi
h
(4.20)
where the error is O(h), and = fi is referred to as the first backward difference. See Fig. 4.6b
for a graphical representation.
94 TRUNCATION ERRORS AND THE TAYLOR SERIES
FIGURE 4.6
Graphical depiction of (a) forward, (b) backward, and (c) centered finite-divided-difference
approximations of the first derivative.
2h
xi–1 xi+1 x
f(x)
True derivative
Approximation
(c)
h
xi–1 xi x
f(x)
True derivative
A
p
p
r
o
x
i
m
a
t
i
o
n
(b)
h
xi xi+1 x
f(x)
True derivative
Approximation
(a)
4.1 THE TAYLOR SERIES 95
Centered Difference Approximation of the First Derivative. A third way to approxi-
mate the first derivative is to subtract Eq. (4.19) from the forward Taylor series expansion:
f(xi11) 5 f(xi) 1 f¿(xi)h 1
f–(xi)
2!
h2
1 p (4.21)
to yield
f(xi11) 5 f(xi21) 1 2f¿(xi)h 1
2f (3)
(xi)
3!
h3
1 p
which can be solved for
f ¿(xi) 5
f(xi11) 2 f(xi21)
2h
2
f (3)
(xi)
6
h2
2 p
or
f¿(xi) 5
f(xi11) 2 f(xi21)
2h
2 O(h2
) (4.22)
Equation (4.22) is a centered difference representation of the first derivative. Notice that
the truncation error is of the order of h2
in contrast to the forward and backward
approximations that were of the order of h. Consequently, the Taylor series analysis
yields the practical information that the centered difference is a more accurate represen-
tation of the derivative (Fig. 4.6c). For example, if we halve the step size using a forward
or backward difference, we would approximately halve the truncation error, whereas for
the central difference, the error would be quartered.
EXAMPLE 4.4 Finite-Divided-Difference Approximations of Derivatives
Problem Statement. Use forward and backward difference approximations of O(h) and
a centered difference approximation of O(h2
) to estimate the first derivative of
f(x) 5 20.1x4
2 0.15x3
2 0.5x2
2 0.25x 1 1.25
at x 5 0.5 using a step size h 5 0.5. Repeat the computation using h 5 0.25. Note that
the derivative can be calculated directly as
f¿(x) 5 20.4x3
2 0.45x2
2 1.0x 2 0.25
and can be used to compute the true value as f9(0.5) 5 20.9125.
Solution. For h 5 0.5, the function can be employed to determine
xi21 5 0 f(xi21) 5 1.2
xi 5 0.5 f(xi) 5 0.925
xi11 5 1.0 f(xi11) 5 0.2
These values can be used to compute the forward divided difference [Eq. (4.17)],
f ¿(0.5) 
0.2 2 0.925
0.5
5 21.45 Zet Z 5 58.9%
96 TRUNCATION ERRORS AND THE TAYLOR SERIES
the backward divided difference [Eq. (4.20)],
f¿(0.5) 
0.925 2 1.2
0.5
5 20.55 Zet Z 5 39.7%
and the centered divided difference [Eq. (4.22)],
f ¿(0.5) 
0.2 2 1.2
1.0
5 21.0 Zet Z 5 9.6%
For h 5 0.25,
xi21 5 0.25 f(xi21) 5 1.10351563
xi 5 0.5 f(xi) 5 0.925
xi11 5 0.75 f(xi11) 5 0.63632813
which can be used to compute the forward divided difference,
f¿(0.5) 
0.63632813 2 0.925
0.25
5 21.155 Zet Z 5 26.5%
the backward divided difference,
f¿(0.5) 
0.925 2 1.10351563
0.25
5 20.714 Zet Z 5 21.7%
and the centered divided difference,
f¿(0.5) 
0.63632813 2 1.10351563
0.5
5 20.934 Zet Z 5 2.4%
For both step sizes, the centered difference approximation is more accurate than
forward or backward differences. Also, as predicted by the Taylor series analysis, halving
the step size approximately halves the error of the backward and forward differences and
quarters the error of the centered difference.
Finite Difference Approximations of Higher Derivatives. Besides first derivatives,
the Taylor series expansion can be used to derive numerical estimates of higher deriva-
tives. To do this, we write a forward Taylor series expansion for f(xi12) in terms of f(xi):
f(xi12) 5 f(xi) 1 f¿(xi)(2h) 1
f–(xi)
2!
(2h)2
1 p (4.23)
Equation (4.21) can be multiplied by 2 and subtracted from Eq. (4.23) to give
f(xi12) 2 2f(xi11) 5 2f(xi) 1 f–(xi)h2
1 p
which can be solved for
f–(xi) 5
f(xi12) 2 2f(xi11) 1 f(xi)
h2
1 O(h) (4.24)
4.2 ERROR PROPAGATION 97
This relationship is called the second forward finite divided difference. Similar manipula-
tions can be employed to derive a backward version
f–(xi) 5
f(xi) 2 2f(xi21) 1 f(xi22)
h2
1 O(h)
and a centered version
f –(xi) 5
f(xi11) 2 2f(xi) 1 f(xi21)
h2
1 O(h2
)
As was the case with the first-derivative approximations, the centered case is more accurate.
Notice also that the centered version can be alternatively expressed as
f–(xi) 
f(xi11) 2 f(xi)
h
2
f(xi) 2 f(xi21)
h
h
Thus, just as the second derivative is a derivative of a derivative, the second divided
difference approximation is a difference of two first divided differences.
We will return to the topic of numerical differentiation in Chap. 23. We have intro-
duced you to the topic at this point because it is a very good example of why the Taylor
series is important in numerical methods. In addition, several of the formulas introduced
in this section will be employed prior to Chap. 23.
4.2 ERROR PROPAGATION
The purpose of this section is to study how errors in numbers can propagate through
mathematical functions. For example, if we multiply two numbers that have errors, we
would like to estimate the error in the product.
4.2.1 Functions of a Single Variable
Suppose that we have a function f(x) that is dependent on a single independent variable x.
Assume that x̃ is an approximation of x. We, therefore, would like to assess the effect
of the discrepancy between x and x̃ on the value of the function. That is, we would like
to estimate
¢f(x̃) 5 Z f(x) 2 f(x̃)Z
The problem with evaluating ¢f(x̃) is that f(x) is unknown because x is unknown. We can
overcome this difficulty if x̃ is close to x and f(x̃) is continuous and differentiable. If these
conditions hold, a Taylor series can be employed to compute f(x) near f(x̃), as in
f(x) 5 f(x̃) 1 f¿(x̃)(x 2 x̃) 1
f–(x̃)
2
(x 2 x̃)2
1 p
Dropping the second- and higher-order terms and rearranging yields
f(x) 2 f(x̃)  f¿(x̃)(x 2 x̃)
98 TRUNCATION ERRORS AND THE TAYLOR SERIES
or
¢f(x̃) 5 Z f¿(x̃)Z¢x̃ (4.25)
where ¢f(x̃) 5 Z f(x) 2 f(x̃)Z represents an estimate of the error of the function and
¢x̃ 5 Zx 2 x̃Z represents an estimate of the error of x. Equation (4.25) provides the capabil-
ity to approximate the error in f(x) given the derivative of a function and an estimate of the
error in the independent variable. Figure 4.7 is a graphical illustration of the operation.
EXAMPLE 4.5 Error Propagation in a Function of a Single Variable
Problem Statement. Given a value of x̃ 5 2.5 with an error of ¢x̃ 5 0.01, estimate
the resulting error in the function f(x) 5 x3
.
Solution. Using Eq. (4.25),
¢f(x̃)  3(2.5)2
(0.01) 5 0.1875
Because f(2.5) 5 15.625, we predict that
f(2.5) 5 15.625 6 0.1875
or that the true value lies between 15.4375 and 15.8125. In fact, if x were actually 2.49,
the function could be evaluated as 15.4382, and if x were 2.51, it would be 15.8132. For
this case, the first-order error analysis provides a fairly close estimate of the true error.
True error
兩f⬘(x)兩⌬x
Estimated error
x x x
f(x)
⌬x
FIGURE 4.7
Graphical depiction of first-
order error propagation.
4.2 ERROR PROPAGATION 99
4.2.2 Functions of More than One Variable
The foregoing approach can be generalized to functions that are dependent on more
than one independent variable. This is accomplished with a multivariable version of the
Taylor series. For example, if we have a function of two independent variables u and
y, the Taylor series can be written as
f(ui11, yi11) 5 f(ui, yi) 1
0f
0u
(ui11 2 ui) 1
0f
0y
(yi11 2 yi)
1
1
2!
c
02
f
0u2
(ui11 2 ui)2
1 2
02
f
0u0y
(ui11 2 ui)(yi11 2 yi)
1
02
f
0y2
(yi11 2 yi)2
d 1 p (4.26)
where all partial derivatives are evaluated at the base point i. If all second-order and
higher terms are dropped, Eq. (4.26) can be solved for
¢f(ũ, ỹ) 5 `
0f
0u
` ¢ũ 1 `
0f
0y
` ¢ỹ
where ¢ũ and ¢ỹ 5 estimates of the errors in u and y, respectively.
For n independent variables x̃1, x̃2, p , x̃n having errors ¢x̃1, ¢x̃2, p , ¢xn the
following general relationship holds:
¢f(x̃1, x̃2, p , x̃n)  `
0f
0x1
` ¢x̃1 1 `
0f
0x2
` ¢x̃2 1 p 1 `
0f
0xn
` ¢x̃n (4.27)
EXAMPLE 4.6 Error Propagation in a Multivariable Function
Problem Statement. The deflection y of the top of a sailboat mast is
y 5
FL4
8EI
where F 5 a uniform side loading (N/m), L 5 height (m), E 5 the modulus of elasticity
(N/m2
), and I 5 the moment of inertia (m4
). Estimate the error in y given the following data:
F̃ 5 750 N/m ¢F̃ 5 30 N/m
L̃ 5 9 m ¢L̃ 5 0.03 m
Ẽ 5 7.5 3 109
N/m2
¢Ẽ 5 5 3 107
N/m2
Ĩ 5 0.0005 m4
¢I
˜ 5 0.000005 m4
Solution. Employing Eq. (4.27) gives
¢y(F̃, L̃, Ẽ, I
˜) 5 `
0y
0F
` ¢F̃ 1 `
0y
0L
` ¢L̃ 1 `
0y
0E
` ¢Ẽ 1 `
0y
0I
` ¢I
˜
or
¢y(F̃, L̃, Ẽ, Ĩ2 
L̃4
8ẼĨ
¢F̃ 1
F̃L̃3
2ẼI
˜ ¢L̃ 1
F̃L̃4
8Ẽ2
Ĩ
¢Ẽ 1
F̃L̃4
8ẼI
˜2
¢I
˜
100 TRUNCATION ERRORS AND THE TAYLOR SERIES
Substituting the appropriate values gives
¢y 5 0.006561 1 0.002187 1 0.001094 1 0.00164 5 0.011482
Therefore, y 5 0.164025 6 0.011482. In other words, y is between 0.152543 and
0.175507 m. The validity of these estimates can be verified by substituting the extreme
values for the variables into the equation to generate an exact minimum of
ymin 5
720(8.97)4
8(7.55 3 109
)0.000505
5 0.152818
and
ymax 5
780(9.03)4
8(7.45 3 109
)0.000495
5 0.175790
Thus, the first-order estimates are reasonably close to the exact values.
Equation (4.27) can be employed to define error propagation relationships for
common mathematical operations. The results are summarized in Table 4.3. We will
leave the derivation of these formulas as a homework exercise.
4.2.3 Stability and Condition
The condition of a mathematical problem relates to its sensitivity to changes in its input
values. We say that a computation is numerically unstable if the uncertainty of the input
values is grossly magnified by the numerical method.
These ideas can be studied using a first-order Taylor series
f(x) 5 f(x̃) 1 f¿(x̃)(x 2 x̃)
This relationship can be employed to estimate the relative error of f(x) as in
f(x) 2 f(x̃)
f(x)

f¿(x̃)(x 2 x̃)
f(x̃)
The relative error of x is given by
x 2 x̃
x̃
TABLE 4.3 Estimated error bounds associated with common
mathematical operations using inexact numbers ũ and ṽ.
Operation Estimated Error
Addition ¢(ũ 1 ṽ) ¢ũ 1 ¢ṽ
Subtraction ¢(ũ 2 ṽ) ¢ũ 1 ¢ṽ
Multiplication ¢(ũ 3 ṽ) ZũZ¢ṽ 1 ZṽZ¢ũ
Division ¢ a
ũ
ṽ
b
ZũZ¢ṽ 1 ZṽZ¢ũ
ZṽZ2
4.3 TOTAL NUMERICAL ERROR 101
A condition number can be defined as the ratio of these relative errors
Condition number 5
x
˜ f¿(x̃)
f(x̃)
(4.28)
The condition number provides a measure of the extent to which an uncertainty in x is
magnified by f(x). A value of 1 tells us that the function’s relative error is identical to the
relative error in x. A value greater than 1 tells us that the relative error is amplified, whereas
a value less than 1 tells us that it is attenuated. Functions with very large values are said to
be ill-conditioned. Any combination of factors in Eq. (4.28) that increases the numerical
value of the condition number will tend to magnify uncertainties in the computation of f(x).
EXAMPLE 4.7 Condition Number
Problem Statement. Compute and interpret the condition number for
f(x) 5 tan x for x̃ 5
p
2
1 0.1a
p
2
b
f(x) 5 tan x for x̃ 5
p
2
1 0.01a
p
2
b
Solution. The condition number is computed as
Condition number 5
x̃(1ycos2
x)
tan x̃
For x̃ 5 py2 1 0.1(py2),
Condition number 5
1.7279(40.86)
26.314
5 211.2
Thus, the function is ill-conditioned. For x̃ 5 py2 1 0.01(py2), the situation is even
worse:
Condition number 5
1.5865(4053)
263.66
5 2101
For this case, the major cause of ill conditioning appears to be the derivative. This makes sense
because in the vicinity of py2, the tangent approaches both positive and negative infinity.
4.3 TOTAL NUMERICAL ERROR
The total numerical error is the summation of the truncation and round-off errors. In
general, the only way to minimize round-off errors is to increase the number of significant
figures of the computer. Further, we have noted that round-off error will increase due to
subtractive cancellation or due to an increase in the number of computations in an analy-
sis. In contrast, Example 4.4 demonstrated that the truncation error can be reduced by
decreasing the step size. Because a decrease in step size can lead to subtractive cancella-
tion or to an increase in computations, the truncation errors are decreased as the round-off
102 TRUNCATION ERRORS AND THE TAYLOR SERIES
errors are increased. Therefore, we are faced by the following dilemma: The strategy for
decreasing one component of the total error leads to an increase of the other component.
In a computation, we could conceivably decrease the step size to minimize truncation
errors only to discover that in doing so, the round-off error begins to dominate the solu-
tion and the total error grows! Thus, our remedy becomes our problem (Fig. 4.8). One
challenge that we face is to determine an appropriate step size for a particular computation.
We would like to choose a large step size in order to decrease the amount of calculations
and round-off errors without incurring the penalty of a large truncation error. If the total
error is as shown in Fig. 4.8, the challenge is to identify the point of diminishing returns
where round-off error begins to negate the benefits of step-size reduction.
In actual cases, however, such situations are relatively uncommon because most com-
puters carry enough significant figures that round-off errors do not predominate. Neverthe-
less, they sometimes do occur and suggest a sort of “numerical uncertainty principle” that
places an absolute limit on the accuracy that may be obtained using certain computerized
numerical methods. We explore such a case in the following section.
4.3.1 Error Analysis of Numerical Differentiation
As described in the Sec. 4.1.3, a centered difference approximation of the first derivative
can be written as (Eq. 4.22):
f¿(xi) 5
f(xi11) 2 f(xi21)
2h
2
f (3)
(j)
6
h2
(4.29)
True Finite-difference Truncation
value approximation error
FIGURE 4.8
A graphical depiction of the trade-off between round-off and truncation error that sometimes
comes into play in the course of a numerical method. The point of diminishing returns is shown,
where round-off error begins to negate the benefits of step-size reduction.
Total error
Round-off error
Truncation error
log step size
log
error
Point of
diminishing
returns
4.3 TOTAL NUMERICAL ERROR 103
Thus, if the two function values in the numerator of the finite-difference approximation
have no round-off error, the only error is due to truncation.
However, because we are using digital computers, the function values do include
round-off error as in
f(xi21) 5 f˜(xi21) 1 ei21
f(xi11) 5 f˜(xi11) 1 ei11
where the f˜’s are the rounded function values and the e’s are the associated round-off
errors. Substituting these values into Eq. (4.29) gives
f¿(xi) 5
f˜(xi11) 2 f˜(xi21)
2h
1
ei11 2 ei21
2h
2
f (3)
(j)
6
h2
True Finite-difference Round-off Truncation
value approximation error error
We can see that the total error of the finite-difference approximation consists of a round-
off error which increases with step size and a truncation error that decreases with step
size.
Assuming that the absolute value of each component of the round-off error has an
upper bound of e, the maximum possible value of the difference ei+1 2 ei will be 2e.
Further, assume that the third derivative has a maximum absolute value of M. An upper
bound on the absolute value of the total error can therefore be represented as
Total error 5 ` f¿(xi) 2
f˜(xi11) 2 f˜(xi21)
2h
` #
e
h
1
h2
M
6
(4.30)
An optimal step size can be determined by differentiating Eq. (4.30), setting the result
equal to zero and solving for
hopt 5
B
3 3e
M
(4.31)
EXAMPLE 4.8 Round-off and Truncation Errors in Numerical Differentiation
Problem Statement. In Example 4.4, we used a centered difference approximation of
O(h2
) to estimate the first derivative of the following function at x 5 0.5,
f(x) 5 20.1x4
2 0.15x3
2 0.5x2
2 0.25x 1 1.2
Perform the same computation starting with h 5 1. Then progressively divide the step
size by a factor of 10 to demonstrate how round-off becomes dominant as the step size
is reduced. Relate your results to Eq. (4.31). Recall that the true value of the derivative
is 20.9125.
Solution. We can develop a program to perform the computations and plot the results.
For the present example, we have done this with a MATLAB software M-file. Notice
that we pass both the function and its analytical derivative as arguments. In addition, the
function generates a plot of the results.
104 TRUNCATION ERRORS AND THE TAYLOR SERIES
function diffex(func,dfunc,x,n)
format long
dftrue=dfunc(x);
h=1;
H(1)=h;
D(1)=(func(x+h)−func(x−h))/(2*h);
E(1)=abs(dftrue−D(1));
for i=2:n
h=h/10;
H(i)=h;
D(i)=(func(x+h)−func(x−h))/(2*h);
E(i)=abs(dftrue−D(i));
end
L=[H' D' E']';
fprintf(' step size finite difference true errorn');
fprintf('%14.10f %16.14f %16.13fn',L);
loglog(H,E),xlabel('Step Size'),ylabel('Error')
title('Plot of Error Versus Step Size')
format short
The M-file can then be run using the following commands:
 ff=@(x) −0.1*x^4−0.15*x^3−0.5*x^2−0.25*x+1.2;
 df=@(x) −0.4*x^3−0.45*x^2−x−0.25;
 diffex(ff,df,0.5,11)
When the function is run, the following numeric output is generated along with the plot
(Fig. 4.9):
step size finite difference true error
1.0000000000 −1.26250000000000 0.3500000000000
0.1000000000 −0.91600000000000 0.0035000000000
0.0100000000 −0.91253500000000 0.0000350000000
0.0010000000 −0.91250035000001 0.0000003500000
0.0001000000 −0.91250000349985 0.0000000034998
0.0000100000 −0.91250000003318 0.0000000000332
0.0000010000 −0.91250000000542 0.0000000000054
0.0000001000 −0.91249999945031 0.0000000005497
0.0000000100 −0.91250000333609 0.0000000033361
0.0000000010 −0.91250001998944 0.0000000199894
0.0000000001 −0.91250007550059 0.0000000755006
The results are as expected. At first, round-off is minimal and the estimate is dominated
by truncation error. Hence, as in Eq. (4.30), the total error drops by a factor of 100 each
time we divide the step by 10. However, starting at h 5 0.0001, we see round-off error
begin to creep in and erode the rate at which the error diminishes. A minimum error is
reached at h 5 1026
. Beyond this point, the error increases as round-off dominates.
Because we are dealing with an easily differentiable function, we can also investigate
whether these results are consistent with Eq. (4.31). First, we can estimate M by evalu-
ating the function’s third derivative as
M 5 Z f 3
(0.5) Z 5 Z 22.4(0.5) 2 0.9Z 5 2.1
4.3 TOTAL NUMERICAL ERROR 105
Because MATLAB has a precision of about 15 to 16 base-10 digits, a rough estimate of
the upper bound on round-off would be about e 5 0.5 3 10216
. Substituting these values
into Eq. (4.31) gives
hopt 5
B
3 3(0.5 3 10216
)
2.1
5 4.3 3 1026
which is on the same order as the result of 1 3 1026
obtained with our computer program.
4.3.2 Control of Numerical Errors
For most practical cases, we do not know the exact error associated with numerical meth-
ods. The exception, of course, is when we have obtained the exact solution that makes
our numerical approximations unnecessary. Therefore, for most engineering applications
we must settle for some estimate of the error in our calculations.
There are no systematic and general approaches to evaluating numerical errors for
all problems. In many cases, error estimates are based on the experience and judgment
of the engineer.
Although error analysis is to a certain extent an art, there are several practical program-
ming guidelines we can suggest. First and foremost, avoid subtracting two nearly equal
numbers. Loss of significance almost always occurs when this is done. Sometimes you can
rearrange or reformulate the problem to avoid subtractive cancellation. If this is not pos-
sible, you may want to use extended-precision arithmetic. Furthermore, when adding and
FIGURE 4.9
Plot of error versus step size.
Error
10–12
10–10
10–8
10–6
10–4
Step size
Plot of error versus step size
10–2
10–0
10–10
10–8
10–6
10–4
10–2
100
106 TRUNCATION ERRORS AND THE TAYLOR SERIES
subtracting numbers, it is best to sort the numbers and work with the smallest numbers
first. This avoids loss of significance.
Beyond these computational hints, one can attempt to predict total numerical errors
using theoretical formulations. The Taylor series is our primary tool for analysis of both
truncation and round-off errors. Several examples have been presented in this chapter.
Prediction of total numerical error is very complicated for even moderately sized problems
and tends to be pessimistic. Therefore, it is usually attempted for only small-scale tasks.
The tendency is to push forward with the numerical computations and try to estimate
the accuracy of your results. This can sometimes be done by seeing if the results satisfy
some condition or equation as a check. Or it may be possible to substitute the results
back into the original equation to check that it is actually satisfied.
Finally you should be prepared to perform numerical experiments to increase your
awareness of computational errors and possible ill-conditioned problems. Such experi-
ments may involve repeating the computations with a different step size or method and
comparing the results. We may employ sensitivity analysis to see how our solution changes
when we change model parameters or input values. We may want to try different nu-
merical algorithms that have different theoretical foundations, are based on different com-
putational strategies, or have different convergence properties and stability characteristics.
When the results of numerical computations are extremely critical and may involve
loss of human life or have severe economic ramifications, it is appropriate to take special
precautions. This may involve the use of two or more independent groups to solve the
same problem so that their results can be compared.
The roles of errors will be a topic of concern and analysis in all sections of this
book. We will leave these investigations to specific sections.
4.4 BLUNDERS, FORMULATION ERRORS,
AND DATA UNCERTAINTY
Although the following sources of error are not directly connected with most of the
numerical methods in this book, they can sometimes have great impact on the success
of a modeling effort. Thus, they must always be kept in mind when applying numerical
techniques in the context of real-world problems.
4.4.1 Blunders
We are all familiar with gross errors, or blunders. In the early years of computers, er-
roneous numerical results could sometimes be attributed to malfunctions of the computer
itself. Today, this source of error is highly unlikely, and most blunders must be attributed
to human imperfection.
Blunders can occur at any stage of the mathematical modeling process and can
contribute to all the other components of error. They can be avoided only by sound
knowledge of fundamental principles and by the care with which you approach and
design your solution to a problem.
Blunders are usually disregarded in discussions of numerical methods. This is no
doubt due to the fact that, try as we may, mistakes are to a certain extent unavoidable.
However, we believe that there are a number of ways in which their occurrence can be
4.4 BLUNDERS, FORMULATION ERRORS, AND DATA UNCERTAINTY 107
minimized. In particular, the good programming habits that were outlined in Chap. 2 are
extremely useful for mitigating programming blunders. In addition, there are usually
simple ways to check whether a particular numerical method is working properly.
Throughout this book, we discuss ways to check the results of numerical calculations.
4.4.2 Formulation Errors
Formulation, or model, errors relate to bias that can be ascribed to incomplete mathe-
matical models. An example of a negligible formulation error is the fact that Newton’s
second law does not account for relativistic effects. This does not detract from the ad-
equacy of the solution in Example 1.1 because these errors are minimal on the time and
space scales associated with the falling parachutist problem.
However, suppose that air resistance is not linearly proportional to fall velocity, as
in Eq. (1.7), but is a function of the square of velocity. If this were the case, both the
analytical and numerical solutions obtained in the Chap. 1 would be erroneous because
of formulation error. Further consideration of formulation error is included in some of
the engineering applications in the remainder of the book. You should be cognizant of
these problems and realize that, if you are working with a poorly conceived model, no
numerical method will provide adequate results.
4.4.3 Data Uncertainty
Errors sometimes enter into an analysis because of uncertainty in the physical data upon
which a model is based. For instance, suppose we wanted to test the falling parachutist
model by having an individual make repeated jumps and then measuring his or her
velocity after a specified time interval. Uncertainty would undoubtedly be associated
with these measurements, since the parachutist would fall faster during some jumps than
during others. These errors can exhibit both inaccuracy and imprecision. If our instru-
ments consistently underestimate or overestimate the velocity, we are dealing with an
inaccurate, or biased, device. On the other hand, if the measurements are randomly high
and low, we are dealing with a question of precision.
Measurement errors can be quantified by summarizing the data with one or more
well-chosen statistics that convey as much information as possible regarding specific
characteristics of the data. These descriptive statistics are most often selected to represent
(1) the location of the center of the distribution of the data and (2) the degree of spread
of the data. As such, they provide a measure of the bias and imprecision, respectively.
We will return to the topic of characterizing data uncertainty in Part Five.
Although you must be cognizant of blunders, formulation errors, and uncertain data,
the numerical methods used for building models can be studied, for the most part, inde-
pendently of these errors. Therefore, for most of this book, we will assume that we have
not made gross errors, we have a sound model, and we are dealing with error-free mea-
surements. Under these conditions, we can study numerical errors without complicating
factors.
108 TRUNCATION ERRORS AND THE TAYLOR SERIES
PROBLEMS
4.1 The following infinite series can be used to approximate ex
:
ex
5 1 1 x 1
x2
2
1
x3
3!
1 p 1
xn
n!
(a) Prove that this Maclaurin series expansion is a special case of
the Taylor series expansion [(Eq. (4.7)] with xi 5 0 and h 5 x.
(b) Use the Taylor series to estimate f(x) 5 e2x
at xi11 5 1 for
xi 5 0.2. Employ the zero-, first-, second-, and third-order
versions and compute the ZetZ for each case.
4.2 The Maclaurin series expansion for cos x is
cos x 5 1 2
x2
2
1
x4
4!
2
x6
6!
1
x8
8!
2 p
Starting with the simplest version, cos x 5 1, add terms one at a
time to estimate cos(py3). After each new term is added, compute
the true and approximate percent relative errors. Use your pocket
calculator to determine the true value. Add terms until the absolute
value of the approximate error estimate falls below an error crite-
rion conforming to two significant figures.
4.3 Perform the same computation as in Prob. 4.2, but use the
Maclaurin series expansion for the sin x to estimate sin(py3).
sin x 5 x 2
x3
3!
1
x5
5!
2
x7
7!
1 p
4.4 The Maclaurin series expansion for the arctangent of x is de-
fined for ZxZ # 1 as
arctan x 5 a
q
n50
(21)n
2n 1 1
x2n11
(a) Write out the first four terms (n 5 0, . . . , 3).
(b) Starting with the simplest version, arctan x 5 x, add terms one
at a time to estimate arctan(py6).After each new term is added,
compute the true and approximate percent relative errors. Use
your calculator to determine the true value. Add terms until the
absolute value of the approximate error estimate falls below an
error criterion conforming to two significant figures.
4.5 Use zero- through third-order Taylor series expansions to
predict f (3) for
f(x) 5 25x3
2 6x2
1 7x 2 88
using a base point at x 5 1. Compute the true percent relative error
et for each approximation.
4.6 Use zero- through fourth-order Taylor series expansions to pre-
dict f(2.5) for f(x) 5 ln x using a base point at x 5 1. Compute the
true percent relative error et for each approximation. Discuss the
meaning of the results.
4.7 Use forward and backward difference approximations of O(h)
and a centered difference approximation of O(h2
) to estimate the
first derivative of the function examined in Prob. 4.5. Evaluate the
derivative at x 5 2 using a step size of h 5 0.2. Compare your results
with the true value of the derivative. Interpret your results on the
basis of the remainder term of the Taylor series expansion.
4.8 Use a centered difference approximation of O(h2
) to estimate
the second derivative of the function examined in Prob. 4.5. Per-
form the evaluation at x 5 2 using step sizes of h 5 0.25 and 0.125.
Compare your estimates with the true value of the second deriva-
tive. Interpret your results on the basis of the remainder term of the
Taylor series expansion.
4.9 The Stefan-Boltzmann law can be employed to estimate the
rate of radiation of energy H from a surface, as in
H 5 AesT4
where H is in watts, A 5 the surface area (m2
), e 5 the emissivity
that characterizes the emitting properties of the surface (dimension-
less), s 5 a universal constant called the Stefan-Boltzmann con-
stant (5 5.67 3 1028
W m22
K24
), and T 5 absolute temperature
(K). Determine the error of H for a steel plate with A 5 0.15 m2
,
e 5 0.90, and T 5 650 6 20. Compare your results with the exact
error. Repeat the computation but with T 5 650 6 40. Interpret
your results.
4.10 Repeat Prob. 4.9 but for a copper sphere with
radius 5 0.15 6 0.01 m, e 5 0.90 6 0.05, and T 5 550 6 20.
4.11 Recall that the velocity of the falling parachutist can be com-
puted by [Eq. (1.10)],
y(t) 5
gm
c
(1 2 e2(cym)t
)
Use a first-order error analysis to estimate the error of v at t 5 6, if
g 5 9.81 and m 5 50 but c 5 12.5 6 1.5.
4.12 Repeat Prob. 4.11 with g 5 9.81, t 5 6, c 5 12.5 6 1.5, and
m 5 50 6 2.
4.13 Evaluate and interpret the condition numbers for
(a) f(x) 5 1Zx 2 1Z 1 1 for x 5 1.00001
(b) f(x) 5 e2x
for x 5 10
(c) f(x) 5 2x2
1 1 2 x for x 5 300
(d) f(x) 5
e2x
2 1
x
for x 5 0.001
(e) f(x) 5
sin x
1 1 cos x
for x 5 1.0001p
4.14 Employing ideas from Sec. 4.2, derive the relationships from
Table 4.3.
4.15 Prove that Eq. (4.4) is exact for all values of x if f(x) 5
ax2
1 bx 1 c.
PROBLEMS 109
4.16 Manning’s formula for a rectangular channel can be written
as
Q 5
1
n
(BH)5y3
(B 1 2H)2y3
1S
where Q 5 flow (m3
/s), n 5 a roughness coefficient, B 5 width (m),
H 5 depth (m), and S 5 slope. You are applying this formula to a
stream where you know that the width 5 20 m and the depth 5 0.3 m.
Unfortunately, you know the roughness and the slope to only a 6 10%
precision. That is, you know that the roughness is about 0.03 with a
range from 0.027 to 0.033 and the slope is 0.0003 with a range from
0.00027 to 0.00033. Use a first-order error analysis to determine the
sensitivity of the flow prediction to each of these two factors. Which
one should you attempt to measure with more precision?
4.17 If ZxZ , 1, it is known that
1
1 2 x
5 1 1 x 1 x2
1 x3
1 p
Repeat Prob. 4.1 for this series for x 5 0.1.
4.18 A missile leaves the ground with an initial velocity y0 form-
ing an angle f0 with the vertical as shown in Fig. P4.18. The maxi-
mum desired altitude is aR where R is the radius of the earth. The
laws of mechanics can be used to show that
sin f0 5 (1 1 a)
B
1 2
a
1 1 a
a
ye
y0
b
2
where ye 5 the escape velocity of the missile. It is desired to fire the
missile and reach the design maximum altitude within an accuracy of
62%. Determine the range of values for f0 if yeyy0 5 2 and a 5 0.25.
4.19 To calculate a planet’s space coordinates, we have to solve the
function
f (x) 5 x 2 1 2 0.5 sin x
Let the base point be a 5 xi 5 py2 on the interval [0, p]. Determine
the highest-order Taylor series expansion resulting in a maximum
error of 0.015 on the specified interval. The error is equal to the
absolute value of the difference between the given function and the
specific Taylor series expansion. (Hint: Solve graphically.)
4.20 Consider the function f(x) 5 x3
2 2x 1 4 on the interval [22, 2]
with h 5 0.25. Use the forward, backward, and centered finite differ-
ence approximations for the first and second derivatives so as to
graphically illustrate which approximation is most accurate. Graph all
three first derivative finite difference approximations along with the
theoretical, and do the same for the second derivative as well.
4.21 Derive Eq. (4.31).
4.22 Repeat Example 4.8, but for f(x) 5 cos(x) at x 5 py6.
4.23 Repeat Example 4.8, but for the forward divided difference
(Eq. 4.17).
4.24 Develop a well-structured program to compute the Maclaurin
series expansion for the cosine function as described in Prob. 4.2.
The function should have the following features:
• Iterate until the relative error falls below a stopping criterion
(es) or exceeds a maximum number of iterations (maxit).
Allow the user to specify values for these parameters.
• Include default values of es (5 0.000001) and maxit (5 100)
in the event that they are not specified by the user.
• Return the estimate of cos(x), the approximate relative error, the
number of iterations, and the true relative error (that you can
calculate based on the built-in cosine function).
FIGURE P4.18
R
v0
0
110 EPILOGUE: PART ONE
110
EPILOGUE: PART ONE
PT1.4 TRADE-OFFS
Numerical methods are scientific in the sense that they represent systematic techniques
for solving mathematical problems. However, there is a certain degree of art, subjective
judgment, and compromise associated with their effective use in engineering practice.
For each problem, you may be confronted with several alternative numerical methods
and many different types of computers. Thus, the elegance and efficiency of different
approaches to problems is highly individualistic and correlated with your ability to
choose wisely among options. Unfortunately, as with any intuitive process, the factors
influencing this choice are difficult to communicate. Only by experience can these skills
be fully comprehended and honed. However, because these skills play such a prominent
role in the effective implementation of the methods, we have included this section as an
introduction to some of the trade-offs that you must consider when selecting a numerical
method and the tools for implementing the method. It is hoped that the discussion that
follows will influence your orientation when approaching subsequent material. Also, it
is hoped that you will refer back to this material when you are confronted with choices
and trade-offs in the remainder of the book.
1. Type of Mathematical Problem. As delineated previously in Fig. PT1.2, several types
of mathematical problems are discussed in this book:
(a) Roots of equations.
(b) Systems of simultaneous linear algebraic equations.
(c) Optimization.
(d) Curve fitting.
(e) Numerical integration.
(f) Ordinary differential equations.
(g) Partial differential equations.
You will probably be introduced to the applied aspects of numerical methods by confront-
ing a problem in one of the above areas. Numerical methods will be required because
the problem cannot be solved efficiently using analytical techniques. You should be
cognizant of the fact that your professional activities will eventually involve problems in
all the above areas. Thus, the study of numerical methods and the selection of automatic
computation equipment should, at the minimum, consider these basic types of problems.
More advanced problems may require capabilities of handling areas such as functional
approximation, integral equations, etc. These areas typically demand greater computation
power or advanced methods not covered in this text. Other references such as Carnahan,
Luther, and Wilkes (1969); Hamming (1973); Ralston and Rabinowitz (1978); Burden
and Faires (2005); and Moler (2004) should be consulted for problems beyond the scope
of this book. In addition, at the end of each part of this text, we include a brief summary
PT1.4 TRADE-OFFS 111
and references for advanced methods to provide you with avenues for pursuing further
studies of numerical methods.
2. Type, Availability, Precision, Cost, and Speed of Computer. You may have the option
of working with a variety of computation tools. These range from pocket calculators
to large mainframe computers. Of course, any of the tools can be used to implement
any numerical method (including simple paper and pencil). It is usually not a question
of ultimate capability but rather of cost, convenience, speed, dependability, repeatability,
and precision. Although each of the tools will continue to have utility, the recent rapid
advances in the performance of personal computers have already had a major impact
on the engineering profession. We expect this revolution will spread as technological
improvements continue because personal computers offer an excellent compromise in
convenience, cost, precision, speed, and storage capacity. Furthermore, they can be
readily applied to most practical engineering problems.
3. Program Development Cost versus Software Cost versus Run-Time Cost. Once the
types of mathematical problems to be solved have been identified and the computer
system has been selected, it is appropriate to consider software and run-time costs.
Software development may represent a substantial effort in many engineering projects
and may therefore be a significant cost. In this regard, it is particularly important that
you be very well acquainted with the theoretical and practical aspects of the relevant
numerical methods. In addition, you should be familiar with professionally developed
software. Low-cost software is widely available to implement numerical methods that
may be readily adapted to a broad variety of problems.
4. Characteristics of the Numerical Method. When computer hardware and software
costs are high, or if computer availability is limited (for example, on some timeshare
systems), it pays to choose carefully the numerical method to suit the situation. On
the other hand, if the problem is still at the exploratory stage and computer access
and cost are not problems, it may be appropriate for you to select a numerical method
that always works but may not be the most computationally efficient. The numerical
methods available to solve any particular type of problem involve the types of trade-
offs just discussed and others:
(a) Number of Initial Guesses or Starting Points. Some of the numerical methods for
finding roots of equations or solving differential equations require the user to
specify initial guesses or starting points. Simple methods usually require one
value, whereas complicated methods may require more than one value. The
advantages of complicated methods that are computationally efficient may be
offset by the requirement for multiple starting points.You must use your experience
and judgment to assess the trade-offs for each particular problem.
(b) Rate of Convergence. Certain numerical methods converge more rapidly than
others. However, this rapid convergence may require more refined initial guesses
and more complex programming than a method with slower convergence. Again,
you must use your judgment in selecting a method. Faster is not always better.
(c) Stability. Some numerical methods for finding roots of equations or solutions for
systems of linear equations may diverge rather than converge on the correct answer
for certain problems. Why would you tolerate this possibility when confronted
with design or planning problems? The answer is that these methods may be
highly efficient when they work. Thus, trade-offs again emerge. You must decide
112 EPILOGUE: PART ONE
if your problem requirements justify the effort needed to apply a method that may
not always converge.
(d) Accuracy and Precision. Some numerical methods are simply more accurate or
precise than others. Good examples are the various equations available for
numerical integration. Usually, the performance of low-accuracy methods can be
improved by decreasing the step size or increasing the number of applications
over a given interval. Is it better to use a low-accuracy method with small step
sizes or a high-accuracy method with large step sizes? This question must be
addressed on a case-by-case basis taking into consideration the additional factors
such as cost and ease of programming. In addition, you must also be concerned
with round-off errors when you are using multiple applications of low-accuracy
methods and when the number of computations becomes large. Here the number
of significant figures handled by the computer may be the deciding factor.
(e) Breadth of Application. Some numerical methods can be applied to only a
limited class of problems or to problems that satisfy certain mathematical
restrictions. Other methods are not affected by such limitations. You must
evaluate whether it is worth your effort to develop programs that employ
techniques that are appropriate for only a limited number of problems. The
fact that such techniques may be widely used suggests that they have
advantages that will often outweigh their disadvantages. Obviously, trade-offs
are occurring.
(f) Special Requirements. Some numerical techniques attempt to increase accuracy
and rate of convergence using additional or special information. An example
would be to use estimated or theoretical values of errors to improve accuracy.
However, these improvements are generally not achieved without some
inconvenience in terms of added computing costs or increased program
complexity.
(g) Programming Effort Required. Efforts to improve rates of convergence, stability,
and accuracy can be creative and ingenious. When improvements can be made
without increasing the programming complexity, they may be considered elegant
and will probably find immediate use in the engineering profession. However, if
they require more complicated programs, you are again faced with a trade-off
situation that may or may not favor the new method.
It is clear that the above discussion concerning a choice of numerical methods
reduces to one of cost and accuracy. The costs are those involved with computer time
and program development. Appropriate accuracy is a question of professional judg-
ment and ethics.
5. Mathematical Behavior of the Function, Equation, or Data. In selecting a particular
numerical method, type of computer, and type of software, you must consider the
complexity of your functions, equations, or data. Simple equations and smooth data
may be appropriately handled by simple numerical algorithms and inexpensive
computers. The opposite is true for complicated equations and data exhibiting
discontinuities.
6. Ease of Application (User-Friendly?). Some numerical methods are easy to apply;
others are difficult. This may be a consideration when choosing one method over
PT1.6 ADVANCED METHODS AND ADDITIONAL REFERENCES 113
another. This same idea applies to decisions regarding program development costs
versus professionally developed software. It may take considerable effort to convert
a difficult program to one that is user-friendly. Ways to do this were introduced in
Chap. 2 and are elaborated throughout the book.
7. Maintenance. Programs for solving engineering problems require maintenance because
during application, difficulties invariably occur. Maintenance may require changing
the program code or expanding the documentation. Simple programs and numerical
algorithms are simpler to maintain.
The chapters that follow involve the development of various types of numerical methods
for various types of mathematical problems. Several alternative methods will be given
in each chapter. These various methods (rather than a single method chosen by the au-
thors) are presented because there is no single “best” method. There is no best method
because there are many trade-offs that must be considered when applying the methods
to practical problems. A table that highlights the trade-offs involved in each method will
be found at the end of each part of the book. This table should assist you in selecting
the appropriate numerical procedure for your particular problem context.
PT1.5 IMPORTANT RELATIONSHIPS AND FORMULAS
Table PT1.2 summarizes important information that was presented in Part One. The table
can be consulted to quickly access important relationships and formulas. The epilogue
of each part of the book will contain such a summary.
PT1.6 ADVANCED METHODS AND ADDITIONAL REFERENCES
The epilogue of each part of the book will also include a section designed to facilitate
and encourage further studies of numerical methods. This section will reference other
books on the subject as well as material related to more advanced methods.1
To extend the background provided in Part One, numerous manuals on computer
programming are available. It would be difficult to reference all the excellent books and
manuals pertaining to specific languages and computers. In addition, you probably already
have material from your previous exposure to programming. However, if this is your first
experience with computers, your instructor and fellow students should also be able to
advise you regarding good reference books for the machines and languages available at
your school.
As for error analysis, any good introductory calculus book will include supplemen-
tary material related to subjects such as the Taylor series expansion. Texts by Swokowski
(1979), Thomas and Finney (1979), and Simmons (1985) provide very readable discus-
sions of these subjects. In addition, Taylor (1982) presents a nice introduction to error
analysis.
Finally, although we hope that our book serves you well, it is always good to con-
sult other sources when trying to master a new subject. Burden and Faires (2005); Ralston
1
Books are referenced only by author here; a complete bibliography will be found at the back of this text.
114 EPILOGUE: PART ONE
TABLE PT1.2 Summary of important information presented in Part One.
Error Definitions
True error Et 5 true value 2 approximation
True percent relative error et 5
true value 2 approximation
true value
100%
Approximate percent relative error ea 5
present approximation 2 previous approximation
present approximation
100%
Stopping criterion Terminate computation when
ea , es
where es is the desired percent relative error
Taylor Series
Taylor series expansion
f (xi11) 5 f (xi) 1 f'(xi)h 1
f''(xi)
2!
h2
1
f'''(xi)
3!
h3
1 p 1
f (n)
(xi)
n!
hn
1 Rn
where
Remainder Rn 5
f (n11)
(j)
(n 1 1)!
hn11
or
Rn 5 O(hn11
)
Numerical Differentiation
First forward finite divided difference
f'(xi) 5
f (xi11) 2 f (xi)
h
1 O(h)
(Other divided differences are summarized in Chaps. 4 and 23.)
Error Propagation
For n independent variables x1, x2,..., xn having errors ¢x̃1, ¢x̃2 , p ,¢x̃n, the error in the function
f can be estimated via
¢f 5 `
0f
0x1
` ¢x̃1 1 `
0f
0x2
` ¢x̃2 1 p 1 `
0f
0xn
` ¢x̃n
and Rabinowitz (1978); Hoffman (1992); and Carnahan, Luther, and Wilkes (1969) pro-
vide comprehensive discussions of most numerical methods. Other enjoyable books on
the subject are Gerald and Wheatley (2004), and Cheney and Kincaid (2008). In addition,
Press et al. (2007) include algorithms to implement a variety of methods, and Moler
(2004) and Chapra (2007) are devoted to numerical methods with MATLAB software.
This page intentionally left blank
PART TWO
117
PT2.1 MOTIVATION
Years ago, you learned to use the quadratic formula
x 5
2b 6 2b2
2 4ac
2a
(PT2.1)
to solve
f(x) 5 ax2
1 bx 1 c 5 0 (PT2.2)
The values calculated with Eq. (PT2.1) are called the “roots” of Eq. (PT2.2). They rep-
resent the values of x that make Eq. (PT2.2) equal to zero. Thus, we can define the root
of an equation as the value of x that makes f(x) 5 0. For this reason, roots are sometimes
called the zeros of the equation.
Although the quadratic formula is handy for solving Eq. (PT2.2), there are many other
functions for which the root cannot be determined so easily. For these cases, the numerical
methods described in Chaps. 5, 6, and 7 provide efficient means to obtain the answer.
PT2.1.1 Noncomputer Methods for Determining Roots
Before the advent of digital computers, there were several ways to solve for roots of
algebraic and transcendental equations. For some cases, the roots could be obtained by
direct methods, as was done with Eq. (PT2.1). Although there were equations like this
that could be solved directly, there were many more that could not. For example, even
an apparently simple function such as f(x) 5 e2x
2 x cannot be solved analytically. In
such instances, the only alternative is an approximate solution technique.
One method to obtain an approximate solution is to plot the function and determine
where it crosses the x axis. This point, which represents the x value for which f(x) 5 0,
is the root. Graphical techniques are discussed at the beginning of Chaps. 5 and 6.
Although graphical methods are useful for obtaining rough estimates of roots, they
are limited because of their lack of precision. An alternative approach is to use trial and
error. This “technique” consists of guessing a value of x and evaluating whether f(x) is
zero. If not (as is almost always the case), another guess is made, and f(x) is again
evaluated to determine whether the new value provides a better estimate of the root. The
process is repeated until a guess is obtained that results in an f(x) that is close to zero.
Such haphazard methods are obviously inefficient and inadequate for the require-
ments of engineering practice. The techniques described in Part Two represent alterna-
tives that are also approximate but employ systematic strategies to home in on the true
root. As elaborated on in the following pages, the combination of these systematic meth-
ods and computers makes the solution of most applied roots-of-equations problems a
simple and efficient task.
ROOTS OF EQUATIONS
118 ROOTS OF EQUATIONS
PT2.1.2 Roots of Equations and Engineering Practice
Although they arise in other problem contexts, roots of equations frequently occur in the
area of engineering design. Table PT2.1 lists several fundamental principles that are
routinely used in design work. As introduced in Chap. 1, mathematical equations or
models derived from these principles are employed to predict dependent variables as a
function of independent variables, forcing functions, and parameters. Note that in each
case, the dependent variables reflect the state or performance of the system, whereas the
parameters represent its properties or composition.
An example of such a model is the equation, derived from Newton’s second law,
used in Chap. 1 for the parachutist’s velocity:
y 5
gm
c
(1 2 e2(cym)t
) (PT2.3)
where velocity y 5 the dependent variable, time t 5 the independent variable, the grav-
itational constant g 5 the forcing function, and the drag coefficient c and mass m 5
parameters. If the parameters are known, Eq. (PT2.3) can be used to predict the parachut-
ist’s velocity as a function of time. Such computations can be performed directly because
y is expressed explicitly as a function of time. That is, it is isolated on one side of the
equal sign.
TABLE PT2.1 Fundamental principles used in engineering design problems.
Fundamental Dependent Independent Parameters
Principle Variable Variable
Heat balance Temperature Time and Thermal properties
position of material and
geometry of system
Mass balance Concentration or Time and Chemical behavior
quantity of mass position of material, mass
transfer coefficients,
and geometry of
system
Force balance Magnitude and Time and Strength of material,
direction of forces position structural properties,
and geometry of
system
Energy balance Changes in the kinetic- Time and Thermal properties,
and potential-energy position mass of material,
states of the system and system geometry
Newton’s laws Acceleration, velocity, Time and Mass of material,
of motion or location position system geometry,
and dissipative
parameters such
as friction or drag
Kirchhoff’s laws Currents and voltages Time Electrical properties
in electric circuits of systems such as
resistance, capacitance,
and inductance
PT2.2 MATHEMATICAL BACKGROUND 119
However, suppose we had to determine the drag coefficient for a parachutist of a
given mass to attain a prescribed velocity in a set time period. Although Eq. (PT2.3)
provides a mathematical representation of the interrelationship among the model vari-
ables and parameters, it cannot be solved explicitly for the drag coefficient. Try it. There
is no way to rearrange the equation so that c is isolated on one side of the equal sign.
In such cases, c is said to be implicit.
This represents a real dilemma, because many engineering design problems involve
specifying the properties or composition of a system (as represented by its parameters)
to ensure that it performs in a desired manner (as represented by its variables). Thus,
these problems often require the determination of implicit parameters.
The solution to the dilemma is provided by numerical methods for roots of equations.
To solve the problem using numerical methods, it is conventional to reexpress Eq. (PT2.3).
This is done by subtracting the dependent variable y from both sides of the equation to give
f(c) 5
gm
c
(1 2 e2(cym)t
) 2 y (PT2.4)
The value of c that makes f(c) 5 0 is, therefore, the root of the equation. This value
also represents the drag coefficient that solves the design problem.
Part Two of this book deals with a variety of numerical and graphical methods for deter-
mining roots of relationships such as Eq. (PT2.4). These techniques can be applied to engi-
neering design problems that are based on the fundamental principles outlined in Table PT2.1
as well as to many other problems confronted routinely in engineering practice.
PT2.2 MATHEMATICAL BACKGROUND
For most of the subject areas in this book, there is usually some prerequisite mathematical
background needed to successfully master the topic. For example, the concepts of error
estimation and the Taylor series expansion discussed in Chaps. 3 and 4 have direct relevance
to our discussion of roots of equations. Additionally, prior to this point we have mentioned
the terms “algebraic” and “transcendental” equations. It might be helpful to formally define
these terms and discuss how they relate to the scope of this part of the book.
By definition, a function given by y 5 f(x) is algebraic if it can be expressed in the
form
fn yn
1 fn21yn21
1 p 1 f1y 1 f0 5 0 (PT2.5)
where fi 5 an ith-order polynomial in x. Polynomials are a simple class of algebraic
functions that are represented generally by
fn(x) 5 a0 1 a1x 1 a2x2
1 p 1 anxn
(PT2.6)
where n 5 the order of the polynomial and the a’s 5 constants. Some specific examples
are
f2(x) 5 1 2 2.37x 1 7.5x2
(PT2.7)
and
f6(x) 5 5x2
2 x3
1 7x6
(PT2.8)
120 ROOTS OF EQUATIONS
A transcendental function is one that is nonalgebraic. These include trigonometric,
exponential, logarithmic, and other, less familiar, functions. Examples are
f(x) 5 ln x2
2 1 (PT2.9)
and
f(x) 5 e20.2x
sin (3x 2 0.5) (PT2.10)
Roots of equations may be either real or complex. Although there are cases where com-
plex roots of nonpolynomials are of interest, such situations are less common than for
polynomials. As a consequence, the standard methods for locating roots typically fall
into two somewhat related but primarily distinct problem areas:
1. The determination of the real roots of algebraic and transcendental equations. These
techniques are usually designed to determine the value of a single real root on the
basis of foreknowledge of its approximate location.
2. The determination of all real and complex roots of polynomials. These methods are
specifically designed for polynomials. They systematically determine all the roots of
the polynomial rather than determining a single real root given an approximate location.
In this book we discuss both. Chapters 5 and 6 are devoted to the first category.
Chapter 7 deals with polynomials.
PT2.3 ORIENTATION
Some orientation is helpful before proceeding to the numerical methods for determining
roots of equations. The following is intended to give you an overview of the material in
Part Two. In addition, some objectives have been included to help you focus your efforts
when studying the material.
PT2.3.1 Scope and Preview
Figure PT2.1 is a schematic representation of the organization of Part Two. Examine this
figure carefully, starting at the top and working clockwise.
After the present introduction, Chap. 5 is devoted to bracketing methods for finding
roots. These methods start with guesses that bracket, or contain, the root and then sys-
tematically reduce the width of the bracket. Two specific methods are covered: bisection
and false position. Graphical methods are used to provide visual insight into the tech-
niques. Error formulations are developed to help you determine how much computational
effort is required to estimate the root to a prespecified level of precision.
Chapter 6 covers open methods. These methods also involve systematic trial-and-
error iterations but do not require that the initial guesses bracket the root. We will dis-
cover that these methods are usually more computationally efficient than bracketing
methods but that they do not always work. One-point iteration, Newton-Raphson, and
secant methods are described. Graphical methods are used to provide geometric insight
into cases where the open methods do not work. Formulas are developed that provide
an idea of how fast open methods home in on the root. An advanced approach, Brent’s
method, that combines the reliability of bracketing with the speed of open methods is
PT2.3 ORIENTATION 121
described. In addition, an approach to extend the Newton-Raphson method to systems of
nonlinear equations is explained.
Chapter 7 is devoted to finding the roots of polynomials. After background sections
on polynomials, the use of conventional methods (in particular the open methods from
Chap. 6) are discussed. Then two special methods for locating polynomial roots are
CHAPTER 5
Bracketing
Methods
PART 2
Roots
of
Equations
CHAPTER 7
Roots
of
Polynomials
CHAPTER 8
Engineering
Case Studies
EPILOGUE
6.6
Nonlinear
systems
6.5
Multiple
roots
6.4
Brent’s
method
6.3
Secant
6.2
Newton-
Raphson
6.1
Fixed-point
iteration
PT 2.2
Mathematical
background
PT 2.6
Advanced
methods
PT 2.5
Important
formulas
8.4
Mechanical
engineering
8.3
Electrical
engineering
8.2
Civil
engineering
8.1
Chemical
engineering
7.7
Software
packages
7.6
Other
methods
7.1
Polynomials in
engineering
7.2
Computing with
polynomials
7.4
Muller's
method
7.5
Bairstow's
method
7.3
Conventional
methods
PT 2.4
Trade-offs
PT 2.3
Orientation
PT 2.1
Motivation
5.2
Bisection
5.3
False
position
5.4
Incremental
searches
5.1
Graphical
methods
..
CHAPTER 6
Open
Methods
FIGURE PT2.1
Schematic of the organization of the material in Part Two: Roots of Equations.
122 ROOTS OF EQUATIONS
described: Müller’s and Bairstow’s methods. The chapter ends with information related
to finding roots with Excel, MATLAB software, and Mathcad.
Chapter 8 extends the above concepts to actual engineering problems. Engineering case
studies are used to illustrate the strengths and weaknesses of each method and to provide
insight into the application of the techniques in professional practice. The applications also
highlight the trade-offs (as discussed in Part One) associated with the various methods.
An epilogue is included at the end of Part Two. It contains a detailed comparison
of the methods discussed in Chaps. 5, 6, and 7. This comparison includes a description
of trade-offs related to the proper use of each technique. This section also provides a
summary of important formulas, along with references for some numerical methods that
are beyond the scope of this text.
PT2.3.2 Goals and Objectives
Study Objectives. After completing Part Two, you should have sufficient information
to successfully approach a wide variety of engineering problems dealing with roots of
equations. In general, you should have mastered the techniques, have learned to assess
their reliability, and be capable of choosing the best method (or methods) for any par-
ticular problem. In addition to these general goals, the specific concepts in Table PT2.2
should be assimilated for a comprehensive understanding of the material in Part Two.
Computer Objectives. The book provides you with software and simple computer algo-
rithms to implement the techniques discussed in Part Two. All have utility as learning tools.
Pseudocodes for several methods are also supplied directly in the text. This informa-
tion will allow you to expand your software library to include programs that are more
efficient than the bisection method. For example, you may also want to have your own
software for the false-position, Newton-Raphson, and secant techniques, which are often
more efficient than the bisection method.
Finally, packages such as Excel, MATLAB, and Mathcad have powerful capabilities for
locating roots. You can use this part of the book to become familiar with these capabilities.
TABLE PT2.2 Specific study objectives for Part Two.
1. Understand the graphical interpretation of a root
2. Know the graphical interpretation of the false-position method and why it is usually superior to the
bisection method
3. Understand the difference between bracketing and open methods for root location
4. Understand the concepts of convergence and divergence; use the two-curve graphical method to
provide a visual manifestation of the concepts
5. Know why bracketing methods always converge, whereas open methods may sometimes diverge
6. Realize that convergence of open methods is more likely if the initial guess is close to the true root
7. Understand the concepts of linear and quadratic convergence and their implications for the
efficiencies of the fixed-point-iteration and Newton-Raphson methods
8. Know the fundamental difference between the false-position and secant methods and how it relates
to convergence
9. Understand how Brent’s method combines the reliability of bisection with the speed of open methods
10. Understand the problems posed by multiple roots and the modifications available to mitigate them
11. Know how to extend the single-equation Newton-Raphson approach to solve systems of nonlinear
equations
123
5
Bracketing Methods
This chapter on roots of equations deals with methods that exploit the fact that a function
typically changes sign in the vicinity of a root. These techniques are called bracketing
methods because two initial guesses for the root are required. As the name implies, these
guesses must “bracket,” or be on either side of, the root. The particular methods described
herein employ different strategies to systematically reduce the width of the bracket and,
hence, home in on the correct answer.
As a prelude to these techniques, we will briefly discuss graphical methods for
depicting functions and their roots. Beyond their utility for providing rough guesses,
graphical techniques are also useful for visualizing the properties of the functions and
the behavior of the various numerical methods.
5.1 GRAPHICAL METHODS
A simple method for obtaining an estimate of the root of the equation f(x) 5 0 is to
make a plot of the function and observe where it crosses the x axis. This point, which
represents the x value for which f(x) 5 0, provides a rough approximation of the root.
EXAMPLE 5.1 The Graphical Approach
Problem Statement. Use the graphical approach to determine the drag coefficient c
needed for a parachutist of mass m 5 68.1 kg to have a velocity of 40 m/s after free-
falling for time t 5 10 s. Note: The acceleration due to gravity is 9.81 m/s2
.
Solution. This problem can be solved by determining the root of Eq. (PT2.4) using the
parameters t 5 10, g 5 9.81, y 5 40, and m 5 68.1:
f(c) 5
9.81(68.1)
c
(1 2 e2(cy68.1)10
) 2 40
or
f(c) 5
668.06
c
(1 2 e20.146843c
) 2 40 (E5.1.1)
Various values of c can be substituted into the right-hand side of this equation to compute
C H A P T E R 5
124 BRACKETING METHODS
These points are plotted in Fig. 5.1. The resulting curve crosses the c axis between 12 and
16. Visual inspection of the plot provides a rough estimate of the root of 14.75. The valid-
ity of the graphical estimate can be checked by substituting it into Eq. (E5.1.1) to yield
f(14.75) 5
668.06
14.75
(1 2 e20.146843(14.75)
) 2 40 5 0.100
which is close to zero. It can also be checked by substituting it into Eq. (PT2.3) along
with the parameter values from this example to give
y 5
9.81(68.1)
14.75
(1 2 e2(14.75y68.1)10
) 5 40.100
which is very close to the desired fall velocity of 40 m/s.
c f(c)
4 34.190
8 17.712
12 6.114
16 22.230
20 28.368
FIGURE 5.1
The graphical approach for determining the roots of an equation.
20
Root
12
8
4
0
20
40
f(c)
c
–10
5.1 GRAPHICAL METHODS 125
Graphical techniques are of limited practical value because they are not precise. However,
graphical methods can be utilized to obtain rough estimates of roots. These estimates can be
employed as starting guesses for numerical methods discussed in this and the next chapter.
Aside from providing rough estimates of the root, graphical interpretations are im-
portant tools for understanding the properties of the functions and anticipating the pitfalls
of the numerical methods. For example, Fig. 5.2 shows a number of ways in which roots
can occur (or be absent) in an interval prescribed by a lower bound xl and an upper
bound xu. Figure 5.2b depicts the case where a single root is bracketed by negative and
positive values of f(x). However, Fig. 5.2d, where f(xl) and f(xu) are also on opposite
sides of the x axis, shows three roots occurring within the interval. In general, if f(xl)
and f(xu) have opposite signs, there are an odd number of roots in the interval. As indi-
cated by Fig. 5.2a and c, if f(xl) and f(xu) have the same sign, there are either no roots
or an even number of roots between the values.
Although these generalizations are usually true, there are cases where they do not
hold. For example, functions that are tangential to the x axis (Fig. 5.3a) and discontinu-
ous functions (Fig. 5.3b) can violate these principles. An example of a function that is
tangential to the axis is the cubic equation f(x) 5 (x 2 2)(x 2 2)(x 2 4). Notice that
x 5 2 makes two terms in this polynomial equal to zero. Mathematically, x 5 2 is called
a multiple root. At the end of Chap. 6, we will present techniques that are expressly
designed to locate multiple roots.
The existence of cases of the type depicted in Fig. 5.3 makes it difficult to develop
general computer algorithms guaranteed to locate all the roots in an interval. However,
when used in conjunction with graphical approaches, the methods described in the
FIGURE 5.2
Illustration of a number of
general ways that a root may
occur in an interval prescribed
by a lower bound xl and an
upper bound xu. Parts (a) and
(c) indicate that if both f(xl) and
f(xu) have the same sign, either
there will be no roots or there
will be an even number of roots
within the interval. Parts (b) and
(d) indicate that if the function
has different signs at the end
points, there will be an odd
number of roots in the interval.
f(x)
x
f(x)
x
f(x)
x
f(x)
x
(a)
(b)
(c)
(d)
xl xu
FIGURE 5.3
Illustration of some exceptions to the general cases depicted in
Fig. 5.2. (a) Multiple root that occurs when the function is tangen-
tial to the x axis. For this case, although the end points are of op-
posite signs, there are an even number of axis intersections for
the interval. (b) Discontinuous function where end points of oppo-
site sign bracket an even number of roots. Special strategies are
required for determining the roots for these cases.
f(x)
x
f(x)
x
(a)
(b)
xl xu
126 BRACKETING METHODS
following sections are extremely useful for solving many roots of equations problems
confronted routinely by engineers and applied mathematicians.
EXAMPLE 5.2 Use of Computer Graphics to Locate Roots
Problem Statement. Computer graphics can expedite and improve your efforts to locate
roots of equations. The function
f(x) 5 sin10x 1 cos3x
has several roots over the range x 5 0 to x 5 5. Use computer graphics to gain insight
into the behavior of this function.
Solution. Packages such as Excel and MATLAB software can be used to generate plots.
Figure 5.4a is a plot of f(x) from x 5 0 to x 5 5. This plot suggests the presence of
several roots, including a possible double root at about x 5 4.2 where f(x) appears to be
.15
0
Y
4.2 4.25
X
4.3
–.15
(c)
FIGURE 5.4
The progressive enlargement of f(x) 5 sin 10x 1 cos 3x by the computer. Such interactive graphics
permits the analyst to determine that two distinct roots exist between x 5 4.2 and x 5 4.3.
5
2
0
Y
0 2.5
X
–2
2
0
Y
3 4
X
5
–2
(a) (b)
5.2 THE BISECTION METHOD 127
tangent to the x axis. A more detailed picture of the behavior of f(x) is obtained by chang-
ing the plotting range from x 5 3 to x 5 5, as shown in Fig. 5.4b. Finally, in Fig. 5.4c, the
vertical scale is narrowed further to f(x) 5 20.15 to f(x) 5 0.15 and the horizontal scale
is narrowed to x 5 4.2 to x 5 4.3. This plot shows clearly that a double root does not exist
in this region and that in fact there are two distinct roots at about x 5 4.23 and x 5 4.26.
Computer graphics will have great utility in your studies of numerical methods. This
capability will also find many other applications in your other classes and professional
activities as well.
FIGURE 5.5
Step 1: Choose lower xl and upper xu guesses for the root such that the function changes sign
over the interval. This can be checked by ensuring that f(xl)f(xu) , 0.
Step 2: An estimate of the root xr is determined by
xr 5
xl 1 xu
2
Step 3: Make the following evaluations to determine in which subinterval the root lies:
(a) If f(xl)f(xr) , 0, the root lies in the lower subinterval. Therefore, set xu 5 xr and return
to step 2.
(b) If f(xl)f(xr) . 0, the root lies in the upper subinterval. Therefore, set xl 5 xr and return
to step 2.
(c) If f(xl)f(xr) 5 0, the root equals xr; terminate the computation.
5.2 THE BISECTION METHOD
When applying the graphical technique in Example 5.1, you have observed (Fig. 5.1)
that f(x) changed sign on opposite sides of the root. In general, if f(x) is real and con-
tinuous in the interval from xl to xu and f(xl) and f(xu) have opposite signs, that is,
f(xl) f(xu) , 0 (5.1)
then there is at least one real root between xl and xu.
Incremental search methods capitalize on this observation by locating an interval
where the function changes sign. Then the location of the sign change (and consequently,
the root) is identified more precisely by dividing the interval into a number of subinter-
vals. Each of these subintervals is searched to locate the sign change. The process is
repeated and the root estimate refined by dividing the subintervals into finer increments.
We will return to the general topic of incremental searches in Sec. 5.4.
The bisection method, which is alternatively called binary chopping, interval halving,
or Bolzano’s method, is one type of incremental search method in which the interval is
always divided in half. If a function changes sign over an interval, the function value at
the midpoint is evaluated. The location of the root is then determined as lying at the
midpoint of the subinterval within which the sign change occurs. The process is repeated
to obtain refined estimates. A simple algorithm for the bisection calculation is listed in
Fig. 5.5, and a graphical depiction of the method is provided in Fig. 5.6. The following
example goes through the actual computations involved in the method.
128 BRACKETING METHODS
EXAMPLE 5.3 Bisection
Problem Statement. Use bisection to solve the same problem approached graphically
in Example 5.1.
Solution. The first step in bisection is to guess two values of the unknown (in the
present problem, c) that give values for f(c) with different signs. From Fig. 5.1, we can
see that the function changes sign between values of 12 and 16. Therefore, the initial
estimate of the root xr lies at the midpoint of the interval
xr 5
12 1 16
2
5 14
This estimate represents a true percent relative error of et 5 5.3% (note that the true
value of the root is 14.8011). Next we compute the product of the function value at the
lower bound and at the midpoint:
f(12) f(14) 5 6.114 (1.611) 5 9.850
which is greater than zero, and hence no sign change occurs between the lower bound
and the midpoint. Consequently, the root must be located between 14 and 16. Therefore,
we create a new interval by redefining the lower bound as 14 and determining a revised
root estimate as
xr 5
14 1 16
2
5 15
which represents a true percent error of et 5 1.3%. The process can be repeated to obtain
refined estimates. For example,
f(14) f(15) 5 1.611(20.384) 5 20.619
16
12
14 16
15
14
FIGURE 5.6
A graphical depiction of the
bisection method. This plot
conforms to the first three
iterations from Example 5.3.
5.2 THE BISECTION METHOD 129
Therefore, the root is between 14 and 15. The upper bound is redefined as 15, and the
root estimate for the third iteration is calculated as
xr 5
14 1 15
2
5 14.5
which represents a percent relative error of et 5 2.0%. The method can be repeated until
the result is accurate enough to satisfy your needs.
In the previous example, you may have noticed that the true error does not decrease
with each iteration. However, the interval within which the root is located is halved with
each step in the process. As discussed in the next section, the interval width provides an
exact estimate of the upper bound of the error for the bisection method.
5.2.1 Termination Criteria and Error Estimates
We ended Example 5.3 with the statement that the method could be continued to obtain
a refined estimate of the root. We must now develop an objective criterion for deciding
when to terminate the method.
An initial suggestion might be to end the calculation when the true error falls
below some prespecified level. For instance, in Example 5.3, the relative error dropped
to 2.0 percent during the course of the computation. We might decide that we should
terminate when the error drops below, say, 0.1 percent. This strategy is flawed because
the error estimates in the example were based on knowledge of the true root of the
function. This would not be the case in an actual situation because there would be no
point in using the method if we already knew the root.
Therefore, we require an error estimate that is not contingent on foreknowledge of
the root. As developed previously in Sec. 3.3, an approximate percent relative error ea
can be calculated, as in [recall Eq. (3.5)]
ea 5 `
xnew
r 2 xold
r
xnew
r
` 100% (5.2)
where xnew
r is the root for the present iteration and xold
r is the root from the previous it-
eration. The absolute value is used because we are usually concerned with the magnitude
of ea rather than with its sign. When ea becomes less than a prespecified stopping cri-
terion es, the computation is terminated.
EXAMPLE 5.4 Error Estimates for Bisection
Problem Statement. Continue Example 5.3 until the approximate error falls below a
stopping criterion of es 5 0.5%. Use Eq. (5.2) to compute the errors.
Solution. The results of the first two iterations for Example 5.3 were 14 and 15. Sub-
stituting these values into Eq. (5.2) yields
ZeaZ 5 `
15 2 14
15
` 100% 5 6.667%
130 BRACKETING METHODS
Recall that the true percent relative error for the root estimate of 15 was 1.3%. Therefore,
ea is greater than et. This behavior is manifested for the other iterations:
Thus, after six iterations ea finally falls below es 5 0.5%, and the computation can
be terminated.
These results are summarized in Fig. 5.7. The “ragged” nature of the true error is due
to the fact that, for bisection, the true root can lie anywhere within the bracketing interval.
The true and approximate errors are far apart when the interval happens to be centered on
the true root. They are close when the true root falls at either end of the interval.
Iteration xl xu xr Ea (%) et (%)
1 12 16 14 5.413
2 14 16 15 6.667 1.344
3 14 15 14.5 3.448 2.035
4 14.5 15 14.75 1.695 0.345
5 14.75 15 14.875 0.840 0.499
6 14.75 14.875 14.8125 0.422 0.077
FIGURE 5.7
Errors for the bisection method.
True and estimated errors are
plotted versus the number of
iterations.
6
2 4
Iterations
Percent
relative
error
0
0.1
1.0
True
Approximate
10
Although the approximate error does not provide an exact estimate of the true error,
Fig. 5.7 suggests that ea captures the general downward trend of et. In addition, the plot
exhibits the extremely attractive characteristic that ea is always greater than et. Thus,
5.2 THE BISECTION METHOD 131
when ea falls below es, the computation could be terminated with confidence that the
root is known to be at least as accurate as the prespecified acceptable level.
Although it is always dangerous to draw general conclusions from a single example,
it can be demonstrated that ea will always be greater than et for the bisection method. This
is because each time an approximate root is located using bisection as xr 5 (xl 1 xu)y2,
we know that the true root lies somewhere within an interval of (xu 2 xl)y2 5 Dxy2.
Therefore, the root must lie within 6Dxy2 of our estimate (Fig. 5.8). For instance, when
Example 5.3 was terminated, we could make the definitive statement that
xr 5 14.5 6 0.5
Because ¢xy2 5 xnew
r 2 xold
r (Fig. 5.9), Eq. (5.2) provides an exact upper bound on
the true error. For this bound to be exceeded, the true root would have to fall outside
the bracketing interval, which, by definition, could never occur for the bisection method.
As illustrated in a subsequent example (Example 5.6), other root-locating techniques do
not always behave as nicely. Although bisection is generally slower than other methods,
FIGURE 5.8
Three ways in which the interval
may bracket the root. In (a) the
true value lies at the center of
the interval, whereas in (b) and
(c) the true value lies near the
extreme. Notice that the dis-
crepancy between the true
value and the midpoint of the
interval never exceeds half the
interval length, or Dxy2.
(b)
(a)
(c)
⌬x/2
xl xr xu
xl xr xu
xl xr xu
⌬x/2
True root
FIGURE 5.9
Graphical depiction of why the
error estimate for bisection
(Dxy2) is equivalent to the root
estimate for the present iteration
(xnew
r ) minus the root estimate for
the previous iteration (xold
r ).
Previous iteration
⌬x/2
xold
r
xnew
r
xnew
– xold
r r
Present iteration
132 BRACKETING METHODS
the neatness of its error analysis is certainly a positive aspect that could make it attrac-
tive for certain engineering applications.
Before proceeding to the computer program for bisection, we should note that the
relationships (Fig. 5.9)
xnew
r 2 xold
r 5
xu 2 xl
2
and
xnew
r 5
xl 1 xu
2
can be substituted into Eq. (5.2) to develop an alternative formulation for the approximate
percent relative error
ea 5 `
xu 2 xl
xu 1 xl
` 100% (5.3)
This equation yields identical results to Eq. (5.2) for bisection. In addition, it allows us to
calculate an error estimate on the basis of our initial guesses—that is, on our first iteration.
For instance, on the first iteration of Example 5.2, an approximate error can be computed as
ea 5 `
16 2 12
16 1 12
` 100% 5 14.29%
Another benefit of the bisection method is that the number of iterations required to
attain an absolute error can be computed a priori—that is, before starting the iterations.
This can be seen by recognizing that before starting the technique, the absolute error is
E0
a 5 x0
u 2 x0
l 5 ¢x0
where the superscript designates the iteration. Hence, before starting the method, we are
at the “zero iteration.” After the first iteration, the error becomes
E1
a 5
¢x0
2
Because each succeeding iteration halves the error, a general formula relating the error
and the number of iterations n is
En
a 5
¢x0
2n (5.4)
If Ea,d is the desired error, this equation can be solved for
n 5
log(¢x0
yEa,d)
log 2
5 log2 a
¢x0
Ea,d
b (5.5)
Let us test the formula. For Example 5.4, the initial interval was Dx0 5 16 2 12 5 4.
After six iterations, the absolute error was
Ea 5
Z14.875 2 14.75Z
2
5 0.0625
5.2 THE BISECTION METHOD 133
We can substitute these values into Eq. (5.5) to give
n 5
log(4y0.0625)
log 2
5 6
Thus, if we knew beforehand that an error of less than 0.0625 was acceptable, the for-
mula tells us that six iterations would yield the desired result.
Although we have emphasized the use of relative errors for obvious reasons, there will
be cases where (usually through knowledge of the problem context) you will be able to
specify an absolute error. For these cases, bisection along with Eq. (5.5) can provide a useful
root-location algorithm. We will explore such applications in the end-of-chapter problems.
5.2.2 Bisection Algorithm
The algorithm in Fig. 5.5 can now be expanded to include the error check (Fig. 5.10).
The algorithm employs user-defined functions to make root location and function evalu-
ation more efficient. In addition, an upper limit is placed on the number of iterations.
Finally, an error check is included to avoid division by zero during the error evaluation.
Such would be the case when the bracketing interval is centered on zero. For this situ-
ation, Eq. (5.2) becomes infinite. If this occurs, the program skips over the error evalu-
ation for that iteration.
The algorithm in Fig. 5.10 is not user-friendly; it is designed strictly to come up
with the answer. In Prob. 5.14 at the end of this chapter, you will have the task of mak-
ing it easier to use and understand.
FUNCTION Bisect(xl, xu, es, imax, xr, iter, ea)
iter 5 0
DO
xrold 5 xr
xr 5 (xl 1 xu) / 2
iter 5 iter 1 1
IF xr ? 0 THEN
ea 5 ABS((xr 2 xrold) / xr) * 100
END IF
test 5 f(xl) * f(xr)
IF test , 0 THEN
xu 5 xr
ELSE IF test . 0 THEN
xl 5 xr
ELSE
ea 5 0
END IF
IF ea , es OR iter $ imax EXIT
END DO
Bisect 5 xr
END Bisect
FIGURE 5.10
Pseudocode for function to
implement bisection.
134 BRACKETING METHODS
5.2.3 Minimizing Function Evaluations
The bisection algorithm in Fig. 5.10 is just fine if you are performing a single root
evaluation for a function that is easy to evaluate. However, there are many instances
in engineering when this is not the case. For example, suppose that you develop a
computer program that must locate a root numerous times. In such cases you could
call the algorithm from Fig. 5.10 thousands and even millions of times in the course
of a single run.
Further, in its most general sense, a univariate function is merely an entity that re-
turns a single value in return for a single value you send to it. Perceived in this sense,
functions are not always simple formulas like the one-line equations solved in the pre-
ceding examples in this chapter. For example, a function might consist of many lines of
code that could take a significant amount of execution time to evaluate. In some cases,
the function might even represent an independent computer program.
Because of both these factors, it is imperative that numerical algorithms minimize
function evaluations. In this light, the algorithm from Fig. 5.10 is deficient. In particular,
notice that in making two function evaluations per iteration, it recalculates one of the
functions that was determined on the previous iteration.
Figure 5.11 provides a modified algorithm that does not have this deficiency. We have
highlighted the lines that differ from Fig. 5.10. In this case, only the new function value at
FUNCTION Bisect(xl, xu, es, imax, xr, iter, ea)
iter 5 0
fl 5 f(xl)
DO
xrold 5 xr
xr 5 (xl 1 xu) / 2
fr 5 f(xr)
iter 5 iter 1 1
IF xr ? 0 THEN
ea 5 ABS((xr 2 xrold) / xr) * 100
END IF
test 5 fl * fr
IF test , 0 THEN
xu 5 xr
ELSE IF test . 0 THEN
xl 5 xr
fl 5 fr
ELSE
ea 5 0
END IF
IF ea , es OR iter $ imax EXIT
END DO
Bisect 5 xr
END Bisect
FIGURE 5.11
Pseudocode for bisection sub-
program which minimizes
function evaluations.
5.3 THE FALSE-POSITION METHOD 135
the root estimate is calculated. Previously calculated values are saved and merely reassigned
as the bracket shrinks. Thus, n 1 1 function evaluations are performed, rather than 2n.
5.3 THE FALSE-POSITION METHOD
Although bisection is a perfectly valid technique for determining roots, its “brute-force”
approach is relatively inefficient. False position is an alternative based on a graphical insight.
A shortcoming of the bisection method is that, in dividing the interval from xl to xu
into equal halves, no account is taken of the magnitudes of f(xl) and f(xu). For example,
if f(xl) is much closer to zero than f(xu), it is likely that the root is closer to xl than to
xu (Fig. 5.12). An alternative method that exploits this graphical insight is to join f(xl)
and f(xu) by a straight line. The intersection of this line with the x axis represents an
improved estimate of the root. The fact that the replacement of the curve by a straight
line gives a “false position” of the root is the origin of the name, method of false position,
or in Latin, regula falsi. It is also called the linear interpolation method.
Using similar triangles (Fig. 5.12), the intersection of the straight line with the
x axis can be estimated as
f(xl)
xr 2 xl
5
f(xu)
xr 2 xu
(5.6)
which can be solved for (see Box 5.1 for details).
xr 5 xu 2
f(xu)(xl 2 xu)
f(xl) 2 f(xu)
(5.7)
FIGURE 5.12
A graphical depiction of the
method of false position. Similar
triangles used to derive the for-
mula for the method are
shaded.
x
f (x)
f(xl)
f (xu)
xu
xl
xr
136 BRACKETING METHODS
This is the false-position formula. The value of xr computed with Eq. (5.7) then replaces
whichever of the two initial guesses, xl or xu, yields a function value with the same sign
as f(xr). In this way, the values of xl and xu always bracket the true root. The process is
repeated until the root is estimated adequately. The algorithm is identical to the one for
bisection (Fig. 5.5) with the exception that Eq. (5.7) is used for step 2. In addition, the
same stopping criterion [Eq. (5.2)] is used to terminate the computation.
EXAMPLE 5.5 False Position
Problem Statement. Use the false-position method to determine the root of the same
equation investigated in Example 5.1 [Eq. (E5.1.1)].
Solution. As in Example 5.3, initiate the computation with guesses of xl 5 12 and
xu 5 16.
First iteration:
xl 5 12 f(xl) 5 6.1139
xu 5 16 f(xu) 5 22.2303
xr 5 16 2
22.2303(12 2 16)
6.1139 2 (22.2303)
5 14.309
which has a true relative error of 0.88 percent.
Second iteration:
f(xl) f(xr) 5 21.5376
Box 5.1 Derivation of the Method of False Position
Cross-multiply Eq. (5.6) to yield
f(xl)(xr 2 xu) 5 f(xu)(xr 2 xl)
Collect terms and rearrange:
xr [ f(xl) 2 f(xu)] 5 xu f(xl) 2 xl f(xu)
Divide by f(xl) 2 f(xu):
xr 5
xu f(xl) 2 xl f(xu)
f(xl) 2 f(xu)
(B5.1.1)
This is one form of the method of false position. Note that it al-
lows the computation of the root xr as a function of the lower and
upper guesses xl and xu. It can be put in an alternative form by
expanding it:
xr 5
xu f(xl)
f(xl) 2 f(xu)
2
xl f(xu)
f(xl) 2 f(xu)
then adding and subtracting xu on the right-hand side:
xr 5 xu 1
xu f(xl)
f(xl) 2 f(xu)
2 xu 2
xl f(xu)
f(xl) 2 f(xu)
Collecting terms yields
xr 5 xu 1
xu f(xu)
f(xl) 2 f(xu)
2
xl f(xu)
f(xl) 2 f(xu)
or
xr 5 xu 2
f(xu)(xl 2 xu)
f(xl) 2 f(xu)
which is the same as Eq. (5.7). We use this form because it involves
one less function evaluation and one less multiplication than Eq.
(B5.1.1). In addition, it is directly comparable with the secant
method, which will be discussed in Chap. 6.
5.3 THE FALSE-POSITION METHOD 137
Therefore, the root lies in the first subinterval, and xr becomes the upper limit for the
next iteration, xu 5 14.9113:
xl 5 12 f(xl) 5 6.1139
xu 5 14.9309 f(xu) 5 20.2515
xr 5 14.9309 2
20.2515(12 2 14.9309)
6.1139 2 (20.2515)
5 14.8151
which has true and approximate relative errors of 0.09 and 0.78 percent. Additional
iterations can be performed to refine the estimate of the roots.
FIGURE 5.13
Comparison of the relative
errors of the bisection and the
false-position methods.
6
3
Iterations
True
percent
relative
error
0
10–2
10–3
Bisection
False position
10
1
10–1
10–4
A feeling for the relative efficiency of the bisection and false-position methods can
be appreciated by referring to Fig. 5.13, where we have plotted the true percent relative
errors for Examples 5.4 and 5.5. Note how the error for false position decreases much
faster than for bisection because of the more efficient scheme for root location in the
false-position method.
Recall in the bisection method that the interval between xl and xu grew smaller during
the course of a computation. The interval, as defined by ¢xy2 5 Z xu 2 xl Zy2 for the first
iteration, therefore provided a measure of the error for this approach. This is not the case
138 BRACKETING METHODS
for the method of false position because one of the initial guesses may stay fixed through-
out the computation as the other guess converges on the root. For instance, in Example 5.5
the lower guess xl remained at 12 while xu converged on the root. For such cases, the
interval does not shrink but rather approaches a constant value.
Example 5.5 suggests that Eq. (5.2) represents a very conservative error criterion.
In fact, Eq. (5.2) actually constitutes an approximation of the discrepancy of the previous
iteration. This is because for a case such as Example 5.5, where the method is converg-
ing quickly (for example, the error is being reduced nearly an order of magnitude per
iteration), the root for the present iteration xnew
r is a much better estimate of the true value
than the result of the previous iteration xold
r . Thus, the quantity in the numerator of Eq. (5.2)
actually represents the discrepancy of the previous iteration. Consequently, we are assured
that satisfaction of Eq. (5.2) ensures that the root will be known with greater accuracy
than the prescribed tolerance. However, as described in the next section, there are cases
where false position converges slowly. For these cases, Eq. (5.2) becomes unreliable, and
an alternative stopping criterion must be developed.
5.3.1 Pitfalls of the False-Position Method
Although the false-position method would seem to always be the bracketing method of
preference, there are cases where it performs poorly. In fact, as in the following example,
there are certain cases where bisection yields superior results.
EXAMPLE 5.6 A Case Where Bisection Is Preferable to False Position
Problem Statement. Use bisection and false position to locate the root of
f(x) 5 x10
2 1
between x 5 0 and 1.3.
Solution. Using bisection, the results can be summarized as
Iteration xl xu xr ␧a (%) ␧t (%)
1 0 1.3 0.65 100.0 35
2 0.65 1.3 0.975 33.3 2.5
3 0.975 1.3 1.1375 14.3 13.8
4 0.975 1.1375 1.05625 7.7 5.6
5 0.975 1.05625 1.015625 4.0 1.6
Thus, after five iterations, the true error is reduced to less than 2 percent. For false
position, a very different outcome is obtained:
Iteration xl xu xr ␧a (%) ␧t (%)
1 0 1.3 0.09430 90.6
2 0.09430 1.3 0.18176 48.1 81.8
3 0.18176 1.3 0.26287 30.9 73.7
4 0.26287 1.3 0.33811 22.3 66.2
5 0.33811 1.3 0.40788 17.1 59.2
5.3 THE FALSE-POSITION METHOD 139
After five iterations, the true error has only been reduced to about 59 percent. In
addition, note that ea , et. Thus, the approximate error is misleading. Insight into these
results can be gained by examining a plot of the function. As in Fig. 5.14, the curve
violates the premise upon which false position was based—that is, if f(xl) is much closer
to zero than f(xu), then the root is closer to xl than to xu (recall Fig. 5.12). Because of
the shape of the present function, the opposite is true.
FIGURE 5.14
Plot of f (x) 5 x10
2 1, illustrating slow convergence of the false-position method.
1.0
10
5
0
f(x)
x
The forgoing example illustrates that blanket generalizations regarding root-location
methods are usually not possible. Although a method such as false position is often supe-
rior to bisection, there are invariably cases that violate this general conclusion. Therefore,
in addition to using Eq. (5.2), the results should always be checked by substituting the root
estimate into the original equation and determining whether the result is close to zero. Such
a check should be incorporated into all computer programs for root location.
The example also illustrates a major weakness of the false-position method: its one-
sidedness. That is, as iterations are proceeding, one of the bracketing points will tend to
140 BRACKETING METHODS
stay fixed. This can lead to poor convergence, particularly for functions with significant
curvature. The following section provides a remedy.
5.3.2 Modified False Position
One way to mitigate the “one-sided” nature of false position is to have the algorithm
detect when one of the bounds is stuck. If this occurs, the function value at the stagnant
bound can be divided in half. This is called the modified false-position method.
The algorithm in Fig. 5.15 implements this strategy. Notice how counters are used
to determine when one of the bounds stays fixed for two iterations. If this occurs, the
function value at this stagnant bound is halved.
The effectiveness of this algorithm can be demonstrated by applying it to Example 5.6.
If a stopping criterion of 0.01% is used, the bisection and standard false-position
FUNCTION ModFalsePos(xl, xu, es, imax, xr, iter, ea)
iter 5 0
fl 5 f(xl)
fu 5 f(xu)
DO
xrold 5 xr
xr 5 xu 2 fu * (xl 2 xu) / (fl 2 fu)
fr 5 f(xr)
iter 5 iter 1 1
IF xr ,. 0 THEN
ea 5 Abs((xr 2 xrold) / xr) * 100
END IF
test 5 fl * fr
IF test , 0 THEN
xu 5 xr
fu 5 f(xu)
iu 5 0
il 5 il 1 1
If il $ 2 THEN fl 5 fl / 2
ELSE IF test . 0 THEN
xl 5 xr
fl 5 f(xl)
il 5 0
iu 5 iu 1 1
IF iu $ 2 THEN fu 5 fu / 2
ELSE
ea 5 0
END IF
IF ea , es OR iter $ imax THEN EXIT
END DO
ModFalsePos 5 xr
End MODFALSEPOS
FIGURE 5.15
Pseudocode for the modified
false-position method.
5.4 INCREMENTAL SEARCHES AND DETERMINING INITIAL GUESSES 141
methods would converge in 14 and 39 iterations, respectively. In contrast, the modified
false-position method would converge in 12 iterations. Thus, for this example, it is
somewhat more efficient than bisection and is vastly superior to the unmodified false-
position method.
5.4 INCREMENTAL SEARCHES AND DETERMINING
INITIAL GUESSES
Besides checking an individual answer, you must determine whether all possible roots
have been located. As mentioned previously, a plot of the function is usually very useful
in guiding you in this task. Another option is to incorporate an incremental search at the
beginning of the computer program. This consists of starting at one end of the region of
interest and then making function evaluations at small increments across the region.
When the function changes sign, it is assumed that a root falls within the increment. The
x values at the beginning and the end of the increment can then serve as the initial guesses
for one of the bracketing techniques described in this chapter.
A potential problem with an incremental search is the choice of the increment length.
If the length is too small, the search can be very time consuming. On the other hand, if
the length is too great, there is a possibility that closely spaced roots might be missed
(Fig. 5.16). The problem is compounded by the possible existence of multiple roots. A
partial remedy for such cases is to compute the first derivative of the function f'(x) at
the beginning and the end of each interval. If the derivative changes sign, it suggests that
a minimum or maximum may have occurred and that the interval should be examined
more closely for the existence of a possible root.
Although such modifications or the employment of a very fine increment can allevi-
ate the problem, it should be clear that brute-force methods such as incremental search
are not foolproof. You would be wise to supplement such automatic techniques with any
other information that provides insight into the location of the roots. Such information
can be found in plotting and in understanding the physical problem from which the
equation originated.
FIGURE 5.16
Cases where roots could be
missed because the increment
length of the search procedure
is too large. Note that the last
root on the right is multiple and
would be missed regardless of
increment length.
x6
x0 x1 x2 x3 x4 x5
f (x)
x
142 BRACKETING METHODS
PROBLEMS
5.1 Determine the real roots of f(x) 5 20.5x2
1 2.5x 1 4.5:
(a) Graphically.
(b) Using the quadratic formula.
(c) Using three iterations of the bisection method to determine the
highest root. Employ initial guesses of xl 5 5 and xu 5 10.
Compute the estimated error ea and the true error et after each
iteration.
5.2 Determine the real root of f(x) 5 5x3
2 5x2
1 6x 2 2:
(a) Graphically.
(b) Using bisection to locate the root. Employ initial guesses of
xl 5 0 and xu 5 1 and iterate until the estimated error ea falls
below a level of es 5 10%.
5.3 Determine the real root of f(x) 5 225 1 82x 2 90x2
1
44x3
2 8x4
1 0.7x5
:
(a) Graphically.
(b) Using bisection to determine the root to es 5 10%. Employ
initial guesses of xl 5 0.5 and xu 5 1.0.
(c) Perform the same computation as in (b) but use the false-
position method and es 5 0.2%.
5.4 (a) Determine the roots of f(x) 5 212 2 21x 1 18x2
2
2.75x3
graphically. In addition, determine the first root of the function
with (b) bisection, and (c) false position. For (b) and (c) use initial
guesses of xl 5 21 and xu 5 0, and a stopping criterion of 1%.
5.5 Locate the first nontrivial root of sin x 5 x2
where x is in radi-
ans. Use a graphical technique and bisection with the initial interval
from 0.5 to 1. Perform the computation until ea is less than es 5 2%.
Also perform an error check by substituting your final answer into
the original equation.
5.6 Determine the positive real root of ln (x2
) 5 0.7 (a) graphi-
cally, (b) using three iterations of the bisection method, with initial
guesses of xl 5 0.5 and xu 5 2, and (c) using three iterations of the
false-position method, with the same initial guesses as in (b).
5.7 Determine the real root of f(x) 5 (0.8 2 0.3x)yx:
(a) Analytically.
(b) Graphically.
(c) Using three iterations of the false-position method and initial
guesses of 1 and 3. Compute the approximate error ea and
the true error et after each iteration. Is there a problem with
the result?
5.8 Find the positive square root of 18 using the false-position
method to within es 5 0.5%. Employ initial guesses of xl 5 4 and
xu 5 5.
5.9 Find the smallest positive root of the function (x is in radians)
x2
Zcos 1xZ 5 5 using the false-position method. To locate the re-
gion in which the root lies, first plot this function for values of x
between 0 and 5. Perform the computation until ea falls below
es 5 1%. Check your final answer by substituting it into the orig-
inal function.
5.10 Find the positive real root of f(x) 5 x4
2 8x3
2 35x2
1
450x 2 1001 using the false-position method. Use initial guesses
of xl 5 4.5 and xu 5 6 and perform five iterations. Compute both
the true and approximate errors based on the fact that the root is
5.60979. Use a plot to explain your results and perform the compu-
tation to within es 5 1.0%.
5.11 Determine the real root of x3.5
5 80: (a) analytically and
(b) with the false-position method to within es 5 2.5%. Use initial
guesses of 2.0 and 5.0.
5.12 Given
f(x) 5 22x6
2 1.5x4
1 10x 1 2
Use bisection to determine the maximum of this function. Employ
initial guesses of xl 5 0 and xu 5 1, and perform iterations until
the approximate relative error falls below 5%.
5.13 The velocity y of a falling parachutist is given by
y 5
gm
c
(1 2 e2(cym)t
)
where g 5 9.81mys2
. For a parachutist with a drag coefficient
c 5 15 kg/s, compute the mass m so that the velocity is y 5 36 m/s
at t 5 10 s. Use the false-position method to determine m to a level
of es 5 0.1%.
5.14 Use bisection to determine the drag coefficient needed so that
an 82-kg parachutist has a velocity of 36 m/s after 4 s of free fall.
Note: The acceleration of gravity is 9.81 m/s2
. Start with initial
guesses of xl 5 3 and xu 5 5 and iterate until the approximate
relative error falls below 2%. Also perform an error check by sub-
stituting your final answer into the original equation.
5.15 As depicted in Fig. P5.15, the velocity of water, y (m/s),
discharged from a cylindrical tank through a long pipe can be
computed as
y 5 12gH tanh a
12gH
2L
tb
H
L
v
FIGURE P5.15
PROBLEMS 143
your answer. Determine the approximate relative error after each
iteration. Employ initial guesses of 0 and R.
5.18 The saturation concentration of dissolved oxygen in freshwa-
ter can be calculated with the equation (APHA, 1992)
lnosf 5 2139.34411 1
1.575701 3 105
Ta
2
6.642308 3 107
T2
a
1
1.243800 3 1010
T3
a
2
8.621949 3 1011
T4
a
where osf 5 the saturation concentration of dissolved oxygen in
freshwater at 1 atm (mg/L) and Ta 5 absolute temperature (K).
Remember that Ta 5 T 1 273.15, where T 5 temperature (°C).
According to this equation, saturation decreases with increasing
temperature. For typical natural waters in temperate climates, the
equation can be used to determine that oxygen concentration ranges
from 14.621 mg/L at 0°C to 6.413 mg/L at 40°C. Given a value of
oxygen concentration, this formula and the bisection method can be
used to solve for temperature in °C.
(a) If the initial guesses are set as 0 and 408C, how many bisection
iterations would be required to determine temperature to an
absolute error of 0.058C?
(b) Develop and test a bisection program to determine T as a func-
tion of a given oxygen concentration to a prespecified absolute
error as in (a). Given initial guesses of 0 and 408C, test your
program for an absolute error 5 0.058C and the following
cases: osf 5 8, 10, and 12 mg/L. Check your results.
5.19 According to Archimedes principle, the buoyancy force is equal
to the weight of fluid displaced by the submerged portion of an
object. For the sphere depicted in Fig. P5.19, use bisection to deter-
mine the height h of the portion that is above water. Employ the follow-
ing values for your computation: r 5 1 m, ␳s 5 density of sphere 5
200 kg/m3
, and ␳w 5 density of water 5 1000 kg/m3
. Note that the
volume of the above-water portion of the sphere can be computed with
V 5
ph2
3
(3r 2 h)
h
r
FIGURE P5.19
where g 5 9.81 m/s2
, H 5 initial head (m), L 5 pipe length (m),
and t 5 elapsed time (s). Determine the head needed to achieve
y 5 5 m/s in 2.5 s for a 4-m-long pipe (a) graphically, (b) by
bisection, and (c) with false position. Employ initial guesses of
xl 5 0 and xu 5 2 m with a stopping criterion of es 5 1%. Check
you results.
5.16 Water is flowing in a trapezoidal channel at a rate of Q 5 20 m3
/s.
The critical depth y for such a channel must satisfy the equation
0 5 1 2
Q2
gA3
c
B
where g 5 9.81 m/s2
, Ac 5 the cross-sectional area (m2
), and B 5
the width of the channel at the surface (m). For this case, the width
and the cross-sectional area can be related to depth y by
B 5 3 1 y and Ac 5 3y 1
y2
2
Solve for the critical depth using (a) the graphical method, (b) bisec-
tion, and (c) false position. For (b) and (c) use initial guesses of
xl 5 0.5 and xu 5 2.5, and iterate until the approximate error falls
below 1% or the number of iterations exceeds 10. Discuss your results.
5.17 You are designing a spherical tank (Fig. P5.17) to hold water
for a small village in a developing country. The volume of liquid it
can hold can be computed as
V 5 ph2 [3R 2 h]
3
where V 5 volume (m3
), h 5 depth of water in tank (m), and R 5
the tank radius (m).
h
V
R
FIGURE P5.17
If R 5 3 m, to what depth must the tank be filled so that it holds
30 m3
? Use three iterations of the false-position method to determine
144 BRACKETING METHODS
(c) Add an answer check that substitutes the root estimate into the
original function to verify whether the final result is close to
zero.
(d) Test the subprogram by duplicating the computations from
Examples 5.3 and 5.4.
5.22 Develop a subprogram for the bisection method that mini-
mizes function evaluations based on the pseudocode from Fig. 5.11.
Determine the number of function evaluations (n) per total itera-
tions. Test the program by duplicating Example 5.6.
5.23 Develop a user-friendly program for the false-position
method. The structure of your program should be similar to the
bisection algorithm outlined in Fig. 5.10. Test the program by
duplicating Example 5.5.
5.24 Develop a subprogram for the false-position method that min-
imizes function evaluations in a fashion similar to Fig. 5.11. Deter-
mine the number of function evaluations (n) per total iterations.
Test the program by duplicating Example 5.6.
5.25 Develop a user-friendly subprogram for the modified false-
position method based on Fig. 5.15. Test the program by deter-
mining the root of the function described in Example 5.6.
Perform a number of runs until the true percent relative error
falls below 0.01%. Plot the true and approximate percent relative
errors versus number of iterations on semilog paper. Interpret
your results.
5.26 Develop a function for bisection in a similar fashion to Fig. 5.10.
However, rather than using the maximum iterations and Eq. (5.2),
employ Eq. (5.5) as your stopping criterion. Make sure to round the
result of Eq. (5.5) up to the next highest integer. Test your function by
solving Example 5.3 using Ea,d 5 0.0001.
5.20 Perform the same computation as in Prob. 5.19, but for the
frustrum of a cone, as depicted in Fig. P5.20. Employ the following
values for your computation: r1 5 0.5 m, r2 5 1 m, h 5 1 m, ␳f 5
frustrum density 5 200 kg/m3
, and ␳w 5 water density 5 1000 kg/m3
.
Note that the volume of a frustrum is given by
V 5
ph
3
(r2
1 1 r2
2 1 r1r2)
h
h1
r2
r1
FIGURE P5.20
5.21 Integrate the algorithm outlined in Fig. 5.10 into a complete,
user-friendly bisection subprogram. Among other things:
(a) Place documentation statements throughout the subprogram to
identify what each section is intended to accomplish.
(b) Label the input and output.
6
C H A P T E R 6
145
Open Methods
For the bracketing methods in Chap. 5, the root is located within an interval prescribed
by a lower and an upper bound. Repeated application of these methods always results
in closer estimates of the true value of the root. Such methods are said to be convergent
because they move closer to the truth as the computation progresses (Fig. 6.1a).
In contrast, the open methods described in this chapter are based on formulas
that require only a single starting value of x or two starting values that do not
FIGURE 6.1
Graphical depiction of the
fundamental difference between
the (a) bracketing and (b) and
(c) open methods for root
location. In (a), which is the
bisection method, the root is
constrained within the interval
prescribed by xl and xu. In
contrast, for the open method
depicted in (b) and (c), a
formula is used to project from
xi to xi11 in an iterative fashion.
Thus, the method can either (b)
diverge or (c) converge rapidly,
depending on the value of the
initial guess.
f (x)
x
(a)
xl xu
xl xu
f(x)
x
(b)
xi
xi + 1
f(x)
x
(c)
xi
xi + 1
xl xu
xl xu
xl xu
146 OPEN METHODS
necessarily bracket the root. As such, they sometimes diverge or move away from
the true root as the computation progresses (Fig. 6.1b). However, when the open
methods converge (Fig. 6.1c), they usually do so much more quickly than the brack-
eting methods. We will begin our discussion of open techniques with a simple version
that is useful for illustrating their general form and also for demonstrating the con-
cept of convergence.
6.1 SIMPLE FIXED-POINT ITERATION
As mentioned above, open methods employ a formula to predict the root. Such a formula
can be developed for simple fixed-point iteration (or, as it is also called, one-point it-
eration or successive substitution) by rearranging the function f(x) 5 0 so that x is on
the left-hand side of the equation:
x 5 g(x) (6.1)
This transformation can be accomplished either by algebraic manipulation or by simply
adding x to both sides of the original equation. For example,
x2
2 2x 1 3 5 0
can be simply manipulated to yield
x 5
x2
1 3
2
whereas sin x 5 0 could be put into the form of Eq. (6.1) by adding x to both sides
to yield
x 5 sin x 1 x
The utility of Eq. (6.1) is that it provides a formula to predict a new value of x as
a function of an old value of x. Thus, given an initial guess at the root xi, Eq. (6.1) can
be used to compute a new estimate xi11 as expressed by the iterative formula
xi11 5 g(xi) (6.2)
As with other iterative formulas in this book, the approximate error for this equation can
be determined using the error estimator [Eq. (3.5)]:
ea 5 `
xi11 2 xi
xi11
` 100%
EXAMPLE 6.1 Simple Fixed-Point Iteration
Problem Statement. Use simple fixed-point iteration to locate the root of f(x) 5 e2x
2 x.
Solution. The function can be separated directly and expressed in the form of Eq. (6.2) as
xi11 5 e2xi
6.1 SIMPLE FIXED-POINT ITERATION 147
Starting with an initial guess of x0 5 0, this iterative equation can be applied to compute
i xi Ea (%) Et (%)
0 0 100.0
1 1.000000 100.0 76.3
2 0.367879 171.8 35.1
3 0.692201 46.9 22.1
4 0.500473 38.3 11.8
5 0.606244 17.4 6.89
6 0.545396 11.2 3.83
7 0.579612 5.90 2.20
8 0.560115 3.48 1.24
9 0.571143 1.93 0.705
10 0.564879 1.11 0.399
Thus, each iteration brings the estimate closer to the true value of the root: 0.56714329.
6.1.1 Convergence
Notice that the true percent relative error for each iteration of Example 6.1 is roughly
proportional (by a factor of about 0.5 to 0.6) to the error from the previous iteration.
This property, called linear convergence, is characteristic of fixed-point iteration.
Aside from the “rate” of convergence, we must comment at this point about the
“possibility” of convergence. The concepts of convergence and divergence can be de-
picted graphically. Recall that in Sec. 5.1, we graphed a function to visualize its structure
and behavior (Example 5.1). Such an approach is employed in Fig. 6.2a for the function
f(x) 5 e2x
2 x. An alternative graphical approach is to separate the equation into two
component parts, as in
f1(x) 5 f2(x)
Then the two equations
y1 5 f1(x) (6.3)
and
y2 5 f2(x) (6.4)
can be plotted separately (Fig. 6.2b). The x values corresponding to the intersections of
these functions represent the roots of f(x) 5 0.
EXAMPLE 6.2 The Two-Curve Graphical Method
Problem Statement. Separate the equation e2x
2 x 5 0 into two parts and determine
its root graphically.
148 OPEN METHODS
These points are plotted in Fig. 6.2b. The intersection of the two curves indicates a root
estimate of approximately x 5 0.57, which corresponds to the point where the single
curve in Fig. 6.2a crosses the x axis.
Solution. Reformulate the equation as y1 5 x and y2 5 e2x
. The following values can
be computed:
x y1 y2
0.0 0.0 1.000
0.2 0.2 0.819
0.4 0.4 0.670
0.6 0.6 0.549
0.8 0.8 0.449
1.0 1.0 0.368
FIGURE 6.2
Two alternative graphical
methods for determining the root
of f(x) 5 e2x
2 x. (a) Root at
the point where it crosses the
x axis; (b) root at the intersec-
tion of the component functions.
f (x)
f (x)
x
x
Root
Root
f (x) = e–x
– x
f 1(x) = x
f 2(x) = e–x
(a)
(b)
6.1 SIMPLE FIXED-POINT ITERATION 149
The two-curve method can now be used to illustrate the convergence and divergence
of fixed-point iteration. First, Eq. (6.1) can be reexpressed as a pair of equations y1 5 x
and y2 5 g(x). These two equations can then be plotted separately. As was the case with
Eqs. (6.3) and (6.4), the roots of f(x) 5 0 correspond to the abscissa value at the inter-
section of the two curves. The function y1 5 x and four different shapes for y2 5 g(x)
are plotted in Fig. 6.3.
For the first case (Fig. 6.3a), the initial guess of x0 is used to determine the corre-
sponding point on the y2 curve [x0, g(x0)]. The point (x1, x1) is located by moving left
horizontally to the y1 curve. These movements are equivalent to the first iteration in the
fixed-point method:
x1 5 g(x0)
Thus, in both the equation and in the plot, a starting value of x0 is used to obtain an
estimate of x1. The next iteration consists of moving to [x1, g(x1)] and then to (x2, x2).
This iteration is equivalent to the equation
x2 5 g(x1)
FIGURE 6.3
Iteration cobwebs depicting
convergence (a and b) and
divergence (c and d) of simple
fixed-point iteration. Graphs (a)
and (c) are called monotone
patterns, whereas (b) and (d)
are called oscillating or spiral
patterns. Note that convergence
occurs when |g9(x)| , 1.
x
x1
y1 = x
y2 = g(x)
x2 x0
y
(a)
x
y1 = x
y2 = g(x)
x0
y
(b)
x
y1 = x
y2 = g(x)
x0
y
(c)
x
y1 = x
y2 = g(x)
x0
y
(d)
150 OPEN METHODS
The solution in Fig. 6.3a is convergent because the estimates of x move closer to the
root with each iteration. The same is true for Fig. 6.3b. However, this is not the case
for Fig. 6.3c and d, where the iterations diverge from the root. Notice that convergence
seems to occur only when the absolute value of the slope of y2 5 g(x) is less than
the slope of y1 5 x, that is, when ug9(x)u , 1. Box 6.1 provides a theoretical deriva-
tion of this result.
6.1.2 Algorithm for Fixed-Point Iteration
The computer algorithm for fixed-point iteration is extremely simple. It consists of a
loop to iteratively compute new estimates until the termination criterion has been met.
Figure 6.4 presents pseudocode for the algorithm. Other open methods can be pro-
grammed in a similar way, the major modification being to change the iterative formula
that is used to compute the new root estimate.
Box 6.1 Convergence of Fixed-Point Iteration
From studying Fig. 6.3, it should be clear that fixed-point iteration
converges if, in the region of interest, ug9(x)u , 1. In other words,
convergence occurs if the magnitude of the slope of g(x) is less than
the slope of the line f(x) 5 x. This observation can be demonstrated
theoretically. Recall that the iterative equation is
xi11 5 g(xi)
Suppose that the true solution is
xr 5 g(xr)
Subtracting these equations yields
xr 2 xi11 5 g(xr) 2 g(xi) (B6.1.1)
The derivative mean-value theorem (recall Sec. 4.1.1) states that if
a function g(x) and its first derivative are continuous over an inter-
val a # x # b, then there exists at least one value of x 5 j within
the interval such that
g¿(j) 5
g(b) 2 g(a)
b 2 a
(B6.1.2)
The right-hand side of this equation is the slope of the line joining
g(a) and g(b). Thus, the mean-value theorem states that there is at
least one point between a and b that has a slope, designated by g9(j),
which is parallel to the line joining g(a) and g(b) (recall Fig. 4.3).
Now, if we let a 5 xi and b 5 xr, the right-hand side of Eq.
(B6.1.1) can be expressed as
g(xr) 2 g(xi) 5 (xr 2 xi)g¿(j)
where j is somewhere between xi and xr. This result can then be
substituted into Eq. (B6.1.1) to yield
xr 2 xi11 5 (xr 2 xi)g¿(j) (B6.1.3)
If the true error for iteration i is defined as
Et,i 5 xr 2 xi
then Eq. (B6.1.3) becomes
Et,i11 5 g¿(j)Et,i
Consequently, if ug9(x)u , 1, the errors decrease with each iteration.
For ug9(x)u . 1, the errors grow. Notice also that if the derivative is
positive, the errors will be positive, and hence, the iterative solution
will be monotonic (Fig. 6.3a and c). If the derivative is negative, the
errors will oscillate (Fig. 6.3b and d).
An offshoot of the analysis is that it also demonstrates that when
the method converges, the error is roughly proportional to and less
than the error of the previous step. For this reason, simple fixed-
point iteration is said to be linearly convergent.
6.2 THE NEWTON-RAPHSON METHOD 151
6.2 THE NEWTON-RAPHSON METHOD
Perhaps the most widely used of all root-locating formulas is the Newton-Raphson equa-
tion (Fig. 6.5). If the initial guess at the root is xi, a tangent can be extended from the
point [xi, f(xi)]. The point where this tangent crosses the x axis usually represents an
improved estimate of the root.
FUNCTION Fixpt(x0, es, imax, iter, ea)
xr 5 x0
iter 5 0
DO
xrold 5 xr
xr 5 g(xrold)
iter 5 iter 1 1
IF xr ? O THEN
ea 5 `
xr 2 xrold
xr
` ? 100
END IF
IF ea , es OR iter $ imax EXIT
END DO
Fixpt 5 xr
END Fixpt
FIGURE 6.4
Pseudocode for fixed-point
iteration. Note that other open
methods can be cast in this
general format.
f (x)
f (xi)
f(xi) – 0
Slope = f'(xi)
0
x
xi+1 xi
xi – xi+1
FIGURE 6.5
Graphical depiction of the
Newton-Raphson method.
A tangent to the function of xi
[that is, f9(xi)] is extrapolated
down to the x axis to provide
an estimate of the root at xi11.
152 OPEN METHODS
The Newton-Raphson method can be derived on the basis of this geometrical inter-
pretation (an alternative method based on the Taylor series is described in Box 6.2). As
in Fig. 6.5, the first derivative at x is equivalent to the slope:
f ¿(xi) 5
f(xi) 2 0
xi 2 xi11
(6.5)
which can be rearranged to yield
xi11 5 xi 2
f(xi)
f ¿(xi)
(6.6)
which is called the Newton-Raphson formula.
EXAMPLE 6.3 Newton-Raphson Method
Problem Statement. Use the Newton-Raphson method to estimate the root of f(x) 5
e2x
2 x, employing an initial guess of x0 5 0.
Solution. The first derivative of the function can be evaluated as
f¿(x) 5 2e2x
2 1
which can be substituted along with the original function into Eq. (6.6) to give
xi11 5 xi 2
e2xi
2 xi
2e2xi
2 1
Starting with an initial guess of x0 5 0, this iterative equation can be applied to compute
i xi Et (%)
0 0 100
1 0.500000000 11.8
2 0.566311003 0.147
3 0.567143165 0.0000220
4 0.567143290 , 1028
Thus, the approach rapidly converges on the true root. Notice that the true percent relative
error at each iteration decreases much faster than it does in simple fixed-point iteration
(compare with Example 6.1).
6.2.1 Termination Criteria and Error Estimates
As with other root-location methods, Eq. (3.5) can be used as a termination criterion. In
addition, however, the Taylor series derivation of the method (Box 6.2) provides theo-
retical insight regarding the rate of convergence as expressed by Ei11 5 O(E2
i ). Thus the
error should be roughly proportional to the square of the previous error. In other words,
6.2 THE NEWTON-RAPHSON METHOD 153
the number of significant figures of accuracy approximately doubles with each iteration.
This behavior is examined in the following example.
EXAMPLE 6.4 Error Analysis of Newton-Raphson Method
Problem Statement. As derived in Box 6.2, the Newton-Raphson method is quadrati-
cally convergent. That is, the error is roughly proportional to the square of the previous
error, as in
Et,i11 
2f –(xr)
2f ¿(xr)
E2
t,i (E6.4.1)
Examine this formula and see if it applies to the results of Example 6.3.
Solution. The first derivative of f(x) 5 e2x
2 x is
f¿(x) 5 2e2x
2 1
Box 6.2 Derivation and Error Analysis of the Newton-Raphson Method
Aside from the geometric derivation [Eqs. (6.5) and (6.6)], the
Newton-Raphson method may also be developed from the Taylor
series expansion. This alternative derivation is useful in that it also
provides insight into the rate of convergence of the method.
Recall from Chap. 4 that the Taylor series expansion can be
represented as
f(xi11) 5 f(xi) 1 f ¿(xi)(xi11 2 xi)
1
f –(j)
2!
(xi11 2 xi)2
(B6.2.1)
where j lies somewhere in the interval from xi to xi11. An approxi-
mate version is obtainable by truncating the series after the first
derivative term:
f(xi11)  f(xi) 1 f ¿(xi)(xi11 2 xi)
At the intersection with the x axis, f(xi11) would be equal to
zero, or
0 5 f(xi) 1 f ¿(xi)(xi11 2 xi) (B6.2.2)
which can be solved for
xi11 5 xi 2
f(xi)
f ¿(xi)
which is identical to Eq. (6.6). Thus, we have derived the Newton-
Raphson formula using a Taylor series.
Aside from the derivation, the Taylor series can also be used to
estimate the error of the formula. This can be done by realizing that
if the complete Taylor series were employed, an exact result would
be obtained. For this situation xi11 5 xr, where x is the true value
of the root. Substituting this value along with f(xr) 5 0 into
Eq. (B6.2.1) yields
0 5 f(xi) 1 f ¿(xi)(xr 2 xi) 1
f –(j)
2!
(xr 2 xi)2
(B6.2.3)
Equation (B6.2.2) can be subtracted from Eq. (B6.2.3) to give
0 5 f ¿(xi)(xr 2 xi11) 1
f –(j)
2!
(xr 2 xi)2
(B6.2.4)
Now, realize that the error is equal to the discrepancy between xi11
and the true value xr, as in
Et,i11 5 xr 2 xi11
and Eq. (B6.2.4) can be expressed as
0 5 f ¿(xi)Et,i11 1
f –(j)
2!
E2
t,i (B6.2.5)
If we assume convergence, both xi and j should eventually be ap-
proximated by the root xr, and Eq. (B6.2.5) can be rearranged to yield
Et,i11 5
2f –(xr)
2 f ¿(xr)
E2
t,i (B6.2.6)
According to Eq. (B6.2.6), the error is roughly proportional to the
square of the previous error. This means that the number of correct
decimal places approximately doubles with each iteration. Such
behavior is referred to as quadratic convergence. Example 6.4
manifests this property.
154 OPEN METHODS
which can be evaluated at xr 5 0.56714329 as f9(0.56714329) 5 21.56714329. The
second derivative is
f–(x) 5 e2x
which can be evaluated as f 0(0.56714329) 5 0.56714329. These results can be substituted
into Eq. (E6.4.1) to yield
Et,i11  2
0.56714329
2(21.56714329)
E2
t,i 5 0.18095E2
t,i
From Example 6.3, the initial error was Et,0 5 0.56714329, which can be substituted
into the error equation to predict
Et,1  0.18095(0.56714329)2
5 0.0582
which is close to the true error of 0.06714329. For the next iteration,
Et,2  0.18095(0.06714329)2
5 0.0008158
which also compares favorably with the true error of 0.0008323. For the third iteration,
Et,3  0.18095(0.0008323)2
5 0.000000125
which is the error obtained in Example 6.3. The error estimate improves in this manner
because, as we come closer to the root, x and j are better approximated by xr [recall our
assumption in going from Eq. (B6.2.5) to Eq. (B6.2.6) in Box 6.2]. Finally,
Et,4  0.18095(0.000000125)2
5 2.83 3 10215
Thus, this example illustrates that the error of the Newton-Raphson method for this case
is, in fact, roughly proportional (by a factor of 0.18095) to the square of the error of the
previous iteration.
6.2.2 Pitfalls of the Newton-Raphson Method
Although the Newton-Raphson method is often very efficient, there are situations where
it performs poorly. A special case—multiple roots—will be addressed later in this chapter.
However, even when dealing with simple roots, difficulties can also arise, as in the fol-
lowing example.
EXAMPLE 6.5 Example of a Slowly Converging Function with Newton-Raphson
Problem Statement. Determine the positive root of f(x) 5 x10
2 1 using the Newton-
Raphson method and an initial guess of x 5 0.5.
Solution. The Newton-Raphson formula for this case is
xi11 5 xi 2
x10
i 2 1
10x9
i
which can be used to compute
6.2 THE NEWTON-RAPHSON METHOD 155
Aside from slow convergence due to the nature of the function, other difficulties
can arise, as illustrated in Fig. 6.6. For example, Fig. 6.6a depicts the case where
an inflection point [that is, f 0(x) 5 0] occurs in the vicinity of a root. Notice that
iterations beginning at x0 progressively diverge from the root. Figure 6.6b illustrates
the tendency of the Newton-Raphson technique to oscillate around a local maximum
or minimum. Such oscillations may persist, or as in Fig. 6.6b, a near-zero slope is
reached, whereupon the solution is sent far from the area of interest. Figure 6.6c
shows how an initial guess that is close to one root can jump to a location several
roots away. This tendency to move away from the area of interest is because near-
zero slopes are encountered. Obviously, a zero slope [ f9(x) 5 0] is truly a disaster
because it causes division by zero in the Newton-Raphson formula [Eq. (6.6)].
Graphically (see Fig 6.6d), it means that the solution shoots off horizontally and
never hits the x axis.
Thus, there is no general convergence criterion for Newton-Raphson. Its convergence
depends on the nature of the function and on the accuracy of the initial guess. The only
remedy is to have an initial guess that is “sufficiently” close to the root. And for some
functions, no guess will work! Good guesses are usually predicated on knowledge of the
physical problem setting or on devices such as graphs that provide insight into the be-
havior of the solution. The lack of a general convergence criterion also suggests that
good computer software should be designed to recognize slow convergence or diver-
gence. The next section addresses some of these issues.
6.2.3 Algorithm for Newton-Raphson
An algorithm for the Newton-Raphson method is readily obtained by substituting Eq. (6.6)
for the predictive formula [Eq. (6.2)] in Fig. 6.4. Note, however, that the program must
also be modified to compute the first derivative. This can be simply accomplished by the
inclusion of a user-defined function.
Iteration x
0 0.5
1 51.65
2 46.485
3 41.8365
4 37.65285
5 33.887565
.
.
.
` 1.0000000
Thus, after the first poor prediction, the technique is converging on the true root of 1,
but at a very slow rate.
156 OPEN METHODS
Additionally, in light of the foregoing discussion of potential problems of the Newton-
Raphson method, the program would be improved by incorporating several additional
features:
f(x)
x
x2
x0
x1
(a)
f(x)
x
x2 x4
x0 x1
x3
(b)
f(x)
x
x0
x1
x2
(c)
f(x)
x
x0 x1
(d)
FIGURE 6.6
Four cases where the Newton-Raphson method exhibits poor convergence.
6.3 THE SECANT METHOD 157
1. A plotting routine should be included in the program.
2. At the end of the computation, the final root estimate should always be substituted
into the original function to compute whether the result is close to zero. This check
partially guards against those cases where slow or oscillating convergence may lead
to a small value of ea while the solution is still far from a root.
3. The program should always include an upper limit on the number of iterations to guard
against oscillating, slowly convergent, or divergent solutions that could persist interminably.
4. The program should alert the user and take account of the possibility that f9(x) might
equal zero at any time during the computation.
6.3 THE SECANT METHOD
A potential problem in implementing the Newton-Raphson method is the evaluation of
the derivative. Although this is not inconvenient for polynomials and many other func-
tions, there are certain functions whose derivatives may be extremely difficult or incon-
venient to evaluate. For these cases, the derivative can be approximated by a backward
finite divided difference, as in (Fig. 6.7)
f ¿(xi) 
f(xi21) 2 f(xi)
xi21 2 xi
This approximation can be substituted into Eq. (6.6) to yield the following iterative
equation:
xi11 5 xi 2
f(xi)(xi21 2 xi)
f(xi21) 2 f(xi)
(6.7)
f (x)
f (xi)
f (xi – 1)
x
xi
xi – 1
FIGURE 6.7
Graphical depiction of the se-
cant method. This technique is
similar to the Newton-Raphson
technique (Fig. 6.5) in the sense
that an estimate of the root is
predicted by extrapolating a
tangent of the function to the
x axis. However, the secant
method uses a difference rather
than a derivative to estimate the
slope.
158 OPEN METHODS
Equation (6.7) is the formula for the secant method. Notice that the approach requires
two initial estimates of x. However, because f(x) is not required to change signs between
the estimates, it is not classified as a bracketing method.
EXAMPLE 6.6 The Secant Method
Problem Statement. Use the secant method to estimate the root of f(x) 5 e2x
2 x. Start
with initial estimates of x21 5 0 and x0 5 1.0.
Solution. Recall that the true root is 0.56714329. . . .
First iteration:
x21 5 0 f(x21) 5 1.00000
x0 5 1 f(x0) 5 20.63212
x1 5 1 2
20.63212(0 2 1)
1 2 (20.63212)
5 0.61270 et 5 8.0%
Second iteration:
x0 5 1 f(x0) 5 20.63212
x1 5 0.61270 f(x1) 5 20.07081
(Note that both estimates are now on the same side of the root.)
x2 5 0.61270 2
20.07081(1 2 0.61270)
20.63212 2 (20.07081)
5 0.56384 et 5 0.58%
Third iteration:
x1 5 0.61270 f(x1) 5 20.07081
x2 5 0.56384 f(x2) 5 0.00518
x3 5 0.56384 2
0.00518(0.61270 2 0.56384)
20.07081 2 (20.00518)
5 0.56717 et 5 0.0048%
6.3.1 The Difference Between the Secant and False-Position Methods
Note the similarity between the secant method and the false-position method. For example,
Eqs. (6.7) and (5.7) are identical on a term-by-term basis. Both use two initial estimates to
compute an approximation of the slope of the function that is used to project to the x axis
for a new estimate of the root. However, a critical difference between the methods is how
one of the initial values is replaced by the new estimate. Recall that in the false-position
method the latest estimate of the root replaces whichever of the original values yielded a
function value with the same sign as f(xr). Consequently, the two estimates always bracket
the root. Therefore, for all practical purposes, the method always converges because the root
is kept within the bracket. In contrast, the secant method replaces the values in strict sequence,
with the new value xi11 replacing xi and xi replacing xi21. As a result, the two values can
sometimes lie on the same side of the root. For certain cases, this can lead to divergence.
EXAMPLE 6.7 Comparison of Convergence of the Secant and False-Position Techniques
Problem Statement. Use the false-position and secant methods to estimate the root of
f(x) 5 ln x. Start the computation with values of xl 5 xi21 5 0.5 and xu 5 xi 5 5.0.
Solution. For the false-position method, the use of Eq. (5.7) and the bracketing criterion
for replacing estimates results in the following iterations:
Iteration xl xu xr
1 0.5 5.0 1.8546
2 0.5 1.8546 1.2163
3 0.5 1.2163 1.0585
As can be seen (Fig. 6.8a and c), the estimates are converging on the true root which is
equal to 1.
6.3 THE SECANT METHOD 159
FIGURE 6.8
Comparison of the false-position and the secant methods. The first iterations (a) and (b) for both
techniques are identical. However, for the second iterations (c) and (d), the points used differ. As
a consequence, the secant method can diverge, as indicated in (d).
f(x) f (xu)
f (xl )
x
xr
(a)
False position
f(x) f(xi )
f(xi )
f(xi – 1)
x
xr
(b)
Secant
f(x)
f (xl )
f(xu)
x
xr
(c)
f(x) f (xi – 1)
x
xr
(d)
160 OPEN METHODS
For the secant method, using Eq. (6.7) and the sequential criterion for replacing
estimates results in
Iteration xi⫺1 xi xi⫹1
1 0.5 5.0 1.8546
2 5.0 1.8546 0.10438
As in Fig. 6.8d, the approach is divergent.
Although the secant method may be divergent, when it converges it usually does so
at a quicker rate than the false-position method. For instance, Fig. 6.9 demonstrates the
superiority of the secant method in this regard. The inferiority of the false-position
method is due to one end staying fixed to maintain the bracketing of the root. This
property, which is an advantage in that it prevents divergence, is a shortcoming with
regard to the rate of convergence; it makes the finite-difference estimate a less-accurate
approximation of the derivative.
20
Iterations
True
percent
relative
error
10–6
10–5
10–4
10–3
10–2
10–1
1
10
F
a
l
s
e
p
o
s
i
t
i
o
n
S
e
c
a
n
t
N
e
w
t
o
n
-
R
a
p
h
s
o
n
B
i
s
e
c
t
i
o
n
FIGURE 6.9
Comparison of the true percent
relative errors et for the methods
to determine the roots of
f(x) 5 e2x
2 x.
6.3 THE SECANT METHOD 161
6.3.2 Algorithm for the Secant Method
As with the other open methods, an algorithm for the secant method is obtained simply
by modifying Fig. 6.4 so that two initial guesses are input and by using Eq. (6.7) to
calculate the root. In addition, the options suggested in Sec. 6.2.3 for the Newton-Raphson
method can also be applied to good advantage for the secant program.
6.3.3 Modified Secant Method
Rather than using two arbitrary values to estimate the derivative, an alternative approach
involves a fractional perturbation of the independent variable to estimate f9(x),
f¿(xi) 
f(xi 1 dxi) 2 f(xi)
dxi
where d 5 a small perturbation fraction. This approximation can be substituted into Eq. (6.6)
to yield the following iterative equation:
xi11 5 xi 2
dxi f(xi)
f(xi 1 dxi) 2 f(xi)
(6.8)
EXAMPLE 6.8 Modified Secant Method
Problem Statement. Use the modified secant method to estimate the root of f(x) 5
e2x
2 x. Use a value of 0.01 for d and start with x0 5 1.0. Recall that the true root is
0.56714329. . . .
Solution.
First iteration:
x0 5 1 f(x0) 5 20.63212
x0 1 dx0 5 1.01 f(x0 1 dx0) 5 20.64578
x1 5 1 2
0.01(20.63212)
20.64578 2 (20.63212)
5 0.537263 Zet Z 5 5.3%
Second iteration:
x0 5 0.537263 f(x0) 5 0.047083
x0 1 dx0 5 0.542635 f(x0 1 dx0) 5 0.038579
x1 5 0.537263 2
0.005373(0.047083)
0.038579 2 0.047083
5 0.56701 Zet Z 5 0.0236%
Third iteration:
x0 5 0.56701 f(x0) 5 0.000209
x0 1 dx0 5 0.572680 f(x0 1 dx0) 5 20.00867
x1 5 0.56701 2
0.00567(0.000209)
20.00867 2 0.000209
5 0.567143 Zet Z 5 2.365 3 1025
%
162 OPEN METHODS
The choice of a proper value for ␦ is not automatic. If ␦ is too small, the method
can be swamped by round-off error caused by subtractive cancellation in the denomina-
tor of Eq. (6.8). If it is too big, the technique can become inefficient and even divergent.
However, if chosen correctly, it provides a nice alternative for cases where evaluating
the derivative is difficult and developing two initial guesses is inconvenient.
6.4 BRENT’S METHOD
Wouldn’t it be nice to have a hybrid approach that combined the reliability of bracketing
with the speed of the open methods? Brent’s root-location method is a clever algorithm
that does just that by applying a speedy open method wherever possible, but reverting
to a reliable bracketing method if necessary. The approach was developed by Richard
Brent (1973) based on an earlier algorithm of Theodorus Dekker (1969).
The bracketing technique is the trusty bisection method (Sec. 5.2) whereas two differ-
ent open methods are employed. The first is the secant method described in Sec. 6.3. As
explained next, the second is inverse quadratic interpolation.
6.4.1 Inverse Quadratic Interpolation
Inverse quadratic interpolation is similar in spirit to the secant method. As in Fig. 6.10a,
the secant method is based on computing a straight line that goes through two guesses.
The intersection of this straight line with the x axis represents the new root estimate. For
this reason, it is sometimes referred to as a linear interpolation method.
Now suppose that we had three points. In that case, we could determine a quadratic
function of x that goes through the three points (Fig. 6.10b). Just as with the linear secant
method, the intersection of this parabola with the x axis would represent the new root
estimate. And as illustrated in Fig. 6.10b, using a curve rather than a straight line often
yields a better estimate.
Although this would seem to represent a great improvement, the approach has a
fundamental flaw: It is possible that the parabola might not intersect the x axis! Such
would be the case when the resulting parabola had complex roots. This is illustrated by
the parabola, y 5 f(x), in Fig. 6.11.
FIGURE 6.10
Comparison of (a) the secant
method and (b) inverse qua-
dratic interpolation. Note that
the dark parabola passing
through the three points in
(b) is called “inverse” because it
is written in y rather than in x.
f(x)
x
(a) (b)
f(x)
x
6.4 BRENT’S METHOD 163
The difficulty can be rectified by employing inverse quadratic interpolation. That is,
rather than using a parabola in x, we can fit the points with a parabola in y. This amounts
to reversing the axes and creating a “sideways” parabola [the curve, x 5 f(y), in Fig. 6.11].
If the three points are designated as (xi22, yi22), (xi21, yi21), and (xi, yi), a quadratic
function of y that passes through the points can be generated as
g(y) 5
(y 2 yi21)(y 2 yi)
(yi22 2 yi21)(yi22 2 yi)
xi22 1
(y 2 yi22)(y 2 yi)
(yi21 2 yi22)(yi21 2 yi)
xi21
1
(y 2 yi22)(y 2 yi21)
(yi 2 yi22)(yi 2 yi21)
xi (6.9)
As we will learn in Sec. 18.2, this form is called a Lagrange polynomial. The root, xi11,
corresponds to y 5 0, which when substituted into Eq. (6.9) yields
xi11 5
yi21 yi
(yi22 2 yi21)(yi22 2 yi)
xi22 1
yi22 yi
(yi21 2 yi21 2 yi)
xi21
1
yi22 yi21
(yi 2 yi22)(yi 2 yi21)
xi (6.10)
As shown in Fig. 6.11, such a “sideways” parabola always intersects the x axis.
EXAMPLE 6.9 Inverse Quadratic Interpolation
Problem Statement. Develop quadratic equations in both x and y for the data points
depicted in Fig. 6.11: (1, 2), (2, 1), and (4, 5). For the first, y 5 f(x), employ the qua-
dratic formula to illustrate that the roots are complex. For the latter, x 5 g(y), use inverse
quadratic interpolation (Eq. 6.10) to determine the root estimate.
FIGURE 6.11
Two parabolas fit to three
points. The parabola written as
a function of x, y 5 f(x), has
complex roots and hence does
not intersect the x axis. In
contrast, if the variables are
reversed, and the parabola
developed as x 5 f(y), the
function does intersect the
x axis.
5
Root
3
1 2
0
2
4
6
y
x = f(y)
y = f(x)
x
164 OPEN METHODS
Solution. By reversing the x’s and y’s, Eq. (6.9) can be used to generate a quadratic in x as
f(x) 5
(x 2 2)(x 2 4)
(1 2 2)(1 2 4)
2 1
(x 2 1)(x 2 4)
(2 2 1)(2 2 4)
1 1
(x 2 1)(x 2 2)
(4 2 1)(4 2 2)
5
or collecting terms
f(x) 5 x2
2 4x 1 5
This equation was used to generate the parabola, y 5 f(x), in Fig. 6.11. The quadratic
formula can be used to determine that the roots for this case are complex,
x 5
4 6 2(24)2
2 4(1)(5)
2
5 2 6 i
Equation (6.9) can be used to generate the quadratic in y as
g(y) 5
(y 2 1)(y 2 5)
(2 2 1)(2 2 5)
1 1
(y 2 2)(y 2 5)
(1 2 2)(1 2 5)
2 1
(y 2 2)(y 2 1)
(5 2 2)(5 2 1)
4
or collecting terms
g(y) 5 0.5x2
2 2.5x 1 4
Finally, Eq. (6.10) can be used to determine the root as
xi11 5
21(25)
(2 2 1)(2 2 5)
1 1
22(25)
(1 2 2)(1 2 5)
2 1
22(21)
(5 2 2)(5 2 1)
4 5 4
Before proceeding to Brent’s algorithm, we need to mention one more case where
inverse quadratic interpolation does not work. If the three y values are not distinct (that
is, yi22 5 yi21 or yi21 5 yi), an inverse quadratic function does not exist. So this is where
the secant method comes into play. If we arrive at a situation where the y values are not
distinct, we can always revert to the less efficient secant method to generate a root using
two of the points. If yi22 5 yi21, we use the secant method with xi21 and xi. If yi21 5 yi,
we use xi22 and xi21.
6.4.2 Brent’s Method Algorithm
The general idea behind the Brent’s root finding method is whenever possible to use
one of the quick open methods. In the event that these generate an unacceptable result
(i.e., a root estimate that falls outside the bracket), the algorithm reverts to the more
conservative bisection method. Although bisection may be slower, it generates an
estimate guaranteed to fall within the bracket. This process is then repeated until the
root is located to within an acceptable tolerance. As might be expected, bisection
typically dominates at first but as the root is approached, the technique shifts to the
faster open methods.
Figure 6.12 presents pseudocode for the algorithm based on a MATLAB software
M-file developed by Cleve Moler (2005). It represents a stripped down version of
6.4 BRENT’S METHOD 165
Function fzerosimp(xl, xu)
eps 5 2.22044604925031E-16
tol 5 0.000001
a 5 xl: b 5 xu: fa 5 f(a): fb 5 f(b)
c 5 a: fc 5 fa: d 5 b 2 c: e 5 d
DO
IF fb 5 0 EXIT
IF Sgn(fa) 5 Sgn(fb) THEN (If necessary, rearrange points)
a 5 c: fa 5 fc: d 5 b 2 c: e 5 d
ENDIF
IF |fa| , |fb| THEN
c 5 b: b 5 a: a 5 c
fc 5 fb: fb 5 fa: fa 5 fc
ENDIF
m 5 0.5 * (a 2 b) (Termination test and possible exit)
tol 5 2 * eps * max(|b|, 1)
IF |m| # tol Or fb 5 0. THEN
EXIT
ENDIF
(Choose open methods or bisection)
IF |e| $ tol And |fc| . |fb| THEN
s 5 fb / fc
IF a 5 c THEN (Secant method)
p 5 2 * m * s
q 5 1 2 s
ELSE (Inverse quadratic interpolation)
q 5 fc / fa: r 5 fb / fa
p 5 s * (2 * m * q * (q 2 r) 2 (b 2 c) * (r 2 1))
q 5 (q 2 1) * (r 2 1) * (s 2 1)
ENDIF
IF p . 0 THEN q 5 2q ELSE p 5 2p
IF 2 * p , 3 * m * q 2 |tol * q| AND p , |0.5 * e * q| THEN
e 5 d: d 5 p / q
ELSE
d 5 m: e 5 m
ENDIF
ELSE (Bisection)
d 5 m: e 5 m
ENDIF
c 5 b: fc 5 fb
IF |d| . tol THEN b 5 b 1 d Else b 5 b 2 Sgn(b 2 a) * tol
fb 5 f(b)
ENDDO
fzerosimp 5 b
END fzerosimp
FIGURE 6.12
Pseudocode for Brent’s root
finding algorithm based on a
MATLAB m-file developed by
Cleve Moler (2005).
166 OPEN METHODS
the fzero function which is the professional root location function employed in MAT-
LAB. For that reason, we call the simplified version: fzerosimp. Note that it requires
another function, f, that holds the equation for which the root is being evaluated.
The fzerosimp function is passed two initial guesses that must bracket the root.
After assigning values for machine epsilon and a tolerance, the three variables defining
the search interval (a, b, c) are initialized, and f is evaluated at the endpoints.
A main loop is then implemented. If necessary, the three points are rearranged to
satisfy the conditions required for the algorithm to work effectively. At this point, if the
stopping criteria are met, the loop is terminated. Otherwise, a decision structure chooses
among the three methods and checks whether the outcome is acceptable. A final section
then evaluates f at the new point and the loop is repeated. Once the stopping criteria
are met, the loop terminates and the final root estimate is returned.
Note that Sec. 7.7.2 presents an application of Brent’s method where we illustrate
how the MATLAB’s fzero function works. In addition, it is employed in Case Study
8.4 to determine the friction factor for air flow through a tube.
6.5 MULTIPLE ROOTS
A multiple root corresponds to a point where a function is tangent to the x axis. For
example, a double root results from
f(x) 5 (x 2 3)(x 2 1)(x 2 1) (6.11)
or, multiplying terms, f(x) 5 x3
2 5x2
1 7x 2 3. The equation has a double root because
one value of x makes two terms in Eq. (6.11) equal to zero. Graphically, this corresponds
to the curve touching the x axis tangentially at the double root. Examine Fig. 6.13a at
x 5 1. Notice that the function touches the axis but does not cross it at the root.
A triple root corresponds to the case where one x value makes three terms in an
equation equal to zero, as in
f(x) 5 (x 2 3)(x 2 1)(x 2 1)(x 2 1)
or, multiplying terms, f(x) 5 x4
2 6x3
1 12x2
2 10x 1 3. Notice that the graphical
depiction (Fig. 6.13b) again indicates that the function is tangent to the axis at the root,
but that for this case the axis is crossed. In general, odd multiple roots cross the axis,
whereas even ones do not. For example, the quadruple root in Fig. 6.13c does not cross
the axis.
Multiple roots pose some difficulties for many of the numerical methods described
in Part Two:
1. The fact that the function does not change sign at even multiple roots precludes
the use of the reliable bracketing methods that were discussed in Chap. 5. Thus,
of the methods covered in this book, you are limited to the open methods that
may diverge.
2. Another possible problem is related to the fact that not only f(x) but also f9(x) goes
to zero at the root. This poses problems for both the Newton-Raphson and secant
methods, which both contain the derivative (or its estimate) in the denominator of
6.5 MULTIPLE ROOTS 167
their respective formulas. This could result in division by zero when the solution
converges very close to the root. A simple way to circumvent these problems is based
on the fact that it can be demonstrated theoretically (Ralston and Rabinowitz, 1978)
that f(x) will always reach zero before f9(x). Therefore, if a zero check for f(x) is
incorporated into the computer program, the computation can be terminated before
f9(x) reaches zero.
3. It can be demonstrated that the Newton-Raphson and secant methods are linearly,
rather than quadratically, convergent for multiple roots (Ralston and Rabinowitz,
1978). Modifications have been proposed to alleviate this problem. Ralston and
Rabinowitz (1978) have indicated that a slight change in the formulation returns it to
quadratic convergence, as in
xi11 5 xi 2 m
f(xi)
f¿(xi)
(6.12)
where m is the multiplicity of the root (that is, m 5 2 for a double root, m 5 3 for
a triple root, etc.). Of course, this may be an unsatisfactory alternative because it
hinges on foreknowledge of the multiplicity of the root.
Another alternative, also suggested by Ralston and Rabinowitz (1978), is to define
a new function u(x), that is, the ratio of the function to its derivative, as in
u(x) 5
f(x)
f¿(x)
(6.13)
It can be shown that this function has roots at all the same locations as the original
function. Therefore, Eq. (6.13) can be substituted into Eq. (6.6) to develop an alternative
form of the Newton-Raphson method:
xi11 5 xi 2
u(xi)
u¿(xi)
(6.14)
Equation (6.13) can be differentiated to give
u¿(x) 5
f ¿(x) f ¿(x) 2 f(x) f –(x)
[ f ¿(x)]2
(6.15)
Equations (6.13) and (6.15) can be substituted into Eq. (6.14) and the result simplified
to yield
xi11 5 xi 2
f(xi) f¿(xi)
[f ¿(xi)]2
2 f(xi) f–(xi)
(6.16)
EXAMPLE 6.10 Modified Newton-Raphson Method for Multiple Roots
Problem Statement. Use both the standard and modified Newton-Raphson methods to
evaluate the multiple root of Eq. (6.11), with an initial guess of x0 5 0.
FIGURE 6.13
Examples of multiple roots that
are tangential to the x axis.
Notice that the function does
not cross the axis on either side
of even multiple roots (a) and
(c), whereas it crosses the axis
for odd cases (b).
f(x)
x
(a)
Double
root
1 3
4
0
–4
f(x)
x
(c)
Quadruple
root
1 3
4
0
–4
f(x)
x
(b)
Triple
root
1 3
4
0
–4
168 OPEN METHODS
Solution. The first derivative of Eq. (6.11) is f¿(x) 5 3x2
2 10x 1 7, and therefore,
the standard Newton-Raphson method for this problem is [Eq. (6.6)]
xi11 5 xi 2
x3
i 2 5x2
i 1 7xi 2 3
3x2
i 2 10xi 1 7
which can be solved iteratively for
i xi et (%)
0 0 100
1 0.4285714 57
2 0.6857143 31
3 0.8328654 17
4 0.9133290 8.7
5 0.9557833 4.4
6 0.9776551 2.2
As anticipated, the method is linearly convergent toward the true value of 1.0.
For the modified method, the second derivative is f0(x) 5 6x 2 10, and the iterative
relationship is [Eq. (6.16)]
xi11 5 xi 2
(x3
i 2 5x2
i 1 7xi 2 3)(3x2
i 2 10xi 1 7)
(3x2
i 2 10xi 1 7)2
2 (x3
i 2 5x2
i 1 7xi 2 3)(6xi 2 10)
which can be solved for
i xi et (%)
0 0 100
1 1.105263 11
2 1.003082 0.31
3 1.000002 0.00024
Thus, the modified formula is quadratically convergent. We can also use both methods
to search for the single root at x 5 3. Using an initial guess of x0 5 4 gives the following
results:
i Standard et (%) Modified et (%)
0 4 33 4 33
1 3.4 13 2.636364 12
2 3.1 3.3 2.820225 6.0
3 3.008696 0.29 2.961728 1.3
4 3.000075 0.0025 2.998479 0.051
5 3.000000 2 3 1027
2.999998 7.7 3 1025
Thus, both methods converge quickly, with the standard method being somewhat more
efficient.
6.6 SYSTEMS OF NONLINEAR EQUATIONS 169
The preceding example illustrates the trade-offs involved in opting for the modified
Newton-Raphson method. Although it is preferable for multiple roots, it is somewhat
less efficient and requires more computational effort than the standard method for simple
roots.
It should be noted that a modified version of the secant method suited for multiple
roots can also be developed by substituting Eq. (6.13) into Eq. (6.7). The resulting
formula is (Ralston and Rabinowitz, 1978)
xi11 5 xi 2
u(xi)(xi21 2 xi)
u(xi21) 2 u(xi)
6.6 SYSTEMS OF NONLINEAR EQUATIONS
To this point, we have focused on the determination of the roots of a single equation. A
related problem is to locate the roots of a set of simultaneous equations,
f1(x1, x2, p , xn) 5 0
f2(x1, x2, p , xn) 5 0
 
  (6.17)
 
fn(x1, x2, p , xn) 5 0
The solution of this system consists of a set of x values that simultaneously result in all
the equations equaling zero.
In Part Three, we will present methods for the case where the simultaneous equations
are linear—that is, they can be expressed in the general form
f(x) 5 a1 x1 1 a2 x2 1 p 1 an xn 2 b 5 0 (6.18)
where the b and the a’s are constants. Algebraic and transcendental equations that do not
fit this format are called nonlinear equations. For example,
x2
1 xy 5 10
and
y 1 3xy2
5 57
are two simultaneous nonlinear equations with two unknowns, x and y. They can be
expressed in the form of Eq. (6.17) as
u(x, y) 5 x2
1 xy 2 10 5 0 (6.19a)
y(x, y) 5 y 1 3xy2
2 57 5 0 (6.19b)
Thus, the solution would be the values of x and y that make the functions u(x, y) and
y(x, y) equal to zero. Most approaches for determining such solutions are extensions of
the open methods for solving single equations. In this section, we will investigate two
of these: fixed-point iteration and Newton-Raphson.
170 OPEN METHODS
6.6.1 Fixed-Point Iteration
The fixed-point-iteration approach (Sec. 6.1) can be modified to solve two simultaneous,
nonlinear equations. This approach will be illustrated in the following example.
EXAMPLE 6.11 Fixed-Point Iteration for a Nonlinear System
Problem Statement. Use fixed-point iteration to determine the roots of Eq. (6.19). Note
that a correct pair of roots is x 5 2 and y 5 3. Initiate the computation with guesses of
x 5 1.5 and y 5 3.5.
Solution. Equation (6.19a) can be solved for
xi11 5
10 2 x2
i
yi
(E6.11.1)
and Eq. (6.19b) can be solved for
yi11 5 57 2 3xi y2
i (E6.11.2)
Note that we will drop the subscripts for the remainder of the example.
On the basis of the initial guesses, Eq. (E6.11.1) can be used to determine a new
value of x:
x 5
10 2 (1.5)2
3.5
5 2.21429
This result and the initial value of y 5 3.5 can be substituted into Eq. (E6.11.2) to
determine a new value of y:
y 5 57 2 3(2.21429)(3.5)2
5 224.37516
Thus, the approach seems to be diverging. This behavior is even more pronounced on
the second iteration:
x 5
10 2 (2.21429)2
224.37516
5 20.20910
y 5 57 2 3(20.20910)(224.37516)2
5 429.709
Obviously, the approach is deteriorating.
Now we will repeat the computation but with the original equations set up in a
different format. For example, an alternative formulation of Eq. (6.19a) is
x 5 210 2 xy
and of Eq. (6.19b) is
y 5
B
57 2 y
3x
Now the results are more satisfactory:
x 5 210 2 1.5(3.5) 5 2.17945
6.6 SYSTEMS OF NONLINEAR EQUATIONS 171
y 5
B
57 2 3.5
3(2.17945)
5 2.86051
x 5 210 2 2.17945(2.86051) 5 1.94053
y 5
B
57 2 2.86051
3(1.94053)
5 3.04955
Thus, the approach is converging on the true values of x 5 2 and y 5 3.
The previous example illustrates the most serious shortcoming of simple fixed-point
iteration—that is, convergence often depends on the manner in which the equations are
formulated. Additionally, even in those instances where convergence is possible, diver-
gence can occur if the initial guesses are insufficiently close to the true solution. Using
reasoning similar to that in Box 6.1, it can be demonstrated that sufficient conditions for
convergence for the two-equation case are
`
0u
0x
` 1 `
0u
0y
` , 1
and
`
0y
0x
` 1 `
0y
0y
` , 1
These criteria are so restrictive that fixed-point iteration has limited utility for solving
nonlinear systems. However, as we will describe later in the book, it can be very useful
for solving linear systems.
6.6.2 Newton-Raphson
Recall that the Newton-Raphson method was predicated on employing the derivative (that
is, the slope) of a function to estimate its intercept with the axis of the independent
variable—that is, the root (Fig. 6.5). This estimate was based on a first-order Taylor
series expansion (recall Box 6.2),
f(xi11) 5 f(xi) 1 (xi11 2 xi) f¿(xi) (6.20)
where xi is the initial guess at the root and xi11 is the point at which the slope intercepts
the x axis. At this intercept, f(xi11) by definition equals zero and Eq. (6.20) can be rear-
ranged to yield
xi11 5 xi 2
f(xi)
f¿(xi)
(6.21)
which is the single-equation form of the Newton-Raphson method.
The multiequation form is derived in an identical fashion. However, a multivariable
Taylor series must be used to account for the fact that more than one independent
172 OPEN METHODS
variable contributes to the determination of the root. For the two-variable case, a first-
order Taylor series can be written [recall Eq. (4.26)] for each nonlinear equation as
ui11 5 ui 1 (xi11 2 xi)
0ui
0x
1 (yi11 2 yi)
0ui
0y
(6.22a)
and
yi11 5 yi 1 (xi11 2 xi)
0yi
0x
1 (yi11 2 yi)
0yi
0y
(6.22b)
Just as for the single-equation version, the root estimate corresponds to the values of x and
y, where ui11 and yi11 equal zero. For this situation, Eq. (6.22) can be rearranged to give
0ui
0x
xi11 1
0ui
0y
yi11 5 2ui 1 xi
0ui
0x
1 yi
0ui
0y
(6.23a)
0yi
0x
xi11 1
0yi
0y
yi11 5 2yi 1 xi
0yi
0x
1 yi
0yi
0y
(6.23b)
Because all values subscripted with i’s are known (they correspond to the latest guess
or approximation), the only unknowns are xi11 and yi11. Thus, Eq. (6.23) is a set of two
linear equations with two unknowns [compare with Eq. (6.18)]. Consequently, algebraic
manipulations (for example, Cramer’s rule) can be employed to solve for
xi11 5 xi 2
ui
0yi
0y
2 yi
0ui
0y
0ui
0x
0yi
0y
2
0ui
0y
0yi
0x
(6.24a)
yi11 5 yi 2
yi
0ui
0x
2 ui
0yi
0x
0ui
0x
0yi
0y
2
0ui
0y
0yi
0x
(6.24b)
The denominator of each of these equations is formally referred to as the determinant
of the Jacobian of the system.
Equation (6.24) is the two-equation version of the Newton-Raphson method. As in
the following example, it can be employed iteratively to home in on the roots of two
simultaneous equations.
EXAMPLE 6.12 Newton-Raphson for a Nonlinear System
Problem Statement. Use the multiple-equation Newton-Raphson method to determine
roots of Eq. (6.19). Note that a correct pair of roots is x 5 2 and y 5 3. Initiate the
computation with guesses of x 5 1.5 and y 5 3.5.
Solution. First compute the partial derivatives and evaluate them at the initial guesses
of x and y:
0u0
0x
5 2x 1 y 5 2(1.5) 1 3.5 5 6.5
0u0
0y
5 x 5 1.5
PROBLEMS 173
0y0
0x
5 3y2
5 3(3.5)2
5 36.75
0y0
0y
5 1 1 6xy 5 1 1 6(1.5)(3.5) 5 32.5
Thus, the determinant of the Jacobian for the first iteration is
6.5(32.5) 2 1.5(36.75) 5 156.125
The values of the functions can be evaluated at the initial guesses as
u0 5 (1.5)2
1 1.5(3.5) 2 10 5 22.5
y0 5 3.5 1 3(1.5)(3.5)2
2 57 5 1.625
These values can be substituted into Eq. (6.24) to give
x 5 1.5 2
22.5(32.5) 2 1.625(1.5)
156.125
5 2.03603
y 5 3.5 2
1.625(6.5) 2 (22.5)(36.75)
156.125
5 2.84388
Thus, the results are converging to the true values of x 5 2 and y 5 3. The computation
can be repeated until an acceptable accuracy is obtained.
Just as with fixed-point iteration, the Newton-Raphson approach will often diverge if
the initial guesses are not sufficiently close to the true roots. Whereas graphical methods
could be employed to derive good guesses for the single-equation case, no such simple
procedure is available for the multiequation version. Although there are some advanced
approaches for obtaining acceptable first estimates, often the initial guesses must be ob-
tained on the basis of trial and error and knowledge of the physical system being modeled.
The two-equation Newton-Raphson approach can be generalized to solve n simulta-
neous equations. Because the most efficient way to do this involves matrix algebra and
the solution of simultaneous linear equations, we will defer discussion of the general
approach to Part Three.
PROBLEMS
6.1 Use simple fixed-point iteration to locate the root of
f(x) 5 sin ( 1x) 2 x
Use an initial guess of x0 5 0.5 and iterate until ea # 0.01%. Verify
that the process is linearly convergent as described in Box 6.1.
6.2 Determine the highest real root of
f(x) 5 2x3
2 11.7x2
1 17.7x 2 5
(a) Graphically.
(b) Fixed-point iteration method (three iterations, x0 5 3). Note: Make
certain that you develop a solution that converges on the root.
(c) Newton-Raphson method (three iterations, x0 5 3).
(d) Secant method (three iterations, x21 5 3, x0 5 4).
(e) Modified secant method (three iterations, x0 5 3, d 5 0.01).
Compute the approximate percent relative errors for your solutions.
6.3 Use (a) fixed-point iteration and (b) the Newton-Raphson
method to determine a root of f(x) 5 20.9x2
1 1.7x 1 2.5 using
x0 5 5. Perform the computation until ea is less than es 5 0.01%.
Also perform an error check of your final answer.
6.4 Determine the real roots of f(x) 5 21 1 5.5x 2 4x2
1 0.5x3
:
(a) graphically and (b) using the Newton-Raphson method to
within es 5 0.01%.
6.5 Employ the Newton-Raphson method to determine a real root for
f(x) 5 21 1 5.5x 2 4x2
1 0.5x3
using initial guesses of (a) 4.52
174 OPEN METHODS
and (b) 4.54. Discuss and use graphical and analytical methods to ex-
plain any peculiarities in your results.
6.6 Determine the lowest real root of f(x) 5 212 2 21x 1
18x2
2 2.4x3
: (a) graphically and (b) using the secant method to a
value of es corresponding to three significant figures.
6.7 Locate the first positive root of
f(x) 5 sinx 1 cos (1 1 x2
) 2 1
where x is in radians. Use four iterations of the secant method with
initial guesses of (a) xi21 5 1.0 and xi 5 3.0; (b) xi21 5 1.5 and
xi 5 2.5, and (c) xi21 5 1.5 and xi 5 2.25 to locate the root. (d) Use
the graphical method to explain your results.
6.8 Determine the real root of x3.5
5 80, with the modified secant
method to within es 5 0.1% using an initial guess of x0 5 3.5 and
d 5 0.01.
6.9 Determine the highest real root of f(x) 5 x3
2 6x2
1 11x 2 6.1:
(a) Graphically.
(b) Using the Newton-Raphson method (three iterations, xi 5 3.5).
(c) Using the secant method (three iterations, xi11 5 2.5 and
xi 5 3.5).
(d) Using the modified secant method (three iterations, xi 5 3.5,
d 5 0.01).
6.10 Determinethelowestpositiverootof f(x) 5 7sin (x)e2x
2 1:
(a) Graphically.
(b) Using the Newton-Raphson method (three iterations, xi 5 0.3).
(c) Using the secant method (five iterations, xi21 5 0.5 and
xi 5 0.4).
(d) Using the modified secant method (three iterations, xi 5 0.3,
d 5 0.01).
6.11 Use the Newton-Raphson method to find the root of
f(x) 5 e20.5x
(4 2 x) 2 2
Employ initial guesses of (a) 2, (b) 6, and (c) 8. Explain your results.
6.12 Given
f(x) 5 22x6
2 1.5x4
1 10x 1 2
Use a root location technique to determine the maximum of this
function. Perform iterations until the approximate relative error
falls below 5%. If you use a bracketing method, use initial guesses
of xl 5 0 and xu 5 1. If you use the Newton-Raphson or the modi-
fied secant method, use an initial guess of xi 5 1. If you use the
secant method, use initial guesses of xi21 5 0 and xi 5 1. Assuming
that convergence is not an issue, choose the technique that is best
suited to this problem. Justify your choice.
6.13 You must determine the root of the following easily differen-
tiable function,
e0.5x
5 5 2 5x
Pick the best numerical technique, justify your choice and then
use that technique to determine the root. Note that it is known
that for positive initial guesses, all techniques except fixed-point
iteration will eventually converge. Perform iterations until the
approximate relative error falls below 2%. If you use a bracket-
ing method, use initial guesses of xl 5 0 and xu 5 2. If you use
the Newton-Raphson or the modified secant method, use an ini-
tial guess of xi 5 0.7. If you use the secant method, use initial
guesses of xi21 5 0 and xi 5 2.
6.14 Use (a) the Newton-Raphson method and (b) the modified
secant method (d 5 0.05) to determine a root of f(x) 5 x5
2 16.05x4
1
88.75x3
2 192.0375x2
1 116.35x 1 31.6875 using an initial guess
of x 5 0.5825 and es 5 0.01%. Explain your results.
6.15 The “divide and average” method, an old-time method for
approximating the square root of any positive number a, can be
formulated as
x 5
x 1 ayx
2
Prove that this is equivalent to the Newton-Raphson algorithm.
6.16 (a) Apply the Newton-Raphson method to the function f(x) 5
tanh(x2
2 9) to evaluate its known real root at x 5 3. Use an initial
guess of x0 5 3.2 and take a minimum of four iterations. (b) Did the
method exhibit convergence onto its real root? Sketch the plot with
the results for each iteration shown.
6.17 The polynomial f(x) 5 0.0074x4
2 0.284x3
1 3.355x2
2
12.183x 1 5 has a real root between 15 and 20. Apply the Newton-
Raphson method to this function using an initial guess of x0 5 16.15.
Explain your results.
6.18 Use the secant method on the circle function (x 1 1)2
1
(y 2 2)2
5 16 to find a positive real root. Set your initial guess to
xi 5 3 and xi21 5 0.5. Approach the solution from the first and
fourth quadrants. When solving for f(x) in the fourth quadrant, be
sure to take the negative value of the square root. Why does your
solution diverge?
6.19 You are designing a spherical tank (Fig. P6.19) to hold water
for a small village in a developing country. The volume of liquid it
can hold can be computed as
V 5 ph2 [3R 2 h]
3
where V 5 volume (m3
), h 5 depth of water in tank (m), and R 5
the tank radius (m). If R 5 3 m, what depth must the tank be filled
to so that it holds 30 m3
? Use three iterations of the Newton-
Raphson method to determine your answer. Determine the ap-
proximate relative error after each iteration. Note that an initial
guess of R will always converge.
PROBLEMS 175
6.20 The Manning equation can be written for a rectangular open
channel as
Q 5
1S(BH)5y3
n(B 1 2H)2y3
where Q 5 flow [m3
/s], S 5 slope [m/m], H 5 depth [m], and n 5
the Manning roughness coefficient. Develop a fixed-point iteration
scheme to solve this equation for H given Q 5 5, S 5 0.0002, B 5 20,
and n 5 0.03. Prove that your scheme converges for all initial guesses
greater than or equal to zero.
6.21 The function x3
2 2x2
2 4x 1 8 has a double root at x 5 2.
Use (a) the standard Newton-Raphson [Eq. (6.6)], (b) the modi-
fied Newton-Raphson [Eq. (6.12)], and (c) the modified Newton-
Raphson [Eq. (6.16)] to solve for the root at x 5 2. Compare and
discuss the rate of convergence using an initial guess of x0 5 1.2.
6.22 Determine the roots of the following simultaneous nonlinear
equations using (a) fixed-point iteration and (b) the Newton-Raphson
method:
y 5 2x2
1 x 1 0.75
y 1 5xy 5 x2
Employ initial guesses of x 5 y 5 1.2 and discuss the results.
6.23 Determine the roots of the simultaneous nonlinear equations
(x 2 4)2
1 (y 2 4)2
5 5
x2
1 y2
5 16
Use a graphical approach to obtain your initial guesses. Determine
refined estimates with the two-equation Newton-Raphson method
described in Sec. 6.6.2.
6.24 Repeat Prob. 6.23 except determine the positive root of
y 5 x2
1 1
y 5 2cosx
6.25 A mass balance for a pollutant in a well-mixed lake can be
written as
V
dc
dt
5 W 2 Qc 2 kV 1c
Given the parameter values V 5 1 3 106
m3
, Q 5 1 3 105
m3
/yr,
W 5 1 3 106
g/yr, and k 5 0.25 m0.5
/g0.5
/yr, use the modified secant
method to solve for the steady-state concentration. Employ an ini-
tial guess of c 5 4 g/m3
and d 5 0.5. Perform three iterations and
determine the percent relative error after the third iteration.
6.26 For Prob. 6.25, the root can be located with fixed-point
iteration as
c 5 a
W 2 Qc
kV
b
2
or as
c 5
W 2 kV 1c
Q
Only one will converge for initial guesses of 2 , c , 6. Select the
correct one and demonstrate why it will always work.
6.27 Develop a user-friendly program for the Newton-Raphson
method based on Fig. 6.4 and Sec. 6.2.3. Test it by duplicating the
computation from Example 6.3.
6.28 Develop a user-friendly program for the secant method based
on Fig. 6.4 and Sec. 6.3.2. Test it by duplicating the computation
from Example 6.6.
6.29 Develop a user-friendly program for the modified secant
method based on Fig. 6.4 and Sec. 6.3.2. Test it by duplicating the
computation from Example 6.8.
6.30 Develop a user-friendly program for Brent’s root location
method based on Fig. 6.12. Test it by solving Prob. 6.6.
6.31 Develop a user-friendly program for the two-equation
Newton-Raphson method based on Sec. 6.6.2. Test it by solving
Example 6.12.
6.32 Use the program you developed in Prob. 6.31 to solve Probs.
6.22 and 6.23 to within a tolerance of es 5 0.01%.
h
V
R
FIGURE P6.19
7
C H A P T E R 7
176
Roots of Polynomials
In this chapter, we will discuss methods to find the roots of polynomial equations of the
general form
fn(x) 5 a0 1 a1x 1 a2x2
1 p 1 an xn
(7.1)
where n 5 the order of the polynomial and the a’s 5 constant coefficients. Although the
coefficients can be complex numbers, we will limit our discussion to cases where they
are real. For such cases, the roots can be real and/or complex.
The roots of such polynomials follow these rules:
1. For an nth-order equation, there are n real or complex roots. It should be noted that
these roots will not necessarily be distinct.
2. If n is odd, there is at least one real root.
3. If complex roots exist, they exist in conjugate pairs (that is, l 1 mi and l 2 mi),
where i 5 121.
Before describing the techniques for locating the roots of polynomials, we will provide
some background. The first section offers some motivation for studying the techniques;
the second deals with some fundamental computer manipulations involving polynomials.
7.1 POLYNOMIALS IN ENGINEERING AND SCIENCE
Polynomials have many applications in engineering and science. For example, they are used
extensively in curve-fitting. However, we believe that one of their most interesting and
powerful applications is in characterizing dynamic systems and, in particular, linear systems.
Examples include mechanical devices, structures, and electrical circuits. We will be explor-
ing specific examples throughout the remainder of this text. In particular, they will be the
focus of several of the engineering applications throughout the remainder of this text.
For the time being, we will keep the discussion simple and general by focusing on
a simple second-order system defined by the following linear ordinary differential equa-
tion (or ODE):
a2
d2
y
dt2
1 a1
dy
dt
1 a0y 5 F(t) (7.2)
7.1 POLYNOMIALS IN ENGINEERING AND SCIENCE 177
where y and t are the dependent and independent variables, respectively, the a’s are
constant coefficients, and F(t) is the forcing function.
In addition, it should be noted that Eq. (7.2) can be alternatively expressed as a pair
of first-order ODEs by defining a new variable z,
z 5
dy
dt
(7.3)
Equation (7.3) can be substituted along with its derivative into Eq. (7.2) to remove the
second-derivative term. This reduces the problem to solving
dz
dt
5
F(t) 2 a1z 2 a0y
a2
(7.4)
dz
dt
5 z (7.5)
In a similar fashion, an nth-order linear ODE can always be expressed as a system of n
first-order ODEs.
Now let’s look at the solution. The forcing function represents the effect of the
external world on the system. The homogeneous or general solution of the equation deals
with the case when the forcing function is set to zero,
a2
d2
y
dt2
1 a1
dy
dt
1 a0y 5 0 (7.6)
Thus, as the name implies, the general solution should tell us something very fundamental
about the system being simulated—that is, how the system responds in the absence of
external stimuli.
Now, the general solution to all unforced linear systems is of the form y 5 ert
. If
this function is differentiated and substituted into Eq. (7.6), the result is
a2r2
ert
1 a1rert
1 a0 ert
5 0
or canceling the exponential terms,
a2r2
1 a1r 1 a0 5 0 (7.7)
Notice that the result is a polynomial called the characteristic equation. The roots
of this polynomial are the values of r that satisfy Eq. (7.7). These r’s are referred to as
the system’s characteristic values, or eigenvalues.
So, here is the connection between roots of polynomials and engineering and
science. The eigenvalue tells us something fundamental about the system we are modeling,
and finding the eigenvalues involves finding the roots of polynomials. And, whereas
finding the root of a second-order equation is easy with the quadratic formula, finding
roots of higher-order systems (and hence, higher-order polynomials) is arduous analyti-
cally. Thus, the best general approach requires numerical methods of the type described
in this chapter.
Before proceeding to these methods, let us take our analysis a bit farther by in-
vestigating what specific values of the eigenvalues might imply about the behavior of
178 ROOTS OF POLYNOMIALS
physical systems. First, let us evaluate the roots of Eq. (7.7) with the quadratic
formula,
r1
r2
5
2a1 6 2a2
1 2 4a2a0
a0
Thus, we get two roots. If the discriminant (a2
1 2 4a2a0) is positive, the roots are real
and the general solution can be represented as
y 5 c1er1t
1 c2er2t
(7.8)
where the c’s 5 constants that can be determined from the initial conditions. This is
called the overdamped case.
If the discriminant is zero, a single real root results, and the general solution can be
formulated as
y 5 (c1 1 c2t)elt
(7.9)
This is called the critically damped case.
If the discriminant is negative, the roots will be complex conjugate numbers,
r1
r2
5 l 6 mi
and the general solution can be formulated as
y 5 c1e(l1mi)t
1 c2e(l2mi)t
FIGURE 7.1
The general solution for linear
ODEs can be composed of (a)
exponential and (b) sinusoidal
components. The combination
of the two shapes results in the
damped sinusoid shown in (c).
y
t
(a) (b)
y
t
(c)
y
t
7.2 COMPUTING WITH POLYNOMIALS 179
The physical behavior of this solution can be elucidated by using Euler’s formula
emit
5 cosmt 1 i sin mt
to reformulate the general solution as (see Boyce and DiPrima, 1992, for details of the
derivation)
y 5 c1elt
cos mt 1 c2elt
sin mt (7.10)
This is called the underdamped case.
Equations (7.8), (7.9), and (7.10) express the possible ways that linear systems re-
spond dynamically. The exponential terms mean that the solutions are capable of decay-
ing (negative real part) or growing (positive real part) exponentially with time (Fig. 7.1a).
The sinusoidal terms (imaginary part) mean that the solutions can oscillate (Fig. 7.1b).
If the eigenvalue has both real and imaginary parts, the exponential and sinusoidal shapes
are combined (Fig. 7.1c). Because such knowledge is a key element in understanding,
designing, and controlling the behavior of a physical system, characteristic polynomials
are very important in engineering and many branches of science. We will explore the
dynamics of several engineering systems in the applications covered in Chap. 8.
7.2 COMPUTING WITH POLYNOMIALS
Before describing root-location methods, we will discuss some fundamental computer
operations involving polynomials. These have utility in their own right as well as provid-
ing support for root finding.
7.2.1 Polynomial Evaluation and Differentiation
Although it is the most common format, Eq. (7.1) provides a poor means for determin-
ing the value of a polynomial for a particular value of x. For example, evaluating a
third-order polynomial as
f3(x) 5 a3 x3
1 a2 x2
1 a1x 1 a0 (7.11)
involves six multiplications and three additions. In general, for an nth-order polynomial,
this approach requires n(n 1 1)y2 multiplications and n additions.
In contrast, a nested format,
f3(x) 5 ((a3x 1 a2)x 1 a1)x 1 a0 (7.12)
involves three multiplications and three additions. For an nth-order polynomial, this ap-
proach requires n multiplications and n additions. Because the nested format minimizes
the number of operations, it also tends to minimize round-off errors. Note that, depend-
ing on your preference, the order of nesting can be reversed:
f3(x) 5 a0 1 x(a1 1 x(a2 1 xa3)) (7.13)
Succinct pseudocode to implement the nested form can be written simply as
DOFOR j 5 n, 0, 21
p 5 p * x1a(j)
END DO
180 ROOTS OF POLYNOMIALS
where p holds the value of the polynomial (defined by its coefficients, the a’s) evaluated
at x.
There are cases (such as in the Newton-Raphson method) where you might want to
evaluate both the function and its derivative. This evaluation can also be neatly included
by adding a single line to the preceding pseudocode,
D0FOR j 5 n, 0, 21
df 5 df * x1p
p 5 p * x1a(j)
END DO
where df holds the first derivative of the polynomial.
7.2.2 Polynomial Deflation
Suppose that you determine a single root of an nth-order polynomial. If you repeat your
root location procedure, you might find the same root. Therefore, it would be nice to
remove the found root before proceeding. This removal process is referred to as polyno-
mial deflation.
Before we show how this is done, some orientation might be useful. Polynomials
are typically represented in the format of Eq. (7.1). For example, a fifth-order polynomial
could be written as
f5(x) 5 2120 2 46x 1 79x2
2 3x3
2 7x4
1 x5
(7.14)
Although this is a familiar format, it is not necessarily the best expression to understand
the polynomial’s mathematical behavior. For example, this fifth-order polynomial might
be expressed alternatively as
f5(x) 5 (x 1 1)(x 2 4)(x 2 5)(x 1 3)(x 2 2) (7.15)
This is called the factored form of the polynomial. If multiplication is completed
and like terms collected, Eq. (7.14) would be obtained. However, the format of Eq. (7.15)
has the advantage that it clearly indicates the function’s roots. Thus, it is apparent that
x 5 21, 4, 5, 23, and 2 are all roots because each causes an individual term in Eq. (7.15)
to become zero.
Now, suppose that we divide this fifth-order polynomial by any of its factors, for
example, x 1 3. For this case, the result would be a fourth-order polynomial
f4(x) 5 (x 1 1)(x 2 4)(x 2 5)(x 2 2) 5 240 2 2x 1 27x2
2 10x3
1 x4
(7.16)
with a remainder of zero.
In the distant past, you probably learned to divide polynomials using the approach
called synthetic division. Several computer algorithms (based on both synthetic division
and other methods) are available for performing the operation. One simple scheme is
provided by the following pseudocode, which divides an nth-order polynomial by a
7.2 COMPUTING WITH POLYNOMIALS 181
monomial factor x 2 t:
r 5 a(n)
a(n) 5 0
DOFOR i 5 n21, 0, 21
s 5 a(i)
a(i) 5 r
r 5 s 1 r * t
END DO
If the monomial is a root of the polynomial, the remainder r will be zero, and the coef-
ficients of the quotient stored in a, at the end of the loop.
EXAMPLE 7.1 Polynomial Deflation
Problem Statement. Divide the second-order polynomial,
f(x) 5 (x 2 4)(x 1 6) 5 x2
1 2x 2 24
by the factor x 2 4.
Solution. Using the approach outlined in the above pseudocode, the parameters are
n 5 2, a0 5 224, a1 5 2, a2 5 1, and t 5 4. These can be used to compute
r 5 a2 5 1
a2 5 0
The loop is then iterated from i 5 2 2 1 5 1 to 0. For i 5 1,
s 5 a1 5 2
a1 5 r 5 1
r 5 s 1 rt 5 2 1 1(4) 5 6
For i 5 0,
s 5 a0 5 224
a0 5 r 5 6
r 5 224 1 6(4) 5 0
Thus, the result is as expected—the quotient is a0 1 a1x 5 6 1 x, with a remainder of zero.
It is also possible to divide by polynomials of higher order. As we will see later in
this chapter, the most common task involves dividing by a second-order polynomial or
parabola. The subroutine in Fig. 7.2 addresses the more general problem of dividing an
nth-order polynomial a by an mth-order polynomial d. The result is an (n 2 m)th-order
polynomial q, with an (m 2 1)th-order polynomial as the remainder.
Because each calculated root is known only approximately, it should be noted that
deflation is sensitive to round-off errors. In some cases, round-off error can grow to the
point that the results can become meaningless.
Some general strategies can be applied to minimize this problem. For example, round-off
error is affected by the order in which the terms are evaluated. Forward deflation refers to the
182 ROOTS OF POLYNOMIALS
case where new polynomial coefficients are in order of descending powers of x (that is, from
the highest-order to the zero-order term). For this case, it is preferable to divide by the roots
of smallest absolute value first. Conversely, for backward deflation (that is, from the zero-order
to the highest-order term), it is preferable to divide by the roots of largest absolute value first.
Another way to reduce round-off errors is to consider each successive root estimate
obtained during deflation as a good first guess. These can then be used as a starting
guess, and the root determined again with the original nondeflated polynomial. This is
referred to as root polishing.
Finally, a problem arises when two deflated roots are inaccurate enough that they
both converge on the same undeflated root. In that case, you might be erroneously led
to believe that the polynomial has a multiple root (recall Sec. 6.5). One way to detect
this problem is to compare each polished root with those that were located previously.
Press et al. (2007) discuss this problem in more detail.
7.3 CONVENTIONAL METHODS
Now that we have covered some background material on polynomials, we can begin to
describe methods to locate their roots. The obvious first step would be to investigate the
viability of the bracketing and open approaches described in Chaps. 5 and 6.
The efficacy of these approaches depends on whether the problem being solved involves
complex roots. If only real roots exist, any of the previously described methods could have
utility. However, the problem of finding good initial guesses complicates both the bracketing
and the open methods, whereas the open methods could be susceptible to divergence.
SUB poldiv(a, n, d, m, q, r)
DOFOR j 5 0, n
r(j) 5 a(j)
q(j) 5 0
END DO
DOFOR k 5 n2m, 0, 21
q(k11) 5 r(m1k) y d(m)
DOFOR j 5 m1k21, k, 21
r(j) 5 r(j)2q(k11) * b(j2k)
END DO
END DO
DOFOR j 5 m, n
r(j) 5 0
END DO
n 5 n2m
DOFOR i 5 0, n
a(i) 5 q(i11)
END DO
END SUB
FIGURE 7.2
Algorithm to divide a polynomial (defined by its coefficients a) by a lower-order polynomial d.
7.4 MÜLLER’S METHOD 183
When complex roots are possible, the bracketing methods cannot be used because
of the obvious problem that the criterion for defining a bracket (that is, sign change)
does not translate to complex guesses.
Of the open methods, the conventional Newton-Raphson method would provide a
viable approach. In particular, concise code including deflation can be developed. If a
language that accommodates complex variables (like Fortran) is used, such an algorithm
will locate both real and complex roots. However, as might be expected, it would be
susceptible to convergence problems. For this reason, special methods have been devel-
oped to find the real and complex roots of polynomials. We describe two—the Müller
and Bairstow methods—in the following sections. As you will see, both are related to
the more conventional open approaches described in Chap. 6.
7.4 MÜLLER’S METHOD
Recall that the secant method obtains a root estimate by projecting a straight line to the
x axis through two function values (Fig. 7.3a). Müller’s method takes a similar approach,
but projects a parabola through three points (Fig. 7.3b).
The method consists of deriving the coefficients of the parabola that goes through
the three points. These coefficients can then be substituted into the quadratic formula to
obtain the point where the parabola intercepts the x axis—that is, the root estimate. The
approach is facilitated by writing the parabolic equation in a convenient form,
f2(x) 5 a(x 2 x2)2
1 b(x 2 x2) 1 c (7.17)
We want this parabola to intersect the three points [x0, f(x0)], [x1, f(x1)], and [x2, f(x2)]. The
coefficients of Eq. (7.17) can be evaluated by substituting each of the three points to give
f(x0) 5 a(x0 2 x2)2
1 b(x0 2 x2) 1 c (7.18)
f(x1) 5 a(x1 2 x2)2
1 b(x1 2 x2) 1 c (7.19)
f(x2) 5 a(x2 2 x2)2
1 b(x2 2 x2) 1 c (7.20)
FIGURE 7.3
A comparison of two related
approaches for locating roots:
(a) the secant method and
(b) Müller’s method.
f (x)
x
x1 x0
(a)
Straight
line
Root
estimate
Root
f(x)
x
x2 x0
(b)
Parabola
Root Root
estimate
x1
184 ROOTS OF POLYNOMIALS
Note that we have dropped the subscript “2” from the function for conciseness. Because
we have three equations, we can solve for the three unknown coefficients, a, b, and c.
Because two of the terms in Eq. (7.20) are zero, it can be immediately solved for
c 5 f(x2). Thus, the coefficient c is merely equal to the function value evaluated at the
third guess, x2. This result can then be substituted into Eqs. (7.18) and (7.19) to yield
two equations with two unknowns:
f(x0) 2 f(x2) 5 a(x0 2 x2)2
1 b(x0 2 x2) (7.21)
f(x1) 2 f(x2) 5 a(x1 2 x2)2
1 b(x1 2 x2) (7.22)
Algebraic manipulation can then be used to solve for the remaining coefficients,
a and b. One way to do this involves defining a number of differences,
h0 5 x1 2 x0 h1 5 x2 2 x1
d0 5
f(x1) 2 f(x0)
x1 2 x0
d1 5
f(x2) 2 f(x1)
x2 2 x1
(7.23)
These can be substituted into Eqs. (7.21) and (7.22) to give
(h0 1 h1)b 2 (h0 1 h1)2
a 5 h0d0 1 h1d1
h1 b 2 h2
1 a 5 h1d1
which can be solved for a and b. The results can be summarized as
a 5
d1 2 d0
h1 1 h0
(7.24)
b 5 ah1 1 d1 (7.25)
c 5 f(x2) (7.26)
To find the root, we apply the quadratic formula to Eq. (7.17). However, because of
potential round-off error, rather than using the conventional form, we use the alternative
formulation [Eq. (3.13)] to yield
x3 2 x2 5
22c
b 6 2b2
2 4ac
(7.27a)
or isolating the unknown x3 on the left side of the equal sign,
x3 5 x2 1
22c
b 6 2b2
2 4ac
(7.27b)
Note that the use of the quadratic formula means that both real and complex roots can
be located. This is a major benefit of the method.
In addition, Eq. (7.27a) provides a neat means to determine the approximate error.
Because the left side represents the difference between the present (x3) and the previous
(x2) root estimate, the error can be calculated as
ea 5 `
x3 2 x2
x3
` 100%
7.4 MÜLLER’S METHOD 185
Now, a problem with Eq. (7.27a) is that it yields two roots, corresponding to the
6 term in the denominator. In Müller’s method, the sign is chosen to agree with the sign
of b. This choice will result in the largest denominator, and hence, will give the root
estimate that is closest to x2.
Once x3 is determined, the process is repeated. This brings up the issue of which
point is discarded. Two general strategies are typically used:
1. If only real roots are being located, we choose the two original points that are near-
est the new root estimate, x3.
2. If both real and complex roots are being evaluated, a sequential approach is employed.
That is, just like the secant method, x1, x2, and x3 take the place of x0, x1, and x2.
EXAMPLE 7.2 Müller’s Method
Problem Statement. Use Müller’s method with guesses of x0, x1, and x2 5 4.5, 5.5,
and 5, respectively, to determine a root of the equation
f(x) 5 x3
2 13x 2 12
Note that the roots of this equation are 23, 21, and 4.
Solution. First, we evaluate the function at the guesses
f(4.5) 5 20.625 f(5.5) 5 82.875 f(5) 5 48
which can be used to calculate
h0 5 5.5 2 4.5 5 1 h1 5 5 2 5.5 5 20.5
d0 5
82.875 2 20.625
5.5 2 4.5
5 62.25 d1 5
48 2 82.875
5 2 5.5
5 69.75
These values in turn can be substituted into Eqs. (7.24) through (7.26) to compute
a 5
69.75 2 62.25
20.5 1 1
5 15 b 5 15(20.5) 1 69.75 5 62.25 c 5 48
The square root of the discriminant can be evaluated as
262.252
2 4(15)48 5 31.54461
Then, because Z62.25 1 31.54451Z . Z62.25 2 31.54451Z, a positive sign is employed in
the denominator of Eq. (7.27b), and the new root estimate can be determined as
x3 5 5 1
22(48)
62.25 1 31.54451
5 3.976487
and develop the error estimate
ea 5 `
21.023513
3.976487
` 100% 5 25.74%
Because the error is large, new guesses are assigned; x0 is replaced by x1, x1 is replaced
by x2, and x2 is replaced by x3. Therefore, for the new iteration,
x0 5 5.5 x1 5 5 x2 5 3.976487
186 ROOTS OF POLYNOMIALS
Pseudocode to implement Müller’s method for real roots is presented in Fig. 7.4.
Notice that this routine is set up to take a single initial nonzero guess that is then
perturbed to develop the other two guesses. Of course, the algorithm can also be
and the calculation is repeated. The results, tabulated below, show that the method con-
verges rapidly on the root, xr 5 4:
i xr Ea (%)
0 5
1 3.976487 25.74
2 4.00105 0.6139
3 4 0.0262
4 4 0.0000119
FIGURE 7.4
Pseudocode for Müller’s method.
SUB Muller(xr, h, eps, maxit)
x2 5 xr
x1 5 xr 1 h*xr
x0 5 xr 2 h*xr
DO
iter 5 iter 1 1
h0 5 x1 2 x0
h1 5 x2 2 x1
d0 5 (f(x1) 2 f(x0)) / h0
d1 5 (f(x2) 2 f(x1)) / h1
a 5 (d1 2 d0) / (h1 1 h0)
b 5 a*h1 1 d1
c 5 f(x2)
rad 5 SQRT(b*b 2 4*a*c)
If |b1rad| . |b2rad| THEN
den 5 b 1 rad
ELSE
den 5 b 2 rad
END IF
dxr 5 22*c y den
xr 5 x2 1 dxr
PRINT iter, xr
IF (|dxr| , eps*xr OR iter .5 maxit) EXIT
x0 5 x1
x1 5 x2
x2 5 xr
END DO
END Müller
7.5 BAIRSTOW’S METHOD 187
programmed to accommodate three guesses. For languages like Fortran, the code will
find complex roots if the proper variables are declared as complex.
7.5 BAIRSTOW’S METHOD
Bairstow’s method is an iterative approach related loosely to both the Müller and Newton-
Raphson methods. Before launching into a mathematical description of the technique,
recall the factored form of the polynomial,
f5(x) 5 (x 1 1)(x 2 4)(x 2 5)(x 1 3)(x 2 2) (7.28)
If we divided by a factor that is not a root (for example, x 1 6), the quotient would be
a fourth-order polynomial. However, for this case, a remainder would result.
On the basis of the above, we can elaborate on an algorithm for determining a root of
a polynomial: (1) guess a value for the root x 5 t, (2) divide the polynomial by the factor
x 2 t, and (3) determine whether there is a remainder. If not, the guess was perfect and
the root is equal to t. If there is a remainder, the guess can be systematically adjusted and
the procedure repeated until the remainder disappears and a root is located. After this is
accomplished, the entire procedure can be repeated for the quotient to locate another root.
Bairstow’s method is generally based on this approach. Consequently, it hinges on
the mathematical process of dividing a polynomial by a factor. Recall from our discus-
sion of polynomial deflation (Sec. 7.2.2) that synthetic division involves dividing a poly-
nomial by a factor x 2 t. For example, the general polynomial [Eq. (7.1)]
fn(x) 5 a0 1 a1x 1 a2 x2
1 p 1 an xn
(7.29)
can be divided by the factor x 2 t to yield a second polynomial that is one order lower,
fn21(x) 5 b1 1 b2 x 1 b3 x2
1 p 1 bn xn21
(7.30)
with a remainder R 5 b0, where the coefficients can be calculated by the recurrence
relationship
bn 5 an
bi 5 ai 1 bi11t for i 5 n 2 1 to 0
Note that if t were a root of the original polynomial, the remainder b0 would equal zero.
To permit the evaluation of complex roots, Bairstow’s method divides the polynomial
by a quadratic factor x2
2 rx 2 s. If this is done to Eq. (7.29), the result is a new poly-
nomial
fn22(x) 5 b2 1 b3 x 1 p 1 bn21 xn23
1 bn xn22
with a remainder
R 5 b1(x 2 r) 1 b0 (7.31)
As with normal synthetic division, a simple recurrence relationship can be used to perform
the division by the quadratic factor:
bn 5 an (7.32a)
bn21 5 an21 1 rbn (7.32b)
bi 5 ai 1 rbi11 1 sbi12 for i 5 n 2 2 to 0 (7.32c)
188 ROOTS OF POLYNOMIALS
The quadratic factor is introduced to allow the determination of complex roots.
This relates to the fact that, if the coefficients of the original polynomial are real, the
complex roots occur in conjugate pairs. If x2
2 rx 2 s is an exact divisor of the
polynomial, complex roots can be determined by the quadratic formula. Thus, the
method reduces to determining the values of r and s that make the quadratic factor
an exact divisor. In other words, we seek the values that make the remainder term
equal to zero.
Inspection of Eq. (7.31) leads us to conclude that for the remainder to be zero, b0
and b1 must be zero. Because it is unlikely that our initial guesses at the values of r and s
will lead to this result, we must determine a systematic way to modify our guesses so
that b0 and b1 approach zero. To do this, Bairstow’s method uses a strategy similar to
the Newton-Raphson approach. Because both b0 and b1 are functions of both r and s,
they can be expanded using a Taylor series, as in [recall Eq. (4.26)]
b1(r 1 ¢r, s 1 ¢s) 5 b1 1
0b1
0r
¢r 1
0b1
0s
¢s
b0(r 1 ¢r, s 1 ¢s) 5 b0 1
0b0
0r
¢r 1
0b0
0s
¢s (7.33)
where the values on the right-hand side are all evaluated at r and s. Notice that second-
and higher-order terms have been neglected. This represents an implicit assumption that
2r and 2s are small enough that the higher-order terms are negligible. Another way of
expressing this assumption is to say that the initial guesses are adequately close to the
values of r and s at the roots.
The changes, Dr and Ds, needed to improve our guesses can be estimated by setting
Eq. (7.33) equal to zero to give
0b1
0r
¢r 1
0b1
0s
¢s 5 2b1 (7.34)
0b0
0r
¢r 1
0b0
0s
¢s 5 2b0 (7.35)
If the partial derivatives of the b’s can be determined, these are a system of two equa-
tions that can be solved simultaneously for the two unknowns, Dr and Ds. Bairstow
showed that the partial derivatives can be obtained by a synthetic division of the b’s in
a fashion similar to the way in which the b’s themselves were derived:
cn 5 bn (7.36a)
cn21 5 bn21 1 rcn (7.36b)
ci 5 bi 1 rci11 1 sci12 for i 5 n 2 2 to 1 (7.36c)
where 0b0y0r 5 c1, 0b0y0s 5 0b1y0r 5 c2, and 0b1y0s 5 c3. Thus, the partial derivatives
are obtained by synthetic division of the b’s. Then the partial derivatives can be substi-
tuted into Eqs. (7.34) and (7.35) along with the b’s to give
c2 ¢r 1 c3 ¢s 5 2b1
c1 ¢r 1 c2 ¢s 5 2b0
7.5 BAIRSTOW’S METHOD 189
These equations can be solved for Dr and Ds, which can in turn be employed to improve
the initial guesses of r and s. At each step, an approximate error in r and s can be esti-
mated, as in
Zea,r Z 5 `
¢r
r
` 100% (7.37)
and
Zea,s Z 5 `
¢s
s
` 100% (7.38)
When both of these error estimates fall below a prespecified stopping criterion es, the
values of the roots can be determined by
x 5
r 6 2r2
1 4s
2
(7.39)
At this point, three possibilities exist:
1. The quotient is a third-order polynomial or greater. For this case, Bairstow’s method
would be applied to the quotient to evaluate new values for r and s. The previous
values of r and s can serve as the starting guesses for this application.
2. The quotient is a quadratic. For this case, the remaining two roots could be evaluated
directly with Eq. (7.39).
3. The quotient is a first-order polynomial. For this case, the remaining single root can
be evaluated simply as
x 5 2
s
r
(7.40)
EXAMPLE 7.3 Bairstow’s Method
Problem Statement. Employ Bairstow’s method to determine the roots of the polynomial
f5(x) 5 x5
2 3.5x4
1 2.75x3
1 2.125x2
2 3.875x 1 1.25
Use initial guesses of r 5 s 5 21 and iterate to a level of es 5 1%.
Solution. Equations (7.32) and (7.36) can be applied to compute
b5 5 1 b4 5 24.5 b3 5 6.25 b2 5 0.375 b1 5 210.5
b0 5 11.375
c5 5 1 c4 5 25.5 c3 5 10.75 c2 5 24.875 c1 5 216.375
Thus, the simultaneous equations to solve for Dr and Ds are
24.875¢r 1 10.75¢s 5 10.5
216.375¢r 2 4.875¢s 5 211.375
which can be solved for Dr 5 0.3558 and Ds 5 1.1381. Therefore, our original guesses
can be corrected to
r 5 21 1 0.3558 5 20.6442
s 5 21 1 1.1381 5 0.1381
190 ROOTS OF POLYNOMIALS
and the approximate errors can be evaluated by Eqs. (7.37) and (7.38),
0ea,r 0 5 `
0.3558
20.6442
` 100% 5 55.23% 0ea, s 0 5 `
1.1381
0.1381
` 100% 5 824.1%
Next, the computation is repeated using the revised values for r and s. Applying Eqs. (7.32)
and (7.36) yields
b5 5 1 b4 5 24.1442 b3 5 5.5578 b2 5 22.0276 b1 5 21.8013
b0 5 2.1304
c5 5 1 c4 5 24.7884 c3 5 8.7806 c2 5 28.3454 c1 5 4.7874
Therefore, we must solve
28.3454¢r 1 8.7806¢s 5 1.8013
4.7874¢r 2 8.3454¢s 5 22.1304
for Dr 5 0.1331 and Ds 5 0.3316, which can be used to correct the root estimates as
r 5 20.6442 1 0.1331 5 20.5111 Zea,r Z 5 26.0%
s 5 0.1381 1 0.3316 5 0.4697 Zea,s Z 5 70.6%
The computation can be continued, with the result that after four iterations the
method converges on values of r 5 20.5 (Zea,rZ 5 0.063%) and s 5 0.5 (Zea,s Z 5 0.040%).
Equation (7.39) can then be employed to evaluate the roots as
x 5
20.5 6 2(20.5)2
1 4(0.5)
2
5 0.5,21.0
At this point, the quotient is the cubic equation
f(x) 5 x3
2 4x2
1 5.25x 2 2.5
Bairstow’s method can be applied to this polynomial using the results of the previous
step, r 5 20.5 and s 5 0.5, as starting guesses. Five iterations yield estimates of r 5 2
and s 5 21.249, which can be used to compute
x 5
2 6 222
1 4(21.249)
2
5 1 6 0.499i
At this point, the quotient is a first-order polynomial that can be directly evaluated
by Eq. (7.40) to determine the fifth root: 2.
Note that the heart of Bairstow’s method is the evaluation of the b’s and c’s via
Eqs. (7.32) and (7.36). One of the primary strengths of the method is the concise way
in which these recurrence relationships can be programmed.
Figure 7.5 lists pseudocode to implement Bairstow’s method. The heart of the algo-
rithm consists of the loop to evaluate the b’s and c’s. Also notice that the code to solve
the simultaneous equations checks to prevent division by zero. If this is the case, the
values of r and s are perturbed slightly and the procedure is begun again. In addition,
the algorithm places a user-defined upper limit on the number of iterations (MAXIT)
and should be designed to avoid division by zero while calculating the error estimates.
Finally, the algorithm requires initial guesses for r and s (rr and ss in the code). If no
prior knowledge of the roots exist, they can be set to zero in the calling program.
7.5 BAIRSTOW’S METHOD 191
(a) Bairstow Algorithm
SUB Bairstow (a,nn,es,rr,ss,maxit,re,im,ier)
DIMENSION b(nn), c(nn)
r 5 rr; s 5 ss; n 5 nn
ier 5 0; ea1 5 1; ea2 5 1
DO
IF n , 3 OR iter $ maxit EXIT
iter 5 0
DO
iter 5 iter 1 1
b(n) 5 a(n)
b(n 2 1) 5 a(n 2 1) 1 r * b(n)
c(n) 5 b(n)
c(n 2 1) 5 b(n 2 1) 1 r * c(n)
DO i 5 n 2 2, 0, 21
b(i) 5 a(i) 1 r * b(i 1 1) 1 s * b(i 1 2)
c(i) 5 b(i) 1 r * c(i 1 1) 1 s * c(i 1 2)
END DO
det 5 c(2) * c(2) 2 c(3) * c(1)
IF det fi 0 THEN
dr 5 (2b(1) * c(2) 1 b(0) * c(3))ydet
ds 5 (2b(0) * c(2) 1 b(1) * c(1))ydet
r 5 r 1 dr
s 5 s 1 ds
IF rfi0 THEN ea1 5 ABS(dryr) * 100
IF sfi0 THEN ea2 5 ABS(dsys) * 100
ELSE
r 5 r 1 1
s 5 s 1 1
iter 5 0
END IF
IF ea1 # es AND ea2 # es OR iter $ maxit EXIT
END DO
CALL Quadroot(r,s,r1,i1,r2,i2)
re(n) 5 r1
im(n) 5 i1
re(n 2 1) 5 r2
im(n 2 1) 5 i2
n 5 n 2 2
DO i 5 0, n
a(i) 5 b(i 1 2)
END DO
END DO
IF iter , maxit THEN
IF n 5 2 THEN
r 5 2a(1)ya(2)
s 5 2a(0)ya(2)
CALL Quadroot(r,s,r1,i1,r2,i2)
re(n) 5 r1
im(n) 5 i1
re(n 2 1) 5 r2
im(n 2 1) 5 i2
ELSE
re(n) 5 2a(0)ya(1)
im(n) 5 0
END IF
ELSE
ier 5 1
END IF
END Bairstow
(b) Roots of Quadratic Algorithm
SUB Quadroot(r,s,r1,i1,r2,i2)
disc 5 r ^ 2 1 4 * s
IF disc . 0 THEN
r1 5 (r 1 SQRT(disc))y2
r2 5 (r 2 SQRT(disc))y2
i1 5 0
i2 5 0
ELSE
r1 5 ry2
r2 5 r1
i1 5 SQRT(ABS(disc))y2
i2 5 2i1
END IF
END QuadRoot
FIGURE 7.5
(a) Algorithm for implementing Bairstow’s method, along with (b) an algorithm to determine the roots of a quadratic.
192 ROOTS OF POLYNOMIALS
S
O
F
T
W
A
R
E
7.6 OTHER METHODS
Other methods are available to locate the roots of polynomials. The Jenkins-Traub method
is commonly used in software libraries. It is fairly complicated, and a good starting point
to understanding it is found in Ralston and Rabinowitz (1978).
Laguerre’s method, which approximates both real and complex roots and has cubic
convergence, is among the best approaches. A complete discussion can be found in
Householder (1970). In addition, Press et al. (2007) present a nice algorithm to imple-
ment the method.
7.7 ROOT LOCATION WITH SOFTWARE PACKAGES
Software packages have great capabilities for locating roots. In this section, we will give
you a taste of some of the more useful ones.
7.7.1 Excel
A spreadsheet like Excel can be used to locate a root by trial and error. For example,
if you want to find a root of
f(x) 5 x 2 cosx
first, you can enter a value for x in a cell. Then set up another cell for f(x) that would
obtain its value for x from the first cell. You can then vary the x cell until the f(x) cell
approaches zero. This process can be further enhanced by using Excel’s plotting capa-
bilities to obtain a good initial guess (Fig. 7.6).
Although Excel does facilitate a trial-and-error approach, it also has two standard
tools that can be employed for root location: Goal Seek and Solver. Both these tools can
be employed to systematically adjust the initial guesses. Goal Seek is expressly used to
drive an equation to a value (in our case, zero) by varying a single parameter.
FIGURE 7.6
A spreadsheet set up to
determine the root of
f(x) 5 x 2 cos x by trial and
error. The plot is used to obtain
a good initial guess.
7.7 ROOT LOCATION WITH SOFTWARE PACKAGES 193
EXAMPLE 7.4 Using Excel’s Goal Seek Tool to Locate a Single Root
Problem Statement. Employ Goal Seek to determine the root of the transcendental function
f(x) 5 x 2 cosx
Solution. As in Fig. 7.6, the key to solving a single equation with Excel is creating a cell to
hold the value of the function in question and then making the value dependent on another cell.
Once this is done, the selection Goal Seek is chosen from the What-If Analysis button on your
Data ribbon. At this point a dialogue box will be displayed, asking you to set a cell to a value by
changing another cell. For the example, suppose that as in Fig. 7.6 your guess is entered in cell
A11 and your function result in cell B11. The Goal Seek dialogue box would be filled out as
When the OK button is selected, a message box displays the results,
The cells on the spreadsheet would also be modified to the new values (as shown in Fig. 7.6).
The Solver tool is more sophisticated than Goal Seek in that (1) it can vary several
cells simultaneously and (2) along with driving a target cell to a value, it can minimize
and maximize its value. The next example illustrates how it can be used to solve a system
of nonlinear equations.
EXAMPLE 7.5 Using Excel’s Solver for a Nonlinear System
Problem Statement. Recall that in Sec. 6.6 we obtained the solution of the following
set of simultaneous equations,
u(x, y) 5 x2
1 xy 2 10 5 0
y(x, y) 5 y 1 3xy2
2 57 5 0
194 ROOTS OF POLYNOMIALS
S
O
F
T
W
A
R
E
Note that a correct pair of roots is x 5 2 and y 5 3. Use Solver to determine the roots
using initial guesses of x 5 1 and y 5 3.5.
Solution. As shown below, two cells (B1 and B2) can be created to hold the guesses for x and
y. The function values themselves, u(x, y) and y(x, y) can then be entered into two other cells
(B3 and B4). As can be seen, the initial guesses result in function values that are far from zero.
Next, another cell can be created that contains a single value reflecting how close both
functions are to zero. One way to do this is to sum the squares of the function values. This
is done and the result entered in cell B6. If both functions are at zero, this function should
also be at zero. Further, using the squared functions avoids the possibility that both func-
tions could have the same nonzero value, but with opposite signs. For this case, the target
cell (B6) would be zero, but the roots would be incorrect.
Once the spreadsheet is created, the selection Solver is chosen from the Data ribbon.1
At this point a dialogue box will be displayed, querying you for pertinent information.
The pertinent cells of the Solver dialogue box would be filled out as
1
Note that you may have to install Solver by choosing Office, Excel Options, Add-Ins. Select Excel Add-Ins
from the Manage drop-down box at the bottom of the Excel options menu and click Go. Then, check the
Solver box. The Solver then should be installed and a button to access it should appear on your Data ribbon.
7.7 ROOT LOCATION WITH SOFTWARE PACKAGES 195
When the OK button is selected, a dialogue box will open with a report on the success
of the operation. For the present case, the Solver obtains the correct solution:
It should be noted that the Solver can fail. Its success depends on (1) the condition
of the system of equations and/or (2) the quality of the initial guesses. Thus, the suc-
cessful outcome of the previous example is not guaranteed. Despite this, we have found
Solver useful enough to make it a feasible option for quickly obtaining roots in a wide
range of engineering applications.
7.7.2 MATLAB
As summarized in Table 7.1, MATLAB software is capable of locating roots of single
algebraic and transcendental equations. It is superb at manipulating and locating the roots
of polynomials.
The fzero function is designed to locate one root of a single function. A simplified
representation of its syntax is
fzero(f,x0,options)
where f is the function you are analyzing, x0 is the initial guess, and options are the
optimization parameters (these are changed using the function optimset). If options
are omitted, default values are employed. Note that one or two guesses can be employed.
If two guesses are employed, they are assumed to bracket a root. The following example
illustrates how fzero can be used.
TABLE 7.1 Common functions in MATLAB related to root
location and polynomial manipulation.
Function Description
fzero Root of single function.
roots Find polynomial roots.
poly Construct polynomial with specified roots.
polyval Evaluate polynomial.
polyvalm Evaluate polynomial with matrix argument.
residue Partial-fraction expansion (residues).
polyder Differentiate polynomial.
conv Multiply polynomials.
deconv Divide polynomials.
196 ROOTS OF POLYNOMIALS
S
O
F
T
W
A
R
E
EXAMPLE 7.6 Using MATLAB for Root Location
Problem Statement. Use the MATLAB function fzero to find the roots of
f(x) 5 x10
2 1
within the interval xl 5 0 and xu 5 4. Obviously two roots occur at 21 and 1. Recall
that in Example 5.6, we used the false-position method with initial guesses of 0 and 1.3
to determine the positive root.
Solution. Using the same initial conditions as in Example 5.6, we can use MATLAB
to determine the positive root as in
 x0=[0 1.3];
 x=fzero(@(x) x^10–1,x0)
x =
1
In a similar fashion, we can use initial guesses of 21.3 and 0 to determine the negative
root,
 x0=[21.3 0];
 x=fzero(@(x) x^10–1,x0)
x =
–1
We can also employ a single guess. An interesting case would be to use an initial
guess of 0,
 x0=0;
 x=fzero(@(x) x^10–1,x0)
x =
–1
Thus, for this guess, the underlying algorithm happens to home in on the negative root.
The use of optimset can be illustrated by using it to display the actual iterations
as the solution progresses:
 x0=0;
 option=optimset('DISP','ITER');
 x=fzero(@(x) x^10–1,x0,option)
Func–count x f(x) Procedure
1 0 –1 initial
2 –0.0282843 –1 search
3 0.0282843 –1 search
4 –0.04 –1 search
•
•
•
21 0.64 –0.988471 search
22 –0.905097 –0.631065 search
7.7 ROOT LOCATION WITH SOFTWARE PACKAGES 197
23 0.905097 –0.631065 search
24 –1.28 10.8059 search
Looking for a zero in the interval [–1.28, 0.9051]
25 0.784528 –0.911674 interpolation
26 –0.247736 –0.999999 bisection
27 –0.763868 –0.932363 bisection
28 –1.02193 0.242305 bisection
29 –0.968701 –0.27239 interpolation
30 –0.996873 –0.0308299 interpolation
31 –0.999702 –0.00297526 interpolation
32 –1 5.53132e–006 interpolation
33 –1 –7.41965e–009 interpolation
34 –1 –1.88738e–014 interpolation
35 –1 0 interpolation
Zero found in the interval: [–1.28, 0.9051].
x 5
21
These results illustrate the strategy used by fzero when it is provided with a
single guess. First, it searches in the vicinity of the guess until it detects a sign change.
Then it uses a combination of bisection and interpolation to home in on the root. The
interpolation involves both the secant method and inverse quadratic interpolation (recall
Sec. 7.4). It should be noted that the fzero algorithm has more to it than this basic
description might imply. You can consult Press et al. (2007) for additional details.
EXAMPLE 7.7 Using MATLAB to Manipulate and Determine the Roots of Polynomials
Problem Statement. Explore how MATLAB can be employed to manipulate and de-
termine the roots of polynomials. Use the following equation from Example 7.3,
f5(x) 5 x5
2 3.5x4
1 2.75x3
1 2.125x2
2 3.875x 1 1.25 (E7.7.1)
which has three real roots: 0.5, 21.0, and 2, and one pair of complex roots: 1 6 0.5i.
Solution. Polynomials are entered into MATLAB by storing the coefficients as a vector.
For example, at the MATLAB prompt (.
.) typing and entering the follow line stores
the coefficients in the vector a,
 a=[1 –3.5 2.75 2.125 –3.875 1.25];
We can then proceed to manipulate the polynomial. For example, we can evaluate it at
x 5 1 by typing
 polyval(a,1)
with the result 1(1)5
2 3.5(1)4
1 2.75(1)3
1 2.125(1)2
2 3.875(1) 1 1.25 5 20.25,
ans =
–0.2500
198 ROOTS OF POLYNOMIALS
S
O
F
T
W
A
R
E
We can evaluate the derivative f9(x) 5 5x4
2 14x3
1 8.25x2
1 4.25x 2 3.875 by
 polyder(a)
ans =
5.0000 –14.0000 8.2500 4.2500 –3.8750
Next, let us create a quadratic polynomial that has roots corresponding to two of the original
roots of Eq. (E7.7.1): 0.5 and 21. This quadratic is (x 2 0.5)(x 1 1) 5 x2
1 0.5x 2 0.5
and can be entered into MATLAB as the vector b,
 b=[1 0.5 –0.5];
We can divide this polynomial into the original polynomial by
 [d,e]5deconv(a,b)
with the result being a quotient (a third-order polynomial d) and a remainder (e),
d =
1.0000 –4.0000 5.2500 –2.5000
e =
0 0 0 0 0 0
Because the polynomial is a perfect divisor, the remainder polynomial has zero coeffi-
cients. Now, the roots of the quotient polynomial can be determined as
 roots(d)
with the expected result that the remaining roots of the original polynomial (E7.7.1) are found,
ans =
2.0000
1.0000 + 0.5000i
1.0000 – 0.5000i
We can now multiply d by b to come up with the original polynomial,
 conv(d,b)
ans =
1.0000 –3.5000 2.7500 2.1250 –3.8750 1.2500
Finally, we can determine all the roots of the original polynomial by
 r5roots(a)
r =
–1.0000
2.0000
1.0000 + 0.5000i
1.0000 – 0.5000i
0.5000
7.7 ROOT LOCATION WITH SOFTWARE PACKAGES 199
7.7.3 Mathcad
Mathcad has a numeric mode function called root that can be used to solve an equation of a
single variable. The method requires that you supply a function f(x) and either an initial guess
or a bracket. When a single guess value is used, root uses the Secant and Müller methods. In
the case where two guesses that bracket a root are supplied, it uses a combination of the
Ridder method (a variation of false position) and Brent’s method. It iterates until the magnitude
of f(x) at the proposed root is less than the predefined value of TOL. The Mathcad imple-
mentation has similar advantages and disadvantages as conventional root location methods
such as issues concerning the quality of the initial guess and the rate of convergence.
Mathcad can find all the real or complex roots of polynomials with polyroots. This nu-
meric or symbolic mode function is based on the Laguerre method. This function does not
require initial guesses, and all the roots are returned at the same time.
Mathcad contains a numeric mode function called Find that can be used to solve up to
50 simultaneous nonlinear algebraic equations. The Find function chooses an appropriate
method from a group of available methods, depending on whether the problem is linear or
nonlinear, and other attributes. Acceptable values for the solution may be unconstrained or
constrained to fall within specified limits. If Find fails to locate a solution that satisfies the
equations and constraints, it returns the error message “did not find solution.” However, Mathcad
also contains a similar function called Minerr. This function gives solution results that mini-
mize the errors in the constraints even when exact solutions cannot be found. Thus, the prob-
lem of solving for the roots of nonlinear equations is closely related to both optimization and
nonlinear least squares. These areas and Minerr are covered in detail in Parts Four and Five.
Figure 7.7 shows a typical Mathcad worksheet. The menus at the top provide quick
access to common arithmetic operators and functions, various two- and three-dimensional
FIGURE 7.7
Mathcad screen to find the root
of a single equation.
200 ROOTS OF POLYNOMIALS
S
O
F
T
W
A
R
E
plot types, and the environment to create subprograms. Equations, text, data, or graphs
can be placed anywhere on the screen. You can use a variety of fonts, colors, and styles
to construct worksheets with almost any design and format that pleases you. Consult the
summary of the Mathcad User’s manual in Appendix C or the full manual available from
MathSoft. Note that in all our Mathcad examples, we have tried to fit the entire Mathcad
session onto a single screen. You should realize that the graph would have to be placed
below the commands to work properly.
Let’s start with an example that solves for the root of f(x) 5 x 2 cos x. The first
step is to enter the function. This is done by typing f(x): which is automatically converted
to f(x):5 by Mathcad. The :5 is called the definition symbol. Next an initial guess is
input in a similar manner using the definition symbol. Now, soln is defined as root(f(x), x),
which invokes the secant method with a starting value of 1.0. Iteration is continued until
f(x) evaluated at the proposed root is less than TOL. The value of TOL is set from the
Math/Options pull down menu. Finally the value of soln is displayed using a normal
equal sign (5). The number of significant figures is set from the Format/Number pull
down menu. The text labels and equation definitions can be placed anywhere on the
screen in a number of different fonts, styles, sizes, and colors. The graph can be placed
anywhere on the worksheet by clicking to the desired location. This places a red cross
hair at that location. Then use the Insert/Graph/X-Y Plot pull down menu to place an
empty plot on the worksheet with place-holders for the expressions to be graphed and
for the ranges of the x and y axes. Simply type f(z) in the placeholder on the y axis and
210 and 10 for the z-axis range. Mathcad does all the rest to produce the graph shown
in Fig. 7.7. Once the graph has been created you can use the Format/Graph/X-Y Plot
pull down menu to vary the type of graph; change the color, type, and weight of the
trace of the function; and add titles, labels and other features.
Figure 7.8 shows how Mathcad can be used to find the roots of a polynomial using
the polyroots function. First, p(x) and v are input using the :5 definition symbol. Note
that v is a vector that contains the coefficients of the polynomial starting with zero-order
term and ending in this case with the third-order term. Next, r is defined (using :5) as
polyroots(v), which invokes the Laguerre method. The roots contained in r are displayed
as rT
using a normal equal sign (5). Next, a plot is constructed in a manner similar to the
above, except that now two range variables, x and j, are used to define the range of the x
axis and the location of the roots. The range variable for x is constructed by typing x and
then “:” (which appears as :5) and then 24, and then “,” and then 23.99, and then “;”
(which is transformed into .. by Mathcad), and finally 4. This creates a vector of values of
x ranging from 24 to 4 with an increment of 0.01 for the x axis with corresponding values
for p(x) on the y axis. The j range variable is used to create three values for r and p(r) that
are plotted as individual small circles. Note that again, in our effort to fit the entire Mathcad
session onto a single screen, we have placed the graph above the commands. You should
realize that the graph would have to be below the commands to work properly.
The last example shows the solution of a system of nonlinear equations using a
Mathcad Solve Block (Fig. 7.9). The process begins with using the definition symbol to
create initial guesses for x and y. The word Given then alerts Mathcad that what follows
is a system of equations. Then comes the equations and inequalities (not used here). Note
that for this application Mathcad requires the use of a symbolic equal sign typed as
[Ctrl]5 or , and . to separate the left and right sides of an equation. Now, the variable
vec is defined as Find (x,y) and the value of vec is shown using an equal sign.
7.7 ROOT LOCATION WITH SOFTWARE PACKAGES 201
FIGURE 7.8
Mathcad screen to solve for
roots of polynomial.
FIGURE 7.9
Mathcad screen to solve a
system of nonlinear equations.
202 ROOTS OF POLYNOMIALS
PROBLEMS
7.1 Divide a polynomial f(x) 5 x4
2 7.5x3
1 14.5x2
1 3x 2 20
by the monomial factor x 2 2. Is x 5 2 a root?
7.2 Divide a polynomial f(x) 5 x5
2 5x4
1 x3
2 6x2
2 7x 1 10 by
the monomial factor x 2 2.
7.3 Use Müller’s method to determine the positive real root of
(a) f(x) 5 x3
1 x2
2 4x 2 4
(b) f(x) 5 x3
2 0.5x2
1 4x 2 2
7.4 Use Müller’s method or MATLAB to determine the real and
complex roots of
(a) f(x) 5 x3
2 x2
1 2x 2 2
(b) f(x) 5 2x4
1 6x2
1 8
(c) f(x) 5 x4
2 2x3
1 6x2
2 2x 1 5
7.5 Use Bairstow’s method to determine the roots of
(a) f(x) 5 22 1 6.2x 2 4x2
1 0.7x3
(b) f(x) 5 9.34 2 21.97x 1 16.3x2
2 3.704x3
(c) f(x) 5 x4
2 2x3
1 6x2
2 2x 1 5
7.6 Develop a program to implement Müller’s method. Test it by
duplicating Example 7.2.
7.7 Use the program developed in Prob. 7.6 to determine the real
roots of Prob. 7.4a. Construct a graph (by hand or with a software
package) to develop suitable starting guesses.
7.8 Develop a program to implement Bairstow’s method. Test it by
duplicating Example 7.3.
7.9 Use the program developed in Prob. 7.8 to determine the roots
of the equations in Prob. 7.5.
7.10 Determine the real root of x3.5
5 80 with Excel, MATLAB or
Mathcad.
7.11 The velocity of a falling parachutist is given by
y 5
gm
c
(1 2 e2(cym)t
)
where g 5 9.81 m/s2
. For a parachutist with a drag coefficient c 5
15 kg/s, compute the mass m so that the velocity is y 5 35 m/s at
t 5 8 s. Use Excel, MATLAB or Mathcad to determine m.
7.12 Determine the roots of the simultaneous nonlinear equations
y 5 2x2
1 x 1 0.75
y 1 5xy 5 x2
Employ initial guesses of x 5 y 5 1.2 and use the Solver tool from
Excel or a software package of your choice.
7.13 Determine the roots of the simultaneous nonlinear equations
(x 2 4)2
1 (y 2 4)4
5 5
x2
1 y2
5 16
Use a graphical approach to obtain your initial guesses. Determine
refined estimates with the Solver tool from Excel or a software
package of your choice.
7.14 Perform the identical MATLAB operations as those in
Example 7.7 or use a software package of your choice to find all the
roots of the polynomial
f(x) 5 (x 1 2)(x 1 5)(x 2 6)(x 2 4)(x 2 8)
Note that the poly function can be used to convert the roots to a
polynomial.
7.15 Use MATLAB or Mathcad to determine the roots for the
equations in Prob. 7.5.
7.16 A two-dimensional circular cylinder is placed in a high-speed
uniform flow. Vortices shed from the cylinder at a constant
frequency, and pressure sensors on the rear surface of the cylinder
detect this frequency by calculating how often the pressure oscil-
lates. Given three data points, use Müller’s method to find the time
where the pressure was zero.
Time 0.60 0.62 0.64
Pressure 20 50 60
7.17 When trying to find the acidity of a solution of magne-
sium hydroxide in hydrochloric acid, we obtain the following
equation
A(x) 5 x3
1 3.5x2
2 40
where x is the hydronium ion concentration. Find the hydronium
ion concentration for a saturated solution (acidity equals zero)
using two different methods in MATLAB (for example, graphically
and the roots function).
7.18 Consider the following system with three unknowns a, u,
and y:
u2
2 2y2
5 a2
u 1 y 5 2
a2
2 2a 2 u 5 0
Solve for the real values of the unknowns using: (a) the Excel
Solver and (b) a symbolic manipulator software package.
7.19 In control systems analysis, transfer functions are developed
that mathematically relate the dynamics of a system’s input to its
output. A transfer function for a robotic positioning system is
given by
G(s) 5
C(s)
N(s)
5
s3
1 9s2
1 26s 1 24
s4
1 15s3
1 77s2
1 153s 1 90
where G(s) 5 system gain, C(s) 5 system output, N(s) 5 system
input, and s 5 Laplace transform complex frequency. Use a
PROBLEMS 203
numerical technique to find the roots of the numerator and denomi-
nator and factor these into the form
G(s) 5
(s 1 a1)(s 1 a2)(s 1 a3)
(s 1 b1)(s 1 b2)(s 1 b3)(s 1 b4)
where ai and bi 5 the roots of the numerator and denominator,
respectively.
7.20 Develop an M-file function for bisection in a similar fashion
to Fig. 5.10. Test the function by duplicating the computations from
Examples 5.3 and 5.4.
7.21 Develop an M-file function for the false-position method. The
structure of your function should be similar to the bisection
algorithm outlined in Fig. 5.10. Test the program by duplicating
Example 5.5.
7.22 Develop an M-file function for the Newton-Raphson method
based on Fig. 6.4 and Sec. 6.2.3. Along with the initial guess, pass
the function and its derivative as arguments. Test it by duplicating
the computation from Example 6.3.
7.23 Develop an M-file function for the secant method based on
Fig. 6.4 and Sec. 6.3.2. Along with the two initial guesses, pass the
function as an argument. Test it by duplicating the computation
from Example 6.6.
7.24 Develop an M-file function for the modified secant method
based on Fig. 6.4 and Sec. 6.3.2. Along with the initial guess and
the perturbation fraction, pass the function as an argument. Test it
by duplicating the computation from Example 6.8.
204
8
C H A P T E R 8
Case Studies:
Roots of Equations
The purpose of this chapter is to use the numerical procedures discussed in Chaps. 5, 6,
and 7 to solve actual engineering problems. Numerical techniques are important for
practical applications because engineers frequently encounter problems that cannot be
approached using analytical techniques. For example, simple mathematical models that
can be solved analytically may not be applicable when real problems are involved. Thus,
more complicated models must be employed. For these cases, it is appropriate to imple-
ment a numerical solution on a computer. In other situations, engineering design prob-
lems may require solutions for implicit variables in complicated equations.
The following case studies are typical of those that are routinely encountered during
upper-class courses and graduate studies. Furthermore, they are representative of prob-
lems you will address professionally. The problems are drawn from the four major
disciplines of engineering: chemical, civil, electrical, and mechanical. These applications
also serve to illustrate the trade-offs among the various numerical techniques.
The first application, taken from chemical engineering, provides an excellent example
of how root-location methods allow you to use realistic formulas in engineering practice.
In addition, it also demonstrates how the efficiency of the Newton-Raphson technique is
used to advantage when a large number of root-location computations is required.
The following engineering design problems are taken from civil, electrical, and mechan-
ical engineering. Section 8.2 uses bisection to determine changes in rainwater chemistry due
to increases in atmospheric carbon dioxide. Section 8.3 shows how the roots of transcendental
equations can be used in the design of an electrical circuit. Sections 8.2 and 8.3 also illustrate
how graphical methods provide insight into the root-location process. Finally, Sec. 8.4 uses a
variety of numerical methods to compute the friction factor for fluid flow in a pipe.
8.1 IDEAL AND NONIDEAL GAS LAWS
(CHEMICAL/BIO ENGINEERING)
Background. The ideal gas law is given by
pV 5 nRT (8.1)
where p is the absolute pressure, V is the volume, n is the number of moles, R is the
universal gas constant, and T is the absolute temperature. Although this equation is
8.1 IDEAL AND NONIDEAL GAS LAWS 205
widely used by engineers and scientists, it is accurate over only a limited range of pres-
sure and temperature. Furthermore, Eq. (8.1) is more appropriate for some gases than
for others.
An alternative equation of state for gases is given by
ap 1
a
y2
b(y 2 b) 5 RT (8.2)
known as the van der Waals equation, where y 5 V/n is the molal volume and a and b
are empirical constants that depend on the particular gas.
A chemical engineering design project requires that you accurately estimate the molal
volume (y) of both carbon dioxide and oxygen for a number of different temperature and
pressure combinations so that appropriate containment vessels can be selected. It is also
of interest to examine how well each gas conforms to the ideal gas law by comparing the
molal volume as calculated by Eqs. (8.1) and (8.2). The following data are provided:
R 5 0.082054 L atm/(mol K)
a 5 3.592
b 5 0.04267
f carbon dioxide
a 5 1.360
b 5 0.03183
f oxygen
The design pressures of interest are 1, 10, and 100 atm for temperature combinations of
300, 500, and 700 K.
Solution. Molal volumes for both gases are calculated using the ideal gas law, with n 5 1.
For example, if p 5 1 atm and T 5 300 K,
y 5
V
n
5
RT
p
5 0.082054
L atm
mol K
300 K
1 atm
5 24.6162 L/mol
These calculations are repeated for all temperature and pressure combinations and
presented in Table 8.1.
TABLE 8.1 Computations of molal volume.
Molal Volume Molal Volume
Molal Volume (van der Waals) (van der Waals)
Temperature, Pressure, (Ideal Gas Law), Carbon Dioxide, Oxygen,
K atm L/mol L/mol L/mol
300 1 24.6162 24.5126 24.5928
10 2.4616 2.3545 2.4384
100 0.2462 0.0795 0.2264
500 1 41.0270 40.9821 41.0259
10 4.1027 4.0578 4.1016
100 0.4103 0.3663 0.4116
700 1 57.4378 57.4179 57.4460
10 5.7438 5.7242 5.7521
100 0.5744 0.5575 0.5842
206 CASE STUDIES: ROOTS OF EQUATIONS
The computation of molal volume from the van der Waals equation can be accom-
plished using any of the numerical methods for finding roots of equations discussed in
Chaps. 5, 6, and 7, with
f(y) 5 ap 1
a
y2
b(y 2 b) 2 RT (8.3)
In this case, the derivative of f(y) is easy to determine and the Newton-Raphson method is
convenient and efficient to implement. The derivative of f(y) with respect to y is given by
f¿(y) 5 p 2
a
y2
1
2ab
y3 (8.4)
The Newton-Raphson method is described by Eq. (6.6):
yi11 5 yi 2
f(yi)
f¿(yi)
which can be used to estimate the root. For example, using the initial guess of 24.6162,
the molal volume of carbon dioxide at 300 K and 1 atm is computed as 24.5126 L/mol.
This result was obtained after just two iterations and has an ea of less than 0.001 percent.
Similar computations for all combinations of pressure and temperature for both gases
are presented in Table 8.1. It is seen that the results for the ideal gas law differ from
those for the van der Waals equation for both gases, depending on specific values for p
and T. Furthermore, because some of these results are significantly different, your design
of the containment vessels would be quite different, depending on which equation of
state was used.
In this case, a complicated equation of state was examined using the Newton-Raphson
method. The results varied significantly from the ideal gas law for several cases. From
a practical standpoint, the Newton-Raphson method was appropriate for this application
because f9(y) was easy to calculate. Thus, the rapid convergence properties of the
Newton-Raphson method could be exploited.
In addition to demonstrating its power for a single computation, the present design
problem also illustrates how the Newton-Raphson method is especially attractive when
numerous computations are required. Because of the speed of digital computers, the
efficiency of various numerical methods for most roots of equations is indistinguishable
for a single computation. Even a 1-s difference between the crude bisection approach
and the efficient Newton-Raphson does not amount to a significant time loss when only
one computation is performed. However, suppose that millions of root evaluations are
required to solve a problem. In this case, the efficiency of the method could be a decid-
ing factor in the choice of a technique.
For example, suppose that you are called upon to design an automatic computerized
control system for a chemical production process. This system requires accurate estimates
of molal volumes on an essentially continuous basis to properly manufacture the final
product. Gauges are installed that provide instantaneous readings of pressure and tempera-
ture. Evaluations of y must be obtained for a variety of gases that are used in the process.
For such an application, bracketing methods such as bisection or false position would
probably be too time-consuming. In addition, the two initial guesses that are required for
8.2 GREENHOUSE GASES AND RAINWATER 207
these approaches may also interject a critical delay in the procedure. This shortcoming
is relevant to the secant method, which also needs two initial estimates.
In contrast, the Newton-Raphson method requires only one guess for the root. The
ideal gas law could be employed to obtain this guess at the initiation of the process.
Then, assuming that the time frame is short enough so that pressure and temperature do
not vary wildly between computations, the previous root solution would provide a good
guess for the next application. Thus, the close guess that is often a prerequisite for con-
vergence of the Newton-Raphson method would automatically be available. All the above
considerations would greatly favor the Newton-Raphson technique for such problems.
8.2 GREENHOUSE GASES AND RAINWATER
(CIVIL/ENVIRONMENTAL ENGINEERING)
Background. Civil engineering is a broad field that includes such diverse areas as structural,
geotechnical, transportation, water-resources, and environmental engineering. The last area has
traditionally dealt with pollution control. However, in recent years, environmental engineers
(as well as chemical engineers) have addressed broader problems such as climate change.
It is well documented that the atmospheric levels of several greenhouse gases have
been increasing over the past 50 years. For example, Fig. 8.1 shows data for the partial
pressure of carbon dioxide (CO2) collected at Mauna Loa, Hawaii, from 1958 through
2003. The trend in the data can be nicely fit with a quadratic polynomial (in Part Five, we
will learn how to determine such polynomials),
pCO2
5 0.011825(t 2 1980.5)2
1 1.356975(t 2 1980.5) 1 339
where pCO2
5 the partial pressure of CO2 in the atmosphere [ppm]. The data indicate that
levels have increased over 19% during the period from 315 to 376 ppm.
FIGURE 8.1
Average annual partial pressures of atmospheric carbon dioxide (ppm) measured at Mauna Loa,
Hawaii.
310
330
350
370
1950 1960 1970 1980 1990 2000 2010
pCO2
(ppm)
208 CASE STUDIES: ROOTS OF EQUATIONS
Aside from global warming, greenhouse gases can also influence atmospheric chemistry.
One question that we can address is how the carbon dioxide trend is affecting the pH of
rainwater. Outside of urban and industrial areas, it is well documented that carbon dioxide is
the primary determinant of the pH of the rain. pH is the measure of the activity of hydrogen
ions and, therefore, its acidity. For dilute aqueous solutions, it can be computed as
pH 5 2log10[H1
] (8.5)
where [H1
] is the molar concentration of hydrogen ions.
The following five nonlinear system of equations govern the chemistry of rainwater,
K1 5 106 [H1
][HCO2
3 ]
KH pCO2
(8.6)
K2 5
[H1
][CO22
3 ]
[HCO2
3 ]
(8.7)
Kw 5 [H1
][OH2
] (8.8)
cT 5
KH pCO2
106
1 [HCO2
3 ] 1 [CO22
3 ] (8.9)
0 5 [HCO2
3 ] 1 2[CO22
3 ] 1 [OH2
] 2 [H1
] (8.10)
where KH 5 Henry’s constant, and K1, K2, and Kw are equilibrium coefficients. The five
unknowns in this system of five nonlinear equations are cT 5 total inorganic carbon,
[HCO2
3 ] 5 bicarbonate, [CO22
3 ] 5 carbonate, [H1
] 5 hydrogen ion, and [OH2
] 5
hydroxyl ion. Notice how the partial pressure of CO2 shows up in Eqs. (8.6) and (8.9).
Use these equations to compute the pH of rainwater given that KH 5 1021.46
,
K1 5 1026.3
, K2 5 10210.3
, and Kw 5 10214
. Compare the results in 1958 when the pCO2
was 315 and in 2003 when it was 375 ppm. When selecting a numerical method for your
computation, consider the following:
You know with certainty that the pH of rain in pristine areas always falls between
2 and 12.
You also know that your measurement devices can only measure pH to two places of
decimal precision.
Solution. There are a variety of ways to solve this nonlinear system of five equations.
One way is to eliminate unknowns by combining them to produce a single function that
only depends on [H1
]. To do this, first solve Eqs. (8.6) and (8.7) for
[HCO2
3 ] 5
K1
106
[H1
]
KH pCO2
(8.11)
[CO22
3 ] 5
K2[HCO2
3 ]
[H1
]
(8.12)
Substitute Eq. (8.11) into (8.12)
[CO22
3 ] 5
K2K1
106
[H1
]2
KH pCO2
(8.13)
8.3 DESIGN OF AN ELECTRIC CIRCUIT 209
Equations (8.11) and (8.13) can be substituted along with Eq. (8.8) into Eq. (8.10) to give
0 5
K1
106
[H1
]
KH pCO2
1 2
K2K1
106
[H1
]2
KH pCO2
1
Kw
[H1
]
2 [H1
] (8.14)
Although it might not be apparent, this result is a third-order polynomial in [H1
]. Thus,
its root can be used to compute the pH of the rainwater.
Now we must decide which numerical method to employ to obtain the solution.
There are two reasons why bisection would be a good choice. First, the fact that the pH
always falls within the range from 2 to 12 provides us with two good initial guesses.
Second, because the pH can only be measured to two decimal places of precision, we
will be satisfied with an absolute error of Ea,d 5 0.005. Remember that given an initial
bracket and the desired relative error, we can compute the number of iterations a priori.
Using Eq. (5.5), the result is n 5 log2(10)0.005 5 10.9658. Thus, eleven iterations of
bisection will produce the desired precision.
If this is done, the result for 1958 will be a pH of 5.6279 with a relative error of
0.0868%. We can be confident that the rounded result of 5.63 is correct to two decimal
places. This can be verified by performing another run with more iterations. For example,
if we perform 35 iterations, a result of 5.6304 is obtained with an approximate relative
error of εa 5 5.17 3 1029
%. The same calculation can be repeated for the 2003 condi-
tions to give pH 5 5.59 with εa 5 0.0874%.
Interestingly, these results indicate that the 19% rise in atmospheric CO2 levels has
produced only a 0.67% drop in pH. Although this is certainly true, remember that the
pH represents a logarithmic scale as defined by Eq. (8.5). Consequently, a unit drop in
pH represents a 10-fold increase in hydrogen ion. The concentration can be computed
as [H1
] 5 102pH
and the resulting percent change can be calculated as 9.1%. Therefore,
the hydrogen ion concentration has increased about 9%.
There is quite a lot of controversy related to the true significance of the greenhouse gas
trends. However, regardless of the ultimate implications, it is quite sobering to realize that
something as large as our atmosphere has changed so much over a relatively short time
period. This case study illustrates how numerical methods can be employed to analyze and
interpret such trends. Over the coming years, engineers and scientists can hopefully use such
tools to gain increased understanding and help rationalize the debate over their ramifications.
8.3 DESIGN OF AN ELECTRIC CIRCUIT
(ELECTRICAL ENGINEERING)
Background. Electrical engineers often use Kirchhoff’s laws to study the steady-state
(not time-varying) behavior of electric circuits. Such steady-state behavior will be exam-
ined in Sec. 12.3. Another important problem involves circuits that are transient in nature
where sudden temporal changes take place. Such a situation occurs following the closing
of the switch in Fig. 8.2. In this case, there will be a period of adjustment following the
closing of the switch as a new steady state is reached. The length of this adjustment
period is closely related to the storage properties of the capacitor and the inductor. Energy
storage may oscillate between these two elements during a transient period. However,
resistance in the circuit will dissipate the magnitude of the oscillations.
210 CASE STUDIES: ROOTS OF EQUATIONS
The flow of current through the resistor causes a voltage drop (VR) given by
VR 5 iR
where i 5 the current and R 5 the resistance of the resistor. When R and i have units
of ohms and amperes, respectively, VR has units of volts.
Similarly, an inductor resists changes in current, such that the voltage drop VL across
it is
VL 5 L
di
dt
where L 5 the inductance. When L and i have units of henrys and amperes, respectively,
VL has units of volts and t has units of seconds.
The voltage drop across the capacitor (VC) depends on the charge (q) on it:
VC 5
q
C
(8.15)
where C 5 the capacitance. When the charge is expressed in units of coulombs, the unit
of C is the farad.
Kirchhoff’s second law states that the algebraic sum of voltage drops around a closed
circuit is zero. After the switch is closed we have
L
di
dt
1 Ri 1
q
C
5 0 (8.16)
However, the current is related to the charge according to
i 5
dq
dt
(8.17)
Therefore,
L
d2
q
dt2
1 R
dq
dt
1
1
C
q 5 0 (8.18)
This is a second-order linear ordinary differential equation that can be solved using the
methods of calculus. This solution is given by
q(t) 5 q0e2Rty(2L)
cos c
B
1
LC
2 a
R
2L
b
2
td (8.19)
FIGURE 8.2
An electric circuit. When the
switch is closed, the current will
undergo a series of oscillations
until a new steady state is
reached.
Switch
Resistor
Capacitor
–
+
V0
i
–
+
Battery Inductor
8.3 DESIGN OF AN ELECTRIC CIRCUIT 211
where at t 5 0, q 5 q0 5 V0C, and V0 5 the voltage from the charging battery. Equation
(8.19) describes the time variation of the charge on the capacitor. The solution q(t) is
plotted in Fig. 8.3.
A typical electrical engineering design problem might involve determining the proper
resistor to dissipate energy at a specified rate, with known values for L and C. For this prob-
lem, assume the charge must be dissipated to 1 percent of its original value (qq0 5 0.01)
in t 5 0.05 s, with L 5 5 H and C 5 1024
F.
Solution. It is necessary to solve Eq. (8.19) for R, with known values of q, q0, L, and
C. However, a numerical approximation technique must be employed because R is an
implicit variable in Eq. (8.19). The bisection method will be used for this purpose. The
other methods discussed in Chaps. 5 and 6 are also appropriate, although the Newton-
Raphson method might be deemed inconvenient because the derivative of Eq. (8.19) is
a little cumbersome. Rearranging Eq. (8.19),
f(R) 5 e2Rty(2L)
cos c
B
1
LC
2 a
R
2L
b
2
td 2
q
q0
or using the numerical values given,
f(R) 5 e20.005R
cos[22000 2 0.01R2
(0.05)] 2 0.01 (8.20)
Examination of this equation suggests that a reasonable initial range for R is 0 to 400 V
(because 2000 2 0.01R2
must be greater than zero). Figure 8.4, a plot of Eq. (8.20),
confirms this. Twenty-one iterations of the bisection method give R 5 328.1515 V, with
an error of less than 0.0001 percent.
FIGURE 8.3
The charge on a capacitor as a
function of time following the
closing of the switch in
Fig. 8.2.
q(t)
q0
Time
FIGURE 8.4
Plot of Eq. (8.20) used to
obtain initial guesses for R
that bracket the root.
f (R)
R
0.0
–0.2
–0.4
–0.6
200
Root ⯝ 325
400
212 CASE STUDIES: ROOTS OF EQUATIONS
Thus, you can specify a resistor with this rating for the circuit shown in Fig. 8.2
and expect to achieve a dissipation performance that is consistent with the requirements
of the problem. This design problem could not be solved efficiently without using the
numerical methods in Chaps. 5 and 6.
8.4 PIPE FRICTION (MECHANICAL/AEROSPACE ENGINEERING)
Background. Determining fluid flow through pipes and tubes has great relevance in
many areas of engineering and science. In mechanical and aerospace engineering, typical
applications include the flow of liquids and gases through cooling systems.
The resistance to flow in such conduits is parameterized by a dimensionless number
called the friction factor. For turbulent flow, the Colebrook equation provides a means
to calculate the friction factor,
0 5
1
1f
1 2.0 log a
e
3.7D
1
2.51
Re1f
b (8.21)
where ␧ 5 the roughness (m), D 5 diameter (m), and Re 5 the Reynolds number,
Re 5
rVD
m
where  5 the fluid’s density (kg/m3
), V 5 its velocity (m/s), and  5 dynamic viscos-
ity (N ? s/m2
). In addition to appearing in Eq. (8.21), the Reynolds number also serves
as the criterion for whether flow is turbulent (Re . 4000).
In the present case study, we will illustrate how the numerical methods covered in this
part of the book can be employed to determine f for air flow through a smooth, thin tube.
For this case, the parameters are  5 1.23 kg/m3
,  5 1.79 3 1025
N ? s/m2
, D 5 0.005 m,
V 5 40 m/s, and ␧ 5 0.0015 mm. Note that friction factors range from about 0.008 to 0.08.
In addition, an explicit formulation called the Swamee-Jain equation provides an approxi-
mate estimate,
f 5
1.325
clna
e
3.7D
1
5.74
Re0.9
b d
2
(8.22)
Solution. The Reynolds number can be computed as
Re 5
rVD
m
5
1.23(40)0.005
1.79 3 1025
5 13,743
This value along with the other parameters can be substituted into Eq. (8.21) to give
g( f ) 5
1
1f
1 2.0 loga
0.0000015
3.7(0.005)
1
2.51
13,7431f
b
Before determining the root, it is advisable to plot the function to estimate initial
guesses and to anticipate possible difficulties. This can be done easily with tools such
8.4 PIPE FRICTION 213
as MATLAB software, Excel, or Mathcad. For example, a plot of the function can be
generated with the following MATLAB commands
 rho=1.23;mu=1.79e-5;D=0.005;V=40;e=0.0015/1000;
 Re=rho*V*D/mu;
 g=@(f) 1/sqrt(f)+2*log10(e/(3.7*D)+2.51/(Re*sqrt(f)));
 fplot(g,[0.008 0.08]),grid,xlabel('f'),ylabel('g(f)')
As in Fig. 8.5, the root is located at about 0.03.
Because we are supplied initial guesses (xl 5 0.008 and xu 5 0.08), either of the
bracketing methods from Chap. 5 could be used. For example, bisection gives a value
of f 5 0.0289678 with a percent relative error of error of 5.926 3 1025
in 22 iterations.
False position yields a result of similar precision in 26 iterations. Thus, although they
produce the correct result, they are somewhat inefficient. This would not be important
for a single application, but could become prohibitive if many evaluations were made.
We could try to attain improved performance by turning to an open method. Because
Eq. (8.21) is relatively straightforward to differentiate, the Newton-Raphson method is a good
candidate. For example, using an initial guess at the lower end of the range (x0 5 0.008),
Newton-Raphson converges quickly to 0.0289678 with an approximate error of 6.87 3 1026
%
in only 6 iterations. However, when the initial guess is set at the upper end of the range
(x0 5 0.08), the routine diverges!
As can be seen by inspecting Fig. 8.5, this occurs because the function’s slope at
the initial guess causes the first iteration to jump to a negative value. Further runs
demonstrate that for this case, convergence only occurs when the initial guess is below
about 0.066.
FIGURE 8.5
–3
0.02
0.01 0.03 0.04 0.05 0.06 0.07 0.08
–2
–1
0
1
2
3
4
5
6
g(
f
)
f
214 CASE STUDIES: ROOTS OF EQUATIONS
So we can see that although the Newton-Raphson is very efficient, it requires good
initial guesses. For the Colebrook equation, a good strategy might be to employ the
Swamee-Jain equation (Eq. 8.22) to provide the initial guess as in
f 5
1.325
clna
0.0000015
3.7(0.005)
1
5.74
137430.9
b d
2
5 0.029031
For this case, Newton-Raphson converges in only 3 iterations quickly to 0.0289678 with
an approximate error of 8.51 3 10210
%.
Aside from our homemade functions, we can also use professional root finders like
MATLAB’s built-in fzero function. However, just as with the Newton-Raphson method,
divergence also occurs when fzero function is used with a single guess. However, in
this case, guesses at the lower end of the range cause problems. For example,
 rho=1.23;mu=1.79e-5;D=0.005;V=40;e=0.0015/1000;
 Re=rho*V*D/mu
 g=@(f) 1/sqrt(f)+2*log10(e/(3.7*D)+2.51/(Re*sqrt(f)));
 fzero(g,0.008)
Exiting fzero: aborting search for an interval containing a
sign change because complex function value encountered
during search. (Function value at -0.0028 is -4.92028-
20.2423i.)
Check function or try again with a different starting value.
ans =
NaN
If the iterations are displayed using optimset (recall Sec. 7.7.2), it is revealed that a
negative value occurs during the search phase before a sign change is detected and the
routine aborts. However, for single initial guesses above about 0.016, the routine works
nicely. For example, for the guess of 0.08 that caused problems for Newton-Raphson,
fzero does just fine,
 fzero(g,0.08)
ans =
0.02896781017144
As a final note, let’s see whether convergence is possible for simple fixed-point iteration.
The easiest and most straightforward version involves solving for the first f in Eq. (8.21),
fi11 5
0.25
aloga
e
3.7D
1
2.51
Re1fi
bb
2
(8.23)
The cobweb display of this function depicted indicates a surprising result (Fig. 8.6).
Recall that fixed-point iteration converges when the y2 curve has a relatively flat slope
(i.e., Zg9()Z , 1). As indicated by Fig. 8.6, the fact that the y2 curve is quite flat in the
range from f 5 0.008 to 0.08 means that not only does fixed-point iteration converge,
PROBLEMS 215
but it converges fairly rapidly! In fact, for initial guesses anywhere between 0.008 and 0.08,
fixed-point iteration yields predictions with percent relative errors less than 0.008% in
six or fewer iterations. Thus, this simple approach that requires only one guess and no
derivative estimates performs really well for this particular case.
The take-home message from this case study is that even great, professionally-
developed software like MATLAB is not always foolproof. Further, there is usually no
single method that works best for all problems. Sophisticated users understand the
strengths and weaknesses of the available numerical techniques. In addition, they under-
stand enough of the underlying theory so that they can effectively deal with situations
where a method breaks down.
FIGURE 8.6
0
0.01
0.02
0.03
0.04
0.05
0 0.02 0.04 0.06 0.08
y2 = g(x)
y1 = x
x
y
PROBLEMS
Chemical/Bio Engineering
8.1 Perform the same computation as in Sec. 8.1, but for ethyl
alcohol (a 5 12.02 and b 5 0.08407) at a temperature of 375 K and
p of 2.0 atm. Compare your results with the ideal gas law. Use any
of the numerical methods discussed in Chaps. 5 and 6 to perform
the computation. Justify your choice of technique.
8.2 In chemical engineering, plug flow reactors (that is, those in
which fluid flows from one end to the other with minimal mixing
along the longitudinal axis) are often used to convert reactants into
products. It has been determined that the efficiency of the conver-
sion can sometimes be improved by recycling a portion of the
product stream so that it returns to the entrance for an additional
pass through the reactor (Fig. P8.2). The recycle rate is defined as
R 5
volume of fluid returned to entrance
volume leaving the system
Suppose that we are processing a chemicalA to generate a product B.
For the case where A forms B according to an autocatalytic reac-
tion (that is, in which one of the products acts as a catalyst or
stimulus for the reaction), it can be shown that an optimal recycle
rate must satisfy
216 CASE STUDIES: ROOTS OF EQUATIONS
moles of C that are produced. Conservation of mass can be used to
reformulate the equilibrium relationship as
K 5
(cc,0 1 x)
(ca,0 2 2x)2
(cb,0 2 x)
where the subscript 0 designates the initial concentration of each
constituent. If K 5 0.015, ca,0 5 42, cb,0 5 30, and cc,0 5 4, determine
the value of x. (a) Obtain the solution graphically. (b) On the basis of
(a), solve for the root with initial guesses of xl 5 0 and xu 5 20 to
␧s 5 0.5%. Choose either bisection or false position to obtain your
solution. Justify your choice.
8.6 The following chemical reactions take place in a closed system
2A 1 B C
A 1 D C
At equilibrium, they can be characterized by
K1 5
cc
c2
acb
K2 5
cc
cacd
where the nomenclature represents the concentration of constituent
i. If x1 and x2 are the number of moles of C that are produced due to
the first and second reactions, respectively, use an approach similar
to that of Prob. 8.5 to reformulate the equilibrium relationships in
terms of the initial concentrations of the constituents. Then, use the
Newton-Raphson method to solve the pair of simultaneous non-
linear equations for x1 and x2 if K1 5 4 3 1024
, K2 5 3.7 3 1022
,
ca,0 5 50, cb,0 5 20, cc,0 5 5, and cd,0 5 10. Use a graphical
approach to develop your initial guesses.
8.7 The Redlich-Kwong equation of state is given by
p 5
RT
y 2 b
2
a
y(y 1 b) 1T
where R 5 the universal gas constant [5 0.518 kJ/(kg K)],
T 5 absolute temperature (K), p 5 absolute pressure (kPa), and
y 5 the volume of a kg of gas (m3
/kg). The parameters a and b
are calculated by
a 5 0.427
R2
T2.5
c
pc
b 5 0.0866R
Tc
pc
where pc 5 critical pressure (kPa) and Tc 5 critical temperature (K).
As a chemical engineer, you are asked to determine the amount of
methane fuel (pc 5 4600 kPa and Tc 5 191 K) that can be held in a
3-m3
tank at a temperature of 2408C with a pressure of 65,000 kPa.
Use a root-locating method of your choice to calculate y and then
determine the mass of methane contained in the tank.
ln
1 1 R(1 2 XAf)
R(1 2 XAf)
5
R 1 1
R[1 1 R(1 2 XAf)]
where XAf 5 the fraction of reactant A that is converted to product
B. The optimal recycle rate corresponds to the minimum-sized
reactor needed to attain the desired level of conversion. Use a
numerical method to determine the recycle ratios needed to mini-
mize reactor size for a fractional conversion of XAf 5 0.9.
8.3 In a chemical engineering process, water vapor (H2O) is heated to
sufficiently high temperatures that a significant portion of the water
dissociates, or splits apart, to form oxygen (O2) and hydrogen (H2):
H2O H2 1 1
2O2
If it is assumed that this is the only reaction involved, the mole
fraction x of H2O that dissociates can be represented by
K 5
x
1 2 x A
2pt
2 1 x
(P8.3.1)
where K 5 the reaction equilibrium constant and pt 5 the total
pressure of the mixture. If pt 5 3 atm and K 5 0.05, determine the
value of x that satisfies Eq. (P8.3.1).
8.4 The following equation pertains to the concentration of a
chemical in a completely mixed reactor:
c 5 cin(1 2 e20.04t
) 1 c0e20.04t
If the initial concentration c0 5 4 and the inflow concentration cin 5 10,
compute the time required for c to be 93 percent of cin.
8.5 A reversible chemical reaction
2A 1 B C
can be characterized by the equilibrium relationship
K 5
cc
c2
acb
where the nomenclature ci represents the concentration of constituent i.
Suppose that we define a variable x as representing the number of
FIGURE P8.2
Schematic representation of a plug flow reactor with recycle.
Plug flow reactor
Recycle
Feed Product
PROBLEMS 217
Given the parameter values listed below, find the void fraction e of
the bed.
DpGo
m
5 1000
¢PrDp
G2
oL
5 10
8.13 The pressure drop in a section of pipe can be calculated as
¢p 5 f
LrV2
2D
where Dp 5 the pressure drop (Pa), f 5 the friction factor, L 5 the
length of pipe [m],  5 density (kg/m3
), V 5 velocity (m/s), and
D 5 diameter (m). For turbulent flow, the Colebrook equation pro-
vides a means to calculate the friction factor,
1
1f
5 22.0 log a
e
3.7D
1
2.51
Re1f
b
where ␧ 5 the roughness (m), and Re 5 the Reynolds number,
Re 5
rVD
m
where  5 dynamic viscosity (N ? s/m2
).
(a) Determine Dp for a 0.2-m-long horizontal stretch of smooth
drawn tubing given  5 1.23 kg/m3
, m 5 1.79 3 1025
N ? s/m2
,
D 5 0.005 m, V 5 40 m/s, and e 5 0.0015 mm. Use a numerical
method to determine the friction factor. Note that smooth pipes
with Re , 105
, a good initial guess can be obtained using the
Blasius formula, f 5 0.316yRe0.25
.
(b) Repeat the computation but for a rougher commercial steel
pipe (e 5 0.045 mm).
Civil and Environmental Engineering
8.14 In structural engineering, the secant formula defines the force
per unit area, PyA, that causes a maximum stress m in a column of
given slenderness ratio Lyk:
P
A
5
sm
1 1 (ecyk2
)sec[0.51Py(EA)(Lyk)]
where ecyk2
5 the eccentricity ratio and E 5 the modulus of elasticity.
If for a steel beam, E 5 200,000 MPa, ecyk2
5 0.2, and sm 5 250 MPa,
compute PyA for Lyk 5 100. Recall that sec x 5 1ycos x.
8.15 In environmental engineering (a specialty area in civil
engineering), the following equation can be used to compute the oxy-
gen level c (mg/L) in a river downstream from a sewage discharge:
c 5 10 2 20(e20.2x
2 e20.75x
)
8.8 The volume V of liquid in a hollow horizontal cylinder of
radius r and length L is related to the depth of the liquid h by
V 5 c r2
cos21
a
r 2 h
r
b 2 (r 2 h) 22rh 2 h2
d L
Determine h given r 5 2 m, L 5 5 m, and V 5 8 m3
. Note that if you
are using a programming language or software tool that is not rich in
trigonometric functions, the arc cosine can be computed with
cos21
x 5
p
2
2 tan21
a
x
21 2 x2
b
8.9 The volume V of liquid in a spherical tank of radius r is related
to the depth h of the liquid by
V 5
ph2
(3r 2 h)
3
Determine h given r 5 1 m and V 5 0.5 m3
.
8.10 For the spherical tank in Prob. 8.9, it is possible to develop the
following two fixed-point formulas:
h 5
B
h3
1 (3Vyp)
3r
and
h 5
B
3
3 arh2
2
V
p
b
If r 5 1 m and V 5 0.5 m3
, determine whether either of these is
stable, and the range of initial guesses for which they are stable.
8.11 The operation of a constant density plug flow reactor for the
production of a substance via an enzymatic reaction is described by
the equation below, where V is the volume of the reactor, F is the
flow rate of reactant C, Cin and Cout are the concentrations of reac-
tant entering and leaving the reactor, respectively, and K and kmax
are constants. For a 100-L reactor, with an inlet concentration of
Cin 5 0.2 M, an inlet flow rate of 80 L/s, kmax 5 1022
s21
, and
K 5 0.1 M, find the concentration of C at the outlet of the reactor.
V
F
5 2#
Cout
Cin
K
kmaxC
1
1
kmax
dC
8.12 The Ergun equation, shown below, is used to describe the
flow of a fluid through a packed bed. DP is the pressure drop, r is
the density of the fluid, Go is the mass velocity (mass flow rate di-
vided by cross-sectional area), Dp is the diameter of the particles
within the bed, m is the fluid viscosity, L is the length of the bed,
and e is the void fraction of the bed.
¢Pr
G2
o
Dp
L
e3
1 2 e
5 150
1 2 e
(DpGoym)
1 1.75
218 CASE STUDIES: ROOTS OF EQUATIONS
where the hyperbolic cosine can be computed by
cosh x 5
1
2
(ex
1 e2x
)
Use a numerical method to calculate a value for the parameter TA
given values for the parameters w 5 10 and y0 5 5, such that the
cable has a height of y 5 15 at x 5 50.
8.18 Figure P8.18a shows a uniform beam subject to a linearly
increasing distributed load. The equation for the resulting elastic
curve is (see Fig. P8.18b)
y 5
w0
120EIL
(2x5
1 2L2
x3
2 L4
x) (P8.18.1)
Use bisection to determine the point of maximum deflection (that is,
the value of x where dy/dx 5 0). Then substitute this value into
Eq. (P8.18.1) to determine the value of the maximum deflection.
Use the following parameter values in your computation: L 5 450
cm, E 5 50,000 kN/cm2
, I 5 30,000 cm4
, and w0 5 1.75 kN/cm.
8.19 The displacement of a structure is defined by the following
equation for a damped oscillation:
y 5 8e2kt
cos vt
where k 5 0.5 and  5 3.
(a) Use the graphical method to make an initial estimate of the
time required for the displacement to decrease to 4.
(b) Use the Newton-Raphson method to determine the root to
␧s 5 0.01%.
(c) Use the secant method to determine the root to ␧s 5 0.01%.
8.20 The Manning equation can be written for a rectangular open
channel as
Q 5
1S(BH)5y3
n(B 1 2H)2y3
where x is the distance downstream in kilometers.
(a) Determine the distance downstream where the oxygen level
first falls to a reading of 5 mg/L. (Hint: It is within 2 km of the
discharge.) Determine your answer to a 1% error. Note that
levels of oxygen below 5 mg/L are generally harmful to game-
fish such as trout and salmon.
(b) Determine the distance downstream at which the oxygen is at a
minimum. What is the concentration at that location?
8.16 The concentration of pollutant bacteria c in a lake decreases
according to
c 5 70e21.5t
1 25e20.075t
Determine the time required for the bacteria concentration to be
reduced to 9 using (a) the graphical method and (b) using the
Newton-Raphson method with an initial guess of t 5 10 and a
stopping criterion of 0.5%. Check your result.
8.17 A catenary cable is one that is hung between two points not in
the same vertical line. As depicted in Fig. P8.17a, it is subject to no
loads other than its own weight. Thus, its weight (N/m) acts as a
uniform load per unit length along the cable. A free-body diagram
of a section AB is depicted in Fig. P8.17b, where TA and TB are the
tension forces at the end. Based on horizontal and vertical force
balances, the following differential equation model of the cable can
be derived:
d2
y
dx2
5
w
TAB
1 1 a
dy
dx
b
2
Calculus can be employed to solve this equation for the height y of
the cable as a function of distance x,
y 5
TA
w
cosh a
w
TA
xb 1 y0 2
TA
w
FIGURE P8.17
(a) Forces acting on a section
AB of a flexible hanging cable.
The load is uniform along the
cable (but not uniform per the
horizontal distance x). (b) A free-
body diagram of section AB.
y
B
A
TA
W = ws
w y0
x
(a) (b)
TB
␪
PROBLEMS 219
formula relating present worth P, annual payments A, number of
years n, and interest rate i is
A 5 P
i(1 1 i)n
(1 1 i)n
2 1
8.23 Many fields of engineering require accurate population esti-
mates. For example, transportation engineers might find it neces-
sary to determine separately the population growth trends of a city
and adjacent suburb. The population of the urban area is declining
with time according to
Pu(t) 5 Pu, maxe2kut
1 Pu, min
while the suburban population is growing, as in
Ps(t) 5
Ps, max
1 1 [Ps, maxyP0 2 1]e2kst
where Pu, max, ku, Ps, max, P0, and ks 5 empirically derived parame-
ters. Determine the time and corresponding values of Pu(t) and Ps(t)
when the suburbs are 20% larger than the city. The parameter
values are Pu, max 5 75,000, ku 5 0.045/yr, Pu, min 5 100,000 people,
Ps, max 5 300,000 people, P0 5 10,000 people, ks 5 0.08/yr. To
obtain your solutions, use (a) graphical, (b) false-position, and
(c) modified secant methods.
8.24 A simply supported beam is loaded as shown in Fig. P8.24.
Using singularity functions, the shear along the beam can be
expressed by the equation:
V(x) 5 20[kx 2 0l1
2 kx 2 5l1
] 2 15 kx 2 8l0
2 57
By definition, the singularity function can be expressed as follows:
kx 2 aln
5 e
(x 2 a)n
when x . a
0 when x # a
f
Use a numerical method to find the point(s) where the shear equals
zero.
where Q 5 flow [m3
/s], S 5 slope [m/m], H 5 depth [m], and
n 5 the Manning roughness coefficient. Develop a fixed-point
iteration scheme to solve this equation for H given Q 5 5,
S 5 0.0002, B 5 20, and n 5 0.03. Prove that your scheme con-
verges for all initial guesses greater than or equal to zero.
8.21 In ocean engineering, the equation for a reflected standing
wave in a harbor is given by  5 16, t 5 12,  5 48:
h 5 h0 c sina
2px
l
b cos a
2pty
l
b 1 e2x
d
Solve for the lowest positive value of x if h 5 0.4h0.
8.22 You buy a $20,000 piece of equipment for nothing down and
$4000 per year for 6 years. What interest rate are you paying? The
w0
L
(a)
(x = 0, y = 0)
(x = L, y = 0)
x
(b)
FIGURE P8.18
FIGURE P8.24
20 kips/ft
150 kip-ft
15 kips
5’ 2’ 1’ 2’
220 CASE STUDIES: ROOTS OF EQUATIONS
Electrical Engineering
8.29 Perform the same computation as in Sec. 8.3, but determine
the value of L required for the circuit to dissipate to 1% of its origi-
nal value in t 5 0.05 s, given R 5 280V, and C 5 1024
F. Use
(a) a graphical approach, (b) bisection, and (c) root location soft-
ware such as the Excel Solver, the MATLAB function fzero, or
the Mathcad function root.
8.30 An oscillating current in an electric circuit is described by
i 5 9e2t
sin(2 t), where t is in seconds. Determine the lowest
value of t such that i 5 3.5.
8.31 The resistivity  of doped silicon is based on the charge
q on an electron, the electron density n, and the electron mobility
. The electron density is given in terms of the doping density
N and the intrinsic carrier density ni. The electron mobility is
described by the temperature T, the reference temperature T0,
and the reference mobility 0. The equations required to com-
pute the resistivity are
r 5
1
qnm
where
n 5
1
2
(N 1 2N2
1 4n2
i ) and m 5 m0 a
T
T0
b
22.42
Determine N, given T0 5 300 K, T 5 1000 K, 0 5 1300 cm2
(V s)21
, q 5 1.6 3 10219
C, ni 5 6.21 3 109
cm23
, and a desired
 5 6 3 106
V s cm/C. Use (a) bisection and (b) the modified
secant method.
8.32 A total charge Q is uniformly distributed around a ring-shaped
conductor with radius a. A charge q is located at a distance x from
the center of the ring (Fig. P8.32). The force exerted on the charge
by the ring is given by
F 5
1
4pe0
qQx
(x2
1 a2
)3y2
where e0 5 8.85 3 10212
C2
/(N m2
). Find the distance x where the
force is 1N if q and Q are 2 3 1025
C for a ring with a radius of
0.9 m.
8.25 Using the simply supported beam from Prob. 8.24, the mo-
ment along the beam, M(x), is given by:
M(x) 5 210[kx 2 0l2
2 kx 2 5l2
] 1 15 kx 2 8l1
1 150 kx 2 7l0
1 57x
Use a numerical method to find the point(s) where the moment
equals zero.
8.26 Using the simply supported beam from Prob. 8.24, the slope
along the beam is given by:
duy
dx
(x) 5
210
3
[kx 2 0l3
2 kx 2 5l3
] 1
15
2
kx 2 8l2
1 150 kx 2 7l1
1
57
2
x2
2 238.25
Use a numerical method to find the point(s) where the slope equals
zero.
8.27 Using the simply supported beam from Prob. 8.24, the dis-
placement along the beam is given by:
uy(x) 5
25
6
[kx 2 0l4
2 kx 2 5l4
] 1
15
6
kx 2 8l3
1 75 kx 2 7l2
1
57
6
x3
2 238.25x
(a) Find the point(s) where the displacement equals zero.
(b) How would you use a root location technique to determine the
location of the minimum displacement?
8.28 Although we did not mention it in Sec. 8.2, Eq. (8.10) is actu-
ally an expression of electroneutrality; that is, that positive and
negative charges must balance. This can be seen more clearly by
expressing it as
[H1
] 5 [HCO2
3 ] 1 2[CO22
3 ] 1 [OH2
]
In other words, the positive charges must equal the negative
charges. Thus, when you compute the pH of a natural water body
such as a lake, you must also account for other ions that may be
present. For the case where these ions originate from nonreactive
salts, the net negative minus positive charges due to these ions are
lumped together in a quantity called alkalinity, and the equation is
reformulated as
Alk 1 [H1
] 5 [HCO2
3 ] 1 2[CO22
3 ] 1 [OH2
] (P8.28)
where Alk 5 alkalinity (eq/L). For example, the alkalinity of Lake
Superior is approximately 0.4 3 1023
eq/L. Perform the same
calculations as in Sec. 8.2 to compute the pH of Lake Superior in
2008. Assume that just like the raindrops, the lake is in equilib-
rium with atmospheric CO2, but account for the alkalinity as in
Eq. (P8.28).
FIGURE P8.32
x
a
Q
q
PROBLEMS 221
8.36 Mechanical engineers, as well as most other engineers, use
thermodynamics extensively in their work. The following polyno-
mial can be used to relate the zero-pressure specific heat of dry air,
cp kJ/(kg K), to temperature (K):
cp 5 0.99403 1 1.671 3 1024
T 1 9.7215 3 1028
T2
29.5838 3 10211
T3
1 1.9520 3 10214
T4
Determine the temperature that corresponds to a specific heat of
1.2 kJ/(kg K).
8.37 Aerospace engineers sometimes compute the trajectories of pro-
jectiles like rockets. A related problem deals with the trajectory of a
thrown ball.The trajectory of a ball is defined by the (x, y) coordinates,
as displayed in Fig. P8.37. The trajectory can be modeled as
y 5 (tan u0)x 2
g
2y2
0 cos2
u0
x2
1 y0
Find the appropriate initial angle u0, if the initial velocity 0 5 20 m/s
and the distance to the catcher x is 40 m. Note that the ball leaves the
thrower’s hand at an elevation of y0 5 1.8 m and the catcher receives
it at 1 m. Express the final result in degrees. Use a value of 9.81 m/s2
for g and employ the graphical method to develop your initial guesses.
8.33 Figure P8.33 shows a circuit with a resistor, an inductor, and
a capacitor in parallel. Kirchhoff’s rules can be used to express the
impedance of the system as
1
Z
5
B
1
R2
1 avC 2
1
vL
b
2
where Z 5 impedance (V) and v 5 the angular frequency. Find the
 that results in an impedance of 75 V using both bisection and
false position with initial guesses of 1 and 1000 for the following
parameters: R 5 225 V, C 5 0.6 3 1026
F, and L 5 0.5 H. Deter-
mine how many iterations of each technique are necessary to deter-
mine the answer to ␧s 5 0.1%. Use the graphical approach to
explain any difficulties that arise.
FIGURE P8.33
R L C
ⵑ
FIGURE P8.35
h
(a) (b)
d
h + d
Mechanical and Aerospace Engineering
8.34 Beyond the Colebrook equation, other relationships, such as
the Fanning friction factor f, are available to estimate friction in
pipes. The Fanning friction factor is dependent on a number of pa-
rameters related to the size of the pipe and the fluid, which can all be
represented by another dimensionless quantity, the Reynolds number
Re. A formula that predicts f given Re is the von Karman equation,
1
1f
5 4 log10(Re1f ) 2 0.4
Typical values for the Reynolds number for turbulent flow are 10,000
to 500,000 and for the Fanning friction factor are 0.001 to 0.01. De-
velop a function that uses bisection to solve for f given a user-supplied
value of Re between 2500 and 1,000,000. Design the function so that
it ensures that the absolute error in the result is Ea,d , 0.000005.
8.35 Real mechanical systems may involve the deflection of nonlin-
ear springs. In Fig. P8.35, a mass m is released a distance h above a
nonlinear spring. The resistance force F of the spring is given by
F 5 2(k1d 1 k2d3y2
)
Conservation of energy can be used to show that
0 5
2k2d5y2
5
1
1
2
k1d2
2 mgd 2 mgh
Solve for d, given the following parameter values: k1 5 40,000 g/s2
,
k2 5 40 g/(s2
m0.5
), m 5 95 g, g 5 9.81 m/s2
, and h 5 0.43 m.
FIGURE P8.37
␪0
v0
y
x
222 CASE STUDIES: ROOTS OF EQUATIONS
As a mechanical engineer, you would like to know if there are cases
where 5 y2 2 1. Use the other parameters from the section to
set up the equation as a roots problem and solve for .
8.41 Two fluids at different temperatures enter a mixer and
come out at the same temperature. The heat capacity of fluid A
is given by:
cp 5 3.381 1 1.804 3 1022
T 2 4.300 3 1026
T2
and the heat capacity of fluid B is given by:
cp 5 8.592 1 1.290 3 1021
T 2 4.078 3 1025
T2
where cp is in units of cal/mol K, and T is in units of K. Note that
¢H 5 #
T2
T1
cpdT
A enters the mixer at 4008C. B enters the mixer at 6008C. There is
twice as much A as there is B entering into the mixer. At what tem-
perature do the two fluids exit the mixer?
8.42 A compressor is operating at compression ratio Rc of 3.0 (the
pressure of gas at the outlet is three times greater than the pressure
of the gas at the inlet). The power requirements of the compressor
Hp can be determined from the equation below. Assuming that
the power requirements of the compressor are exactly equal to
zRT1yMW, find the polytropic efficiency n of the compressor. The
parameter z is compressibility of the gas under operating condi-
tions of the compressor, R is the gas constant, T1 is the temperature
of the gas at the compressor inlet, and MW is the molecular weight
of the gas.
HP 5
zRT1
MW
n
n 2 1
(R(n21)yn
c 2 1)
8.43 In the thermos shown in Fig. P8.43, the innermost compart-
ment is separated from the middle container by a vacuum. There
is a final shell around the thermos. This final shell is separated
from the middle layer by a thin layer of air. The outside of the
final shell comes in contact with room air. Heat transfer from the
inner compartment to the next layer q1 is by radiation only (since
the space is evacuated). Heat transfer between the middle layer
and outside shell q2 is by convection in a small space. Heat trans-
fer from the outside shell to the air q3 is by natural convection.
The heat flux from each region of the thermos must be equal—
that is, q1 5 q2 5 q3. Find the temperatures T1 and T2 at steady
state. T0 is 5008C and T3 5 258C.
q1 5 1029
[(T0 1 273)4
2 (T1 1 273)4
]
q2 5 4(T1 2 T2)
q3 5 1.3(T2 2 T3)4y3
8.38 The general form for a three-dimensional stress field is
given by
£
sxx sxy sxz
sxy syy syz
sxz syz szz
§
where the diagonal terms represent tensile or compressive stresses
and the off-diagonal terms represent shear stresses. A stress field
(in MPa) is given by
£
10 14 25
14 7 15
25 15 16
§
To solve for the principal stresses, it is necessary to construct the
following matrix (again in MPa):
£
10 2 s 14 25
14 7 2 s 15
25 15 16 2 s
§
1, 2, and 3 can be solved from the equation
s3
2 Is2
1 IIs 2 III 5 0
where
I 5 sxx 1 syy 1 szz
II 5 sxxsyy 1 sxxszz 1 syyszz 2 s2
xy 2 s2
xz 2 s2
yz
III 5 sxxsyyszz 2 sxxs2
yz 2 syys2
xz 2 szzs2
xy 1 2sxysxzsyz
I, II, and III are known as the stress invariants. Find 1, 2, and 3
using a root-finding technique.
8.39 The upward velocity of a rocket can be computed by the fol-
lowing formula:
y 5 u ln
m0
m0 2 qt
2 gt
where  5 upward velocity, u 5 the velocity at which fuel is ex-
pelled relative to the rocket, m0 5 the initial mass of the rocket at
time t 5 0, q 5 the fuel consumption rate, and g 5 the downward ac-
celeration of gravity (assumed constant 5 9.81 m/s2
). If u 5 2200 m/s,
m0 5 160,000 kg, and q 5 2680 kg/s, compute the time at which 
5 1000 m/s. (Hint: t is somewhere between 10 and 50 s.) Determine
your result so that it is within 1% of the true value. Check your
answer.
8.40 The phase angle between the forced vibration caused by the
rough road and the motion of the car is given by
tan f 5
2(cycc)(vyp)
1 2 (vyp)2
PROBLEMS 223
8.45 A fluid is pumped into the network of pipes shown in Fig. P8.45.
At steady state, the following flow balances must hold,
Q1 5 Q2 1 Q3
Q3 5 Q4 1 Q5
Q5 5 Q6 1 Q7
where Qi 5 flow in pipe i(m3
/s). In addition, the pressure drops
around the three right-hand loops must equal zero. The pressure
drop in each circular pipe length can be computed with
¢P 5
16
p2
fLr
2D5
Q2
where DP 5 the pressure drop (Pa), f 5 the friction factor (dimen-
sionless), L 5 the pipe length (m),  5 the fluid density (kg/m3
),
and D 5 pipe diameter (m). Write a program (or develop an algo-
rithm in a mathematics software package) that will allow you to
compute the flow in every pipe length given that Q1 5 1 m3
/s and
 5 1.23 kg/m3
. All the pipes have D 5 500 mm and f 5 0.005.
The pipe lengths are: L3 5 L5 5 L8 5 L9 5 2 m; L2 5 L4 5 L6 5 4 m;
and L7 5 8 m.
8.46 Repeat Prob. 8.45, but incorporate the fact that the friction
factor can be computed with the von Karman equation,
1
1f
5 4 log10(Re1f ) 2 0.4
where Re 5 the Reynolds number
Re 5
rVD
m
where V 5 the velocity of the fluid in the pipe (m/s) and  5
dynamic viscosity (N ? s/m2
). Note that for a circular pipe
8.44 Figure P8.44 shows three reservoirs connected by circular pipes.
The pipes, which are made of asphalt-dipped cast iron (ε 5 0.0012 m),
have the following characteristics:
Pipe 1 2 3
Length, m 1800 500 1400
Diameter, m 0.4 0.25 0.2
Flow, m3
/s ? 0.1 ?
If the water surface elevations in Reservoirs A and C are 200 and
172.5 m, respectively, determine the elevation in Reservoir B and
the flows in pipes 1 and 3. Note that the kinematic viscosity of
water is 1 3 1026
m2
/s and use the Colebrook equation to deter-
mine the friction factor (recall Prob. 8.13).
FIGURE P8.43
T0
T2
T3
T1
FIGURE P8.45
Q1
Q10 Q9 Q8
Q3 Q5
Q7
Q6
Q4
Q2
FIGURE P8.44
Q1
h2
h3
h1
Q3
Q2
1
2
3
A
B
C
224 CASE STUDIES: ROOTS OF EQUATIONS
V 5 4Q/ D2
. Also, assume that the fluid has a viscosity of
1.79 3 1025
N ? s/m2
.
8.47 The space shuttle, at lift-off from the launch pad, has four
forces acting on it, which are shown on the free-body diagram
(Fig. P8.47). The combined weight of the two solid rocket boost-
ers and external fuel tank is WB 5 1.663 3 106
lb. The weight of
FIGURE P8.47
External tank
Solid rocket
booster
Orbiter
38’
4’
28’
WB WS
TS
TB
␪
G
the orbiter with a full payload is WS 5 0.23 3 106
lb. The com-
bined thrust of the two solid rocket boosters is TB 5 5.30 3 106
lb.
The combined thrust of the three liquid fuel orbiter engines is TS
5 1.125 3 106
lb.
At liftoff, the orbiter engine thrust is directed at angle to
make the resultant moment acting on the entire craft assembly
(external tank, solid rocket boosters, and orbiter) equal to zero.
With the resultant moment equal to zero, the craft will not rotate
about its mass center G at liftoff. With these forces, the craft will
have a resultant force with components in both the vertical and
horizontal direction. The vertical resultant force component is
what allows the craft to lift off from the launch pad and fly verti-
cally. The horizontal resultant force component causes the craft to
fly horizontally. The resultant moment acting on the craft will be
zero when is adjusted to the proper value. If this angle is not
adjusted properly, and there is some resultant moment acting on
the craft, the craft will tend to rotate about it mass center.
(a) Resolve the orbiter thrust TS into horizontal and vertical com-
ponents, and then sum moments about point G, the craft mass
center. Set the resulting moment equation equal to zero. This
equation can now be solved for the value of required for
liftoff.
(b) Derive an equation for the resultant moment acting on the craft
in terms of the angle . Plot the resultant moment as a function
of the angle over a range of 25 radians to 15 radians.
(c) Write a computer program to solve for the angle using
Newton’s method to find the root of the resultant moment equa-
tion. Make an initial first guess at the root of interest using the
plot. Terminate your iterations when the value of has better
than five significant figures.
(d) Repeat the program for the minimum payload weight of the
orbiter of WS 5 195,000 lb.
8.48 Determining the velocity of particles settling through fluids is
of great importance of many areas of engineering and science. Such
calculations depend on the flow regime as represented by the
dimensionless Reynolds number,
Re 5
rdy
m
(P8.48.1)
where  5 the fluid’s density (kg/m3
), d 5 the particle diameter
(m), y 5 the particle’s settling velocity (m/s), and  5 the fluid’s
dynamic viscosity (N s/m2
). Under laminar conditions (Re , 0.1),
the settling velocity of a spherical particle can be computed with
the following formula based on Stokes law,
y 5
g
18
a
rs 2 r
m
bd2
(P8.48.2)
where g 5 the gravitational constant (5 9.81 m/s2
), and s 5 the
particle’s density (kg/m3
). For turbulent conditions (i.e., higher
PROBLEMS 225
(b) Use the modified secant method with d 5 1023
and εS 5 0.05%
to determine y for a spherical iron particle settling in water, where
d 5 200 m,  5 1 g/cm3
, s 5 7.874 g/cm3
, and  5 0.014
g/(cm?s). Employ Eq. (P8.48.2) to generate your initial guess.
(c) Based on the result of (b), compute the Reynolds number and
the drag coefficient, and use the latter to confirm that the flow
regime is not laminar.
(d) Develop a fixed-point iteration solution for the conditions
outlined in (b).
(e) Use a graphical approach to illustrate that the formulation
developed in (d) will converge for any positive guess.
Reynolds numbers), an alternative approach can be used based on
the following formula:
y 5
B
4g(rs 2 r)d
3CDr
(P8.48.3)
where CD 5 the drag coefficient, which depends on the Reynolds
number as in
CD 5
24
Re
1
3
1Re
1 0.34 (P8.48.4)
(a) Combine Eqs. (P8.48.2), (P8.48.3), and (P8.48.4) to express
the determination of y as a roots of equations problem. That is,
express the combined formula in the format f(y) 5 0.
226
PT2.4 TRADE-OFFS
Table PT2.3 provides a summary of the trade-offs involved in solving for roots of alge-
braic and transcendental equations. Although graphical methods are time-consuming,
they provide insight into the behavior of the function and are useful in identifying initial
guesses and potential problems such as multiple roots. Therefore, if time permits, a quick
sketch (or better yet, a computerized graph) can yield valuable information regarding the
behavior of the function.
The numerical methods themselves are divided into two general categories: bracket-
ing and open methods. The former requires two initial guesses that are on either side of
a root. This “bracketing” is maintained as the solution proceeds, and thus, these tech-
niques are always convergent. However, a price is paid for this property in that the rate
of convergence is relatively slow.
TABLE PT2.3 Comparison of the characteristics of alternative methods for finding roots of algebraic and
transcendental equations. The comparisons are based on general experience and do not account for the
behavior of specific functions.
Method Type Guesses Convergence Stability Programming Comments
Direct Analytical — — —
Graphical Visual — — — — Imprecise
Bisection Bracketing 2 Slow Always Easy
False-position Bracketing 2 Slow/medium Always Easy
Modified FP Bracketing 2 Medium Always Easy
Fixed-point Open 1 Slow Possibly divergent Easy
iteration
Newton-Raphson Open 1 Fast Possibly divergent Easy Requires
evaluation of f’(x)
Modified Newton- Open 1 Fast (multiple), Possibly divergent Easy Requires
Raphson medium (single) evaluation of
f’(x) and f”(x)
Secant Open 2 Medium/fast Possibly divergent Easy Initial guesses do
not have to
bracket the root
Modified secant Open 1 Medium/fast Possibly divergent Easy
Brent Hybrid 1 or 2 Medium Always (for Moderate Robust
2 guesses)
Müller Polynomials 2 Medium/fast Possibly divergent Moderate
Bairstow Polynomials 2 Fast Possibly divergent Moderate
EPILOGUE: PART TWO
PT2.6 ADVANCED METHODS AND ADDITIONAL REFERENCES 227
Open techniques differ from bracketing methods in that they use information at a
single point (or two values that need not bracket the root to extrapolate to a new root
estimate). This property is a double-edged sword. Although it leads to quicker conver-
gence, it also allows the possibility that the solution may diverge. In general, the con-
vergence of open techniques is partially dependent on the quality of the initial guess and
the nature of the function. The closer the guess is to the true root, the more likely the
methods will converge.
Of the open techniques, the standard Newton-Raphson method is often used because
of its property of quadratic convergence. However, its major shortcoming is that it re-
quires the derivative of the function be obtained analytically. For some functions this is
impractical. In these cases, the secant method, which employs a finite-difference repre-
sentation of the derivative, provides a viable alternative. Because of the finite-difference
approximation, the rate of convergence of the secant method is initially slower than for
the Newton-Raphson method. However, as the root estimate is refined, the difference
approximation becomes a better representation of the true derivative, and convergence
accelerates rapidly. The modified Newton-Raphson technique can be used to attain rapid
convergence for multiple roots. However, this technique requires an analytical expression
for both the first and second derivatives.
Of particular interest are hybrid methods that combine the reliability of bracketing
with the speed of open methods. Brent’s method does this by combining bisection with
several open methods. All the methods are easy-to-moderate to program on computers
and require minimal time to determine a single root. On this basis, you might conclude
that simple methods such as bisection would be good enough for practical purposes.
This would be true if you were exclusively interested in determining the root of an
equation once. However, there are many cases in engineering where numerous root
locations are required and where speed becomes important. For these cases, slow meth-
ods are very time-consuming and, hence, costly. On the other hand, the fast open meth-
ods may diverge, and the accompanying delays can also be costly. Some computer
algorithms attempt to capitalize on the strong points of both classes of techniques by
initially employing a bracketing method to approach the root, then switching to an open
method to rapidly refine the estimate. Whether a single approach or a combination is
used, the trade-offs between convergence and speed are at the heart of the choice of a
root-location technique.
PT2.5 IMPORTANT RELATIONSHIPS AND FORMULAS
Table PT2.4 summarizes important information that was presented in Part Two. This table
can be consulted to quickly access important relationships and formulas.
PT2.6 ADVANCED METHODS AND ADDITIONAL REFERENCES
The methods in this text have focused on determining a single real root of an algebraic
or transcendental equation based on foreknowledge of its approximate location. In ad-
dition, we have also described methods expressly designed to determine both the real
228 EPILOGUE: PART TWO
and complex roots of polynomials. Additional references on the subject are Ralston and
Rabinowitz (1978) and Carnahan, Luther, and Wilkes (1969).
In addition to Müller’s and Bairstow’s methods, several techniques are available to
determine all the roots of polynomials. In particular, the quotient difference (QD) algo-
rithm (Henrici, 1964, and Gerald and Wheatley, 2004) determines all roots without
initial guesses. Ralston and Rabinowitz (1978) and Carnahan, Luther, and Wilkes (1969)
TABLE PT2.4 Summary of important information presented in Part Two.
Graphical Errors and
Method Formulation Interpretation Stopping Criteria
Bracketing methods:
Bisection xr 5
xl 1 xu
2
Stopping criterion:
If f(xl)f(xr) , 0, xu 5 xr `
xnew
r 2 xold
r
xnew
r
` 100% # es
f(xl)f(xr) . 0, xl 5 xr
False position xr 5 xu 2
f (xu)(xl 2 xu)
f (xl) 2 f (xu)
Stopping criterion:
If f(xl)f(xr) , 0, xu 5 xr `
xnew
r 2 xold
r
xnew
r
` 100% # es
f(xl)f(xr) . 0, xl 5 xr
Newton-Raphson Stopping criterion:
xi11 5 xi 2
f (xi)
f ¿(xi)
`
xi11 2 xi
xi11
` 100% # es
Error: Ei11 5 0(E2
i )
Secant Stopping criterion:
xi11 5 xi 2
f (xi)(xi21 2 xi)
f (xi21) 2 f (xi)
`
xi11 2 xi
xi11
` 100% # es
f(x)
x
xu
xl
L
L/2
Root
L/4
f(x)
x
xu
xl
xr
Chord
f(x)
x
xi
xi + 1
Tangent
f(x)
x
xi xi – 1
xi + 1
PT2.6 ADVANCED METHODS AND ADDITIONAL REFERENCES 229
contain discussions of this method as well as of other techniques for locating roots of
polynomials. As discussed in the text, the Jenkins-Traub and Laguerre’s methods are
widely employed.
In summary, the foregoing is intended to provide you with avenues for deeper
exploration of the subject. Additionally, all the above references provide descrip-
tions of the basic techniques covered in Part Two. We urge you to consult these
alternative sources to broaden your understanding of numerical methods for root
location.1
1
Books are referenced only by author here, a complete bibliography will be found at the back of this text.
PART THREE
231
PT3.1 MOTIVATION
In Part Two, we determined the value x that satisfied a single equation, f(x) 5 0. Now,
we deal with the case of determining the values x1, x2, . . . , xn that simultaneously sat-
isfy a set of equations
f1(x1, x2, p , xn) 5 0
f2(x1, x2, p , xn) 5 0
. .
. .
. .
fn(x1, x2, p , xn) 5 0
Such systems can be either linear or nonlinear. In Part Three, we deal with linear alge-
braic equations that are of the general form
a11x1 1 a12x2 1 p 1 a1nxn 5 b1
a21x1 1 a22x2 1 p 1 a2nxn 5 b2
. .
. . (PT3.1)
. .
an1x1 1 an2x2 1 p 1 annxn 5 bn
where the a’s are constant coefficients, the b’s are constants, and n is the number of equa-
tions. All other equations are nonlinear. Nonlinear systems were discussed in Chap. 6 and
will be covered briefly again in Chap. 9.
PT3.1.1 Noncomputer Methods for Solving Systems of Equations
For small numbers of equations (n # 3), linear (and sometimes nonlinear) equations can
be solved readily by simple techniques. Some of these methods will be reviewed at the
beginning of Chap. 9. However, for four or more equations, solutions become arduous
and computers must be utilized. Historically, the inability to solve all but the smallest sets
of equations by hand has limited the scope of problems addressed in many engineering
applications.
Before computers, techniques to solve linear algebraic equations were time-consum-
ing and awkward. These approaches placed a constraint on creativity because the methods
were often difficult to implement and understand. Consequently, the techniques were
sometimes overemphasized at the expense of other aspects of the problem-solving process
such as formulation and interpretation (recall Fig. PT1.1 and accompanying discussion).
LINEAR ALGEBRAIC
EQUATIONS
232 LINEAR ALGEBRAIC EQUATIONS
The advent of easily accessible computers makes it possible and practical for you
to solve large sets of simultaneous linear algebraic equations. Thus, you can approach
more complex and realistic examples and problems. Furthermore, you will have more
time to test your creative skills because you will be able to place more emphasis on
problem formulation and solution interpretation.
PT3.1.2 Linear Algebraic Equations and Engineering Practice
Many of the fundamental equations of engineering are based on conservation laws (recall
Table 1.1). Some familiar quantities that conform to such laws are mass, energy, and
momentum. In mathematical terms, these principles lead to balance or continuity equa-
tions that relate system behavior as represented by the levels or response of the quantity
being modeled to the properties or characteristics of the system and the external stimuli
or forcing functions acting on the system.
As an example, the principle of mass conservation can be used to formulate a model
for a series of chemical reactors (Fig. PT3.1a). For this case, the quantity being modeled
is the mass of the chemical in each reactor. The system properties are the reaction char-
acteristics of the chemical and the reactors’ sizes and flow rates. The forcing functions
are the feed rates of the chemical into the system.
In Part Two, you saw how single-component systems result in a single equation that
can be solved using root-location techniques. Multicomponent systems result in a coupled
set of mathematical equations that must be solved simultaneously. The equations are
FIGURE PT3.1
Two types of systems that can be modeled using linear algebraic equations: (a) lumped
variable system that involves coupled finite components and (b) distributed variable system that
involves a continuum.
x1 x1 xi⫹1
xi⫺1 xn
(b)
Feed
Feed x1 x5
(a)
…
…
x2
x3
x4
PT3.2 MATHEMATICAL BACKGROUND 233
coupled because the individual parts of the system are influenced by other parts. For
example, in Fig. PT3.1a, reactor 4 receives chemical inputs from reactors 2 and 3. Con-
sequently, its response is dependent on the quantity of chemical in these other reactors.
When these dependencies are expressed mathematically, the resulting equations are
often of the linear algebraic form of Eq. (PT3.1). The x’s are usually measures of the
magnitudes of the responses of the individual components. Using Fig. PT3.1a as an
example, x1 might quantify the amount of mass in the first reactor, x2 might quantify the
amount in the second, and so forth. The a’s typically represent the properties and char-
acteristics that bear on the interactions between components. For instance, the a’s for
Fig. PT3.1a might be reflective of the flow rates of mass between the reactors. Finally,
the b’s usually represent the forcing functions acting on the system, such as the feed rate
in Fig. PT3.1a. The applications in Chap. 12 provide other examples of such equations
derived from engineering practice.
Multicomponent problems of the above types arise from both lumped (macro-) or
distributed (micro-) variable mathematical models (Fig. PT3.1). Lumped variable prob-
lems involve coupled finite components. Examples include trusses (Sec. 12.2), reactors
(Fig. PT3.1a and Sec. 12.1), and electric circuits (Sec. 12.3). These types of problems
use models that provide little or no spatial detail.
Conversely, distributed variable problems attempt to describe spatial detail of sys-
tems on a continuous or semicontinuous basis. The distribution of chemicals along the
length of an elongated, rectangular reactor (Fig. PT3.1b) is an example of a continuous
variable model. Differential equations derived from the conservation laws specify the
distribution of the dependent variable for such systems. These differential equations can
be solved numerically by converting them to an equivalent system of simultaneous alge-
braic equations. The solution of such sets of equations represents a major engineering
application area for the methods in the following chapters. These equations are coupled
because the variables at one location are dependent on the variables in adjoining regions.
For example, the concentration at the middle of the reactor is a function of the concen-
tration in adjoining regions. Similar examples could be developed for the spatial distribu-
tion of temperature or momentum. We will address such problems when we discuss
differential equations later in the book.
Aside from physical systems, simultaneous linear algebraic equations also arise in
a variety of mathematical problem contexts. These result when mathematical functions
are required to satisfy several conditions simultaneously. Each condition results in an
equation that contains known coefficients and unknown variables. The techniques dis-
cussed in this part can be used to solve for the unknowns when the equations are linear
and algebraic. Some widely used numerical techniques that employ simultaneous equa-
tions are regression analysis (Chap. 17) and spline interpolation (Chap. 18).
PT3.2 MATHEMATICAL BACKGROUND
All parts of this book require some mathematical background. For Part Three, matrix
notation and algebra are useful because they provide a concise way to represent and
manipulate linear algebraic equations. If you are already familiar with matrices, feel free
to skip to Sec. PT3.3. For those who are unfamiliar or require a review, the following
material provides a brief introduction to the subject.
234 LINEAR ALGEBRAIC EQUATIONS
PT3.2.1 Matrix Notation
A matrix consists of a rectangular array of elements represented by a single symbol. As
depicted in Fig. PT3.2, [A] is the shorthand notation for the matrix and aij designates an
individual element of the matrix.
A horizontal set of elements is called a row and a vertical set is called a column.
The first subscript i always designates the number of the row in which the element lies.
The second subscript j designates the column. For example, element a23 is in row 2 and
column 3.
The matrix in Fig. PT3.2 has n rows and m columns and is said to have a dimension
of n by m (or n 3 m). It is referred to as an n by m matrix.
Matrices with row dimension n 5 1, such as
[B] 5 [b1 b2
p bm]
are called row vectors. Note that for simplicity, the first subscript of each element is
dropped. Also, it should be mentioned that there are times when it is desirable to employ
a special shorthand notation to distinguish a row matrix from other types of matrices.
One way to accomplish this is to employ special open-topped brackets, as in :B;.
Matrices with column dimension m 5 1, such as
[C] 5 F
c1
c2
.
.
.
cn
V
are referred to as column vectors. For simplicity, the second subscript is dropped. As
with the row vector, there are occasions when it is desirable to employ a special short-
hand notation to distinguish a column matrix from other types of matrices. One way to
accomplish this is to employ special brackets, as in {C}.
FIGURE PT3.2
A matrix.
Column 3
[A] 5 F
a11 a12 a13 p a1m
a21 a22 a23 p a2m
. . .
. . .
. . .
an1 an2 an3 p anm
V
Row 2
PT3.2 MATHEMATICAL BACKGROUND 235
Matrices where n 5 m are called square matrices. For example, a 4 by 4 matrix is
[A] 5 ≥
a11 a12 a13 a14
a21 a22 a23 a24
a31 a32 a33 a34
a41 a42 a43 a44
¥
The diagonal consisting of the elements a11, a22, a33, and a44 is termed the principal or
main diagonal of the matrix.
Square matrices are particularly important when solving sets of simultaneous
linear equations. For such systems, the number of equations (corresponding to rows)
and the number of unknowns (corresponding to columns) must be equal for a unique
solution to be possible. Consequently, square matrices of coefficients are encountered
when dealing with such systems. Some special types of square matrices are described
in Box PT3.1.
There are a number of special forms of square matrices that are
important and should be noted:
A symmetric matrix is one where aij 5 aji for all i’s and j’s. For
example,
[A] 5 £
5 1 2
1 3 7
2 7 8
§
is a 3 by 3 symmetric matrix.
A diagonal matrix is a square matrix where all elements off the
main diagonal are equal to zero, as in
[A] 5 ≥
a11
a22
a33
a44
¥
Note that where large blocks of elements are zero, they are left
blank.
An identity matrix is a diagonal matrix where all elements on
the main diagonal are equal to 1, as in
[I] 5 ≥
1
1
1
1
¥
The symbol [I] is used to denote the identity matrix. The identity
matrix has properties similar to unity.
An upper triangular matrix is one where all the elements below
the main diagonal are zero, as in
[A] 5 ≥
a11 a12 a13 a14
a22 a23 a24
a33 a34
a44
¥
A lower triangular matrix is one where all elements above the
main diagonal are zero, as in
[A] 5 ≥
a11
a21 a22
a31 a32 a33
a41 a42 a43 a44
¥
A banded matrix has all elements equal to zero, with the excep-
tion of a band centered on the main diagonal:
[A] 5 ≥
a12 a12
a21 a22 a23
a32 a33 a34
a43 a44
¥
The above matrix has a bandwidth of 3 and is given a special
name—the tridiagonal matrix.
Box PT3.1 Special Types of Square Matrices
236 LINEAR ALGEBRAIC EQUATIONS
PT3.2.2 Matrix Operating Rules
Now that we have specified what we mean by a matrix, we can define some operating rules
that govern its use. Two n by m matrices are equal if, and only if, every element in the first
is equal to every element in the second, that is, [A] 5 [B] if aij 5 bij for all i and j.
Addition of two matrices, say, [A] and [B], is accomplished by adding corresponding
terms in each matrix. The elements of the resulting matrix [C] are computed,
cij 5 aij 1 bij
for i 5 1, 2, . . . , n and j 5 1, 2, . . . , m. Similarly, the subtraction of two matrices,
say, [E] minus [F], is obtained by subtracting corresponding terms, as in
dij 5 eij 2 fij
for i 5 1, 2, . . . , n and j 5 1, 2, . . . , m. It follows directly from the above definitions
that addition and subtraction can be performed only between matrices having the same
dimensions.
Both addition and subtraction are commutative:
[A] 1 [B] 5 [B] 1 [A]
Addition and subtraction are also associative, that is,
([A] 1 [B]) 1 [C] 5 [A] 1 ([B] 1 [C])
The multiplication of a matrix [A] by a scalar g is obtained by multiplying every element
of [A] by g, as in
[D] 5 g[A] 5 F
ga11 ga12
p ga1m
ga21 ga22
p ga2m
. . .
. . .
. . .
gan1 gan2
p ganm
V
The product of two matrices is represented as [C] 5 [A][B], where the elements of [C]
are defined as (see Box PT3.2 for a simple way to conceptualize matrix multiplication)
cij 5 a
n
k51
aikbkj (PT3.2)
where n 5 the column dimension of [A] and the row dimension of [B]. That is, the cij
element is obtained by adding the product of individual elements from the ith row of the
first matrix, in this case [A], by the jth column of the second matrix [B].
According to this definition, multiplication of two matrices can be performed only
if the first matrix has as many columns as the number of rows in the second matrix.
Thus, if [A] is an n by m matrix, [B] could be an m by l matrix. For this case, the result-
ing [C] matrix would have the dimension of n by l. However, if [B] were an l by m
matrix, the multiplication could not be performed. Figure PT3.3 provides an easy way
to check whether two matrices can be multiplied.
PT3.2 MATHEMATICAL BACKGROUND 237
FIGURE PT3.3
Box PT3.2 A Simple Method for Multiplying Two Matrices
Although Eq. (PT3.2) is well suited for implementation on a
computer, it is not the simplest means for visualizing the mechanics
of multiplying two matrices. What follows gives more tangible
expression to the operation.
Suppose that we want to multiply [X] by [Y] to yield [Z],
[Z] 5 [X][Y] 5 £
3 1
8 6
0 4
§ c
5 9
7 2
d
A simple way to visualize the computation of [Z] is to raise [Y],
as in
A
c
5 9
7 2
d d [Y]
[X] S £
3 1
8 6
0 4
§ £ ? § d [Z]
Now the answer [Z] can be computed in the space vacated by [Y].
This format has utility because it aligns the appropriate rows
and columns that are to be multiplied. For example, according to
Eq. (PT3.2), the element z11 is obtained by multiplying the first
row of [X] by the first column of [Y]. This amounts to adding the
product of x11 and y11 to the product of x12 and y21, as in
c
5 9
7 2
d
T
£
3 1
8 6
0 4
§
S
£
3 3 5 1 1 3 7 5 22
§
Thus, z11 is equal to 22. Element z21 can be computed in a similar
fashion, as in
c
5 9
7 2
d
T
£
3 1
8 6
0 4
§ S £
22
8 3 5 1 6 3 7 5 82 §
The computation can be continued in this way, following the
alignment of the rows and columns, to yield the result
[Z] 5 £
22 29
82 84
28 8
§
Note how this simple method makes it clear why it is impossible
to multiply two matrices if the number of columns of the first ma-
trix does not equal the number of rows in the second matrix. Also,
note how it demonstrates that the order of multiplication matters
(that is, matrix multiplication is not commutative).
[A]n ⴛ m [B]m ⴛ l ⴝ [C]n ⴛ l
Interior dimensions
are equal;
multiplication
is possible
Exterior dimensions define
the dimensions of the result
238 LINEAR ALGEBRAIC EQUATIONS
If the dimensions of the matrices are suitable, matrix multiplication is associative,
([A][B])[C] 5 [A]([B][C])
and distributive,
[A]([B] 1 [C]) 5 [A][B] 1 [A][C]
or
([A] 1 [B])[C] 5 [A][C] 1 [B][C]
However, multiplication is not generally commutative:
[A][B] ? [B][A]
That is, the order of multiplication is important.
Figure PT3.4 shows pseudocode to multiply an n by m matrix [A], by an m by
l matrix [B], and store the result in an n by l matrix [C]. Notice that, instead of the
inner product being directly accumulated in [C], it is collected in a temporary vari-
able, sum. This is done for two reasons. First, it is a bit more efficient, because the
computer need determine the location of ci, j only n 3 l times rather than n 3 l 3 m
times. Second, the precision of the multiplication can be greatly improved by declar-
ing sum as a double precision variable (recall the discussion of inner products in
Sec. 3.4.2).
Although multiplication is possible, matrix division is not a defined operation. How-
ever, if a matrix [A] is square and nonsingular, there is another matrix [A]21
, called the
inverse of [A], for which
[A][A]21
5 [A]21
[A] 5 [I] (PT3.3)
Thus, the multiplication of a matrix by the inverse is analogous to division, in the sense
that a number divided by itself is equal to 1. That is, multiplication of a matrix by its
inverse leads to the identity matrix (recall Box PT3.1).
The inverse of a two-dimensional square matrix can be represented simply by
[A]21
5
1
a11a22 2 a12a21
c
a22 2a12
2a21 a11
d (PT3.4)
SUBROUTINE Mmult (a, b, c, m, n, l)
DOFOR i 5 1, n
DOFOR j 5 1, l
sum 5 0.
DOFOR k 5 1, m
sum 5 sum 1 a(i,k) ? b(k,j)
END DO
c(i,j) 5 sum
END DO
END DO
FIGURE PT3.4
PT3.2 MATHEMATICAL BACKGROUND 239
Similar formulas for higher-dimensional matrices are much more involved. Sections in
Chaps. 10 and 11 will be devoted to techniques for using numerical methods and the
computer to calculate the inverse for such systems.
Two other matrix manipulations that will have utility in our discussion are the trans-
pose and the trace of a matrix. The transpose of a matrix involves transforming its rows
into columns and its columns into rows. For example, for the 4 3 4 matrix,
[A] 5 ≥
a11 a12 a13 a14
a21 a22 a23 a24
a31 a32 a33 a34
a41 a42 a43 a44
¥
the transpose, designated [A]T
, is defined as
[A]T
5 ≥
a11 a21 a31 a41
a12 a22 a32 a42
a13 a23 a33 a43
a14 a24 a34 a44
¥
In other words, the element aij of the transpose is equal to the aji element of the original
matrix.
The transpose has a variety of functions in matrix algebra. One simple advantage is
that it allows a column vector to be written as a row. For example, if
{C} 5 μ
c1
c2
c3
c4
∂
then
{C}T
5 :c1 c2 c3 c4 ;
where the superscript T designates the transpose. For example, this can save space when
writing a column vector in a manuscript. In addition, the transpose has numerous math-
ematical applications.
The trace of a matrix is the sum of the elements on its principal diagonal. It is
designated as tr [A] and is computed as
tr [A] 5 a
n
i51
aii
The trace will be used in our discussion of eigenvalues in Chap. 27.
The final matrix manipulation that will have utility in our discussion is augmentation.
A matrix is augmented by the addition of a column (or columns) to the original matrix.
For example, suppose we have a matrix of coefficients:
[A] 5 £
a11 a12 a13
a21 a22 a23
a31 a32 a33
§
240 LINEAR ALGEBRAIC EQUATIONS
We might wish to augment this matrix [A] with an identity matrix (recall Box PT3.1) to
yield a 3-by-6-dimensional matrix:
[A] 5 £
a11 a12 a13
a21 a22 a23
a31 a32 a33
1 0 0
0 1 0
0 0 1
§
Such an expression has utility when we must perform a set of identical operations on
two matrices. Thus, we can perform the operations on the single augmented matrix rather
than on the two individual matrices.
PT3.2.3 Representing Linear Algebraic Equations in Matrix Form
It should be clear that matrices provide a concise notation for representing simultaneous
linear equations. For example, Eq. (PT3.1) can be expressed as
[A]{X} 5 {B} (PT3.5)
where [A] is the n by n square matrix of coefficients,
[A] 5 F
a11 a12 p a1n
a21 a22 p a2n
. . .
. . .
. . .
an1 an2 p ann
V
{B} is the n by 1 column vector of constants,
{B}T
5 :b1 b2
p bn ;
and {X} is the n by 1 column vector of unknowns:
{X}T
5 :x1 x2
p xn ;
Recall the definition of matrix multiplication [Eq. (PT3.2) or Box PT3.2] to convince
yourself that Eqs. (PT3.1) and (PT3.5) are equivalent. Also, realize that Eq. (PT3.5) is
a valid matrix multiplication because the number of columns, n, of the first matrix [A]
is equal to the number of rows, n, of the second matrix {X}.
This part of the book is devoted to solving Eq. (PT3.5) for {X}. A formal way to
obtain a solution using matrix algebra is to multiply each side of the equation by the
inverse of [A] to yield
[A]21
[A]{X} 5 [A]21
{B}
Because [A]21
[A] equals the identity matrix, the equation becomes
{X} 5 [A]21
{B} (PT3.6)
Therefore, the equation has been solved for {X}. This is another example of how the
inverse plays a role in matrix algebra that is similar to division. It should be noted that
this is not a very efficient way to solve a system of equations. Thus, other approaches
PT3.3 ORIENTATION 241
are employed in numerical algorithms. However, as discussed in Chap. 10, the matrix
inverse itself has great value in the engineering analyses of such systems.
Finally, we will sometimes find it useful to augment [A] with {B}. For example, if
n 5 3, this results in a 3-by-4-dimensional matrix:
[A] 5 £
a11 a12 a13 b1
a21 a22 a23 b2
a31 a32 a33 b3
§ (PT3.7)
Expressing the equations in this form is useful because several of the techniques for
solving linear systems perform identical operations on a row of coefficients and the cor-
responding right-hand-side constant. As expressed in Eq. (PT3.7), we can perform the
manipulation once on an individual row of the augmented matrix rather than separately
on the coefficient matrix and the right-hand-side vector.
PT3.3 ORIENTATION
Before proceeding to the numerical methods, some further orientation might be helpful.
The following is intended as an overview of the material discussed in Part Three. In
addition, we have formulated some objectives to help focus your efforts when studying
the material.
PT3.3.1 Scope and Preview
Figure PT3.5 provides an overview for Part Three. Chapter 9 is devoted to the most
fundamental technique for solving linear algebraic systems: Gauss elimination. Before
launching into a detailed discussion of this technique, a preliminary section deals with
simple methods for solving small systems. These approaches are presented to provide
you with visual insight and because one of the methods—the elimination of unknowns—
represents the basis for Gauss elimination.
After the preliminary material, “naive’’ Gauss elimination is discussed. We start with
this “stripped-down” version because it allows the fundamental technique to be elabo-
rated on without complicating details. Then, in subsequent sections, we discuss potential
problems of the naive approach and present a number of modifications to minimize and
circumvent these problems. The focus of this discussion will be the process of switching
rows, or partial pivoting.
Chapter 10 begins by illustrating how Gauss elimination can be formulated as an
LU decomposition solution. Such solution techniques are valuable for cases where many
right-hand-side vectors need to be evaluated. It is shown how this attribute allows
efficient calculation of the matrix inverse, which has tremendous utility in engineering
practice. Finally, the chapter ends with a discussion of matrix condition. The condition
number is introduced as a measure of the loss of significant digits of accuracy that can
result when solving ill-conditioned matrices.
The beginning of Chap. 11 focuses on special types of systems of equations that have
broad engineering application. In particular, efficient techniques for solving tridiagonal
systems are presented. Then, the remainder of the chapter focuses on an alternative to
elimination methods called the Gauss-Seidel method. This technique is similar in spirit to
242 LINEAR ALGEBRAIC EQUATIONS
FIGURE PT3.5
Schematic of the organization of the material in Part Three: Linear Algebraic Equations.
PT 3.1
Motivation
PT 3.2
Mathematical
background PT 3.3
Orientation
9.1
Small
systems
9.2
Naive Gauss
elimination
PART 3
Linear Algebraic
Equations
PT 3.6
Advanced
methods
EPILOGUE
CHAPTER 9
Gauss
Elimination
PT 3.5
Important
formulas
PT 3.4
Trade-offs
12.4
Mechanical
engineering
12.3
Electrical
engineering
12.2
Civil
engineering 12.1
Chemical
engineering
11.3
Software
11.2
Gauss-
Seidel
11.1
Special
matrices
CHAPTER 10
LU Decomposition
and
Matrix Inversion
CHAPTER 11
Special Matrices
and Gauss-Seidel
CHAPTER 12
Engineering
Case Studies
10.3
System
condition
10.2
Matrix
inverse
10.1
LU
decomposition
9.7
Gauss-Jordan
9.6
Nonlinear
systems
9.5
Complex
systems
9.4
Remedies
9.3
Pitfalls
PT3.3 ORIENTATION 243
the approximate methods for roots of equations that were discussed in Chap. 6. That
is, the technique involves guessing a solution and then iterating to obtain a refined
estimate. The chapter ends with information related to solving linear algebraic equations
with software packages.
Chapter 12 demonstrates how the methods can actually be applied for problem solv-
ing. As with other parts of the book, applications are drawn from all fields of engineering.
Finally, an epilogue is included at the end of Part Three. This review includes dis-
cussion of trade-offs that are relevant to implementation of the methods in engineering
practice. This section also summarizes the important formulas and advanced methods
related to linear algebraic equations. As such, it can be used before exams or as a
refresher after you have graduated and must return to linear algebraic equations as a
professional.
PT3.3.2 Goals and Objectives
Study Objectives. After completing Part Three, you should be able to solve problems
involving linear algebraic equations and appreciate the application of these equations in
many fields of engineering. You should strive to master several techniques and assess
their reliability. You should understand the trade-offs involved in selecting the “best”
method (or methods) for any particular problem. In addition to these general objectives,
the specific concepts listed in Table PT3.1 should be assimilated and mastered.
Computer Objectives. Your most fundamental computer objectives are to be able to
solve a system of linear algebraic equations and to evaluate the matrix inverse. You will
TABLE PT3.1 Specific study objectives for Part Three.
1. Understand the graphical interpretation of ill-conditioned systems and how it relates to the
determinant.
2. Be familiar with terminology: forward elimination, back substitution, pivot equation, and pivot
coefficient.
3. Understand the problems of division by zero, round-off error, and ill-conditioning.
4. Know how to compute the determinant using Gauss elimination.
5. Understand the advantages of pivoting; realize the difference between partial and complete
pivoting.
6. Know the fundamental difference between Gauss elimination and the Gauss-Jordan method and
which is more efficient.
7. Recognize how Gauss elimination can be formulated as an LU decomposition.
8. Know how to incorporate pivoting and matrix inversion into an LU decomposition algorithm.
9. Know how to interpret the elements of the matrix inverse in evaluating stimulus response
computations in engineering.
10. Realize how to use the inverse and matrix norms to evaluate system condition.
11. Understand how banded and symmetric systems can be decomposed and solved efficiently.
12. Understand why the Gauss-Seidel method is particularly well suited for large, sparse systems of
equations.
13. Know how to assess diagonal dominance of a system of equations and how it relates to whether
the system can be solved with the Gauss-Seidel method.
14. Understand the rationale behind relaxation; know where underrelaxation and overrelaxation are
appropriate.
244 LINEAR ALGEBRAIC EQUATIONS
want to have subprograms developed for LU decomposition of both full and tridiagonal
matrices. You may also want to have your own software to implement the Gauss-Seidel
method.
You should know how to use packages to solve linear algebraic equations and
find the matrix inverse. You should become familiar with how the same evaluations
can be implemented on popular software packages such as Excel, MATLAB software,
and Mathcad.
9
C H A P T E R 9
245
Gauss Elimination
This chapter deals with simultaneous linear algebraic equations that can be represented
generally as
a11x1 1 a12x2 1 p 1 a1nxn 5 b1
a21x1 1 a22x2 1 p 1 a2n xn 5 b2
. . (9.1)
. .
. .
an1x1 1 an2 x2 1 p 1 ann xn 5 bn
where the a’s are constant coefficients and the b’s are constants.
The technique described in this chapter is called Gauss elimination because it involves
combining equations to eliminate unknowns. Although it is one of the earliest methods
for solving simultaneous equations, it remains among the most important algorithms in
use today and is the basis for linear equation solving on many popular software packages.
9.1 SOLVING SMALL NUMBERS OF EQUATIONS
Before proceeding to the computer methods, we will describe several methods that are
appropriate for solving small (n # 3) sets of simultaneous equations and that do not
require a computer. These are the graphical method, Cramer’s rule, and the elimination
of unknowns.
9.1.1 The Graphical Method
A graphical solution is obtainable for two equations by plotting them on Cartesian co-
ordinates with one axis corresponding to x1 and the other to x2. Because we are dealing
with linear systems, each equation is a straight line. This can be easily illustrated for the
general equations
a11x1 1 a12 x2 5 b1
a21x1 1 a22x2 5 b2
246 GAUSS ELIMINATION
Both equations can be solved for x2:
x2 5 2a
a11
a12
b x1 1
b1
a12
x2 5 2a
a21
a22
b x1 1
b2
a22
Thus, the equations are now in the form of straight lines; that is, x2 5 (slope) x1 1 inter-
cept. These lines can be graphed on Cartesian coordinates with x2 as the ordinate and x1
as the abscissa. The values of x1 and x2 at the intersection of the lines represent the solution.
EXAMPLE 9.1 The Graphical Method for Two Equations
Problem Statement. Use the graphical method to solve
3x1 1 2x2 5 18 (E9.1.1)
2x1 1 2x2 5 2 (E9.1.2)
Solution. Let x1 be the abscissa. Solve Eq. (E9.1.1) for x2:
x2 5 2
3
2
x1 1 9
which, when plotted on Fig. 9.1, is a straight line with an intercept of 9 and a slope of 23y2.
FIGURE 9.1
Graphical solution of a set of two simultaneous linear algebraic equations. The intersection of the
lines represents the solution.
0 6
2 4
0
6
2
4
8
x2
x1
Solution: x1  4; x2  3
x1
 2x2
 2
3
x
1

2
x
2

1
8
9.1 SOLVING SMALL NUMBERS OF EQUATIONS 247
For three simultaneous equations, each equation would be represented by a plane in
a three-dimensional coordinate system. The point where the three planes intersect would
represent the solution. Beyond three equations, graphical methods break down and, con-
sequently, have little practical value for solving simultaneous equations. However, they
sometimes prove useful in visualizing properties of the solutions. For example, Fig. 9.2
depicts three cases that can pose problems when solving sets of linear equations. Figure
9.2a shows the case where the two equations represent parallel lines. For such situations,
there is no solution because the lines never cross. Figure 9.2b depicts the case where the
two lines are coincident. For such situations there is an infinite number of solutions. Both
types of systems are said to be singular. In addition, systems that are very close to being
singular (Fig. 9.2c) can also cause problems. These systems are said to be ill-conditioned.
Graphically, this corresponds to the fact that it is difficult to identify the exact point at
which the lines intersect. Ill-conditioned systems will also pose problems when they are
encountered during the numerical solution of linear equations. This is because they will
be extremely sensitive to round-off error (recall Sec. 4.2.3).
Equation (E9.1.2) can also be solved for x2:
x2 5
1
2
x1 1 1
which is also plotted on Fig. 9.1. The solution is the intersection of the two lines at x1 5 4
and x2 5 3. This result can be checked by substituting these values into the original
equations to yield
3(4) 1 2(3) 5 18
2(4) 1 2(3) 5 2
Thus, the results are equivalent to the right-hand sides of the original equations.
FIGURE 9.2
Graphical depiction of singular and ill-conditioned systems: (a) no solution, (b) infinite solutions,
and (c) ill-conditioned system where the slopes are so close that the point of intersection is
difficult to detect visually.
x2
x1
x1
 x2
 1
x1
 x2

(a) (b)
x2
x1
x1
 2x2
 2
x1
 x2
 1
(c)
x2
x1
x1
 x2
 1
 2
1
x1
 x2
 1.1
 5
2.3
 2
1
 2
1
2
1
 2
1
248 GAUSS ELIMINATION
9.1.2 Determinants and Cramer’s Rule
Cramer’s rule is another solution technique that is best suited to small numbers of equa-
tions. Before describing this method, we will briefly introduce the concept of the deter-
minant, which is used to implement Cramer’s rule. In addition, the determinant has
relevance to the evaluation of the ill-conditioning of a matrix.
Determinants. The determinant can be illustrated for a set of three equations:
[A]{X} 5 {B}
where [A] is the coefficient matrix:
[A] 5 £
a11 a12 a13
a21 a22 a23
a31 a32 a33
§
The determinant D of this system is formed from the coefficients of the equation, as in
D 5 †
a11 a12 a13
a21 a22 a23
a31 a32 a33
† (9.2)
Although the determinant D and the coefficient matrix [A] are composed of the same
elements, they are completely different mathematical concepts. That is why they are
distinguished visually by using brackets to enclose the matrix and straight lines to enclose
the determinant. In contrast to a matrix, the determinant is a single number. For example,
the value of the second-order determinant
D 5 `
a11 a12
a21 a22
`
is calculated by
D 5 a11a22 2 a12a21 (9.3)
For the third-order case [Eq. (9.2)], a single numerical value for the determinant can be
computed as
D 5 a11 `
a22 a23
a32 a33
` 2a12 `
a21 a23
a31 a33
` 1a13 `
a21 a22
a31 a32
` (9.4)
where the 2 by 2 determinants are called minors.
EXAMPLE 9.2 Determinants
Problem Statement. Compute values for the determinants of the systems represented
in Figs. 9.1 and 9.2.
Solution. For Fig. 9.1:
D 5 `
3 2
21 2
` 5 3(2) 2 2(21) 5 8
9.1 SOLVING SMALL NUMBERS OF EQUATIONS 249
In the foregoing example, the singular systems had zero determinants. Additionally,
the results suggest that the system that is almost singular (Fig. 9.2c) has a determinant
that is close to zero. These ideas will be pursued further in our subsequent discussion of
ill-conditioning (Sec. 9.3.3).
Cramer’s Rule. This rule states that each unknown in a system of linear algebraic equa-
tions may be expressed as a fraction of two determinants with denominator D and with
the numerator obtained from D by replacing the column of coefficients of the unknown
in question by the constants b1, b2, . . . , bn. For example, x1 would be computed as
x1 5
†
b1 a12 a13
b2 a22 a23
b3 a32 a33
†
D
(9.5)
EXAMPLE 9.3 Cramer’s Rule
Problem Statement. Use Cramer’s rule to solve
0.3x1 1 0.52x2 1 x3 5 20.01
0.5x1 1 x2 1 1.9x3 5 0.67
0.1x1 1 0.3x2 1 0.5x3 5 20.44
Solution. The determinant D can be written as [Eq. (9.2)]
D 5 †
0.3 0.52 1
0.5 1 1.9
0.1 0.3 0.5
†
The minors are [Eq. (9.3)]
A1 5 `
1 1.9
0.3 0.5
` 5 1(0.5) 2 1.9(0.3) 5 20.07
A2 5 `
0.5 1.9
0.1 0.5
` 5 0.5(0.5) 2 1.9(0.1) 5 0.06
For Fig. 9.2a:
D 5 `
21y2 1
21y2 1
` 5
21
2
(1) 2 1a
21
2
b 5 0
For Fig. 9.2b:
D 5 `
21y2 1
21 2
` 5
21
2
(2) 2 1(21) 5 0
For Fig. 9.2c:
D 5 `
21y2 1
22.3y5 1
` 5
21
2
(1) 2 1a
22.3
5
b 5 20.04
250 GAUSS ELIMINATION
A3 5 `
0.5 1
0.1 0.3
` 5 0.5(0.3) 2 1(0.1) 5 0.05
These can be used to evaluate the determinant, as in [Eq. (9.4)]
D 5 0.3(20.07) 2 0.52(0.06) 1 1(0.05) 5 20.0022
Applying Eq. (9.5), the solution is
x1 5
†
20.01 0.52 1
0.67 1 1.9
20.44 0.3 0.5
†
20.0022
5
0.03278
20.0022
5 214.9
x2 5
†
0.3 20.01 1
0.5 0.67 1.9
0.1 20.44 0.5
†
20.0022
5
0.0649
20.0022
5 229.5
x3 5
†
0.3 0.52 20.01
0.5 1 0.67
0.1 0.3 20.44
†
20.0022
5
20.04356
20.0022
5 19.8
For more than three equations, Cramer’s rule becomes impractical because, as the
number of equations increases, the determinants are time consuming to evaluate by hand
(or by computer). Consequently, more efficient alternatives are used. Some of these al-
ternatives are based on the last noncomputer solution technique covered in the next
section—the elimination of unknowns.
9.1.3 The Elimination of Unknowns
The elimination of unknowns by combining equations is an algebraic approach that can
be illustrated for a set of two equations:
a11x1 1 a12x2 5 b1 (9.6)
a21x1 1 a22x2 5 b2 (9.7)
The basic strategy is to multiply the equations by constants so that one of the unknowns
will be eliminated when the two equations are combined. The result is a single equation
that can be solved for the remaining unknown. This value can then be substituted into
either of the original equations to compute the other variable.
For example, Eq. (9.6) might be multiplied by a21 and Eq. (9.7) by a11 to give
a11a21x1 1 a12a21x2 5 b1a21 (9.8)
a21a11x1 1 a22a11x2 5 b2a11 (9.9)
9.1 SOLVING SMALL NUMBERS OF EQUATIONS 251
Subtracting Eq. (9.8) from Eq. (9.9) will, therefore, eliminate the x1 term from the equa-
tions to yield
a22a11x2 2 a12a21x2 5 b2a11 2 b1a21
which can be solved for
x2 5
a11b2 2 a21b1
a11a22 2 a12a21
(9.10)
Equation (9.10) can then be substituted into Eq. (9.6), which can be solved for
x1 5
a22b1 2 a12b2
a11a22 2 a12a21
(9.11)
Notice that Eqs. (9.10) and (9.11) follow directly from Cramer’s rule, which states
x1 5
`
b1 a12
b2 a22
`
`
a11 a12
a21 a22
`
5
b1a22 2 a12b2
a11a22 2 a12a21
x2 5
`
a11 b1
a21 b2
`
`
a11 a12
a21 a22
`
5
a11b22 2 b1a21
a11a22 2 a12a21
EXAMPLE 9.4 Elimination of Unknowns
Problem Statement. Use the elimination of unknowns to solve (recall Example 9.1)
3x1 1 2x2 5 18
2x1 1 2x2 5 2
Solution. Using Eqs. (9.11) and (9.10),
x1 5
2(18) 2 2(2)
3(2) 2 2(21)
5 4
x2 5
3(2) 2 (21)18
3(2) 2 2(21)
5 3
which is consistent with our graphical solution (Fig. 9.1).
The elimination of unknowns can be extended to systems with more than two or
three equations. However, the numerous calculations that are required for larger systems
make the method extremely tedious to implement by hand. However, as described in the
next section, the technique can be formalized and readily programmed for the computer.
252 GAUSS ELIMINATION
9.2 NAIVE GAUSS ELIMINATION
In the previous section, the elimination of unknowns was used to solve a pair of simul-
taneous equations. The procedure consisted of two steps:
1. The equations were manipulated to eliminate one of the unknowns from the equations.
The result of this elimination step was that we had one equation with one unknown.
2. Consequently, this equation could be solved directly and the result back-substituted
into one of the original equations to solve for the remaining unknown.
This basic approach can be extended to large sets of equations by developing a
systematic scheme or algorithm to eliminate unknowns and to back-substitute. Gauss
elimination is the most basic of these schemes.
This section includes the systematic techniques for forward elimination and back sub-
stitution that comprise Gauss elimination. Although these techniques are ideally suited for
implementation on computers, some modifications will be required to obtain a reliable algo-
rithm. In particular, the computer program must avoid division by zero. The following method
is called “naive” Gauss elimination because it does not avoid this problem. Subsequent
sections will deal with the additional features required for an effective computer program.
The approach is designed to solve a general set of n equations:
a11x1 1 a12x2 1 a13x3 1 p 1 a1nxn 5 b1 (9.12a)
a21x1 1 a22x2 1 a23x3 1 p 1 a2nxn 5 b2 (9.12b)
. .
. .
. .
an1x1 1 an2x2 1 an3x3 1 p 1 annxn 5 bn (9.12c)
As was the case with the solution of two equations, the technique for n equations consists
of two phases: elimination of unknowns and solution through back substitution.
Forward Elimination of Unknowns. The first phase is designed to reduce the set of
equations to an upper triangular system (Fig. 9.3). The initial step will be to eliminate
the first unknown, x1, from the second through the nth equations. To do this, multiply
Eq. (9.12a) by a21Ya11 to give
a21x1 1
a21
a11
a12x2 1 p 1
a21
a11
a1n xn 5
a21
a11
b1 (9.13)
Now, this equation can be subtracted from Eq. (9.12b) to give
aa22 2
a21
a11
a12b x2 1 p 1 aa2n 2
a21
a11
a1nb xn 5 b2 2
a21
a11
b1
or
a¿
22 x2 1 p 1 a¿
2n xn 5 b¿
2
where the prime indicates that the elements have been changed from their original values.
The procedure is then repeated for the remaining equations. For instance, Eq. (9.12a)
can be multiplied by a31ya11 and the result subtracted from the third equation. Repeating
9.2 NAIVE GAUSS ELIMINATION 253
the procedure for the remaining equations results in the following modified system:
a11x1 1 a12x2 1 a13x3 1 p 1 a1nxn 5 b1 (9.14a)
a¿
22x2 1 a¿
23x3 1 p 1 a¿
2nxn 5 b¿
2 (9.14b)
a¿
32x2 1 a¿
33x3 1 p 1 a¿
3nxn 5 b¿
3 (9.14c)
. .
. .
. .
a¿
n2x2 1 a¿
n3x3 1 p 1 a¿
nnxn 5 b¿
n (9.14d)
For the foregoing steps, Eq. (9.12a) is called the pivot equation and a11 is called the
pivot coefficient or element. Note that the process of multiplying the first row by a21ya11
is equivalent to dividing it by a11 and multiplying it by a21. Sometimes the division
operation is referred to as normalization. We make this distinction because a zero pivot
element can interfere with normalization by causing a division by zero. We will return
to this important issue after we complete our description of naive Gauss elimination.
Now repeat the above to eliminate the second unknown from Eq. (9.14c) through
(9.14d). To do this multiply Eq. (9.14b) by a9
32ya9
22 and subtract the result from Eq.
(9.14c). Perform a similar elimination for the remaining equations to yield
a11x1 1 a12x2 1 a13x3 1 p 1 a1nxn 5 b1
a¿
22x2 1 a¿
23x3 1 p 1 a¿
2nxn 5 b¿
2
a–
33x3 1 p 1 a–
3nxn 5 b–
2
. .
. .
. .
a–
n3 x3 1 p 1 a–
nnxn 5 b–
n
where the double prime indicates that the elements have been modified twice.
FIGURE 9.3
The two phases of Gauss
elimination: forward elimination
and back substitution. The
primes indicate the number of
times that the coefficients and
constants have been modified.
£
a11 a12 a13 b1
a21 a22 a23 b2
a31 a32 a33 b3
§
2
£
a11 a12 a13 b1
a'22 a'23 b'2
a''
33 b''
3
§
2
x3 5 b''
3ya''
33
x2 5 (b'2 2 a'2333)ya'22
x1 5 (b1 2 a1232 2 a1333)ya11
Forward
elimination
Back
substitution
254 GAUSS ELIMINATION
The procedure can be continued using the remaining pivot equations. The final ma-
nipulation in the sequence is to use the (n 2 1)th equation to eliminate the xn21 term
from the nth equation. At this point, the system will have been transformed to an upper
triangular system (recall Box PT3.1):
a11x1 1 a12x2 1 a13x3 1 p 1 a1nxn 5 b1 (9.15a)
a¿22x2 1 a¿
23x3 1 p 1 a¿
2nxn 5 b¿
2 (9.15b)
a–
33x3 1 p 1 a–
3nxn 5 b–
3 (9.15c)
. .
. .
. .
a(n21)
nn xn 5 bn
(n21)
(9.15d)
Pseudocode to implement forward elimination is presented in Fig. 9.4a. Notice that three
nested loops provide a concise representation of the process. The outer loop moves down the
matrix from one pivot row to the next. The middle loop moves below the pivot row to each
of the subsequent rows where elimination is to take place. Finally, the innermost loop pro-
gresses across the columns to eliminate or transform the elements of a particular row.
Back Substitution. Equation (9.15d) can now be solved for xn:
xn 5
b(n21)
n
a(n21)
nn
(9.16)
This result can be back-substituted into the (n 2 l)th equation to solve for xn21. The procedure,
which is repeated to evaluate the remaining x’s, can be represented by the following formula:
xi 5
b(i21)
i 2 a
n
j5i11
a(i21)
ij xj
a(i21)
ii
for i 5 n 2 1, n 2 2, p , 1 (9.17)
(a) DOFOR k 5 1, n 2 1
DOFOR i 5 k 1 1, n
factor 5 ai,k y ak,k
DOFOR j 5 k 1 1 to n
ai,j 5 ai,j 2 factor ? ak,j
END DO
bi 5 bi 2 factor ? bk
END DO
END DO
(b) xn 5 bn y an,n
DOFOR i 5 n 2 1, 1, 21
sum 5 bi
DOFOR j 5 i 1 1, n
sum 5 sum 2 ai,j ? xj
END DO
xi 5 sum y ai,i
END DO
FIGURE 9.4
Pseudocode to perform (a) for-
ward elimination and (b) back
substitution.
9.2 NAIVE GAUSS ELIMINATION 255
Pseudocode to implement Eqs. (9.16) and (9.17) is presented in Fig. 9.4b. Notice
the similarity between this pseudocode and that in Fig. PT3.4 for matrix multiplication.
As with Fig. PT3.4, a temporary variable, sum, is used to accumulate the summation
from Eq. (9.17). This results in a somewhat faster execution time than if the summation
were accumulated in bi. More importantly, it allows efficient improvement in precision
if the variable, sum, is declared in double precision.
EXAMPLE 9.5 Naive Gauss Elimination
Problem Statement. Use Gauss elimination to solve
3x1 2 0.1x2 2 0.2x3 5 7.85 (E9.5.1)
0.1x1 1 7x2 2 0.3x3 5 219.3 (E9.5.2)
0.3x1 2 0.2x2 1 10x3 5 71.4 (E9.5.3)
Carry six significant figures during the computation.
Solution. The first part of the procedure is forward elimination. Multiply Eq. (E9.5.1)
by (0.1)y3 and subtract the result from Eq. (E9.5.2) to give
7.00333x2 2 0.293333x3 5 219.5617
Then multiply Eq. (E9.5.1) by (0.3)y3 and subtract it from Eq. (E9.5.3) to eliminate x1.
After these operations, the set of equations is
3x1 20.1x2 20.2x3 5 7.85 (E9.5.4)
7.00333x2 2 0.293333x3 5 219.5617 (E9.5.5)
20.190000x2 1 10.0200x3 5 70.6150 (E9.5.6)
To complete the forward elimination, x2 must be removed from Eq. (E9.5.6). To accom-
plish this, multiply Eq. (E9.5.5) by 20.190000y7.00333 and subtract the result from
Eq. (E9.5.6). This eliminates x2 from the third equation and reduces the system to an
upper triangular form, as in
3x1 20.1x2 20.2x3 5 7.85 (E9.5.7)
7.00333x2 2 0.293333x3 5 219.5617 (E9.5.8)
10.0120x3 5 70.0843 (E9.5.9)
We can now solve these equations by back substitution. First, Eq. (E9.5.9) can be solved
for
x3 5
70.0843
10.0120
5 7.0000 (E9.5.10)
This result can be back-substituted into Eq. (E9.5.8):
7.00333x2 2 0.293333(7.0000) 5 219.5617
which can be solved for
x2 5
2 19.5617 1 0.293333(7.0000)
7.00333
5 22.50000 (E9.5.11)
256 GAUSS ELIMINATION
Finally, Eqs. (E9.5.10) and (E9.5.11) can be substituted into Eq. (E9.5.4):
3x1 2 0.1(22.50000) 2 0.2(7.0000) 5 7.85
which can be solved for
x1 5
7.85 1 0.1(22.50000) 1 0.2(7.0000)
3
5 3.00000
The results are identical to the exact solution of x1 5 3, x2 5 22.5, and x3 5 7. This
can be verified by substituting the results into the original equation set
3(3) 2 0.1(22.5) 2 0.2(7) 5 7.85
0.1(3) 1 7(22.5) 2 0.3(7) 5 219.3
0.3(3) 2 0.2(22.5) 1 10(7) 5 71.4
9.2.1 Operation Counting
The execution time of Gauss elimination depends on the amount of floating-point
operations (or flops) involved in the algorithm. On modern computers using math copro-
cessors, the time consumed to perform addition/subtraction and multiplication/division
is about the same. Therefore, totaling up these operations provides insight into which
parts of the algorithm are most time consuming and how computation time increases as
the system gets larger.
Before analyzing naive Gauss elimination, we will first define some quantities that
facilitate operation counting:
a
m
i51
cf (i) 5 c a
m
i51
f(i) a
m
i51
f(i) 1 g(i) 5 a
m
i51
f(i) 1 a
m
i51
g(i) (9.18a,b)
a
m
i51
1 5 1 1 1 1 1 1 p 1 1 5 m a
m
i5k
1 5 m 2 k 1 1 (9.18c,d)
a
m
i51
i 5 1 1 2 1 3 1 p 1 m 5
m(m 1 1)
2
5
m2
2
1 O(m) (9.18e)
a
m
i51
i2
5 12
1 22
1 32
1 p 1 m2
5
m(m 1 1)(2m 1 1)
6
5
m3
3
1 O(m2
) (9.18f)
where O(mn
) means “terms of order mn
and lower.”
Now let us examine the naive Gauss elimination algorithm (Fig. 9.4a) in detail. We
will first count the flops in the elimination stage. On the first pass through the outer loop,
k 5 1. Therefore, the limits on the middle loop are from i 5 2 to n. According to Eq.
(9.18d), this means that the number of iterations of the middle loop will be
a
n
i52
1 5 n 2 2 1 1 5 n 2 1 (9.19)
For every one of these iterations, there is one division to define the factor. The interior loop
then performs a single multiplication and subtraction for each iteration from j 5 2 to n.
Finally, there is one additional multiplication and subtraction for the right-hand-side value.
9.2 NAIVE GAUSS ELIMINATION 257
Thus, for every iteration of the middle loop, the number of multiplications is
1 1 [n 2 2 1 1] 1 1 5 1 1 n (9.20)
The total multiplications for the first pass through the outer loop is therefore obtained
by multiplying Eq. (9.19) by (9.20) to give [n 2 1](1 1 n). In like fashion, the number
of subtractions is computed as [n 2 1](n).
Similar reasoning can be used to estimate the flops for the subsequent iterations of
the outer loop. These can be summarized as
Outer Loop Middle Loop Addition/Subtraction Multiplication/Division
k i flops flops
1 2, n (n 2 1)(n) (n 2 1)(n 1 1)
2 3, n (n 2 2)(n – 1) (n 2 2)(n)
. .
. .
. .
k k 1 1, n (n 2 k)(n 1 1 2 k) (n 2 k)(n 1 2 2 k)
. .
. .
. .
n 2 1 n, n (1)(2) (1) (3)
Therefore, the total addition/subtraction flops for elimination can be computed as
a
n21
k51
(n 2 k)(n 1 1 2 k) 5 a
n21
k51
[n(n 1 1) 2 k(2n 1 1) 1 k2
]
or
n(n 1 1) a
n21
k51
1 2 (2n 1 1) a
n21
k51
k 1 a
n21
k51
k2
Applying some of the relationships from Eq. (9.18) yields
[n3
1 O(n)] 2 [n3
1 O(n2
)] 1 c
1
3
n3
1 O(n2
) d 5
n3
3
1 O(n) (9.21)
A similar analysis for the multiplication/division flops yields
[n3
1 O(n2
)] 2 [n3
1 O(n)] 1 c
1
3
n3
1 O(n2
) d 5
n3
3
1 O(n2
) (9.22)
Summing these results gives
2n3
3
1 O(n2
)
Thus, the total number of flops is equal to 2n3
y3 plus an additional component
proportional to terms of order n2
and lower. The result is written in this way because as
n gets large, the O(n2
) and lower terms become negligible. We are therefore justified in
concluding that for large n, the effort involved in forward elimination converges on 2n3
/3.
Because only a single loop is used, back substitution is much simpler to evaluate.
The number of addition/subtraction flops is equal to n(n 2 1)y2. Because of the extra
258 GAUSS ELIMINATION
division prior to the loop, the number of multiplication/division flops is n(n 1 1)y2.
These can be added to arrive at a total of
n2
1 O(n)
Thus, the total effort in naive Gauss elimination can be represented as
2n3
3
1 O(n2
) 1 n2
1 O(n) ———
—
S
as n increases 2n3
3
1 O(n2
) (9.23)
Forward Backward
elimination substitution
Two useful general conclusions can be drawn from this analysis:
1. As the system gets larger, the computation time increases greatly. As in Table 9.1,
the amount of flops increases nearly three orders of magnitude for every order of
magnitude increase in the dimension.
2. Most of the effort is incurred in the elimination step. Thus, efforts to make the method
more efficient should probably focus on this step.
9.3 PITFALLS OF ELIMINATION METHODS
Whereas there are many systems of equations that can be solved with naive Gauss elimina-
tion, there are some pitfalls that must be explored before writing a general computer
program to implement the method. Although the following material relates directly to naive
Gauss elimination, the information is relevant for other elimination techniques as well.
9.3.1 Division by Zero
The primary reason that the foregoing technique is called “naive” is that during both the
elimination and the back-substitution phases, it is possible that a division by zero can
occur. For example, if we use naive Gauss elimination to solve
2x2 1 3x3 5 8
4x1 1 6x2 1 7x3 5 23
2x1 1 x2 1 6x3 5 5
the normalization of the first row would involve division by a11 5 0. Problems also can
arise when a coefficient is very close to zero. The technique of pivoting has been devel-
oped to partially avoid these problems. It will be described in Sec. 9.4.2.
TABLE 9.1 Number of Flops for Gauss Elimination.
Back Total Percent Due
n Elimination Substitution Flops 2n3
/3 to Elimination
10 705 100 805 667 87.58%
100 671550 10000 681550 666667 98.53%
1000 6.67 3 108
1 3 106
6.68 3 108
6.67 3 108
99.85%
9.3 PITFALLS OF ELIMINATION METHODS 259
9.3.2 Round-Off Errors
Even though the solution in Example 9.5 was close to the true answer, there was a slight
discrepancy in the result for x3 [Eq. (E9.5.10)]. This discrepancy, which amounted to a
relative error of 20.00043 percent, was due to our use of six significant figures during
the computation. If we had used more significant figures, the error in the results would
be reduced further. If we had used fractions instead of decimals (and consequently
avoided round-off altogether), the answers would have been exact. However, because
computers carry only a limited number of significant figures (recall Sec. 3.4.1), round-off
errors can occur and must be considered when evaluating the results.
The problem of round-off error can become particularly important when large num-
bers of equations are to be solved. This is due to the fact that every result is dependent
on previous results. Consequently, an error in the early steps will tend to propagate—that
is, it will cause errors in subsequent steps.
Specifying the system size where round-off error becomes significant is complicated
by the fact that the type of computer and the properties of the equations are determining
factors. A rough rule of thumb is that round-off error may be important when dealing
with 100 or more equations. In any event, you should always substitute your answers
back into the original equations to check whether a substantial error has occurred. How-
ever, as discussed below, the magnitudes of the coefficients themselves can influence
whether such an error check ensures a reliable result.
9.3.3 Ill-Conditioned Systems
The adequacy of the solution depends on the condition of the system. In Sec. 9.1.1, a graph-
ical depiction of system condition was developed. As discussed in Sec. 4.2.3, well-conditioned
systems are those where a small change in one or more of the coefficients results in a simi-
lar small change in the solution. Ill-conditioned systems are those where small changes in
coefficients result in large changes in the solution. An alternative interpretation of ill-condi-
tioning is that a wide range of answers can approximately satisfy the equations. Because
round-off errors can induce small changes in the coefficients, these artificial changes can lead
to large solution errors for ill-conditioned systems, as illustrated in the following example.
EXAMPLE 9.6 Ill-Conditioned Systems
Problem Statement. Solve the following system:
x1 1 2x2 5 10 (E9.6.1)
1.1x1 1 2x2 5 10.4 (E9.6.2)
Then, solve it again, but with the coefficient of x1 in the second equation modified slightly
to 1.05.
Solution. Using Eqs. (9.10) and (9.11), the solution is
x1 5
2(10) 2 2(10.4)
1(2) 2 2(1.1)
5 4
x2 5
1(10.4) 2 1.1(10)
1(2) 2 2(1.1)
5 3
260 GAUSS ELIMINATION
However, with the slight change of the coefficient a21 from 1.1 to 1.05, the result is
changed dramatically to
x1 5
2(10) 2 2(10.4)
1(2) 2 2(1.05)
5 8
x2 5
1(10.4) 2 1.1(10)
1(2) 2 2(1.05)
5 1
Notice that the primary reason for the discrepancy between the two results is that
the denominator represents the difference of two almost-equal numbers. As illustrated
previously in Sec. 3.4.2, such differences are highly sensitive to slight variations in the
numbers being manipulated.
At this point, you might suggest that substitution of the results into the original
equations would alert you to the problem. Unfortunately, for ill-conditioned systems this
is often not the case. Substitution of the erroneous values of x1 5 8 and x2 5 1 into Eqs.
(E9.6.1) and (E9.6.2) yields
8 1 2(1) 5 10 5 10
1.1(8) 1 2(1) 5 10.8  10.4
Therefore, although x1 5 8 and x2 5 1 is not the true solution to the original problem,
the error check is close enough to possibly mislead you into believing that your solutions
are adequate.
As was done previously in the section on graphical methods, a visual representative
of ill-conditioning can be developed by plotting Eqs. (E9.6.1) and (E9.6.2) (recall Fig. 9.2).
Because the slopes of the lines are almost equal, it is visually difficult to see exactly where
they intersect. This visual difficulty is reflected quantitatively in the nebulous results of
Example 9.6. We can mathematically characterize this situation by writing the two equa-
tions in general form:
a11x1 1 a12x2 5 b1 (9.24)
a21x1 1 a22x2 5 b2 (9.25)
Dividing Eq. (9.24) by a12 and Eq. (9.25) by a22 and rearranging yields alternative ver-
sions that are in the format of straight lines [x2 5 (slope) x1 1 intercept]:
x2 5 2
a11
a12
x1 1
b1
a12
x2 5 2
a21
a22
x1 1
b2
a22
Consequently, if the slopes are nearly equal,
a11
a12

a21
a22
9.3 PITFALLS OF ELIMINATION METHODS 261
or, cross-multiplying,
a11a22  a12a21
which can be also expressed as
a11a22 2 a12a21  0 (9.26)
Now, recalling that a11a22 2 a12a2l is the determinant of a two-dimensional system
[Eq. (9.3)], we arrive at the general conclusion that an ill-conditioned system is one with
a determinant close to zero. In fact, if the determinant is exactly zero, the two slopes are
identical, which connotes either no solution or an infinite number of solutions, as is the
case for the singular systems depicted in Fig. 9.2a and b.
It is difficult to specify how close to zero the determinant must be to indicate ill-
conditioning. This is complicated by the fact that the determinant can be changed by
multiplying one or more of the equations by a scale factor without changing the solution.
Consequently, the determinant is a relative value that is influenced by the magnitude of
the coefficients.
EXAMPLE 9.7 Effect of Scale on the Determinant
Problem Statement. Evaluate the determinant of the following systems:
(a) From Example 9.1:
3x1 1 2x2 5 18 (E9.7.1)
2x1 1 2x2 5 2 (E9.7.2)
(b) From Example 9.6:
x1 1 2x2 5 10 (E9.7.3)
1.1x1 1 2x2 5 10.4 (E9.7.4)
(c) Repeat (b) but with the equations multiplied by 10.
Solution.
(a) The determinant of Eqs. (E9.7.1) and (E9.7.2), which are well-conditioned, is
D 5 3(2) 2 2(21) 5 8
(b) The determinant of Eqs. (E9.7.3) and (E9.7.4), which are ill-conditioned, is
D 5 1(2) 2 2(1.1) 5 20.2
(c) The results of (a) and (b) seem to bear out the contention that ill-conditioned systems
have near-zero determinants. However, suppose that the ill-conditioned system in (b)
is multiplied by 10 to give
10x1 1 20x2 5 100
11x1 1 20x2 5 104
The multiplication of an equation by a constant has no effect on its solution. In ad-
dition, it is still ill-conditioned. This can be verified by the fact that multiplying by
262 GAUSS ELIMINATION
As illustrated by the previous example, the magnitude of the coefficients interjects
a scale effect that complicates the relationship between system condition and determinant
size. One way to partially circumvent this difficulty is to scale the equations so that the
maximum element in any row is equal to 1.
EXAMPLE 9.8 Scaling
Problem Statement. Scale the systems of equations in Example 9.7 to a maximum
value of 1 and recompute their determinants.
Solution.
(a) For the well-conditioned system, scaling results in
x1 1 0.667x2 5 6
20.5x1 1 x2 5 1
for which the determinant is
D 5 1(1) 2 0.667(20.5) 5 1.333
(b) For the ill-conditioned system, scaling gives
0.5x1 1 x2 5 5
0.55x1 1 x2 5 5.2
for which the determinant is
D 5 0.5(1) 2 1(0.55) 5 20.05
(c) For the last case, scaling changes the system to the same form as in (b) and the
determinant is also 20.05. Thus, the scale effect is removed.
a constant has no effect on the graphical solution. However, the determinant is
dramatically affected:
D 5 10(20) 2 20(11) 5 220
Not only has it been raised two orders of magnitude, but it is now over twice as
large as the determinant of the well-conditioned system in (a).
In a previous section (Sec. 9.1.2), we suggested that the determinant is difficult to
compute for more than three simultaneous equations. Therefore, it might seem that it
does not provide a practical means for evaluating system condition. However, as de-
scribed in Box 9.1, there is a simple algorithm that results from Gauss elimination that
can be used to evaluate the determinant.
Aside from the approach used in the previous example, there are a variety of other
ways to evaluate system condition. For example, there are alternative methods for nor-
malizing the elements (see Stark, 1970). In addition, as described in the next chapter
(Sec. 10.3), the matrix inverse and matrix norms can be employed to evaluate system
condition. Finally, a simple (but time-consuming) test is to modify the coefficients
9.3 PITFALLS OF ELIMINATION METHODS 263
slightly and repeat the solution. If such modifications lead to drastically different results,
the system is likely to be ill-conditioned.
As you might gather from the foregoing discussion, ill-conditioned systems are prob-
lematic. Fortunately, most linear algebraic equations derived from engineering-problem
settings are naturally well-conditioned. In addition, some of the techniques outlined in
Sec. 9.4 help to alleviate the problem.
9.3.4 Singular Systems
In the previous section, we learned that one way in which a system of equations can be
ill-conditioned is when two or more of the equations are nearly identical. Obviously, it is
even worse when the two are identical. In such cases, we would lose one degree of freedom,
and would be dealing with the impossible case of n 2 1 equations with n unknowns. Such
cases might not be obvious to you, particularly when dealing with large equation sets.
Consequently, it would be nice to have some way of automatically detecting singularity.
The answer to this problem is neatly offered by the fact that the determinant of a
singular system is zero. This idea can, in turn, be connected to Gauss elimination by
recognizing that after the elimination step, the determinant can be evaluated as the prod-
uct of the diagonal elements (recall Box 9.1). Thus, a computer algorithm can test to
discern whether a zero diagonal element is created during the elimination stage. If one
is discovered, the calculation can be immediately terminated and a message displayed
Box 9.1 Determinant Evaluation Using Gauss Elimination
In Sec. 9.1.2, we stated that determinant evaluation by expansion of
minors was impractical for large sets of equations. Thus, we con-
cluded that Cramer’s rule would be applicable only to small sys-
tems. However, as mentioned in Sec. 9.3.3, the determinant has
value in assessing system condition. It would, therefore, be useful
to have a practical method for computing this quantity.
Fortunately, Gauss elimination provides a simple way to do
this. The method is based on the fact that the determinant of a tri-
angular matrix can be simply computed as the product of its diago-
nal elements:
D 5 a11a22a33
p ann (B9.1.1)
The validity of this formulation can be illustrated for a 3 by 3 system:
D 5 †
a11 a12 a13
0 a22 a23
0 0 a33
†
where the determinant can be evaluated as [recall Eq. (9.4)]
D 5 a11 `
a22 a23
0 a33
` 2a12 `
0 a23
0 a33
` 1a13 `
0 a22
0 0
`
or, by evaluating the minors (that is, the 2 by 2 determinants),
D 5 a11a22a33 2 a12(0) 1 a13(0) 5 a11a12a33
Recall that the forward-elimination step of Gauss elimination
results in an upper triangular system. Because the value of the de-
terminant is not changed by the forward-elimination process, the
determinant can be simply evaluated at the end of this step via
D 5 a11a¿
22 a–
33
p a(n21)
nn (B9.1.2)
where the superscripts signify the number of times that the ele-
ments have been modified by the elimination process. Thus, we can
capitalize on the effort that has already been expended in reducing
the system to triangular form and, in the bargain, come up with a
simple estimate of the determinant.
There is a slight modification to the above approach when the
program employs partial pivoting (Sec. 9.4.2). For this case, the
determinant changes sign every time a row is pivoted. One way to
represent this is to modify Eq. (B9.1.2):
D 5 a11a¿
22a–
33
p a(n21)
nn (21)p
(B9.1.3)
where p represents the number of times that rows are pivoted.
This modification can be incorporated simply into a program;
merely keep track of the number of pivots that take place during
the course of the computation and then use Eq. (B9.1.3) to evalu-
ate the determinant.
264 GAUSS ELIMINATION
alerting the user. We will show the details of how this is done when we present a full
algorithm for Gauss elimination later in this chapter.
9.4 TECHNIQUES FOR IMPROVING SOLUTIONS
The following techniques can be incorporated into the naive Gauss elimination algorithm
to circumvent some of the pitfalls discussed in the previous section.
9.4.1 Use of More Significant Figures
The simplest remedy for ill-conditioning is to use more significant figures in the compu-
tation. If your application can be extended to handle larger word size, such a feature will
greatly reduce the problem. However, a price must be paid in the form of the computa-
tional and memory overhead connected with using extended precision (recall Sec. 3.4.1).
9.4.2 Pivoting
As mentioned at the beginning of Sec. 9.3, obvious problems occur when a pivot element
is zero because the normalization step leads to division by zero. Problems may also arise
when the pivot element is close to, rather than exactly equal to, zero because if the
magnitude of the pivot element is small compared to the other elements, then round-off
errors can be introduced.
Therefore, before each row is normalized, it is advantageous to determine the larg-
est available coefficient in the column below the pivot element. The rows can then be
switched so that the largest element is the pivot element. This is called partial pivoting.
If columns as well as rows are searched for the largest element and then switched, the
procedure is called complete pivoting. Complete pivoting is rarely used because switch-
ing columns changes the order of the x’s and, consequently, adds significant and usually
unjustified complexity to the computer program. The following example illustrates the
advantages of partial pivoting. Aside from avoiding division by zero, pivoting also min-
imizes round-off error. As such, it also serves as a partial remedy for ill-conditioning.
EXAMPLE 9.9 Partial Pivoting
Problem Statement. Use Gauss elimination to solve
0.0003x1 1 3.0000x2 5 2.0001
1.0000x1 1 1.0000x2 5 1.0000
Note that in this form the first pivot element, a11 5 0.0003, is very close to zero. Then
repeat the computation, but partial pivot by reversing the order of the equations. The
exact solution is x1 5 1y3 and x2 5 2y3.
Solution. Multiplying the first equation by 1y(0.0003) yields
x1 1 10,000x2 5 6667
which can be used to eliminate x1 from the second equation:
29999x2 5 26666
9.4 TECHNIQUES FOR IMPROVING SOLUTIONS 265
which can be solved for
x2 5
2
3
This result can be substituted back into the first equation to evaluate x1:
x1 5
2.0001 2 3(2y3)
0.0003
(E9.9.1)
However, due to subtractive cancellation, the result is very sensitive to the number of
significant figures carried in the computation:
Absolute Value
of Percent
Significant Relative Error
Figures x2 x1 for x1
3 0.667 23.33 1099
4 0.6667 0.0000 100
5 0.66667 0.30000 10
6 0.666667 0.330000 1
7 0.6666667 0.3330000 0.1
Note how the solution for x1 is highly dependent on the number of significant figures.
This is because in Eq. (E9.9.1), we are subtracting two almost-equal numbers. On the
other hand, if the equations are solved in reverse order, the row with the larger pivot
element is normalized. The equations are
1.0000x1 1 1.0000x2 5 1.0000
0.0003x1 1 3.0000x2 5 2.0001
Elimination and substitution yield x2 5 2y3. For different numbers of significant figures,
x1 can be computed from the first equation, as in
x1 5
1 2 (2y3)
1
(E9.9.2)
This case is much less sensitive to the number of significant figures in the computation:
Absolute Value
of Percent
Significant Relative Error
Figures x2 x1 for x1
3 0.667 0.333 0.1
4 0.6667 0.3333 0.01
5 0.66667 0.33333 0.001
6 0.666667 0.333333 0.0001
7 0.6666667 0.3333333 0.00001
Thus, a pivot strategy is much more satisfactory.
266 GAUSS ELIMINATION
General-purpose computer programs must include a pivot strategy. Figure 9.5
provides a simple algorithm to implement such a strategy. Notice that the algorithm
consists of two major loops. After storing the current pivot element and its row
number as the variables, big and p, the first loop compares the pivot element with
the elements below it to check whether any of these is larger than the pivot element.
If so, the new largest element and its row number are stored in big and p. Then,
the second loop switches the original pivot row with the one with the largest ele-
ment so that the latter becomes the new pivot row. This pseudocode can be inte-
grated into a program based on the other elements of Gauss elimination outlined in
Fig. 9.4. The best way to do this is to employ a modular approach and write Fig.
9.5 as a subroutine (or procedure) that would be called directly after the beginning
of the first loop in Fig. 9.4a.
Note that the second IF/THEN construct in Fig. 9.5 physically interchanges the rows.
For large matrices, this can become quite time consuming. Consequently, most codes do
not actually exchange rows but rather keep track of the pivot rows by storing the ap-
propriate subscripts in a vector. This vector then provides a basis for specifying the
proper row ordering during the forward-elimination and back-substitution operations.
Thus, the operations are said to be implemented in place.
9.4.3 Scaling
In Sec. 9.3.3, we proposed that scaling had value in standardizing the size of the deter-
minant. Beyond this application, it has utility in minimizing round-off errors for those
cases where some of the equations in a system have much larger coefficients than others.
Such situations are frequently encountered in engineering practice when widely different
units are used in the development of simultaneous equations. For instance, in electric-
circuit problems, the unknown voltages can be expressed in units ranging from microvolts
to kilovolts. Similar examples can arise in all fields of engineering. As long as each
equation is consistent, the system will be technically correct and solvable. However, the
use of widely differing units can lead to coefficients of widely differing magnitudes. This,
in turn, can have an impact on round-off error as it affects pivoting, as illustrated by the
following example.
EXAMPLE 9.10 Effect of Scaling on Pivoting and Round-Off
Problem Statement.
(a) Solve the following set of equations using Gauss elimination and a pivoting strategy:
2x1 1 100,000x2 5 100,000
x1 1 x2 5 2
(b) Repeat the solution after scaling the equations so that the maximum coefficient in
each row is 1.
(c) Finally, use the scaled coefficients to determine whether pivoting is necessary. How-
ever, actually solve the equations with the original coefficient values. For all cases,
retain only three significant figures. Note that the correct answers are x1 5 1.00002
and x2 5 0.99998 or, for three significant figures, x1 5 x2 5 1.00.
p 5 k
big 5 |ak,k|
DOFOR ii 5 k11, n
dummy 5 |aii,k|
IF (dummy . big)
big 5 dummy
p 5 ii
END IF
END DO
IF (p fi k)
DOFOR jj 5 k, n
dummy 5 ap,jj
ap,jj 5 ak,jj
ak,jj 5 dummy
END DO
dummy 5 bp
bp 5 bk
bk 5 dummy
END IF
FIGURE 9.5
Pseudocode to implement
partial pivoting.
9.4 TECHNIQUES FOR IMPROVING SOLUTIONS 267
Solution.
(a) Without scaling, forward elimination is applied to give
2x1 1 100,000x2 5 100,000
250,000x2 5 250,000
which can be solved by back substitution for
x2 5 1.00
x1 5 0.00
Although x2 is correct, x1 is 100 percent in error because of round-off.
(b) Scaling transforms the original equations to
0.00002x1 1 x2 5 1
x1 1 x2 5 2
Therefore, the rows should be pivoted to put the greatest value on the diagonal.
x1 1 x2 5 2
0.00002x1 1 x2 5 1
Forward elimination yields
x1 1 x2 5 2
x2 5 1.00
which can be solved for
x1 5 x2 5 1
Thus, scaling leads to the correct answer.
(c) The scaled coefficients indicate that pivoting is necessary. We therefore pivot but
retain the original coefficients to give
x1 1 x2 5 2
2x1 1 100,000x2 5 100,000
Forward elimination yields
x1 1 x2 5 2
100,000x2 5 100,000
which can be solved for the correct answer: x1 5 x2 5 1. Thus, scaling was useful
in determining whether pivoting was necessary, but the equations themselves did not
require scaling to arrive at a correct result.
268 GAUSS ELIMINATION
FIGURE 9.6
Pseudocode to implement Gauss elimination with partial pivoting.
SUB Gauss (a, b, n, x, tol, er)
DIMENSION s(n)
er 5 0
DOFOR i 5 1, n
si 5 ABS(ai,1)
DOFOR j 5 2, n
IF ABS(ai,j).si THEN si 5 ABS(ai,j)
END DO
END DO
CALL Eliminate(a, s, n, b, tol, er)
IF er ? 21 THEN
CALL Substitute(a, n, b, x)
END IF
END Gauss
SUB Eliminate (a, s, n, b, tol, er)
DOFOR k 5 1, n 2 1
CALL Pivot (a, b, s, n, k)
IF ABS (ak,k/sk) , tol THEN
er 5 21
EXIT DO
END IF
DOFOR i 5 k 1 1, n
factor 5 ai,k/ak,k
DOFOR j 5 k 1 1, n
ai,j 5 ai,j 2 factor*ak,j
END DO
bi 5 bi 2 factor * bk
END DO
END DO
IF ABS(an,n/sn) , to1 THEN er 5 21
END Eliminate
SUB Pivot (a, b, s, n, k)
p 5 k
big 5 ABS(ak,k/sk)
DOFOR ii 5 k 1 1, n
dummy 5 ABS(aii,k/sii)
IF dummy . big THEN
big 5 dummy
p 5 ii
END IF
END DO
IF p ? k THEN
DOFOR jj 5 k, n
dummy 5 ap,jj
ap,jj 5 ak,jj
ak,jj 5 dummy
END DO
dummy 5 bp
bp 5 bk
bk 5 dummy
dummy 5 sp
sp 5 sk
sk 5 dummy
END IF
END pivot
SUB Substitute (a, n, b, x)
xn 5 bn/an,n
DOFOR i 5 n 2 1, 1, 21
sum 5 0
DOFOR j 5 i 1 1, n
sum 5 sum 1 ai,j * xj
END DO
xn 5 (bn 2 sum) / an,n
END DO
END Substitute
9.4 TECHNIQUES FOR IMPROVING SOLUTIONS 269
As in the previous example, scaling has utility in minimizing round-off. However, it
should be noted that scaling itself also leads to round-off. For example, given the equation
2x1 1 300,000x2 5 1
and using three significant figures, scaling leads to
0.00000667x1 1 x2 5 0.00000333
Thus, scaling introduces a round-off error to the first coefficient and the right-hand-side
constant. For this reason, it is sometimes suggested that scaling should be employed only
as in part (c) of the preceding example. That is, it is used to calculate scaled values for
the coefficients solely as a criterion for pivoting, but the original coefficient values are
retained for the actual elimination and substitution computations. This involves a trade-
off if the determinant is being calculated as part of the program. That is, the resulting
determinant will be unscaled. However, because many applications of Gauss elimination
do not require determinant evaluation, it is the most common approach and will be used
in the algorithm in the next section.
9.4.4 Computer Algorithm for Gauss Elimination
The algorithms from Figs. 9.4 and 9.5 can now be combined into a larger algorithm to
implement the entire Gauss elimination algorithm. Figure 9.6 shows an algorithm for a
general subroutine to implement Gauss elimination.
Note that the program includes modules for the three primary operations of the
Gauss elimination algorithm: forward elimination, back substitution, and pivoting. In
addition, there are several aspects of the code that differ and represent improvements
over the pseudocodes from Figs. 9.4 and 9.5. These are:
The equations are not scaled, but scaled values of the elements are used to determine
whether pivoting is to be implemented.
The diagonal term is monitored during the pivoting phase to detect near-zero occurrences
in order to flag singular systems. If it passes back a value of er 5 21, a singular
matrix has been detected and the computation should be terminated. A parameter tol
is set by the user to a small number in order to detect near-zero occurrences.
EXAMPLE 9.11 Solution of Linear Algebraic Equations Using the Computer
Problem Statement. A computer program to solve linear algebraic equations such
as one based on Fig. 9.6 can be used to solve a problem associated with the falling
parachutist example discussed in Chap. 1. Suppose that a team of three parachutists
is connected by a weightless cord while free-falling at a velocity of 5 m/s (Fig. 9.7).
270 GAUSS ELIMINATION
Solution. Free-body diagrams for each of the parachutists are depicted in Fig. 9.8.
Summing the forces in the vertical direction and using Newton’s second law gives a set
of three simultaneous linear equations:
m1g 2 T 2 c1y 5 m1a
m2g 1 T 2 c2y 2 R 5 m2a
m3g 2 c3y 1 R 5 m3a
These equations have three unknowns: a, T, and R. After substituting the known values,
the equations can be expressed in matrix form as (g 5 9.81 m/s2
),
£
70 1 0
60 21 1
40 0 21
§ •
a
T
R
¶ 5 •
636.7
518.6
307.4
¶
This system can be solved using your own software. The result is a 5 8.6041 m/s2
;
T 5 34.4118 N; and R 5 36.7647 N.
FIGURE 9.7
Three parachutists free-falling
while connected by weightless
cords.
R
T
1
2
3
a
T
m3g
R
T
R m2g m1g
c3v c2v c1v
3 2 1
FIGURE 9.8
Free-body diagrams for each of the three falling parachutists.
Parachutist Mass, kg Drag Coefficient, kg/s
1 70 10
2 60 14
3 40 17
Calculate the tension in each section of cord and the acceleration of the team, given
the following:
9.6 NONLINEAR SYSTEMS OF EQUATIONS 271
9.5 COMPLEX SYSTEMS
In some problems, it is possible to obtain a complex system of equations
[C]{Z} 5 {W} (9.27)
where
[C] 5 [A] 1 i[B]
{Z} 5 {X} 1 i{Y}
{W} 5 {U} 1 i{V} (9.28)
where i 5 121.
The most straightforward way to solve such a system is to employ one of the algo-
rithms described in this part of the book, but replace all real operations with complex
ones. Of course, this is only possible for those languages, such as Fortran, that allow
complex variables.
For languages that do not permit the declaration of complex variables, it is possible
to write a code to convert real to complex operations. However, this is not a trivial task.
An alternative is to convert the complex system into an equivalent one dealing with real
variables. This can be done by substituting Eq. (9.28) into Eq. (9.27) and equating real
and complex parts of the resulting equation to yield
[A]{X} 2 [B]{Y} 5 {U} (9.29)
and
[B]{X} 1 [A]{Y} 5 {V} (9.30)
Thus, the system of n complex equations is converted to a set of 2n real ones. This
means that storage and execution time will be increased significantly. Consequently, a
trade-off exists regarding this option. If you evaluate complex systems infrequently, it is
preferable to use Eqs. (9.29) and (9.30) because of their convenience. However, if you
use them often and desire to employ a language that does not allow complex data types,
it may be worth the up-front programming effort to write a customized equation solver
that converts real to complex operations.
9.6 NONLINEAR SYSTEMS OF EQUATIONS
Recall that at the end of Chap. 6 we presented an approach to solve two nonlinear equa-
tions with two unknowns. This approach can be extended to the general case of solving
n simultaneous nonlinear equations.
f1(x1, x2, p , xn) 5 0
f2(x1, x2, p , xn) 5 0
. .
. . (9.31)
. .
fn(x1, x2, p , xn) 5 0
272 GAUSS ELIMINATION
The solution of this system consists of the set of x values that simultaneously result in
all the equations equaling zero.
As described in Sec. 6.5.2, one approach to solving such systems is based on a
multidimensional version of the Newton-Raphson method. Thus, a Taylor series expan-
sion is written for each equation. For example, for the kth equation,
fk,i11 5 fk,i 1 (x1,i11 2 x1,i)
0fk,i
0x1
1 (x2,i11 2 x2,i)
0fk,i
0x2
1 p 1 (xn,i11 2 xn,i)
0fk,i
0xn
(9.32)
where the first subscript, k, represents the equation or unknown and the second subscript
denotes whether the value or function in question is at the present value (i) or at the next
value (i 1 1).
Equations of the form of (9.32) are written for each of the original nonlinear equa-
tions. Then, as was done in deriving Eq. (6.20) from (6.19), all fk,i11 terms are set to
zero as would be the case at the root, and Eq. (9.32) can be written as
2fk,i 1 x1,i
0fk,i
0x1
1 x2,i
0fk,i
0x2
1 p 1 xn,i
0fk,i
0xn
5 x1,i11
0fk,i
0x1
1 x2,i11
0fk,i
0x2
1 p 1 xn,i11
0fk,i
0xn
(9.33)
Notice that the only unknowns in Eq. (9.33) are the xk,i11 terms on the right-hand side.
All other quantities are located at the present value (i) and, thus, are known at any
iteration. Consequently, the set of equations generally represented by Eq. (9.33) (that is,
with k 5 1, 2, . . . , n) constitutes a set of linear simultaneous equations that can be
solved by methods elaborated in this part of the book.
Matrix notation can be employed to express Eq. (9.33) concisely. The partial
derivatives can be expressed as
[Z] 5
I
0f1,i
0x1
0f1,i
0x2
p
0f1,i
0xn
0f2,i
0x1
0f2,i
0x2
p
0f2,i
0xn
. . .
. . .
. . .
0fn,i
0x1
0fn,i
0x2
p
0fn,i
0xn
Y
(9.34)
The initial and final values can be expressed in vector form as
{Xi}T
5 :x1,i x2,i
p xn,i ;
and
{Xi11}T
5 :x1,i11 x2,i11
p xn,i11 ;
9.7 GAUSS-JORDAN 273
Finally, the function values at i can be expressed as
{Fi}T
5 : f1,i f2,i
p fn,i ;
Using these relationships, Eq. (9.33) can be represented concisely as
[Z]{Xi11} 5 2{Fi} 1 [Z]{Xi} (9.35)
Equation (9.35) can be solved using a technique such as Gauss elimination. This process
can be repeated iteratively to obtain refined estimates in a fashion similar to the two-
equation case in Sec. 6.5.2.
It should be noted that there are two major shortcomings to the foregoing approach.
First, Eq. (9.34) is often inconvenient to evaluate. Therefore, variations of the Newton-
Raphson approach have been developed to circumvent this dilemma. As might be ex-
pected, most are based on using finite-difference approximations for the partial derivatives
that comprise [Z].
The second shortcoming of the multiequation Newton-Raphson method is that excel-
lent initial guesses are usually required to ensure convergence. Because these are often
difficult to obtain, alternative approaches that are slower than Newton-Raphson but which
have better convergence behavior have been developed. One common approach is to
reformulate the nonlinear system as a single function
F(x) 5 a
n
i51
[ fi(x1, x2, p , xn)]2
(9.36)
where fi(xl, x2, . . . , xn) is the ith member of the original system of Eq. (9.31). The
values of x that minimize this function also represent the solution of the nonlinear system.
As we will see in Chap. 17, this reformulation belongs to a class of problems called
nonlinear regression. As such, it can be approached with a number of optimization tech-
niques such as the ones described later in this text (Part Four and specifically Chap. 14).
9.7 GAUSS-JORDAN
The Gauss-Jordan method is a variation of Gauss elimination. The major difference is that
when an unknown is eliminated in the Gauss-Jordan method, it is eliminated from all
other equations rather than just the subsequent ones. In addition, all rows are normalized
by dividing them by their pivot elements. Thus, the elimination step results in an identity
matrix rather than a triangular matrix (Fig. 9.9). Consequently, it is not necessary to em-
ploy back substitution to obtain the solution. The method is best illustrated by an example.
EXAMPLE 9.12 Gauss-Jordan Method
Problem Statement. Use the Gauss-Jordan technique to solve the same system as in
Example 9.5:
3x1 2 0.1x2 2 0.2x3 5 7.85
0.1x1 1 7x2 2 0.3x3 5 219.3
0.3x1 2 0.2x2 1 10x3 5 71.4
£
a11 a12 a13 b1
a21 a22 a23 b2
a31 a32 a33 b3
§
T
£
1 0 0 b(n)
1
0 1 0 b(n)
2
0 0 1 b(n)
3
§
T
x1 5 b(n)
1
x2 5 b(n)
2
x3 5 b(n)
3
FIGURE 9.9
Graphical depiction of the
Gauss-Jordan method. Compare
with Fig. 9.3 to elucidate the
differences between this tech-
nique and Gauss elimination.
The superscript (n) means that
the elements of the right-hand-
side vector have been modified
n times (for this case, n 5 3).
274 GAUSS ELIMINATION
Solution. First, express the coefficients and the right-hand side as an augmented matrix:
£
3 20.1 20.2 7.85
0.1 7 20.3 219.3
0.3 20.2 10 71.4
§
Then normalize the first row by dividing it by the pivot element, 3, to yield
£
1 20.0333333 20.066667 2.61667
0.1 7 20.3 219.3
0.3 20.2 10 71.4
§
The x1 term can be eliminated from the second row by subtracting 0.1 times the first row
from the second row. Similarly, subtracting 0.3 times the first row from the third row will
eliminate the x1 term from the third row:
£
1 20.0333333 20.066667 2.61667
0 7.00333 20.293333 219.5617
0 20.190000 10.0200 70.6150
§
Next, normalize the second row by dividing it by 7.00333:
£
1 20.0333333 20.066667 2.61667
0 1 20.0418848 22.79320
0 20.190000 10.0200 70.6150
§
Reduction of the x2 terms from the first and third equations gives
£
1 0 20.0680629 2.52356
0 1 20.0418848 22.79320
0 0 10.01200 70.0843
§
The third row is then normalized by dividing it by 10.0120:
£
1 0 20.0680629 2.52356
0 1 20.0418848 22.79320
0 0 1 7.0000
§
Finally, the x3 terms can be reduced from the first and the second equations to give
£
1 0 0 3.0000
0 1 0 22.5000
0 0 1 7.0000
§
Thus, as depicted in Fig. 9.9, the coefficient matrix has been transformed to the identity
matrix, and the solution is obtained in the right-hand-side vector. Notice that no back
substitution was required to obtain the solution.
All the material in this chapter regarding the pitfalls and improvements in Gauss
elimination also applies to the Gauss-Jordan method. For example, a similar pivoting
strategy can be used to avoid division by zero and to reduce round-off error.
PROBLEMS 275
Although the Gauss-Jordan technique and Gauss elimination might appear almost
identical, the former requires more work. Using a similar approach to Sec. 9.2.1, it can
be determined that the number of flops involved in naive Gauss-Jordan is
n3
1 n2
2 n ——
——
S
as n increases
n3
1 O(n2
) (9.37)
Thus, Gauss-Jordan involves approximately 50 percent more operations than Gauss elim-
ination [compare with Eq. (9.23)]. Therefore, Gauss elimination is the simple elimination
method of preference for obtaining solutions of linear algebraic equations. One of the
primary reasons that we have introduced the Gauss-Jordan, however, is that it is still used
in engineering as well as in some numerical algorithms.
9.8 SUMMARY
In summary, we have devoted most of this chapter to Gauss elimination, the most fun-
damental method for solving simultaneous linear algebraic equations. Although it is one
of the earliest techniques developed for this purpose, it is nevertheless an extremely
effective algorithm for obtaining solutions for many engineering problems. Aside from
this practical utility, this chapter also provided a context for our discussion of general
issues such as round-off, scaling, and conditioning. In addition, we briefly presented
material on the Gauss-Jordan method, as well as complex and nonlinear systems.
Answers obtained using Gauss elimination may be checked by substituting them into
the original equations. However, this does not always represent a reliable check for ill-
conditioned systems. Therefore, some measure of condition, such as the determinant of
the scaled system, should be computed if round-off error is suspected. Using partial
pivoting and more significant figures in the computation are two options for mitigating
round-off error. In the next chapter, we will return to the topic of system condition when
we discuss the matrix inverse.
PROBLEMS
9.1
(a) Write the following set of equations in matrix form:
8 5 6x3 1 2x2
2 2 x1 5 x3
5x2 1 8xl 5 13
(b) Multiply the matrix of coefficients by its transpose; i.e., [A][A]T
.
9.2 A number of matrices are defined as
[A] 5 £
4 7
1 2
5 6
§ [B] 5 £
4 3 7
1 2 7
2 0 4
§
{C} 5 •
3
6
1
¶ [D] 5 c
9 4 3 26
2 21 7 5
d
[E] 5 £
1 5 8
7 2 3
4 0 6
§
[F] 5 c
3 0 1
1 7 3
d :G; 5 :7 6 4;
Answer the following questions regarding these matrices:
(a) What are the dimensions of the matrices?
(b) Identify the square, column, and row matrices.
(c) What are the values of the elements: a12, b23, d32, e22, f12, g12?
(d) Perform the following operations:
(1) [E] 1 [B] (5) [E] 3 [B]
(2) [A] 3 [F] (6) {C}T
(3) [B] 2 [E] (7) [B] 3 [A]
(4) 7 3 [B] (8) [D]T
276 GAUSS ELIMINATION
(e) Solve again, but with a11 modified slightly to 0.52. Interpret
your results.
9.8 Given the equations
10x1 1 2x2 2 x3 5 27
23x1 2 6x2 1 2x3 5 261.5
x1 1 x2 1 5x3 5 221.5
(a) Solve by naive Gauss elimination. Show all steps of the com-
putation.
(b) Substitute your results into the original equations to check your
answers.
9.9 Use Gauss elimination to solve:
8x1 1 2x2 2 2x3 5 22
10x1 1 2x2 1 4x3 5 4
12x1 1 2x2 1 2x3 5 6
Employ partial pivoting and check your answers by substituting
them into the original equations.
9.10 Given the system of equations
23x2 1 7x3 5 2
x1 1 2x2 2 x3 5 3
5x1 2 2x2 5 2
(a) Compute the determinant.
(b) Use Cramer’s rule to solve for the x’s.
(c) Use Gauss elimination with partial pivoting to solve for the x’s.
(d) Substitute your results back into the original equations to check
your solution.
9.11 Given the equations
2x1 2 6x2 2 x3 5 238
23x1 2 x2 1 7x3 5 234
28x1 1 x2 2 2x3 5 220
(a) Solve by Gauss elimination with partial pivoting. Show all
steps of the computation.
(b) Substitute your results into the original equations to check your
answers.
9.12 Use Gauss-Jordan elimination to solve:
2x1 1 x2 2 x3 5 1
5x1 1 2x2 1 2x3 5 24
3x1 1 x2 1 x3 5 5
Do not employ pivoting. Check your answers by substituting them
into the original equations.
(9) [A] 3 {C} (11) [E]T
[E]
(10) [I] 3 [B] (12) {C}T
{C}
9.3 Three matrices are defined as
[A] 5 £
1 6
3 10
7 4
§ [B] 5 c
1 3
0.5 2
d [C] 5 c
2 22
23 1
d
(a) Perform all possible multiplications that can be computed be-
tween pairs of these matrices.
(b) Use the method in Box PT3.2 to justify why the remaining
pairs cannot be multiplied.
(c) Use the results of (a) to illustrate why the order of multiplica-
tion is important.
9.4 Use the graphical method to solve
4x1 2 8x2 5 224
2x1 1 6x2 5 34
Check your results by substituting them back into the equations.
9.5 Given the system of equations
21.1x1 1 10x2 5 120
22x1 1 17.4x2 5 174
(a) Solve graphically and check your results by substituting them
back into the equations.
(b) On the basis of the graphical solution, what do you expect re-
garding the condition of the system?
(c) Compute the determinant.
(d) Solve by the elimination of unknowns.
9.6 For the set of equations
2x2 1 5x3 5 9
2x1 1 x2 1 x3 5 9
3x1 1 x2 5 10
(a) Compute the determinant.
(b) Use Cramer’s rule to solve for the x’s.
(c) Substitute your results back into the original equation to check
your results.
9.7 Given the equations
0.5x1 2 x2 5 29.5
1.02x1 2 2x2 5 218.8
(a) Solve graphically.
(b) Compute the determinant.
(c) On the basis of (a) and (b), what would you expect regarding
the system’s condition?
(d) Solve by the elimination of unknowns.
PROBLEMS 277
9.17 Develop, debug, and test a program in either a high-level lan-
guage or macro language of your choice to generate the transpose
of a matrix. Test it on the matrices from Prob. 9.3.
9.18 Develop, debug, and test a program in either a high-level lan-
guage or macro language of your choice to solve a system of equa-
tions with Gauss elimination with partial pivoting. Base the
program on the pseudocode from Fig. 9.6. Test the program using
the following system (which has an answer of x1 5 x2 5 x3 5 1),
x1 1 2x2 2 x3 5 2
5x1 1 2x2 1 2x3 5 9
23x1 1 5x2 2 x3 5 1
9.19 Three masses are suspended vertically by a series of identi-
cal springs where mass 1 is at the top and mass 3 is at the bottom.
If g 5 9.81 m/s2
, m1 5 2 kg, m2 5 3 kg, m3 5 2.5 kg, and the
k’s 5 10 kg/s2
, solve for the displacements x.
9.20 Develop, debug, and test a program in either a high-level lan-
guage or macro language of your choice to solve a system of n si-
multaneous nonlinear equations based on Sec. 9.6. Test the program
by solving Prob. 7.12.
9.21 Recall from Sec. 8.2 that determining the chemistry of water
exposed to atmospheric CO2 can be determined by solving five
nonlinear equations (Eqs. 8.6 through 8.10) for five unknowns: cT,
[HCO3
2
], [CO3
22
], [H1
], and [OH2
]. Employing the parameters
from Sec. 8.2 and the program developed in Prob. 9.20, solve this
system for conditions in 1958 when the partial pressure of CO2 was
315 ppm. Use your results to compute the pH.
9.13 Solve:
x1 1 x2 2 x3 5 23
6x1 1 2x2 1 2x3 5 2
23x1 1 4x2 1 x3 5 1
with (a) naive Gauss elimination, (b) Gauss elimination with par-
tial pivoting, and (c) Gauss-Jordan without partial pivoting.
9.14 Perform the same computation as in Example 9.11, but use
five parachutists with the following characteristics:
Parachutist Mass, kg Drag Coefficient, kg/s
1 55 10
2 75 12
3 60 15
4 75 16
5 90 10
The parachutists have a velocity of 9 m/s.
9.15 Solve
c
3 1 2i 4
2 i 1
d e
z1
z2
f 5 e
2 1 i
3
f
9.16 Develop, debug, and test a program in either a high-level lan-
guage or macro language of your choice to multiply two matrices—
that is, [X] 5 [Y][Z], where [Y] is m by n and [Z] is n by p. Test the
program using the matrices from Prob. 9.3.
10
C H A P T E R 10
278
LU Decomposition and
Matrix Inversion
This chapter deals with a class of elimination methods called LU decomposition tech-
niques. The primary appeal of LU decomposition is that the time-consuming elimination
step can be formulated so that it involves only operations on the matrix of coefficients,
[A]. Thus, it is well suited for those situations where many right-hand-side vectors {B}
must be evaluated for a single value of [A]. Although there are a variety of ways in which
this is done, we will focus on showing how the Gauss elimination method can be imple-
mented as an LU decomposition.
One motive for introducing LU decomposition is that it provides an efficient means
to compute the matrix inverse. The inverse has a number of valuable applications in
engineering practice. It also provides a means for evaluating system condition.
10.1 LU DECOMPOSITION
As described in Chap. 9, Gauss elimination is designed to solve systems of linear alge-
braic equations,
[A]{X} 5 {B} (10.1)
Although it certainly represents a sound way to solve such systems, it becomes inefficient
when solving equations with the same coefficients [A], but with different right-hand-side
constants (the b’s).
Recall that Gauss elimination involves two steps: forward elimination and back-
substitution (Fig. 9.3). Of these, the forward-elimination step comprises the bulk of the
computational effort (recall Table 9.1). This is particularly true for large systems of
equations.
LU decomposition methods separate the time-consuming elimination of the matrix
[A] from the manipulations of the right-hand side {B}. Thus, once [A] has been “decom-
posed,” multiple right-hand-side vectors can be evaluated in an efficient manner.
Interestingly, Gauss elimination itself can be expressed as an LU decomposition.
Before showing how this can be done, let us first provide a mathematical overview of
the decomposition strategy.
10.1 LU DECOMPOSITION 279
10.1.1 Overview of LU Decomposition
Just as was the case with Gauss elimination, LU decomposition requires pivoting to avoid
division by zero. However, to simplify the following description, we will defer the issue
of pivoting until after the fundamental approach is elaborated. In addition, the following
explanation is limited to a set of three simultaneous equations. The results can be directly
extended to n-dimensional systems.
Equation (10.1) can be rearranged to give
[A]{X} 2 {B} 5 0 (10.2)
Suppose that Eq. (10.2) could be expressed as an upper triangular system:
£
u11 u12 u13
0 u22 u23
0 0 u33
§ •
x1
x2
x3
¶ 5 •
d1
d2
d3
¶ (10.3)
Recognize that this is similar to the manipulation that occurs in the first step of Gauss
elimination. That is, elimination is used to reduce the system to upper triangular form.
Equation (10.3) can also be expressed in matrix notation and rearranged to give
[U]{X} 2 {D} 5 0 (10.4)
Now, assume that there is a lower diagonal matrix with 1’s on the diagonal,
[L] 5 £
1 0 0
l21 1 0
l31 l32 1
§ (10.5)
that has the property that when Eq. (10.4) is premultiplied by it, Eq. (10.2) is the result.
That is,
[L]{[U]{X} 2 {D}} 5 [A]{X} 2 {B} (10.6)
If this equation holds, it follows from the rules for matrix multiplication that
[L][U] 5 [A] (10.7)
and
[L]{D} 5 {B} (10.8)
A two-step strategy (see Fig. 10.1) for obtaining solutions can be based on Eqs. (10.4),
(10.7), and (10.8):
1. LU decomposition step. [A] is factored or “decomposed” into lower [L] and upper
[U] triangular matrices.
2. Substitution step. [L] and [U] are used to determine a solution {X} for a right-hand-
side {B}. This step itself consists of two steps. First, Eq. (10.8) is used to generate
an intermediate vector {D} by forward substitution. Then, the result is substituted
into Eq. (10.4), which can be solved by back substitution for {X}.
Now, let us show how Gauss elimination can be implemented in this way.
280 LU DECOMPOSITION AND MATRIX INVERSION
10.1.2 LU Decomposition Version of Gauss Elimination
Although it might appear at face value to be unrelated to LU decomposition, Gauss
elimination can be used to decompose [A] into [L] and [U]. This can be easily seen for
[U], which is a direct product of the forward elimination. Recall that the forward-
elimination step is intended to reduce the original coefficient matrix [A] to the form
[U] 5 £
a11 a12 a13
0 a¿
22 a¿23
0 0 a–
33
§ (10.9)
which is in the desired upper triangular format.
Though it might not be as apparent, the matrix [L] is also produced during the step.
This can be readily illustrated for a three-equation system,
£
a11 a12 a13
a21 a22 a23
a31 a32 a33
§ •
x1
x2
x3
¶ 5 •
b1
b2
b3
¶
The first step in Gauss elimination is to multiply row 1 by the factor [recall Eq. (9.13)]
f21 5
a21
a11
and subtract the result from the second row to eliminate a21. Similarly, row 1 is multiplied by
f31 5
a31
a11
FIGURE 10.1
The steps in LU decomposition.
A X
X
X
B
B
D
D
D
U
L
L
U
⫽
⫽
Substitution
⫽
(b) Forward
(c) Backward
(a) Decomposition
10.1 LU DECOMPOSITION 281
and the result subtracted from the third row to eliminate a31. The final step is to multiply
the modified second row by
f32 5
a¿
32
a¿
22
and subtract the result from the third row to eliminate a¿
32.
Now suppose that we merely perform all these manipulations on the matrix [A].
Clearly, if we do not want to change the equation, we also have to do the same to the
right-hand side {B}. But there is absolutely no reason that we have to perform the ma-
nipulations simultaneously. Thus, we could save the f’s and manipulate {B} later.
Where do we store the factors f21, f31, and f32? Recall that the whole idea behind the
elimination was to create zeros in a21, a31, and a32. Thus, we can store f21 in a21, f31 in
a31, and f32 in a32. After elimination, the [A] matrix can therefore be written as
£
a11 a12 a13
f21 a¿
22 a¿
23
f31 f32 a–
33
§ (10.10)
This matrix, in fact, represents an efficient storage of the LU decomposition of [A],
[A] S [L][U] (10.11)
where
[U] 5 £
a11 a12 a13
0 a¿22 a¿23
0 0 a–
33
§
and
[L] 5 £
1 0 0
f21 1 0
f31 f32 1
§
The following example confirms that [A] 5 [L][U].
EXAMPLE 10.1 LU Decomposition with Gauss Elimination
Problem Statement. Derive an LU decomposition based on the Gauss elimination per-
formed in Example 9.5.
Solution. In Example 9.5, we solved the matrix
[A] 5 £
3 20.1 20.2
0.1 7 20.3
0.3 20.2 10
§
After forward elimination, the following upper triangular matrix was obtained:
[U] 5 £
3 20.1 20.2
0 7.00333 20.293333
0 0 10.0120
§
282 LU DECOMPOSITION AND MATRIX INVERSION
The factors employed to obtain the upper triangular matrix can be assembled into a
lower triangular matrix. The elements a21 and a31 were eliminated by using the factors
f21 5
0.1
3
5 0.03333333 f31 5
0.3
3
5 0.1000000
and the element a¿
32 was eliminated by using the factor
f32 5
20.19
7.00333
5 20.0271300
Thus, the lower triangular matrix is
[L] 5 £
1 0 0
0.0333333 1 0
0.100000 20.0271300 1
§
Consequently, the LU decomposition is
[A] 5 [L][U] 5 £
1 0 0
0.0333333 1 0
0.100000 20.0271300 1
§ £
3 20.1 20.2
0 7.00333 2 0.293333
0 0 10.0120
§
This result can be verified by performing the multiplication of [L][U] to give
[L][U] 5 £
3 20.1 20.2
0.0999999 7 20.3
0.3 20.2 9.99996
§
where the minor discrepancies are due to round-off.
The following is pseudocode for a subroutine to implement the decomposition phase:
SUB Decompose (a, n)
DOFOR k 5 1, n 2 1
DOFOR i 5 k 11, n
factor 5 ai,k/
ak,k
ai,k 5 factor
DOFOR j 5 k 1 1, n
ai,j 5 ai,j 2 factor * ak,j
END DO
END DO
END DO
END Decompose
Notice that this algorithm is “naive” in the sense that pivoting is not included. This
feature will be added later when we develop the full algorithm for LU decomposition.
After the matrix is decomposed, a solution can be generated for a particular right-
hand-side vector {B}. This is done in two steps. First, a forward-substitution step is
executed by solving Eq. (10.8) for {D}. It is important to recognize that this merely
10.1 LU DECOMPOSITION 283
amounts to performing the elimination manipulations on {B}. Thus, at the end of this
step, the right-hand side will be in the same state that it would have been had we per-
formed forward manipulation on [A] and {B} simultaneously.
The forward-substitution step can be represented concisely as
di 5 bi 2 a
i21
j51
aij dj for i 5 2, 3, p , n (10.12)
The second step then merely amounts to implementing back substitution, as in Eq.
(10.4). Again, it is important to recognize that this is identical to the back-substitution
phase of conventional Gauss elimination. Thus, in a fashion similar to Eqs. (9.16) and
(9.17), the back-substitution step can be represented concisely as
xn 5 dnyann (10.13)
xi 5
di 2 a
n
j5i11
aij xj
aii
for i 5 n 2 1, n 2 2, p , 1 (10.14)
EXAMPLE 10.2 The Substitution Steps
Problem Statement. Complete the problem initiated in Example 10.1 by generating
the final solution with forward and back substitution.
Solution. As stated above, the intent of forward substitution is to impose the elimination
manipulations, that we had formerly applied to [A], on the right-hand-side vector {B}.
Recall that the system being solved in Example 9.5 was
£
3 20.1 20.2
0.1 7 20.3
0.3 20.2 10
§ •
x1
x2
x3
¶ 5 •
7.85
219.3
71.4
¶
and that the forward-elimination phase of conventional Gauss elimination resulted in
£
3 20.1 20.2
0 7.00333 20.293333
0 0 10.0120
§ •
x1
x2
x3
¶ 5 •
7.85
219.5617
70.0843
¶ (E10.2.1)
The forward-substitution phase is implemented by applying Eq. (10.7) to our problem,
£
1 0 0
0.0333333 1 0
0.100000 20.0271300 1
§ •
d1
d2
d3
¶ 5 •
7.85
219.3
71.4
¶
or multiplying out the left-hand side,
d1 5 7.85
0.0333333d1 1 d2 5 219.3
0.1d1 2 0.02713d2 1 d3 5 71.4
284 LU DECOMPOSITION AND MATRIX INVERSION
We can solve the first equation for d1,
d1 5 7.85
which can be substituted into the second equation to solve for
d2 5 219.3 2 0.0333333(7.85) 5 219.5617
Both d1 and d2 can be substituted into the third equation to give
d3 5 71.4 2 0.1(7.85) 1 0.02713(219.5617) 5 70.0843
Thus,
{D} 5 •
7.85
219.5617
70.0843
¶
which is identical to the right-hand side of Eq. (E10.2.1).
This result can then be substituted into Eq. (10.4), [U]{X} 5 {D}, to give
£
3 20.1 20.2
0 7.00333 20.293333
0 0 10.0120
§ •
x1
x2
x3
¶ 5 •
7.85
219.5617
70.0843
¶
which can be solved by back substitution (see Example 9.5 for details) for the final solution,
{X} 5 •
3
22.5
7.00003
¶
The following is pseudocode for a subroutine to implement both substitution phases:
SUB Substitute (a, n, b, x)
'forward substitution
DOFOR i 5 2, n
sum 5 bi
DOFOR j 5 1, i 2 1
sum 5 sum 2 ai,j * bj
END DO
bi 5 sum
END DO
'back substitution
xn 5 bn/an,n
DOFOR i 5 n 2 1, 1, 21
sum 5 0
DOFOR j 5 i 1 1, n
sum 5 sum 1 ai,j * xj
END DO
xi 5 (bi 2 sum)/ai,i
END DO
END Substitute
10.1 LU DECOMPOSITION 285
The LU decomposition algorithm requires the same total multiply/divide flops as for
Gauss elimination. The only difference is that a little less effort is expended in the de-
composition phase since the operations are not applied to the right-hand side. Thus, the
number of multiply/divide flops involved in the decomposition phase can be calculated
as
n3
3
2
n
3
———
—
S
as n increases n3
3
1 O(n) (10.15)
Conversely, the substitution phase takes a little more effort. Thus, the number of
flops for forward and back substitution is n2
. The total effort is therefore identical to
Gauss elimination
n3
3
2
n
3
1 n2
———
—
S
as n increases n3
3
1 O(n2
) (10.16)
10.1.3 LU Decomposition Algorithm
An algorithm to implement an LU decomposition expression of Gauss elimination is
listed in Fig. 10.2. Four features of this algorithm bear mention:
The factors generated during the elimination phase are stored in the lower part of the
matrix. This can be done because these are converted to zeros anyway and are
unnecessary for the final solution. This storage saves space.
This algorithm keeps track of pivoting by using an order vector o. This greatly speeds
up the algorithm because only the order vector (as opposed to the whole row) is pivoted.
The equations are not scaled, but scaled values of the elements are used to determine
whether pivoting is to be implemented.
The diagonal term is monitored during the pivoting phase to detect near-zero
occurrences in order to flag singular systems. If it passes back a value of er 5 21,
a singular matrix has been detected and the computation should be terminated. A
parameter tol is set by the user to a small number in order to detect near-zero
occurrences.
10.1.4 Crout Decomposition
Notice that for the LU decomposition implementation of Gauss elimination, the [L] matrix
has 1’s on the diagonal. This is formally referred to as a Doolittle decomposition, or fac-
torization. An alternative approach involves a [U] matrix with 1’s on the diagonal. This is
called Crout decomposition. Although there are some differences between the approaches
(Atkinson, 1978; Ralston and Rabinowitz, 1978), their performance is comparable.
The Crout decomposition approach generates [U] and [L] by sweeping through the
matrix by columns and rows, as depicted in Fig. 10.3. It can be implemented by the
following concise series of formulas:
li,1 5 ai,1 for i 5 1, 2, p , n (10.17)
u1j 5
a1j
l11
for j 5 2, 3, p , n (10.18)
286 LU DECOMPOSITION AND MATRIX INVERSION
For j 5 2, 3, . . . , n 2 1
lij 5 aij 2 a
j21
k51
likukj for i 5 j, j 1 1, p , n (10.19)
ujk 5
ajk 2 a
j21
i51
ljiuik
ljj
for k 5 j 1 1, j 1 2, p , n (10.20)
SUB Ludecomp (a, b, n, tol, x, er)
DIM on, sn
er 5 0
CALL Decompose(a, n, tol, o, s, er)
IF er ,. 21 THEN
CALL Substitute(a, o, n, b, x)
END IF
END Ludecomp
SUB Decompose (a, n, tol, o, s, er)
DOFOR i 5 1, n
oi 5 i
si 5 ABS(ai,1)
DOFOR j 5 2, n
IF ABS(ai,j).si THEN si 5 ABS(ai,j)
END DO
END DO
DOFOR k 5 1, n 2 1
CALL Pivot(a, o, s, n, k)
IF ABS(ao(k),kyso(k)) , tol THEN
er 5 21
PRINT ao(k),kyso(k)
EXIT DO
END IF
DOFOR i 5 k 1 1, n
factor 5 ao(i),kyao(k),k
ao(i),k 5 factor
DOFOR j 5 k 1 1, n
ao(i),j 5 ao(i),j 2 factor * ao(k),j
END DO
END DO
END DO
IF ABS(ao(k),kyso(k)) , tol THEN
er 5 21
PRINT ao(k),kyso(k)
FIGURE 10.2
Pseudocode for an LU decomposition algorithm.
END IF
END Decompose
SUB Pivot (a, o, s, n, k)
p 5 k
big 5 ABS(ao(k),kyso(k))
DOFOR ii 5 k 1 1, n
dummy 5 ABS(ao(ii),kyso(ii))
IF dummy . big THEN
big 5 dummy
p 5 ii
END IF
END DO
dummy 5 op
op 5 ok
ok 5 dummy
END Pivot
SUB Substitute (a, o, n, b, x)
DOFOR i 5 2, n
sum 5 bo(i)
DOFOR j 5 1, i 2 1
sum 5 sum 2 ao(i),j * bo(j)
END DO
bo(i) 5 sum
END DO
xn 5 bo(n)yao(n),n
DOFOR i 5 n 2 1, 1, 21
sum 5 0
DOFOR j 5 i 1 1, n
sum 5 sum 1 ao(i),j * xj
END DO
xi 5 (bo(i) 2 sum)yao(i),i
END DO
END Substitute
FIGURE 10.3
A schematic depicting the
evaluations involved in Crout
LU decomposition.
(a)
(b)
(c)
(d)
10.2 THE MATRIX INVERSE 287
and
lnn 5 ann 2 a
n21
k51
lnkukn (10.21)
Aside from the fact that it consists of a few concise loops, the foregoing approach also
has the benefit that storage space can be economized. There is no need to store the 1’s on
the diagonal of [U] or the 0’s for [L] or [U] because they are givens in the method. Con-
sequently, the values of [U] can be stored in the zero space of [L]. Further, close examina-
tion of the foregoing derivation makes it clear that after each element of [A] is employed
once, it is never used again. Therefore, as each element of [L] and [U] is computed, it can
be substituted for the corresponding element (as designated by its subscripts) of [A].
Pseudocode to accomplish this is presented in Fig. 10.4. Notice that Eq. (10.17) is
not included in the pseudocode because the first column of [L] is already stored in [A].
Otherwise, the algorithm directly follows from Eqs. (10.18) through (10.21).
10.2 THE MATRIX INVERSE
In our discussion of matrix operations (Sec. PT3.2.2), we introduced the notion that if a
matrix [A] is square, there is another matrix, [A]21
, called the inverse of [A], for which
[Eq. (PT3.3)]
[A][A21
] 5 [A]21
[A] 5 [I]
DOFOR j 5 2, n
a1,j 5 a1,jya1,1
END DO
DOFOR j 5 2, n 2 1
DOFOR i 5 j, n
sum 5 0
DOFOR k 5 1, j 2 1
sum 5 sum 1 ai,k ? ak,j
END DO
ai,j 5 ai,j 2 sum
END DO
DOFOR k 5 j 1 1, n
sum 5 0
DOFOR i 5 1, j 2 1
sum 5 sum 1 aj,i ? ai,k
END DO
aj,k 5 (aj,k 2 sum)yaj,j
END DO
END DO
sum 5 0
DOFOR k 5 1, n 2 1
sum 5 sum 1 an,k ? ak,n
END DO
an,n 5 an,n 2 sum
FIGURE 10.4
Pseudocode for Crout’s LU
decomposition algorithm.
288 LU DECOMPOSITION AND MATRIX INVERSION
Now we will focus on how the inverse can be computed numerically. Then we will
explore how it can be used for engineering analysis.
10.2.1 Calculating the Inverse
The inverse can be computed in a column-by-column fashion by generating solutions
with unit vectors as the right-hand-side constants. For example, if the right-hand-side
constant has a 1 in the first position and zeros elsewhere,
{b} 5 •
1
0
0
¶
the resulting solution will be the first column of the matrix inverse. Similarly, if a unit
vector with a 1 at the second row is used
{b} 5 •
0
1
0
¶
the result will be the second column of the matrix inverse.
The best way to implement such a calculation is with the LU decomposition algorithm
described at the beginning of this chapter. Recall that one of the great strengths of LU
decomposition is that it provides a very efficient means to evaluate multiple right-
hand-side vectors. Thus, it is ideal for evaluating the multiple unit vectors needed to
compute the inverse.
EXAMPLE 10.3 Matrix Inversion
Problem Statement. Employ LU decomposition to determine the matrix inverse for the
system from Example 10.2.
[A] 5 £
3 20.1 20.2
0.1 7 20.3
0.3 20.2 10
§
Recall that the decomposition resulted in the following lower and upper triangular matrices:
[U] 5 £
3 20.1 20.2
0 7.00333 20.293333
0 0 10.0120
§ [L] 5 £
1 0 0
0.0333333 1 0
0.100000 20.0271300 1
§
Solution. The first column of the matrix inverse can be determined by performing the
forward-substitution solution procedure with a unit vector (with 1 in the first row) as the
right-hand-side vector. Thus, Eq. (10.8), the lower-triangular system, can be set up as
£
1 0 0
0.0333333 1 0
0.100000 20.0271300 1
§ •
d1
d2
d3
¶ 5 •
1
0
0
¶
10.2 THE MATRIX INVERSE 289
and solved with forward substitution for {D}T
5 :1 20.03333 20.1009;. This vector
can then be used as the right-hand side of Eq. (10.3),
£
3 20.1 20.2
0 7.00333 20.293333
0 0 10.0120
§ •
x1
x2
x3
¶ 5 •
1
20.03333
20.1009
¶
which can be solved by back substitution for {X}T
5 :0.33249 20.00518 20.01008;,
which is the first column of the matrix,
[A]21
5 £
0.33249 0 0
20.00518 0 0
20.01008 0 0
§
To determine the second column, Eq. (10.8) is formulated as
£
1 0 0
0.0333333 1 0
0.100000 20.0271300 1
§ •
d1
d2
d3
¶ 5 •
0
1
0
¶
This can be solved for {D}, and the results are used with Eq. (10.3) to determine
{X}T
5 :0.004944 0.142903 0.00271;, which is the second column of the matrix,
[A]21
5 £
0.33249 0.004944 0
20.00518 0.142903 0
20.01008 0.00271 0
§
Finally, the forward- and back-substitution procedures can be implemented with
{B}T
5 :0 0 1; to solve for {X}T
5 :0.006798 0.004183 0.09988;, which is the
final column of the matrix,
[A]21
5 £
0.33249 0.004944 0.006798
20.00518 0.142903 0.004183
20.01008 0.00271 0.09988
§
The validity of this result can be checked by verifying that [A][A]21
5 [I].
Pseudocode to generate the matrix inverse is shown in Fig. 10.5. Notice how the
decomposition subroutine from Fig. 10.2 is called to perform the decomposition and then
generates the inverse by repeatedly calling the substitution algorithm with unit vectors.
The effort required for this algorithm is simply computed as
n3
3
2
n
3
1 n(n2
) 5
4n3
3
2
n
4
(10.22)
decomposition 1 n 3 substitutions
where from Sec. 10.1.2, the decomposition is defined by Eq. (10.15) and the effort in-
volved with every right-hand-side evaluation involves n2
multiply/divide flops.
290 LU DECOMPOSITION AND MATRIX INVERSION
10.2.2 Stimulus-Response Computations
As discussed in Sec. PT3.1.2, many of the linear systems of equations confronted in engi-
neering practice are derived from conservation laws. The mathematical expression of these
laws is some form of balance equation to ensure that a particular property—mass, force,
heat, momentum, or other—is conserved. For a force balance on a structure, the properties
might be horizontal or vertical components of the forces acting on each node of the structure
(see Sec. 12.2). For a mass balance, the properties might be the mass in each reactor of a
chemical process (see Sec. 12.1). Other fields of engineering would yield similar examples.
A single balance equation can be written for each part of the system, resulting in a
set of equations defining the behavior of the property for the entire system. These equa-
tions are interrelated, or coupled, in that each equation may include one or more of the
variables from the other equations. For many cases, these systems are linear and, there-
fore, of the exact form dealt with in this chapter:
[A]{X} 5 {B} (10.23)
Now, for balance equations, the terms of Eq. (10.23) have a definite physical interpreta-
tion. For example, the elements of {X} are the levels of the property being balanced for each
part of the system. In a force balance of a structure, they represent the horizontal and vertical
forces in each member. For the mass balance, they are the mass of chemical in each reactor.
In either case, they represent the system’s state or response, which we are trying to determine.
The right-hand-side vector {B} contains those elements of the balance that are in-
dependent of behavior of the system—that is, they are constants. As such, they often
represent the external forces or stimuli that drive the system.
FIGURE 10.5
Driver program that uses some of the subprograms from Fig. 10.2 to generate a matrix inverse.
CALL Decompose (a, n, tol, o, s, er)
IF er 5 0 THEN
DOFOR i 5 1, n
DOFOR j 5 1, n
IF i 5 j THEN
b(j) 5 1
ELSE
b(j) 5 0
END IF
END DO
CALL Substitute (a, o, n, b, x)
DOFOR j 5 1, n
ai(j, i) 5 x(j)
END DO
END DO
Output ai, if desired
ELSE
PRINT ill-conditioned system
END IF
10.3 ERROR ANALYSIS AND SYSTEM CONDITION 291
Finally, the matrix of coefficients [A] usually contains the parameters that express
how the parts of the system interact or are coupled. Consequently, Eq. (10.23) might be
reexpressed as
[Interactions]{response} 5 {stimuli}
Thus, Eq. (10.23) can be seen as an expression of the fundamental mathematical model
that we formulated previously as a single equation in Chap. 1 [recall Eq. (1.1)]. We can
now see that Eq. (10.23) represents a version that is designed for coupled systems involv-
ing several dependent variables {X}.
As we know from this chapter and Chap. 9, there are a variety of ways to solve
Eq. (10.23). However, using the matrix inverse yields a particularly interesting result.
The formal solution can be expressed as
{X} 5 [A]21
{B}
or (recalling our definition of matrix multiplication from Box PT3.2)
x1 5 a21
11 b1 1 a21
12 b2 1 a21
13 b3
x2 5 a21
21 b1 1 a21
22 b2 1 a21
23 b3
x3 5 a21
31 b1 1 a21
32 b2 1 a21
33 b3
Thus, we find that the inverted matrix itself, aside from providing a solution, has ex-
tremely useful properties. That is, each of its elements represents the response of a
single part of the system to a unit stimulus of any other part of the system.
Notice that these formulations are linear and, therefore, superposition and propor-
tionality hold. Superposition means that if a system is subject to several different stimuli
(the b’s), the responses can be computed individually and the results summed to obtain
a total response. Proportionality means that multiplying the stimuli by a quantity results
in the response to those stimuli being multiplied by the same quantity. Thus, the coef-
ficient a21
11 is a proportionality constant that gives the value of x1 due to a unit level of
b1. This result is independent of the effects of b2 and b3 on x1, which are reflected in the
coefficients a21
12 and a21
13 , respectively. Therefore, we can draw the general conclusion
that the element a21
ij of the inverted matrix represents the value of xi due to a unit quan-
tity of bj. Using the example of the structure, element a21
ij of the matrix inverse would
represent the force in member i due to a unit external force at node j. Even for small
systems, such behavior of individual stimulus-response interactions would not be intui-
tively obvious. As such, the matrix inverse provides a powerful technique for understand-
ing the interrelationships of component parts of complicated systems. This power will
be demonstrated in Secs. 12.1 and 12.2.
10.3 ERROR ANALYSIS AND SYSTEM CONDITION
Aside from its engineering applications, the inverse also provides a means to discern
whether systems are ill-conditioned. Three methods are available for this purpose:
1. Scale the matrix of coefficients [A] so that the largest element in each row is 1. Invert
the scaled matrix and if there are elements of [A]21
that are several orders of magnitude
greater than one, it is likely that the system is ill-conditioned (see Box 10.1).
292 LU DECOMPOSITION AND MATRIX INVERSION
2. Multiply the inverse by the original coefficient matrix and assess whether the result
is close to the identity matrix. If not, it indicates ill-conditioning.
3. Invert the inverted matrix and assess whether the result is sufficiently close to the
original coefficient matrix. If not, it again indicates that the system is ill-conditioned.
Although these methods can indicate ill-conditioning, it would be preferable to ob-
tain a single number (such as the condition number from Sec. 4.2.3) that could serve as
an indicator of the problem. Attempts to formulate such a matrix condition number are
based on the mathematical concept of the norm.
10.3.1 Vector and Matrix Norms
A norm is a real-valued function that provides a measure of the size or “length” of
multicomponent mathematical entities such as vectors and matrices (see Box 10.2).
A simple example is a vector in three-dimensional Euclidean space (Fig. 10.6) that
can be represented as
:F; 5 :a b c;
where a, b, and c are the distances along the x, y, and z axes, respectively. The length
of this vector—that is, the distance from the coordinate (0, 0, 0) to (a, b, c)—can be
simply computed as
BFBe 5 2a2
1 b2
1 c2
where the nomenclature BFBe indicates that this length is referred to as the Euclidean
norm of [F].
Box 10.1 Interpreting the Elements of the Matrix Inverse as a Measure
of Ill-Conditioning
One method for assessing a system’s condition is to scale [A] so
that the largest element in each row is 1 and then compute [A]21
. If
elements of [A]21
are several orders of magnitude greater than the
elements of the original scaled matrix, it is likely that the system is
ill-conditioned.
Insight into this approach can be gained by recalling that a way
to check whether an approximate solution {X} is acceptable is to
substitute it into the original equations and see whether the origi-
nal right-hand-side constants result. This is equivalent to
{R} 5 {B} 2 [A]{X̃} (B10.1.1)
where {R} is the residual between the right-hand-side constants and
the values computed with the solution {X̃}. If {R} is small, we
might conclude that the {X̃} values are adequate. However, suppose
that {X} is the exact solution that yields a zero residual, as in
{0} 5 {B} 2 [A]{X} (B10.1.2)
Subtracting Eq. (B10.1.2) from (B10.1.1) yields
{R} 5 [A] {X} 2 {X̃}
Multiplying both sides of this equation by [A]21
gives
{X} 2 {X̃} 5 [A]21
{R}
This result indicates why checking a solution by substitution can
be misleading. For cases where elements of [A]21
are large, a
small discrepancy in the right-hand-side residual {R} could cor-
respond to a large error {X} 2 {X̃} in the calculated value of the
unknowns. In other words, a small residual does not guarantee an
accurate solution. However, we can conclude that if the largest
element of [A]21
is on the order of magnitude of unity, the system
can be considered to be well-conditioned. Conversely, if [A]21
includes elements much larger than unity, we conclude that the
system is ill-conditioned.
10.3 ERROR ANALYSIS AND SYSTEM CONDITION 293
Box 10.2 Matrix Norms
As developed in this section, Euclidean norms can be employed to
quantify the size of a vector,
BXBe 5
B a
n
i51
x2
i
or matrix,
BABe 5
B a
n
i51
a
n
j51
a2
i, j
For vectors, there are alternatives called p norms that can be
represented generally by
BXBp 5 a a
n
i51
Zxi Zp
b
1yp
We can also see that the Euclidean norm and the 2 norm, BXB2, are
identical for vectors.
Other important examples are
BXB1 5 a
n
i51
Zxi Z
which represents the norm as the sum of the absolute values of the
elements. Another is the maximum-magnitude or uniform-vector
norm.
BXBq 5 max
1#i#n
Zxi Z
which defines the norm as the element with the largest absolute
value.
Using a similar approach, norms can be developed for matrices.
For example,
BAB1 5 max
1#j#n a
n
i51
Zaij Z
That is, a summation of the absolute values of the coefficients is
performed for each column, and the largest of these summations is
taken as the norm. This is called the column-sum norm.
A similar determination can be made for the rows, resulting in a
uniform-matrix or row-sum norm,
BABq 5 max
1#i#n a
n
j51
Zaij Z
It should be noted that, in contrast to vectors, the 2 norm and the
Euclidean norm for a matrix are not the same. Whereas the Euclidean
norm BABe can be easily determined by Eq. (10.24), the matrix
2 norm BAB2 is calculated as
BAB2 5 (mmax)1y2
where mmax is the largest eigenvalue of [A]T
[A]. In Chap. 27, we
will learn more about eigenvalues. For the time being, the impor-
tant point is that the BAB2, or spectral, norm is the minimum norm
and, therefore, provides the tightest measure of size (Ortega 1972).
FIGURE 10.6
Graphical depiction of a vector
:F; 5 :a b c; in Euclidean
space.
y
x
a
2 ⫹
b
2 ⫹
c
2
b
储F
储 e
=
z
c
a
294 LU DECOMPOSITION AND MATRIX INVERSION
Similarly, for an n-dimensional vector :X; 5 :x1 x2
p xn ;, a Euclidean norm
would be computed as
BXBe 5
B a
n
i51
x2
i
The concept can be extended further to a matrix [A], as in
BABe 5
B a
n
i51
a
n
j51
a2
i,j (10.24)
which is given a special name—the Frobenius norm. However, as with the other vector
norms, it provides a single value to quantify the “size” of [A].
It should be noted that there are alternatives to the Euclidean and Frobenius norms
(see Box 10.2). For example, a uniform vector norm is defined as
BXBq 5 max
1#i#n
Zxi Z
That is, the element with the largest absolute value is taken as the measure of the vector’s
size. Similarly, a uniform matrix norm or row-sum norm is defined as
BABq 5 max
1#i#n a
n
j51
Zaij Z (10.25)
In this case, the sum of the absolute value of the elements is computed for each row,
and the largest of these is taken as the norm.
Although there are theoretical benefits for using certain of the norms, the choice is
sometimes influenced by practical considerations. For example, the uniform-row norm is
widely used because of the ease with which it can be calculated and the fact that it usu-
ally provides an adequate measure of matrix size.
10.3.2 Matrix Condition Number
Now that we have introduced the concept of the norm, we can use it to define
Cond [A] 5 BAB # BA21
B (10.26)
where Cond [A] is called the matrix condition number. Note that for a matrix [A], this
number will be greater than or equal to 1. It can be shown (Ralston and Rabinowitz,
1978; Gerald and Wheatley, 2004) that
B¢XB
BXB
# Cond [A]
B¢AB
BAB
That is, the relative error of the norm of the computed solution can be as large as the
relative error of the norm of the coefficients of [A] multiplied by the condition number.
For example, if the coefficients of [A] are known to t-digit precision (that is, rounding
errors are on the order of 102t
) and Cond [A] 5 10c
, the solution [X] may be valid to
only t 2 c digits (rounding errors ,10c2t
).
10.3 ERROR ANALYSIS AND SYSTEM CONDITION 295
EXAMPLE 10.4 Matrix Condition Evaluation
Problem Statement. The Hilbert matrix, which is notoriously ill-conditioned, can be
represented generally as
F
1 1y2 1y3 p 1yn
1y2 1y3 1y4 p 1y(n 1 1)
. . . .
. . . .
. . . .
1yn 1y(n 1 1) 1y(n 1 2) p 1y(2n 2 1)
V
Use the row-sum norm to estimate the matrix condition number for the 3 3 3 Hilbert
matrix,
[A] 5 £
1 1y2 1y3
1y2 1y3 1y4
1y3 1y4 1y5
§
Solution. First, the matrix can be normalized so that the maximum element in each
row is 1,
[A] 5 £
1 1y2 1y3
1 2y3 1y2
1 3y4 3y5
§
Summing each of the rows gives 1.833, 2.1667, and 2.35. Thus, the third row has the
largest sum and the row-sum norm is
BABq 5 1 1
3
4
1
3
5
5 2.35
The inverse of the scaled matrix can be computed as
[A]21
5 £
9 218 10
236 96 260
30 290 60
§
Note that the elements of this matrix are larger than the original matrix. This is also
reflected in its row-sum norm, which is computed as
BA21
Bq 5 Z236Z 1 Z96Z 1 Z260Z 5 192
Thus, the condition number can be calculated as
Cond [A] 5 2.35(192) 5 451.2
The fact that the condition number is considerably greater than unity suggests that
the system is ill-conditioned. The extent of the ill-conditioning can be quantified by
calculating c 5 log 451.2 5 2.65. Computers using IEEE floating-point representation
296 LU DECOMPOSITION AND MATRIX INVERSION
have approximately t 5 log 2224
5 7.2 significant base-10 digits (recall Sec. 3.4.1).
Therefore, the solution could exhibit rounding errors of up to 10(2.65-7.2)
5 3 3 1025
.
Note that such estimates almost always overpredict the actual error. However, they are
useful in alerting you to the possibility that round-off errors may be significant.
Practically speaking, the problem with implementing Eq. (10.26) is the computa-
tional price required to obtain BA21
B. Rice (1983) outlines some possible strategies to
mitigate this problem. Further, he suggests an alternative way to assess system condi-
tion: run the same solution on two different compilers. Because the resulting codes will
likely implement the arithmetic differently, the effect of ill-conditioning should be evi-
dent from such an experiment. Finally, it should be mentioned that software packages
such as MATLAB software and Mathcad have the capability to conveniently compute
matrix condition. We will review these capabilities when we review such packages at
the end of Chap. 11.
10.3.3 Iterative Refinement
In some cases, round-off errors can be reduced by the following procedure. Suppose that
we are solving the following set of equations:
a11x1 1 a12x2 1 a13x3 5 b1
a21x1 1 a22x2 1 a23x3 5 b2 (10.27)
a31x1 1 a32x2 1 a33x3 5 b3
For conciseness, we will limit the following discussion to this small (3 3 3) system.
However, the approach is generally applicable to larger sets of linear equations.
Suppose an approximate solution vector is given by {X
˜ }T
5 :x̃1 x̃2 x̃3 ;. This solution
can be substituted into Eq. (10.27) to give
a11x̃1 1 a12x̃2 1 a13x̃3 5 b̃1
a21x̃1 1 a22x̃2 1 a23x̃3 5 b̃2 (10.28)
a31x̃1 1 a32x̃2 1 a33x̃3 5 b̃3
Now, suppose that the exact solution {X} is expressed as a function of the approximate
solution and a vector of correction factors {DX}, where
x1 5 x̃1 1 ¢x1
x2 5 x̃2 1 ¢x2 (10.29)
x3 5 x̃3 1 ¢x3
If these results are substituted into Eq. (10.27), the following system results:
a11(x̃1 1 ¢x1) 1 a12(x̃2 1 ¢x2) 1 a13(x̃3 1 ¢x3) 5 b1
a21(x̃1 1 ¢x1) 1 a22(x̃2 1 ¢x2) 1 a23(x̃3 1 ¢x3) 5 b2 (10.30)
a31(x̃1 1 ¢x1) 1 a32(x̃2 1 ¢x2) 1 a33(x̃3 1 ¢x3) 5 b3
PROBLEMS 297
Now, Eq. (10.28) can be subtracted from Eq. (10.30) to yield
a11¢x1 1 a12¢x2 1 a13¢x3 5 b1 2 b̃1 5 E1
a21¢x1 1 a22¢x2 1 a23¢x3 5 b2 2 b̃2 5 E2 (10.31)
a31¢x1 1 a32¢x2 1 a33¢x3 5 b3 2 b̃3 5 E3
This system itself is a set of simultaneous linear equations that can be solved to obtain
the correction factors. The factors can then be applied to improve the solution, as specified
by Eq. (10.29).
It is relatively straightforward to integrate an iterative refinement procedure into com-
puter programs for elimination methods. It is especially effective for the LU decomposition
approaches described earlier, which are designed to evaluate different right-hand-side vec-
tors efficiently. Note that to be effective for correcting ill-conditioned systems, the E’s in
Eq. (10.31) must be expressed in double precision.
PROBLEMS
10.1 Use the rules of matrix multiplication to prove that Eqs. (10.7)
and (10.8) follow from Eq. (10.6).
10.2 (a) Use naive Gauss elimination to decompose the following
system according to the description in Sec. 10.1.2.
10x1 1 2x2 2 x3 5 27
23x1 2 6x2 1 2x3 5 261.5
x1 1 x2 1 5x3 5 221.5
Then, multiply the resulting [L] and [U] matrices to determine that
[A] is produced. (b) Use LU decomposition to solve the system.
Show all the steps in the computation. (c) Also solve the system for
an alternative right-hand-side vector: {B}T
5 :12 18 26;.
10.3
(a) Solve the following system of equations by LU decomposition
without pivoting
8x1 1 4x2 2 x3 5 11
22x1 1 5x2 1 x3 5 4
2x1 2 x2 1 6x3 5 7
(b) Determine the matrix inverse. Check your results by verifying
that [A][A]21
5 [I].
10.4 Solve the following system of equations using LU decompo-
sition with partial pivoting:
2x1 2 6x2 2 x3 5 238
23x1 2 x2 1 7x3 5 234
28x1 1 x2 2 2x3 5 220
10.5 Determine the total flops as a function of the number of
equations n for the (a) decomposition, (b) forward-substitution,
and (c) back-substitution phases of the LU decomposition version
of Gauss elimination.
10.6 Use LU decomposition to determine the matrix inverse for the
following system. Do not use a pivoting strategy, and check your
results by verifying that [A][A]21
5 [I].
10x1 1 2x2 2 x3 5 27
23x1 2 6x2 1 2x3 5 261.5
x1 1 x2 1 5x3 5 221.5
10.7 Perform Crout decomposition on
2x1 2 5x2 1 x3 5 12
2x1 1 3x2 2 x3 5 28
3x1 2 4x2 1 2x3 5 16
Then, multiply the resulting [L] and [U] matrices to determine that
[A] is produced.
10.8 The following system of equations is designed to determine
concentrations (the c’s in gym3
) in a series of coupled reactors as a
function of the amount of mass input to each reactor (the right-hand
sides in gyday),
15c1 2 3c2 2 c3 5 3800
23c1 1 18c2 2 6c3 5 1200
24c1 2 c2 1 12c3 5 2350
(a) Determine the matrix inverse.
(b) Use the inverse to determine the solution.
(c) Determine how much the rate of mass input to reactor 3 must be
increased to induce a 10 g/m3
rise in the concentration of reactor 1.
(d) How much will the concentration in reactor 3 be reduced if the
rate of mass input to reactors 1 and 2 is reduced by 500 and
250 g/day, respectively?
298 LU DECOMPOSITION AND MATRIX INVERSION
How many digits of precision will be lost due to ill-conditioning?
(b) Repeat (a), but scale the matrix by making the maximum ele-
ment in each row equal to one.
10.16 Determine the condition number based on the row-sum
norm for the normalized 5 3 5 Hilbert matrix. How many signifi-
cant digits of precision will be lost due to ill-conditioning?
10.17 Besides the Hilbert matrix, there are other matrices that are
inherently ill-conditioned. One such case is the Vandermonde
matrix, which has the following form:
£
x2
1 x1 1
x2
2
x2 1
x2
3 x3 1
§
(a) Determine the condition number based on the row-sum norm
for the case where x1 5 4, x2 5 2, and x3 5 7.
(b) Use MATLAB or Mathcad software to compute the spectral
and Frobenius condition numbers.
10.18 Develop a user-friendly program for LU decomposition
based on the pseudocode from Fig. 10.2.
10.19 Develop a user-friendly program for LU decomposition, in-
cluding the capability to evaluate the matrix inverse. Base the pro-
gram on Figs. 10.2 and 10.5.
10.20 Use iterative refinement techniques to improve x1 5 2,
x2 5 23, and x3 5 8, which are approximate solutions of
2x1 1 5x2 1 x3 5 25
5x1 1 2x2 1 x3 5 12
x1 1 2x2 1 x3 5 3
10.21 Consider vectors:
A
S
5 2i
S
2 3 j
S
1 ak
S
B
S
5 bi
S
1 j
S
2 4k
S
C
S
5 3i
S
1 c j
S
1 2k
S
Vector A
S
is perpendicular to B
S
as well as to C
S
. It is also known
that B
S
# C
S
5 2. Use any method studied in this chapter to solve for
the three unknowns, a, b, and c.
10.22 Consider the following vectors:
A
S
5 ai
S
1 b j
S
1 ck
S
B
S
5 22i
S
1 j
S
2 4k
S
C
S
5 i
S
1 3 j
S
1 2k
S
where A
S
is an unknown vector. If
(A
S
3 B
S
) 1 (A
S
3 C
S
) 5 (5a 1 6) i
S
1 (3b 2 2) j
S
1 (24c 1 1) k
S
use any method learned in this chapter to solve for the three un-
knowns, a, b, and c.
10.9 Solve the following set of equations with LU decomposition:
3x1 2 2x2 1 x3 5 210
2x1 1 6x2 2 4x3 5 44
2x1 2 2x2 1 5x3 5 226
10.10 (a) Determine the LU decomposition without pivoting by
hand for the following matrix and check your results by validating
that [L][U] 5 [A].
£
8 2 1
3 7 2
2 3 9
§
(b) Employ the result of (a) to compute the determinant.
(c) Repeat (a) and (b) using MATLAB.
10.11 Use the following LU decomposition to (a) compute the de-
terminant and (b) solve [A]{x} 5 {b} with {b}T
5 :210 44 226=.
[A] 5 [L][U] 5 £
1
0.6667 1
20.3333 20.3636 1
§ £
3 22 1
7.3333 24.6667
3.6364
§
10.12 Determine BABe, BAB1, and BABq for
[A] 5 £
8 2 210
29 1 3
15 21 6
§
Scale the matrix by making the maximum element in each row
equal to one.
10.13 Determine the Frobenius and the row-sum norms for the
systems in Probs. 10.3 and 10.4. Scale the matrices by making the
maximum element in each row equal to one.
10.14 A matrix [A] is defined as
[A] 5 ≥
0.125 0.25 0.5 1
0.015625 0.625 0.25 1
0.00463 0.02777 0.16667 1
0.001953 0.015625 0.125 1
¥
Using the column-sum norm, compute the condition number and
how many suspect digits would be generated by this matrix.
10.15 (a) Determine the condition number for the following
system using the row-sum norm. Do not normalize the system.
E
1 4 9 16 25
4 9 16 25 36
9 16 25 36 49
16 25 36 49 64
25 36 49 64 81
U
PROBLEMS 299
Use sufficient precision in displaying results to allow you to
detect imprecision.
(b) Repeat part (a) using a 7 3 7 Hilbert matrix.
(c) Repeat part (a) using a 10 3 10 Hilbert matrix.
10.25 Polynomial interpolation consists of determining the unique
(n 2 1)th-order polynomial that fits n data points. Such polynomi-
als have the general form,
f(x) 5 p1xn21
1 p2xn22
1 p 1 pn21 x 1 pn (P10.25)
where the p’s are constant coefficients. A straightforward way for
computing the coefficients is to generate n linear algebraic equations
that we can solve simultaneously for the coefficients. Suppose that
we want to determine the coefficients of the fourth-order polynomial
f(x) 5 p1x4
1 p2x3
1 p3x2
1 p4x 1 p5 that passes through the
following five points: (200, 0.746), (250, 0.675), (300, 0.616), (400,
0.525), and (500, 0.457). Each of these pairs can be substituted into
Eq. (P10.25) to yield a system of five equations with five unknowns
(the p’s). Use this approach to solve for the coefficients. In addition,
determine and interpret the condition number.
10.23 Let the function be defined on the interval [0, 2] as follows:
f(x) 5 e
ax 1 b, 0 # x # 1
cx 1 d, 1 # x # 2
f
Determine the constants a, b, c, and d so that the function f satisfies
the following:
f(0) 5 f(2) 5 1.
f is continuous on the entire interval.
a 1 b 5 4.
Derive and solve a system of linear algebraic equations with a ma-
trix form identical to Eq. (10.1).
10.24
(a) Create a 3 3 3 Hilbert matrix. This will be your matrix [A].
Multiply the matrix by the column vector {x} 5 [1, 1, 1]T
. The
solution of [A]{x} will be another column vector {b}. Using
any numerical package and Gauss elimination, find the solution
to [A]{x} 5 {b}using the Hilbert matrix and the vector {b} that
you calculated. Compare the result to your known {x} vector.
11
C H A P T E R 11
300
Special Matrices and
Gauss-Seidel
Certain matrices have a particular structure that can be exploited to develop efficient
solution schemes. The first part of this chapter is devoted to two such systems: banded
and symmetric matrices. Efficient elimination methods are described for both.
The second part of the chapter turns to an alternative to elimination methods, that
is, approximate, iterative methods. The focus is on the Gauss-Seidel method, which
employs initial guesses and then iterates to obtain refined estimates of the solution. The
Gauss-Seidel method is particularly well suited for large numbers of equations. In these
cases, elimination methods can be subject to round-off errors. Because the error of the
Gauss-Seidel method is controlled by the number of iterations, round-off error is not an
issue of concern with this method. However, there are certain instances where the Gauss-
Seidel technique will not converge on the correct answer. These and other trade-offs
between elimination and iterative methods will be discussed in subsequent pages.
11.1 SPECIAL MATRICES
As mentioned in Box PT3.1, a banded matrix is a square matrix that has all elements
equal to zero, with the exception of a band centered on the main diagonal. Banded sys-
tems are frequently encountered in engineering and scientific practice. For example, they
typically occur in the solution of differential equations. In addition, other numerical
methods such as cubic splines (Sec. 18.5) involve the solution of banded systems.
The dimensions of a banded system can be quantified by two parameters: the band-
width BW and the half-bandwidth HBW (Fig. 11.1). These two values are related by
BW 5 2HBW 1 1. In general, then, a banded system is one for which aij 5 0 if Zi 2 jZ .
HBW.
Although Gauss elimination or conventional LU decomposition can be employed to
solve banded equations, they are inefficient, because if pivoting is unnecessary none of
the elements outside the band would change from their original values of zero. Thus,
unnecessary space and time would be expended on the storage and manipulation of these
useless zeros. If it is known beforehand that pivoting is unnecessary, very efficient algo-
rithms can be developed that do not involve the zero elements outside the band. Because
many problems involving banded systems do not require pivoting, these alternative al-
gorithms, as described next, are the methods of choice.
11.1 SPECIAL MATRICES 301
11.1.1 Tridiagonal Systems
A tridiagonal system—that is, one with a bandwidth of 3—can be expressed generally as
G
f1 g1
e2 f2 g2
e3 f3 g3
. . .
. . .
. . .
en21 fn21 gn21
en fn
W g
x1
x2
x3
.
.
.
xn21
xn
w 5 g
r1
r2
r3
.
.
.
rn21
rn
w (11.1)
Notice that we have changed our notation for the coefficients from a’s and b’s to e’s, f’s,
g’s, and r’s. This was done to avoid storing large numbers of useless zeros in the square
matrix of a’s. This space-saving modification is advantageous because the resulting al-
gorithm requires less computer memory.
Figure 11.2 shows pseudocode for an efficient method, called the Thomas algorithm,
to solve Eq. (11.1). As with conventional LU decomposition, the algorithm consists of
three steps: decomposition and forward and back substitution. Thus, all the advantages
of LU decomposition, such as convenient evaluation of multiple right-hand-side vectors
and the matrix inverse, can be accomplished by proper application of this algorithm.
EXAMPLE 11.1 Tridiagonal Solution with the Thomas Algorithm
Problem Statement. Solve the following tridiagonal system with the Thomas algorithm.
≥
2.04 21
21 2.04 21
21 2.04 21
21 2.04
¥ μ
T1
T2
T3
T4
∂ 5 μ
40.8
0.8
0.8
200.8
∂
HBW + 1
HBW
BW
Diagonal
FIGURE 11.1
Parameters used to quantify the dimensions of a banded system. BW and HBW designate the
bandwidth and the half-bandwidth, respectively.
(a) Decomposition
DOFOR k 5 2, n
ek 5 ek yfk21
fk 5 fk 2 ek ? gk21
END DO
(b) Forward substitution
DOFOR k 5 2, n
rk 5 rk 2 ek ? rk21
END DO
(c) Back substitution
xn 5 rn yfn
DOFOR k 5 n 21, 1, 21
xk 5 (rk 2 gk ? xk11)yfk
END DO
FIGURE 11.2
Pseudocode to implement the
Thomas algorithm, an LU
decomposition method for tridi-
agonal systems.
302 SPECIAL MATRICES AND GAUSS-SEIDEL
Solution. First, the decomposition is implemented as
e2 5 21y2.04 5 20.49
f2 5 2.04 2 (20.49)(21) 5 1.550
e3 5 21y1.550 5 20.645
f3 5 2.04 2 (20.645)(21) 5 1.395
e4 5 21y1.395 5 20.717
f4 5 2.04 2 (20.717)(21) 5 1.323
Thus, the matrix has been transformed to
≥
2.04 21
20.49 1.550 21
20.645 1.395 21
20.717 1.323
¥
and the LU decomposition is
[A] 5 [L][U] 5 ≥
1
20.49 1
20.645 1
20.717 1
¥ ≥
2.04 21
1.550 21
1.395 21
1.323
¥
You can verify that this is correct by multiplying [L][U] to yield [A].
The forward substitution is implemented as
r2 5 0.8 2 (20.49)40.8 5 20.8
r3 5 0.8 2 (20.645)20.8 5 14.221
r4 5 200.8 2 (20.717)14.221 5 210.996
Thus, the right-hand-side vector has been modified to
μ
40.8
20.8
14.221
210.996
∂
which then can be used in conjunction with the [U] matrix to perform back substitution
and obtain the solution
T4 5 210.996y1.323 5 159.480
T3 5 [14.221 2 (21)159.48]y1.395 5 124.538
T2 5 [20.800 2 (21)124.538]y1.550 5 93.778
T1 5 [40.800 2 (21)93.778]y2.040 5 65.970
11.1.2 Cholesky Decomposition
Recall from Box PT3.1 that a symmetric matrix is one where aij 5 aji for all i and j. In
other words, [A] 5 [A]T
. Such systems occur commonly in both mathematical and
11.1 SPECIAL MATRICES 303
engineering problem contexts. They offer computational advantages because only half
the storage is needed and, in most cases, only half the computation time is required for
their solution.
One of the most popular approaches involves Cholesky decomposition. This algo-
rithm is based on the fact that a symmetric matrix can be decomposed, as in
[A] 5 [L][L]T
(11.2)
That is, the resulting triangular factors are the transpose of each other.
The terms of Eq. (11.2) can be multiplied out and set equal to each other. The result
can be expressed simply by recurrence relations. For the kth row,
lki 5
aki 2 a
i21
j51
lij lkj
lii
for i 5 1, 2, p , k 2 1 (11.3)
and
lkk 5
B
akk 2 a
k21
j51
l2
kj (11.4)
EXAMPLE 11.2 Cholesky Decomposition
Problem Statement. Apply Cholesky decomposition to the symmetric matrix
[A] 5 £
6 15 55
15 55 225
55 225 979
§
Solution. For the first row (k 5 1), Eq. (11.3) is skipped and Eq. (11.4) is employed
to compute
l11 5 1a11 5 16 5 2.4495
For the second row (k 5 2), Eq. (11.3) gives
l21 5
a21
l11
5
15
2.4495
5 6.1237
and Eq. (11.4) yields
l22 5 2a22 2 l2
21 5 255 2 (6.1237)2
5 4.1833
For the third row (k 5 3), Eq. (11.3) gives (i 5 1)
l31 5
a31
l11
5
55
2.4495
5 22.454
and (i 5 2)
l32 5
a32 2 l21l31
l22
5
225 2 6.1237(22.454)
4.1833
5 20.917
304 SPECIAL MATRICES AND GAUSS-SEIDEL
Figure 11.3 presents pseudocode for implementing the Cholesky decomposition al-
gorithm. It should be noted that the algorithm in Fig. 11.3 could result in an execution
error if the evaluation of akk involves taking the square root of a negative number. How-
ever, for cases where the matrix is positive definite,1
this will never occur. Because many
symmetric matrices dealt with in engineering are, in fact, positive definite, the Cholesky
algorithm has wide application. Another benefit of dealing with positive definite sym-
metric matrices is that pivoting is not required to avoid division by zero. Thus, we can
implement the algorithm in Fig. 11.3 without the complication of pivoting.
11.2 GAUSS-SEIDEL
Iterative or approximate methods provide an alternative to the elimination methods de-
scribed to this point. Such approaches are similar to the techniques we developed to
obtain the roots of a single equation in Chap. 6. Those approaches consisted of guessing
a value and then using a systematic method to obtain a refined estimate of the root.
Because the present part of the book deals with a similar problem—obtaining the values
that simultaneously satisfy a set of equations—we might suspect that such approximate
methods could be useful in this context.
The Gauss-Seidel method is the most commonly used iterative method. Assume that
we are given a set of n equations:
[A]{X} 5 {B}
Suppose that for conciseness we limit ourselves to a 3 3 3 set of equations. If the di-
agonal elements are all nonzero, the first equation can be solved for x1, the second for
x2, and the third for x3 to yield
x1 5
b1 2 a12x2 2 a13x3
a11
(11.5a)
1
A positive definite matrix is one for which the product {X}T
[A]{X} is greater than zero for all nonzero
vectors {X}.
and Eq. (11.4) yields
l33 5 2a33 2 l2
31 2 l2
32 5 2979 2 (22.454)2
2 (20.917)2
5 6.1101
Thus, the Cholesky decomposition yields
[L] 5 £
2.4495
6.1237 4.1833
22.454 20.917 6.1101
§
The validity of this decomposition can be verified by substituting it and its transpose
into Eq. (11.2) to see if their product yields the original matrix [A]. This is left for an
exercise.
FIGURE 11.3
Pseudocode for Cholesky’s LU
decomposition algorithm.
DOFOR k 5 1, n
DOFOR i 5 1, k 2 1
sum 5 0.
DOFOR j 5 1, i 2 1
sum 5 sum 1 aij ? akj
END DO
aki 5 (aki 2 sum)yaii
END DO
sum 5 0.
DOFOR j 5 1, k 2 1
sum 5 sum 1 a2
kj
END DO
akk 5 1akk 2 sum
END DO
11.2 GAUSS-SEIDEL 305
x2 5
b2 2 a21x1 2 a23x3
a22
(11.5b)
x3 5
b3 2 a31x1 2 a32x2
a33
(11.5c)
Now, we can start the solution process by choosing guesses for the x’s. A simple
way to obtain initial guesses is to assume that they are all zero. These zeros can be
substituted into Eq. (11.5a), which can be used to calculate a new value for x1 5 b1ya11.
Then, we substitute this new value of x1 along with the previous guess of zero for x3
into Eq. (11.5b) to compute a new value for x2. The process is repeated for Eq. (11.5c)
to calculate a new estimate for x3. Then we return to the first equation and repeat the
entire procedure until our solution converges closely enough to the true values. Conver-
gence can be checked using the criterion [recall Eq. (3.5)]
Zea,i Z 5 `
xj
i 2 xj21
i
xj
i
` 100% , es (11.6)
for all i, where j and j 2 1 are the present and previous iterations.
EXAMPLE 11.3 Gauss-Seidel Method
Problem Statement. Use the Gauss-Seidel method to obtain the solution of the same
system used in Example 10.2:
3x1 2 0.1x2 2 0.2x3 5 7.85
0.1x1 1 7x2 2 0.3x3 5 219.3
0.3x1 2 0.2x2 1 10x3 5 71.4
Recall that the true solution is x1 5 3, x2 5 22.5, and x3 5 7.
Solution. First, solve each of the equations for its unknown on the diagonal.
x1 5
7.85 1 0.1x2 1 0.2x3
3
(E11.3.1)
x2 5
219.3 2 0.1x1 1 0.3x3
7
(E11.3.2)
x3 5
71.4 2 0.3x1 1 0.2x2
10
(E11.3.3)
By assuming that x2 and x3 are zero, Eq. (E11.3.1) can be used to compute
x1 5
7.85 1 0 1 0
3
5 2.616667
This value, along with the assumed value of x3 5 0, can be substituted into Eq. (E11.3.2)
to calculate
x2 5
219.3 2 0.1(2.616667) 1 0
7
5 22.794524
306 SPECIAL MATRICES AND GAUSS-SEIDEL
The first iteration is completed by substituting the calculated values for x1 and x2 into
Eq. (E11.3.3) to yield
x3 5
71.4 2 0.3(2.616667) 1 0.2(22.794524)
10
5 7.005610
For the second iteration, the same process is repeated to compute
x1 5
7.85 1 0.1(22.794524) 1 0.2(7.005610)
3
5 2.990557 Zet Z 5 0.31%
x2 5
219.3 2 0.1(2.990557) 1 0.3(7.005610)
7
5 22.499625 Zet Z 5 0.015%
x3 5
71.4 2 0.3(2.990557) 1 0.2(22.499625)
10
5 7.000291 Zet Z 5 0.0042%
The method is, therefore, converging on the true solution. Additional iterations could be
applied to improve the answers. However, in an actual problem, we would not know the
true answer a priori. Consequently, Eq. (11.6) provides a means to estimate the error.
For example, for x1,
Zea, 1 Z 5 `
2.990557 2 2.616667
2.990557
` 100% 5 12.5%
For x2 and x3, the error estimates are Zea, 2Z 5 11.8% and Zea, 3Z 5 0.076%. Note that, as
was the case when determining roots of a single equation, formulations such as Eq. (11.6)
usually provide a conservative appraisal of convergence. Thus, when they are met, they
ensure that the result is known to at least the tolerance specified by es.
As each new x value is computed for the Gauss-Seidel method, it is immediately
used in the next equation to determine another x value. Thus, if the solution is converg-
ing, the best available estimates will be employed. An alternative approach, called Jacobi
iteration, utilizes a somewhat different tactic. Rather than using the latest available x’s,
this technique uses Eq. (11.5) to compute a set of new x’s on the basis of a set of old
x’s. Thus, as new values are generated, they are not immediately used but rather are
retained for the next iteration.
The difference between the Gauss-Seidel method and Jacobi iteration is depicted in
Fig. 11.4. Although there are certain cases where the Jacobi method is useful, Gauss-
Seidel’s utilization of the best available estimates usually makes it the method of preference.
11.2.1 Convergence Criterion for the Gauss-Seidel Method
Note that the Gauss-Seidel method is similar in spirit to the technique of simple fixed-
point iteration that was used in Sec. 6.1 to solve for the roots of a single equation.
Recall that simple fixed-point iteration had two fundamental problems: (1) it was some-
times nonconvergent and (2) when it converged, it often did so very slowly. The Gauss-
Seidel method can also exhibit these shortcomings.
11.2 GAUSS-SEIDEL 307
Convergence criteria can be developed by recalling from Sec. 6.5.1 that sufficient
conditions for convergence of two nonlinear equations, u(x, y) and y(x, y), are
`
0u
0x
` 1 `
0u
0y
` , 1 (11.7a)
and
`
0y
0x
` 1 `
0y
0y
` , 1 (11.7b)
These criteria also apply to linear equations of the sort we are solving with the
Gauss-Seidel method. For example, in the case of two simultaneous equations, the Gauss-
Seidel algorithm [Eq. (11.5)] can be expressed as
u(x1, x2) 5
b1
a11
2
a12
a11
x2 (11.8a)
and
y(x1, x2) 5
b2
a22
2
a21
a22
x1 (11.8b)
The partial derivatives of these equations can be evaluated with respect to each of the
unknowns as
0u
0x1
5 0
0u
0x2
5 2
a12
a11
FIGURE 11.4
Graphical depiction of the difference between (a) the Gauss-Seidel and (b) the Jacobi iterative
methods for solving simultaneous linear algebraic equations.
First Iteration
x1 5 (b1 2 a12x2 2 a13x3)ya11 x1 5 (b1 2 a12x2 2 a13x3)ya11
x2 5 (b2 2 a21x1 2 a23x3)ya22 x2 5 (b2 2 a21x1 2 a23x3)ya22
x3 5 (b3 2 a31x1 2 a32x2)ya33 x3 5 (b3 2 a31x1 2 a32x2)ya33
Second Interation
x1 5 (b1 2 a12x2 2 a13x3)ya11 x1 5 (b1 2 a12x2 2 a13x3)ya11
x2 5 (b2 2 a21x1 2 a23x3)ya22 x2 5 (b2 2 a21x1 2 a23x3)ya22
x3 5 (b3 2 a31x1 2 a32x2)ya33 x3 5 (b3 2 a31x1 2 a32x2)ya33
(a) (b)
T
T
T
T
T
T
⎫
⎪
⎪
⎪
⎪
⎪
⎬
⎪
⎪
⎪
⎪
⎪
⎭
308 SPECIAL MATRICES AND GAUSS-SEIDEL
and
0y
0x1
5 2
a21
a22
0y
0x2
5 0
which can be substituted into Eq. (11.7) to give
`
a12
a11
` , 1 (11.9a)
and
`
a21
a22
` , 1 (11.9b)
In other words, the absolute values of the slopes of Eq. (11.8) must be less than
unity to ensure convergence. This is displayed graphically in Fig. 11.5. Equation (11.9)
can also be reformulated as
Za11 Z . Za12 Z
and
Za22 Z . Za21 Z
That is, the diagonal element must be greater than the off-diagonal element for each row.
The extension of the above to n equations is straightforward and can be expressed as
Zaii Z . a
n
j51
j?i
Zaij Z (11.10)
FIGURE 11.5
Iteration cobwebs illustrating (a) convergence and (b) divergence of the Gauss-Seidel method. Notice
that the same functions are plotted in both cases (u: 11x1 1 13x2 5 286; v: 11x1 2 9x2 5 99).
Thus, the order in which the equations are implemented (as depicted by the direction of the first arrow
from the origin) dictates whether the computation converges.
x2
x1
v
u
(a)
x2
x1
v
u
(b)
11.2 GAUSS-SEIDEL 309
That is, the diagonal coefficient in each of the equations must be larger than the sum of
the absolute values of the other coefficients in the equation. This criterion is sufficient
but not necessary for convergence. That is, although the method may sometimes work if
Eq. (11.10) is not met, convergence is guaranteed if the condition is satisfied. Systems
where Eq. (11.10) holds are called diagonally dominant. Fortunately, many engineering
problems of practical importance fulfill this requirement.
11.2.2 Improvement of Convergence Using Relaxation
Relaxation represents a slight modification of the Gauss-Seidel method and is designed
to enhance convergence. After each new value of x is computed using Eq. (11.5), that
value is modified by a weighted average of the results of the previous and the present
iterations:
xnew
i 5 lxnew
i 1 (1 2 l)xold
i (11.11)
where l is a weighting factor that is assigned a value between 0 and 2.
If l 5 1, (1 2 l) is equal to 0 and the result is unmodified. However, if l is set at
a value between 0 and 1, the result is a weighted average of the present and the previous
results. This type of modification is called underrelaxation. It is typically employed to
make a nonconvergent system converge or to hasten convergence by dampening out
oscillations.
For values of l from 1 to 2, extra weight is placed on the present value. In this
instance, there is an implicit assumption that the new value is moving in the correct
direction toward the true solution but at too slow a rate. Thus, the added weight of l is
intended to improve the estimate by pushing it closer to the truth. Hence, this type of
modification, which is called overrelaxation, is designed to accelerate the convergence
of an already convergent system. The approach is also called successive or simultaneous
overrelaxation, or SOR.
The choice of a proper value for l is highly problem-specific and is often determined
empirically. For a single solution of a set of equations it is often unnecessary. However,
if the system under study is to be solved repeatedly, the efficiency introduced by a wise
choice of l can be extremely important. Good examples are the very large systems of
partial differential equations that often arise when modeling continuous variations of
variables (recall the distributed system depicted in Fig. PT3.1b). We will return to this
topic in Part Eight.
11.2.3 Algorithm for Gauss-Seidel
An algorithm for the Gauss-Seidel method, with relaxation, is depicted in Fig. 11.6. Note
that this algorithm is not guaranteed to converge if the equations are not input in a
diagonally dominant form.
The pseudocode has two features that bear mentioning. First, there is an initial set of
nested loops to divide each equation by its diagonal element. This reduces the total num-
ber of operations in the algorithm. Second, notice that the error check is designated by a
variable called sentinel. If any of the equations has an approximate error greater than the
stopping criterion (es), then the iterations are allowed to continue. The use of the sentinel
310 SPECIAL MATRICES AND GAUSS-SEIDEL
allows us to circumvent unnecessary calculations of error estimates once one of the equa-
tions exceeds the criterion.
11.2.4 Problem Contexts for the Gauss-Seidel Method
Aside from circumventing the round-off dilemma, the Gauss-Seidel technique has a num-
ber of other advantages that make it particularly attractive in the context of certain en-
gineering problems. For example, when the matrix in question is very large and very
sparse (that is, most of the elements are zero), elimination methods waste large amounts
of computer memory by storing zeros.
FIGURE 11.6
Pseudocode for Gauss-Seidel
with relaxation.
SUBROUTINE Gseid (a,b,n,x,imax,es,lambda)
DOFOR i 5 1,n
dummy 5 ai,i
DOFOR j 5 1,n
ai,j 5 ai,j/dummy
END DO
bi 5 bi/dummy
END DO
DOFOR i 5 1, n
sum 5 bi
DOFOR j 5 1, n
IF i fi j THEN sum 5 sum 2 ai,j*xj
END DO
xi5sum
END DO
iter51
DO
sentinel 5 1
DOFOR i 5 1,n
old 5 xi
sum 5 bi
DOFOR j 5 1,n
IF i fi j THEN sum 5 sum 2 ai,j*xj
END DO
xi 5 lambda*sum 1(1.2lambda)*old
IF sentinel 5 1 AND xi fi0. THEN
ea 5 ABS((xi 2 old)/xi)*100.
IF ea . es THEN sentinel 5 0
END IF
END DO
iter 5 iter 1 1
IF sentinel 5 1 OR (iter $ imax) EXIT
END DO
END Gseid
11.3 LINEAR ALGEBRAIC EQUATIONS WITH SOFTWARE PACKAGES 311
At the beginning of this chapter, we saw how this shortcoming could be circum-
vented if the coefficient matrix is banded. For nonbanded systems, there is usually no
simple way to avoid large memory requirements when using elimination methods. Be-
cause all computers have a finite amount of memory, this inefficiency can place a con-
straint on the size of systems for which elimination methods are practical.
Although a general algorithm such as the one in Fig. 11.6 is prone to the same
constraint, the structure of the Gauss-Seidel equations [Eq. (11.5)] permits concise pro-
grams to be developed for specific systems. Because only nonzero coefficients need be
included in Eq. (11.5), large savings of computer memory are possible. Although this
entails more up-front investment in software development, the long-term advantages are
substantial when dealing with large systems for which many simulations are to be per-
formed. Both lumped- and distributed-variable systems can result in large, sparse matri-
ces for which the Gauss-Seidel method has utility.
11.3 LINEAR ALGEBRAIC EQUATIONS WITH SOFTWARE PACKAGES
Software packages have great capabilities for solving systems of linear algebraic equa-
tions. Before describing these tools, we should mention that the approaches described in
Chap. 7 for solving nonlinear systems can be applied to linear systems. However, in this
section, we will focus on the approaches that are expressly designed for linear equations.
11.3.1 Excel
There are two ways to solve linear algebraic equations with Excel: (1) using the Solver
tool or (2) using matrix inversion and multiplication functions.
Recall that one way to determine the solution of linear algebraic equations is
{X} 5 [A]21
{B} (11.12)
Excel has built-in functions for both matrix inversion and multiplication that can be used
to implement this formula.
EXAMPLE 11.4 Using Excel to Solve Linear Systems
Problem Statement. Recall that in Chap. 10 we introduced the Hilbert matrix. The
following system is based on the Hilbert matrix. Note that it is scaled, as was done
previously in Example 10.3, so that the maximum coefficient in each row is unity.
£
1 1y2 1y3
1 2y3 1y2
1 3y4 3y5
§ •
x1
x2
x3
¶ 5 •
1.833333
2.166667
2.35
¶
The solution to this system is {X}T
5 :1 1 1;. Use Excel to obtain this solution.
Solution. The spreadsheet to solve this problem is displayed in Fig. 11.7. First, the
matrix [A] and the right-hand-side constants {B} are entered into the spreadsheet cells.
Then, a set of cells of the proper dimensions (in our example 3 3 3) is highlighted by
either clicking and dragging the mouse or by using the arrow keys while depressing the
shift key. As in Fig. 11.7, we highlight the range: B5. .D7.
S
O
F
T
W
A
R
E
312 SPECIAL MATRICES AND GAUSS-SEIDEL
Next, a formula invoking the matrix inverse function is entered,
=minverse(B1..D3)
Note that the argument is the range holding the elements of [A]. The Ctrl and Shift keys
are held down while the Enter key is depressed. The resulting inverse of [A] will be
calculated by Excel and displayed in the range B5. .D7 as shown in Fig. 11.7.
A similar approach is used to multiply the inverse by the right-hand-side vector. For
this case, the range from F5. .F7 is highlighted and the following formula is entered
=mmult(B5..D7,F1..F3)
where the first range is the first matrix to be multiplied, [A]21
, and the second range is
the second matrix to be multiplied, {B}. By again using the Ctrl-Shift-Enter combination,
the solution {X} will be calculated by Excel and displayed in the range F5. .F7, as shown
in Fig. 11.7. As can be seen, the correct answer results.
FIGURE 11.7
Notice that we deliberately reformatted the results in Example 11.4 to show 15
digits. We did this because Excel uses double-precision to store numerical values. Thus,
we see that round-off error occurs in the last two digits. This implies a condition number
on the order of 100, which agrees with the result of 451.2 originally calculated in
Example 10.3. Excel does not have the capability to calculate a condition number. In
most cases, particularly because it employs double-precision numbers, this does not rep-
resent a problem. However, for cases where you suspect that the system is ill-conditioned,
determination of the condition number is useful. MATLAB and Mathcad software are
capable of computing this quantity.
11.3.2 MATLAB
As the name implies, MATLAB (short for MATrix LABoratory) was designed to facili-
tate matrix manipulations. Thus, as might be expected, its capabilities in this area are
S
O
F
T
W
A
R
E
11.3 LINEAR ALGEBRAIC EQUATIONS WITH SOFTWARE PACKAGES 313
excellent. Some of the key MATLAB functions related to matrix operations are listed in
Table 11.1. The following example illustrates a few of these capabilities.
EXAMPLE 11.5 Using MATLAB to Manipulate Linear Algebraic Equations
Problem Statement. Explore how MATLAB can be employed to solve and analyze
linear algebraic equations. Use the same system as in Example 11.4.
Solution. First, we can enter the [A] matrix and the {B}vector,
 A 5 [ 1 1/2 1/3 ; 1 2/3 1/2 ; 1 3/4 3/5 ]
A =
1.0000 0.5000 0.3333
1.0000 0.6667 0.5000
1.0000 0.7500 0.6000
 B=[1+1/2+1/3;1+2/3+2/4;1+3/4+3/5]
B =
1.8333
2.1667
2.3500
Next, we can determine the condition number for [A], as in
 cond(A)
ans =
366.3503
TABLE 11.1 MATLAB functions to implement matrix analysis and numerical linear algebra.
Matrix Analysis Linear Equations
Function Description Function Description
cond Matrix condition number  and / Linear equation solution; use “help slash”
norm Matrix or vector norm chol Cholesky factorization
rcond LINPACK reciprocal condition estimator lu Factors from Gauss elimination
rank Number of linearly independent inv Matrix inverse
rows or columns
det Determinant qr Orthogonal-triangular decomposition
trace Sum of diagonal elements qrdelete Delete a column from the QR
factorization
null Null space qrinsert Insert a column in the QR factorization
orth Orthogonalization nnls Nonnegative least squares
rref Reduced row echelon form pinv Pseudoinverse
lscov Least squares in the presence of known
covariance
314 SPECIAL MATRICES AND GAUSS-SEIDEL
This result is based on the spectral, or BAB2, norm discussed in Box 10.2. Note that it
is of the same order of magnitude as the condition number 5 451.2 based on the row-
sum norm in Example 10.3. Both results imply that between two and three digits of
precision could be lost.
Now we can solve the system of equations in two different ways. The most direct
and efficient way is to employ backslash, or “left division”:
 X=AB
X =
1.0000
1.0000
1.0000
For cases such as ours, MATLAB uses Gauss elimination to solve such systems.
As an alternative, we can implement Eq. (PT3.6) directly, as in
 X=inv(A)*B
X =
1.0000
1.0000
1.0000
This approach actually determines the matrix inverse first and then performs the
matrix multiplication. Hence, it is more time consuming than using the backslash
approach.
S
O
F
T
W
A
R
E
11.3.3 Mathcad
Mathcad contains many special functions that manipulate vectors and matrices. These
include common operations such as the dot product, matrix transpose, matrix addition,
and matrix multiplication. In addition, it allows calculation of the matrix inverse, deter-
minant, trace, various types of norms, and condition numbers based on different norms.
It also has several functions that decompose matrices.
Systems of linear equations can be solved in two ways by Mathcad. First, it is pos-
sible to use matrix inversion and subsequent multiplication by the right-hand-side as
discussed in Chap. 10. In addition, Mathcad has a special function called lsolve(A,b)
that is specifically designed to solve linear equations. You can use other built-in functions
to evaluate the condition of A to determine if A is nearly singular and thus possibly
subject to round-off errors.
As an example, let’s use lsolve to solve a system of linear equations. As shown in
Fig. 11.8, the first step is to enter the coefficients of the A matrix using the definition
symbol and the Insert/Matrix pull down menu. This gives a box that allows you to
11.3 LINEAR ALGEBRAIC EQUATIONS WITH SOFTWARE PACKAGES 315
specify the dimensions of the matrix. For our case, we will select a dimension of 434,
and Mathcad places a blank 4-by-4-size matrix on screen. Now, simply click the
appropriate cell location and enter values. Repeat similar operations to create the right-
hand-side b vector. Now the vector x is defined as lsolve(A,b) and the value of x is
displayed with the equal sign.
We can also solve the same system using the matrix inverse. The inverse can be
simply computed by merely raising A to the exponent 21. The result is shown on the
right side of Fig. 11.8. The solution is then generated as the product of the inverse
times b.
Next, let’s use Mathcad to find the inverse and the condition number of the Hilbert
matrix. As in Fig. 11.9, the scaled matrix can be entered using the definition symbol and
the Insert/Matrix pull down menu. The inverse can again be computed by simply raising
H to the exponent 21. The result is shown in Fig. 11.9. We can then use some other
Mathcad functions to determine condition numbers by using the definition symbol to
define variables c1, c2, ce, and ci as the condition number based on the column-sum
(cond1), spectral (cond2), the Euclidean (conde), and the row-sum (condi) norms, re-
spectively. The resulting values are shown at the bottom of Fig. 11.9. As expected, the
spectral norm provides the smallest measure of magnitude.
FIGURE 11.8
Mathcad screen to solve a system of linear algebraic equations.
316 SPECIAL MATRICES AND GAUSS-SEIDEL
FIGURE 11.9
Mathcad screen to determine the matrix inverse and condition numbers of a scaled 333 Hilbert
matrix.
S
O
F
T
W
A
R
E
PROBLEMS
11.1 Perform the same calculations as in (a) Example 11.1, and
(b) Example 11.3, but for the tridiagonal system,
£
0.8 20.4
20.4 0.8 20.4
20.4 0.8
§ •
x1
x2
x3
¶ 5 •
41
25
105
¶
11.2 Determine the matrix inverse for Example 11.1 based on the
LU decomposition and unit vectors.
11.3 The following tridiagonal system must be solved as part of a
larger algorithm (Crank-Nicolson) for solving partial differential
equations:
D
2.01475 20.020875
20.020875 2.01475 20.020875
20.020875 2.01475 20.020875
20.020875 2.01475
T
3 d
T1
T2
T3
T4
t 5 d
4.175
0
0
2.0875
t
Use the Thomas algorithm to obtain a solution.
11.4 Confirm the validity of the Cholesky decomposition of
Example 11.2 by substituting the results into Eq. (11.2) to see
if the product of [L] and [L]T
yields [A].
PROBLEMS 317
11.13 Use the Gauss-Seidel method (a) without relaxation and
(b) with relaxation (l 5 1.2) to solve the following system to a
tolerance of es 5 5%. If necessary, rearrange the equations to
achieve convergence.
2x1 2 6x2 2 x3 5 238
23x1 2 x2 1 7x3 5 234
28x1 1 x2 2 2x3 5 220
11.14 Redraw Fig. 11.5 for the case where the slopes of the equa-
tions are 1 and 21. What is the result of applying Gauss-Seidel to
such a system?
11.15 Of the following three sets of linear equations, identify the
set(s) that you could not solve using an iterative method such as
Gauss-Seidel. Show using any number of iterations that is neces-
sary that your solution does not converge. Clearly state your con-
vergence criteria (how you know it is not converging).
Set One Set Two Set Three
8x 1 3y 1 z 5 12 x 1 y 1 5z 5 7 2x 1 3y 1 5z 5 7
26x 1 7z 5 1 x 1 4y 2 z 5 4 22x 1 4y 2 5z 5 23
2x 1 4y 2 z 5 5 3x 1 y 2 z 5 4 2y 2 z 5 1
11.16 Use the software package of your choice to obtain a solu-
tion, calculate the inverse, and determine the condition number
(without scaling) based on the row-sum norm for
(a)
£
1 4 9
4 9 16
9 16 25
§ •
x1
x2
x3
¶ 5 •
14
29
50
¶
(b)
D
1 4 9 16
4 9 16 25
9 16 25 36
16 25 36 49
T d
x1
x2
x3
x4
t 5 d
30
54
86
126
t
In both cases, the answers for all the x’s should be 1.
11.17 Given the pair of nonlinear simultaneous equations:
f(x, y) 5 4 2 y 2 2x2
g(x, y) 5 8 2 y2
2 4x
(a) Use the Excel Solver to determine the two pairs of values of x
and y that satisfy these equations.
(b) Using a range of initial guesses (x 5 26 to 6 and y 5 26 to 6),
determine which initial guesses yield each of the solutions.
11.5 Perform the same calculations as in Example 11.2, but for the
symmetric system,
£
6 15 55
15 55 225
55 225 979
§ •
a0
a1
a2
¶ 5 •
152.6
585.6
2488.8
¶
In addition to solving for the Cholesky decomposition, employ it to
solve for the a’s.
11.6 Perform a Cholesky decomposition of the following symmet-
ric system by hand,
£
8 20 15
20 80 50
15 50 60
§ •
x1
x2
x3
¶ 5 •
50
250
100
¶
11.7 Compute the Cholesky decomposition of
[A] 5 £
9 0 0
0 25 0
0 0 4
§
Do your results make sense in terms of Eqs. (11.3) and (11.4)?
11.8 Use the Gauss-Seidel method to solve the tridiagonal system
from Prob. 11.1 (es 5 5%). Use overrelaxation with l 5 1.2.
11.9 Recall from Prob. 10.8, that the following system of equa-
tions is designed to determine concentrations (the c’s in g/m3
) in a
series of coupled reactors as a function of amount of mass input to
each reactor (the right-hand sides in g/d),
15c1 2 3c2 2 c3 5 3800
23c1 1 18c2 2 6c3 5 1200
24c1 2 c2 1 12c3 5 2350
Solve this problem with the Gauss-Seidel method to es 5 5%.
11.10 Repeat Prob. 11.9, but use Jacobi iteration.
11.11 Use the Gauss-Seidel method to solve the following system
until the percent relative error falls below es 5 5%,
10x1 1 2x2 2 x3 5 27
23x1 2 6x2 1 2x3 5 261.5
x1 1 x2 1 5x3 5 221.5
11.12 Use the Gauss-Seidel method (a) without relaxation and
(b) with relaxation (l 5 0.95) to solve the following system to a
tolerance of es 5 5%. If necessary, rearrange the equations to
achieve convergence.
23x1 1 x2 1 12x3 5 50
6x1 2 x2 2 x3 5 3
6x1 1 9x2 1 x3 5 40
318 SPECIAL MATRICES AND GAUSS-SEIDEL
11.24 Develop a user-friendly program in either a high-level or
macro language of your choice to obtain a solution for a tridiagonal
system with the Thomas algorithm (Fig. 11.2). Test your program
by duplicating the results of Example 11.1.
11.25 Develop a user-friendly program in either a high-level or
macro language of your choice for Cholesky decomposition based
on Fig. 11.3. Test your program by duplicating the results of
Example 11.2.
11.26 Develop a user-friendly program in either a high-level or
macro language of your choice for the Gauss-Seidel method based
on Fig. 11.6. Test your program by duplicating the results of
Example 11.3.
11.27 As described in Sec. PT3.1.2, linear algebraic equations can
arise in the solution of differential equations. For example, the
following differential equation results from a steady-state mass
balance for a chemical in a one-dimensional canal,
0 5 D
d2
c
dx2
2 U
dc
dx
2 kc
where c 5 concentration, t 5 time, x 5 distance, D 5 diffusion
coefficient, U 5 fluid velocity, and k 5 a first-order decay rate.
Convert this differential equation to an equivalent system of simul-
taneous algebraic equations. Given D 5 2, U 5 1, k 5 0.2, c(0) 5 80
and c(10) 5 20, solve these equations from x 5 0 to 10 with Dx 5 2,
and develop a plot of concentration versus distance.
11.28 A pentadiagonal system with a bandwidth of five can be
expressed generally as
Develop a program to efficiently solve such systems without
pivoting in a similar fashion to the algorithm used for tridiagonal
matrices in Sec. 11.1.1. Test it for the following case:
E
8 22 21 0 0
22 9 24 21 0
21 23 7 21 22
0 24 22 12 25
0 0 27 23 15
U e
x1
x2
x3
x4
x5
u 5 e
5
2
0
1
5
u
H
f1 g1 h1
e2 f2 g2 h2
d3 e3 f3 g3 h3
. . .
. . .
. . .
dn21 en21 fn21 gn21
dn en fn
X h
x1
x2
x3
.
.
.
xn21
xn
x5h
r1
r2
r3
.
.
.
rn21
rn
x
11.18 An electronics company produces transistors, resistors, and
computer chips. Each transistor requires four units of copper, one
unit of zinc, and two units of glass. Each resistor requires three,
three, and one units of the three materials, respectively, and each
computer chip requires two, one, and three units of these materials,
respectively. Putting this information into table form, we get:
Component Copper Zinc Glass
Transistors 4 1 2
Resistors 3 3 1
Computer chips 2 1 3
Supplies of these materials vary from week to week, so the com-
pany needs to determine a different production run each week. For
example, one week the total amounts of materials available are 960
units of copper, 510 units of zinc, and 610 units of glass. Set up the
system of equations modeling the production run, and use Excel,
MATLAB, or Mathcad, to solve for the number of transistors, resis-
tors, and computer chips to be manufactured this week.
11.19 Use MATLAB or Mathcad software to determine the spectral
condition number for a 10-dimensional Hilbert matrix. How many
digits of precision are expected to be lost due to ill-conditioning?
Determine the solution for this system for the case where each ele-
ment of the right-hand-side vector {b} consists of the summation of
the coefficients in its row. In other words, solve for the case where
all the unknowns should be exactly one. Compare the resulting er-
rors with those expected based on the condition number.
11.20 Repeat Prob. 11.19, but for the case of a six-dimensional
Vandermonde matrix (see Prob. 10.17) where x1 5 4, x2 5 2, x3 5 7,
x4 5 10, x5 5 3, and x6 5 5.
11.21 Given a square matrix [A], write a single line MATLAB
command that will create a new matrix [Aug] that consists of the
original matrix [A] augmented by an identity matrix [I].
11.22 Write the following set of equations in matrix form:
50 5 5x3 2 7x2
4x2 1 7x3 1 30 5 0
x1 2 7x3 5 40 2 3x2 1 5x1
Use Excel, MATLAB, or Mathcad to solve for the unknowns. In
addition, compute the transpose and the inverse of the coefficient
matrix.
11.23 In Sec. 9.2.1, we determined the number of operations re-
quired for Gauss elimination without partial pivoting. Make a simi-
lar determination for the Thomas algorithm (Fig. 11.2). Develop a
plot of operations versus n (from 2 to 20) for both techniques.
12
C H A P T E R 12
319
Case Studies: Linear
Algebraic Equations
The purpose of this chapter is to use the numerical procedures discussed in Chaps. 9, 10,
and 11 to solve systems of linear algebraic equations for some engineering case studies.
These systematic numerical techniques have practical significance because engineers fre-
quently encounter problems involving systems of equations that are too large to solve by
hand. The numerical algorithms in these applications are particularly convenient to imple-
ment on personal computers.
Section 12.1 shows how a mass balance can be employed to model a system of
reactors. Section 12.2 places special emphasis on the use of the matrix inverse to
determine the complex cause-effect interactions between forces in the members of a
truss. Section 12.3 is an example of the use of Kirchhoff’s laws to compute the cur-
rents and voltages in a resistor circuit. Finally, Sec. 12.4 is an illustration of how
linear equations are employed to determine the steady-state configuration of a mass-
spring system.
12.1 STEADY-STATE ANALYSIS OF A SYSTEM OF REACTORS
(CHEMICAL/BIO ENGINEERING)
Background. One of the most important organizing principles in chemical engineer-
ing is the conservation of mass (recall Table 1.1). In quantitative terms, the principle is
expressed as a mass balance that accounts for all sources and sinks of a material that
pass in and out of a volume (Fig. 12.1). Over a finite period of time, this can be
expressed as
Accumulation 5 inputs 2 outputs (12.1)
The mass balance represents a bookkeeping exercise for the particular substance
being modeled. For the period of the computation, if the inputs are greater than the
outputs, the mass of the substance within the volume increases. If the outputs are greater
than the inputs, the mass decreases. If inputs are equal to the outputs, accumulation is
zero and mass remains constant. For this stable condition, or steady state, Eq. (12.1) can
be expressed as
Inputs 5 outputs (12.2)
320 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS
Employ the conservation of mass to determine the steady-state concentrations of a system
of coupled reactors.
Solution. The mass balance can be used for engineering problem solving by expressing
the inputs and outputs in terms of measurable variables and parameters. For example, if
we were performing a mass balance for a conservative substance (that is, one that does
not increase or decrease due to chemical transformations) in a reactor (Fig. 12.2), we
would have to quantify the rate at which mass flows into the reactor through the two
inflow pipes and out of the reactor through the outflow pipe. This can be done by taking
the product of the flow rate Q (in cubic meters per minute) and the concentration c (in
milligrams per cubic meter) for each pipe. For example, for pipe 1 in Fig. 12.2, Q1 5
2 m3
/min and c1 5 25 mg/m3
; therefore, the rate at which mass flows into the reactor
through pipe 1 is Q1c1 5 (2 m3
/min)(25 mg/m3
) 5 50 mg/min. Thus, 50 mg of chemi-
cal flows into the reactor through this pipe each minute. Similarly, for pipe 2 the mass
inflow rate can be calculated as Q2c2 5 (1.5 m3
/min)(10 mg/m3
) 5 15 mg/min.
Notice that the concentration out of the reactor through pipe 3 is not specified by
Fig. 12.2. This is because we already have sufficient information to calculate it on the
basis of the conservation of mass. Because the reactor is at steady state, Eq. (12.2) holds
and the inputs should be in balance with the outputs, as in
Q1c1 1 Q2c2 5 Q3c3
Substituting the given values into this equation yields
50 1 15 5 3.5c3
which can be solved for c3 5 18.6 mg/m3
. Thus, we have determined the concentration
in the third pipe. However, the computation yields an additional bonus. Because the
reactor is well mixed (as represented by the propeller in Fig. 12.2), the concentration
will be uniform, or homogeneous, throughout the tank. Therefore the concentration in
pipe 3 should be identical to the concentration throughout the reactor. Consequently, the
mass balance has allowed us to compute both the concentration in the reactor and in the
Input Output
Accumulation
Volume
FIGURE 12.1
A schematic representation of mass balance.
12.1 STEADY-STATE ANALYSIS OF A SYSTEM OF REACTORS 321
outflow pipe. Such information is of great utility to chemical and petroleum engineers
who must design reactors to yield mixtures of a specified concentration.
Because simple algebra was used to determine the concentration for the single reac-
tor in Fig. 12.2, it might not be obvious how computers figure in mass-balance calcula-
tions. Figure 12.3 shows a problem setting where computers are not only useful but are
a practical necessity. Because there are five interconnected, or coupled, reactors, five
simultaneous mass-balance equations are needed to characterize the system. For reactor 1,
the rate of mass flow in is
5(10) 1 Q31c3
FIGURE 12.2
A steady-state, completely
mixed reactor with two inflow
pipes and one outflow pipe.
The flows Q are in cubic meters
per minute, and the concentra-
tions c are in milligrams per
cubic meter.
Q3 = 3.5 m3
/min
c3 = ?
Q1 = 2 m3
/min
c1 = 25 mg/m3
Q2 = 1.5 m3
/min
c2 = 10 mg/m3
Q24 = 1
Q54 = 2
Q55 = 2
Q15 = 3
Q44 = 11
Q12 = 3
Q31 = 1
Q03 = 8
c03 = 20
Q23 = 1
Q25 = 1
Q34 = 8
Q01 = 5
c01 = 10
c3
c5
c1 c2 c4
FIGURE 12.3
Five reactors linked by pipes.
322 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS
and the rate of mass flow out is
Q12c1 1 Q15c1
Because the system is at steady state, the inflows and outflows must be equal:
5(10) 1 Q31c3 5 Q12c1 1 Q15c1
or, substituting the values for flow from Fig. 12.3,
6c1 2 c3 5 50
Similar equations can be developed for the other reactors:
23c1 1 3c2 5 0
2c2 1 9c3 5 160
2c2 2 8c3 1 11c4 2 2c5 5 0
23c1 2 c2 1 4c5 5 0
A numerical method can be used to solve these five equations for the five unknown
concentrations:
{C}T
5 :11.51 11.51 19.06 17.00 11.51;
In addition, the matrix inverse can be computed as
[A]21
5 E
0.16981 0.00629 0.01887 0 0
0.16981 0.33962 0.01887 0 0
0.01887 0.03774 0.11321 0 0
0.06003 0.07461 0.08748 0.09091 0.04545
0.16981 0.08962 0.01887 0 0.25000
U
Each of the elements aij signifies the change in concentration of reactor i due to a unit
change in loading to reactor j. Thus, the zeros in column 4 indicate that a loading to
reactor 4 will have no impact on reactors 1, 2, 3, and 5. This is consistent with the
system configuration (Fig. 12.3), which indicates that flow out of reactor 4 does not feed
back into any of the other reactors. In contrast, loadings to any of the first three reactors
will affect the entire system as indicated by the lack of zeros in the first three columns.
Such information is of great utility to engineers who design and manage such systems.
12.2 ANALYSIS OF A STATICALLY DETERMINATE TRUSS
(CIVIL/ENVIRONMENTAL ENGINEERING)
Background. An important problem in structural engineering is that of finding the
forces and reactions associated with a statically determinate truss. Figure 12.4 shows an
example of such a truss.
The forces (F) represent either tension or compression on the members of the truss.
External reactions (H2, V2, and V3) are forces that characterize how the truss interacts with the
supporting surface. The hinge at node 2 can transmit both horizontal and vertical forces to the
surface, whereas the roller at node 3 transmits only vertical forces. It is observed that the ef-
fect of the external loading of 1000 lb is distributed among the various members of the truss.
12.2 ANALYSIS OF A STATICALLY DETERMINATE TRUSS 323
Solution. This type of structure can be described as a system of coupled linear alge-
braic equations. Free-body force diagrams are shown for each node in Fig. 12.5. The
sum of the forces in both horizontal and vertical directions must be zero at each node,
because the system is at rest. Therefore, for node 1,
gFH 5 0 5 2F1 cos 30° 1 F3 cos 60° 1 F1,h (12.3)
gFV 5 0 5 2F1 sin 30° 2 F3 sin 60° 1 F1,y (12.4)
for node 2,
gFH 5 0 5 F2 1 F1 cos 30° 1 F2,h 1 H2 (12.5)
gFV 5 0 5 F1 sin 30° 1 F2,y 1 V2 (12.6)
FIGURE 12.4
Forces on a statically determi-
nate truss.
1000 lb
2
3
1
30⬚
60⬚
90⬚ F3
F1
F2
H2
V2 V3
FIGURE 12.5
Free-body force diagrams for
the nodes of a statically
determinate truss.
2 F3,h
F1,v
F1,h
F2
F2,h
F1
F2,v
H2
V2
F3
F1
F3,v
F3
F2
V3
1
30⬚
30⬚
60⬚
60⬚
3
324 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS
for node 3,
gFH 5 0 5 2F2 2 F3 cos 60° 1 F3, h (12.7)
gFV 5 0 5 F3 sin 60° 1 F3, y 1 V3 (12.8)
where Fi,h is the external horizontal force applied to node i (where a positive force is
from left to right) and F1,y is the external vertical force applied to node i (where a
positive force is upward). Thus, in this problem, the 1000-lb downward force on node 1
corresponds to F1,y 5 21000. For this case all other Fi,y’s and Fi,h’s are zero. Note that
the directions of the internal forces and reactions are unknown. Proper application of
Newton’s laws requires only consistent assumptions regarding direction. Solutions are
negative if the directions are assumed incorrectly. Also note that in this problem, the
forces in all members are assumed to be in tension and act to pull adjoining nodes to-
gether. A negative solution therefore corresponds to compression. This problem can be
written as the following system of six equations and six unknowns:
F
0.866 0 20.5 0 0 0
0.5 0 0.866 0 0 0
20.866 21 0 21 0 0
20.5 0 0 0 21 0
0 1 0.5 0 0 0
0 0 20.866 0 0 21
V f
F1
F2
F3
H2
V2
V3
v 5 f
0
21000
0
0
0
0
v (12.9)
Notice that, as formulated in Eq. (12.9), partial pivoting is required to avoid division
by zero diagonal elements. Employing a pivot strategy, the system can be solved using
any of the elimination techniques discussed in Chap. 9 or 10. However, because this
problem is an ideal case study for demonstrating the utility of the matrix inverse, the LU
decomposition can be used to compute
F1 5 2500 F2 5 433 F3 5 2866
H2 5 0 V2 5 250 V3 5 750
and the matrix inverse is
[A]21
5 F
0.866 0.5
0.25 20.433
20.5 0.866
21 0
20.433 20.25
0.433 20.75
0 0 0 0
0 0 1 0
0 0 0 0
21 0 21 0
0 21 0 0
0 0 0 21
V
Now, realize that the right-hand-side vector represents the externally applied horizontal
and vertical forces on each node, as in
{F}T
5 :F1,h F1,y F2,h F2,y F3,h F3,y ; (12.10)
Because the external forces have no effect on the LU decomposition, the method need
not be implemented over and over again to study the effect of different external forces on
the truss. Rather, all that we have to do is perform the forward- and backward-substitution
steps for each right-hand-side vector to efficiently obtain alternative solutions. For example,
12.2 ANALYSIS OF A STATICALLY DETERMINATE TRUSS 325
we might want to study the effect of horizontal forces induced by a wind blowing from
left to right. If the wind force can be idealized as two point forces of 1000 lb on nodes
1 and 2 (Fig. 12.6a), the right-hand-side vector is
{F}T
5 :21000 0 1000 0 0 0;
which can be used to compute
F1 5 866 F2 5 250 F3 5 2500
H2 5 22000 V2 5 2433 V3 5 433
For a wind from the right (Fig. 12.6b), F1,h 5 21000, F3,h 5 21000, and all other
external forces are zero, with the result that
F1 5 2866 F2 5 21250 F3 5 500
H2 5 2000 V2 5 433 V3 5 2433
The results indicate that the winds have markedly different effects on the structure. Both
cases are depicted in Fig. 12.6.
The individual elements of the inverted matrix also have direct utility in elucidating
stimulus-response interactions for the structure. Each element represents the change of
one of the unknown variables to a unit change of one of the external stimuli. For ex-
ample, element a21
32 indicates that the third unknown (F3) will change 0.866 due to a unit
change of the second external stimulus (F1,y). Thus, if the vertical load at the first node
were increased by 1, F3 would increase by 0.866. The fact that elements are 0 indicates
that certain unknowns are unaffected by some of the external stimuli. For instance
a21
32 5 0 means that F1 is unaffected by changes in F2,h. This ability to isolate interactions
has a number of engineering applications, including the identification of those compo-
nents that are most sensitive to external stimuli and, as a consequence, most prone to
failure. In addition, it can be used to determine components that may be unnecessary
(see Prob. 12.18).
FIGURE 12.6
Two test cases showing (a) winds from the left and (b) winds from the right.
(a) (b)
866
2000 1000
1000
250
5
0
0
433 433
866
2000 1000
1000
1250
5
0
0
433 433
326 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS
The foregoing approach becomes particularly useful when applied to large complex
structures. In engineering practice, it may be necessary to solve trusses with hundreds
or even thousands of structural members. Linear equations provide one powerful ap-
proach for gaining insight into the behavior of these structures.
12.3 CURRENTS AND VOLTAGES IN RESISTOR CIRCUITS
(ELECTRICAL ENGINEERING)
Background. A common problem in electrical engineering involves determining the
currents and voltages at various locations in resistor circuits. These problems are solved
using Kirchhoff’s current and voltage rules. The current (or point) rule states that the
algebraic sum of all currents entering a node must be zero (see Fig. 12.7a), or
oi 5 0 (12.11)
where all current entering the node is considered positive in sign. The current rule is an
application of the principle of conservation of charge (recall Table 1.1).
The voltage (or loop) rule specifies that the algebraic sum of the potential differences
(that is, voltage changes) in any loop must equal zero. For a resistor circuit, this is ex-
pressed as
oj 2 oiR 5 0 (12.12)
where j is the emf (electromotive force) of the voltage sources and R is the resistance of
any resistors on the loop. Note that the second term derives from Ohm’s law (Fig. 12.7b),
which states that the voltage drop across an ideal resistor is equal to the product of the
current and the resistance. Kirchhoff’s voltage rule is an expression of the conservation
of energy.
Solution. Application of these rules results in systems of simultaneous linear algebraic
equations because the various loops within a circuit are coupled. For example, consider
the circuit shown in Fig. 12.8. The currents associated with this circuit are unknown both
in magnitude and direction. This presents no great difficulty because one simply assumes
a direction for each current. If the resultant solution from Kirchhoff’s laws is negative,
then the assumed direction was incorrect. For example, Fig. 12.9 shows some assumed
currents.
FIGURE 12.7
Schematic representations of
(a) Kirchhoff’s current rule and
(b) Ohm’s law.
i1 i3
i2
Vi Vj
Rij
iij
(a)
(b)
FIGURE 12.8
A resistor circuit to be solved using simultaneous linear algebraic equations.
R = 5 ⍀ R = 10 ⍀
R = 10 ⍀
3 2 1
4 5 6
R = 15 ⍀
R = 5 ⍀
V1 = 200 V
V6 = 0 V
R = 20 ⍀
12.3 CURRENTS AND VOLTAGES IN RESISTOR CIRCUITS 327
Given these assumptions, Kirchhoff’s current rule is applied at each node to yield
i12 1 i52 1 i32 5 0
i65 2 i52 2 i54 5 0
i43 2 i32 5 0
i54 2 i43 5 0
Application of the voltage rule to each of the two loops gives
2i54R54 2 i43R43 2 i32R32 1 i52 R52 5 0
2i65R65 2 i52R52 2 i12R12 2 200 5 0
or, substituting the resistances from Fig. 12.8 and bringing constants to the right-hand side,
215i54 2 5i43 2 10i32 1 10i52 5 0
220i65 2 10i52 1 5i12 5 200
Therefore, the problem amounts to solving the following set of six equations with six
unknown currents:
F
1 1 1 0 0 0
0 21 0 1 21 0
0 0 21 0 0 1
0 0 0 0 1 21
0 10 210 0 215 25
5 210 0 220 0 0
V f
i12
i52
i32
i65
i54
i43
v 5 f
0
0
0
0
0
200
v
Although impractical to solve by hand, this system is easily handled using an elimination
method. Proceeding in this manner, the solution is
i12 5 6.1538 i52 5 24.6154 i32 5 21.5385
i65 5 26.1538 i54 5 21.5385 i43 5 21.5385
Thus, with proper interpretation of the signs of the result, the circuit currents and volt-
ages are as shown in Fig. 12.10. The advantages of using numerical algorithms and
computers for problems of this type should be evident.
FIGURE 12.9
Assumed currents.
3 2 1
4 5 6
i12
i65
i52
i32
i54
i43
328 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS
12.4 SPRING-MASS SYSTEMS (MECHANICAL/AEROSPACE
ENGINEERING)
Background. Idealized spring-mass systems play an important role in mechanical and
other engineering problems. Figure 12.11 shows such a system. After they are released,
the masses are pulled downward by the force of gravity. Notice that the resulting dis-
placement of each spring in Fig. 12.11b is measured along local coordinates referenced
to its initial position in Fig. 12.11a.
As introduced in Chap. 1, Newton’s second law can be employed in conjunction
with force balances to develop a mathematical model of the system. For each mass, the
second law can be expressed as
m
d2
x
dt2
5 FD 2 FU (12.13)
To simplify the analysis, we will assume that all the springs are identical and follow
Hooke’s law. A free-body diagram for the first mass is depicted in Fig. 12.12a. The
upward force is merely a direct expression of Hooke’s law:
FU 5 kx1 (12.14)
The downward component consists of the two spring forces along with the action of
gravity on the mass,
FD 5 k(x2 2 x1) 1 k(x2 2 x1) 5 m1g (12.15)
Note how the force component of the two springs is proportional to the displacement of
the second mass, x2, corrected for the displacement of the first mass, x1.
Equations (12.14) and (12.15) can be substituted into Eq. (12.13) to give
m1
d2
x1
dt2
5 2k(x2 2 x1) 1 m1g 2 kx1 (12.16)
Thus, we have derived a second-order ordinary differential equation to describe the dis-
placement of the first mass with respect to time. However, notice that the solution cannot
be obtained because the model includes a second dependent variable, x2. Consequently,
free-body diagrams must be developed for the second and the third masses (Fig. 12.12b
FIGURE 12.10
The solution for currents and voltages obtained using an elimination method.
V = 153.85 V = 169.23
i = 1.5385
V = 146.15 V = 123.08
V = 0
V = 200
i = 6.1538
12.4 SPRING-MASS SYSTEMS 329
and c) that can be employed to derive
m2
d2
x2
dt2
5 k(x3 2 x2) 1 m2g 2 2k(x2 2 x1) (12.17)
and
m3
d2
x3
dt2
5 m3g 2 k(x3 2 x2) (12.18)
m1
m3
m2
m1
m3
0
0
0
x1
x2
x3
k
k
k
k
(b)
(a)
m2
FIGURE 12.11
A system composed of three masses suspended vertically by a series of springs. (a) The system
before release, that is, prior to extension or compression of the springs. (b) The system after
release. Note that the positions of the masses are referenced to local coordinates with origins at
their position before release.
FIGURE 12.12
Free-body diagrams for the three masses from Fig. 12.11.
m1
k(x2 – x1) m1g k(x2 – x1)
kx1 k(x2 – x1) k(x2 – x1) k(x3 – x2)
m2g k(x3 – x2) m3g
(a) (b) (c)
m2 m3
330 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS
Equations (12.16), (12.17), and (12.18) form a system of three differential equations
with three unknowns. With the appropriate initial conditions, they could be used to solve
for the displacements of the masses as a function of time (that is, their oscillations). We
will discuss numerical methods for obtaining such solutions in Part Seven. For the pres-
ent, we can obtain the displacements that occur when the system eventually comes to
rest, that is, to the steady state. To do this, the derivatives in Eqs. (12.16), (12.17), and
(12.18) are set to zero to give
3kx1 2 2kx2 5 m1g
22kx1 1 3kx2 2 kx3 5 m2g
2 kx2 1 kx3 5 m3g
or, in matrix form,
[K]{X} 5 {W}
where [K], called the stiffness matrix, is
[K] 5 £
3k 22k
22k 3k 2k
2k k
§
and {X} and {W} are the column vectors of the unknowns X and the weights mg,
respectively.
Solution. At this point, numerical methods can be employed to obtain a solution. If m1 5
2 kg, m2 5 3 kg, m3 5 2.5 kg, and the k’s 5 10 kg/s2
, use LU decomposition to solve
for the displacements and generate the inverse of [K].
Substituting the model parameters with g 5 9.81 gives
[K] 5 £
30 220
220 30 210
210 10
§ {W} 5 •
19.62
29.43
24.525
¶
LU decomposition can be employed to solve for x1 5 7.36, x2 5 10.06, and x3 5 12.51.
These displacements were used to construct Fig. 12.11b. The inverse of the stiffness
matrix is computed as
[K]21
5 £
0.1 0.1 0.1
0.1 0.15 0.15
0.1 0.15 0.25
§
Each element of this matrix k21
ji tells us the displacement of mass i due to a unit
force imposed on mass j. Thus, the values of 0.1 in column 1 tell us that a downward
unit load to the first mass will displace all of the masses 0.1 m downward. The other
elements can be interpreted in a similar fashion. Therefore, the inverse of the stiffness
matrix provides a fundamental summary of how the system’s components respond to
externally applied forces.
PROBLEMS 331
PROBLEMS
Chemical/Bio Engineering
12.1 Perform the same computation as in Sec. 12.1, but change c01
to 20 and c03 to 6.Also change the following flows: Q01 5 6, Q12 5 4,
Q24 5 2, and Q44 5 12.
12.2 If the input to reactor 3 in Sec. 12.1 is decreased 25 percent,
use the matrix inverse to compute the percent change in the concen-
tration of reactors 2 and 4?
12.3 Because the system shown in Fig. 12.3 is at steady state, what
can be said regarding the four flows: Q01, Q03, Q44, and Q55?
12.4 Recompute the concentrations for the five reactors shown in
Fig. 12.3, if the flows are changed to
Q01 5 5 Q31 5 3 Q25 5 2 Q23 5 2
Q15 5 4 Q55 5 3 Q54 5 3 Q34 5 7
Q12 5 4 Q03 5 8 Q24 5 0 Q44 5 10
12.5 Solve the same system as specified in Prob. 12.4, but set
Q12 5 Q54 5 0 and Q15 5 Q34 5 3. Assume that the inflows (Q01,
Q03) and outflows (Q44, Q55) are the same. Use conservation of flow
to recompute the values for the other flows.
12.6 Figure P12.6 shows three reactors linked by pipes.As indicated,
the rate of transfer of chemicals through each pipe is equal to a flow
rate (Q, with units of cubic meters per second) multiplied by the con-
centration of the reactor from which the flow originates (c, with units
of milligrams per cubic meter). If the system is at a steady state, the
transfer into each reactor will balance the transfer out. Develop mass-
balance equations for the reactors and solve the three simultaneous
linear algebraic equations for their concentrations.
12.7 Employing the same basic approach as in Sec. 12.1, deter-
mine the concentration of chloride in each of the Great Lakes using
the information shown in Fig. P12.7.
12.8 The Lower Colorado River consists of a series of four reser-
voirs as shown in Fig. P12.8. Mass balances can be written for each
reservoir and the following set of simultaneous linear algebraic
equations results:
≥
13.442 0 0 0
213.442 12.252 0 0
0 212.252 12.377 0
0 0 212.377 11.797
¥ μ
c1
c2
c3
c4
∂ 5 μ
750.5
300
102
30
∂
where the right-hand-side vector consists of the loadings of chlo-
ride to each of the four lakes and c1, c2, c3, and c4 5 the resulting
chloride concentrations for Lakes Powell, Mead, Mohave, and
Havasu, respectively.
(a) Use the matrix inverse to solve for the concentrations in each of
the four lakes.
(b) How much must the loading to Lake Powell be reduced in or-
der for the chloride concentration of Lake Havasu to be 75?
(c) Using the column-sum norm, compute the condition number
and how many suspect digits would be generated by solving
this system.
12.9 A stage extraction process is depicted in Fig. P12.9. In such
systems, a stream containing a weight fraction Yin of a chemical
enters from the left at a mass flow rate of F1. Simultaneously, a
solvent carrying a weight fraction Xin of the same chemical enters
from the right at a flow rate of F2. Thus, for stage i, a mass balance
can be represented as
F1Yi21 1 F2Xi11 5 F1Yi 1 F2Xi (P12.9.1)
At each stage, an equilibrium is assumed to be established between
Yi and Xi as in
K 5
Xi
Yi
(P12.9.2)
FIGURE P12.6
Three reactors linked by pipes.
The rate of mass transfer
through each pipe is equal to
the product of flow Q and con-
centration c of the reactor from
which the flow originates.
2
3
Q33 = 120
Q13 = 40
Q12 = 80
Q23 = 60
Q21 = 20
Q12c1
Q21c2
Q23c2
Q33c3
Q13c1
400 mg/s
200 mg/s
1
332 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS
12.10 An irreversible, first-order reaction takes place in four well-
mixed reactors (Fig. P12.10),
A S
k
B
Thus, the rate at which A is transformed to B can be represented as
Rab 5 kVc
The reactors have different volumes, and because they are operated
at different temperatures, each has a different reaction rate:
Reactor V, L k, h21
1 25 0.05
2 75 0.1
3 100 0.5
4 25 0.1
Determine the concentration of A and B in each of the reactors at
steady state.
12.11 A peristaltic pump delivers a unit flow (Q1) of a highly
viscous fluid. The network is depicted in Fig. P12.11. Every pipe
section has the same length and diameter. The mass and mechanical
energy balance can be simplified to obtain the flows in every pipe.
Solve the following system of equations to obtain the flow in every
pipe.
Q3 1 2Q4 2 2Q2 5 0
Q5 1 2Q6 2 2Q4 5 0
3Q7 2 2Q6 5 0
where K is called a distribution coefficient. Equation (P12.9.2) can
be solved for Xi and substituted into Eq. (P12.9.1) to yield
Yi21 2 a1 1
F2
F1
Kb Yi 1 a
F2
F1
Kb Yi11 5 0 (P12.9.3)
If F1 5 400 kg/h, Yin 5 0.1, F2 5 800 kg/h, Xin 5 0, and K 5 5,
determine the values of Yout and Xout if a five-stage reactor is used.
Note that Eq. (P12.9.3) must be modified to account for the inflow
weight fractions when applied to the first and last stages.
FIGURE P12.7
A chloride balance for the
Great Lakes. Numbered arrows
are direct inputs.
QSH = 67
QMH = 36
QHE = 161
QEO = 182
QOO = 212
QSHcS
QMHcM
QHEcH
QEOcE
QOOcO
3850
4720
740
180
710
Superior
Michigan
Huron
Superior
Erie
Ontario
c1
c2
c3
c4
Upper
Colorado
River
Lake
Mead
Lake
Mohave
Lake
Havasu
Lake
Powell
FIGURE P12.8
The Lower Colorado River.
PROBLEMS 333
is passed over a liquid flowing from right to left. The transfer of a
chemical from the gas into the liquid occurs at a rate that is propor-
tional to the difference between the gas and liquid concentrations in
each reactor. At steady state, a mass balance for the first reactor can
be written for the gas as
QGcG0 2 QGcG1 1 D(cL1 2 cG1) 5 0
and for the liquid as
QL cL2 2 QL cL1 1 D(cG1 2 cL1) 5 0
where QG and QL are the gas and liquid flow rates, respectively, and
D 5 the gas-liquid exchange rate. Similar balances can be written
for the other reactors. Solve for the concentrations given the follow-
ing values: QG 5 2, QL 5 1, D 5 0.8, cG0 5 100, cL6 5 20.
Civil/Environmental Engineering
12.13 A civil engineer involved in construction requires 4800, 5810,
and 5690 m3
of sand, fine gravel, and coarse gravel, respectively, for
Q1 5 Q2 1 Q3
Q3 5 Q4 1 Q5
Q5 5 Q6 1 Q7
12.12 Figure P12.12 depicts a chemical exchange process consist-
ing of a series of reactors in which a gas flowing from left to right
FIGURE P12.9
A stage extraction process.
Flow = F1
Flow = F2
x2
xout x3 xi xi + 1 xn – 1 xn xin
y1
yin y2 yi – 1 yi yn – 2 yn – 1 yout
1 0
2 0
n
0
i n – 1
••• •••
1 2 3 4
Qin = 10
Q32 = 5
Q43 = 3
cA,in = 1
FIGURE P12.10
FIGURE P12.11
Q1 Q3 Q5
Q2 Q4 Q6 Q7
cG1
cG0 cG2 cG3 cG4
QG
QG
QL
cG5
QL
D
cL1 cL2 cL3 cL4 cL5 cL6
FIGURE P12.12
334 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS
12.17 In the example for Fig. 12.4, where a 1000-lb downward
force is applied at node 1, the external reactions V2 and V3 were
calculated. But if the lengths of the truss members had been
given, we could have calculated V2 and V3 by utilizing the fact
that V2 1 V3 must equal 1000 and by summing moments around
node 2. However, because we do know V2 and V3, we can work
backward to solve for the lengths of the truss members. Note that
because there are three unknown lengths and only two equations,
we can solve for only the relationship between lengths. Solve for
this relationship.
12.18 Employing the same methods as used to analyze Fig. 12.4,
determine the forces and reactions for the truss shown in
Fig. P12.18.
12.19 Solve for the forces and reaction for the truss in Fig. P12.19.
Determine the matrix inverse for the system. Does the vertical-
member force in the middle member seem reasonable? Why?
How many cubic meters must be hauled from each pit in order to
meet the engineer’s needs?
12.14 Perform the same computation as in Sec. 12.2, but for the
truss depicted in Fig. P12.14.
12.15 Perform the same computation as in Sec. 12.2, but for the
truss depicted in Fig. P12.15.
12.16 Calculate the forces and reactions for the truss in Fig. 12.4 if
a downward force of 2500 kg and a horizontal force to the right of
2000 kg are applied at node 1.
600
1200
500
30⬚
45⬚ 45⬚
FIGURE P12.14
FIGURE P12.19
400 200
45⬚ 60⬚
45⬚ 30⬚
FIGURE P12.15
a building project. There are three pits from which these materials
can be obtained. The composition of these pits is
Sand Fine Gravel Coarse Gravel
% % %
Pit 1 52 30 18
Pit 2 20 50 30
Pit 3 25 20 55
FIGURE P12.18
45⬚
800
250
30⬚
30⬚
60⬚
45⬚ 45⬚
60⬚
3500
PROBLEMS 335
12.22 A truss is loaded as shown in Fig. P12.22. Using the follow-
ing set of equations, solve for the 10 unknowns: AB, BC, AD, BD,
CD, DE, CE, Ax, Ay, and Ey.
12.20 As the name implies, indoor air pollution deals with air con-
tamination in enclosed spaces such as homes, offices, work areas,
etc. Suppose that you are designing a ventilation system for a res-
taurant as shown in Fig. P12.20. The restaurant serving area con-
sists of two square rooms and one elongated room. Room 1 and
room 3 have sources of carbon monoxide from smokers and a
faulty grill, respectively. Steady-state mass balances can be written
for each room. For example, for the smoking section (room 1), the
balance can be written as
0 5 Wsmoker 1 Qaca 2 Qac1 1 E13(c3 2 c1)
(load) 1 (inflow) 2 (outflow) 1 (mixing)
or substituting the parameters
225c1 2 25c3 5 2400
Similar balances can be written for the other rooms.
(a) Solve for the steady-state concentration of carbon monoxide in
each room.
(b) Determine what percent of the carbon monoxide in the kids’
section is due to (i) the smokers, (ii) the grill, and (iii) the air in
the intake vents.
(c) If the smoker and grill loads are increased to 2000 and 5000
mg/hr, respectively, use the matrix inverse to determine the in-
crease in the concentration in the kids’ section.
(d) How does the concentration in the kids’ area change if a screen
is constructed so that the mixing between areas 2 and 4 is de-
creased to 5 m3
/hr?
12.21 An upward force of 20 kN is applied at the top of a tripod as
depicted in Fig. P12.21. Determine the forces in the legs of the
tripod.
Qc = 150 m3
/hr
2
(Kids' section)
1
(Smoking section)
Grill load
(2000 mg/hr)
Smoker load
(1000 mg/hr)
4
25 m3
/hr
25 m3
/hr
3
Qb = 50 m3
/hr
cb = 2 mg/m3
Qa = 200 m3
/hr
ca = 2 mg/m3
Qd = 100 m3
/hr
50
m
3
/hr
FIGURE P12.20
Overhead view of rooms in a
restaurant. The one-way arrows
represent volumetric airflows,
whereas the two-way arrows
represent diffusive mixing. The
smoker and grill loads add
carbon monoxide mass to the
system but negligible airflow.
D
B
C
A
x
y
0.6 m
2.4 m
0.8 m
0.8
m
1 m
FIGURE P12.21
336 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS
Metal, Plastic, Rubber,
Component g/component g/component g/component
1 15 0.30 1.0
2 17 0.40 1.2
3 19 0.55 1.5
If totals of 3.89, 0.095, and 0.282 kg of metal, plastic, and rubber,
respectively, are available each day, how many components can be
produced per day?
12.27 Determine the currents for the circuit in Fig. P12.27.
12.28 Determine the currents for the circuit in Fig. P12.28.
12.29 The following system of equations was generated by applying
the mesh current law to the circuit in Fig. P12.29:
55I1 2 25I4 5 2200
237I3 2 4I4 5 2250
225I1 2 4I3 1 29I4 5 100
Solve for I1, I3, and I4.
Ax 1 AD 5 0
Ay 1 AB 5 0
74 1 BC 1 (3y5)BD 5 0
2AB 2 (4y5)BD 5 0
2BC 1 (3y5)CE 5 0
224 2 CD 2 (4y5)CE 5 0
2AD 1 DE 2 (3y5)BD 5 0
CD 1 (4y5)BD 5 0
2DE 2 (3y5)CE 5 0
Ey 1 (4y5)CE 5 0
Electrical Engineering
12.23 Perform the same computation as in Sec. 12.3, but for the
circuit depicted in Fig. P12.23.
12.24 Perform the same computation as in Sec. 12.3, but for the
circuit depicted in Fig. P12.24.
12.25 Solve the circuit in Fig. P12.25 for the currents in each wire.
Use Gauss elimination with pivoting.
12.26 An electrical engineer supervises the production of three
types of electrical components. Three kinds of material—metal,
plastic, and rubber—are required for production. The amounts
needed to produce each component are
FIGURE P12.22
3 m 3 m
4 m
D
A E
C
B
54 kN
24 kN
FIGURE P12.23
R = 2 ⍀ R = 5 ⍀
R = 20 ⍀
3 2 1
4 5 6
R = 5 ⍀
R = 10 ⍀
V1 = 200 volts
V6 = 0 volts
R = 25 ⍀
FIGURE P12.24
R
= 7 ⍀
R = 5 ⍀ R = 10 ⍀
R = 30 ⍀
3 2 1
4 5 6
R = 18 ⍀
R = 35 ⍀
V1 = 10 volts
V6 = 200 volts
R = 5 ⍀
FIGURE P12.25
20 ⍀
5 ⍀
10 ⍀
10 ⍀
20 ⍀
5 ⍀
5 ⍀
60 ⍀
0 ⍀
4 7 9
2
1
8
3 6 15 ⍀
5
V2 = 40
V1 = 110
PROBLEMS 337
Mechanical/Aerospace Engineering
12.31 Perform the same computation as in Sec. 12.4, but add a
third spring between masses 1 and 2 and triple k for all springs.
12.32 Perform the same computation as in Sec. 12.4, but change
the masses from 2, 3, and 2.5 kg to 10, 3.5, and 2 kg, respectively.
12.33 Idealized spring-mass systems have numerous applications
throughout engineering. Figure P12.33 shows an arrangement of
four springs in series being depressed with a force of 2000 kg. At
equilibrium, force-balance equations can be developed defining the
interrelationships between the springs,
k2(x2 2 x1) 5 k1x1
k3(x3 2 x2) 5 k2(x2 2 x1)
k4(x4 2 x3) 5 k3(x3 2 x2)
F 5 k4(x4 2 x3)
where the k’s are spring constants. If k1 through k4 are 150, 50, 75,
and 225 N/m, respectively, compute the x’s.
12.34 Three blocks are connected by a weightless cord and rest on
an inclined plane (Fig. P12.34a). Employing a procedure similar to
the one used in the analysis of the falling parachutists in Example
12.30 The following system of equations was generated by apply-
ing the mesh current law to the circuit in Fig. P12.30:
60I1 2 40I2 5 200
240I1 1 150I2 2 100I3 5 0
2100I2 1 130I3 5 230
Solve for I1, I2, and I3.
FIGURE P12.27
15 ⍀ 25 ⍀ 50 V
80 V
5 ⍀ 10 ⍀ 20 ⍀
+
–
+
–
FIGURE P12.28
20 V
8 ⍀
4 ⍀
5 ⍀
2 ⍀
+
–
6 ⍀
i3
i1
j2
FIGURE P12.30
200 V
80 V
10 A
20 ⍀
40 ⍀
10 ⍀
100 ⍀ 30 ⍀
+
–
+
–
I1 I2 I3 I4
FIGURE P12.29
100 V
25 ⍀
25 ⍀
8 ⍀
4 ⍀
+
–
10 A
10 ⍀
20 ⍀ I2
I3
I4
I1
338 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS
9.11 yields the following set of simultaneous equations (free-body
diagrams are shown in Fig. P12.34b):
100a 1 T 5 519.72
50a 2 T 1 R 5 216.55
25a 2 R 5 108.28
Solve for acceleration a and the tensions T and R in the two ropes.
12.35 Perform a computation similar to that called for in Prob. 12.34,
but for the system shown in Fig. P12.35.
12.36 Perform the same computation as in Prob. 12.34, but for the
system depicted in Fig. P12.36 (angles are 458).
12.37 Consider the three mass-four spring system in Fig. P12.37.
Determining the equations of motion from gFx 5 ma, for each
mass using its free-body diagram results in the following differential
equations:
x
$
1 1 a
k1 1 k2
m1
bx1 2 a
k2
m1
bx2 5 0
x
$
2 2 a
k2
m2
bx1 1 a
k2 1 k3
m2
bx2 2 a
k3
m2
bx3 5 0
x
$
3 2 a
k3
m3
bx2 1 a
k3 1 k4
m3
bx3 5 0
FIGURE P12.33
F
k4
x4
x
x3
x2
x1
0
k3
k2
k1
FIGURE P12.34
(b)
(a)
1
0
0
k
g
5
0
k
g
a
,
a
c
c
e
l
e
r
a
t
i
o
n
2
5
k
g
45⬚
R
T
R
T
6
9
2
.
9
6
692.96
100 ⫻ 9.8 = 980
6
9
2
.
9
6
⫻
0
.
2
5
=
1
7
3
.
2
4
3
4
6
.
4
8
346.48
50 ⫻ 9.8 = 490
3
4
6
.
4
8
⫻
0
.
3
7
5
=
1
2
9
.
9
3
1
7
3
.
2
4
173.24
25 ⫻ 9.8 = 245
1
7
3
.
2
4
⫻
0
.
3
7
5
=
6
4
.
9
7
PROBLEMS 339
where T 5 temperature (8C), x 5 distance along the rod (m), h9 5
a heat transfer coefficient between the rod and the ambient air
(m22
), and Ta 5 the temperature of the surrounding air (8C). This
equation can be transformed into a set of linear algebraic equations
by using a finite divided difference approximation for the second
derivative (recall Section 4.1.3),
d2
T
dx2
5
Ti11 2 2Ti 1 Ti21
¢x2
where Ti designates the temperature at node i. This approximation
can be substituted into Eq. (P12.38.1) to give
2Ti21 1 (2 1 h¿¢x2
)Ti 2 Ti11 5 h¿¢x2
Ta
This equation can be written for each of the interior nodes of the
rod resulting in a tridiagonal system of equations. The first and last
nodes at the rod’s ends are fixed by boundary conditions.
(a) Develop an analytical solution for Eq. (P12.38.1) for a
10-m rod with Ta 5 20, T(x 5 0) 5 40, T(x 5 10) 5 200,
and h9 5 0.02.
(b) Develop a numerical solution for the same parameter values
employed in (a) using a finite-difference solution with four in-
terior nodes as shown in Fig. P12.38 (Dx 5 2 m).
12.39 The steady-state distribution of temperature on a heated
plate can be modeled by the Laplace equation,
0 5
02
T
0x2
1
02
T
0y2
If the plate is represented by a series of nodes (Fig. P12.39), cen-
tered finite-divided differences can be substituted for the second
derivatives, which results in a system of linear algebraic equations.
Use the Gauss-Seidel method to solve for the temperatures of the
nodes in Fig. P12.39.
where k1 5 k4 5 10 N/m, k2 5 k3 5 30 N/m, and m1 5 m2 5 m3 5
2 kg. Write the three equations in matrix form:
0 5 [Acceleration vector] 1 [k/m matrix][displacement vector x]
At a specific time when x1 5 0.05 m, x2 5 0.04 m, and x3 5 0.03 m,
this forms a tridiagonal matrix. Solve for the acceleration of
each mass.
12.38 Linear algebraic equations can arise in the solution of
differential equations. For example, the following differential equa-
tion derives from a heat balance for a long, thin rod (Fig. P12.38):
d2
T
dx2
1 h¿(Ta 2 T) 5 0 (P12.38.1)
FIGURE P12.35
40 kg
5
0
k
g
10 kg
30⬚
60⬚
Friction = 0.5
Friction = 0.3
Friction = 0.2
FIGURE P12.38
A noninsulated uniform rod positioned between two walls of
constant but different temperature. The finite difference
representation employs four interior nodes.
⌬x
T0 = 40 T5 = 200
Ta = 10
Ta = 10
x = 0 x = 10
FIGURE P12.36
Friction = 0.8
Friction
= 0.2
8 kg
1
0
k
g
1
5
k
g
5 kg
FIGURE P12.37
m1 m2 m3
x1
k2 k3 k4
k1
x2 x3
340 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS
12.40 A rod on a ball and socket joint is attached to cables A and B,
as in Fig. P12.40.
(a) If a 50-N force is exerted on the massless rod at G, what is the
tensile force at cables A and B?
(b) Solve for the reactant forces at the base of the rod. Call the base
point P.
FIGURE P12.39
T12
T11
T22
T21
200⬚C
200⬚C
0⬚C
0⬚C
75⬚C 75⬚C
25⬚C 25⬚C
FIGURE P12.40
Ball and socket
y
x
z
50 N
2 m
2 m
2 m
1 m
B
2 m
1 m
A
27.1 CURRENT 1ST LEVEL HEAD 341
341
PT3.4 TRADE-OFFS
Table PT3.2 provides a summary of the trade-offs involved in solving simultaneous
linear algebraic equations. Two methods—graphical and Cramer’s rule—are limited to
small (# 3) numbers of equations and thus have little utility for practical problem solv-
ing. However, these techniques are useful didactic tools for understanding the behavior
of linear systems in general.
The numerical methods themselves are divided into two general categories: exact
and approximate methods. As the name implies, the former are intended to yield exact
answers. However, because they are affected by round-off errors, they sometimes yield
imprecise results. The magnitude of the round-off error varies from system to system
and is dependent on a number of factors. These include the system’s dimensions, its
condition, and whether the matrix of coefficients is sparse or full. In addition, computer
precision will affect round-off error.
It is recommended that a pivoting strategy be employed in any computer program
implementing exact elimination methods. The inclusion of such a strategy minimizes
round-off error and avoids problems such as division by zero. All other things being
equal, LU decomposition–based algorithms are the methods of choice because of their
efficiency and flexibility.
TABLE PT3.2 Comparison of the characteristics of alternative methods for finding solutions
of simultaneous linear algebraic equations.
Breadth of Programming
Method Stability Precision Application Effort Comments
Graphical — Poor Limited — May take more time than the
numerical method, but can be
useful for visualization
Cramer’s rule — Affected by Limited — Excessive computational effort
round-off error required for more than three
equations
Gauss elimination (with — Affected by General Moderate
partial pivoting) round-off error
LU decomposition — Affected by General Moderate Preferred elimination method; allows
round-off error computation of matrix inverse
Gauss-Seidel May not Excellent Appropriate only Easy
converge if not for diagonally
diagonally dominant dominant systems
EPILOGUE: PART THREE
342 EPILOGUE: PART THREE
Although elimination methods have great utility, their use of the entire matrix of
coefficients can be somewhat limiting when dealing with very large, sparse systems. This
is due to the fact that large portions of computer memory would be devoted to storage of
meaningless zeros. For banded systems, techniques are available to implement elimination
methods without having to store the entire coefficient matrix.
The approximate technique described in this book is called the Gauss-Seidel
method. It differs from the exact techniques in that it employs an iterative scheme to
obtain progressively closer estimates of the solution. Thus, the effect of round-off is a
moot point with the Gauss-Seidel method because the iterations can be continued as
long as is necessary to obtain the desired precision. In addition, versions of the Gauss-
Seidel method can be developed to efficiently utilize computer storage requirements
for sparse systems. Consequently, the Gauss-Seidel technique has utility for large sys-
tems of equations where storage requirements would pose significant problems for the
exact techniques.
The disadvantage of the Gauss-Seidel method is that it does not always converge or
sometimes converges slowly on the true solution. It is strictly reliable only for those
systems that are diagonally dominant. However, relaxation methods are available that
sometimes offset these disadvantages. In addition, because many sets of linear algebraic
equations originating from physical systems exhibit diagonal dominance, the Gauss-
Seidel method has great utility for engineering problem solving.
In summary, a variety of factors will bear on your choice of a technique for a par-
ticular problem involving linear algebraic equations. However, as outlined above, the size
and sparseness of the system are particularly important factors in determining your choice.
PT3.5 IMPORTANT RELATIONSHIPS AND FORMULAS
Every part of this book includes a section that summarizes important formulas. Although
Part Three does not really deal with single formulas, we have used Table PT3.3 to sum-
marize the algorithms that were covered. The table provides an overview that should be
helpful for review and in elucidating the major differences between the methods.
PT3.6 ADVANCED METHODS AND ADDITIONAL REFERENCES
General references on the solution of simultaneous linear equations can be found in
Fadeev and Fadeeva (1963), Stewart (1973), Varga (1962), and Young (1971). Ralston
and Rabinowitz (1978) provide a general summary.
Many advanced techniques are available to increase the savings in time and/or space
when solving linear algebraic equations. Most of these focus on exploiting properties of
the equations such as symmetry and bandedness. In particular, algorithms are available
to operate on sparse matrices to convert them to a minimum banded format. Jacobs
(1977) and Tewarson (1973) include information on this area. Once they are in a mini-
mum banded format, there are a variety of efficient solution strategies that are employed
such as the active column storage approach of Bathe and Wilson (1976).
Aside from n 3 n sets of equations, there are other systems where the number of
equations, m, and number of unknowns, n, are not equal. Systems where m , n are
called underdetermined. In such cases, there can be either no solution or else more than
PT3.6 ADVANCED METHODS AND ADDITIONAL REFERENCES 343
one. Systems where m . n are called overdetermined. For such situations, there is in
general no exact solution. However, it is often possible to develop a compromise solution
that attempts to determine answers that come “closest” to satisfying all the equations
simultaneously. A common approach is to solve the equation in a “least-squares” sense
(Lawson and Hanson, 1974; Wilkinson and Reinsch, 1971). Alternatively, linear program-
ming methods can be used where the equations are solved in an “optimal” sense by
minimizing some objective function (Dantzig, 1963; Luenberger, 1984; and Rabinowitz,
1968). We describe this approach in detail in Chap. 15.
TABLE PT3.3 Summary of important information presented in Part Three.
Potential
Problems and
Method Procedure Remedies
Gauss
elimination
LU
decomposition
Gauss-Seidel
method
Problems:
III conditioning
Round-off
Division by zero
Remedies:
Higher precision
Partial pivoting
Problems:
III conditioning
Round-off
Division by zero
Remedies:
Higher precision
Partial pivoting
Problems:
Divergent or
converges slowly
Remedies:
Diagonal
dominance
Relaxation
£
a11 a12 a13 0 c1
a21 a22 a23 0 c2
a31 a32 a33 0 c3
§ 1 £
a11 a12 a13 Z c1
a'22 a'23 Z c'2
a''
33 Z c''
3
§ 1
x3 5 c''
3ya''
33
x2 5 1c'2 2 a'23x32ya'22
x1 5 1c1 2 a12x1 2 a13x32ya11
Decomposition Back Substitution
£
a11 a12 a13
a21 a22 a23
a31 a32 a33
§ 1 £
1 0 0
l21 1 0
l31 l32 1
§ •
d1
d2
d3
¶ 5 •
c1
c2
c3
¶ 1 £
u11 u12 u13
0 u22 u23
0 0 u33
§ •
x1
x2
x3
¶ 5 •
d1
d2
d3
¶ 5 •
x1
x2
x3
¶
Forward Substitution
xi
1 5 (c1 2 a12xi21
2 2 a13xi21
3 )ya11
xi
2 5 (c2 2 a21xi
1 2 a23xi21
3 )ya22
xi
3 5 (c3 2 a31xi
1 2 a32xi
2)ya33
¶ `
xi
i 2 xi21
i
xi
i
` 100% , es
for all x'i s
continue iteratively until
PART FOUR
345
PT4.1 MOTIVATION
Root location (Part 2) and optimization are related in the sense that both involve guessing
and searching for a point on a function. The fundamental difference between the two types
of problems is illustrated in Fig. PT4.1. Root location involves searching for zeros of a
function or functions. In contrast, optimization involves searching for either the minimum
or the maximum.
The optimum is the point where the curve is flat. In mathematical terms, this corre-
sponds to the x value where the derivative f9(x) is equal to zero. Additionally, the second
derivative, f0(x), indicates whether the optimum is a minimum or a maximum: if f0(x) , 0,
the point is a maximum; if f0(x) . 0, the point is a minimum.
Now, understanding the relationship between roots and optima would suggest a pos-
sible strategy for finding the latter. That is, you can differentiate the function and locate
the root (that is, the zero) of the new function. In fact, some optimization methods seek
to find an optima by solving the root problem: f9(x) 5 0. It should be noted that such
searches are often complicated because f9(x) is not available analytically. Thus, one must
sometimes use finite-difference approximations to estimate the derivative.
Beyond viewing optimization as a roots problem, it should be noted that the task of
locating optima is aided by some extra mathematical structure that is not part of simple
root finding. This tends to make optimization a more tractable task, particularly for
multidimensional cases.
OPTIMIZATION
FIGURE PT4.1
A function of a single variable illustrating the difference between roots and optima.
Maximum
Minimum
0
Root
Root
Root
f(x)
x
f⬘(x) = 0
f⬙(x) ⬎ 0
f⬘(x) = 0
f⬙(x) ⬍ 0
f(x) = 0
346 OPTIMIZATION
PT4.1.1 Noncomputer Methods and History
As mentioned above, differential calculus methods are still used to determine optimum solu-
tions. All engineering and science students recall working maxima-minima problems by
determining first derivatives of functions in their calculus courses. Bernoulli, Euler, Lagrange,
and others laid the foundations of the calculus of variations, which deals with the minimiza-
tion of functions. The Lagrange multiplier method was developed to optimize constrained
problems, that is, optimization problems where the variables are bounded in some way.
The first major advances in numerical approaches occurred only with the develop-
ment of digital computers after World War II. Koopmans in the United Kingdom and
Kantorovich in the former Soviet Union independently worked on the general problem
of least-cost distribution of supplies and products. In 1947, Koopman’s student Dantzig
invented the simplex procedure for solving linear programming problems. This approach
paved the way for other methods of constrained optimization by a number of investiga-
tors, notably Charnes and his coworkers. Approaches for unconstrained optimization also
developed rapidly following the widespread availability of computers.
PT4.1.2 Optimization and Engineering Practice
Most of the mathematical models we have dealt with to this point have been descriptive
models. That is, they have been derived to simulate the behavior of an engineering device
or system. In contrast, optimization typically deals with finding the “best result,” or opti-
mum solution, of a problem. Thus, in the context of modeling, they are often termed
prescriptive models since they can be used to prescribe a course of action or the best design.
Engineers must continuously design devices and products that perform tasks in an
efficient fashion. In doing so, they are constrained by the limitations of the physical
world. Further, they must keep costs down. Thus, they are always confronting optimiza-
tion problems that balance performance and limitations. Some common instances are
listed in Table PT4.1. The following example has been developed to help you get a feel
for the way in which such problems might be formulated.
TABLE PT4.1 Some common examples of optimization problems in engineering.
• Design aircraft for minimum weight and maximum strength.
• Optimal trajectories of space vehicles.
• Design civil engineering structures for minimum cost.
• Design water-resource projects like dams to mitigate flood damage while yielding maximum hydropower.
• Predict structural behavior by minimizing potential energy.
• Material-cutting strategy for minimum cost.
• Design pump and heat transfer equipment for maximum efficiency.
• Maximize power output of electrical networks and machinery while minimizing heat generation.
• Shortest route of salesperson visiting various cities during one sales trip.
• Optimal planning and scheduling.
• Statistical analysis and models with minimum error.
• Optimal pipeline networks.
• Inventory control.
• Maintenance planning to minimize cost.
• Minimize waiting and idling times.
• Design waste treatment systems to meet water-quality standards at least cost.
PT4.1 MOTIVATION 347
EXAMPLE PT4.1 Optimization of Parachute Cost
Problem Statement. Throughout the rest of the book, we have used the falling para-
chutist to illustrate the basic problem areas of numerical methods. You may have noticed
that none of these examples concentrate on what happens after the chute opens. In this
example, we will examine a case where the chute has opened and we are interested in
predicting impact velocity at the ground.
You are an engineer working for an agency planning to airlift supplies to refugees
in a war zone. The supplies will be dropped at low altitude (500 m) so that the drop is
not detected and the supplies fall as close as possible to the refugee camp. The chutes
open immediately upon leaving the plane. To reduce damage, the vertical velocity on
impact must be below a critical value of yc 5 20 m/s.
The parachute used for the drop is depicted in Fig. PT4.2. The cross-sectional area
of the chute is that of a half sphere,
A 5 2pr2
(PT4.1)
The length of each of the 16 cords connecting the chute to the mass is related to the
chute radius by
/ 5 12r (PT4.2)
You know that the drag force for the chute is a linear function of its cross-sectional area
described by the following formula
c 5 kc A (PT4.3)
where c 5 drag coefficient (kg/s) and kc 5 a proportionality constant parameterizing the
effect of area on drag [kg/(s ? m2
)].
Also, you can divide the payload into as many parcels as you like. That is, the mass
of each individual parcel can be calculated as
m 5
Mt
n
FIGURE PT4.2
A deployed parachute.
m
r
ᐉ
348 OPTIMIZATION
where m 5 mass of an individual parcel (kg), Mt 5 total load being dropped (kg), and
n 5 total number of parcels.
Finally, the cost of each chute is related to chute size in a nonlinear fashion,
Cost per chute 5 c0 1 c1/ 1 c2A2
(PT4.4)
where c0, c1, and c2 5 cost coefficients. The constant term, c0, is the base price for the
chutes. The nonlinear relationship between cost and area exists because larger chutes are
much more difficult to construct than small chutes.
Determine the size (r) and number of chutes (n) that result in minimum cost while
at the same time meeting the requirement of having a sufficiently small impact velocity.
Solution. The objective here is to determine the number and size of parachutes to
minimize the cost of the airlift. The problem is constrained because the parcels must
have an impact velocity less than a critical value.
The cost can be computed by multiplying the cost of the individual parachute
[Eq. (PT4.4)] by the number of parachutes (n). Thus, the function you wish to minimize,
which is formally called the objective function, is written as
Minimize C 5 n(c0 1 c1/ 1 c2A2
) (PT4.5)
where C 5 cost ($) and A and / are calculated by Eqs. (PT4.1) and (PT4.2), respectively.
Next, we must specify the constraints. For this problem there are two constraints.
First, the impact velocity must be equal to or less than the critical velocity,
y # yc (PT4.6)
Second, the number of parcels must be an integer and greater than or equal to 1,
n $ 1 (PT4.7)
where n is an integer.
At this point, the optimization problem has been formulated. As can be seen, it is a
nonlinear constrained problem.
Although the problem has been broadly formulated, one more issue must be
addressed: How do we determine the impact velocity y? Recall from Chap. 1 that the
velocity of a falling object can be computed with
y 5
gm
c
(1 2 e2(cym)t
) (1.10)
where y 5 velocity (m/s), g 5 acceleration of gravity (m/s2
), m 5 mass (kg), and t 5
time (s).
Although Eq. (1.10) provides a relationship between y and t, we need to know how long
the mass falls. Therefore, we need a relationship between the drop distance z and the time
of fall t. The drop distance can be calculated from the velocity in Eq. (1.10) by integration
z 5 #
t
0
gm
c
(1 2 e2(c/m)t
) dt (PT4.8)
This integral can be evaluated to yield
z 5 z0 2
gm
c
t 1
gm2
c2
(1 2 e2(c/m)t
) (PT4.9)
PT4.1 MOTIVATION 349
where z0 5 initial height (m). This function, as plotted in Fig. PT4.3, provides a way to
predict z given knowledge of t.
However, we do not need z as a function of t to solve this problem. Rather, we need
to compute the time required for the parcel to fall the distance z0. Thus, we recognize
that we must reformulate Eq. (PT4.9) as a root-finding problem. That is, we must solve
for the time at which z goes to zero,
f(t) 5 0 5 z0 2
gm
c
t 1
gm2
c2
(1 2 e2(cym)t
) (PT4.10)
Once the time to impact is computed, we can substitute it into Eq. (1.10) to solve for
the impact velocity.
The final specification of the problem, therefore, would be
Minimize C 5 n(c0 1 c1/ 1 c2A2
) (PT4.11)
subject to
y # yc (PT4.12)
n $ 1 (PT4.13)
where
A 5 2pr2
(PT4.14)
/ 5 12r (PT4.15)
c 5 kc A (PT4.16)
m 5
Mt
n
(PT4.17)
FIGURE PT4.3
The height z and velocity v of a deployed parachute as it falls to earth (z 5 0).
5 10
t (s)
v (m/s)
z (m)
15
Impact
0
0
200
400
600
350 OPTIMIZATION
t 5 rootcz0 2
gm
c
t 1
gm2
c2
(1 2 e2(cym)t
) d (PT4.18)
y 5
gm
c
(1 2 e2(cym)t
) (PT4.19)
We will solve this problem in Example 15.4 in Chap. 15. For the time being recog-
nize that it has most of the fundamental elements of other optimization problems you
will routinely confront in engineering practice. These are
• The problem will involve an objective function that embodies your goal.
• There will be a number of design variables. These variables can be real numbers or
they can be integers. In our example, these are r (real) and n (integer).
• The problem will include constraints that reflect the limitations you are working under.
We should make one more point before proceeding. Although the objective function
and constraints may superficially appear to be simple equations [e.g., Eq. (PT4.12)], they
may in fact be the “tip of the iceberg.” That is, they may be underlain by complex de-
pendencies and models. For instance, as in our example, they may involve other numeri-
cal methods [Eq. (PT4.18)]. This means that the functional relationships you will be using
could actually represent large and complicated calculations. Thus, techniques that can find
the optimal solution, while minimizing function evaluations, can be extremely valuable.
PT4.2 MATHEMATICAL BACKGROUND
There are a number of mathematical concepts and operations that underlie optimization.
Because we believe that they will be more relevant to you in context, we will defer
discussion of specific mathematical prerequisites until they are needed. For example, we
will discuss the important concepts of the gradient and Hessians at the beginning of
Chap. 14 on multivariate unconstrained optimization. In the meantime, we will limit
ourselves here to the more general topic of how optimization problems are classified.
An optimization or mathematical programming problem generally can be stated as:
Find x, which minimizes or maximizes f(x)
subject to
di (x) # ai i 5 1, 2, p , m (PT4.20)
ei (x) 5 bi i 5 1, 2, p , p (PT4.21)
where x is an n-dimensional design vector, f (x) is the objective function, di(x) are inequal-
ity constraints, ei(x) are equality constraints, and ai and bi are constants.
Optimization problems can be classified on the basis of the form of f(x):
• If f(x) and the constraints are linear, we have linear programming.
• If f(x) is quadratic and the constraints are linear, we have quadratic programming.
• If f(x) is not linear or quadratic and/or the constraints are nonlinear, we have nonlinear
programming.
PT4.3 ORIENTATION 351
Further, when Eqs. (PT4.20) and (PT4.21) are included, we have a constrained optimiza-
tion problem; otherwise, it is an unconstrained optimization problem.
Note that for constrained problems, the degrees of freedom are given by n2p2m.
Generally, to obtain a solution, p 1 m must be # n. If p 1 m . n, the problem is said
to be overconstrained.
Another way in which optimization problems are classified is by dimensionality.
This is most commonly done by dividing them into one-dimensional and multidimen-
sional problems. As the name implies, one-dimensional problems involve functions that
depend on a single dependent variable. As in Fig. PT4.4a, the search then consists of
climbing or descending one-dimensional peaks and valleys. Multidimensional problems
involve functions that depend on two or more dependent variables. In the same spirit, a
two-dimensional optimization can again be visualized as searching out peaks and valleys
(Fig. PT4.4b). However, just as in real hiking, we are not constrained to walk a single
direction, instead the topography is examined to efficiently reach the goal.
Finally, the process of finding a maximum versus finding a minimum is essentially
identical because the same value, x*, both minimizes f(x) and maximizes 2f(x). This
equivalence is illustrated graphically for a one-dimensional function in Fig. PT4.4a.
PT4.3 ORIENTATION
Some orientation is helpful before proceeding to the numerical methods for optimization.
The following is intended to provide an overview of the material in Part Four. In addi-
tion, some objectives have been included to help you focus your efforts when studying
the material.
FIGURE PT4.4
(a) One-dimensional optimization. This figure also illustrates how minimization of f(x) is equivalent
to the maximization of 2f(x). (b) Two-dimensional optimization. Note that this figure can be
taken to represent either a maximization (contours increase in elevation up to the maximum like a
mountain) or a minimization (contours decrease in elevation down to the minimum like a valley).
x*
x*
x
x
(b)
(a)
Optimum f(x*, y*)
Minimum f (x)
f (x)
– f(x)
Maximum – f (x)
f (x, y)
f (x)
y*
y
352 OPTIMIZATION
PT4.3.1 Scope and Preview
Figure PT4.5 is a schematic representation of the organization of Part Four. Examine this
figure carefully, starting at the top and working clockwise.
After the present introduction, Chap. 13 is devoted to one-dimensional unconstrained
optimization. Methods are presented to find the minimum or maximum of a function of
a single variable. Three methods are covered: golden-section search, parabolic interpola-
tion, and Newton’s method. An advanced hybrid approach, Brent’s method, that combines
the reliability of the golden-section search with the speed of parabolic interpolation is
also described.
Chapter 14 covers two general types of methods to solve multidimensional uncon-
strained optimization problems. Direct methods such as random searches, univariate
searches, and pattern searches do not require the evaluation of the function’s derivatives.
On the other hand, gradient methods use either first and sometimes second derivatives
to find the optimum. The chapter introduces the gradient and the Hessian, which are
multidimensional representations of the first and second derivatives. The method of steep-
est ascent/descent is then covered in some detail. This is followed by descriptions of
some advanced methods: conjugate gradient, Newton’s method, Marquardt’s method, and
quasi-Newton methods.
Chapter 15 is devoted to constrained optimization. Linear programming is described
in detail using both a graphical representation and the simplex method. The detailed
analysis of nonlinear constrained optimization is beyond this book’s scope, but we pro-
vide an overview of the major approaches. In addition, we illustrate how such problems
(along with the problems covered in Chaps. 13 and 14) can be obtained with software
packages such as Excel, MATLAB, and Mathcad.
Chapter 16 extends the above concepts to actual engineering problems. Engineering
applications are used to illustrate how optimization problems are formulated and provide
insight into the application of the solution techniques in professional practice.
An epilogue is included at the end of Part Four. It contains an overview of the
methods discussed in Chaps. 13, 14, and 15. This overview includes a description of
trade-offs related to the proper use of each technique. This section also provides refer-
ences for some numerical methods that are beyond the scope of this text.
PT4.3.2 Goals and Objectives
Study Objectives. After completing Part Four, you should have sufficient information
to successfully approach a wide variety of engineering problems dealing with optimiza-
tion. In general, you should have mastered the techniques, have learned to assess their
reliability, and be capable of analyzing alternative methods for any particular problem.
In addition to these general goals, the specific concepts in Table PT4.2 should be as-
similated for a comprehensive understanding of the material in Part Four.
Computer Objectives. You should be able to write a subprogram to implement a simple
one-dimensional (like golden-section search or parabolic interpolation) and multidimen-
sional (like the random-search method) search. In addition, software packages such as Excel,
MATLAB, or Mathcad have varying capabilities for optimization. You can use this part of
the book to become familiar with these capabilities.
PT4.3 ORIENTATION 353
FIGURE PT4.5
Schematic of the organization of the material in Part Four: Optimization.
CHAPTER 13
One-Dimensional
Unconstrained
Optimization
PART 4
Optimization
CHAPTER 14
Multidimensional
Unconstrained
Optimization
CHAPTER 15
Constrained
Optimization
CHAPTER 16
Case Studies
EPILOGUE
14.2
Gradient
methods
14.1
Direct
methods
PT 4.2
Mathematical
background
PT 4.5
Additional
references
16.4
Mechanical
engineering
16.3
Electrical
engineering
16.2
Civil
engineering
16.1
Chemical
engineering
15.1
Linear
programming
15.3
Software
packages 15.2
Nonlinear
constrained
PT 4.4
Trade-offs
PT 4.3
Orientation
PT 4.1
Motivation
13.2
Parabolic
interpolation
13.3
Newton's
method
13.4
Brent's
method
13.1
Golden-section
search
354 OPTIMIZATION
TABLE PT4.2 Specific study objectives for Part Four.
1. Understand why and where optimization occurs in engineering problem solving.
2. Understand the major elements of the general optimization problem: objective function, decision
variables, and constraints.
3. Be able to distinguish between linear and nonlinear optimization, and between constrained and
unconstrained problems.
4. Be able to define the golden ratio and understand how it makes one-dimensional optimization
efficient.
5. Locate the optimum of a single variable function with the golden-section search, parabolic
interpolation, and Newton’s method. Also, recognize the trade-offs among these approaches, with
particular attention to initial guesses and convergence.
6. Understand how Brent’s optimization method combines the reliability of the golden-section search
with the speed of parabolic interpolation.
7. Be capable of writing a program and solving for the optimum of a multivariable function using
random searching.
8. Understand the ideas behind pattern searches, conjugate directions, and Powell’s method.
9. Be able to define and evaluate the gradient and Hessian of a multivariable function both
analytically and numerically.
10. Compute by hand the optimum of a two-variable function using the method of steepest ascent/
descent.
11. Understand the basic ideas behind the conjugate gradient, Newton’s, Marquardt’s, and quasi-
Newton methods. In particular, understand the trade-offs among the approaches and recognize how
each improves on the steepest ascent/descent.
12. Be capable of recognizing and setting up a linear programming problem to represent applicable
engineering problems.
13. Be able to solve a two-dimensional linear programming problem with both the graphical and simplex
methods.
14. Understand the four possible outcomes of a linear programming problem.
15. Be able to set up and solve nonlinear constrained optimization problems using a software package.
13
355
C H A P T E R 13
One-Dimensional Unconstrained
Optimization
This section will describe techniques to find the minimum or maximum of a function of
a single variable, f(x). A useful image in this regard is the one-dimensional, “roller coaster”–
like function depicted in Fig. 13.1. Recall from Part Two that root location was complicated
by the fact that several roots can occur for a single function. Similarly, both local and
global optima can occur in optimization. Such cases are called multimodal. In almost all
instances, we will be interested in finding the absolute highest or lowest value of a func-
tion. Thus, we must take care that we do not mistake a local result for the global optimum.
Distinguishing a global from a local extremum can be a very difficult problem for
the general case. There are three usual ways to approach this problem. First, insight into
the behavior of low-dimensional functions can sometimes be obtained graphically. Sec-
ond, finding optima based on widely varying and perhaps randomly generated starting
guesses, and then selecting the largest of these as global. Finally, perturbing the starting
point associated with a local optimum and seeing if the routine returns a better point or
always returns to the same point. Although all these approaches can have utility, the fact
is that in some problems (usually the large ones), there may be no practical way to
ensure that you have located a global optimum. However, although you should always
FIGURE 13.1
A function that asymptotically approaches zero at plus and minus q and has two maximum and
two minimum points in the vicinity of the origin. The two points to the right are local optima,
whereas the two to the left are global.
Local
maximum
Local
minimum
Global
minimum
Global
maximum
f(x)
x
356 ONE-DIMENSIONAL UNCONSTRAINED OPTIMIZATION
be sensitive to the issue, it is fortunate that there are numerous engineering problems
where you can locate the global optimum in an unambiguous fashion.
Just as in root location, optimization in one dimension can be divided into bracket-
ing and open methods. As described in the next section, the golden-section search is an
example of a bracketing method that depends on initial guesses that bracket a single
optimum. This is followed by an alternative approach, parabolic interpolation, which
often converges faster than the golden-section search, but sometimes diverges.
Another method described in this chapter is an open method based on the idea from
calculus that the minimum or maximum can be found by solving f9(x) 5 0. This reduces
the optimization problem to finding the root of f9(x) using techniques of the sort described
in Part Two. We will demonstrate one version of this approach—Newton’s method.
Finally, an advanced hybrid approach, Brent’s method, is described. This ap-
proach combines the reliability of the golden-section search with the speed of para-
bolic interpolation.
13.1 GOLDEN-SECTION SEARCH
In solving for the root of a single nonlinear equation, the goal was to find the value of the
variable x that yields a zero of the function f(x). Single-variable optimization has the goal
of finding the value of x that yields an extremum, either a maximum or minimum of f(x).
The golden-section search is a simple, general-purpose, single-variable search tech-
nique. It is similar in spirit to the bisection approach for locating roots in Chap. 5. Recall
that bisection hinged on defining an interval, specified by a lower guess (xl) and an upper
guess (xu), that bracketed a single root. The presence of a root between these bounds
was verified by determining that f(xl) and f(xu) had different signs. The root was then
estimated as the midpoint of this interval,
xr 5
xl 1 xu
2
The final step in a bisection iteration involved determining a new smaller bracket. This
was done by replacing whichever of the bounds xl or xu had a function value with the
same sign as f(xr). One advantage of this approach was that the new value xr replaced
one of the old bounds.
Now we can develop a similar approach for locating the optimum of a one-dimensional
function. For simplicity, we will focus on the problem of finding a maximum. When we
discuss the computer algorithm, we will describe the minor modifications needed to simu-
late a minimum.
As with bisection, we can start by defining an interval that contains a single answer.
That is, the interval should contain a single maximum, and hence is called unimodal. We
can adopt the same nomenclature as for bisection, where xl and xu defined the lower and
upper bounds, respectively, of such an interval. However, in contrast to bisection, we
need a new strategy for finding a maximum within the interval. Rather than using only
two function values (which are sufficient to detect a sign change, and hence a zero), we
would need three function values to detect whether a maximum occurred. Thus, an ad-
ditional point within the interval has to be chosen. Next, we have to pick a fourth point.
13.1 GOLDEN-SECTION SEARCH 357
Then the test for the maximum could be applied to discern whether the maximum occurred
within the first three or the last three points.
The key to making this approach efficient is the wise choice of the intermediate
points. As in bisection, the goal is to minimize function evaluations by replacing old
values with new values. This goal can be achieved by specifying that the following two
conditions hold (Fig. 13.2):
/0 5 /1 1 /2 (13.1)
/1
/0
5
/2
/1
(13.2)
The first condition specifies that the sum of the two sublengths /1 and /2 must equal the
original interval length. The second says that the ratio of the lengths must be equal.
Equation (13.1) can be substituted into Eq. (13.2),
/1
/1 1 /2
5
/2
/1
(13.3)
If the reciprocal is taken and R 5 /2 y/1, we arrive at
1 1 R 5
1
R
(13.4)
or
R2
1 R 2 1 5 0 (13.5)
which can be solved for the positive root
R 5
21 1 11 2 4(21)
2
5
15 2 1
2
5 0.61803p (13.6)
FIGURE 13.2
The initial step of the golden-section search algorithm involves choosing two interior points
according to the golden ratio.
Maximum
First
iteration
Second
iteration
f (x)
x
xu
xl
ᐉ0
ᐉ1
ᐉ2
ᐉ2
358 ONE-DIMENSIONAL UNCONSTRAINED OPTIMIZATION
This value, which has been known since antiquity, is called the golden ratio (see
Box 13.1). Because it allows optima to be found efficiently, it is the key element of the
golden-section method we have been developing conceptually. Now let us derive an al-
gorithm to implement this approach on the computer.
As mentioned above and as depicted in Fig. 13.4, the method starts with two initial
guesses, xl and xu, that bracket one local extremum of f(x). Next, two interior points x1
and x2 are chosen according to the golden ratio,
d 5
15 2 1
2
(xu 2 xl)
x1 5 xl 1 d
x2 5 xu 2 d
The function is evaluated at these two interior points. Two results can occur:
1. If, as is the case in Fig. 13.4, f(x1) . f(x2), then the domain of x to the left of x2,
from xl to x2, can be eliminated because it does not contain the maximum. For this
case, x2 becomes the new xl for the next round.
2. If f(x2) . f(x1), then the domain of x to the right of x1, from x1 to xu would have been
eliminated. In this case, x1 becomes the new xu for the next round.
Box 13.1 The Golden Ratio and Fibonacci Numbers
In many cultures, certain numbers are ascribed qualities. For example,
we in the West are all familiar with “Lucky 7” and “Friday the 13th.”
Ancient Greeks called the following number the “golden ratio:”
15 2 1
2
5 0.61803 p
This ratio was employed for a number of purposes, including the
development of the rectangle in Fig. 13.3. These proportions were
considered aesthetically pleasing by the Greeks. Among other
things, many of their temples followed this shape.
The golden ratio is related to an important mathematical series
known as the Fibonacci numbers, which are
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, p
Thus, each number after the first two represents the sum of the
preceding two. This sequence pops up in many diverse areas of sci-
ence and engineering. In the context of the present discussion, an
interesting property of the Fibonacci sequence relates to the ratio of
consecutive numbers in the sequence; that is, 0y1 5 0, 1y1 5 1,
1y2 5 0.5, 2y3 5 0.667, 3y5 5 0.6, 5y8 5 0.625, 8y13 5 0.615,
and so on. As one proceeds, the ratio of consecutive numbers ap-
proaches the golden ratio!
FIGURE 13.3
The Parthenon in Athens, Greece, was constructed in the
5th century B.C. Its front dimensions can be fit almost exactly
within a golden rectangle.
0.61803
1
13.1 GOLDEN-SECTION SEARCH 359
Now, here is the real benefit from the use of the golden ratio. Because the original
x1 and x2 were chosen using the golden ratio, we do not have to recalculate all the func-
tion values for the next iteration. For example, for the case illustrated in Fig. 13.4, the
old x1 becomes the new x2. This means that we already have the value for the new f(x2),
since it is the same as the function value at the old x1.
To complete the algorithm, we now only need to determine the new x1. This is done
with the same proportionality as before,
x1 5 xl 1
15 2 1
2
(xu 2 xl)
A similar approach would be used for the alternate case where the optimum fell in the
left subinterval.
As the iterations are repeated, the interval containing the extremum is reduced rap-
idly. In fact, each round the interval is reduced by a factor of the golden ratio (about
61.8%). That means that after 10 rounds, the interval is shrunk to about 0.61810
or 0.008
or 0.8% of its initial length. After 20 rounds, it is about 0.0066%. This is not quite as
good as the reduction achieved with bisection, but this is a harder problem.
FIGURE 13.4
(a) The initial step of the golden-section search algorithm involves choosing two interior points ac-
cording to the golden ratio. (b) The second step involves defining a new interval that includes the
optimum.
Extremum
(maximum)
Eliminate
f (x)
x
x1
xl d
xu
x2 d
(a)
f(x)
x
x2 x1
xl
Old x1
Old x2
xu
(b)
360 ONE-DIMENSIONAL UNCONSTRAINED OPTIMIZATION
EXAMPLE 13.1 Golden-Section Search
Problem Statement. Use the golden-section search to find the maximum of
f(x) 5 2 sin x 2
x2
10
within the interval xl 5 0 and xu 5 4.
Solution. First, the golden ratio is used to create the two interior points
d 5
15 2 1
2
(4 2 0) 5 2.472
x1 5 0 1 2.472 5 2.472
x2 5 4 2 2.472 5 1.528
The function can be evaluated at the interior points
f(x2) 5 f(1.528) 5 2 sin(1.528) 2
1.5282
10
5 1.765
f(x1) 5 f(2.472) 5 0.63
Because f(x2) . f(x1), the maximum is in the interval defined by xl, x2, and x1. Thus,
for the new interval, the lower bound remains xl 5 0, and x1 becomes the upper bound,
that is, xu 5 2.472. In addition, the former x2 value becomes the new x1, that is, x1 5 1.528.
Further, we do not have to recalculate f(x1) because it was determined on the previous it-
eration as f(1.528) 5 1.765.
All that remains is to compute the new values of d and x2,
d 5
15 2 1
2
(2.472 2 0) 5 1.528
x2 5 2.4721 2 1.528 5 0.944
The function evaluation at x2 is f(0.994) 5 1.531. Since this value is less than the
function value at x1, the maximum is in the interval prescribed by x2, x1, and xu.
The process can be repeated, with the results tabulated below:
i xl f(xl) x2 f(x2) x1 f(x1) xu f(xu) d
1 0 0 1.5279 1.7647 2.4721 0.6300 4.0000 23.1136 2.4721
2 0 0 0.9443 1.5310 1.5279 1.7647 2.4721 0.6300 1.5279
3 0.9443 1.5310 1.5279 1.7647 1.8885 1.5432 2.4721 0.6300 0.9443
4 0.9443 1.5310 1.3050 1.7595 1.5279 1.7647 1.8885 1.5432 0.5836
5 1.3050 1.7595 1.5279 1.7647 1.6656 1.7136 1.8885 1.5432 0.3607
6 1.3050 1.7595 1.4427 1.7755 1.5279 1.7647 1.6656 1.7136 0.2229
7 1.3050 1.7595 1.3901 1.7742 1.4427 1.7755 1.5279 1.7647 0.1378
8 1.3901 1.7742 1.4427 1.7755 1.4752 1.7732 1.5279 1.7647 0.0851
13.1 GOLDEN-SECTION SEARCH 361
Note that the current maximum is highlighted for every iteration. After the eighth
iteration, the maximum occurs at x 5 1.4427 with a function value of 1.7755. Thus, the
result is converging on the true value of 1.7757 at x 5 1.4276.
Recall that for bisection (Sec. 5.2.1), an exact upper bound for the error can be cal-
culated at each iteration. Using similar reasoning, an upper bound for golden-section search
can be derived as follows: Once an iteration is complete, the optimum will either fall in
one of two intervals. If x2 is the optimum function value, it will be in the lower interval
(xl, x2, x1). If x1 is the optimum function value, it will be in the upper interval (x2, x1, xu).
Because the interior points are symmetrical, either case can be used to define the error.
Looking at the upper interval, if the true value were at the far left, the maximum
distance from the estimate would be
¢xa 5 x1 2 x2
5 xl 1 R(xu 2 xl) 2 xu 1 R(xu 2 xl)
5 (xl 2 xu) 1 2R(xu 2 xl)
5 (2R 2 1)(xu 2 xl)
or 0.236(xu 2 xl).
If the true value were at the far right, the maximum distance from the estimate
would be
¢xb 5 xu 2 x1
5 xu 2 xl 2 R(xu 2 xl)
5 (1 2 R)(xu 2 xl)
or 0.382(xu 2 xl). Therefore, this case would represent the maximum error. This result
can then be normalized to the optimal value for that iteration, xopt, to yield
ea 5 (1 2 R) `
xu 2 xl
xopt
` 100%
This estimate provides a basis for terminating the iterations.
Pseudocode for the golden-section-search algorithm for maximization is presented in
Fig. 13.5a. The minor modifications to convert the algorithm to minimization are listed
in Fig. 13.5b. In both versions the x value for the optimum is returned as the function
value (gold). In addition, the value of f(x) at the optimum is returned as the variable (fx).
You may be wondering why we have stressed the reduced function evaluations of
the golden-section search. Of course, for solving a single optimization, the speed savings
would be negligible. However, there are two important contexts where minimizing the
number of function evaluations can be important. These are
1. Many evaluations. There are cases where the golden-section-search algorithm may be
a part of a much larger calculation. In such cases, it may be called many times.
Therefore, keeping function evaluations to a minimum could pay great dividends for
such cases.
362 ONE-DIMENSIONAL UNCONSTRAINED OPTIMIZATION
FUNCTION Gold (xlow, xhigh, maxit, es, fx)
R 5 (50.5
2 1)Y2
x/ = xlow; xu 5 xhigh
iter 5 1
d 5 R * (xu 2 x/)
x1 5 x/ 1 d; x2 5 xu 2 d
f1 5 f(x1)
f2 5 f(x2)
IF f1 . f2 THEN IF f1 , f2 THEN
xopt 5 x1
fx 5 f1
ELSE
xopt 5 x2
fx 5 f2
END IF
DO
d 5 R*d; xint 5 xu 2 x/
IF f1 . f2 THEN IF f1 , f2 THEN
x/ 5 x2
x2 5 x1
x1 5 x/1d
f2 5 f1
f1 5 f(x1)
ELSE
xu 5 x1
x1 5 x2
x2 5 xu2d
f1 5 f2
f2 5 f(x2)
END IF
iter 5 iter11
IF f1 . f2 THEN IF f1 , f2 THEN
xopt 5 x1
fx 5 f1
ELSE
xopt 5 x2
fx 5 f2
END IF
IF xopt fi 0. THEN
ea 5 (1.2R) *ABS(xintyxopt)*100.
END IF
IF ea # es OR iter $ maxit EXIT
END DO
Gold 5 xopt
END Gold
(a)Maximization (b)Minimization
FIGURE 13.5
Algorithm for the golden-section
search.
13.2 PARABOLIC INTERPOLATION 363
2. Time-consuming evaluation. For pedagogical reasons, we use simple functions in most
of our examples. You should understand that a function can be very complex and time-
consuming to evaluate. For example, in a later part of this book, we will describe how
optimization can be used to estimate the parameters of a model consisting of a system
of differential equations. For such cases, the “function” involves time-consuming model
integration. Any method that minimizes such evaluations would be advantageous.
13.2 PARABOLIC INTERPOLATION
Parabolic interpolation takes advantage of the fact that a second-order polynomial often
provides a good approximation to the shape of f(x) near an optimum (Fig. 13.6).
Just as there is only one straight line connecting two points, there is only one qua-
dratic polynomial or parabola connecting three points. Thus, if we have three points that
jointly bracket an optimum, we can fit a parabola to the points. Then we can differenti-
ate it, set the result equal to zero, and solve for an estimate of the optimal x. It can be
shown through some algebraic manipulations that the result is
x3 5
f(x0)(x2
1 2 x2
2) 1 f(x1)(x2
2 2 x2
0) 1 f(x2)(x2
0 2 x2
1)
2 f(x0)(x1 2 x2) 1 2 f(x1)(x2 2 x0) 1 2 f(x2)(x0 2 x1)
(13.7)
where x0, x1, and x2 are the initial guesses, and x3 is the value of x that corresponds to
the maximum value of the parabolic fit to the guesses. After generating the new point,
there are two strategies for selecting the points for the next iteration. The simplest ap-
proach, which is similar to the secant method, is to merely assign the new points se-
quentially. That is, for the new iteration, z0 5 z1, z1 5 z2, and z2 5 z3. Alternatively, as
illustrated in the following example, a bracketing approach, similar to bisection or the
golden-section search, can be employed.
FIGURE 13.6
Graphical description of parabolic interpolation.
Parabolic
approximation
of maximum
Parabolic
function
True maximum
True function
f (x)
x
x0 x1 x3 x2
364 ONE-DIMENSIONAL UNCONSTRAINED OPTIMIZATION
EXAMPLE 13.2 Parabolic Interpolation
Problem Statement. Use parabolic interpolation to approximate the maximum of
f(x) 5 2 sin x 2
x2
10
with initial guesses of x0 5 0, x1 5 1, and x2 5 4.
Solution. The function values at the three guesses can be evaluated,
x0 5 0 f(x0) 5 0
x1 5 1 f(x1) 5 1.5829
x2 5 4 f(x2) 5 23.1136
and substituted into Eq. (13.7) to give
x3 5
0(12
2 42
) 1 1.5829(42
2 02
) 1 (23.1136)(02
2 12
)
2(0)(1 2 4) 1 2(1.5829)(4 2 0) 1 2(23.1136)(0 2 1)
5 1.5055
which has a function value of f(1.5055) 5 1.7691.
Next, a strategy similar to the golden-section search can be employed to determine
which point should be discarded. Because the function value for the new point is higher
than for the intermediate point (x1) and the new x value is to the right of the intermedi-
ate point, the lower guess (x0) is discarded. Therefore, for the next iteration,
x0 5 1 f(x0) 5 1.5829
x1 5 1.5055 f(x1) 5 1.7691
x2 5 4 f(x2) 5 23.1136
which can be substituted into Eq. (13.7) to give
x3 5
1.5829(1.50552
2 42
) 1 1.7691(42
2 12
) 1 (23.1136)(12
2 1.50552
)
2(1.5829)(1.5055 2 4) 1 2(1.7691)(4 2 1) 1 2(23.1136)(1 2 1.5055)
5 1.4903
which has a function value of f(1.4903) 5 1.7714.
The process can be repeated, with the results tabulated below:
i x0 f(x0) x1 f(x1) x2 f(x2) x3 f(x3)
1 0.0000 0.0000 1.0000 1.5829 4.0000 23.1136 1.5055 1.7691
2 1.0000 1.5829 1.5055 1.7691 4.0000 23.1136 1.4903 1.7714
3 1.0000 1.5829 1.4903 1.7714 1.5055 1.7691 1.4256 1.7757
4 1.0000 1.5829 1.4256 1.7757 1.4903 1.7714 1.4266 1.7757
5 1.4256 1.7757 1.4266 1.7757 1.4903 1.7714 1.4275 1.7757
Thus, within five iterations, the result is converging rapidly on the true value of 1.7757
at x 5 1.4276.
13.3 NEWTON’S METHOD 365
We should mention that just like the false-position method, parabolic interpolation
can get hung up with just one end of the interval converging. Thus, convergence can
be slow. For example, notice that in our example, 1.0000 was an endpoint for most of
the iterations.
This method, as well as others using third-order polynomials, can be formulated into
algorithms that contain convergence tests, careful selection strategies for the points to
retain on each iteration, and attempts to minimize round-off error accumulation.
13.3 NEWTON’S METHOD
Recall that the Newton-Raphson method of Chap. 6 is an open method that finds the
root x of a function such that f(x) 5 0. The method is summarized as
xi11 5 xi 2
f(xi)
f¿(xi)
A similar open approach can be used to find an optimum of f(x) by defining a new
function, g(x) 5 f9(x). Thus, because the same optimal value x* satisfies both
f¿(x*) 5 g(x*) 5 0
we can use the following,
xi11 5 xi 2
f¿(xi)
f–(xi)
(13.8)
as a technique to find the minimum or maximum of f(x). It should be noted that this
equation can also be derived by writing a second-order Taylor series for f(x) and setting
the derivative of the series equal to zero. Newton’s method is an open method similar to
Newton-Raphson because it does not require initial guesses that bracket the optimum. In
addition, it also shares the disadvantage that it may be divergent. Finally, it is usually a
good idea to check that the second derivative has the correct sign to confirm that the
technique is converging on the result you desire.
EXAMPLE 13.3 Newton’s Method
Problem Statement. Use Newton’s method to find the maximum of
f(x) 5 2 sin x 2
x2
10
with an initial guess of x0 5 2.5.
Solution. The first and second derivatives of the function can be evaluated as
f ¿(x) 5 2 cos x 2
x
5
f–(x) 5 22 sin x 2
1
5
366 ONE-DIMENSIONAL UNCONSTRAINED OPTIMIZATION
which can be substituted into Eq. (13.8) to give
xi11 5 xi 2
2 cos xi 2 xiy5
22 sin xi 2 1y5
Substituting the initial guess yields
x1 5 2.5 2
2 cos 2.5 2 2.5y5
22 sin 2.5 2 1y5
5 0.99508
which has a function value of 1.57859. The second iteration gives
x1 5 0.995 2
2 cos 0.995 2 0.995y5
22 sin 0.995 2 1y5
5 1.46901
which has a function value of 1.77385.
The process can be repeated, with the results tabulated below:
i x f(x) f’(x) f’’(x)
0 2.5 0.57194 22.10229 21.39694
1 0.99508 1.57859 0.88985 21.87761
2 1.46901 1.77385 20.09058 22.18965
3 1.42764 1.77573 20.00020 22.17954
4 1.42755 1.77573 0.00000 22.17952
Thus, within four iterations, the result converges rapidly on the true value.
Although Newton’s method works well in some cases, it is impractical for cases
where the derivatives cannot be conveniently evaluated. For these cases, other approaches
that do not involve derivative evaluation are available. For example, a secant-like version
of Newton’s method can be developed by using finite-difference approximations for the
derivative evaluations.
A bigger reservation regarding the approach is that it may diverge based on the
nature of the function and the quality of the initial guess. Thus, it is usually employed
only when we are close to the optimum. As described next, hybrid techniques that use
bracketing approaches far from the optimum and open methods near the optimum attempt
to exploit the strong points of both approaches.
13.4 BRENT’S METHOD
Recall that in Sec. 6.4, we described Brent’s method for root location. This hybrid
method combined several root-finding methods into a single algorithm that balanced
reliability with efficiency.
Brent also developed a similar approach for one-dimensional minimization. It combines
the slow, dependable golden-section search with the faster, but possibly unreliable, parabolic
interpolation. It first attempts parabolic interpolation and keeps applying it as long as ac-
ceptable results are obtained. If not, it uses the golden-section search to get matters in hand.
Figure 13.7 presents pseudocode for the algorithm based on a MATLAB software
M-file developed by Cleve Moler (2005). It represents a stripped-down version of the
13.4 BRENT’S METHOD 367
Function fminsimp(x1, xu)
tol 5 0.000001; phi 5 (1 + 15)/2;; rho 5 2 2 phi
u 5 x1 1 rho*(xu 2 x1); v 5 u; w 5 u; x 5 u
fu 5 f(u); fv 5 fu; fw 5 fu; fx 5 fu
xm 5 0.5*(x1 1 xu); d 5 0; e 5 0
DO
IF |x 2 xm| # tol EXIT
para 5 |e| . tol
IF para THEN (Try parabolic fit)
r 5 (x 2 w)*(fx 2 fv); q 5 (x 2 v)*(fx 2 fw)
p 5 (x 2 v)*q 2 (x 2 w)*r; s 5 2*(q 2 r)
IF s . 0 THEN p 5 2p
s 5 |s|
' Is the parabola acceptable?
para 5 |p| , |0.5*s*e| And p . s*(x1 2 x) And p , s*(xu 2 x)
IF para THEN
e 5 d; d 5 p/s (Parabolic interpolation step)
ENDIF
ENDIF
IF Not para THEN
IF x $ xm THEN (Golden-section search step)
e 5 x1 2 x
ELSE
e 5 xu 2 x
ENDIF
d 5 rho*e
ENDIF
u 5 x 1 d; fu 5 f(u)
IF fu # fx THEN (Update x1, xu, x, v, w, xm)
IF u $ x THEN
x1 5 x
ELSE
xu 5 x
ENDIF
v 5 w; fv 5 fw; w 5 x; fw 5 fx; x 5 u; fx 5 fu
ELSE
IF u , x THEN
x1 5 u
ELSE
xu 5 u
ENDIF
IF fu # fw Or w 5 x THEN
v 5 w; fv 5 fw; w 5 u; fw 5 fu
ELSEIF fu # fv Or v 5 x Or v 5 w THEN
v 5 u; fv 5 fu
ENDIF
ENDIF
xm 5 0.5*(x1 1 xu)
ENDDO
fminsimp 5 fu
END fminsimp
FIGURE 13.7
Pseudocode for Brent’s
minimum-finding algorithm
based on a MATLAB M-file
developed by Cleve
Moler (2005).
368 ONE-DIMENSIONAL UNCONSTRAINED OPTIMIZATION
fminbnd function, which is the professional minimization function employed in MATLAB.
For that reason, we call the simplified version fminsimp. Note that it requires another
function f that holds the equation for which the minimum is being evaluated.
This concludes our treatment of methods to solve the optima of functions of a
single variable. Some engineering examples are presented in Chap. 16. In addition, the
techniques described here are an important element of some procedures to optimize
multivariable functions, as discussed in Chap. 14.
PROBLEMS
13.1 Given the formula
f(x) 5 2x2
1 8x 2 12
(a) Determine the maximum and the corresponding value of x for
this function analytically (i.e., using differentiation).
(b) Verify that Eq. (13.7) yields the same results based on initial
guesses of x0 5 0, x1 5 2, and x2 5 6.
13.2 Given
f(x) 5 21.5x6
2 2x4
1 12x
(a) Plot the function.
(b) Use analytical methods to prove that the function is concave for
all values of x.
(c) Differentiate the function and then use a root-location
method to solve for the maximum f(x) and the corresponding
value of x.
13.3 Solve for the value of x that maximizes f(x) in Prob. 13.2
using the golden-section search. Employ initial guesses of xl 5 0
and xu 5 2 and perform three iterations.
13.4 Repeat Prob. 13.3, except use parabolic interpolation in the same
fashion as Example 13.2. Employ initial guesses of x0 5 0, x1 5 1, and
x2 5 2 and perform three iterations.
13.5 Repeat Prob. 13.3 but use Newton’s method. Employ an ini-
tial guess of x0 5 2 and perform three iterations.
13.6 Employ the following methods to find the maximum of
f(x) 5 4x 2 1.8x2
1 1.2x3
2 0.3x4
(a) Golden-section search (xl 5 22, xu 5 4, es 5 1%).
(b) Parabolic interpolation (x0 5 1.75, x1 5 2, x2 5 2.5, itera-
tions 5 4). Select new points sequentially as in the secant
method.
(c) Newton’s method (x0 5 3, es 5 1%).
13.7 Consider the following function:
f(x) 5 2 x4
2 2x3
2 8x2
2 5x
Use analytical and graphical methods to show the function has a
maximum for some value of x in the range 22 # x # 1.
13.8 Employ the following methods to find the maximum of the
function from Prob. 13.7:
(a) Golden-section search (xl 5 22, xu 5 1, es 5 1%).
(b) Parabolic interpolation (x0 5 22, x1 5 21, x2 5 1, itera-
tions 5 4). Select new points sequentially as in the secant
method.
(c) Newton’s method (x0 5 21, es 5 1%).
13.9 Consider the following function:
f(x) 5 2x 1
3
x
Perform 10 iterations of parabolic interpolation to locate the mini-
mum. Select new points in the same fashion as in Example 13.2.
Comment on the convergence of your results. (x0 5 0.1, x1 5 0.5,
x2 5 5)
13.10 Consider the following function:
f(x) 5 3 1 6x 1 5x2
1 3x3
1 4x4
Locate the minimum by finding the root of the derivative of this
function. Use bisection with initial guesses of xl 5 22 and xu 5 1.
13.11 Determine the minimum of the function from Prob. 13.10
with the following methods:
(a) Newton’s method (x0 5 21, es 5 1%).
(b) Newton’s method, but using a finite difference approximation
for the derivative estimates.
f¿(x) 5
f(xi 1 dxi) 2 f(xi 2 dxi)
2dxi
f–(x) 5
f(xi 1 dxi) 2 2f(xi) 2 f(xi 2 dxi)
(dxi)2
where d 5 a perturbation fraction (5 0.01). Use an initial guess of
x0 5 21 and iterate to es 5 1%.
13.12 Develop a program using a programming or macro language
to implement the golden-section search algorithm. Design the pro-
gram so that it is expressly designed to locate a maximum. The
subroutine should have the following features:
PROBLEMS 369
Given that L 5 600 cm, E 5 50,000 kN/cm2
, I 5 30,000 cm4
, and
w0 5 2.5 kN/cm, determine the point of maximum deflection (a)
graphically, (b) using the golden-section search until the approximate
error falls below es 5 1% with initial guesses of xl 5 0 and xu 5 L.
13.19 An object with a mass of 100 kg is projected upward from the
surface of the earth at a velocity of 50 m/s. If the object is subject to
linear drag (c 5 15 kg/s), use the golden-section search to determine
the maximum height the object attains. Hint: recall Sec. PT4.1.2.
13.20 The normal distribution is a bell-shaped curve defined by
y 5 e2x2
Use the golden-section search to determine the location of the
inflection point of this curve for positive x.
13.21 An object can be projected upward at a specified velocity. If
it is subject to linear drag, its altitude as a function of time can be
computed as
z 5 z0 1
m
c
ay0 1
mg
c
b (1 2 e2(cym)t
) 2
mg
c
t
where z 5 altitude (m) above the earth’s surface (defined as z 5 0),
z0 5 the initial altitude (m), m 5 mass (kg), c 5 a linear drag coef-
ficient (kg/s), v0 5 initial velocity (m/s), and t 5 time (s). Note that
for this formulation, positive velocity is considered to be in the up-
ward direction. Given the following parameter values: g 5 9.81 m/s2
,
z0 5 100 m, v0 5 55 m/s, m 5 80 kg, and c 5 15 kg/s, the equation
can be used to calculate the jumper’s altitude. Determine the time and
altitude of the peak elevation (a) graphically, (b) analytically, and (c)
with the golden-section search until the approximate error falls be-
low es 5 1% with initial guesses of tl 5 0 and tu 5 10 s.
13.22 Use the golden-section search to determine the length of the
shortest ladder that reaches from the ground over the fence to touch the
building’s wall (Fig. P13.22). Test it for the case where h 5 d 5 4 m.
• Iterate until the relative error falls below a stopping criterion or
exceeds a maximum number of iterations.
• Return both the optimal x and f(x).
• Minimize the number of function evaluations.
Test your program with the same problem as Example 13.1.
13.13 Develop a program as described in Prob. 13.12, but make it
perform minimization or maximization depending on the user’s
preference.
13.14 Develop a program using a programming or macro language
to implement the parabolic interpolation algorithm. Design the pro-
gram so that it is expressly designed to locate a maximum and se-
lects new points as in Example 13.2. The subroutine should have
the following features:
• Base it on two initial guesses, and have the program generate the
third initial value at the midpoint of the interval.
• Check whether the guesses bracket a maximum. If not, the sub-
routine should not implement the algorithm, but should return an
error message.
• Iterate until the relative error falls below a stopping criterion or
exceeds a maximum number of iterations.
• Return both the optimal x and f(x).
• Minimize the number of function evaluations.
Test your program with the same problem as Example 13.2.
13.15 Develop a program using a programming or macro language
to implement Newton’s method. The subroutine should have the
following features:
• Iterate until the relative error falls below a stopping criterion or
exceeds a maximum number of iterations.
• Returns both the optimal x and f(x).
Test your program with the same problem as Example 13.3.
13.16 Pressure measurements are taken at certain points behind an
airfoil over time. These data best fit the curve y 5 6 cos x 2 1.5 sin x
from x 5 0 to 6 s. Use four iterations of the golden-search method
to find the minimum pressure. Set xl 5 2 and xu 5 4.
13.17 The trajectory of a ball can be computed with
y 5 (tan u0)x 2
g
2y2
0 cos2
u0
x2
1 y0
where y 5 the height (m), u0 5 the initial angle (radians), y0 5 the
initial velocity (m/s), g 5 the gravitational constant 5 9.81 m/s2
,
and y0 5 the initial height (m). Use the golden-section search to
determine the maximum height given y0 5 1 m, y0 5 25 m/s and
u0 5 508. Iterate until the approximate error falls below es 5 1%
using initial guesses of xl 5 0 and xu 5 60 m.
13.18 The deflection of a uniform beam subject to a linearly in-
creasing distributed load can be computed as
y 5
w0
120EIL
(2x5
1 2L2
x3
2 L4
x)
d
h
FIGURE P13.22
A ladder leaning against a fence and just touching a wall.
14
C H A P T E R 14
370
Multidimensional Unconstrained
Optimization
This chapter describes techniques to find the minimum or maximum of a function of
several variables. Recall from Chap. 13 that our visual image of a one-dimensional search
was like a roller coaster. For two-dimensional cases, the image becomes that of moun-
tains and valleys (Fig. 14.1). For higher-dimensional problems, convenient images are
not possible.
We have chosen to limit this chapter to the two-dimensional case. We have adopted
this approach because the essential features of multidimensional searches are often best
communicated visually.
Techniques for multidimensional unconstrained optimization can be classified in a
number of ways. For purposes of the present discussion, we will divide them depending
on whether they require derivative evaluation. The approaches that do not require de-
rivative evaluation are called nongradient, or direct, methods. Those that require deriva-
tives are called gradient, or descent (or ascent), methods.
FIGURE 14.1
The most tangible way to visual-
ize two-dimensional searches is
in the context of ascending a
mountain (maximization) or
descending into a valley
(minimization). (a) A 2-D
topographic map that
corresponds to the 3-D
mountain in (b).
Lines of constant f
x
x
y
f
y
(a) (b)
14.1 DIRECT METHODS 371
14.1 DIRECT METHODS
These methods vary from simple brute force approaches to more elegant techniques that
attempt to exploit the nature of the function. We will start our discussion with a brute
force approach.
14.1.1 Random Search
A simple example of a brute force approach is the random search method. As the name
implies, this method repeatedly evaluates the function at randomly selected values of the
independent variables. If a sufficient number of samples are conducted, the optimum will
eventually be located.
EXAMPLE 14.1 Random Search Method
Problem Statement. Use a random number generator to locate the maximum of
f(x, y) 5 y 2 x 2 2x2
2 2xy 2 y2
(E14.1.1)
in the domain bounded by x 5 22 to 2 and y 5 1 to 3. The domain is depicted in Fig. 14.2.
Notice that a single maximum of 1.5 occurs at x 5 21 and y 5 1.5.
Solution. Random number generators typically generate values between 0 and 1. If we
designate such a number as r, the following formula can be used to generate x values
randomly within a range between xl to xu:
x 5 xl 1 (xu 2 xl)r
For the present application, xl 5 22 and xu 5 2, and the formula is
x 5 22 1 (2 2 (22))r 5 22 1 4r
This can be tested by substituting 0 and 1 to yield 22 and 2, respectively.
FIGURE 14.2
Equation (E14.1.1) showing the maximum at x 5 21 and y 5 1.5.
2
1
0
0
0
–10
–20
Maximum
–1
–2
1
2
3
y
x
372 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION
This simple brute force approach works even for discontinuous and nondifferentiable
functions. Furthermore, it always finds the global optimum rather than a local optimum.
Its major shortcoming is that as the number of independent variables grows, the imple-
mentation effort required can become onerous. In addition, it is not efficient because it
takes no account of the behavior of the underlying function. The remainder of the ap-
proaches described in this chapter do take function behavior into account as well as the
results of previous trials to improve the speed of convergence. Thus, although the random
search can certainly prove useful in specific problem contexts, the following methods
have more general utility and almost always lead to more efficient convergence.
Iterations x y f (x, y)
1000 20.9886 1.4282 1.2462
2000 21.0040 1.4724 1.2490
3000 21.0040 1.4724 1.2490
4000 21.0040 1.4724 1.2490
5000 21.0040 1.4724 1.2490
6000 20.9837 1.4936 1.2496
7000 20.9960 1.5079 1.2498
8000 20.9960 1.5079 1.2498
9000 20.9960 1.5079 1.2498
10000 20.9978 1.5039 1.2500
Similarly for y, a formula for the present example could be developed as
y 5 yl 1 (yu 2 yl)r 5 1 1 (3 2 1)r 5 1 1 2r
The following Excel VBA macrocode uses the VBA random number function Rnd,
to generate (x, y) pairs. These are then substituted into Eq. (E14.1.1). The maximum
value from among these random trials is stored in the variable maxf, and the correspond-
ing x and y values in maxx and maxy, respectively.
maxf = −1E9
For j = 1 To n
x = −2 + 4 * Rnd
y = 1 + 2 * Rnd
fn = y − x − 2 * x ^ 2 − 2 * x * y − y ^ 2
If fn  maxf Then
maxf = fn
maxx = x
maxy = y
End If
Next j
A number of iterations yields
The results indicate that the technique homes in on the true maximum.
14.1 DIRECT METHODS 373
It should be noted that more sophisticated search techniques are available. These are
heuristic approaches that were developed to handle either nonlinear and/or discontinuous
problems that classical optimization cannot usually handle well, if at all. Simulated an-
nealing, tabu search, artificial neural networks, and genetic algorithms are a few. The
most widely applied is the genetic algorithm, with a number of commercial packages
available. Holland (1975) pioneered the genetic algorithm approach and Davis (1991)
and Goldberg (1989) provide good overviews of the theory and application of the method.
14.1.2 Univariate and Pattern Searches
It is very appealing to have an efficient optimization approach that does not require
evaluation of derivatives. The random search method described above does not require
derivative evaluation, but it is not very efficient. This section describes an approach, the
univariate search method, that is more efficient and still does not require derivative
evaluation.
The basic strategy underlying the univariate search method is to change one variable
at a time to improve the approximation while the other variables are held constant. Since
only one variable is changed, the problem reduces to a sequence of one-dimensional
searches that can be solved using a variety of methods (including those described in
Chap. 13).
Let us perform a univariate search graphically, as shown in Fig. 14.3. Start at point 1,
and move along the x axis with y constant to the maximum at point 2. You can see that
point 2 is a maximum by noticing that the trajectory along the x axis just touches a
contour line at the point. Next, move along the y axis with x constant to point 3. Continue
this process generating points 4, 5, 6, etc.
FIGURE 14.3
A graphical depiction of how a univariate search is conducted.
6
4
5
3
1
2
y
x
374 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION
Although we are gradually moving toward the maximum, the search becomes less
efficient as we move along the narrow ridge toward the maximum. However, also note
that lines joining alternate points such as 1-3, 3-5 or 2-4, 4-6 point in the general direc-
tion of the maximum. These trajectories present an opportunity to shoot directly along
the ridge toward the maximum. Such trajectories are called pattern directions.
Formal algorithms are available that capitalize on the idea of pattern directions to
find optimum values efficiently. The best known of these algorithms is called Powell’s
method. It is based on the observation (see Fig. 14.4) that if points 1 and 2 are obtained
by one-dimensional searches in the same direction but from different starting points, then
the line formed by 1 and 2 will be directed toward the maximum. Such lines are called
conjugate directions.
In fact, it can be proved that if f(x, y) is a quadratic function, sequential searches
along conjugate directions will converge exactly in a finite number of steps regardless
of the starting point. Since a general nonlinear function can often be reasonably ap-
proximated by a quadratic function, methods based on conjugate directions are usually
quite efficient and are in fact quadratically convergent as they approach the optimum.
Let us graphically implement a simplified version of Powell’s method to find the
maximum of
f(x, y) 5 c 2 (x 2 a)2
2 (y 2 b)2
where a, b, and c are positive constants. This equation results in circular contours in the
x, y plane, as shown in Fig. 14.5.
Initiate the search at point 0 with starting directions h1 and h2. Note that h1 and h2 are
not necessarily conjugate directions. From zero, move along h1 until a maximum is located
2
1
y
x
FIGURE 14.4
Conjugate directions.
14.2 GRADIENT METHODS 375
at point 1. Then search from point 1 along direction h2 to find point 2. Next, form a new
search direction h3 through points 0 and 2. Search along this direction until the maximum
at point 3 is located. Then search from point 3 in the h2 direction until the maximum at
point 4 is located. From point 4 arrive at point 5 by again searching along h3. Now, observe
that both points 5 and 3 have been located by searching in the h3 direction from two dif-
ferent points. Powell has shown that h4 (formed by points 3 and 5) and h3 are conjugate
directions. Thus, searching from point 5 along h4 brings us directly to the maximum.
Powell’s method can be refined to make it more efficient, but the formal algorithms
are beyond the scope of this text. However, it is an efficient method that is quadratically
convergent without requiring derivative evaluation.
14.2 GRADIENT METHODS
As the name implies, gradient methods explicitly use derivative information to generate
efficient algorithms to locate optima. Before describing specific approaches, we must first
review some key mathematical concepts and operations.
14.2.1 Gradients and Hessians
Recall from calculus that the first derivative of a one-dimensional function provides a
slope or tangent to the function being differentiated. From the standpoint of optimization,
this is useful information. For example, if the slope is positive, it tells us that increasing
the independent variable will lead to a higher value of the function we are exploring.
From calculus, also recall that the first derivative may tell us when we have reached
an optimal value since this is the point that the derivative goes to zero. Further, the sign
of the second derivative can tell us whether we have reached a minimum (positive second
derivative) or a maximum (negative second derivative).
FIGURE 14.5
Powell’s method.
2
3
0
1
4
5
h3
h2
h1
h2
h2 h3
h4
y
x
376 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION
These ideas were useful to us in the one-dimensional search algorithms we explored
in Chap. 13. However, to fully understand multidimensional searches, we must first
understand how the first and second derivatives are expressed in a multidimensional
context.
The Gradient. Suppose we have a two-dimensional function f(x, y). An example might
be your elevation on a mountain as a function of your position. Suppose that you are at a
specific location on the mountain (a, b) and you want to know the slope in an arbitrary
direction. One way to define the direction is along a new axis h that forms an angle u with
the x axis (Fig. 14.6). The elevation along this new axis can be thought of as a new func-
tion g(h). If you define your position as being the origin of this axis (that is, h 5 0), the
slope in this direction would be designated as g9(0). This slope, which is called the direc-
tional derivative, can be calculated from the partial derivatives along the x and y axis by
g¿(0) 5
0f
0x
cos u 1
0f
0y
sin u (14.1)
where the partial derivatives are evaluated at x 5 a and y 5 b.
Assuming that your goal is to gain the most elevation with the next step, the next
logical question would be: what direction is the steepest ascent? The answer to this
question is provided very neatly by what is referred to mathematically as the gradient,
which is defined as
§f 5
0f
0x
i 1
0f
0y
j (14.2)
This vector is also referred to as “del f.” It represents the directional derivative of f(x, y)
at point x 5 a and y 5 b.
x = a
y = b
h = 0
h
␪
y
x
FIGURE 14.6
The directional gradient is defined along an axis h that forms an angle u with the x axis.
14.2 GRADIENT METHODS 377
Vector notation provides a concise means to generalize the gradient to n dimensions, as
§f(x) 5
i
0f
0x1
(x)
0f
0x2
(x)
.
.
.
0f
0xn
(x)
y
How do we use the gradient? For the mountain-climbing problem, if we are inter-
ested in gaining elevation as quickly as possible, the gradient tells us what direction to
move locally and how much we will gain by taking it. Note, however, that this strategy
does not necessarily take us on a direct path to the summit! We will discuss these ideas
in more depth later in this chapter.
EXAMPLE 14.2 Using the Gradient to Evaluate the Path of Steepest Ascent
Problem Statement. Employ the gradient to evaluate the steepest ascent direction for
the function
f(x, y) 5 xy2
at the point (2, 2). Assume that positive x is pointed east and positive y is pointed north.
Solution. First, our elevation can be determined as
f(2, 2) 5 2(2)2
5 8
Next, the partial derivatives can be evaluated,
0f
0x
5 y2
5 22
5 4
0f
0y
5 2xy 5 2(2)(2) 5 8
which can be used to determine the gradient as
§f 5 4i 1 8j
This vector can be sketched on a topographical map of the function, as in Fig. 14.7. This
immediately tells us that the direction we must take is
u 5 tan21
a
8
4
b 5 1.107 radians (563.4°)
relative to the x axis. The slope in this direction, which is the magnitude of =f, can be
calculated as
242
1 82
5 8.944
378 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION
Thus, during our first step, we will initially gain 8.944 units of elevation rise for a unit
distance advanced along this steepest path. Observe that Eq. (14.1) yields the same result,
g¿(0) 5 4 cos(1.107) 1 8 sin(1.107) 5 8.944
Note that for any other direction, say u 5 1.107y2 5 0.5235, g9(0) 5 4 cos(0.5235) 1
8 sin(0.5235) 5 7.608, which is smaller.
As we move forward, both the direction and magnitude of the steepest path will
change. These changes can be quantified at each step using the gradient, and your climb-
ing direction modified accordingly.
A final insight can be gained by inspecting Fig. 14.7. As indicated, the direction of
steepest ascent is perpendicular, or orthogonal, to the elevation contour at the coordinate
(2, 2). This is a general characteristic of the gradient.
0
0
1
2
3
4
1 2 3 4
y
x
8 24 40
FIGURE 14.7
The arrow follows the direction of steepest ascent calculated with the gradient.
Aside from defining a steepest path, the first derivative can also be used to discern
whether an optimum has been reached. As is the case for a one-dimensional function, if
the partial derivatives with respect to both x and y are zero, a two-dimensional optimum
has been reached.
The Hessian. For one-dimensional problems, both the first and second derivatives pro-
vide valuable information for searching out optima. The first derivative (a) provides a
steepest trajectory of the function and (b) tells us that we have reached an optimum.
Once at an optimum, the second derivative tells us whether we are a maximum [negative
14.2 GRADIENT METHODS 379
f 0(x)] or a minimum [positive f 0(x)]. In the previous paragraphs, we illustrated how the
gradient provides best local trajectories for multidimensional problems. Now, we will
examine how the second derivative is used in such contexts.
You might expect that if the partial second derivatives with respect to both x and y
are both negative, then you have reached a maximum. Figure 14.8 shows a function
where this is not true. The point (a, b) of this graph appears to be a minimum when
observed along either the x dimension or the y dimension. In both instances, the second
partial derivatives are positive. However, if the function is observed along the line y 5 x,
it can be seen that a maximum occurs at the same point. This shape is called a saddle,
and clearly, neither a maximum or a minimum occurs at the point.
Whether a maximum or a minimum occurs involves not only the partials with respect
to x and y but also the second partial with respect to x and y. Assuming that the partial
derivatives are continuous at and near the point being evaluated, the following quantity
can be computed:
ZHZ 5
02
f
0 x2
02
f
0 y2
2 a
02
f
0 x0 y
b
2
(14.3)
Three cases can occur
If ZHZ . 0 and 02
fy0x2
. 0, then f(x, y) has a local minimum.
If ZHZ . 0 and 02
fy0x2
, 0, then f(x, y) has a local maximum.
If ZHZ , 0, then f(x, y) has a saddle point.
f(x, y)
(a, b)
x
y
y = x
FIGURE 14.8
A saddle point (x 5 a and y 5 b). Notice that when the curve is viewed along the x and y
directions, the function appears to go through a minimum (positive second derivative), whereas
when viewed along an axis x 5 y, it is concave downward (negative second derivative).
380 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION
The quantity ZHZ is equal to the determinant of a matrix made up of the second
derivatives,1
H 5 ≥
02
f
0 x2
02
f
0 x0 y
02
f
0 y0 x
02
f
0 y2
¥ (14.4)
where this matrix is formally referred to as the Hessian of f.
Besides providing a way to discern whether a multidimensional function has reached
an optimum, the Hessian has other uses in optimization (for example, for the multidi-
mensional form of Newton’s method). In particular, it allows searches to include second-
order curvature to attain superior results.
Finite-Difference Approximations. It should be mentioned that, for cases where they
are difficult or inconvenient to compute analytically, both the gradient and the determi-
nant of the Hessian can be evaluated numerically. In most cases, the approach introduced
in Sec. 6.3.3 for the modified secant method is employed. That is, the independent
variables can be perturbed slightly to generate the required partial derivatives. For ex-
ample, if a centered-difference approach is adopted, they can be computed as
0f
0x
5
f(x 1 dx, y) 2 f(x 2 dx, y)
2dx
(14.5)
0f
0y
5
f(x, y 1 dy) 2 f(x, y 2 dy)
2dy
(14.6)
02
f
0x2
5
f(x 1 dx, y) 2 2f(x, y) 1 f(x 2 dx, y)
dx2
(14.7)
02
f
0y2
5
f(x, y 1 dy) 2 2f(x, y) 1 f(x, y 2 dy)
dy2
(14.8)
02
f
0x0y
5
f(x 1 dx, y 1 dy) 2 f(x 1 dx, y 2 dy) 2 f(x 2 dx, y 1 dy) 1 f(x 2 dx, y 2 dy)
4dxdy
(14.9)
where d is some small fractional value.
Note that the methods employed in commercial software packages also use forward
differences. In addition, they are usually more complicated than the approximations listed
in Eqs. (14.5) through (14.9). Dennis and Schnabel (1996) provide more detail on such
approaches.
Regardless of how the approximation is implemented, the important point is that
you may have the option of evaluating the gradient and/or the Hessian analytically. This
can sometimes be an arduous task, but the performance of the algorithm may benefit
1
Note that 02
fy(0x0y) 5 02
fy(0y0x).
14.2 GRADIENT METHODS 381
enough to make your effort worthwhile. The closed-form derivatives will be exact, but
more importantly, you will reduce the number of function evaluations. This latter point
can have a critical impact on the execution time.
On the other hand, you will often exercise the option of having the quantities com-
puted internally using numerical approaches. In many cases, the performance will be
quite adequate and you will be saved the difficulty of numerous partial differentiations.
Such would be the case on the optimizers used in certain spreadsheets and mathematical
software packages (for example, Excel). In such cases, you may not even be given the
option of entering an analytically derived gradient and Hessian. However, for small to
moderately sized problems, this is usually not a major shortcoming.
14.2.2 Steepest Ascent Method
An obvious strategy for climbing a hill would be to determine the maximum slope at
your starting position and then start walking in that direction. But clearly, another prob-
lem arises almost immediately. Unless you were really lucky and started on a ridge that
pointed directly to the summit, as soon as you moved, your path would diverge from the
steepest ascent direction.
Recognizing this fact, you might adopt the following strategy. You could walk a
short distance along the gradient direction. Then you could stop, reevaluate the gradient
and walk another short distance. By repeating the process you would eventually get to
the top of the hill.
Although this strategy sounds superficially sound, it is not very practical. In par-
ticular, the continuous reevaluation of the gradient can be computationally demanding.
A preferred approach involves moving in a fixed path along the initial gradient until f(x, y)
stops increasing, that is, becomes level along your direction of travel. This stopping point
becomes the starting point where §f is reevaluated and a new direction followed. The
process is repeated until the summit is reached. This approach is called the steepest
ascent method.2
It is the most straightforward of the gradient search techniques. The
basic idea behind the approach is depicted in Fig. 14.9.
We start at an initial point (x0, y0) labeled “0” in the figure. At this point, we deter-
mine the direction of steepest ascent, that is, the gradient. We then search along the
direction of the gradient, h0, until we find a maximum, which is labeled “1” in the figure.
The process is then repeated.
Thus, the problem boils down to two parts: (1) determining the “best” direction to
search and (2) determining the “best value” along that search direction. As we will see,
the effectiveness of the various algorithms described in the coming pages depends on
how clever we are at both parts.
For the time being, the steepest ascent method uses the gradient approach as its
choice for the “best” direction. We have already shown how the gradient is evaluated in
Example 14.1. Now, before examining how the algorithm goes about locating the maxi-
mum along the steepest direction, we must pause to explore how to transform a function
of x and y into a function of h along the gradient direction.
2
Because of our emphasis on maximization here, we use the terminology steepest ascent. The same approach
can also be used for minimization, in which case the terminology steepest descent is used.
382 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION
Starting at x0, y0 the coordinates of any point in the gradient direction can be ex-
pressed as
x 5 x0 1
0f
0x
h (14.10)
y 5 y0 1
0f
0y
h (14.11)
FIGURE 14.9
A graphical depiction of the method of steepest ascent.
2
1
0
h0
h2
h1
y
x
FIGURE 14.10
The relationship between an arbitrary direction h and x and y coordinates.
10
y
x
6
2
7
4
1
ⵜf = 3i + 4j
h
=
2
h
=
1
h
=
0
14.2 GRADIENT METHODS 383
where h is distance along the h axis. For example, suppose x0 5 1 and y0 5 2 and
§f 5 3i 1 4j, as shown in Fig. 14.10. The coordinates of any point along the h axis are
given by
x 5 1 1 3h (14.12)
y 5 2 1 4h (14.13)
The following example illustrates how we can use these transformations to convert a
two-dimensional function of x and y into a one-dimensional function in h.
EXAMPLE 14.3 Developing a 1-D Function Along the Gradient Direction
Problem Statement. Suppose we have the following two-dimensional function:
f(x, y) 5 2xy 1 2x 2 x2
2 2y2
Develop a one-dimensional version of this equation along the gradient direction at point
x 5 21 and y 5 1.
Solution. The partial derivatives can be evaluated at (21, 1),
0f
0x
5 2y 1 2 2 2x 5 2(1) 1 2 2 2(21) 5 6
0f
0y
5 2x 2 4y 5 2(21) 2 4(1) 5 26
Therefore, the gradient vector is
§f 5 6i 2 6j
To find the maximum, we could search along the gradient direction, that is, along an h axis
running along the direction of this vector. The function can be expressed along this axis as
f ax0 1
0f
0x
h, y0 1
0f
0y
hb 5 f(21 1 6h, 1 2 6h)
5 2(21 1 6h)(1 2 6h) 1 2(21 1 6h) 2 (21 1 6h)2
2 2(1 2 6h)2
where the partial derivatives are evaluated at x 5 21 and y 5 1.
By combining terms, we develop a one-dimensional function g(h) that maps f(x, y)
along the h axis,
g(h) 5 2180h2
1 72h 2 7
Now that we have developed a function along the path of steepest ascent, we can
explore how to answer the second question. That is, how far along this path do we travel?
One approach might be to move along this path until we find the maximum of this func-
tion. We will call the location of this maximum h*. This is the value of the step that
maximizes g (and hence, f ) in the gradient direction. This problem is equivalent to find-
ing the maximum of a function of a single variable h. This can be done using different
one-dimensional search techniques like the ones we discussed in Chap. 13. Thus, we
384 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION
convert from finding the optimum of a two-dimensional function to performing a one-
dimensional search along the gradient direction.
This method is called steepest ascent when an arbitrary step size h is used. If a value
of a single step h* is found that brings us directly to the maximum along the gradient
direction, the method is called the optimal steepest ascent.
EXAMPLE 14.4 Optimal Steepest Ascent
Problem Statement. Maximize the following function:
f(x, y) 5 2xy 1 2x 2 x2
2 2y2
using initial guesses, x 5 21 and y 5 1.
Solution. Because this function is so simple, we can first generate an analytical solu-
tion. To do this, the partial derivatives can be evaluated as
0 f
0 x
5 2y 1 2 2 2x 5 0
0 f
0 y
5 2x 2 4y 5 0
This pair of equations can be solved for the optimum, x 5 2 and y 5 1. The second
partial derivatives can also be determined and evaluated at the optimum,
02
f
0x2
5 22
02
f
0y2
5 24
02
f
0x0y
5
02
f
0y0x
5 2
and the determinant of the Hessian is computed [Eq. (14.3)],
ZHZ 5 22(24) 2 22
5 4
Therefore, because ZHZ . 0 and 02
fy0x2
, 0, function value f(2, 1) is a maximum.
Now let us implement steepest ascent. Recall that, at the end of Example 14.3, we
had already implemented the initial steps of the problem by generating
g(h) 5 2180h2
1 72h 2 7
Now, because this is a simple parabola, we can directly locate the maximum (that is, h 5 h*)
by solving the problem,
g¿(h*) 5 0
2360h* 1 72 5 0
h* 5 0.2
This means that if we travel along the h axis, g(h) reaches a minimum value when h 5
h* 5 0.2. This result can be placed back into Eqs. (14.10) and (14.11) to solve for the
14.2 GRADIENT METHODS 385
(x, y) coordinates corresponding to this point,
x 5 21 1 6(0.2) 5 0.2
y 5 1 2 6(0.2) 5 20.2
This step is depicted in Fig. 14.11 as the move from point 0 to 1.
The second step is merely implemented by repeating the procedure. First, the partial
derivatives can be evaluated at the new starting point (0.2, 20.2) to give
0 f
0 x
5 2(20.2) 1 2 2 2(0.2) 5 1.2
0 f
0 y
5 2(0.2) 2 4(20.2) 5 1.2
Therefore, the gradient vector is
§f 5 1.2i 1 1.2j
This means that the steepest direction is now pointed up and to the right at a 458 angle with
the x axis (see Fig. 14.11). The coordinates along this new h axis can now be expressed as
x 5 0.2 1 1.2h
y 5 20.2 1 1.2h
Substituting these values into the function yields
f(0.2 1 1.2h, 20.2 1 1.2h) 5 g(h) 5 21.44h2
1 2.88h 1 0.2
The step h* to take us to the maximum along the search direction can then be directly
computed as
g¿(h*) 5 22.88h* 1 2.88 5 0
h* 5 1
FIGURE 14.11
The method of optimal steepest ascent.
2
2
1
0
Maximum
0
–2
–1
0
2
1
3
y
x
4
386 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION
It can be shown that the method of steepest descent is linearly convergent. Further,
it tends to move very slowly along long, narrow ridges. This is because the new gradient
at each maximum point will be perpendicular to the original direction. Thus, the technique
takes many small steps criss-crossing the direct route to the summit. Hence, although it
is reliable, there are other approaches that converge much more rapidly, particularly in
the vicinity of an optimum. The remainder of the section is devoted to such methods.
14.2.3 Advanced Gradient Approaches
Conjugate Gradient Method (Fletcher-Reeves). In Sec. 14.1.2, we have seen how
conjugate directions in Powell’s method greatly improved the efficiency of a univariate
search. In a similar manner, we can also improve the linearly convergent steepest ascent
using conjugate gradients. In fact, an optimization method that makes use of conjugate
gradients to define search directions can be shown to be quadratically convergent. This
also ensures that the method will optimize a quadratic function exactly in a finite num-
ber of steps regardless of the starting point. Since most well-behaved functions can be
approximated reasonably well by a quadratic in the vicinity of an optimum, quadratically
convergent approaches are often very efficient near an optimum.
We have seen how starting with two arbitrary search directions, Powell’s method
produced new conjugate search directions. This method is quadratically convergent and
does not require gradient information. On the other hand, if evaluation of derivatives is
practical, we can devise algorithms that combine the ideas of steepest descent and con-
jugate directions to achieve robust initial performance and rapid convergence as the
technique gravitates toward the optimum. The Fletcher-Reeves conjugate gradient algo-
rithm modifies the steepest-ascent method by imposing the condition that successive
gradient search directions be mutually conjugate. The proof and algorithm are beyond
the scope of the text but are described by Rao (1996).
Newton’s Method. Newton’s method for a single variable (recall Sec. 13.3) can be
extended to multivariate cases. Write a second-order Taylor series for f(x) near x 5 xi,
f(x) 5 f(xi) 1 §f T
(xi)(x 2 xi) 1
1
2
(x 2 xi)T
Hi(x 2 xi)
where Hi is the Hessian matrix. At the minimum,
0f(x)
0xj
5 0 for j 5 1, 2, p , n
This result can be placed back into Eqs. (14.10) and (14.11) to solve for the (x, y) co-
ordinates corresponding to this new point,
x 5 0.2 1 1.2(1) 5 1.4
y 5 20.2 1 1.2(1) 5 1
As depicted in Fig. 14.11, we move to the new coordinates, labeled point 2 in the plot,
and in so doing move closer to the maximum. The approach can be repeated with the
final result converging on the analytical solution, x 5 2 and y 5 1.
14.2 GRADIENT METHODS 387
Thus,
§f 5 §f(xi) 1 Hi(x 2 xi) 5 0
If H is nonsingular,
xi11 5 xi 2 H21
i §f (14.14)
which can be shown to converge quadratically near the optimum. This method again
performs better than the steepest ascent method (see Fig. 14.12). However, note that the
method requires both the computation of second derivatives and matrix inversion at each
iteration. Thus, the method is not very useful in practice for functions with large numbers
of variables. Furthermore, Newton’s method may not converge if the starting point is not
close to the optimum.
Marquardt Method. We know that the method of steepest ascent increases the func-
tion value even if the starting point is far from an optimum. On the other hand, we have
just described Newton’s method, which converges rapidly near the maximum. Marquardt’s
method uses the steepest descent method when x is far from x*, and Newton’s method
when x closes in on an optimum. This is accomplished by modifying the diagonal of the
Hessian in Eq. (14.14),
H
˜
i 5 Hi 1 ai I
where ai is a positive constant and I is the identity matrix. At the start of the procedure,
ai is assumed to be large and
H
˜ 21
i 
1
ai
I
FIGURE 14.12
When the starting point is close to the optimal point, following the gradient can be inefficient.
Newton methods attempt to search along a direct path to the optimum (solid line).
y
x
388 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION
which reduces Eq. (14.14) to the steepest ascent method. As the iterations proceed, ai
approaches zero and the method becomes Newton’s method.
Thus, Marquardt’s method offers the best of both worlds: it plods along reliably
from poor initial starting values yet accelerates rapidly when it approaches the optimum.
Unfortunately, the method still requires Hessian evaluation and matrix inversion at each
step. It should be noted that the Marquardt method is primarily used for nonlinear least-
squares problems.
Quasi-Newton Methods. Quasi-Newton, or variable metric, methods seek to estimate
the direct path to the optimum in a manner similar to Newton’s method. However, notice
that the Hessian matrix in Eq. (14.14) is composed of the second derivatives of f that
vary from step to step. Quasi-Newton methods attempt to avoid these difficulties by
approximating H with another matrix A using only first partial derivatives of f. The
approach involves starting with an initial approximation of H21
and updating and improv-
ing it with each iteration. The methods are called quasi-Newton because we do not use the
true Hessian, rather an approximation. Thus, we have two approximations at work simul-
taneously: (1) the original Taylor-series approximation and (2) the Hessian approximation.
There are two primary methods of this type: the Davidon-Fletcher-Powell (DFP) and
the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithms. They are similar except for
details concerning how they handle round-off error and convergence issues. BFGS is
generally recognized as being superior in most cases. Rao (1996) provides details and
formal statements of both the DFP and the BFGS algorithms.
PROBLEMS
14.1 Find the directional derivative of
f(x, y) 5 x2
1 2y2
at x 5 2 and y 5 2 in the direction of h 5 2i 1 3j.
14.2 Repeat Example 14.2 for the following function at the point
(0.8, 1.2).
f(x, y) 5 2xy 1 1.5y 2 1.25x2
2 2y2
1 5
14.3 Given
f(x, y) 5 2.25xy 1 1.75y 2 1.5x2
2 2y2
Construct and solve a system of linear algebraic equations that
maximizes f(x). Note that this is done by setting the partial deriva-
tives of f with respect to both x and y to zero.
14.4
(a) Start with an initial guess of x 5 1 and y 5 1 and apply two ap-
plications of the steepest ascent method to f(x, y) from Prob. 14.3.
(b) Construct a plot from the results of (a) showing the path of the
search.
14.5 Find the gradient vector and Hessian matrix for each of the
following functions:
(a) f(x, y) 5 2xy2
1 3exy
(b) f(x, y, z) 5 x2
1 y2
1 2z2
(c) f(x, y) 5 ln(x2
1 2xy 1 3y2
)
14.6 Find the minimum value of
f(x, y) 5 (x 2 3)2
1 (y 2 2)2
starting at x 5 1 and y 5 1, using the steepest descent method with
a stopping criterion of es 5 1%. Explain your results.
14.7 Perform one iteration of the steepest ascent method to locate
the maximum of
f(x, y) 5 4x 1 2y 1 x2
2 2x4
1 2xy 2 3y2
using initial guesses x 5 0 and y 5 0. Employ bisection to find the
optimal step size in the gradient search direction.
14.8 Perform one iteration of the optimal gradient steepest descent
method to locate the minimum of
f(x, y) 5 28x 1 x2
1 12y 1 4y2
2 2xy
using initial guesses x 5 0 and y 5 0.
14.9 Develop a program using a programming or macro language
to implement the random search method. Design the subprogram so
that it is expressly designed to locate a maximum. Test the program
with f(x, y) from Prob. 14.7. Use a range of 22 to 2 for both x and y.
PROBLEMS 389
14.10 The grid search is another brute force approach to optimiza-
tion. The two-dimensional version is depicted in Fig. P14.10. The x
and y dimensions are divided into increments to create a grid. The
function is then evaluated at each node of the grid. The denser the
grid, the more likely it would be to locate the optimum.
Develop a program using a programming or macro language to
implement the grid search method. Design the program so that it is
expressly designed to locate a maximum. Test it with the same
problem as Example 14.1.
14.11 Develop a one-dimensional equation in the pressure gradient
direction at the point (4, 2). The pressure function is
f(x, y) 5 6x2
y 2 9y2
2 8x2
14.12 A temperature function is
f(x, y) 5 2x3
y2
2 7xy 2 x2
1 3y
Develop a one-dimensional function in the temperature gradient
direction at the point (1, 1).
FIGURE P14.10
The grid search.
2
1
0
–5 –10 –15 –20 –25
0
0
Maximum
–1
–2
1
2
3
y
x
15
C H A P T E R 15
390
Constrained Optimization
This chapter deals with optimization problems where constraints come into play. We first
discuss problems where both the objective function and the constraints are linear. For
such cases, special methods are available that exploit the linearity of the underlying
functions. Called linear programming methods, the resulting algorithms solve very large
problems with thousands of variables and constraints with great efficiency. They are used
in a wide range of problems in engineering and management.
Then we will turn briefly to the more general problem of nonlinear constrained
optimization. Finally, we provide an overview of how software packages can be employed
for optimization.
15.1 LINEAR PROGRAMMING
Linear programming (LP) is an optimization approach that deals with meeting a desired
objective such as maximizing profit or minimizing cost in the presence of constraints
such as limited resources. The term linear connotes that the mathematical functions
representing both the objective and the constraints are linear. The term programming
does not mean “computer programming,” but rather, connotes “scheduling” or “setting
an agenda” (Revelle et al., 1997).
15.1.1 Standard Form
The basic linear programming problem consists of two major parts: the objective function
and a set of constraints. For a maximization problem, the objective function is generally
expressed as
Maximize Z 5 c1x1 1 c2x2 1 p 1 cnxn (15.1)
where cj 5 payoff of each unit of the jth activity that is undertaken and xj 5 magnitude
of the jth activity. Thus, the value of the objective function, Z, is the total payoff due to
the total number of activities, n.
The constraints can be represented generally as
ai1x1 1 ai2x2 1 p 1 ainxn # bi (15.2)
15.1 LINEAR PROGRAMMING 391
where aij 5 amount of the ith resource that is consumed for each unit of the jth activity
and bi 5 amount of the ith resource that is available. That is, the resources are limited.
The second general type of constraint specifies that all activities must have a positive
value,
xi $ 0 (15.3)
In the present context, this expresses the realistic notion that, for some problems,
negative activity is physically impossible (for example, we cannot produce negative
goods).
Together, the objective function and the constraints specify the linear programming
problem. They say that we are trying to maximize the payoff for a number of activities
under the constraint that these activities utilize finite amounts of resources. Before show-
ing how this result can be obtained, we will first develop an example.
EXAMPLE 15.1 Setting Up the LP Problem
Problem Statement. The following problem is developed from the area of chemical or
petroleum engineering. However, it is relevant to all areas of engineering that deal with
producing products with limited resources.
Suppose that a gas-processing plant receives a fixed amount of raw gas each week.
The raw gas is processed into two grades of heating gas, regular and premium quality.
These grades of gas are in high demand (that is, they are guaranteed to sell) and yield
different profits to the company. However, their production involves both time and on-site
storage constraints. For example, only one of the grades can be produced at a time, and
the facility is open for only 80 hr/week. Further, there is limited on-site storage for each
of the products. All these factors are listed below (note that a metric ton, or tonne, is
equal to 1000 kg):
Product
Resource Regular Premium Resource Availability
Raw gas 7 m3
/tonne 11 m3
/tonne 77 m3
/week
Production time 10 hr/tonne 8 hr/tonne 80 hr/week
Storage 9 tonnes 6 tonnes
Profit 150/tonne 175/tonne
Develop a linear programming formulation to maximize the profits for this operation.
Solution. The engineer operating this plant must decide how much of each gas to
produce to maximize profits. If the amounts of regular and premium produced weekly
are designated as x1 and x2, respectively, the total weekly profit can be calculated as
Total profit 5 150x1 1 175x2
or written as a linear programming objective function,
Maximize Z 5 150x1 1 175x2
392 CONSTRAINED OPTIMIZATION
The constraints can be developed in a similar fashion. For example, the total raw
gas used can be computed as
Total gas used 5 7x1 1 11x2
This total cannot exceed the available supply of 77 m3
/week, so the constraint can be
represented as
7x1 1 11x2 # 77
The remaining constraints can be developed in a similar fashion, with the resulting
total LP formulation given by
Maximize Z 5 150x1 1 175x2 (maximize profit)
subject to
7x1 1 11x2 # 77 (material constraint)
10x1 1 8x2 # 80 (time constraint)
x1 # 9 (“regular” storage constraint)
x2 # 6 (“premium” storage constraint)
x1,x2 $ 0 (positivity constraints)
Note that the above set of equations constitute the total LP formulation. The parenthetical
explanations at the right have been appended to clarify the meaning of each term.
15.1.2 Graphical Solution
Because they are limited to two or three dimensions, graphical solutions have limited
practical utility. However, they are very useful for demonstrating some basic concepts
that underlie the general algebraic techniques used to solve higher-dimensional problems
with the computer.
For a two-dimensional problem, such as the one in Example 15.1, the solution space
is defined as a plane with x1 measured along the abscissa and x2 along the ordinate. Because
they are linear, the constraints can be plotted on this plane as straight lines. If the LP prob-
lem was formulated properly (that is, it has a solution), these constraint lines will delineate
a region, called the feasible solution space, encompassing all possible combinations of x1
and x2 that obey the constraints and hence represent feasible solutions. The objective func-
tion for a particular value of Z can then be plotted as another straight line and superimposed
on this space. The value of Z can then be adjusted until it is at the maximum value while
still touching the feasible space. This value of Z represents the optimal solution. The cor-
responding values of x1 and x2, where Z touches the feasible solution space, represent the
optimal values for the activities. The following example should help clarify the approach.
EXAMPLE 15.2 Graphical Solution
Problem Statement. Develop a graphical solution for the gas-processing problem pre-
viously derived in Example 15.1:
Maximize Z 5 150x1 1 175x2
15.1 LINEAR PROGRAMMING 393
subject to
7x1 1 11x2 # 77 (1)
10x1 1 8x2 # 80 (2)
x1 # 9 (3)
x2 # 6 (4)
x1 $ 0 (5)
x2 $ 0 (6)
We have numbered the constraints to identify them in the following graphical solution.
Solution. First, the constraints can be plotted on the solution space. For example, the
first constraint can be reformulated as a line by replacing the inequality by an equal sign
and solving for x2:
x2 5 2
7
11
x1 1 7
Thus, as in Fig. 15.1a, the possible values of x1 and x2 that obey this constraint fall below
this line (the direction designated in the plot by the small arrow). The other constraints can
be evaluated similarly, as superimposed on Fig. 15.1a. Notice how they encompass a region
where they are all met. This is the feasible solution space (the area ABCDE in the plot).
Aside from defining the feasible space, Fig. 15.1a also provides additional insight.
In particular, we can see that constraint 3 (storage of regular gas) is “redundant.” That
is, the feasible solution space is unaffected if it were deleted.
FIGURE 15.1
Graphical solution of a linear programming problem. (a) The constraints define a feasible
solution space. (b) The objective function can be increased until it reaches the highest value
that obeys all constraints. Graphically, the function moves up and to the right until it touches
the feasible space at a single optimal point.
(b)
0
8
4 x1
x2
8
A B
C
D
E
Z ⫽
0
Z ⫽
600
Z ⫽
1400
(a)
0
8
4
4 x1
Redundant
4
x2
8
A
F
B
C
D
E
3
6
5
1
2
394 CONSTRAINED OPTIMIZATION
Next, the objective function can be added to the plot. To do this, a value of Z must
be chosen. For example, for Z 5 0, the objective function becomes
0 5 150x1 1 175x2
or, solving for x2, we derive the line
x2 5 2
150
175
x1
As displayed in Fig. 15.1b, this represents a dashed line intersecting the origin. Now,
since we are interested in maximizing Z, we can increase it to say, 600, and the objective
function is
x2 5
600
175
2
150
175
x1
Thus, increasing the value of the objective function moves the line away from the origin.
Because the line still falls within the solution space, our result is still feasible. For the
same reason, however, there is still room for improvement. Hence, Z can keep increasing
until a further increase will take the objective beyond the feasible region. As shown in
Fig. 15.1b, the maximum value of Z corresponds to approximately 1400. At this point,
x1 and x2 are equal to approximately 4.9 and 3.9, respectively. Thus, the graphical solu-
tion tells us that if we produce these quantities of regular and premium, we will reap a
maximum profit of about 1400.
Aside from determining optimal values, the graphical approach provides further
insights into the problem. This can be appreciated by substituting the answers back into
the constraint equations,
7(4.9) 1 11(3.9)  77
10(4.9) 1 8(3.9)  80
4.9 # 9
3.9 # 6
Consequently, as is also clear from the plot, producing at the optimal amount of each
product brings us right to the point where we just meet the resource (1) and time con-
straints (2). Such constraints are said to be binding. Further, as is also evident graphically,
neither of the storage constraints [(3) and (4)] acts as a limitation. Such constraints are
called nonbinding. This leads to the practical conclusion that, for this case, we can increase
profits by either increasing our resource supply (the raw gas) or increasing our production
time. Further, it indicates that increasing storage would have no impact on profit.
The result obtained in the previous example is one of four possible outcomes that
can be generally obtained in a linear programming problem. These are
1. Unique solution. As in the example, the maximum objective function intersects a
single point.
2. Alternate solutions. Suppose that the objective function in the example had coefficients
so that it was precisely parallel to one of the constraints. In our example problem,
15.1 LINEAR PROGRAMMING 395
one way in which this would occur would be if the profits were changed to $140/
tonne and $220/tonne. Then, rather than a single point, the problem would have an
infinite number of optima corresponding to a line segment (Fig. 15.2a).
3. No feasible solution. As in Fig. 15.2b, it is possible that the problem is set up so that
there is no feasible solution. This can be due to dealing with an unsolvable problem
or due to errors in setting up the problem. The latter can result if the problem is
over-constrained to the point that no solution can satisfy all the constraints.
4. Unbounded problems. As in Fig. 15.2c, this usually means that the problem is under-
constrained and therefore open-ended. As with the no-feasible-solution case, it can
often arise from errors committed during problem specification.
Now let us suppose that our problem involves a unique solution. The graphical
approach might suggest an enumerative strategy for hunting down the maximum. From
Fig. 15.1, it should be clear that the optimum always occurs at one of the corner points
where two constraints meet. Such a point is known formally as an extreme point. Thus,
out of the infinite number of possibilities in the decision space, focusing on extreme
points clearly narrows down the possible options.
Further, we can recognize that not every extreme point is feasible, that is, satisfying all
constraints. For example, notice that point F in Fig. 15.1a is an extreme point but is not
feasible. Limiting ourselves to feasible extreme points narrows the field down still further.
Finally, once all feasible extreme points are identified, the one yielding the best value
of the objective function represents the optimum solution. Finding this optimal solution
could be done by exhaustively (and inefficiently) evaluating the value of the objective
function at every feasible extreme point. The following section discusses the simplex
method, which offers a preferable strategy that charts a selective course through a sequence
of feasible extreme points to arrive at the optimum in an extremely efficient manner.
FIGURE 15.2
Aside from a single optimal solution (for example, Fig. 15.1b), there are three other possible
outcomes of a linear programming problem: (a) alternative optima, (b) no feasible solution,
and (c) an unbounded result.
(b)
0
x1
x2
(a)
0
x1
x2
(c)
0
x1
x2
Z
396 CONSTRAINED OPTIMIZATION
15.1.3 The Simplex Method
The simplex method is predicated on the assumption that the optimal solution will be
an extreme point. Thus, the approach must be able to discern whether during problem
solution an extreme point occurs. To do this, the constraint equations are reformulated
as equalities by introducing what are called slack variables.
Slack Variables. As the name implies, a slack variable measures how much of a
constrained resource is available, that is, how much “slack” of the resource is available.
For example, recall the resource constraint used in Examples 15.1 and 15.2,
7x1 1 11x2 # 77
We can define a slack variable S1 as the amount of raw gas that is not used for a particular
production level (x1, x2). If this quantity is added to the left side of the constraint, it makes
the relationship exact,
7x1 1 11x2 1 S1 5 77
Now recognize what the slack variable tells us. If it is positive, it means that we
have some “slack” for this constraint. That is, we have some surplus resource that is not
being fully utilized. If it is negative, it tells us that we have exceeded the constraint.
Finally, if it is zero, we exactly meet the constraint. That is, we have used up all the
allowable resource. Since this is exactly the condition where constraint lines intersect,
the slack variable provides a means to detect extreme points.
A different slack variable is developed for each constraint equation, resulting in what
is called the fully augmented version,
Maximize Z 5 150x1 1 175x2
subject to
(15.4a)
(15.4b)
(15.4c)
(15.4d)
x1, x2, S1, S2, S3, S4 $ 0
Notice how we have set up the four equality equations so that the unknowns are
aligned in columns. We did this to underscore that we are now dealing with a system of
linear algebraic equations (recall Part Three). In the following section, we will show how
these equations can be used to determine extreme points algebraically.
Algebraic Solution. In contrast to Part Three, where we had n equations with n un-
knowns, our example system [Eqs. (15.4)] is underspecified or underdetermined, that is,
it has more unknowns than equations. In general terms, there are n structural variables
(that is, the original unknowns), m surplus or slack variables (one per constraint), and
n 1 m total variables (structural plus surplus). For the gas production problem we have
2 structural variables, 4 slack variables, and 6 total variables. Thus, the problem involves
solving 4 equations with 6 unknowns.
7x1 1 11x2 1 S1
10x1 1 8x2 1 S2
x1 1 S3
x2 1 S4
5 77
5 80
5 9
5 6
15.1 LINEAR PROGRAMMING 397
The difference between the number of unknowns and the number of equations (equal
to 2 for our problem) is directly related to how we can distinguish a feasible extreme
point. Specifically, every feasible point has 2 variables out of 6 equal to zero. For ex-
ample, the five corner points of the area ABCDE have the following zero values:
Extreme Point Zero Variables
A x1, x2
B x2, S2
C S1, S2
D S1, S4
E x1, S4
This observation leads to the conclusion that the extreme points can be determined
from the standard form by setting two of the variables equal to zero. In our example,
this reduces the problem to a solvable form of 4 equations with 4 unknowns. For example,
for point E, setting x1 5 S4 5 0 reduces the standard form to
11x2 1 S1
8x2 1 S2
1 S3
x2
5 77
5 80
5 9
5 6
which can be solved for x2 5 6, S1 5 11, S2 5 32, and S3 5 9. Together with x1 5 S4 5 0,
these values define point E.
To generalize, a basic solution for m linear equations with n unknowns is devel-
oped by setting n 2 m variables to zero, and solving the m equations for the m remain-
ing unknowns. The zero variables are formally referred to as nonbasic variables,
whereas the remaining m variables are called basic variables. If all the basic variables
are nonnegative, the result is called a basic feasible solution. The optimum will be one
of these.
Now a direct approach to determining the optimal solution would be to calculate all
the basic solutions, determine which were feasible, and among those, which had the
highest value of Z. There are two reasons why this is not a wise approach.
First, for even moderately sized problems, the approach can involve solving a great
number of equations. For m equations with n unknowns, this results in solving
Cn
m 5
n!
m!(n 2 m)!
simultaneous equations. For example, if there are 10 equations (m 5 10) with 16 un-
knowns (n 5 16), you would have 8008 [5 16!y(10! 6!)] 10 3 10 systems of equations
to solve!
Second, a significant portion of these may be infeasible. For example, in the present
problem, out of C4
6 5 15 extreme points, only 5 are feasible. Clearly, if we could avoid
solving all these unnecessary systems, a more efficient algorithm would be developed.
Such an approach is described next.
398 CONSTRAINED OPTIMIZATION
Simplex Method Implementation. The simplex method avoids inefficiencies outlined
in the previous section. It does this by starting with a basic feasible solution. Then it
moves through a sequence of other basic feasible solutions that successively improve the
value of the objective function. Eventually, the optimal value is reached and the method
is terminated.
We will illustrate the approach using the gas-processing problem from Examples 15.1
and 15.2. The first step is to start at a basic feasible solution (that is, at an extreme
corner point of the feasible space). For cases like ours, an obvious starting point would
be point A; that is, x1 5 x2 5 0. The original 6 equations with 4 unknowns become
S1 5 77
S2 5 80
S3 5 9
S4 5 6
Thus, the starting values for the basic variables are given automatically as being equal
to the right-hand sides of the constraints.
Before proceeding to the next step, the beginning information can now be sum-
marized in a convenient tabular format called a tableau. As shown below, the tableau
provides a concise summary of the key information constituting the linear programming
problem.
Basic Z x1 x2 S1 S2 S3 S4 Solution Intercept
Z 1 2150 2175 0 0 0 0 0
S1 0 7 11 1 0 0 0 77 11
S2 0 10 8 0 1 0 0 80 8
S3 0 1 0 0 0 1 0 9 9
S4 0 0 1 0 0 0 1 6 `
Notice that for the purposes of the tableau, the objective function is expressed as
Z 2 150x1 2 175x2 2 0S1 2 0S2 2 0S3 2 0S4 5 0 (15.5)
The next step involves moving to a new basic feasible solution that leads to an
improvement of the objective function. This is accomplished by increasing a current
nonbasic variable (at this point, x1 or x2) above zero so that Z increases. Recall that, for
the present example, extreme points must have 2 zero values. Therefore, one of the cur-
rent basic variables (S1, S2, S3, or S4) must also be set to zero.
To summarize this important step: one of the current nonbasic variables must be
made basic (nonzero). This variable is called the entering variable. In the process, one
of the current basic variables is made nonbasic (zero). This variable is called the leaving
variable.
Now, let us develop a mathematical approach for choosing the entering and leav-
ing variables. Because of the convention by which the objective function is written
[(Eq. (15.5)], the entering variable can be any variable in the objective function having
a negative coefficient (because this will make Z bigger). The variable with the largest
negative value is conventionally chosen because it usually leads to the largest increase
15.1 LINEAR PROGRAMMING 399
in Z. For our case, x2 would be the entering variable since its coefficient, 2175, is
more negative than the coefficient of x1, 2150.
At this point the graphical solution can be consulted for insight. As in Fig. 15.3, we
start at the initial point A. Based on its coefficient, x2 should be chosen to enter. However,
to keep the present example brief, we choose x1 since we can see from the graph that
this will bring us to the maximum quicker.
Next, we must choose the leaving variable from among the current basic variables—
S1, S2, S3, or S4. Graphically, we can see that there are two possibilities. Moving to point
B will drive S2 to zero, whereas moving to point F will drive S1 to zero. However, the
graph also makes it clear that F is not possible because it lies outside the feasible solu-
tion space. Thus, we decide to move from A to B.
How is the same result detected mathematically? One way is to calculate the values
at which the constraint lines intersect the axis or line corresponding to the entering
variable (in our case, the x1 axis). We can calculate this value as the ratio of the right-
hand side of the constraint (the “Solution” column of the tableau) to the corresponding
coefficient of x1. For example, for the first constraints slack variable S1, the result is
Intercept 5
77
7
5 11
The remaining intercepts can be calculated and listed as the last column of the tableau.
Because 8 is the smallest positive intercept, it means that the second constraint line will
be reached first as x1 is increased. Hence, S2 should be the leaving variable.
FIGURE 15.3
Graphical depiction of how the simplex method successively moves through feasible basic solu-
tions to arrive at the optimum in an efficient manner.
0
8
4
4 x1
4
1
x2
8
2
A
F
B
C
D
E
3
400 CONSTRAINED OPTIMIZATION
At this point, we have moved to point B (x2 5 S2 5 0), and the new basic solution
becomes
7x1 1 S1 5 77
10x1 5 80
x1 1 S3 5 9
S4 5 6
The solution of this system of equations effectively defines the values of the basic vari-
ables at point B: x1 5 8, S1 5 21, S3 5 1, and S4 5 6.
The tableau can be used to make the same calculation by employing the Gauss-
Jordan method. Recall that the basic strategy behind Gauss-Jordan involved converting
the pivot element to 1 and then eliminating the coefficients in the same column above
and below the pivot element (recall Sec. 9.7).
For this example, the pivot row is S2 (the leaving variable) and the pivot element is 10
(the coefficient of the entering variable, x1). Dividing the row by 10 and replacing S2 by x1
gives
Basic Z x1 x2 S1 S2 S3 S4 Solution Intercept
Z 1 2150 2175 0 0 0 0 0
S1 0 7 11 1 0 0 0 77
x1 0 1 0.8 0 0.1 0 0 8
S3 0 1 0 0 0 1 0 9
S4 0 0 1 0 0 0 1 6
Next, the x1 coefficients in the other rows can be eliminated. For example, for the objective
function row, the pivot row is multiplied by 2150 and the result subtracted from the first
row to give
Z x1 x2 S1 S2 S3 S4 Solution
1 2150 2175 0 0 0 0 0
20 2(2150) 2(2120) 20 2(215) 0 0 2(21200)
1 0 255 0 15 0 0 1200
Similar operations can be performed on the remaining rows to give the new tableau,
Basic Z x1 x2 S1 S2 S3 S4 Solution Intercept
Z 1 0 255 0 15 0 0 1200
S1 0 0 5.4 1 20.7 0 0 21 3.889
x1 0 1 0.8 0 0.1 0 0 8 10
S3 0 0 20.8 0 20.1 1 0 1 21.25
S4 0 0 1 0 0 0 1 6 6
15.2 NONLINEAR CONSTRAINED OPTIMIZATION 401
Thus, the new tableau summarizes all the information for point B. This includes the fact
that the move has increased the objective function to Z 5 1200.
This tableau can then be used to chart our next, and in this case final, step. Only
one more variable, x2, has a negative value in the objective function, and it is therefore
chosen as the entering variable. According to the intercept values (now calculated as the
solution column over the coefficients in the x2 column), the first constraint has the small-
est positive value, and therefore, S1 is selected as the leaving variable. Thus, the simplex
method moves us from points B to C in Fig. 15.3. Finally, the Gauss-Jordan elimination
can be implemented to solve the simultaneous equations. The result is the final tableau,
Basic Z x1 x2 S1 S2 S3 S4 Solution
Z 1 0 0 10.1852 7.8704 0 0 1413.889
x2 0 0 1 0.1852 20.1296 0 0 3.889
x1 0 1 0 20.1481 0.2037 0 0 4.889
S3 0 0 0 0.1481 20.2037 1 0 4.111
S4 0 0 0 20.1852 0.1296 0 1 2.111
We know that the result is final because there are no negative coefficients remaining in the
objective function row. The final solution is tabulated as x1 5 3.889 and x2 5 4.889, which
give a maximum objective function of Z 5 1413.889. Further, because S3 and S4 are still
in the basis, we know that the solution is limited by the first and second constraints.
15.2 NONLINEAR CONSTRAINED OPTIMIZATION
There are a number of approaches for handling nonlinear optimization problems in the
presence of constraints. These can generally be divided into indirect and direct ap-
proaches (Rao, 1996). A typical indirect approach uses so-called penalty functions. These
involve placing additional expressions to make the objective function less optimal as the
solution approaches a constraint. Thus, the solution will be discouraged from violating
constraints. Although such methods can be useful in some problems, they can become
arduous when the problem involves many constraints.
The generalized reduced gradient (GRG) search method is one of the more popular
of the direct methods (for details, see Fylstra et al., 1998; Lasdon et al., 1978; Lasdon
and Smith, 1992). It is, in fact, the nonlinear method used within the Excel Solver.
It first “reduces” the problem to an unconstrained optimization problem. It does this
by solving a set of nonlinear equations for the basic variables in terms of the nonbasic
variables. Then, the unconstrained problem is solved using approaches similar to those
described in Chap. 14. First, a search direction is chosen along which an improvement in
the objective function is sought. The default choice is a quasi-Newton approach (BFGS)
that, as described in Chap. 14, requires storage of an approximation of the Hessian matrix.
This approach performs very well for most cases. The conjugate gradient approach is also
available in Excel as an alternative for large problems. The Excel Solver has the nice
feature that it automatically switches to the conjugate gradient method, depending on
available storage. Once the search direction is established, a one-dimensional search is
carried out along that direction using a variable step-size approach.
402 CONSTRAINED OPTIMIZATION
S
O
F
T
W
A
R
E
15.3 OPTIMIZATION WITH SOFTWARE PACKAGES
Software packages have great capabilities for optimization. In this section, we will give
you an introduction to some of the more useful ones.
15.3.1 Excel for Linear Programming
There are a variety of software packages expressly designed to implement linear program-
ming. However, because of its broad availability, we will focus on the Excel spreadsheet.
This involves using the Solver option previously employed in Chap. 7 for root location.
The manner in which Solver is used for linear programming is similar to our previ-
ous applications in that these data are entered into spreadsheet cells. The basic strategy
is to arrive at a single cell that is to be optimized as a function of variations of other
cells on the spreadsheet. The following example illustrates how this can be done for the
gas-processing problem.
EXAMPLE 15.3 Using Excel’s Solver for a Linear Programming Problem
Problem Statement. Use Excel to solve the gas-processing problem we have been
examining in this chapter.
Solution. An Excel worksheet set up to calculate the pertinent values in the gas-
processing problem is shown in Fig. 15.4. The unshaded cells are those containing
numeric and labeling data. The shaded cells involve quantities that are calculated based
on other cells. Recognize that the cell to be maximized is D12, which contains the total
profit. The cells to be varied are B4:C4, which hold the amounts of regular and premium
gas produced.
FIGURE 15.4
Excel spreadsheet set up to use the Solver for linear programming.
15.3 OPTIMIZATION WITH SOFTWARE PACKAGES 403
Once the spreadsheet is created, Solver is chosen from the Data tab (recall Sec. 7.7.1).
At this point a dialogue box will be displayed, querying you for pertinent information.
The pertinent cells of the Solver dialogue box are filled out as
The constraints must be added one by one by selecting the “Add” button. This will
open up a dialogue box that looks like
As shown, the constraint that the total raw gas (cell D6) must be less than or equal
to the available supply (E6) can be added as shown. After adding each constraint, the
“Add” button can be selected. When all four constraints have been entered, the OK but-
ton is selected to return to the Solver dialogue box.
Now, before execution, the Solver options button should be selected and the box la-
beled “Assume linear model” should be checked off. This will make Excel employ a ver-
sion of the simplex algorithm (rather than the more general nonlinear solver it usually
uses) that will speed up your application.
After selecting this option, return to the Solver menu. When the OK button is se-
lected, a dialogue box will open with a report on the success of the operation. For the
present case, the Solver obtains the correct solution (Fig. 15.5)
404 CONSTRAINED OPTIMIZATION
S
O
F
T
W
A
R
E
Beyond obtaining the solution, the Solver also provides some useful summary reports.
We will explore these in the engineering application described in Sec. 16.2.
15.3.2 Excel for Nonlinear Optimization
The manner in which Solver is used for nonlinear optimization is similar to our previous
applications in that these data are entered into spreadsheet cells. Once again, the basic strategy
is to arrive at a single cell that is to be optimized as a function of variations of other cells on
the spreadsheet. The following example illustrates how this can be done for the parachutist
problem we set up in the introduction to this part of the book (recall Example PT4.1).
EXAMPLE 15.4 Using Excel‘s Solver for Nonlinear Constrained Optimization
Problem Statement. Recall from Example PT4.1 that we developed a nonlinear con-
strained optimization to minimize the cost for a parachute drop into a refugee camp.
Parameters for this problem are
Parameter Symbol Value Unit
Total mass Mt 2000 kg
Acceleration of gravity g 9.8 m/s2
Cost coefficient (constant) c0 200 $
Cost coefficient (length) c1 56 $/m
Cost coefficient (area) c2 0.1 $/m2
Critical impact velocity vc 20 m/s
Area effect on drag kc 3 kg/(s?m2
)
Initial drop height z0 500 m
FIGURE 15.5
Excel spreadsheet showing solution to linear programming problem.
15.3 OPTIMIZATION WITH SOFTWARE PACKAGES 405
Substituting these values into Eqs. (PT4.11) through (PT4.19) gives
Minimize C 5 n(200 1 56/ 1 0.1A2
)
subject to
y # 20
n $ 1
where n is an integer and all other variables are real. In addition, the following quantities
are defined as
A 5 2pr2
/ 5 12r
c 5 3A
m 5
Mt
n
(E15.4.1)
t 5 root c500 2
9.8m
c
t 1
9.8m2
c2
(1 2 e2(cym)t
)d (E15.4.2)
y 5
9.8m
c
(1 2 e2(cym)t
)
Use Excel to solve this problem for the design variables r and n that minimize cost C.
Solution. Before implementation of this problem on Excel, we must first deal with the
problem of determining the root in the above formulation [Eq. (E15.4.2)]. One method might
be to develop a macro to implement a root-location method such as bisection or the secant
method. (Note that we will illustrate how this is done in the next chapter in Sec. 16.3.)
For the time being, an easier approach is possible by developing the following fixed-
point iteration solution to Eq. (E15.4.2),
ti11 5 c500 1
9.8m2
c2
(1 2 e2(cym)ti
)d
c
9.8m
(E15.4.3)
Thus, t can be adjusted until Eq. (E15.4.3) is satisfied. It can be shown that for the range
of parameters used in the present problem, this formula always converges.
Now, how can this equation be solved on a spreadsheet? As shown below, two cells
can be set up to hold a value for t and for the right-hand side of Eq. (E15.4.3) [that is, f(t)].
406 CONSTRAINED OPTIMIZATION
S
O
F
T
W
A
R
E
You can type Eq. (E15.4.3) into cell B21 so that it gets its time value from cell B20 and
the other parameter values from cells elsewhere on the sheet (see below for how we set
up the whole sheet). Then go to cell B20 and point its value to cell B21.
Once you enter these formulations, you will immediately get the error message:
“Cannot resolve circular references” because B20 depends on B21 and vice versa. Now,
go to the Tools/Options selections from the menu and select calculation. From the cal-
culation dialogue box, check off “iteration” and hit “OK.” Immediately the spreadsheet
will iterate these cells and the result will come out as
FIGURE 15.6
Excel spreadsheet set up for the nonlinear parachute optimization problem.
Thus, the cells will converge on the root. If you want to make it more precise, just strike
the F9 key to make it iterate some more (the default is 100 iterations, which you can
change if you wish).
An Excel worksheet to calculate the pertinent values can then be set up as shown
in Fig. 15.6. The unshaded cells are those containing numeric and labeling data. The
15.3 OPTIMIZATION WITH SOFTWARE PACKAGES 407
shaded cells involve quantities that are calculated based on other cells. For example, the
mass in B17 was computed with Eq. (E15.4.1) based on the values for Mt (B4) and n
(E5). Note also that some cells are redundant. For example, cell E11 points back to cell
E5. The information is repeated in cell E11 so that the structure of the constraints is
evident from the sheet. Finally, recognize that the cell to be minimized is E15, which
contains the total cost. The cells to be varied are E4:E5, which hold the radius and the
number of parachutes.
Once the spreadsheet is created, the selection Solver is chosen from the Data tab.
At this point a dialogue box will be displayed, querying you for pertinent information.
The pertinent cells of the Solver dialogue box would be filled out as
The constraints must be added one by one by selecting the “Add” button. This will
open up a dialogue box that looks like
As shown, the constraint that the actual impact velocity (cell E10) must be less than
or equal to the required velocity (G10) can be added as shown. After adding each con-
straint, the “Add” button can be selected. Note that the down arrow allows you to choose
among several types of constraints (,5, .5, 5, and integer). Thus, we can force the
number of parachutes (E5) to be an integer.
When all three constraints have been entered, the “OK” button is selected to return
to the Solver dialogue box. After selecting this option return to the Solver menu. When
the “OK” button is selected, a dialogue box will open with a report on the success of
the operation. For the present case, the Solver obtains the correct solution as in Fig. 15.7.
408 CONSTRAINED OPTIMIZATION
Thus, we determine that the minimum cost of $4377.26 will occur if we break the
load up into six parcels with a chute radius of 2.944 m. Beyond obtaining the solution,
the Solver also provides some useful summary reports. We will explore these in the
engineering application described in Sec. 16.2.
FIGURE 15.7
Excel spreadsheet showing the solution for the nonlinear parachute optimization problem.
15.3.3 MATLAB
As summarized in Table 15.1, MATLAB software has a variety of built-in functions to
perform optimization. The following examples illustrates how they can be used.
TABLE 15.1 MATLAB functions to implement optimization.
Function Description
fminbnd Minimize function of one variable with bound constraints
fminsearch Minimize function of several variables
15.3 OPTIMIZATION WITH SOFTWARE PACKAGES 409
EXAMPLE 15.5 Using MATLAB for One-Dimensional Optimization
Problem Statement. Use the MATLAB fminbnd function to find the maximum of
f(x) 5 2 sin x 2
x2
2
within the interval xl 5 0 and xu 5 4. Recall that in Chap. 13, we used several methods
to solve this problem for x 5 1.7757 and f(x) 5 1.4276.
Solution. First, we must create an M-file to hold the function.
function f=fx(x)
f = −(2*sin(x)−x^2/10)
Because we are interested in maximization, we enter the negative of the function. Then,
we invoke the fminbnd function with
 x=fminbnd('fx',0,4)
The result is
f =
−1.7757
x =
1.4275
Note that additional arguments can be included. One useful addition is to set optimiza-
tion options such as error tolerance or maximum iterations. This is done with the optimset
function, which was used previously in Example 7.6 and has the general format,
optimset('param1',value1,'param2',value2,...)
where parami is a parameter specifying the type of option and valuei is the value
assigned to that option. For example, if you wanted to set the tolerance at 1 31022
,
optimset('TolX',le–2)
Thus, solving the present problem to a tolerance of 1 3 1022
can be generated with
 fminbnd('fx',0,4,optimset('TolX',le–2))
with the result
f =
−1.7757
ans =
1.4270
410 CONSTRAINED OPTIMIZATION
S
O
F
T
W
A
R
E
A complete set of parameters can be found by invoking Help as in
 Help optimset
MATLAB has a variety of capabilities for dealing with multidimensional functions.
Recall from Chap. 13 that our visual image of a one-dimensional search was like a roller
coaster. For two-dimensional cases, the image becomes that of mountains and valleys.
As in the following example, MATLAB’s graphic capabilities provide a handy means to
visualize such functions.
EXAMPLE 15.6 Visualizing a Two-Dimensional Function
Problem Statement. Use MATLAB’s graphical capabilities to display the following
function and visually estimate its minimum in the range 22 # x1 # 0 and 0 # x2 # 3:
f(x1, x2) 5 2 1 x1 2 x2 1 2x2
1 1 2x1x2 1 x2
2
Solution. The following script generates contour and mesh plots of the function:
x=linspace(−2,0,40);y=linspace(0,3,40);
[X,Y] = meshgrid(x,y);
Z=2+X−Y+2*X.^2+2*X.*Y+Y.^2;
subplot(1,2,1);
cs=contour(X,Y,Z);clabel(cs);
xlabel('x_1');ylabel('x_2');
title('(a) Contour plot');grid;
subplot(1,2,2);
cs=surfc(X,Y,Z);
zmin=floor(min(Z));
zmax=ceil(max(Z));
xlabel('x_1');ylabel('x_2');zlabel('f(x_1,x_2)');
title('(b) Mesh plot');
As displayed in Fig. 15.8, both plots indicate that function has a minimum value of about
f(x1, x2) 5 0 to 1 located at about x1 5 21 and x2 5 1.5.
Standard MATLAB has a function fminsearch that can be used to determine the
minimum of a multidimensional function. It is based on the Nelder-Mead method, which
is a direct-search method that uses only function values (does not require derivatives)
and handles nonsmooth objective functions. A simple expression of its syntax is
[xmin, fval] = fminsearch(function,x1,x2)
where xmin and fval are the location and value of the minimum, function is the
name of the function being evaluated, and x1 and x2 are the bounds of the interval being
searched.
15.3 OPTIMIZATION WITH SOFTWARE PACKAGES 411
FIGURE 15.8
(a) Contour and (b) mesh plots of a two-dimensional function.
x1
(x
1
,
x
2
)
x1
(a) Contour plot (b) Mesh plot
x2
x
2
EXAMPLE 15.7 Using MATLAB for Multidimensional Optimization
Problem Statement. Use the MATLAB fminsearch function to find the maximum
for the simple function we just graphed in Example 15.6.
f(x1, x2) 5 2 1 x1 2 x2 1 2x2
1 1 2x1x2 1 x2
2
Employ initial guesses of x 5 20.5 and y 5 0.5.
Solution. We can invoke the fminsearch function with
 f=@(x) 2+x(1)−x(2)+2*x(1)^2+2*x(1)*x(2)+x(2)^2;
 [x,fval]=fminsearch(f,[−0.5,0.5])
x =
−1.0000 1.5000
fval =
0.7500
Just as with fminbnd, arguments can be included in order to specify additional param-
eters of the optimization process. For example, the optimset function can be used to
limit the maximum number of iterations
 [x,fval]=fminsearch(f,[−0.5,0.5],optimset('MaxIter',2))
412 CONSTRAINED OPTIMIZATION
with the result
Exiting: Maximum number of iterations has been exceeded
− increase MaxIter option.
Current function value: 1.225625
x =
−0.5000 0.5250
fval =
1.2256
Thus, because we have set a very stringent limit on the iterations, the optimization ter-
minates well before the maximum is reached.
15.3.4 Mathcad
Mathcad contains a numeric mode function called Find that can be used to solve up to
50 simultaneous nonlinear algebraic equations with inequality constraints. The use of
this function for unconstrained applications was described in Part Two. If Find fails to
locate a solution that satisfies the equations and constraints, it returns the error message
“did not find solution.” However, Mathcad also contains a similar function called Minerr.
This function gives solution results that minimize the errors in the constraints even when
exact solutions cannot be found. This function solves equations and accommodates sev-
eral constraints using the Levenberg-Marquardt method taken from the public-domain
MINPACK algorithms developed and published by the Argonne National Laboratory.
Let’s develop an example where Find is used to solve a system of nonlinear equa-
tions with constraints. Initial guesses of x 5 21 and y 5 1 are input using the definition
symbol as shown in Fig. 15.9. The word Given then alerts Mathcad that what follows
is a system of equations. Then we can enter the equations and the inequality constraint.
Note that for this application, Mathcad requires the use of a symbolic equal sign (typed
as [Ctrl]5) and . to separate the left and right sides of an equation. Now the vector
consisting of xval and yval is computed using Find (x,y) and the values are shown using
an equal sign.
A graph that displays the equations and constraints as well as the solution can be
placed on the worksheet by clicking to the desired location. This places a red crosshair
at that location. Then use the Insert/Graph/X-Y Plot pull-down menu to place an empty
plot on the worksheet with placeholders for the expressions to be graphed and for the
ranges of the x and y axes. Four variables are plotted on the y axis as shown: the top
and bottom halves of the equation for the circle, the linear function, and a vertical line
to represent the x . 2 constraint. In addition, the solution is included as a point. Once
the graph has been created, you can use the Format/Graph/X-Y Plot pull-down menu to
vary the type of graph; change the color, type, and weight of the trace of the function;
and add titles, labels, and other features. The graph and the numerical values for xval
and yval nicely portray the solution as the intersection of the circle and the line in the
region where x . 2.
PROBLEMS 413
FIGURE 15.9
Mathcad screen for a nonlinear constrained optimization problem.
PROBLEMS
15.1 A company makes two types of products, A and B. These
products are produced during a 40-hr work week and then shipped
out at the end of the week. They require 20 and 5 kg of raw material
per kg of product, respectively, and the company has access to 9500 kg
of raw material per week. Only one product can be created at a time
with production times for each of 0.04 and 0.12 hr, respectively.
The plant can only store 550 kg of total product per week. Finally,
the company makes profits of $45 and $20 on each unit of A and B,
respectively. Each unit of product is equivalent to a kg.
(a) Set up the linear programming problem to maximize profit.
(b) Solve the linear programming problem graphically.
(c) Solve the linear programming problem with the simplex method.
(d) Solve the problem with a software package.
(e) Evaluate which of the following options will raise profits the
most: increasing raw material, storage, or production time.
15.2 Suppose that for Example 15.1, the gas-processing plant
decides to produce a third grade of product with the following
characteristics:
Supreme
Raw gas 15 m3
/tonne
Production time 12 hr/tonne
Storage 5 tonnes
Profit $250/tonne
In addition, suppose that a new source of raw gas has been discov-
ered so that the total available is doubled to 154 m3
/week.
(a) Set up the linear programming problem to maximize profit.
(b) Solve the linear programming problem with the simplex method.
(c) Solve the problem with a software package.
(d) Evaluate which of the following options will raise profits the
most: increasing raw material, storage, or production time.
15.3 Consider the linear programming problem:
Maximize f(x, y) 5 1.75x 1 1.25y
414 CONSTRAINED OPTIMIZATION
15.7 Consider the following constrained nonlinear optimization
problem:
Minimize f(x, y) 5 (x 2 3)2
1 (y 2 3)2
subject to
x 1 2y 5 4
(a) Use a graphical approach to estimate the solution.
(b) Use a software package (for example, Excel) to obtain a more
accurate estimate.
15.8 Use a software package to determine the maximum of
f(x, y) 5 2.25xy 1 1.75y 2 1.5x2
2 2y2
15.9 Use a software package to determine the maximum of
f(x, y) 5 4x 1 2y 1 x2
2 2x4
1 2xy 2 3y2
15.10 Given the following function,
f(x, y) 5 28x 1 x2
1 12y 1 4y2
2 2xy
use a software package to determine the minimum:
(a) Graphically.
(b) Numerically.
(c) Substitute the result of (b) back into the function to determine
the minimum f(x, y).
(d) Determine the Hessian and its determinant, and substitute the
result of part (b) back into the latter to verify that a minimum
has been detected.
15.11 You are asked to design a covered conical pit to store 50 m3
of waste liquid. Assume excavation costs at $100ym3
, side lining
costs at $50ym2
, and cover cost at 25ym2
. Determine the dimen-
sions of the pit that minimize cost (a) if the side slope is uncon-
strained and (b) if the side slope must be less than 458.
15.12 An automobile company has two versions of the same model
car for sale, a two-door coupe and the full-size four door.
(a) Graphically solve how many cars of each design should be
produced to maximize profit and what that profit is.
(b) Solve the same problem with Excel.
Two Door Four Door Availability
Profit $13,500/car $15,000/car
Production time 15 h/car 20 h/car 8000 h/year
Storage 400 cars 350 cars
Consumer demand 700/car 500/car 240,000 cars
15.13 Og is the leader of the surprisingly mathematically ad-
vanced, though technologically run-of-the-mill, Calm Waters cave-
man tribe. He must decide on the number of stone clubs and stone
axes to be produced for the upcoming battle against the neighboring
subject to
1.2x 1 2.25y # 14
x 1 1.1y # 8
2.5x 1 y # 9
x $ 0
y $ 0
Obtain the solution:
(a) Graphically.
(b) Using the simplex method.
(c) Using an appropriate software package (for example, Excel,
MATLAB, or Mathcad).
15.4 Consider the linear programming problem:
Maximize f(x, y) 5 6x 1 8y
subject to
5x 1 2y # 40
6x 1 6y # 60
2x 1 4y # 32
x $ 0
y $ 0
Obtain the solution:
(a) Graphically.
(b) Using the simplex method.
(c) Using an appropriate software package (for example, Excel).
15.5 Use a software package (for example, Excel, MATLAB,
Mathcad) to solve the following constrained nonlinear optimization
problem:
Maximize f(x, y) 5 1.2x 1 2y 2 y3
subject to
2x 1 y # 2
x $ 0
y $ 0
15.6 Use a software package (for example, Excel, MATLAB,
Mathcad) to solve the following constrained nonlinear optimization
problem:
Maximize f(x, y) 5 15x 1 15y
subject to
x2
1 y2
# 1
x 1 2y # 2.1
x $ 0
y $ 0
PROBLEMS 415
• Check whether the guesses bracket a maximum. If not, the func-
tion should not implement the algorithm, but should return an
error message.
• Iterate until the relative error falls below a stopping criterion or
exceeds a maximum number of iterations.
• Return both the optimal x and f(x).
• Use a bracketing approach (as in Example 13.2) to replace old
values with new values.
15.17 The length of the longest ladder that can negotiate the corner
depicted in Fig. P15.17 can be determined by computing the value
of u that minimizes the following function:
L(u) 5
w1
sin u
1
w2
sin(p 2 a 2 u)
For the case where w1 5 w2 5 2 m, use a numerical method (in-
cluding software) to develop a plot of L versus a range of a’s from
458 to 1358.
Peaceful Sunset tribe. Experience has taught him that each club is
good for, on the average, 0.45 kills and 0.65 maims, while each axe
produces 0.70 kills and 0.35 maims. Production of a club requires
5.1 lb of stone and 2.1 man-hours of labor while an axe requires 3.2 lb
of stone and 4.3 man-hours of labor. Og’s tribe has 240 lb of stone
available for weapons production, and a total of 200 man-hours of
labor available before the expected time of this battle (that Og is sure
will end war for all time). Og values a kill as worth two maims in
quantifying the damage inflicted on the enemy, and he wishes to
produce that mix of weapons that will maximize damage.
(a) Formulate this as a linear programming problem. Make sure to
define your decision variables.
(b) Represent this problem graphically, making sure to identify all
the feasible corner points and the infeasible corner points.
(c) Solve the problem graphically.
(d) Solve the problem using the computer.
15.14 Develop an M-file that is expressly designed to locate a
maximum with the golden-section search algorithm. In other
words, set it up so that it directly finds the maximum rather than
finding the minimum of 2f(x). Test your program with the same
problem as Example 13.1. The function should have the following
features:
• Iterate until the relative error falls below a stopping criterion or
exceeds a maximum number of iterations.
• Return both the optimal x and f(x).
15.15 Develop an M-file to locate a minimum with the golden-
section search. Rather than using the standard stopping criteria (as
in Fig. 13.5), determine the number of iterations needed to attain a
desired tolerance.
15.16 Develop an M-file to implement parabolic interpolation to
locate a minimum. Test your program with the same problem as
Example 13.2. The function should have the following features:
• Base it on two initial guesses, and have the program generate the
third initial value at the midpoint of the interval.
FIGURE P15.17
A ladder negotiating a corner formed by two hallways.
w2
a
q
w1
L
16
C H A P T E R 16
416
Case Studies: Optimization
The purpose of this chapter is to use the numerical procedures discussed in Chaps. 13
through 15 to solve actual engineering problems involving optimization. These prob-
lems are important because engineers are often called upon to come up with the “best”
solution to a problem. Because many of these cases involve complex systems and
interactions, numerical methods and computers are often a necessity for developing
optimal solutions.
The following applications are typical of those that are routinely encountered during
upper-class and graduate studies. Furthermore, they are representative of problems you
will address professionally. The problems are drawn from the major discipline areas of
engineering: chemical/bio, civil/environmental, electrical, and mechanical/aerospace.
The first application, taken from chemical/bio engineering, deals with using nonlin-
ear constrained optimization to design an optimal cylindrical tank. The Excel Solver is
used to develop the solution.
Next, we use linear programming to assess a problem from civil/environmental en-
gineering: minimizing the cost of waste treatment to meet water-quality objectives in a
river. In this example, we introduce the notion of shadow prices and their use in assess-
ing the sensitivity of a linear programming solution.
The third application, taken from electrical engineering, involves maximizing the
power across a potentiometer in an electric circuit. The solution involves one-dimensional
unconstrained optimization. Aside from solving the problem, we illustrate how the Visual
Basic macro language allows access to the golden-section search algorithm within the
context of the Excel environment.
Finally, the fourth application, taken from mechanical/aerospace engineering,
involves determining the equilibrium position of a multi-spring system based on the
minimum potential energy.
16.1 LEAST-COST DESIGN OF A TANK
(CHEMICAL/BIO ENGINEERING)
Background. Chemical engineers (as well as other specialists such as mechanical and
civil engineers) often encounter the general problem of designing containers to transport
liquids and gases. Suppose that you are asked to determine the dimensions of a small
16.1 LEAST-COST DESIGN OF A TANK 417
cylindrical tank to transport toxic waste that is to be mounted on the back of a pickup
truck. Your overall objective will be to minimize the cost of the tank. However, aside
from cost, you must ensure that it holds the required amount of liquid and that it does
not exceed the dimensions of the truck’s bed. Note that because the tank will be carrying
toxic waste, the tank thickness is specified by regulations.
A schematic of the tank and bed are shown in Fig. 16.1. As can be seen, the tank
consists of a cylinder with two plates welded on each end.
The cost of the tank involves two components: (1) material expense, which is based
on weight, and (2) welding expense based on length of weld. Note that the latter involves
welding both the interior and the exterior seams where the plates connect with the
cylinder. The data needed for the problem are summarized in Table 16.1.
Solution. The objective here is to construct a tank for a minimum cost. The cost is
related to the design variables (length and diameter) as they effect the mass of the tank
and the welding lengths. Further, the problem is constrained because the tank must
(1) fit within the truck bed and (2) carry the required volume of material.
Lmax
Dmax
t
L
D
t
FIGURE 16.1
Parameters for determining the optimal dimensions of a cylindrical tank.
TABLE 16.1 Parameters for determining the optimal dimensions of a cylindrical tank used
to transport toxic wastes.
Parameter Symbol Value Units
Required volume Vo 0.8 m3
Thickness t 3 cm
Density r 8000 kg/m3
Bed length Lmax 2 m
Bed width Dmax 1 m
Material cost cm 4.5 $/kg
Welding cost cw 20 $/m
418 CASE STUDIES: OPTIMIZATION
The cost consists of tank material and welding costs. Therefore, the objective
function can be formulated as minimizing
C 5 cm m 1 cw/w (16.1)
where C 5 cost ($), m 5 mass (kg), /w 5 weld length (m), and cm and cw 5 cost factors
for mass ($/kg) and weld length ($/m), respectively.
Next, we will formulate how the mass and weld lengths are related to the dimensions
of the drum. First, the mass can be calculated as the volume of material times its density.
The volume of the material used to create the side walls (that is, the cylinder) can be
computed as
Vcylinder 5 Lpc a
D
2
1 tb
2
2 a
D
2
b
2
d
For each circular end plate, it is
Vplate 5 pa
D
2
1 tb
2
t
Thus, the mass is computed by
m 5 reLpc a
D
2
1 tb
2
2 a
D
2
b
2
d 1 2pa
D
2
1 tb
2
tf (16.2)
where r 5 density (kg/m3
).
The weld length for attaching each plate is equal to the cylinder’s inside and outside
circumference. For the two plates, the total weld length would be
/w 5 2c2pa
D
2
1 tb 1 2p
D
2
d 5 4p(D 1 t) (16.3)
Given values for D and L (remember, thickness t is fixed by regulations), Eqs. (16.1) through
(16.3) provide a means to compute cost. Also recognize that when Eqs. (16.2) and (16.3)
are substituted into Eq. (16.1), the resulting objective function is nonlinear in the unknowns.
Next, we can formulate the constraints. First, we must compute how much volume
can be held within the finished tank,
V 5
pD2
4
L
This value must be equal to the desired volume. Thus, one constraint is
pD2
L
4
5 Vo
where Vo is the desired volume (m3
).
The remaining constraints deal with ensuring that the tank will fit within the dimen-
sions of the truck bed,
L # Lmax
D # Dmax
16.1 LEAST-COST DESIGN OF A TANK 419
The problem is now specified. Substituting the values from Table 16.1, it can be
summarized as
Maximize C 5 4.5m 1 20/w
subject to
pD2
L
4
5 0.8
L # 2
D # 1
where
m 5 8000eLpc a
D
2
1 0.03b
2
2 a
D
2
b
2
d 1 2pa
D
2
1 0.03b
2
0.03f
and
/w 5 4p(D 1 0.03)
The problem can now be solved in a number of ways. However, the simplest approach
for a problem of this magnitude is to use a tool like the Excel Solver. The spreadsheet to
accomplish this is shown in Fig. 16.2.
For the case shown, we enter the upper limits for D and L. For this case, the volume
is more than required (1.57 . 0.8).
FIGURE 16.2
Excel spreadsheet set up to
evaluate the cost of a tank
subject to a volume requirement
and size constraints.
420 CASE STUDIES: OPTIMIZATION
Once the spreadsheet is created, the selection Solver is chosen from the Data tab.
At this point a dialogue box will be displayed, querying you for pertinent information.
The pertinent cells of the Solver dialogue box would be filled out as
When the OK button is selected, a dialogue box will open with a report on the success
of the operation. For the present case, the Solver obtains the correct solution, which is
shown in Fig. 16.3. Notice that the optimal diameter is nudging up against the constraint
of 1 m. Thus, if the required capacity of the tank were increased, we would run up against
this constraint and the problem would reduce to a one-dimensional search for length.
FIGURE 16.3
Results of minimization. The
price is reduced from $9154 to
$5723 because of the smaller
volume using dimensions of
D 5 0.98 m and L 5 1.05 m.
16.2 LEAST-COST TREATMENT OF WASTEWATER 421
16.2 LEAST-COST TREATMENT OF WASTEWATER
(CIVIL/ENVIRONMENTAL ENGINEERING)
Background. Wastewater discharges from big cities are often a major cause of river
pollution. Figure 16.4 illustrates the type of system that an environmental engineer might
confront. Several cities are located on a river and its tributary. Each generates pollution
at a loading rate P that has units of milligrams per day (mg/d). The pollution loading is
subject to waste treatment that results in a fractional removal x. Thus, the amount
discharged to the river is the excess not removed by treatment,
Wi 5 (1 2 xi)Pi (16.4)
where Wi 5 waste discharge from the ith city.
When the waste discharge enters the stream, it mixes with pollution from upstream
sources. If complete mixing is assumed at the discharge point, the resulting concentration
at the discharge point can be calculated by a simple mass balance,
ci 5
Wi 1 Qu cu
Qi
(16.5)
where Qu 5 flow (L/d), cu 5 concentration (mg/L) in the river immediately upstream of
the discharge, and Qi 5 flow downstream of the discharge point (L/d).
After the concentration at the mixing point is established, chemical and biological
decomposition processes can remove some of the pollution as it flows downstream. For
the present case, we will assume that this removal can be represented by a simple frac-
tional reduction factor R.
Assuming that the headwaters (that is, the river above cities 1 and 2) are pollution-
free, the concentrations at the four nodes can be computed as
c1 5
(1 2 x1)P1
Q13
c2 5
(1 2 x2)P2
Q23
c3 5
R13Q13c1 1 R23 Q23 c2 1 (1 2 x3)P3
Q34
(16.6)
c4 5
R34 Q34 c3 1 (1 2 x4)P4
Q45
FIGURE 16.4
Four wastewater treatment
plants discharging pollution to a
river system. The river segments
between the cities are labeled
with circled numbers.
4
P1
3
2
P4
P2
P3
W1
W2
W3 W4
34
23
13
45
WWTP2
1
WWTP1
WWTP4
WWTP3
422 CASE STUDIES: OPTIMIZATION
Next, it is recognized that the waste treatment costs a different amount, di ($1000/mg
removed), at each of the facilities. Thus, the total cost of treatment (on a daily basis) can
be calculated as
Z 5 d1 P1x1 1 d2 P2 x2 1 d3 P3 x3 1 d4P4 x4 (16.7)
where Z is total daily cost of treatment ($1000/d).
The final piece in the “decision puzzle” involves environmental regulations. To pro-
tect the beneficial uses of the river (for example, boating, fisheries, bathing), regulations
say that the river concentration must not exceed a water-quality standard of cs.
Parameters for the river system in Fig. 16.4 are summarized in Table 16.2. Notice
that there is a difference in treatment cost between the upstream (1 and 2) and the down-
stream cities (3 and 4) because of the outmoded nature of the downstream plants.
The concentration can be calculated with Eq. (16.6) and the result listed in the shaded
column for the case where no waste treatment is implemented (that is, all the x’s 5 0).
Notice that the standard of 20 mg/L is being violated at all mixing points.
Use linear programming to determine the treatment levels that meet the water-quality
standards for the minimum cost. Also, evaluate the impact of making the standard more
stringent below city 3. That is, redo the exercise, but with the standards for segments
3–4 and 4–5 lowered to 10 mg/L.
Solution. All the factors outlined above can be combined into the following linear
programming problem:
Minimize Z 5 d1P1x1 1 d2P2x2 1 d3P3x3 1 d4P4x4 (16.8)
subject to the following constraints
(1 2 x1)P1
Q13
# cs1
(1 2 x2)P2
Q23
# cs2
R13Q13c1 1 R23Q23c2 1 (1 2 x3)P3
Q34
# cs3 (16.9)
R34Q34c3 1 (1 2 x4)P4
Q45
# cs4
0 # x1, x2, x3, x4 # 1 (16.10)
TABLE 16.2 Parameters for four wastewater treatment plants discharging pollution to a river system,
along with the resulting concentration (ci) for zero treatment. Flow, removal, and standards
for the river segments are also listed.
City Pi (mg/d) di ($1026
/mg) ci (mg/L) Segment Q (L/d) R cs (mg/L)
1 1.00 3 109
2 100 1–3 1.00 3 107
0.5 20
2 2.00 3 109
2 40 2–3 5.00 3 107
0.35 20
3 4.00 3 109
4 47.3 3–4 1.10 3 108
0.6 20
4 2.50 3 109
4 22.5 4–5 2.50 3 108
20
16.2 LEAST-COST TREATMENT OF WASTEWATER 423
Thus, the objective function is to minimize treatment cost [Eq. (16.8)] subject to the
constraint that water-quality standards must be met for all parts of the system [Eq. (16.9)].
In addition, treatment cannot be negative or greater than 100% removal [Eq. (16.10)].
The problem can be solved using a variety of packages. For the present application,
we use the Excel spreadsheet. As seen in Fig. 16.5, these data along with the concentra-
tion calculations can be set up nicely in the spreadsheet cells.
Once the spreadsheet is created, the selection Solver is chosen from the Data tab.
At this point a dialogue box will be displayed, querying you for pertinent information.
The pertinent cells of the Solver dialogue box would be filled out as
FIGURE 16.5
Excel spreadsheet set up to
evaluate the cost of waste
treatment on a regulated river
system. Column F contains the
calculation of concentration
according to Eq. (16.6). Cells
F4 and H4 are highlighted to
show the formulas used to
calculate c1 and treatment cost
for city 1. In addition,
highlighted cell H9 shows the
formula (Eq. 16.8) for total
cost that is to be minimized.
Notice that not all the constraints are shown, because the dialogue box displays only six
constraints at a time.
424 CASE STUDIES: OPTIMIZATION
When the OK button is selected, a dialogue box will open with a report on the success of
the operation. For the present case, the Solver obtains the correct solution, which is shown
in Fig. 16.6. Before accepting the solution (by selecting the OK button on the Solver
Reports box), notice that three reports can be generated: Answer, Sensitivity, and Limits.
Select the Sensitivity Report and then hit the OK button to accept the solution. The Solver
will automatically generate a Sensitivity Report, as in Fig. 16.7.
Now let us examine the solution (Fig. 16.6). Notice that the standard will be met at
all the mixing points. In fact, the concentration at city 4 will actually be less than the
standard (16.28 mg/L), even though no treatment would be required for city 4.
As a final exercise, we can lower the standards for reaches 3–4 and 4–5 to 10 mg/L.
Before doing this, we can examine the Sensitivity Report. For the present case, the key
column of Fig. 16.7 is the Lagrange Multiplier (aka the “shadow price”). The shadow
price is a value that expresses the sensitivity of the objective function (in our case, cost)
to a unit change of one of the constraints (water-quality standards). It therefore represents
the additional cost that will be incurred by making the standards more stringent. For our
example, it is revealing that the largest shadow price, 2$440yDcs3, occurs for one of the
standard changes (that is, downstream from city 3) that we are contemplating. This tips
us off that our modification will be costly.
This is confirmed when we rerun Solver with the new standards (that is, we lower
cells G6 and G7 to 10). As seen in Table 16.3, the result is that treatment cost is increased
from $12,600/day to $19,640/day. In addition, reducing the standard concentrations for
the lower reaches means that city 4 must begin to treat its waste and city 3 must upgrade
its treatment. Notice also that the treatment of the upstream cities is unaffected.
FIGURE 16.6
Results of minimization. The water-quality standards are met at a cost of $12,600/day. Notice
that despite the fact that no treatment is required for city 4, the concentration at its mixing point
actually exceeds the standard.
16.3 MAXIMUM POWER TRANSFER FOR A CIRCUIT 425
16.3 MAXIMUM POWER TRANSFER FOR A CIRCUIT
(ELECTRICAL ENGINEERING)
Background. The simple resistor circuit in Fig. 16.8 contains three fixed resistors and
one adjustable resistor. Adjustable resistors are called potentiometers. The values for the
parameters are V 5 80 V, R1 5 8 V, R2 5 12 V, and R3 5 10 V. (a) Find the value of the
adjustable resistance Ra that maximizes the power transfer across terminals 1 and 2. (b)
Perform a sensitivity analysis to determine how the maximum power and the corresponding
setting of the potentiometer (Ra) varies as V is varied over the range from 45 to 105 V.
FIGURE 16.7
Sensitivity Report for spread-
sheet set up to evaluate the cost
of waste treatment on a regu-
lated river system.
TABLE 16.3 Comparison of two scenarios involving the impact of different regulations
on treatment costs.
Scenario 1: All cs 5 20 Scenario 2: Downstream cs 5 10
City x c City x c
1 0.8 20 1 0.8 20
2 0.5 20 2 0.5 20
3 0.5625 20 3 0.8375 10
4 0 15.28 4 0.264 10
Cost 5 $12,600 Cost 5 $19,640
426 CASE STUDIES: OPTIMIZATION
Solution. An expression for power for the circuit can be derived from Kirchhoff’s laws as
P(Ra) 5
c
VR3 Ra
R1(Ra 1 R2 1 R3) 1 R3 Ra 1 R3 R2
d
2
Ra
(16.11)
Substituting the parameter values gives the plot shown in Fig. 16.9. Notice that a maximum
power transfer occurs at a resistance of about 16 V.
We will solve this problem in two ways with the Excel spreadsheet. First, we will
employ trial-and-error and the Solver option. Then, we will develop a Visual Basic macro
program to perform the sensitivity analysis.
(a) An Excel spreadsheet to implement Eq. (16.11) is shown in Fig. 16.10. As indi-
cated, Eq. (16.11) can be entered into cell B9. Then the value of Ra (cell B8) can be
varied in a trial-and-error fashion until a minimum drag was determined. For this ex-
ample, the result is a power of 30.03 W and a potentiometer setting of Ra 5 16.44 V.
A superior approach involves using the Solver option from the spreadsheet’s Data
tab. At this point a dialogue box will be displayed, querying you for pertinent informa-
tion. The pertinent cells of the Solver dialogue box would be filled out as
R3
1
2
V
⫹
⫺
R2
R1
Ra
FIGURE 16.8
A resistor circuit with an
adjustable resistor, or
potentiometer.
50 100
40
0
0
Ra
P(Ra)
20
Maximum
power
FIGURE 16.9
A plot of power transfer across
terminals 1-2 from Fig. 16.8 as
a function of the potentiometer
resistance Ra.
Set target cell: B9
Equal to ● max ❍ min ❍ equal to 0
By changing cells B8
16.3 MAXIMUM POWER TRANSFER FOR A CIRCUIT 427
When the OK button is selected, a dialogue box will open with a report on the success
of the operation. For the present case, the Solver obtains the same correct solution shown
in Fig. 16.10.
(b) Now, although the foregoing approach is excellent for a single evaluation, it is
not convenient for cases where multiple optimizations would be employed. Such would
be the case for the second part of this application, where we are interested in determin-
ing how the maximum power varies for different voltage settings. Of course, the Solver
could be invoked multiple times for different parameter values, but this would be inef-
ficient. A preferable course would involve developing a macro function to come up with
the optimum.
Such a function is listed in Fig. 16.11. Notice how closely it resembles the golden-
section-search pseudocode previously presented in Fig. 13.5. In addition, notice that a
function must also be defined to compute power according to Eq. (16.11).
An Excel spreadsheet utilizing this macro to evaluate the sensitivity of the solution
to voltage is given in Fig. 16.12. A column of values is set up that spans the range of
V’s (that is, from 45 to 105 V). A function call to the macro is written in cell B9 that
references the adjacent value of V (the 45 in A9). In addition, the other parameters in
the function argument are also included. Notice that, whereas the reference to V is rela-
tive, the references to the lower and upper guesses and the resistances are absolute (that
is, including leading $). This was done so that when the formula is copied down, the
absolute references stay fixed, whereas the relative reference corresponds to the voltage
in the same row. A similar strategy is used to place Eq. (16.11) in cell C9.
When the formulas are copied downward, the result is as shown in Fig. 16.12. The
maximum power can be plotted to visualize the impact of voltage variations. As seen in
Fig. 16.13, the power increases with V.
The results for the corresponding potentiometer settings (Ra) are more interesting.
The spreadsheet indicates that the same setting, 16.44 V, results in maximum power.
Such a result might be difficult to intuit based on casual inspection of Eq. (16.11).
FIGURE 16.10
Excel determination of maximum power across a potentiometer using trial-and-error.
428 CASE STUDIES: OPTIMIZATION
Option Explicit
Function Golden(xlow, xhigh, R1, R2, R3, V)
Dim iter As Integer, maxit As Integer, ea As Double, es As Double
Dim fx As Double, xL As Double, xU As Double, d As Double, x1 as Double
Dim x2 As Double, f1 As Double, f2 As Double, xopt As Double
Const R As Double = (5 ^ 0.5 – 1) / 2
maxit = 50
es = 0.001
xL = xlow
xU = xhigh
iter = 1
d = R * (xU – xL)
x1 = xL + d
x2 = xU – d
f1 = f(x1, R1, R2, R3, V)
f2 = f(x2, R1, R2, R3, V)
If f1  f2 Then
xopt = x1
fx = f1
Else
xopt = x2
fx = f2
End If
Do
d = R * d
If f1  f2 Then
xL = x2
x2 = x1
x1 = xL + d
f2 = f1
f1 = f(x1, R1, R2, R3, V)
Else
xU = x1
x1 = x2
x2 = xU – d
f1 = f2
f2 = f(x2, R1, R2, R3, V)
End If
iter = iter + 1
If f1  f2 Then
xopt = x1
fx =f1
Else
xopt = x2
fx = f2
End If
If xopt  0 Then ea = (1 – R) * Abs((xU – xL) / xopt) * 100
If ea = es Or iter = maxit Then Exit Do
Loop
Golden = xopt
End Function
Function f(Ra, R1, R2, R3, V)
f = (V * R3 * Ra / (R1 * (Ra + R2 + R3) + R3 * Ra + R3 * R2)) ^ 2 / Ra
END FUNCTION
FIGURE 16.11
Excel macro written in Visual
Basic to determine a maximum
with the golden-section search.
16.4 EQUILIBRIUM AND MINIMUM POTENTIAL ENERGY 429
FIGURE 16.13
Results of sensitivity analysis of the effect of voltage variations on maximum power.
45 105
40
60
0
P (W)
75
20 Ra (⍀)
V (V)
16.4 EQUILIBRIUM AND MINIMUM POTENTIAL ENERGY
(MECHANICAL/AEROSPACE ENGINEERING)
Background. As in Figure 16.14a, an unloaded spring can be attached to a wall mount.
When a horizontal force is applied the spring stretches. The displacement is related to
the force by Hooke’s law, F 5 kx. The potential energy of the deformed state consists
of the difference between the strain energy of the spring and the work done by the force,
PE(x) 5 0.5kx2
2 Fx (16.12)
Equation (16.12) defines a parabola. Since the potential energy will be at a minimum at
equilibrium, the solution for displacement can be viewed as a one-dimensional optimization
FIGURE 16.12
Excel spreadsheet to implement a sensitivity analysis of the maximum power to variations of
voltage. This routine accesses the macro program for golden-section search from Fig. 16.11.
=(A9*$B$5*B9/($B$3*(B9+$B$4+$B$5)+$B$5*B9+$B$3*$B$4))^2/B9
= Golden($B$6,$B$7,$B$3,$B$4,$B$5,A9)
Call to Visual Basic
macro function
Power calculation
A B C D
1 Maximum Power Transfer
2
3 R1 8
4 R2 12
5 R3 10
6 Rmin 0.1
7 Rmax 100
8 V Ra P(Ra)
9 45 16.44444 9.501689
10 60 16.44444 16.89189
11 75 16.44444 26.39358
12 90 16.44444 38.00676
13 105 16.44444 51.73142
430 CASE STUDIES: OPTIMIZATION
problem. Because this equation is so easy to differentiate, we can solve for the displacement
as x 5 Fyk. For example, if k 5 2 N/cm and F 5 5 N, x 5 5Ny(2 N/cm)y5 5 2.5 cm.
A more interesting two-dimensional case is shown in Figure 16.15. In this system,
there are two degrees of freedom in that the system can move both horizontally and
vertically. In the same way that we approached the one-dimensional system, the equilib-
rium deformations are the values of x1 and x2 that minimize the potential energy,
PE(x1, x2) 5 0.5ka(2x2
1 1 (La 2 x2)2
2 La)2
1 0.5kb(2x2
1 1 (Lb 1 x2)2
2 Lb)2
2 F1x1 2 F2x2
(16.13)
If the parameters are ka 5 9 N/cm, kb 5 2 N/cm, La 5 10 cm, Lb 5 10 cm, F1 5 2 N,
and F2 5 4 N, solve for the displacements and the potential energy.
Background. We can use a variety of software tools to solve this problem. For example,
using MATLAB, an M-file can be developed to hold the potential energy function,
function p=PE(x,ka,kb,La,Lb,F1,F2)
PEa=0.5*ka*(sqrt(x(1)^2+(La-x(2))^2)-La)^2;
PEb=0.5*kb*(sqrt(x(1)^2+(Lb+x(2))^2)-Lb)^2;
W=F1*x(1)+F2*x(2);
p=PEa+PEb-W;
The solution can then be obtained with the fminsearch function,
 ka=9;kb=2;La=10;Lb=10;F1=2;F2=4;
 [x,f]=fminsearch(@PE,[—0.5,0.5],[],ka,kb,La,Lb,F1,F2)
x =
4.9523 1.2769
f =
-9.6422
Thus, at equilibrium, the potential energy is 29.6422 N?cm. The connecting point is
located 4.9523 cm to the right and 1.2759 cm above its original position.
FIGURE 16.14
(a) An unloaded spring at-
tached to a wall mount. (b) Ap-
plication of a horizontal force
stretches the spring where the
relationship between force and
displacement is described by
Hooke’s law.
k
(a)
(b)
x
F
FIGURE 16.15
A two-spring system: (a) unloaded, and (b) loaded.
F1
x1
x2
ka
kb
Lb
La
F2
(a) (b)
PROBLEMS 431
PROBLEMS
Chemical/Bio Engineering
16.1 Design the optimal cylindrical container (Fig. P16.1) that is
open at one end and has walls of negligible thickness. The con-
tainer is to hold 0.5 m3
. Design it so that the areas of its bottom and
sides are minimized.
16.2 (a) Design the optimal conical container (Fig. P16.2) that has
a cover and has walls of negligible thickness. The container is to
hold 0.5 m3
. Design it so that the areas of its top and sides are mini-
mized. (b) Repeat (a) but for a conical container without a cover.
16.3 Design the optimal cylindrical tank with dished ends
(Fig. P16.3). The container is to hold 0.5 m3
and has walls of negli-
gible thickness. Note that the area and volume of each of the dished
ends can be computed with
A 5 p(h2
1 r2
) V 5
ph(h2
1 3r2
)
6
(a) Design the tank so that its surface area is minimized. Interpret
the result.
(b) Repeat part (a), but add the constraint L $ 2h.
FIGURE P16.1
A cylindrical container with no lid.
h
r
Open
FIGURE P16.2
A conical container with a lid.
h
r
Lid
FIGURE P16.3
L
h
r
16.4 The specific growth rate of a yeast that produces an antibiotic
is a function of the food concentration c,
g 5
2c
4 1 0.8c 1 c2
1 0.2c3
As depicted in Fig. P16.4, growth goes to zero at very low concen-
trations due to food limitation. It also goes to zero at high concen-
trations due to toxicity effects. Find the value of c at which growth
is a maximum.
16.5 A chemical plant makes three major products on a weekly
basis. Each of these products requires a certain quantity of raw
chemical and different production times, and yields different
profits. The pertinent information is in Table P16.5. Note that
there is sufficient warehouse space at the plant to store a total of
450 kg/week.
FIGURE P16.4
The specific growth rate of a yeast that produces an antibiotic
versus the food concentration.
5 10
0.4
0
0
c (mg/L)
g
(d⫺1
) 0.2
432 CASE STUDIES: OPTIMIZATION
that the initial cost of the system is a function of the conversion xA.
Find the conversion that will result in the lowest cost system. C is a
proportionality constant.
Cost 5 C c a
1
(1 2 xA)2
b
0.6
1 5a
1
xA
b
0.6
d
16.9 In problem 16.8, only one reactor is used. If two reactors are
used in series, the governing equation for the system changes. Find
the conversions for both reactors (xA1 and xA2) such that the total
cost of the system is minimized.
Cost 5
C c a
xA1
xA2(1 2 xA1)2
b
0.6
1 a
1 2 (xA1
xA2
)
(1 2 xA2)2
b
0.6
1 5a
1
xA2
b
0.6
d
16.10 For the reaction:
2A 1 B 3 C
equilibrium can be expressed as:
K 5
[C]
[A]2
[B]
5
[C]
[A0 2 2C]2
[B0 2 C]
If K 5 2 M21
, the initial concentration of A (A0) can be varied. The
initial concentration of B is fixed by the process, B0 5 100. A costs
$1/M and C sells for $10/M. What would be the optimum initial
concentration ofA to use such that the profits would be maximized?
16.11 A chemical plant requires 106
L/day of a solution.Three sources
are available at different prices and supply rates. Each source also has
a different concentration of an impurity that must be kept below a
minimum level to prevent interference with the chemical. The data for
the three sources are summarized in the following table. Determine the
amount from each source to meet the requirements at the least cost.
Source 1 Source 2 Source 3 Required
Cost ($yL) 0.50 1.00 1.20 minimize
Supply (105
Lyday) 20 10 5 $10
Concentration (mgyL) 135 100 75 #100
(a) Set up a linear programming problem to maximize profit.
(b) Solve the linear programming problem with the simplex method.
(c) Solve the problem with a software package.
(d) Evaluate which of the following options will raise profits the
most: increasing raw chemical, production time, or storage.
16.6 Recently chemical engineers have become involved in the
area known as waste minimization. This involves the operation of a
chemical plant so that impacts on the environment are minimized.
Suppose a refinery develops a product Z1 made from two raw
materials X and Y. The production of 1 metric tonne of the product
involves 1 tonne of X and 2.5 tonnes of Y and produces 1 tonne of a
liquid waste W. The engineers have come up with three alternative
ways to handle the waste:
• Produce a tonne of a secondary product Z2 by adding an addi-
tional tonne of X to each tonne of W.
• Produce a tonne of another secondary product Z3 by adding an
additional tonne of Y to each tonne of W.
• Treat the waste so that it is permissible to discharge it.
The products yield profits of $2500, 2$50, and $200/tonne for Z1,
Z2, and Z3, respectively. Note that producing Z2 actually creates a
loss. The treatment process costs $300/tonne. In addition, the com-
pany has access to a limit of 7500 and 10,000 tonnes of X and Y,
respectively, during the production period. Determine how much of
the products and waste must be created in order to maximize profit.
16.7 A mixture of benzene and toluene are to be separated in a flash
tank. At what temperature should the tank be operated to get the
highest purity toluene in the liquid phase (maximizing xT)? The pres-
sure in the flash tank is 800 mm Hg. The units for Antoine’s equation
are mm Hg and 8C for pressure and temperature, respectively.
xB PsatB 1 xT PsatT 5 P
log10 (PsatB) 5 6.905 2
1211
T 1 221
log10 (PsatT) 5 6.953 2
1344
T 1 219
16.8 A compoundA will be converted into B in a stirred tank reactor.
The product B and unreacted A are purified in a separation unit.
Unreacted A is recycled to the reactor. A process engineer has found
TABLE P16.5
Resource
Product 1 Product 2 Product 3 Availability
Raw chemical 7 kg/kg 5 kg/kg 13 kg/kg 3000 kg
Production time 0.05 hr/kg 0.1 hr/kg 0.2 hr/kg 55 hr/week
Profit $30/kg $30/kg $35/kg
PROBLEMS 433
16.16 Suppose that you are asked to design a column to support
a compressive load P, as shown in Fig. P16.16a. The column
has a cross-section shaped as a thin-walled pipe as shown in
Fig. P16.16b.
The design variables are the mean pipe diameter d and the wall
thickness t. The cost of the pipe is computed by
Cost 5 f(t, d) 5 c1W 1 c2d
where c1 5 4 and c2 5 2 are cost factors and W 5 weight of the pipe,
W 5 pdtHr
where r 5 density of the pipe material 5 0.0025 kg/cm3
. The col-
umn must support the load under compressive stress and not buckle.
Therefore,
Actual stress (s) # maximum compressive yield stress
5 sy 5 550 kg/cm2
Actual stress # buckling stress
16.12 You must design a triangular open channel to carry a waste
stream from a chemical plant to a waste stabilization pond
(Fig. P16.12). The mean velocity increases with the hydraulic
radius Rh 5 Ayp, where A is the cross-sectional area and p equals
the wetted perimeter. Because the maximum flow rate corresponds
to the maximum velocity, the optimal design amounts to minimiz-
ing the wetted perimeter. Determine the dimensions to minimize
the wetted perimeter for a given cross-sectional area.Are the relative
dimensions universal?
16.13 As an agricultural engineer, you must design a trapezoi-
dal open channel to carry irrigation water (Fig. P16.13). Deter-
mine the optimal dimensions to minimize the wetted perimeter
for a cross-sectional area of 100 m2
. Are the relative dimensions
universal?
16.14 Find the optimal dimensions for a heated cylindrical tank
designed to hold 10 m3
of fluid. The ends and sides cost $200/m2
and $100/m2
, respectively. In addition, a coating is applied to the
entire tank area at a cost of $50/m2
.
Civil/Environmental Engineering
16.15 A finite-element model of a cantilever beam subject to load-
ing and moments (Fig. P16.15) is given by optimizing
f(x, y) 5 5x2
2 5xy 1 2.5y2
2 x 2 1.5y
where x 5 end displacement and y 5 end moment. Find the values
of x and y that minimize f(x, y).
FIGURE P16.12
w
d
␪
␪
FIGURE P16.13
w
d
␪
␪
FIGURE P16.15
A cantilever beam.
x
y
FIGURE P16.16
(a) A column supporting a compressive load P. (b) The column
has a cross section shaped as a thin-walled pipe.
(a)
H
P
(b)
d
t
434 CASE STUDIES: OPTIMIZATION
below the point discharge. This point is called “critical” because it
represents the location where biota that depend on oxygen (like
fish) would be the most stressed. Determine the critical travel time
and concentration, given the following values:
os 5 10 mg/L kd 5 0.1 d21
ka 5 0.6 d21
ks 5 0.05 d21
Lo 5 50 mg/L Sb 5 1 mg/L/d
16.18 The two-dimensional distribution of pollutant concentration
in a channel can be described by
c(x, y) 5 7.9 1 0.13x 1 0.21y 2 0.05x2
2 0.016y2
2 0.007xy
Determine the exact location of the peak concentration given the
function and the knowledge that the peak lies within the bounds
210 # x # 10 and 0 # y # 20.
16.19 The flow Q (m3
/s) in an open channel can be predicted with
the Manning equation
Q 5
1
n
AcR2y3
S1y2
where n 5 Manning roughness coefficient (a dimensionless num-
ber used to parameterize the channel friction), Ac 5 cross-sectional
area of the channel (m2
), S 5 channel slope (dimensionless, meters
drop per meter length), and R 5 hydraulic radius (m), which is re-
lated to more fundamental parameters by R 5 AcyP, where P 5
wetted perimeter (m). As the name implies, the wetted perimeter is
the length of the channel sides and bottom that is under water. For
example, for a rectangular channel, it is defined as P 5 B 1 2H,
where H 5 depth (m). Suppose that you are using this formula to
design a lined canal (note that farmers line canals to minimize leak-
age losses).
(a) Given the parameters n 5 0.035, S 5 0.003, and Q 5 1 m3
/s,
determine the values of B and H that minimize the wetted pe-
rimeter. Note that such a calculation would minimize cost if
lining costs were much larger than excavation costs.
(b) Repeat part (a), but include the cost of excavation. To do this
minimize the following cost function,
C 5 c1 Ac 1 c2P
where c1 is a cost factor for excavation 5 $100/m2
and c2 is a
cost factor for lining $50/m.
(c) Discuss the implications of your results.
16.20 A cylindrical beam carries a compression load P 5 3000 kN.
To prevent the beam from buckling, this load must be less than a
critical load,
Pc 5
p2
EI
L2
The actual stress is given by
s 5
P
A
5
P
pdt
The buckling stress can be shown to be
sb 5
pEI
H2
dt
where E 5 modulus of elasticity and I 5 second moment of the
area of the cross section. Calculus can be used to show that
I 5
p
8
dt(d2
1 t2
)
Finally, diameters of available pipes are between d1 and d2 and
thicknesses between t1 and t2. Develop and solve this problem by
determining the values of d and t that minimize the cost. Note that
H 5 275 cm, P 5 2000 kg, E 5 900,000 kg/cm2
, d1 5 1 cm, d2 5
10 cm, t1 5 0.1 cm, and t2 5 1 cm.
16.17 The Streeter-Phelps model can be used to compute the
dissolved oxygen concentration in a river below a point discharge
of sewage (Fig. P16.17),
o 5 os 2
kd Lo
kd 1 ks 2 ka
(e2ka t
2 e2(kd1ks)t
) 2
Sb
ka
(1 2 e2ka t
)
(P16.17)
where o 5 dissolved oxygen concentration (mg/L), os 5 oxygen
saturation concentration (mg/L), t 5 travel time (d), Lo 5 biochem-
ical oxygen demand (BOD) concentration at the mixing point
(mg/L), kd 5 rate of decomposition of BOD (d21
), ks 5 rate of set-
tling of BOD (d21
), ka 5 reaeration rate (d21
), and Sb 5 sediment
oxygen demand (mg/L/d).
As indicated in Fig. P16.17, Eq. (P16.17) produces an oxygen
“sag” that reaches a critical minimum level oc some travel time tc
15 20
8
12
0
0
t (d)
5
4
10
o
(mg/L) o
os
tc
oc
FIGURE P16.17
A dissolved oxygen “sag” below a point discharge of sewage
into a river.
PROBLEMS 435
16.24 A system consists of two power plants that must deliver
loads over a transmission network. The costs of generating power
at plants 1 and 2 are given by
F1 5 2p1 1 2
F2 5 10p2
where p1 and p2 5 power produced by each plant. The losses of
power due to transmission L are given by
L1 5 0.2p1 1 0.1p2
L2 5 0.2p1 1 0.5p2
The total demand for power is 30 and p1 must not exceed 42.
Determine the power generation needed to meet demands while
minimizing cost using an optimization routine such as those
found in, for example, Excel, MATLAB, or Mathcad software.
16.25 The torque transmitted to an induction motor is a function of
the slip between the rotation of the stator field and the rotor speed s
where slip is defined as
s 5
n 2 nR
n
where n 5 revolutions per second of rotating stator speed and nR 5
rotor speed. Kirchhoff’s laws can be used to show that the torque
(expressed in dimensionless form) and slip are related by
T 5
15(s 2 s2
)
(1 2 s)(4s2
2 3s 1 4)
Figure P16.25 shows this function. Use a numerical method to
determine the slip at which the maximum torque occurs.
16.26
(a) A computer equipment manufacturer produces scanners and
printers. The resources needed for producing these devices and
the corresponding profits are
Device Capital ($/unit) Labor (hr/unit) Profit ($/unit)
Scanner 300 20 500
Printer 400 10 400
where E 5 Young’s modulus 5 200 3 109
N/m2
, I 5 pr4
y4 (the
area moment of inertia for a cylindrical beam of radius r), and L
is the beam length. If the volume of beam V cannot exceed 0.075 m3
,
find the largest height L that can be utilized and the correspond-
ing radius.
16.21 The Splash River has a flow rate of 2 3 106
m3
/d, of which
up to 70% can be diverted into two channels where it flows through
Splish County. These channels are used for transportation, irriga-
tion, and electric power generation, with the latter two being
sources of revenue. The transportation use requires a minimum di-
verted flow rate of 0.3 3 106
m3
/d for Channel 1 and 0.2 3 106
m3
/d
for Channel 2. For political reasons it has been decided that the
absolute difference between the flow rates in the two channels can-
not exceed 40% of the total flow diverted into the channels. The
Splish County Water Management Board has also limited mainte-
nance costs for the channel system to be no more than $1.8 3 106
per year. Annual maintenance costs are estimated based on the
daily flow rate. Channel 1 costs per year are estimated by multiply-
ing $1.1 times the m3
/d of flow; while for Channel 2 the multiplica-
tion factor is $1.4 per m3
/d. Electric power production revenue
is also estimated based on daily flow rate. For Channel 1 this is
$4.0 per m3
/d, while for Channel 2 it is $3.0 per m3
/d. Annual
revenue from irrigation is also estimated based on daily flow
rate, but the flow rates must first be corrected for water loss in
the channels previous to delivery for irrigation. This loss is 30%
in Channel 1 and 20% in Channel 2. In both channels the reve-
nue is $3.2 per m3
/d. Determine the flows in the channels that
maximize profit.
16.22 Determine the beam cross-sectional areas that result in the
minimum weight for the truss we studied in Sec. 12.2 (Fig. 12.4).
The critical buckling and maximum tensile strengths of compres-
sion and tension members are 10 and 20 ksi, respectively. The
truss is to be constructed of steel (density 5 3.5 lb/ft-in2
). Note
that the length of the horizontal member (2) is 50 ft. Also, recall
that the stress in each member is equal to the force divided
by cross-sectional area. Set up the problem as a linear program-
ming problem. Obtain the solution graphically and with the
Excel Solver.
Electrical Engineering
16.23 A total charge Q is uniformly distributed around a ring-
shaped conductor with radius a. A charge q is located at a distance
x from the center of the ring (Fig. P16.23). The force exerted on the
charge by the ring is given by
F 5
1
4pe0
qQx
(x2
1 a2
)3y2
where e0 5 8.85 3 10212
C2
y(N m2
), q 5 Q 5 2 3 1025
C, and a 5
0.9 m. Determine the distance x where the force is a maximum.
FIGURE P16.23
x
a
Q
q
436 CASE STUDIES: OPTIMIZATION
where D 5 drag, s 5 ratio of air density between the flight altitude
and sea level, W 5 weight, and V 5 velocity.As seen in Fig. P16.28,
the two factors contributing to drag are affected differently as
velocity increases. Whereas friction drag increases with velocity, the
drag due to lift decreases. The combination of the two factors leads
to a minimum drag.
(a) If s 5 0.5 and W 5 15,000, determine the minimum drag and
the velocity at which it occurs.
(b) In addition, develop a sensitivity analysis to determine how this
optimum varies in response to a range of W 5 12,000 to 18,000
with s 5 0.5.
16.29 Roller bearings are subject to fatigue failure caused by large
contact loads F (Fig. P16.29).
If there are $127,000 worth of capital and 4270 hr of labor
available each day, how many of each device should be pro-
duced per day to maximize profit?
(b) Repeat the problem, but now assume that the profit for each
printer sold Pp depends on the number of printers produced
Xp, as in
Pp 5 400 2 Xp
16.27 A manufacturer provides specialized microchips. During the
next 3 months, its sales, costs, and available time are
Month 1 Month 2 Month 3
Chips required 1000 2500 2200
Cost regular time ($/chip) 100 100 120
Cost overtime ($/chip) 110 120 130
Regular operation time (hr) 2400 2400 2400
Overtime (hr) 720 720 720
There are no chips in stock at the beginning of the first month. It
takes 1.5 hr of production time to produce a chip and costs $5 to
store a chip from one month to the next. Determine a production
schedule that meets the demand requirements, does not exceed the
monthly production time limitations, and minimizes cost. Note that
no chips should be in stock at the end of the 3 months.
Mechanical/Aerospace Engineering
16.28 The total drag on an airfoil can be estimated by
D 5 0.01sV2
1
0.95
s
a
W
V
b
2
friction lift
FIGURE P16.25
Torque transmitted to an inductor as a function of slip.
s
T
4 8 10
3
4
0
0
2
2
6
1
FIGURE P16.28
Plot of drag versus velocity for an airfoil.
400 800 1,200
10,000
20,000
Total
Minimum
Lift Friction
0
0
V
D
FIGURE P16.29
Roller bearings.
F
F
x
PROBLEMS 437
and testing of mountain bikes (Fig. P16.33a). Suppose that you are
given the task of predicting the horizontal and vertical displace-
ment of a bike bracketing system in response to a force. Assume
the forces you must analyze can be simplified as depicted in
Fig. P16.33b.You are interested in testing the response of the truss
to a force exerted in any number of directions designated by the
angle u. The parameters for the problem are E 5Young’s modulus 5
231011
Pa, A 5 cross-sectional area 5 0.0001 m2
, w 5 width 5
0.44 m, / 5 length 5 0.56 m, and h 5 height 5 0.5 m. The dis-
placements x and y can be solved by determining the values that
yield a minimum potential energy. Determine the displacements
for a force of 10,000 N and a range of u’s from 08 (horizontal) to
908 (vertical).
The problem of finding the location of the maximum stress
along the x axis can be shown to be equivalent to maximizing the
function
f(x) 5
0.4
21 1 x2
2 21 1 x2
a1 2
0.4
1 1 x2
b 1 x
Find the x that maximizes f(x).
16.30 An aerospace company is developing a new fuel additive
for commercial airliners. The additive is composed of three ingre-
dients: X, Y, and Z. For peak performance, the total amount of
additive must be at least 6 mL/L of fuel. For safety reasons, the
sum of the highly flammable X and Y ingredients must not exceed
2.5 mL/L. In addition, the amount of the X ingredient must always
be equal to or greater than the Y, and the Z must be greater than
half the Y. If the cost per mL for the ingredients X, Y, and Z is
0.05, 0.025, and 0.15, respectively, determine the minimum cost
mixture for each liter of fuel.
16.31 A manufacturing firm produces four types of automobile
parts. Each is first fabricated and then finished. The required worker
hours and profit for each part are
Part
A B C D
Fabrication time (hr/100 units) 2.5 1.5 2.75 2
Finishing time (hr/100 units) 3.5 3 3 2
Profit ($/100 units) 375 275 475 325
The capacities of the fabrication and finishing shops over the next
month are 640 and 960 hours, respectively. Determine how many of
each part should be produced in order to maximize profit.
16.32 In a similar fashion to the case study described in Sec.
16.4, develop the potential energy function for the system de-
picted in Fig. P16.32. Develop contour and surface plots in
MATLAB. Minimize the potential energy function in order to
determine the equilibrium displacements x1 and x2 given the
forcing function F 5 100 N, and the parameter ka 5 20 and kb 5
15 N/m.
16.33 Recent interest in competitive and recreational cycling has
meant that engineers have directed their skills toward the design
FIGURE P16.32
Two frictionless masses connected to a wall by a pair of linear
elastic springs.
ka kb
F
2
1
x2
x1
FIGURE P16.33
(a) A mountain bike along with (b) a free-body diagram for a
part of the frame.
(a)
x
F
y
h ᐉ
w
␪
(b)
438
EPILOGUE: PART FOUR
The epilogues of other parts of this book contain a discussion and a tabular summary of
the trade-offs among various methods as well as important formulas and relationships.
Most of the methods of this part are quite complicated and, consequently, cannot be
summarized with simple formulas and tabular summaries. Therefore, we deviate some-
what here by providing the following narrative discussion of trade-offs and further refer-
ences.
PT4.4 TRADE-OFFS
Chapter 13 dealt with finding the optimum of an unconstrained function of a single vari-
able. The golden-section search method is a bracketing method requiring that an interval
containing a single optimum be known. It has the advantage that it minimizes function
evaluations and always converges. Parabolic interpolation also works best when imple-
mented as a bracketing method, although it can also be programmed as an open method.
However, in such cases, it may diverge. Both the golden-section search and parabolic
interpolation do not require derivative evaluations. Thus, they are both appropriate meth-
ods when the bracket can be readily defined and function evaluations are costly.
Newton’s method is an open method not requiring that an optimum be bracketed. It
can be implemented in a closed-form representation when first and second derivatives can
be determined analytically. It can also be implemented in a fashion similar to the secant
method with finite-difference representations of the derivatives. Although Newton’s method
converges rapidly near the optimum, it is often divergent for poor guesses. Convergence is
also dependent on the nature of the function.
Finally, hybrid approaches are available that orchestrate various methods to attain
both reliability and efficiency. Brent’s method does this by combining the reliable golden-
section search with speedy parabolic interpolation.
Chapter 14 covered two general types of methods to solve multidimensional uncon-
strained optimization problems. Direct methods such as random searches and univariate
searches do not require the evaluation of the function’s derivatives and are often ineffi-
cient. However they also provide a tool to find global rather than local optima. Pattern
search methods like Powell’s method can be very efficient and also do not require de-
rivative evaluation.
Gradient methods use either first and sometimes second derivatives to find the op-
timum. The method of steepest ascent/descent provides a reliable but sometimes slow
approach. In contrast, Newton’s method often converges rapidly when in the vicinity of
a root, but sometimes suffers from divergence. The Marquardt method uses the steepest
descent method at the starting location far away from the optimum and switches to
Newton’s method near the optimum in an attempt to take advantage of the strengths of
each method.
PT4.5 ADDITIONAL REFERENCES 439
The Newton method can be computationally costly because it requires computation
of both the gradient vector and the Hessian matrix. Quasi-Newton approaches attempt
to circumvent these problems by using approximations to reduce the number of matrix
evaluations (particularly the evaluation, storage, and inversion of the Hessian).
Research investigations continue today that explore the characteristics and respective
strengths of various hybrid and tandem methods. Some examples are the Fletcher-Reeves
conjugate gradient method and Davidon-Fletcher-Powell quasi-Newton methods.
Chapter 15 was devoted to constrained optimization. For linear problems, linear pro-
gramming based on the simplex method provides an efficient means to obtain solutions.
Approaches such as the GRG method are available to solve nonlinear constrained problems.
Software packages include a wide variety of optimization capabilities. As described
in Chap. 15, Excel, MATLAB software, and Mathcad all have built-in search capabilities
that can be used for both one-dimensional and multidimensional problems routinely
encountered in engineering and science.
PT4.5 ADDITIONAL REFERENCES
General overviews of optimization including some algorithms can be found in Press et
al. (2007) and Moler (2004). For multidimensional problems, additional information can
be found in Dennis and Schnabel (1996), Fletcher (1980, 1981), Gill et al. (1981), and
Luenberger (1984).
In addition, there are a number of advanced methods that are well suited for specific
problem contexts. For example, genetic algorithms use strategies inspired by evolutionary
biology such as inheritance, mutation, and selection. Because they do not make assump-
tions regarding the underlying search space, such evolutionary algorithms are often use-
ful for large problems with many local optima. Related techniques include simulated
annealing and Tabu search. Hillier and Lieberman (2005) provide overviews of these and
a number of other advanced techniques.
PART FIVE
441
PT5.1 MOTIVATION
Data are often given for discrete values along a continuum. However, you may require
estimates at points between the discrete values. The present part of this book describes
techniques to fit curves to such data to obtain intermediate estimates. In addition, you
may require a simplified version of a complicated function. One way to do this is to
compute values of the function at a number of discrete values along the range of interest.
Then, a simpler function may be derived to fit these values. Both of these applications
are known as curve fitting.
There are two general approaches for curve fitting that are distinguished from each
other on the basis of the amount of error associated with these data. First, where these
data exhibit a significant degree of error or “noise,” the strategy is to derive a single
curve that represents the general trend of these data. Because any individual data point
may be incorrect, we make no effort to intersect every point. Rather, the curve is designed
to follow the pattern of the points taken as a group. One approach of this nature is called
least-squares regression (Fig. PT5.1a).
Second, where these data are known to be very precise, the basic approach is to fit
a curve or a series of curves that pass directly through each of the points. Such data
usually originate from tables. Examples are values for the density of water or for the
heat capacity of gases as a function of temperature. The estimation of values between
well-known discrete points is called interpolation (Fig. PT5.1b and c).
PT5.1.1 Noncomputer Methods for Curve Fitting
The simplest method for fitting a curve to data is to plot the points and then sketch a
line that visually conforms to these data. Although this is a valid option when quick
estimates are required, the results are dependent on the subjective viewpoint of the per-
son sketching the curve.
For example, Fig. PT5.1 shows sketches developed from the same set of data by
three engineers. The first did not attempt to connect the points, but rather, characterized
the general upward trend of these data with a straight line (Fig. PT5.1a). The second
engineer used straight-line segments or linear interpolation to connect the points
(Fig. PT5.1b). This is a very common practice in engineering. If the values are truly
close to being linear or are spaced closely, such an approximation provides estimates
that are adequate for many engineering calculations. However, where the underlying
relationship is highly curvilinear or these data are widely spaced, significant errors can
be introduced by such linear interpolation. The third engineer used curves to try to
capture the meanderings suggested by these data (Fig. PT5.1c). A fourth or fifth engineer
would likely develop alternative fits. Obviously, our goal here is to develop systematic
and objective methods for the purpose of deriving such curves.
CURVE FITTING
442 CURVE FITTING
PT5.1.2 Curve Fitting and Engineering Practice
Your first exposure to curve fitting may have been to determine intermediate values from
tabulated data—for instance, from interest tables for engineering economics or from
steam tables for thermodynamics. Throughout the remainder of your career, you will
have frequent occasion to estimate intermediate values from such tables.
Although many of the widely used engineering properties have been tabulated, there
are a great many more that are not available in this convenient form. Special cases and
new problem contexts often require that you measure your own data and develop your
own predictive relationships. Two types of applications are generally encountered when
fitting experimental data: trend analysis and hypothesis testing.
Trend analysis represents the process of using the pattern of these data to make
predictions. For cases where these data are measured with high precision, you might
FIGURE PT5.1
Three attempts to fit a “best” curve through five data points. (a) Least-squares regression, (b) linear
interpolation, and (c) curvilinear interpolation.
f(x)
x
(a)
f(x)
x
(b)
f(x)
x
(c)
PT5.2 MATHEMATICAL BACKGROUND 443
utilize interpolating polynomials. Imprecise data are often analyzed with least-squares
regression.
Trend analysis may be used to predict or forecast values of the dependent variable.
This can involve extrapolation beyond the limits of the observed data or interpolation
within the range of the data. All fields of engineering commonly involve problems of
this type.
A second engineering application of experimental curve fitting is hypothesis testing.
Here, an existing mathematical model is compared with measured data. If the model
coefficients are unknown, it may be necessary to determine values that best fit the ob-
served data. On the other hand, if estimates of the model coefficients are already avail-
able, it may be appropriate to compare predicted values of the model with observed
values to test the adequacy of the model. Often, alternative models are compared and
the “best” one is selected on the basis of empirical observations.
In addition to the above engineering applications, curve fitting is important in other
numerical methods such as integration and the approximate solution of differential equa-
tions. Finally, curve-fitting techniques can be used to derive simple functions to ap-
proximate complicated functions.
PT5.2 MATHEMATICAL BACKGROUND
The prerequisite mathematical background for interpolation is found in the material
on Taylor series expansions and finite divided differences introduced in Chap. 4.
Least-squares regression requires additional information from the field of statistics. If
you are familiar with the concepts of the mean, standard deviation, residual sum of
the squares, normal distribution, and confidence intervals, feel free to skip the follow-
ing pages and proceed directly to PT5.3. If you are unfamiliar with these concepts or
are in need of a review, the following material is designed as a brief introduction to
these topics.
PT5.2.1 Simple Statistics
Suppose that in the course of an engineering study, several measurements were made
of a particular quantity. For example, Table PT5.1 contains 24 readings of the coefficient
of thermal expansion of a structural steel. Taken at face value, these data provide a
limited amount of information—that is, that the values range from a minimum of 6.395
to a maximum of 6.775. Additional insight can be gained by summarizing these data
in one or more well-chosen statistics that convey as much information as possible about
specific characteristics of the data set. These descriptive statistics are most often selected
TABLE PT5.1 Measurements of the coefficient of thermal expansion of structural steel
[3 1026
in/(in ? 8F)].
6.495 6.595 6.615 6.635 6.485 6.555
6.665 6.505 6.435 6.625 6.715 6.655
6.755 6.625 6.715 6.575 6.655 6.605
6.565 6.515 6.555 6.395 6.775 6.685
444 CURVE FITTING
to represent (1) the location of the center of the distribution of these data and (2) the
degree of spread of the data set.
The most common location statistic is the arithmetic mean. The arithmetic mean (y)
of a sample is defined as the sum of the individual data points (yi) divided by the num-
ber of points (n), or
y 5
gyi
n
(PT5.1)
where the summation (and all the succeeding summations in this introduction) is from
i 5 1 through n.
The most common measure of spread for a sample is the standard deviation (sy)
about the mean,
sy 5
B
St
n 2 1
(PT5.2)
where St is the total sum of the squares of the residuals between the data points and the
mean, or
St 5 g (yi 2 y)2
(PT5.3)
Thus, if the individual measurements are spread out widely around the mean, St (and,
consequently, sy) will be large. If they are grouped tightly, the standard deviation will be
small. The spread can also be represented by the square of the standard deviation, which
is called the variance:
s2
y 5
g (yi 2 y)2
n 2 1
(PT5.4)
Note that the denominator in both Eqs. (PT5.2) and (PT5.4) is n 2 1. The quantity n 2 1
is referred to as the degrees of freedom. Hence St and sy are said to be based on n 2 1
degrees of freedom. This nomenclature derives from the fact that the sum of the quanti-
ties upon which St is based (that is, y 2 y1, y 2 y2, p , y 2 yn) is zero. Consequently, if
y is known and n 2 1 of the values are specified, the remaining value is fixed. Thus,
only n 2 1 of the values are said to be freely determined. Another justification for divid-
ing by n 2 1 is the fact that there is no such thing as the spread of a single data point.
For the case where n 5 1, Eqs. (PT5.2) and (PT5.4) yield a meaningless result of infinity.
It should be noted that an alternative, more convenient formula is available to com-
pute the standard deviation,
s2
y 5
gy2
i 2 ( gyi)2
yn
n 2 1
This version does not require precomputation of y and yields an identical result as
Eq. (PT5.4).
PT5.2 MATHEMATICAL BACKGROUND 445
A final statistic that has utility in quantifying the spread of data is the coefficient of
variation (c.v.). This statistic is the ratio of the standard deviation to the mean. As such,
it provides a normalized measure of the spread. It is often multiplied by 100 so that it
can be expressed in the form of a percent:
c.v. 5
sy
y
100% (PT5.5)
Notice that the coefficient of variation is similar in spirit to the percent relative error (␧t)
discussed in Sec. 3.3. That is, it is the ratio of a measure of error (sy) to an estimate of
the true value (y).
EXAMPLE PT5.1 Simple Statistics of a Sample
Problem Statement. Compute the mean, variance, standard deviation, and coefficient
of variation for the data in Table PT5.1.
TABLE PT5.2 Computations for statistics for the readings of the coefficient of thermal
expansion. The frequencies and bounds are developed to construct the
histogram in Fig. PT5.2.
Interval
Lower Upper
i yi (yi 2 y
w)2
Frequency Bound Bound
1 6.395 0.042025 1 6.36 6.40
2 6.435 0.027225 1 6.40 6.44
3 6.485 0.013225
4 6.495 0.011025
4 6.48 6.52
5 6.505 0.009025
6 6.515 0.007225
7 6.555 0.002025
8 6.555 0.002025
2 6.52 6.56
9 6.565 0.001225
10 6.575 0.000625 3 6.56 6.60
11 6.595 0.000025
12 6.605 0.000025
13 6.615 0.000225
14 6.625 0.000625 5 6.60 6.64
15 6.625 0.000625
16 6.635 0.001225
17 6.655 0.003025
18 6.655 0.003025 3 6.64 6.68
19 6.665 0.004225
20 6.685 0.007225
21 6.715 0.013225 3 6.68 6.72
22 6.715 0.013225
23 6.755 0.024025 1 6.72 6.76
24 6.775 0.030625 1 6.76 6.80
S 158.4 0.217000
446 CURVE FITTING
Solution. These data are added (Table PT5.2), and the results are used to compute
[Eq. (PT5.1)]
y 5
158.4
24
5 6.6
As in Table PT5.2, the sum of the squares of the residuals is 0.217000, which can be
used to compute the standard deviation [Eq. (PT5.2)]:
sy 5
B
0.217000
24 2 1
5 0.097133
the variance [Eq. (PT5.4)]:
s2
y 5 0.009435
and the coefficient of variation [Eq. (PT5.5)]:
c.v. 5
0.097133
6.6
100% 5 1.47%
PT5.2.2 The Normal Distribution
Another characteristic that bears on the present discussion is the data distribution—that is,
the shape with which these data are spread around the mean. A histogram provides a
simple visual representation of the distribution. As seen in Table PT5.2, the histogram is
constructed by sorting the measurements into intervals. The units of measurement are plot-
ted on the abscissa and the frequency of occurrence of each interval is plotted on the or-
dinate. Thus, five of the measurements fall between 6.60 and 6.64. As in Fig. PT5.2, the
histogram suggests that most of these data are grouped close to the mean value of 6.6.
If we have a very large set of data, the histogram often can be approximated by a
smooth curve. The symmetric, bell-shaped curve superimposed on Fig. PT5.2 is one such
characteristic shape—the normal distribution. Given enough additional measurements,
the histogram for this particular case could eventually approach the normal distribution.
The concepts of the mean, standard deviation, residual sum of the squares, and
normal distribution all have great relevance to engineering practice. A very simple ex-
ample is their use to quantify the confidence that can be ascribed to a particular measure-
ment. If a quantity is normally distributed, the range defined by y 2 sy to y 1 sy will
encompass approximately 68 percent of the total measurements. Similarly, the range
defined by y 2 2sy to y 1 2sy will encompass approximately 95 percent.
For example, for the data in Table PT5.1 (y 5 6.6 and sy 5 0.097133), we can make
the statement that approximately 95 percent of the readings should fall between 6.405734 and
6.794266. If someone told us that they had measured a value of 7.35, we would suspect that
the measurement might be erroneous. The following section elaborates on such evaluations.
PT5.2.3 Estimation of Confidence Intervals
As should be clear from the previous sections, one of the primary aims of statistics is
to estimate the properties of a population based on a limited sample drawn from that
PT5.2 MATHEMATICAL BACKGROUND 447
population. Clearly, it is impossible to measure the coefficient of thermal expansion for
every piece of structural steel that has ever been produced. Consequently, as seen in
Tables PT5.1 and PT5.2, we can randomly make a number of measurements and, on the
basis of the sample, attempt to characterize the properties of the entire population.
Because we “infer” properties of the unknown population from a limited sample,
the endeavor is called statistical inference. Because the results are often reported as
estimates of the population parameters, the process is also referred to as estimation.
We have already shown how we estimate the central tendency (sample mean, y) and
spread (sample standard deviation and variance) of a limited sample. Now, we will briefly
describe how we can attach probabilistic statements to the quality of these estimates. In
particular, we will discuss how we can define a confidence interval around our estimate
of the mean. We have chosen this particular topic because of its direct relevance to the
regression models we will be describing in Chap. 17.
Note that in the following discussion, the nomenclature y and sy refer to the sample
mean and standard deviation, respectively. The nomenclature ␮ and ␴ refer to the popu-
lation mean and standard deviation, respectively. The former are sometimes referred to
as the “estimated” mean and standard deviation, whereas the latter are sometimes called
the “true” mean and standard deviation.
An interval estimator gives the range of values within which the parameter is ex-
pected to lie with a given probability. Such intervals are described as being one-sided or
two-sided. As the name implies, a one-sided interval expresses our confidence that the
parameter estimate is less than or greater than the true value. In contrast, the two-sided
interval deals with the more general proposition that the estimate agrees with the truth
with no consideration to the sign of the discrepancy. Because it is more general, we will
focus on the two-sided interval.
FIGURE PT5.2
A histogram used to depict the distribution of data. As the number of data points increases, the
histogram could approach the smooth, bell-shaped curve called the normal distribution.
5
4
Frequency
3
2
1
6.4 6.6 6.8
0
448 CURVE FITTING
A two-sided interval can be described by the statement
P{L # m # U} 5 1 2 a
which reads, “the probability that the true mean of y, ␮, falls within the bound from
L to U is 1 2 ␣.” The quantity ␣ is called the significance level. So the problem of
defining a confidence interval reduces to estimating L and U. Although it is not abso-
lutely necessary, it is customary to view the two-sided interval with the ␣ probability
distributed evenly as ␣y2 in each tail of the distribution, as in Fig. PT5.3.
If the true variance of the distribution of y, ␴2
, is known (which is not usually the
case), statistical theory states that the sample mean y comes from a normal distribution
with mean ␮ and variance ␴2
yn (Box PT5.1). In the case illustrated in Fig. PT5.3, we
really do not know ␮. Therefore, we do not know where the normal curve is exactly
located with respect to y. To circumvent this dilemma, we compute a new quantity, the
standard normal estimate
z 5
y 2 m
sy2n
(PT5.6)
which represents the normalized distance between y and ␮. According to statistical theory,
this quantity should be normally distributed with a mean of 0 and a variance of 1.
Furthermore, the probability that z would fall within the unshaded region of Fig. PT5.3
FIGURE PT5.3
A two-sided confidence interval. The abscissa scale in (a) is written in the natural units of the ran-
dom variable y. The normalized version of the abscissa in (b) has the mean at the origin and
scales the axis so that the standard deviation corresponds to a unit value.
L
␣/2
1 – ␣
Distribution of
means of y, y
–
␮ U
y
(a)
z–␣/2 –1 1
0 z␣/2
z
–
(b)
␣/2
␴
–␴
PT5.2 MATHEMATICAL BACKGROUND 449
should be 1 2 ␣. Therefore, the statement can be made that
y 2 m
sy1n
, 2zay2 or
y 2 m
sy1n
. zay2
with a probability of ␣.
The quantity z␣y2 is a standard normal random variable. This is the distance measured
along the normalized axis above and below the mean that encompasses 1 2 ␣ probability
(Fig. PT5.3b). Values of z␣y2 are tabulated in statistics books (for example, Milton and
Arnold, 2002). They can also be calculated using functions on software packages like
Excel, MATLAB, and Mathcad. As an example, for ␣ 5 0.05 (in other words, defining
an interval encompassing 95%), z␣y2 is equal to about 1.96. This means that an interval
around the mean of width 61.96 times the standard deviation will encompass approxi-
mately 95% of the distribution.
These results can be rearranged to yield
L # m # U
Box PT5.1 A Little Statistics
Most engineers take several courses to become proficient at statis-
tics. Because you may not have taken such a course yet, we would
like to mention a few ideas that might make this present section
more coherent.
As we have stated, the “game” of inferential statistics assumes
that the random variable you are sampling, y, has a true mean (␮)
and variance (␴2
). Further, in the present discussion, we also as-
sume that it has a particular distribution: the normal distribution.
The variance of this normal distribution has a finite value that spec-
ifies the “spread” of the normal distribution. If the variance is large,
the distribution is broad. Conversely, if the variance is small, the
distribution is narrow. Thus, the true variance quantifies the intrin-
sic uncertainty of the random variable.
In the game of statistics, we take a limited number of measure-
ments of this quantity called a sample. From this sample, we can
compute an estimated mean (y) and variance (s2
y). The more mea-
surements we take, the better the estimates approximate the true
values. That is, as n S `, y S m and s2
y S s2
.
Suppose that we take n samples and compute an estimated mean
y1. Then, we take another n samples and compute another, y2. We
can keep repeating this process until we have generated a sample of
means: y1, y2, y3, p , ym, where m is large. We can then develop a
histogram of these means and determine a “distribution of the
means,” as well as a “mean of the means” and a “standard deviation
of the means.” Now the question arises: does this new distribution
of means and its statistics behave in a predictable fashion?
There is an extremely important theorem known as the Central
Limit Theorem that speaks directly to this question. It can be stated
as
Let y1, y2, . . . , yn be a random sample of size n from a distribu-
tion with mean ␮ and variance ␴2
. Then, for large n, y is approxi-
mately normal with mean ␮ and variance ␴2
yn. Furthermore, for
large n, the random variable (y 2 m)y(sy1n) is approximately
standard normal.
Thus, the theorem states the remarkable result that the distri-
bution of means will always be normally distributed regardless
of the underlying distribution of the random variables! It also
yields the expected result that given a sufficiently large sample,
the mean of the means should converge on the true population
mean ␮.
Further, the theorem says that as the sample size gets larger, the
variance of the means should approach zero. This makes sense,
because if n is small, our individual estimates of the mean should
be poor and the variance of the means should be large. As n in-
creases, our estimates of the mean will improve and hence their
spread should shrink. The Central Limit Theorem neatly defines
exactly how this shrinkage relates to both the true variance and the
sample size, that is, as ␴2
yn.
Finally, the theorem states the important result that we have
given as Eq. (PT5.6). As is shown in this section, this result is the
basis for constructing confidence intervals for the mean.
450 CURVE FITTING
with a probability of 1 2 ␣, where
L 5 y 2
s
1n
zay2 U 5 y 1
s
1n
zay2 (PT5.7)
Now, although the foregoing provides an estimate of L and U, it is based on knowl-
edge of the true variance ␴. For our case, we know only the estimated variance sy. A
straightforward alternative would be to develop a version of Eq. (PT5.6) based on sy,
t 5
y 2 m
syy1n
(PT5.8)
Even when we sample from a normal distribution, this fraction will not be normally
distributed, particularly when n is small. It was found by W. S. Gossett that the random
variable defined by Eq. (PT5.8) follows the so-called Student-t, or simply, t distribution.
For this case,
L 5 y 2
sy
1n
tay2, n21 U 5 y 1
sy
1n
tay2, n21 (PT5.9)
where t␣y2,n21 is the standard random variable for the t distribution for a probability of
␣y2. As was the case for z␣y2, values are tabulated in statistics books and can also be
calculated using software packages and libraries. For example, if ␣ 5 0.05 and n 5 20,
t␣y2,n21 5 2.086.
The t distribution can be thought of as a modification of the normal distribution that
accounts for the fact that we have an imperfect estimate of the standard deviation. When
n is small, it tends to be flatter than the normal (see Fig. PT5.4). Therefore, for small
FIGURE PT5.4
Comparison of the normal distribution with the t distribution for n 5 3 and n 5 6. Notice how
the t distribution is generally flatter.
–1
–2
–3 0
Z or t
2
1 3
t(n = 6)
t(n = 3)
Normal
PT5.2 MATHEMATICAL BACKGROUND 451
numbers of measurements, it yields wider and hence more conservative confidence in-
tervals. As n grows larger, the t distribution converges on the normal.
EXAMPLE PT5.2 Confidence Interval on the Mean
Problem Statement. Determine the mean and the corresponding 95% confidence interval
for the data from Table PT5.1. Perform three estimates based on (a) the first 8, (b) the first
16, and (c) all 24 measurements.
Solution. (a) The mean and standard deviation for the first 8 points is
y 5
52.72
8
5 6.59 sy 5
B
347.4814 2 (52.72)2
y8
8 2 1
5 0.089921
The appropriate t statistic can be calculated as
t0.05y2,821 5 t0.025,7 5 2.364623
which can be used to compute the interval
L 5 6.59 2
0.089921
18
2.364623 5 6.5148
U 5 6.59 1
0.089921
18
2.364623 5 6.6652
or
6.5148 # m # 6.6652
FIGURE PT5.5
Estimates of the mean and 95% confidence intervals for different numbers of sample size.
6.60
6.55
6.50
Coefficient of thermal expansion [⫻ 10–6
in/(in • ⬚F)]
6.70
6.65
n = 24
n = 16
y
–
n = 8
452 CURVE FITTING
Thus, based on the first eight measurements, we conclude that there is a 95% probabil-
ity that the true mean falls within the range 6.5148 to 6.6652.
The two other cases for (b) 16 points and (c) 24 points can be calculated in a
similar fashion and the results tabulated along with case (a) as
n y
w sy t␣y2,n21 L U
8 6.5900 0.089921 2.364623 6.5148 6.6652
16 6.5794 0.095845 2.131451 6.5283 6.6304
24 6.6000 0.097133 2.068655 6.5590 6.6410
These results, which are also summarized in Fig. PT5.5, indicate the expected outcome
that the confidence interval becomes more narrow as n increases. Thus, the more mea-
surements we take, our estimate of the true value becomes more refined.
The above is just one simple example of how statistics can be used to make judg-
ments regarding uncertain data. These concepts will also have direct relevance to our
discussion of regression models. You can consult any basic statistics book (for example,
Milton and Arnold, 2002) to obtain additional information on the subject.
PT5.3 ORIENTATION
Before we proceed to numerical methods for curve fitting, some orientation might be
helpful. The following is intended as an overview of the material discussed in Part Five.
In addition, we have formulated some objectives to help focus your efforts when study-
ing the material.
PT5.3.1 Scope and Preview
Figure PT5.6 provides a visual overview of the material to be covered in Part Five.
Chapter 17 is devoted to least-squares regression. We will first learn how to fit the
“best” straight line through a set of uncertain data points. This technique is called lin-
ear regression. Besides discussing how to calculate the slope and intercept of this
straight line, we also present quantitative and visual methods for evaluating the validity
of the results.
In addition to fitting a straight line, we also present a general technique for fitting
a “best’’ polynomial. Thus, you will learn to derive a parabolic, cubic, or higher-order
polynomial that optimally fits uncertain data. Linear regression is a subset of this more
general approach, which is called polynomial regression.
The next topic covered in Chap. 17 is multiple linear regression. It is designed for
the case where the dependent variable y is a linear function of two or more independent
variables x1, x2, . . . , xm. This approach has special utility for evaluating experimental
data where the variable of interest is dependent on a number of different factors.
PT5.3 ORIENTATION 453
After multiple regression, we illustrate how polynomial and multiple regression are
both subsets of a general linear least-squares model. Among other things, this will allow
us to introduce a concise matrix representation of regression and discuss its general
statistical properties.
FIGURE PT5.6
Schematic of the organization of the material in Part Five: Curve Fitting.
PART 5
Curve
Fitting
CHAPTER 20
Case Studies
EPILOGUE
18.6
Splines
18.7
Multidimensional
interpolation
18.5
Additional
comments
18.4
Inverse
interpolation
18.3
Polynomial
coefficients
18.2
Lagrange
polynomial
18.1
Newton
polynomial
PT 5.2
Mathematical
background
PT 5.6
Advanced
methods
PT 5.5
Important
formulas
20.4
Mechanical
engineering
20.3
Electrical
engineering
20.2
Civil
engineering
20.1
Chemical
engineering
19.8
Software
packages
19.7
Power
spectrum
19.1
Sinusoids
19.2
Continuous
Fourier series
19.6
Fast Fourier
transform
19.5
Discrete Fourier
transform
19.3
Frequency and
time domains
19.4
Fourier
transform
PT 5.4
Trade-offs
PT 5.3
Orientation
PT 5.1
Motivation
17.2
Polynomial
regression
17.3
Multiple
regression
17.4
General linear
least squares
17.5
Nonlinear
regression
17.1
Linear
regression
CHAPTER 17
Least-Squares
Regression
CHAPTER 19
Fourier
Approximation
CHAPTER 18
Interpolation
454 CURVE FITTING
Finally, the last sections of Chap. 17 are devoted to nonlinear regression. This ap-
proach is designed to compute a least-squares fit of a nonlinear equation to data.
In Chap. 18, the alternative curve-fitting technique called interpolation is de-
scribed. As discussed previously, interpolation is used for estimating intermediate
values between precise data points. In Chap. 18, polynomials are derived for this
purpose. We introduce the basic concept of polynomial interpolation by using straight
lines and parabolas to connect points. Then, we develop a generalized procedure for
fitting an nth-order polynomial. Two formats are presented for expressing these poly-
nomials in equation form. The first, called Newton’s interpolating polynomial, is pref-
erable when the appropriate order of the polynomial is unknown. The second, called
the Lagrange interpolating polynomial, has advantages when the proper order is
known beforehand.
The next section of Chap. 18 presents an alternative technique for fitting precise data
points. This technique, called spline interpolation, fits polynomials to data but in a piece-
wise fashion. As such, it is particularly well-suited for fitting data that are generally
smooth but exhibit abrupt local changes. Finally, we provide a brief introduction to
multidimensional interpolation.
Chapter 19 deals with the Fourier transform approach to curve fitting where periodic
functions are fit to data. Our emphasis in this section will be on the fast Fourier trans-
form. At the end of this chapter, we also include an overview of several software pack-
ages that can be used for curve fitting. These are Excel, MATLAB, and Mathcad.
Chapter 20 is devoted to engineering applications that illustrate the utility of the
numerical methods in engineering problem contexts. Examples are drawn from the four
major specialty areas of chemical, civil, electrical, and mechanical engineering. In addi-
tion, some of the applications illustrate how software packages can be applied for engi-
neering problem solving.
Finally, an epilogue is included at the end of Part Five. It contains a summary of
the important formulas and concepts related to curve fitting as well as a discussion of
trade-offs among the techniques and suggestions for future study.
PT5.3.2 Goals and Objectives
Study Objectives. After completing Part Five, you should have greatly enhanced your
capability to fit curves to data. In general, you should have mastered the techniques, have
learned to assess the reliability of the answers, and be capable of choosing the preferred
method (or methods) for any particular problem. In addition to these general goals, the
specific concepts in Table PT5.3 should be assimilated and mastered.
Computer Objectives. You have been provided with simple computer algorithms to
implement the techniques discussed in Part Five. You may also have access to software
packages and libraries. All have utility as learning tools.
Pseudocode algorithms are provided for most of the methods in Part Five. This
information will allow you to expand your software library to include techniques beyond
polynomial regression. For example, you may find it useful from a professional view-
point to have software to implement multiple linear regression, Newton’s interpolating
polynomial, cubic spline interpolation, and the fast Fourier transform.
PT5.3 ORIENTATION 455
In addition, one of your most important goals should be to master several of the
general-purpose software packages that are widely available. In particular, you should
become adept at using these tools to implement numerical methods for engineering
problem solving.
TABLE PT5.3 Specific study objectives for Part Five.
1. Understand the fundamental difference between regression and interpolation and realize why
confusing the two could lead to serious problems
2. Understand the derivation of linear least-squares regression and be able to assess the reliability of
the fit using graphical and quantitative assessments
3. Know how to linearize data by transformation
4. Understand situations where polynomial, multiple, and nonlinear regression are appropriate
5. Be able to recognize general linear models, understand the general matrix formulation of linear least
squares, and know how to compute confidence intervals for parameters
6. Understand that there is one and only one polynomial of degree n or less that passes exactly
through n 1 1 points
7. Know how to derive the first-order Newton’s interpolating polynomial
8. Understand the analogy between Newton’s polynomial and the Taylor series expansion and how it
relates to the truncation error
9. Recognize that the Newton and Lagrange equations are merely different formulations of the same
interpolating polynomial and understand their respective advantages and disadvantages
10. Realize that more accurate results are generally obtained if data used for interpolation are centered
around and close to the unknown point
11. Realize that data points do not have to be equally spaced nor in any particular order for either the
Newton or Lagrange polynomials
12. Know why equispaced interpolation formulas have utility
13. Recognize the liabilities and risks associated with extrapolation
14. Understand why spline functions have utility for data with local areas of abrupt change
15. Understand how interpolating polynomials can be applied in two dimensions
16. Recognize how the Fourier series is used to fit data with periodic functions
17. Understand the difference between the frequency and time domains
17
C H A P T E R 17
456
Least-Squares Regression
Where substantial error is associated with data, polynomial interpolation is inappropriate
and may yield unsatisfactory results when used to predict intermediate values. Experi-
mental data are often of this type. For example, Fig. 17.1a shows seven experimentally
derived data points exhibiting significant variability. Visual inspection of these data sug-
gests a positive relationship between y and x. That is, the overall trend indicates that
higher values of y are associated with higher values of x. Now, if a sixth-order interpo-
lating polynomial is fitted to these data (Fig. 17.1b), it will pass exactly through all of
the points. However, because of the variability in these data, the curve oscillates widely
in the interval between the points. In particular, the interpolated values at x 5 1.5 and
x 5 6.5 appear to be well beyond the range suggested by these data.
A more appropriate strategy for such cases is to derive an approximating function
that fits the shape or general trend of the data without necessarily matching the indi-
vidual points. Figure 17.1c illustrates how a straight line can be used to generally char-
acterize the trend of these data without passing through any particular point.
One way to determine the line in Fig. 17.1c is to visually inspect the plotted data
and then sketch a “best” line through the points. Although such “eyeball” approaches
have commonsense appeal and are valid for “back-of-the-envelope” calculations, they are
deficient because they are arbitrary. That is, unless the points define a perfect straight
line (in which case, interpolation would be appropriate), different analysts would draw
different lines.
To remove this subjectivity, some criterion must be devised to establish a basis for
the fit. One way to do this is to derive a curve that minimizes the discrepancy between
the data points and the curve. A technique for accomplishing this objective, called least-
squares regression, will be discussed in the present chapter.
17.1 LINEAR REGRESSION
The simplest example of a least-squares approximation is fitting a straight line to a set
of paired observations: (x1, y1), (x2, y2), . . . , (xn, yn). The mathematical expression for
the straight line is
y 5 a0 1 a1x 1 e (17.1)
17.1 LINEAR REGRESSION 457
where a0 and a1 are coefficients representing the intercept and the slope, respectively,
and e is the error, or residual, between the model and the observations, which can be
represented by rearranging Eq. (17.1) as
e 5 y 2 a0 2 a1x
Thus, the error, or residual, is the discrepancy between the true value of y and the ap-
proximate value, a0 1 a1x, predicted by the linear equation.
y
x
(a)
5
5
0
0
y
x
(b)
5
5
0
0
y
x
(c)
5
5
0
0
FIGURE 17.1
(a) Data exhibiting significant
error. (b) Polynomial fit
oscillating beyond the range of
the data. (c) More satisfactory
result using the least-squares fit.
458 LEAST-SQUARES REGRESSION
17.1.1 Criteria for a “Best” Fit
One strategy for fitting a “best” line through the data would be to minimize the sum of
the residual errors for all the available data, as in
a
n
i51
ei 5 a
n
i51
(yi 2 a0 2 a1 xi) (17.2)
where n 5 total number of points. However, this is an inadequate criterion, as illustrated
by Fig. 17.2a which depicts the fit of a straight line to two points. Obviously, the best
FIGURE 17.2
Examples of some criteria for “best fit” that are inadequate for regression: (a) minimizes the sum
of the residuals, (b) minimizes the sum of the absolute values of the residuals, and (c) minimizes
the maximum error of any individual point.
y
Midpoint
Outlier
x
(a)
y
x
(b)
y
x
(c)
17.1 LINEAR REGRESSION 459
fit is the line connecting the points. However, any straight line passing through the mid-
point of the connecting line (except a perfectly vertical line) results in a minimum value
of Eq. (17.2) equal to zero because the errors cancel.
Therefore, another logical criterion might be to minimize the sum of the absolute
values of the discrepancies, as in
a
n
i51
Zei Z 5 a
n
i51
Zyi 2 a0 2 a1xi Z
Figure 17.2b demonstrates why this criterion is also inadequate. For the four points
shown, any straight line falling within the dashed lines will minimize the sum of the
absolute values. Thus, this criterion also does not yield a unique best fit.
A third strategy for fitting a best line is the minimax criterion. In this technique,
the line is chosen that minimizes the maximum distance that an individual point
falls from the line. As depicted in Fig. 17.2c, this strategy is ill-suited for regres-
sion because it gives undue influence to an outlier, that is, a single point with a
large error. It should be noted that the minimax principle is sometimes well-suited
for fitting a simple function to a complicated function (Carnahan, Luther, and
Wilkes, 1969).
A strategy that overcomes the shortcomings of the aforementioned approaches is to
minimize the sum of the squares of the residuals between the measured y and the y
calculated with the linear model
Sr 5 a
n
i51
e2
i 5 a
n
i51
(yi,measured 2 yi,model)2
5 a
n
i51
(yi 2 a0 2 a1xi)2
(17.3)
This criterion has a number of advantages, including the fact that it yields a unique line
for a given set of data. Before discussing these properties, we will present a technique
for determining the values of a0 and a1 that minimize Eq. (17.3).
17.1.2 Least-Squares Fit of a Straight Line
To determine values for a0 and a1, Eq. (17.3) is differentiated with respect to each coef-
ficient:
0Sr
0a0
5 22 a (yi 2 a0 2 a1xi)
0Sr
0a1
5 22 a [(yi 2 a0 2 a1xi)xi]
Note that we have simplified the summation symbols; unless otherwise indicated, all
summations are from i 5 1 to n. Setting these derivatives equal to zero will result in a
minimum Sr. If this is done, the equations can be expressed as
0 5 a yi 2 a a0 2 a a1xi
0 5 a yi xi 2 a a0 xi 2 a a1x2
i
460 LEAST-SQUARES REGRESSION
Now, realizing that Sa0 5 na0, we can express the equations as a set of two simultane-
ous linear equations with two unknowns (a0 and a1):
na0 1 (a xi)a1 5 a yi (17.4)
(a xi)a0 1 (a x2
i )a1 5 a xi yi (17.5)
These are called the normal equations. They can be solved simultaneously
a1 5
noxi yi 2 oxi oyi
nox2
i 2 (oxi)2
(17.6)
This result can then be used in conjunction with Eq. (17.4) to solve for
a0 5 y 2 a1x (17.7)
where y and x are the means of y and x, respectively.
EXAMPLE 17.1 Linear Regression
Problem Statement. Fit a straight line to the x and y values in the first two columns
of Table 17.1.
Solution. The following quantities can be computed:
n 5 7 a xi yi 5 119.5 a x2
i 5 140
a xi 5 28 x 5
28
7
5 4
a yi 5 24 y 5
24
7
5 3.428571
Using Eqs. (17.6) and (17.7),
a1 5
7(119.5) 2 28(24)
7(140) 2 (28)2
5 0.8392857
a0 5 3.428571 2 0.8392857(4) 5 0.07142857
TABLE 17.1 Computations for an error analysis of the linear fit.
xi yi (yi 2 y) (yi 2 a0 2 a1xi)2
1 0.5 8.5765 0.1687
2 2.5 0.8622 0.5625
3 2.0 2.0408 0.3473
4 4.0 0.3265 0.3265
5 3.5 0.0051 0.5896
6 6.0 6.6122 0.7972
7 5.5 4.2908 0.1993
S 24.0 22.7143 2.9911
17.1 LINEAR REGRESSION 461
17.1.3 Quantification of Error of Linear Regression
Any line other than the one computed in Example 17.1 results in a larger sum of the
squares of the residuals. Thus, the line is unique and in terms of our chosen criterion is
a “best” line through the points. A number of additional properties of this fit can be
elucidated by examining more closely the way in which residuals were computed. Recall
that the sum of the squares is defined as [Eq. (17.3)]
Sr 5 a
n
i51
e2
i 5 a
n
i51
(yi 2 a0 2 a1xi)2
(17.8)
Notice the similarity between Eqs. (PT5.3) and (17.8). In the former case, the square
of the residual represented the square of the discrepancy between the data and a single
estimate of the measure of central tendency—the mean. In Eq. (17.8), the square of the
residual represents the square of the vertical distance between the data and another mea-
sure of central tendency—the straight line (Fig. 17.3).
The analogy can be extended further for cases where (1) the spread of the points
around the line is of similar magnitude along the entire range of the data and (2) the
distribution of these points about the line is normal. It can be demonstrated that if these
criteria are met, least-squares regression will provide the best (that is, the most likely)
estimates of a0 and a1 (Draper and Smith, 1981). This is called the maximum likelihood
Therefore, the least-squares fit is
y 5 0.07142857 1 0.8392857x
The line, along with the data, is shown in Fig. 17.1c.
FIGURE 17.3
The residual in linear regression represents the vertical distance between a data point and the
straight line.
y
yi
xi
a0 + a1xi
Measurement
yi – a0 – a1xi
Regression
line
x
462 LEAST-SQUARES REGRESSION
principle in statistics. In addition, if these criteria are met, a “standard deviation” for the
regression line can be determined as [compare with Eq. (PT5.2)]
syyx 5
A
Sr
n 2 2
(17.9)
where syyx is called the standard error of the estimate. The subscript notation “yyx” desig-
nates that the error is for a predicted value of y corresponding to a particular value of x.
Also, notice that we now divide by n 2 2 because two data-derived estimates—a0 and
a1—were used to compute Sr; thus, we have lost two degrees of freedom. As with our
discussion of the standard deviation in PT5.2.1, another justification for dividing by n 2 2
is that there is no such thing as the “spread of data” around a straight line connecting two
points. Thus, for the case where n 5 2, Eq. (17.9) yields a meaningless result of infinity.
Just as was the case with the standard deviation, the standard error of the estimate
quantifies the spread of the data. However, sy/x quantifies the spread around the regression
line as shown in Fig. 17.4b in contrast to the original standard deviation sy that quantified
the spread around the mean (Fig. 17.4a).
The above concepts can be used to quantify the “goodness” of our fit. This is par-
ticularly useful for comparison of several regressions (Fig. 17.5). To do this, we return
to the original data and determine the total sum of the squares around the mean for the
dependent variable (in our case, y). As was the case for Eq. (PT5.3), this quantity is
designated St. This is the magnitude of the residual error associated with the dependent
variable prior to regression. After performing the regression, we can compute Sr, the sum
of the squares of the residuals around the regression line. This characterizes the residual
error that remains after the regression. It is, therefore, sometimes called the unexplained
FIGURE 17.4
Regression data showing (a) the spread of the data around the mean of the dependent variable
and (b) the spread of the data around the best-fit line. The reduction in the spread in going from
(a) to (b), as indicated by the bell-shaped curves at the right, represents the improvement due to
linear regression.
(a) (b)
17.1 LINEAR REGRESSION 463
sum of the squares. The difference between the two quantities, St 2 Sr, quantifies the
improvement or error reduction due to describing the data in terms of a straight line rather
than as an average value. Because the magnitude of this quantity is scale-dependent, the
difference is normalized to St to yield
r2
5
St 2 Sr
St
(17.10)
where r2
is called the coefficient of determination and r is the correlation coefficient
(52r2
). For a perfect fit, Sr 5 0 and r 5 r2
5 1, signifying that the line explains 100
percent of the variability of the data. For r 5 r2
5 0, Sr 5 St and the fit represents no
improvement. An alternative formulation for r that is more convenient for computer
implementation is
r 5
noxiyi 2 (oxi)(oyi)
2nox2
i 2 (oxi)2
2noy2
i 2 (oyi)2
(17.11)
y
x
(a)
y
x
(b)
FIGURE 17.5
Examples of linear regression with (a) small and (b) large residual errors.
464 LEAST-SQUARES REGRESSION
EXAMPLE 17.2 Estimation of Errors for the Linear Least-Squares Fit
Problem Statement. Compute the total standard deviation, the standard error of the
estimate, and the correlation coefficient for the data in Example 17.1.
Solution. The summations are performed and presented in Table 17.1. The standard
deviation is [Eq. (PT5.2)]
sy 5
A
22.7143
7 2 1
5 1.9457
and the standard error of the estimate is [Eq. (17.9)]
syyx 5
A
2.9911
7 2 2
5 0.7735
Thus, because syyx , sy, the linear regression model has merit. The extent of the improve-
ment is quantified by [Eq. (17.10)]
r2
5
22.7143 2 2.9911
22.7143
5 0.868
or
r 5 10.868 5 0.932
These results indicate that 86.8 percent of the original uncertainty has been explained by
the linear model.
Before proceeding to the computer program for linear regression, a word of caution
is in order. Although the correlation coefficient provides a handy measure of goodness-
of-fit, you should be careful not to ascribe more meaning to it than is warranted. Just
because r is “close” to 1 does not mean that the fit is necessarily “good.” For example,
it is possible to obtain a relatively high value of r when the underlying relationship
between y and x is not even linear. Draper and Smith (1981) provide guidance and ad-
ditional material regarding assessment of results for linear regression. In addition, at the
minimum, you should always inspect a plot of the data along with your regression curve.
As described in the next section, software packages include such a capability.
17.1.4 Computer Program for Linear Regression
It is a relatively trivial matter to develop a pseudocode for linear regression (Fig. 17.6).
As mentioned above, a plotting option is critical to the effective use and interpretation
of regression. Such capabilities are included in popular packages like MATLAB software
and Excel. If your computer language has plotting capabilities, we recommend that you
expand your program to include a plot of y versus x, showing both the data and the
regression line. The inclusion of the capability will greatly enhance the utility of the
program in problem-solving contexts.
17.1 LINEAR REGRESSION 465
EXAMPLE 17.3 Linear Regression Using the Computer
Problem Statement. We can use software based on Fig. 17.6 to solve a hypothesis-
testing problem associated with the falling parachutist discussed in Chap. 1. A theoreti-
cal mathematical model for the velocity of the parachutist was given as the following
[Eq. (1.10)]:
y(t) 5
gm
c
(1 2 e(2cym)t
)
where y 5 velocity (m/s), g 5 gravitational constant (9.8 m/s2
), m 5 mass of the para-
chutist equal to 68.1 kg, and c 5 drag coefficient of 12.5 kg/s. The model predicts the
velocity of the parachutist as a function of time, as described in Example 1.1.
An alternative empirical model for the velocity of the parachutist is given by
y(t) 5
gm
c
a
t
3.75 1 t
b (E17.3.1)
Suppose that you would like to test and compare the adequacy of these two math-
ematical models. This might be accomplished by measuring the actual velocity of the
SUB Regress(x, y, n, al, a0, syx, r2)
sumx 5 0: sumxy 5 0: st 5 0
sumy 5 0: sumx2 5 0: sr 5 0
DOFOR i 5 1, n
sumx 5 sumx 1 xi
sumy 5 sumy 1 yi
sumxy 5 sumxy 1 xi*yi
sumx2 5 sumx2 1 xi*xi
END DO
xm 5 sumx/n
ym 5 sumy/n
a1 5 (n*sumxy 2 sumx*sumy)y(n*sumx2 2 sumx*sumx)
a0 5 ym 2 a1*xm
DOFOR i 5 1, n
st 5 st 1 (yi 2 ym)2
sr 5 sr 1 (yi 2 a1*xi 2 a0)2
END DO
syx 5 (sr/(n 2 2))0.5
r2 5 (st 2 sr)/st
END Regress
FIGURE 17.6
Algorithm for linear regression.
466 LEAST-SQUARES REGRESSION
parachutist at known values of time and comparing these results with the predicted ve-
locities according to each model.
Such an experimental-data-collection program was implemented, and the results are
listed in column (a) of Table 17.2. Computed velocities for each model are listed in
columns (b) and (c).
Solution. The adequacy of the models can be tested by plotting the model-calculated
velocity versus the measured velocity. Linear regression can be used to calculate the
slope and the intercept of the plot. This line will have a slope of 1, an intercept of 0,
and an r2
5 1 if the model matches the data perfectly. A significant deviation from these
values can be used as an indication of the inadequacy of the model.
Figure 17.7a and b are plots of the line and data for the regressions of columns (b)
and (c), respectively, versus column (a). For the first model [Eq. (1.10) as depicted in
Fig. 17.7a],
ymodel 5 20.859 1 1.032ymeasure
and for the second model [Eq. (E17.3.1) as depicted in Fig. 17.7b],
ymodel 5 5.776 1 0.752ymeasure
These plots indicate that the linear regression between these data and each of the models
is highly significant. Both models match the data with a correlation coefficient of greater
than 0.99.
However, the model described by Eq. (1.10) conforms to our hypothesis test criteria
much better than that described by Eq. (E17.3.1) because the slope and intercept are
more nearly equal to 1 and 0. Thus, although each plot is well described by a straight
line, Eq. (1.10) appears to be a better model than Eq. (E17.3.1).
TABLE 17.2 Measured and calculated velocities for the falling parachutist.
Measured v, Model-calculated v, Model-calculated v,
m/s m/s [Eq. (1.10)] m/s [Eq. (E17.3.1)]
Time, s (a) (b) (c)
1 10.00 8.953 11.240
2 16.30 16.405 18.570
3 23.00 22.607 23.729
4 27.50 27.769 27.556
5 31.00 32.065 30.509
6 35.60 35.641 32.855
7 39.00 38.617 34.766
8 41.50 41.095 36.351
9 42.90 43.156 37.687
10 45.00 44.872 38.829
11 46.00 46.301 39.816
12 45.50 47.490 40.678
13 46.00 48.479 41.437
14 49.00 49.303 42.110
15 50.00 49.988 42.712
17.1 LINEAR REGRESSION 467
Model testing and selection are common and extremely important activities per-
formed in all fields of engineering. The background material provided in this chapter,
together with your software, should allow you to address many practical problems of
this type.
55
30
Y
5 30
X
55
5
(a)
55
30
Y
5 30
X
55
5
(b)
FIGURE 17.7
(a) Results using linear regression to compare predictions computed with the theoretical model
[Eq. (1.10)] versus measured values. (b) Results using linear regression to compare predictions
computed with the empirical model [Eq. (E17.3.1)] versus measured values.
There is one shortcoming with the analysis in Example 17.3. The example was un-
ambiguous because the empirical model [Eq. (E17.3.1)] was clearly inferior to Eq. (1.10).
Thus, the slope and intercept for the former were so much closer to the desired result of
1 and 0, that it was obvious which model was superior.
(a)
(b)
468 LEAST-SQUARES REGRESSION
However, suppose that the slope were 0.85 and the intercept were 2. Obviously this
would make the conclusion that the slope and intercept were 1 and 0 open to debate.
Clearly, rather than relying on a subjective judgment, it would be preferable to base such
a conclusion on a quantitative criterion.
This can be done by computing confidence intervals for the model parameters in the
same way that we developed confidence intervals for the mean in Sec. PT5.2.3. We will
return to this topic at the end of this chapter.
17.1.5 Linearization of Nonlinear Relationships
Linear regression provides a powerful technique for fitting a best line to data. However,
it is predicated on the fact that the relationship between the dependent and independent
variables is linear. This is not always the case, and the first step in any regression
analysis should be to plot and visually inspect the data to ascertain whether a linear
model applies. For example, Fig. 17.8 shows some data that is obviously curvilinear. In
some cases, techniques such as polynomial regression, which is described in Sec. 17.2,
are appropriate. For others, transformations can be used to express the data in a form
that is compatible with linear regression.
FIGURE 17.8
(a) Data that are ill-suited for linear least-squares regression. (b) Indication that a parabola is
preferable.
y
x
(a)
y
x
(b)
17.1 LINEAR REGRESSION 469
One example is the exponential model
y 5 a1eb1x
(17.12)
where a1 and b1 are constants. This model is used in many fields of engineering to
characterize quantities that increase (positive b1) or decrease (negative b1) at a rate that
is directly proportional to their own magnitude. For example, population growth or ra-
dioactive decay can exhibit such behavior. As depicted in Fig. 17.9a, the equation rep-
resents a nonlinear relationship (for b1 ? 0) between y and x.
Another example of a nonlinear model is the simple power equation
y 5 a2xb2
(17.13)
FIGURE 17.9
(a) The exponential equation, (b) the power equation, and (c) the saturation-growth-rate
equation. Parts (d), (e), and (f ) are linearized versions of these equations that result
from simple transformations.
y
x
y = ␣1e␤1x
(a)
Linearization
y
x
y = ␣2x␤2
(b)
Linearization
y
x
(c)
Linearization
y = ␣3
x
␤3 + x
ln y
x
Slope = ␤1
Intercept = ln ␣1
(d)
log y
log x
(e)
1/y
1/x
( f )
Intercept = log ␣2
Intercept = 1/␣3
Slope = ␤2
Slope = ␤3/␣3
470 LEAST-SQUARES REGRESSION
where a2 and b2 are constant coefficients. This model has wide applicability in all fields
of engineering. As depicted in Fig. 17.9b, the equation (for b2 ? 0 or 1) is nonlinear.
A third example of a nonlinear model is the saturation-growth-rate equation [recall
Eq. (E17.3.1)]
y 5 a3
x
b3 1 x
(17.14)
where a3 and b3 are constant coefficients. This model, which is particularly well-suited for
characterizing population growth rate under limiting conditions, also represents a nonlinear
relationship between y and x (Fig. 17.9c) that levels off, or “saturates,” as x increases.
Nonlinear regression techniques are available to fit these equations to experimental
data directly. (Note that we will discuss nonlinear regression in Sec. 17.5.) However, a
simpler alternative is to use mathematical manipulations to transform the equations into
a linear form. Then, simple linear regression can be employed to fit the equations to data.
For example, Eq. (17.12) can be linearized by taking its natural logarithm to yield
ln y 5 ln a1 1 b1x ln e
But because ln e 5 1,
ln y 5 ln a1 1 b1x (17.15)
Thus, a plot of ln y versus x will yield a straight line with a slope of b1 and an intercept
of ln a1 (Fig. 17.9d).
Equation (17.13) is linearized by taking its base-10 logarithm to give
log y 5 b2 log x 1 log a2 (17.16)
Thus, a plot of log y versus log x will yield a straight line with a slope of b2 and an
intercept of log a2 (Fig. 17.9e).
Equation (17.14) is linearized by inverting it to give
1
y
5
b3
a3
1
x
1
1
a3
(17.17)
Thus, a plot of 1Yy versus lYx will be linear, with a slope of b3Ya3 and an intercept of
1Ya3 (Fig. 17.9f ).
In their transformed forms, these models can use linear regression to evaluate the
constant coefficients. They could then be transformed back to their original state and
used for predictive purposes. Example 17.4 illustrates this procedure for Eq. (17.13). In
addition, Sec. 20.1 provides an engineering example of the same sort of computation.
EXAMPLE 17.4 Linearization of a Power Equation
Problem Statement. Fit Eq. (17.13) to the data in Table 17.3 using a logarithmic
transformation of the data.
Solution. Figure 17.10a is a plot of the original data in its untransformed state. Figure
17.10b shows the plot of the transformed data. A linear regression of the log-transformed
data yields the result
log y 5 1.75 log x 2 0.300
17.1 LINEAR REGRESSION 471
TABLE 17.3 Data to be fit to the power equation.
x y log x log y
1 0.5 0 20.301
2 1.7 0.301 0.226
3 3.4 0.477 0.534
4 5.7 0.602 0.753
5 8.4 0.699 0.922
FIGURE 17.10
(a) Plot of untransformed data with the power equation that fits these data. (b) Plot of transformed
data used to determine the coefficients of the power equation.
y
x
5
0
0
5
(a)
log y
0.5
(b)
log x
0.5
472 LEAST-SQUARES REGRESSION
17.1.6 General Comments on Linear Regression
Before proceeding to curvilinear and multiple linear regression, we must emphasize the
introductory nature of the foregoing material on linear regression. We have focused on
the simple derivation and practical use of equations to fit data. You should be cognizant
of the fact that there are theoretical aspects of regression that are of practical importance
but are beyond the scope of this book. For example, some statistical assumptions that
are inherent in the linear least-squares procedures are
1. Each x has a fixed value; it is not random and is known without error.
2. The y values are independent random variables and all have the same variance.
3. The y values for a given x must be normally distributed.
Such assumptions are relevant to the proper derivation and use of regression. For
example, the first assumption means that (1) the x values must be error-free and (2) the
regression of y versus x is not the same as x versus y (try Prob. 17.4 at the end of the
chapter). You are urged to consult other references such as Draper and Smith (1981) to
appreciate aspects and nuances of regression that are beyond the scope of this book.
17.2 POLYNOMIAL REGRESSION
In Sec. 17.1, a procedure was developed to derive the equation of a straight line using
the least-squares criterion. Some engineering data, although exhibiting a marked pattern
such as seen in Fig. 17.8, is poorly represented by a straight line. For these cases, a curve
would be better suited to fit these data. As discussed in the previous section, one method
to accomplish this objective is to use transformations. Another alternative is to fit poly-
nomials to the data using polynomial regression.
The least-squares procedure can be readily extended to fit the data to a higher-order
polynomial. For example, suppose that we fit a second-order polynomial or quadratic:
y 5 a0 1 a1x 1 a2x2
1 e
For this case the sum of the squares of the residuals is [compare with Eq. (17.3)]
Sr 5 a
n
i51
(yi 2 a0 2 a1xi 2 a2x2
i )2
(17.18)
Following the procedure of the previous section, we take the derivative of Eq. (17.18)
with respect to each of the unknown coefficients of the polynomial, as in
0Sr
0a0
5 22a (yi 2 a0 2 a1xi 2 a2x2
i )
Thus, the intercept, log a2, equals 20.300, and therefore, by taking the antilogarithm,
a2 5 1020.3
5 0.5. The slope is b2 5 1.75. Consequently, the power equation is
y 5 0.5x1.75
This curve, as plotted in Fig. 17.10a, indicates a good fit.
17.2 POLYNOMIAL REGRESSION 473
0Sr
0a1
5 22 a xi(yi 2 a0 2 a1xi 2 a2x2
i )
0Sr
0a2
5 22 a x2
i (yi 2 a0 2 a1xi 2 a2x2
i )
These equations can be set equal to zero and rearranged to develop the following set of
normal equations:
(n)a0 1 (a xi)a1 1 (a x2
i )a2 5 a yi
(a xi)a0 1 (a x2
i )a1 1 (a x3
i )a2 5 a xiyi (17.19)
(a x2
i )a0 1 (a x3
i )a1 1 (a x4
i )a2 5 a x2
i yi
where all summations are from i 5 1 through n. Note that the above three equations are
linear and have three unknowns: a0, a1, and a2. The coefficients of the unknowns can be
calculated directly from the observed data.
For this case, we see that the problem of determining a least-squares second-order
polynomial is equivalent to solving a system of three simultaneous linear equations.
Techniques to solve such equations were discussed in Part Three.
The two-dimensional case can be easily extended to an mth-order polynomial as
y 5 a0 1 a1x 1 a2x2
1 p 1 amxm
1 e
The foregoing analysis can be easily extended to this more general case. Thus, we can
recognize that determining the coefficients of an mth-order polynomial is equivalent to
solving a system of m 1 1 simultaneous linear equations. For this case, the standard
error is formulated as
sy/x 5
B
Sr
n 2 (m 1 1)
(17.20)
This quantity is divided by n 2 (m 1 1) because (m 1 1) data-derived coefficients—
a0, a1, . . . , am—were used to compute Sr; thus, we have lost m 1 1 degrees of free-
dom. In addition to the standard error, a coefficient of determination can also be
computed for polynomial regression with Eq. (17.10).
EXAMPLE 17.5 Polynomial Regression
Problem Statement. Fit a second-order polynomial to the data in the first two columns
of Table 17.4.
Solution. From the given data,
m 5 2 a xi 5 15 a x4
i 5 979
n 5 6 a yi 5 152.6 a xiyi 5 585.6
x 5 2.5 a x2
i 5 55 a x2
i yi 5 2488.8
y 5 25.433 a x3
i 5 225
474 LEAST-SQUARES REGRESSION
Therefore, the simultaneous linear equations are
£
6 15 55
15 55 225
55 225 979
§ •
a0
a1
a2
¶ 5 •
152.6
585.6
2488.8
¶
Solving these equations through a technique such as Gauss elimination gives a0 5 2.47857,
a1 5 2.35929, and a2 5 1.86071. Therefore, the least-squares quadratic equation for this case is
y 5 2.47857 1 2.35929x 1 1.86071x2
The standard error of the estimate based on the regression polynomial is [Eq. (17.20)]
syyx 5
A
3.74657
6 2 3
5 1.12
TABLE 17.4 Computations for an error analysis of the quadratic least-squares fit.
xi yi (yi 2 y)2
(yi 2 a0 2 a1xi 2 a2xi
2
)2
0 2.1 544.44 0.14332
1 7.7 314.47 1.00286
2 13.6 140.03 1.08158
3 27.2 3.12 0.80491
4 40.9 239.22 0.61951
5 61.1 1272.11 0.09439
S 152.6 2513.39 3.74657
FIGURE 17.11
Fit of a second-order polynomial.
y
x
5
0
50
Least-squares
parabola
17.2 POLYNOMIAL REGRESSION 475
The coefficient of determination is
r2
5
2513.39 2 3.74657
2513.39
5 0.99851
and the correlation coefficient is r 5 0.99925.
These results indicate that 99.851 percent of the original uncertainty has been ex-
plained by the model. This result supports the conclusion that the quadratic equation
represents an excellent fit, as is also evident from Fig. 17.11.
17.2.1 Algorithm for Polynomial Regression
An algorithm for polynomial regression is delineated in Fig. 17.12. Note that the primary
task is the generation of the coefficients of the normal equations [Eq. (17.19)]. (Pseudocode
for accomplishing this is presented in Fig. 17.13.) Then, techniques from Part Three can
be applied to solve these simultaneous equations for the coefficients.
A potential problem associated with implementing polynomial regression on the
computer is that the normal equations tend to be ill-conditioned. This is particularly
true for higher-order versions. For these cases, the computed coefficients may be highly
susceptible to round-off error, and consequently, the results can be inaccurate. Among
other things, this problem is related to the structure of the normal equations and to the
fact that for higher-order polynomials the normal equations can have very large and
very small coefficients. This is because the coefficients are summations of the data
raised to powers.
Although the strategies for mitigating round-off error discussed in Part Three, such as
pivoting, can help to partially remedy this problem, a simpler alternative is to use a com-
puter with higher precision. Fortunately, most practical problems are limited to lower-order
polynomials for which round-off is usually negligible. In situations where higher-order
versions are required, other alternatives are available for certain types of data. However,
these techniques (such as orthogonal polynomials) are beyond the scope of this book. The
reader should consult texts on regression, such as Draper and Smith (1981), for additional
information regarding the problem and possible alternatives.
FIGURE 17.12
Algorithm for implementation of polynomial and multiple linear regression.
Step 1: Input order of polynomial to be fit, m.
Step 2: Input number of data points, n.
Step 3: If n , m 1 1, print out an error message that regression is impossible and terminate
the process. If n $ m 1 1, continue.
Step 4: Compute the elements of the normal equation in the form of an augmented matrix.
Step 5: Solve the augmented matrix for the coefficients a0, a1, a2, . . . , am, using an
elimination method.
Step 6: Print out the coefficients.
476 LEAST-SQUARES REGRESSION
17.3 MULTIPLE LINEAR REGRESSION
A useful extension of linear regression is the case where y is a linear function of two or
more independent variables. For example, y might be a linear function of x1 and x2, as in
y 5 a0 1 a1x1 1 a2x2 1 e
Such an equation is particularly useful when fitting experimental data, where the variable
being studied is often a function of two other variables. For this two-dimensional case,
the regression “line” becomes a “plane” (Fig. 17.14).
DOFOR i 5 1, order 1 1
DOFOR j 5 1, i
k 5 i 1 j 2 2
sum 5 0
DOFOR , 5 1, n
sum 5 sum 1 x,
k
END DO
ai,j 5 sum
aj,i 5 sum
END DO
sum 5 0
DOFOR , 5 1, n
sum 5 sum 1 y, ? x,
i21
END DO
ai,order12 5 sum
END DO
FIGURE 17.13
Pseudocode to assemble the
elements of the normal
equations for polynomial
regression.
FIGURE 17.14
Graphical depiction of multiple
linear regression where y is a
linear function of x1 and x2.
y
x1
x2
17.3 MULTIPLE LINEAR REGRESSION 477
As with the previous cases, the “best” values of the coefficients are determined by
setting up the sum of the squares of the residuals,
Sr 5 a
n
i51
(yi 2 a0 2 a1x1i 2 a2x2i)2
(17.21)
and differentiating with respect to each of the unknown coefficients,
0Sr
0a0
5 22 a (yi 2 a0 2 a1x1i 2 a2x2i)
0Sr
0a1
5 22 a x1i (yi 2 a0 2 a1x1i 2 a2x2i)
0Sr
0a2
5 22 a x2i (yi 2 a0 2 a1x1i 2 a2x2i)
The coefficients yielding the minimum sum of the squares of the residuals are obtained
by setting the partial derivatives equal to zero and expressing the result in matrix form as
£
n gx1i gx2i
gx1i gx2
1i gx1ix2i
gx2i gx1ix2i gx2
2i
§ 5 •
a0
a1
a2
¶ 5 •
gyi
gx1iyi
gx2iyi
¶ (17.22)
EXAMPLE 17.6 Multiple Linear Regression
Problem Statement. The following data were calculated from the equation y 5 5 1
4x1 2 3x2:
x1 x2 y
0 0 5
2 1 10
2.5 2 9
1 3 0
4 6 3
7 2 27
Use multiple linear regression to fit these data.
Solution. The summations required to develop Eq. (17.22) are computed in Table 17.5.
The result is
£
6 16.5 14
16.5 76.25 48
14 48 54
§ •
a0
a1
a2
¶ 5 •
54
243.5
100
¶
which can be solved using a method such as Gauss elimination for
a0 5 5 a1 5 4 a2 5 23
which is consistent with the original equation from which these data were derived.
478 LEAST-SQUARES REGRESSION
The foregoing two-dimensional case can be easily extended to m dimensions, as in
y 5 a0 1 a1x1 1 a2x2 1 p 1 amxm 1 e
where the standard error is formulated as
syyx 5
B
Sr
n 2 (m 1 1)
and the coefficient of determination is computed as in Eq. (17.10). An algorithm to set
up the normal equations is listed in Fig. 17.15.
Although there may be certain cases where a variable is linearly related to two or
more other variables, multiple linear regression has additional utility in the derivation of
power equations of the general form
y 5 a0xa1
1 xa2
2
p xam
m
TABLE 17.5 Computations required to develop the normal equations for Example 17.6.
y x1 x2 x1
2
x2
2
x1x2 x1y x2y
5 0 0 0 0 0 0 0
10 2 1 4 1 2 20 10
9 2.5 2 6.25 4 5 22.5 18
0 1 3 1 9 3 0 0
3 4 6 16 36 24 12 18
27 7 2 49 4 14 189 54
S 54 16.5 14 76.25 54 48 243.5 100
DOFOR i 5 1, order 1 1
DOFOR j 5 1, i
sum 5 0
DOFOR , 5 1, n
sum 5 sum 1 xi21,, ? xj21,,
END DO
ai,j 5 sum
aj,i 5 sum
END DO
sum 5 0
DOFOR , 5 1, n
sum 5 sum 1 y, ? xi21,,
END DO
ai,order12 5 sum
END DO
FIGURE 17.15
Pseudocode to assemble the elements of the normal equations for multiple regression. Note that
aside from storing the independent variables in x1,i, x2,i, etc., 1’s must be stored in x0,i for this al-
gorithm to work.
17.4 GENERAL LINEAR LEAST SQUARES 479
Such equations are extremely useful when fitting experimental data. To use multiple
linear regression, the equation is transformed by taking its logarithm to yield
log y 5 log a0 1 a1 log x1 1 a2 log x2 1 p 1 am log xm
This transformation is similar in spirit to the one used in Sec. 17.1.5 and Example 17.4
to fit a power equation when y was a function of a single variable x. Section 20.4 provides
an example of such an application for two independent variables.
17.4 GENERAL LINEAR LEAST SQUARES
To this point, we have focused on the mechanics of obtaining least-squares fits of some
simple functions to data. Before turning to nonlinear regression, there are several issues
that we would like to discuss to enrich your understanding of the preceding material.
17.4.1 General Matrix Formulation for Linear Least Squares
In the preceding pages, we have introduced three types of regression: simple linear,
polynomial, and multiple linear. In fact, all three belong to the following general linear
least-squares model:
y 5 a0z0 1 a1z1 1 a2z2 1 p 1 amzm 1 e (17.23)
where z0, z1, . . . , zm are m 1 1 basis functions. It can easily be seen how simple and
multiple linear regression fall within this model—that is, z0 5 1, z1 5 x1, z2 5 x2, . . . ,
zm 5 xm. Further, polynomial regression is also included if the basis functions are simple
monomials as in z0 5 x0
5 1, z1 5 x, z2 5 x2
, . . . , zm 5 xm
.
Note that the terminology “linear” refers only to the model’s dependence on its
parameters—that is, the a’s. As in the case of polynomial regression, the functions them-
selves can be highly nonlinear. For example, the z’s can be sinusoids, as in
y 5 a0 1 a1 cos(vt) 1 a2 sin(vt)
Such a format is the basis of Fourier analysis described in Chap. 19.
On the other hand, a simple-looking model like
f(x) 5 a0(1 2 e2a1x
)
is truly nonlinear because it cannot be manipulated into the format of Eq. (17.23). We
will turn to such models at the end of this chapter.
For the time being, Eq. (17.23) can be expressed in matrix notation as
{Y} 5 [Z]{A} 1 {E} (17.24)
where [Z] is a matrix of the calculated values of the basis functions at the measured
values of the independent variables,
[Z] 5 F
z01 z11
p zm1
z02 z12
p zm2
. . .
. . .
. . .
z0n z1n p zmn
V
480 LEAST-SQUARES REGRESSION
where m is the number of variables in the model and n is the number of data points. Be-
cause n $ m 1 1, you should recognize that most of the time, [Z] is not a square matrix.
The column vector {Y} contains the observed values of the dependent variable
{Y}T
5 :y1 y2
p yn ;
The column vector {A} contains the unknown coefficients
{A}T
5 :a0 a1
p am;
and the column vector {E} contains the residuals
{E}T
5 :e1 e2
p en ;
As was done throughout this chapter, the sum of the squares of the residuals for this
model can be defined as
Sr 5 a
n
i51
ayi 2 a
m
j50
ajzjib
2
This quantity can be minimized by taking its partial derivative with respect to each of
the coefficients and setting the resulting equation equal to zero. The outcome of this
process is the normal equations that can be expressed concisely in matrix form as
3[Z]T
[Z]4{A} 5 5[Z]T
{Y}6 (17.25)
It can be shown that Eq. (17.25) is, in fact, equivalent to the normal equations developed
previously for simple linear, polynomial, and multiple linear regression.
Our primary motivation for the foregoing has been to illustrate the unity among the
three approaches and to show how they can all be expressed simply in the same matrix
notation. The matrix notation will also have relevance when we turn to nonlinear regres-
sion in the last section of this chapter.
From Eq. (PT3.6), recall that the matrix inverse can be employed to solve Eq. (17.25),
as in
{A} 5 3[Z]T
[Z]421
5[Z]T
{Y}6 (17.26)
As we have learned in Part Three, this is an inefficient approach for solving a set of
simultaneous equations. However, from a statistical perspective, there are a number of
reasons why we might be interested in obtaining the inverse and examining its coeffi-
cients. These reasons will be discussed next.
17.4.2 Statistical Aspects of Least-Squares Theory
In Sec. PT5.2.1, we reviewed a number of descriptive statistics that can be used to describe
a sample. These included the arithmetic mean, the standard deviation, and the variance.
Aside from yielding a solution for the regression coefficients, the matrix formula-
tion of Eq. (17.26) provides estimates of their statistics. It can be shown (Draper and
Smith, 1981) that the diagonal and off-diagonal terms of the matrix [[Z]T
[Z]]21
give,
respectively, the variances and the covariances1
of the a’s. If the diagonal elements of
1
The covariance is a statistic that measures the dependency of one variable on another. Thus, cov(x, y) indicates
the dependency of x and y. For example, cov(x, y) 5 0 would indicate that x and y are totally independent.
17.4 GENERAL LINEAR LEAST SQUARES 481
[[Z]T
[Z]]21
are designated as z21
i,i ,
var(ai21) 5 z21
i,i s2
yyx (17.27)
and
cov(ai21, aj21) 5 z21
i, j s2
yyx (17.28)
These statistics have a number of important applications. For our present purposes,
we will illustrate how they can be used to develop confidence intervals for the intercept
and slope.
Using an approach similar to that in Sec. PT5.2.3, it can be shown that lower and upper
bounds on the intercept can be formulated as (see Milton and Arnold, 2002, for details)
L 5 a0 2 tay2,n22 s(a0) U 5 a0 1 tay2,n22 s(a0) (17.29)
where s(aj) 5 the standard error of coefficient aj 5 1var(aj). In a similar manner, lower
and upper bounds on the slope can be formulated as
L 5 a1 2 tay2,n22 s(a1) U 5 a1 1 tay2,n22 s(a1) (17.30)
The following example illustrates how these intervals can be used to make quantitative
inferences related to linear regression.
EXAMPLE 17.7 Confidence Intervals for Linear Regression
Problem Statement. In Example 17.3, we used regression to develop the following
relationship between measurements and model predictions:
y 5 20.859 1 1.032x
where y 5 the model predictions and x 5 the measurements. We concluded that there was
a good agreement between the two because the intercept was approximately equal to 0 and
the slope approximately equal to 1. Recompute the regression but use the matrix approach
to estimate standard errors for the parameters. Then employ these errors to develop confidence
intervals, and use these to make a probabilistic statement regarding the goodness of fit.
Solution. These data can be written in matrix format for simple linear regression as:
[Z] 5 G
1 10
1 16.3
1 23
. .
. .
. .
1 50
W {Y} 5 g
8.953
16.405
22.607
.
.
.
49.988
w
Matrix transposition and multiplication can then be used to generate the normal equations as
3[Z]T
[Z]4 {A} 5 5[Z]T
{Y}6
c
15 548.3
548.3 22191.21
d e
a0
a1
f 5 e
552.741
22421.43
f
482 LEAST-SQUARES REGRESSION
Matrix inversion can be used to obtain the slope and intercept as
{A} 5 3[Z]T
[Z]421
5[Z]T
{Y}6
5 c
0.688414 20.01701
20.01701 0.000465
d e
552.741
22421.43
f 5 e
20.85872
1.031592
f
Thus, the intercept and the slope are determined as a0 5 20.85872 and a1 5 1.031592,
respectively. These values in turn can be used to compute the standard error of the estimate
as syyx 5 0.863403. This value can be used along with the diagonal elements of the
matrix inverse to calculate the standard errors of the coefficients,
s(a0) 5 2z21
11 s2
yyx 5 20.688414(0.863403)2
5 0.716372
s(a1) 5 2z21
22 s2
yyx 5 20.000465(0.863403)2
5 0.018625
The statistic, tay2,n21 needed for a 95% confidence interval with n 2 2 5 15 2 2 5 13
degrees of freedom can be determined from a statistics table or using software. We used
an Excel function, TINV, to come up with the proper value, as in
5 TINV(0.05, 13)
which yielded a value of 2.160368. Equations (17.29) and (17.30) can then be used to
compute the confidence intervals as
a0 5 20.85872 ; 2.160368(0.716372)
5 20.85872 ; 1.547627 5 [22.40634, 0.688912]
a1 5 1.031592 ; 2.160368(0.018625)
5 1.031592 ; 0.040237 5 [0.991355, 1.071828]
Notice that the desired values (0 for intercept and slope and 1 for the intercept) fall
within the intervals. On the basis of this analysis we could make the following statement
regarding the slope: We have strong grounds for believing that the slope of the true regres-
sion line lies within the interval from 0.991355 to 1.071828. Because 1 falls within this
interval, we also have strong grounds for believing that the result supports the agreement
between the measurements and the model. Because zero falls within the intercept interval,
a similar statement can be made regarding the intercept.
As mentioned previously in Sec. 17.2.1, the normal equations are notoriously ill-
conditioned. Hence, if solved with conventional techniques such as LU decomposition,
the computed coefficients can be highly susceptible to round-off error. As a conse-
quence, more sophisticated orthogonalization algorithms, such as QR factorization, are
available to circumvent the problem. Because these techniques are beyond the scope of
this book, the reader should consult texts on regression, such as Draper and Smith
(1981), for additional information regarding the problem and possible alternatives.
Moler (2004) also provides a nice discussion of the topic with emphasis on the nu-
merical methods.
The foregoing is a limited introduction to the rich topic of statistical inference and
its relationship to regression. There are many subleties that are beyond the scope of this
17.5 NONLINEAR REGRESSION 483
book. Our primary motivation has been to illustrate the power of the matrix approach to
general linear least squares. In addition, it should be noted that software packages such
as Excel, MATLAB, and Mathcad can generate least-squares regression fits along with
information relevant to inferential statistics. We will explore some of these capabilities
when we describe these packages at the end of Chap. 19.
17.5 NONLINEAR REGRESSION
There are many cases in engineering where nonlinear models must be fit to data. In the
present context, these models are defined as those that have a nonlinear dependence on
their parameters. For example,
f(x) 5 a0(1 2 e2a1x
) 1 e (17.31)
This equation cannot be manipulated so that it conforms to the general form of Eq. (17.23).
As with linear least squares, nonlinear regression is based on determining the values
of the parameters that minimize the sum of the squares of the residuals. However, for
the nonlinear case, the solution must proceed in an iterative fashion.
The Gauss-Newton method is one algorithm for minimizing the sum of the squares
of the residuals between data and nonlinear equations. The key concept underlying the
technique is that a Taylor series expansion is used to express the original nonlinear equa-
tion in an approximate, linear form. Then, least-squares theory can be used to obtain new
estimates of the parameters that move in the direction of minimizing the residual.
To illustrate how this is done, first the relationship between the nonlinear equation
and the data can be expressed generally as
yi 5 f(xi; a0, a1, p , am) 1 ei
where yi 5 a measured value of the dependent variable, f(xi; a0, a1, p , am) 5 the equa-
tion that is a function of the independent variable xi and a nonlinear function of the
parameters a0, a1, p , am, and ei 5 a random error. For convenience, this model can be
expressed in abbreviated form by omitting the parameters,
yi 5 f(xi) 1 ei (17.32)
The nonlinear model can be expanded in a Taylor series around the parameter values
and curtailed after the first derivative. For example, for a two-parameter case,
f(xi)j11 5 f(xi)j 1
0f(xi)j
0a0
¢a0 1
0f(xi)j
0a1
¢a1 (17.33)
where j 5 the initial guess, j 1 1 5 the prediction, Da0 5 a0,j11 2 a0,j, and Da1 5 a1,j11 2
a1,j. Thus, we have linearized the original model with respect to the parameters. Equation
(17.33) can be substituted into Eq. (17.32) to yield
yi 2 f(xi)j 5
0f(xi)j
0a0
¢a0 1
0f(xi)j
0a1
¢a1 1 ei
or in matrix form [compare with Eq. (17.24)],
{D} 5 [Zj]{¢A} 1 {E} (17.34)
484 LEAST-SQUARES REGRESSION
where [Zj] is the matrix of partial derivatives of the function evaluated at the initial guess j,
[Zj] 5 F
0f1y0a0 0f1y0a1
0f2y0a0 0f2y0a1
. .
. .
. .
0fny0a0 0fny0a1
V
where n 5 the number of data points and 0fiy0ak 5 the partial derivative of the function
with respect to the kth parameter evaluated at the ith data point. The vector {D} contains
the differences between the measurements and the function values,
{D} 5 f
y1 2 f(x1)
y2 2 f(x2)
.
.
.
yn 2 f(xn)
v
and the vector {DA} contains the changes in the parameter values,
{¢A} 5 f
¢a0
¢a1
.
.
.
¢am
v
Applying linear least-squares theory to Eq. (17.34) results in the following normal equa-
tions [recall Eq. (17.25)]:
3[Zj]T
[Zj]4{¢A} 5 5[Zj]T
{D}6 (17.35)
Thus, the approach consists of solving Eq. (17.35) for {DA}, which can be employed to
compute improved values for the parameters, as in
a0, j11 5 a0, j 1 ¢a0
and
a1, j11 5 a1, j 1 ¢a1
This procedure is repeated until the solution converges—that is, until
Zea Zk 5 `
ak, j11 2 ak, j
ak, j11
` 100% (17.36)
falls below an acceptable stopping criterion.
17.5 NONLINEAR REGRESSION 485
EXAMPLE 17.8 Gauss-Newton Method
Problem Statement. Fit the function f(x; a0, a1) 5 a0(1 2 e2a1x
) to the data:
x 0.25 0.75 1.25 1.75 2.25
y 0.28 0.57 0.68 0.74 0.79
Use initial guesses of a0 5 1.0 and a1 5 1.0 for the parameters. Note that for these
guesses, the initial sum of the squares of the residuals is 0.0248.
Solution. The partial derivatives of the function with respect to the parameters are
0f
0a0
5 1 2 e2a1x
(E17.8.1)
and
0f
0a1
5 a0xe2a1x
(E17.8.2)
Equations (E17.8.1) and (E17.8.2) can be used to evaluate the matrix
[Z0] 5 E
0.2212 0.1947
0.5276 0.3543
0.7135 0.3581
0.8262 0.3041
0.8946 0.2371
U
This matrix multiplied by its transpose results in
[Z0]T
[Z0] 5 c
2.3193 0.9489
0.9489 0.4404
d
which in turn can be inverted to yield
3[Z0]T
[Z0]421
5 c
3.6397 27.8421
27.8421 19.1678
d
The vector {D} consists of the differences between the measurements and the model
predictions,
{D} 5 e
0.28 2 0.2212
0.57 2 0.5276
0.68 2 0.7135
0.74 2 0.8262
0.79 2 0.8946
u 5 e
0.0588
0.0424
20.0335
20.0862
20.1046
u
It is multiplied by [Z0]T
to give
[Z0]T
{D} 5 c
20.1533
20.0365
d
486 LEAST-SQUARES REGRESSION
The vector {DA} is then calculated by solving Eq. (17.35) for
¢A 5 e
20.2714
0.5019
f
which can be added to the initial parameter guesses to yield
e
a0
a1
f 5 e
1.0
1.0
f 1 e
20.2714
0.5019
f 5 e
0.7286
1.5019
f
Thus, the improved estimates of the parameters are a0 5 0.7286 and a1 5 1.5019. The
new parameters result in a sum of the squares of the residuals equal to 0.0242. Equation
(17.36) can be used to compute e0 and e1 equal to 37 and 33 percent, respectively. The
computation would then be repeated until these values fell below the prescribed stopping
criterion. The final result is a0 5 0.79186 and a1 5 1.6751. These coefficients give a
sum of the squares of the residuals of 0.000662.
A potential problem with the Gauss-Newton method as developed to this point is
that the partial derivatives of the function may be difficult to evaluate. Consequently,
many computer programs use difference equations to approximate the partial derivatives.
One method is
0fi
0ak

f(xi; a0, p , ak 1 dak, p , am) 2 f(xi; a0, p , ak, p , am)
dak
(17.37)
where d 5 a small fractional perturbation.
The Gauss-Newton method has a number of other possible shortcomings:
1. It may converge slowly.
2. It may oscillate widely, that is, continually change directions.
3. It may not converge at all.
Modifications of the method (Booth and Peterson, 1958; Hartley, 1961) have been de-
veloped to remedy the shortcomings.
In addition, although there are several approaches expressly designed for regres-
sion, a more general approach is to use nonlinear optimization routines as described
in Part Four. To do this, a guess for the parameters is made, and the sum of the
squares of the residuals is computed. For example, for Eq. (17.31) it would be com-
puted as
Sr 5 a
n
i51
[yi 2 a0(1 2 e2a1xi
)]2
(17.38)
Then, the parameters would be adjusted systematically to minimize Sr using search tech-
niques of the type described previously in Chap. 14. We will illustrate how this is done
when we describe software applications at the end of Chap. 19.
PROBLEMS 487
PROBLEMS
17.1 Given these data
8.8 9.5 9.8 9.4 10.0
9.4 10.1 9.2 11.3 9.4
10.0 10.4 7.9 10.4 9.8
9.8 9.5 8.9 8.8 10.6
10.1 9.5 9.6 10.2 8.9
Determine (a) the mean, (b) the standard deviation, (c) the vari-
ance, (d) the coefficient of variation, and (e) the 95% confidence
interval for the mean. (f) construct a histogram using a range from
7.5 to 11.5 with intervals of 0.5.
17.2 Given these data
29.65 28.55 28.65 30.15 29.35 29.75 29.25
30.65 28.15 29.85 29.05 30.25 30.85 28.75
29.65 30.45 29.15 30.45 33.65 29.35 29.75
31.25 29.45 30.15 29.65 30.55 29.65 29.25
Determine (a) the mean, (b) the standard deviation, (c) the vari-
ance, (d) the coefficient of variation, and (e) the 90% confidence
interval for the mean. (f) Construct a histogram. Use a range from
28 to 34 with increments of 0.4. (g) Assuming that the distribution
is normal and that your estimate of the standard deviation is valid,
compute the range (that is, the lower and the upper values) that
encompasses 68% of the readings. Determine whether this is a
valid estimate for the data in this problem.
17.3 Use least-squares regression to fit a straight line to
x 0 2 4 6 9 11 12 15 17 19
y 5 6 7 6 9 8 7 10 12 12
Along with the slope and intercept, compute the standard error of
the estimate and the correlation coefficient. Plot the data and the
regression line. Then repeat the problem, but regress x versus y—
that is, switch the variables. Interpret your results.
17.4 Use least-squares regression to fit a straight line to
x 6 7 11 15 17 21 23 29 29 37 39
y 29 21 29 14 21 15 7 7 13 0 3
Along with the slope and the intercept, compute the standard error of
the estimate and the correlation coefficient. Plot the data and the re-
gression line. If someone made an additional measurement of x 5 10,
y 5 10, would you suspect, based on a visual assessment and the
standard error, that the measurement was valid or faulty? Justify your
conclusion.
17.5 Using the same approach as was employed to derive Eqs. (17.15)
and (17.16), derive the least-squares fit of the following model:
y 5 a1x 1 e
That is, determine the slope that results in the least-squares fit for a
straight line with a zero intercept. Fit the following data with this
model and display the result graphically:
x 2 4 6 7 10 11 14 17 20
y 1 2 5 2 8 7 6 9 12
17.6 Use least-squares regression to fit a straight line to
x 1 2 3 4 5 6 7 8 9
y 1 1.5 2 3 4 5 8 10 13
(a) Along with the slope and intercept, compute the standard error
of the estimate and the correlation coefficient. Plot the data and
the straight line. Assess the fit.
(b) Recompute (a), but use polynomial regression to fit a parabola
to the data. Compare the results with those of (a).
17.7 Fit the following data with (a) a saturation-growth-rate model,
(b) a power equation, and (c) a parabola. In each case, plot the data
and the equation.
x 0.75 2 3 4 6 8 8.5
y 1.2 1.95 2 2.4 2.4 2.7 2.6
17.8 Fit the following data with the power model (y 5 axb
). Use
the resulting power equation to predict y at x 5 9:
x 2.5 3.5 5 6 7.5 10 12.5 15 17.5 20
y 13 11 8.5 8.2 7 6.2 5.2 4.8 4.6 4.3
17.9 Fit an exponential model to
x 0.4 0.8 1.2 1.6 2 2.3
y 800 975 1500 1950 2900 3600
Plot the data and the equation on both standard and semi-logarithmic
graph paper.
17.10 Rather than using the base-e exponential model (Eq. 17.22),
a common alternative is to use a base-10 model,
y 5 a510b5x
When used for curve fitting, this equation yields identical results
to the base-e version, but the value of the exponent parameter (b5)
will differ from that estimated with Eq. 17.22 (b1). Use the base-10
version to solve Prob. 17.9. In addition, develop a formulation to
relate b1 to b5.
17.11 Beyond the examples in Fig. 17.10, there are other models
that can be linearized using transformations. For example,
y 5 a4xeb4x
488 LEAST-SQUARES REGRESSION
Determine the coefficients by setting up and solving Eq. (17.25).
17.16 Given these data
x 5 10 15 20 25 30 35 40 45 50
y 17 24 31 33 37 37 40 40 42 41
use least-squares regression to fit (a) a straight line, (b) a power
equation, (c) a saturation-growth-rate equation, and (d) a parabola.
Plot the data along with all the curves. Is any one of the curves
superior? If so, justify.
17.17 Fit a cubic equation to the following data:
x 3 4 5 7 8 9 11 12
y 1.6 3.6 4.4 3.4 2.2 2.8 3.8 4.6
Along with the coefficients, determine r2
and syyx.
17.18 Use multiple linear regression to fit
x1 0 1 1 2 2 3 3 4 4
x2 0 1 2 1 2 1 2 1 2
y 15.1 17.9 12.7 25.6 20.5 35.1 29.7 45.4 40.2
Compute the coefficients, the standard error of the estimate, and the
correlation coefficient.
17.19 Use multiple linear regression to fit
x1 0 0 1 2 0 1 2 2 1
x2 0 2 2 4 4 6 6 2 1
y 14 21 11 12 23 23 14 6 11
Compute the coefficients, the standard error of the estimate, and the
correlation coefficient.
17.20 Use nonlinear regression to fit a parabola to the following
data:
x 0.2 0.5 0.8 1.2 1.7 2 2.3
y 500 700 1000 1200 2200 2650 3750
17.21 Use nonlinear regression to fit a saturation-growth-rate
equation to the data in Prob. 17.16.
17.22 Recompute the regression fits from Probs. (a) 17.3 and (b)
17.17, using the matrix approach. Estimate the standard errors and
develop 90% confidence intervals for the coefficients.
17.23 Develop, debug, and test a program in either a high-level
language or macro language of your choice to implement linear
regression. Among other things: (a) include statements to docu-
ment the code, and (b) determine the standard error and the coeffi-
cient of determination.
17.24 A material is tested for cyclic fatigue failure whereby a
stress, in MPa, is applied to the material and the number of cycles
needed to cause failure is measured. The results are in the table
below. When a log-log plot of stress versus cycles is generated, the
Linearize this model and use it to estimate a4 and b4 based on the
following data. Develop a plot of your fit along with the data.
x 0.1 0.2 0.4 0.6 0.9 1.3 1.5 1.7 1.8
y 0.75 1.25 1.45 1.25 0.85 0.55 0.35 0.28 0.18
17.12 An investigator has reported the data tabulated below for an
experiment to determine the growth rate of bacteria k (per d), as a
function of oxygen concentration c (mg/L). It is known that such
data can be modeled by the following equation:
k 5
kmaxc2
cs 1 c2
where cs and kmax are parameters. Use a transformation to linearize
this equation. Then use linear regression to estimate cs and kmax and
predict the growth rate at c 5 2 mg/L.
c 0.5 0.8 1.5 2.5 4
k 1.1 2.4 5.3 7.6 8.9
17.13 An investigator has reported the data tabulated below. It is
known that such data can be modeled by the following equation
x 5 e(y2b)ya
where a and b are parameters. Use a transformation to linearize this
equation and then employ linear regression to determine a and b.
Based on your analysis predict y at x 5 2.6.
x 1 2 3 4 5
y 0.5 2 2.9 3.5 4
17.14 It is known that the data tabulated below can be modeled by
the following equation
y 5 a
a 1 1x
b1x
b
2
Use a transformation to linearize this equation and then employ
linear regression to determine the parameters a and b. Based on
your analysis predict y at x 5 1.6.
x 0.5 1 2 3 4
y 10.4 5.8 3.3 2.4 2
17.15 The following data are provided
x 1 2 3 4 5
y 2.2 2.8 3.6 4.5 5.5
You want to use least-squares regression to fit these data with the
following model,
y 5 a 1 bx 1
c
x
PROBLEMS 489
at which the concentration will reach 200 CFUy100 mL. Note that
your choice of model should be consistent with the fact that nega-
tive concentrations are impossible and that the bacteria concentra-
tion always decreases with time.
17.28 An object is suspended in a wind tunnel and the force mea-
sured for various levels of wind velocity. The results are tabulated
below.
v, m/s 10 20 30 40 50 60 70 80
F, N 25 70 380 550 610 1220 830 1450
Use least-squares regression to fit these data with (a) a straight line,
(b) a power equation based on log transformations, and (c) a power
model based on nonlinear regression. Display the results graphically.
17.29 Fit a power model to the data from Prob. 17.28, but use
natural logarithms to perform the transformations.
17.30 Derive the least-squares fit of the following model:
y 5 a1x 1 a2x2
1 e
That is, determine the coefficients that results in the least-squares fit
for a second-order polynomial with a zero intercept. Test the ap-
proach by using it to fit the data from Prob. 17.28.
17.31 In Prob. 17.11 we used transformations to linearize and fit
the following model:
y 5 a4xeb4x
Use nonlinear regression to estimate a4 and b4 based on the follow-
ing data. Develop a plot of your fit along with the data.
x 0.1 0.2 0.4 0.6 0.9 1.3 1.5 1.7 1.8
y 0.75 1.25 1.45 1.25 0.85 0.55 0.35 0.28 0.18
data trend shows a linear relationship. Use least-squares regression
to determine a best-fit equation for these data.
N, cycles 1 10 100 1000 10,000 100,000 1,000,000
Stress, MPa 1100 1000 925 800 625 550 420
17.25 The following data show the relationship between the vis-
cosity of SAE 70 oil and temperature. After taking the log of the
data, use linear regression to find the equation of the line that best
fits the data and the r2
value.
Temperature, 8C 26.67 93.33 148.89 315.56
Viscosity, m, N ? s/m2
1.35 0.085 0.012 0.00075
17.26 The data below represents the bacterial growth in a liquid
culture over a number of days.
Day 0 4 8 12 16 20
Amount 3 106
67 84 98 125 149 185
Find a best-fit equation to the data trend. Try several possibilities—
linear, parabolic, and exponential. Use the software package of
your choice to find the best equation to predict the amount of bac-
teria after 40 days.
17.27 The concentration of E. coli bacteria in a swimming area is
monitored after a storm:
t (hr) 4 8 12 16 20 24
c (CFUy100 mL) 1600 1320 1000 890 650 560
The time is measured in hours following the end of the storm and
the unit CFU is a “colony forming unit.” Use these data to estimate
(a) the concentration at the end of the storm (t 5 0) and (b) the time
18
C H A P T E R 18
490
Interpolation
You will frequently have occasion to estimate intermediate values between precise data
points. The most common method used for this purpose is polynomial interpolation.
Recall that the general formula for an nth-order polynomial is
f(x) 5 a0 1 a1x 1 a2x2
1 p 1 anxn
(18.1)
For n 1 1 data points, there is one and only one polynomial of order n that passes
through all the points. For example, there is only one straight line (that is, a first-order
polynomial) that connects two points (Fig. 18.1a). Similarly, only one parabola connects
a set of three points (Fig. 18.lb). Polynomial interpolation consists of determining the
unique nth-order polynomial that fits n 1 1 data points. This polynomial then provides
a formula to compute intermediate values.
Although there is one and only one nth-order polynomial that fits n 1 1 points, there
are a variety of mathematical formats in which this polynomial can be expressed. In this
chapter, we will describe two alternatives that are well-suited for computer implementa-
tion: the Newton and the Lagrange polynomials.
FIGURE 18.1
Examples of interpolating polynomials: (a) first-order (linear) connecting two points, (b) second-
order (quadratic or parabolic) connecting three points, and (c) third-order (cubic) connecting
four points.
(a) (b) (c)
18.1 NEWTON’S DIVIDED-DIFFERENCE INTERPOLATING POLYNOMIALS 491
18.1 NEWTON’S DIVIDED-DIFFERENCE INTERPOLATING
POLYNOMIALS
As stated above, there are a variety of alternative forms for expressing an interpolating
polynomial. Newton’s divided-difference interpolating polynomial is among the most
popular and useful forms. Before presenting the general equation, we will introduce the
first- and second-order versions because of their simple visual interpretation.
18.1.1 Linear Interpolation
The simplest form of interpolation is to connect two data points with a straight line. This tech-
nique, called linear interpolation, is depicted graphically in Fig. 18.2. Using similar triangles,
f1(x) 2 f(x0)
x 2 x0
5
f(x1) 2 f(x0)
x1 2 x0
which can be rearranged to yield
f1(x) 5 f(x0) 1
f(x1) 2 f(x0)
x1 2 x0
(x 2 x0) (18.2)
which is a linear-interpolation formula. The notation f1(x) designates that this is a first-
order interpolating polynomial. Notice that besides representing the slope of the line
connecting the points, the term [ f(x1) 2 f(x0)]y(x1 2 x0) is a finite-divided-difference
FIGURE 18.2
Graphical depiction of linear interpolation. The shaded areas indicate the similar triangles used
to derive the linear-interpolation formula [Eq. (18.2)].
f(x)
x
x1
x
x0
f(x1)
f(x0)
f1(x)
492 INTERPOLATION
approximation of the first derivative [recall Eq. (4.17)]. In general, the smaller the inter-
val between the data points, the better the approximation. This is due to the fact that, as
the interval decreases, a continuous function will be better approximated by a straight
line. This characteristic is demonstrated in the following example.
EXAMPLE 18.1 Linear Interpolation
Problem Statement. Estimate the natural logarithm of 2 using linear interpolation.
First, perform the computation by interpolating between ln 1 5 0 and ln 6 5 1.791759.
Then, repeat the procedure, but use a smaller interval from ln 1 to ln 4 (1.386294). Note
that the true value of ln 2 is 0.6931472.
Solution. We use Eq. (18.2) and a linear interpolation for ln(2) from x0 5 1 to
x1 5 6 to give
f1(2) 5 0 1
1.791759 2 0
6 2 1
(2 2 1) 5 0.3583519
which represents an error of ␧t 5 48.3%. Using the smaller interval from x0 5 1 to
x1 5 4 yields
f1(2) 5 0 1
1.386294 2 0
4 2 1
(2 2 1) 5 0.4620981
Thus, using the shorter interval reduces the percent relative error to ␧t 5 33.3%. Both
interpolations are shown in Fig. 18.3, along with the true function.
FIGURE 18.3
Two linear interpolations to estimate ln 2. Note how the smaller interval provides a better
estimate.
f(x)
f (x) = ln x
f1(x)
True
value
Linear estimates
x
5
0
2
0
1
18.1 NEWTON’S DIVIDED-DIFFERENCE INTERPOLATING POLYNOMIALS 493
18.1.2 Quadratic Interpolation
The error in Example 18.1 resulted from our approximating a curve with a straight line.
Consequently, a strategy for improving the estimate is to introduce some curvature into
the line connecting the points. If three data points are available, this can be accomplished
with a second-order polynomial (also called a quadratic polynomial or a parabola). A
particularly convenient form for this purpose is
f2(x) 5 b0 1 b1(x 2 x0) 1 b2(x 2 x0)(x 2 x1) (18.3)
Note that although Eq. (18.3) might seem to differ from the general polynomial [Eq. (18.1)],
the two equations are equivalent. This can be shown by multiplying the terms in
Eq. (18.3) to yield
f2(x) 5 b0 1 b1x 2 b1x0 1 b2x2
1 b2x0x1 2 b2xx0 2 b2xx1
or, collecting terms,
f2(x) 5 a0 1 a1x 1 a2x2
where
a0 5 b0 2 b1x0 1 b2x0x1
a1 5 b1 2 b2x0 2 b2x1
a2 5 b2
Thus, Eqs. (18.1) and (18.3) are alternative, equivalent formulations of the unique second-
order polynomial joining the three points.
A simple procedure can be used to determine the values of the coefficients. For b0,
Eq. (18.3) with x 5 x0 can be used to compute
b0 5 f(x0) (18.4)
Equation (18.4) can be substituted into Eq. (18.3), which can be evaluated at x 5 x1 for
b1 5
f(x1) 2 f(x0)
x1 2 x0
(18.5)
Finally, Eqs. (18.4) and (18.5) can be substituted into Eq. (18.3), which can be evaluated
at x 5 x2 and solved (after some algebraic manipulations) for
b2 5
f(x2) 2 f(x1)
x2 2 x1
2
f(x1) 2 f(x0)
x1 2 x0
x2 2 x0
(18.6)
Notice that, as was the case with linear interpolation, b1 still represents the slope of
the line connecting points x0 and x1. Thus, the first two terms of Eq. (18.3) are equivalent
to linear interpolation from x0 to x1, as specified previously in Eq. (18.2). The last term,
b2(x 2 x0)(x 2 x1), introduces the second-order curvature into the formula.
Before illustrating how to use Eq. (18.3), we should examine the form of the coef-
ficient b2. It is very similar to the finite-divided-difference approximation of the second
derivative introduced previously in Eq. (4.24). Thus, Eq. (18.3) is beginning to manifest
a structure that is very similar to the Taylor series expansion. This observation will be
494 INTERPOLATION
explored further when we relate Newton’s interpolating polynomials to the Taylor series
in Sec. 18.1.4. But first, we will do an example that shows how Eq. (18.3) is used to
interpolate among three points.
EXAMPLE 18.2 Quadratic Interpolation
Problem Statement. Fit a second-order polynomial to the three points used in Example 18.1:
x0 5 1 f(x0) 5 0
x1 5 4 f(x1) 5 1.386294
x2 5 6 f(x2) 5 1.791759
Use the polynomial to evaluate ln 2.
Solution. Applying Eq. (18.4) yields
b0 5 0
Equation (18.5) yields
b1 5
1.386294 2 0
4 2 1
5 0.4620981
and Eq. (18.6) gives
b2 5
1.791759 2 1.386294
6 2 4
2 0.4620981
6 2 1
5 20.0518731
FIGURE 18.4
The use of quadratic interpolation to estimate ln 2. The linear interpolation from x 5 1 to 4 is
also included for comparison.
f(x)
f (x) = ln x
f2(x)
True
value
Linear estimate
Quadratic estimate
x
5
0
2
0
1
18.1 NEWTON’S DIVIDED-DIFFERENCE INTERPOLATING POLYNOMIALS 495
Substituting these values into Eq. (18.3) yields the quadratic formula
f2(x) 5 0 1 0.4620981(x 2 1) 2 0.0518731(x 2 1)(x 2 4)
which can be evaluated at x 5 2 for
f2(2) 5 0.5658444
which represents a relative error of ␧t 5 18.4%. Thus, the curvature introduced by the
quadratic formula (Fig. 18.4) improves the interpolation compared with the result obtained
using straight lines in Example 18.1 and Fig. 18.3.
18.1.3 General Form of Newton’s Interpolating Polynomials
The preceding analysis can be generalized to fit an nth-order polynomial to n 1 1 data
points. The nth-order polynomial is
fn(x) 5 b0 1 b1(x 2 x0) 1 p 1 bn(x 2 x0)(x 2 x1) p (x 2 xn21) (18.7)
As was done previously with the linear and quadratic interpolations, data points can be
used to evaluate the coefficients b0, b1, . . . , bn. For an nth-order polynomial, n 1 1 data
points are required: [x0, f(x0)], [x1, f(x1)], . . . , [xn, f(xn)]. We use these data points and
the following equations to evaluate the coefficients:
b0 5 f(x0) (18.8)
b1 5 f [x1, x0] (18.9)
b2 5 f [x2, x1, x0] (18.10)
.
.
.
bn 5 f[xn, xn21, p , x1, x0] (18.11)
where the bracketed function evaluations are finite divided differences. For example, the
first finite divided difference is represented generally as
f[xi, xj] 5
f(xi) 2 f(xj)
xi 2 xj
(18.12)
The second finite divided difference, which represents the difference of two first divided
differences, is expressed generally as
f[xi, xj, xk] 5
f[xi, xj] 2 f[xj, xk]
xi 2 xk
(18.13)
Similarly, the nth finite divided difference is
f[xn, xn21, p , x1, x0] 5
f[xn, xn21, p , x1] 2 f[xn21, xn22, p , x0]
xn 2 x0
(18.14)
496 INTERPOLATION
These differences can be used to evaluate the coefficients in Eqs. (18.8) through
(18.11), which can then be substituted into Eq. (18.7) to yield the interpolating
polynomial
fn(x) 5 f(x0) 1 (x 2 x0)f[x1, x0] 1 (x 2 x0)(x 2 x1)f[x2, x1, x0]
1 p 1 (x 2 x0)(x 2 x1) p (x 2 xn21)f[xn, xn21, p , x0] (18.15)
which is called Newton’s divided-difference interpolating polynomial. It should be noted
that it is not necessary that the data points used in Eq. (18.15) be equally spaced or that
the abscissa values necessarily be in ascending order, as illustrated in the following
example. Also, notice how Eqs. (18.12) through (18.14) are recursive—that is, higher-
order differences are computed by taking differences of lower-order differences (Fig. 18.5).
This property will be exploited when we develop an efficient computer program in
Sec. 18.1.5 to implement the method.
EXAMPLE 18.3 Newton’s Divided-Difference Interpolating Polynomials
Problem Statement. In Example 18.2, data points at x0 5 1, x1 5 4, and x2 5 6 were
used to estimate ln 2 with a parabola. Now, adding a fourth point [x3 5 5; f(x3) 5 1.609438],
estimate ln 2 with a third-order Newton’s interpolating polynomial.
Solution. The third-order polynomial, Eq. (18.7) with n 5 3, is
f3(x) 5 b0 1 b1(x 2 x0) 1 b2(x 2 x0)(x 2 x1) 1 b3(x 2 x0)(x 2 x1)(x 2 x2)
The first divided differences for the problem are [Eq. (18.12)]
f[x1, x0] 5
1.386294 2 0
4 2 1
5 0.4620981
f[x2, x1] 5
1.791759 2 1.386294
6 2 4
5 0.2027326
f[x3, x2] 5
1.609438 2 1.791759
5 2 6
5 0.1823216
FIGURE 18.5
Graphical depiction of the recursive nature of finite divided differences.
i xi f(xi) First Second Third
0 x0 f(x0) f[x1, x0] f[x2, x1, x0] f[x3, x2, x1, x0]
1 x1 f(x1) f[x2, x1] f[x3, x2, x1]
2 x2 f(x2) f[x3, x2]
3 x3 f(x3)
18.1 NEWTON’S DIVIDED-DIFFERENCE INTERPOLATING POLYNOMIALS 497
The second divided differences are [Eq. (18.13)]
f[x2, x1, x0] 5
0.2027326 2 0.4620981
6 2 1
5 20.05187311
f[x3, x2, x1] 5
0.1823216 2 0.2027326
5 2 4
5 20.02041100
The third divided difference is [Eq. (18.14) with n 5 3]
f[x3, x2, x1, x0] 5
20.02041100 2 (20.05187311)
5 2 1
5 0.007865529
The results for f [x1, x0], f[x2, x1, x0], and f [x3, x2, x1, x0] represent the coefficients b1, b2,
and b3, respectively, of Eq. (18.7). Along with b0 5 f(x0) 5 0.0, Eq. (18.7) is
f3(x) 5 0 1 0.4620981(x 2 1) 2 0.05187311(x 2 1)(x 2 4)
1 0.007865529(x 2 1)(x 2 4)(x 2 6)
which can be used to evaluate f3(2) 5 0.6287686, which represents a relative error of
␧t 5 9.3%. The complete cubic polynomial is shown in Fig. 18.6.
f(x)
f(x) = ln x
f3(x)
True
value
Cubic
estimate
x
5
0
2
0
1
FIGURE 18.6
The use of cubic interpolation to estimate ln 2.
18.1.4 Errors of Newton’s Interpolating Polynomials
Notice that the structure of Eq. (18.15) is similar to the Taylor series expansion in the
sense that terms are added sequentially to capture the higher-order behavior of the
underlying function. These terms are finite divided differences and, thus, represent
498 INTERPOLATION
approximations of the higher-order derivatives. Consequently, as with the Taylor series,
if the true underlying function is an nth-order polynomial, the nth-order interpolating
polynomial based on n 1 1 data points will yield exact results.
Also, as was the case with the Taylor series, a formulation for the truncation error
can be obtained. Recall from Eq. (4.6) that the truncation error for the Taylor series could
be expressed generally as
Rn 5
f (n11)
(j)
(n 1 1)!
(xi11 2 xi)n11
(4.6)
where ␰ is somewhere in the interval xi to xi11. For an nth-order interpolating polynomial,
an analogous relationship for the error is
Rn 5
f (n11)
(j)
(n 1 1)!
(x 2 x0)(x 2 x1) p (x 2 xn) (18.16)
where ␰ is somewhere in the interval containing the unknown and the data. For this
formula to be of use, the function in question must be known and differentiable. This is
not usually the case. Fortunately, an alternative formulation is available that does not
require prior knowledge of the function. Rather, it uses a finite divided difference to
approximate the (n 1 1)th derivative,
Rn 5 f[x, xn, xn21, p , x0](x 2 x0)(x 2 x1) p (x 2 xn) (18.17)
where f[x, xn, xn21, . . . , x0] is the (n 1 1)th finite divided difference. Because Eq. (18.17)
contains the unknown f(x), it cannot be solved for the error. However, if an additional
data point f(xn11) is available, Eq. (18.17) can be used to estimate the error, as in
Rn  f[xn11, xn, xn21, p , x0](x 2 x0)(x 2 x1) p (x 2 xn) (18.18)
EXAMPLE 18.4 Error Estimation for Newton’s Polynomial
Problem Statement. Use Eq. (18.18) to estimate the error for the second-order polyno-
mial interpolation of Example 18.2. Use the additional data point f(x3) 5 f(5) 5 1.609438
to obtain your results.
Solution. Recall that in Example 18.2, the second-order interpolating polynomial provided
an estimate of f2(2) 5 0.5658444, which represents an error of 0.6931472 2 0.5658444 5
0.1273028. If we had not known the true value, as is most usually the case, Eq. (18.18),
along with the additional value at x3, could have been used to estimate the error, as in
R2 5 f[x3, x2, x1, x0](x 2 x0)(x 2 x1)(x 2 x2)
or
R2 5 0.007865529(x 2 1)(x 2 4)(x 2 6)
where the value for the third-order finite divided difference is as computed previously in
Example 18.3. This relationship can be evaluated at x 5 2 for
R2 5 0.007865529(2 2 1)(2 2 4)(2 2 6) 5 0.0629242
which is of the same order of magnitude as the true error.
18.1 NEWTON’S DIVIDED-DIFFERENCE INTERPOLATING POLYNOMIALS 499
From the previous example and from Eq. (18.18), it should be clear that the error esti-
mate for the nth-order polynomial is equivalent to the difference between the (n 1 1)th
order and the nth-order prediction. That is,
Rn 5 fn11(x) 2 fn(x) (18.19)
In other words, the increment that is added to the nth-order case to create the (n 1 1)th-
order case [that is, Eq. (18.18)] is interpreted as an estimate of the nth-order error. This
can be clearly seen by rearranging Eq. (18.19) to give
fn11(x) 5 fn(x) 1 Rn
The validity of this approach is predicated on the fact that the series is strongly con-
vergent. For such a situation, the (n 1 1)th-order prediction should be much closer to
the true value than the nth-order prediction. Consequently, Eq. (18.19) conforms to our
standard definition of error as representing the difference between the truth and an
approximation. However, note that whereas all other error estimates for iterative
approaches introduced up to this point have been determined as a present prediction
minus a previous one, Eq. (18.19) represents a future prediction minus a present one.
This means that for a series that is converging rapidly, the error estimate of Eq. (18.19)
could be less than the true error. This would represent a highly unattractive quality if
the error estimate were being employed as a stopping criterion. However, as will be
described in the following section, higher-order interpolating polynomials are highly
sensitive to data errors—that is, they are very ill-conditioned. When employed for in-
terpolation, they often yield predictions that diverge significantly from the true value.
By “looking ahead” to sense errors, Eq. (18.19) is more sensitive to such divergence.
As such, it is more valuable for the sort of exploratory data analysis for which Newton’s
polynomial is best-suited.
18.1.5 Computer Algorithm for Newton’s Interpolating Polynomial
Three properties make Newton’s interpolating polynomials extremely attractive for com-
puter applications:
1. As in Eq. (18.7), higher-order versions can be developed sequentially by adding a
single term to the next lower-order equation. This facilitates the evaluation of several
different-order versions in the same program. Such a capability is especially valuable
when the order of the polynomial is not known a priori. By adding new terms se-
quentially, we can determine when a point of diminishing returns is reached—that is,
when addition of higher-order terms no longer significantly improves the estimate or
in certain situations actually detracts from it. The error equations discussed below in (3)
are useful in devising an objective criterion for identifying this point of diminishing
terms.
2. The finite divided differences that constitute the coefficients of the polynomial [Eqs. (18.8)
through (18.11)] can be computed efficiently. That is, as in Eq. (18.14) and Fig. 18.5,
lower-order differences are used to compute higher-order differences. By utilizing this
previously determined information, the coefficients can be computed efficiently. The
algorithm in Fig. 18.7 contains such a scheme.
3. The error estimate [Eq. (18.18)] can be very simply incorporated into a computer
algorithm because of the sequential way in which the prediction is built.
500 INTERPOLATION
All the above characteristics can be exploited and incorporated into a general algo-
rithm for implementing Newton’s polynomial (Fig. 18.7). Note that the algorithm consists
of two parts: The first determines the coefficients from Eq. (18.7), and the second deter-
mines the predictions and their associated error. The utility of this algorithm is demon-
strated in the following example.
EXAMPLE 18.5 Error Estimates to Determine the Appropriate Order of Interpolation
Problem Statement. After incorporating the error [Eq. (18.18)], utilize the computer
algorithm given in Fig. 18.7 and the following information to evaluate f(x) 5 ln x
at x 5 2:
x f (x) ⫽ ln x
1 0
4 1.3862944
6 1.7917595
5 1.6094379
3 1.0986123
1.5 0.4054641
2.5 0.9162907
3.5 1.2527630
SUBROUTINE NewtInt (x, y, n, xi, yint, ea)
LOCAL fddn,n
DOFOR i 5 0, n
fddi,0 5 yi
END DO
DOFOR j 5 1, n
DOFOR i 5 0, n 2 j
fddi,j 5 (fddi11,j21 2 fddi,j21)/(xi1j 2 xi)
END DO
END DO
xterm 5 1
yint0 5 fdd0,0
DOFOR order 5 1, n
xterm 5 xterm * (xi 2 xorder21)
yint2 5 yintorder21 1 fdd0,order * xterm
eaorder21 5 yint2 2 yintorder21
yintorder 5 yint2
END order
END NewtInt
FIGURE 18.7
An algorithm for Newton’s interpolating polynomial written in pseudocode.
18.1 NEWTON’S DIVIDED-DIFFERENCE INTERPOLATING POLYNOMIALS 501
Solution. The results of employing the algorithm in Fig. 18.7 to obtain a solution are
shown in Fig. 18.8. The error estimates, along with the true error (based on the fact that
ln 2 5 0.6931472), are depicted in Fig. 18.9. Note that the estimated error and the true
error are similar and that their agreement improves as the order increases. From these
results, it can be concluded that the fifth-order version yields a good estimate and that
higher-order terms do not significantly enhance the prediction.
This exercise also illustrates the importance of the positioning and ordering of the
points. For example, up through the third-order estimate, the rate of improvement is slow
because the points that are added (at x 5 4, 6, and 5) are distant and on one side of the
point in question at x 5 2. The fourth-order estimate shows a somewhat greater improve-
ment because the new point at x 5 3 is closer to the unknown. However, the most
dramatic decrease in the error is associated with the inclusion of the fifth-order term
using the data point at x 5 1.5. Not only is this point close to the unknown but it is also
positioned on the opposite side from most of the other points. As a consequence, the
error is reduced by almost an order of magnitude.
The significance of the position and sequence of these data can also be illustrated
by using the same data to obtain an estimate for ln 2 but considering the points in a
different sequence. Figure 18.9 shows results for the case of reversing the order of the
original data, that is, x0 5 3.5, x1 5 2.5, x3 5 1.5, and so forth. Because the initial points
for this case are closer to and spaced on either side of ln 2, the error decreases much
more rapidly than for the original situation. By the second-order term, the error has been
reduced to less than ␧t 5 2%. Other combinations could be employed to obtain different
rates of convergence.
NUMBER OF POINTS? 8
X( 0 ), y( 0 ) = ? 1,0
X( 1 ), y( 1 ) = ? 4,1.3862944
X( 2 ), y( 2 ) = ? 6,1.7917595
X( 3 ), y( 3 ) = ? 5,1.6094379
X( 4 ), y( 4 ) = ? 3,1.0986123
X( 5 ), y( 5 ) = ? 1.5,0.40546411
X( 6 ), y( 6 ) = ? 2.5,0.91629073
X( 7 ), y( 7 ) = ? 3.5,1.2527630
INTERPOLATION AT X = 2
ORDER F(X) ERROR
0 0.000000 0.462098
1 0.462098 0.103746
2 0.565844 0.062924
3 0.628769 0.046953
4 0.675722 0.021792
5 0.697514 -0.003616
6 0.693898 -0.000459
7 0.693439
FIGURE 18.8
The output of a program, based on the algorithm from Fig. 18.7 to evaluate ln 2.
502 INTERPOLATION
The foregoing example illustrates the importance of the choice of base points. As
should be intuitively obvious, the points should be centered around and as close as pos-
sible to the unknown. This observation is also supported by direct examination of the
error equation [Eq. (18.17)]. If we assume that the finite divided difference does not vary
markedly along the range of these data, the error is proportional to the product:
(x 2 x0)(x 2 x1) p (x 2 xn). Obviously, the closer the base points are to x, the smaller
the magnitude of this product.
18.2 LAGRANGE INTERPOLATING POLYNOMIALS
The Lagrange interpolating polynomial is simply a reformulation of the Newton polyno-
mial that avoids the computation of divided differences. It can be represented concisely as
fn(x) 5 a
n
i50
Li(x) f(xi) (18.20)
FIGURE 18.9
Percent relative errors for the prediction of ln 2 as a function of the order of the interpolating
polynomial.
Error
True error (original)
Estimated error (original)
Estimated error (reversed)
Order
5
0.5
0
–0.5
18.2 LAGRANGE INTERPOLATING POLYNOMIALS 503
where
Li(x) 5 q
n
j50
j?1
x 2 xj
xi 2 xj
(18.21)
where P designates the “product of.” For example, the linear version (n 5 1) is
f1(x) 5
x 2 x1
x0 2 x1
f(x0) 1
x 2 x0
x1 2 x0
f(x1) (18.22)
and the second-order version is
f2(x) 5
(x 2 x1)(x 2 x2)
(x0 2 x1)(x0 2 x2)
f(x0) 1
(x 2 x0)(x 2 x2)
(x1 2 x0)(x1 2 x2)
f(x1)
1
(x 2 x0)(x 2 x1)
(x2 2 x0)(x2 2 x1)
f(x2) (18.23)
Equation (18.20) can be derived directly from Newton’s polynomial (Box 18.1).
However, the rationale underlying the Lagrange formulation can be grasped directly by
realizing that each term Li(x) will be 1 at x 5 xi and 0 at all other sample points
(Fig. 18.10). Thus, each product Li(x)f(xi) takes on the value of f(xi) at the sample point xi.
Consequently, the summation of all the products designated by Eq. (18.20) is the unique
nth-order polynomial that passes exactly through all n 1 1 data points.
EXAMPLE 18.6 Lagrange Interpolating Polynomials
Problem Statement. Use a Lagrange interpolating polynomial of the first and second
order to evaluate ln 2 on the basis of the data given in Example 18.2:
x0 5 1 f(x0) 5 0
x1 5 4 f(x1) 5 1.386294
x2 5 6 f(x2) 5 1.791760
Solution. The first-order polynomial [Eq. (18.22)] can be used to obtain the estimate
at x 5 2,
f1(2) 5
2 2 4
1 2 4
0 1
2 2 1
4 2 1
1.386294 5 0.4620981
In a similar fashion, the second-order polynomial is developed as [Eq. (18.23)]
f2(2) 5
(2 2 4)(2 2 6)
(1 2 4)(1 2 6)
0 1
(2 2 1)(2 2 6)
(4 2 1)(4 2 6)
1.386294
1
(2 2 1)(2 2 4)
(6 2 1)(6 2 4)
1.791760 5 0.5658444
As expected, both these results agree with those previously obtained using Newton’s
interpolating polynomial.
504 INTERPOLATION
Box 18.1 Derivation of the Lagrange Form Directly from Newton’s Interpolating
Polynomial
The Lagrange interpolating polynomial can be derived directly
from Newton’s formulation. We will do this for the first-order case
only [Eq. (18.2)]. To derive the Lagrange form, we reformulate the
divided differences. For example, the first divided difference,
f[x1, x0] 5
f(x1) 2 f(x0)
x1 2 x0
(B18.1.1)
can be reformulated as
f[x1, x0] 5
f(x1)
x1 2 x0
1
f(x0)
x0 2 x1
(B18.1.2)
which is referred to as the symmetric form. Substituting Eq.
(B18.1.2) into Eq. (18.2) yields
f1(x) 5 f(x0) 1
x 2 x0
x1 2 x0
f(x1) 1
x 2 x0
x0 2 x1
f(x0)
Finally, grouping similar terms and simplifying yields the La-
grange form,
f1(x) 5
x 2 x1
x0 2 x1
f(x0) 1
x 2 x0
x1 2 x0
f(x1)
FIGURE 18.10
A visual depiction of the rationale behind the Lagrange polynomial. This figure shows
a second-order case. Each of the three terms in Eq. (18.23) passes through one of the data
points and is zero at the other two. The summation of the three terms must, therefore, be the
unique second-order polynomial f2(x) that passes exactly through the three points.
Summation
of three
terms = f2(x)
Third term
Second term
150
0
100
50
–150
–100
–50
20
15 30
25
First term
18.2 LAGRANGE INTERPOLATING POLYNOMIALS 505
Note that, as with Newton’s method, the Lagrange version has an estimated error of
[Eq. (18.17)]
Rn 5 f[x, xn, xn21, p , x0] q
n
i50
(x 2 xi)
Thus, if an additional point is available at x 5 xn11, an error estimate can be obtained.
However, because the finite divided differences are not employed as part of the Lagrange
algorithm, this is rarely done.
Equations (18.20) and (18.21) can be very simply programmed for implementation
on a computer. Figure 18.11 shows pseudocode that can be employed for this purpose.
In summary, for cases where the order of the polynomial is unknown, the Newton
method has advantages because of the insight it provides into the behavior of the
different-order formulas. In addition, the error estimate represented by Eq. (18.18) can
usually be integrated easily into the Newton computation because the estimate employs
a finite difference (Example 18.5). Thus, for exploratory computations, Newton’s method
is often preferable.
When only one interpolation is to be performed, the Lagrange and Newton formula-
tions require comparable computational effort. However, the Lagrange version is some-
what easier to program. Because it does not require computation and storage of divided
differences, the Lagrange form is often used when the order of the polynomial is known
a priori.
EXAMPLE 18.7 Lagrange Interpolation Using the Computer
Problem Statement. We can use the algorithm from Fig. 18.11 to study a trend analysis
problem associated with our now-familiar falling parachutist. Assume that we have
FUNCTION Lagrng(x, y, n, xx)
sum 5 0
DOFOR i 5 0, n
product 5 yi
DOFOR j 5 0, n
IF i ⫽ j THEN
product 5 product*(xx 2 xj)/(xi 2 xj)
ENDIF
END DO
sum 5 sum 1 product
END DO
Lagrng 5 sum
END Lagrng
FIGURE 18.11
Pseudocode to implement Lagrange interpolation. This algorithm is set up to compute a single
nth-order prediction, where n 1 1 is the number of data points.
506 INTERPOLATION
developed instrumentation to measure the velocity of the parachutist. The measured data
obtained for a particular test case are
Time, Measured Velocity v,
s cm/s
1 800
3 2310
5 3090
7 3940
13 4755
Our problem is to estimate the velocity of the parachutist at t 5 10s to fill in the large
gap in the measurements between t 5 7 and t 5 13s. We are aware that the behavior of
interpolating polynomials can be unexpected. Therefore, we will construct polynomials
of orders 4, 3, 2, and 1 and compare the results.
Solution. The Lagrange algorithm can be used to construct fourth-, third-, second-, and
first-order interpolating polynomials.
The fourth-order polynomial and the input data can be plotted as shown in Fig. 18.12a.
It is evident from this plot that the estimated value of y at x 5 10 is higher than the
overall trend of these data.
Figure 18.12b through d shows plots of the results of the computations for third-,
second-, and first-order interpolating polynomials, respectively. It is noted that the lower
the order, the lower the estimated value of the velocity at t 5 10s. The plots of the in-
terpolating polynomials indicate that the higher-order polynomials tend to overshoot the
trend of these data. This suggests that the first- or second-order versions are most ap-
propriate for this particular trend analysis. It should be remembered, however, that be-
cause we are dealing with uncertain data, regression would actually be more appropriate.
v,
cm/s
v,
cm/s
0
0
3000
6000
5 10 15
0
0
3000
6000
5 10
t(s)
15
0
0
3000
6000
5 10 15
0
0
3000
6000
5 10
t(s)
15
(a) (b)
(c) (d)
FIGURE 18.12
Plots showing (a) fourth-order,
(b) third-order, (c) second-order,
and (d) first-order interpolations.
18.4 INVERSE INTERPOLATION 507
The preceding example illustrates that higher-order polynomials tend to be ill-
conditioned, that is, they tend to be highly sensitive to round-off error. The same problem
applies to higher-order polynomial regression. Double-precision arithmetic sometimes
helps mitigate the problem. However, as the order increases, there will come a point at
which round-off error will interfere with the ability to interpolate using the simple
approaches covered to this point.
18.3 COEFFICIENTS OF AN INTERPOLATING POLYNOMIAL
Although both the Newton and the Lagrange polynomials are well-suited for determining
intermediate values between points, they do not provide a convenient polynomial of the
conventional form
f(x) 5 a0 1 a1x 1 a2x2
1 p 1 anxn
(18.24)
A straightforward method for computing the coefficients of this polynomial is based
on the fact that n 1 1 data points are required to determine the n 1 1 coefficients. Thus,
simultaneous linear algebraic equations can be used to calculate the a’s. For example,
suppose that you desired to compute the coefficients of the parabola
f(x) 5 a0 1 a1x 1 a2x2
(18.25)
Three data points are required: [x0, f(x0)], [x1, f(x1)], and [x2, f(x2)]. Each can be substi-
tuted into Eq. (18.25) to give
f(x0) 5 a0 1 a1x0 1 a2x2
0
f(x1) 5 a0 1 a1x1 1 a2x2
1 (18.26)
f(x2) 5 a0 1 a1x2 1 a2x2
2
Thus, for this case, the x’s are the knowns and the a’s are the unknowns. Because there
are the same number of equations as unknowns, Eq. (18.26) could be solved by an
elimination method from Part Three.
It should be noted that the foregoing approach is not the most efficient method that
is available to determine the coefficients of an interpolating polynomial. Press et al.
(2007) provide a discussion and computer codes for more efficient approaches. Whatever
technique is employed, a word of caution is in order. Systems such as Eq. (18.26) are
notoriously ill-conditioned. Whether they are solved with an elimination method or with
a more efficient algorithm, the resulting coefficients can be highly inaccurate, particularly
for large n. When used for a subsequent interpolation, they often yield erroneous results.
In summary, if you are interested in determining an intermediate point, employ
Newton or Lagrange interpolation. If you must determine an equation of the form of
Eq. (18.24), limit yourself to lower-order polynomials and check your results carefully.
18.4 INVERSE INTERPOLATION
As the nomenclature implies, the f(x) and x values in most interpolation contexts are the
dependent and independent variables, respectively. As a consequence, the values of the x’s
are typically uniformly spaced. A simple example is a table of values derived for the
508 INTERPOLATION
function f(x) 5 1yx,
x 1 2 3 4 5 6 7
f(x) 1 0.5 0.3333 0.25 0.2 0.1667 0.1429
Now suppose that you must use the same data, but you are given a value for f(x)
and must determine the corresponding value of x. For instance, for the data above, sup-
pose that you were asked to determine the value of x that corresponded to f(x) 5 0.3.
For this case, because the function is available and easy to manipulate, the correct answer
can be determined directly as x 5 1y0.3 5 3.3333.
Such a problem is called inverse interpolation. For a more complicated case, you
might be tempted to switch the f(x) and x values [that is, merely plot x versus f(x)] and
use an approach like Lagrange interpolation to determine the result. Unfortunately, when
you reverse the variables, there is no guarantee that the values along the new abscissa
[the f(x)’s] will be evenly spaced. In fact, in many cases, the values will be “telescoped.”
That is, they will have the appearance of a logarithmic scale with some adjacent points
bunched together and others spread out widely. For example, for f(x) 5 1yx the result is
f(x) 0.1429 0.1667 0.2 0.25 0.3333 0.5 1
x 7 6 5 4 3 2 1
Such nonuniform spacing on the abscissa often leads to oscillations in the resulting
interpolating polynomial. This can occur even for lower-order polynomials.
An alternative strategy is to fit an nth-order interpolating polynomial, fn(x), to the
original data [that is, with f(x) versus x]. In most cases, because the x’s are evenly spaced,
this polynomial will not be ill-conditioned. The answer to your problem then amounts
to finding the value of x that makes this polynomial equal to the given f(x). Thus, the
interpolation problem reduces to a roots problem!
For example, for the problem outlined above, a simple approach would be to fit a qua-
dratic polynomial to the three points: (2, 0.5), (3, 0.3333) and (4, 0.25). The result would be
f2(x) 5 1.08333 2 0.375x 1 0.041667x2
The answer to the inverse interpolation problem of finding the x corresponding to f(x) 5 0.3
would therefore involve determining the root of
0.3 5 1.08333 2 0.375x 1 0.041667x2
For this simple case, the quadratic formula can be used to calculate
x 5
0.375 6 2(20.375)2
2 4(0.041667)0.78333
2(0.041667)
5
5.704158
3.295842
Thus, the second root, 3.296, is a good approximation of the true value of 3.333. If
additional accuracy were desired, a third- or fourth-order polynomial along with one of
the root location methods from Part Two could be employed.
18.5 ADDITIONAL COMMENTS
Before proceeding to the next section, we must mention two additional topics: interpola-
tion with equally spaced data and extrapolation.
18.5 ADDITIONAL COMMENTS 509
Because both the Newton and Lagrange polynomials are compatible with arbitrarily
spaced data, you might wonder why we address the special case of equally spaced data
(Box 18.2). Prior to the advent of digital computers, these techniques had great utility
for interpolation from tables with equally spaced arguments. In fact, a computational
framework known as a divided-difference table was developed to facilitate the imple-
mentation of these techniques. (Figure 18.5 is an example of such a table.)
However, because the formulas are subsets of the computer-compatible Newton and
Lagrange schemes and because many tabular functions are available as library subroutines,
the need for the equispaced versions has waned. In spite of this, we have included them
at this point because of their relevance to later parts of this book. In particular, they are
needed to derive numerical integration formulas that typically employ equispaced data
(Chap. 21). Because the numerical integration formulas have relevance to the solution of
ordinary differential equations, the material in Box 18.2 also has significance to Part Seven.
Extrapolation is the process of estimating a value of f(x) that lies outside the range
of the known base points, x0, x1, . . . , xn (Fig. 18.13). In a previous section, we mentioned
that the most accurate interpolation is usually obtained when the unknown lies near the
center of the base points. Obviously, this is violated when the unknown lies outside the
range, and consequently, the error in extrapolation can be very large. As depicted in
Fig. 18.13, the open-ended nature of extrapolation represents a step into the unknown
because the process extends the curve beyond the known region. As such, the true curve
could easily diverge from the prediction. Extreme care should, therefore, be exercised
whenever a case arises where one must extrapolate.
FIGURE 18.13
Illustration of the possible divergence of an extrapolated prediction. The extrapolation is based
on fitting a parabola through the first three known points.
f(x)
x
True
curve
Extrapolation
of interpolating
polynomial
Interpolation Extrapolation
x2
x1
x0
510 INTERPOLATION
Box 18.2 Interpolation with Equally Spaced Data
If data are equally spaced and in ascending order, then the indepen-
dent variable assumes values of
x1 5 x0 1 h
x2 5 x0 1 2h
.
.
.
xn 5 x0 1 nh
where h is the interval, or step size, between these data. On this
basis, the finite divided differences can be expressed in concise
form. For example, the second forward divided difference is
f[x0, x1, x2] 5
f(x2) 2 f(x1)
x2 2 x1
2
f(x1) 2 f(x0)
x1 2 x0
x2 2 x0
which can be expressed as
f[x0, x1, x2] 5
f(x2) 2 2 f(x1) 1 f(x0)
2h2
(B18.2.1)
because x1 2 x0 5 x2 2 x1 5 (x2 2 x0)y2 5 h. Now recall that the
second forward difference is equal to [numerator of Eq. (4.24)]
¢2
f(x0) 5 f(x2) 2 2 f(x1) 1 f(x0)
Therefore, Eq. (B18.2.1) can be represented as
f[x0, x1, x2] 5
¢2
f(x0)
2!h2
or, in general,
f[x0, x1, p , xn] 5
¢n
f(x0)
n!hn (B18.2.2)
Using Eq. (B18.2.2), we can express Newton’s interpolating poly-
nomial [Eq. (18.15)] for the case of equispaced data as
fn(x) 5 f(x0) 1
¢ f(x0)
h
(x 2 x0)
1
¢2
f(x0)
2!h2
(x 2 x0)(x 2 x0 2 h)
1 p 1
¢n
f(x0)
n!hn (x 2 x0)(x 2 x0 2 h)
p[x 2 x0 2 (n 2 1)h] 1 Rn (B18.2.3)
where the remainder is the same as Eq. (18.16). This equation is
known as Newton’s formula, or the Newton-Gregory forward for-
mula. It can be simplified further by defining a new quantity, ␣:
a 5
x 2 x0
h
This definition can be used to develop the following simplified ex-
pressions for the terms in Eq. (B18.2.3):
x 2 x0 5 ah
x 2 x0 2 h 5 ah 2 h 5 h(a 2 1)
.
.
.
x 2 x0 2 (n 2 1)h 5 ah 2 (n 2 1)h 5 h(a 2 n 1 1)
which can be substituted into Eq. (B18.2.3) to give
fn(x) 5 f(x0) 1 ¢f(x0)a 1
¢2
f(x0)
2!
a(a 2 1)
1 p 1
¢n
f(x0)
n!
a(a 2 1) p (a 2 n 1 1) 1 Rn
(B18.2.4)
where
Rn 5
f (n11)
(j)
(n 1 1)!
hn11
a(a 2 1)(a 2 2) p (a 2 n)
This concise notation will have utility in our derivation and error
analyses of the integration formulas in Chap. 21.
In addition to the forward formula, backward and central
Newton-Gregory formulas are also available. Carnahan, Luther,
and Wilkes (1969) can be consulted for further information regard-
ing interpolation for equally spaced data.
18.6 SPLINE INTERPOLATION 511
18.6 SPLINE INTERPOLATION
In the previous sections, nth-order polynomials were used to interpolate between n 1 l
data points. For example, for eight points, we can derive a perfect seventh-order poly-
nomial. This curve would capture all the meanderings (at least up to and including
seventh derivatives) suggested by the points. However, there are cases where these func-
tions can lead to erroneous results because of round-off error and overshoot. An alterna-
tive approach is to apply lower-order polynomials to subsets of data points. Such
connecting polynomials are called spline functions.
For example, third-order curves employed to connect each pair of data points are
called cubic splines. These functions can be constructed so that the connections between
adjacent cubic equations are visually smooth. On the surface, it would seem that the
third-order approximation of the splines would be inferior to the seventh-order expres-
sion. You might wonder why a spline would ever be preferable.
Figure 18.14 illustrates a situation where a spline performs better than a higher-
order polynomial. This is the case where a function is generally smooth but undergoes
an abrupt change somewhere along the region of interest. The step increase depicted in
Fig. 18.14 is an extreme example of such a change and serves to illustrate the point.
Figure 18.14a through c illustrates how higher-order polynomials tend to swing
through wild oscillations in the vicinity of an abrupt change. In contrast, the spline also
connects the points, but because it is limited to lower-order changes, the oscillations are
kept to a minimum. As such, the spline usually provides a superior approximation of the
behavior of functions that have local, abrupt changes.
The concept of the spline originated from the drafting technique of using a thin,
flexible strip (called a spline) to draw smooth curves through a set of points. The process
is depicted in Fig. 18.15 for a series of five pins (data points). In this technique, the
drafter places paper over a wooden board and hammers nails or pins into the paper (and
board) at the location of the data points. A smooth cubic curve results from interweaving
the strip between the pins. Hence, the name “cubic spline” has been adopted for poly-
nomials of this type.
In this section, simple linear functions will first be used to introduce some basic
concepts and problems associated with spline interpolation. Then we derive an algorithm
for fitting quadratic splines to data. Finally, we present material on the cubic spline,
which is the most common and useful version in engineering practice.
18.6.1 Linear Splines
The simplest connection between two points is a straight line. The first-order splines for
a group of ordered data points can be defined as a set of linear functions,
f(x) 5 f(x0) 1 m0(x 2 x0) x0 # x # x1
f(x) 5 f(x1) 1 m1(x 2 x1) x1 # x # x2
.
.
.
f(x) 5 f(xn21) 1 mn21(x 2 xn21) xn21 # x # xn
512 INTERPOLATION
where mi is the slope of the straight line connecting the points:
mi 5
f(xi11) 2 f(xi)
xi11 2 xi
(18.27)
These equations can be used to evaluate the function at any point between x0 and xn
by first locating the interval within which the point lies. Then the appropriate equation
(a)
f(x)
x
0
(b)
f(x)
x
0
(c)
f(x)
x
0
(d)
f(x)
x
0
FIGURE 18.14
A visual representation of a situation where the splines are superior to higher-order interpolating
polynomials. The function to be fit undergoes an abrupt increase at x 5 0. Parts (a) through
(c) indicate that the abrupt change induces oscillations in interpolating polynomials. In contrast,
because it is limited to third-order curves with smooth transitions, a linear spline (d) provides a
much more acceptable approximation.
18.6 SPLINE INTERPOLATION 513
is used to determine the function value within the interval. The method is obviously
identical to linear interpolation.
EXAMPLE 18.8 First-Order Splines
Problem Statement. Fit the data in Table 18.1 with first-order splines. Evaluate the
function at x 5 5.
Solution. These data can be used to determine the slopes between points. For example,
for the interval x 5 4.5 to x 5 7 the slope can be computed using Eq. (18.27):
m 5
2.5 2 7
7 2 4.5
5 0.60
The slopes for the other intervals can be computed, and the resulting first-order splines
are plotted in Fig. 18.16a. The value at x 5 5 is 1.3.
TABLE 18.1
Data to be fit with
spline functions.
x f(x)
3.0 2.5
4.5 1.0
7.0 2.5
9.0 0.5
FIGURE 18.15
The drafting technique of using a spline to draw smooth curves through a series of points. Notice
how, at the end points, the spline straightens out. This is called a “natural” spline.
514 INTERPOLATION
Visual inspection of Fig. 18.16a indicates that the primary disadvantage of first-
order splines is that they are not smooth. In essence, at the data points where two splines
meet (called a knot), the slope changes abruptly. In formal terms, the first derivative of
the function is discontinuous at these points. This deficiency is overcome by using higher-
order polynomial splines that ensure smoothness at the knots by equating derivatives at
these points, as discussed in the next section.
18.6.2 Quadratic Splines
To ensure that the mth derivatives are continuous at the knots, a spline of at least m 1 1
order must be used. Third-order polynomials or cubic splines that ensure continuous first
and second derivatives are most frequently used in practice. Although third and higher
derivatives could be discontinuous when using cubic splines, they usually cannot be
detected visually and consequently are ignored.
FIGURE 18.16
Spline fits of a set of four points. (a) Linear spline, (b) quadratic spline, and (c) cubic spline, with
a cubic interpolating polynomial also plotted.
f(x)
x
10
2 4 6
(a)
8
0
2
f(x)
x
(b)
0
2
f(x)
x
(c)
0
Interpolating
cubic
First-order
spline
Second-order
spline
Cubic
spline
2
18.6 SPLINE INTERPOLATION 515
Because the derivation of cubic splines is somewhat involved, we have chosen to
include them in a subsequent section. We have decided to first illustrate the concept of
spline interpolation using second-order polynomials. These “quadratic splines” have con-
tinuous first derivatives at the knots. Although quadratic splines do not ensure equal
second derivatives at the knots, they serve nicely to demonstrate the general procedure
for developing higher-order splines.
The objective in quadratic splines is to derive a second-order polynomial for each in-
terval between data points. The polynomial for each interval can be represented generally as
fi(x) 5 aix2
1 bix 1 ci (18.28)
Figure 18.17 has been included to help clarify the notation. For n 1 1 data points (i 5 0, 1,
2, . . . , n), there are n intervals and, consequently, 3n unknown constants (the a’s, b’s,
and c’s) to evaluate. Therefore, 3n equa
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one
numerical methods for civil engineering for every one

More Related Content

DOCX
This is a great start, however, I think the addition of statistica.docx
DOCX
Applied Numerical Methodswith MATLAB® for Engineers and .docx
PDF
Applied Numerical Methods With MATLAB For Engineers And Scientists Third Ed...
PDF
Numerical Computing
DOCX
A First Course in NumeriCAl methodsCS07_Ascher-Gre.docx
PDF
Numerical Methods in Engineering with MATLAB - Jaan Kiusalaas.pdf
PDF
Wiley numerical analysis with applications in mechanics and engineering by ...
PDF
Numerical Methods for Engineers and Scientists 3rd Edition Amos Gilat
This is a great start, however, I think the addition of statistica.docx
Applied Numerical Methodswith MATLAB® for Engineers and .docx
Applied Numerical Methods With MATLAB For Engineers And Scientists Third Ed...
Numerical Computing
A First Course in NumeriCAl methodsCS07_Ascher-Gre.docx
Numerical Methods in Engineering with MATLAB - Jaan Kiusalaas.pdf
Wiley numerical analysis with applications in mechanics and engineering by ...
Numerical Methods for Engineers and Scientists 3rd Edition Amos Gilat

Similar to numerical methods for civil engineering for every one (20)

PDF
Best of numerical
PDF
Numerical Methods For Engineers_S. C. Chapra And R. P. Canale.pdf
PDF
Approximation Techniques For Engineers Louis Komzsik
PDF
Numerical Methods For Chemical Engineering Applications In Matlab Beers
PDF
Numerical methods by Jeffrey R. Chasnov
PDF
AN INTRODUCTION TO NUMERICAL METHODS USING MATHCAD Mathcad Release 14
PDF
Introduction to Computational Mathematics (2nd Edition, 2015)
PDF
James_F_Epperson_An_Introduction_to_Numerical_Methods_and_Analysis.pdf
PDF
Na 20130603
PDF
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
PDF
On the Numerical Solution of Differential Equations
PDF
Solution Manual for Engineers Guide to MATLAB, 3/E 3rd Edition
PDF
Numerical Analysis
PDF
Diederik Fokkema - Thesis
PDF
Barret templates
PDF
(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...
PDF
Engineering Numerical Analysis-Introduction.pdf
PDF
Dennis j e, schnabel b numerical methods for unconstrained optimization and n...
DOCX
Computational methods couurseout line
DOCX
Informe #1 de metodos
Best of numerical
Numerical Methods For Engineers_S. C. Chapra And R. P. Canale.pdf
Approximation Techniques For Engineers Louis Komzsik
Numerical Methods For Chemical Engineering Applications In Matlab Beers
Numerical methods by Jeffrey R. Chasnov
AN INTRODUCTION TO NUMERICAL METHODS USING MATHCAD Mathcad Release 14
Introduction to Computational Mathematics (2nd Edition, 2015)
James_F_Epperson_An_Introduction_to_Numerical_Methods_and_Analysis.pdf
Na 20130603
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
On the Numerical Solution of Differential Equations
Solution Manual for Engineers Guide to MATLAB, 3/E 3rd Edition
Numerical Analysis
Diederik Fokkema - Thesis
Barret templates
(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...
Engineering Numerical Analysis-Introduction.pdf
Dennis j e, schnabel b numerical methods for unconstrained optimization and n...
Computational methods couurseout line
Informe #1 de metodos
Ad

More from karthik sampath (14)

PDF
Introduction-To-AI(L-1) for civil engineers
PDF
PSC Box Bridge Single Span for structural engineers
PDF
midas Civil for Beginners_Single Span PSC Box Girder Bridge (1).pdf
PDF
Bridge Engineering regarding infrastructures
DOCX
water Resources engineering for everyone.
PDF
Artificial Intelligence -related to Civil Engineering
PDF
2023 scheme 1st -4th sem- M.Tech Scheme and Syllabus.pdf
DOC
FINAL HSC REPORT A SUDDEN VIOLENT SHAKING OF THE GROUND ,TYPICALLY CAUSING GR...
PPTX
artificialintelligenceincivilengineering-181022061712.pptx
PDF
DOC-20240312-WA0003_240312_124000_240323_084118.pdf
PPTX
NBA FINAL PPT-03-03-22.pptx
PDF
Earthquake_Response_of_Medium_Rise_to_Hi (3).pdf
PDF
Introduction to Bridges.pdf
PDF
RCC_Module-5_Backbencher.club.pdf
Introduction-To-AI(L-1) for civil engineers
PSC Box Bridge Single Span for structural engineers
midas Civil for Beginners_Single Span PSC Box Girder Bridge (1).pdf
Bridge Engineering regarding infrastructures
water Resources engineering for everyone.
Artificial Intelligence -related to Civil Engineering
2023 scheme 1st -4th sem- M.Tech Scheme and Syllabus.pdf
FINAL HSC REPORT A SUDDEN VIOLENT SHAKING OF THE GROUND ,TYPICALLY CAUSING GR...
artificialintelligenceincivilengineering-181022061712.pptx
DOC-20240312-WA0003_240312_124000_240323_084118.pdf
NBA FINAL PPT-03-03-22.pptx
Earthquake_Response_of_Medium_Rise_to_Hi (3).pdf
Introduction to Bridges.pdf
RCC_Module-5_Backbencher.club.pdf
Ad

Recently uploaded (20)

PPTX
additive manufacturing of ss316l using mig welding
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PPT
Project quality management in manufacturing
PPTX
Artificial Intelligence
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Sustainable Sites - Green Building Construction
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Geodesy 1.pptx...............................................
additive manufacturing of ss316l using mig welding
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Project quality management in manufacturing
Artificial Intelligence
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
CYBER-CRIMES AND SECURITY A guide to understanding
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Embodied AI: Ushering in the Next Era of Intelligent Systems
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Sustainable Sites - Green Building Construction
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Automation-in-Manufacturing-Chapter-Introduction.pdf
R24 SURVEYING LAB MANUAL for civil enggi
Safety Seminar civil to be ensured for safe working.
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Geodesy 1.pptx...............................................

numerical methods for civil engineering for every one

  • 2. Numerical Methods for Engineers SEVENTH EDITION Steven C. Chapra Berger Chair in Computing and Engineering Tufts University Raymond P. Canale Professor Emeritus of Civil Engineering University of Michigan
  • 3. NUMERICAL METHODS FOR ENGINEERS, SEVENTH EDITION Published by McGraw-Hill Education, 2 Penn Plaza, New York, NY 10121. Copyright © 2015 by McGraw-Hill Education. All rights reserved. Printed in the United States of America. Previous editions © 2010, 2006, and 2002. No part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written consent of McGraw-Hill Education, including, but not limited to, in any network or other electronic storage or transmission, or broadcast for distance learning. Some ancillaries, including electronic and print components, may not be available to customers outside the United States. This book is printed on acid-free paper. 1 2 3 4 5 6 7 8 9 0 DOC/DOC 1 0 9 8 7 6 5 4 ISBN 978–0–07–339792–4 MHID 0–07–339792–x Senior Vice President, Products & Markets: Kurt L. Strand Vice President, General Manager, Products & Markets: Marty Lange Vice President, Content Production & Technology Services: Kimberly Meriwether David Executive Brand Manager: Bill Stenquist Managing Director: Thomas Timp Global Publisher: Raghothaman Srinivasan Developmental Editor: Lorraine Buczek Marketing Manager: Heather Wagner Director, Content Production: Terri Schiesl Senior Content Project Manager: Melissa M. Leick Buyer: Jennifer Pickel Cover Designer: Studio Montage, St. Louis, MO Cover Image: Peak towering above clouds: Royalty-Free/CORBIS; Skysurfers: Getty Images/Digital Vision/RF Media Project Manager: Sandra M. Schnee Compositor: Aptara® , Inc. Typeface: 10/12 Time Roman Printer: R. R. Donnelley All credits appearing on page or at the end of the book are considered to be an extension of the copyright page. Library of Congress Cataloging-in-Publication Data Chapra, Steven C. Numerical methods for engineers / Steven C. Chapra, Berger chair in computing and engineering, Tufts University, Raymond P. Canale, professor emeritus of civil engineering, University of Michigan. — Seventh edition. pages cm Includes bibliographical references and index. ISBN 978-0-07-339792-4 (alk. paper) — ISBN 0-07-339792-X (alk. paper) 1. Engineering mathematics—Data processing. 2. Numerical calculations—Data processing 3. Microcomputers—Programming. I. Canale, Raymond P. II. Title. TA345.C47 2015 518.024’62—dc23 2013041704
  • 4. To Margaret and Gabriel Chapra Helen and Chester Canale
  • 5. iv CONTENTS PREFACE xiv ABOUT THE AUTHORS xvi PART ONE MODELING, PT1.1 Motivation 3 COMPUTERS, AND PT1.2 Mathematical Background 5 ERROR ANALYSIS 3 PT1.3 Orientation 8 CHAPTER 1 Mathematical Modeling and Engineering Problem Solving 11 1.1 A Simple Mathematical Model 11 1.2 Conservation Laws and Engineering 18 Problems 21 CHAPTER 2 Programming and Software 27 2.1 Packages and Programming 27 2.2 Structured Programming 28 2.3 Modular Programming 37 2.4 Excel 39 2.5 MATLAB 43 2.6 Mathcad 47 2.7 Other Languages and Libraries 48 Problems 49 CHAPTER 3 Approximations and Round-Off Errors 55 3.1 Significant Figures 56 3.2 Accuracy and Precision 58 3.3 Error Definitions 59 3.4 Round-Off Errors 65 Problems 79
  • 6. CONTENTS v CHAPTER 4 Truncation Errors and the Taylor Series 81 4.1 The Taylor Series 81 4.2 Error Propagation 97 4.3 Total Numerical Error 101 4.4 Blunders, Formulation Errors, and Data Uncertainty 106 Problems 108 EPILOGUE: PART ONE 110 PT1.4 Trade-Offs 110 PT1.5 Important Relationships and Formulas 113 PT1.6 Advanced Methods and Additional References 113 PART TWO ROOTS OF PT2.1 Motivation 117 EQUATIONS 117 PT2.2 Mathematical Background 119 PT2.3 Orientation 120 CHAPTER 5 Bracketing Methods 123 5.1 Graphical Methods 123 5.2 The Bisection Method 127 5.3 The False-Position Method 135 5.4 Incremental Searches and Determining Initial Guesses 141 Problems 142 CHAPTER 6 Open Methods 145 6.1 Simple Fixed-Point Iteration 146 6.2 The Newton-Raphson Method 151 6.3 The Secant Method 157 6.4 Brent’s Method 162 6.5 Multiple Roots 166 6.6 Systems of Nonlinear Equations 169 Problems 173 CHAPTER 7 Roots of Polynomials 176 7.1 Polynomials in Engineering and Science 176 7.2 Computing with Polynomials 179 7.3 Conventional Methods 182
  • 7. vi CONTENTS 7.4 Müller’s Method 183 7.5 Bairstow’s Method 187 7.6 Other Methods 192 7.7 Root Location with Software Packages 192 Problems 202 CHAPTER 8 Case Studies: Roots of Equations 204 8.1 Ideal and Nonideal Gas Laws (Chemical/Bio Engineering) 204 8.2 Greenhouse Gases and Rainwater (Civil/Environmental Engineering) 207 8.3 Design of an Electric Circuit (Electrical Engineering) 209 8.4 Pipe Friction (Mechanical/Aerospace Engineering) 212 Problems 215 EPILOGUE: PART TWO 226 PT2.4 Trade-Offs 226 PT2.5 Important Relationships and Formulas 227 PT2.6 Advanced Methods and Additional References 227 PART THREE LINEAR ALGEBRAIC PT3.1 Motivation 231 EQUATIONS 231 PT3.2 Mathematical Background 233 PT3.3 Orientation 241 CHAPTER 9 Gauss Elimination 245 9.1 Solving Small Numbers of Equations 245 9.2 Naive Gauss Elimination 252 9.3 Pitfalls of Elimination Methods 258 9.4 Techniques for Improving Solutions 264 9.5 Complex Systems 271 9.6 Nonlinear Systems of Equations 271 9.7 Gauss-Jordan 273 9.8 Summary 275 Problems 275 CHAPTER 10 LU Decomposition and Matrix Inversion 278 10.1 LU Decomposition 278 10.2 The Matrix Inverse 287 10.3 Error Analysis and System Condition 291 Problems 297
  • 8. CONTENTS vii CHAPTER 11 Special Matrices and Gauss-Seidel 300 11.1 Special Matrices 300 11.2 Gauss-Seidel 304 11.3 Linear Algebraic Equations with Software Packages 311 Problems 316 CHAPTER 12 Case Studies: Linear Algebraic Equations 319 12.1 Steady-State Analysis of a System of Reactors (Chemical/Bio Engineering) 319 12.2 Analysis of a Statically Determinate Truss (Civil/Environmental Engineering) 322 12.3 Currents and Voltages in Resistor Circuits (Electrical Engineering) 326 12.4 Spring-Mass Systems (Mechanical/Aerospace Engineering) 328 Problems 331 EPILOGUE: PART THREE 341 PT3.4 Trade-Offs 341 PT3.5 Important Relationships and Formulas 342 PT3.6 Advanced Methods and Additional References 342 PART FOUR OPTIMIZATION 345 PT4.1 Motivation 345 PT4.2 Mathematical Background 350 PT4.3 Orientation 351 CHAPTER 13 One-Dimensional Unconstrained Optimization 355 13.1 Golden-Section Search 356 13.2 Parabolic Interpolation 363 13.3 Newton’s Method 365 13.4 Brent’s Method 366 Problems 368 CHAPTER 14 Multidimensional Unconstrained Optimization 370 14.1 Direct Methods 371 14.2 Gradient Methods 375 Problems 388
  • 9. viii CONTENTS CHAPTER 15 Constrained Optimization 390 15.1 Linear Programming 390 15.2 Nonlinear Constrained Optimization 401 15.3 Optimization with Software Packages 402 Problems 413 CHAPTER 16 Case Studies: Optimization 416 16.1 Least-Cost Design of a Tank (Chemical/Bio Engineering) 416 16.2 Least-Cost Treatment of Wastewater (Civil/Environmental Engineering) 421 16.3 Maximum Power Transfer for a Circuit (Electrical Engineering) 425 16.4 Equilibrium and Minimum Potential Energy (Mechanical/Aerospace Engineering) 429 Problems 431 EPILOGUE: PART FOUR 438 PT4.4 Trade-Offs 438 PT4.5 Additional References 439 PART FIVE CURVE FITTING 441 PT5.1 Motivation 441 PT5.2 Mathematical Background 443 PT5.3 Orientation 452 CHAPTER 17 Least-Squares Regression 456 17.1 Linear Regression 456 17.2 Polynomial Regression 472 17.3 Multiple Linear Regression 476 17.4 General Linear Least Squares 479 17.5 Nonlinear Regression 483 Problems 487 CHAPTER 18 Interpolation 490 18.1 Newton’s Divided-Difference Interpolating Polynomials 491 18.2 Lagrange Interpolating Polynomials 502 18.3 Coefficients of an Interpolating Polynomial 507 18.4 Inverse Interpolation 507 18.5 Additional Comments 508 18.6 Spline Interpolation 511 18.7 Multidimensional Interpolation 521 Problems 524
  • 10. CONTENTS ix CHAPTER 19 Fourier Approximation 526 19.1 Curve Fitting with Sinusoidal Functions 527 19.2 Continuous Fourier Series 533 19.3 Frequency and Time Domains 536 19.4 Fourier Integral and Transform 540 19.5 Discrete Fourier Transform (DFT) 542 19.6 Fast Fourier Transform (FFT) 544 19.7 The Power Spectrum 551 19.8 Curve Fitting with Software Packages 552 Problems 561 CHAPTER 20 Case Studies: Curve Fitting 563 20.1 Linear Regression and Population Models (Chemical/Bio Engineering) 563 20.2 Use of Splines to Estimate Heat Transfer (Civil/Environmental Engineering) 567 20.3 Fourier Analysis (Electrical Engineering) 569 20.4 Analysis of Experimental Data (Mechanical/Aerospace Engineering) 570 Problems 572 EPILOGUE: PART FIVE 582 PT5.4 Trade-Offs 582 PT5.5 Important Relationships and Formulas 583 PT5.6 Advanced Methods and Additional References 584 PART SIX NUMERICAL PT6.1 Motivation 587 DIFFERENTIATION PT6.2 Mathematical Background 597 AND PT6.3 Orientation 599 INTEGRATION 587 CHAPTER 21 Newton-Cotes Integration Formulas 603 21.1 The Trapezoidal Rule 605 21.2 Simpson’s Rules 615 21.3 Integration with Unequal Segments 624 21.4 Open Integration Formulas 627 21.5 Multiple Integrals 627 Problems 629
  • 11. x CONTENTS CHAPTER 22 Integration of Equations 633 22.1 Newton-Cotes Algorithms for Equations 633 22.2 Romberg Integration 634 22.3 Adaptive Quadrature 640 22.4 Gauss Quadrature 642 22.5 Improper Integrals 650 Problems 653 CHAPTER 23 Numerical Differentiation 655 23.1 High-Accuracy Differentiation Formulas 655 23.2 Richardson Extrapolation 658 23.3 Derivatives of Unequally Spaced Data 660 23.4 Derivatives and Integrals for Data with Errors 661 23.5 Partial Derivatives 662 23.6 Numerical Integration/Differentiation with Software Packages 663 Problems 670 CHAPTER 24 Case Studies: Numerical Integration and Differentiation 673 24.1 Integration to Determine the Total Quantity of Heat (Chemical/Bio Engineering) 673 24.2 Effective Force on the Mast of a Racing Sailboat (Civil/Environmental Engineering) 675 24.3 Root-Mean-Square Current by Numerical Integration (Electrical Engineering) 677 24.4 Numerical Integration to Compute Work (Mechanical/Aerospace Engineering) 680 Problems 684 EPILOGUE: PART SIX 694 PT6.4 Trade-Offs 694 PT6.5 Important Relationships and Formulas 695 PT6.6 Advanced Methods and Additional References 695 PART SEVEN ORDINARY PT7.1 Motivation 699 DIFFERENTIAL PT7.2 Mathematical Background 703 EQUATIONS 699 PT7.3 Orientation 705
  • 12. CONTENTS xi CHAPTER 25 Runge-Kutta Methods 709 25.1 Euler’s Method 710 25.2 Improvements of Euler’s Method 721 25.3 Runge-Kutta Methods 729 25.4 Systems of Equations 739 25.5 Adaptive Runge-Kutta Methods 744 Problems 752 CHAPTER 26 Stiffness and Multistep Methods 755 26.1 Stiffness 755 26.2 Multistep Methods 759 Problems 779 CHAPTER 27 Boundary-Value and Eigenvalue Problems 781 27.1 General Methods for Boundary-Value Problems 782 27.2 Eigenvalue Problems 789 27.3 Odes and Eigenvalues with Software Packages 801 Problems 808 CHAPTER 28 Case Studies: Ordinary Differential Equations 811 28.1 Using ODEs to Analyze the Transient Response of a Reactor (Chemical/Bio Engineering) 811 28.2 Predator-Prey Models and Chaos (Civil/Environmental Engineering) 818 28.3 Simulating Transient Current for an Electric Circuit (Electrical Engineering) 822 28.4 The Swinging Pendulum (Mechanical/Aerospace Engineering) 827 Problems 831 EPILOGUE: PART SEVEN 841 PT7.4 Trade-Offs 841 PT7.5 Important Relationships and Formulas 842 PT7.6 Advanced Methods and Additional References 842 PART EIGHT PARTIAL PT8.1 Motivation 845 DIFFERENTIAL PT8.2 Orientation 848 EQUATIONS 845
  • 13. xii CONTENTS CHAPTER 29 Finite Difference: Elliptic Equations 852 29.1 The Laplace Equation 852 29.2 Solution Technique 854 29.3 Boundary Conditions 860 29.4 The Control-Volume Approach 866 29.5 Software to Solve Elliptic Equations 869 Problems 870 CHAPTER 30 Finite Difference: Parabolic Equations 873 30.1 The Heat-Conduction Equation 873 30.2 Explicit Methods 874 30.3 A Simple Implicit Method 878 30.4 The Crank-Nicolson Method 882 30.5 Parabolic Equations in Two Spatial Dimensions 885 Problems 888 CHAPTER 31 Finite-Element Method 890 31.1 The General Approach 891 31.2 Finite-Element Application in One Dimension 895 31.3 Two-Dimensional Problems 904 31.4 Solving PDEs with Software Packages 908 Problems 912 CHAPTER 32 Case Studies: Partial Differential Equations 915 32.1 One-Dimensional Mass Balance of a Reactor (Chemical/Bio Engineering) 915 32.2 Deflections of a Plate (Civil/Environmental Engineering) 919 32.3 Two-Dimensional Electrostatic Field Problems (Electrical Engineering) 921 32.4 Finite-Element Solution of a Series of Springs (Mechanical/Aerospace Engineering) 924 Problems 928 EPILOGUE: PART EIGHT 931 PT8.3 Trade-Offs 931 PT8.4 Important Relationships and Formulas 931 PT8.5 Advanced Methods and Additional References 932
  • 14. CONTENTS xiii APPENDIX A: THE FOURIER SERIES 933 APPENDIX B: GETTING STARTED WITH MATLAB 935 APPENDIX C: GETTING STARTED WITH MATHCAD 943 BIBLIOGRAPHY 954 INDEX 957
  • 15. xiv PREFACE It has been over twenty years since we published the first edition of this book. Over that period, our original contention that numerical methods and computers would figure more prominently in the engineering curriculum—particularly in the early parts—has been dra- matically borne out. Many universities now offer freshman, sophomore, and junior courses in both introductory computing and numerical methods. In addition, many of our colleagues are integrating computer-oriented problems into other courses at all levels of the curriculum. Thus, this new edition is still founded on the basic premise that student engineers should be provided with a strong and early introduction to numerical methods. Consequently, although we have expanded our coverage in the new edition, we have tried to maintain many of the features that made the first edition accessible to both lower- and upper-level undergraduates. These include: • Problem Orientation. Engineering students learn best when they are motivated by problems. This is particularly true for mathematics and computing. Consequently, we have approached numerical methods from a problem-solving perspective. • Student-Oriented Pedagogy. We have developed a number of features to make this book as student-friendly as possible. These include the overall organization, the use of introductions and epilogues to consolidate major topics and the extensive use of worked examples and case studies from all areas of engineering. We have also en- deavored to keep our explanations straightforward and oriented practically. • Computational Tools. We empower our students by helping them utilize the standard “point-and-shoot” numerical problem-solving capabilities of packages like Excel, MATLAB, and Mathcad software. However, students are also shown how to develop simple, well-structured programs to extend the base capabilities of those environ- ments. This knowledge carries over to standard programming languages such as Visual Basic, Fortran 90, and C/C11. We believe that the current flight from computer programming represents something of a “dumbing down” of the engineering curricu- lum. The bottom line is that as long as engineers are not content to be tool limited, they will have to write code. Only now they may be called “macros” or “M-files.” This book is designed to empower them to do that. Beyond these five original principles, the seventh edition has new and expanded problem sets. Most of the problems have been modified so that they yield different numerical solu- tions from previous editions. In addition, a variety of new problems have been included. The seventh edition also includes McGraw-Hill’s Connect® Engineering. This online homework management tool allows assignment of algorithmic problems for homework, quizzes, and tests. It connects students with the tools and resources they’ll need to achieve success. To learn more, visit www.mcgrawhillconnect.com. McGraw-Hill LearnSmart™ is also available as an integrated feature of McGraw-Hill Connect® Engineering. It is an adaptive learning system designed to help students learn faster, study more efficiently, and retain more knowledge for greater success. LearnSmart assesses
  • 16. PREFACE xv a student’s knowledge of course content through a series of adaptive questions. It pinpoints concepts the student does not understand and maps out a personalized study plan for success. Visit the following site for a demonstration. www.mhlearnsmart.com As always, our primary intent in writing this book is to provide students with a sound introduction to numerical methods. We believe that motivated students who enjoy numeri- cal methods, computers, and mathematics will, in the end, make better engineers. If our book fosters an enthusiasm for these subjects, we will consider our efforts a success. Acknowledgments. We would like to thank our friends at McGraw-Hill. In particular, Lorraine Buczek and Bill Stenquist, who provided a positive and supportive atmosphere for creating this edition. As usual, Beatrice Sussman did a masterful job of copyediting the man- uscript and Arpana Kumari of Aptara also did an outstanding job in the book’s final production phase. As in past editions, David Clough (University of Colorado), Mike Gustafson (Duke), and Jerry Stedinger (Cornell University) generously shared their insights and suggestions. Use- ful suggestions were also made by Bill Philpot (Cornell University), Jim Guilkey (University of Utah), Dong-Il Seo (Chungnam National University, Korea), Niall Broekhuizen (NIWA, New Zealand), and Raymundo Cordero and Karim Muci (ITESM, Mexico). The present edition has also benefited from the reviews and suggestions by the following colleagues: Betty Barr, University of Houston Jalal Behzadi, Shahid Chamran University Jordan Berg, Texas Tech University Jacob Bishop, Utah State University Estelle M. Eke, California State University, Sacramento Yazan A. Hussain, Jordan University of Science & Technology Yogesh Jaluria, Rutgers University S. Graham Kelly, The University of Akron Subha Kumpaty, Milwaukee School of Engineering Eckart Meiburg, University of California-Santa Barbara Prashant Mhaskar, McMaster University Luke Olson, University of Illinois at Urbana-Champaign Richard Pates Jr., Old Dominion University Joseph H. Pierluissi, University of Texas at El Paso Juan Perán, Universidad Nacional de Educación a Distancia (UNED) Scott A. Socolofsky, Texas A&M University It should be stressed that although we received useful advice from the aforementioned individuals, we are responsible for any inaccuracies or mistakes you may detect in this edi- tion. Please contact Steve Chapra via e-mail if you should detect any errors in this edition. Finally, we would like to thank our family, friends, and students for their enduring patience and support. In particular, Cynthia Chapra, Danielle Husley, and Claire Canale are always there providing understanding, perspective, and love. Steven C. Chapra Medford, Massachusetts steven.chapra@tufts.edu Raymond P. Canale Lake Leelanau, Michigan
  • 17. xvi ABOUT THE AUTHORS Steve Chapra teaches in the Civil and Environmental Engineering Department at Tufts University where he holds the Louis Berger Chair in Computing and Engineering. His other books include Surface Water-Quality Modeling and Applied Numerical Methods with MATLAB. Dr. Chapra received engineering degrees from Manhattan College and the University of Michigan. Before joining the faculty at Tufts, he worked for the Environmental Pro- tection Agency and the National Oceanic and Atmospheric Administration, and taught at Texas A&M University and the University of Colorado. His general research interests focus on surface water-quality modeling and advanced computer applications in environ- mental engineering. He is a Fellow of the ASCE, and has received a number of awards for his scholarly contributions, including the Rudolph Hering Medal (ASCE), and the Meriam-Wiley Distinguished Author Award (American Society for Engineering Education). He has also been recognized as the outstanding teacher among the engineering faculties at Texas A&M University, the University of Colorado, and Tufts University. Raymond P. Canale is an emeritus professor at the University of Michigan. During his over 20-year career at the university, he taught numerous courses in the area of comput- ers, numerical methods, and environmental engineering. He also directed extensive research programs in the area of mathematical and computer modeling of aquatic ecosystems. He has authored or coauthored several books and has published over 100 scientific papers and reports. He has also designed and developed personal computer software to facilitate en- gineering education and the solution of engineering problems. He has been given the Meriam-Wiley Distinguished Author Award by the American Society for Engineering Education for his books and software and several awards for his technical publications. Professor Canale is now devoting his energies to applied problems, where he works with engineering firms and industry and governmental agencies as a consultant and expert witness.
  • 20. 3 PT1.1 MOTIVATION Numerical methods are techniques by which mathematical problems are formulated so that they can be solved with arithmetic operations. Although there are many kinds of numerical methods, they have one common characteristic: they invariably involve large numbers of tedious arithmetic calculations. It is little wonder that with the development of fast, efficient digital computers, the role of numerical methods in engineering problem solving has increased dramatically in recent years. PT1.1.1 Noncomputer Methods Beyond providing increased computational firepower, the widespread availability of com- puters (especially personal computers) and their partnership with numerical methods has had a significant influence on the actual engineering problem-solving process. In the precomputer era there were generally three different ways in which engineers approached problem solving: 1. Solutions were derived for some problems using analytical, or exact, methods. These solutions were often useful and provided excellent insight into the behavior of some systems. However, analytical solutions can be derived for only a limited class of problems. These include those that can be approximated with linear models and those that have simple geometry and low dimensionality. Consequently, analytical solutions are of limited practical value because most real problems are nonlinear and involve complex shapes and processes. 2. Graphical solutions were used to characterize the behavior of systems. These graphical solutions usually took the form of plots or nomographs. Although graphical techniques can often be used to solve complex problems, the results are not very precise. Furthermore, graphical solutions (without the aid of computers) are extremely tedious and awkward to implement. Finally, graphical techniques are often limited to problems that can be described using three or fewer dimensions. 3. Calculators and slide rules were used to implement numerical methods manually. Although in theory such approaches should be perfectly adequate for solving complex problems, in actuality several difficulties are encountered. Manual calculations are slow and tedious. Furthermore, consistent results are elusive because of simple blunders that arise when numerous manual tasks are performed. During the precomputer era, significant amounts of energy were expended on the solution technique itself, rather than on problem definition and interpretation (Fig. PT1.1a). This unfortunate situation existed because so much time and drudgery were required to obtain numerical answers using precomputer techniques. MODELING, COMPUTERS, AND ERROR ANALYSIS
  • 21. 4 MODELING, COMPUTERS, AND ERROR ANALYSIS Today, computers and numerical methods provide an alternative for such compli- cated calculations. Using computer power to obtain solutions directly, you can approach these calculations without recourse to simplifying assumptions or time-intensive tech- niques. Although analytical solutions are still extremely valuable both for problem solving and for providing insight, numerical methods represent alternatives that greatly enlarge your capabilities to confront and solve problems. As a result, more time is available for the use of your creative skills. Thus, more emphasis can be placed on problem formulation and solution interpretation and the incorporation of total system, or “holistic,” awareness (Fig. PT1.1b). PT1.1.2 Numerical Methods and Engineering Practice Since the late 1940s the widespread availability of digital computers has led to a veri- table explosion in the use and development of numerical methods. At first, this growth was somewhat limited by the cost of access to large mainframe computers, and, conse- quently, many engineers continued to use simple analytical approaches in a significant portion of their work. Needless to say, the recent evolution of inexpensive personal FIGURE PT1.1 The three phases of engineering problem solving in (a) the precomputer and (b) the computer era. The sizes of the boxes indicate the level of emphasis directed toward each phase. Computers facilitate the implementation of solution techniques and thus allow more emphasis to be placed on the creative aspects of problem formulation and interpretation of results. INTERPRETATION Ease of calculation allows holistic thoughts and intuition to develop; system sensitivity and behavior can be studied FORMULATION In-depth exposition of relationship of problem to fundamental laws SOLUTION Easy-to-use computer method (b) INTERPRETATION In-depth analysis limited by time- consuming solution FORMULATION Fundamental laws explained briefly SOLUTION Elaborate and often complicated method to make problem tractable (a)
  • 22. PT1.2 MATHEMATICAL BACKGROUND 5 computers has given us ready access to powerful computational capabilities. There are several additional reasons why you should study numerical methods: 1. Numerical methods are extremely powerful problem-solving tools. They are capable of handling large systems of equations, nonlinearities, and complicated geometries that are not uncommon in engineering practice and that are often impossible to solve analytically. As such, they greatly enhance your problem-solving skills. 2. During your careers, you may often have occasion to use commercially available prepackaged, or “canned,” computer programs that involve numerical methods. The intelligent use of these programs is often predicated on knowledge of the basic theory underlying the methods. 3. Many problems cannot be approached using canned programs. If you are conversant with numerical methods and are adept at computer programming, you can design your own programs to solve problems without having to buy or commission expensive software. 4. Numerical methods are an efficient vehicle for learning to use computers. It is well known that an effective way to learn programming is to actually write computer programs. Because numerical methods are for the most part designed for implementation on computers, they are ideal for this purpose. Further, they are especially well-suited to illustrate the power and the limitations of computers. When you successfully implement numerical methods on a computer and then apply them to solve otherwise intractable problems, you will be provided with a dramatic demonstration of how computers can serve your professional development. At the same time, you will also learn to acknowledge and control the errors of approximation that are part and parcel of large-scale numerical calculations. 5. Numerical methods provide a vehicle for you to reinforce your understanding of mathematics. Because one function of numerical methods is to reduce higher mathematics to basic arithmetic operations, they get at the “nuts and bolts” of some otherwise obscure topics. Enhanced understanding and insight can result from this alternative perspective. PT1.2 MATHEMATICAL BACKGROUND Every part in this book requires some mathematical background. Consequently, the in- troductory material for each part includes a section, such as the one you are reading, on mathematical background. Because Part One itself is devoted to background material on mathematics and computers, this section does not involve a review of a specific math- ematical topic. Rather, we take this opportunity to introduce you to the types of math- ematical subject areas covered in this book. As summarized in Fig. PT1.2, these are 1. Roots of Equations (Fig. PT1.2a). These problems are concerned with the value of a variable or a parameter that satisfies a single nonlinear equation. These problems are especially valuable in engineering design contexts where it is often impossible to explicitly solve design equations for parameters. 2. Systems of Linear Algebraic Equations (Fig. PT1.2b). These problems are similar in spirit to roots of equations in the sense that they are concerned with values that
  • 23. 6 MODELING, COMPUTERS, AND ERROR ANALYSIS f(x) x Root x2 x1 Solution Minimum f(x) x Interpolation f(x) x f(x) x Regression f(x) I (a) Part 2: Roots of equations Solve f(x) = 0 for x. (c) Part 4: Optimization (b) Part 3: Linear algebraic equations Given the a’s and the c’s, solve a11x1 + a12x2 = c1 a21x1 + a22x2 = c2 for the x’s. Determine x that gives optimum f(x). (e) Part 6: Integration I = 兰a b f(x) dx Find the area under the curve. (d) Part 5: Curve fitting x FIGURE PT1.2 Summary of the numerical methods covered in this book.
  • 24. PT1.2 MATHEMATICAL BACKGROUND 7 satisfy equations. However, in contrast to satisfying a single equation, a set of values is sought that simultaneously satisfies a set of linear algebraic equations. Such equations arise in a variety of problem contexts and in all disciplines of engineering. In particular, they originate in the mathematical modeling of large systems of interconnected elements such as structures, electric circuits, and fluid networks. However, they are also encountered in other areas of numerical methods such as curve fitting and differential equations. 3. Optimization (Fig. PT1.2c). These problems involve determining a value or values of an independent variable that correspond to a “best” or optimal value of a function. Thus, as in Fig. PT1.2c, optimization involves identifying maxima and minima. Such problems occur routinely in engineering design contexts. They also arise in a number of other numerical methods. We address both single- and multi-variable unconstrained optimization. We also describe constrained optimization with particular emphasis on linear programming. 4. Curve Fitting (Fig. PT1.2d). You will often have occasion to fit curves to data points. The techniques developed for this purpose can be divided into two general categories: regression and interpolation. Regression is employed where there is a significant degree of error associated with the data. Experimental results are often of this kind. For these situations, the strategy is to derive a single curve that represents the general trend of the data without necessarily matching any individual points. In contrast, interpolation is used where the objective is to determine intermediate values between relatively error-free data points. Such is usually the case for tabulated information. For these situations, the strategy is to fit a curve directly through the data points and use the curve to predict the intermediate values. 5. Integration (Fig. PT1.2e). As depicted, a physical interpretation of numerical integration is the determination of the area under a curve. Integration has many y x (g) Part 8: Partial differential equations Given solve for u as a function of x and y = f(x, y) ⭸2 u ⭸x2 ⭸2 u ⭸y2 + t Slope = f(ti, yi) y ⌬t ti ti + 1 ( f ) Part 7: Ordinary differential equations Given solve for y as a function of t. yi + 1 = yi + f(ti , yi) ⌬t ⯝ = f(t, y) dy dt ⌬y ⌬t FIGURE PT1.2 (concluded)
  • 25. 8 MODELING, COMPUTERS, AND ERROR ANALYSIS applications in engineering practice, ranging from the determination of the centroids of oddly shaped objects to the calculation of total quantities based on sets of discrete measurements. In addition, numerical integration formulas play an important role in the solution of differential equations. 6. Ordinary Differential Equations (Fig. PT1.2f ). Ordinary differential equations are of great significance in engineering practice. This is because many physical laws are couched in terms of the rate of change of a quantity rather than the magnitude of the quantity itself. Examples range from population-forecasting models (rate of change of population) to the acceleration of a falling body (rate of change of velocity). Two types of problems are addressed: initial-value and boundary-value problems. In addition, the computation of eigenvalues is covered. 7. Partial Differential Equations (Fig. PT1.2g). Partial differential equations are used to characterize engineering systems where the behavior of a physical quantity is couched in terms of its rate of change with respect to two or more independent variables. Examples include the steady-state distribution of temperature on a heated plate (two spatial dimensions) or the time-variable temperature of a heated rod (time and one spatial dimension). Two fundamentally different approaches are employed to solve partial differential equations numerically. In the present text, we will emphasize finite-difference methods that approximate the solution in a pointwise fashion (Fig. PT1.2g). However, we will also present an introduction to finite-element methods, which use a piecewise approach. PT1.3 ORIENTATION Some orientation might be helpful before proceeding with our introduction to nu- merical methods. The following is intended as an overview of the material in Part One. In addition, some objectives have been included to focus your efforts when studying the material. PT1.3.1 Scope and Preview Figure PT1.3 is a schematic representation of the material in Part One. We have designed this diagram to provide you with a global overview of this part of the book. We believe that a sense of the “big picture” is critical to developing insight into numerical methods. When reading a text, it is often possible to become lost in technical details. Whenever you feel that you are losing the big picture, refer back to Fig. PT1.3 to reorient yourself. Every part of this book includes a similar figure. Figure PT1.3 also serves as a brief preview of the material covered in Part One. Chapter 1 is designed to orient you to numerical methods and to provide motivation by demonstrating how these techniques can be used in the engineering modeling process. Chapter 2 is an introduction and review of computer-related aspects of numerical meth- ods and suggests the level of computer skills you should acquire to efficiently apply succeeding information. Chapters 3 and 4 deal with the important topic of error analysis, which must be understood for the effective use of numerical methods. In addition, an epilogue is included that introduces the trade-offs that have such great significance for the effective implementation of numerical methods.
  • 26. PT1.3 ORIENTATION 9 FIGURE PT1.3 Schematic of the organization of the material in Part One: Modeling, Computers, and Error Analysis. CHAPTER 1 Mathematical Modeling and Engineering Problem Solving PART 1 Modeling, Computers, and Error Analysis CHAPTER 2 Programming and Software CHAPTER 3 Approximations and Round-Off Errors CHAPTER 4 Truncation Errors and the Taylor Series EPILOGUE 2.7 Languages and libraries 2.6 Mathcad 2.5 MATLAB 2.4 Excel 2.3 Modular programming 2.2 Structured programming 2.1 Packages and programming PT 1.2 Mathematical background PT 1.6 Advanced methods PT 1.5 Important formulas 4.4 Miscellaneous errors 4.3 Total numerical error 4.2 Error propagation 4.1 Taylor series 3.4 Round-off errors 3.1 Significant figures 3.3 Error definitions 3.2 Accuracy and precision PT 1.4 Trade-offs PT 1.3 Orientation PT 1.1 Motivation 1.2 Conservation laws 1.1 A simple model
  • 27. 10 MODELING, COMPUTERS, AND ERROR ANALYSIS TABLE PT1.1 Specific study objectives for Part One. 1. Recognize the difference between analytical and numerical solutions. 2. Understand how conservation laws are employed to develop mathematical models of physical systems. 3. Define top-down and modular design. 4. Delineate the rules that underlie structured programming. 5. Be capable of composing structured and modular programs in a high-level computer language. 6. Know how to translate structured flowcharts and pseudocode into code in a high-level language. 7. Start to familiarize yourself with any software packages that you will be using in conjunction with this text. 8. Recognize the distinction between truncation and round-off errors. 9. Understand the concepts of significant figures, accuracy, and precision. 10. Recognize the difference between true relative error et, approximate relative error ea, and acceptable error es, and understand how ea and es are used to terminate an iterative computation. 11. Understand how numbers are represented in digital computers and how this representation induces round-off error. In particular, know the difference between single and extended precision. 12. Recognize how computer arithmetic can introduce and amplify round-off errors in calculations. In particular, appreciate the problem of subtractive cancellation. 13. Understand how the Taylor series and its remainder are employed to represent continuous functions. 14. Know the relationship between finite divided differences and derivatives. 15. Be able to analyze how errors are propagated through functional relationships. 16. Be familiar with the concepts of stability and condition. 17. Familiarize yourself with the trade-offs outlined in the Epilogue of Part One. PT1.3.2 Goals and Objectives Study Objectives. Upon completing Part One, you should be adequately prepared to embark on your studies of numerical methods. In general, you should have gained a fundamental understanding of the importance of computers and the role of approxima- tions and errors in the implementation and development of numerical methods. In addi- tion to these general goals, you should have mastered each of the specific study objectives listed in Table PT1.1. Computer Objectives. Upon completing Part One, you should have mastered sufficient computer skills to develop your own software for the numerical methods in this text. You should be able to develop well-structured and reliable computer programs on the basis of pseudocode, flowcharts, or other forms of algorithms. You should have developed the capability to document your programs so that they may be effectively employed by users. Finally, in addition to your own programs, you may be using software packages along with this book. Packages like Excel, Mathcad, or The MathWorks, Inc. MATLAB® pro- gram are examples of such software. You should become familiar with these packages, so that you will be comfortable using them to solve numerical problems later in the text.
  • 28. 1 11 Mathematical Modeling and Engineering Problem Solving Knowledge and understanding are prerequisites for the effective implementation of any tool. No matter how impressive your tool chest, you will be hard-pressed to repair a car if you do not understand how it works. This is particularly true when using computers to solve engineering problems. Although they have great potential utility, computers are practically useless without a fundamental understanding of how engineering systems work. This understanding is initially gained by empirical means—that is, by observation and experiment. However, while such empirically derived information is essential, it is only half the story. Over years and years of observation and experiment, engineers and scientists have noticed that certain aspects of their empirical studies occur repeatedly. Such general behavior can then be expressed as fundamental laws that essentially embody the cumulative wisdom of past experience. Thus, most engineering problem solving employs the two-pronged approach of empiricism and theoretical analysis (Fig. 1.1). It must be stressed that the two prongs are closely coupled. As new measurements are taken, the generalizations may be modified or new ones developed. Similarly, the general- izations can have a strong influence on the experiments and observations. In particular, generalizations can serve as organizing principles that can be employed to synthesize ob- servations and experimental results into a coherent and comprehensive framework from which conclusions can be drawn. From an engineering problem-solving perspective, such a framework is most useful when it is expressed in the form of a mathematical model. The primary objective of this chapter is to introduce you to mathematical modeling and its role in engineering problem solving. We will also illustrate how numerical meth- ods figure in the process. 1.1 A SIMPLE MATHEMATICAL MODEL A mathematical model can be broadly defined as a formulation or equation that expresses the essential features of a physical system or process in mathematical terms. In a very general sense, it can be represented as a functional relationship of the form Dependent variable 5 f a independent variables , parameters, forcing functions b (1.1) C H A P T E R 1
  • 29. 12 MATHEMATICAL MODELING AND ENGINEERING PROBLEM SOLVING where the dependent variable is a characteristic that usually reflects the behavior or state of the system; the independent variables are usually dimensions, such as time and space, along which the system’s behavior is being determined; the parameters are reflective of the system’s properties or composition; and the forcing functions are external influences acting upon the system. The actual mathematical expression of Eq. (1.1) can range from a simple algebraic relationship to large complicated sets of differential equations. For example, on the basis of his observations, Newton formulated his second law of motion, which states that the time rate of change of momentum of a body is equal to the resultant force acting on it. The mathematical expression, or model, of the second law is the well- known equation F 5 ma (1.2) where F 5 net force acting on the body (N, or kg m/s2 ), m 5 mass of the object (kg), and a 5 its acceleration (m/s2 ). Implementation Numeric or graphic results Mathematical model Problem definition THEORY DATA Problem-solving tools: computers, statistics, numerical methods, graphics, etc. Societal interfaces: scheduling, optimization, communication, public interaction, etc. FIGURE 1.1 The engineering problem- solving process.
  • 30. 1.1 A SIMPLE MATHEMATICAL MODEL 13 The second law can be recast in the format of Eq. (1.1) by merely dividing both sides by m to give a 5 F m (1.3) where a 5 the dependent variable reflecting the system’s behavior, F 5 the forcing function, and m 5 a parameter representing a property of the system. Note that for this simple case there is no independent variable because we are not yet predicting how acceleration varies in time or space. Equation (1.3) has several characteristics that are typical of mathematical models of the physical world: 1. It describes a natural process or system in mathematical terms. 2. It represents an idealization and simplification of reality. That is, the model ignores negligible details of the natural process and focuses on its essential manifestations. Thus, the second law does not include the effects of relativity that are of minimal importance when applied to objects and forces that interact on or about the earth’s surface at velocities and on scales visible to humans. 3. Finally, it yields reproducible results and, consequently, can be used for predictive purposes. For example, if the force on an object and the mass of an object are known, Eq. (1.3) can be used to compute acceleration. Because of its simple algebraic form, the solution of Eq. (1.2) can be obtained eas- ily. However, other mathematical models of physical phenomena may be much more complex, and either cannot be solved exactly or require more sophisticated mathematical techniques than simple algebra for their solution. To illustrate a more complex model of this kind, Newton’s second law can be used to determine the terminal velocity of a free- falling body near the earth’s surface. Our falling body will be a parachutist (Fig. 1.2). A model for this case can be derived by expressing the acceleration as the time rate of change of the velocity (dy兾dt) and substituting it into Eq. (1.3) to yield dy dt 5 F m (1.4) where y is velocity (m/s) and t is time (s). Thus, the mass multiplied by the rate of change of the velocity is equal to the net force acting on the body. If the net force is positive, the object will accelerate. If it is negative, the object will decelerate. If the net force is zero, the object’s velocity will remain at a constant level. Next, we will express the net force in terms of measurable variables and parameters. For a body falling within the vicinity of the earth (Fig. 1.2), the net force is composed of two opposing forces: the downward pull of gravity FD and the upward force of air resistance FU: F 5 FD 1 FU (1.5) If the downward force is assigned a positive sign, the second law can be used to formu- late the force due to gravity, as FD 5 mg (1.6) where g 5 the gravitational constant, or the acceleration due to gravity, which is approxi- mately equal to 9.81 m/s2 . FU FD FIGURE 1.2 Schematic diagram of the forces acting on a falling parachutist. FD is the downward force due to gravity. FU is the upward force due to air resistance.
  • 31. 14 MATHEMATICAL MODELING AND ENGINEERING PROBLEM SOLVING Air resistance can be formulated in a variety of ways. A simple approach is to as- sume that it is linearly proportional to velocity1 and acts in an upward direction, as in FU 5 2cy (1.7) where c 5 a proportionality constant called the drag coefficient (kg/s). Thus, the greater the fall velocity, the greater the upward force due to air resistance. The parameter c accounts for properties of the falling object, such as shape or surface roughness, that affect air resistance. For the present case, c might be a function of the type of jumpsuit or the orientation used by the parachutist during free-fall. The net force is the difference between the downward and upward force. Therefore, Eqs. (1.4) through (1.7) can be combined to yield dy dt 5 mg 2 cy m (1.8) or simplifying the right side, dy dt 5 g 2 c m y (1.9) Equation (1.9) is a model that relates the acceleration of a falling object to the forces acting on it. It is a differential equation because it is written in terms of the differential rate of change (dy兾dt) of the variable that we are interested in predicting. However, in contrast to the solution of Newton’s second law in Eq. (1.3), the exact solution of Eq. (1.9) for the velocity of the falling parachutist cannot be obtained using simple algebraic manipulation. Rather, more advanced techniques, such as those of calculus, must be applied to obtain an exact or analytical solution. For example, if the parachutist is initially at rest (y 5 0 at t 5 0), calculus can be used to solve Eq. (1.9) for y(t) 5 gm c (1 2 e2(cym)t ) (1.10) Note that Eq. (1.10) is cast in the general form of Eq. (1.1), where y(t) 5 the dependent variable, t 5 the independent variable, c and m 5 parameters, and g 5 the forcing function. EXAMPLE 1.1 Analytical Solution to the Falling Parachutist Problem Problem Statement. A parachutist of mass 68.1 kg jumps out of a stationary hot air balloon. Use Eq. (1.10) to compute velocity prior to opening the chute. The drag coefficient is equal to 12.5 kg/s. Solution. Inserting the parameters into Eq. (1.10) yields y(t) 5 9.81(68.1) 12.5 (1 2 e2(12.5y68.1)t ) 5 53.44 (1 2 e20.18355t ) which can be used to compute 1 In fact, the relationship is actually nonlinear and might better be represented by a power relationship such as FU 5 2cy2 . We will explore how such nonlinearities affect the model in problems at the end of this chapter.
  • 32. 1.1 A SIMPLE MATHEMATICAL MODEL 15 t, s v, m/s 0 0.00 2 16.42 4 27.80 6 35.68 8 41.14 10 44.92 12 47.54 ` 53.44 According to the model, the parachutist accelerates rapidly (Fig. 1.3). A velocity of 44.92 m/s is attained after 10 s. Note also that after a sufficiently long time, a constant veloc- ity, called the terminal velocity, of 53.44 m/s is reached. This velocity is constant because, eventually, the force of gravity will be in balance with the air resistance. Thus, the net force is zero and acceleration has ceased. Equation (1.10) is called an analytical, or exact, solution because it exactly satisfies the original differential equation. Unfortunately, there are many mathematical models that cannot be solved exactly. In many of these cases, the only alternative is to develop a numerical solution that approximates the exact solution. As mentioned previously, numerical methods are those in which the mathematical problem is reformulated so it can be solved by arithmetic operations. This can be illustrated FIGURE 1.3 The analytical solution to the falling parachutist problem as computed in Example 1.1. Velocity increases with time and asymptotically approaches a terminal velocity. 0 0 20 40 4 8 12 t, s v, m/s Terminal velocity
  • 33. 16 MATHEMATICAL MODELING AND ENGINEERING PROBLEM SOLVING for Newton’s second law by realizing that the time rate of change of velocity can be approximated by (Fig. 1.4): dy dt > ¢y ¢t 5 y(ti11) 2 y(ti) ti11 2 ti (1.11) where Dy and Dt 5 differences in velocity and time, respectively, computed over finite intervals, y(ti) 5 velocity at an initial time ti, and y(ti+1) 5 velocity at some later time ti+1. Note that dy/dt > ¢yy¢t is approximate because Dt is finite. Remember from calculus that dy dt 5 lim ¢tS0 ¢y ¢t Equation (1.11) represents the reverse process. Equation (1.11) is called a finite divided difference approximation of the derivative at time ti. It can be substituted into Eq. (1.9) to give y(ti11) 2 y(ti) ti11 2 ti 5 g 2 c m y(ti) This equation can then be rearranged to yield y(ti11) 5 y(ti) 1 cg 2 c m y(ti) d (ti11 2 ti) (1.12) Notice that the term in brackets is the right-hand side of the differential equation itself [Eq. (1.9)]. That is, it provides a means to compute the rate of change or slope of y. Thus, the differential equation has been transformed into an equation that can be used to determine the velocity algebraically at ti11 using the slope and previous values of FIGURE 1.4 The use of a finite difference to approximate the first derivative of v with respect to t. v(ti +1) v(ti ) v True slope dv/dt Approximate slope v t v(ti +1) – v(ti ) ti +1 – ti = ti +1 ti t t
  • 34. 1.1 A SIMPLE MATHEMATICAL MODEL 17 y and t. If you are given an initial value for velocity at some time ti, you can easily com- pute velocity at a later time ti11. This new value of velocity at ti11 can in turn be employed to extend the computation to velocity at ti12 and so on. Thus, at any time along the way, New value 5 old value 1 slope 3 step size Note that this approach is formally called Euler’s method. EXAMPLE 1.2 Numerical Solution to the Falling Parachutist Problem Problem Statement. Perform the same computation as in Example 1.1 but use Eq. (1.12) to compute the velocity. Employ a step size of 2 s for the calculation. Solution. At the start of the computation (ti 5 0), the velocity of the parachutist is zero. Using this information and the parameter values from Example 1.1, Eq. (1.12) can be used to compute velocity at ti11 5 2 s: y 5 0 1 c 9.81 2 12.5 68.1 (0) d 2 5 19.62 m/s For the next interval (from t 5 2 to 4 s), the computation is repeated, with the result y 5 19.62 1 c9.81 2 12.5 68.1 (19.62)d2 5 32.04 m/s The calculation is continued in a similar fashion to obtain additional values: t, s v, m/s 0 0.00 2 19.62 4 32.04 6 39.90 8 44.87 10 48.02 12 50.01 ` 53.44 The results are plotted in Fig. 1.5 along with the exact solution. It can be seen that the numerical method captures the essential features of the exact solution. However, be- cause we have employed straight-line segments to approximate a continuously curving function, there is some discrepancy between the two results. One way to minimize such discrepancies is to use a smaller step size. For example, applying Eq. (1.12) at l-s intervals results in a smaller error, as the straight-line segments track closer to the true solution. Using hand calculations, the effort associated with using smaller and smaller step sizes would make such numerical solutions impractical. However, with the aid of the computer, large numbers of calculations can be performed easily. Thus, you can accurately model the velocity of the falling parachutist without having to solve the differential equation exactly. As in the previous example, a computational price must be paid for a more accurate numerical result. Each halving of the step size to attain more accuracy leads to a doubling
  • 35. 18 MATHEMATICAL MODELING AND ENGINEERING PROBLEM SOLVING of the number of computations. Thus, we see that there is a trade-off between accuracy and computational effort. Such trade-offs figure prominently in numerical methods and constitute an important theme of this book. Consequently, we have devoted the Epilogue of Part One to an introduction to more of these trade-offs. 1.2 CONSERVATION LAWS AND ENGINEERING Aside from Newton’s second law, there are other major organizing principles in engineering. Among the most important of these are the conservation laws. Although they form the basis for a variety of complicated and powerful mathematical models, the great conserva- tion laws of science and engineering are conceptually easy to understand. They all boil down to Change 5 increases 2 decreases (1.13) This is precisely the format that we employed when using Newton’s law to develop a force balance for the falling parachutist [Eq. (1.8)]. Although simple, Eq. (1.13) embodies one of the most fundamental ways in which conservation laws are used in engineering—that is, to predict changes with respect to time. We give Eq. (1.13) the special name time-variable (or transient) computation. Aside from predicting changes, another way in which conservation laws are applied is for cases where change is nonexistent. If change is zero, Eq. (1.13) becomes Change 5 0 5 increases 2 decreases or Increases 5 decreases (1.14) 0 0 20 40 4 8 12 t, s v, m/s Terminal velocity Exact, analytical solution Approximate, numerical solution FIGURE 1.5 Comparison of the numerical and analytical solutions for the falling parachutist problem.
  • 36. 1.2 CONSERVATION LAWS AND ENGINEERING 19 Thus, if no change occurs, the increases and decreases must be in balance. This case, which is also given a special name—the steady-state computation—has many applica- tions in engineering. For example, for steady-state incompressible fluid flow in pipes, the flow into a junction must be balanced by flow going out, as in Flow in 5 flow out For the junction in Fig. 1.6, the balance can be used to compute that the flow out of the fourth pipe must be 60. For the falling parachutist, steady-state conditions would correspond to the case where the net force was zero, or [Eq. (1.8) with dy兾dt 5 0] mg 5 cy (1.15) Thus, at steady state, the downward and upward forces are in balance, and Eq. (1.15) can be solved for the terminal velocity y 5 mg c Although Eqs. (1.13) and (1.14) might appear trivially simple, they embody the two fundamental ways that conservation laws are employed in engineering. As such, they will form an important part of our efforts in subsequent chapters to illustrate the connection between numerical methods and engineering. Our primary vehicles for making this con- nection are the engineering applications that appear at the end of each part of this book. Table 1.1 summarizes some of the simple engineering models and associated conserva- tion laws that will form the basis for many of these engineering applications. Most of the chemical engineering applications will focus on mass balances for reactors. The mass balance is derived from the conservation of mass. It specifies that the change of mass of a chemical in the reactor depends on the amount of mass flowing in minus the mass flowing out. Both the civil and mechanical engineering applications will focus on models devel- oped from the conservation of momentum. For civil engineering, force balances are utilized to analyze structures such as the simple truss in Table 1.1. The same principles are employed for the mechanical engineering applications to analyze the transient up-and-down motion or vibrations of an automobile. Pipe 2 Flow in = 80 Pipe 3 Flow out = 120 Pipe 4 Flow out = ? Pipe 1 Flow in = 100 FIGURE 1.6 A flow balance for steady-state incompressible fluid flow at the junction of pipes.
  • 37. 20 MATHEMATICAL MODELING AND ENGINEERING PROBLEM SOLVING TABLE 1.1 Devices and types of balances that are commonly used in the four major areas of engineering. For each case, the conservation law upon which the balance is based is specified. Structure Civil engineering Conservation of momentum Chemical engineering Field Device Organizing Principle Mathematical Expression Conservation of mass Force balance: Mechanical engineering Conservation of momentum Machine Force balance: Electrical engineering Conservation of charge Current balance: Conservation of energy Voltage balance: Mass balance: Reactors Input Output Over a unit of time period mass = inputs – outputs At each node horizontal forces (FH) = 0 vertical forces (FV) = 0 For each node current (i) = 0 Around each loop emf’s – voltage drops for resistors = 0 – iR = 0 –FV +FV +FH –FH +i2 –i3 +i1 + – Circuit i1R1 i3R3 i2R2 ␰ Upward force Downward force x = 0 m = downward force – upward force d2 x dt2
  • 38. PROBLEMS 21 TABLE 1.2 Some practical issues that will be explored in the engineering applications at the end of each part of this book. 1. Nonlinear versus linear. Much of classical engineering depends on linearization to permit analytical solutions. Although this is often appropriate, expanded insight can often be gained if nonlinear problems are examined. 2. Large versus small systems. Without a computer, it is often not feasible to examine systems with over three interacting components. With computers and numerical methods, more realistic multicomponent systems can be examined. 3. Nonideal versus ideal. Idealized laws abound in engineering. Often there are nonidealized alternatives that are more realistic but more computationally demanding. Approximate numerical approaches can facilitate the application of these nonideal relationships. 4. Sensitivity analysis. Because they are so involved, many manual calculations require a great deal of time and effort for successful implementation. This sometimes discourages the analyst from implementing the multiple computations that are necessary to examine how a system responds under different conditions. Such sensitivity analyses are facilitated when numerical methods allow the computer to assume the computational burden. 5. Design. It is often a straightforward proposition to determine the performance of a system as a function of its parameters. It is usually more difficult to solve the inverse problem—that is, determining the parameters when the required performance is specified. Numerical methods and computers often permit this task to be implemented in an efficient manner. Finally, the electrical engineering applications employ both current and energy bal- ances to model electric circuits. The current balance, which results from the conservation of charge, is similar in spirit to the flow balance depicted in Fig. 1.6. Just as flow must balance at the junction of pipes, electric current must balance at the junction of electric wires. The energy balance specifies that the changes of voltage around any loop of the circuit must add up to zero. The engineering applications are designed to illustrate how numerical methods are actually employed in the engineering problem-solving process. As such, they will permit us to explore practical issues (Table 1.2) that arise in real-world applications. Making these connections between mathematical techniques such as nu- merical methods and engineering practice is a critical step in tapping their true potential. Careful examination of the engineering applications will help you to take this step. PROBLEMS 1.1 Use calculus to solve Eq. (1.9) for the case where the initial velocity, y(0) is nonzero. 1.2 Repeat Example 1.2. Compute the velocity to t 5 8 s, with a step size of (a) 1 and (b) 0.5 s. Can you make any statement regard- ing the errors of the calculation based on the results? 1.3 Rather than the linear relationship of Eq. (1.7), you might choose to model the upward force on the parachutist as a second- order relationship, FU 5 2c¿y2 where c9 5 a bulk second-order drag coefficient (kg/m). (a) Using calculus, obtain the closed-form solution for the case where the jumper is initially at rest (y 5 0 at t 5 0). (b) Repeat the numerical calculation in Example 1.2 with the same initial condition and parameter values, but with second-order drag. Use a value of 0.22 kg/m for c9. 1.4 For the free-falling parachutist with linear drag, assume a first jumper is 70 kg and has a drag coefficient of 12 kg/s. If a second jumper has a drag coefficient of 15 kg/s and a mass of 80 kg, how long will it take him to reach the same velocity the first jumper reached in 9 s? 1.5 Compute the velocity of a free-falling parachutist using Euler’s method for the case where m 5 80 kg and c 5 10 kg/s. Perform the calculation from t 5 0 to 20 s with a step size of 1 s. Use an initial condition that the parachutist has an upward velocity of 20 m/s at t 5 0.At t 5 10 s, assume that the chute is instantaneously deployed so that the drag coefficient jumps to 60 kg/s.
  • 39. 22 MATHEMATICAL MODELING AND ENGINEERING PROBLEM SOLVING 1.6 The following information is available for a bank account: Date Deposits Withdrawals Interest Balance 5/1 1522.33 220.13 327.26 6/1 216.80 378.51 7/1 450.35 106.80 8/1 127.31 350.61 9/1 Note that the money earns interest which is computed as Interest 5 i Bi where i 5 the interest rate expressed as a fraction per month, and Bi the initial balance at the beginning of the month. (a) Use the conservation of cash to compute the balance on 6/1, 7/1, 8/1, and 9/1 if the interest rate is 1% per month (i 5 0.01/month). Show each step in the computation. (b) Write a differential equation for the cash balance in the form dB dt 5 f(D(t), W(t), i) where t 5 time (months), D(t) 5 deposits as a function of time ($/month), W(t) 5 withdrawals as a function of time ($/month). For this case, assume that interest is compounded continu- ously; that is, interest 5 iB. (c) Use Euler’s method with a time step of 0.5 month to simulate the balance. Assume that the deposits and withdrawals are ap- plied uniformly over the month. (d) Develop a plot of balance versus time for (a) and (c). 1.7 The amount of a uniformly distributed radioactive contaminant contained in a closed reactor is measured by its concentration c (becquerel/liter or Bq/L). The contaminant decreases at a decay rate proportional to its concentration—that is, decay rate 5 2kc where k is a constant with units of day21 . Therefore, according to Eq. (1.13), a mass balance for the reactor can be written as dc dt 5 2kc a change in mass b 5 a decrease by decay b (a) Use Euler’s method to solve this equation from t 5 0 to 1 d with k 5 0.175d21 . Employ a step size of Dt 5 0.1. The con- centration at t 5 0 is 100 Bq/L. (b) Plot the solution on a semilog graph (i.e., ln c versus t) and determine the slope. Interpret your results. 1.8 A group of 35 students attend a class in a room that measures 11 m by 8 m by 3 m. Each student takes up about 0.075 m3 and gives out about 80 W of heat (1 W 5 1 J/s). Calculate the air tem- perature rise during the first 20 minutes of the class if the room is completely sealed and insulated. Assume the heat capacity, Cy, for air is 0.718 kJ/(kg K). Assume air is an ideal gas at 208C and 101.325 kPa. Note that the heat absorbed by the air Q is related to the mass of the air m, the heat capacity, and the change in tempera- ture by the following relationship: Q 5 m # T2 T1 CydT 5 mCy (T2 2 T1) The mass of air can be obtained from the ideal gas law: PV 5 m MwT RT where P is the gas pressure, V is the volume of the gas, Mwt is the molecular weight of the gas (for air, 28.97 kg/kmol), and R is the ideal gas constant [8.314 kPa m3 /(kmol K)]. 1.9 A storage tank contains a liquid at depth y, where y 5 0 when the tank is half full. Liquid is withdrawn at a constant flow rate Q to meet demands. The contents are resupplied at a sinusoidal rate 3Q sin2 (t). y 0 FIGURE P1.9 Equation (1.13) can be written for this system as d(Ay) dt 5 3Qsin2 (t) 2 Q a change in volume b 5 (inflow) 2 (outflow) or, since the surface area A is constant dy dt 5 3 Q A sin2 (t) 2 Q A
  • 40. PROBLEMS 23 Use Euler’s method to solve for the depth y from t 5 0 to 10 d with a step size of 0.5 d. The parameter values are A 5 1250 m2 and Q 5 450 m3 /d. Assume that the initial condition is y 5 0. 1.10 For the same storage tank described in Prob. 1.9, suppose that the outflow is not constant but rather depends on the depth. For this case, the differential equation for depth can be written as dy dt 5 3 Q A sin2 (t) 2 a(1 1 y)1.5 A Use Euler’s method to solve for the depth y from t 5 0 to 10 d with a step size of 0.5 d. The parameter values are A 5 1250 m2 , Q 5 450 m3 /d, and a 5 150. Assume that the initial condition is y 5 0. 1.11 Apply the conservation of volume (see Prob. 1.9) to simulate the level of liquid in a conical storage tank (Fig. P1.11). The liquid flows in at a sinusoidal rate of Qin 5 3 sin2 (t) and flows out accord- ing to Qout 5 3(y 2 yout)1.5 y . yout Qout 5 0 y # yout where flow has units of m3 /d and y 5 the elevation of the water sur- face above the bottom of the tank (m). Use Euler’s method to solve for the depth y from t 5 0 to 10 d with a step size of 0.5 d. The pa- rameter values are rtop 5 2.5 m, ytop 5 4 m, and yout 5 1 m. Assume that the level is initially below the outlet pipe with y(0) 5 0.8 m. ytop y yout 0 Qin Qout s 1 rtop FIGURE P1.11 1.12 In our example of the free-falling parachutist, we assumed that the acceleration due to gravity was a constant value. Although this is a decent approximation when we are examining falling objects near the surface of the earth, the gravitational force decreases as we move above sea level. A more general representation based on Newton’s inverse square law of gravitational attraction can be written as g(x) 5 g(0) R2 (R 1 x)2 where g(x) 5 gravitational acceleration at altitude x (in m) mea- sured upward from the earth’s surface (m/s2 ), g(0) 5 gravitational acceleration at the earth’s surface ( 9.81 m/s2 ), and R 5 the earth’s radius ( 6.37 3 106 m). (a) In a fashion similar to the derivation of Eq. (1.9) use a force balance to derive a differential equation for velocity as a func- tion of time that utilizes this more complete representation of gravitation. However, for this derivation, assume that upward velocity is positive. (b) For the case where drag is negligible, use the chain rule to ex- press the differential equation as a function of altitude rather than time. Recall that the chain rule is dy dt 5 dy dx dx dt (c) Use calculus to obtain the closed form solution where y 5 y0 at x 5 0. (d) Use Euler’s method to obtain a numerical solution from x 5 0 to 100,000 m using a step of 10,000 m where the initial velocity is 1500 m/s upward. Compare your result with the analytical solution. 1.13 Suppose that a spherical droplet of liquid evaporates at a rate that is proportional to its surface area. dV dt 5 2kA where V 5 volume (mm3 ), t 5 time (min), k 5 the evaporation rate (mm/min), and A 5 surface area (mm2 ). Use Euler’s method to compute the volume of the droplet from t 5 0 to 10 min using a step size of 0.25 min.Assume that k 5 0.08 mm/min and that the droplet initially has a radius of 2.5 mm. Assess the validity of your results by determining the radius of your final computed volume and veri- fying that it is consistent with the evaporation rate. 1.14 Newton’s law of cooling says that the temperature of a body changes at a rate proportional to the difference between its temperature and that of the surrounding medium (the ambient temperature), dT dt 5 2k(T 2 Ta) where T 5 the temperature of the body (8C), t 5 time (min), k 5 the proportionality constant (per minute), and Ta 5 the ambi- ent temperature (8C). Suppose that a cup of coffee originally has a temperature of 708C. Use Euler’s method to compute the temperature from t 5 0 to 10 min using a step size of 2 min if Ta 5 208C and k 5 0.019/min. 1.15 As depicted in Fig. P1.15, an RLC circuit consists of three elements: a resistor (R), and inductor (L) and a capacitor (C). The flow of current across each element induces a voltage drop.
  • 41. 24 MATHEMATICAL MODELING AND ENGINEERING PROBLEM SOLVING Q1 Q10 Q9 Q8 Q3 Q5 Q7 Q6 Q4 Q2 FIGURE P1.17 1.18 The velocity is equal to the rate of change of distance x (m), dx dt 5 y(t) (P1.18) (a) Substitute Eq. (1.10) and develop an analytical solution for distance as a function of time. Assume that x(0) 5 0. (b) Use Euler’s method to numerically integrate Eqs. (P1.18) and (1.9) in order to determine both the velocity and distance fallen as a function of time for the first 10 s of free-fall using the same parameters as in Example 1.2. (c) Develop a plot of your numerical results together with the ana- lytical solution. 1.19 You are working as a crime-scene investigator and must pre- dict the temperature of a homicide victim over a 5-hr period. You know that the room where the victim was found was at 108C when the body was discovered. (a) Use Newton’s law of cooling (Prob. 1.14) and Euler’s method to compute the victim’s body temperature for the 5-hr period using values of k 5 0.12/hr and Dt 5 0.5 hr. Assume that the victim’s body temperature at the time of death was 378C, and that the room temperature was at a constant value of 108C over the 5-hr period. (b) Further investigation reveals that the room temperature had actually dropped linearly from 20 to 108C over the 5-hr period. Repeat the same calculation as in (a) but incorporate this new information. (c) Compare the results from (a) and (b) by plotting them on the same graph. 1.20 Suppose that a parachutist with linear drag (m 5 70 kg, c 5 12.5 kg/s) jumps from an airplane flying at an altitude of a kilo- meter with a horizontal velocity of 180 m/s relative to the ground. (a) Write a system of four differential equations for x, y, yx 5 dx/dt and yy 5 dy/dt. Kirchhoff’s second voltage law states that the algebraic sum of these voltage drops around a closed circuit is zero, iR 1 L di dt 1 q C 5 0 where i 5 current, R 5 resistance, L 5 inductance, t 5 time, q 5 charge, and C 5 capacitance. In addition, the current is related to charge as in dq dt 5 i (a) If the initial values are i(0) 5 0 and q(0) 5 1 C, use Euler’s method to solve this pair of differential equations from t 5 0 to 0.1 s using a step size of Dt 5 0.01 s. Employ the following parameters for your calculation: R 5 200 V, L 5 5 H, and C 5 10–4 F. (b) Develop a plot of i and q versus t. q c iR Resistor Inductor Capacitor i di dt L FIGURE P1.15 1.16 Cancer cells grow exponentially with a doubling time of 20 h when they have an unlimited nutrient supply. However, as the cells start to form a solid spherical tumor without a blood supply, growth at the center of the tumor becomes limited, and eventually cells start to die. (a) Exponential growth of cell number N can be expressed as shown, where m is the growth rate of the cells. For cancer cells, find the value of m. dN dt 5 mN (b) Write an equation that will describe the rate of change of tumor volume during exponential growth given that the diameter of an individual cell is 20 microns. (c) After a particular type of tumor exceeds 500 microns in diam- eter, the cells at the center of the tumor die (but continue to take up space in the tumor). Determine how long it will take for the tumor to exceed this critical size. 1.17 A fluid is pumped into the network shown in Fig. P1.17. If Q2 5 0.6, Q3 5 0.4, Q7 5 0.2, and Q8 5 0.3 m3 /s, determine the other flows.
  • 42. PROBLEMS 25 (b) At steady-state, use this equation to solve for the particle’s terminal velocity. (c) Employ the result of (b) to compute the particle’s terminal velocity in m/s for a spherical silt particle settling in water: d 5 10 mm, r 5 1 g/cm3 , rs 5 2.65 g/cm3 , and m 5 0.014 g/(cm?s). (d) Check whether flow is laminar. (e) Use Euler’s method to compute the velocity from t 5 0 to 2215 s with Dt 5 2218 s given the parameters given previously along with the initial condition: y (0) 5 0. FG FD FB d FIGURE P1.22 1.23 As described in Prob. 1.22, in addition to the downward force of gravity (weight) and drag, an object falling through a fluid is also subject to a buoyancy force that is proportional to the displaced volume. For example, for a sphere with diameter d (m), the sphere’s volume is V 5 pd3 /6 and its projected area is A 5 pd2 /4. The buoy- ancy force can then be computed as Fb 5 –rVg. We neglected buoyancy in our derivation of Eq. (1.9) because it is relatively small for an object like a parachutist moving through air. However, for a more dense fluid like water, it becomes more prominent. (a) Derive a differential equation in the same fashion as Eq. (1.9), but include the buoyancy force and represent the drag force as described in Prob. 1.21. (b) Rewrite the differential equation from (a) for the special case of a sphere. (c) Use the equation developed in (b) to compute the terminal velocity (i.e., for the steady-state case). Use the following parameter values for a sphere falling through water: sphere diameter 5 1 cm, sphere density 5 2700 kg/m3 , water density 5 1000 kg/m3 , and Cd 5 0.47. (d) Use Euler’s method with a step size of Dt 5 0.03125 s to nu- merically solve for the velocity from t 5 0 to 0.25 s with an initial velocity of zero. 1.24 As depicted in Fig. P1.24, the downward deflection y (m) of a cantilever beam with a uniform load w (kg/m) can be computed as y 5 w 24EI (x4 2 4Lx3 1 6L2 x2 ) where x 5 distance (m), E 5 the modulus of elasticity 5 2 3 1011 Pa, I 5 moment of inertia 5 3.25 3 10–4 m4 , w 5 10,000 N/m, and (b) If the initial horizontal position is defined as x 5 0, use Euler’s methods with Dt 5 1 s to compute the jumper’s position over the first 10 s. (c) Develop plots of y versus t and y versus x. Use the plot to graphically estimate when and where the jumper would hit the ground if the chute failed to open. 1.21 As noted in Prob. 1.3, drag is more accurately represented as depending on the square of velocity. A more fundamental represen- tation of the drag force, which assumes turbulent conditions (i.e., a high Reynolds number), can be formulated as Fd 5 2 1 2 rACdyZyZ where Fd 5 the drag force (N), r 5 fluid density (kg/m3 ), A 5 the fron- tal area of the object on a plane perpendicular to the direction of motion (m2 ), y 5 velocity (m/s), and Cd 5 a dimensionless drag coefficient. (a) Write the pair of differential equations for velocity and position (see Prob. 1.18) to describe the vertical motion of a sphere with di- ameter d (m) and a density of rs (kg/km3 ). The differential equation for velocity should be written as a function of the sphere’s diameter. (b) Use Euler’s method with a step size of Dt 5 2 s to compute the posi- tion and velocity of a sphere over the first 14 s. Employ the follow- ing parameters in your calculation: d 5 120 cm, r 5 1.3 kg/m3 , rs 5 2700 kg/m3 , and Cd 5 0.47. Assume that the sphere has the initial conditions: x(0) 5 100 m and y(0) 5 –40 m/s. (c) Develop a plot of your results (i.e., y and y versus t) and use it to graphically estimate when the sphere would hit the ground. (d) Compute the value for the bulk second-order drag coefficient cd9 (kg/m). Note that, as described in Prob. 1.3, the bulk second- order drag coefficient is the term in the final differential equa- tion for velocity that multiplies the term yZyZ. 1.22 As depicted in Fig. P1.22, a spherical particle settling through a quiescent fluid is subject to three forces: the downward force of gravity (FG), and the upward forces of buoyancy (FB) and drag (FD). Both the gravity and buoyancy forces can be computed with Newton’s second law with the latter equal to the weight of the displaced fluid. For lami- nar flow, the drag force can be computed with Stokes’s law, FD 5 3pmdy where m 5 the dynamic viscosity of the fluid (N s/m2 ), d 5 the particle diameter (m), and y 5 the particle’s settling velocity (m/s). Note that the mass of the particle can be expressed as the product of the particle’s volume and density rs (kg/m3 ) and the mass of the dis- placed fluid can be computed as the product of the particle’s volume and the fluid’s density r (kg/m3 ). The volume of a sphere is pd3 /6. In addition, laminar flow corresponds to the case where the dimension- less Reynolds number, Re, is less than 1, where Re 5 rdy/m. (a) Use a force balance for the particle to develop the differential equation for dy/dt as a function of d, r, rs, and m.
  • 43. 26 MATHEMATICAL MODELING AND ENGINEERING PROBLEM SOLVING 1.26 Beyond fluids, Archimedes’ principle has proven useful in geology when applied to solids on the earth’s crust. Figure P1.26 depicts one such case where a lighter conical granite mountain “floats on” a denser basalt layer at the earth’s surface. Note that the part of the cone below the surface is formally referred to as a frus- tum. Develop a steady-state force balance for this case in terms of the following parameters: basalt’s density (rb), granite’s density (rg), the cone’s bottom radius (r), and the height above (h1) and below (h2) the earth’s surface. H Basalt Granite h1 h2 r1 r2 FIGURE P1.26 L 5 length 5 4 m. This equation can be differentiated to yield the slope of the downward deflection as a function of x: dy dx 5 w 24EI (4x3 2 12Lx2 1 12L2 x) If y 5 0 at x 5 0, use this equation with Euler’s method (Dx 5 0.125 m) to compute the deflection from x 5 0 to L. Develop a plot of your results along with the analytical solution computed with the first equation. y w x = 0 x = L 0 FIGURE P1.24 A cantilever beam. 1.25 Use Archimedes’ principle to develop a steady-state force bal- ance for a spherical ball of ice floating in seawater (Fig. P1.25). The force balance should be expressed as a third-order polynomial (cubic) in terms of height of the cap above the water line (h), the seawater’s density (rf), the ball’s density (rs), and the ball’s radius (r). h r FIGURE P1.25
  • 44. 2 27 Programming and Software In Chap. 1, we used a net force to develop a mathematical model to predict the fall velocity of a parachutist. This model took the form of a differential equation, dy dt 5 g 2 c m y We also learned that a solution to this equation could be obtained by a simple numerical approach called Euler’s method, yi11 5 yi 1 dyi dt ¢t Given an initial condition, this equation can be implemented repeatedly to compute the velocity as a function of time. However, to obtain good accuracy, many small steps must be taken. This would be extremely laborious and time-consuming to implement by hand. However, with the aid of the computer, such calculations can be performed easily. So our next task is to figure out how to do this. The present chapter will introduce you to how the computer is used as a tool to obtain such solutions. 2.1 PACKAGES AND PROGRAMMING Today, there are two types of software users. On one hand, there are those who take what they are given. That is, they limit themselves to the capabilities found in the software’s standard mode of operation. For example, it is a straightforward proposition to solve a system of linear equations or to generate a plot of x-y values with either Excel or MATLAB software. Because this usually involves a minimum of effort, most users tend to adopt this “vanilla” mode of operation. In addition, since the designers of these packages anticipate most typical user needs, many meaningful problems can be solved in this way. But what happens when problems arise that are beyond the standard capability of the tool? Unfortunately, throwing up your hands and saying, “Sorry boss, no can do!” is not acceptable in most engineering circles. In such cases, you have two alternatives. First, you can look for a different package and see if it is capable of solving the problem. That is one of the reasons we have chosen to cover both Excel and MATLAB in this book. As you will see, neither one is all encompassing and each has different C H A P T E R 2
  • 45. 28 PROGRAMMING AND SOFTWARE strengths. By being conversant with both, you will greatly increase the range of problems you can address. Second, you can grow and become a “power user” by learning to write Excel VBA1 macros or MATLAB M-files. And what are these? They are nothing more than computer programs that allow you to extend the capabilities of these tools. Because engineers should never be content to be tool limited, they will do whatever is necessary to solve their prob- lems. A powerful way to do this is to learn to write programs in the Excel and MATLAB environments. Furthermore, the programming skills required for macros and M-files are the same as those needed to effectively develop programs in languages like Fortran 90 or C. The major goal of the present chapter is to show you how this can be done. However, we do assume that you have been exposed to the rudiments of computer programming. Therefore, our emphasis here is on facets of programming that directly affect its use in engineering problem solving. 2.1.1 Computer Programs Computer programs are merely a set of instructions that direct the computer to perform a certain task. Since many individuals write programs for a broad range of applications, most high-level computer languages, like Fortran 90 and C, have rich capabilities. Although some engineers might need to tap the full range of these capabilities, most merely require the ability to perform engineering-oriented numerical calculations. Looked at from this perspective, we can narrow down the complexity to a few programming topics. These are: • Simple information representation (constants, variables, and type declarations). • Advanced information representation (data structure, arrays, and records). • Mathematical formulas (assignment, priority rules, and intrinsic functions). • Input/output. • Logical representation (sequence, selection, and repetition). • Modular programming (functions and subroutines). Because we assume that you have had some prior exposure to programming, we will not spend time on the first four of these areas. At best, we offer them as a checklist that covers what you will need to know to implement the programs that follow. However, we will devote some time to the last two topics. We emphasize logical representation because it is the single area that most influences an algorithm’s coherence and understandability. We include modular programming because it also contributes greatly to a program’s organization. In addition, modules provide a means to archive useful algorithms in a convenient format for subsequent applications. 2.2 STRUCTURED PROGRAMMING In the early days of computers, programmers usually did not pay much attention to whether their programs were clear and easy to understand. Today, it is recognized that there are many benefits to writing organized, well-structured code. Aside from the obvious benefit of making software much easier to share, it also helps generate much more efficient 1 VBA is the acronym for Visual Basic for Applications.
  • 46. 2.2 STRUCTURED PROGRAMMING 29 program development. That is, well-structured algorithms are invariably easier to debug and test, resulting in programs that take a shorter time to develop, test, and update. Computer scientists have systematically studied the factors and procedures needed to develop high-quality software of this kind. In essence, structured programming is a set of rules that prescribe good style habits for the programmer. Although structured programming is flexible enough to allow considerable creativity and personal expression, its rules impose enough constraints to render the resulting codes far superior to unstruc- tured versions. In particular, the finished product is more elegant and easier to understand. A key idea behind structured programming is that any numerical algorithm can be composed using the three fundamental control structures: sequence, selection, and rep- etition. By limiting ourselves to these structures, the resulting computer code will be clearer and easier to follow. In the following paragraphs, we will describe each of these structures. To keep this description generic, we will employ flowcharts and pseudocode. A flowchart is a visual or graphical representation of an algorithm. The flowchart employs a series of blocks and arrows, each of which represents a particular operation or step in the algorithm (Fig. 2.1). The arrows represent the sequence in which the operations are implemented. Not everyone involved with computer programming agrees that flowcharting is a productive endeavor. In fact, some experienced programmers do not advocate flow- charts. However, we feel that there are three good reasons for studying them. First, they are still used for expressing and communicating algorithms. Second, even if they are not employed routinely, there will be times when they will prove useful in planning, unraveling, or communicating the logic of your own or someone else’s program. Finally, and most important for our purposes, they are excellent pedagogical tools. From a FIGURE 2.1 Symbols used in flowcharts. SYMBOL NAME Terminal Flowlines Process Input/output Decision Junction Off-page connector Count-controlled loop FUNCTION Represents the beginning or end of a program. Represents the flow of logic. The humps on the horizontal arrow indicate that it passes over and does not connect with the vertical flowlines. Represents calculations or data manipulations. Represents inputs or outputs of data and information. Represents a comparison, question, or decision that determines alternative paths to be followed. Represents the confluence of flowlines. Represents a break that is continued on another page. Used for loops which repeat a prespecified number of iterations.
  • 47. 30 PROGRAMMING AND SOFTWARE teaching perspective, they are ideal vehicles for visualizing some of the fundamental control structures employed in computer programming. An alternative approach to express an algorithm that bridges the gap between flow- charts and computer code is called pseudocode. This technique uses code-like statements in place of the graphical symbols of the flowchart. We have adopted some style conventions for the pseudocode in this book. Keywords such as IF, DO, INPUT, etc., are capitalized, whereas the conditions, processing steps, and tasks are in lowercase. Additionally, the processing steps are indented. Thus the keywords form a “sandwich” around the steps to visually define the extent of each control structure. One advantage of pseudocode is that it is easier to develop a program with it than with a flowchart. The pseudocode is also easier to modify and share with others. However, because of their graphic form, flowcharts sometimes are better suited for visualizing complex algorithms. In the present text, we will use flowcharts for pedagogical purposes. Pseudocode will be our principal vehicle for communicating algorithms related to numerical methods. 2.2.1 Logical Representation Sequence. The sequence structure expresses the trivial idea that unless you direct it otherwise, the computer code is to be implemented one instruction at a time. As in Fig. 2.2, the structure can be expressed generically as a flowchart or as pseudocode. Selection. In contrast to the step-by-step sequence structure, selection provides a means to split the program’s flow into branches based on the outcome of a logical condition. Figure 2.3 shows the two most fundamental ways for doing this. The single-alternative decision, or IF/THEN structure (Fig. 2.3a), allows for a detour in the program flow if a logical condition is true. If it is false, nothing happens and the program moves directly to the next statement following the ENDIF. The double-alternative decision, or IF/THEN/ELSE structure (Fig. 2.3b), behaves in the same manner for a true condition. However, if the condition is false, the program implements the code between the ELSE and the ENDIF. FIGURE 2.2 (a) Flowchart and (b) pseudocode for the sequence structure. Instruction1 Instruction2 Instruction3 Instruction4 Instruction1 Instruction2 Instruction3 Instruction4 (a) Flowchart (b) Pseudocode
  • 48. 2.2 STRUCTURED PROGRAMMING 31 Although the IF/THEN and the IF/THEN/ELSE constructs are sufficient to construct any numerical algorithm, two other variants are commonly used. Suppose that the ELSE clause of an IF/THEN/ELSE contains another IF/THEN. For such cases, the ELSE and the IF can be combined in the IF/THEN/ELSEIF structure shown in Fig. 2.4a. Notice how in Fig. 2.4a there is a chain or “cascade” of decisions. The first one is the IF statement, and each successive decision is an ELSEIF statement. Going down the chain, the first condition encountered that tests true will cause a branch to its correspond- ing code block followed by an exit of the structure. At the end of the chain of conditions, if all the conditions have tested false, an optional ELSE block can be included. The CASE structure is a variant on this type of decision making (Fig. 2.4b). Rather than testing individual conditions, the branching is based on the value of a single test expression. Depending on its value, different blocks of code will be implemented. In addition, an optional block can be implemented if the expression takes on none of the prescribed values (CASE ELSE). Repetition. Repetition provides a means to implement instructions repeatedly. The resulting constructs, called loops, come in two “flavors” distinguished by how they are terminated. FIGURE 2.3 Flowchart and pseudocode for simple selection constructs. (a) Single-alternative selection (IF/THEN) and (b) double- alternative selection (IF/THEN/ELSE). (a) Single-alternative structure (IF/THEN) (b) Double-alternative structure (IF/THEN/ELSE) Flowchart Pseudocode IF condition THEN True block ENDIF True Condition ? True Block IF condition THEN True block ELSE False block ENDIF True False Condition ? True Block False Block
  • 49. 32 PROGRAMMING AND SOFTWARE The first and most fundamental type is called a decision loop because it terminates based on the result of a logical condition. Figure 2.5 shows the most generic type of decision loop, the DOEXIT construct, also called a break loop. This structure repeats until a logical condition is true. It is not necessary to have two blocks in this structure. If the first block is not included, the structure is sometimes called a pretest loop because the logical test is performed before anything occurs. Alternatively, if the second block is omitted, it is (a) Multialternative structure (IF/THEN/ELSEIF) (b) CASE structure (SELECT or SWITCH) Flowchart Pseudocode SELECT CASE Test Expression CASE Value1 Block1 CASE Value2 Block2 CASE Value3 Block3 CASE ELSE Block4 END SELECT Value1 Value2 Value3 Else Test expression Block1 Block2 Block3 Block4 IF condition1 THEN Block1 ELSEIF condition2 Block2 ELSEIF condition3 Block3 ELSE Block4 ENDIF True False True True Condition1 ? False Condition3 ? False Condition2 ? Block1 Block2 Block3 Block4 FIGURE 2.4 Flowchart and pseudocode for supplementary selection or branching constructs. (a) Multiple- alternative selection (IF/THEN/ELSEIF) and (b) CASE construct.
  • 50. 2.2 STRUCTURED PROGRAMMING 33 called a posttest loop. Because both blocks are included, the general case in Fig. 2.5 is sometimes called a midtest loop. It should be noted that the DOEXIT loop was introduced in Fortran 90 in an effort to simplify decision loops. This control construct is a standard part of the Excel VBA macro language but is not standard in C or MATLAB, which use the so-called WHILE structure. Because we believe that the DOEXIT is superior, we have adopted it as our decision loop structure throughout this book. In order to ensure that our algorithms are directly implemented in both MATLAB and Excel, we will show how the break loop can be simulated with the WHILE structure later in this chapter (see Sec. 2.5). The break loop in Fig. 2.5 is called a logical loop because it terminates on a logical condition. In contrast, a count-controlled or DOFOR loop (Fig. 2.6) performs a specified number of repetitions, or iterations. The count-controlled loop works as follows. The index (represented as i in Fig. 2.6) is a variable that is set at an initial value of start. The program then tests whether the FIGURE 2.5 The DOEXIT or break loop. False True Condition ? DO Block1 IF condition EXIT Block2 ENDDO Flowchart Pseudocode Block1 Block2 FIGURE 2.6 The count-controlled or DOFOR construct. i = start True False i finish ? i = i + step DOFOR i = start, finish, step Block ENDDO Flowchart Pseudocode Block
  • 51. 34 PROGRAMMING AND SOFTWARE index is less than or equal to the final value, finish. If so, it executes the body of the loop, and then cycles back to the DO statement. Every time the ENDDO statement is encountered, the index is automatically increased by the step. Thus the index acts as a counter. Then, when the index is greater than the final value (finish), the computer auto- matically exits the loop and transfers control to the line following the ENDDO statement. Note that for nearly all computer languages, including those of Excel and MATLAB, if the step is omitted, the computer assumes it is equal to 1.2 The numerical algorithms outlined in the following pages will be developed exclu- sively from the structures outlined in Figs. 2.2 through 2.6. The following example illustrates the basic approach by developing an algorithm to determine the roots for the quadratic formula. EXAMPLE 2.1 Algorithm for Roots of a Quadratic Problem Statement. The roots of a quadratic equation ax2 1 bx 1 c 5 0 can be determined with the quadratic formula, x1 x2 5 2b 6 2Zb2 2 4acZ 2a (E2.1.1) Develop an algorithm that does the following: Step 1: Prompts the user for the coefficients, a, b, and c. Step 2: Implements the quadratic formula, guarding against all eventualities (for example, avoiding division by zero and allowing for complex roots). Step 3: Displays the solution, that is, the values for x. Step 4: Allows the user the option to return to step 1 and repeat the process. Solution. We will use a top-down approach to develop our algorithm. That is, we will successively refine the algorithm rather than trying to work out all the details the first time around. To do this, let us assume for the present that the quadratic formula is foolproof regardless of the values of the coefficients (obviously not true, but good enough for now). A structured algorithm to implement the scheme is DO INPUT a, b, c r1 5 (2b 1 SQRT(b2 2 4ac))y(2a) r2 5 (2b 2 SQRT(b2 2 4ac))y(2a) DISPLAY r1, r2 DISPLAY 'Try again? Answer yes or no' INPUT response IF response 5 'no' EXIT ENDDO 2 A negative step can be used. In such cases, the loop terminates when the index is less than the final value.
  • 52. 2.2 STRUCTURED PROGRAMMING 35 A DOEXIT construct is used to implement the quadratic formula repeatedly as long as the condition is false. The condition depends on the value of the character variable response. If response is equal to ‘yes’ the calculation is implemented. If not, that is, response 5 ‘no’ the loop terminates. Thus, the user controls termination by inputting a value for response. Now although the above algorithm works for certain cases, it is not foolproof. Depend- ing on the values of the coefficients, the algorithm might not work. Here is what can happen: • If a 5 0, an immediate problem arises because of division by zero. In fact, close inspection of Eq. (E2.1.1) indicates that two different cases can arise. That is, If b fi 0, the equation reduces to a linear equation with one real root, 2cyb. If b 5 0, then no solution exists. That is, the problem is trivial. • If a fi 0, two possible cases occur depending on the value of the discriminant, d 5 b2 2 4ac. That is, If d $ 0, two real roots occur. If d , 0, two complex roots occur. Notice how we have used indentation to highlight the decisional structure that underlies the mathematics. This structure then readily translates to a set of coupled IF/THEN/ELSE structures that can be inserted in place of the shaded statements in the previous code to give the final algorithm: DO INPUT a, b, c r1 5 0: r2 5 0: i1 5 0: i2 5 0 IF a 5 0 THEN IF b fi 0 THEN r1 5 2cyb ELSE DISPLAY Trivial solution ENDIF ELSE discr 5 b2 2 4 * a * c IF discr $ 0 THEN r1 5 (2b 1 Sqrt(discr))y(2 * a) r2 5 (2b 2 Sqrt(discr))y(2 * a) ELSE r1 5 2by(2 * a) r2 5 r1 i1 5 Sqrt(Abs(discr))y(2 * a) i2 5 2il ENDIF ENDIF DISPLAY r1, r2, i1, i2 DISPLAY 'Try again? Answer yes or no' INPUT response IF response 5 'no' EXIT ENDDO
  • 53. 36 PROGRAMMING AND SOFTWARE The approach in the foregoing example can be employed to develop an algorithm for the parachutist problem. Recall that, given an initial condition for time and velocity, the problem involved iteratively solving the formula yi11 5 yi 1 dyi dt ¢t (2.1) Now also remember that if we desired to attain good accuracy, we would need to employ small steps. Therefore, we would probably want to apply the formula repeatedly from the initial time to the final time. Consequently, an algorithm to solve the problem would be based on a loop. For example, suppose that we started the computation at t 5 0 and wanted to predict the velocity at t 5 4 s using a time step of Dt 5 0.5 s. We would, therefore, need to apply Eq. (2.1) eight times, that is, n 5 4 0.5 5 8 where n 5 the number of iterations of the loop. Because this result is exact, that is, the ratio is an integer, we can use a count-controlled loop as the basis for the algorithm. Here is an example of the pseudocode: g 5 9.81 INPUT cd, m INPUT ti, vi, tf, dt t 5 ti v 5 vi n 5 (tf 2 ti) y dt DOFOR i 5 1 TO n dvdt 5 g 2 (cd y m) * v v 5 v 1 dvdt * dt t 5 t 1 dt ENDDO DISPLAY v Although this scheme is simple to program, it is not foolproof. In particular, it will work only if the computation interval is evenly divisible by the time step.3 In order to cover such cases, a decision loop can be substituted in place of the shaded area in the previous pseudocode. The final result is g 5 9.81 INPUT cd, m INPUT ti, vi, tf, dt t 5 ti v 5 vi 3 This problem is compounded by the fact that computers use base-2 number representation for their internal math. Consequently, some apparently evenly divisible numbers do not yield integers when the division is implemented on a computer. We will cover this in Chap. 3.
  • 54. 2.3 MODULAR PROGRAMMING 37 h 5 dt DO IF t 1 dt . tf THEN h 5 tf 2 t ENDIF dvdt 5 g 2 (cd y m) * v v 5 v 1 dvdt * h t 5 t 1 h IF t $ tf EXIT ENDDO DISPLAY v As soon as we enter the loop, we use an IF/THEN structure to test whether adding t 1 dt will take us beyond the end of the interval. If it does not, which would usually be the case at first, we do nothing. If it does, we would need to shorten the interval by setting the variable step h to t f 2 t. By doing this, we guarantee that the next step falls exactly on t f. After we implement this final step, the loop will terminate because the condition t $ t f will test true. Notice that before entering the loop, we assign the value of the time step, dt, to another variable, h. We create this dummy variable so that our routine does not change the given value of dt if and when we shorten the time step. We do this in anticipation that we might need to use the original value of dt somewhere else in the event that this code is integrated within a larger program. It should be noted that the algorithm is still not foolproof. For example, the user could have mistakenly entered a step size greater than the calculation interval, for example, t f 2 ti 5 5 and dt 5 20. Thus, you might want to include error traps in your code to catch such errors and to then allow the user to correct the mistake. 2.3 MODULAR PROGRAMMING Imagine how difficult it would be to study a textbook that had no chapters, sections, or paragraphs. Breaking complicated tasks or subjects into more manageable parts is one way to make them easier to handle. In the same spirit, computer programs can be divided into small subprograms, or modules, that can be developed and tested separately. This approach is called modular programming. The most important attribute of modules is that they be as independent and self- contained as possible. In addition, they are typically designed to perform a specific, well-defined function and have one entry and one exit point. As such, they are usually short (generally 50 to 100 instructions in length) and highly focused. In standard high-level languages such as Fortran 90 or C, the primary programming element used to represent each module is the procedure. A procedure is a series of com- puter instructions that together perform a given task. Two types of procedures are com- monly employed: functions and subroutines. The former usually returns a single result, whereas the latter returns several. In addition, it should be mentioned that much of the programming related to software packages like Excel and MATLAB involves the development of subprograms. Hence,
  • 55. 38 PROGRAMMING AND SOFTWARE Excel macros and MATLAB functions are designed to receive some information, perform a calculation, and return results. Thus, modular thinking is also consistent with how programming is implemented in package environments. Modular programming has a number of advantages. The use of small, self-contained units makes the underlying logic easier to devise and to understand for both the developer and the user. Development is facilitated because each module can be perfected in isolation. In fact, for large projects, different programmers can work on individual parts. Modular design also increases the ease with which a program can be debugged and tested because errors can be more easily isolated. Finally, program maintenance and modification are facilitated. This is primarily due to the fact that new modules can be developed to perform additional tasks and then easily incorporated into the already coherent and organized scheme. While all these attributes are reason enough to use modules, the most important reason related to numerical engineering problem solving is that they allow you to main- tain your own library of useful modules for later use in other programs. This will be the philosophy of this book: All the algorithms will be presented as modules. This approach is illustrated in Fig. 2.7, which shows a function developed to imple- ment Euler’s method. Notice that this function application and the previous versions differ in how they handle input/output. In the former versions, input and output directly come from (via INPUT statements) and to (via DISPLAY statements) the user. In the function, the inputs are passed into the FUNCTION via its argument list Function Euler(dt, ti, tf, yi) and the output is returned via the assignment statement y 5 Euler(dt, ti, tf, yi) In addition, recognize how generic the routine has become. There are no references to the specifics of the parachutist problem. For example, rather than calling the dependent FUNCTION Euler(dt, ti, tf, yi) t 5 ti y 5 yi h 5 dt DO IF t 1 dt . tf THEN h 5 tf 2 t ENDIF dydt 5 dy(t, y) y 5 y 1 dydt * h t 5 t 1 h IF t $ tf EXIT ENDDO Euler 5 y END Euler FIGURE 2.7 Pseudocode for a function that solves a differential equation using Euler’s method.
  • 56. 2.4 EXCEL 39 variable y for velocity, the more generic label, y, is used within the function. Further, notice that the derivative is not computed within the function by an explicit equation. Rather, another function, dy, must be invoked to compute it. This acknowledges the fact that we might want to use this function for many different problems beyond solving for the parachutist’s velocity. 2.4 EXCEL Excel is the spreadsheet produced by Microsoft, Inc. Spreadsheets are a special type of mathematical software that allow the user to enter and perform calculations on rows and columns of data. As such, they are a computerized version of a large accounting work- sheet on which large interconnected calculations can be implemented and displayed. Because the entire calculation is updated when any value on the sheet is changed, spread- sheets are ideal for “what if?” sorts of analysis. Excel has some built-in numerical capabilities including equation solving, curve fitting, and optimization. It also includes VBA as a macro language that can be used to implement numerical calculations. Finally, it has several visualization tools, such as graphs and three-dimensional surface plots, that serve as valuable adjuncts for numerical analysis. In the present section, we will show how these capabilities can be used to solve the parachutist problem. To do this, let us first set up a simple spreadsheet. As shown below, the first step involves entering labels and numbers into the spreadsheet cells. Before we write a macro program to calculate the numerical value, we can make our subsequent work easier by attaching names to the parameter values. To do this, select cells A3:B5 (the easiest way to do this is by moving the mouse to A3, holding down the left mouse button and dragging down to B5). Next, go to the Formulas tab and in the Defined Names group, click Create from Selection. This will open the Create Names from Selection dialog box, where the Left column box should be automatically selected. Then click OK to create the names. To verify that this has worked properly, select cell B3 and check that the label “m” appears in the name box (located on the left side of the sheet just below the menu bars).
  • 57. 40 PROGRAMMING AND SOFTWARE Move to cell C8 and enter the analytical solution (Eq. 1.9), =9.81*m/cd*(1−exp(−cd/m*A8)) When this formula is entered, the value 0 should appear in cell C8. Then copy the for- mula down to cell C9 to give a value of 16.405 m/s. All the above is typical of the standard use of Excel. For example, at this point you could change parameter values and see how the analytical solution changes. Now, we will illustrate how VBA macros can be used to extend the standard capa- bilities. Figure 2.8 lists pseudocode alongside Excel VBA code for all the control struc- tures described in Sec. 2.2 (Figs. 2.3 through 2.6). Notice how, although the details differ, the structure of the pseudocode and the VBA code are identical. We can now use some of the constructs from Fig. 2.8 to write a macro function to numerically compute velocity. Open VBA by selecting4 Tools Macro Visual Basic Editor Once inside the Visual Basic Editor (VBE), select Insert Module and a new code window will open up. The following VBA function can be developed directly from the pseudocode in Fig. 2.7. Type it into the code window. Option Explicit Function Euler(dt, ti, tf, yi, m, cd) Dim h As Double, t As Double, y As Double, dydt As Double t = ti y = yi h = dt Do If t + dt tf Then h = tf − t End If dydt = dy(t, y, m, cd) y = y + dydt * h t = t + h If t = tf Then Exit Do Loop Euler = y End Function Compare this macro with the pseudocode from Fig. 2.7 and recognize how similar they are. Also, see how we have expanded the function’s argument list to include the necessary parameters for the parachutist velocity model. The resulting velocity, y, is then passed back to the spreadsheet via the function name. 4 The hot key combination Alt-F11 is even quicker!
  • 58. 41 (a) Pseudocode IF/THEN: IF condition THEN True block ENDIF IF/THEN/ELSE: IF condition THEN True block ELSE False block ENDIF IF/THEN/ELSEIF: IF condition1 THEN Block1 ELSEIF condition2 Block2 ELSEIF condition3 Block3 ELSE Block4 ENDIF CASE: SELECT CASE Test Expression CASE Value1 Block1 CASE Value2 Block2 CASE Value3 Block3 CASE ELSE Block4 END SELECT DOEXIT: DO Block1 IF condition EXIT Block2 ENDDO COUNT-CONTROLLED LOOP: DOFOR i = start, finish, step Block ENDDO (b) Excel VBA If b 0 Then r1 = −c / b End If If a 0 Then b = Sqr(Abs(a)) Else b = Sqr(a) End If If class = 1 Then x = x + 8 ElseIf class 1 Then x = x − 8 ElseIf class 10 Then x = x − 32 Else x = x − 64 End If Select Case a + b Case Is −50 x = −5 Case Is 0 x = −5 − (a + b) / 10 Case Is 50 x = (a + b) / 10 Case Else x = 5 End Select Do i = i + 1 If i = 10 Then Exit Do j = i*x Loop For i = 1 To 10 Step 2 x = x + i Next i FIGURE 2.8 The fundamental control structures in (a) pseudocode and (b) Excel VBA.
  • 59. 42 PROGRAMMING AND SOFTWARE Also notice how we have included another function to compute the derivative. This can be entered in the same module by typing it directly below the Euler function, Function dy(t, v, m, cd) Const g As Double = 9.81 dy = g − (cd / m) * v End Function The final step is to return to the spreadsheet and invoke the function by entering the following formula in cell B9 =Euler(dt,A8,A9,B8,m,cd) The result of the numerical integration, 16.531, will appear in cell B9. You should appreciate what has happened here. When you enter the function into the spreadsheet cell, the parameters are passed into the VBA program where the calcula- tion is performed and the result is then passed back and displayed in the cell. In effect, the VBA macro language allows you to use Excel as your input/output mechanism. All sorts of benefits arise from this fact. For example, now that you have set up the calculation, you can play with it. Suppose that the jumper was much heavier, say, m 5 100 kg (about 220 lb). Enter 100 into cell B3 and the spreadsheet will update immediately to show a value of 17.438 in cell B9. Change the mass back to 68.1 kg and the previous result, 16.531, automatically reappears in cell B9. Now let us take the process one step further by filling in some additional numbers for the time. Enter the numbers 4, 6, . . . 16 in cells A10 through A16. Then copy the formu- las from cells B9:C9 down to rows 10 through 16. Notice how the VBA program calculates the numerical result correctly for each new row. (To verify this, change dt to 2 and compare with the results previously computed by hand in Example 1.2.) An additional embellish- ment would be to develop an x-y plot of the results using the Excel Chart Wizard. The final spreadsheet is shown below. We now have created a pretty nice problem- solving tool. You can perform sensitivity analyses by changing the values for each of
  • 60. 2.5 MATLAB 43 the parameters. As each new value is entered, the computation and the graph would be automatically updated. It is this interactive nature that makes Excel so powerful. How- ever, recognize that the ability to solve this problem hinges on being able to write the macro with VBA. It is the combination of the Excel environment with the VBA programming language that truly opens up a world of possibilities for engineering problem solving. In the com- ing chapters, we will illustrate how this is accomplished. 2.5 MATLAB MATLAB is the flagship software product of The MathWorks, Inc., which was cofounded by the numerical analysts Cleve Moler and John N. Little. As the name implies, MATLAB was originally developed as a matrix laboratory. To this day, the major element of MAT- LAB is still the matrix. Mathematical manipulations of matrices are very conveniently implemented in an easy-to-use, interactive environment. To these matrix manipulations, MATLAB has added a variety of numerical functions, symbolic computations, and visu- alization tools. As a consequence, the present version represents a fairly comprehensive technical computing environment. MATLAB has a variety of functions and operators that allow convenient implemen- tation of many of the numerical methods developed in this book. These will be described in detail in the individual chapters that follow. In addition, programs can be written as so-called M-files that can be used to implement numerical calculations. Let us explore how this is done. First, you should recognize that normal MATLAB use is closely related to program- ming. For example, suppose that we wanted to determine the analytical solution to the parachutist problem. This could be done with the following series of MATLAB commands g=9.81; m=68.1; cd=12.5; tf=2; v=g*m/cd*(1−exp(−cd/m*tf)) with the result being displayed as v = 16.4217 Thus, the sequence of commands is just like the sequence of instructions in a typical programming language. Now what if you want to deviate from the sequential structure. Although there are some neat ways to inject some nonsequential capabilities in the standard command mode, the inclusion of decisions and loops is best done by creating a MATLAB document called an M-file. To do this, make the menu selection File New Script
  • 61. 44 PROGRAMMING AND SOFTWARE and a new window will open with a heading “MATLAB Editor/Debugger.” In this window, you can type and edit MATLAB programs. Type the following code there: g=9.81; m=68.1; cd=12.5; tf=2; v=g*m/cd*(1−exp(−cd/m*tf)) Notice how the commands are written in exactly the way as they would be written in the front end of MATLAB. Save the program with the name: analpara. MATLAB will automatically attach the extension .m to denote it as an M-file: analpara.m. To run the program, you must go back to the command mode. The most direct way to do this is to click on the “MATLAB Command Window” button on the task bar (which is usually at the bottom of the screen). The program can now be run by typing the name of the M-file, analpara, which should look like analpara If you have done everything correctly, MATLAB should respond with the correct answer: v = 16.4217 Now one problem with the foregoing is that it is set up to compute one case only. You can make it more flexible by having the user input some of the variables. For example, suppose that you wanted to assess the impact of mass on the velocity at 2 s. The M-file could be rewritten as the following to accomplish this g=9.81; m=input('mass (kg) : ') ; cd=12.5; tf=2; v=g*m/cd*(1−exp(−cd/m*tf)) Save this as analpara2.m. If you typed analpara2 while being in command mode, the prompt would show mass (kg): The user could then enter a value like 100, and the result will be displayed as v = 17.3597 Now it should be pretty clear how we can program a numerical solution with an M-file. In order to do this, we must first understand how MATLAB handles logical and looping structures. Figure 2.9 lists pseudocode alongside MATLAB code for all the
  • 62. 2.5 MATLAB 45 (a) Pseudocode IF/THEN: IF condition THEN True block ENDIF IF/THEN/ELSE: IF condition THEN True block ELSE False block ENDIF IF/THEN/ELSEIF: IF condition1 THEN Block1 ELSEIF condition2 Block2 ELSEIF condition3 Block3 ELSE Block4 ENDIF CASE: SELECT CASE Test Expression CASE Value1 Block1 CASE Value2 Block2 CASE Value3 Block3 CASE ELSE Block4 END SELECT DOEXIT: DO Block1 IF condition EXIT Block2 ENDDO COUNT-CONTROLLED LOOP: DOFOR i = start, finish, step Block ENDDO (b) MATLAB if b ~= 0 r1 = −c / b; end if a 0 b = sqrt(abs(a)); else b 5 sqrt(a); end if class == 1 x = x + 8; elseif class 1 x = x − 8; elseif class 10 x = x − 32; else x = x − 64; end switch a + b case 1 x = −25; case 2 x = −5 − (a + b) / 10; case 3 x = (a + b) / 10; otherwise x = 5; end while (1) i = i + 1; if i = 10, break, end j = i*x; end for i = 1:2:10 x = x + i; end FIGURE 2.9 The fundamental control structures in (a) pseudocode and (b) the MATLAB program- ming language.
  • 63. 46 PROGRAMMING AND SOFTWARE control structures from Sec. 2.2. Although the structures of the pseudocode and the MATLAB code are very similar, there are some slight differences that should be noted. In particular, look at how we have represented the DOEXIT structure. In place of the DO, we use the statement WHILE(1). Because MATLAB interprets the number 1 as corresponding to “true,” this statement will repeat infinitely in the same manner as the DO statement. The loop is terminated with a break command. This command transfers control to the statement following the end statement that terminates the loop. Also notice that the parameters of the count-controlled loop are ordered differently. For the pseudocode, the loop parameters are specified as start,finish,step. For MAT- LAB, the parameters are ordered as start:step:finish. The following MATLAB M-file can now be developed directly from the pseudocode in Fig. 2.7. Type it into the MATLAB Editor/Debugger: g=9.81; m=input('mass (kg) :') ; cd=12.5; ti=0; tf=2; vi=0; dt=0.1; t = ti; v = vi; h = dt; while (1) if t + dt tf h = tf − t; end dvdt = g − (cd / m) * v; v = v + dvdt * h; t = t + h; if t = tf, break, end end disp('velocity (m/s):') disp(v) Save this file as numpara.m and return to the command mode and run it by entering: numpara. The following output should result: mass (kg): 100 velocity (m/s): 17.4559 As a final step in this development, let us take the above M-file and convert it into a proper function. This can be done in the following M-file based on the pseudocode from Fig. 2.7 function yy 5 euler(dt,ti,tf,yi,m,cd) t = ti; y = yi; h = dt;
  • 64. 2.6 MATHCAD 47 while (1) if t + dt tf h = tf − t; end dydt = dy(t, y, m, cd); y = y + dydt * h; t = t + h; if t = tf, break, end end yy = y; Save this file as euler.m and then create another M-file to compute the derivative, function dydt = dy(t, v, m, cd) g = 9.81; dydt = g − (cd / m) * v; Save this file as dy.m and return to the command mode. In order to invoke the function and see the result, you can type in the following commands m=68.1; cd=12.5; ti=0; tf=2.; vi=0; dt=0.1; euler(dt,ti,tf,vi,m,cd) When the last command is entered, the answer will be displayed as ans = 16.5478 It is the combination of the MATLAB environment with the M-file programming language that truly opens up a world of possibilities for engineering problem solving. In the coming chapters we will illustrate how this is accomplished. 2.6 MATHCAD Mathcad attempts to bridge the gap between spreadsheets like Excel and notepads. It was originally developed by Allen Razdow of MIT who cofounded Mathsoft, Inc., which published the first commercial version in 1986. Today, Mathsoft is part of Parametric Technology Corporation (PTC) and Mathcad is in version 15. Mathcad is essentially an interactive notepad that allows engineers and scientists to perform a number of common mathematical, data-handling, and graphical tasks. Informa- tion and equations are input to a “whiteboard” design environment that is similar in spirit to a page of paper. Unlike a programming tool or spreadsheet, Mathcad’s interface accepts and displays natural mathematical notation using keystrokes or menu palette clicks—with no programming required. Because the worksheets contain live calculations, a single keystroke that changes an input or equation instantly returns an updated result.
  • 65. 48 PROGRAMMING AND SOFTWARE Mathcad can perform tasks in either numeric or symbolic mode. In numeric mode, Mathcad functions and operators give numerical responses, whereas in symbolic mode results are given as general expressions or equations. Maple V, a comprehensive symbolic math package, is the basis of the symbolic mode and was incorporated into Mathcad in 1993. Mathcad has a variety of functions and operators that allow convenient implementa- tion of many of the numerical methods developed in this book. These will be described in detail in succeeding chapters. In the event that you are unfamiliar with Mathcad, Appendix C also provides a primer on using this powerful software. 2.7 OTHER LANGUAGES AND LIBRARIES In Secs. 2.4 and 2.5, we showed how Excel and MATLAB function procedures for Euler’s method could be developed from an algorithm expressed as pseudocode. You should recognize that similar functions can be written in high-level languages like Fortran 90 and C++. For example, a Fortran 90 function for Euler’s method is Function Euler(dt, ti, tf, yi, m, cd) REAL dt, ti, tf, yi, m, cd Real h, t, y, dydt t = ti y = yi h = dt Do If (t + dt tf) Then h = tf − t End If dydt = dy(t, y, m, cd) y = y + dydt * h t = t + h If (t = tf) Exit End Do Euler = y End Function For C, the result would look quite similar to the MATLAB function. The point is that once a well-structured algorithm is developed in pseudocode form, it can be readily implemented in a variety of programming environments. In this book, our approach will be to provide you with well-structured procedures written as pseudocode. This collection of algorithms then constitutes a numerical library that can be accessed to perform specific numerical tasks in a range of software tools and programming languages. Beyond your own programs, you should be aware that commercial programming libraries contain many useful numerical procedures. For example, the Numerical Recipe library includes a large range of algorithms written in Fortran and C.5 These procedures are described in both book (for example, Press et al. 2007) and electronic form. 5 Numerical Recipe procedures are also available in book and electronic format for Pascal, MS BASIC, and MATLAB. Information on all the Numerical Recipe products can be found at http://www.nr.com/.
  • 66. PROBLEMS 49 2.4 The sine function can be evaluated by the following infinite series: sinx 5 x 2 x3 3! 1 x5 5! 2 x7 7! 1 p Write an algorithm to implement this formula so that it computes and prints out the values of sin x as each term in the series is added. In other words, compute and print in sequence the values for sinx 5 x sinx 5 x 2 x3 3! sinx 5 x 2 x3 3! 1 x5 5! up to the order term n of your choosing. For each of the preceding, compute and display the percent relative error as % error 5 true 2 series approximation true 3 100% Write the algorithm as (a) a structured flowchart and (b) pseudocode. 2.5 Develop, debug, and document a program for Prob. 2.4 in either a high-level language or a macro language of your choice. Employ the library function for the sine in your computer to determine the true value. Have the program print out the series approximation and the error at each step.As a test case, employ the program to compute sin(1.5) for up to and including the term x15 /15!. Interpret your results. 2.6 The following algorithm is designed to determine a grade for a course that consists of quizzes, homework, and a final exam: Step 1: Input course number and name. Step 2: Input weighting factors for quizzes (WQ), homework (WH), and the final exam (WF). Step 3: Input quiz grades and determine an average quiz grade (AQ). Step 4: Input homework grades and determine an average home- work grade (AH). Step 5: If this course has a final grade, continue to step 6. If not, go to step 9. Step 6: Input final exam grade (FE). Step 7: Determine average grade AG according to AG 5 WQ 3 AQ 1 WH 3 AH 1 WF 3 FE WQ 1 WH 1 WF 3 100% Step 8: Go to step 10. Step 9: Determine average grade AG according to AG 5 WQ 3 AQ 1 WH 3 AH WQ 1 WH 3 100% 2.1 Write pseudocode to implement the flowchart depicted in Fig. P2.1. Make sure that proper indentation is included to make the structure clear. F F F T T T x = 75 x = 0 x = x – 50 x ≤ 500 x 50 x 100 FIGURE P2.1 2.2 Rewrite the following pseudocode using proper indentation DO j 5 j 1 1 x 5 x 1 5 IF x . 5 THEN y 5 x ELSE y 5 0 ENDIF z 5 x 1 y IF z . 50 EXIT ENDDO 2.3 Develop, debug, and document a program to determine the roots of a quadratic equation, ax2 1 bx 1 c, in either a high-level language or a macro language of your choice. Use a subroutine procedure to compute the roots (either real or complex). Perform test runs for the cases (a) a 5 1, b 5 6, c 5 2; (b) a 5 0, b 5 24, c 5 1.6; (c) a 5 3, b 5 2.5, c 5 7. PROBLEMS
  • 67. 50 PROGRAMMING AND SOFTWARE 2.8 An amount of money P is invested in an account where interest is compounded at the end of the period. The future worth F yielded at an interest rate i after n periods may be determined from the following formula: F 5 P(1 1 i)n Write a program that will calculate the future worth of an investment for each year from 1 through n. The input to the function should include the initial investment P, the interest rate i (as a decimal), and the number of years n for which the future worth is to be calcu- lated. The output should consist of a table with headings and columns for n and F. Run the program for P 5 $100,000, i 5 0.04, and n 5 11 years. 2.9 Economic formulas are available to compute annual payments for loans. Suppose that you borrow an amount of money P and agree to repay it in n annual payments at an interest rate of i. The formula to compute the annual payment A is A 5 P i(1 1 i)n (1 1 i)n 2 1 Write a program to compute A. Test it with P 5 $55,000 and an interest rate of 6.6% (i 5 0.066). Compute results for n 5 1, 2, 3, 4, and 5 and display the results as a table with headings and columns for n and A. 2.10 The average daily temperature for an area can be approxi- mated by the following function, T 5 Tmean 1 (Tpeak 2 Tmean) cos (v(t 2 tpeak)) where Tmean 5 the average annual temperature, Tpeak 5 the peak temperature, v 5 the frequency of the annual variation (5 2p/365), and tpeak 5 day of the peak temperature (˘ 205 d). Develop a program that computes the average temperature between two days of the year for a particular city. Test it for (a) January–February (t 5 0 to 59) in Miami, Florida (Tmean 5 22.18C; Tpeak 5 28.38C), and (b) July–August (t 5 180 to 242) in Boston, Massachusetts (Tmean 5 10.78C; Tpeak 5 22.98C). 2.11 Develop, debug, and test a program in either a high-level language or a macro language of your choice to compute the velocity of the falling parachutist as outlined in Example 1.2. Design the program so that it allows the user to input values for the drag coefficient and mass. Test the program by duplicating the results from Example 1.2. Repeat the computation but em- ploy step sizes of 1 and 0.5 s. Compare your results with the analytical solution obtained previously in Example 1.1. Does a smaller step size make the results better or worse? Explain your results. 2.12 The bubble sort is an inefficient, but easy-to-program, sorting technique. The idea behind the sort is to move down through an array comparing adjacent pairs and swapping the Step 10: Print out course number, name, and average grade. Step 11: Terminate computation. (a) Write well-structured pseudocode to implement this algorithm. (b) Write, debug, and document a structured computer program based on this algorithm. Test it using the following data to calculate a grade without the final exam and a grade with the final exam: WQ 5 30; WH 5 40; WF 5 30; quizzes 5 98, 95, 90, 60, 99; homework 5 98, 95, 86, 100, 100, 77; and final exam 5 91. 2.7 The “divide and average” method, an old-time method for approximating the square root of any positive number a can be formulated as x 5 x 1 ayx 2 (a) Write well-structured pseudocode to implement this algorithm as depicted in Fig. P2.7. Use proper indentation so that the structure is clear. (b) Develop, debug, and document a program to implement this equation in either a high-level language or a macro language of your choice. Structure your code according to Fig. P2.7. F F T T SquareRoot = 0 SquareRoot = x y = (x + a/x)/2 e = |(y – x)/y| x = y tol = 106 x = a/2 a 0 e tol FIGURE P2.7
  • 68. PROBLEMS 51 decisional control structures (like If/Then, ElseIf, Else, End If). Design the function so that it returns the volume for all cases where the depth is less than 3R. Return an error message (“Overtop”) if you overtop the tank, that is, d . 3R. Test it with the following data: R 1 1 1 1 d 0.5 1.2 3.0 3.1 2R R d FIGURE P2.13 I II III IV ␪ r x y FIGURE P2.14 2.14 Two distances are required to specify the location of a point relative to an origin in two-dimensional space (Fig. P2.14): • The horizontal and vertical distances (x, y) in Cartesian coordinates • The radius and angle (r, u) in radial coordinates. values if they are out of order. For this method to sort the array completely, it may need to pass through it many times. As the passes proceed for an ascending-order sort, the smaller elements in the array appear to rise toward the top like bubbles. Eventu- ally, there will be a pass through the array where no swaps are required. Then, the array is sorted. After the first pass, the larg- est value in the array drops directly to the bottom. Consequently, the second pass only has to proceed to the second-to-last value, and so on. Develop a program to set up an array of 20 random numbers and sort them in ascending order with the bubble sort (Fig. P2.12). T T T F F F m = n – 1 switch = false switch = true m = m – 1 i = 1 i = i + 1 i m swap ai ai+1 start end ai ai+1 Not switch FIGURE P2.12 2.13 Figure P2.13 shows a cylindrical tank with a conical base. If the liquid level is quite low in the conical part, the volume is simply the conical volume of liquid. If the liquid level is mid- range in the cylindrical part, the total volume of liquid includes the filled conical part and the partially filled cylindrical part. Write a well-structured function procedure to compute the tank’s volume as a function of given values of R and d. Use
  • 69. 52 PROGRAMMING AND SOFTWARE Letter Criteria A 90 # numeric grade # 100 B 80 # numeric grade , 90 C 70 # numeric grade , 80 D 60 # numeric grade , 70 F numeric grade , 60 2.16 Develop well-structured function procedures to determine (a) the factorial; (b) the minimum value in a vector; and (c) the average of the values in a vector. 2.17 Develop well-structured programs to (a) determine the square root of the sum of the squares of the elements of a two-dimensional array (i.e., a matrix) and (b) normalize a matrix by dividing each row by the maximum absolute value in the row so that the maxi- mum element in each row is 1. 2.18 Piecewise functions are sometimes useful when the relation- ship between a dependent and an independent variable cannot be adequately represented by a single equation. For example, the velocity of a rocket might be described by y(t) 5 e 11t2 2 5t 0 # t # 10 1100 2 5t 10 # t # 20 50t 1 2(t 2 20)2 20 # t # 30 1520e20.2(t230) t . 30 0 otherwise Develop a well-structured function to compute v as a function of t. Then use this function to generate a table of v versus t for t 5 25 to 50 at increments of 0.5. 2.19 Develop a well-structured function to determine the elapsed days in a year. The function should be passed three values: mo 5 the month (1–12), da 5 the day (1–31) and leap 5 (0 for non–leap year and 1 for leap year). Test it for January 1, 1999; February 29, 2000; March 1, 2001; June 21, 2002; and December 31, 2004. Hint: a nice way to do this combines the for and the switch structures. 2.20 Develop a well-structured function to determine the elapsed days in a year. The first line of the function should be set up as function nd = days(mo, da, year) where mo 5 the month (1–12), da 5 the day (1–31) and year 5 the year. Test it for January 1, 1999; February 29, 2000; March 1, 2001; June 21, 2002; and December 31, 2004. 2.21 Manning’s equation can be used to compute the velocity of water in a rectangular open channel, U 5 2S n a BH B 1 2H b 2y3 It is relatively straightforward to compute Cartesian coordinates (x, y) on the basis of polar coordinates (r, u). The reverse process is not so simple. The radius can be computed by the following formula: r 5 2x2 1 y2 If the coordinates lie within the first and fourth coordinates (i.e., x . 0), then a simple formula can be used to compute u u 5 tan21 a y x b The difficulty arises for the other cases. The following table sum- marizes the possibilities: x y U ,0 .0 tan21 (y/x) 1 p ,0 ,0 tan21 (y/x) 2 p ,0 50 p 50 .0 p/2 50 ,0 2p/2 50 50 0 (a) Write a well-structured flowchart for a subroutine procedure to calculate r and u as a function of x and y. Express the final results for u in degrees. (b) Write a well-structured function procedure based on your flowchart. Test your program by using it to fill out the follow- ing table: x y r U 1 0 1 1 0 1 21 1 21 0 21 21 0 21 1 21 0 0 2.15 Develop a well-structured function procedure that is passed a numeric grade from 0 to 100 and returns a letter grade according to the scheme:
  • 70. PROBLEMS 53 2.23 The volume V of liquid in a hollow horizontal cylinder of radius r and length L is related to the depth of the liquid h by V 5 c r2 cos 21 a r 2 h r b 2 (r 2 h) 22rh 2 h2 d L Develop a well-structured function to create a plot of volume versus depth. Test the program for r 5 2 m and L 5 5 m. 2.24 Develop a well-structured program to compute the ve- locity of a parachutist as a function of time using Euler’s method. Test your program for the case where m 5 80 kg and c 5 10 kg/s. Perform the calculation from t 5 0 to 20 s with a step size of 2 s. Use an initial condition that the parachutist has an upward velocity of 20 m/s at t 5 0. At t 5 10 s, assume that the parachute is instantaneously deployed so that the drag coefficient jumps to 50 kg/s. 2.25 The pseudocode in Fig. P2.25 computes the factorial. Express this algorithm as a well-structured function in the language of your choice. Test it by computing 0! and 5!. In addition, test the error trap by trying to evaluate 22!. FUNCTION fac(n) IF n $ 0 THEN x 5 1 DOFOR i 5 1, n x 5 x ? i END DO fac 5 x ELSE display error message terminate ENDIF END fac FIGURE P2.25 20.26 The height of a small rocket y can be calculated as a function of time after blastoff with the following piecewise function: y 5 38.1454t 1 0.13743t3 0 # t , 15 y 5 1036 1 130.909(t 2 15) 1 6.18425(t 2 15)2 2 0.428(t 2 15)3 15 # t , 33 y 5 2900262.468(t 233)216.9274(t 233)2 1 0.41796(t 233)3 t . 33 where U 5 velocity (m/s), S 5 channel slope, n 5 roughness coef- ficient, B 5 width (m), and H 5 depth (m). The following data are available for five channels: n S B H 0.035 0.0001 10 2 0.020 0.0002 8 1 0.015 0.0010 20 1.5 0.030 0.0007 24 3 0.022 0.0003 15 2.5 Write a well-structured program that computes the velocity for each of these channels. Have the program display the input data along with the computed velocity in tabular form where velocity is the fifth column. Include headings on the table to label the columns. 2.22 A simply supported beam is loaded as shown in Fig. P2.22. Using singularity functions, the displacement along the beam can be expressed by the equation: uy(x) 5 25 6 [kx 2 0l4 2 kx 2 5l4 ] 1 15 6 kx 2 8l3 1 75kx 2 7l2 1 57 6 x3 2 238.25x By definition, the singularity function can be expressed as follows: kx 2 aln 5 e (x 2 a)n when x . a 0 when x # a f Develop a program that creates a plot of displacement versus distance along the beam x. Note that x 5 0 at the left end of the beam. 20 kips/ft 150 kip-ft 15 kips 5’ 2’ 1’ 2’ FIGURE P2.22
  • 71. 54 PROGRAMMING AND SOFTWARE Develop a well-structured pseudocode function to compute y as a function of t. Note that if the user enters a negative value of t or if the rocket has hit the ground (y # 0) then return a value of zero for y. Also, the function should be invoked in the calling program as height(t). Write the algorithm as (a) pseudocode, or (b) in the high-level language of your choice. 20.27 As depicted in Fig. P2.27, a water tank consists of a cylinder topped by the frustum of a cone. Develop a well- structured function in the high-level language or macro lan- guage of your choice to compute the volume given the water level h (m) above the tank’s bottom. Design the function so that it returns a value of zero for negative h’s and the value of the maximum filled volume for h’s greater than the tank’s maxi- mum depth. Given the following parameters, H1 5 10 m, r1 5 4 m, H2 5 5 m, and r2 5 6.5 m, test your function by using it to compute the volumes and generate a graph of the volume as a function of level from h 5 21 to 16 m. h H2 H1 r1 r2 FIGURE P2.27
  • 72. 3 C H A P T E R 3 55 Approximations and Round-Off Errors Because so many of the methods in this book are straightforward in description and application, it would be very tempting at this point for us to proceed directly to the main body of the text and teach you how to use these techniques. However, understanding the concept of error is so important to the effective use of numerical methods that we have chosen to devote the next two chapters to this topic. The importance of error was introduced in our discussion of the falling parachutist in Chap. 1. Recall that we determined the velocity of a falling parachutist by both ana- lytical and numerical methods. Although the numerical technique yielded estimates that were close to the exact analytical solution, there was a discrepancy, or error, because the numerical method involved an approximation. Actually, we were fortunate in that case because the availability of an analytical solution allowed us to compute the error exactly. For many applied engineering problems, we cannot obtain analytical solutions. Therefore, we cannot compute exactly the errors associated with our numerical methods. In these cases, we must settle for approximations or estimates of the errors. Such errors are characteristic of most of the techniques described in this book. This statement might at first seem contrary to what one normally conceives of as sound engineering. Students and practicing engineers constantly strive to limit errors in their work. When taking examinations or doing homework problems, you are penalized, not rewarded, for your errors. In professional practice, errors can be costly and sometimes catastrophic. If a structure or device fails, lives can be lost. Although perfection is a laudable goal, it is rarely, if ever, attained. For example, despite the fact that the model developed from Newton’s second law is an excellent approximation, it would never in practice exactly predict the parachutist’s fall. A variety of factors such as winds and slight variations in air resistance would result in deviations from the prediction. If these deviations are systematically high or low, then we might need to develop a new model. However, if they are randomly distributed and tightly grouped around the prediction, then the deviations might be considered negligible and the model deemed adequate. Numerical approximations also introduce similar discrepancies into the analysis. Again, the question is: How much the next error is present in our calculations and is it tolerable? This chapter and Chap. 4 cover basic topics related to the identification, quan- tification, and minimization of these errors. In this chapter, general information con- cerned with the quantification of error is reviewed in the first sections. This is
  • 73. 56 APPROXIMATIONS AND ROUND-OFF ERRORS followed by a section on one of the two major forms of numerical error: round-off error. Round-off error is due to the fact that computers can represent only quantities with a finite number of digits. Then Chap. 4 deals with the other major form: trun- cation error. Truncation error is the discrepancy introduced by the fact that numeri- cal methods may employ approximations to represent exact mathematical operations and quantities. Finally, we briefly discuss errors not directly connected with the numerical methods themselves. These include blunders, formulation or model errors, and data uncertainty. 3.1 SIGNIFICANT FIGURES This book deals extensively with approximations connected with the manipulation of numbers. Consequently, before discussing the errors associated with numerical methods, it is useful to review basic concepts related to approximate representation of the numbers themselves. Whenever we employ a number in a computation, we must have assurance that it can be used with confidence. For example, Fig. 3.1 depicts a speedometer and odom- eter from an automobile. Visual inspection of the speedometer indicates that the car is traveling between 48 and 49 km/h. Because the indicator is higher than the midpoint between the markers on the gauge, we can say with assurance that the car is traveling at approximately 49 km/h. We have confidence in this result because two or more rea- sonable individuals reading this gauge would arrive at the same conclusion. However, let us say that we insist that the speed be estimated to one decimal place. For this case, 40 8 7 3 2 4 4 5 0 120 20 40 60 80 100 FIGURE 3.1 An automobile speedometer and odometer illustrating the concept of a significant figure.
  • 74. 3.1 SIGNIFICANT FIGURES 57 one person might say 48.8, whereas another might say 48.9 km/h. Therefore, because of the limits of this instrument, only the first two digits can be used with confidence. Estimates of the third digit (or higher) must be viewed as approximations. It would be ludicrous to claim, on the basis of this speedometer, that the automobile is traveling at 48.8642138 km/h. In contrast, the odometer provides up to six certain digits. From Fig. 3.1, we can conclude that the car has traveled slightly less than 87,324.5 km during its lifetime. In this case, the seventh digit (and higher) is uncertain. The concept of a significant figure, or digit, has been developed to formally designate the reliability of a numerical value. The significant digits of a number are those that can be used with confidence. They correspond to the number of certain digits plus one esti- mated digit. For example, the speedometer and the odometer in Fig. 3.1 yield readings of three and seven significant figures, respectively. For the speedometer, the two certain digits are 48. It is conventional to set the estimated digit at one-half of the smallest scale division on the measurement device. Thus the speedometer reading would consist of the three significant figures: 48.5. In a similar fashion, the odometer would yield a seven- significant-figure reading of 87,324.45. Although it is usually a straightforward procedure to ascertain the significant figures of a number, some cases can lead to confusion. For example, zeros are not always sig- nificant figures because they may be necessary just to locate a decimal point. The num- bers 0.00001845, 0.0001845, and 0.001845 all have four significant figures. Similarly, when trailing zeros are used in large numbers, it is not clear how many, if any, of the zeros are significant. For example, at face value the number 45,300 may have three, four, or five significant digits, depending on whether the zeros are known with confidence. Such uncertainty can be resolved by using scientific notation, where 4.53 3 104 , 4.530 3 104 , 4.5300 3 104 designate that the number is known to three, four, and five significant figures, respectively. The concept of significant figures has two important implications for our study of numerical methods: 1. As introduced in the falling parachutist problem, numerical methods yield approxi- mate results. We must, therefore, develop criteria to specify how confident we are in our approximate result. One way to do this is in terms of significant figures. For example, we might decide that our approximation is acceptable if it is correct to four significant figures. 2. Although quantities such as p, e, or 17 represent specific quantities, they cannot be expressed exactly by a limited number of digits. For example, p 5 3.141592653589793238462643 p ad infinitum. Because computers retain only a finite number of significant figures, such numbers can never be represented exactly. The omission of the remaining significant figures is called round-off error. Both round-off error and the use of significant figures to express our confidence in a numerical result will be explored in detail in subsequent sections. In addition, the concept of significant figures will have relevance to our definition of accuracy and preci- sion in the next section.
  • 75. 58 APPROXIMATIONS AND ROUND-OFF ERRORS 3.2 ACCURACY AND PRECISION The errors associated with both calculations and measurements can be characterized with regard to their accuracy and precision. Accuracy refers to how closely a computed or measured value agrees with the true value. Precision refers to how closely individual computed or measured values agree with each other. These concepts can be illustrated graphically using an analogy from target practice. The bullet holes on each target in Fig. 3.2 can be thought of as the predictions of a nu- merical technique, whereas the bull’s-eye represents the truth. Inaccuracy (also called bias) is defined as systematic deviation from the truth. Thus, although the shots in Fig. 3.2c are more tightly grouped than those in Fig. 3.2a, the two cases are equally biased because they are both centered on the upper left quadrant of the target. Imprecision (also called uncertainty), on the other hand, refers to the magnitude of the scatter. Therefore, although Fig. 3.2b and d are equally accurate (that is, centered on the bull’s-eye), the latter is more precise because the shots are tightly grouped. Numerical methods should be sufficiently accurate or unbiased to meet the require- ments of a particular engineering problem. They also should be precise enough for adequate (c) (a) (d) (b) Increasing accuracy Increasing precision FIGURE 3.2 An example from marksmanship illustrating the concepts of accuracy and precision. (a) Inaccurate and imprecise; (b) accurate and imprecise; (c) inaccurate and precise; (d) accurate and precise.
  • 76. 3.3 ERROR DEFINITIONS 59 engineering design. In this book, we will use the collective term error to represent both the inaccuracy and the imprecision of our predictions. With these concepts as background, we can now discuss the factors that contribute to the error of numerical computations. 3.3 ERROR DEFINITIONS Numerical errors arise from the use of approximations to represent exact mathematical operations and quantities. These include truncation errors, which result when approxima- tions are used to represent exact mathematical procedures, and round-off errors, which result when numbers having limited significant figures are used to represent exact num- bers. For both types, the relationship between the exact, or true, result and the approxi- mation can be formulated as True value 5 approximation 1 error (3.1) By rearranging Eq. (3.1), we find that the numerical error is equal to the discrepancy between the truth and the approximation, as in Et 5 true value 2 approximation (3.2) where Et is used to designate the exact value of the error. The subscript t is included to designate that this is the “true” error. This is in contrast to other cases, as described shortly, where an “approximate” estimate of the error must be employed. A shortcoming of this definition is that it takes no account of the order of magnitude of the value under examination. For example, an error of a centimeter is much more sig- nificant if we are measuring a rivet rather than a bridge. One way to account for the mag- nitudes of the quantities being evaluated is to normalize the error to the true value, as in True fractional relative error 5 true error true value where, as specified by Eq. (3.2), error 5 true value 2 approximation. The relative error can also be multiplied by 100 percent to express it as et 5 true error true value 100% (3.3) where et designates the true percent relative error. EXAMPLE 3.1 Calculation of Errors Problem Statement. Suppose that you have the task of measuring the lengths of a bridge and a rivet and come up with 9999 and 9 cm, respectively. If the true values are 10,000 and 10 cm, respectively, compute (a) the true error and (b) the true percent rela- tive error for each case. Solution. (a) The error for measuring the bridge is [Eq. (3.2)] Et 5 10,000 2 9999 5 1 cm
  • 77. 60 APPROXIMATIONS AND ROUND-OFF ERRORS and for the rivet it is Et 5 10 2 9 5 1 cm (b) The percent relative error for the bridge is [Eq. (3.3)] et 5 1 10,000 100% 5 0.01% and for the rivet it is et 5 1 10 100% 5 10% Thus, although both measurements have an error of 1 cm, the relative error for the rivet is much greater. We would conclude that we have done an adequate job of measuring the bridge, whereas our estimate for the rivet leaves something to be desired. Notice that for Eqs. (3.2) and (3.3), E and e are subscripted with a t to signify that the error is normalized to the true value. In Example 3.1, we were provided with this value. However, in actual situations such information is rarely available. For numerical methods, the true value will be known only when we deal with functions that can be solved analytically. Such will typically be the case when we investigate the theoretical behavior of a particular technique for simple systems. However, in real-world applications, we will obviously not know the true answer a priori. For these situations, an alternative is to normalize the error using the best available estimate of the true value, that is, to the approximation itself, as in ea 5 approximate error approximation 100% (3.4) where the subscript a signifies that the error is normalized to an approximate value. Note also that for real-world applications, Eq. (3.2) cannot be used to calculate the error term for Eq. (3.4). One of the challenges of numerical methods is to determine error estimates in the absence of knowledge regarding the true value. For example, certain numerical methods use an iterative approach to compute answers. In such an approach, a present approximation is made on the basis of a previous approximation. This process is performed repeatedly, or iteratively, to successively compute (we hope) better and better approxima- tions. For such cases, the error is often estimated as the difference between previous and current approximations. Thus, percent relative error is determined according to ea 5 current approximation 2 previous approximation current approximation 100% (3.5) This and other approaches for expressing errors will be elaborated on in subsequent chapters. The signs of Eqs. (3.2) through (3.5) may be either positive or negative. If the approximation is greater than the true value (or the previous approximation is greater than the current approximation), the error is negative; if the approximation is less than the true value, the error is positive. Also, for Eqs. (3.3) to (3.5), the denominator may
  • 78. 3.3 ERROR DEFINITIONS 61 be less than zero, which can also lead to a negative error. Often, when performing computations, we may not be concerned with the sign of the error, but we are interested in whether the percent absolute value is lower than a prespecified percent tolerance es. Therefore, it is often useful to employ the absolute value of Eqs. (3.2) through (3.5). For such cases, the computation is repeated until ZeaZ , es (3.6) If this relationship holds, our result is assumed to be within the prespecified acceptable level es. Note that for the remainder of this text, we will almost exclusively employ absolute values when we use relative errors. It is also convenient to relate these errors to the number of significant figures in the approximation. It can be shown (Scarborough, 1966) that if the following criterion is met, we can be assured that the result is correct to at least n significant figures. es 5 (0.5 3 1022n )% (3.7) EXAMPLE 3.2 Error Estimates for Iterative Methods Problem Statement. In mathematics, functions can often be represented by infinite series. For example, the exponential function can be computed using ex 5 1 1 x 1 x2 2 1 x3 3! 1 p 1 xn n! (E3.2.1) Thus, as more terms are added in sequence, the approximation becomes a better and better estimate of the true value of ex . Equation (E3.2.1) is called a Maclaurin series expansion. Starting with the simplest version, ex 5 1, add terms one at a time to estimate e0.5 . After each new term is added, compute the true and approximate percent relative errors with Eqs. (3.3) and (3.5), respectively. Note that the true value is e0.5 5 1.648721 . . . . Add terms until the absolute value of the approximate error estimate ea falls below a prespecified error criterion es conforming to three significant figures. Solution. First, Eq. (3.7) can be employed to determine the error criterion that ensures a result is correct to at least three significant figures: es 5 (0.5 3 10223 )% 5 0.05% Thus, we will add terms to the series until ea falls below this level. The first estimate is simply equal to Eq. (E3.2.1) with a single term. Thus, the first es- timate is equal to 1. The second estimate is then generated by adding the second term, as in ex 5 1 1 x or for x 5 0.5, e0.5 5 1 1 0.5 5 1.5 This represents a true percent relative error of [Eq. (3.3)] et 5 1.648721 2 1.5 1.648721 100% 5 9.02%
  • 79. 62 APPROXIMATIONS AND ROUND-OFF ERRORS Equation (3.5) can be used to determine an approximate estimate of the error, as in ea 5 1.5 2 1 1.5 100% 5 33.3% Because ea is not less than the required value of es, we would continue the computation by adding another term, x2 y2!, and repeating the error calculations. The process is con- tinued until ea , es. The entire computation can be summarized as Terms Result Et (%) Ea (%) 1 1 39.3 2 1.5 9.02 33.3 3 1.625 1.44 7.69 4 1.645833333 0.175 1.27 5 1.648437500 0.0172 0.158 6 1.648697917 0.00142 0.0158 Thus, after six terms are included, the approximate error falls below es 5 0.05% and the computation is terminated. However, notice that, rather than three significant figures, the result is accurate to five! This is because, for this case, both Eqs. (3.5) and (3.7) are con- servative. That is, they ensure that the result is at least as good as they specify. Although, as discussed in Chap. 6, this is not always the case for Eq. (3.5), it is true most of the time. 3.3.1 Computer Algorithm for Iterative Calculations Many of the numerical methods described in the remainder of this text involve iterative cal- culations of the sort illustrated in Example 3.2. These all entail solving a mathematical problem by computing successive approximations to the solution starting from an initial guess. The computer implementation of such iterative solutions involves loops. As we saw in Sec. 2.1.1, these come in two basic flavors: count-controlled and decision loops. Most iterative solutions use decision loops. Thus, rather than employing a prespecified number of iterations, the process typically is repeated until an approximate error estimate falls below a stopping criterion, as in Example 3.2. A pseudocode for a generic iterative calculation is presented in Fig. 3.3. The function is passed a value (val) along with a stopping error criterion (es) and a maximum al- lowable number of iterations (maxit). The value is typically either (1) an initial value or (2) the value for which the iterative calculation is to be made. The function first initializes three variables. These include (1) a variable iter that keeps track of the number of iterations, (2) a variable sol that holds the current estimate of the solution, and (3) a variable ea that holds the approximate percent relative error. Note that ea is initially set to a value of 100 to ensure that the loop executes at least once. These initializations are followed by the decision loop that actually implements the iterative calculation. Prior to generating a new solution, sol is first assigned to solold. Then a new value of sol is computed and the iteration counter is incremented. If the new value of sol is nonzero, the percent relative error ea is determined. The stopping
  • 80. 3.3 ERROR DEFINITIONS 63 criteria are then tested. If both are false, the loop repeats. If either are true, the loop terminates and the final solution is sent back to the function call. The following example illustrates how the generic algorithm can be applied to a specific iterative calculation. EXAMPLE 3.3 Computer Implementation of an Iterative Calculation Problem Statement. Develop a computer program based on the pseudocode from Fig. 3.3 to implement the calculation from Example 3.2. Solution. A function to implement the Maclaurin series expansion for ex can be based on the general scheme in Fig. 3.3. To do this, we first formulate the series expansion as a formula: ex a n i50 xn n! Figure 3.4 shows functions to implement this series written in VBA and MATLAB software. Similar codes could be developed in other languages such a C11 or Fortran 95. Notice that whereas MATLAB has a built-in factorial function, it is necessary to compute the factorial as part of the VBA implementation with a simple product accumulator fac. When the programs are run, they generate an estimate for the exponential function. For the MATLAB version, the answer is returned along with the approximate error and the number of iterations. For example, e1 can be evaluated as format long [val, ea, iter] = IterMeth(1,1e−6,100) val = 2.718281826198493 ea = 9.216155641522974e−007 iter = 12 FUNCTION IterMeth(val, es, maxit) iter 5 1 sol 5 val ea 5 100 DO solold 5 sol sol 5 ... iter 5 iter 1 1 IF sol fi 0 ea5abs((sol 2 solold)/sol)*100 IF ea # es OR iter $ maxit EXIT END DO IterMeth 5 sol END IterMeth FIGURE 3.3 Pseudocode for a generic iterative calculation.
  • 81. 64 APPROXIMATIONS AND ROUND-OFF ERRORS We can see that after 12 iterations, we obtain a result of 2.7182818 with an approxi- mate error estimate of 5 9.2162 3 1027 %. The result can be verified by using the built-in exp function to directly calculate the exact value and the true percent relative error, trueval=exp(1) trueval = 2.718281828459046 et=abs((trueval−val)/trueval)*100 et = 8.316108397236229e−008 As was the case with Example 3.2, we obtain the desirable outcome that the true error is less than the approximate error. With the preceding definitions as background, we can now proceed to the two types of error connected directly with numerical methods: round-off errors and truncation errors. (b) MATLAB function [v,ea,iter] = IterMeth(x,es,maxit) % initialization iter = 1; sol = 1; ea = 100; % iterative calculation while (1) solold = sol; sol = sol + x ^ iter / factorial(iter); iter = iter + 1; if sol~=0 ea=abs((sol − solold)/sol)*100; end if ea=es | iter=maxit,break,end end v = sol; end (a) VBA/Excel Function IterMeth(x, es, maxit) ’ initialization iter = 1 sol = 1 ea = 100 fac = 1 ’ iterative calculation Do solold = sol fac = fac * iter sol = sol + x ^ iter / fac iter = iter + 1 If sol 0 Then ea = Abs((sol − solold) / sol) * 100 End If If ea = es Or iter = maxit Then Exit Do Loop IterMeth = sol End Function FIGURE 3.4 (a) VBA/Excel and (b) MATLAB functions based on the pseudocode from Fig. 3.3.
  • 82. 3.4 ROUND-OFF ERRORS 65 3.4 ROUND-OFF ERRORS As mentioned previously, round-off errors originate from the fact that computers retain only a fixed number of significant figures during a calculation. Numbers such as p, e, or 27 cannot be expressed by a fixed number of significant figures. Therefore, they cannot be represented exactly by the computer. In addition, because computers use a base-2 representation, they cannot precisely represent certain exact base-10 numbers. The discrepancy introduced by this omission of significant figures is called round-off error. 3.4.1 Computer Representation of Numbers Numerical round-off errors are directly related to the manner in which numbers are stored in a computer. The fundamental unit whereby information is represented is called a word. This is an entity that consists of a string of binary digits, or bits. Numbers are typically stored in one or more words. To understand how this is accomplished, we must first review some material related to number systems. Number Systems. A number system is merely a convention for representing quantities. Because we have 10 fingers and 10 toes, the number system that we are most familiar with is the decimal, or base-10, number system. A base is the number used as the refer- ence for constructing the system. The base-10 system uses the 10 digits—0, 1, 2, 3, 4, 5, 6, 7, 8, 9—to represent numbers. By themselves, these digits are satisfactory for counting from 0 to 9. For larger quantities, combinations of these basic digits are used, with the position or place value specifying the magnitude. The right-most digit in a whole number repre- sents a number from 0 to 9. The second digit from the right represents a multiple of 10. The third digit from the right represents a multiple of 100 and so on. For example, if we have the number 86,409 then we have eight groups of 10,000, six groups of 1000, four groups of 100, zero groups of 10, and nine more units, or (8 3 104 ) 1 (6 3 103 ) 1 (4 3 102 ) 1 (0 3 101 ) 1 (9 3 100 ) 5 86,409 Figure 3.5a provides a visual representation of how a number is formulated in the base-10 system. This type of representation is called positional notation. Because the decimal system is so familiar, it is not commonly realized that there are alternatives. For example, if human beings happened to have had eight fingers and eight toes, we would undoubtedly have developed an octal, or base-8, representation. In the same sense, our friend the computer is like a two-fingered animal who is limited to two states—either 0 or 1. This relates to the fact that the primary logic units of digital com- puters are on/off electronic components. Hence, numbers on the computer are represented with a binary, or base-2, system. Just as with the decimal system, quantities can be represented using positional notation. For example, the binary number 11 is equivalent to (1 3 21 ) 1 (1 3 20 ) 5 2 1 1 5 3 in the decimal system. Figure 3.5b illustrates a more complicated example. Integer Representation. Now that we have reviewed how base-10 numbers can be represented in binary form, it is simple to conceive of how integers are represented on a computer. The most straightforward approach, called the signed magnitude method, employs the first bit of a word to indicate the sign, with a 0 for positive and a 1 for
  • 83. 66 APPROXIMATIONS AND ROUND-OFF ERRORS negative. The remaining bits are used to store the number. For example, the integer value of 2173 would be stored on a 16-bit computer, as in Fig. 3.6. EXAMPLE 3.4 Range of Integers Problem Statement. Determine the range of integers in base-10 that can be represented on a 16-bit computer. FIGURE 3.5 How the (a) decimal (base-10) and the (b) binary (base-2) systems work. In (b), the binary num- ber 10101101 is equivalent to the decimal number 173. 1 ⫻ 1 = 0 ⫻ 2 = 1 ⫻ 4 = 1 ⫻ 8 = 0 ⫻ 16 = 1 ⫻ 32 = 0 ⫻ 64 = 1 ⫻ 128 = 1 0 4 8 0 32 0 128 173 27 1 26 0 25 1 24 0 23 1 22 1 21 0 20 1 9 ⫻ 1 = 0 ⫻ 10 = 4 ⫻ 100 = 6 ⫻ 1,000 = 8 ⫻ 10,000 = 9 0 400 6,000 80,000 86,409 104 8 103 6 102 4 101 0 100 9 (a) (b) FIGURE 3.6 The representation of the decimal integer 2173 on a 16-bit computer using the signed magnitude method. 1 0 0 0 0 0 0 0 1 0 1 0 1 1 0 1 Sign Number
  • 84. 3.4 ROUND-OFF ERRORS 67 Solution. Of the 16 bits, the first bit holds the sign. The remaining 15 bits can hold binary numbers from 0 to 111111111111111. The upper limit can be converted to a decimal integer, as in (1 3 214 ) 1 (1 3 213 ) 1 p 1 (1 3 21 ) 1 (1 3 20 ) which equals 32,767 (note that this expression can be simply evaluated as 215 2 1). Thus, a 16-bit computer word can store decimal integers ranging from 232,767 to 32,767. In addition, because zero is already defined as 0000000000000000, it is redundant to use the number 1000000000000000 to define a “minus zero.” Therefore, it is usually em- ployed to represent an additional negative number: 232,768, and the range is from 232,768 to 32,767. Note that the signed magnitude method described above is not used to represent integers on conventional computers. A preferred approach called the 2’s complement technique directly incorporates the sign into the number’s magnitude rather than provid- ing a separate bit to represent plus or minus (see Chapra and Canale 1994). However, Example 3.4 still serves to illustrate how all digital computers are limited in their capa- bility to represent integers. That is, numbers above or below the range cannot be repre- sented. A more serious limitation is encountered in the storage and manipulation of fractional quantities as described next. Floating-Point Representation. Fractional quantities are typically represented in com- puters using floating-point form. In this approach, the number is expressed as a fractional part, called a mantissa or significand, and an integer part, called an exponent or charac- teristic, as in m # be where m 5 the mantissa, b 5 the base of the number system being used, and e 5 the exponent. For instance, the number 156.78 could be represented as 0.15678 3 103 in a floating-point base-10 system. Figure 3.7 shows one way that a floating-point number could be stored in a word. The first bit is reserved for the sign, the next series of bits for the signed exponent, and the last bits for the mantissa. FIGURE 3.7 The manner in which a floating-point number is stored in a word. Sign Signed exponent Mantissa
  • 85. 68 APPROXIMATIONS AND ROUND-OFF ERRORS Note that the mantissa is usually normalized if it has leading zero digits. For ex- ample, suppose the quantity 1y34 5 0.029411765 . . . was stored in a floating-point base- 10 system that allowed only four decimal places to be stored. Thus, 1y34 would be stored as 0.0294 3 100 However, in the process of doing this, the inclusion of the useless zero to the right of the decimal forces us to drop the digit 1 in the fifth decimal place. The number can be normalized to remove the leading zero by multiplying the mantissa by 10 and lowering the exponent by 1 to give 0.2941 3 1021 Thus, we retain an additional significant figure when the number is stored. The consequence of normalization is that the absolute value of m is limited. That is, 1 b # m , 1 (3.8) where b 5 the base. For example, for a base-10 system, m would range between 0.1 and 1, and for a base-2 system, between 0.5 and 1. Floating-point representation allows both fractions and very large numbers to be expressed on the computer. However, it has some disadvantages. For example, floating-point numbers take up more room and take longer to process than integer numbers. More significantly, however, their use introduces a source of error because the mantissa holds only a finite number of significant figures. Thus, a round-off error is introduced. EXAMPLE 3.5 Hypothetical Set of Floating-Point Numbers Problem Statement. Create a hypothetical floating-point number set for a machine that stores information using 7-bit words. Employ the first bit for the sign of the number, the next three for the sign and the magnitude of the exponent, and the last three for the magnitude of the mantissa (Fig. 3.8). FIGURE 3.8 The smallest possible positive floating-point number from Example 3.5. 0 1 1 1 1 0 0 Sign of number Sign of exponent Magnitude of exponent Magnitude of mantissa 21 20 2–1 2–2 2–3
  • 86. 3.4 ROUND-OFF ERRORS 69 Solution. The smallest possible positive number is depicted in Fig. 3.8. The initial 0 indicates that the quantity is positive. The 1 in the second place designates that the exponent has a negative sign. The 1’s in the third and fourth places give a maximum value to the exponent of 1 3 21 1 1 3 20 5 3 Therefore, the exponent will be 23. Finally, the mantissa is specified by the 100 in the last three places, which conforms to 1 3 221 1 0 3 222 1 0 3 223 5 0.5 Although a smaller mantissa is possible (e.g., 000, 001, 010, 011), the value of 100 is used because of the limit imposed by normalization [Eq. (3.8)]. Thus, the smallest possible positive number for this system is 10.5 3 223 , which is equal to 0.0625 in the base-10 system. The next highest numbers are developed by increasing the mantissa, as in 0111101 5 (1 3 221 1 0 3 222 1 1 3 223 ) 3 223 5 (0.078125)10 0111110 5 (1 3 221 1 1 3 222 1 0 3 223 ) 3 223 5 (0.093750)10 0111111 5 (1 3 221 1 1 3 222 1 1 3 223 ) 3 223 5 (0.109375)10 Notice that the base-10 equivalents are spaced evenly with an interval of 0.015625. At this point, to continue increasing, we must decrease the exponent to 10, which gives a value of 1 3 21 1 0 3 20 5 2 The mantissa is decreased back to its smallest value of 100. Therefore, the next num- ber is 0110100 5 (1 3 221 1 0 3 222 1 0 3 223 ) 3 222 5 (0.125000)10 This still represents a gap of 0.125000 2 0.109375 5 0.015625. However, now when higher numbers are generated by increasing the mantissa, the gap is lengthened to 0.03125, 0110101 5 (1 3 221 1 0 3 222 1 1 3 223 ) 3 222 5 (0.156250)10 0110110 5 (1 3 221 1 1 3 222 1 0 3 223 ) 3 222 5 (0.187500)10 0110111 5 (1 3 221 1 1 3 222 1 1 3 223 ) 3 222 5 (0.218750)10 This pattern is repeated as each larger quantity is formulated until a maximum number is reached, 0011111 5 (1 3 221 1 1 3 222 1 1 3 223 ) 3 23 5 (7)10 The final number set is depicted graphically in Fig. 3.9. Figure 3.9 manifests several aspects of floating-point representation that have significance regarding computer round-off errors: 1. There Is a Limited Range of Quantities That May Be Represented. Just as for the integer case, there are large positive and negative numbers that cannot be represented. Attempts to employ numbers outside the acceptable range will result in what is called
  • 87. 70 APPROXIMATIONS AND ROUND-OFF ERRORS an overflow error. However, in addition to large quantities, the floating-point repre- sentation has the added limitation that very small numbers cannot be represented. This is illustrated by the underflow “hole” between zero and the first positive number in Fig. 3.9. It should be noted that this hole is enlarged because of the normalization constraint of Eq. (3.8). 2. There Are Only a Finite Number of Quantities That Can Be Represented within the Range. Thus, the degree of precision is limited. Obviously, irrational numbers cannot be represented exactly. Furthermore, rational numbers that do not exactly match one of the values in the set also cannot be represented precisely. The errors introduced by approximating both these cases are referred to as quantizing errors. The actual approximation is accomplished in either of two ways: chopping or rounding. For example, suppose that the value of p 5 3.14159265358 . . . is to be stored on a base- 10 number system carrying seven significant figures. One method of approximation would be to merely omit, or “chop off,” the eighth and higher terms, as in p 5 3.141592, with the introduction of an associated error of [Eq. (3.2)] Et 5 0.00000065 p This technique of retaining only the significant terms was originally dubbed “truncation” in computer jargon. We prefer to call it chopping to distinguish it from the truncation errors discussed in Chap. 4. Note that for the base-2 number system ⌬x x – ⌬x ⌬x/2 ⌬x/2 x – ⌬x x + ⌬x Chopping Rounding 0 0 7 Overflow Underflow “hole” at zero FIGURE 3.9 The hypothetical number system developed in Example 3.5. Each value is indicated by a tick mark. Only the positive numbers are shown. An identical set would also extend in the negative direction.
  • 88. 3.4 ROUND-OFF ERRORS 71 in Fig. 3.9, chopping means that any quantity falling within an interval of length Dx will be stored as the quantity at the lower end of the interval. Thus, the upper error bound for chopping is Dx. Additionally, a bias is introduced because all errors are positive. The shortcomings of chopping are attributable to the fact that the higher terms in the complete decimal representation have no impact on the shortened version. For instance, in our example of p, the first discarded digit is 6. Thus, the last retained digit should be rounded up to yield 3.141593. Such rounding reduces the error to Et 5 20.00000035 p Consequently, rounding yields a lower absolute error than chopping. Note that for the base-2 number system in Fig. 3.9, rounding means that any quantity falling within an interval of length Dx will be represented as the nearest allowable number. Thus, the upper error bound for rounding is Dxy2. Additionally, no bias is introduced because some errors are positive and some are negative. Some computers employ rounding. However, this adds to the computational overhead, and, consequently, many machines use simple chopping. This approach is justified under the supposition that the number of significant figures is large enough that resulting round-off error is usually negligible. 3. The Interval between Numbers, Dx, Increases as the Numbers Grow in Magnitude. It is this characteristic, of course, that allows floating-point representation to preserve significant digits. However, it also means that quantizing errors will be proportional to the magnitude of the number being represented. For normalized floating-point numbers, this proportionality can be expressed, for cases where chopping is employed, as Z¢xZ ZxZ # e (3.9) and, for cases where rounding is employed, as Z¢xZ ZxZ # e 2 (3.10) where % is referred to as the machine epsilon, which can be computed as e 5 b12t (3.11) where b is the number base and t is the number of significant digits in the mantissa. Notice that the inequalities in Eqs. (3.9) and (3.10) signify that these are error bounds. That is, they specify the worst cases. EXAMPLE 3.6 Machine Epsilon Problem Statement. Determine the machine epsilon and verify its effectiveness in char- acterizing the errors of the number system from Example 3.5. Assume that chopping is used. Solution. The hypothetical floating-point system from Example 3.5 employed values of the base b 5 2, and the number of mantissa bits t 5 3. Therefore, the machine epsi- lon would be [Eq. (3.11)] e 5 2123 5 0.25
  • 89. 72 APPROXIMATIONS AND ROUND-OFF ERRORS Consequently, the relative quantizing error should be bounded by 0.25 for chopping. The largest relative errors should occur for those quantities that fall just below the upper bound of the first interval between successive equispaced numbers (Fig. 3.10). Those numbers falling in the succeeding higher intervals would have the same value of Dx but a greater value of x and, hence, would have a lower relative error. An example of a maximum error would be a value falling just below the upper bound of the interval between (0.125000)10 and (0.156250)10. For this case, the error would be less than 0.03125 0.125000 5 0.25 Thus, the error is as predicted by Eq. (3.9). Largest relative error FIGURE 3.10 The largest quantizing error will occur for those values falling just below the upper bound of the first of a series of equispaced intervals. The magnitude dependence of quantizing errors has a number of practical applica- tions in numerical methods. Most of these relate to the commonly employed operation of testing whether two numbers are equal. This occurs when testing convergence of quantities as well as in the stopping mechanism for iterative processes (recall Example 3.2). For these cases, it should be clear that, rather than test whether the two quantities are equal, it is advisable to test whether their difference is less than an acceptably small tolerance. Further, it should also be evident that normalized rather than absolute differ- ence should be compared, particularly when dealing with numbers of large magnitude. In addition, the machine epsilon can be employed in formulating stopping or convergence criteria. This ensures that programs are portable—that is, they are not dependent on the computer on which they are implemented. Figure 3.11 lists pseudocode to automatically determine the machine epsilon of a binary computer. Extended Precision. It should be noted at this point that, although round-off errors can be important in contexts such as testing convergence, the number of significant digits carried on most computers allows most engineering computations to be performed with more than acceptable precision. For example, the hypothetical number system in Fig. 3.9 is a gross exaggeration that was employed for illustrative purposes. Commercial computers use much larger words and, consequently, allow numbers to be expressed with more than adequate precision. For example, computers that use IEEE format allow 24 bits to be used for the mantissa, which translates into about seven significant base-10 digits of precision1 with a range of about 10238 to 1039 . FIGURE 3.11 Pseudocode to determine machine epsilon for a binary computer. epsilon 5 1 DO IF (epsilon11 # 1)EXIT epsilon 5 epsilon/2 END DO epsilon 5 2 3 epsilon 1 Note that only 23 bits are actually used to store the mantissa. However, because of normalization, the first bit of the mantissa is always 1 and is, therefore, not stored. Thus, this first bit together with the 23 stored bits gives the 24 total bits of precision for the mantissa.
  • 90. 3.4 ROUND-OFF ERRORS 73 With this acknowledged, there are still cases where round-off error becomes critical. For this reason most computers allow the specification of extended precision. The most common of these is double precision, in which the number of words used to store floating-point numbers is doubled. It provides about 15 to 16 decimal digits of precision and a range of approximately 102308 to 10308 . In many cases, the use of double-precision quantities can greatly mitigate the effect of round-off errors. However, a price is paid for such remedies in that they also require more memory and execution time. The difference in execution time for a small calcula- tion might seem insignificant. However, as your programs become larger and more com- plicated, the added execution time could become considerable and have a negative impact on your effectiveness as a problem solver. Therefore, extended precision should not be used frivolously. Rather, it should be selectively employed where it will yield the maxi- mum benefit at the least cost in terms of execution time. In the following sections, we will look closer at how round-off errors affect computations, and in so doing provide a foundation of understanding to guide your use of the double-precision capability. Before proceeding, it should be noted that some of the commonly used software pack- ages (for example, Excel, Mathcad) routinely use double precision to represent numerical quantities. Thus, the developers of these packages decided that mitigating round-off errors would take precedence over any loss of speed incurred by using extended precision. Others, like MATLAB software, allow you to use extended precision, if you desire. 3.4.2 Arithmetic Manipulations of Computer Numbers Aside from the limitations of a computer’s number system, the actual arithmetic manipula- tions involving these numbers can also result in round-off error. In the following section, we will first illustrate how common arithmetic operations affect round-off errors. Then we will investigate a number of particular manipulations that are especially prone to round-off errors. Common Arithmetic Operations. Because of their familiarity, normalized base-10 numbers will be employed to illustrate the effect of round-off errors on simple addition, subtraction, multiplication, and division. Other number bases would behave in a similar fashion. To simplify the discussion, we will employ a hypothetical decimal computer with a 4-digit mantissa and a 1-digit exponent. In addition, chopping is used. Rounding would lead to similar though less dramatic errors. When two floating-point numbers are added, the mantissa of the number with the smaller exponent is modified so that the exponents are the same. This has the effect of align- ing the decimal points. For example, suppose we want to add 0.1557 ? 101 1 0.4381 ? 1021 . The decimal of the mantissa of the second number is shifted to the left a number of places equal to the difference of the exponents [1 2 (21) 5 2], as in 0.4381 # 1021 S 0.004381 # 101 Now the numbers can be added, 0.1557 # 101 0.004381 # 101 0.160081 # 101 and the result chopped to 0.1600 ? 101 . Notice how the last two digits of the second number that were shifted to the right have essentially been lost from the computation.
  • 91. 74 APPROXIMATIONS AND ROUND-OFF ERRORS Subtraction is performed identically to addition except that the sign of the subtrahend is reversed. For example, suppose that we are subtracting 26.86 from 36.41. That is, 0.3641 # 102 20.2686 # 102 0.0955 # 102 For this case the result is not normalized, and so we must shift the decimal one place to the right to give 0.9550 ? 101 5 9.550. Notice that the zero added to the end of the man- tissa is not significant but is merely appended to fill the empty space created by the shift. Even more dramatic results would be obtained when the numbers are very close, as in 0.7642 # 103 20.7641 # 103 0.0001 # 103 which would be converted to 0.1000 ? 100 5 0.1000. Thus, for this case, three nonsig- nificant zeros are appended. This introduces a substantial computational error because subsequent manipulations would act as if these zeros were significant. As we will see in a later section, the loss of significance during the subtraction of nearly equal numbers is among the greatest source of round-off error in numerical methods. Multiplication and division are somewhat more straightforward than addition or sub- traction. The exponents are added and the mantissas multiplied. Because multiplication of two n-digit mantissas will yield a 2n-digit result, most computers hold intermediate results in a double-length register. For example, 0.1363 # 103 3 0.6423 # 1021 5 0.08754549 # 102 If, as in this case, a leading zero is introduced, the result is normalized, 0.08754549 # 102 S 0.8754549 # 101 and chopped to give 0.8754 # 101 Division is performed in a similar manner, but the mantissas are divided and the exponents are subtracted. Then the results are normalized and chopped. Large Computations. Certain methods require extremely large numbers of arithmetic manipulations to arrive at their final results. In addition, these computations are often interdependent. That is, the later calculations are dependent on the results of earlier ones. Consequently, even though an individual round-off error could be small, the cumulative effect over the course of a large computation can be significant. EXAMPLE 3.7 Large Numbers of Interdependent Computations Problem Statement. Investigate the effect of round-off error on large numbers of in- terdependent computations. Develop a program to sum a number 100,000 times. Sum the number 1 in single precision, and 0.00001 in single and double precision. Solution. Figure 3.12 shows a Fortran 90 program that performs the summation. Whereas the single-precision summation of 1 yields the expected result, the single-precision
  • 92. 3.4 ROUND-OFF ERRORS 75 summation of 0.00001 yields a large discrepancy. This error is reduced significantly when 0.00001 is summed in double precision. Quantizing errors are the source of the discrepancies. Because the integer 1 can be represented exactly within the computer, it can be summed exactly. In contrast, 0.00001 cannot be represented exactly and is quantized by a value that is slightly different from its true value. Whereas this very slight discrepancy would be negligible for a small com- putation, it accumulates after repeated summations. The problem still occurs in double precision but is greatly mitigated because the quantizing error is much smaller. PROGRAM fig0312 IMPLICIT none INTEGER::i REAL::sum1, sum2, x1, x2 DOUBLE PRECISION::sum3, x3 sum1=0. sum2=0. sum3=0. x1=1. x2=1.e−5 x3=1.d−5 DO i=1,100000 sum1=sum1+x1 sum2=sum2+x2 sum3=sum3+x3 END DO PRINT *, sum1 PRINT *, sum2 PRINT *, sum3 END output: 100000.000000 1.000990 9.999999999980838E-001 FIGURE 3.12 Fortran 90 program to sum a number 105 times. The case sums the number 1 in single precision and the number 1025 in single and double precision. Note that the type of error illustrated by the previous example is somewhat atypical in that all the errors in the repeated operation are of the same sign. In most cases the errors of a long computation alternate sign in a random fashion and, thus, often cancel out. However, there are also instances where such errors do not cancel but, in fact, lead to a spurious final result. The following sections are intended to provide insight into ways in which this may occur. Adding a Large and a Small Number. Suppose we add a small number, 0.0010, to a large number, 4000, using a hypothetical computer with the 4-digit mantissa and the 1-digit exponent. We modify the smaller number so that its exponent matches the larger, 0.4000 # 104 0.0000001 # 104 0.4000001 # 104
  • 93. 76 APPROXIMATIONS AND ROUND-OFF ERRORS which is chopped to 0.4000 ? 104 . Thus, we might as well have not performed the addition! This type of error can occur in the computation of an infinite series. The initial terms in such series are often relatively large in comparison with the later terms. Thus, after a few terms have been added, we are in the situation of adding a small quantity to a large quantity. One way to mitigate this type of error is to sum the series in reverse order—that is, in ascending rather than descending order. In this way, each new term will be of com- parable magnitude to the accumulated sum (see Prob. 3.5). Subtractive Cancellation. This term refers to the round-off induced when subtracting two nearly equal floating-point numbers. One common instance where this can occur involves finding the roots of a quadratic equation or parabola with the quadratic formula, x1 x2 5 2b62b2 24ac 2a (3.12) For cases where b2 W 4ac, the difference in the numerator can be very small. In such cases, double precision can mitigate the problem. In addition, an alternative formulation can be used to minimize subtractive cancellation, x1 x2 5 22c b 6 2b2 2 4ac (3.13) An illustration of the problem and the use of this alternative formula are provided in the following example. EXAMPLE 3.8 Subtractive Cancellation Problem Statement. Compute the values of the roots of a quadratic equation with a 5 1, b 5 3000.001, and c 5 3. Check the computed values versus the true roots of x1 5 20.001 and x2 5 23000. Solution. Figure 3.13 shows an Excel/VBA program that computes the roots x1 and x2 on the basis of the quadratic formula [(Eq. (3.12)]. Note that both single- and double-precision versions are given. Whereas the results for x2 are adequate, the percent relative errors for x1 are poor for the single-precision version, et 5 2.4%. This level could be inadequate for many applied engineering problems. This result is particularly surprising because we are employing an analytical formula to obtain our solution! The loss of significance occurs in the line of both programs where two relatively large numbers are subtracted. Similar problems do not occur when the same numbers are added. On the basis of the above, we can draw the general conclusion that the quadratic formula will be susceptible to subtractive cancellation whenever b2 W 4ac. One way to circumvent this problem is to use double precision. Another is to recast the quadratic formula in the format of Eq. (3.13). As in the program output, both options give a much smaller error because the subtractive cancellation is minimized or avoided.
  • 94. 3.4 ROUND-OFF ERRORS 77 Option Explicit Sub fig0313() Dim a As Single, b As Single Dim c As Single, d As Single Dim x1 As Single, x2 As Single Dim x1r As Single Dim aa As Double, bb As Double Dim cc As Double, dd As Double Dim x11 As Double, x22 As Double 'Single precision: a = 1: b = 3000.001: c = 3 d = Sqr(b * b − 4 * a * c) x1 = (−b + d) / (2 * a) x2 = (−b − d) / (2 * a) 'Double precision: aa = 1: bb = 3000.001: cc = 3 dd = Sqr(bb * bb − 4 * aa * cc) x11 = (−bb + dd) / (2 * aa) x22 = (−bb − dd) / (2 * aa) 'Modified formula for first root 'single precision: x1r = −2 * c / (b + d) FIGURE 3.13 Excel/VBA program to determine the roots of a quadratic. 'Display results Sheets(sheet1).Select Range(b2).Select ActiveCell.Value = x1 ActiveCell.Offset(1, 0).Select ActiveCell.Value = x2 ActiveCell.Offset(2, 0).Select ActiveCell.Value = x11 ActiveCell.Offset(1, 0).Select ActiveCell.Value = x22 ActiveCell.Offset(2, 0).Select ActiveCell.Value = x1r End Sub OUTPUT: Note that, as in the foregoing example, there are times when subtractive cancellation can be circumvented by using a transformation. However, the only general remedy is to employ extended precision. Smearing. Smearing occurs whenever the individual terms in a summation are larger than the summation itself. As in the following example, one case where this occurs is in series of mixed signs. EXAMPLE 3.9 Evaluation of ex using Infinite Series Problem Statement. The exponential function y 5 ex is given by the infinite series y 5 1 1 x 1 x2 2 1 x3 3! 1 p Evaluate this function for x 5 10 and x 5 210, and be attentive to the problems of round-off error. Solution. Figure 3.14a gives an Excel/VBA program that uses the infinite series to evaluate ex . The variable i is the number of terms in the series, term is the value of the
  • 95. 78 APPROXIMATIONS AND ROUND-OFF ERRORS current term added to the series, and sum is the accumulative value of the series. The variable test is the preceding accumulative value of the series prior to adding term. The series is terminated when the computer cannot detect the difference between test and sum. Figure 3.14b shows the results of running the program for x 5 10. Note that this case is completely satisfactory. The final result is achieved in 31 terms with the series identical to the library function value within seven significant figures. Figure 3.14c shows similar results for x 5 210. However, for this case, the results of the series calculation are not even the same sign as the true result. As a matter of fact, the negative results are open to serious question because ex can never be less than zero. The problem here is caused by round-off error. Note that many of the terms that make up the (a) Program Option Explicit Sub fig0314() Dim term As Single, test As Single Dim sum As Single, x As Single Dim i As Integer i = 0: term = 1#: sum = 1#: test = 0# Sheets(sheet1).Select Range(b1).Select x = ActiveCell.Value Range(a3:c1003).ClearContents Range(a3).Select Do If sum = test Then Exit Do ActiveCell.Value = i ActiveCell.Offset(0, 1).Select ActiveCell.Value = term ActiveCell.Offset(0, 1).Select ActiveCell.Value = sum ActiveCell.Offset(1, -2).Select i = i + 1 test = sum term = x ^ i / _ Application.WorksheetFunction.Fact(i) sum = sum + term Loop ActiveCell.Offset(0, 1).Select ActiveCell.Value = Exact value = ActiveCell.Offset(0, 1).Select ActiveCell.Value = Exp(x) End Sub (b) Evaluation of e10 (c) Evaluation of e10 FIGURE 3.14 (a) An Excel/VBA program to evaluate ex using an infinite series. (b) Evaluation of ex . (c) Evaluation of e2x .
  • 96. PROBLEMS 79 sum are much larger than the final result of the sum. Furthermore, unlike the previous case, the individual terms vary in sign. Thus, in effect we are adding and subtracting large num- bers (each with some small error) and placing great significance on the differences—that is, subtractive cancellation. Thus, we can see that the culprit behind this example of smear- ing is, in fact, subtractive cancellation. For such cases it is appropriate to seek some other computational strategy. For example, one might try to compute y 5 e10 as y 5 (e21 )10 . Other than such a reformulation, the only general recourse is extended precision. Inner Products. As should be clear from the last sections, some infinite series are particularly prone to round-off error. Fortunately, the calculation of series is not one of the more common operations in numerical methods. A far more ubiquitous manipulation is the calculation of inner products, as in a n i51 xi yi 5 x1 y1 1 x2 y2 1 p 1 xn yn This operation is very common, particularly in the solution of simultaneous linear alge- braic equations. Such summations are prone to round-off error. Consequently, it is often desirable to compute such summations in extended precision. Although the foregoing sections should provide rules of thumb to mitigate round-off error, they do not provide a direct means beyond trial and error to actually determine the effect of such errors on a computation. In Chap. 4, we will introduce the Taylor series, which will provide a mathematical approach for estimating these effects. PROBLEMS 3.1 Convert the following base-2 numbers to base-10: (a) 101101, (b) 101.011, and (c) 0.01101. 3.2 Convert the following base-8 numbers to base-10: 71,263 and 3.147. 3.3 Compose your own program based on Fig. 3.11 and use it to determine your computer’s machine epsilon. 3.4 In a fashion similar to that in Fig. 3.11, write a short program to determine the smallest number, xmin, used on the computer you will be employing along with this book. Note that your computer will be unable to reliably distinguish between zero and a quantity that is smaller than this number. 3.5 The infinite series f(n) 5 a n i51 1 i4 converges on a value of f(n) 5 p4 y90 as n approaches infinity. Write a program in single precision to calculate f(n) for n 5 10,000 by computing the sum from i 5 1 to 10,000. Then repeat the calcu- lation but in reverse order—that is, from i 5 10,000 to 1 using incre- ments of 21. In each case, compute the true percent relative error. Explain the results. 3.6 Evaluate e25 using two approaches e2x 5 1 2 x 1 x2 2 2 x3 3! 1 p and e2x 5 1 ex 5 1 1 1 x 1 x2 2 1 x3 3! 1 p and compare with the true value of 6.737947 3 1023 . Use 20 terms to evaluate each series and compute true and approximate relative errors as terms are added. 3.7 The derivative of f(x) 5 1y(1 2 3x2 ) is given by 6x (1 2 3x2 )2 Do you expect to have difficulties evaluating this function at x 5 0.577? Try it using 3- and 4-digit arithmetic with chopping. 3.8 (a) Evaluate the polynomial y 5 x3 2 5x2 1 6x 1 0.55
  • 97. 80 APPROXIMATIONS AND ROUND-OFF ERRORS at x 5 1.37. Use 3-digit arithmetic with chopping. Evaluate the percent relative error. (b) Repeat (a) but express y as y 5 ((x 2 5)x 1 6)x 1 0.55 Evaluate the error and compare with part (a). 3.9 Calculate the random access memory (RAM) in megabytes necessary to store a multidimensional array that is 20 3 40 3 120. This array is double precision, and each value requires a 64-bit word. Recall that a 64-bit word 5 8 bytes and 1 kilobyte 5 210 bytes. Assume that the index starts at 1. 3.10 Determine the number of terms necessary to approximate cos x to 8 significant figures using the Maclaurin series approximation cosx 5 1 2 x2 2 1 x4 4! 2 x6 6! 1 x8 8! 2 p Calculate the approximation using a value of x 5 0.3p. Write a program to determine your result. 3.11 Use 5-digit arithmetic with chopping to determine the roots of the following equation with Eqs. (3.12) and (3.13) x2 2 5000.002x 1 10 Compute percent relative errors for your results. 3.12 How can the machine epsilon be employed to formulate a stopping criterion es for your programs? Provide an example. 3.13 The “divide and average” method, an old-time method for approximating the square root of any positive number a, can be formulated as x 5 x 1 ayx 2 Write a well-structured function to implement this algorithm based on the algorithm outlined in Fig. 3.3.
  • 98. 4 C H A P T E R 4 81 Truncation Errors and the Taylor Series Truncation errors are those that result from using an approximation in place of an exact mathematical procedure. For example, in Chap. 1 we approximated the deriva- tive of velocity of a falling parachutist by a finite-divided-difference equation of the form [Eq. (1.11)] dy dt ¢y ¢t 5 y(ti11) 2 y(ti) ti11 2 ti (4.1) A truncation error was introduced into the numerical solution because the difference equation only approximates the true value of the derivative (recall Fig. 1.4). In order to gain insight into the properties of such errors, we now turn to a mathematical formulation that is used widely in numerical methods to express functions in an approximate fashion— the Taylor series. 4.1 THE TAYLOR SERIES Taylor’s theorem (Box 4.1) and its associated formula, the Taylor series, is of great value in the study of numerical methods. In essence, the Taylor series provides a means to predict a function value at one point in terms of the function value and its deriva- tives at another point. In particular, the theorem states that any smooth function can be approximated as a polynomial. A useful way to gain insight into the Taylor series is to build it term by term. For example, the first term in the series is f(xi11) f(xi) (4.2) This relationship, called the zero-order approximation, indicates that the value of f at the new point is the same as its value at the old point. This result makes intuitive sense because if xi and xi+1 are close to each other, it is likely that the new value is probably similar to the old value. Equation (4.2) provides a perfect estimate if the function being approximated is, in fact, a constant. However, if the function changes at all over the interval, additional terms
  • 99. 82 TRUNCATION ERRORS AND THE TAYLOR SERIES Box 4.1 Taylor’s Theorem Taylor’s Theorem If the function f and its first n 1 1 derivatives are continuous on an in- terval containing a and x, then the value of the function at x is given by f(x) 5 f(a) 1 f ¿(a)(x 2 a) 1 f –(a) 2! (x 2 a)2 1 f (3) (a) 3! (x 2 a)3 1 p 1 f (n) (a) n! (x 2 a)n 1 Rn (B4.1.1) where the remainder Rn is defined as Rn 5 # x a (x 2 t)n n! f (n11) (t)dt (B4.1.2) where t 5 a dummy variable. Equation (B4.1.1) is called the Taylor series or Taylor’s formula. If the remainder is omitted, the right side of Eq. (B4.1.1) is the Taylor polynomial approximation to f(x). In essence, the theorem states that any smooth function can be ap- proximated as a polynomial. Equation (B4.1.2) is but one way, called the integral form, by which the remainder can be expressed. An alternative formulation can be derived on the basis of the integral mean-value theorem. First Theorem of Mean for Integrals If the function g is continuous and integrable on an interval contain- ing a and x, then there exists a point j between a and x such that # x a g(t) dt 5 g(j)(x 2 a) (B4.1.3) In other words, this theorem states that the integral can be repre- sented by an average value for the function g(j) times the interval length x 2 a. Because the average must occur between the mini- mum and maximum values for the interval, there is a point x 5 j at which the function takes on the average value. The first theorem is in fact a special case of a second mean- value theorem for integrals. Second Theorem of Mean for Integrals If the functions g and h are continuous and integrable on an interval containing a and x, and h does not change sign in the interval, then there exists a point j between a and x such that # x a g(t)h(t)dt 5 g(j) # x a h(t) dt (B4.1.4) Thus, Eq. (B4.1.3) is equivalent to Eq. (B4.1.4) with h(t) 5 1. The second theorem can be applied to Eq. (B4.1.2) with g(t) 5 f (n11) (t) h(t) 5 (x 2 t)n n! As t varies from a to x, h(t) is continuous and does not change sign. Therefore, if f(n11) (t) is continuous, then the integral mean-value theorem holds and Rn 5 f (n11) (j) (n 1 1)! (x 2 a)n11 This equation is referred to as the derivative or Lagrange form of the remainder. of the Taylor series are required to provide a better estimate. For example, the first-order approximation is developed by adding another term to yield f(xi11) f(xi) 1 f¿(xi)(xi11 2 xi) (4.3) The additional first-order term consists of a slope f9(xi) multiplied by the distance between xi and xi+1. Thus, the expression is now in the form of a straight line and is capable of predicting an increase or decrease of the function between xi and xi+1. Although Eq. (4.3) can predict a change, it is exact only for a straight-line, or linear, trend. Therefore, a second-order term is added to the series to capture some of the cur- vature that the function might exhibit: f(xi11) f(xi) 1 f ¿(xi)(xi11 2 xi) 1 f –(xi) 2! (xi11 2 xi)2 (4.4)
  • 100. 4.1 THE TAYLOR SERIES 83 In a similar manner, additional terms can be included to develop the complete Taylor series expansion: f(xi11) 5 f(xi) 1 f¿(xi)(xi11 2 xi) 1 f –(xi) 2! (xi11 2 xi)2 1 f (3) (xi) 3! (xi11 2 xi)3 1 p 1 f (n) (xi) n! (xi11 2 xi)n 1 Rn (4.5) Note that because Eq. (4.5) is an infinite series, an equal sign replaces the approximate sign that was used in Eqs. (4.2) through (4.4). A remainder term is included to account for all terms from n 1 1 to infinity: Rn 5 f (n11) (j) (n 1 1)! (xi11 2 xi)n11 (4.6) where the subscript n connotes that this is the remainder for the nth-order approximation and j is a value of x that lies somewhere between xi and xi+1. The introduction of the j is so important that we will devote an entire section (Sec. 4.1.1) to its derivation. For the time being, it is sufficient to recognize that there is such a value that provides an exact determination of the error. It is often convenient to simplify the Taylor series by defining a step size h 5 xi+1 2 xi and expressing Eq. (4.5) as f(xi11) 5 f(xi) 1 f¿(xi)h 1 f –(xi) 2! h2 1 f (3) (xi) 3! h3 1 p 1 f n (xi) n! hn 1 Rn (4.7) where the remainder term is now Rn 5 f (n11) (j) (n 1 1)! hn11 (4.8) EXAMPLE 4.1 Taylor Series Approximation of a Polynomial Problem Statement. Use zero- through fourth-order Taylor series expansions to approxi- mate the function f(x) 5 20.1x4 2 0.15x3 2 0.5x2 2 0.25x 1 1.2 from xi 5 0 with h 5 1. That is, predict the function’s value at xi+1 5 1. Solution. Because we are dealing with a known function, we can compute values for f(x) between 0 and 1. The results (Fig. 4.1) indicate that the function starts at f(0) 5 1.2 and then curves downward to f(1) 5 0.2. Thus, the true value that we are trying to predict is 0.2. The Taylor series approximation with n 5 0 is [Eq. (4.2)] f(xi11) . 1.2
  • 101. 84 TRUNCATION ERRORS AND THE TAYLOR SERIES Thus, as in Fig. 4.1, the zero-order approximation is a constant. Using this formulation results in a truncation error [recall Eq. (3.2)] of Et 5 0.2 2 1.2 5 21.0 at x 5 1. For n 5 1, the first derivative must be determined and evaluated at x 5 0: f ¿(0) 5 20.4(0.0)3 2 0.45(0.0)2 2 1.0(0.0) 2 0.25 5 20.25 Therefore, the first-order approximation is [Eq. (4.3)] f(xi11) . 1.2 2 0.25h which can be used to compute f(1) 5 0.95. Consequently, the approximation begins to capture the downward trajectory of the function in the form of a sloping straight line (Fig. 4.1). This results in a reduction of the truncation error to Et 5 0.2 2 0.95 5 20.75 For n 5 2, the second derivative is evaluated at x 5 0: f –(0) 5 21.2(0.0)2 2 0.9(0.0) 2 1.0 5 21.0 Therefore, according to Eq. (4.4), f(xi11) . 1.2 2 0.25h 2 0.5h2 and substituting h 5 1, f(1) 5 0.45. The inclusion of the second derivative now adds some downward curvature resulting in an improved estimate, as seen in Fig. 4.1. The truncation error is reduced further to 0.2 2 0.45 5 20.25. FIGURE 4.1 The approximation of f(x) 5 20.1x4 2 0.15x3 2 0.5x2 2 0.25x 1 1.2 at x 5 1 by zero-order, first-order, and second-order Taylor series expansions. Second order First order T r u e f(x) 1.0 0.5 0 xi = 0 xi + 1 = 1 x f(xi + 1) f(xi + 1) ⯝ f(xi) + f ⬘(xi)h + h2 h f ⬙(xi) 2! f(xi + 1) ⯝ f(xi) + f ⬘(xi)h f(xi + 1) ⯝ f(xi) f(xi) Zero order
  • 102. 4.1 THE TAYLOR SERIES 85 Additional terms would improve the approximation even more. In fact, the inclusion of the third and the fourth derivatives results in exactly the same equation we started with: f(x) 5 1.2 2 0.25h 2 0.5h2 2 0.15h3 2 0.1h4 where the remainder term is R4 5 f (5) (j) 5! h5 5 0 because the fifth derivative of a fourth-order polynomial is zero. Consequently, the Taylor series expansion to the fourth derivative yields an exact estimate at xi+1 5 1: f(1) 5 1.2 2 0.25(1) 2 0.5(1)2 2 0.15(1)3 2 0.1(1)4 5 0.2 In general, the nth-order Taylor series expansion will be exact for an nth-order polynomial. For other differentiable and continuous functions, such as exponentials and sinusoids, a finite number of terms will not yield an exact estimate. Each additional term will contribute some improvement, however slight, to the approximation. This behavior will be demonstrated in Example 4.2. Only if an infinite number of terms are added will the series yield an exact result. Although the above is true, the practical value of Taylor series expansions is that, in most cases, the inclusion of only a few terms will result in an approximation that is close enough to the true value for practical purposes. The assessment of how many terms are required to get “close enough” is based on the remainder term of the expansion. Recall that the remainder term is of the general form of Eq. (4.8). This relationship has two major drawbacks. First, j is not known exactly but merely lies somewhere between xi and xi+1. Second, to evaluate Eq. (4.8), we need to determine the (n 1 1)th derivative of f(x). To do this, we need to know f(x). However, if we knew f(x), there would be no need to perform the Taylor series expansion in the present context! Despite this dilemma, Eq. (4.8) is still useful for gaining insight into truncation errors. This is because we do have control over the term h in the equation. In other words, we can choose how far away from x we want to evaluate f(x), and we can control the num- ber of terms we include in the expansion. Consequently, Eq. (4.8) is usually expressed as Rn 5 O(hn11 ) where the nomenclature O(hn11 ) means that the truncation error is of the order of hn11 . That is, the error is proportional to the step size h raised to the (n 1 l)th power. Although this approximation implies nothing regarding the magnitude of the derivatives that multiply hn11 , it is extremely useful in judging the comparative error of numerical methods based on Taylor series expansions. For example, if the error is O(h), halving the step size will halve the error. On the other hand, if the error is O(h2 ), halving the step size will quarter the error. In general, we can usually assume that the truncation error is decreased by the ad- dition of terms to the Taylor series. In many cases, if h is sufficiently small, the first- and other lower-order terms usually account for a disproportionately high percent of the error. Thus, only a few terms are required to obtain an adequate estimate. This property is illustrated by the following example.
  • 103. 86 TRUNCATION ERRORS AND THE TAYLOR SERIES EXAMPLE 4.2 Use of Taylor Series Expansion to Approximate a Function with an Infinite Number of Derivatives Problem Statement. Use Taylor series expansions with n 5 0 to 6 to approximate f(x) 5 cos x at xi+1 5 py3 on the basis of the value of f(x) and its derivatives at xi 5 py4. Note that this means that h 5 py3 2 py4 5 py12. Solution. As with Example 4.1, our knowledge of the true function means that we can determine the correct value f(py3) 5 0.5. The zero-order approximation is [Eq. (4.3)] f a p 3 b cos a p 4 b 5 0.707106781 which represents a percent relative error of et 5 0.5 2 0.707106781 0.5 100% 5 241.4% For the first-order approximation, we add the first derivative term where f9(x) 5 2sin x: f a p 3 b cos a p 4 b 2 sin a p 4 b a p 12 b 5 0.521986659 which has et 5 24.40 percent. For the second-order approximation, we add the second derivative term where f 0(x) 5 2cos x: f a p 3 b cos a p 4 b 2 sin a p 4 b a p 12 b 2 cos(py4) 2 a p 12 b 2 5 0.497754491 with et 5 0.449 percent. Thus, the inclusion of additional terms results in an improved estimate. The process can be continued and the results listed, as in Table 4.1. Notice that the derivatives never go to zero, as was the case with the polynomial in Example 4.1. There- fore, each additional term results in some improvement in the estimate. However, also notice how most of the improvement comes with the initial terms. For this case, by the time we have added the third-order term, the error is reduced to 2.62 3 1022 percent, TABLE 4.1 Taylor series approximation of f(x) 5 cos x at xi11 5 p/3 using a base point of p/4. Values are shown for various orders (n) of approximation. Order n f(n) (x) f(P/3) Et 0 cos x 0.707106781 241.4 1 2sin x 0.521986659 24.4 2 2cos x 0.497754491 0.449 3 sin x 0.499869147 2.62 3 1022 4 cos x 0.500007551 21.51 3 1023 5 2sin x 0.500000304 26.08 3 1025 6 2cos x 0.499999988 2.44 3 1026
  • 104. 4.1 THE TAYLOR SERIES 87 4.1.1 The Remainder for the Taylor Series Expansion Before demonstrating how the Taylor series is actually used to estimate numerical errors, we must explain why we included the argument j in Eq. (4.8). A mathematical derivation is presented in Box 4.1. We will now develop an alternative exposition based on a some- what more visual interpretation. Then we can extend this specific case to the more general formulation. Suppose that we truncated the Taylor series expansion [Eq. (4.7)] after the zero- order term to yield f(xi11) f(xi) A visual depiction of this zero-order prediction is shown in Fig. 4.2. The remainder, or error, of this prediction, which is also shown in the illustration, consists of the infinite series of terms that were truncated: R0 5 f¿(xi)h 1 f –(xi) 2! h2 1 f (3) (xi) 3! h3 1 p It is obviously inconvenient to deal with the remainder in this infinite series format. One simplification might be to truncate the remainder itself, as in R0 f ¿(xi)h (4.9) FIGURE 4.2 Graphical depiction of a zero-order Taylor series prediction and remainder. Zero-order prediction Exact prediction f(x) xi xi + 1 x h f(xi) R0 which means that we have attained 99.9738 percent of the true value. Consequently, although the addition of more terms will reduce the error further, the improvement becomes negligible.
  • 105. 88 TRUNCATION ERRORS AND THE TAYLOR SERIES Although, as stated in the previous section, lower-order derivatives usually account for a greater share of the remainder than the higher-order terms, this result is still inexact because of the neglected second- and higher-order terms. This “inexactness” is implied by the approximate equality symbol () employed in Eq. (4.9). An alternative simplification that transforms the approximation into an equivalence is based on a graphical insight. As in Fig. 4.3, the derivative mean-value theorem states that if a function f(x) and its first derivative are continuous over an interval from xi to xi+1, then there exists at least one point on the function that has a slope, designated by f9(j), that is parallel to the line joining f(xi) and f(xi+1). The parameter j marks the x value where this slope occurs (Fig. 4.3). A physical illustration of this theorem is that, if you travel between two points with an average velocity, there will be at least one mo- ment during the course of the trip when you will be moving at that average velocity. By invoking this theorem it is simple to realize that, as illustrated in Fig. 4.3, the slope f9(j) is equal to the rise R0 divided by the run h, or f ¿(j) 5 R0 h which can be rearranged to give R0 5 f ¿(j)h (4.10) Thus, we have derived the zero-order version of Eq. (4.8). The higher-order versions are merely a logical extension of the reasoning used to derive Eq. (4.10). The first-order version is R1 5 f –(j) 2! h2 (4.11) FIGURE 4.3 Graphical depiction of the derivative mean-value theorem. f(x) xi xi + 1 x h R0 Slope = f⬘() Slope = R0 h
  • 106. 4.1 THE TAYLOR SERIES 89 For this case, the value of j conforms to the x value corresponding to the second de- rivative that makes Eq. (4.11) exact. Similar higher-order versions can be developed from Eq. (4.8). 4.1.2 Using the Taylor Series to Estimate Truncation Errors Although the Taylor series will be extremely useful in estimating truncation errors throughout this book, it may not be clear to you how the expansion can actually be applied to numerical methods. In fact, we have already done so in our example of the falling parachutist. Recall that the objective of both Examples 1.1 and 1.2 was to pre- dict velocity as a function of time. That is, we were interested in determining y(t). As specified by Eq. (4.5), y(t) can be expanded in a Taylor series: y(ti11) 5 y(ti) 1 y¿(ti)(ti11 2 ti) 1 y–(ti) 2! (ti11 2 ti)2 1 p 1 Rn (4.12) Now let us truncate the series after the first derivative term: y(ti11) 5 y(ti) 1 y¿(ti)(ti11 2 ti) 1 R1 (4.13) Equation (4.13) can be solved for y¿(ti) 5 y(ti11) 2 y(ti) ti11 2 ti 2 R1 ti11 2 ti (4.14) First-order Truncation approximation error The first part of Eq. (4.14) is exactly the same relationship that was used to approximate the derivative in Example 1.2 [Eq. (1.11)]. However, because of the Taylor series ap- proach, we have now obtained an estimate of the truncation error associated with this approximation of the derivative. Using Eqs. (4.6) and (4.14) yields R1 ti11 2 ti 5 y–(j) 2! (ti11 2 ti) (4.15) or R1 ti11 2 ti 5 O(ti11 2 ti) (4.16) Thus, the estimate of the derivative [Eq. (1.11) or the first part of Eq. (4.14)] has a trun- cation error of order ti11 2 ti. In other words, the error of our derivative approximation should be proportional to the step size. Consequently, if we halve the step size, we would expect to halve the error of the derivative. EXAMPLE 4.3 The Effect of Nonlinearity and Step Size on the Taylor Series Approximation Problem Statement. Figure 4.4 is a plot of the function f(x) 5 xm (E4.3.1) for m 5 1, 2, 3, and 4 over the range from x 5 1 to 2. Notice that for m 5 1 the function is linear, and as m increases, more curvature or nonlinearity is introduced into the function.
  • 107. 90 TRUNCATION ERRORS AND THE TAYLOR SERIES FIGURE 4.4 Plot of the function f(x) 5 xm for m 5 1, 2, 3, and 4. Notice that the function becomes more nonlinear as m increases. 1 0 5 10 15 2 x f(x) m = 2 m = 3 m = 4 m = 1 Employ the first-order Taylor series to approximate this function for various values of the exponent m and the step size h. Solution. Equation (E4.3.1) can be approximated by a first-order Taylor series expansion, as in f(xi11) 5 f(xi) 1 mxm21 i h (E4.3.2) which has a remainder R1 5 f–(xi) 2! h2 1 f (3) (xi) 3! h3 1 f (4) (xi) 4! h4 1 p First, we can examine how the approximation performs as m increases—that is, as the func- tion becomes more nonlinear. For m 5 1, the actual value of the function at x 5 2 is 2.
  • 108. 4.1 THE TAYLOR SERIES 91 The Taylor series yields f(2) 5 1 1 1(1) 5 2 and R1 5 0 The remainder is zero because the second and higher derivatives of a linear function are zero. Thus, as expected, the first-order Taylor series expansion is perfect when the underlying function is linear. For m 5 2, the actual value is f(2) 5 22 5 4. The first-order Taylor series approximation is f(2) 5 1 1 2(1) 5 3 and R1 5 2 2(1)2 1 0 1 0 1 p 5 1 Thus, because the function is a parabola, the straight-line approximation results in a discrepancy. Note that the remainder is determined exactly. For m 5 3, the actual value is f(2) 5 23 5 8. The Taylor series approximation is f(2) 5 1 1 3(1)2 (1) 5 4 and R1 5 6 2(1)2 1 6 6(1)3 1 0 1 0 1 p 5 4 Again, there is a discrepancy that can be determined exactly from the Taylor series. For m 5 4, the actual value is f(2) 5 24 5 16. The Taylor series approximation is f(2) 5 1 1 4(1)3 (1) 5 5 and R1 5 12 2 (1)2 1 24 6 (1)3 1 24 24(1)4 1 0 1 0 1 p 5 11 On the basis of these four cases, we observe that R1 increases as the function be- comes more nonlinear. Furthermore, R1 accounts exactly for the discrepancy. This is because Eq. (E4.3.1) is a simple monomial with a finite number of derivatives. This permits a complete determination of the Taylor series remainder. Next, we will examine Eq. (E4.3.2) for the case m 5 4 and observe how R1 changes as the step size h is varied. For m 5 4, Eq. (E4.3.2) is f(x 1 h) 5 f(x) 1 4x3 i h If x 5 1, f(1) 5 1 and this equation can be expressed as f(1 1 h) 5 1 1 4h with a remainder of R1 5 6h2 1 4h3 1 h4
  • 109. 92 TRUNCATION ERRORS AND THE TAYLOR SERIES This leads to the conclusion that the discrepancy will decrease as h is reduced. Also, at sufficiently small values of h, the error should become proportional to h2 . That is, as h is halved, the error will be quartered. This behavior is confirmed by Table 4.2 and Fig. 4.5. Thus, we conclude that the error of the first-order Taylor series approximation decreases as m approaches 1 and as h decreases. Intuitively, this means that the Taylor FIGURE 4.5 Log-log plot of the remainder R1 of the first-order Taylor series approximation of the function f(x) 5 x4 versus step size h. A line with a slope of 2 is also shown to indicate that as h decreases, the error becomes proportional to h2 . 兩Slope兩 = 2 0.1 1 0.001 0.01 0.1 1 10 0.01 h R1 TABLE 4.2 Comparison of the exact value of the function f(x) 5 x4 with the first-order Taylor series approximation. Both the function and the approximation are evaluated at x 1 h, where x 5 1. First-Order h True Approximation R1 1 16 5 11 0.5 5.0625 3 2.0625 0.25 2.441406 2 0.441406 0.125 1.601807 1.5 0.101807 0.0625 1.274429 1.25 0.024429 0.03125 1.130982 1.125 0.005982 0.015625 1.063980 1.0625 0.001480
  • 110. 4.1 THE TAYLOR SERIES 93 series becomes more accurate when the function we are approximating becomes more like a straight line over the interval of interest. This can be accomplished either by reduc- ing the size of the interval or by “straightening” the function by reducing m. Obviously, the latter option is usually not available in the real world because the functions we analyze are typically dictated by the physical problem context. Consequently, we do not have control of their lack of linearity, and our only recourse is reducing the step size or includ- ing additional terms in the Taylor series expansion. 4.1.3 Numerical Differentiation Equation (4.14) is given a formal label in numerical methods—it is called a finite divided difference. It can be represented generally as f¿(xi) 5 f(xi11) 2 f(xi) xi11 2 xi 1 O(xi11 2 xi) (4.17) or f¿(xi) 5 ¢fi h 1 O(h) (4.18) where D fi is referred to as the first forward difference and h is called the step size, that is, the length of the interval over which the approximation is made. It is termed a “forward” difference because it utilizes data at i and i 1 1 to estimate the derivative (Fig. 4.6a). The entire term D fyh is referred to as a first finite divided difference. This forward divided difference is but one of many that can be developed from the Taylor series to approximate derivatives numerically. For example, backward and centered difference approximations of the first derivative can be developed in a fashion similar to the derivation of Eq. (4.14). The former utilizes values at xi21 and xi (Fig. 4.6b), whereas the latter uses values that are equally spaced around the point at which the derivative is estimated (Fig. 4.6c). More accurate approximations of the first derivative can be devel- oped by including higher-order terms of the Taylor series. Finally, all the above versions can also be developed for second, third, and higher derivatives. The following sections provide brief summaries illustrating how some of these cases are derived. Backward Difference Approximation of the First Derivative. The Taylor series can be expanded backward to calculate a previous value on the basis of a present value, as in f(xi21) 5 f(xi) 2 f¿(xi)h 1 f–(xi) 2! h2 2 p (4.19) Truncating this equation after the first derivative and rearranging yields f¿(xi) f(xi) 2 f(xi21) h 5 §fi h (4.20) where the error is O(h), and = fi is referred to as the first backward difference. See Fig. 4.6b for a graphical representation.
  • 111. 94 TRUNCATION ERRORS AND THE TAYLOR SERIES FIGURE 4.6 Graphical depiction of (a) forward, (b) backward, and (c) centered finite-divided-difference approximations of the first derivative. 2h xi–1 xi+1 x f(x) True derivative Approximation (c) h xi–1 xi x f(x) True derivative A p p r o x i m a t i o n (b) h xi xi+1 x f(x) True derivative Approximation (a)
  • 112. 4.1 THE TAYLOR SERIES 95 Centered Difference Approximation of the First Derivative. A third way to approxi- mate the first derivative is to subtract Eq. (4.19) from the forward Taylor series expansion: f(xi11) 5 f(xi) 1 f¿(xi)h 1 f–(xi) 2! h2 1 p (4.21) to yield f(xi11) 5 f(xi21) 1 2f¿(xi)h 1 2f (3) (xi) 3! h3 1 p which can be solved for f ¿(xi) 5 f(xi11) 2 f(xi21) 2h 2 f (3) (xi) 6 h2 2 p or f¿(xi) 5 f(xi11) 2 f(xi21) 2h 2 O(h2 ) (4.22) Equation (4.22) is a centered difference representation of the first derivative. Notice that the truncation error is of the order of h2 in contrast to the forward and backward approximations that were of the order of h. Consequently, the Taylor series analysis yields the practical information that the centered difference is a more accurate represen- tation of the derivative (Fig. 4.6c). For example, if we halve the step size using a forward or backward difference, we would approximately halve the truncation error, whereas for the central difference, the error would be quartered. EXAMPLE 4.4 Finite-Divided-Difference Approximations of Derivatives Problem Statement. Use forward and backward difference approximations of O(h) and a centered difference approximation of O(h2 ) to estimate the first derivative of f(x) 5 20.1x4 2 0.15x3 2 0.5x2 2 0.25x 1 1.25 at x 5 0.5 using a step size h 5 0.5. Repeat the computation using h 5 0.25. Note that the derivative can be calculated directly as f¿(x) 5 20.4x3 2 0.45x2 2 1.0x 2 0.25 and can be used to compute the true value as f9(0.5) 5 20.9125. Solution. For h 5 0.5, the function can be employed to determine xi21 5 0 f(xi21) 5 1.2 xi 5 0.5 f(xi) 5 0.925 xi11 5 1.0 f(xi11) 5 0.2 These values can be used to compute the forward divided difference [Eq. (4.17)], f ¿(0.5) 0.2 2 0.925 0.5 5 21.45 Zet Z 5 58.9%
  • 113. 96 TRUNCATION ERRORS AND THE TAYLOR SERIES the backward divided difference [Eq. (4.20)], f¿(0.5) 0.925 2 1.2 0.5 5 20.55 Zet Z 5 39.7% and the centered divided difference [Eq. (4.22)], f ¿(0.5) 0.2 2 1.2 1.0 5 21.0 Zet Z 5 9.6% For h 5 0.25, xi21 5 0.25 f(xi21) 5 1.10351563 xi 5 0.5 f(xi) 5 0.925 xi11 5 0.75 f(xi11) 5 0.63632813 which can be used to compute the forward divided difference, f¿(0.5) 0.63632813 2 0.925 0.25 5 21.155 Zet Z 5 26.5% the backward divided difference, f¿(0.5) 0.925 2 1.10351563 0.25 5 20.714 Zet Z 5 21.7% and the centered divided difference, f¿(0.5) 0.63632813 2 1.10351563 0.5 5 20.934 Zet Z 5 2.4% For both step sizes, the centered difference approximation is more accurate than forward or backward differences. Also, as predicted by the Taylor series analysis, halving the step size approximately halves the error of the backward and forward differences and quarters the error of the centered difference. Finite Difference Approximations of Higher Derivatives. Besides first derivatives, the Taylor series expansion can be used to derive numerical estimates of higher deriva- tives. To do this, we write a forward Taylor series expansion for f(xi12) in terms of f(xi): f(xi12) 5 f(xi) 1 f¿(xi)(2h) 1 f–(xi) 2! (2h)2 1 p (4.23) Equation (4.21) can be multiplied by 2 and subtracted from Eq. (4.23) to give f(xi12) 2 2f(xi11) 5 2f(xi) 1 f–(xi)h2 1 p which can be solved for f–(xi) 5 f(xi12) 2 2f(xi11) 1 f(xi) h2 1 O(h) (4.24)
  • 114. 4.2 ERROR PROPAGATION 97 This relationship is called the second forward finite divided difference. Similar manipula- tions can be employed to derive a backward version f–(xi) 5 f(xi) 2 2f(xi21) 1 f(xi22) h2 1 O(h) and a centered version f –(xi) 5 f(xi11) 2 2f(xi) 1 f(xi21) h2 1 O(h2 ) As was the case with the first-derivative approximations, the centered case is more accurate. Notice also that the centered version can be alternatively expressed as f–(xi) f(xi11) 2 f(xi) h 2 f(xi) 2 f(xi21) h h Thus, just as the second derivative is a derivative of a derivative, the second divided difference approximation is a difference of two first divided differences. We will return to the topic of numerical differentiation in Chap. 23. We have intro- duced you to the topic at this point because it is a very good example of why the Taylor series is important in numerical methods. In addition, several of the formulas introduced in this section will be employed prior to Chap. 23. 4.2 ERROR PROPAGATION The purpose of this section is to study how errors in numbers can propagate through mathematical functions. For example, if we multiply two numbers that have errors, we would like to estimate the error in the product. 4.2.1 Functions of a Single Variable Suppose that we have a function f(x) that is dependent on a single independent variable x. Assume that x̃ is an approximation of x. We, therefore, would like to assess the effect of the discrepancy between x and x̃ on the value of the function. That is, we would like to estimate ¢f(x̃) 5 Z f(x) 2 f(x̃)Z The problem with evaluating ¢f(x̃) is that f(x) is unknown because x is unknown. We can overcome this difficulty if x̃ is close to x and f(x̃) is continuous and differentiable. If these conditions hold, a Taylor series can be employed to compute f(x) near f(x̃), as in f(x) 5 f(x̃) 1 f¿(x̃)(x 2 x̃) 1 f–(x̃) 2 (x 2 x̃)2 1 p Dropping the second- and higher-order terms and rearranging yields f(x) 2 f(x̃) f¿(x̃)(x 2 x̃)
  • 115. 98 TRUNCATION ERRORS AND THE TAYLOR SERIES or ¢f(x̃) 5 Z f¿(x̃)Z¢x̃ (4.25) where ¢f(x̃) 5 Z f(x) 2 f(x̃)Z represents an estimate of the error of the function and ¢x̃ 5 Zx 2 x̃Z represents an estimate of the error of x. Equation (4.25) provides the capabil- ity to approximate the error in f(x) given the derivative of a function and an estimate of the error in the independent variable. Figure 4.7 is a graphical illustration of the operation. EXAMPLE 4.5 Error Propagation in a Function of a Single Variable Problem Statement. Given a value of x̃ 5 2.5 with an error of ¢x̃ 5 0.01, estimate the resulting error in the function f(x) 5 x3 . Solution. Using Eq. (4.25), ¢f(x̃) 3(2.5)2 (0.01) 5 0.1875 Because f(2.5) 5 15.625, we predict that f(2.5) 5 15.625 6 0.1875 or that the true value lies between 15.4375 and 15.8125. In fact, if x were actually 2.49, the function could be evaluated as 15.4382, and if x were 2.51, it would be 15.8132. For this case, the first-order error analysis provides a fairly close estimate of the true error. True error 兩f⬘(x)兩⌬x Estimated error x x x f(x) ⌬x FIGURE 4.7 Graphical depiction of first- order error propagation.
  • 116. 4.2 ERROR PROPAGATION 99 4.2.2 Functions of More than One Variable The foregoing approach can be generalized to functions that are dependent on more than one independent variable. This is accomplished with a multivariable version of the Taylor series. For example, if we have a function of two independent variables u and y, the Taylor series can be written as f(ui11, yi11) 5 f(ui, yi) 1 0f 0u (ui11 2 ui) 1 0f 0y (yi11 2 yi) 1 1 2! c 02 f 0u2 (ui11 2 ui)2 1 2 02 f 0u0y (ui11 2 ui)(yi11 2 yi) 1 02 f 0y2 (yi11 2 yi)2 d 1 p (4.26) where all partial derivatives are evaluated at the base point i. If all second-order and higher terms are dropped, Eq. (4.26) can be solved for ¢f(ũ, ỹ) 5 ` 0f 0u ` ¢ũ 1 ` 0f 0y ` ¢ỹ where ¢ũ and ¢ỹ 5 estimates of the errors in u and y, respectively. For n independent variables x̃1, x̃2, p , x̃n having errors ¢x̃1, ¢x̃2, p , ¢xn the following general relationship holds: ¢f(x̃1, x̃2, p , x̃n) ` 0f 0x1 ` ¢x̃1 1 ` 0f 0x2 ` ¢x̃2 1 p 1 ` 0f 0xn ` ¢x̃n (4.27) EXAMPLE 4.6 Error Propagation in a Multivariable Function Problem Statement. The deflection y of the top of a sailboat mast is y 5 FL4 8EI where F 5 a uniform side loading (N/m), L 5 height (m), E 5 the modulus of elasticity (N/m2 ), and I 5 the moment of inertia (m4 ). Estimate the error in y given the following data: F̃ 5 750 N/m ¢F̃ 5 30 N/m L̃ 5 9 m ¢L̃ 5 0.03 m Ẽ 5 7.5 3 109 N/m2 ¢Ẽ 5 5 3 107 N/m2 Ĩ 5 0.0005 m4 ¢I ˜ 5 0.000005 m4 Solution. Employing Eq. (4.27) gives ¢y(F̃, L̃, Ẽ, I ˜) 5 ` 0y 0F ` ¢F̃ 1 ` 0y 0L ` ¢L̃ 1 ` 0y 0E ` ¢Ẽ 1 ` 0y 0I ` ¢I ˜ or ¢y(F̃, L̃, Ẽ, Ĩ2 L̃4 8ẼĨ ¢F̃ 1 F̃L̃3 2ẼI ˜ ¢L̃ 1 F̃L̃4 8Ẽ2 Ĩ ¢Ẽ 1 F̃L̃4 8ẼI ˜2 ¢I ˜
  • 117. 100 TRUNCATION ERRORS AND THE TAYLOR SERIES Substituting the appropriate values gives ¢y 5 0.006561 1 0.002187 1 0.001094 1 0.00164 5 0.011482 Therefore, y 5 0.164025 6 0.011482. In other words, y is between 0.152543 and 0.175507 m. The validity of these estimates can be verified by substituting the extreme values for the variables into the equation to generate an exact minimum of ymin 5 720(8.97)4 8(7.55 3 109 )0.000505 5 0.152818 and ymax 5 780(9.03)4 8(7.45 3 109 )0.000495 5 0.175790 Thus, the first-order estimates are reasonably close to the exact values. Equation (4.27) can be employed to define error propagation relationships for common mathematical operations. The results are summarized in Table 4.3. We will leave the derivation of these formulas as a homework exercise. 4.2.3 Stability and Condition The condition of a mathematical problem relates to its sensitivity to changes in its input values. We say that a computation is numerically unstable if the uncertainty of the input values is grossly magnified by the numerical method. These ideas can be studied using a first-order Taylor series f(x) 5 f(x̃) 1 f¿(x̃)(x 2 x̃) This relationship can be employed to estimate the relative error of f(x) as in f(x) 2 f(x̃) f(x) f¿(x̃)(x 2 x̃) f(x̃) The relative error of x is given by x 2 x̃ x̃ TABLE 4.3 Estimated error bounds associated with common mathematical operations using inexact numbers ũ and ṽ. Operation Estimated Error Addition ¢(ũ 1 ṽ) ¢ũ 1 ¢ṽ Subtraction ¢(ũ 2 ṽ) ¢ũ 1 ¢ṽ Multiplication ¢(ũ 3 ṽ) ZũZ¢ṽ 1 ZṽZ¢ũ Division ¢ a ũ ṽ b ZũZ¢ṽ 1 ZṽZ¢ũ ZṽZ2
  • 118. 4.3 TOTAL NUMERICAL ERROR 101 A condition number can be defined as the ratio of these relative errors Condition number 5 x ˜ f¿(x̃) f(x̃) (4.28) The condition number provides a measure of the extent to which an uncertainty in x is magnified by f(x). A value of 1 tells us that the function’s relative error is identical to the relative error in x. A value greater than 1 tells us that the relative error is amplified, whereas a value less than 1 tells us that it is attenuated. Functions with very large values are said to be ill-conditioned. Any combination of factors in Eq. (4.28) that increases the numerical value of the condition number will tend to magnify uncertainties in the computation of f(x). EXAMPLE 4.7 Condition Number Problem Statement. Compute and interpret the condition number for f(x) 5 tan x for x̃ 5 p 2 1 0.1a p 2 b f(x) 5 tan x for x̃ 5 p 2 1 0.01a p 2 b Solution. The condition number is computed as Condition number 5 x̃(1ycos2 x) tan x̃ For x̃ 5 py2 1 0.1(py2), Condition number 5 1.7279(40.86) 26.314 5 211.2 Thus, the function is ill-conditioned. For x̃ 5 py2 1 0.01(py2), the situation is even worse: Condition number 5 1.5865(4053) 263.66 5 2101 For this case, the major cause of ill conditioning appears to be the derivative. This makes sense because in the vicinity of py2, the tangent approaches both positive and negative infinity. 4.3 TOTAL NUMERICAL ERROR The total numerical error is the summation of the truncation and round-off errors. In general, the only way to minimize round-off errors is to increase the number of significant figures of the computer. Further, we have noted that round-off error will increase due to subtractive cancellation or due to an increase in the number of computations in an analy- sis. In contrast, Example 4.4 demonstrated that the truncation error can be reduced by decreasing the step size. Because a decrease in step size can lead to subtractive cancella- tion or to an increase in computations, the truncation errors are decreased as the round-off
  • 119. 102 TRUNCATION ERRORS AND THE TAYLOR SERIES errors are increased. Therefore, we are faced by the following dilemma: The strategy for decreasing one component of the total error leads to an increase of the other component. In a computation, we could conceivably decrease the step size to minimize truncation errors only to discover that in doing so, the round-off error begins to dominate the solu- tion and the total error grows! Thus, our remedy becomes our problem (Fig. 4.8). One challenge that we face is to determine an appropriate step size for a particular computation. We would like to choose a large step size in order to decrease the amount of calculations and round-off errors without incurring the penalty of a large truncation error. If the total error is as shown in Fig. 4.8, the challenge is to identify the point of diminishing returns where round-off error begins to negate the benefits of step-size reduction. In actual cases, however, such situations are relatively uncommon because most com- puters carry enough significant figures that round-off errors do not predominate. Neverthe- less, they sometimes do occur and suggest a sort of “numerical uncertainty principle” that places an absolute limit on the accuracy that may be obtained using certain computerized numerical methods. We explore such a case in the following section. 4.3.1 Error Analysis of Numerical Differentiation As described in the Sec. 4.1.3, a centered difference approximation of the first derivative can be written as (Eq. 4.22): f¿(xi) 5 f(xi11) 2 f(xi21) 2h 2 f (3) (j) 6 h2 (4.29) True Finite-difference Truncation value approximation error FIGURE 4.8 A graphical depiction of the trade-off between round-off and truncation error that sometimes comes into play in the course of a numerical method. The point of diminishing returns is shown, where round-off error begins to negate the benefits of step-size reduction. Total error Round-off error Truncation error log step size log error Point of diminishing returns
  • 120. 4.3 TOTAL NUMERICAL ERROR 103 Thus, if the two function values in the numerator of the finite-difference approximation have no round-off error, the only error is due to truncation. However, because we are using digital computers, the function values do include round-off error as in f(xi21) 5 f˜(xi21) 1 ei21 f(xi11) 5 f˜(xi11) 1 ei11 where the f˜’s are the rounded function values and the e’s are the associated round-off errors. Substituting these values into Eq. (4.29) gives f¿(xi) 5 f˜(xi11) 2 f˜(xi21) 2h 1 ei11 2 ei21 2h 2 f (3) (j) 6 h2 True Finite-difference Round-off Truncation value approximation error error We can see that the total error of the finite-difference approximation consists of a round- off error which increases with step size and a truncation error that decreases with step size. Assuming that the absolute value of each component of the round-off error has an upper bound of e, the maximum possible value of the difference ei+1 2 ei will be 2e. Further, assume that the third derivative has a maximum absolute value of M. An upper bound on the absolute value of the total error can therefore be represented as Total error 5 ` f¿(xi) 2 f˜(xi11) 2 f˜(xi21) 2h ` # e h 1 h2 M 6 (4.30) An optimal step size can be determined by differentiating Eq. (4.30), setting the result equal to zero and solving for hopt 5 B 3 3e M (4.31) EXAMPLE 4.8 Round-off and Truncation Errors in Numerical Differentiation Problem Statement. In Example 4.4, we used a centered difference approximation of O(h2 ) to estimate the first derivative of the following function at x 5 0.5, f(x) 5 20.1x4 2 0.15x3 2 0.5x2 2 0.25x 1 1.2 Perform the same computation starting with h 5 1. Then progressively divide the step size by a factor of 10 to demonstrate how round-off becomes dominant as the step size is reduced. Relate your results to Eq. (4.31). Recall that the true value of the derivative is 20.9125. Solution. We can develop a program to perform the computations and plot the results. For the present example, we have done this with a MATLAB software M-file. Notice that we pass both the function and its analytical derivative as arguments. In addition, the function generates a plot of the results.
  • 121. 104 TRUNCATION ERRORS AND THE TAYLOR SERIES function diffex(func,dfunc,x,n) format long dftrue=dfunc(x); h=1; H(1)=h; D(1)=(func(x+h)−func(x−h))/(2*h); E(1)=abs(dftrue−D(1)); for i=2:n h=h/10; H(i)=h; D(i)=(func(x+h)−func(x−h))/(2*h); E(i)=abs(dftrue−D(i)); end L=[H' D' E']'; fprintf(' step size finite difference true errorn'); fprintf('%14.10f %16.14f %16.13fn',L); loglog(H,E),xlabel('Step Size'),ylabel('Error') title('Plot of Error Versus Step Size') format short The M-file can then be run using the following commands: ff=@(x) −0.1*x^4−0.15*x^3−0.5*x^2−0.25*x+1.2; df=@(x) −0.4*x^3−0.45*x^2−x−0.25; diffex(ff,df,0.5,11) When the function is run, the following numeric output is generated along with the plot (Fig. 4.9): step size finite difference true error 1.0000000000 −1.26250000000000 0.3500000000000 0.1000000000 −0.91600000000000 0.0035000000000 0.0100000000 −0.91253500000000 0.0000350000000 0.0010000000 −0.91250035000001 0.0000003500000 0.0001000000 −0.91250000349985 0.0000000034998 0.0000100000 −0.91250000003318 0.0000000000332 0.0000010000 −0.91250000000542 0.0000000000054 0.0000001000 −0.91249999945031 0.0000000005497 0.0000000100 −0.91250000333609 0.0000000033361 0.0000000010 −0.91250001998944 0.0000000199894 0.0000000001 −0.91250007550059 0.0000000755006 The results are as expected. At first, round-off is minimal and the estimate is dominated by truncation error. Hence, as in Eq. (4.30), the total error drops by a factor of 100 each time we divide the step by 10. However, starting at h 5 0.0001, we see round-off error begin to creep in and erode the rate at which the error diminishes. A minimum error is reached at h 5 1026 . Beyond this point, the error increases as round-off dominates. Because we are dealing with an easily differentiable function, we can also investigate whether these results are consistent with Eq. (4.31). First, we can estimate M by evalu- ating the function’s third derivative as M 5 Z f 3 (0.5) Z 5 Z 22.4(0.5) 2 0.9Z 5 2.1
  • 122. 4.3 TOTAL NUMERICAL ERROR 105 Because MATLAB has a precision of about 15 to 16 base-10 digits, a rough estimate of the upper bound on round-off would be about e 5 0.5 3 10216 . Substituting these values into Eq. (4.31) gives hopt 5 B 3 3(0.5 3 10216 ) 2.1 5 4.3 3 1026 which is on the same order as the result of 1 3 1026 obtained with our computer program. 4.3.2 Control of Numerical Errors For most practical cases, we do not know the exact error associated with numerical meth- ods. The exception, of course, is when we have obtained the exact solution that makes our numerical approximations unnecessary. Therefore, for most engineering applications we must settle for some estimate of the error in our calculations. There are no systematic and general approaches to evaluating numerical errors for all problems. In many cases, error estimates are based on the experience and judgment of the engineer. Although error analysis is to a certain extent an art, there are several practical program- ming guidelines we can suggest. First and foremost, avoid subtracting two nearly equal numbers. Loss of significance almost always occurs when this is done. Sometimes you can rearrange or reformulate the problem to avoid subtractive cancellation. If this is not pos- sible, you may want to use extended-precision arithmetic. Furthermore, when adding and FIGURE 4.9 Plot of error versus step size. Error 10–12 10–10 10–8 10–6 10–4 Step size Plot of error versus step size 10–2 10–0 10–10 10–8 10–6 10–4 10–2 100
  • 123. 106 TRUNCATION ERRORS AND THE TAYLOR SERIES subtracting numbers, it is best to sort the numbers and work with the smallest numbers first. This avoids loss of significance. Beyond these computational hints, one can attempt to predict total numerical errors using theoretical formulations. The Taylor series is our primary tool for analysis of both truncation and round-off errors. Several examples have been presented in this chapter. Prediction of total numerical error is very complicated for even moderately sized problems and tends to be pessimistic. Therefore, it is usually attempted for only small-scale tasks. The tendency is to push forward with the numerical computations and try to estimate the accuracy of your results. This can sometimes be done by seeing if the results satisfy some condition or equation as a check. Or it may be possible to substitute the results back into the original equation to check that it is actually satisfied. Finally you should be prepared to perform numerical experiments to increase your awareness of computational errors and possible ill-conditioned problems. Such experi- ments may involve repeating the computations with a different step size or method and comparing the results. We may employ sensitivity analysis to see how our solution changes when we change model parameters or input values. We may want to try different nu- merical algorithms that have different theoretical foundations, are based on different com- putational strategies, or have different convergence properties and stability characteristics. When the results of numerical computations are extremely critical and may involve loss of human life or have severe economic ramifications, it is appropriate to take special precautions. This may involve the use of two or more independent groups to solve the same problem so that their results can be compared. The roles of errors will be a topic of concern and analysis in all sections of this book. We will leave these investigations to specific sections. 4.4 BLUNDERS, FORMULATION ERRORS, AND DATA UNCERTAINTY Although the following sources of error are not directly connected with most of the numerical methods in this book, they can sometimes have great impact on the success of a modeling effort. Thus, they must always be kept in mind when applying numerical techniques in the context of real-world problems. 4.4.1 Blunders We are all familiar with gross errors, or blunders. In the early years of computers, er- roneous numerical results could sometimes be attributed to malfunctions of the computer itself. Today, this source of error is highly unlikely, and most blunders must be attributed to human imperfection. Blunders can occur at any stage of the mathematical modeling process and can contribute to all the other components of error. They can be avoided only by sound knowledge of fundamental principles and by the care with which you approach and design your solution to a problem. Blunders are usually disregarded in discussions of numerical methods. This is no doubt due to the fact that, try as we may, mistakes are to a certain extent unavoidable. However, we believe that there are a number of ways in which their occurrence can be
  • 124. 4.4 BLUNDERS, FORMULATION ERRORS, AND DATA UNCERTAINTY 107 minimized. In particular, the good programming habits that were outlined in Chap. 2 are extremely useful for mitigating programming blunders. In addition, there are usually simple ways to check whether a particular numerical method is working properly. Throughout this book, we discuss ways to check the results of numerical calculations. 4.4.2 Formulation Errors Formulation, or model, errors relate to bias that can be ascribed to incomplete mathe- matical models. An example of a negligible formulation error is the fact that Newton’s second law does not account for relativistic effects. This does not detract from the ad- equacy of the solution in Example 1.1 because these errors are minimal on the time and space scales associated with the falling parachutist problem. However, suppose that air resistance is not linearly proportional to fall velocity, as in Eq. (1.7), but is a function of the square of velocity. If this were the case, both the analytical and numerical solutions obtained in the Chap. 1 would be erroneous because of formulation error. Further consideration of formulation error is included in some of the engineering applications in the remainder of the book. You should be cognizant of these problems and realize that, if you are working with a poorly conceived model, no numerical method will provide adequate results. 4.4.3 Data Uncertainty Errors sometimes enter into an analysis because of uncertainty in the physical data upon which a model is based. For instance, suppose we wanted to test the falling parachutist model by having an individual make repeated jumps and then measuring his or her velocity after a specified time interval. Uncertainty would undoubtedly be associated with these measurements, since the parachutist would fall faster during some jumps than during others. These errors can exhibit both inaccuracy and imprecision. If our instru- ments consistently underestimate or overestimate the velocity, we are dealing with an inaccurate, or biased, device. On the other hand, if the measurements are randomly high and low, we are dealing with a question of precision. Measurement errors can be quantified by summarizing the data with one or more well-chosen statistics that convey as much information as possible regarding specific characteristics of the data. These descriptive statistics are most often selected to represent (1) the location of the center of the distribution of the data and (2) the degree of spread of the data. As such, they provide a measure of the bias and imprecision, respectively. We will return to the topic of characterizing data uncertainty in Part Five. Although you must be cognizant of blunders, formulation errors, and uncertain data, the numerical methods used for building models can be studied, for the most part, inde- pendently of these errors. Therefore, for most of this book, we will assume that we have not made gross errors, we have a sound model, and we are dealing with error-free mea- surements. Under these conditions, we can study numerical errors without complicating factors.
  • 125. 108 TRUNCATION ERRORS AND THE TAYLOR SERIES PROBLEMS 4.1 The following infinite series can be used to approximate ex : ex 5 1 1 x 1 x2 2 1 x3 3! 1 p 1 xn n! (a) Prove that this Maclaurin series expansion is a special case of the Taylor series expansion [(Eq. (4.7)] with xi 5 0 and h 5 x. (b) Use the Taylor series to estimate f(x) 5 e2x at xi11 5 1 for xi 5 0.2. Employ the zero-, first-, second-, and third-order versions and compute the ZetZ for each case. 4.2 The Maclaurin series expansion for cos x is cos x 5 1 2 x2 2 1 x4 4! 2 x6 6! 1 x8 8! 2 p Starting with the simplest version, cos x 5 1, add terms one at a time to estimate cos(py3). After each new term is added, compute the true and approximate percent relative errors. Use your pocket calculator to determine the true value. Add terms until the absolute value of the approximate error estimate falls below an error crite- rion conforming to two significant figures. 4.3 Perform the same computation as in Prob. 4.2, but use the Maclaurin series expansion for the sin x to estimate sin(py3). sin x 5 x 2 x3 3! 1 x5 5! 2 x7 7! 1 p 4.4 The Maclaurin series expansion for the arctangent of x is de- fined for ZxZ # 1 as arctan x 5 a q n50 (21)n 2n 1 1 x2n11 (a) Write out the first four terms (n 5 0, . . . , 3). (b) Starting with the simplest version, arctan x 5 x, add terms one at a time to estimate arctan(py6).After each new term is added, compute the true and approximate percent relative errors. Use your calculator to determine the true value. Add terms until the absolute value of the approximate error estimate falls below an error criterion conforming to two significant figures. 4.5 Use zero- through third-order Taylor series expansions to predict f (3) for f(x) 5 25x3 2 6x2 1 7x 2 88 using a base point at x 5 1. Compute the true percent relative error et for each approximation. 4.6 Use zero- through fourth-order Taylor series expansions to pre- dict f(2.5) for f(x) 5 ln x using a base point at x 5 1. Compute the true percent relative error et for each approximation. Discuss the meaning of the results. 4.7 Use forward and backward difference approximations of O(h) and a centered difference approximation of O(h2 ) to estimate the first derivative of the function examined in Prob. 4.5. Evaluate the derivative at x 5 2 using a step size of h 5 0.2. Compare your results with the true value of the derivative. Interpret your results on the basis of the remainder term of the Taylor series expansion. 4.8 Use a centered difference approximation of O(h2 ) to estimate the second derivative of the function examined in Prob. 4.5. Per- form the evaluation at x 5 2 using step sizes of h 5 0.25 and 0.125. Compare your estimates with the true value of the second deriva- tive. Interpret your results on the basis of the remainder term of the Taylor series expansion. 4.9 The Stefan-Boltzmann law can be employed to estimate the rate of radiation of energy H from a surface, as in H 5 AesT4 where H is in watts, A 5 the surface area (m2 ), e 5 the emissivity that characterizes the emitting properties of the surface (dimension- less), s 5 a universal constant called the Stefan-Boltzmann con- stant (5 5.67 3 1028 W m22 K24 ), and T 5 absolute temperature (K). Determine the error of H for a steel plate with A 5 0.15 m2 , e 5 0.90, and T 5 650 6 20. Compare your results with the exact error. Repeat the computation but with T 5 650 6 40. Interpret your results. 4.10 Repeat Prob. 4.9 but for a copper sphere with radius 5 0.15 6 0.01 m, e 5 0.90 6 0.05, and T 5 550 6 20. 4.11 Recall that the velocity of the falling parachutist can be com- puted by [Eq. (1.10)], y(t) 5 gm c (1 2 e2(cym)t ) Use a first-order error analysis to estimate the error of v at t 5 6, if g 5 9.81 and m 5 50 but c 5 12.5 6 1.5. 4.12 Repeat Prob. 4.11 with g 5 9.81, t 5 6, c 5 12.5 6 1.5, and m 5 50 6 2. 4.13 Evaluate and interpret the condition numbers for (a) f(x) 5 1Zx 2 1Z 1 1 for x 5 1.00001 (b) f(x) 5 e2x for x 5 10 (c) f(x) 5 2x2 1 1 2 x for x 5 300 (d) f(x) 5 e2x 2 1 x for x 5 0.001 (e) f(x) 5 sin x 1 1 cos x for x 5 1.0001p 4.14 Employing ideas from Sec. 4.2, derive the relationships from Table 4.3. 4.15 Prove that Eq. (4.4) is exact for all values of x if f(x) 5 ax2 1 bx 1 c.
  • 126. PROBLEMS 109 4.16 Manning’s formula for a rectangular channel can be written as Q 5 1 n (BH)5y3 (B 1 2H)2y3 1S where Q 5 flow (m3 /s), n 5 a roughness coefficient, B 5 width (m), H 5 depth (m), and S 5 slope. You are applying this formula to a stream where you know that the width 5 20 m and the depth 5 0.3 m. Unfortunately, you know the roughness and the slope to only a 6 10% precision. That is, you know that the roughness is about 0.03 with a range from 0.027 to 0.033 and the slope is 0.0003 with a range from 0.00027 to 0.00033. Use a first-order error analysis to determine the sensitivity of the flow prediction to each of these two factors. Which one should you attempt to measure with more precision? 4.17 If ZxZ , 1, it is known that 1 1 2 x 5 1 1 x 1 x2 1 x3 1 p Repeat Prob. 4.1 for this series for x 5 0.1. 4.18 A missile leaves the ground with an initial velocity y0 form- ing an angle f0 with the vertical as shown in Fig. P4.18. The maxi- mum desired altitude is aR where R is the radius of the earth. The laws of mechanics can be used to show that sin f0 5 (1 1 a) B 1 2 a 1 1 a a ye y0 b 2 where ye 5 the escape velocity of the missile. It is desired to fire the missile and reach the design maximum altitude within an accuracy of 62%. Determine the range of values for f0 if yeyy0 5 2 and a 5 0.25. 4.19 To calculate a planet’s space coordinates, we have to solve the function f (x) 5 x 2 1 2 0.5 sin x Let the base point be a 5 xi 5 py2 on the interval [0, p]. Determine the highest-order Taylor series expansion resulting in a maximum error of 0.015 on the specified interval. The error is equal to the absolute value of the difference between the given function and the specific Taylor series expansion. (Hint: Solve graphically.) 4.20 Consider the function f(x) 5 x3 2 2x 1 4 on the interval [22, 2] with h 5 0.25. Use the forward, backward, and centered finite differ- ence approximations for the first and second derivatives so as to graphically illustrate which approximation is most accurate. Graph all three first derivative finite difference approximations along with the theoretical, and do the same for the second derivative as well. 4.21 Derive Eq. (4.31). 4.22 Repeat Example 4.8, but for f(x) 5 cos(x) at x 5 py6. 4.23 Repeat Example 4.8, but for the forward divided difference (Eq. 4.17). 4.24 Develop a well-structured program to compute the Maclaurin series expansion for the cosine function as described in Prob. 4.2. The function should have the following features: • Iterate until the relative error falls below a stopping criterion (es) or exceeds a maximum number of iterations (maxit). Allow the user to specify values for these parameters. • Include default values of es (5 0.000001) and maxit (5 100) in the event that they are not specified by the user. • Return the estimate of cos(x), the approximate relative error, the number of iterations, and the true relative error (that you can calculate based on the built-in cosine function). FIGURE P4.18 R v0 0
  • 127. 110 EPILOGUE: PART ONE 110 EPILOGUE: PART ONE PT1.4 TRADE-OFFS Numerical methods are scientific in the sense that they represent systematic techniques for solving mathematical problems. However, there is a certain degree of art, subjective judgment, and compromise associated with their effective use in engineering practice. For each problem, you may be confronted with several alternative numerical methods and many different types of computers. Thus, the elegance and efficiency of different approaches to problems is highly individualistic and correlated with your ability to choose wisely among options. Unfortunately, as with any intuitive process, the factors influencing this choice are difficult to communicate. Only by experience can these skills be fully comprehended and honed. However, because these skills play such a prominent role in the effective implementation of the methods, we have included this section as an introduction to some of the trade-offs that you must consider when selecting a numerical method and the tools for implementing the method. It is hoped that the discussion that follows will influence your orientation when approaching subsequent material. Also, it is hoped that you will refer back to this material when you are confronted with choices and trade-offs in the remainder of the book. 1. Type of Mathematical Problem. As delineated previously in Fig. PT1.2, several types of mathematical problems are discussed in this book: (a) Roots of equations. (b) Systems of simultaneous linear algebraic equations. (c) Optimization. (d) Curve fitting. (e) Numerical integration. (f) Ordinary differential equations. (g) Partial differential equations. You will probably be introduced to the applied aspects of numerical methods by confront- ing a problem in one of the above areas. Numerical methods will be required because the problem cannot be solved efficiently using analytical techniques. You should be cognizant of the fact that your professional activities will eventually involve problems in all the above areas. Thus, the study of numerical methods and the selection of automatic computation equipment should, at the minimum, consider these basic types of problems. More advanced problems may require capabilities of handling areas such as functional approximation, integral equations, etc. These areas typically demand greater computation power or advanced methods not covered in this text. Other references such as Carnahan, Luther, and Wilkes (1969); Hamming (1973); Ralston and Rabinowitz (1978); Burden and Faires (2005); and Moler (2004) should be consulted for problems beyond the scope of this book. In addition, at the end of each part of this text, we include a brief summary
  • 128. PT1.4 TRADE-OFFS 111 and references for advanced methods to provide you with avenues for pursuing further studies of numerical methods. 2. Type, Availability, Precision, Cost, and Speed of Computer. You may have the option of working with a variety of computation tools. These range from pocket calculators to large mainframe computers. Of course, any of the tools can be used to implement any numerical method (including simple paper and pencil). It is usually not a question of ultimate capability but rather of cost, convenience, speed, dependability, repeatability, and precision. Although each of the tools will continue to have utility, the recent rapid advances in the performance of personal computers have already had a major impact on the engineering profession. We expect this revolution will spread as technological improvements continue because personal computers offer an excellent compromise in convenience, cost, precision, speed, and storage capacity. Furthermore, they can be readily applied to most practical engineering problems. 3. Program Development Cost versus Software Cost versus Run-Time Cost. Once the types of mathematical problems to be solved have been identified and the computer system has been selected, it is appropriate to consider software and run-time costs. Software development may represent a substantial effort in many engineering projects and may therefore be a significant cost. In this regard, it is particularly important that you be very well acquainted with the theoretical and practical aspects of the relevant numerical methods. In addition, you should be familiar with professionally developed software. Low-cost software is widely available to implement numerical methods that may be readily adapted to a broad variety of problems. 4. Characteristics of the Numerical Method. When computer hardware and software costs are high, or if computer availability is limited (for example, on some timeshare systems), it pays to choose carefully the numerical method to suit the situation. On the other hand, if the problem is still at the exploratory stage and computer access and cost are not problems, it may be appropriate for you to select a numerical method that always works but may not be the most computationally efficient. The numerical methods available to solve any particular type of problem involve the types of trade- offs just discussed and others: (a) Number of Initial Guesses or Starting Points. Some of the numerical methods for finding roots of equations or solving differential equations require the user to specify initial guesses or starting points. Simple methods usually require one value, whereas complicated methods may require more than one value. The advantages of complicated methods that are computationally efficient may be offset by the requirement for multiple starting points.You must use your experience and judgment to assess the trade-offs for each particular problem. (b) Rate of Convergence. Certain numerical methods converge more rapidly than others. However, this rapid convergence may require more refined initial guesses and more complex programming than a method with slower convergence. Again, you must use your judgment in selecting a method. Faster is not always better. (c) Stability. Some numerical methods for finding roots of equations or solutions for systems of linear equations may diverge rather than converge on the correct answer for certain problems. Why would you tolerate this possibility when confronted with design or planning problems? The answer is that these methods may be highly efficient when they work. Thus, trade-offs again emerge. You must decide
  • 129. 112 EPILOGUE: PART ONE if your problem requirements justify the effort needed to apply a method that may not always converge. (d) Accuracy and Precision. Some numerical methods are simply more accurate or precise than others. Good examples are the various equations available for numerical integration. Usually, the performance of low-accuracy methods can be improved by decreasing the step size or increasing the number of applications over a given interval. Is it better to use a low-accuracy method with small step sizes or a high-accuracy method with large step sizes? This question must be addressed on a case-by-case basis taking into consideration the additional factors such as cost and ease of programming. In addition, you must also be concerned with round-off errors when you are using multiple applications of low-accuracy methods and when the number of computations becomes large. Here the number of significant figures handled by the computer may be the deciding factor. (e) Breadth of Application. Some numerical methods can be applied to only a limited class of problems or to problems that satisfy certain mathematical restrictions. Other methods are not affected by such limitations. You must evaluate whether it is worth your effort to develop programs that employ techniques that are appropriate for only a limited number of problems. The fact that such techniques may be widely used suggests that they have advantages that will often outweigh their disadvantages. Obviously, trade-offs are occurring. (f) Special Requirements. Some numerical techniques attempt to increase accuracy and rate of convergence using additional or special information. An example would be to use estimated or theoretical values of errors to improve accuracy. However, these improvements are generally not achieved without some inconvenience in terms of added computing costs or increased program complexity. (g) Programming Effort Required. Efforts to improve rates of convergence, stability, and accuracy can be creative and ingenious. When improvements can be made without increasing the programming complexity, they may be considered elegant and will probably find immediate use in the engineering profession. However, if they require more complicated programs, you are again faced with a trade-off situation that may or may not favor the new method. It is clear that the above discussion concerning a choice of numerical methods reduces to one of cost and accuracy. The costs are those involved with computer time and program development. Appropriate accuracy is a question of professional judg- ment and ethics. 5. Mathematical Behavior of the Function, Equation, or Data. In selecting a particular numerical method, type of computer, and type of software, you must consider the complexity of your functions, equations, or data. Simple equations and smooth data may be appropriately handled by simple numerical algorithms and inexpensive computers. The opposite is true for complicated equations and data exhibiting discontinuities. 6. Ease of Application (User-Friendly?). Some numerical methods are easy to apply; others are difficult. This may be a consideration when choosing one method over
  • 130. PT1.6 ADVANCED METHODS AND ADDITIONAL REFERENCES 113 another. This same idea applies to decisions regarding program development costs versus professionally developed software. It may take considerable effort to convert a difficult program to one that is user-friendly. Ways to do this were introduced in Chap. 2 and are elaborated throughout the book. 7. Maintenance. Programs for solving engineering problems require maintenance because during application, difficulties invariably occur. Maintenance may require changing the program code or expanding the documentation. Simple programs and numerical algorithms are simpler to maintain. The chapters that follow involve the development of various types of numerical methods for various types of mathematical problems. Several alternative methods will be given in each chapter. These various methods (rather than a single method chosen by the au- thors) are presented because there is no single “best” method. There is no best method because there are many trade-offs that must be considered when applying the methods to practical problems. A table that highlights the trade-offs involved in each method will be found at the end of each part of the book. This table should assist you in selecting the appropriate numerical procedure for your particular problem context. PT1.5 IMPORTANT RELATIONSHIPS AND FORMULAS Table PT1.2 summarizes important information that was presented in Part One. The table can be consulted to quickly access important relationships and formulas. The epilogue of each part of the book will contain such a summary. PT1.6 ADVANCED METHODS AND ADDITIONAL REFERENCES The epilogue of each part of the book will also include a section designed to facilitate and encourage further studies of numerical methods. This section will reference other books on the subject as well as material related to more advanced methods.1 To extend the background provided in Part One, numerous manuals on computer programming are available. It would be difficult to reference all the excellent books and manuals pertaining to specific languages and computers. In addition, you probably already have material from your previous exposure to programming. However, if this is your first experience with computers, your instructor and fellow students should also be able to advise you regarding good reference books for the machines and languages available at your school. As for error analysis, any good introductory calculus book will include supplemen- tary material related to subjects such as the Taylor series expansion. Texts by Swokowski (1979), Thomas and Finney (1979), and Simmons (1985) provide very readable discus- sions of these subjects. In addition, Taylor (1982) presents a nice introduction to error analysis. Finally, although we hope that our book serves you well, it is always good to con- sult other sources when trying to master a new subject. Burden and Faires (2005); Ralston 1 Books are referenced only by author here; a complete bibliography will be found at the back of this text.
  • 131. 114 EPILOGUE: PART ONE TABLE PT1.2 Summary of important information presented in Part One. Error Definitions True error Et 5 true value 2 approximation True percent relative error et 5 true value 2 approximation true value 100% Approximate percent relative error ea 5 present approximation 2 previous approximation present approximation 100% Stopping criterion Terminate computation when ea , es where es is the desired percent relative error Taylor Series Taylor series expansion f (xi11) 5 f (xi) 1 f'(xi)h 1 f''(xi) 2! h2 1 f'''(xi) 3! h3 1 p 1 f (n) (xi) n! hn 1 Rn where Remainder Rn 5 f (n11) (j) (n 1 1)! hn11 or Rn 5 O(hn11 ) Numerical Differentiation First forward finite divided difference f'(xi) 5 f (xi11) 2 f (xi) h 1 O(h) (Other divided differences are summarized in Chaps. 4 and 23.) Error Propagation For n independent variables x1, x2,..., xn having errors ¢x̃1, ¢x̃2 , p ,¢x̃n, the error in the function f can be estimated via ¢f 5 ` 0f 0x1 ` ¢x̃1 1 ` 0f 0x2 ` ¢x̃2 1 p 1 ` 0f 0xn ` ¢x̃n and Rabinowitz (1978); Hoffman (1992); and Carnahan, Luther, and Wilkes (1969) pro- vide comprehensive discussions of most numerical methods. Other enjoyable books on the subject are Gerald and Wheatley (2004), and Cheney and Kincaid (2008). In addition, Press et al. (2007) include algorithms to implement a variety of methods, and Moler (2004) and Chapra (2007) are devoted to numerical methods with MATLAB software.
  • 132. This page intentionally left blank
  • 134. 117 PT2.1 MOTIVATION Years ago, you learned to use the quadratic formula x 5 2b 6 2b2 2 4ac 2a (PT2.1) to solve f(x) 5 ax2 1 bx 1 c 5 0 (PT2.2) The values calculated with Eq. (PT2.1) are called the “roots” of Eq. (PT2.2). They rep- resent the values of x that make Eq. (PT2.2) equal to zero. Thus, we can define the root of an equation as the value of x that makes f(x) 5 0. For this reason, roots are sometimes called the zeros of the equation. Although the quadratic formula is handy for solving Eq. (PT2.2), there are many other functions for which the root cannot be determined so easily. For these cases, the numerical methods described in Chaps. 5, 6, and 7 provide efficient means to obtain the answer. PT2.1.1 Noncomputer Methods for Determining Roots Before the advent of digital computers, there were several ways to solve for roots of algebraic and transcendental equations. For some cases, the roots could be obtained by direct methods, as was done with Eq. (PT2.1). Although there were equations like this that could be solved directly, there were many more that could not. For example, even an apparently simple function such as f(x) 5 e2x 2 x cannot be solved analytically. In such instances, the only alternative is an approximate solution technique. One method to obtain an approximate solution is to plot the function and determine where it crosses the x axis. This point, which represents the x value for which f(x) 5 0, is the root. Graphical techniques are discussed at the beginning of Chaps. 5 and 6. Although graphical methods are useful for obtaining rough estimates of roots, they are limited because of their lack of precision. An alternative approach is to use trial and error. This “technique” consists of guessing a value of x and evaluating whether f(x) is zero. If not (as is almost always the case), another guess is made, and f(x) is again evaluated to determine whether the new value provides a better estimate of the root. The process is repeated until a guess is obtained that results in an f(x) that is close to zero. Such haphazard methods are obviously inefficient and inadequate for the require- ments of engineering practice. The techniques described in Part Two represent alterna- tives that are also approximate but employ systematic strategies to home in on the true root. As elaborated on in the following pages, the combination of these systematic meth- ods and computers makes the solution of most applied roots-of-equations problems a simple and efficient task. ROOTS OF EQUATIONS
  • 135. 118 ROOTS OF EQUATIONS PT2.1.2 Roots of Equations and Engineering Practice Although they arise in other problem contexts, roots of equations frequently occur in the area of engineering design. Table PT2.1 lists several fundamental principles that are routinely used in design work. As introduced in Chap. 1, mathematical equations or models derived from these principles are employed to predict dependent variables as a function of independent variables, forcing functions, and parameters. Note that in each case, the dependent variables reflect the state or performance of the system, whereas the parameters represent its properties or composition. An example of such a model is the equation, derived from Newton’s second law, used in Chap. 1 for the parachutist’s velocity: y 5 gm c (1 2 e2(cym)t ) (PT2.3) where velocity y 5 the dependent variable, time t 5 the independent variable, the grav- itational constant g 5 the forcing function, and the drag coefficient c and mass m 5 parameters. If the parameters are known, Eq. (PT2.3) can be used to predict the parachut- ist’s velocity as a function of time. Such computations can be performed directly because y is expressed explicitly as a function of time. That is, it is isolated on one side of the equal sign. TABLE PT2.1 Fundamental principles used in engineering design problems. Fundamental Dependent Independent Parameters Principle Variable Variable Heat balance Temperature Time and Thermal properties position of material and geometry of system Mass balance Concentration or Time and Chemical behavior quantity of mass position of material, mass transfer coefficients, and geometry of system Force balance Magnitude and Time and Strength of material, direction of forces position structural properties, and geometry of system Energy balance Changes in the kinetic- Time and Thermal properties, and potential-energy position mass of material, states of the system and system geometry Newton’s laws Acceleration, velocity, Time and Mass of material, of motion or location position system geometry, and dissipative parameters such as friction or drag Kirchhoff’s laws Currents and voltages Time Electrical properties in electric circuits of systems such as resistance, capacitance, and inductance
  • 136. PT2.2 MATHEMATICAL BACKGROUND 119 However, suppose we had to determine the drag coefficient for a parachutist of a given mass to attain a prescribed velocity in a set time period. Although Eq. (PT2.3) provides a mathematical representation of the interrelationship among the model vari- ables and parameters, it cannot be solved explicitly for the drag coefficient. Try it. There is no way to rearrange the equation so that c is isolated on one side of the equal sign. In such cases, c is said to be implicit. This represents a real dilemma, because many engineering design problems involve specifying the properties or composition of a system (as represented by its parameters) to ensure that it performs in a desired manner (as represented by its variables). Thus, these problems often require the determination of implicit parameters. The solution to the dilemma is provided by numerical methods for roots of equations. To solve the problem using numerical methods, it is conventional to reexpress Eq. (PT2.3). This is done by subtracting the dependent variable y from both sides of the equation to give f(c) 5 gm c (1 2 e2(cym)t ) 2 y (PT2.4) The value of c that makes f(c) 5 0 is, therefore, the root of the equation. This value also represents the drag coefficient that solves the design problem. Part Two of this book deals with a variety of numerical and graphical methods for deter- mining roots of relationships such as Eq. (PT2.4). These techniques can be applied to engi- neering design problems that are based on the fundamental principles outlined in Table PT2.1 as well as to many other problems confronted routinely in engineering practice. PT2.2 MATHEMATICAL BACKGROUND For most of the subject areas in this book, there is usually some prerequisite mathematical background needed to successfully master the topic. For example, the concepts of error estimation and the Taylor series expansion discussed in Chaps. 3 and 4 have direct relevance to our discussion of roots of equations. Additionally, prior to this point we have mentioned the terms “algebraic” and “transcendental” equations. It might be helpful to formally define these terms and discuss how they relate to the scope of this part of the book. By definition, a function given by y 5 f(x) is algebraic if it can be expressed in the form fn yn 1 fn21yn21 1 p 1 f1y 1 f0 5 0 (PT2.5) where fi 5 an ith-order polynomial in x. Polynomials are a simple class of algebraic functions that are represented generally by fn(x) 5 a0 1 a1x 1 a2x2 1 p 1 anxn (PT2.6) where n 5 the order of the polynomial and the a’s 5 constants. Some specific examples are f2(x) 5 1 2 2.37x 1 7.5x2 (PT2.7) and f6(x) 5 5x2 2 x3 1 7x6 (PT2.8)
  • 137. 120 ROOTS OF EQUATIONS A transcendental function is one that is nonalgebraic. These include trigonometric, exponential, logarithmic, and other, less familiar, functions. Examples are f(x) 5 ln x2 2 1 (PT2.9) and f(x) 5 e20.2x sin (3x 2 0.5) (PT2.10) Roots of equations may be either real or complex. Although there are cases where com- plex roots of nonpolynomials are of interest, such situations are less common than for polynomials. As a consequence, the standard methods for locating roots typically fall into two somewhat related but primarily distinct problem areas: 1. The determination of the real roots of algebraic and transcendental equations. These techniques are usually designed to determine the value of a single real root on the basis of foreknowledge of its approximate location. 2. The determination of all real and complex roots of polynomials. These methods are specifically designed for polynomials. They systematically determine all the roots of the polynomial rather than determining a single real root given an approximate location. In this book we discuss both. Chapters 5 and 6 are devoted to the first category. Chapter 7 deals with polynomials. PT2.3 ORIENTATION Some orientation is helpful before proceeding to the numerical methods for determining roots of equations. The following is intended to give you an overview of the material in Part Two. In addition, some objectives have been included to help you focus your efforts when studying the material. PT2.3.1 Scope and Preview Figure PT2.1 is a schematic representation of the organization of Part Two. Examine this figure carefully, starting at the top and working clockwise. After the present introduction, Chap. 5 is devoted to bracketing methods for finding roots. These methods start with guesses that bracket, or contain, the root and then sys- tematically reduce the width of the bracket. Two specific methods are covered: bisection and false position. Graphical methods are used to provide visual insight into the tech- niques. Error formulations are developed to help you determine how much computational effort is required to estimate the root to a prespecified level of precision. Chapter 6 covers open methods. These methods also involve systematic trial-and- error iterations but do not require that the initial guesses bracket the root. We will dis- cover that these methods are usually more computationally efficient than bracketing methods but that they do not always work. One-point iteration, Newton-Raphson, and secant methods are described. Graphical methods are used to provide geometric insight into cases where the open methods do not work. Formulas are developed that provide an idea of how fast open methods home in on the root. An advanced approach, Brent’s method, that combines the reliability of bracketing with the speed of open methods is
  • 138. PT2.3 ORIENTATION 121 described. In addition, an approach to extend the Newton-Raphson method to systems of nonlinear equations is explained. Chapter 7 is devoted to finding the roots of polynomials. After background sections on polynomials, the use of conventional methods (in particular the open methods from Chap. 6) are discussed. Then two special methods for locating polynomial roots are CHAPTER 5 Bracketing Methods PART 2 Roots of Equations CHAPTER 7 Roots of Polynomials CHAPTER 8 Engineering Case Studies EPILOGUE 6.6 Nonlinear systems 6.5 Multiple roots 6.4 Brent’s method 6.3 Secant 6.2 Newton- Raphson 6.1 Fixed-point iteration PT 2.2 Mathematical background PT 2.6 Advanced methods PT 2.5 Important formulas 8.4 Mechanical engineering 8.3 Electrical engineering 8.2 Civil engineering 8.1 Chemical engineering 7.7 Software packages 7.6 Other methods 7.1 Polynomials in engineering 7.2 Computing with polynomials 7.4 Muller's method 7.5 Bairstow's method 7.3 Conventional methods PT 2.4 Trade-offs PT 2.3 Orientation PT 2.1 Motivation 5.2 Bisection 5.3 False position 5.4 Incremental searches 5.1 Graphical methods .. CHAPTER 6 Open Methods FIGURE PT2.1 Schematic of the organization of the material in Part Two: Roots of Equations.
  • 139. 122 ROOTS OF EQUATIONS described: Müller’s and Bairstow’s methods. The chapter ends with information related to finding roots with Excel, MATLAB software, and Mathcad. Chapter 8 extends the above concepts to actual engineering problems. Engineering case studies are used to illustrate the strengths and weaknesses of each method and to provide insight into the application of the techniques in professional practice. The applications also highlight the trade-offs (as discussed in Part One) associated with the various methods. An epilogue is included at the end of Part Two. It contains a detailed comparison of the methods discussed in Chaps. 5, 6, and 7. This comparison includes a description of trade-offs related to the proper use of each technique. This section also provides a summary of important formulas, along with references for some numerical methods that are beyond the scope of this text. PT2.3.2 Goals and Objectives Study Objectives. After completing Part Two, you should have sufficient information to successfully approach a wide variety of engineering problems dealing with roots of equations. In general, you should have mastered the techniques, have learned to assess their reliability, and be capable of choosing the best method (or methods) for any par- ticular problem. In addition to these general goals, the specific concepts in Table PT2.2 should be assimilated for a comprehensive understanding of the material in Part Two. Computer Objectives. The book provides you with software and simple computer algo- rithms to implement the techniques discussed in Part Two. All have utility as learning tools. Pseudocodes for several methods are also supplied directly in the text. This informa- tion will allow you to expand your software library to include programs that are more efficient than the bisection method. For example, you may also want to have your own software for the false-position, Newton-Raphson, and secant techniques, which are often more efficient than the bisection method. Finally, packages such as Excel, MATLAB, and Mathcad have powerful capabilities for locating roots. You can use this part of the book to become familiar with these capabilities. TABLE PT2.2 Specific study objectives for Part Two. 1. Understand the graphical interpretation of a root 2. Know the graphical interpretation of the false-position method and why it is usually superior to the bisection method 3. Understand the difference between bracketing and open methods for root location 4. Understand the concepts of convergence and divergence; use the two-curve graphical method to provide a visual manifestation of the concepts 5. Know why bracketing methods always converge, whereas open methods may sometimes diverge 6. Realize that convergence of open methods is more likely if the initial guess is close to the true root 7. Understand the concepts of linear and quadratic convergence and their implications for the efficiencies of the fixed-point-iteration and Newton-Raphson methods 8. Know the fundamental difference between the false-position and secant methods and how it relates to convergence 9. Understand how Brent’s method combines the reliability of bisection with the speed of open methods 10. Understand the problems posed by multiple roots and the modifications available to mitigate them 11. Know how to extend the single-equation Newton-Raphson approach to solve systems of nonlinear equations
  • 140. 123 5 Bracketing Methods This chapter on roots of equations deals with methods that exploit the fact that a function typically changes sign in the vicinity of a root. These techniques are called bracketing methods because two initial guesses for the root are required. As the name implies, these guesses must “bracket,” or be on either side of, the root. The particular methods described herein employ different strategies to systematically reduce the width of the bracket and, hence, home in on the correct answer. As a prelude to these techniques, we will briefly discuss graphical methods for depicting functions and their roots. Beyond their utility for providing rough guesses, graphical techniques are also useful for visualizing the properties of the functions and the behavior of the various numerical methods. 5.1 GRAPHICAL METHODS A simple method for obtaining an estimate of the root of the equation f(x) 5 0 is to make a plot of the function and observe where it crosses the x axis. This point, which represents the x value for which f(x) 5 0, provides a rough approximation of the root. EXAMPLE 5.1 The Graphical Approach Problem Statement. Use the graphical approach to determine the drag coefficient c needed for a parachutist of mass m 5 68.1 kg to have a velocity of 40 m/s after free- falling for time t 5 10 s. Note: The acceleration due to gravity is 9.81 m/s2 . Solution. This problem can be solved by determining the root of Eq. (PT2.4) using the parameters t 5 10, g 5 9.81, y 5 40, and m 5 68.1: f(c) 5 9.81(68.1) c (1 2 e2(cy68.1)10 ) 2 40 or f(c) 5 668.06 c (1 2 e20.146843c ) 2 40 (E5.1.1) Various values of c can be substituted into the right-hand side of this equation to compute C H A P T E R 5
  • 141. 124 BRACKETING METHODS These points are plotted in Fig. 5.1. The resulting curve crosses the c axis between 12 and 16. Visual inspection of the plot provides a rough estimate of the root of 14.75. The valid- ity of the graphical estimate can be checked by substituting it into Eq. (E5.1.1) to yield f(14.75) 5 668.06 14.75 (1 2 e20.146843(14.75) ) 2 40 5 0.100 which is close to zero. It can also be checked by substituting it into Eq. (PT2.3) along with the parameter values from this example to give y 5 9.81(68.1) 14.75 (1 2 e2(14.75y68.1)10 ) 5 40.100 which is very close to the desired fall velocity of 40 m/s. c f(c) 4 34.190 8 17.712 12 6.114 16 22.230 20 28.368 FIGURE 5.1 The graphical approach for determining the roots of an equation. 20 Root 12 8 4 0 20 40 f(c) c –10
  • 142. 5.1 GRAPHICAL METHODS 125 Graphical techniques are of limited practical value because they are not precise. However, graphical methods can be utilized to obtain rough estimates of roots. These estimates can be employed as starting guesses for numerical methods discussed in this and the next chapter. Aside from providing rough estimates of the root, graphical interpretations are im- portant tools for understanding the properties of the functions and anticipating the pitfalls of the numerical methods. For example, Fig. 5.2 shows a number of ways in which roots can occur (or be absent) in an interval prescribed by a lower bound xl and an upper bound xu. Figure 5.2b depicts the case where a single root is bracketed by negative and positive values of f(x). However, Fig. 5.2d, where f(xl) and f(xu) are also on opposite sides of the x axis, shows three roots occurring within the interval. In general, if f(xl) and f(xu) have opposite signs, there are an odd number of roots in the interval. As indi- cated by Fig. 5.2a and c, if f(xl) and f(xu) have the same sign, there are either no roots or an even number of roots between the values. Although these generalizations are usually true, there are cases where they do not hold. For example, functions that are tangential to the x axis (Fig. 5.3a) and discontinu- ous functions (Fig. 5.3b) can violate these principles. An example of a function that is tangential to the axis is the cubic equation f(x) 5 (x 2 2)(x 2 2)(x 2 4). Notice that x 5 2 makes two terms in this polynomial equal to zero. Mathematically, x 5 2 is called a multiple root. At the end of Chap. 6, we will present techniques that are expressly designed to locate multiple roots. The existence of cases of the type depicted in Fig. 5.3 makes it difficult to develop general computer algorithms guaranteed to locate all the roots in an interval. However, when used in conjunction with graphical approaches, the methods described in the FIGURE 5.2 Illustration of a number of general ways that a root may occur in an interval prescribed by a lower bound xl and an upper bound xu. Parts (a) and (c) indicate that if both f(xl) and f(xu) have the same sign, either there will be no roots or there will be an even number of roots within the interval. Parts (b) and (d) indicate that if the function has different signs at the end points, there will be an odd number of roots in the interval. f(x) x f(x) x f(x) x f(x) x (a) (b) (c) (d) xl xu FIGURE 5.3 Illustration of some exceptions to the general cases depicted in Fig. 5.2. (a) Multiple root that occurs when the function is tangen- tial to the x axis. For this case, although the end points are of op- posite signs, there are an even number of axis intersections for the interval. (b) Discontinuous function where end points of oppo- site sign bracket an even number of roots. Special strategies are required for determining the roots for these cases. f(x) x f(x) x (a) (b) xl xu
  • 143. 126 BRACKETING METHODS following sections are extremely useful for solving many roots of equations problems confronted routinely by engineers and applied mathematicians. EXAMPLE 5.2 Use of Computer Graphics to Locate Roots Problem Statement. Computer graphics can expedite and improve your efforts to locate roots of equations. The function f(x) 5 sin10x 1 cos3x has several roots over the range x 5 0 to x 5 5. Use computer graphics to gain insight into the behavior of this function. Solution. Packages such as Excel and MATLAB software can be used to generate plots. Figure 5.4a is a plot of f(x) from x 5 0 to x 5 5. This plot suggests the presence of several roots, including a possible double root at about x 5 4.2 where f(x) appears to be .15 0 Y 4.2 4.25 X 4.3 –.15 (c) FIGURE 5.4 The progressive enlargement of f(x) 5 sin 10x 1 cos 3x by the computer. Such interactive graphics permits the analyst to determine that two distinct roots exist between x 5 4.2 and x 5 4.3. 5 2 0 Y 0 2.5 X –2 2 0 Y 3 4 X 5 –2 (a) (b)
  • 144. 5.2 THE BISECTION METHOD 127 tangent to the x axis. A more detailed picture of the behavior of f(x) is obtained by chang- ing the plotting range from x 5 3 to x 5 5, as shown in Fig. 5.4b. Finally, in Fig. 5.4c, the vertical scale is narrowed further to f(x) 5 20.15 to f(x) 5 0.15 and the horizontal scale is narrowed to x 5 4.2 to x 5 4.3. This plot shows clearly that a double root does not exist in this region and that in fact there are two distinct roots at about x 5 4.23 and x 5 4.26. Computer graphics will have great utility in your studies of numerical methods. This capability will also find many other applications in your other classes and professional activities as well. FIGURE 5.5 Step 1: Choose lower xl and upper xu guesses for the root such that the function changes sign over the interval. This can be checked by ensuring that f(xl)f(xu) , 0. Step 2: An estimate of the root xr is determined by xr 5 xl 1 xu 2 Step 3: Make the following evaluations to determine in which subinterval the root lies: (a) If f(xl)f(xr) , 0, the root lies in the lower subinterval. Therefore, set xu 5 xr and return to step 2. (b) If f(xl)f(xr) . 0, the root lies in the upper subinterval. Therefore, set xl 5 xr and return to step 2. (c) If f(xl)f(xr) 5 0, the root equals xr; terminate the computation. 5.2 THE BISECTION METHOD When applying the graphical technique in Example 5.1, you have observed (Fig. 5.1) that f(x) changed sign on opposite sides of the root. In general, if f(x) is real and con- tinuous in the interval from xl to xu and f(xl) and f(xu) have opposite signs, that is, f(xl) f(xu) , 0 (5.1) then there is at least one real root between xl and xu. Incremental search methods capitalize on this observation by locating an interval where the function changes sign. Then the location of the sign change (and consequently, the root) is identified more precisely by dividing the interval into a number of subinter- vals. Each of these subintervals is searched to locate the sign change. The process is repeated and the root estimate refined by dividing the subintervals into finer increments. We will return to the general topic of incremental searches in Sec. 5.4. The bisection method, which is alternatively called binary chopping, interval halving, or Bolzano’s method, is one type of incremental search method in which the interval is always divided in half. If a function changes sign over an interval, the function value at the midpoint is evaluated. The location of the root is then determined as lying at the midpoint of the subinterval within which the sign change occurs. The process is repeated to obtain refined estimates. A simple algorithm for the bisection calculation is listed in Fig. 5.5, and a graphical depiction of the method is provided in Fig. 5.6. The following example goes through the actual computations involved in the method.
  • 145. 128 BRACKETING METHODS EXAMPLE 5.3 Bisection Problem Statement. Use bisection to solve the same problem approached graphically in Example 5.1. Solution. The first step in bisection is to guess two values of the unknown (in the present problem, c) that give values for f(c) with different signs. From Fig. 5.1, we can see that the function changes sign between values of 12 and 16. Therefore, the initial estimate of the root xr lies at the midpoint of the interval xr 5 12 1 16 2 5 14 This estimate represents a true percent relative error of et 5 5.3% (note that the true value of the root is 14.8011). Next we compute the product of the function value at the lower bound and at the midpoint: f(12) f(14) 5 6.114 (1.611) 5 9.850 which is greater than zero, and hence no sign change occurs between the lower bound and the midpoint. Consequently, the root must be located between 14 and 16. Therefore, we create a new interval by redefining the lower bound as 14 and determining a revised root estimate as xr 5 14 1 16 2 5 15 which represents a true percent error of et 5 1.3%. The process can be repeated to obtain refined estimates. For example, f(14) f(15) 5 1.611(20.384) 5 20.619 16 12 14 16 15 14 FIGURE 5.6 A graphical depiction of the bisection method. This plot conforms to the first three iterations from Example 5.3.
  • 146. 5.2 THE BISECTION METHOD 129 Therefore, the root is between 14 and 15. The upper bound is redefined as 15, and the root estimate for the third iteration is calculated as xr 5 14 1 15 2 5 14.5 which represents a percent relative error of et 5 2.0%. The method can be repeated until the result is accurate enough to satisfy your needs. In the previous example, you may have noticed that the true error does not decrease with each iteration. However, the interval within which the root is located is halved with each step in the process. As discussed in the next section, the interval width provides an exact estimate of the upper bound of the error for the bisection method. 5.2.1 Termination Criteria and Error Estimates We ended Example 5.3 with the statement that the method could be continued to obtain a refined estimate of the root. We must now develop an objective criterion for deciding when to terminate the method. An initial suggestion might be to end the calculation when the true error falls below some prespecified level. For instance, in Example 5.3, the relative error dropped to 2.0 percent during the course of the computation. We might decide that we should terminate when the error drops below, say, 0.1 percent. This strategy is flawed because the error estimates in the example were based on knowledge of the true root of the function. This would not be the case in an actual situation because there would be no point in using the method if we already knew the root. Therefore, we require an error estimate that is not contingent on foreknowledge of the root. As developed previously in Sec. 3.3, an approximate percent relative error ea can be calculated, as in [recall Eq. (3.5)] ea 5 ` xnew r 2 xold r xnew r ` 100% (5.2) where xnew r is the root for the present iteration and xold r is the root from the previous it- eration. The absolute value is used because we are usually concerned with the magnitude of ea rather than with its sign. When ea becomes less than a prespecified stopping cri- terion es, the computation is terminated. EXAMPLE 5.4 Error Estimates for Bisection Problem Statement. Continue Example 5.3 until the approximate error falls below a stopping criterion of es 5 0.5%. Use Eq. (5.2) to compute the errors. Solution. The results of the first two iterations for Example 5.3 were 14 and 15. Sub- stituting these values into Eq. (5.2) yields ZeaZ 5 ` 15 2 14 15 ` 100% 5 6.667%
  • 147. 130 BRACKETING METHODS Recall that the true percent relative error for the root estimate of 15 was 1.3%. Therefore, ea is greater than et. This behavior is manifested for the other iterations: Thus, after six iterations ea finally falls below es 5 0.5%, and the computation can be terminated. These results are summarized in Fig. 5.7. The “ragged” nature of the true error is due to the fact that, for bisection, the true root can lie anywhere within the bracketing interval. The true and approximate errors are far apart when the interval happens to be centered on the true root. They are close when the true root falls at either end of the interval. Iteration xl xu xr Ea (%) et (%) 1 12 16 14 5.413 2 14 16 15 6.667 1.344 3 14 15 14.5 3.448 2.035 4 14.5 15 14.75 1.695 0.345 5 14.75 15 14.875 0.840 0.499 6 14.75 14.875 14.8125 0.422 0.077 FIGURE 5.7 Errors for the bisection method. True and estimated errors are plotted versus the number of iterations. 6 2 4 Iterations Percent relative error 0 0.1 1.0 True Approximate 10 Although the approximate error does not provide an exact estimate of the true error, Fig. 5.7 suggests that ea captures the general downward trend of et. In addition, the plot exhibits the extremely attractive characteristic that ea is always greater than et. Thus,
  • 148. 5.2 THE BISECTION METHOD 131 when ea falls below es, the computation could be terminated with confidence that the root is known to be at least as accurate as the prespecified acceptable level. Although it is always dangerous to draw general conclusions from a single example, it can be demonstrated that ea will always be greater than et for the bisection method. This is because each time an approximate root is located using bisection as xr 5 (xl 1 xu)y2, we know that the true root lies somewhere within an interval of (xu 2 xl)y2 5 Dxy2. Therefore, the root must lie within 6Dxy2 of our estimate (Fig. 5.8). For instance, when Example 5.3 was terminated, we could make the definitive statement that xr 5 14.5 6 0.5 Because ¢xy2 5 xnew r 2 xold r (Fig. 5.9), Eq. (5.2) provides an exact upper bound on the true error. For this bound to be exceeded, the true root would have to fall outside the bracketing interval, which, by definition, could never occur for the bisection method. As illustrated in a subsequent example (Example 5.6), other root-locating techniques do not always behave as nicely. Although bisection is generally slower than other methods, FIGURE 5.8 Three ways in which the interval may bracket the root. In (a) the true value lies at the center of the interval, whereas in (b) and (c) the true value lies near the extreme. Notice that the dis- crepancy between the true value and the midpoint of the interval never exceeds half the interval length, or Dxy2. (b) (a) (c) ⌬x/2 xl xr xu xl xr xu xl xr xu ⌬x/2 True root FIGURE 5.9 Graphical depiction of why the error estimate for bisection (Dxy2) is equivalent to the root estimate for the present iteration (xnew r ) minus the root estimate for the previous iteration (xold r ). Previous iteration ⌬x/2 xold r xnew r xnew – xold r r Present iteration
  • 149. 132 BRACKETING METHODS the neatness of its error analysis is certainly a positive aspect that could make it attrac- tive for certain engineering applications. Before proceeding to the computer program for bisection, we should note that the relationships (Fig. 5.9) xnew r 2 xold r 5 xu 2 xl 2 and xnew r 5 xl 1 xu 2 can be substituted into Eq. (5.2) to develop an alternative formulation for the approximate percent relative error ea 5 ` xu 2 xl xu 1 xl ` 100% (5.3) This equation yields identical results to Eq. (5.2) for bisection. In addition, it allows us to calculate an error estimate on the basis of our initial guesses—that is, on our first iteration. For instance, on the first iteration of Example 5.2, an approximate error can be computed as ea 5 ` 16 2 12 16 1 12 ` 100% 5 14.29% Another benefit of the bisection method is that the number of iterations required to attain an absolute error can be computed a priori—that is, before starting the iterations. This can be seen by recognizing that before starting the technique, the absolute error is E0 a 5 x0 u 2 x0 l 5 ¢x0 where the superscript designates the iteration. Hence, before starting the method, we are at the “zero iteration.” After the first iteration, the error becomes E1 a 5 ¢x0 2 Because each succeeding iteration halves the error, a general formula relating the error and the number of iterations n is En a 5 ¢x0 2n (5.4) If Ea,d is the desired error, this equation can be solved for n 5 log(¢x0 yEa,d) log 2 5 log2 a ¢x0 Ea,d b (5.5) Let us test the formula. For Example 5.4, the initial interval was Dx0 5 16 2 12 5 4. After six iterations, the absolute error was Ea 5 Z14.875 2 14.75Z 2 5 0.0625
  • 150. 5.2 THE BISECTION METHOD 133 We can substitute these values into Eq. (5.5) to give n 5 log(4y0.0625) log 2 5 6 Thus, if we knew beforehand that an error of less than 0.0625 was acceptable, the for- mula tells us that six iterations would yield the desired result. Although we have emphasized the use of relative errors for obvious reasons, there will be cases where (usually through knowledge of the problem context) you will be able to specify an absolute error. For these cases, bisection along with Eq. (5.5) can provide a useful root-location algorithm. We will explore such applications in the end-of-chapter problems. 5.2.2 Bisection Algorithm The algorithm in Fig. 5.5 can now be expanded to include the error check (Fig. 5.10). The algorithm employs user-defined functions to make root location and function evalu- ation more efficient. In addition, an upper limit is placed on the number of iterations. Finally, an error check is included to avoid division by zero during the error evaluation. Such would be the case when the bracketing interval is centered on zero. For this situ- ation, Eq. (5.2) becomes infinite. If this occurs, the program skips over the error evalu- ation for that iteration. The algorithm in Fig. 5.10 is not user-friendly; it is designed strictly to come up with the answer. In Prob. 5.14 at the end of this chapter, you will have the task of mak- ing it easier to use and understand. FUNCTION Bisect(xl, xu, es, imax, xr, iter, ea) iter 5 0 DO xrold 5 xr xr 5 (xl 1 xu) / 2 iter 5 iter 1 1 IF xr ? 0 THEN ea 5 ABS((xr 2 xrold) / xr) * 100 END IF test 5 f(xl) * f(xr) IF test , 0 THEN xu 5 xr ELSE IF test . 0 THEN xl 5 xr ELSE ea 5 0 END IF IF ea , es OR iter $ imax EXIT END DO Bisect 5 xr END Bisect FIGURE 5.10 Pseudocode for function to implement bisection.
  • 151. 134 BRACKETING METHODS 5.2.3 Minimizing Function Evaluations The bisection algorithm in Fig. 5.10 is just fine if you are performing a single root evaluation for a function that is easy to evaluate. However, there are many instances in engineering when this is not the case. For example, suppose that you develop a computer program that must locate a root numerous times. In such cases you could call the algorithm from Fig. 5.10 thousands and even millions of times in the course of a single run. Further, in its most general sense, a univariate function is merely an entity that re- turns a single value in return for a single value you send to it. Perceived in this sense, functions are not always simple formulas like the one-line equations solved in the pre- ceding examples in this chapter. For example, a function might consist of many lines of code that could take a significant amount of execution time to evaluate. In some cases, the function might even represent an independent computer program. Because of both these factors, it is imperative that numerical algorithms minimize function evaluations. In this light, the algorithm from Fig. 5.10 is deficient. In particular, notice that in making two function evaluations per iteration, it recalculates one of the functions that was determined on the previous iteration. Figure 5.11 provides a modified algorithm that does not have this deficiency. We have highlighted the lines that differ from Fig. 5.10. In this case, only the new function value at FUNCTION Bisect(xl, xu, es, imax, xr, iter, ea) iter 5 0 fl 5 f(xl) DO xrold 5 xr xr 5 (xl 1 xu) / 2 fr 5 f(xr) iter 5 iter 1 1 IF xr ? 0 THEN ea 5 ABS((xr 2 xrold) / xr) * 100 END IF test 5 fl * fr IF test , 0 THEN xu 5 xr ELSE IF test . 0 THEN xl 5 xr fl 5 fr ELSE ea 5 0 END IF IF ea , es OR iter $ imax EXIT END DO Bisect 5 xr END Bisect FIGURE 5.11 Pseudocode for bisection sub- program which minimizes function evaluations.
  • 152. 5.3 THE FALSE-POSITION METHOD 135 the root estimate is calculated. Previously calculated values are saved and merely reassigned as the bracket shrinks. Thus, n 1 1 function evaluations are performed, rather than 2n. 5.3 THE FALSE-POSITION METHOD Although bisection is a perfectly valid technique for determining roots, its “brute-force” approach is relatively inefficient. False position is an alternative based on a graphical insight. A shortcoming of the bisection method is that, in dividing the interval from xl to xu into equal halves, no account is taken of the magnitudes of f(xl) and f(xu). For example, if f(xl) is much closer to zero than f(xu), it is likely that the root is closer to xl than to xu (Fig. 5.12). An alternative method that exploits this graphical insight is to join f(xl) and f(xu) by a straight line. The intersection of this line with the x axis represents an improved estimate of the root. The fact that the replacement of the curve by a straight line gives a “false position” of the root is the origin of the name, method of false position, or in Latin, regula falsi. It is also called the linear interpolation method. Using similar triangles (Fig. 5.12), the intersection of the straight line with the x axis can be estimated as f(xl) xr 2 xl 5 f(xu) xr 2 xu (5.6) which can be solved for (see Box 5.1 for details). xr 5 xu 2 f(xu)(xl 2 xu) f(xl) 2 f(xu) (5.7) FIGURE 5.12 A graphical depiction of the method of false position. Similar triangles used to derive the for- mula for the method are shaded. x f (x) f(xl) f (xu) xu xl xr
  • 153. 136 BRACKETING METHODS This is the false-position formula. The value of xr computed with Eq. (5.7) then replaces whichever of the two initial guesses, xl or xu, yields a function value with the same sign as f(xr). In this way, the values of xl and xu always bracket the true root. The process is repeated until the root is estimated adequately. The algorithm is identical to the one for bisection (Fig. 5.5) with the exception that Eq. (5.7) is used for step 2. In addition, the same stopping criterion [Eq. (5.2)] is used to terminate the computation. EXAMPLE 5.5 False Position Problem Statement. Use the false-position method to determine the root of the same equation investigated in Example 5.1 [Eq. (E5.1.1)]. Solution. As in Example 5.3, initiate the computation with guesses of xl 5 12 and xu 5 16. First iteration: xl 5 12 f(xl) 5 6.1139 xu 5 16 f(xu) 5 22.2303 xr 5 16 2 22.2303(12 2 16) 6.1139 2 (22.2303) 5 14.309 which has a true relative error of 0.88 percent. Second iteration: f(xl) f(xr) 5 21.5376 Box 5.1 Derivation of the Method of False Position Cross-multiply Eq. (5.6) to yield f(xl)(xr 2 xu) 5 f(xu)(xr 2 xl) Collect terms and rearrange: xr [ f(xl) 2 f(xu)] 5 xu f(xl) 2 xl f(xu) Divide by f(xl) 2 f(xu): xr 5 xu f(xl) 2 xl f(xu) f(xl) 2 f(xu) (B5.1.1) This is one form of the method of false position. Note that it al- lows the computation of the root xr as a function of the lower and upper guesses xl and xu. It can be put in an alternative form by expanding it: xr 5 xu f(xl) f(xl) 2 f(xu) 2 xl f(xu) f(xl) 2 f(xu) then adding and subtracting xu on the right-hand side: xr 5 xu 1 xu f(xl) f(xl) 2 f(xu) 2 xu 2 xl f(xu) f(xl) 2 f(xu) Collecting terms yields xr 5 xu 1 xu f(xu) f(xl) 2 f(xu) 2 xl f(xu) f(xl) 2 f(xu) or xr 5 xu 2 f(xu)(xl 2 xu) f(xl) 2 f(xu) which is the same as Eq. (5.7). We use this form because it involves one less function evaluation and one less multiplication than Eq. (B5.1.1). In addition, it is directly comparable with the secant method, which will be discussed in Chap. 6.
  • 154. 5.3 THE FALSE-POSITION METHOD 137 Therefore, the root lies in the first subinterval, and xr becomes the upper limit for the next iteration, xu 5 14.9113: xl 5 12 f(xl) 5 6.1139 xu 5 14.9309 f(xu) 5 20.2515 xr 5 14.9309 2 20.2515(12 2 14.9309) 6.1139 2 (20.2515) 5 14.8151 which has true and approximate relative errors of 0.09 and 0.78 percent. Additional iterations can be performed to refine the estimate of the roots. FIGURE 5.13 Comparison of the relative errors of the bisection and the false-position methods. 6 3 Iterations True percent relative error 0 10–2 10–3 Bisection False position 10 1 10–1 10–4 A feeling for the relative efficiency of the bisection and false-position methods can be appreciated by referring to Fig. 5.13, where we have plotted the true percent relative errors for Examples 5.4 and 5.5. Note how the error for false position decreases much faster than for bisection because of the more efficient scheme for root location in the false-position method. Recall in the bisection method that the interval between xl and xu grew smaller during the course of a computation. The interval, as defined by ¢xy2 5 Z xu 2 xl Zy2 for the first iteration, therefore provided a measure of the error for this approach. This is not the case
  • 155. 138 BRACKETING METHODS for the method of false position because one of the initial guesses may stay fixed through- out the computation as the other guess converges on the root. For instance, in Example 5.5 the lower guess xl remained at 12 while xu converged on the root. For such cases, the interval does not shrink but rather approaches a constant value. Example 5.5 suggests that Eq. (5.2) represents a very conservative error criterion. In fact, Eq. (5.2) actually constitutes an approximation of the discrepancy of the previous iteration. This is because for a case such as Example 5.5, where the method is converg- ing quickly (for example, the error is being reduced nearly an order of magnitude per iteration), the root for the present iteration xnew r is a much better estimate of the true value than the result of the previous iteration xold r . Thus, the quantity in the numerator of Eq. (5.2) actually represents the discrepancy of the previous iteration. Consequently, we are assured that satisfaction of Eq. (5.2) ensures that the root will be known with greater accuracy than the prescribed tolerance. However, as described in the next section, there are cases where false position converges slowly. For these cases, Eq. (5.2) becomes unreliable, and an alternative stopping criterion must be developed. 5.3.1 Pitfalls of the False-Position Method Although the false-position method would seem to always be the bracketing method of preference, there are cases where it performs poorly. In fact, as in the following example, there are certain cases where bisection yields superior results. EXAMPLE 5.6 A Case Where Bisection Is Preferable to False Position Problem Statement. Use bisection and false position to locate the root of f(x) 5 x10 2 1 between x 5 0 and 1.3. Solution. Using bisection, the results can be summarized as Iteration xl xu xr ␧a (%) ␧t (%) 1 0 1.3 0.65 100.0 35 2 0.65 1.3 0.975 33.3 2.5 3 0.975 1.3 1.1375 14.3 13.8 4 0.975 1.1375 1.05625 7.7 5.6 5 0.975 1.05625 1.015625 4.0 1.6 Thus, after five iterations, the true error is reduced to less than 2 percent. For false position, a very different outcome is obtained: Iteration xl xu xr ␧a (%) ␧t (%) 1 0 1.3 0.09430 90.6 2 0.09430 1.3 0.18176 48.1 81.8 3 0.18176 1.3 0.26287 30.9 73.7 4 0.26287 1.3 0.33811 22.3 66.2 5 0.33811 1.3 0.40788 17.1 59.2
  • 156. 5.3 THE FALSE-POSITION METHOD 139 After five iterations, the true error has only been reduced to about 59 percent. In addition, note that ea , et. Thus, the approximate error is misleading. Insight into these results can be gained by examining a plot of the function. As in Fig. 5.14, the curve violates the premise upon which false position was based—that is, if f(xl) is much closer to zero than f(xu), then the root is closer to xl than to xu (recall Fig. 5.12). Because of the shape of the present function, the opposite is true. FIGURE 5.14 Plot of f (x) 5 x10 2 1, illustrating slow convergence of the false-position method. 1.0 10 5 0 f(x) x The forgoing example illustrates that blanket generalizations regarding root-location methods are usually not possible. Although a method such as false position is often supe- rior to bisection, there are invariably cases that violate this general conclusion. Therefore, in addition to using Eq. (5.2), the results should always be checked by substituting the root estimate into the original equation and determining whether the result is close to zero. Such a check should be incorporated into all computer programs for root location. The example also illustrates a major weakness of the false-position method: its one- sidedness. That is, as iterations are proceeding, one of the bracketing points will tend to
  • 157. 140 BRACKETING METHODS stay fixed. This can lead to poor convergence, particularly for functions with significant curvature. The following section provides a remedy. 5.3.2 Modified False Position One way to mitigate the “one-sided” nature of false position is to have the algorithm detect when one of the bounds is stuck. If this occurs, the function value at the stagnant bound can be divided in half. This is called the modified false-position method. The algorithm in Fig. 5.15 implements this strategy. Notice how counters are used to determine when one of the bounds stays fixed for two iterations. If this occurs, the function value at this stagnant bound is halved. The effectiveness of this algorithm can be demonstrated by applying it to Example 5.6. If a stopping criterion of 0.01% is used, the bisection and standard false-position FUNCTION ModFalsePos(xl, xu, es, imax, xr, iter, ea) iter 5 0 fl 5 f(xl) fu 5 f(xu) DO xrold 5 xr xr 5 xu 2 fu * (xl 2 xu) / (fl 2 fu) fr 5 f(xr) iter 5 iter 1 1 IF xr ,. 0 THEN ea 5 Abs((xr 2 xrold) / xr) * 100 END IF test 5 fl * fr IF test , 0 THEN xu 5 xr fu 5 f(xu) iu 5 0 il 5 il 1 1 If il $ 2 THEN fl 5 fl / 2 ELSE IF test . 0 THEN xl 5 xr fl 5 f(xl) il 5 0 iu 5 iu 1 1 IF iu $ 2 THEN fu 5 fu / 2 ELSE ea 5 0 END IF IF ea , es OR iter $ imax THEN EXIT END DO ModFalsePos 5 xr End MODFALSEPOS FIGURE 5.15 Pseudocode for the modified false-position method.
  • 158. 5.4 INCREMENTAL SEARCHES AND DETERMINING INITIAL GUESSES 141 methods would converge in 14 and 39 iterations, respectively. In contrast, the modified false-position method would converge in 12 iterations. Thus, for this example, it is somewhat more efficient than bisection and is vastly superior to the unmodified false- position method. 5.4 INCREMENTAL SEARCHES AND DETERMINING INITIAL GUESSES Besides checking an individual answer, you must determine whether all possible roots have been located. As mentioned previously, a plot of the function is usually very useful in guiding you in this task. Another option is to incorporate an incremental search at the beginning of the computer program. This consists of starting at one end of the region of interest and then making function evaluations at small increments across the region. When the function changes sign, it is assumed that a root falls within the increment. The x values at the beginning and the end of the increment can then serve as the initial guesses for one of the bracketing techniques described in this chapter. A potential problem with an incremental search is the choice of the increment length. If the length is too small, the search can be very time consuming. On the other hand, if the length is too great, there is a possibility that closely spaced roots might be missed (Fig. 5.16). The problem is compounded by the possible existence of multiple roots. A partial remedy for such cases is to compute the first derivative of the function f'(x) at the beginning and the end of each interval. If the derivative changes sign, it suggests that a minimum or maximum may have occurred and that the interval should be examined more closely for the existence of a possible root. Although such modifications or the employment of a very fine increment can allevi- ate the problem, it should be clear that brute-force methods such as incremental search are not foolproof. You would be wise to supplement such automatic techniques with any other information that provides insight into the location of the roots. Such information can be found in plotting and in understanding the physical problem from which the equation originated. FIGURE 5.16 Cases where roots could be missed because the increment length of the search procedure is too large. Note that the last root on the right is multiple and would be missed regardless of increment length. x6 x0 x1 x2 x3 x4 x5 f (x) x
  • 159. 142 BRACKETING METHODS PROBLEMS 5.1 Determine the real roots of f(x) 5 20.5x2 1 2.5x 1 4.5: (a) Graphically. (b) Using the quadratic formula. (c) Using three iterations of the bisection method to determine the highest root. Employ initial guesses of xl 5 5 and xu 5 10. Compute the estimated error ea and the true error et after each iteration. 5.2 Determine the real root of f(x) 5 5x3 2 5x2 1 6x 2 2: (a) Graphically. (b) Using bisection to locate the root. Employ initial guesses of xl 5 0 and xu 5 1 and iterate until the estimated error ea falls below a level of es 5 10%. 5.3 Determine the real root of f(x) 5 225 1 82x 2 90x2 1 44x3 2 8x4 1 0.7x5 : (a) Graphically. (b) Using bisection to determine the root to es 5 10%. Employ initial guesses of xl 5 0.5 and xu 5 1.0. (c) Perform the same computation as in (b) but use the false- position method and es 5 0.2%. 5.4 (a) Determine the roots of f(x) 5 212 2 21x 1 18x2 2 2.75x3 graphically. In addition, determine the first root of the function with (b) bisection, and (c) false position. For (b) and (c) use initial guesses of xl 5 21 and xu 5 0, and a stopping criterion of 1%. 5.5 Locate the first nontrivial root of sin x 5 x2 where x is in radi- ans. Use a graphical technique and bisection with the initial interval from 0.5 to 1. Perform the computation until ea is less than es 5 2%. Also perform an error check by substituting your final answer into the original equation. 5.6 Determine the positive real root of ln (x2 ) 5 0.7 (a) graphi- cally, (b) using three iterations of the bisection method, with initial guesses of xl 5 0.5 and xu 5 2, and (c) using three iterations of the false-position method, with the same initial guesses as in (b). 5.7 Determine the real root of f(x) 5 (0.8 2 0.3x)yx: (a) Analytically. (b) Graphically. (c) Using three iterations of the false-position method and initial guesses of 1 and 3. Compute the approximate error ea and the true error et after each iteration. Is there a problem with the result? 5.8 Find the positive square root of 18 using the false-position method to within es 5 0.5%. Employ initial guesses of xl 5 4 and xu 5 5. 5.9 Find the smallest positive root of the function (x is in radians) x2 Zcos 1xZ 5 5 using the false-position method. To locate the re- gion in which the root lies, first plot this function for values of x between 0 and 5. Perform the computation until ea falls below es 5 1%. Check your final answer by substituting it into the orig- inal function. 5.10 Find the positive real root of f(x) 5 x4 2 8x3 2 35x2 1 450x 2 1001 using the false-position method. Use initial guesses of xl 5 4.5 and xu 5 6 and perform five iterations. Compute both the true and approximate errors based on the fact that the root is 5.60979. Use a plot to explain your results and perform the compu- tation to within es 5 1.0%. 5.11 Determine the real root of x3.5 5 80: (a) analytically and (b) with the false-position method to within es 5 2.5%. Use initial guesses of 2.0 and 5.0. 5.12 Given f(x) 5 22x6 2 1.5x4 1 10x 1 2 Use bisection to determine the maximum of this function. Employ initial guesses of xl 5 0 and xu 5 1, and perform iterations until the approximate relative error falls below 5%. 5.13 The velocity y of a falling parachutist is given by y 5 gm c (1 2 e2(cym)t ) where g 5 9.81mys2 . For a parachutist with a drag coefficient c 5 15 kg/s, compute the mass m so that the velocity is y 5 36 m/s at t 5 10 s. Use the false-position method to determine m to a level of es 5 0.1%. 5.14 Use bisection to determine the drag coefficient needed so that an 82-kg parachutist has a velocity of 36 m/s after 4 s of free fall. Note: The acceleration of gravity is 9.81 m/s2 . Start with initial guesses of xl 5 3 and xu 5 5 and iterate until the approximate relative error falls below 2%. Also perform an error check by sub- stituting your final answer into the original equation. 5.15 As depicted in Fig. P5.15, the velocity of water, y (m/s), discharged from a cylindrical tank through a long pipe can be computed as y 5 12gH tanh a 12gH 2L tb H L v FIGURE P5.15
  • 160. PROBLEMS 143 your answer. Determine the approximate relative error after each iteration. Employ initial guesses of 0 and R. 5.18 The saturation concentration of dissolved oxygen in freshwa- ter can be calculated with the equation (APHA, 1992) lnosf 5 2139.34411 1 1.575701 3 105 Ta 2 6.642308 3 107 T2 a 1 1.243800 3 1010 T3 a 2 8.621949 3 1011 T4 a where osf 5 the saturation concentration of dissolved oxygen in freshwater at 1 atm (mg/L) and Ta 5 absolute temperature (K). Remember that Ta 5 T 1 273.15, where T 5 temperature (°C). According to this equation, saturation decreases with increasing temperature. For typical natural waters in temperate climates, the equation can be used to determine that oxygen concentration ranges from 14.621 mg/L at 0°C to 6.413 mg/L at 40°C. Given a value of oxygen concentration, this formula and the bisection method can be used to solve for temperature in °C. (a) If the initial guesses are set as 0 and 408C, how many bisection iterations would be required to determine temperature to an absolute error of 0.058C? (b) Develop and test a bisection program to determine T as a func- tion of a given oxygen concentration to a prespecified absolute error as in (a). Given initial guesses of 0 and 408C, test your program for an absolute error 5 0.058C and the following cases: osf 5 8, 10, and 12 mg/L. Check your results. 5.19 According to Archimedes principle, the buoyancy force is equal to the weight of fluid displaced by the submerged portion of an object. For the sphere depicted in Fig. P5.19, use bisection to deter- mine the height h of the portion that is above water. Employ the follow- ing values for your computation: r 5 1 m, ␳s 5 density of sphere 5 200 kg/m3 , and ␳w 5 density of water 5 1000 kg/m3 . Note that the volume of the above-water portion of the sphere can be computed with V 5 ph2 3 (3r 2 h) h r FIGURE P5.19 where g 5 9.81 m/s2 , H 5 initial head (m), L 5 pipe length (m), and t 5 elapsed time (s). Determine the head needed to achieve y 5 5 m/s in 2.5 s for a 4-m-long pipe (a) graphically, (b) by bisection, and (c) with false position. Employ initial guesses of xl 5 0 and xu 5 2 m with a stopping criterion of es 5 1%. Check you results. 5.16 Water is flowing in a trapezoidal channel at a rate of Q 5 20 m3 /s. The critical depth y for such a channel must satisfy the equation 0 5 1 2 Q2 gA3 c B where g 5 9.81 m/s2 , Ac 5 the cross-sectional area (m2 ), and B 5 the width of the channel at the surface (m). For this case, the width and the cross-sectional area can be related to depth y by B 5 3 1 y and Ac 5 3y 1 y2 2 Solve for the critical depth using (a) the graphical method, (b) bisec- tion, and (c) false position. For (b) and (c) use initial guesses of xl 5 0.5 and xu 5 2.5, and iterate until the approximate error falls below 1% or the number of iterations exceeds 10. Discuss your results. 5.17 You are designing a spherical tank (Fig. P5.17) to hold water for a small village in a developing country. The volume of liquid it can hold can be computed as V 5 ph2 [3R 2 h] 3 where V 5 volume (m3 ), h 5 depth of water in tank (m), and R 5 the tank radius (m). h V R FIGURE P5.17 If R 5 3 m, to what depth must the tank be filled so that it holds 30 m3 ? Use three iterations of the false-position method to determine
  • 161. 144 BRACKETING METHODS (c) Add an answer check that substitutes the root estimate into the original function to verify whether the final result is close to zero. (d) Test the subprogram by duplicating the computations from Examples 5.3 and 5.4. 5.22 Develop a subprogram for the bisection method that mini- mizes function evaluations based on the pseudocode from Fig. 5.11. Determine the number of function evaluations (n) per total itera- tions. Test the program by duplicating Example 5.6. 5.23 Develop a user-friendly program for the false-position method. The structure of your program should be similar to the bisection algorithm outlined in Fig. 5.10. Test the program by duplicating Example 5.5. 5.24 Develop a subprogram for the false-position method that min- imizes function evaluations in a fashion similar to Fig. 5.11. Deter- mine the number of function evaluations (n) per total iterations. Test the program by duplicating Example 5.6. 5.25 Develop a user-friendly subprogram for the modified false- position method based on Fig. 5.15. Test the program by deter- mining the root of the function described in Example 5.6. Perform a number of runs until the true percent relative error falls below 0.01%. Plot the true and approximate percent relative errors versus number of iterations on semilog paper. Interpret your results. 5.26 Develop a function for bisection in a similar fashion to Fig. 5.10. However, rather than using the maximum iterations and Eq. (5.2), employ Eq. (5.5) as your stopping criterion. Make sure to round the result of Eq. (5.5) up to the next highest integer. Test your function by solving Example 5.3 using Ea,d 5 0.0001. 5.20 Perform the same computation as in Prob. 5.19, but for the frustrum of a cone, as depicted in Fig. P5.20. Employ the following values for your computation: r1 5 0.5 m, r2 5 1 m, h 5 1 m, ␳f 5 frustrum density 5 200 kg/m3 , and ␳w 5 water density 5 1000 kg/m3 . Note that the volume of a frustrum is given by V 5 ph 3 (r2 1 1 r2 2 1 r1r2) h h1 r2 r1 FIGURE P5.20 5.21 Integrate the algorithm outlined in Fig. 5.10 into a complete, user-friendly bisection subprogram. Among other things: (a) Place documentation statements throughout the subprogram to identify what each section is intended to accomplish. (b) Label the input and output.
  • 162. 6 C H A P T E R 6 145 Open Methods For the bracketing methods in Chap. 5, the root is located within an interval prescribed by a lower and an upper bound. Repeated application of these methods always results in closer estimates of the true value of the root. Such methods are said to be convergent because they move closer to the truth as the computation progresses (Fig. 6.1a). In contrast, the open methods described in this chapter are based on formulas that require only a single starting value of x or two starting values that do not FIGURE 6.1 Graphical depiction of the fundamental difference between the (a) bracketing and (b) and (c) open methods for root location. In (a), which is the bisection method, the root is constrained within the interval prescribed by xl and xu. In contrast, for the open method depicted in (b) and (c), a formula is used to project from xi to xi11 in an iterative fashion. Thus, the method can either (b) diverge or (c) converge rapidly, depending on the value of the initial guess. f (x) x (a) xl xu xl xu f(x) x (b) xi xi + 1 f(x) x (c) xi xi + 1 xl xu xl xu xl xu
  • 163. 146 OPEN METHODS necessarily bracket the root. As such, they sometimes diverge or move away from the true root as the computation progresses (Fig. 6.1b). However, when the open methods converge (Fig. 6.1c), they usually do so much more quickly than the brack- eting methods. We will begin our discussion of open techniques with a simple version that is useful for illustrating their general form and also for demonstrating the con- cept of convergence. 6.1 SIMPLE FIXED-POINT ITERATION As mentioned above, open methods employ a formula to predict the root. Such a formula can be developed for simple fixed-point iteration (or, as it is also called, one-point it- eration or successive substitution) by rearranging the function f(x) 5 0 so that x is on the left-hand side of the equation: x 5 g(x) (6.1) This transformation can be accomplished either by algebraic manipulation or by simply adding x to both sides of the original equation. For example, x2 2 2x 1 3 5 0 can be simply manipulated to yield x 5 x2 1 3 2 whereas sin x 5 0 could be put into the form of Eq. (6.1) by adding x to both sides to yield x 5 sin x 1 x The utility of Eq. (6.1) is that it provides a formula to predict a new value of x as a function of an old value of x. Thus, given an initial guess at the root xi, Eq. (6.1) can be used to compute a new estimate xi11 as expressed by the iterative formula xi11 5 g(xi) (6.2) As with other iterative formulas in this book, the approximate error for this equation can be determined using the error estimator [Eq. (3.5)]: ea 5 ` xi11 2 xi xi11 ` 100% EXAMPLE 6.1 Simple Fixed-Point Iteration Problem Statement. Use simple fixed-point iteration to locate the root of f(x) 5 e2x 2 x. Solution. The function can be separated directly and expressed in the form of Eq. (6.2) as xi11 5 e2xi
  • 164. 6.1 SIMPLE FIXED-POINT ITERATION 147 Starting with an initial guess of x0 5 0, this iterative equation can be applied to compute i xi Ea (%) Et (%) 0 0 100.0 1 1.000000 100.0 76.3 2 0.367879 171.8 35.1 3 0.692201 46.9 22.1 4 0.500473 38.3 11.8 5 0.606244 17.4 6.89 6 0.545396 11.2 3.83 7 0.579612 5.90 2.20 8 0.560115 3.48 1.24 9 0.571143 1.93 0.705 10 0.564879 1.11 0.399 Thus, each iteration brings the estimate closer to the true value of the root: 0.56714329. 6.1.1 Convergence Notice that the true percent relative error for each iteration of Example 6.1 is roughly proportional (by a factor of about 0.5 to 0.6) to the error from the previous iteration. This property, called linear convergence, is characteristic of fixed-point iteration. Aside from the “rate” of convergence, we must comment at this point about the “possibility” of convergence. The concepts of convergence and divergence can be de- picted graphically. Recall that in Sec. 5.1, we graphed a function to visualize its structure and behavior (Example 5.1). Such an approach is employed in Fig. 6.2a for the function f(x) 5 e2x 2 x. An alternative graphical approach is to separate the equation into two component parts, as in f1(x) 5 f2(x) Then the two equations y1 5 f1(x) (6.3) and y2 5 f2(x) (6.4) can be plotted separately (Fig. 6.2b). The x values corresponding to the intersections of these functions represent the roots of f(x) 5 0. EXAMPLE 6.2 The Two-Curve Graphical Method Problem Statement. Separate the equation e2x 2 x 5 0 into two parts and determine its root graphically.
  • 165. 148 OPEN METHODS These points are plotted in Fig. 6.2b. The intersection of the two curves indicates a root estimate of approximately x 5 0.57, which corresponds to the point where the single curve in Fig. 6.2a crosses the x axis. Solution. Reformulate the equation as y1 5 x and y2 5 e2x . The following values can be computed: x y1 y2 0.0 0.0 1.000 0.2 0.2 0.819 0.4 0.4 0.670 0.6 0.6 0.549 0.8 0.8 0.449 1.0 1.0 0.368 FIGURE 6.2 Two alternative graphical methods for determining the root of f(x) 5 e2x 2 x. (a) Root at the point where it crosses the x axis; (b) root at the intersec- tion of the component functions. f (x) f (x) x x Root Root f (x) = e–x – x f 1(x) = x f 2(x) = e–x (a) (b)
  • 166. 6.1 SIMPLE FIXED-POINT ITERATION 149 The two-curve method can now be used to illustrate the convergence and divergence of fixed-point iteration. First, Eq. (6.1) can be reexpressed as a pair of equations y1 5 x and y2 5 g(x). These two equations can then be plotted separately. As was the case with Eqs. (6.3) and (6.4), the roots of f(x) 5 0 correspond to the abscissa value at the inter- section of the two curves. The function y1 5 x and four different shapes for y2 5 g(x) are plotted in Fig. 6.3. For the first case (Fig. 6.3a), the initial guess of x0 is used to determine the corre- sponding point on the y2 curve [x0, g(x0)]. The point (x1, x1) is located by moving left horizontally to the y1 curve. These movements are equivalent to the first iteration in the fixed-point method: x1 5 g(x0) Thus, in both the equation and in the plot, a starting value of x0 is used to obtain an estimate of x1. The next iteration consists of moving to [x1, g(x1)] and then to (x2, x2). This iteration is equivalent to the equation x2 5 g(x1) FIGURE 6.3 Iteration cobwebs depicting convergence (a and b) and divergence (c and d) of simple fixed-point iteration. Graphs (a) and (c) are called monotone patterns, whereas (b) and (d) are called oscillating or spiral patterns. Note that convergence occurs when |g9(x)| , 1. x x1 y1 = x y2 = g(x) x2 x0 y (a) x y1 = x y2 = g(x) x0 y (b) x y1 = x y2 = g(x) x0 y (c) x y1 = x y2 = g(x) x0 y (d)
  • 167. 150 OPEN METHODS The solution in Fig. 6.3a is convergent because the estimates of x move closer to the root with each iteration. The same is true for Fig. 6.3b. However, this is not the case for Fig. 6.3c and d, where the iterations diverge from the root. Notice that convergence seems to occur only when the absolute value of the slope of y2 5 g(x) is less than the slope of y1 5 x, that is, when ug9(x)u , 1. Box 6.1 provides a theoretical deriva- tion of this result. 6.1.2 Algorithm for Fixed-Point Iteration The computer algorithm for fixed-point iteration is extremely simple. It consists of a loop to iteratively compute new estimates until the termination criterion has been met. Figure 6.4 presents pseudocode for the algorithm. Other open methods can be pro- grammed in a similar way, the major modification being to change the iterative formula that is used to compute the new root estimate. Box 6.1 Convergence of Fixed-Point Iteration From studying Fig. 6.3, it should be clear that fixed-point iteration converges if, in the region of interest, ug9(x)u , 1. In other words, convergence occurs if the magnitude of the slope of g(x) is less than the slope of the line f(x) 5 x. This observation can be demonstrated theoretically. Recall that the iterative equation is xi11 5 g(xi) Suppose that the true solution is xr 5 g(xr) Subtracting these equations yields xr 2 xi11 5 g(xr) 2 g(xi) (B6.1.1) The derivative mean-value theorem (recall Sec. 4.1.1) states that if a function g(x) and its first derivative are continuous over an inter- val a # x # b, then there exists at least one value of x 5 j within the interval such that g¿(j) 5 g(b) 2 g(a) b 2 a (B6.1.2) The right-hand side of this equation is the slope of the line joining g(a) and g(b). Thus, the mean-value theorem states that there is at least one point between a and b that has a slope, designated by g9(j), which is parallel to the line joining g(a) and g(b) (recall Fig. 4.3). Now, if we let a 5 xi and b 5 xr, the right-hand side of Eq. (B6.1.1) can be expressed as g(xr) 2 g(xi) 5 (xr 2 xi)g¿(j) where j is somewhere between xi and xr. This result can then be substituted into Eq. (B6.1.1) to yield xr 2 xi11 5 (xr 2 xi)g¿(j) (B6.1.3) If the true error for iteration i is defined as Et,i 5 xr 2 xi then Eq. (B6.1.3) becomes Et,i11 5 g¿(j)Et,i Consequently, if ug9(x)u , 1, the errors decrease with each iteration. For ug9(x)u . 1, the errors grow. Notice also that if the derivative is positive, the errors will be positive, and hence, the iterative solution will be monotonic (Fig. 6.3a and c). If the derivative is negative, the errors will oscillate (Fig. 6.3b and d). An offshoot of the analysis is that it also demonstrates that when the method converges, the error is roughly proportional to and less than the error of the previous step. For this reason, simple fixed- point iteration is said to be linearly convergent.
  • 168. 6.2 THE NEWTON-RAPHSON METHOD 151 6.2 THE NEWTON-RAPHSON METHOD Perhaps the most widely used of all root-locating formulas is the Newton-Raphson equa- tion (Fig. 6.5). If the initial guess at the root is xi, a tangent can be extended from the point [xi, f(xi)]. The point where this tangent crosses the x axis usually represents an improved estimate of the root. FUNCTION Fixpt(x0, es, imax, iter, ea) xr 5 x0 iter 5 0 DO xrold 5 xr xr 5 g(xrold) iter 5 iter 1 1 IF xr ? O THEN ea 5 ` xr 2 xrold xr ` ? 100 END IF IF ea , es OR iter $ imax EXIT END DO Fixpt 5 xr END Fixpt FIGURE 6.4 Pseudocode for fixed-point iteration. Note that other open methods can be cast in this general format. f (x) f (xi) f(xi) – 0 Slope = f'(xi) 0 x xi+1 xi xi – xi+1 FIGURE 6.5 Graphical depiction of the Newton-Raphson method. A tangent to the function of xi [that is, f9(xi)] is extrapolated down to the x axis to provide an estimate of the root at xi11.
  • 169. 152 OPEN METHODS The Newton-Raphson method can be derived on the basis of this geometrical inter- pretation (an alternative method based on the Taylor series is described in Box 6.2). As in Fig. 6.5, the first derivative at x is equivalent to the slope: f ¿(xi) 5 f(xi) 2 0 xi 2 xi11 (6.5) which can be rearranged to yield xi11 5 xi 2 f(xi) f ¿(xi) (6.6) which is called the Newton-Raphson formula. EXAMPLE 6.3 Newton-Raphson Method Problem Statement. Use the Newton-Raphson method to estimate the root of f(x) 5 e2x 2 x, employing an initial guess of x0 5 0. Solution. The first derivative of the function can be evaluated as f¿(x) 5 2e2x 2 1 which can be substituted along with the original function into Eq. (6.6) to give xi11 5 xi 2 e2xi 2 xi 2e2xi 2 1 Starting with an initial guess of x0 5 0, this iterative equation can be applied to compute i xi Et (%) 0 0 100 1 0.500000000 11.8 2 0.566311003 0.147 3 0.567143165 0.0000220 4 0.567143290 , 1028 Thus, the approach rapidly converges on the true root. Notice that the true percent relative error at each iteration decreases much faster than it does in simple fixed-point iteration (compare with Example 6.1). 6.2.1 Termination Criteria and Error Estimates As with other root-location methods, Eq. (3.5) can be used as a termination criterion. In addition, however, the Taylor series derivation of the method (Box 6.2) provides theo- retical insight regarding the rate of convergence as expressed by Ei11 5 O(E2 i ). Thus the error should be roughly proportional to the square of the previous error. In other words,
  • 170. 6.2 THE NEWTON-RAPHSON METHOD 153 the number of significant figures of accuracy approximately doubles with each iteration. This behavior is examined in the following example. EXAMPLE 6.4 Error Analysis of Newton-Raphson Method Problem Statement. As derived in Box 6.2, the Newton-Raphson method is quadrati- cally convergent. That is, the error is roughly proportional to the square of the previous error, as in Et,i11 2f –(xr) 2f ¿(xr) E2 t,i (E6.4.1) Examine this formula and see if it applies to the results of Example 6.3. Solution. The first derivative of f(x) 5 e2x 2 x is f¿(x) 5 2e2x 2 1 Box 6.2 Derivation and Error Analysis of the Newton-Raphson Method Aside from the geometric derivation [Eqs. (6.5) and (6.6)], the Newton-Raphson method may also be developed from the Taylor series expansion. This alternative derivation is useful in that it also provides insight into the rate of convergence of the method. Recall from Chap. 4 that the Taylor series expansion can be represented as f(xi11) 5 f(xi) 1 f ¿(xi)(xi11 2 xi) 1 f –(j) 2! (xi11 2 xi)2 (B6.2.1) where j lies somewhere in the interval from xi to xi11. An approxi- mate version is obtainable by truncating the series after the first derivative term: f(xi11) f(xi) 1 f ¿(xi)(xi11 2 xi) At the intersection with the x axis, f(xi11) would be equal to zero, or 0 5 f(xi) 1 f ¿(xi)(xi11 2 xi) (B6.2.2) which can be solved for xi11 5 xi 2 f(xi) f ¿(xi) which is identical to Eq. (6.6). Thus, we have derived the Newton- Raphson formula using a Taylor series. Aside from the derivation, the Taylor series can also be used to estimate the error of the formula. This can be done by realizing that if the complete Taylor series were employed, an exact result would be obtained. For this situation xi11 5 xr, where x is the true value of the root. Substituting this value along with f(xr) 5 0 into Eq. (B6.2.1) yields 0 5 f(xi) 1 f ¿(xi)(xr 2 xi) 1 f –(j) 2! (xr 2 xi)2 (B6.2.3) Equation (B6.2.2) can be subtracted from Eq. (B6.2.3) to give 0 5 f ¿(xi)(xr 2 xi11) 1 f –(j) 2! (xr 2 xi)2 (B6.2.4) Now, realize that the error is equal to the discrepancy between xi11 and the true value xr, as in Et,i11 5 xr 2 xi11 and Eq. (B6.2.4) can be expressed as 0 5 f ¿(xi)Et,i11 1 f –(j) 2! E2 t,i (B6.2.5) If we assume convergence, both xi and j should eventually be ap- proximated by the root xr, and Eq. (B6.2.5) can be rearranged to yield Et,i11 5 2f –(xr) 2 f ¿(xr) E2 t,i (B6.2.6) According to Eq. (B6.2.6), the error is roughly proportional to the square of the previous error. This means that the number of correct decimal places approximately doubles with each iteration. Such behavior is referred to as quadratic convergence. Example 6.4 manifests this property.
  • 171. 154 OPEN METHODS which can be evaluated at xr 5 0.56714329 as f9(0.56714329) 5 21.56714329. The second derivative is f–(x) 5 e2x which can be evaluated as f 0(0.56714329) 5 0.56714329. These results can be substituted into Eq. (E6.4.1) to yield Et,i11 2 0.56714329 2(21.56714329) E2 t,i 5 0.18095E2 t,i From Example 6.3, the initial error was Et,0 5 0.56714329, which can be substituted into the error equation to predict Et,1 0.18095(0.56714329)2 5 0.0582 which is close to the true error of 0.06714329. For the next iteration, Et,2 0.18095(0.06714329)2 5 0.0008158 which also compares favorably with the true error of 0.0008323. For the third iteration, Et,3 0.18095(0.0008323)2 5 0.000000125 which is the error obtained in Example 6.3. The error estimate improves in this manner because, as we come closer to the root, x and j are better approximated by xr [recall our assumption in going from Eq. (B6.2.5) to Eq. (B6.2.6) in Box 6.2]. Finally, Et,4 0.18095(0.000000125)2 5 2.83 3 10215 Thus, this example illustrates that the error of the Newton-Raphson method for this case is, in fact, roughly proportional (by a factor of 0.18095) to the square of the error of the previous iteration. 6.2.2 Pitfalls of the Newton-Raphson Method Although the Newton-Raphson method is often very efficient, there are situations where it performs poorly. A special case—multiple roots—will be addressed later in this chapter. However, even when dealing with simple roots, difficulties can also arise, as in the fol- lowing example. EXAMPLE 6.5 Example of a Slowly Converging Function with Newton-Raphson Problem Statement. Determine the positive root of f(x) 5 x10 2 1 using the Newton- Raphson method and an initial guess of x 5 0.5. Solution. The Newton-Raphson formula for this case is xi11 5 xi 2 x10 i 2 1 10x9 i which can be used to compute
  • 172. 6.2 THE NEWTON-RAPHSON METHOD 155 Aside from slow convergence due to the nature of the function, other difficulties can arise, as illustrated in Fig. 6.6. For example, Fig. 6.6a depicts the case where an inflection point [that is, f 0(x) 5 0] occurs in the vicinity of a root. Notice that iterations beginning at x0 progressively diverge from the root. Figure 6.6b illustrates the tendency of the Newton-Raphson technique to oscillate around a local maximum or minimum. Such oscillations may persist, or as in Fig. 6.6b, a near-zero slope is reached, whereupon the solution is sent far from the area of interest. Figure 6.6c shows how an initial guess that is close to one root can jump to a location several roots away. This tendency to move away from the area of interest is because near- zero slopes are encountered. Obviously, a zero slope [ f9(x) 5 0] is truly a disaster because it causes division by zero in the Newton-Raphson formula [Eq. (6.6)]. Graphically (see Fig 6.6d), it means that the solution shoots off horizontally and never hits the x axis. Thus, there is no general convergence criterion for Newton-Raphson. Its convergence depends on the nature of the function and on the accuracy of the initial guess. The only remedy is to have an initial guess that is “sufficiently” close to the root. And for some functions, no guess will work! Good guesses are usually predicated on knowledge of the physical problem setting or on devices such as graphs that provide insight into the be- havior of the solution. The lack of a general convergence criterion also suggests that good computer software should be designed to recognize slow convergence or diver- gence. The next section addresses some of these issues. 6.2.3 Algorithm for Newton-Raphson An algorithm for the Newton-Raphson method is readily obtained by substituting Eq. (6.6) for the predictive formula [Eq. (6.2)] in Fig. 6.4. Note, however, that the program must also be modified to compute the first derivative. This can be simply accomplished by the inclusion of a user-defined function. Iteration x 0 0.5 1 51.65 2 46.485 3 41.8365 4 37.65285 5 33.887565 . . . ` 1.0000000 Thus, after the first poor prediction, the technique is converging on the true root of 1, but at a very slow rate.
  • 173. 156 OPEN METHODS Additionally, in light of the foregoing discussion of potential problems of the Newton- Raphson method, the program would be improved by incorporating several additional features: f(x) x x2 x0 x1 (a) f(x) x x2 x4 x0 x1 x3 (b) f(x) x x0 x1 x2 (c) f(x) x x0 x1 (d) FIGURE 6.6 Four cases where the Newton-Raphson method exhibits poor convergence.
  • 174. 6.3 THE SECANT METHOD 157 1. A plotting routine should be included in the program. 2. At the end of the computation, the final root estimate should always be substituted into the original function to compute whether the result is close to zero. This check partially guards against those cases where slow or oscillating convergence may lead to a small value of ea while the solution is still far from a root. 3. The program should always include an upper limit on the number of iterations to guard against oscillating, slowly convergent, or divergent solutions that could persist interminably. 4. The program should alert the user and take account of the possibility that f9(x) might equal zero at any time during the computation. 6.3 THE SECANT METHOD A potential problem in implementing the Newton-Raphson method is the evaluation of the derivative. Although this is not inconvenient for polynomials and many other func- tions, there are certain functions whose derivatives may be extremely difficult or incon- venient to evaluate. For these cases, the derivative can be approximated by a backward finite divided difference, as in (Fig. 6.7) f ¿(xi) f(xi21) 2 f(xi) xi21 2 xi This approximation can be substituted into Eq. (6.6) to yield the following iterative equation: xi11 5 xi 2 f(xi)(xi21 2 xi) f(xi21) 2 f(xi) (6.7) f (x) f (xi) f (xi – 1) x xi xi – 1 FIGURE 6.7 Graphical depiction of the se- cant method. This technique is similar to the Newton-Raphson technique (Fig. 6.5) in the sense that an estimate of the root is predicted by extrapolating a tangent of the function to the x axis. However, the secant method uses a difference rather than a derivative to estimate the slope.
  • 175. 158 OPEN METHODS Equation (6.7) is the formula for the secant method. Notice that the approach requires two initial estimates of x. However, because f(x) is not required to change signs between the estimates, it is not classified as a bracketing method. EXAMPLE 6.6 The Secant Method Problem Statement. Use the secant method to estimate the root of f(x) 5 e2x 2 x. Start with initial estimates of x21 5 0 and x0 5 1.0. Solution. Recall that the true root is 0.56714329. . . . First iteration: x21 5 0 f(x21) 5 1.00000 x0 5 1 f(x0) 5 20.63212 x1 5 1 2 20.63212(0 2 1) 1 2 (20.63212) 5 0.61270 et 5 8.0% Second iteration: x0 5 1 f(x0) 5 20.63212 x1 5 0.61270 f(x1) 5 20.07081 (Note that both estimates are now on the same side of the root.) x2 5 0.61270 2 20.07081(1 2 0.61270) 20.63212 2 (20.07081) 5 0.56384 et 5 0.58% Third iteration: x1 5 0.61270 f(x1) 5 20.07081 x2 5 0.56384 f(x2) 5 0.00518 x3 5 0.56384 2 0.00518(0.61270 2 0.56384) 20.07081 2 (20.00518) 5 0.56717 et 5 0.0048% 6.3.1 The Difference Between the Secant and False-Position Methods Note the similarity between the secant method and the false-position method. For example, Eqs. (6.7) and (5.7) are identical on a term-by-term basis. Both use two initial estimates to compute an approximation of the slope of the function that is used to project to the x axis for a new estimate of the root. However, a critical difference between the methods is how one of the initial values is replaced by the new estimate. Recall that in the false-position method the latest estimate of the root replaces whichever of the original values yielded a function value with the same sign as f(xr). Consequently, the two estimates always bracket the root. Therefore, for all practical purposes, the method always converges because the root is kept within the bracket. In contrast, the secant method replaces the values in strict sequence, with the new value xi11 replacing xi and xi replacing xi21. As a result, the two values can sometimes lie on the same side of the root. For certain cases, this can lead to divergence.
  • 176. EXAMPLE 6.7 Comparison of Convergence of the Secant and False-Position Techniques Problem Statement. Use the false-position and secant methods to estimate the root of f(x) 5 ln x. Start the computation with values of xl 5 xi21 5 0.5 and xu 5 xi 5 5.0. Solution. For the false-position method, the use of Eq. (5.7) and the bracketing criterion for replacing estimates results in the following iterations: Iteration xl xu xr 1 0.5 5.0 1.8546 2 0.5 1.8546 1.2163 3 0.5 1.2163 1.0585 As can be seen (Fig. 6.8a and c), the estimates are converging on the true root which is equal to 1. 6.3 THE SECANT METHOD 159 FIGURE 6.8 Comparison of the false-position and the secant methods. The first iterations (a) and (b) for both techniques are identical. However, for the second iterations (c) and (d), the points used differ. As a consequence, the secant method can diverge, as indicated in (d). f(x) f (xu) f (xl ) x xr (a) False position f(x) f(xi ) f(xi ) f(xi – 1) x xr (b) Secant f(x) f (xl ) f(xu) x xr (c) f(x) f (xi – 1) x xr (d)
  • 177. 160 OPEN METHODS For the secant method, using Eq. (6.7) and the sequential criterion for replacing estimates results in Iteration xi⫺1 xi xi⫹1 1 0.5 5.0 1.8546 2 5.0 1.8546 0.10438 As in Fig. 6.8d, the approach is divergent. Although the secant method may be divergent, when it converges it usually does so at a quicker rate than the false-position method. For instance, Fig. 6.9 demonstrates the superiority of the secant method in this regard. The inferiority of the false-position method is due to one end staying fixed to maintain the bracketing of the root. This property, which is an advantage in that it prevents divergence, is a shortcoming with regard to the rate of convergence; it makes the finite-difference estimate a less-accurate approximation of the derivative. 20 Iterations True percent relative error 10–6 10–5 10–4 10–3 10–2 10–1 1 10 F a l s e p o s i t i o n S e c a n t N e w t o n - R a p h s o n B i s e c t i o n FIGURE 6.9 Comparison of the true percent relative errors et for the methods to determine the roots of f(x) 5 e2x 2 x.
  • 178. 6.3 THE SECANT METHOD 161 6.3.2 Algorithm for the Secant Method As with the other open methods, an algorithm for the secant method is obtained simply by modifying Fig. 6.4 so that two initial guesses are input and by using Eq. (6.7) to calculate the root. In addition, the options suggested in Sec. 6.2.3 for the Newton-Raphson method can also be applied to good advantage for the secant program. 6.3.3 Modified Secant Method Rather than using two arbitrary values to estimate the derivative, an alternative approach involves a fractional perturbation of the independent variable to estimate f9(x), f¿(xi) f(xi 1 dxi) 2 f(xi) dxi where d 5 a small perturbation fraction. This approximation can be substituted into Eq. (6.6) to yield the following iterative equation: xi11 5 xi 2 dxi f(xi) f(xi 1 dxi) 2 f(xi) (6.8) EXAMPLE 6.8 Modified Secant Method Problem Statement. Use the modified secant method to estimate the root of f(x) 5 e2x 2 x. Use a value of 0.01 for d and start with x0 5 1.0. Recall that the true root is 0.56714329. . . . Solution. First iteration: x0 5 1 f(x0) 5 20.63212 x0 1 dx0 5 1.01 f(x0 1 dx0) 5 20.64578 x1 5 1 2 0.01(20.63212) 20.64578 2 (20.63212) 5 0.537263 Zet Z 5 5.3% Second iteration: x0 5 0.537263 f(x0) 5 0.047083 x0 1 dx0 5 0.542635 f(x0 1 dx0) 5 0.038579 x1 5 0.537263 2 0.005373(0.047083) 0.038579 2 0.047083 5 0.56701 Zet Z 5 0.0236% Third iteration: x0 5 0.56701 f(x0) 5 0.000209 x0 1 dx0 5 0.572680 f(x0 1 dx0) 5 20.00867 x1 5 0.56701 2 0.00567(0.000209) 20.00867 2 0.000209 5 0.567143 Zet Z 5 2.365 3 1025 %
  • 179. 162 OPEN METHODS The choice of a proper value for ␦ is not automatic. If ␦ is too small, the method can be swamped by round-off error caused by subtractive cancellation in the denomina- tor of Eq. (6.8). If it is too big, the technique can become inefficient and even divergent. However, if chosen correctly, it provides a nice alternative for cases where evaluating the derivative is difficult and developing two initial guesses is inconvenient. 6.4 BRENT’S METHOD Wouldn’t it be nice to have a hybrid approach that combined the reliability of bracketing with the speed of the open methods? Brent’s root-location method is a clever algorithm that does just that by applying a speedy open method wherever possible, but reverting to a reliable bracketing method if necessary. The approach was developed by Richard Brent (1973) based on an earlier algorithm of Theodorus Dekker (1969). The bracketing technique is the trusty bisection method (Sec. 5.2) whereas two differ- ent open methods are employed. The first is the secant method described in Sec. 6.3. As explained next, the second is inverse quadratic interpolation. 6.4.1 Inverse Quadratic Interpolation Inverse quadratic interpolation is similar in spirit to the secant method. As in Fig. 6.10a, the secant method is based on computing a straight line that goes through two guesses. The intersection of this straight line with the x axis represents the new root estimate. For this reason, it is sometimes referred to as a linear interpolation method. Now suppose that we had three points. In that case, we could determine a quadratic function of x that goes through the three points (Fig. 6.10b). Just as with the linear secant method, the intersection of this parabola with the x axis would represent the new root estimate. And as illustrated in Fig. 6.10b, using a curve rather than a straight line often yields a better estimate. Although this would seem to represent a great improvement, the approach has a fundamental flaw: It is possible that the parabola might not intersect the x axis! Such would be the case when the resulting parabola had complex roots. This is illustrated by the parabola, y 5 f(x), in Fig. 6.11. FIGURE 6.10 Comparison of (a) the secant method and (b) inverse qua- dratic interpolation. Note that the dark parabola passing through the three points in (b) is called “inverse” because it is written in y rather than in x. f(x) x (a) (b) f(x) x
  • 180. 6.4 BRENT’S METHOD 163 The difficulty can be rectified by employing inverse quadratic interpolation. That is, rather than using a parabola in x, we can fit the points with a parabola in y. This amounts to reversing the axes and creating a “sideways” parabola [the curve, x 5 f(y), in Fig. 6.11]. If the three points are designated as (xi22, yi22), (xi21, yi21), and (xi, yi), a quadratic function of y that passes through the points can be generated as g(y) 5 (y 2 yi21)(y 2 yi) (yi22 2 yi21)(yi22 2 yi) xi22 1 (y 2 yi22)(y 2 yi) (yi21 2 yi22)(yi21 2 yi) xi21 1 (y 2 yi22)(y 2 yi21) (yi 2 yi22)(yi 2 yi21) xi (6.9) As we will learn in Sec. 18.2, this form is called a Lagrange polynomial. The root, xi11, corresponds to y 5 0, which when substituted into Eq. (6.9) yields xi11 5 yi21 yi (yi22 2 yi21)(yi22 2 yi) xi22 1 yi22 yi (yi21 2 yi21 2 yi) xi21 1 yi22 yi21 (yi 2 yi22)(yi 2 yi21) xi (6.10) As shown in Fig. 6.11, such a “sideways” parabola always intersects the x axis. EXAMPLE 6.9 Inverse Quadratic Interpolation Problem Statement. Develop quadratic equations in both x and y for the data points depicted in Fig. 6.11: (1, 2), (2, 1), and (4, 5). For the first, y 5 f(x), employ the qua- dratic formula to illustrate that the roots are complex. For the latter, x 5 g(y), use inverse quadratic interpolation (Eq. 6.10) to determine the root estimate. FIGURE 6.11 Two parabolas fit to three points. The parabola written as a function of x, y 5 f(x), has complex roots and hence does not intersect the x axis. In contrast, if the variables are reversed, and the parabola developed as x 5 f(y), the function does intersect the x axis. 5 Root 3 1 2 0 2 4 6 y x = f(y) y = f(x) x
  • 181. 164 OPEN METHODS Solution. By reversing the x’s and y’s, Eq. (6.9) can be used to generate a quadratic in x as f(x) 5 (x 2 2)(x 2 4) (1 2 2)(1 2 4) 2 1 (x 2 1)(x 2 4) (2 2 1)(2 2 4) 1 1 (x 2 1)(x 2 2) (4 2 1)(4 2 2) 5 or collecting terms f(x) 5 x2 2 4x 1 5 This equation was used to generate the parabola, y 5 f(x), in Fig. 6.11. The quadratic formula can be used to determine that the roots for this case are complex, x 5 4 6 2(24)2 2 4(1)(5) 2 5 2 6 i Equation (6.9) can be used to generate the quadratic in y as g(y) 5 (y 2 1)(y 2 5) (2 2 1)(2 2 5) 1 1 (y 2 2)(y 2 5) (1 2 2)(1 2 5) 2 1 (y 2 2)(y 2 1) (5 2 2)(5 2 1) 4 or collecting terms g(y) 5 0.5x2 2 2.5x 1 4 Finally, Eq. (6.10) can be used to determine the root as xi11 5 21(25) (2 2 1)(2 2 5) 1 1 22(25) (1 2 2)(1 2 5) 2 1 22(21) (5 2 2)(5 2 1) 4 5 4 Before proceeding to Brent’s algorithm, we need to mention one more case where inverse quadratic interpolation does not work. If the three y values are not distinct (that is, yi22 5 yi21 or yi21 5 yi), an inverse quadratic function does not exist. So this is where the secant method comes into play. If we arrive at a situation where the y values are not distinct, we can always revert to the less efficient secant method to generate a root using two of the points. If yi22 5 yi21, we use the secant method with xi21 and xi. If yi21 5 yi, we use xi22 and xi21. 6.4.2 Brent’s Method Algorithm The general idea behind the Brent’s root finding method is whenever possible to use one of the quick open methods. In the event that these generate an unacceptable result (i.e., a root estimate that falls outside the bracket), the algorithm reverts to the more conservative bisection method. Although bisection may be slower, it generates an estimate guaranteed to fall within the bracket. This process is then repeated until the root is located to within an acceptable tolerance. As might be expected, bisection typically dominates at first but as the root is approached, the technique shifts to the faster open methods. Figure 6.12 presents pseudocode for the algorithm based on a MATLAB software M-file developed by Cleve Moler (2005). It represents a stripped down version of
  • 182. 6.4 BRENT’S METHOD 165 Function fzerosimp(xl, xu) eps 5 2.22044604925031E-16 tol 5 0.000001 a 5 xl: b 5 xu: fa 5 f(a): fb 5 f(b) c 5 a: fc 5 fa: d 5 b 2 c: e 5 d DO IF fb 5 0 EXIT IF Sgn(fa) 5 Sgn(fb) THEN (If necessary, rearrange points) a 5 c: fa 5 fc: d 5 b 2 c: e 5 d ENDIF IF |fa| , |fb| THEN c 5 b: b 5 a: a 5 c fc 5 fb: fb 5 fa: fa 5 fc ENDIF m 5 0.5 * (a 2 b) (Termination test and possible exit) tol 5 2 * eps * max(|b|, 1) IF |m| # tol Or fb 5 0. THEN EXIT ENDIF (Choose open methods or bisection) IF |e| $ tol And |fc| . |fb| THEN s 5 fb / fc IF a 5 c THEN (Secant method) p 5 2 * m * s q 5 1 2 s ELSE (Inverse quadratic interpolation) q 5 fc / fa: r 5 fb / fa p 5 s * (2 * m * q * (q 2 r) 2 (b 2 c) * (r 2 1)) q 5 (q 2 1) * (r 2 1) * (s 2 1) ENDIF IF p . 0 THEN q 5 2q ELSE p 5 2p IF 2 * p , 3 * m * q 2 |tol * q| AND p , |0.5 * e * q| THEN e 5 d: d 5 p / q ELSE d 5 m: e 5 m ENDIF ELSE (Bisection) d 5 m: e 5 m ENDIF c 5 b: fc 5 fb IF |d| . tol THEN b 5 b 1 d Else b 5 b 2 Sgn(b 2 a) * tol fb 5 f(b) ENDDO fzerosimp 5 b END fzerosimp FIGURE 6.12 Pseudocode for Brent’s root finding algorithm based on a MATLAB m-file developed by Cleve Moler (2005).
  • 183. 166 OPEN METHODS the fzero function which is the professional root location function employed in MAT- LAB. For that reason, we call the simplified version: fzerosimp. Note that it requires another function, f, that holds the equation for which the root is being evaluated. The fzerosimp function is passed two initial guesses that must bracket the root. After assigning values for machine epsilon and a tolerance, the three variables defining the search interval (a, b, c) are initialized, and f is evaluated at the endpoints. A main loop is then implemented. If necessary, the three points are rearranged to satisfy the conditions required for the algorithm to work effectively. At this point, if the stopping criteria are met, the loop is terminated. Otherwise, a decision structure chooses among the three methods and checks whether the outcome is acceptable. A final section then evaluates f at the new point and the loop is repeated. Once the stopping criteria are met, the loop terminates and the final root estimate is returned. Note that Sec. 7.7.2 presents an application of Brent’s method where we illustrate how the MATLAB’s fzero function works. In addition, it is employed in Case Study 8.4 to determine the friction factor for air flow through a tube. 6.5 MULTIPLE ROOTS A multiple root corresponds to a point where a function is tangent to the x axis. For example, a double root results from f(x) 5 (x 2 3)(x 2 1)(x 2 1) (6.11) or, multiplying terms, f(x) 5 x3 2 5x2 1 7x 2 3. The equation has a double root because one value of x makes two terms in Eq. (6.11) equal to zero. Graphically, this corresponds to the curve touching the x axis tangentially at the double root. Examine Fig. 6.13a at x 5 1. Notice that the function touches the axis but does not cross it at the root. A triple root corresponds to the case where one x value makes three terms in an equation equal to zero, as in f(x) 5 (x 2 3)(x 2 1)(x 2 1)(x 2 1) or, multiplying terms, f(x) 5 x4 2 6x3 1 12x2 2 10x 1 3. Notice that the graphical depiction (Fig. 6.13b) again indicates that the function is tangent to the axis at the root, but that for this case the axis is crossed. In general, odd multiple roots cross the axis, whereas even ones do not. For example, the quadruple root in Fig. 6.13c does not cross the axis. Multiple roots pose some difficulties for many of the numerical methods described in Part Two: 1. The fact that the function does not change sign at even multiple roots precludes the use of the reliable bracketing methods that were discussed in Chap. 5. Thus, of the methods covered in this book, you are limited to the open methods that may diverge. 2. Another possible problem is related to the fact that not only f(x) but also f9(x) goes to zero at the root. This poses problems for both the Newton-Raphson and secant methods, which both contain the derivative (or its estimate) in the denominator of
  • 184. 6.5 MULTIPLE ROOTS 167 their respective formulas. This could result in division by zero when the solution converges very close to the root. A simple way to circumvent these problems is based on the fact that it can be demonstrated theoretically (Ralston and Rabinowitz, 1978) that f(x) will always reach zero before f9(x). Therefore, if a zero check for f(x) is incorporated into the computer program, the computation can be terminated before f9(x) reaches zero. 3. It can be demonstrated that the Newton-Raphson and secant methods are linearly, rather than quadratically, convergent for multiple roots (Ralston and Rabinowitz, 1978). Modifications have been proposed to alleviate this problem. Ralston and Rabinowitz (1978) have indicated that a slight change in the formulation returns it to quadratic convergence, as in xi11 5 xi 2 m f(xi) f¿(xi) (6.12) where m is the multiplicity of the root (that is, m 5 2 for a double root, m 5 3 for a triple root, etc.). Of course, this may be an unsatisfactory alternative because it hinges on foreknowledge of the multiplicity of the root. Another alternative, also suggested by Ralston and Rabinowitz (1978), is to define a new function u(x), that is, the ratio of the function to its derivative, as in u(x) 5 f(x) f¿(x) (6.13) It can be shown that this function has roots at all the same locations as the original function. Therefore, Eq. (6.13) can be substituted into Eq. (6.6) to develop an alternative form of the Newton-Raphson method: xi11 5 xi 2 u(xi) u¿(xi) (6.14) Equation (6.13) can be differentiated to give u¿(x) 5 f ¿(x) f ¿(x) 2 f(x) f –(x) [ f ¿(x)]2 (6.15) Equations (6.13) and (6.15) can be substituted into Eq. (6.14) and the result simplified to yield xi11 5 xi 2 f(xi) f¿(xi) [f ¿(xi)]2 2 f(xi) f–(xi) (6.16) EXAMPLE 6.10 Modified Newton-Raphson Method for Multiple Roots Problem Statement. Use both the standard and modified Newton-Raphson methods to evaluate the multiple root of Eq. (6.11), with an initial guess of x0 5 0. FIGURE 6.13 Examples of multiple roots that are tangential to the x axis. Notice that the function does not cross the axis on either side of even multiple roots (a) and (c), whereas it crosses the axis for odd cases (b). f(x) x (a) Double root 1 3 4 0 –4 f(x) x (c) Quadruple root 1 3 4 0 –4 f(x) x (b) Triple root 1 3 4 0 –4
  • 185. 168 OPEN METHODS Solution. The first derivative of Eq. (6.11) is f¿(x) 5 3x2 2 10x 1 7, and therefore, the standard Newton-Raphson method for this problem is [Eq. (6.6)] xi11 5 xi 2 x3 i 2 5x2 i 1 7xi 2 3 3x2 i 2 10xi 1 7 which can be solved iteratively for i xi et (%) 0 0 100 1 0.4285714 57 2 0.6857143 31 3 0.8328654 17 4 0.9133290 8.7 5 0.9557833 4.4 6 0.9776551 2.2 As anticipated, the method is linearly convergent toward the true value of 1.0. For the modified method, the second derivative is f0(x) 5 6x 2 10, and the iterative relationship is [Eq. (6.16)] xi11 5 xi 2 (x3 i 2 5x2 i 1 7xi 2 3)(3x2 i 2 10xi 1 7) (3x2 i 2 10xi 1 7)2 2 (x3 i 2 5x2 i 1 7xi 2 3)(6xi 2 10) which can be solved for i xi et (%) 0 0 100 1 1.105263 11 2 1.003082 0.31 3 1.000002 0.00024 Thus, the modified formula is quadratically convergent. We can also use both methods to search for the single root at x 5 3. Using an initial guess of x0 5 4 gives the following results: i Standard et (%) Modified et (%) 0 4 33 4 33 1 3.4 13 2.636364 12 2 3.1 3.3 2.820225 6.0 3 3.008696 0.29 2.961728 1.3 4 3.000075 0.0025 2.998479 0.051 5 3.000000 2 3 1027 2.999998 7.7 3 1025 Thus, both methods converge quickly, with the standard method being somewhat more efficient.
  • 186. 6.6 SYSTEMS OF NONLINEAR EQUATIONS 169 The preceding example illustrates the trade-offs involved in opting for the modified Newton-Raphson method. Although it is preferable for multiple roots, it is somewhat less efficient and requires more computational effort than the standard method for simple roots. It should be noted that a modified version of the secant method suited for multiple roots can also be developed by substituting Eq. (6.13) into Eq. (6.7). The resulting formula is (Ralston and Rabinowitz, 1978) xi11 5 xi 2 u(xi)(xi21 2 xi) u(xi21) 2 u(xi) 6.6 SYSTEMS OF NONLINEAR EQUATIONS To this point, we have focused on the determination of the roots of a single equation. A related problem is to locate the roots of a set of simultaneous equations, f1(x1, x2, p , xn) 5 0 f2(x1, x2, p , xn) 5 0 (6.17) fn(x1, x2, p , xn) 5 0 The solution of this system consists of a set of x values that simultaneously result in all the equations equaling zero. In Part Three, we will present methods for the case where the simultaneous equations are linear—that is, they can be expressed in the general form f(x) 5 a1 x1 1 a2 x2 1 p 1 an xn 2 b 5 0 (6.18) where the b and the a’s are constants. Algebraic and transcendental equations that do not fit this format are called nonlinear equations. For example, x2 1 xy 5 10 and y 1 3xy2 5 57 are two simultaneous nonlinear equations with two unknowns, x and y. They can be expressed in the form of Eq. (6.17) as u(x, y) 5 x2 1 xy 2 10 5 0 (6.19a) y(x, y) 5 y 1 3xy2 2 57 5 0 (6.19b) Thus, the solution would be the values of x and y that make the functions u(x, y) and y(x, y) equal to zero. Most approaches for determining such solutions are extensions of the open methods for solving single equations. In this section, we will investigate two of these: fixed-point iteration and Newton-Raphson.
  • 187. 170 OPEN METHODS 6.6.1 Fixed-Point Iteration The fixed-point-iteration approach (Sec. 6.1) can be modified to solve two simultaneous, nonlinear equations. This approach will be illustrated in the following example. EXAMPLE 6.11 Fixed-Point Iteration for a Nonlinear System Problem Statement. Use fixed-point iteration to determine the roots of Eq. (6.19). Note that a correct pair of roots is x 5 2 and y 5 3. Initiate the computation with guesses of x 5 1.5 and y 5 3.5. Solution. Equation (6.19a) can be solved for xi11 5 10 2 x2 i yi (E6.11.1) and Eq. (6.19b) can be solved for yi11 5 57 2 3xi y2 i (E6.11.2) Note that we will drop the subscripts for the remainder of the example. On the basis of the initial guesses, Eq. (E6.11.1) can be used to determine a new value of x: x 5 10 2 (1.5)2 3.5 5 2.21429 This result and the initial value of y 5 3.5 can be substituted into Eq. (E6.11.2) to determine a new value of y: y 5 57 2 3(2.21429)(3.5)2 5 224.37516 Thus, the approach seems to be diverging. This behavior is even more pronounced on the second iteration: x 5 10 2 (2.21429)2 224.37516 5 20.20910 y 5 57 2 3(20.20910)(224.37516)2 5 429.709 Obviously, the approach is deteriorating. Now we will repeat the computation but with the original equations set up in a different format. For example, an alternative formulation of Eq. (6.19a) is x 5 210 2 xy and of Eq. (6.19b) is y 5 B 57 2 y 3x Now the results are more satisfactory: x 5 210 2 1.5(3.5) 5 2.17945
  • 188. 6.6 SYSTEMS OF NONLINEAR EQUATIONS 171 y 5 B 57 2 3.5 3(2.17945) 5 2.86051 x 5 210 2 2.17945(2.86051) 5 1.94053 y 5 B 57 2 2.86051 3(1.94053) 5 3.04955 Thus, the approach is converging on the true values of x 5 2 and y 5 3. The previous example illustrates the most serious shortcoming of simple fixed-point iteration—that is, convergence often depends on the manner in which the equations are formulated. Additionally, even in those instances where convergence is possible, diver- gence can occur if the initial guesses are insufficiently close to the true solution. Using reasoning similar to that in Box 6.1, it can be demonstrated that sufficient conditions for convergence for the two-equation case are ` 0u 0x ` 1 ` 0u 0y ` , 1 and ` 0y 0x ` 1 ` 0y 0y ` , 1 These criteria are so restrictive that fixed-point iteration has limited utility for solving nonlinear systems. However, as we will describe later in the book, it can be very useful for solving linear systems. 6.6.2 Newton-Raphson Recall that the Newton-Raphson method was predicated on employing the derivative (that is, the slope) of a function to estimate its intercept with the axis of the independent variable—that is, the root (Fig. 6.5). This estimate was based on a first-order Taylor series expansion (recall Box 6.2), f(xi11) 5 f(xi) 1 (xi11 2 xi) f¿(xi) (6.20) where xi is the initial guess at the root and xi11 is the point at which the slope intercepts the x axis. At this intercept, f(xi11) by definition equals zero and Eq. (6.20) can be rear- ranged to yield xi11 5 xi 2 f(xi) f¿(xi) (6.21) which is the single-equation form of the Newton-Raphson method. The multiequation form is derived in an identical fashion. However, a multivariable Taylor series must be used to account for the fact that more than one independent
  • 189. 172 OPEN METHODS variable contributes to the determination of the root. For the two-variable case, a first- order Taylor series can be written [recall Eq. (4.26)] for each nonlinear equation as ui11 5 ui 1 (xi11 2 xi) 0ui 0x 1 (yi11 2 yi) 0ui 0y (6.22a) and yi11 5 yi 1 (xi11 2 xi) 0yi 0x 1 (yi11 2 yi) 0yi 0y (6.22b) Just as for the single-equation version, the root estimate corresponds to the values of x and y, where ui11 and yi11 equal zero. For this situation, Eq. (6.22) can be rearranged to give 0ui 0x xi11 1 0ui 0y yi11 5 2ui 1 xi 0ui 0x 1 yi 0ui 0y (6.23a) 0yi 0x xi11 1 0yi 0y yi11 5 2yi 1 xi 0yi 0x 1 yi 0yi 0y (6.23b) Because all values subscripted with i’s are known (they correspond to the latest guess or approximation), the only unknowns are xi11 and yi11. Thus, Eq. (6.23) is a set of two linear equations with two unknowns [compare with Eq. (6.18)]. Consequently, algebraic manipulations (for example, Cramer’s rule) can be employed to solve for xi11 5 xi 2 ui 0yi 0y 2 yi 0ui 0y 0ui 0x 0yi 0y 2 0ui 0y 0yi 0x (6.24a) yi11 5 yi 2 yi 0ui 0x 2 ui 0yi 0x 0ui 0x 0yi 0y 2 0ui 0y 0yi 0x (6.24b) The denominator of each of these equations is formally referred to as the determinant of the Jacobian of the system. Equation (6.24) is the two-equation version of the Newton-Raphson method. As in the following example, it can be employed iteratively to home in on the roots of two simultaneous equations. EXAMPLE 6.12 Newton-Raphson for a Nonlinear System Problem Statement. Use the multiple-equation Newton-Raphson method to determine roots of Eq. (6.19). Note that a correct pair of roots is x 5 2 and y 5 3. Initiate the computation with guesses of x 5 1.5 and y 5 3.5. Solution. First compute the partial derivatives and evaluate them at the initial guesses of x and y: 0u0 0x 5 2x 1 y 5 2(1.5) 1 3.5 5 6.5 0u0 0y 5 x 5 1.5
  • 190. PROBLEMS 173 0y0 0x 5 3y2 5 3(3.5)2 5 36.75 0y0 0y 5 1 1 6xy 5 1 1 6(1.5)(3.5) 5 32.5 Thus, the determinant of the Jacobian for the first iteration is 6.5(32.5) 2 1.5(36.75) 5 156.125 The values of the functions can be evaluated at the initial guesses as u0 5 (1.5)2 1 1.5(3.5) 2 10 5 22.5 y0 5 3.5 1 3(1.5)(3.5)2 2 57 5 1.625 These values can be substituted into Eq. (6.24) to give x 5 1.5 2 22.5(32.5) 2 1.625(1.5) 156.125 5 2.03603 y 5 3.5 2 1.625(6.5) 2 (22.5)(36.75) 156.125 5 2.84388 Thus, the results are converging to the true values of x 5 2 and y 5 3. The computation can be repeated until an acceptable accuracy is obtained. Just as with fixed-point iteration, the Newton-Raphson approach will often diverge if the initial guesses are not sufficiently close to the true roots. Whereas graphical methods could be employed to derive good guesses for the single-equation case, no such simple procedure is available for the multiequation version. Although there are some advanced approaches for obtaining acceptable first estimates, often the initial guesses must be ob- tained on the basis of trial and error and knowledge of the physical system being modeled. The two-equation Newton-Raphson approach can be generalized to solve n simulta- neous equations. Because the most efficient way to do this involves matrix algebra and the solution of simultaneous linear equations, we will defer discussion of the general approach to Part Three. PROBLEMS 6.1 Use simple fixed-point iteration to locate the root of f(x) 5 sin ( 1x) 2 x Use an initial guess of x0 5 0.5 and iterate until ea # 0.01%. Verify that the process is linearly convergent as described in Box 6.1. 6.2 Determine the highest real root of f(x) 5 2x3 2 11.7x2 1 17.7x 2 5 (a) Graphically. (b) Fixed-point iteration method (three iterations, x0 5 3). Note: Make certain that you develop a solution that converges on the root. (c) Newton-Raphson method (three iterations, x0 5 3). (d) Secant method (three iterations, x21 5 3, x0 5 4). (e) Modified secant method (three iterations, x0 5 3, d 5 0.01). Compute the approximate percent relative errors for your solutions. 6.3 Use (a) fixed-point iteration and (b) the Newton-Raphson method to determine a root of f(x) 5 20.9x2 1 1.7x 1 2.5 using x0 5 5. Perform the computation until ea is less than es 5 0.01%. Also perform an error check of your final answer. 6.4 Determine the real roots of f(x) 5 21 1 5.5x 2 4x2 1 0.5x3 : (a) graphically and (b) using the Newton-Raphson method to within es 5 0.01%. 6.5 Employ the Newton-Raphson method to determine a real root for f(x) 5 21 1 5.5x 2 4x2 1 0.5x3 using initial guesses of (a) 4.52
  • 191. 174 OPEN METHODS and (b) 4.54. Discuss and use graphical and analytical methods to ex- plain any peculiarities in your results. 6.6 Determine the lowest real root of f(x) 5 212 2 21x 1 18x2 2 2.4x3 : (a) graphically and (b) using the secant method to a value of es corresponding to three significant figures. 6.7 Locate the first positive root of f(x) 5 sinx 1 cos (1 1 x2 ) 2 1 where x is in radians. Use four iterations of the secant method with initial guesses of (a) xi21 5 1.0 and xi 5 3.0; (b) xi21 5 1.5 and xi 5 2.5, and (c) xi21 5 1.5 and xi 5 2.25 to locate the root. (d) Use the graphical method to explain your results. 6.8 Determine the real root of x3.5 5 80, with the modified secant method to within es 5 0.1% using an initial guess of x0 5 3.5 and d 5 0.01. 6.9 Determine the highest real root of f(x) 5 x3 2 6x2 1 11x 2 6.1: (a) Graphically. (b) Using the Newton-Raphson method (three iterations, xi 5 3.5). (c) Using the secant method (three iterations, xi11 5 2.5 and xi 5 3.5). (d) Using the modified secant method (three iterations, xi 5 3.5, d 5 0.01). 6.10 Determinethelowestpositiverootof f(x) 5 7sin (x)e2x 2 1: (a) Graphically. (b) Using the Newton-Raphson method (three iterations, xi 5 0.3). (c) Using the secant method (five iterations, xi21 5 0.5 and xi 5 0.4). (d) Using the modified secant method (three iterations, xi 5 0.3, d 5 0.01). 6.11 Use the Newton-Raphson method to find the root of f(x) 5 e20.5x (4 2 x) 2 2 Employ initial guesses of (a) 2, (b) 6, and (c) 8. Explain your results. 6.12 Given f(x) 5 22x6 2 1.5x4 1 10x 1 2 Use a root location technique to determine the maximum of this function. Perform iterations until the approximate relative error falls below 5%. If you use a bracketing method, use initial guesses of xl 5 0 and xu 5 1. If you use the Newton-Raphson or the modi- fied secant method, use an initial guess of xi 5 1. If you use the secant method, use initial guesses of xi21 5 0 and xi 5 1. Assuming that convergence is not an issue, choose the technique that is best suited to this problem. Justify your choice. 6.13 You must determine the root of the following easily differen- tiable function, e0.5x 5 5 2 5x Pick the best numerical technique, justify your choice and then use that technique to determine the root. Note that it is known that for positive initial guesses, all techniques except fixed-point iteration will eventually converge. Perform iterations until the approximate relative error falls below 2%. If you use a bracket- ing method, use initial guesses of xl 5 0 and xu 5 2. If you use the Newton-Raphson or the modified secant method, use an ini- tial guess of xi 5 0.7. If you use the secant method, use initial guesses of xi21 5 0 and xi 5 2. 6.14 Use (a) the Newton-Raphson method and (b) the modified secant method (d 5 0.05) to determine a root of f(x) 5 x5 2 16.05x4 1 88.75x3 2 192.0375x2 1 116.35x 1 31.6875 using an initial guess of x 5 0.5825 and es 5 0.01%. Explain your results. 6.15 The “divide and average” method, an old-time method for approximating the square root of any positive number a, can be formulated as x 5 x 1 ayx 2 Prove that this is equivalent to the Newton-Raphson algorithm. 6.16 (a) Apply the Newton-Raphson method to the function f(x) 5 tanh(x2 2 9) to evaluate its known real root at x 5 3. Use an initial guess of x0 5 3.2 and take a minimum of four iterations. (b) Did the method exhibit convergence onto its real root? Sketch the plot with the results for each iteration shown. 6.17 The polynomial f(x) 5 0.0074x4 2 0.284x3 1 3.355x2 2 12.183x 1 5 has a real root between 15 and 20. Apply the Newton- Raphson method to this function using an initial guess of x0 5 16.15. Explain your results. 6.18 Use the secant method on the circle function (x 1 1)2 1 (y 2 2)2 5 16 to find a positive real root. Set your initial guess to xi 5 3 and xi21 5 0.5. Approach the solution from the first and fourth quadrants. When solving for f(x) in the fourth quadrant, be sure to take the negative value of the square root. Why does your solution diverge? 6.19 You are designing a spherical tank (Fig. P6.19) to hold water for a small village in a developing country. The volume of liquid it can hold can be computed as V 5 ph2 [3R 2 h] 3 where V 5 volume (m3 ), h 5 depth of water in tank (m), and R 5 the tank radius (m). If R 5 3 m, what depth must the tank be filled to so that it holds 30 m3 ? Use three iterations of the Newton- Raphson method to determine your answer. Determine the ap- proximate relative error after each iteration. Note that an initial guess of R will always converge.
  • 192. PROBLEMS 175 6.20 The Manning equation can be written for a rectangular open channel as Q 5 1S(BH)5y3 n(B 1 2H)2y3 where Q 5 flow [m3 /s], S 5 slope [m/m], H 5 depth [m], and n 5 the Manning roughness coefficient. Develop a fixed-point iteration scheme to solve this equation for H given Q 5 5, S 5 0.0002, B 5 20, and n 5 0.03. Prove that your scheme converges for all initial guesses greater than or equal to zero. 6.21 The function x3 2 2x2 2 4x 1 8 has a double root at x 5 2. Use (a) the standard Newton-Raphson [Eq. (6.6)], (b) the modi- fied Newton-Raphson [Eq. (6.12)], and (c) the modified Newton- Raphson [Eq. (6.16)] to solve for the root at x 5 2. Compare and discuss the rate of convergence using an initial guess of x0 5 1.2. 6.22 Determine the roots of the following simultaneous nonlinear equations using (a) fixed-point iteration and (b) the Newton-Raphson method: y 5 2x2 1 x 1 0.75 y 1 5xy 5 x2 Employ initial guesses of x 5 y 5 1.2 and discuss the results. 6.23 Determine the roots of the simultaneous nonlinear equations (x 2 4)2 1 (y 2 4)2 5 5 x2 1 y2 5 16 Use a graphical approach to obtain your initial guesses. Determine refined estimates with the two-equation Newton-Raphson method described in Sec. 6.6.2. 6.24 Repeat Prob. 6.23 except determine the positive root of y 5 x2 1 1 y 5 2cosx 6.25 A mass balance for a pollutant in a well-mixed lake can be written as V dc dt 5 W 2 Qc 2 kV 1c Given the parameter values V 5 1 3 106 m3 , Q 5 1 3 105 m3 /yr, W 5 1 3 106 g/yr, and k 5 0.25 m0.5 /g0.5 /yr, use the modified secant method to solve for the steady-state concentration. Employ an ini- tial guess of c 5 4 g/m3 and d 5 0.5. Perform three iterations and determine the percent relative error after the third iteration. 6.26 For Prob. 6.25, the root can be located with fixed-point iteration as c 5 a W 2 Qc kV b 2 or as c 5 W 2 kV 1c Q Only one will converge for initial guesses of 2 , c , 6. Select the correct one and demonstrate why it will always work. 6.27 Develop a user-friendly program for the Newton-Raphson method based on Fig. 6.4 and Sec. 6.2.3. Test it by duplicating the computation from Example 6.3. 6.28 Develop a user-friendly program for the secant method based on Fig. 6.4 and Sec. 6.3.2. Test it by duplicating the computation from Example 6.6. 6.29 Develop a user-friendly program for the modified secant method based on Fig. 6.4 and Sec. 6.3.2. Test it by duplicating the computation from Example 6.8. 6.30 Develop a user-friendly program for Brent’s root location method based on Fig. 6.12. Test it by solving Prob. 6.6. 6.31 Develop a user-friendly program for the two-equation Newton-Raphson method based on Sec. 6.6.2. Test it by solving Example 6.12. 6.32 Use the program you developed in Prob. 6.31 to solve Probs. 6.22 and 6.23 to within a tolerance of es 5 0.01%. h V R FIGURE P6.19
  • 193. 7 C H A P T E R 7 176 Roots of Polynomials In this chapter, we will discuss methods to find the roots of polynomial equations of the general form fn(x) 5 a0 1 a1x 1 a2x2 1 p 1 an xn (7.1) where n 5 the order of the polynomial and the a’s 5 constant coefficients. Although the coefficients can be complex numbers, we will limit our discussion to cases where they are real. For such cases, the roots can be real and/or complex. The roots of such polynomials follow these rules: 1. For an nth-order equation, there are n real or complex roots. It should be noted that these roots will not necessarily be distinct. 2. If n is odd, there is at least one real root. 3. If complex roots exist, they exist in conjugate pairs (that is, l 1 mi and l 2 mi), where i 5 121. Before describing the techniques for locating the roots of polynomials, we will provide some background. The first section offers some motivation for studying the techniques; the second deals with some fundamental computer manipulations involving polynomials. 7.1 POLYNOMIALS IN ENGINEERING AND SCIENCE Polynomials have many applications in engineering and science. For example, they are used extensively in curve-fitting. However, we believe that one of their most interesting and powerful applications is in characterizing dynamic systems and, in particular, linear systems. Examples include mechanical devices, structures, and electrical circuits. We will be explor- ing specific examples throughout the remainder of this text. In particular, they will be the focus of several of the engineering applications throughout the remainder of this text. For the time being, we will keep the discussion simple and general by focusing on a simple second-order system defined by the following linear ordinary differential equa- tion (or ODE): a2 d2 y dt2 1 a1 dy dt 1 a0y 5 F(t) (7.2)
  • 194. 7.1 POLYNOMIALS IN ENGINEERING AND SCIENCE 177 where y and t are the dependent and independent variables, respectively, the a’s are constant coefficients, and F(t) is the forcing function. In addition, it should be noted that Eq. (7.2) can be alternatively expressed as a pair of first-order ODEs by defining a new variable z, z 5 dy dt (7.3) Equation (7.3) can be substituted along with its derivative into Eq. (7.2) to remove the second-derivative term. This reduces the problem to solving dz dt 5 F(t) 2 a1z 2 a0y a2 (7.4) dz dt 5 z (7.5) In a similar fashion, an nth-order linear ODE can always be expressed as a system of n first-order ODEs. Now let’s look at the solution. The forcing function represents the effect of the external world on the system. The homogeneous or general solution of the equation deals with the case when the forcing function is set to zero, a2 d2 y dt2 1 a1 dy dt 1 a0y 5 0 (7.6) Thus, as the name implies, the general solution should tell us something very fundamental about the system being simulated—that is, how the system responds in the absence of external stimuli. Now, the general solution to all unforced linear systems is of the form y 5 ert . If this function is differentiated and substituted into Eq. (7.6), the result is a2r2 ert 1 a1rert 1 a0 ert 5 0 or canceling the exponential terms, a2r2 1 a1r 1 a0 5 0 (7.7) Notice that the result is a polynomial called the characteristic equation. The roots of this polynomial are the values of r that satisfy Eq. (7.7). These r’s are referred to as the system’s characteristic values, or eigenvalues. So, here is the connection between roots of polynomials and engineering and science. The eigenvalue tells us something fundamental about the system we are modeling, and finding the eigenvalues involves finding the roots of polynomials. And, whereas finding the root of a second-order equation is easy with the quadratic formula, finding roots of higher-order systems (and hence, higher-order polynomials) is arduous analyti- cally. Thus, the best general approach requires numerical methods of the type described in this chapter. Before proceeding to these methods, let us take our analysis a bit farther by in- vestigating what specific values of the eigenvalues might imply about the behavior of
  • 195. 178 ROOTS OF POLYNOMIALS physical systems. First, let us evaluate the roots of Eq. (7.7) with the quadratic formula, r1 r2 5 2a1 6 2a2 1 2 4a2a0 a0 Thus, we get two roots. If the discriminant (a2 1 2 4a2a0) is positive, the roots are real and the general solution can be represented as y 5 c1er1t 1 c2er2t (7.8) where the c’s 5 constants that can be determined from the initial conditions. This is called the overdamped case. If the discriminant is zero, a single real root results, and the general solution can be formulated as y 5 (c1 1 c2t)elt (7.9) This is called the critically damped case. If the discriminant is negative, the roots will be complex conjugate numbers, r1 r2 5 l 6 mi and the general solution can be formulated as y 5 c1e(l1mi)t 1 c2e(l2mi)t FIGURE 7.1 The general solution for linear ODEs can be composed of (a) exponential and (b) sinusoidal components. The combination of the two shapes results in the damped sinusoid shown in (c). y t (a) (b) y t (c) y t
  • 196. 7.2 COMPUTING WITH POLYNOMIALS 179 The physical behavior of this solution can be elucidated by using Euler’s formula emit 5 cosmt 1 i sin mt to reformulate the general solution as (see Boyce and DiPrima, 1992, for details of the derivation) y 5 c1elt cos mt 1 c2elt sin mt (7.10) This is called the underdamped case. Equations (7.8), (7.9), and (7.10) express the possible ways that linear systems re- spond dynamically. The exponential terms mean that the solutions are capable of decay- ing (negative real part) or growing (positive real part) exponentially with time (Fig. 7.1a). The sinusoidal terms (imaginary part) mean that the solutions can oscillate (Fig. 7.1b). If the eigenvalue has both real and imaginary parts, the exponential and sinusoidal shapes are combined (Fig. 7.1c). Because such knowledge is a key element in understanding, designing, and controlling the behavior of a physical system, characteristic polynomials are very important in engineering and many branches of science. We will explore the dynamics of several engineering systems in the applications covered in Chap. 8. 7.2 COMPUTING WITH POLYNOMIALS Before describing root-location methods, we will discuss some fundamental computer operations involving polynomials. These have utility in their own right as well as provid- ing support for root finding. 7.2.1 Polynomial Evaluation and Differentiation Although it is the most common format, Eq. (7.1) provides a poor means for determin- ing the value of a polynomial for a particular value of x. For example, evaluating a third-order polynomial as f3(x) 5 a3 x3 1 a2 x2 1 a1x 1 a0 (7.11) involves six multiplications and three additions. In general, for an nth-order polynomial, this approach requires n(n 1 1)y2 multiplications and n additions. In contrast, a nested format, f3(x) 5 ((a3x 1 a2)x 1 a1)x 1 a0 (7.12) involves three multiplications and three additions. For an nth-order polynomial, this ap- proach requires n multiplications and n additions. Because the nested format minimizes the number of operations, it also tends to minimize round-off errors. Note that, depend- ing on your preference, the order of nesting can be reversed: f3(x) 5 a0 1 x(a1 1 x(a2 1 xa3)) (7.13) Succinct pseudocode to implement the nested form can be written simply as DOFOR j 5 n, 0, 21 p 5 p * x1a(j) END DO
  • 197. 180 ROOTS OF POLYNOMIALS where p holds the value of the polynomial (defined by its coefficients, the a’s) evaluated at x. There are cases (such as in the Newton-Raphson method) where you might want to evaluate both the function and its derivative. This evaluation can also be neatly included by adding a single line to the preceding pseudocode, D0FOR j 5 n, 0, 21 df 5 df * x1p p 5 p * x1a(j) END DO where df holds the first derivative of the polynomial. 7.2.2 Polynomial Deflation Suppose that you determine a single root of an nth-order polynomial. If you repeat your root location procedure, you might find the same root. Therefore, it would be nice to remove the found root before proceeding. This removal process is referred to as polyno- mial deflation. Before we show how this is done, some orientation might be useful. Polynomials are typically represented in the format of Eq. (7.1). For example, a fifth-order polynomial could be written as f5(x) 5 2120 2 46x 1 79x2 2 3x3 2 7x4 1 x5 (7.14) Although this is a familiar format, it is not necessarily the best expression to understand the polynomial’s mathematical behavior. For example, this fifth-order polynomial might be expressed alternatively as f5(x) 5 (x 1 1)(x 2 4)(x 2 5)(x 1 3)(x 2 2) (7.15) This is called the factored form of the polynomial. If multiplication is completed and like terms collected, Eq. (7.14) would be obtained. However, the format of Eq. (7.15) has the advantage that it clearly indicates the function’s roots. Thus, it is apparent that x 5 21, 4, 5, 23, and 2 are all roots because each causes an individual term in Eq. (7.15) to become zero. Now, suppose that we divide this fifth-order polynomial by any of its factors, for example, x 1 3. For this case, the result would be a fourth-order polynomial f4(x) 5 (x 1 1)(x 2 4)(x 2 5)(x 2 2) 5 240 2 2x 1 27x2 2 10x3 1 x4 (7.16) with a remainder of zero. In the distant past, you probably learned to divide polynomials using the approach called synthetic division. Several computer algorithms (based on both synthetic division and other methods) are available for performing the operation. One simple scheme is provided by the following pseudocode, which divides an nth-order polynomial by a
  • 198. 7.2 COMPUTING WITH POLYNOMIALS 181 monomial factor x 2 t: r 5 a(n) a(n) 5 0 DOFOR i 5 n21, 0, 21 s 5 a(i) a(i) 5 r r 5 s 1 r * t END DO If the monomial is a root of the polynomial, the remainder r will be zero, and the coef- ficients of the quotient stored in a, at the end of the loop. EXAMPLE 7.1 Polynomial Deflation Problem Statement. Divide the second-order polynomial, f(x) 5 (x 2 4)(x 1 6) 5 x2 1 2x 2 24 by the factor x 2 4. Solution. Using the approach outlined in the above pseudocode, the parameters are n 5 2, a0 5 224, a1 5 2, a2 5 1, and t 5 4. These can be used to compute r 5 a2 5 1 a2 5 0 The loop is then iterated from i 5 2 2 1 5 1 to 0. For i 5 1, s 5 a1 5 2 a1 5 r 5 1 r 5 s 1 rt 5 2 1 1(4) 5 6 For i 5 0, s 5 a0 5 224 a0 5 r 5 6 r 5 224 1 6(4) 5 0 Thus, the result is as expected—the quotient is a0 1 a1x 5 6 1 x, with a remainder of zero. It is also possible to divide by polynomials of higher order. As we will see later in this chapter, the most common task involves dividing by a second-order polynomial or parabola. The subroutine in Fig. 7.2 addresses the more general problem of dividing an nth-order polynomial a by an mth-order polynomial d. The result is an (n 2 m)th-order polynomial q, with an (m 2 1)th-order polynomial as the remainder. Because each calculated root is known only approximately, it should be noted that deflation is sensitive to round-off errors. In some cases, round-off error can grow to the point that the results can become meaningless. Some general strategies can be applied to minimize this problem. For example, round-off error is affected by the order in which the terms are evaluated. Forward deflation refers to the
  • 199. 182 ROOTS OF POLYNOMIALS case where new polynomial coefficients are in order of descending powers of x (that is, from the highest-order to the zero-order term). For this case, it is preferable to divide by the roots of smallest absolute value first. Conversely, for backward deflation (that is, from the zero-order to the highest-order term), it is preferable to divide by the roots of largest absolute value first. Another way to reduce round-off errors is to consider each successive root estimate obtained during deflation as a good first guess. These can then be used as a starting guess, and the root determined again with the original nondeflated polynomial. This is referred to as root polishing. Finally, a problem arises when two deflated roots are inaccurate enough that they both converge on the same undeflated root. In that case, you might be erroneously led to believe that the polynomial has a multiple root (recall Sec. 6.5). One way to detect this problem is to compare each polished root with those that were located previously. Press et al. (2007) discuss this problem in more detail. 7.3 CONVENTIONAL METHODS Now that we have covered some background material on polynomials, we can begin to describe methods to locate their roots. The obvious first step would be to investigate the viability of the bracketing and open approaches described in Chaps. 5 and 6. The efficacy of these approaches depends on whether the problem being solved involves complex roots. If only real roots exist, any of the previously described methods could have utility. However, the problem of finding good initial guesses complicates both the bracketing and the open methods, whereas the open methods could be susceptible to divergence. SUB poldiv(a, n, d, m, q, r) DOFOR j 5 0, n r(j) 5 a(j) q(j) 5 0 END DO DOFOR k 5 n2m, 0, 21 q(k11) 5 r(m1k) y d(m) DOFOR j 5 m1k21, k, 21 r(j) 5 r(j)2q(k11) * b(j2k) END DO END DO DOFOR j 5 m, n r(j) 5 0 END DO n 5 n2m DOFOR i 5 0, n a(i) 5 q(i11) END DO END SUB FIGURE 7.2 Algorithm to divide a polynomial (defined by its coefficients a) by a lower-order polynomial d.
  • 200. 7.4 MÜLLER’S METHOD 183 When complex roots are possible, the bracketing methods cannot be used because of the obvious problem that the criterion for defining a bracket (that is, sign change) does not translate to complex guesses. Of the open methods, the conventional Newton-Raphson method would provide a viable approach. In particular, concise code including deflation can be developed. If a language that accommodates complex variables (like Fortran) is used, such an algorithm will locate both real and complex roots. However, as might be expected, it would be susceptible to convergence problems. For this reason, special methods have been devel- oped to find the real and complex roots of polynomials. We describe two—the Müller and Bairstow methods—in the following sections. As you will see, both are related to the more conventional open approaches described in Chap. 6. 7.4 MÜLLER’S METHOD Recall that the secant method obtains a root estimate by projecting a straight line to the x axis through two function values (Fig. 7.3a). Müller’s method takes a similar approach, but projects a parabola through three points (Fig. 7.3b). The method consists of deriving the coefficients of the parabola that goes through the three points. These coefficients can then be substituted into the quadratic formula to obtain the point where the parabola intercepts the x axis—that is, the root estimate. The approach is facilitated by writing the parabolic equation in a convenient form, f2(x) 5 a(x 2 x2)2 1 b(x 2 x2) 1 c (7.17) We want this parabola to intersect the three points [x0, f(x0)], [x1, f(x1)], and [x2, f(x2)]. The coefficients of Eq. (7.17) can be evaluated by substituting each of the three points to give f(x0) 5 a(x0 2 x2)2 1 b(x0 2 x2) 1 c (7.18) f(x1) 5 a(x1 2 x2)2 1 b(x1 2 x2) 1 c (7.19) f(x2) 5 a(x2 2 x2)2 1 b(x2 2 x2) 1 c (7.20) FIGURE 7.3 A comparison of two related approaches for locating roots: (a) the secant method and (b) Müller’s method. f (x) x x1 x0 (a) Straight line Root estimate Root f(x) x x2 x0 (b) Parabola Root Root estimate x1
  • 201. 184 ROOTS OF POLYNOMIALS Note that we have dropped the subscript “2” from the function for conciseness. Because we have three equations, we can solve for the three unknown coefficients, a, b, and c. Because two of the terms in Eq. (7.20) are zero, it can be immediately solved for c 5 f(x2). Thus, the coefficient c is merely equal to the function value evaluated at the third guess, x2. This result can then be substituted into Eqs. (7.18) and (7.19) to yield two equations with two unknowns: f(x0) 2 f(x2) 5 a(x0 2 x2)2 1 b(x0 2 x2) (7.21) f(x1) 2 f(x2) 5 a(x1 2 x2)2 1 b(x1 2 x2) (7.22) Algebraic manipulation can then be used to solve for the remaining coefficients, a and b. One way to do this involves defining a number of differences, h0 5 x1 2 x0 h1 5 x2 2 x1 d0 5 f(x1) 2 f(x0) x1 2 x0 d1 5 f(x2) 2 f(x1) x2 2 x1 (7.23) These can be substituted into Eqs. (7.21) and (7.22) to give (h0 1 h1)b 2 (h0 1 h1)2 a 5 h0d0 1 h1d1 h1 b 2 h2 1 a 5 h1d1 which can be solved for a and b. The results can be summarized as a 5 d1 2 d0 h1 1 h0 (7.24) b 5 ah1 1 d1 (7.25) c 5 f(x2) (7.26) To find the root, we apply the quadratic formula to Eq. (7.17). However, because of potential round-off error, rather than using the conventional form, we use the alternative formulation [Eq. (3.13)] to yield x3 2 x2 5 22c b 6 2b2 2 4ac (7.27a) or isolating the unknown x3 on the left side of the equal sign, x3 5 x2 1 22c b 6 2b2 2 4ac (7.27b) Note that the use of the quadratic formula means that both real and complex roots can be located. This is a major benefit of the method. In addition, Eq. (7.27a) provides a neat means to determine the approximate error. Because the left side represents the difference between the present (x3) and the previous (x2) root estimate, the error can be calculated as ea 5 ` x3 2 x2 x3 ` 100%
  • 202. 7.4 MÜLLER’S METHOD 185 Now, a problem with Eq. (7.27a) is that it yields two roots, corresponding to the 6 term in the denominator. In Müller’s method, the sign is chosen to agree with the sign of b. This choice will result in the largest denominator, and hence, will give the root estimate that is closest to x2. Once x3 is determined, the process is repeated. This brings up the issue of which point is discarded. Two general strategies are typically used: 1. If only real roots are being located, we choose the two original points that are near- est the new root estimate, x3. 2. If both real and complex roots are being evaluated, a sequential approach is employed. That is, just like the secant method, x1, x2, and x3 take the place of x0, x1, and x2. EXAMPLE 7.2 Müller’s Method Problem Statement. Use Müller’s method with guesses of x0, x1, and x2 5 4.5, 5.5, and 5, respectively, to determine a root of the equation f(x) 5 x3 2 13x 2 12 Note that the roots of this equation are 23, 21, and 4. Solution. First, we evaluate the function at the guesses f(4.5) 5 20.625 f(5.5) 5 82.875 f(5) 5 48 which can be used to calculate h0 5 5.5 2 4.5 5 1 h1 5 5 2 5.5 5 20.5 d0 5 82.875 2 20.625 5.5 2 4.5 5 62.25 d1 5 48 2 82.875 5 2 5.5 5 69.75 These values in turn can be substituted into Eqs. (7.24) through (7.26) to compute a 5 69.75 2 62.25 20.5 1 1 5 15 b 5 15(20.5) 1 69.75 5 62.25 c 5 48 The square root of the discriminant can be evaluated as 262.252 2 4(15)48 5 31.54461 Then, because Z62.25 1 31.54451Z . Z62.25 2 31.54451Z, a positive sign is employed in the denominator of Eq. (7.27b), and the new root estimate can be determined as x3 5 5 1 22(48) 62.25 1 31.54451 5 3.976487 and develop the error estimate ea 5 ` 21.023513 3.976487 ` 100% 5 25.74% Because the error is large, new guesses are assigned; x0 is replaced by x1, x1 is replaced by x2, and x2 is replaced by x3. Therefore, for the new iteration, x0 5 5.5 x1 5 5 x2 5 3.976487
  • 203. 186 ROOTS OF POLYNOMIALS Pseudocode to implement Müller’s method for real roots is presented in Fig. 7.4. Notice that this routine is set up to take a single initial nonzero guess that is then perturbed to develop the other two guesses. Of course, the algorithm can also be and the calculation is repeated. The results, tabulated below, show that the method con- verges rapidly on the root, xr 5 4: i xr Ea (%) 0 5 1 3.976487 25.74 2 4.00105 0.6139 3 4 0.0262 4 4 0.0000119 FIGURE 7.4 Pseudocode for Müller’s method. SUB Muller(xr, h, eps, maxit) x2 5 xr x1 5 xr 1 h*xr x0 5 xr 2 h*xr DO iter 5 iter 1 1 h0 5 x1 2 x0 h1 5 x2 2 x1 d0 5 (f(x1) 2 f(x0)) / h0 d1 5 (f(x2) 2 f(x1)) / h1 a 5 (d1 2 d0) / (h1 1 h0) b 5 a*h1 1 d1 c 5 f(x2) rad 5 SQRT(b*b 2 4*a*c) If |b1rad| . |b2rad| THEN den 5 b 1 rad ELSE den 5 b 2 rad END IF dxr 5 22*c y den xr 5 x2 1 dxr PRINT iter, xr IF (|dxr| , eps*xr OR iter .5 maxit) EXIT x0 5 x1 x1 5 x2 x2 5 xr END DO END Müller
  • 204. 7.5 BAIRSTOW’S METHOD 187 programmed to accommodate three guesses. For languages like Fortran, the code will find complex roots if the proper variables are declared as complex. 7.5 BAIRSTOW’S METHOD Bairstow’s method is an iterative approach related loosely to both the Müller and Newton- Raphson methods. Before launching into a mathematical description of the technique, recall the factored form of the polynomial, f5(x) 5 (x 1 1)(x 2 4)(x 2 5)(x 1 3)(x 2 2) (7.28) If we divided by a factor that is not a root (for example, x 1 6), the quotient would be a fourth-order polynomial. However, for this case, a remainder would result. On the basis of the above, we can elaborate on an algorithm for determining a root of a polynomial: (1) guess a value for the root x 5 t, (2) divide the polynomial by the factor x 2 t, and (3) determine whether there is a remainder. If not, the guess was perfect and the root is equal to t. If there is a remainder, the guess can be systematically adjusted and the procedure repeated until the remainder disappears and a root is located. After this is accomplished, the entire procedure can be repeated for the quotient to locate another root. Bairstow’s method is generally based on this approach. Consequently, it hinges on the mathematical process of dividing a polynomial by a factor. Recall from our discus- sion of polynomial deflation (Sec. 7.2.2) that synthetic division involves dividing a poly- nomial by a factor x 2 t. For example, the general polynomial [Eq. (7.1)] fn(x) 5 a0 1 a1x 1 a2 x2 1 p 1 an xn (7.29) can be divided by the factor x 2 t to yield a second polynomial that is one order lower, fn21(x) 5 b1 1 b2 x 1 b3 x2 1 p 1 bn xn21 (7.30) with a remainder R 5 b0, where the coefficients can be calculated by the recurrence relationship bn 5 an bi 5 ai 1 bi11t for i 5 n 2 1 to 0 Note that if t were a root of the original polynomial, the remainder b0 would equal zero. To permit the evaluation of complex roots, Bairstow’s method divides the polynomial by a quadratic factor x2 2 rx 2 s. If this is done to Eq. (7.29), the result is a new poly- nomial fn22(x) 5 b2 1 b3 x 1 p 1 bn21 xn23 1 bn xn22 with a remainder R 5 b1(x 2 r) 1 b0 (7.31) As with normal synthetic division, a simple recurrence relationship can be used to perform the division by the quadratic factor: bn 5 an (7.32a) bn21 5 an21 1 rbn (7.32b) bi 5 ai 1 rbi11 1 sbi12 for i 5 n 2 2 to 0 (7.32c)
  • 205. 188 ROOTS OF POLYNOMIALS The quadratic factor is introduced to allow the determination of complex roots. This relates to the fact that, if the coefficients of the original polynomial are real, the complex roots occur in conjugate pairs. If x2 2 rx 2 s is an exact divisor of the polynomial, complex roots can be determined by the quadratic formula. Thus, the method reduces to determining the values of r and s that make the quadratic factor an exact divisor. In other words, we seek the values that make the remainder term equal to zero. Inspection of Eq. (7.31) leads us to conclude that for the remainder to be zero, b0 and b1 must be zero. Because it is unlikely that our initial guesses at the values of r and s will lead to this result, we must determine a systematic way to modify our guesses so that b0 and b1 approach zero. To do this, Bairstow’s method uses a strategy similar to the Newton-Raphson approach. Because both b0 and b1 are functions of both r and s, they can be expanded using a Taylor series, as in [recall Eq. (4.26)] b1(r 1 ¢r, s 1 ¢s) 5 b1 1 0b1 0r ¢r 1 0b1 0s ¢s b0(r 1 ¢r, s 1 ¢s) 5 b0 1 0b0 0r ¢r 1 0b0 0s ¢s (7.33) where the values on the right-hand side are all evaluated at r and s. Notice that second- and higher-order terms have been neglected. This represents an implicit assumption that 2r and 2s are small enough that the higher-order terms are negligible. Another way of expressing this assumption is to say that the initial guesses are adequately close to the values of r and s at the roots. The changes, Dr and Ds, needed to improve our guesses can be estimated by setting Eq. (7.33) equal to zero to give 0b1 0r ¢r 1 0b1 0s ¢s 5 2b1 (7.34) 0b0 0r ¢r 1 0b0 0s ¢s 5 2b0 (7.35) If the partial derivatives of the b’s can be determined, these are a system of two equa- tions that can be solved simultaneously for the two unknowns, Dr and Ds. Bairstow showed that the partial derivatives can be obtained by a synthetic division of the b’s in a fashion similar to the way in which the b’s themselves were derived: cn 5 bn (7.36a) cn21 5 bn21 1 rcn (7.36b) ci 5 bi 1 rci11 1 sci12 for i 5 n 2 2 to 1 (7.36c) where 0b0y0r 5 c1, 0b0y0s 5 0b1y0r 5 c2, and 0b1y0s 5 c3. Thus, the partial derivatives are obtained by synthetic division of the b’s. Then the partial derivatives can be substi- tuted into Eqs. (7.34) and (7.35) along with the b’s to give c2 ¢r 1 c3 ¢s 5 2b1 c1 ¢r 1 c2 ¢s 5 2b0
  • 206. 7.5 BAIRSTOW’S METHOD 189 These equations can be solved for Dr and Ds, which can in turn be employed to improve the initial guesses of r and s. At each step, an approximate error in r and s can be esti- mated, as in Zea,r Z 5 ` ¢r r ` 100% (7.37) and Zea,s Z 5 ` ¢s s ` 100% (7.38) When both of these error estimates fall below a prespecified stopping criterion es, the values of the roots can be determined by x 5 r 6 2r2 1 4s 2 (7.39) At this point, three possibilities exist: 1. The quotient is a third-order polynomial or greater. For this case, Bairstow’s method would be applied to the quotient to evaluate new values for r and s. The previous values of r and s can serve as the starting guesses for this application. 2. The quotient is a quadratic. For this case, the remaining two roots could be evaluated directly with Eq. (7.39). 3. The quotient is a first-order polynomial. For this case, the remaining single root can be evaluated simply as x 5 2 s r (7.40) EXAMPLE 7.3 Bairstow’s Method Problem Statement. Employ Bairstow’s method to determine the roots of the polynomial f5(x) 5 x5 2 3.5x4 1 2.75x3 1 2.125x2 2 3.875x 1 1.25 Use initial guesses of r 5 s 5 21 and iterate to a level of es 5 1%. Solution. Equations (7.32) and (7.36) can be applied to compute b5 5 1 b4 5 24.5 b3 5 6.25 b2 5 0.375 b1 5 210.5 b0 5 11.375 c5 5 1 c4 5 25.5 c3 5 10.75 c2 5 24.875 c1 5 216.375 Thus, the simultaneous equations to solve for Dr and Ds are 24.875¢r 1 10.75¢s 5 10.5 216.375¢r 2 4.875¢s 5 211.375 which can be solved for Dr 5 0.3558 and Ds 5 1.1381. Therefore, our original guesses can be corrected to r 5 21 1 0.3558 5 20.6442 s 5 21 1 1.1381 5 0.1381
  • 207. 190 ROOTS OF POLYNOMIALS and the approximate errors can be evaluated by Eqs. (7.37) and (7.38), 0ea,r 0 5 ` 0.3558 20.6442 ` 100% 5 55.23% 0ea, s 0 5 ` 1.1381 0.1381 ` 100% 5 824.1% Next, the computation is repeated using the revised values for r and s. Applying Eqs. (7.32) and (7.36) yields b5 5 1 b4 5 24.1442 b3 5 5.5578 b2 5 22.0276 b1 5 21.8013 b0 5 2.1304 c5 5 1 c4 5 24.7884 c3 5 8.7806 c2 5 28.3454 c1 5 4.7874 Therefore, we must solve 28.3454¢r 1 8.7806¢s 5 1.8013 4.7874¢r 2 8.3454¢s 5 22.1304 for Dr 5 0.1331 and Ds 5 0.3316, which can be used to correct the root estimates as r 5 20.6442 1 0.1331 5 20.5111 Zea,r Z 5 26.0% s 5 0.1381 1 0.3316 5 0.4697 Zea,s Z 5 70.6% The computation can be continued, with the result that after four iterations the method converges on values of r 5 20.5 (Zea,rZ 5 0.063%) and s 5 0.5 (Zea,s Z 5 0.040%). Equation (7.39) can then be employed to evaluate the roots as x 5 20.5 6 2(20.5)2 1 4(0.5) 2 5 0.5,21.0 At this point, the quotient is the cubic equation f(x) 5 x3 2 4x2 1 5.25x 2 2.5 Bairstow’s method can be applied to this polynomial using the results of the previous step, r 5 20.5 and s 5 0.5, as starting guesses. Five iterations yield estimates of r 5 2 and s 5 21.249, which can be used to compute x 5 2 6 222 1 4(21.249) 2 5 1 6 0.499i At this point, the quotient is a first-order polynomial that can be directly evaluated by Eq. (7.40) to determine the fifth root: 2. Note that the heart of Bairstow’s method is the evaluation of the b’s and c’s via Eqs. (7.32) and (7.36). One of the primary strengths of the method is the concise way in which these recurrence relationships can be programmed. Figure 7.5 lists pseudocode to implement Bairstow’s method. The heart of the algo- rithm consists of the loop to evaluate the b’s and c’s. Also notice that the code to solve the simultaneous equations checks to prevent division by zero. If this is the case, the values of r and s are perturbed slightly and the procedure is begun again. In addition, the algorithm places a user-defined upper limit on the number of iterations (MAXIT) and should be designed to avoid division by zero while calculating the error estimates. Finally, the algorithm requires initial guesses for r and s (rr and ss in the code). If no prior knowledge of the roots exist, they can be set to zero in the calling program.
  • 208. 7.5 BAIRSTOW’S METHOD 191 (a) Bairstow Algorithm SUB Bairstow (a,nn,es,rr,ss,maxit,re,im,ier) DIMENSION b(nn), c(nn) r 5 rr; s 5 ss; n 5 nn ier 5 0; ea1 5 1; ea2 5 1 DO IF n , 3 OR iter $ maxit EXIT iter 5 0 DO iter 5 iter 1 1 b(n) 5 a(n) b(n 2 1) 5 a(n 2 1) 1 r * b(n) c(n) 5 b(n) c(n 2 1) 5 b(n 2 1) 1 r * c(n) DO i 5 n 2 2, 0, 21 b(i) 5 a(i) 1 r * b(i 1 1) 1 s * b(i 1 2) c(i) 5 b(i) 1 r * c(i 1 1) 1 s * c(i 1 2) END DO det 5 c(2) * c(2) 2 c(3) * c(1) IF det fi 0 THEN dr 5 (2b(1) * c(2) 1 b(0) * c(3))ydet ds 5 (2b(0) * c(2) 1 b(1) * c(1))ydet r 5 r 1 dr s 5 s 1 ds IF rfi0 THEN ea1 5 ABS(dryr) * 100 IF sfi0 THEN ea2 5 ABS(dsys) * 100 ELSE r 5 r 1 1 s 5 s 1 1 iter 5 0 END IF IF ea1 # es AND ea2 # es OR iter $ maxit EXIT END DO CALL Quadroot(r,s,r1,i1,r2,i2) re(n) 5 r1 im(n) 5 i1 re(n 2 1) 5 r2 im(n 2 1) 5 i2 n 5 n 2 2 DO i 5 0, n a(i) 5 b(i 1 2) END DO END DO IF iter , maxit THEN IF n 5 2 THEN r 5 2a(1)ya(2) s 5 2a(0)ya(2) CALL Quadroot(r,s,r1,i1,r2,i2) re(n) 5 r1 im(n) 5 i1 re(n 2 1) 5 r2 im(n 2 1) 5 i2 ELSE re(n) 5 2a(0)ya(1) im(n) 5 0 END IF ELSE ier 5 1 END IF END Bairstow (b) Roots of Quadratic Algorithm SUB Quadroot(r,s,r1,i1,r2,i2) disc 5 r ^ 2 1 4 * s IF disc . 0 THEN r1 5 (r 1 SQRT(disc))y2 r2 5 (r 2 SQRT(disc))y2 i1 5 0 i2 5 0 ELSE r1 5 ry2 r2 5 r1 i1 5 SQRT(ABS(disc))y2 i2 5 2i1 END IF END QuadRoot FIGURE 7.5 (a) Algorithm for implementing Bairstow’s method, along with (b) an algorithm to determine the roots of a quadratic.
  • 209. 192 ROOTS OF POLYNOMIALS S O F T W A R E 7.6 OTHER METHODS Other methods are available to locate the roots of polynomials. The Jenkins-Traub method is commonly used in software libraries. It is fairly complicated, and a good starting point to understanding it is found in Ralston and Rabinowitz (1978). Laguerre’s method, which approximates both real and complex roots and has cubic convergence, is among the best approaches. A complete discussion can be found in Householder (1970). In addition, Press et al. (2007) present a nice algorithm to imple- ment the method. 7.7 ROOT LOCATION WITH SOFTWARE PACKAGES Software packages have great capabilities for locating roots. In this section, we will give you a taste of some of the more useful ones. 7.7.1 Excel A spreadsheet like Excel can be used to locate a root by trial and error. For example, if you want to find a root of f(x) 5 x 2 cosx first, you can enter a value for x in a cell. Then set up another cell for f(x) that would obtain its value for x from the first cell. You can then vary the x cell until the f(x) cell approaches zero. This process can be further enhanced by using Excel’s plotting capa- bilities to obtain a good initial guess (Fig. 7.6). Although Excel does facilitate a trial-and-error approach, it also has two standard tools that can be employed for root location: Goal Seek and Solver. Both these tools can be employed to systematically adjust the initial guesses. Goal Seek is expressly used to drive an equation to a value (in our case, zero) by varying a single parameter. FIGURE 7.6 A spreadsheet set up to determine the root of f(x) 5 x 2 cos x by trial and error. The plot is used to obtain a good initial guess.
  • 210. 7.7 ROOT LOCATION WITH SOFTWARE PACKAGES 193 EXAMPLE 7.4 Using Excel’s Goal Seek Tool to Locate a Single Root Problem Statement. Employ Goal Seek to determine the root of the transcendental function f(x) 5 x 2 cosx Solution. As in Fig. 7.6, the key to solving a single equation with Excel is creating a cell to hold the value of the function in question and then making the value dependent on another cell. Once this is done, the selection Goal Seek is chosen from the What-If Analysis button on your Data ribbon. At this point a dialogue box will be displayed, asking you to set a cell to a value by changing another cell. For the example, suppose that as in Fig. 7.6 your guess is entered in cell A11 and your function result in cell B11. The Goal Seek dialogue box would be filled out as When the OK button is selected, a message box displays the results, The cells on the spreadsheet would also be modified to the new values (as shown in Fig. 7.6). The Solver tool is more sophisticated than Goal Seek in that (1) it can vary several cells simultaneously and (2) along with driving a target cell to a value, it can minimize and maximize its value. The next example illustrates how it can be used to solve a system of nonlinear equations. EXAMPLE 7.5 Using Excel’s Solver for a Nonlinear System Problem Statement. Recall that in Sec. 6.6 we obtained the solution of the following set of simultaneous equations, u(x, y) 5 x2 1 xy 2 10 5 0 y(x, y) 5 y 1 3xy2 2 57 5 0
  • 211. 194 ROOTS OF POLYNOMIALS S O F T W A R E Note that a correct pair of roots is x 5 2 and y 5 3. Use Solver to determine the roots using initial guesses of x 5 1 and y 5 3.5. Solution. As shown below, two cells (B1 and B2) can be created to hold the guesses for x and y. The function values themselves, u(x, y) and y(x, y) can then be entered into two other cells (B3 and B4). As can be seen, the initial guesses result in function values that are far from zero. Next, another cell can be created that contains a single value reflecting how close both functions are to zero. One way to do this is to sum the squares of the function values. This is done and the result entered in cell B6. If both functions are at zero, this function should also be at zero. Further, using the squared functions avoids the possibility that both func- tions could have the same nonzero value, but with opposite signs. For this case, the target cell (B6) would be zero, but the roots would be incorrect. Once the spreadsheet is created, the selection Solver is chosen from the Data ribbon.1 At this point a dialogue box will be displayed, querying you for pertinent information. The pertinent cells of the Solver dialogue box would be filled out as 1 Note that you may have to install Solver by choosing Office, Excel Options, Add-Ins. Select Excel Add-Ins from the Manage drop-down box at the bottom of the Excel options menu and click Go. Then, check the Solver box. The Solver then should be installed and a button to access it should appear on your Data ribbon.
  • 212. 7.7 ROOT LOCATION WITH SOFTWARE PACKAGES 195 When the OK button is selected, a dialogue box will open with a report on the success of the operation. For the present case, the Solver obtains the correct solution: It should be noted that the Solver can fail. Its success depends on (1) the condition of the system of equations and/or (2) the quality of the initial guesses. Thus, the suc- cessful outcome of the previous example is not guaranteed. Despite this, we have found Solver useful enough to make it a feasible option for quickly obtaining roots in a wide range of engineering applications. 7.7.2 MATLAB As summarized in Table 7.1, MATLAB software is capable of locating roots of single algebraic and transcendental equations. It is superb at manipulating and locating the roots of polynomials. The fzero function is designed to locate one root of a single function. A simplified representation of its syntax is fzero(f,x0,options) where f is the function you are analyzing, x0 is the initial guess, and options are the optimization parameters (these are changed using the function optimset). If options are omitted, default values are employed. Note that one or two guesses can be employed. If two guesses are employed, they are assumed to bracket a root. The following example illustrates how fzero can be used. TABLE 7.1 Common functions in MATLAB related to root location and polynomial manipulation. Function Description fzero Root of single function. roots Find polynomial roots. poly Construct polynomial with specified roots. polyval Evaluate polynomial. polyvalm Evaluate polynomial with matrix argument. residue Partial-fraction expansion (residues). polyder Differentiate polynomial. conv Multiply polynomials. deconv Divide polynomials.
  • 213. 196 ROOTS OF POLYNOMIALS S O F T W A R E EXAMPLE 7.6 Using MATLAB for Root Location Problem Statement. Use the MATLAB function fzero to find the roots of f(x) 5 x10 2 1 within the interval xl 5 0 and xu 5 4. Obviously two roots occur at 21 and 1. Recall that in Example 5.6, we used the false-position method with initial guesses of 0 and 1.3 to determine the positive root. Solution. Using the same initial conditions as in Example 5.6, we can use MATLAB to determine the positive root as in x0=[0 1.3]; x=fzero(@(x) x^10–1,x0) x = 1 In a similar fashion, we can use initial guesses of 21.3 and 0 to determine the negative root, x0=[21.3 0]; x=fzero(@(x) x^10–1,x0) x = –1 We can also employ a single guess. An interesting case would be to use an initial guess of 0, x0=0; x=fzero(@(x) x^10–1,x0) x = –1 Thus, for this guess, the underlying algorithm happens to home in on the negative root. The use of optimset can be illustrated by using it to display the actual iterations as the solution progresses: x0=0; option=optimset('DISP','ITER'); x=fzero(@(x) x^10–1,x0,option) Func–count x f(x) Procedure 1 0 –1 initial 2 –0.0282843 –1 search 3 0.0282843 –1 search 4 –0.04 –1 search • • • 21 0.64 –0.988471 search 22 –0.905097 –0.631065 search
  • 214. 7.7 ROOT LOCATION WITH SOFTWARE PACKAGES 197 23 0.905097 –0.631065 search 24 –1.28 10.8059 search Looking for a zero in the interval [–1.28, 0.9051] 25 0.784528 –0.911674 interpolation 26 –0.247736 –0.999999 bisection 27 –0.763868 –0.932363 bisection 28 –1.02193 0.242305 bisection 29 –0.968701 –0.27239 interpolation 30 –0.996873 –0.0308299 interpolation 31 –0.999702 –0.00297526 interpolation 32 –1 5.53132e–006 interpolation 33 –1 –7.41965e–009 interpolation 34 –1 –1.88738e–014 interpolation 35 –1 0 interpolation Zero found in the interval: [–1.28, 0.9051]. x 5 21 These results illustrate the strategy used by fzero when it is provided with a single guess. First, it searches in the vicinity of the guess until it detects a sign change. Then it uses a combination of bisection and interpolation to home in on the root. The interpolation involves both the secant method and inverse quadratic interpolation (recall Sec. 7.4). It should be noted that the fzero algorithm has more to it than this basic description might imply. You can consult Press et al. (2007) for additional details. EXAMPLE 7.7 Using MATLAB to Manipulate and Determine the Roots of Polynomials Problem Statement. Explore how MATLAB can be employed to manipulate and de- termine the roots of polynomials. Use the following equation from Example 7.3, f5(x) 5 x5 2 3.5x4 1 2.75x3 1 2.125x2 2 3.875x 1 1.25 (E7.7.1) which has three real roots: 0.5, 21.0, and 2, and one pair of complex roots: 1 6 0.5i. Solution. Polynomials are entered into MATLAB by storing the coefficients as a vector. For example, at the MATLAB prompt (. .) typing and entering the follow line stores the coefficients in the vector a, a=[1 –3.5 2.75 2.125 –3.875 1.25]; We can then proceed to manipulate the polynomial. For example, we can evaluate it at x 5 1 by typing polyval(a,1) with the result 1(1)5 2 3.5(1)4 1 2.75(1)3 1 2.125(1)2 2 3.875(1) 1 1.25 5 20.25, ans = –0.2500
  • 215. 198 ROOTS OF POLYNOMIALS S O F T W A R E We can evaluate the derivative f9(x) 5 5x4 2 14x3 1 8.25x2 1 4.25x 2 3.875 by polyder(a) ans = 5.0000 –14.0000 8.2500 4.2500 –3.8750 Next, let us create a quadratic polynomial that has roots corresponding to two of the original roots of Eq. (E7.7.1): 0.5 and 21. This quadratic is (x 2 0.5)(x 1 1) 5 x2 1 0.5x 2 0.5 and can be entered into MATLAB as the vector b, b=[1 0.5 –0.5]; We can divide this polynomial into the original polynomial by [d,e]5deconv(a,b) with the result being a quotient (a third-order polynomial d) and a remainder (e), d = 1.0000 –4.0000 5.2500 –2.5000 e = 0 0 0 0 0 0 Because the polynomial is a perfect divisor, the remainder polynomial has zero coeffi- cients. Now, the roots of the quotient polynomial can be determined as roots(d) with the expected result that the remaining roots of the original polynomial (E7.7.1) are found, ans = 2.0000 1.0000 + 0.5000i 1.0000 – 0.5000i We can now multiply d by b to come up with the original polynomial, conv(d,b) ans = 1.0000 –3.5000 2.7500 2.1250 –3.8750 1.2500 Finally, we can determine all the roots of the original polynomial by r5roots(a) r = –1.0000 2.0000 1.0000 + 0.5000i 1.0000 – 0.5000i 0.5000
  • 216. 7.7 ROOT LOCATION WITH SOFTWARE PACKAGES 199 7.7.3 Mathcad Mathcad has a numeric mode function called root that can be used to solve an equation of a single variable. The method requires that you supply a function f(x) and either an initial guess or a bracket. When a single guess value is used, root uses the Secant and Müller methods. In the case where two guesses that bracket a root are supplied, it uses a combination of the Ridder method (a variation of false position) and Brent’s method. It iterates until the magnitude of f(x) at the proposed root is less than the predefined value of TOL. The Mathcad imple- mentation has similar advantages and disadvantages as conventional root location methods such as issues concerning the quality of the initial guess and the rate of convergence. Mathcad can find all the real or complex roots of polynomials with polyroots. This nu- meric or symbolic mode function is based on the Laguerre method. This function does not require initial guesses, and all the roots are returned at the same time. Mathcad contains a numeric mode function called Find that can be used to solve up to 50 simultaneous nonlinear algebraic equations. The Find function chooses an appropriate method from a group of available methods, depending on whether the problem is linear or nonlinear, and other attributes. Acceptable values for the solution may be unconstrained or constrained to fall within specified limits. If Find fails to locate a solution that satisfies the equations and constraints, it returns the error message “did not find solution.” However, Mathcad also contains a similar function called Minerr. This function gives solution results that mini- mize the errors in the constraints even when exact solutions cannot be found. Thus, the prob- lem of solving for the roots of nonlinear equations is closely related to both optimization and nonlinear least squares. These areas and Minerr are covered in detail in Parts Four and Five. Figure 7.7 shows a typical Mathcad worksheet. The menus at the top provide quick access to common arithmetic operators and functions, various two- and three-dimensional FIGURE 7.7 Mathcad screen to find the root of a single equation.
  • 217. 200 ROOTS OF POLYNOMIALS S O F T W A R E plot types, and the environment to create subprograms. Equations, text, data, or graphs can be placed anywhere on the screen. You can use a variety of fonts, colors, and styles to construct worksheets with almost any design and format that pleases you. Consult the summary of the Mathcad User’s manual in Appendix C or the full manual available from MathSoft. Note that in all our Mathcad examples, we have tried to fit the entire Mathcad session onto a single screen. You should realize that the graph would have to be placed below the commands to work properly. Let’s start with an example that solves for the root of f(x) 5 x 2 cos x. The first step is to enter the function. This is done by typing f(x): which is automatically converted to f(x):5 by Mathcad. The :5 is called the definition symbol. Next an initial guess is input in a similar manner using the definition symbol. Now, soln is defined as root(f(x), x), which invokes the secant method with a starting value of 1.0. Iteration is continued until f(x) evaluated at the proposed root is less than TOL. The value of TOL is set from the Math/Options pull down menu. Finally the value of soln is displayed using a normal equal sign (5). The number of significant figures is set from the Format/Number pull down menu. The text labels and equation definitions can be placed anywhere on the screen in a number of different fonts, styles, sizes, and colors. The graph can be placed anywhere on the worksheet by clicking to the desired location. This places a red cross hair at that location. Then use the Insert/Graph/X-Y Plot pull down menu to place an empty plot on the worksheet with place-holders for the expressions to be graphed and for the ranges of the x and y axes. Simply type f(z) in the placeholder on the y axis and 210 and 10 for the z-axis range. Mathcad does all the rest to produce the graph shown in Fig. 7.7. Once the graph has been created you can use the Format/Graph/X-Y Plot pull down menu to vary the type of graph; change the color, type, and weight of the trace of the function; and add titles, labels and other features. Figure 7.8 shows how Mathcad can be used to find the roots of a polynomial using the polyroots function. First, p(x) and v are input using the :5 definition symbol. Note that v is a vector that contains the coefficients of the polynomial starting with zero-order term and ending in this case with the third-order term. Next, r is defined (using :5) as polyroots(v), which invokes the Laguerre method. The roots contained in r are displayed as rT using a normal equal sign (5). Next, a plot is constructed in a manner similar to the above, except that now two range variables, x and j, are used to define the range of the x axis and the location of the roots. The range variable for x is constructed by typing x and then “:” (which appears as :5) and then 24, and then “,” and then 23.99, and then “;” (which is transformed into .. by Mathcad), and finally 4. This creates a vector of values of x ranging from 24 to 4 with an increment of 0.01 for the x axis with corresponding values for p(x) on the y axis. The j range variable is used to create three values for r and p(r) that are plotted as individual small circles. Note that again, in our effort to fit the entire Mathcad session onto a single screen, we have placed the graph above the commands. You should realize that the graph would have to be below the commands to work properly. The last example shows the solution of a system of nonlinear equations using a Mathcad Solve Block (Fig. 7.9). The process begins with using the definition symbol to create initial guesses for x and y. The word Given then alerts Mathcad that what follows is a system of equations. Then comes the equations and inequalities (not used here). Note that for this application Mathcad requires the use of a symbolic equal sign typed as [Ctrl]5 or , and . to separate the left and right sides of an equation. Now, the variable vec is defined as Find (x,y) and the value of vec is shown using an equal sign.
  • 218. 7.7 ROOT LOCATION WITH SOFTWARE PACKAGES 201 FIGURE 7.8 Mathcad screen to solve for roots of polynomial. FIGURE 7.9 Mathcad screen to solve a system of nonlinear equations.
  • 219. 202 ROOTS OF POLYNOMIALS PROBLEMS 7.1 Divide a polynomial f(x) 5 x4 2 7.5x3 1 14.5x2 1 3x 2 20 by the monomial factor x 2 2. Is x 5 2 a root? 7.2 Divide a polynomial f(x) 5 x5 2 5x4 1 x3 2 6x2 2 7x 1 10 by the monomial factor x 2 2. 7.3 Use Müller’s method to determine the positive real root of (a) f(x) 5 x3 1 x2 2 4x 2 4 (b) f(x) 5 x3 2 0.5x2 1 4x 2 2 7.4 Use Müller’s method or MATLAB to determine the real and complex roots of (a) f(x) 5 x3 2 x2 1 2x 2 2 (b) f(x) 5 2x4 1 6x2 1 8 (c) f(x) 5 x4 2 2x3 1 6x2 2 2x 1 5 7.5 Use Bairstow’s method to determine the roots of (a) f(x) 5 22 1 6.2x 2 4x2 1 0.7x3 (b) f(x) 5 9.34 2 21.97x 1 16.3x2 2 3.704x3 (c) f(x) 5 x4 2 2x3 1 6x2 2 2x 1 5 7.6 Develop a program to implement Müller’s method. Test it by duplicating Example 7.2. 7.7 Use the program developed in Prob. 7.6 to determine the real roots of Prob. 7.4a. Construct a graph (by hand or with a software package) to develop suitable starting guesses. 7.8 Develop a program to implement Bairstow’s method. Test it by duplicating Example 7.3. 7.9 Use the program developed in Prob. 7.8 to determine the roots of the equations in Prob. 7.5. 7.10 Determine the real root of x3.5 5 80 with Excel, MATLAB or Mathcad. 7.11 The velocity of a falling parachutist is given by y 5 gm c (1 2 e2(cym)t ) where g 5 9.81 m/s2 . For a parachutist with a drag coefficient c 5 15 kg/s, compute the mass m so that the velocity is y 5 35 m/s at t 5 8 s. Use Excel, MATLAB or Mathcad to determine m. 7.12 Determine the roots of the simultaneous nonlinear equations y 5 2x2 1 x 1 0.75 y 1 5xy 5 x2 Employ initial guesses of x 5 y 5 1.2 and use the Solver tool from Excel or a software package of your choice. 7.13 Determine the roots of the simultaneous nonlinear equations (x 2 4)2 1 (y 2 4)4 5 5 x2 1 y2 5 16 Use a graphical approach to obtain your initial guesses. Determine refined estimates with the Solver tool from Excel or a software package of your choice. 7.14 Perform the identical MATLAB operations as those in Example 7.7 or use a software package of your choice to find all the roots of the polynomial f(x) 5 (x 1 2)(x 1 5)(x 2 6)(x 2 4)(x 2 8) Note that the poly function can be used to convert the roots to a polynomial. 7.15 Use MATLAB or Mathcad to determine the roots for the equations in Prob. 7.5. 7.16 A two-dimensional circular cylinder is placed in a high-speed uniform flow. Vortices shed from the cylinder at a constant frequency, and pressure sensors on the rear surface of the cylinder detect this frequency by calculating how often the pressure oscil- lates. Given three data points, use Müller’s method to find the time where the pressure was zero. Time 0.60 0.62 0.64 Pressure 20 50 60 7.17 When trying to find the acidity of a solution of magne- sium hydroxide in hydrochloric acid, we obtain the following equation A(x) 5 x3 1 3.5x2 2 40 where x is the hydronium ion concentration. Find the hydronium ion concentration for a saturated solution (acidity equals zero) using two different methods in MATLAB (for example, graphically and the roots function). 7.18 Consider the following system with three unknowns a, u, and y: u2 2 2y2 5 a2 u 1 y 5 2 a2 2 2a 2 u 5 0 Solve for the real values of the unknowns using: (a) the Excel Solver and (b) a symbolic manipulator software package. 7.19 In control systems analysis, transfer functions are developed that mathematically relate the dynamics of a system’s input to its output. A transfer function for a robotic positioning system is given by G(s) 5 C(s) N(s) 5 s3 1 9s2 1 26s 1 24 s4 1 15s3 1 77s2 1 153s 1 90 where G(s) 5 system gain, C(s) 5 system output, N(s) 5 system input, and s 5 Laplace transform complex frequency. Use a
  • 220. PROBLEMS 203 numerical technique to find the roots of the numerator and denomi- nator and factor these into the form G(s) 5 (s 1 a1)(s 1 a2)(s 1 a3) (s 1 b1)(s 1 b2)(s 1 b3)(s 1 b4) where ai and bi 5 the roots of the numerator and denominator, respectively. 7.20 Develop an M-file function for bisection in a similar fashion to Fig. 5.10. Test the function by duplicating the computations from Examples 5.3 and 5.4. 7.21 Develop an M-file function for the false-position method. The structure of your function should be similar to the bisection algorithm outlined in Fig. 5.10. Test the program by duplicating Example 5.5. 7.22 Develop an M-file function for the Newton-Raphson method based on Fig. 6.4 and Sec. 6.2.3. Along with the initial guess, pass the function and its derivative as arguments. Test it by duplicating the computation from Example 6.3. 7.23 Develop an M-file function for the secant method based on Fig. 6.4 and Sec. 6.3.2. Along with the two initial guesses, pass the function as an argument. Test it by duplicating the computation from Example 6.6. 7.24 Develop an M-file function for the modified secant method based on Fig. 6.4 and Sec. 6.3.2. Along with the initial guess and the perturbation fraction, pass the function as an argument. Test it by duplicating the computation from Example 6.8.
  • 221. 204 8 C H A P T E R 8 Case Studies: Roots of Equations The purpose of this chapter is to use the numerical procedures discussed in Chaps. 5, 6, and 7 to solve actual engineering problems. Numerical techniques are important for practical applications because engineers frequently encounter problems that cannot be approached using analytical techniques. For example, simple mathematical models that can be solved analytically may not be applicable when real problems are involved. Thus, more complicated models must be employed. For these cases, it is appropriate to imple- ment a numerical solution on a computer. In other situations, engineering design prob- lems may require solutions for implicit variables in complicated equations. The following case studies are typical of those that are routinely encountered during upper-class courses and graduate studies. Furthermore, they are representative of prob- lems you will address professionally. The problems are drawn from the four major disciplines of engineering: chemical, civil, electrical, and mechanical. These applications also serve to illustrate the trade-offs among the various numerical techniques. The first application, taken from chemical engineering, provides an excellent example of how root-location methods allow you to use realistic formulas in engineering practice. In addition, it also demonstrates how the efficiency of the Newton-Raphson technique is used to advantage when a large number of root-location computations is required. The following engineering design problems are taken from civil, electrical, and mechan- ical engineering. Section 8.2 uses bisection to determine changes in rainwater chemistry due to increases in atmospheric carbon dioxide. Section 8.3 shows how the roots of transcendental equations can be used in the design of an electrical circuit. Sections 8.2 and 8.3 also illustrate how graphical methods provide insight into the root-location process. Finally, Sec. 8.4 uses a variety of numerical methods to compute the friction factor for fluid flow in a pipe. 8.1 IDEAL AND NONIDEAL GAS LAWS (CHEMICAL/BIO ENGINEERING) Background. The ideal gas law is given by pV 5 nRT (8.1) where p is the absolute pressure, V is the volume, n is the number of moles, R is the universal gas constant, and T is the absolute temperature. Although this equation is
  • 222. 8.1 IDEAL AND NONIDEAL GAS LAWS 205 widely used by engineers and scientists, it is accurate over only a limited range of pres- sure and temperature. Furthermore, Eq. (8.1) is more appropriate for some gases than for others. An alternative equation of state for gases is given by ap 1 a y2 b(y 2 b) 5 RT (8.2) known as the van der Waals equation, where y 5 V/n is the molal volume and a and b are empirical constants that depend on the particular gas. A chemical engineering design project requires that you accurately estimate the molal volume (y) of both carbon dioxide and oxygen for a number of different temperature and pressure combinations so that appropriate containment vessels can be selected. It is also of interest to examine how well each gas conforms to the ideal gas law by comparing the molal volume as calculated by Eqs. (8.1) and (8.2). The following data are provided: R 5 0.082054 L atm/(mol K) a 5 3.592 b 5 0.04267 f carbon dioxide a 5 1.360 b 5 0.03183 f oxygen The design pressures of interest are 1, 10, and 100 atm for temperature combinations of 300, 500, and 700 K. Solution. Molal volumes for both gases are calculated using the ideal gas law, with n 5 1. For example, if p 5 1 atm and T 5 300 K, y 5 V n 5 RT p 5 0.082054 L atm mol K 300 K 1 atm 5 24.6162 L/mol These calculations are repeated for all temperature and pressure combinations and presented in Table 8.1. TABLE 8.1 Computations of molal volume. Molal Volume Molal Volume Molal Volume (van der Waals) (van der Waals) Temperature, Pressure, (Ideal Gas Law), Carbon Dioxide, Oxygen, K atm L/mol L/mol L/mol 300 1 24.6162 24.5126 24.5928 10 2.4616 2.3545 2.4384 100 0.2462 0.0795 0.2264 500 1 41.0270 40.9821 41.0259 10 4.1027 4.0578 4.1016 100 0.4103 0.3663 0.4116 700 1 57.4378 57.4179 57.4460 10 5.7438 5.7242 5.7521 100 0.5744 0.5575 0.5842
  • 223. 206 CASE STUDIES: ROOTS OF EQUATIONS The computation of molal volume from the van der Waals equation can be accom- plished using any of the numerical methods for finding roots of equations discussed in Chaps. 5, 6, and 7, with f(y) 5 ap 1 a y2 b(y 2 b) 2 RT (8.3) In this case, the derivative of f(y) is easy to determine and the Newton-Raphson method is convenient and efficient to implement. The derivative of f(y) with respect to y is given by f¿(y) 5 p 2 a y2 1 2ab y3 (8.4) The Newton-Raphson method is described by Eq. (6.6): yi11 5 yi 2 f(yi) f¿(yi) which can be used to estimate the root. For example, using the initial guess of 24.6162, the molal volume of carbon dioxide at 300 K and 1 atm is computed as 24.5126 L/mol. This result was obtained after just two iterations and has an ea of less than 0.001 percent. Similar computations for all combinations of pressure and temperature for both gases are presented in Table 8.1. It is seen that the results for the ideal gas law differ from those for the van der Waals equation for both gases, depending on specific values for p and T. Furthermore, because some of these results are significantly different, your design of the containment vessels would be quite different, depending on which equation of state was used. In this case, a complicated equation of state was examined using the Newton-Raphson method. The results varied significantly from the ideal gas law for several cases. From a practical standpoint, the Newton-Raphson method was appropriate for this application because f9(y) was easy to calculate. Thus, the rapid convergence properties of the Newton-Raphson method could be exploited. In addition to demonstrating its power for a single computation, the present design problem also illustrates how the Newton-Raphson method is especially attractive when numerous computations are required. Because of the speed of digital computers, the efficiency of various numerical methods for most roots of equations is indistinguishable for a single computation. Even a 1-s difference between the crude bisection approach and the efficient Newton-Raphson does not amount to a significant time loss when only one computation is performed. However, suppose that millions of root evaluations are required to solve a problem. In this case, the efficiency of the method could be a decid- ing factor in the choice of a technique. For example, suppose that you are called upon to design an automatic computerized control system for a chemical production process. This system requires accurate estimates of molal volumes on an essentially continuous basis to properly manufacture the final product. Gauges are installed that provide instantaneous readings of pressure and tempera- ture. Evaluations of y must be obtained for a variety of gases that are used in the process. For such an application, bracketing methods such as bisection or false position would probably be too time-consuming. In addition, the two initial guesses that are required for
  • 224. 8.2 GREENHOUSE GASES AND RAINWATER 207 these approaches may also interject a critical delay in the procedure. This shortcoming is relevant to the secant method, which also needs two initial estimates. In contrast, the Newton-Raphson method requires only one guess for the root. The ideal gas law could be employed to obtain this guess at the initiation of the process. Then, assuming that the time frame is short enough so that pressure and temperature do not vary wildly between computations, the previous root solution would provide a good guess for the next application. Thus, the close guess that is often a prerequisite for con- vergence of the Newton-Raphson method would automatically be available. All the above considerations would greatly favor the Newton-Raphson technique for such problems. 8.2 GREENHOUSE GASES AND RAINWATER (CIVIL/ENVIRONMENTAL ENGINEERING) Background. Civil engineering is a broad field that includes such diverse areas as structural, geotechnical, transportation, water-resources, and environmental engineering. The last area has traditionally dealt with pollution control. However, in recent years, environmental engineers (as well as chemical engineers) have addressed broader problems such as climate change. It is well documented that the atmospheric levels of several greenhouse gases have been increasing over the past 50 years. For example, Fig. 8.1 shows data for the partial pressure of carbon dioxide (CO2) collected at Mauna Loa, Hawaii, from 1958 through 2003. The trend in the data can be nicely fit with a quadratic polynomial (in Part Five, we will learn how to determine such polynomials), pCO2 5 0.011825(t 2 1980.5)2 1 1.356975(t 2 1980.5) 1 339 where pCO2 5 the partial pressure of CO2 in the atmosphere [ppm]. The data indicate that levels have increased over 19% during the period from 315 to 376 ppm. FIGURE 8.1 Average annual partial pressures of atmospheric carbon dioxide (ppm) measured at Mauna Loa, Hawaii. 310 330 350 370 1950 1960 1970 1980 1990 2000 2010 pCO2 (ppm)
  • 225. 208 CASE STUDIES: ROOTS OF EQUATIONS Aside from global warming, greenhouse gases can also influence atmospheric chemistry. One question that we can address is how the carbon dioxide trend is affecting the pH of rainwater. Outside of urban and industrial areas, it is well documented that carbon dioxide is the primary determinant of the pH of the rain. pH is the measure of the activity of hydrogen ions and, therefore, its acidity. For dilute aqueous solutions, it can be computed as pH 5 2log10[H1 ] (8.5) where [H1 ] is the molar concentration of hydrogen ions. The following five nonlinear system of equations govern the chemistry of rainwater, K1 5 106 [H1 ][HCO2 3 ] KH pCO2 (8.6) K2 5 [H1 ][CO22 3 ] [HCO2 3 ] (8.7) Kw 5 [H1 ][OH2 ] (8.8) cT 5 KH pCO2 106 1 [HCO2 3 ] 1 [CO22 3 ] (8.9) 0 5 [HCO2 3 ] 1 2[CO22 3 ] 1 [OH2 ] 2 [H1 ] (8.10) where KH 5 Henry’s constant, and K1, K2, and Kw are equilibrium coefficients. The five unknowns in this system of five nonlinear equations are cT 5 total inorganic carbon, [HCO2 3 ] 5 bicarbonate, [CO22 3 ] 5 carbonate, [H1 ] 5 hydrogen ion, and [OH2 ] 5 hydroxyl ion. Notice how the partial pressure of CO2 shows up in Eqs. (8.6) and (8.9). Use these equations to compute the pH of rainwater given that KH 5 1021.46 , K1 5 1026.3 , K2 5 10210.3 , and Kw 5 10214 . Compare the results in 1958 when the pCO2 was 315 and in 2003 when it was 375 ppm. When selecting a numerical method for your computation, consider the following: You know with certainty that the pH of rain in pristine areas always falls between 2 and 12. You also know that your measurement devices can only measure pH to two places of decimal precision. Solution. There are a variety of ways to solve this nonlinear system of five equations. One way is to eliminate unknowns by combining them to produce a single function that only depends on [H1 ]. To do this, first solve Eqs. (8.6) and (8.7) for [HCO2 3 ] 5 K1 106 [H1 ] KH pCO2 (8.11) [CO22 3 ] 5 K2[HCO2 3 ] [H1 ] (8.12) Substitute Eq. (8.11) into (8.12) [CO22 3 ] 5 K2K1 106 [H1 ]2 KH pCO2 (8.13)
  • 226. 8.3 DESIGN OF AN ELECTRIC CIRCUIT 209 Equations (8.11) and (8.13) can be substituted along with Eq. (8.8) into Eq. (8.10) to give 0 5 K1 106 [H1 ] KH pCO2 1 2 K2K1 106 [H1 ]2 KH pCO2 1 Kw [H1 ] 2 [H1 ] (8.14) Although it might not be apparent, this result is a third-order polynomial in [H1 ]. Thus, its root can be used to compute the pH of the rainwater. Now we must decide which numerical method to employ to obtain the solution. There are two reasons why bisection would be a good choice. First, the fact that the pH always falls within the range from 2 to 12 provides us with two good initial guesses. Second, because the pH can only be measured to two decimal places of precision, we will be satisfied with an absolute error of Ea,d 5 0.005. Remember that given an initial bracket and the desired relative error, we can compute the number of iterations a priori. Using Eq. (5.5), the result is n 5 log2(10)0.005 5 10.9658. Thus, eleven iterations of bisection will produce the desired precision. If this is done, the result for 1958 will be a pH of 5.6279 with a relative error of 0.0868%. We can be confident that the rounded result of 5.63 is correct to two decimal places. This can be verified by performing another run with more iterations. For example, if we perform 35 iterations, a result of 5.6304 is obtained with an approximate relative error of εa 5 5.17 3 1029 %. The same calculation can be repeated for the 2003 condi- tions to give pH 5 5.59 with εa 5 0.0874%. Interestingly, these results indicate that the 19% rise in atmospheric CO2 levels has produced only a 0.67% drop in pH. Although this is certainly true, remember that the pH represents a logarithmic scale as defined by Eq. (8.5). Consequently, a unit drop in pH represents a 10-fold increase in hydrogen ion. The concentration can be computed as [H1 ] 5 102pH and the resulting percent change can be calculated as 9.1%. Therefore, the hydrogen ion concentration has increased about 9%. There is quite a lot of controversy related to the true significance of the greenhouse gas trends. However, regardless of the ultimate implications, it is quite sobering to realize that something as large as our atmosphere has changed so much over a relatively short time period. This case study illustrates how numerical methods can be employed to analyze and interpret such trends. Over the coming years, engineers and scientists can hopefully use such tools to gain increased understanding and help rationalize the debate over their ramifications. 8.3 DESIGN OF AN ELECTRIC CIRCUIT (ELECTRICAL ENGINEERING) Background. Electrical engineers often use Kirchhoff’s laws to study the steady-state (not time-varying) behavior of electric circuits. Such steady-state behavior will be exam- ined in Sec. 12.3. Another important problem involves circuits that are transient in nature where sudden temporal changes take place. Such a situation occurs following the closing of the switch in Fig. 8.2. In this case, there will be a period of adjustment following the closing of the switch as a new steady state is reached. The length of this adjustment period is closely related to the storage properties of the capacitor and the inductor. Energy storage may oscillate between these two elements during a transient period. However, resistance in the circuit will dissipate the magnitude of the oscillations.
  • 227. 210 CASE STUDIES: ROOTS OF EQUATIONS The flow of current through the resistor causes a voltage drop (VR) given by VR 5 iR where i 5 the current and R 5 the resistance of the resistor. When R and i have units of ohms and amperes, respectively, VR has units of volts. Similarly, an inductor resists changes in current, such that the voltage drop VL across it is VL 5 L di dt where L 5 the inductance. When L and i have units of henrys and amperes, respectively, VL has units of volts and t has units of seconds. The voltage drop across the capacitor (VC) depends on the charge (q) on it: VC 5 q C (8.15) where C 5 the capacitance. When the charge is expressed in units of coulombs, the unit of C is the farad. Kirchhoff’s second law states that the algebraic sum of voltage drops around a closed circuit is zero. After the switch is closed we have L di dt 1 Ri 1 q C 5 0 (8.16) However, the current is related to the charge according to i 5 dq dt (8.17) Therefore, L d2 q dt2 1 R dq dt 1 1 C q 5 0 (8.18) This is a second-order linear ordinary differential equation that can be solved using the methods of calculus. This solution is given by q(t) 5 q0e2Rty(2L) cos c B 1 LC 2 a R 2L b 2 td (8.19) FIGURE 8.2 An electric circuit. When the switch is closed, the current will undergo a series of oscillations until a new steady state is reached. Switch Resistor Capacitor – + V0 i – + Battery Inductor
  • 228. 8.3 DESIGN OF AN ELECTRIC CIRCUIT 211 where at t 5 0, q 5 q0 5 V0C, and V0 5 the voltage from the charging battery. Equation (8.19) describes the time variation of the charge on the capacitor. The solution q(t) is plotted in Fig. 8.3. A typical electrical engineering design problem might involve determining the proper resistor to dissipate energy at a specified rate, with known values for L and C. For this prob- lem, assume the charge must be dissipated to 1 percent of its original value (qq0 5 0.01) in t 5 0.05 s, with L 5 5 H and C 5 1024 F. Solution. It is necessary to solve Eq. (8.19) for R, with known values of q, q0, L, and C. However, a numerical approximation technique must be employed because R is an implicit variable in Eq. (8.19). The bisection method will be used for this purpose. The other methods discussed in Chaps. 5 and 6 are also appropriate, although the Newton- Raphson method might be deemed inconvenient because the derivative of Eq. (8.19) is a little cumbersome. Rearranging Eq. (8.19), f(R) 5 e2Rty(2L) cos c B 1 LC 2 a R 2L b 2 td 2 q q0 or using the numerical values given, f(R) 5 e20.005R cos[22000 2 0.01R2 (0.05)] 2 0.01 (8.20) Examination of this equation suggests that a reasonable initial range for R is 0 to 400 V (because 2000 2 0.01R2 must be greater than zero). Figure 8.4, a plot of Eq. (8.20), confirms this. Twenty-one iterations of the bisection method give R 5 328.1515 V, with an error of less than 0.0001 percent. FIGURE 8.3 The charge on a capacitor as a function of time following the closing of the switch in Fig. 8.2. q(t) q0 Time FIGURE 8.4 Plot of Eq. (8.20) used to obtain initial guesses for R that bracket the root. f (R) R 0.0 –0.2 –0.4 –0.6 200 Root ⯝ 325 400
  • 229. 212 CASE STUDIES: ROOTS OF EQUATIONS Thus, you can specify a resistor with this rating for the circuit shown in Fig. 8.2 and expect to achieve a dissipation performance that is consistent with the requirements of the problem. This design problem could not be solved efficiently without using the numerical methods in Chaps. 5 and 6. 8.4 PIPE FRICTION (MECHANICAL/AEROSPACE ENGINEERING) Background. Determining fluid flow through pipes and tubes has great relevance in many areas of engineering and science. In mechanical and aerospace engineering, typical applications include the flow of liquids and gases through cooling systems. The resistance to flow in such conduits is parameterized by a dimensionless number called the friction factor. For turbulent flow, the Colebrook equation provides a means to calculate the friction factor, 0 5 1 1f 1 2.0 log a e 3.7D 1 2.51 Re1f b (8.21) where ␧ 5 the roughness (m), D 5 diameter (m), and Re 5 the Reynolds number, Re 5 rVD m where 5 the fluid’s density (kg/m3 ), V 5 its velocity (m/s), and 5 dynamic viscos- ity (N ? s/m2 ). In addition to appearing in Eq. (8.21), the Reynolds number also serves as the criterion for whether flow is turbulent (Re . 4000). In the present case study, we will illustrate how the numerical methods covered in this part of the book can be employed to determine f for air flow through a smooth, thin tube. For this case, the parameters are 5 1.23 kg/m3 , 5 1.79 3 1025 N ? s/m2 , D 5 0.005 m, V 5 40 m/s, and ␧ 5 0.0015 mm. Note that friction factors range from about 0.008 to 0.08. In addition, an explicit formulation called the Swamee-Jain equation provides an approxi- mate estimate, f 5 1.325 clna e 3.7D 1 5.74 Re0.9 b d 2 (8.22) Solution. The Reynolds number can be computed as Re 5 rVD m 5 1.23(40)0.005 1.79 3 1025 5 13,743 This value along with the other parameters can be substituted into Eq. (8.21) to give g( f ) 5 1 1f 1 2.0 loga 0.0000015 3.7(0.005) 1 2.51 13,7431f b Before determining the root, it is advisable to plot the function to estimate initial guesses and to anticipate possible difficulties. This can be done easily with tools such
  • 230. 8.4 PIPE FRICTION 213 as MATLAB software, Excel, or Mathcad. For example, a plot of the function can be generated with the following MATLAB commands rho=1.23;mu=1.79e-5;D=0.005;V=40;e=0.0015/1000; Re=rho*V*D/mu; g=@(f) 1/sqrt(f)+2*log10(e/(3.7*D)+2.51/(Re*sqrt(f))); fplot(g,[0.008 0.08]),grid,xlabel('f'),ylabel('g(f)') As in Fig. 8.5, the root is located at about 0.03. Because we are supplied initial guesses (xl 5 0.008 and xu 5 0.08), either of the bracketing methods from Chap. 5 could be used. For example, bisection gives a value of f 5 0.0289678 with a percent relative error of error of 5.926 3 1025 in 22 iterations. False position yields a result of similar precision in 26 iterations. Thus, although they produce the correct result, they are somewhat inefficient. This would not be important for a single application, but could become prohibitive if many evaluations were made. We could try to attain improved performance by turning to an open method. Because Eq. (8.21) is relatively straightforward to differentiate, the Newton-Raphson method is a good candidate. For example, using an initial guess at the lower end of the range (x0 5 0.008), Newton-Raphson converges quickly to 0.0289678 with an approximate error of 6.87 3 1026 % in only 6 iterations. However, when the initial guess is set at the upper end of the range (x0 5 0.08), the routine diverges! As can be seen by inspecting Fig. 8.5, this occurs because the function’s slope at the initial guess causes the first iteration to jump to a negative value. Further runs demonstrate that for this case, convergence only occurs when the initial guess is below about 0.066. FIGURE 8.5 –3 0.02 0.01 0.03 0.04 0.05 0.06 0.07 0.08 –2 –1 0 1 2 3 4 5 6 g( f ) f
  • 231. 214 CASE STUDIES: ROOTS OF EQUATIONS So we can see that although the Newton-Raphson is very efficient, it requires good initial guesses. For the Colebrook equation, a good strategy might be to employ the Swamee-Jain equation (Eq. 8.22) to provide the initial guess as in f 5 1.325 clna 0.0000015 3.7(0.005) 1 5.74 137430.9 b d 2 5 0.029031 For this case, Newton-Raphson converges in only 3 iterations quickly to 0.0289678 with an approximate error of 8.51 3 10210 %. Aside from our homemade functions, we can also use professional root finders like MATLAB’s built-in fzero function. However, just as with the Newton-Raphson method, divergence also occurs when fzero function is used with a single guess. However, in this case, guesses at the lower end of the range cause problems. For example, rho=1.23;mu=1.79e-5;D=0.005;V=40;e=0.0015/1000; Re=rho*V*D/mu g=@(f) 1/sqrt(f)+2*log10(e/(3.7*D)+2.51/(Re*sqrt(f))); fzero(g,0.008) Exiting fzero: aborting search for an interval containing a sign change because complex function value encountered during search. (Function value at -0.0028 is -4.92028- 20.2423i.) Check function or try again with a different starting value. ans = NaN If the iterations are displayed using optimset (recall Sec. 7.7.2), it is revealed that a negative value occurs during the search phase before a sign change is detected and the routine aborts. However, for single initial guesses above about 0.016, the routine works nicely. For example, for the guess of 0.08 that caused problems for Newton-Raphson, fzero does just fine, fzero(g,0.08) ans = 0.02896781017144 As a final note, let’s see whether convergence is possible for simple fixed-point iteration. The easiest and most straightforward version involves solving for the first f in Eq. (8.21), fi11 5 0.25 aloga e 3.7D 1 2.51 Re1fi bb 2 (8.23) The cobweb display of this function depicted indicates a surprising result (Fig. 8.6). Recall that fixed-point iteration converges when the y2 curve has a relatively flat slope (i.e., Zg9()Z , 1). As indicated by Fig. 8.6, the fact that the y2 curve is quite flat in the range from f 5 0.008 to 0.08 means that not only does fixed-point iteration converge,
  • 232. PROBLEMS 215 but it converges fairly rapidly! In fact, for initial guesses anywhere between 0.008 and 0.08, fixed-point iteration yields predictions with percent relative errors less than 0.008% in six or fewer iterations. Thus, this simple approach that requires only one guess and no derivative estimates performs really well for this particular case. The take-home message from this case study is that even great, professionally- developed software like MATLAB is not always foolproof. Further, there is usually no single method that works best for all problems. Sophisticated users understand the strengths and weaknesses of the available numerical techniques. In addition, they under- stand enough of the underlying theory so that they can effectively deal with situations where a method breaks down. FIGURE 8.6 0 0.01 0.02 0.03 0.04 0.05 0 0.02 0.04 0.06 0.08 y2 = g(x) y1 = x x y PROBLEMS Chemical/Bio Engineering 8.1 Perform the same computation as in Sec. 8.1, but for ethyl alcohol (a 5 12.02 and b 5 0.08407) at a temperature of 375 K and p of 2.0 atm. Compare your results with the ideal gas law. Use any of the numerical methods discussed in Chaps. 5 and 6 to perform the computation. Justify your choice of technique. 8.2 In chemical engineering, plug flow reactors (that is, those in which fluid flows from one end to the other with minimal mixing along the longitudinal axis) are often used to convert reactants into products. It has been determined that the efficiency of the conver- sion can sometimes be improved by recycling a portion of the product stream so that it returns to the entrance for an additional pass through the reactor (Fig. P8.2). The recycle rate is defined as R 5 volume of fluid returned to entrance volume leaving the system Suppose that we are processing a chemicalA to generate a product B. For the case where A forms B according to an autocatalytic reac- tion (that is, in which one of the products acts as a catalyst or stimulus for the reaction), it can be shown that an optimal recycle rate must satisfy
  • 233. 216 CASE STUDIES: ROOTS OF EQUATIONS moles of C that are produced. Conservation of mass can be used to reformulate the equilibrium relationship as K 5 (cc,0 1 x) (ca,0 2 2x)2 (cb,0 2 x) where the subscript 0 designates the initial concentration of each constituent. If K 5 0.015, ca,0 5 42, cb,0 5 30, and cc,0 5 4, determine the value of x. (a) Obtain the solution graphically. (b) On the basis of (a), solve for the root with initial guesses of xl 5 0 and xu 5 20 to ␧s 5 0.5%. Choose either bisection or false position to obtain your solution. Justify your choice. 8.6 The following chemical reactions take place in a closed system 2A 1 B C A 1 D C At equilibrium, they can be characterized by K1 5 cc c2 acb K2 5 cc cacd where the nomenclature represents the concentration of constituent i. If x1 and x2 are the number of moles of C that are produced due to the first and second reactions, respectively, use an approach similar to that of Prob. 8.5 to reformulate the equilibrium relationships in terms of the initial concentrations of the constituents. Then, use the Newton-Raphson method to solve the pair of simultaneous non- linear equations for x1 and x2 if K1 5 4 3 1024 , K2 5 3.7 3 1022 , ca,0 5 50, cb,0 5 20, cc,0 5 5, and cd,0 5 10. Use a graphical approach to develop your initial guesses. 8.7 The Redlich-Kwong equation of state is given by p 5 RT y 2 b 2 a y(y 1 b) 1T where R 5 the universal gas constant [5 0.518 kJ/(kg K)], T 5 absolute temperature (K), p 5 absolute pressure (kPa), and y 5 the volume of a kg of gas (m3 /kg). The parameters a and b are calculated by a 5 0.427 R2 T2.5 c pc b 5 0.0866R Tc pc where pc 5 critical pressure (kPa) and Tc 5 critical temperature (K). As a chemical engineer, you are asked to determine the amount of methane fuel (pc 5 4600 kPa and Tc 5 191 K) that can be held in a 3-m3 tank at a temperature of 2408C with a pressure of 65,000 kPa. Use a root-locating method of your choice to calculate y and then determine the mass of methane contained in the tank. ln 1 1 R(1 2 XAf) R(1 2 XAf) 5 R 1 1 R[1 1 R(1 2 XAf)] where XAf 5 the fraction of reactant A that is converted to product B. The optimal recycle rate corresponds to the minimum-sized reactor needed to attain the desired level of conversion. Use a numerical method to determine the recycle ratios needed to mini- mize reactor size for a fractional conversion of XAf 5 0.9. 8.3 In a chemical engineering process, water vapor (H2O) is heated to sufficiently high temperatures that a significant portion of the water dissociates, or splits apart, to form oxygen (O2) and hydrogen (H2): H2O H2 1 1 2O2 If it is assumed that this is the only reaction involved, the mole fraction x of H2O that dissociates can be represented by K 5 x 1 2 x A 2pt 2 1 x (P8.3.1) where K 5 the reaction equilibrium constant and pt 5 the total pressure of the mixture. If pt 5 3 atm and K 5 0.05, determine the value of x that satisfies Eq. (P8.3.1). 8.4 The following equation pertains to the concentration of a chemical in a completely mixed reactor: c 5 cin(1 2 e20.04t ) 1 c0e20.04t If the initial concentration c0 5 4 and the inflow concentration cin 5 10, compute the time required for c to be 93 percent of cin. 8.5 A reversible chemical reaction 2A 1 B C can be characterized by the equilibrium relationship K 5 cc c2 acb where the nomenclature ci represents the concentration of constituent i. Suppose that we define a variable x as representing the number of FIGURE P8.2 Schematic representation of a plug flow reactor with recycle. Plug flow reactor Recycle Feed Product
  • 234. PROBLEMS 217 Given the parameter values listed below, find the void fraction e of the bed. DpGo m 5 1000 ¢PrDp G2 oL 5 10 8.13 The pressure drop in a section of pipe can be calculated as ¢p 5 f LrV2 2D where Dp 5 the pressure drop (Pa), f 5 the friction factor, L 5 the length of pipe [m], 5 density (kg/m3 ), V 5 velocity (m/s), and D 5 diameter (m). For turbulent flow, the Colebrook equation pro- vides a means to calculate the friction factor, 1 1f 5 22.0 log a e 3.7D 1 2.51 Re1f b where ␧ 5 the roughness (m), and Re 5 the Reynolds number, Re 5 rVD m where 5 dynamic viscosity (N ? s/m2 ). (a) Determine Dp for a 0.2-m-long horizontal stretch of smooth drawn tubing given 5 1.23 kg/m3 , m 5 1.79 3 1025 N ? s/m2 , D 5 0.005 m, V 5 40 m/s, and e 5 0.0015 mm. Use a numerical method to determine the friction factor. Note that smooth pipes with Re , 105 , a good initial guess can be obtained using the Blasius formula, f 5 0.316yRe0.25 . (b) Repeat the computation but for a rougher commercial steel pipe (e 5 0.045 mm). Civil and Environmental Engineering 8.14 In structural engineering, the secant formula defines the force per unit area, PyA, that causes a maximum stress m in a column of given slenderness ratio Lyk: P A 5 sm 1 1 (ecyk2 )sec[0.51Py(EA)(Lyk)] where ecyk2 5 the eccentricity ratio and E 5 the modulus of elasticity. If for a steel beam, E 5 200,000 MPa, ecyk2 5 0.2, and sm 5 250 MPa, compute PyA for Lyk 5 100. Recall that sec x 5 1ycos x. 8.15 In environmental engineering (a specialty area in civil engineering), the following equation can be used to compute the oxy- gen level c (mg/L) in a river downstream from a sewage discharge: c 5 10 2 20(e20.2x 2 e20.75x ) 8.8 The volume V of liquid in a hollow horizontal cylinder of radius r and length L is related to the depth of the liquid h by V 5 c r2 cos21 a r 2 h r b 2 (r 2 h) 22rh 2 h2 d L Determine h given r 5 2 m, L 5 5 m, and V 5 8 m3 . Note that if you are using a programming language or software tool that is not rich in trigonometric functions, the arc cosine can be computed with cos21 x 5 p 2 2 tan21 a x 21 2 x2 b 8.9 The volume V of liquid in a spherical tank of radius r is related to the depth h of the liquid by V 5 ph2 (3r 2 h) 3 Determine h given r 5 1 m and V 5 0.5 m3 . 8.10 For the spherical tank in Prob. 8.9, it is possible to develop the following two fixed-point formulas: h 5 B h3 1 (3Vyp) 3r and h 5 B 3 3 arh2 2 V p b If r 5 1 m and V 5 0.5 m3 , determine whether either of these is stable, and the range of initial guesses for which they are stable. 8.11 The operation of a constant density plug flow reactor for the production of a substance via an enzymatic reaction is described by the equation below, where V is the volume of the reactor, F is the flow rate of reactant C, Cin and Cout are the concentrations of reac- tant entering and leaving the reactor, respectively, and K and kmax are constants. For a 100-L reactor, with an inlet concentration of Cin 5 0.2 M, an inlet flow rate of 80 L/s, kmax 5 1022 s21 , and K 5 0.1 M, find the concentration of C at the outlet of the reactor. V F 5 2# Cout Cin K kmaxC 1 1 kmax dC 8.12 The Ergun equation, shown below, is used to describe the flow of a fluid through a packed bed. DP is the pressure drop, r is the density of the fluid, Go is the mass velocity (mass flow rate di- vided by cross-sectional area), Dp is the diameter of the particles within the bed, m is the fluid viscosity, L is the length of the bed, and e is the void fraction of the bed. ¢Pr G2 o Dp L e3 1 2 e 5 150 1 2 e (DpGoym) 1 1.75
  • 235. 218 CASE STUDIES: ROOTS OF EQUATIONS where the hyperbolic cosine can be computed by cosh x 5 1 2 (ex 1 e2x ) Use a numerical method to calculate a value for the parameter TA given values for the parameters w 5 10 and y0 5 5, such that the cable has a height of y 5 15 at x 5 50. 8.18 Figure P8.18a shows a uniform beam subject to a linearly increasing distributed load. The equation for the resulting elastic curve is (see Fig. P8.18b) y 5 w0 120EIL (2x5 1 2L2 x3 2 L4 x) (P8.18.1) Use bisection to determine the point of maximum deflection (that is, the value of x where dy/dx 5 0). Then substitute this value into Eq. (P8.18.1) to determine the value of the maximum deflection. Use the following parameter values in your computation: L 5 450 cm, E 5 50,000 kN/cm2 , I 5 30,000 cm4 , and w0 5 1.75 kN/cm. 8.19 The displacement of a structure is defined by the following equation for a damped oscillation: y 5 8e2kt cos vt where k 5 0.5 and 5 3. (a) Use the graphical method to make an initial estimate of the time required for the displacement to decrease to 4. (b) Use the Newton-Raphson method to determine the root to ␧s 5 0.01%. (c) Use the secant method to determine the root to ␧s 5 0.01%. 8.20 The Manning equation can be written for a rectangular open channel as Q 5 1S(BH)5y3 n(B 1 2H)2y3 where x is the distance downstream in kilometers. (a) Determine the distance downstream where the oxygen level first falls to a reading of 5 mg/L. (Hint: It is within 2 km of the discharge.) Determine your answer to a 1% error. Note that levels of oxygen below 5 mg/L are generally harmful to game- fish such as trout and salmon. (b) Determine the distance downstream at which the oxygen is at a minimum. What is the concentration at that location? 8.16 The concentration of pollutant bacteria c in a lake decreases according to c 5 70e21.5t 1 25e20.075t Determine the time required for the bacteria concentration to be reduced to 9 using (a) the graphical method and (b) using the Newton-Raphson method with an initial guess of t 5 10 and a stopping criterion of 0.5%. Check your result. 8.17 A catenary cable is one that is hung between two points not in the same vertical line. As depicted in Fig. P8.17a, it is subject to no loads other than its own weight. Thus, its weight (N/m) acts as a uniform load per unit length along the cable. A free-body diagram of a section AB is depicted in Fig. P8.17b, where TA and TB are the tension forces at the end. Based on horizontal and vertical force balances, the following differential equation model of the cable can be derived: d2 y dx2 5 w TAB 1 1 a dy dx b 2 Calculus can be employed to solve this equation for the height y of the cable as a function of distance x, y 5 TA w cosh a w TA xb 1 y0 2 TA w FIGURE P8.17 (a) Forces acting on a section AB of a flexible hanging cable. The load is uniform along the cable (but not uniform per the horizontal distance x). (b) A free- body diagram of section AB. y B A TA W = ws w y0 x (a) (b) TB ␪
  • 236. PROBLEMS 219 formula relating present worth P, annual payments A, number of years n, and interest rate i is A 5 P i(1 1 i)n (1 1 i)n 2 1 8.23 Many fields of engineering require accurate population esti- mates. For example, transportation engineers might find it neces- sary to determine separately the population growth trends of a city and adjacent suburb. The population of the urban area is declining with time according to Pu(t) 5 Pu, maxe2kut 1 Pu, min while the suburban population is growing, as in Ps(t) 5 Ps, max 1 1 [Ps, maxyP0 2 1]e2kst where Pu, max, ku, Ps, max, P0, and ks 5 empirically derived parame- ters. Determine the time and corresponding values of Pu(t) and Ps(t) when the suburbs are 20% larger than the city. The parameter values are Pu, max 5 75,000, ku 5 0.045/yr, Pu, min 5 100,000 people, Ps, max 5 300,000 people, P0 5 10,000 people, ks 5 0.08/yr. To obtain your solutions, use (a) graphical, (b) false-position, and (c) modified secant methods. 8.24 A simply supported beam is loaded as shown in Fig. P8.24. Using singularity functions, the shear along the beam can be expressed by the equation: V(x) 5 20[kx 2 0l1 2 kx 2 5l1 ] 2 15 kx 2 8l0 2 57 By definition, the singularity function can be expressed as follows: kx 2 aln 5 e (x 2 a)n when x . a 0 when x # a f Use a numerical method to find the point(s) where the shear equals zero. where Q 5 flow [m3 /s], S 5 slope [m/m], H 5 depth [m], and n 5 the Manning roughness coefficient. Develop a fixed-point iteration scheme to solve this equation for H given Q 5 5, S 5 0.0002, B 5 20, and n 5 0.03. Prove that your scheme con- verges for all initial guesses greater than or equal to zero. 8.21 In ocean engineering, the equation for a reflected standing wave in a harbor is given by 5 16, t 5 12, 5 48: h 5 h0 c sina 2px l b cos a 2pty l b 1 e2x d Solve for the lowest positive value of x if h 5 0.4h0. 8.22 You buy a $20,000 piece of equipment for nothing down and $4000 per year for 6 years. What interest rate are you paying? The w0 L (a) (x = 0, y = 0) (x = L, y = 0) x (b) FIGURE P8.18 FIGURE P8.24 20 kips/ft 150 kip-ft 15 kips 5’ 2’ 1’ 2’
  • 237. 220 CASE STUDIES: ROOTS OF EQUATIONS Electrical Engineering 8.29 Perform the same computation as in Sec. 8.3, but determine the value of L required for the circuit to dissipate to 1% of its origi- nal value in t 5 0.05 s, given R 5 280V, and C 5 1024 F. Use (a) a graphical approach, (b) bisection, and (c) root location soft- ware such as the Excel Solver, the MATLAB function fzero, or the Mathcad function root. 8.30 An oscillating current in an electric circuit is described by i 5 9e2t sin(2 t), where t is in seconds. Determine the lowest value of t such that i 5 3.5. 8.31 The resistivity of doped silicon is based on the charge q on an electron, the electron density n, and the electron mobility . The electron density is given in terms of the doping density N and the intrinsic carrier density ni. The electron mobility is described by the temperature T, the reference temperature T0, and the reference mobility 0. The equations required to com- pute the resistivity are r 5 1 qnm where n 5 1 2 (N 1 2N2 1 4n2 i ) and m 5 m0 a T T0 b 22.42 Determine N, given T0 5 300 K, T 5 1000 K, 0 5 1300 cm2 (V s)21 , q 5 1.6 3 10219 C, ni 5 6.21 3 109 cm23 , and a desired 5 6 3 106 V s cm/C. Use (a) bisection and (b) the modified secant method. 8.32 A total charge Q is uniformly distributed around a ring-shaped conductor with radius a. A charge q is located at a distance x from the center of the ring (Fig. P8.32). The force exerted on the charge by the ring is given by F 5 1 4pe0 qQx (x2 1 a2 )3y2 where e0 5 8.85 3 10212 C2 /(N m2 ). Find the distance x where the force is 1N if q and Q are 2 3 1025 C for a ring with a radius of 0.9 m. 8.25 Using the simply supported beam from Prob. 8.24, the mo- ment along the beam, M(x), is given by: M(x) 5 210[kx 2 0l2 2 kx 2 5l2 ] 1 15 kx 2 8l1 1 150 kx 2 7l0 1 57x Use a numerical method to find the point(s) where the moment equals zero. 8.26 Using the simply supported beam from Prob. 8.24, the slope along the beam is given by: duy dx (x) 5 210 3 [kx 2 0l3 2 kx 2 5l3 ] 1 15 2 kx 2 8l2 1 150 kx 2 7l1 1 57 2 x2 2 238.25 Use a numerical method to find the point(s) where the slope equals zero. 8.27 Using the simply supported beam from Prob. 8.24, the dis- placement along the beam is given by: uy(x) 5 25 6 [kx 2 0l4 2 kx 2 5l4 ] 1 15 6 kx 2 8l3 1 75 kx 2 7l2 1 57 6 x3 2 238.25x (a) Find the point(s) where the displacement equals zero. (b) How would you use a root location technique to determine the location of the minimum displacement? 8.28 Although we did not mention it in Sec. 8.2, Eq. (8.10) is actu- ally an expression of electroneutrality; that is, that positive and negative charges must balance. This can be seen more clearly by expressing it as [H1 ] 5 [HCO2 3 ] 1 2[CO22 3 ] 1 [OH2 ] In other words, the positive charges must equal the negative charges. Thus, when you compute the pH of a natural water body such as a lake, you must also account for other ions that may be present. For the case where these ions originate from nonreactive salts, the net negative minus positive charges due to these ions are lumped together in a quantity called alkalinity, and the equation is reformulated as Alk 1 [H1 ] 5 [HCO2 3 ] 1 2[CO22 3 ] 1 [OH2 ] (P8.28) where Alk 5 alkalinity (eq/L). For example, the alkalinity of Lake Superior is approximately 0.4 3 1023 eq/L. Perform the same calculations as in Sec. 8.2 to compute the pH of Lake Superior in 2008. Assume that just like the raindrops, the lake is in equilib- rium with atmospheric CO2, but account for the alkalinity as in Eq. (P8.28). FIGURE P8.32 x a Q q
  • 238. PROBLEMS 221 8.36 Mechanical engineers, as well as most other engineers, use thermodynamics extensively in their work. The following polyno- mial can be used to relate the zero-pressure specific heat of dry air, cp kJ/(kg K), to temperature (K): cp 5 0.99403 1 1.671 3 1024 T 1 9.7215 3 1028 T2 29.5838 3 10211 T3 1 1.9520 3 10214 T4 Determine the temperature that corresponds to a specific heat of 1.2 kJ/(kg K). 8.37 Aerospace engineers sometimes compute the trajectories of pro- jectiles like rockets. A related problem deals with the trajectory of a thrown ball.The trajectory of a ball is defined by the (x, y) coordinates, as displayed in Fig. P8.37. The trajectory can be modeled as y 5 (tan u0)x 2 g 2y2 0 cos2 u0 x2 1 y0 Find the appropriate initial angle u0, if the initial velocity 0 5 20 m/s and the distance to the catcher x is 40 m. Note that the ball leaves the thrower’s hand at an elevation of y0 5 1.8 m and the catcher receives it at 1 m. Express the final result in degrees. Use a value of 9.81 m/s2 for g and employ the graphical method to develop your initial guesses. 8.33 Figure P8.33 shows a circuit with a resistor, an inductor, and a capacitor in parallel. Kirchhoff’s rules can be used to express the impedance of the system as 1 Z 5 B 1 R2 1 avC 2 1 vL b 2 where Z 5 impedance (V) and v 5 the angular frequency. Find the that results in an impedance of 75 V using both bisection and false position with initial guesses of 1 and 1000 for the following parameters: R 5 225 V, C 5 0.6 3 1026 F, and L 5 0.5 H. Deter- mine how many iterations of each technique are necessary to deter- mine the answer to ␧s 5 0.1%. Use the graphical approach to explain any difficulties that arise. FIGURE P8.33 R L C ⵑ FIGURE P8.35 h (a) (b) d h + d Mechanical and Aerospace Engineering 8.34 Beyond the Colebrook equation, other relationships, such as the Fanning friction factor f, are available to estimate friction in pipes. The Fanning friction factor is dependent on a number of pa- rameters related to the size of the pipe and the fluid, which can all be represented by another dimensionless quantity, the Reynolds number Re. A formula that predicts f given Re is the von Karman equation, 1 1f 5 4 log10(Re1f ) 2 0.4 Typical values for the Reynolds number for turbulent flow are 10,000 to 500,000 and for the Fanning friction factor are 0.001 to 0.01. De- velop a function that uses bisection to solve for f given a user-supplied value of Re between 2500 and 1,000,000. Design the function so that it ensures that the absolute error in the result is Ea,d , 0.000005. 8.35 Real mechanical systems may involve the deflection of nonlin- ear springs. In Fig. P8.35, a mass m is released a distance h above a nonlinear spring. The resistance force F of the spring is given by F 5 2(k1d 1 k2d3y2 ) Conservation of energy can be used to show that 0 5 2k2d5y2 5 1 1 2 k1d2 2 mgd 2 mgh Solve for d, given the following parameter values: k1 5 40,000 g/s2 , k2 5 40 g/(s2 m0.5 ), m 5 95 g, g 5 9.81 m/s2 , and h 5 0.43 m. FIGURE P8.37 ␪0 v0 y x
  • 239. 222 CASE STUDIES: ROOTS OF EQUATIONS As a mechanical engineer, you would like to know if there are cases where 5 y2 2 1. Use the other parameters from the section to set up the equation as a roots problem and solve for . 8.41 Two fluids at different temperatures enter a mixer and come out at the same temperature. The heat capacity of fluid A is given by: cp 5 3.381 1 1.804 3 1022 T 2 4.300 3 1026 T2 and the heat capacity of fluid B is given by: cp 5 8.592 1 1.290 3 1021 T 2 4.078 3 1025 T2 where cp is in units of cal/mol K, and T is in units of K. Note that ¢H 5 # T2 T1 cpdT A enters the mixer at 4008C. B enters the mixer at 6008C. There is twice as much A as there is B entering into the mixer. At what tem- perature do the two fluids exit the mixer? 8.42 A compressor is operating at compression ratio Rc of 3.0 (the pressure of gas at the outlet is three times greater than the pressure of the gas at the inlet). The power requirements of the compressor Hp can be determined from the equation below. Assuming that the power requirements of the compressor are exactly equal to zRT1yMW, find the polytropic efficiency n of the compressor. The parameter z is compressibility of the gas under operating condi- tions of the compressor, R is the gas constant, T1 is the temperature of the gas at the compressor inlet, and MW is the molecular weight of the gas. HP 5 zRT1 MW n n 2 1 (R(n21)yn c 2 1) 8.43 In the thermos shown in Fig. P8.43, the innermost compart- ment is separated from the middle container by a vacuum. There is a final shell around the thermos. This final shell is separated from the middle layer by a thin layer of air. The outside of the final shell comes in contact with room air. Heat transfer from the inner compartment to the next layer q1 is by radiation only (since the space is evacuated). Heat transfer between the middle layer and outside shell q2 is by convection in a small space. Heat trans- fer from the outside shell to the air q3 is by natural convection. The heat flux from each region of the thermos must be equal— that is, q1 5 q2 5 q3. Find the temperatures T1 and T2 at steady state. T0 is 5008C and T3 5 258C. q1 5 1029 [(T0 1 273)4 2 (T1 1 273)4 ] q2 5 4(T1 2 T2) q3 5 1.3(T2 2 T3)4y3 8.38 The general form for a three-dimensional stress field is given by £ sxx sxy sxz sxy syy syz sxz syz szz § where the diagonal terms represent tensile or compressive stresses and the off-diagonal terms represent shear stresses. A stress field (in MPa) is given by £ 10 14 25 14 7 15 25 15 16 § To solve for the principal stresses, it is necessary to construct the following matrix (again in MPa): £ 10 2 s 14 25 14 7 2 s 15 25 15 16 2 s § 1, 2, and 3 can be solved from the equation s3 2 Is2 1 IIs 2 III 5 0 where I 5 sxx 1 syy 1 szz II 5 sxxsyy 1 sxxszz 1 syyszz 2 s2 xy 2 s2 xz 2 s2 yz III 5 sxxsyyszz 2 sxxs2 yz 2 syys2 xz 2 szzs2 xy 1 2sxysxzsyz I, II, and III are known as the stress invariants. Find 1, 2, and 3 using a root-finding technique. 8.39 The upward velocity of a rocket can be computed by the fol- lowing formula: y 5 u ln m0 m0 2 qt 2 gt where 5 upward velocity, u 5 the velocity at which fuel is ex- pelled relative to the rocket, m0 5 the initial mass of the rocket at time t 5 0, q 5 the fuel consumption rate, and g 5 the downward ac- celeration of gravity (assumed constant 5 9.81 m/s2 ). If u 5 2200 m/s, m0 5 160,000 kg, and q 5 2680 kg/s, compute the time at which 5 1000 m/s. (Hint: t is somewhere between 10 and 50 s.) Determine your result so that it is within 1% of the true value. Check your answer. 8.40 The phase angle between the forced vibration caused by the rough road and the motion of the car is given by tan f 5 2(cycc)(vyp) 1 2 (vyp)2
  • 240. PROBLEMS 223 8.45 A fluid is pumped into the network of pipes shown in Fig. P8.45. At steady state, the following flow balances must hold, Q1 5 Q2 1 Q3 Q3 5 Q4 1 Q5 Q5 5 Q6 1 Q7 where Qi 5 flow in pipe i(m3 /s). In addition, the pressure drops around the three right-hand loops must equal zero. The pressure drop in each circular pipe length can be computed with ¢P 5 16 p2 fLr 2D5 Q2 where DP 5 the pressure drop (Pa), f 5 the friction factor (dimen- sionless), L 5 the pipe length (m), 5 the fluid density (kg/m3 ), and D 5 pipe diameter (m). Write a program (or develop an algo- rithm in a mathematics software package) that will allow you to compute the flow in every pipe length given that Q1 5 1 m3 /s and 5 1.23 kg/m3 . All the pipes have D 5 500 mm and f 5 0.005. The pipe lengths are: L3 5 L5 5 L8 5 L9 5 2 m; L2 5 L4 5 L6 5 4 m; and L7 5 8 m. 8.46 Repeat Prob. 8.45, but incorporate the fact that the friction factor can be computed with the von Karman equation, 1 1f 5 4 log10(Re1f ) 2 0.4 where Re 5 the Reynolds number Re 5 rVD m where V 5 the velocity of the fluid in the pipe (m/s) and 5 dynamic viscosity (N ? s/m2 ). Note that for a circular pipe 8.44 Figure P8.44 shows three reservoirs connected by circular pipes. The pipes, which are made of asphalt-dipped cast iron (ε 5 0.0012 m), have the following characteristics: Pipe 1 2 3 Length, m 1800 500 1400 Diameter, m 0.4 0.25 0.2 Flow, m3 /s ? 0.1 ? If the water surface elevations in Reservoirs A and C are 200 and 172.5 m, respectively, determine the elevation in Reservoir B and the flows in pipes 1 and 3. Note that the kinematic viscosity of water is 1 3 1026 m2 /s and use the Colebrook equation to deter- mine the friction factor (recall Prob. 8.13). FIGURE P8.43 T0 T2 T3 T1 FIGURE P8.45 Q1 Q10 Q9 Q8 Q3 Q5 Q7 Q6 Q4 Q2 FIGURE P8.44 Q1 h2 h3 h1 Q3 Q2 1 2 3 A B C
  • 241. 224 CASE STUDIES: ROOTS OF EQUATIONS V 5 4Q/ D2 . Also, assume that the fluid has a viscosity of 1.79 3 1025 N ? s/m2 . 8.47 The space shuttle, at lift-off from the launch pad, has four forces acting on it, which are shown on the free-body diagram (Fig. P8.47). The combined weight of the two solid rocket boost- ers and external fuel tank is WB 5 1.663 3 106 lb. The weight of FIGURE P8.47 External tank Solid rocket booster Orbiter 38’ 4’ 28’ WB WS TS TB ␪ G the orbiter with a full payload is WS 5 0.23 3 106 lb. The com- bined thrust of the two solid rocket boosters is TB 5 5.30 3 106 lb. The combined thrust of the three liquid fuel orbiter engines is TS 5 1.125 3 106 lb. At liftoff, the orbiter engine thrust is directed at angle to make the resultant moment acting on the entire craft assembly (external tank, solid rocket boosters, and orbiter) equal to zero. With the resultant moment equal to zero, the craft will not rotate about its mass center G at liftoff. With these forces, the craft will have a resultant force with components in both the vertical and horizontal direction. The vertical resultant force component is what allows the craft to lift off from the launch pad and fly verti- cally. The horizontal resultant force component causes the craft to fly horizontally. The resultant moment acting on the craft will be zero when is adjusted to the proper value. If this angle is not adjusted properly, and there is some resultant moment acting on the craft, the craft will tend to rotate about it mass center. (a) Resolve the orbiter thrust TS into horizontal and vertical com- ponents, and then sum moments about point G, the craft mass center. Set the resulting moment equation equal to zero. This equation can now be solved for the value of required for liftoff. (b) Derive an equation for the resultant moment acting on the craft in terms of the angle . Plot the resultant moment as a function of the angle over a range of 25 radians to 15 radians. (c) Write a computer program to solve for the angle using Newton’s method to find the root of the resultant moment equa- tion. Make an initial first guess at the root of interest using the plot. Terminate your iterations when the value of has better than five significant figures. (d) Repeat the program for the minimum payload weight of the orbiter of WS 5 195,000 lb. 8.48 Determining the velocity of particles settling through fluids is of great importance of many areas of engineering and science. Such calculations depend on the flow regime as represented by the dimensionless Reynolds number, Re 5 rdy m (P8.48.1) where 5 the fluid’s density (kg/m3 ), d 5 the particle diameter (m), y 5 the particle’s settling velocity (m/s), and 5 the fluid’s dynamic viscosity (N s/m2 ). Under laminar conditions (Re , 0.1), the settling velocity of a spherical particle can be computed with the following formula based on Stokes law, y 5 g 18 a rs 2 r m bd2 (P8.48.2) where g 5 the gravitational constant (5 9.81 m/s2 ), and s 5 the particle’s density (kg/m3 ). For turbulent conditions (i.e., higher
  • 242. PROBLEMS 225 (b) Use the modified secant method with d 5 1023 and εS 5 0.05% to determine y for a spherical iron particle settling in water, where d 5 200 m, 5 1 g/cm3 , s 5 7.874 g/cm3 , and 5 0.014 g/(cm?s). Employ Eq. (P8.48.2) to generate your initial guess. (c) Based on the result of (b), compute the Reynolds number and the drag coefficient, and use the latter to confirm that the flow regime is not laminar. (d) Develop a fixed-point iteration solution for the conditions outlined in (b). (e) Use a graphical approach to illustrate that the formulation developed in (d) will converge for any positive guess. Reynolds numbers), an alternative approach can be used based on the following formula: y 5 B 4g(rs 2 r)d 3CDr (P8.48.3) where CD 5 the drag coefficient, which depends on the Reynolds number as in CD 5 24 Re 1 3 1Re 1 0.34 (P8.48.4) (a) Combine Eqs. (P8.48.2), (P8.48.3), and (P8.48.4) to express the determination of y as a roots of equations problem. That is, express the combined formula in the format f(y) 5 0.
  • 243. 226 PT2.4 TRADE-OFFS Table PT2.3 provides a summary of the trade-offs involved in solving for roots of alge- braic and transcendental equations. Although graphical methods are time-consuming, they provide insight into the behavior of the function and are useful in identifying initial guesses and potential problems such as multiple roots. Therefore, if time permits, a quick sketch (or better yet, a computerized graph) can yield valuable information regarding the behavior of the function. The numerical methods themselves are divided into two general categories: bracket- ing and open methods. The former requires two initial guesses that are on either side of a root. This “bracketing” is maintained as the solution proceeds, and thus, these tech- niques are always convergent. However, a price is paid for this property in that the rate of convergence is relatively slow. TABLE PT2.3 Comparison of the characteristics of alternative methods for finding roots of algebraic and transcendental equations. The comparisons are based on general experience and do not account for the behavior of specific functions. Method Type Guesses Convergence Stability Programming Comments Direct Analytical — — — Graphical Visual — — — — Imprecise Bisection Bracketing 2 Slow Always Easy False-position Bracketing 2 Slow/medium Always Easy Modified FP Bracketing 2 Medium Always Easy Fixed-point Open 1 Slow Possibly divergent Easy iteration Newton-Raphson Open 1 Fast Possibly divergent Easy Requires evaluation of f’(x) Modified Newton- Open 1 Fast (multiple), Possibly divergent Easy Requires Raphson medium (single) evaluation of f’(x) and f”(x) Secant Open 2 Medium/fast Possibly divergent Easy Initial guesses do not have to bracket the root Modified secant Open 1 Medium/fast Possibly divergent Easy Brent Hybrid 1 or 2 Medium Always (for Moderate Robust 2 guesses) Müller Polynomials 2 Medium/fast Possibly divergent Moderate Bairstow Polynomials 2 Fast Possibly divergent Moderate EPILOGUE: PART TWO
  • 244. PT2.6 ADVANCED METHODS AND ADDITIONAL REFERENCES 227 Open techniques differ from bracketing methods in that they use information at a single point (or two values that need not bracket the root to extrapolate to a new root estimate). This property is a double-edged sword. Although it leads to quicker conver- gence, it also allows the possibility that the solution may diverge. In general, the con- vergence of open techniques is partially dependent on the quality of the initial guess and the nature of the function. The closer the guess is to the true root, the more likely the methods will converge. Of the open techniques, the standard Newton-Raphson method is often used because of its property of quadratic convergence. However, its major shortcoming is that it re- quires the derivative of the function be obtained analytically. For some functions this is impractical. In these cases, the secant method, which employs a finite-difference repre- sentation of the derivative, provides a viable alternative. Because of the finite-difference approximation, the rate of convergence of the secant method is initially slower than for the Newton-Raphson method. However, as the root estimate is refined, the difference approximation becomes a better representation of the true derivative, and convergence accelerates rapidly. The modified Newton-Raphson technique can be used to attain rapid convergence for multiple roots. However, this technique requires an analytical expression for both the first and second derivatives. Of particular interest are hybrid methods that combine the reliability of bracketing with the speed of open methods. Brent’s method does this by combining bisection with several open methods. All the methods are easy-to-moderate to program on computers and require minimal time to determine a single root. On this basis, you might conclude that simple methods such as bisection would be good enough for practical purposes. This would be true if you were exclusively interested in determining the root of an equation once. However, there are many cases in engineering where numerous root locations are required and where speed becomes important. For these cases, slow meth- ods are very time-consuming and, hence, costly. On the other hand, the fast open meth- ods may diverge, and the accompanying delays can also be costly. Some computer algorithms attempt to capitalize on the strong points of both classes of techniques by initially employing a bracketing method to approach the root, then switching to an open method to rapidly refine the estimate. Whether a single approach or a combination is used, the trade-offs between convergence and speed are at the heart of the choice of a root-location technique. PT2.5 IMPORTANT RELATIONSHIPS AND FORMULAS Table PT2.4 summarizes important information that was presented in Part Two. This table can be consulted to quickly access important relationships and formulas. PT2.6 ADVANCED METHODS AND ADDITIONAL REFERENCES The methods in this text have focused on determining a single real root of an algebraic or transcendental equation based on foreknowledge of its approximate location. In ad- dition, we have also described methods expressly designed to determine both the real
  • 245. 228 EPILOGUE: PART TWO and complex roots of polynomials. Additional references on the subject are Ralston and Rabinowitz (1978) and Carnahan, Luther, and Wilkes (1969). In addition to Müller’s and Bairstow’s methods, several techniques are available to determine all the roots of polynomials. In particular, the quotient difference (QD) algo- rithm (Henrici, 1964, and Gerald and Wheatley, 2004) determines all roots without initial guesses. Ralston and Rabinowitz (1978) and Carnahan, Luther, and Wilkes (1969) TABLE PT2.4 Summary of important information presented in Part Two. Graphical Errors and Method Formulation Interpretation Stopping Criteria Bracketing methods: Bisection xr 5 xl 1 xu 2 Stopping criterion: If f(xl)f(xr) , 0, xu 5 xr ` xnew r 2 xold r xnew r ` 100% # es f(xl)f(xr) . 0, xl 5 xr False position xr 5 xu 2 f (xu)(xl 2 xu) f (xl) 2 f (xu) Stopping criterion: If f(xl)f(xr) , 0, xu 5 xr ` xnew r 2 xold r xnew r ` 100% # es f(xl)f(xr) . 0, xl 5 xr Newton-Raphson Stopping criterion: xi11 5 xi 2 f (xi) f ¿(xi) ` xi11 2 xi xi11 ` 100% # es Error: Ei11 5 0(E2 i ) Secant Stopping criterion: xi11 5 xi 2 f (xi)(xi21 2 xi) f (xi21) 2 f (xi) ` xi11 2 xi xi11 ` 100% # es f(x) x xu xl L L/2 Root L/4 f(x) x xu xl xr Chord f(x) x xi xi + 1 Tangent f(x) x xi xi – 1 xi + 1
  • 246. PT2.6 ADVANCED METHODS AND ADDITIONAL REFERENCES 229 contain discussions of this method as well as of other techniques for locating roots of polynomials. As discussed in the text, the Jenkins-Traub and Laguerre’s methods are widely employed. In summary, the foregoing is intended to provide you with avenues for deeper exploration of the subject. Additionally, all the above references provide descrip- tions of the basic techniques covered in Part Two. We urge you to consult these alternative sources to broaden your understanding of numerical methods for root location.1 1 Books are referenced only by author here, a complete bibliography will be found at the back of this text.
  • 248. 231 PT3.1 MOTIVATION In Part Two, we determined the value x that satisfied a single equation, f(x) 5 0. Now, we deal with the case of determining the values x1, x2, . . . , xn that simultaneously sat- isfy a set of equations f1(x1, x2, p , xn) 5 0 f2(x1, x2, p , xn) 5 0 . . . . . . fn(x1, x2, p , xn) 5 0 Such systems can be either linear or nonlinear. In Part Three, we deal with linear alge- braic equations that are of the general form a11x1 1 a12x2 1 p 1 a1nxn 5 b1 a21x1 1 a22x2 1 p 1 a2nxn 5 b2 . . . . (PT3.1) . . an1x1 1 an2x2 1 p 1 annxn 5 bn where the a’s are constant coefficients, the b’s are constants, and n is the number of equa- tions. All other equations are nonlinear. Nonlinear systems were discussed in Chap. 6 and will be covered briefly again in Chap. 9. PT3.1.1 Noncomputer Methods for Solving Systems of Equations For small numbers of equations (n # 3), linear (and sometimes nonlinear) equations can be solved readily by simple techniques. Some of these methods will be reviewed at the beginning of Chap. 9. However, for four or more equations, solutions become arduous and computers must be utilized. Historically, the inability to solve all but the smallest sets of equations by hand has limited the scope of problems addressed in many engineering applications. Before computers, techniques to solve linear algebraic equations were time-consum- ing and awkward. These approaches placed a constraint on creativity because the methods were often difficult to implement and understand. Consequently, the techniques were sometimes overemphasized at the expense of other aspects of the problem-solving process such as formulation and interpretation (recall Fig. PT1.1 and accompanying discussion). LINEAR ALGEBRAIC EQUATIONS
  • 249. 232 LINEAR ALGEBRAIC EQUATIONS The advent of easily accessible computers makes it possible and practical for you to solve large sets of simultaneous linear algebraic equations. Thus, you can approach more complex and realistic examples and problems. Furthermore, you will have more time to test your creative skills because you will be able to place more emphasis on problem formulation and solution interpretation. PT3.1.2 Linear Algebraic Equations and Engineering Practice Many of the fundamental equations of engineering are based on conservation laws (recall Table 1.1). Some familiar quantities that conform to such laws are mass, energy, and momentum. In mathematical terms, these principles lead to balance or continuity equa- tions that relate system behavior as represented by the levels or response of the quantity being modeled to the properties or characteristics of the system and the external stimuli or forcing functions acting on the system. As an example, the principle of mass conservation can be used to formulate a model for a series of chemical reactors (Fig. PT3.1a). For this case, the quantity being modeled is the mass of the chemical in each reactor. The system properties are the reaction char- acteristics of the chemical and the reactors’ sizes and flow rates. The forcing functions are the feed rates of the chemical into the system. In Part Two, you saw how single-component systems result in a single equation that can be solved using root-location techniques. Multicomponent systems result in a coupled set of mathematical equations that must be solved simultaneously. The equations are FIGURE PT3.1 Two types of systems that can be modeled using linear algebraic equations: (a) lumped variable system that involves coupled finite components and (b) distributed variable system that involves a continuum. x1 x1 xi⫹1 xi⫺1 xn (b) Feed Feed x1 x5 (a) … … x2 x3 x4
  • 250. PT3.2 MATHEMATICAL BACKGROUND 233 coupled because the individual parts of the system are influenced by other parts. For example, in Fig. PT3.1a, reactor 4 receives chemical inputs from reactors 2 and 3. Con- sequently, its response is dependent on the quantity of chemical in these other reactors. When these dependencies are expressed mathematically, the resulting equations are often of the linear algebraic form of Eq. (PT3.1). The x’s are usually measures of the magnitudes of the responses of the individual components. Using Fig. PT3.1a as an example, x1 might quantify the amount of mass in the first reactor, x2 might quantify the amount in the second, and so forth. The a’s typically represent the properties and char- acteristics that bear on the interactions between components. For instance, the a’s for Fig. PT3.1a might be reflective of the flow rates of mass between the reactors. Finally, the b’s usually represent the forcing functions acting on the system, such as the feed rate in Fig. PT3.1a. The applications in Chap. 12 provide other examples of such equations derived from engineering practice. Multicomponent problems of the above types arise from both lumped (macro-) or distributed (micro-) variable mathematical models (Fig. PT3.1). Lumped variable prob- lems involve coupled finite components. Examples include trusses (Sec. 12.2), reactors (Fig. PT3.1a and Sec. 12.1), and electric circuits (Sec. 12.3). These types of problems use models that provide little or no spatial detail. Conversely, distributed variable problems attempt to describe spatial detail of sys- tems on a continuous or semicontinuous basis. The distribution of chemicals along the length of an elongated, rectangular reactor (Fig. PT3.1b) is an example of a continuous variable model. Differential equations derived from the conservation laws specify the distribution of the dependent variable for such systems. These differential equations can be solved numerically by converting them to an equivalent system of simultaneous alge- braic equations. The solution of such sets of equations represents a major engineering application area for the methods in the following chapters. These equations are coupled because the variables at one location are dependent on the variables in adjoining regions. For example, the concentration at the middle of the reactor is a function of the concen- tration in adjoining regions. Similar examples could be developed for the spatial distribu- tion of temperature or momentum. We will address such problems when we discuss differential equations later in the book. Aside from physical systems, simultaneous linear algebraic equations also arise in a variety of mathematical problem contexts. These result when mathematical functions are required to satisfy several conditions simultaneously. Each condition results in an equation that contains known coefficients and unknown variables. The techniques dis- cussed in this part can be used to solve for the unknowns when the equations are linear and algebraic. Some widely used numerical techniques that employ simultaneous equa- tions are regression analysis (Chap. 17) and spline interpolation (Chap. 18). PT3.2 MATHEMATICAL BACKGROUND All parts of this book require some mathematical background. For Part Three, matrix notation and algebra are useful because they provide a concise way to represent and manipulate linear algebraic equations. If you are already familiar with matrices, feel free to skip to Sec. PT3.3. For those who are unfamiliar or require a review, the following material provides a brief introduction to the subject.
  • 251. 234 LINEAR ALGEBRAIC EQUATIONS PT3.2.1 Matrix Notation A matrix consists of a rectangular array of elements represented by a single symbol. As depicted in Fig. PT3.2, [A] is the shorthand notation for the matrix and aij designates an individual element of the matrix. A horizontal set of elements is called a row and a vertical set is called a column. The first subscript i always designates the number of the row in which the element lies. The second subscript j designates the column. For example, element a23 is in row 2 and column 3. The matrix in Fig. PT3.2 has n rows and m columns and is said to have a dimension of n by m (or n 3 m). It is referred to as an n by m matrix. Matrices with row dimension n 5 1, such as [B] 5 [b1 b2 p bm] are called row vectors. Note that for simplicity, the first subscript of each element is dropped. Also, it should be mentioned that there are times when it is desirable to employ a special shorthand notation to distinguish a row matrix from other types of matrices. One way to accomplish this is to employ special open-topped brackets, as in :B;. Matrices with column dimension m 5 1, such as [C] 5 F c1 c2 . . . cn V are referred to as column vectors. For simplicity, the second subscript is dropped. As with the row vector, there are occasions when it is desirable to employ a special short- hand notation to distinguish a column matrix from other types of matrices. One way to accomplish this is to employ special brackets, as in {C}. FIGURE PT3.2 A matrix. Column 3 [A] 5 F a11 a12 a13 p a1m a21 a22 a23 p a2m . . . . . . . . . an1 an2 an3 p anm V Row 2
  • 252. PT3.2 MATHEMATICAL BACKGROUND 235 Matrices where n 5 m are called square matrices. For example, a 4 by 4 matrix is [A] 5 ≥ a11 a12 a13 a14 a21 a22 a23 a24 a31 a32 a33 a34 a41 a42 a43 a44 ¥ The diagonal consisting of the elements a11, a22, a33, and a44 is termed the principal or main diagonal of the matrix. Square matrices are particularly important when solving sets of simultaneous linear equations. For such systems, the number of equations (corresponding to rows) and the number of unknowns (corresponding to columns) must be equal for a unique solution to be possible. Consequently, square matrices of coefficients are encountered when dealing with such systems. Some special types of square matrices are described in Box PT3.1. There are a number of special forms of square matrices that are important and should be noted: A symmetric matrix is one where aij 5 aji for all i’s and j’s. For example, [A] 5 £ 5 1 2 1 3 7 2 7 8 § is a 3 by 3 symmetric matrix. A diagonal matrix is a square matrix where all elements off the main diagonal are equal to zero, as in [A] 5 ≥ a11 a22 a33 a44 ¥ Note that where large blocks of elements are zero, they are left blank. An identity matrix is a diagonal matrix where all elements on the main diagonal are equal to 1, as in [I] 5 ≥ 1 1 1 1 ¥ The symbol [I] is used to denote the identity matrix. The identity matrix has properties similar to unity. An upper triangular matrix is one where all the elements below the main diagonal are zero, as in [A] 5 ≥ a11 a12 a13 a14 a22 a23 a24 a33 a34 a44 ¥ A lower triangular matrix is one where all elements above the main diagonal are zero, as in [A] 5 ≥ a11 a21 a22 a31 a32 a33 a41 a42 a43 a44 ¥ A banded matrix has all elements equal to zero, with the excep- tion of a band centered on the main diagonal: [A] 5 ≥ a12 a12 a21 a22 a23 a32 a33 a34 a43 a44 ¥ The above matrix has a bandwidth of 3 and is given a special name—the tridiagonal matrix. Box PT3.1 Special Types of Square Matrices
  • 253. 236 LINEAR ALGEBRAIC EQUATIONS PT3.2.2 Matrix Operating Rules Now that we have specified what we mean by a matrix, we can define some operating rules that govern its use. Two n by m matrices are equal if, and only if, every element in the first is equal to every element in the second, that is, [A] 5 [B] if aij 5 bij for all i and j. Addition of two matrices, say, [A] and [B], is accomplished by adding corresponding terms in each matrix. The elements of the resulting matrix [C] are computed, cij 5 aij 1 bij for i 5 1, 2, . . . , n and j 5 1, 2, . . . , m. Similarly, the subtraction of two matrices, say, [E] minus [F], is obtained by subtracting corresponding terms, as in dij 5 eij 2 fij for i 5 1, 2, . . . , n and j 5 1, 2, . . . , m. It follows directly from the above definitions that addition and subtraction can be performed only between matrices having the same dimensions. Both addition and subtraction are commutative: [A] 1 [B] 5 [B] 1 [A] Addition and subtraction are also associative, that is, ([A] 1 [B]) 1 [C] 5 [A] 1 ([B] 1 [C]) The multiplication of a matrix [A] by a scalar g is obtained by multiplying every element of [A] by g, as in [D] 5 g[A] 5 F ga11 ga12 p ga1m ga21 ga22 p ga2m . . . . . . . . . gan1 gan2 p ganm V The product of two matrices is represented as [C] 5 [A][B], where the elements of [C] are defined as (see Box PT3.2 for a simple way to conceptualize matrix multiplication) cij 5 a n k51 aikbkj (PT3.2) where n 5 the column dimension of [A] and the row dimension of [B]. That is, the cij element is obtained by adding the product of individual elements from the ith row of the first matrix, in this case [A], by the jth column of the second matrix [B]. According to this definition, multiplication of two matrices can be performed only if the first matrix has as many columns as the number of rows in the second matrix. Thus, if [A] is an n by m matrix, [B] could be an m by l matrix. For this case, the result- ing [C] matrix would have the dimension of n by l. However, if [B] were an l by m matrix, the multiplication could not be performed. Figure PT3.3 provides an easy way to check whether two matrices can be multiplied.
  • 254. PT3.2 MATHEMATICAL BACKGROUND 237 FIGURE PT3.3 Box PT3.2 A Simple Method for Multiplying Two Matrices Although Eq. (PT3.2) is well suited for implementation on a computer, it is not the simplest means for visualizing the mechanics of multiplying two matrices. What follows gives more tangible expression to the operation. Suppose that we want to multiply [X] by [Y] to yield [Z], [Z] 5 [X][Y] 5 £ 3 1 8 6 0 4 § c 5 9 7 2 d A simple way to visualize the computation of [Z] is to raise [Y], as in A c 5 9 7 2 d d [Y] [X] S £ 3 1 8 6 0 4 § £ ? § d [Z] Now the answer [Z] can be computed in the space vacated by [Y]. This format has utility because it aligns the appropriate rows and columns that are to be multiplied. For example, according to Eq. (PT3.2), the element z11 is obtained by multiplying the first row of [X] by the first column of [Y]. This amounts to adding the product of x11 and y11 to the product of x12 and y21, as in c 5 9 7 2 d T £ 3 1 8 6 0 4 § S £ 3 3 5 1 1 3 7 5 22 § Thus, z11 is equal to 22. Element z21 can be computed in a similar fashion, as in c 5 9 7 2 d T £ 3 1 8 6 0 4 § S £ 22 8 3 5 1 6 3 7 5 82 § The computation can be continued in this way, following the alignment of the rows and columns, to yield the result [Z] 5 £ 22 29 82 84 28 8 § Note how this simple method makes it clear why it is impossible to multiply two matrices if the number of columns of the first ma- trix does not equal the number of rows in the second matrix. Also, note how it demonstrates that the order of multiplication matters (that is, matrix multiplication is not commutative). [A]n ⴛ m [B]m ⴛ l ⴝ [C]n ⴛ l Interior dimensions are equal; multiplication is possible Exterior dimensions define the dimensions of the result
  • 255. 238 LINEAR ALGEBRAIC EQUATIONS If the dimensions of the matrices are suitable, matrix multiplication is associative, ([A][B])[C] 5 [A]([B][C]) and distributive, [A]([B] 1 [C]) 5 [A][B] 1 [A][C] or ([A] 1 [B])[C] 5 [A][C] 1 [B][C] However, multiplication is not generally commutative: [A][B] ? [B][A] That is, the order of multiplication is important. Figure PT3.4 shows pseudocode to multiply an n by m matrix [A], by an m by l matrix [B], and store the result in an n by l matrix [C]. Notice that, instead of the inner product being directly accumulated in [C], it is collected in a temporary vari- able, sum. This is done for two reasons. First, it is a bit more efficient, because the computer need determine the location of ci, j only n 3 l times rather than n 3 l 3 m times. Second, the precision of the multiplication can be greatly improved by declar- ing sum as a double precision variable (recall the discussion of inner products in Sec. 3.4.2). Although multiplication is possible, matrix division is not a defined operation. How- ever, if a matrix [A] is square and nonsingular, there is another matrix [A]21 , called the inverse of [A], for which [A][A]21 5 [A]21 [A] 5 [I] (PT3.3) Thus, the multiplication of a matrix by the inverse is analogous to division, in the sense that a number divided by itself is equal to 1. That is, multiplication of a matrix by its inverse leads to the identity matrix (recall Box PT3.1). The inverse of a two-dimensional square matrix can be represented simply by [A]21 5 1 a11a22 2 a12a21 c a22 2a12 2a21 a11 d (PT3.4) SUBROUTINE Mmult (a, b, c, m, n, l) DOFOR i 5 1, n DOFOR j 5 1, l sum 5 0. DOFOR k 5 1, m sum 5 sum 1 a(i,k) ? b(k,j) END DO c(i,j) 5 sum END DO END DO FIGURE PT3.4
  • 256. PT3.2 MATHEMATICAL BACKGROUND 239 Similar formulas for higher-dimensional matrices are much more involved. Sections in Chaps. 10 and 11 will be devoted to techniques for using numerical methods and the computer to calculate the inverse for such systems. Two other matrix manipulations that will have utility in our discussion are the trans- pose and the trace of a matrix. The transpose of a matrix involves transforming its rows into columns and its columns into rows. For example, for the 4 3 4 matrix, [A] 5 ≥ a11 a12 a13 a14 a21 a22 a23 a24 a31 a32 a33 a34 a41 a42 a43 a44 ¥ the transpose, designated [A]T , is defined as [A]T 5 ≥ a11 a21 a31 a41 a12 a22 a32 a42 a13 a23 a33 a43 a14 a24 a34 a44 ¥ In other words, the element aij of the transpose is equal to the aji element of the original matrix. The transpose has a variety of functions in matrix algebra. One simple advantage is that it allows a column vector to be written as a row. For example, if {C} 5 μ c1 c2 c3 c4 ∂ then {C}T 5 :c1 c2 c3 c4 ; where the superscript T designates the transpose. For example, this can save space when writing a column vector in a manuscript. In addition, the transpose has numerous math- ematical applications. The trace of a matrix is the sum of the elements on its principal diagonal. It is designated as tr [A] and is computed as tr [A] 5 a n i51 aii The trace will be used in our discussion of eigenvalues in Chap. 27. The final matrix manipulation that will have utility in our discussion is augmentation. A matrix is augmented by the addition of a column (or columns) to the original matrix. For example, suppose we have a matrix of coefficients: [A] 5 £ a11 a12 a13 a21 a22 a23 a31 a32 a33 §
  • 257. 240 LINEAR ALGEBRAIC EQUATIONS We might wish to augment this matrix [A] with an identity matrix (recall Box PT3.1) to yield a 3-by-6-dimensional matrix: [A] 5 £ a11 a12 a13 a21 a22 a23 a31 a32 a33 1 0 0 0 1 0 0 0 1 § Such an expression has utility when we must perform a set of identical operations on two matrices. Thus, we can perform the operations on the single augmented matrix rather than on the two individual matrices. PT3.2.3 Representing Linear Algebraic Equations in Matrix Form It should be clear that matrices provide a concise notation for representing simultaneous linear equations. For example, Eq. (PT3.1) can be expressed as [A]{X} 5 {B} (PT3.5) where [A] is the n by n square matrix of coefficients, [A] 5 F a11 a12 p a1n a21 a22 p a2n . . . . . . . . . an1 an2 p ann V {B} is the n by 1 column vector of constants, {B}T 5 :b1 b2 p bn ; and {X} is the n by 1 column vector of unknowns: {X}T 5 :x1 x2 p xn ; Recall the definition of matrix multiplication [Eq. (PT3.2) or Box PT3.2] to convince yourself that Eqs. (PT3.1) and (PT3.5) are equivalent. Also, realize that Eq. (PT3.5) is a valid matrix multiplication because the number of columns, n, of the first matrix [A] is equal to the number of rows, n, of the second matrix {X}. This part of the book is devoted to solving Eq. (PT3.5) for {X}. A formal way to obtain a solution using matrix algebra is to multiply each side of the equation by the inverse of [A] to yield [A]21 [A]{X} 5 [A]21 {B} Because [A]21 [A] equals the identity matrix, the equation becomes {X} 5 [A]21 {B} (PT3.6) Therefore, the equation has been solved for {X}. This is another example of how the inverse plays a role in matrix algebra that is similar to division. It should be noted that this is not a very efficient way to solve a system of equations. Thus, other approaches
  • 258. PT3.3 ORIENTATION 241 are employed in numerical algorithms. However, as discussed in Chap. 10, the matrix inverse itself has great value in the engineering analyses of such systems. Finally, we will sometimes find it useful to augment [A] with {B}. For example, if n 5 3, this results in a 3-by-4-dimensional matrix: [A] 5 £ a11 a12 a13 b1 a21 a22 a23 b2 a31 a32 a33 b3 § (PT3.7) Expressing the equations in this form is useful because several of the techniques for solving linear systems perform identical operations on a row of coefficients and the cor- responding right-hand-side constant. As expressed in Eq. (PT3.7), we can perform the manipulation once on an individual row of the augmented matrix rather than separately on the coefficient matrix and the right-hand-side vector. PT3.3 ORIENTATION Before proceeding to the numerical methods, some further orientation might be helpful. The following is intended as an overview of the material discussed in Part Three. In addition, we have formulated some objectives to help focus your efforts when studying the material. PT3.3.1 Scope and Preview Figure PT3.5 provides an overview for Part Three. Chapter 9 is devoted to the most fundamental technique for solving linear algebraic systems: Gauss elimination. Before launching into a detailed discussion of this technique, a preliminary section deals with simple methods for solving small systems. These approaches are presented to provide you with visual insight and because one of the methods—the elimination of unknowns— represents the basis for Gauss elimination. After the preliminary material, “naive’’ Gauss elimination is discussed. We start with this “stripped-down” version because it allows the fundamental technique to be elabo- rated on without complicating details. Then, in subsequent sections, we discuss potential problems of the naive approach and present a number of modifications to minimize and circumvent these problems. The focus of this discussion will be the process of switching rows, or partial pivoting. Chapter 10 begins by illustrating how Gauss elimination can be formulated as an LU decomposition solution. Such solution techniques are valuable for cases where many right-hand-side vectors need to be evaluated. It is shown how this attribute allows efficient calculation of the matrix inverse, which has tremendous utility in engineering practice. Finally, the chapter ends with a discussion of matrix condition. The condition number is introduced as a measure of the loss of significant digits of accuracy that can result when solving ill-conditioned matrices. The beginning of Chap. 11 focuses on special types of systems of equations that have broad engineering application. In particular, efficient techniques for solving tridiagonal systems are presented. Then, the remainder of the chapter focuses on an alternative to elimination methods called the Gauss-Seidel method. This technique is similar in spirit to
  • 259. 242 LINEAR ALGEBRAIC EQUATIONS FIGURE PT3.5 Schematic of the organization of the material in Part Three: Linear Algebraic Equations. PT 3.1 Motivation PT 3.2 Mathematical background PT 3.3 Orientation 9.1 Small systems 9.2 Naive Gauss elimination PART 3 Linear Algebraic Equations PT 3.6 Advanced methods EPILOGUE CHAPTER 9 Gauss Elimination PT 3.5 Important formulas PT 3.4 Trade-offs 12.4 Mechanical engineering 12.3 Electrical engineering 12.2 Civil engineering 12.1 Chemical engineering 11.3 Software 11.2 Gauss- Seidel 11.1 Special matrices CHAPTER 10 LU Decomposition and Matrix Inversion CHAPTER 11 Special Matrices and Gauss-Seidel CHAPTER 12 Engineering Case Studies 10.3 System condition 10.2 Matrix inverse 10.1 LU decomposition 9.7 Gauss-Jordan 9.6 Nonlinear systems 9.5 Complex systems 9.4 Remedies 9.3 Pitfalls
  • 260. PT3.3 ORIENTATION 243 the approximate methods for roots of equations that were discussed in Chap. 6. That is, the technique involves guessing a solution and then iterating to obtain a refined estimate. The chapter ends with information related to solving linear algebraic equations with software packages. Chapter 12 demonstrates how the methods can actually be applied for problem solv- ing. As with other parts of the book, applications are drawn from all fields of engineering. Finally, an epilogue is included at the end of Part Three. This review includes dis- cussion of trade-offs that are relevant to implementation of the methods in engineering practice. This section also summarizes the important formulas and advanced methods related to linear algebraic equations. As such, it can be used before exams or as a refresher after you have graduated and must return to linear algebraic equations as a professional. PT3.3.2 Goals and Objectives Study Objectives. After completing Part Three, you should be able to solve problems involving linear algebraic equations and appreciate the application of these equations in many fields of engineering. You should strive to master several techniques and assess their reliability. You should understand the trade-offs involved in selecting the “best” method (or methods) for any particular problem. In addition to these general objectives, the specific concepts listed in Table PT3.1 should be assimilated and mastered. Computer Objectives. Your most fundamental computer objectives are to be able to solve a system of linear algebraic equations and to evaluate the matrix inverse. You will TABLE PT3.1 Specific study objectives for Part Three. 1. Understand the graphical interpretation of ill-conditioned systems and how it relates to the determinant. 2. Be familiar with terminology: forward elimination, back substitution, pivot equation, and pivot coefficient. 3. Understand the problems of division by zero, round-off error, and ill-conditioning. 4. Know how to compute the determinant using Gauss elimination. 5. Understand the advantages of pivoting; realize the difference between partial and complete pivoting. 6. Know the fundamental difference between Gauss elimination and the Gauss-Jordan method and which is more efficient. 7. Recognize how Gauss elimination can be formulated as an LU decomposition. 8. Know how to incorporate pivoting and matrix inversion into an LU decomposition algorithm. 9. Know how to interpret the elements of the matrix inverse in evaluating stimulus response computations in engineering. 10. Realize how to use the inverse and matrix norms to evaluate system condition. 11. Understand how banded and symmetric systems can be decomposed and solved efficiently. 12. Understand why the Gauss-Seidel method is particularly well suited for large, sparse systems of equations. 13. Know how to assess diagonal dominance of a system of equations and how it relates to whether the system can be solved with the Gauss-Seidel method. 14. Understand the rationale behind relaxation; know where underrelaxation and overrelaxation are appropriate.
  • 261. 244 LINEAR ALGEBRAIC EQUATIONS want to have subprograms developed for LU decomposition of both full and tridiagonal matrices. You may also want to have your own software to implement the Gauss-Seidel method. You should know how to use packages to solve linear algebraic equations and find the matrix inverse. You should become familiar with how the same evaluations can be implemented on popular software packages such as Excel, MATLAB software, and Mathcad.
  • 262. 9 C H A P T E R 9 245 Gauss Elimination This chapter deals with simultaneous linear algebraic equations that can be represented generally as a11x1 1 a12x2 1 p 1 a1nxn 5 b1 a21x1 1 a22x2 1 p 1 a2n xn 5 b2 . . (9.1) . . . . an1x1 1 an2 x2 1 p 1 ann xn 5 bn where the a’s are constant coefficients and the b’s are constants. The technique described in this chapter is called Gauss elimination because it involves combining equations to eliminate unknowns. Although it is one of the earliest methods for solving simultaneous equations, it remains among the most important algorithms in use today and is the basis for linear equation solving on many popular software packages. 9.1 SOLVING SMALL NUMBERS OF EQUATIONS Before proceeding to the computer methods, we will describe several methods that are appropriate for solving small (n # 3) sets of simultaneous equations and that do not require a computer. These are the graphical method, Cramer’s rule, and the elimination of unknowns. 9.1.1 The Graphical Method A graphical solution is obtainable for two equations by plotting them on Cartesian co- ordinates with one axis corresponding to x1 and the other to x2. Because we are dealing with linear systems, each equation is a straight line. This can be easily illustrated for the general equations a11x1 1 a12 x2 5 b1 a21x1 1 a22x2 5 b2
  • 263. 246 GAUSS ELIMINATION Both equations can be solved for x2: x2 5 2a a11 a12 b x1 1 b1 a12 x2 5 2a a21 a22 b x1 1 b2 a22 Thus, the equations are now in the form of straight lines; that is, x2 5 (slope) x1 1 inter- cept. These lines can be graphed on Cartesian coordinates with x2 as the ordinate and x1 as the abscissa. The values of x1 and x2 at the intersection of the lines represent the solution. EXAMPLE 9.1 The Graphical Method for Two Equations Problem Statement. Use the graphical method to solve 3x1 1 2x2 5 18 (E9.1.1) 2x1 1 2x2 5 2 (E9.1.2) Solution. Let x1 be the abscissa. Solve Eq. (E9.1.1) for x2: x2 5 2 3 2 x1 1 9 which, when plotted on Fig. 9.1, is a straight line with an intercept of 9 and a slope of 23y2. FIGURE 9.1 Graphical solution of a set of two simultaneous linear algebraic equations. The intersection of the lines represents the solution. 0 6 2 4 0 6 2 4 8 x2 x1 Solution: x1 4; x2 3 x1 2x2 2 3 x 1 2 x 2 1 8
  • 264. 9.1 SOLVING SMALL NUMBERS OF EQUATIONS 247 For three simultaneous equations, each equation would be represented by a plane in a three-dimensional coordinate system. The point where the three planes intersect would represent the solution. Beyond three equations, graphical methods break down and, con- sequently, have little practical value for solving simultaneous equations. However, they sometimes prove useful in visualizing properties of the solutions. For example, Fig. 9.2 depicts three cases that can pose problems when solving sets of linear equations. Figure 9.2a shows the case where the two equations represent parallel lines. For such situations, there is no solution because the lines never cross. Figure 9.2b depicts the case where the two lines are coincident. For such situations there is an infinite number of solutions. Both types of systems are said to be singular. In addition, systems that are very close to being singular (Fig. 9.2c) can also cause problems. These systems are said to be ill-conditioned. Graphically, this corresponds to the fact that it is difficult to identify the exact point at which the lines intersect. Ill-conditioned systems will also pose problems when they are encountered during the numerical solution of linear equations. This is because they will be extremely sensitive to round-off error (recall Sec. 4.2.3). Equation (E9.1.2) can also be solved for x2: x2 5 1 2 x1 1 1 which is also plotted on Fig. 9.1. The solution is the intersection of the two lines at x1 5 4 and x2 5 3. This result can be checked by substituting these values into the original equations to yield 3(4) 1 2(3) 5 18 2(4) 1 2(3) 5 2 Thus, the results are equivalent to the right-hand sides of the original equations. FIGURE 9.2 Graphical depiction of singular and ill-conditioned systems: (a) no solution, (b) infinite solutions, and (c) ill-conditioned system where the slopes are so close that the point of intersection is difficult to detect visually. x2 x1 x1 x2 1 x1 x2 (a) (b) x2 x1 x1 2x2 2 x1 x2 1 (c) x2 x1 x1 x2 1 2 1 x1 x2 1.1 5 2.3 2 1 2 1 2 1 2 1
  • 265. 248 GAUSS ELIMINATION 9.1.2 Determinants and Cramer’s Rule Cramer’s rule is another solution technique that is best suited to small numbers of equa- tions. Before describing this method, we will briefly introduce the concept of the deter- minant, which is used to implement Cramer’s rule. In addition, the determinant has relevance to the evaluation of the ill-conditioning of a matrix. Determinants. The determinant can be illustrated for a set of three equations: [A]{X} 5 {B} where [A] is the coefficient matrix: [A] 5 £ a11 a12 a13 a21 a22 a23 a31 a32 a33 § The determinant D of this system is formed from the coefficients of the equation, as in D 5 † a11 a12 a13 a21 a22 a23 a31 a32 a33 † (9.2) Although the determinant D and the coefficient matrix [A] are composed of the same elements, they are completely different mathematical concepts. That is why they are distinguished visually by using brackets to enclose the matrix and straight lines to enclose the determinant. In contrast to a matrix, the determinant is a single number. For example, the value of the second-order determinant D 5 ` a11 a12 a21 a22 ` is calculated by D 5 a11a22 2 a12a21 (9.3) For the third-order case [Eq. (9.2)], a single numerical value for the determinant can be computed as D 5 a11 ` a22 a23 a32 a33 ` 2a12 ` a21 a23 a31 a33 ` 1a13 ` a21 a22 a31 a32 ` (9.4) where the 2 by 2 determinants are called minors. EXAMPLE 9.2 Determinants Problem Statement. Compute values for the determinants of the systems represented in Figs. 9.1 and 9.2. Solution. For Fig. 9.1: D 5 ` 3 2 21 2 ` 5 3(2) 2 2(21) 5 8
  • 266. 9.1 SOLVING SMALL NUMBERS OF EQUATIONS 249 In the foregoing example, the singular systems had zero determinants. Additionally, the results suggest that the system that is almost singular (Fig. 9.2c) has a determinant that is close to zero. These ideas will be pursued further in our subsequent discussion of ill-conditioning (Sec. 9.3.3). Cramer’s Rule. This rule states that each unknown in a system of linear algebraic equa- tions may be expressed as a fraction of two determinants with denominator D and with the numerator obtained from D by replacing the column of coefficients of the unknown in question by the constants b1, b2, . . . , bn. For example, x1 would be computed as x1 5 † b1 a12 a13 b2 a22 a23 b3 a32 a33 † D (9.5) EXAMPLE 9.3 Cramer’s Rule Problem Statement. Use Cramer’s rule to solve 0.3x1 1 0.52x2 1 x3 5 20.01 0.5x1 1 x2 1 1.9x3 5 0.67 0.1x1 1 0.3x2 1 0.5x3 5 20.44 Solution. The determinant D can be written as [Eq. (9.2)] D 5 † 0.3 0.52 1 0.5 1 1.9 0.1 0.3 0.5 † The minors are [Eq. (9.3)] A1 5 ` 1 1.9 0.3 0.5 ` 5 1(0.5) 2 1.9(0.3) 5 20.07 A2 5 ` 0.5 1.9 0.1 0.5 ` 5 0.5(0.5) 2 1.9(0.1) 5 0.06 For Fig. 9.2a: D 5 ` 21y2 1 21y2 1 ` 5 21 2 (1) 2 1a 21 2 b 5 0 For Fig. 9.2b: D 5 ` 21y2 1 21 2 ` 5 21 2 (2) 2 1(21) 5 0 For Fig. 9.2c: D 5 ` 21y2 1 22.3y5 1 ` 5 21 2 (1) 2 1a 22.3 5 b 5 20.04
  • 267. 250 GAUSS ELIMINATION A3 5 ` 0.5 1 0.1 0.3 ` 5 0.5(0.3) 2 1(0.1) 5 0.05 These can be used to evaluate the determinant, as in [Eq. (9.4)] D 5 0.3(20.07) 2 0.52(0.06) 1 1(0.05) 5 20.0022 Applying Eq. (9.5), the solution is x1 5 † 20.01 0.52 1 0.67 1 1.9 20.44 0.3 0.5 † 20.0022 5 0.03278 20.0022 5 214.9 x2 5 † 0.3 20.01 1 0.5 0.67 1.9 0.1 20.44 0.5 † 20.0022 5 0.0649 20.0022 5 229.5 x3 5 † 0.3 0.52 20.01 0.5 1 0.67 0.1 0.3 20.44 † 20.0022 5 20.04356 20.0022 5 19.8 For more than three equations, Cramer’s rule becomes impractical because, as the number of equations increases, the determinants are time consuming to evaluate by hand (or by computer). Consequently, more efficient alternatives are used. Some of these al- ternatives are based on the last noncomputer solution technique covered in the next section—the elimination of unknowns. 9.1.3 The Elimination of Unknowns The elimination of unknowns by combining equations is an algebraic approach that can be illustrated for a set of two equations: a11x1 1 a12x2 5 b1 (9.6) a21x1 1 a22x2 5 b2 (9.7) The basic strategy is to multiply the equations by constants so that one of the unknowns will be eliminated when the two equations are combined. The result is a single equation that can be solved for the remaining unknown. This value can then be substituted into either of the original equations to compute the other variable. For example, Eq. (9.6) might be multiplied by a21 and Eq. (9.7) by a11 to give a11a21x1 1 a12a21x2 5 b1a21 (9.8) a21a11x1 1 a22a11x2 5 b2a11 (9.9)
  • 268. 9.1 SOLVING SMALL NUMBERS OF EQUATIONS 251 Subtracting Eq. (9.8) from Eq. (9.9) will, therefore, eliminate the x1 term from the equa- tions to yield a22a11x2 2 a12a21x2 5 b2a11 2 b1a21 which can be solved for x2 5 a11b2 2 a21b1 a11a22 2 a12a21 (9.10) Equation (9.10) can then be substituted into Eq. (9.6), which can be solved for x1 5 a22b1 2 a12b2 a11a22 2 a12a21 (9.11) Notice that Eqs. (9.10) and (9.11) follow directly from Cramer’s rule, which states x1 5 ` b1 a12 b2 a22 ` ` a11 a12 a21 a22 ` 5 b1a22 2 a12b2 a11a22 2 a12a21 x2 5 ` a11 b1 a21 b2 ` ` a11 a12 a21 a22 ` 5 a11b22 2 b1a21 a11a22 2 a12a21 EXAMPLE 9.4 Elimination of Unknowns Problem Statement. Use the elimination of unknowns to solve (recall Example 9.1) 3x1 1 2x2 5 18 2x1 1 2x2 5 2 Solution. Using Eqs. (9.11) and (9.10), x1 5 2(18) 2 2(2) 3(2) 2 2(21) 5 4 x2 5 3(2) 2 (21)18 3(2) 2 2(21) 5 3 which is consistent with our graphical solution (Fig. 9.1). The elimination of unknowns can be extended to systems with more than two or three equations. However, the numerous calculations that are required for larger systems make the method extremely tedious to implement by hand. However, as described in the next section, the technique can be formalized and readily programmed for the computer.
  • 269. 252 GAUSS ELIMINATION 9.2 NAIVE GAUSS ELIMINATION In the previous section, the elimination of unknowns was used to solve a pair of simul- taneous equations. The procedure consisted of two steps: 1. The equations were manipulated to eliminate one of the unknowns from the equations. The result of this elimination step was that we had one equation with one unknown. 2. Consequently, this equation could be solved directly and the result back-substituted into one of the original equations to solve for the remaining unknown. This basic approach can be extended to large sets of equations by developing a systematic scheme or algorithm to eliminate unknowns and to back-substitute. Gauss elimination is the most basic of these schemes. This section includes the systematic techniques for forward elimination and back sub- stitution that comprise Gauss elimination. Although these techniques are ideally suited for implementation on computers, some modifications will be required to obtain a reliable algo- rithm. In particular, the computer program must avoid division by zero. The following method is called “naive” Gauss elimination because it does not avoid this problem. Subsequent sections will deal with the additional features required for an effective computer program. The approach is designed to solve a general set of n equations: a11x1 1 a12x2 1 a13x3 1 p 1 a1nxn 5 b1 (9.12a) a21x1 1 a22x2 1 a23x3 1 p 1 a2nxn 5 b2 (9.12b) . . . . . . an1x1 1 an2x2 1 an3x3 1 p 1 annxn 5 bn (9.12c) As was the case with the solution of two equations, the technique for n equations consists of two phases: elimination of unknowns and solution through back substitution. Forward Elimination of Unknowns. The first phase is designed to reduce the set of equations to an upper triangular system (Fig. 9.3). The initial step will be to eliminate the first unknown, x1, from the second through the nth equations. To do this, multiply Eq. (9.12a) by a21Ya11 to give a21x1 1 a21 a11 a12x2 1 p 1 a21 a11 a1n xn 5 a21 a11 b1 (9.13) Now, this equation can be subtracted from Eq. (9.12b) to give aa22 2 a21 a11 a12b x2 1 p 1 aa2n 2 a21 a11 a1nb xn 5 b2 2 a21 a11 b1 or a¿ 22 x2 1 p 1 a¿ 2n xn 5 b¿ 2 where the prime indicates that the elements have been changed from their original values. The procedure is then repeated for the remaining equations. For instance, Eq. (9.12a) can be multiplied by a31ya11 and the result subtracted from the third equation. Repeating
  • 270. 9.2 NAIVE GAUSS ELIMINATION 253 the procedure for the remaining equations results in the following modified system: a11x1 1 a12x2 1 a13x3 1 p 1 a1nxn 5 b1 (9.14a) a¿ 22x2 1 a¿ 23x3 1 p 1 a¿ 2nxn 5 b¿ 2 (9.14b) a¿ 32x2 1 a¿ 33x3 1 p 1 a¿ 3nxn 5 b¿ 3 (9.14c) . . . . . . a¿ n2x2 1 a¿ n3x3 1 p 1 a¿ nnxn 5 b¿ n (9.14d) For the foregoing steps, Eq. (9.12a) is called the pivot equation and a11 is called the pivot coefficient or element. Note that the process of multiplying the first row by a21ya11 is equivalent to dividing it by a11 and multiplying it by a21. Sometimes the division operation is referred to as normalization. We make this distinction because a zero pivot element can interfere with normalization by causing a division by zero. We will return to this important issue after we complete our description of naive Gauss elimination. Now repeat the above to eliminate the second unknown from Eq. (9.14c) through (9.14d). To do this multiply Eq. (9.14b) by a9 32ya9 22 and subtract the result from Eq. (9.14c). Perform a similar elimination for the remaining equations to yield a11x1 1 a12x2 1 a13x3 1 p 1 a1nxn 5 b1 a¿ 22x2 1 a¿ 23x3 1 p 1 a¿ 2nxn 5 b¿ 2 a– 33x3 1 p 1 a– 3nxn 5 b– 2 . . . . . . a– n3 x3 1 p 1 a– nnxn 5 b– n where the double prime indicates that the elements have been modified twice. FIGURE 9.3 The two phases of Gauss elimination: forward elimination and back substitution. The primes indicate the number of times that the coefficients and constants have been modified. £ a11 a12 a13 b1 a21 a22 a23 b2 a31 a32 a33 b3 § 2 £ a11 a12 a13 b1 a'22 a'23 b'2 a'' 33 b'' 3 § 2 x3 5 b'' 3ya'' 33 x2 5 (b'2 2 a'2333)ya'22 x1 5 (b1 2 a1232 2 a1333)ya11 Forward elimination Back substitution
  • 271. 254 GAUSS ELIMINATION The procedure can be continued using the remaining pivot equations. The final ma- nipulation in the sequence is to use the (n 2 1)th equation to eliminate the xn21 term from the nth equation. At this point, the system will have been transformed to an upper triangular system (recall Box PT3.1): a11x1 1 a12x2 1 a13x3 1 p 1 a1nxn 5 b1 (9.15a) a¿22x2 1 a¿ 23x3 1 p 1 a¿ 2nxn 5 b¿ 2 (9.15b) a– 33x3 1 p 1 a– 3nxn 5 b– 3 (9.15c) . . . . . . a(n21) nn xn 5 bn (n21) (9.15d) Pseudocode to implement forward elimination is presented in Fig. 9.4a. Notice that three nested loops provide a concise representation of the process. The outer loop moves down the matrix from one pivot row to the next. The middle loop moves below the pivot row to each of the subsequent rows where elimination is to take place. Finally, the innermost loop pro- gresses across the columns to eliminate or transform the elements of a particular row. Back Substitution. Equation (9.15d) can now be solved for xn: xn 5 b(n21) n a(n21) nn (9.16) This result can be back-substituted into the (n 2 l)th equation to solve for xn21. The procedure, which is repeated to evaluate the remaining x’s, can be represented by the following formula: xi 5 b(i21) i 2 a n j5i11 a(i21) ij xj a(i21) ii for i 5 n 2 1, n 2 2, p , 1 (9.17) (a) DOFOR k 5 1, n 2 1 DOFOR i 5 k 1 1, n factor 5 ai,k y ak,k DOFOR j 5 k 1 1 to n ai,j 5 ai,j 2 factor ? ak,j END DO bi 5 bi 2 factor ? bk END DO END DO (b) xn 5 bn y an,n DOFOR i 5 n 2 1, 1, 21 sum 5 bi DOFOR j 5 i 1 1, n sum 5 sum 2 ai,j ? xj END DO xi 5 sum y ai,i END DO FIGURE 9.4 Pseudocode to perform (a) for- ward elimination and (b) back substitution.
  • 272. 9.2 NAIVE GAUSS ELIMINATION 255 Pseudocode to implement Eqs. (9.16) and (9.17) is presented in Fig. 9.4b. Notice the similarity between this pseudocode and that in Fig. PT3.4 for matrix multiplication. As with Fig. PT3.4, a temporary variable, sum, is used to accumulate the summation from Eq. (9.17). This results in a somewhat faster execution time than if the summation were accumulated in bi. More importantly, it allows efficient improvement in precision if the variable, sum, is declared in double precision. EXAMPLE 9.5 Naive Gauss Elimination Problem Statement. Use Gauss elimination to solve 3x1 2 0.1x2 2 0.2x3 5 7.85 (E9.5.1) 0.1x1 1 7x2 2 0.3x3 5 219.3 (E9.5.2) 0.3x1 2 0.2x2 1 10x3 5 71.4 (E9.5.3) Carry six significant figures during the computation. Solution. The first part of the procedure is forward elimination. Multiply Eq. (E9.5.1) by (0.1)y3 and subtract the result from Eq. (E9.5.2) to give 7.00333x2 2 0.293333x3 5 219.5617 Then multiply Eq. (E9.5.1) by (0.3)y3 and subtract it from Eq. (E9.5.3) to eliminate x1. After these operations, the set of equations is 3x1 20.1x2 20.2x3 5 7.85 (E9.5.4) 7.00333x2 2 0.293333x3 5 219.5617 (E9.5.5) 20.190000x2 1 10.0200x3 5 70.6150 (E9.5.6) To complete the forward elimination, x2 must be removed from Eq. (E9.5.6). To accom- plish this, multiply Eq. (E9.5.5) by 20.190000y7.00333 and subtract the result from Eq. (E9.5.6). This eliminates x2 from the third equation and reduces the system to an upper triangular form, as in 3x1 20.1x2 20.2x3 5 7.85 (E9.5.7) 7.00333x2 2 0.293333x3 5 219.5617 (E9.5.8) 10.0120x3 5 70.0843 (E9.5.9) We can now solve these equations by back substitution. First, Eq. (E9.5.9) can be solved for x3 5 70.0843 10.0120 5 7.0000 (E9.5.10) This result can be back-substituted into Eq. (E9.5.8): 7.00333x2 2 0.293333(7.0000) 5 219.5617 which can be solved for x2 5 2 19.5617 1 0.293333(7.0000) 7.00333 5 22.50000 (E9.5.11)
  • 273. 256 GAUSS ELIMINATION Finally, Eqs. (E9.5.10) and (E9.5.11) can be substituted into Eq. (E9.5.4): 3x1 2 0.1(22.50000) 2 0.2(7.0000) 5 7.85 which can be solved for x1 5 7.85 1 0.1(22.50000) 1 0.2(7.0000) 3 5 3.00000 The results are identical to the exact solution of x1 5 3, x2 5 22.5, and x3 5 7. This can be verified by substituting the results into the original equation set 3(3) 2 0.1(22.5) 2 0.2(7) 5 7.85 0.1(3) 1 7(22.5) 2 0.3(7) 5 219.3 0.3(3) 2 0.2(22.5) 1 10(7) 5 71.4 9.2.1 Operation Counting The execution time of Gauss elimination depends on the amount of floating-point operations (or flops) involved in the algorithm. On modern computers using math copro- cessors, the time consumed to perform addition/subtraction and multiplication/division is about the same. Therefore, totaling up these operations provides insight into which parts of the algorithm are most time consuming and how computation time increases as the system gets larger. Before analyzing naive Gauss elimination, we will first define some quantities that facilitate operation counting: a m i51 cf (i) 5 c a m i51 f(i) a m i51 f(i) 1 g(i) 5 a m i51 f(i) 1 a m i51 g(i) (9.18a,b) a m i51 1 5 1 1 1 1 1 1 p 1 1 5 m a m i5k 1 5 m 2 k 1 1 (9.18c,d) a m i51 i 5 1 1 2 1 3 1 p 1 m 5 m(m 1 1) 2 5 m2 2 1 O(m) (9.18e) a m i51 i2 5 12 1 22 1 32 1 p 1 m2 5 m(m 1 1)(2m 1 1) 6 5 m3 3 1 O(m2 ) (9.18f) where O(mn ) means “terms of order mn and lower.” Now let us examine the naive Gauss elimination algorithm (Fig. 9.4a) in detail. We will first count the flops in the elimination stage. On the first pass through the outer loop, k 5 1. Therefore, the limits on the middle loop are from i 5 2 to n. According to Eq. (9.18d), this means that the number of iterations of the middle loop will be a n i52 1 5 n 2 2 1 1 5 n 2 1 (9.19) For every one of these iterations, there is one division to define the factor. The interior loop then performs a single multiplication and subtraction for each iteration from j 5 2 to n. Finally, there is one additional multiplication and subtraction for the right-hand-side value.
  • 274. 9.2 NAIVE GAUSS ELIMINATION 257 Thus, for every iteration of the middle loop, the number of multiplications is 1 1 [n 2 2 1 1] 1 1 5 1 1 n (9.20) The total multiplications for the first pass through the outer loop is therefore obtained by multiplying Eq. (9.19) by (9.20) to give [n 2 1](1 1 n). In like fashion, the number of subtractions is computed as [n 2 1](n). Similar reasoning can be used to estimate the flops for the subsequent iterations of the outer loop. These can be summarized as Outer Loop Middle Loop Addition/Subtraction Multiplication/Division k i flops flops 1 2, n (n 2 1)(n) (n 2 1)(n 1 1) 2 3, n (n 2 2)(n – 1) (n 2 2)(n) . . . . . . k k 1 1, n (n 2 k)(n 1 1 2 k) (n 2 k)(n 1 2 2 k) . . . . . . n 2 1 n, n (1)(2) (1) (3) Therefore, the total addition/subtraction flops for elimination can be computed as a n21 k51 (n 2 k)(n 1 1 2 k) 5 a n21 k51 [n(n 1 1) 2 k(2n 1 1) 1 k2 ] or n(n 1 1) a n21 k51 1 2 (2n 1 1) a n21 k51 k 1 a n21 k51 k2 Applying some of the relationships from Eq. (9.18) yields [n3 1 O(n)] 2 [n3 1 O(n2 )] 1 c 1 3 n3 1 O(n2 ) d 5 n3 3 1 O(n) (9.21) A similar analysis for the multiplication/division flops yields [n3 1 O(n2 )] 2 [n3 1 O(n)] 1 c 1 3 n3 1 O(n2 ) d 5 n3 3 1 O(n2 ) (9.22) Summing these results gives 2n3 3 1 O(n2 ) Thus, the total number of flops is equal to 2n3 y3 plus an additional component proportional to terms of order n2 and lower. The result is written in this way because as n gets large, the O(n2 ) and lower terms become negligible. We are therefore justified in concluding that for large n, the effort involved in forward elimination converges on 2n3 /3. Because only a single loop is used, back substitution is much simpler to evaluate. The number of addition/subtraction flops is equal to n(n 2 1)y2. Because of the extra
  • 275. 258 GAUSS ELIMINATION division prior to the loop, the number of multiplication/division flops is n(n 1 1)y2. These can be added to arrive at a total of n2 1 O(n) Thus, the total effort in naive Gauss elimination can be represented as 2n3 3 1 O(n2 ) 1 n2 1 O(n) ——— — S as n increases 2n3 3 1 O(n2 ) (9.23) Forward Backward elimination substitution Two useful general conclusions can be drawn from this analysis: 1. As the system gets larger, the computation time increases greatly. As in Table 9.1, the amount of flops increases nearly three orders of magnitude for every order of magnitude increase in the dimension. 2. Most of the effort is incurred in the elimination step. Thus, efforts to make the method more efficient should probably focus on this step. 9.3 PITFALLS OF ELIMINATION METHODS Whereas there are many systems of equations that can be solved with naive Gauss elimina- tion, there are some pitfalls that must be explored before writing a general computer program to implement the method. Although the following material relates directly to naive Gauss elimination, the information is relevant for other elimination techniques as well. 9.3.1 Division by Zero The primary reason that the foregoing technique is called “naive” is that during both the elimination and the back-substitution phases, it is possible that a division by zero can occur. For example, if we use naive Gauss elimination to solve 2x2 1 3x3 5 8 4x1 1 6x2 1 7x3 5 23 2x1 1 x2 1 6x3 5 5 the normalization of the first row would involve division by a11 5 0. Problems also can arise when a coefficient is very close to zero. The technique of pivoting has been devel- oped to partially avoid these problems. It will be described in Sec. 9.4.2. TABLE 9.1 Number of Flops for Gauss Elimination. Back Total Percent Due n Elimination Substitution Flops 2n3 /3 to Elimination 10 705 100 805 667 87.58% 100 671550 10000 681550 666667 98.53% 1000 6.67 3 108 1 3 106 6.68 3 108 6.67 3 108 99.85%
  • 276. 9.3 PITFALLS OF ELIMINATION METHODS 259 9.3.2 Round-Off Errors Even though the solution in Example 9.5 was close to the true answer, there was a slight discrepancy in the result for x3 [Eq. (E9.5.10)]. This discrepancy, which amounted to a relative error of 20.00043 percent, was due to our use of six significant figures during the computation. If we had used more significant figures, the error in the results would be reduced further. If we had used fractions instead of decimals (and consequently avoided round-off altogether), the answers would have been exact. However, because computers carry only a limited number of significant figures (recall Sec. 3.4.1), round-off errors can occur and must be considered when evaluating the results. The problem of round-off error can become particularly important when large num- bers of equations are to be solved. This is due to the fact that every result is dependent on previous results. Consequently, an error in the early steps will tend to propagate—that is, it will cause errors in subsequent steps. Specifying the system size where round-off error becomes significant is complicated by the fact that the type of computer and the properties of the equations are determining factors. A rough rule of thumb is that round-off error may be important when dealing with 100 or more equations. In any event, you should always substitute your answers back into the original equations to check whether a substantial error has occurred. How- ever, as discussed below, the magnitudes of the coefficients themselves can influence whether such an error check ensures a reliable result. 9.3.3 Ill-Conditioned Systems The adequacy of the solution depends on the condition of the system. In Sec. 9.1.1, a graph- ical depiction of system condition was developed. As discussed in Sec. 4.2.3, well-conditioned systems are those where a small change in one or more of the coefficients results in a simi- lar small change in the solution. Ill-conditioned systems are those where small changes in coefficients result in large changes in the solution. An alternative interpretation of ill-condi- tioning is that a wide range of answers can approximately satisfy the equations. Because round-off errors can induce small changes in the coefficients, these artificial changes can lead to large solution errors for ill-conditioned systems, as illustrated in the following example. EXAMPLE 9.6 Ill-Conditioned Systems Problem Statement. Solve the following system: x1 1 2x2 5 10 (E9.6.1) 1.1x1 1 2x2 5 10.4 (E9.6.2) Then, solve it again, but with the coefficient of x1 in the second equation modified slightly to 1.05. Solution. Using Eqs. (9.10) and (9.11), the solution is x1 5 2(10) 2 2(10.4) 1(2) 2 2(1.1) 5 4 x2 5 1(10.4) 2 1.1(10) 1(2) 2 2(1.1) 5 3
  • 277. 260 GAUSS ELIMINATION However, with the slight change of the coefficient a21 from 1.1 to 1.05, the result is changed dramatically to x1 5 2(10) 2 2(10.4) 1(2) 2 2(1.05) 5 8 x2 5 1(10.4) 2 1.1(10) 1(2) 2 2(1.05) 5 1 Notice that the primary reason for the discrepancy between the two results is that the denominator represents the difference of two almost-equal numbers. As illustrated previously in Sec. 3.4.2, such differences are highly sensitive to slight variations in the numbers being manipulated. At this point, you might suggest that substitution of the results into the original equations would alert you to the problem. Unfortunately, for ill-conditioned systems this is often not the case. Substitution of the erroneous values of x1 5 8 and x2 5 1 into Eqs. (E9.6.1) and (E9.6.2) yields 8 1 2(1) 5 10 5 10 1.1(8) 1 2(1) 5 10.8 10.4 Therefore, although x1 5 8 and x2 5 1 is not the true solution to the original problem, the error check is close enough to possibly mislead you into believing that your solutions are adequate. As was done previously in the section on graphical methods, a visual representative of ill-conditioning can be developed by plotting Eqs. (E9.6.1) and (E9.6.2) (recall Fig. 9.2). Because the slopes of the lines are almost equal, it is visually difficult to see exactly where they intersect. This visual difficulty is reflected quantitatively in the nebulous results of Example 9.6. We can mathematically characterize this situation by writing the two equa- tions in general form: a11x1 1 a12x2 5 b1 (9.24) a21x1 1 a22x2 5 b2 (9.25) Dividing Eq. (9.24) by a12 and Eq. (9.25) by a22 and rearranging yields alternative ver- sions that are in the format of straight lines [x2 5 (slope) x1 1 intercept]: x2 5 2 a11 a12 x1 1 b1 a12 x2 5 2 a21 a22 x1 1 b2 a22 Consequently, if the slopes are nearly equal, a11 a12 a21 a22
  • 278. 9.3 PITFALLS OF ELIMINATION METHODS 261 or, cross-multiplying, a11a22 a12a21 which can be also expressed as a11a22 2 a12a21 0 (9.26) Now, recalling that a11a22 2 a12a2l is the determinant of a two-dimensional system [Eq. (9.3)], we arrive at the general conclusion that an ill-conditioned system is one with a determinant close to zero. In fact, if the determinant is exactly zero, the two slopes are identical, which connotes either no solution or an infinite number of solutions, as is the case for the singular systems depicted in Fig. 9.2a and b. It is difficult to specify how close to zero the determinant must be to indicate ill- conditioning. This is complicated by the fact that the determinant can be changed by multiplying one or more of the equations by a scale factor without changing the solution. Consequently, the determinant is a relative value that is influenced by the magnitude of the coefficients. EXAMPLE 9.7 Effect of Scale on the Determinant Problem Statement. Evaluate the determinant of the following systems: (a) From Example 9.1: 3x1 1 2x2 5 18 (E9.7.1) 2x1 1 2x2 5 2 (E9.7.2) (b) From Example 9.6: x1 1 2x2 5 10 (E9.7.3) 1.1x1 1 2x2 5 10.4 (E9.7.4) (c) Repeat (b) but with the equations multiplied by 10. Solution. (a) The determinant of Eqs. (E9.7.1) and (E9.7.2), which are well-conditioned, is D 5 3(2) 2 2(21) 5 8 (b) The determinant of Eqs. (E9.7.3) and (E9.7.4), which are ill-conditioned, is D 5 1(2) 2 2(1.1) 5 20.2 (c) The results of (a) and (b) seem to bear out the contention that ill-conditioned systems have near-zero determinants. However, suppose that the ill-conditioned system in (b) is multiplied by 10 to give 10x1 1 20x2 5 100 11x1 1 20x2 5 104 The multiplication of an equation by a constant has no effect on its solution. In ad- dition, it is still ill-conditioned. This can be verified by the fact that multiplying by
  • 279. 262 GAUSS ELIMINATION As illustrated by the previous example, the magnitude of the coefficients interjects a scale effect that complicates the relationship between system condition and determinant size. One way to partially circumvent this difficulty is to scale the equations so that the maximum element in any row is equal to 1. EXAMPLE 9.8 Scaling Problem Statement. Scale the systems of equations in Example 9.7 to a maximum value of 1 and recompute their determinants. Solution. (a) For the well-conditioned system, scaling results in x1 1 0.667x2 5 6 20.5x1 1 x2 5 1 for which the determinant is D 5 1(1) 2 0.667(20.5) 5 1.333 (b) For the ill-conditioned system, scaling gives 0.5x1 1 x2 5 5 0.55x1 1 x2 5 5.2 for which the determinant is D 5 0.5(1) 2 1(0.55) 5 20.05 (c) For the last case, scaling changes the system to the same form as in (b) and the determinant is also 20.05. Thus, the scale effect is removed. a constant has no effect on the graphical solution. However, the determinant is dramatically affected: D 5 10(20) 2 20(11) 5 220 Not only has it been raised two orders of magnitude, but it is now over twice as large as the determinant of the well-conditioned system in (a). In a previous section (Sec. 9.1.2), we suggested that the determinant is difficult to compute for more than three simultaneous equations. Therefore, it might seem that it does not provide a practical means for evaluating system condition. However, as de- scribed in Box 9.1, there is a simple algorithm that results from Gauss elimination that can be used to evaluate the determinant. Aside from the approach used in the previous example, there are a variety of other ways to evaluate system condition. For example, there are alternative methods for nor- malizing the elements (see Stark, 1970). In addition, as described in the next chapter (Sec. 10.3), the matrix inverse and matrix norms can be employed to evaluate system condition. Finally, a simple (but time-consuming) test is to modify the coefficients
  • 280. 9.3 PITFALLS OF ELIMINATION METHODS 263 slightly and repeat the solution. If such modifications lead to drastically different results, the system is likely to be ill-conditioned. As you might gather from the foregoing discussion, ill-conditioned systems are prob- lematic. Fortunately, most linear algebraic equations derived from engineering-problem settings are naturally well-conditioned. In addition, some of the techniques outlined in Sec. 9.4 help to alleviate the problem. 9.3.4 Singular Systems In the previous section, we learned that one way in which a system of equations can be ill-conditioned is when two or more of the equations are nearly identical. Obviously, it is even worse when the two are identical. In such cases, we would lose one degree of freedom, and would be dealing with the impossible case of n 2 1 equations with n unknowns. Such cases might not be obvious to you, particularly when dealing with large equation sets. Consequently, it would be nice to have some way of automatically detecting singularity. The answer to this problem is neatly offered by the fact that the determinant of a singular system is zero. This idea can, in turn, be connected to Gauss elimination by recognizing that after the elimination step, the determinant can be evaluated as the prod- uct of the diagonal elements (recall Box 9.1). Thus, a computer algorithm can test to discern whether a zero diagonal element is created during the elimination stage. If one is discovered, the calculation can be immediately terminated and a message displayed Box 9.1 Determinant Evaluation Using Gauss Elimination In Sec. 9.1.2, we stated that determinant evaluation by expansion of minors was impractical for large sets of equations. Thus, we con- cluded that Cramer’s rule would be applicable only to small sys- tems. However, as mentioned in Sec. 9.3.3, the determinant has value in assessing system condition. It would, therefore, be useful to have a practical method for computing this quantity. Fortunately, Gauss elimination provides a simple way to do this. The method is based on the fact that the determinant of a tri- angular matrix can be simply computed as the product of its diago- nal elements: D 5 a11a22a33 p ann (B9.1.1) The validity of this formulation can be illustrated for a 3 by 3 system: D 5 † a11 a12 a13 0 a22 a23 0 0 a33 † where the determinant can be evaluated as [recall Eq. (9.4)] D 5 a11 ` a22 a23 0 a33 ` 2a12 ` 0 a23 0 a33 ` 1a13 ` 0 a22 0 0 ` or, by evaluating the minors (that is, the 2 by 2 determinants), D 5 a11a22a33 2 a12(0) 1 a13(0) 5 a11a12a33 Recall that the forward-elimination step of Gauss elimination results in an upper triangular system. Because the value of the de- terminant is not changed by the forward-elimination process, the determinant can be simply evaluated at the end of this step via D 5 a11a¿ 22 a– 33 p a(n21) nn (B9.1.2) where the superscripts signify the number of times that the ele- ments have been modified by the elimination process. Thus, we can capitalize on the effort that has already been expended in reducing the system to triangular form and, in the bargain, come up with a simple estimate of the determinant. There is a slight modification to the above approach when the program employs partial pivoting (Sec. 9.4.2). For this case, the determinant changes sign every time a row is pivoted. One way to represent this is to modify Eq. (B9.1.2): D 5 a11a¿ 22a– 33 p a(n21) nn (21)p (B9.1.3) where p represents the number of times that rows are pivoted. This modification can be incorporated simply into a program; merely keep track of the number of pivots that take place during the course of the computation and then use Eq. (B9.1.3) to evalu- ate the determinant.
  • 281. 264 GAUSS ELIMINATION alerting the user. We will show the details of how this is done when we present a full algorithm for Gauss elimination later in this chapter. 9.4 TECHNIQUES FOR IMPROVING SOLUTIONS The following techniques can be incorporated into the naive Gauss elimination algorithm to circumvent some of the pitfalls discussed in the previous section. 9.4.1 Use of More Significant Figures The simplest remedy for ill-conditioning is to use more significant figures in the compu- tation. If your application can be extended to handle larger word size, such a feature will greatly reduce the problem. However, a price must be paid in the form of the computa- tional and memory overhead connected with using extended precision (recall Sec. 3.4.1). 9.4.2 Pivoting As mentioned at the beginning of Sec. 9.3, obvious problems occur when a pivot element is zero because the normalization step leads to division by zero. Problems may also arise when the pivot element is close to, rather than exactly equal to, zero because if the magnitude of the pivot element is small compared to the other elements, then round-off errors can be introduced. Therefore, before each row is normalized, it is advantageous to determine the larg- est available coefficient in the column below the pivot element. The rows can then be switched so that the largest element is the pivot element. This is called partial pivoting. If columns as well as rows are searched for the largest element and then switched, the procedure is called complete pivoting. Complete pivoting is rarely used because switch- ing columns changes the order of the x’s and, consequently, adds significant and usually unjustified complexity to the computer program. The following example illustrates the advantages of partial pivoting. Aside from avoiding division by zero, pivoting also min- imizes round-off error. As such, it also serves as a partial remedy for ill-conditioning. EXAMPLE 9.9 Partial Pivoting Problem Statement. Use Gauss elimination to solve 0.0003x1 1 3.0000x2 5 2.0001 1.0000x1 1 1.0000x2 5 1.0000 Note that in this form the first pivot element, a11 5 0.0003, is very close to zero. Then repeat the computation, but partial pivot by reversing the order of the equations. The exact solution is x1 5 1y3 and x2 5 2y3. Solution. Multiplying the first equation by 1y(0.0003) yields x1 1 10,000x2 5 6667 which can be used to eliminate x1 from the second equation: 29999x2 5 26666
  • 282. 9.4 TECHNIQUES FOR IMPROVING SOLUTIONS 265 which can be solved for x2 5 2 3 This result can be substituted back into the first equation to evaluate x1: x1 5 2.0001 2 3(2y3) 0.0003 (E9.9.1) However, due to subtractive cancellation, the result is very sensitive to the number of significant figures carried in the computation: Absolute Value of Percent Significant Relative Error Figures x2 x1 for x1 3 0.667 23.33 1099 4 0.6667 0.0000 100 5 0.66667 0.30000 10 6 0.666667 0.330000 1 7 0.6666667 0.3330000 0.1 Note how the solution for x1 is highly dependent on the number of significant figures. This is because in Eq. (E9.9.1), we are subtracting two almost-equal numbers. On the other hand, if the equations are solved in reverse order, the row with the larger pivot element is normalized. The equations are 1.0000x1 1 1.0000x2 5 1.0000 0.0003x1 1 3.0000x2 5 2.0001 Elimination and substitution yield x2 5 2y3. For different numbers of significant figures, x1 can be computed from the first equation, as in x1 5 1 2 (2y3) 1 (E9.9.2) This case is much less sensitive to the number of significant figures in the computation: Absolute Value of Percent Significant Relative Error Figures x2 x1 for x1 3 0.667 0.333 0.1 4 0.6667 0.3333 0.01 5 0.66667 0.33333 0.001 6 0.666667 0.333333 0.0001 7 0.6666667 0.3333333 0.00001 Thus, a pivot strategy is much more satisfactory.
  • 283. 266 GAUSS ELIMINATION General-purpose computer programs must include a pivot strategy. Figure 9.5 provides a simple algorithm to implement such a strategy. Notice that the algorithm consists of two major loops. After storing the current pivot element and its row number as the variables, big and p, the first loop compares the pivot element with the elements below it to check whether any of these is larger than the pivot element. If so, the new largest element and its row number are stored in big and p. Then, the second loop switches the original pivot row with the one with the largest ele- ment so that the latter becomes the new pivot row. This pseudocode can be inte- grated into a program based on the other elements of Gauss elimination outlined in Fig. 9.4. The best way to do this is to employ a modular approach and write Fig. 9.5 as a subroutine (or procedure) that would be called directly after the beginning of the first loop in Fig. 9.4a. Note that the second IF/THEN construct in Fig. 9.5 physically interchanges the rows. For large matrices, this can become quite time consuming. Consequently, most codes do not actually exchange rows but rather keep track of the pivot rows by storing the ap- propriate subscripts in a vector. This vector then provides a basis for specifying the proper row ordering during the forward-elimination and back-substitution operations. Thus, the operations are said to be implemented in place. 9.4.3 Scaling In Sec. 9.3.3, we proposed that scaling had value in standardizing the size of the deter- minant. Beyond this application, it has utility in minimizing round-off errors for those cases where some of the equations in a system have much larger coefficients than others. Such situations are frequently encountered in engineering practice when widely different units are used in the development of simultaneous equations. For instance, in electric- circuit problems, the unknown voltages can be expressed in units ranging from microvolts to kilovolts. Similar examples can arise in all fields of engineering. As long as each equation is consistent, the system will be technically correct and solvable. However, the use of widely differing units can lead to coefficients of widely differing magnitudes. This, in turn, can have an impact on round-off error as it affects pivoting, as illustrated by the following example. EXAMPLE 9.10 Effect of Scaling on Pivoting and Round-Off Problem Statement. (a) Solve the following set of equations using Gauss elimination and a pivoting strategy: 2x1 1 100,000x2 5 100,000 x1 1 x2 5 2 (b) Repeat the solution after scaling the equations so that the maximum coefficient in each row is 1. (c) Finally, use the scaled coefficients to determine whether pivoting is necessary. How- ever, actually solve the equations with the original coefficient values. For all cases, retain only three significant figures. Note that the correct answers are x1 5 1.00002 and x2 5 0.99998 or, for three significant figures, x1 5 x2 5 1.00. p 5 k big 5 |ak,k| DOFOR ii 5 k11, n dummy 5 |aii,k| IF (dummy . big) big 5 dummy p 5 ii END IF END DO IF (p fi k) DOFOR jj 5 k, n dummy 5 ap,jj ap,jj 5 ak,jj ak,jj 5 dummy END DO dummy 5 bp bp 5 bk bk 5 dummy END IF FIGURE 9.5 Pseudocode to implement partial pivoting.
  • 284. 9.4 TECHNIQUES FOR IMPROVING SOLUTIONS 267 Solution. (a) Without scaling, forward elimination is applied to give 2x1 1 100,000x2 5 100,000 250,000x2 5 250,000 which can be solved by back substitution for x2 5 1.00 x1 5 0.00 Although x2 is correct, x1 is 100 percent in error because of round-off. (b) Scaling transforms the original equations to 0.00002x1 1 x2 5 1 x1 1 x2 5 2 Therefore, the rows should be pivoted to put the greatest value on the diagonal. x1 1 x2 5 2 0.00002x1 1 x2 5 1 Forward elimination yields x1 1 x2 5 2 x2 5 1.00 which can be solved for x1 5 x2 5 1 Thus, scaling leads to the correct answer. (c) The scaled coefficients indicate that pivoting is necessary. We therefore pivot but retain the original coefficients to give x1 1 x2 5 2 2x1 1 100,000x2 5 100,000 Forward elimination yields x1 1 x2 5 2 100,000x2 5 100,000 which can be solved for the correct answer: x1 5 x2 5 1. Thus, scaling was useful in determining whether pivoting was necessary, but the equations themselves did not require scaling to arrive at a correct result.
  • 285. 268 GAUSS ELIMINATION FIGURE 9.6 Pseudocode to implement Gauss elimination with partial pivoting. SUB Gauss (a, b, n, x, tol, er) DIMENSION s(n) er 5 0 DOFOR i 5 1, n si 5 ABS(ai,1) DOFOR j 5 2, n IF ABS(ai,j).si THEN si 5 ABS(ai,j) END DO END DO CALL Eliminate(a, s, n, b, tol, er) IF er ? 21 THEN CALL Substitute(a, n, b, x) END IF END Gauss SUB Eliminate (a, s, n, b, tol, er) DOFOR k 5 1, n 2 1 CALL Pivot (a, b, s, n, k) IF ABS (ak,k/sk) , tol THEN er 5 21 EXIT DO END IF DOFOR i 5 k 1 1, n factor 5 ai,k/ak,k DOFOR j 5 k 1 1, n ai,j 5 ai,j 2 factor*ak,j END DO bi 5 bi 2 factor * bk END DO END DO IF ABS(an,n/sn) , to1 THEN er 5 21 END Eliminate SUB Pivot (a, b, s, n, k) p 5 k big 5 ABS(ak,k/sk) DOFOR ii 5 k 1 1, n dummy 5 ABS(aii,k/sii) IF dummy . big THEN big 5 dummy p 5 ii END IF END DO IF p ? k THEN DOFOR jj 5 k, n dummy 5 ap,jj ap,jj 5 ak,jj ak,jj 5 dummy END DO dummy 5 bp bp 5 bk bk 5 dummy dummy 5 sp sp 5 sk sk 5 dummy END IF END pivot SUB Substitute (a, n, b, x) xn 5 bn/an,n DOFOR i 5 n 2 1, 1, 21 sum 5 0 DOFOR j 5 i 1 1, n sum 5 sum 1 ai,j * xj END DO xn 5 (bn 2 sum) / an,n END DO END Substitute
  • 286. 9.4 TECHNIQUES FOR IMPROVING SOLUTIONS 269 As in the previous example, scaling has utility in minimizing round-off. However, it should be noted that scaling itself also leads to round-off. For example, given the equation 2x1 1 300,000x2 5 1 and using three significant figures, scaling leads to 0.00000667x1 1 x2 5 0.00000333 Thus, scaling introduces a round-off error to the first coefficient and the right-hand-side constant. For this reason, it is sometimes suggested that scaling should be employed only as in part (c) of the preceding example. That is, it is used to calculate scaled values for the coefficients solely as a criterion for pivoting, but the original coefficient values are retained for the actual elimination and substitution computations. This involves a trade- off if the determinant is being calculated as part of the program. That is, the resulting determinant will be unscaled. However, because many applications of Gauss elimination do not require determinant evaluation, it is the most common approach and will be used in the algorithm in the next section. 9.4.4 Computer Algorithm for Gauss Elimination The algorithms from Figs. 9.4 and 9.5 can now be combined into a larger algorithm to implement the entire Gauss elimination algorithm. Figure 9.6 shows an algorithm for a general subroutine to implement Gauss elimination. Note that the program includes modules for the three primary operations of the Gauss elimination algorithm: forward elimination, back substitution, and pivoting. In addition, there are several aspects of the code that differ and represent improvements over the pseudocodes from Figs. 9.4 and 9.5. These are: The equations are not scaled, but scaled values of the elements are used to determine whether pivoting is to be implemented. The diagonal term is monitored during the pivoting phase to detect near-zero occurrences in order to flag singular systems. If it passes back a value of er 5 21, a singular matrix has been detected and the computation should be terminated. A parameter tol is set by the user to a small number in order to detect near-zero occurrences. EXAMPLE 9.11 Solution of Linear Algebraic Equations Using the Computer Problem Statement. A computer program to solve linear algebraic equations such as one based on Fig. 9.6 can be used to solve a problem associated with the falling parachutist example discussed in Chap. 1. Suppose that a team of three parachutists is connected by a weightless cord while free-falling at a velocity of 5 m/s (Fig. 9.7).
  • 287. 270 GAUSS ELIMINATION Solution. Free-body diagrams for each of the parachutists are depicted in Fig. 9.8. Summing the forces in the vertical direction and using Newton’s second law gives a set of three simultaneous linear equations: m1g 2 T 2 c1y 5 m1a m2g 1 T 2 c2y 2 R 5 m2a m3g 2 c3y 1 R 5 m3a These equations have three unknowns: a, T, and R. After substituting the known values, the equations can be expressed in matrix form as (g 5 9.81 m/s2 ), £ 70 1 0 60 21 1 40 0 21 § • a T R ¶ 5 • 636.7 518.6 307.4 ¶ This system can be solved using your own software. The result is a 5 8.6041 m/s2 ; T 5 34.4118 N; and R 5 36.7647 N. FIGURE 9.7 Three parachutists free-falling while connected by weightless cords. R T 1 2 3 a T m3g R T R m2g m1g c3v c2v c1v 3 2 1 FIGURE 9.8 Free-body diagrams for each of the three falling parachutists. Parachutist Mass, kg Drag Coefficient, kg/s 1 70 10 2 60 14 3 40 17 Calculate the tension in each section of cord and the acceleration of the team, given the following:
  • 288. 9.6 NONLINEAR SYSTEMS OF EQUATIONS 271 9.5 COMPLEX SYSTEMS In some problems, it is possible to obtain a complex system of equations [C]{Z} 5 {W} (9.27) where [C] 5 [A] 1 i[B] {Z} 5 {X} 1 i{Y} {W} 5 {U} 1 i{V} (9.28) where i 5 121. The most straightforward way to solve such a system is to employ one of the algo- rithms described in this part of the book, but replace all real operations with complex ones. Of course, this is only possible for those languages, such as Fortran, that allow complex variables. For languages that do not permit the declaration of complex variables, it is possible to write a code to convert real to complex operations. However, this is not a trivial task. An alternative is to convert the complex system into an equivalent one dealing with real variables. This can be done by substituting Eq. (9.28) into Eq. (9.27) and equating real and complex parts of the resulting equation to yield [A]{X} 2 [B]{Y} 5 {U} (9.29) and [B]{X} 1 [A]{Y} 5 {V} (9.30) Thus, the system of n complex equations is converted to a set of 2n real ones. This means that storage and execution time will be increased significantly. Consequently, a trade-off exists regarding this option. If you evaluate complex systems infrequently, it is preferable to use Eqs. (9.29) and (9.30) because of their convenience. However, if you use them often and desire to employ a language that does not allow complex data types, it may be worth the up-front programming effort to write a customized equation solver that converts real to complex operations. 9.6 NONLINEAR SYSTEMS OF EQUATIONS Recall that at the end of Chap. 6 we presented an approach to solve two nonlinear equa- tions with two unknowns. This approach can be extended to the general case of solving n simultaneous nonlinear equations. f1(x1, x2, p , xn) 5 0 f2(x1, x2, p , xn) 5 0 . . . . (9.31) . . fn(x1, x2, p , xn) 5 0
  • 289. 272 GAUSS ELIMINATION The solution of this system consists of the set of x values that simultaneously result in all the equations equaling zero. As described in Sec. 6.5.2, one approach to solving such systems is based on a multidimensional version of the Newton-Raphson method. Thus, a Taylor series expan- sion is written for each equation. For example, for the kth equation, fk,i11 5 fk,i 1 (x1,i11 2 x1,i) 0fk,i 0x1 1 (x2,i11 2 x2,i) 0fk,i 0x2 1 p 1 (xn,i11 2 xn,i) 0fk,i 0xn (9.32) where the first subscript, k, represents the equation or unknown and the second subscript denotes whether the value or function in question is at the present value (i) or at the next value (i 1 1). Equations of the form of (9.32) are written for each of the original nonlinear equa- tions. Then, as was done in deriving Eq. (6.20) from (6.19), all fk,i11 terms are set to zero as would be the case at the root, and Eq. (9.32) can be written as 2fk,i 1 x1,i 0fk,i 0x1 1 x2,i 0fk,i 0x2 1 p 1 xn,i 0fk,i 0xn 5 x1,i11 0fk,i 0x1 1 x2,i11 0fk,i 0x2 1 p 1 xn,i11 0fk,i 0xn (9.33) Notice that the only unknowns in Eq. (9.33) are the xk,i11 terms on the right-hand side. All other quantities are located at the present value (i) and, thus, are known at any iteration. Consequently, the set of equations generally represented by Eq. (9.33) (that is, with k 5 1, 2, . . . , n) constitutes a set of linear simultaneous equations that can be solved by methods elaborated in this part of the book. Matrix notation can be employed to express Eq. (9.33) concisely. The partial derivatives can be expressed as [Z] 5 I 0f1,i 0x1 0f1,i 0x2 p 0f1,i 0xn 0f2,i 0x1 0f2,i 0x2 p 0f2,i 0xn . . . . . . . . . 0fn,i 0x1 0fn,i 0x2 p 0fn,i 0xn Y (9.34) The initial and final values can be expressed in vector form as {Xi}T 5 :x1,i x2,i p xn,i ; and {Xi11}T 5 :x1,i11 x2,i11 p xn,i11 ;
  • 290. 9.7 GAUSS-JORDAN 273 Finally, the function values at i can be expressed as {Fi}T 5 : f1,i f2,i p fn,i ; Using these relationships, Eq. (9.33) can be represented concisely as [Z]{Xi11} 5 2{Fi} 1 [Z]{Xi} (9.35) Equation (9.35) can be solved using a technique such as Gauss elimination. This process can be repeated iteratively to obtain refined estimates in a fashion similar to the two- equation case in Sec. 6.5.2. It should be noted that there are two major shortcomings to the foregoing approach. First, Eq. (9.34) is often inconvenient to evaluate. Therefore, variations of the Newton- Raphson approach have been developed to circumvent this dilemma. As might be ex- pected, most are based on using finite-difference approximations for the partial derivatives that comprise [Z]. The second shortcoming of the multiequation Newton-Raphson method is that excel- lent initial guesses are usually required to ensure convergence. Because these are often difficult to obtain, alternative approaches that are slower than Newton-Raphson but which have better convergence behavior have been developed. One common approach is to reformulate the nonlinear system as a single function F(x) 5 a n i51 [ fi(x1, x2, p , xn)]2 (9.36) where fi(xl, x2, . . . , xn) is the ith member of the original system of Eq. (9.31). The values of x that minimize this function also represent the solution of the nonlinear system. As we will see in Chap. 17, this reformulation belongs to a class of problems called nonlinear regression. As such, it can be approached with a number of optimization tech- niques such as the ones described later in this text (Part Four and specifically Chap. 14). 9.7 GAUSS-JORDAN The Gauss-Jordan method is a variation of Gauss elimination. The major difference is that when an unknown is eliminated in the Gauss-Jordan method, it is eliminated from all other equations rather than just the subsequent ones. In addition, all rows are normalized by dividing them by their pivot elements. Thus, the elimination step results in an identity matrix rather than a triangular matrix (Fig. 9.9). Consequently, it is not necessary to em- ploy back substitution to obtain the solution. The method is best illustrated by an example. EXAMPLE 9.12 Gauss-Jordan Method Problem Statement. Use the Gauss-Jordan technique to solve the same system as in Example 9.5: 3x1 2 0.1x2 2 0.2x3 5 7.85 0.1x1 1 7x2 2 0.3x3 5 219.3 0.3x1 2 0.2x2 1 10x3 5 71.4 £ a11 a12 a13 b1 a21 a22 a23 b2 a31 a32 a33 b3 § T £ 1 0 0 b(n) 1 0 1 0 b(n) 2 0 0 1 b(n) 3 § T x1 5 b(n) 1 x2 5 b(n) 2 x3 5 b(n) 3 FIGURE 9.9 Graphical depiction of the Gauss-Jordan method. Compare with Fig. 9.3 to elucidate the differences between this tech- nique and Gauss elimination. The superscript (n) means that the elements of the right-hand- side vector have been modified n times (for this case, n 5 3).
  • 291. 274 GAUSS ELIMINATION Solution. First, express the coefficients and the right-hand side as an augmented matrix: £ 3 20.1 20.2 7.85 0.1 7 20.3 219.3 0.3 20.2 10 71.4 § Then normalize the first row by dividing it by the pivot element, 3, to yield £ 1 20.0333333 20.066667 2.61667 0.1 7 20.3 219.3 0.3 20.2 10 71.4 § The x1 term can be eliminated from the second row by subtracting 0.1 times the first row from the second row. Similarly, subtracting 0.3 times the first row from the third row will eliminate the x1 term from the third row: £ 1 20.0333333 20.066667 2.61667 0 7.00333 20.293333 219.5617 0 20.190000 10.0200 70.6150 § Next, normalize the second row by dividing it by 7.00333: £ 1 20.0333333 20.066667 2.61667 0 1 20.0418848 22.79320 0 20.190000 10.0200 70.6150 § Reduction of the x2 terms from the first and third equations gives £ 1 0 20.0680629 2.52356 0 1 20.0418848 22.79320 0 0 10.01200 70.0843 § The third row is then normalized by dividing it by 10.0120: £ 1 0 20.0680629 2.52356 0 1 20.0418848 22.79320 0 0 1 7.0000 § Finally, the x3 terms can be reduced from the first and the second equations to give £ 1 0 0 3.0000 0 1 0 22.5000 0 0 1 7.0000 § Thus, as depicted in Fig. 9.9, the coefficient matrix has been transformed to the identity matrix, and the solution is obtained in the right-hand-side vector. Notice that no back substitution was required to obtain the solution. All the material in this chapter regarding the pitfalls and improvements in Gauss elimination also applies to the Gauss-Jordan method. For example, a similar pivoting strategy can be used to avoid division by zero and to reduce round-off error.
  • 292. PROBLEMS 275 Although the Gauss-Jordan technique and Gauss elimination might appear almost identical, the former requires more work. Using a similar approach to Sec. 9.2.1, it can be determined that the number of flops involved in naive Gauss-Jordan is n3 1 n2 2 n —— —— S as n increases n3 1 O(n2 ) (9.37) Thus, Gauss-Jordan involves approximately 50 percent more operations than Gauss elim- ination [compare with Eq. (9.23)]. Therefore, Gauss elimination is the simple elimination method of preference for obtaining solutions of linear algebraic equations. One of the primary reasons that we have introduced the Gauss-Jordan, however, is that it is still used in engineering as well as in some numerical algorithms. 9.8 SUMMARY In summary, we have devoted most of this chapter to Gauss elimination, the most fun- damental method for solving simultaneous linear algebraic equations. Although it is one of the earliest techniques developed for this purpose, it is nevertheless an extremely effective algorithm for obtaining solutions for many engineering problems. Aside from this practical utility, this chapter also provided a context for our discussion of general issues such as round-off, scaling, and conditioning. In addition, we briefly presented material on the Gauss-Jordan method, as well as complex and nonlinear systems. Answers obtained using Gauss elimination may be checked by substituting them into the original equations. However, this does not always represent a reliable check for ill- conditioned systems. Therefore, some measure of condition, such as the determinant of the scaled system, should be computed if round-off error is suspected. Using partial pivoting and more significant figures in the computation are two options for mitigating round-off error. In the next chapter, we will return to the topic of system condition when we discuss the matrix inverse. PROBLEMS 9.1 (a) Write the following set of equations in matrix form: 8 5 6x3 1 2x2 2 2 x1 5 x3 5x2 1 8xl 5 13 (b) Multiply the matrix of coefficients by its transpose; i.e., [A][A]T . 9.2 A number of matrices are defined as [A] 5 £ 4 7 1 2 5 6 § [B] 5 £ 4 3 7 1 2 7 2 0 4 § {C} 5 • 3 6 1 ¶ [D] 5 c 9 4 3 26 2 21 7 5 d [E] 5 £ 1 5 8 7 2 3 4 0 6 § [F] 5 c 3 0 1 1 7 3 d :G; 5 :7 6 4; Answer the following questions regarding these matrices: (a) What are the dimensions of the matrices? (b) Identify the square, column, and row matrices. (c) What are the values of the elements: a12, b23, d32, e22, f12, g12? (d) Perform the following operations: (1) [E] 1 [B] (5) [E] 3 [B] (2) [A] 3 [F] (6) {C}T (3) [B] 2 [E] (7) [B] 3 [A] (4) 7 3 [B] (8) [D]T
  • 293. 276 GAUSS ELIMINATION (e) Solve again, but with a11 modified slightly to 0.52. Interpret your results. 9.8 Given the equations 10x1 1 2x2 2 x3 5 27 23x1 2 6x2 1 2x3 5 261.5 x1 1 x2 1 5x3 5 221.5 (a) Solve by naive Gauss elimination. Show all steps of the com- putation. (b) Substitute your results into the original equations to check your answers. 9.9 Use Gauss elimination to solve: 8x1 1 2x2 2 2x3 5 22 10x1 1 2x2 1 4x3 5 4 12x1 1 2x2 1 2x3 5 6 Employ partial pivoting and check your answers by substituting them into the original equations. 9.10 Given the system of equations 23x2 1 7x3 5 2 x1 1 2x2 2 x3 5 3 5x1 2 2x2 5 2 (a) Compute the determinant. (b) Use Cramer’s rule to solve for the x’s. (c) Use Gauss elimination with partial pivoting to solve for the x’s. (d) Substitute your results back into the original equations to check your solution. 9.11 Given the equations 2x1 2 6x2 2 x3 5 238 23x1 2 x2 1 7x3 5 234 28x1 1 x2 2 2x3 5 220 (a) Solve by Gauss elimination with partial pivoting. Show all steps of the computation. (b) Substitute your results into the original equations to check your answers. 9.12 Use Gauss-Jordan elimination to solve: 2x1 1 x2 2 x3 5 1 5x1 1 2x2 1 2x3 5 24 3x1 1 x2 1 x3 5 5 Do not employ pivoting. Check your answers by substituting them into the original equations. (9) [A] 3 {C} (11) [E]T [E] (10) [I] 3 [B] (12) {C}T {C} 9.3 Three matrices are defined as [A] 5 £ 1 6 3 10 7 4 § [B] 5 c 1 3 0.5 2 d [C] 5 c 2 22 23 1 d (a) Perform all possible multiplications that can be computed be- tween pairs of these matrices. (b) Use the method in Box PT3.2 to justify why the remaining pairs cannot be multiplied. (c) Use the results of (a) to illustrate why the order of multiplica- tion is important. 9.4 Use the graphical method to solve 4x1 2 8x2 5 224 2x1 1 6x2 5 34 Check your results by substituting them back into the equations. 9.5 Given the system of equations 21.1x1 1 10x2 5 120 22x1 1 17.4x2 5 174 (a) Solve graphically and check your results by substituting them back into the equations. (b) On the basis of the graphical solution, what do you expect re- garding the condition of the system? (c) Compute the determinant. (d) Solve by the elimination of unknowns. 9.6 For the set of equations 2x2 1 5x3 5 9 2x1 1 x2 1 x3 5 9 3x1 1 x2 5 10 (a) Compute the determinant. (b) Use Cramer’s rule to solve for the x’s. (c) Substitute your results back into the original equation to check your results. 9.7 Given the equations 0.5x1 2 x2 5 29.5 1.02x1 2 2x2 5 218.8 (a) Solve graphically. (b) Compute the determinant. (c) On the basis of (a) and (b), what would you expect regarding the system’s condition? (d) Solve by the elimination of unknowns.
  • 294. PROBLEMS 277 9.17 Develop, debug, and test a program in either a high-level lan- guage or macro language of your choice to generate the transpose of a matrix. Test it on the matrices from Prob. 9.3. 9.18 Develop, debug, and test a program in either a high-level lan- guage or macro language of your choice to solve a system of equa- tions with Gauss elimination with partial pivoting. Base the program on the pseudocode from Fig. 9.6. Test the program using the following system (which has an answer of x1 5 x2 5 x3 5 1), x1 1 2x2 2 x3 5 2 5x1 1 2x2 1 2x3 5 9 23x1 1 5x2 2 x3 5 1 9.19 Three masses are suspended vertically by a series of identi- cal springs where mass 1 is at the top and mass 3 is at the bottom. If g 5 9.81 m/s2 , m1 5 2 kg, m2 5 3 kg, m3 5 2.5 kg, and the k’s 5 10 kg/s2 , solve for the displacements x. 9.20 Develop, debug, and test a program in either a high-level lan- guage or macro language of your choice to solve a system of n si- multaneous nonlinear equations based on Sec. 9.6. Test the program by solving Prob. 7.12. 9.21 Recall from Sec. 8.2 that determining the chemistry of water exposed to atmospheric CO2 can be determined by solving five nonlinear equations (Eqs. 8.6 through 8.10) for five unknowns: cT, [HCO3 2 ], [CO3 22 ], [H1 ], and [OH2 ]. Employing the parameters from Sec. 8.2 and the program developed in Prob. 9.20, solve this system for conditions in 1958 when the partial pressure of CO2 was 315 ppm. Use your results to compute the pH. 9.13 Solve: x1 1 x2 2 x3 5 23 6x1 1 2x2 1 2x3 5 2 23x1 1 4x2 1 x3 5 1 with (a) naive Gauss elimination, (b) Gauss elimination with par- tial pivoting, and (c) Gauss-Jordan without partial pivoting. 9.14 Perform the same computation as in Example 9.11, but use five parachutists with the following characteristics: Parachutist Mass, kg Drag Coefficient, kg/s 1 55 10 2 75 12 3 60 15 4 75 16 5 90 10 The parachutists have a velocity of 9 m/s. 9.15 Solve c 3 1 2i 4 2 i 1 d e z1 z2 f 5 e 2 1 i 3 f 9.16 Develop, debug, and test a program in either a high-level lan- guage or macro language of your choice to multiply two matrices— that is, [X] 5 [Y][Z], where [Y] is m by n and [Z] is n by p. Test the program using the matrices from Prob. 9.3.
  • 295. 10 C H A P T E R 10 278 LU Decomposition and Matrix Inversion This chapter deals with a class of elimination methods called LU decomposition tech- niques. The primary appeal of LU decomposition is that the time-consuming elimination step can be formulated so that it involves only operations on the matrix of coefficients, [A]. Thus, it is well suited for those situations where many right-hand-side vectors {B} must be evaluated for a single value of [A]. Although there are a variety of ways in which this is done, we will focus on showing how the Gauss elimination method can be imple- mented as an LU decomposition. One motive for introducing LU decomposition is that it provides an efficient means to compute the matrix inverse. The inverse has a number of valuable applications in engineering practice. It also provides a means for evaluating system condition. 10.1 LU DECOMPOSITION As described in Chap. 9, Gauss elimination is designed to solve systems of linear alge- braic equations, [A]{X} 5 {B} (10.1) Although it certainly represents a sound way to solve such systems, it becomes inefficient when solving equations with the same coefficients [A], but with different right-hand-side constants (the b’s). Recall that Gauss elimination involves two steps: forward elimination and back- substitution (Fig. 9.3). Of these, the forward-elimination step comprises the bulk of the computational effort (recall Table 9.1). This is particularly true for large systems of equations. LU decomposition methods separate the time-consuming elimination of the matrix [A] from the manipulations of the right-hand side {B}. Thus, once [A] has been “decom- posed,” multiple right-hand-side vectors can be evaluated in an efficient manner. Interestingly, Gauss elimination itself can be expressed as an LU decomposition. Before showing how this can be done, let us first provide a mathematical overview of the decomposition strategy.
  • 296. 10.1 LU DECOMPOSITION 279 10.1.1 Overview of LU Decomposition Just as was the case with Gauss elimination, LU decomposition requires pivoting to avoid division by zero. However, to simplify the following description, we will defer the issue of pivoting until after the fundamental approach is elaborated. In addition, the following explanation is limited to a set of three simultaneous equations. The results can be directly extended to n-dimensional systems. Equation (10.1) can be rearranged to give [A]{X} 2 {B} 5 0 (10.2) Suppose that Eq. (10.2) could be expressed as an upper triangular system: £ u11 u12 u13 0 u22 u23 0 0 u33 § • x1 x2 x3 ¶ 5 • d1 d2 d3 ¶ (10.3) Recognize that this is similar to the manipulation that occurs in the first step of Gauss elimination. That is, elimination is used to reduce the system to upper triangular form. Equation (10.3) can also be expressed in matrix notation and rearranged to give [U]{X} 2 {D} 5 0 (10.4) Now, assume that there is a lower diagonal matrix with 1’s on the diagonal, [L] 5 £ 1 0 0 l21 1 0 l31 l32 1 § (10.5) that has the property that when Eq. (10.4) is premultiplied by it, Eq. (10.2) is the result. That is, [L]{[U]{X} 2 {D}} 5 [A]{X} 2 {B} (10.6) If this equation holds, it follows from the rules for matrix multiplication that [L][U] 5 [A] (10.7) and [L]{D} 5 {B} (10.8) A two-step strategy (see Fig. 10.1) for obtaining solutions can be based on Eqs. (10.4), (10.7), and (10.8): 1. LU decomposition step. [A] is factored or “decomposed” into lower [L] and upper [U] triangular matrices. 2. Substitution step. [L] and [U] are used to determine a solution {X} for a right-hand- side {B}. This step itself consists of two steps. First, Eq. (10.8) is used to generate an intermediate vector {D} by forward substitution. Then, the result is substituted into Eq. (10.4), which can be solved by back substitution for {X}. Now, let us show how Gauss elimination can be implemented in this way.
  • 297. 280 LU DECOMPOSITION AND MATRIX INVERSION 10.1.2 LU Decomposition Version of Gauss Elimination Although it might appear at face value to be unrelated to LU decomposition, Gauss elimination can be used to decompose [A] into [L] and [U]. This can be easily seen for [U], which is a direct product of the forward elimination. Recall that the forward- elimination step is intended to reduce the original coefficient matrix [A] to the form [U] 5 £ a11 a12 a13 0 a¿ 22 a¿23 0 0 a– 33 § (10.9) which is in the desired upper triangular format. Though it might not be as apparent, the matrix [L] is also produced during the step. This can be readily illustrated for a three-equation system, £ a11 a12 a13 a21 a22 a23 a31 a32 a33 § • x1 x2 x3 ¶ 5 • b1 b2 b3 ¶ The first step in Gauss elimination is to multiply row 1 by the factor [recall Eq. (9.13)] f21 5 a21 a11 and subtract the result from the second row to eliminate a21. Similarly, row 1 is multiplied by f31 5 a31 a11 FIGURE 10.1 The steps in LU decomposition. A X X X B B D D D U L L U ⫽ ⫽ Substitution ⫽ (b) Forward (c) Backward (a) Decomposition
  • 298. 10.1 LU DECOMPOSITION 281 and the result subtracted from the third row to eliminate a31. The final step is to multiply the modified second row by f32 5 a¿ 32 a¿ 22 and subtract the result from the third row to eliminate a¿ 32. Now suppose that we merely perform all these manipulations on the matrix [A]. Clearly, if we do not want to change the equation, we also have to do the same to the right-hand side {B}. But there is absolutely no reason that we have to perform the ma- nipulations simultaneously. Thus, we could save the f’s and manipulate {B} later. Where do we store the factors f21, f31, and f32? Recall that the whole idea behind the elimination was to create zeros in a21, a31, and a32. Thus, we can store f21 in a21, f31 in a31, and f32 in a32. After elimination, the [A] matrix can therefore be written as £ a11 a12 a13 f21 a¿ 22 a¿ 23 f31 f32 a– 33 § (10.10) This matrix, in fact, represents an efficient storage of the LU decomposition of [A], [A] S [L][U] (10.11) where [U] 5 £ a11 a12 a13 0 a¿22 a¿23 0 0 a– 33 § and [L] 5 £ 1 0 0 f21 1 0 f31 f32 1 § The following example confirms that [A] 5 [L][U]. EXAMPLE 10.1 LU Decomposition with Gauss Elimination Problem Statement. Derive an LU decomposition based on the Gauss elimination per- formed in Example 9.5. Solution. In Example 9.5, we solved the matrix [A] 5 £ 3 20.1 20.2 0.1 7 20.3 0.3 20.2 10 § After forward elimination, the following upper triangular matrix was obtained: [U] 5 £ 3 20.1 20.2 0 7.00333 20.293333 0 0 10.0120 §
  • 299. 282 LU DECOMPOSITION AND MATRIX INVERSION The factors employed to obtain the upper triangular matrix can be assembled into a lower triangular matrix. The elements a21 and a31 were eliminated by using the factors f21 5 0.1 3 5 0.03333333 f31 5 0.3 3 5 0.1000000 and the element a¿ 32 was eliminated by using the factor f32 5 20.19 7.00333 5 20.0271300 Thus, the lower triangular matrix is [L] 5 £ 1 0 0 0.0333333 1 0 0.100000 20.0271300 1 § Consequently, the LU decomposition is [A] 5 [L][U] 5 £ 1 0 0 0.0333333 1 0 0.100000 20.0271300 1 § £ 3 20.1 20.2 0 7.00333 2 0.293333 0 0 10.0120 § This result can be verified by performing the multiplication of [L][U] to give [L][U] 5 £ 3 20.1 20.2 0.0999999 7 20.3 0.3 20.2 9.99996 § where the minor discrepancies are due to round-off. The following is pseudocode for a subroutine to implement the decomposition phase: SUB Decompose (a, n) DOFOR k 5 1, n 2 1 DOFOR i 5 k 11, n factor 5 ai,k/ ak,k ai,k 5 factor DOFOR j 5 k 1 1, n ai,j 5 ai,j 2 factor * ak,j END DO END DO END DO END Decompose Notice that this algorithm is “naive” in the sense that pivoting is not included. This feature will be added later when we develop the full algorithm for LU decomposition. After the matrix is decomposed, a solution can be generated for a particular right- hand-side vector {B}. This is done in two steps. First, a forward-substitution step is executed by solving Eq. (10.8) for {D}. It is important to recognize that this merely
  • 300. 10.1 LU DECOMPOSITION 283 amounts to performing the elimination manipulations on {B}. Thus, at the end of this step, the right-hand side will be in the same state that it would have been had we per- formed forward manipulation on [A] and {B} simultaneously. The forward-substitution step can be represented concisely as di 5 bi 2 a i21 j51 aij dj for i 5 2, 3, p , n (10.12) The second step then merely amounts to implementing back substitution, as in Eq. (10.4). Again, it is important to recognize that this is identical to the back-substitution phase of conventional Gauss elimination. Thus, in a fashion similar to Eqs. (9.16) and (9.17), the back-substitution step can be represented concisely as xn 5 dnyann (10.13) xi 5 di 2 a n j5i11 aij xj aii for i 5 n 2 1, n 2 2, p , 1 (10.14) EXAMPLE 10.2 The Substitution Steps Problem Statement. Complete the problem initiated in Example 10.1 by generating the final solution with forward and back substitution. Solution. As stated above, the intent of forward substitution is to impose the elimination manipulations, that we had formerly applied to [A], on the right-hand-side vector {B}. Recall that the system being solved in Example 9.5 was £ 3 20.1 20.2 0.1 7 20.3 0.3 20.2 10 § • x1 x2 x3 ¶ 5 • 7.85 219.3 71.4 ¶ and that the forward-elimination phase of conventional Gauss elimination resulted in £ 3 20.1 20.2 0 7.00333 20.293333 0 0 10.0120 § • x1 x2 x3 ¶ 5 • 7.85 219.5617 70.0843 ¶ (E10.2.1) The forward-substitution phase is implemented by applying Eq. (10.7) to our problem, £ 1 0 0 0.0333333 1 0 0.100000 20.0271300 1 § • d1 d2 d3 ¶ 5 • 7.85 219.3 71.4 ¶ or multiplying out the left-hand side, d1 5 7.85 0.0333333d1 1 d2 5 219.3 0.1d1 2 0.02713d2 1 d3 5 71.4
  • 301. 284 LU DECOMPOSITION AND MATRIX INVERSION We can solve the first equation for d1, d1 5 7.85 which can be substituted into the second equation to solve for d2 5 219.3 2 0.0333333(7.85) 5 219.5617 Both d1 and d2 can be substituted into the third equation to give d3 5 71.4 2 0.1(7.85) 1 0.02713(219.5617) 5 70.0843 Thus, {D} 5 • 7.85 219.5617 70.0843 ¶ which is identical to the right-hand side of Eq. (E10.2.1). This result can then be substituted into Eq. (10.4), [U]{X} 5 {D}, to give £ 3 20.1 20.2 0 7.00333 20.293333 0 0 10.0120 § • x1 x2 x3 ¶ 5 • 7.85 219.5617 70.0843 ¶ which can be solved by back substitution (see Example 9.5 for details) for the final solution, {X} 5 • 3 22.5 7.00003 ¶ The following is pseudocode for a subroutine to implement both substitution phases: SUB Substitute (a, n, b, x) 'forward substitution DOFOR i 5 2, n sum 5 bi DOFOR j 5 1, i 2 1 sum 5 sum 2 ai,j * bj END DO bi 5 sum END DO 'back substitution xn 5 bn/an,n DOFOR i 5 n 2 1, 1, 21 sum 5 0 DOFOR j 5 i 1 1, n sum 5 sum 1 ai,j * xj END DO xi 5 (bi 2 sum)/ai,i END DO END Substitute
  • 302. 10.1 LU DECOMPOSITION 285 The LU decomposition algorithm requires the same total multiply/divide flops as for Gauss elimination. The only difference is that a little less effort is expended in the de- composition phase since the operations are not applied to the right-hand side. Thus, the number of multiply/divide flops involved in the decomposition phase can be calculated as n3 3 2 n 3 ——— — S as n increases n3 3 1 O(n) (10.15) Conversely, the substitution phase takes a little more effort. Thus, the number of flops for forward and back substitution is n2 . The total effort is therefore identical to Gauss elimination n3 3 2 n 3 1 n2 ——— — S as n increases n3 3 1 O(n2 ) (10.16) 10.1.3 LU Decomposition Algorithm An algorithm to implement an LU decomposition expression of Gauss elimination is listed in Fig. 10.2. Four features of this algorithm bear mention: The factors generated during the elimination phase are stored in the lower part of the matrix. This can be done because these are converted to zeros anyway and are unnecessary for the final solution. This storage saves space. This algorithm keeps track of pivoting by using an order vector o. This greatly speeds up the algorithm because only the order vector (as opposed to the whole row) is pivoted. The equations are not scaled, but scaled values of the elements are used to determine whether pivoting is to be implemented. The diagonal term is monitored during the pivoting phase to detect near-zero occurrences in order to flag singular systems. If it passes back a value of er 5 21, a singular matrix has been detected and the computation should be terminated. A parameter tol is set by the user to a small number in order to detect near-zero occurrences. 10.1.4 Crout Decomposition Notice that for the LU decomposition implementation of Gauss elimination, the [L] matrix has 1’s on the diagonal. This is formally referred to as a Doolittle decomposition, or fac- torization. An alternative approach involves a [U] matrix with 1’s on the diagonal. This is called Crout decomposition. Although there are some differences between the approaches (Atkinson, 1978; Ralston and Rabinowitz, 1978), their performance is comparable. The Crout decomposition approach generates [U] and [L] by sweeping through the matrix by columns and rows, as depicted in Fig. 10.3. It can be implemented by the following concise series of formulas: li,1 5 ai,1 for i 5 1, 2, p , n (10.17) u1j 5 a1j l11 for j 5 2, 3, p , n (10.18)
  • 303. 286 LU DECOMPOSITION AND MATRIX INVERSION For j 5 2, 3, . . . , n 2 1 lij 5 aij 2 a j21 k51 likukj for i 5 j, j 1 1, p , n (10.19) ujk 5 ajk 2 a j21 i51 ljiuik ljj for k 5 j 1 1, j 1 2, p , n (10.20) SUB Ludecomp (a, b, n, tol, x, er) DIM on, sn er 5 0 CALL Decompose(a, n, tol, o, s, er) IF er ,. 21 THEN CALL Substitute(a, o, n, b, x) END IF END Ludecomp SUB Decompose (a, n, tol, o, s, er) DOFOR i 5 1, n oi 5 i si 5 ABS(ai,1) DOFOR j 5 2, n IF ABS(ai,j).si THEN si 5 ABS(ai,j) END DO END DO DOFOR k 5 1, n 2 1 CALL Pivot(a, o, s, n, k) IF ABS(ao(k),kyso(k)) , tol THEN er 5 21 PRINT ao(k),kyso(k) EXIT DO END IF DOFOR i 5 k 1 1, n factor 5 ao(i),kyao(k),k ao(i),k 5 factor DOFOR j 5 k 1 1, n ao(i),j 5 ao(i),j 2 factor * ao(k),j END DO END DO END DO IF ABS(ao(k),kyso(k)) , tol THEN er 5 21 PRINT ao(k),kyso(k) FIGURE 10.2 Pseudocode for an LU decomposition algorithm. END IF END Decompose SUB Pivot (a, o, s, n, k) p 5 k big 5 ABS(ao(k),kyso(k)) DOFOR ii 5 k 1 1, n dummy 5 ABS(ao(ii),kyso(ii)) IF dummy . big THEN big 5 dummy p 5 ii END IF END DO dummy 5 op op 5 ok ok 5 dummy END Pivot SUB Substitute (a, o, n, b, x) DOFOR i 5 2, n sum 5 bo(i) DOFOR j 5 1, i 2 1 sum 5 sum 2 ao(i),j * bo(j) END DO bo(i) 5 sum END DO xn 5 bo(n)yao(n),n DOFOR i 5 n 2 1, 1, 21 sum 5 0 DOFOR j 5 i 1 1, n sum 5 sum 1 ao(i),j * xj END DO xi 5 (bo(i) 2 sum)yao(i),i END DO END Substitute FIGURE 10.3 A schematic depicting the evaluations involved in Crout LU decomposition. (a) (b) (c) (d)
  • 304. 10.2 THE MATRIX INVERSE 287 and lnn 5 ann 2 a n21 k51 lnkukn (10.21) Aside from the fact that it consists of a few concise loops, the foregoing approach also has the benefit that storage space can be economized. There is no need to store the 1’s on the diagonal of [U] or the 0’s for [L] or [U] because they are givens in the method. Con- sequently, the values of [U] can be stored in the zero space of [L]. Further, close examina- tion of the foregoing derivation makes it clear that after each element of [A] is employed once, it is never used again. Therefore, as each element of [L] and [U] is computed, it can be substituted for the corresponding element (as designated by its subscripts) of [A]. Pseudocode to accomplish this is presented in Fig. 10.4. Notice that Eq. (10.17) is not included in the pseudocode because the first column of [L] is already stored in [A]. Otherwise, the algorithm directly follows from Eqs. (10.18) through (10.21). 10.2 THE MATRIX INVERSE In our discussion of matrix operations (Sec. PT3.2.2), we introduced the notion that if a matrix [A] is square, there is another matrix, [A]21 , called the inverse of [A], for which [Eq. (PT3.3)] [A][A21 ] 5 [A]21 [A] 5 [I] DOFOR j 5 2, n a1,j 5 a1,jya1,1 END DO DOFOR j 5 2, n 2 1 DOFOR i 5 j, n sum 5 0 DOFOR k 5 1, j 2 1 sum 5 sum 1 ai,k ? ak,j END DO ai,j 5 ai,j 2 sum END DO DOFOR k 5 j 1 1, n sum 5 0 DOFOR i 5 1, j 2 1 sum 5 sum 1 aj,i ? ai,k END DO aj,k 5 (aj,k 2 sum)yaj,j END DO END DO sum 5 0 DOFOR k 5 1, n 2 1 sum 5 sum 1 an,k ? ak,n END DO an,n 5 an,n 2 sum FIGURE 10.4 Pseudocode for Crout’s LU decomposition algorithm.
  • 305. 288 LU DECOMPOSITION AND MATRIX INVERSION Now we will focus on how the inverse can be computed numerically. Then we will explore how it can be used for engineering analysis. 10.2.1 Calculating the Inverse The inverse can be computed in a column-by-column fashion by generating solutions with unit vectors as the right-hand-side constants. For example, if the right-hand-side constant has a 1 in the first position and zeros elsewhere, {b} 5 • 1 0 0 ¶ the resulting solution will be the first column of the matrix inverse. Similarly, if a unit vector with a 1 at the second row is used {b} 5 • 0 1 0 ¶ the result will be the second column of the matrix inverse. The best way to implement such a calculation is with the LU decomposition algorithm described at the beginning of this chapter. Recall that one of the great strengths of LU decomposition is that it provides a very efficient means to evaluate multiple right- hand-side vectors. Thus, it is ideal for evaluating the multiple unit vectors needed to compute the inverse. EXAMPLE 10.3 Matrix Inversion Problem Statement. Employ LU decomposition to determine the matrix inverse for the system from Example 10.2. [A] 5 £ 3 20.1 20.2 0.1 7 20.3 0.3 20.2 10 § Recall that the decomposition resulted in the following lower and upper triangular matrices: [U] 5 £ 3 20.1 20.2 0 7.00333 20.293333 0 0 10.0120 § [L] 5 £ 1 0 0 0.0333333 1 0 0.100000 20.0271300 1 § Solution. The first column of the matrix inverse can be determined by performing the forward-substitution solution procedure with a unit vector (with 1 in the first row) as the right-hand-side vector. Thus, Eq. (10.8), the lower-triangular system, can be set up as £ 1 0 0 0.0333333 1 0 0.100000 20.0271300 1 § • d1 d2 d3 ¶ 5 • 1 0 0 ¶
  • 306. 10.2 THE MATRIX INVERSE 289 and solved with forward substitution for {D}T 5 :1 20.03333 20.1009;. This vector can then be used as the right-hand side of Eq. (10.3), £ 3 20.1 20.2 0 7.00333 20.293333 0 0 10.0120 § • x1 x2 x3 ¶ 5 • 1 20.03333 20.1009 ¶ which can be solved by back substitution for {X}T 5 :0.33249 20.00518 20.01008;, which is the first column of the matrix, [A]21 5 £ 0.33249 0 0 20.00518 0 0 20.01008 0 0 § To determine the second column, Eq. (10.8) is formulated as £ 1 0 0 0.0333333 1 0 0.100000 20.0271300 1 § • d1 d2 d3 ¶ 5 • 0 1 0 ¶ This can be solved for {D}, and the results are used with Eq. (10.3) to determine {X}T 5 :0.004944 0.142903 0.00271;, which is the second column of the matrix, [A]21 5 £ 0.33249 0.004944 0 20.00518 0.142903 0 20.01008 0.00271 0 § Finally, the forward- and back-substitution procedures can be implemented with {B}T 5 :0 0 1; to solve for {X}T 5 :0.006798 0.004183 0.09988;, which is the final column of the matrix, [A]21 5 £ 0.33249 0.004944 0.006798 20.00518 0.142903 0.004183 20.01008 0.00271 0.09988 § The validity of this result can be checked by verifying that [A][A]21 5 [I]. Pseudocode to generate the matrix inverse is shown in Fig. 10.5. Notice how the decomposition subroutine from Fig. 10.2 is called to perform the decomposition and then generates the inverse by repeatedly calling the substitution algorithm with unit vectors. The effort required for this algorithm is simply computed as n3 3 2 n 3 1 n(n2 ) 5 4n3 3 2 n 4 (10.22) decomposition 1 n 3 substitutions where from Sec. 10.1.2, the decomposition is defined by Eq. (10.15) and the effort in- volved with every right-hand-side evaluation involves n2 multiply/divide flops.
  • 307. 290 LU DECOMPOSITION AND MATRIX INVERSION 10.2.2 Stimulus-Response Computations As discussed in Sec. PT3.1.2, many of the linear systems of equations confronted in engi- neering practice are derived from conservation laws. The mathematical expression of these laws is some form of balance equation to ensure that a particular property—mass, force, heat, momentum, or other—is conserved. For a force balance on a structure, the properties might be horizontal or vertical components of the forces acting on each node of the structure (see Sec. 12.2). For a mass balance, the properties might be the mass in each reactor of a chemical process (see Sec. 12.1). Other fields of engineering would yield similar examples. A single balance equation can be written for each part of the system, resulting in a set of equations defining the behavior of the property for the entire system. These equa- tions are interrelated, or coupled, in that each equation may include one or more of the variables from the other equations. For many cases, these systems are linear and, there- fore, of the exact form dealt with in this chapter: [A]{X} 5 {B} (10.23) Now, for balance equations, the terms of Eq. (10.23) have a definite physical interpreta- tion. For example, the elements of {X} are the levels of the property being balanced for each part of the system. In a force balance of a structure, they represent the horizontal and vertical forces in each member. For the mass balance, they are the mass of chemical in each reactor. In either case, they represent the system’s state or response, which we are trying to determine. The right-hand-side vector {B} contains those elements of the balance that are in- dependent of behavior of the system—that is, they are constants. As such, they often represent the external forces or stimuli that drive the system. FIGURE 10.5 Driver program that uses some of the subprograms from Fig. 10.2 to generate a matrix inverse. CALL Decompose (a, n, tol, o, s, er) IF er 5 0 THEN DOFOR i 5 1, n DOFOR j 5 1, n IF i 5 j THEN b(j) 5 1 ELSE b(j) 5 0 END IF END DO CALL Substitute (a, o, n, b, x) DOFOR j 5 1, n ai(j, i) 5 x(j) END DO END DO Output ai, if desired ELSE PRINT ill-conditioned system END IF
  • 308. 10.3 ERROR ANALYSIS AND SYSTEM CONDITION 291 Finally, the matrix of coefficients [A] usually contains the parameters that express how the parts of the system interact or are coupled. Consequently, Eq. (10.23) might be reexpressed as [Interactions]{response} 5 {stimuli} Thus, Eq. (10.23) can be seen as an expression of the fundamental mathematical model that we formulated previously as a single equation in Chap. 1 [recall Eq. (1.1)]. We can now see that Eq. (10.23) represents a version that is designed for coupled systems involv- ing several dependent variables {X}. As we know from this chapter and Chap. 9, there are a variety of ways to solve Eq. (10.23). However, using the matrix inverse yields a particularly interesting result. The formal solution can be expressed as {X} 5 [A]21 {B} or (recalling our definition of matrix multiplication from Box PT3.2) x1 5 a21 11 b1 1 a21 12 b2 1 a21 13 b3 x2 5 a21 21 b1 1 a21 22 b2 1 a21 23 b3 x3 5 a21 31 b1 1 a21 32 b2 1 a21 33 b3 Thus, we find that the inverted matrix itself, aside from providing a solution, has ex- tremely useful properties. That is, each of its elements represents the response of a single part of the system to a unit stimulus of any other part of the system. Notice that these formulations are linear and, therefore, superposition and propor- tionality hold. Superposition means that if a system is subject to several different stimuli (the b’s), the responses can be computed individually and the results summed to obtain a total response. Proportionality means that multiplying the stimuli by a quantity results in the response to those stimuli being multiplied by the same quantity. Thus, the coef- ficient a21 11 is a proportionality constant that gives the value of x1 due to a unit level of b1. This result is independent of the effects of b2 and b3 on x1, which are reflected in the coefficients a21 12 and a21 13 , respectively. Therefore, we can draw the general conclusion that the element a21 ij of the inverted matrix represents the value of xi due to a unit quan- tity of bj. Using the example of the structure, element a21 ij of the matrix inverse would represent the force in member i due to a unit external force at node j. Even for small systems, such behavior of individual stimulus-response interactions would not be intui- tively obvious. As such, the matrix inverse provides a powerful technique for understand- ing the interrelationships of component parts of complicated systems. This power will be demonstrated in Secs. 12.1 and 12.2. 10.3 ERROR ANALYSIS AND SYSTEM CONDITION Aside from its engineering applications, the inverse also provides a means to discern whether systems are ill-conditioned. Three methods are available for this purpose: 1. Scale the matrix of coefficients [A] so that the largest element in each row is 1. Invert the scaled matrix and if there are elements of [A]21 that are several orders of magnitude greater than one, it is likely that the system is ill-conditioned (see Box 10.1).
  • 309. 292 LU DECOMPOSITION AND MATRIX INVERSION 2. Multiply the inverse by the original coefficient matrix and assess whether the result is close to the identity matrix. If not, it indicates ill-conditioning. 3. Invert the inverted matrix and assess whether the result is sufficiently close to the original coefficient matrix. If not, it again indicates that the system is ill-conditioned. Although these methods can indicate ill-conditioning, it would be preferable to ob- tain a single number (such as the condition number from Sec. 4.2.3) that could serve as an indicator of the problem. Attempts to formulate such a matrix condition number are based on the mathematical concept of the norm. 10.3.1 Vector and Matrix Norms A norm is a real-valued function that provides a measure of the size or “length” of multicomponent mathematical entities such as vectors and matrices (see Box 10.2). A simple example is a vector in three-dimensional Euclidean space (Fig. 10.6) that can be represented as :F; 5 :a b c; where a, b, and c are the distances along the x, y, and z axes, respectively. The length of this vector—that is, the distance from the coordinate (0, 0, 0) to (a, b, c)—can be simply computed as BFBe 5 2a2 1 b2 1 c2 where the nomenclature BFBe indicates that this length is referred to as the Euclidean norm of [F]. Box 10.1 Interpreting the Elements of the Matrix Inverse as a Measure of Ill-Conditioning One method for assessing a system’s condition is to scale [A] so that the largest element in each row is 1 and then compute [A]21 . If elements of [A]21 are several orders of magnitude greater than the elements of the original scaled matrix, it is likely that the system is ill-conditioned. Insight into this approach can be gained by recalling that a way to check whether an approximate solution {X} is acceptable is to substitute it into the original equations and see whether the origi- nal right-hand-side constants result. This is equivalent to {R} 5 {B} 2 [A]{X̃} (B10.1.1) where {R} is the residual between the right-hand-side constants and the values computed with the solution {X̃}. If {R} is small, we might conclude that the {X̃} values are adequate. However, suppose that {X} is the exact solution that yields a zero residual, as in {0} 5 {B} 2 [A]{X} (B10.1.2) Subtracting Eq. (B10.1.2) from (B10.1.1) yields {R} 5 [A] {X} 2 {X̃} Multiplying both sides of this equation by [A]21 gives {X} 2 {X̃} 5 [A]21 {R} This result indicates why checking a solution by substitution can be misleading. For cases where elements of [A]21 are large, a small discrepancy in the right-hand-side residual {R} could cor- respond to a large error {X} 2 {X̃} in the calculated value of the unknowns. In other words, a small residual does not guarantee an accurate solution. However, we can conclude that if the largest element of [A]21 is on the order of magnitude of unity, the system can be considered to be well-conditioned. Conversely, if [A]21 includes elements much larger than unity, we conclude that the system is ill-conditioned.
  • 310. 10.3 ERROR ANALYSIS AND SYSTEM CONDITION 293 Box 10.2 Matrix Norms As developed in this section, Euclidean norms can be employed to quantify the size of a vector, BXBe 5 B a n i51 x2 i or matrix, BABe 5 B a n i51 a n j51 a2 i, j For vectors, there are alternatives called p norms that can be represented generally by BXBp 5 a a n i51 Zxi Zp b 1yp We can also see that the Euclidean norm and the 2 norm, BXB2, are identical for vectors. Other important examples are BXB1 5 a n i51 Zxi Z which represents the norm as the sum of the absolute values of the elements. Another is the maximum-magnitude or uniform-vector norm. BXBq 5 max 1#i#n Zxi Z which defines the norm as the element with the largest absolute value. Using a similar approach, norms can be developed for matrices. For example, BAB1 5 max 1#j#n a n i51 Zaij Z That is, a summation of the absolute values of the coefficients is performed for each column, and the largest of these summations is taken as the norm. This is called the column-sum norm. A similar determination can be made for the rows, resulting in a uniform-matrix or row-sum norm, BABq 5 max 1#i#n a n j51 Zaij Z It should be noted that, in contrast to vectors, the 2 norm and the Euclidean norm for a matrix are not the same. Whereas the Euclidean norm BABe can be easily determined by Eq. (10.24), the matrix 2 norm BAB2 is calculated as BAB2 5 (mmax)1y2 where mmax is the largest eigenvalue of [A]T [A]. In Chap. 27, we will learn more about eigenvalues. For the time being, the impor- tant point is that the BAB2, or spectral, norm is the minimum norm and, therefore, provides the tightest measure of size (Ortega 1972). FIGURE 10.6 Graphical depiction of a vector :F; 5 :a b c; in Euclidean space. y x a 2 ⫹ b 2 ⫹ c 2 b 储F 储 e = z c a
  • 311. 294 LU DECOMPOSITION AND MATRIX INVERSION Similarly, for an n-dimensional vector :X; 5 :x1 x2 p xn ;, a Euclidean norm would be computed as BXBe 5 B a n i51 x2 i The concept can be extended further to a matrix [A], as in BABe 5 B a n i51 a n j51 a2 i,j (10.24) which is given a special name—the Frobenius norm. However, as with the other vector norms, it provides a single value to quantify the “size” of [A]. It should be noted that there are alternatives to the Euclidean and Frobenius norms (see Box 10.2). For example, a uniform vector norm is defined as BXBq 5 max 1#i#n Zxi Z That is, the element with the largest absolute value is taken as the measure of the vector’s size. Similarly, a uniform matrix norm or row-sum norm is defined as BABq 5 max 1#i#n a n j51 Zaij Z (10.25) In this case, the sum of the absolute value of the elements is computed for each row, and the largest of these is taken as the norm. Although there are theoretical benefits for using certain of the norms, the choice is sometimes influenced by practical considerations. For example, the uniform-row norm is widely used because of the ease with which it can be calculated and the fact that it usu- ally provides an adequate measure of matrix size. 10.3.2 Matrix Condition Number Now that we have introduced the concept of the norm, we can use it to define Cond [A] 5 BAB # BA21 B (10.26) where Cond [A] is called the matrix condition number. Note that for a matrix [A], this number will be greater than or equal to 1. It can be shown (Ralston and Rabinowitz, 1978; Gerald and Wheatley, 2004) that B¢XB BXB # Cond [A] B¢AB BAB That is, the relative error of the norm of the computed solution can be as large as the relative error of the norm of the coefficients of [A] multiplied by the condition number. For example, if the coefficients of [A] are known to t-digit precision (that is, rounding errors are on the order of 102t ) and Cond [A] 5 10c , the solution [X] may be valid to only t 2 c digits (rounding errors ,10c2t ).
  • 312. 10.3 ERROR ANALYSIS AND SYSTEM CONDITION 295 EXAMPLE 10.4 Matrix Condition Evaluation Problem Statement. The Hilbert matrix, which is notoriously ill-conditioned, can be represented generally as F 1 1y2 1y3 p 1yn 1y2 1y3 1y4 p 1y(n 1 1) . . . . . . . . . . . . 1yn 1y(n 1 1) 1y(n 1 2) p 1y(2n 2 1) V Use the row-sum norm to estimate the matrix condition number for the 3 3 3 Hilbert matrix, [A] 5 £ 1 1y2 1y3 1y2 1y3 1y4 1y3 1y4 1y5 § Solution. First, the matrix can be normalized so that the maximum element in each row is 1, [A] 5 £ 1 1y2 1y3 1 2y3 1y2 1 3y4 3y5 § Summing each of the rows gives 1.833, 2.1667, and 2.35. Thus, the third row has the largest sum and the row-sum norm is BABq 5 1 1 3 4 1 3 5 5 2.35 The inverse of the scaled matrix can be computed as [A]21 5 £ 9 218 10 236 96 260 30 290 60 § Note that the elements of this matrix are larger than the original matrix. This is also reflected in its row-sum norm, which is computed as BA21 Bq 5 Z236Z 1 Z96Z 1 Z260Z 5 192 Thus, the condition number can be calculated as Cond [A] 5 2.35(192) 5 451.2 The fact that the condition number is considerably greater than unity suggests that the system is ill-conditioned. The extent of the ill-conditioning can be quantified by calculating c 5 log 451.2 5 2.65. Computers using IEEE floating-point representation
  • 313. 296 LU DECOMPOSITION AND MATRIX INVERSION have approximately t 5 log 2224 5 7.2 significant base-10 digits (recall Sec. 3.4.1). Therefore, the solution could exhibit rounding errors of up to 10(2.65-7.2) 5 3 3 1025 . Note that such estimates almost always overpredict the actual error. However, they are useful in alerting you to the possibility that round-off errors may be significant. Practically speaking, the problem with implementing Eq. (10.26) is the computa- tional price required to obtain BA21 B. Rice (1983) outlines some possible strategies to mitigate this problem. Further, he suggests an alternative way to assess system condi- tion: run the same solution on two different compilers. Because the resulting codes will likely implement the arithmetic differently, the effect of ill-conditioning should be evi- dent from such an experiment. Finally, it should be mentioned that software packages such as MATLAB software and Mathcad have the capability to conveniently compute matrix condition. We will review these capabilities when we review such packages at the end of Chap. 11. 10.3.3 Iterative Refinement In some cases, round-off errors can be reduced by the following procedure. Suppose that we are solving the following set of equations: a11x1 1 a12x2 1 a13x3 5 b1 a21x1 1 a22x2 1 a23x3 5 b2 (10.27) a31x1 1 a32x2 1 a33x3 5 b3 For conciseness, we will limit the following discussion to this small (3 3 3) system. However, the approach is generally applicable to larger sets of linear equations. Suppose an approximate solution vector is given by {X ˜ }T 5 :x̃1 x̃2 x̃3 ;. This solution can be substituted into Eq. (10.27) to give a11x̃1 1 a12x̃2 1 a13x̃3 5 b̃1 a21x̃1 1 a22x̃2 1 a23x̃3 5 b̃2 (10.28) a31x̃1 1 a32x̃2 1 a33x̃3 5 b̃3 Now, suppose that the exact solution {X} is expressed as a function of the approximate solution and a vector of correction factors {DX}, where x1 5 x̃1 1 ¢x1 x2 5 x̃2 1 ¢x2 (10.29) x3 5 x̃3 1 ¢x3 If these results are substituted into Eq. (10.27), the following system results: a11(x̃1 1 ¢x1) 1 a12(x̃2 1 ¢x2) 1 a13(x̃3 1 ¢x3) 5 b1 a21(x̃1 1 ¢x1) 1 a22(x̃2 1 ¢x2) 1 a23(x̃3 1 ¢x3) 5 b2 (10.30) a31(x̃1 1 ¢x1) 1 a32(x̃2 1 ¢x2) 1 a33(x̃3 1 ¢x3) 5 b3
  • 314. PROBLEMS 297 Now, Eq. (10.28) can be subtracted from Eq. (10.30) to yield a11¢x1 1 a12¢x2 1 a13¢x3 5 b1 2 b̃1 5 E1 a21¢x1 1 a22¢x2 1 a23¢x3 5 b2 2 b̃2 5 E2 (10.31) a31¢x1 1 a32¢x2 1 a33¢x3 5 b3 2 b̃3 5 E3 This system itself is a set of simultaneous linear equations that can be solved to obtain the correction factors. The factors can then be applied to improve the solution, as specified by Eq. (10.29). It is relatively straightforward to integrate an iterative refinement procedure into com- puter programs for elimination methods. It is especially effective for the LU decomposition approaches described earlier, which are designed to evaluate different right-hand-side vec- tors efficiently. Note that to be effective for correcting ill-conditioned systems, the E’s in Eq. (10.31) must be expressed in double precision. PROBLEMS 10.1 Use the rules of matrix multiplication to prove that Eqs. (10.7) and (10.8) follow from Eq. (10.6). 10.2 (a) Use naive Gauss elimination to decompose the following system according to the description in Sec. 10.1.2. 10x1 1 2x2 2 x3 5 27 23x1 2 6x2 1 2x3 5 261.5 x1 1 x2 1 5x3 5 221.5 Then, multiply the resulting [L] and [U] matrices to determine that [A] is produced. (b) Use LU decomposition to solve the system. Show all the steps in the computation. (c) Also solve the system for an alternative right-hand-side vector: {B}T 5 :12 18 26;. 10.3 (a) Solve the following system of equations by LU decomposition without pivoting 8x1 1 4x2 2 x3 5 11 22x1 1 5x2 1 x3 5 4 2x1 2 x2 1 6x3 5 7 (b) Determine the matrix inverse. Check your results by verifying that [A][A]21 5 [I]. 10.4 Solve the following system of equations using LU decompo- sition with partial pivoting: 2x1 2 6x2 2 x3 5 238 23x1 2 x2 1 7x3 5 234 28x1 1 x2 2 2x3 5 220 10.5 Determine the total flops as a function of the number of equations n for the (a) decomposition, (b) forward-substitution, and (c) back-substitution phases of the LU decomposition version of Gauss elimination. 10.6 Use LU decomposition to determine the matrix inverse for the following system. Do not use a pivoting strategy, and check your results by verifying that [A][A]21 5 [I]. 10x1 1 2x2 2 x3 5 27 23x1 2 6x2 1 2x3 5 261.5 x1 1 x2 1 5x3 5 221.5 10.7 Perform Crout decomposition on 2x1 2 5x2 1 x3 5 12 2x1 1 3x2 2 x3 5 28 3x1 2 4x2 1 2x3 5 16 Then, multiply the resulting [L] and [U] matrices to determine that [A] is produced. 10.8 The following system of equations is designed to determine concentrations (the c’s in gym3 ) in a series of coupled reactors as a function of the amount of mass input to each reactor (the right-hand sides in gyday), 15c1 2 3c2 2 c3 5 3800 23c1 1 18c2 2 6c3 5 1200 24c1 2 c2 1 12c3 5 2350 (a) Determine the matrix inverse. (b) Use the inverse to determine the solution. (c) Determine how much the rate of mass input to reactor 3 must be increased to induce a 10 g/m3 rise in the concentration of reactor 1. (d) How much will the concentration in reactor 3 be reduced if the rate of mass input to reactors 1 and 2 is reduced by 500 and 250 g/day, respectively?
  • 315. 298 LU DECOMPOSITION AND MATRIX INVERSION How many digits of precision will be lost due to ill-conditioning? (b) Repeat (a), but scale the matrix by making the maximum ele- ment in each row equal to one. 10.16 Determine the condition number based on the row-sum norm for the normalized 5 3 5 Hilbert matrix. How many signifi- cant digits of precision will be lost due to ill-conditioning? 10.17 Besides the Hilbert matrix, there are other matrices that are inherently ill-conditioned. One such case is the Vandermonde matrix, which has the following form: £ x2 1 x1 1 x2 2 x2 1 x2 3 x3 1 § (a) Determine the condition number based on the row-sum norm for the case where x1 5 4, x2 5 2, and x3 5 7. (b) Use MATLAB or Mathcad software to compute the spectral and Frobenius condition numbers. 10.18 Develop a user-friendly program for LU decomposition based on the pseudocode from Fig. 10.2. 10.19 Develop a user-friendly program for LU decomposition, in- cluding the capability to evaluate the matrix inverse. Base the pro- gram on Figs. 10.2 and 10.5. 10.20 Use iterative refinement techniques to improve x1 5 2, x2 5 23, and x3 5 8, which are approximate solutions of 2x1 1 5x2 1 x3 5 25 5x1 1 2x2 1 x3 5 12 x1 1 2x2 1 x3 5 3 10.21 Consider vectors: A S 5 2i S 2 3 j S 1 ak S B S 5 bi S 1 j S 2 4k S C S 5 3i S 1 c j S 1 2k S Vector A S is perpendicular to B S as well as to C S . It is also known that B S # C S 5 2. Use any method studied in this chapter to solve for the three unknowns, a, b, and c. 10.22 Consider the following vectors: A S 5 ai S 1 b j S 1 ck S B S 5 22i S 1 j S 2 4k S C S 5 i S 1 3 j S 1 2k S where A S is an unknown vector. If (A S 3 B S ) 1 (A S 3 C S ) 5 (5a 1 6) i S 1 (3b 2 2) j S 1 (24c 1 1) k S use any method learned in this chapter to solve for the three un- knowns, a, b, and c. 10.9 Solve the following set of equations with LU decomposition: 3x1 2 2x2 1 x3 5 210 2x1 1 6x2 2 4x3 5 44 2x1 2 2x2 1 5x3 5 226 10.10 (a) Determine the LU decomposition without pivoting by hand for the following matrix and check your results by validating that [L][U] 5 [A]. £ 8 2 1 3 7 2 2 3 9 § (b) Employ the result of (a) to compute the determinant. (c) Repeat (a) and (b) using MATLAB. 10.11 Use the following LU decomposition to (a) compute the de- terminant and (b) solve [A]{x} 5 {b} with {b}T 5 :210 44 226=. [A] 5 [L][U] 5 £ 1 0.6667 1 20.3333 20.3636 1 § £ 3 22 1 7.3333 24.6667 3.6364 § 10.12 Determine BABe, BAB1, and BABq for [A] 5 £ 8 2 210 29 1 3 15 21 6 § Scale the matrix by making the maximum element in each row equal to one. 10.13 Determine the Frobenius and the row-sum norms for the systems in Probs. 10.3 and 10.4. Scale the matrices by making the maximum element in each row equal to one. 10.14 A matrix [A] is defined as [A] 5 ≥ 0.125 0.25 0.5 1 0.015625 0.625 0.25 1 0.00463 0.02777 0.16667 1 0.001953 0.015625 0.125 1 ¥ Using the column-sum norm, compute the condition number and how many suspect digits would be generated by this matrix. 10.15 (a) Determine the condition number for the following system using the row-sum norm. Do not normalize the system. E 1 4 9 16 25 4 9 16 25 36 9 16 25 36 49 16 25 36 49 64 25 36 49 64 81 U
  • 316. PROBLEMS 299 Use sufficient precision in displaying results to allow you to detect imprecision. (b) Repeat part (a) using a 7 3 7 Hilbert matrix. (c) Repeat part (a) using a 10 3 10 Hilbert matrix. 10.25 Polynomial interpolation consists of determining the unique (n 2 1)th-order polynomial that fits n data points. Such polynomi- als have the general form, f(x) 5 p1xn21 1 p2xn22 1 p 1 pn21 x 1 pn (P10.25) where the p’s are constant coefficients. A straightforward way for computing the coefficients is to generate n linear algebraic equations that we can solve simultaneously for the coefficients. Suppose that we want to determine the coefficients of the fourth-order polynomial f(x) 5 p1x4 1 p2x3 1 p3x2 1 p4x 1 p5 that passes through the following five points: (200, 0.746), (250, 0.675), (300, 0.616), (400, 0.525), and (500, 0.457). Each of these pairs can be substituted into Eq. (P10.25) to yield a system of five equations with five unknowns (the p’s). Use this approach to solve for the coefficients. In addition, determine and interpret the condition number. 10.23 Let the function be defined on the interval [0, 2] as follows: f(x) 5 e ax 1 b, 0 # x # 1 cx 1 d, 1 # x # 2 f Determine the constants a, b, c, and d so that the function f satisfies the following: f(0) 5 f(2) 5 1. f is continuous on the entire interval. a 1 b 5 4. Derive and solve a system of linear algebraic equations with a ma- trix form identical to Eq. (10.1). 10.24 (a) Create a 3 3 3 Hilbert matrix. This will be your matrix [A]. Multiply the matrix by the column vector {x} 5 [1, 1, 1]T . The solution of [A]{x} will be another column vector {b}. Using any numerical package and Gauss elimination, find the solution to [A]{x} 5 {b}using the Hilbert matrix and the vector {b} that you calculated. Compare the result to your known {x} vector.
  • 317. 11 C H A P T E R 11 300 Special Matrices and Gauss-Seidel Certain matrices have a particular structure that can be exploited to develop efficient solution schemes. The first part of this chapter is devoted to two such systems: banded and symmetric matrices. Efficient elimination methods are described for both. The second part of the chapter turns to an alternative to elimination methods, that is, approximate, iterative methods. The focus is on the Gauss-Seidel method, which employs initial guesses and then iterates to obtain refined estimates of the solution. The Gauss-Seidel method is particularly well suited for large numbers of equations. In these cases, elimination methods can be subject to round-off errors. Because the error of the Gauss-Seidel method is controlled by the number of iterations, round-off error is not an issue of concern with this method. However, there are certain instances where the Gauss- Seidel technique will not converge on the correct answer. These and other trade-offs between elimination and iterative methods will be discussed in subsequent pages. 11.1 SPECIAL MATRICES As mentioned in Box PT3.1, a banded matrix is a square matrix that has all elements equal to zero, with the exception of a band centered on the main diagonal. Banded sys- tems are frequently encountered in engineering and scientific practice. For example, they typically occur in the solution of differential equations. In addition, other numerical methods such as cubic splines (Sec. 18.5) involve the solution of banded systems. The dimensions of a banded system can be quantified by two parameters: the band- width BW and the half-bandwidth HBW (Fig. 11.1). These two values are related by BW 5 2HBW 1 1. In general, then, a banded system is one for which aij 5 0 if Zi 2 jZ . HBW. Although Gauss elimination or conventional LU decomposition can be employed to solve banded equations, they are inefficient, because if pivoting is unnecessary none of the elements outside the band would change from their original values of zero. Thus, unnecessary space and time would be expended on the storage and manipulation of these useless zeros. If it is known beforehand that pivoting is unnecessary, very efficient algo- rithms can be developed that do not involve the zero elements outside the band. Because many problems involving banded systems do not require pivoting, these alternative al- gorithms, as described next, are the methods of choice.
  • 318. 11.1 SPECIAL MATRICES 301 11.1.1 Tridiagonal Systems A tridiagonal system—that is, one with a bandwidth of 3—can be expressed generally as G f1 g1 e2 f2 g2 e3 f3 g3 . . . . . . . . . en21 fn21 gn21 en fn W g x1 x2 x3 . . . xn21 xn w 5 g r1 r2 r3 . . . rn21 rn w (11.1) Notice that we have changed our notation for the coefficients from a’s and b’s to e’s, f’s, g’s, and r’s. This was done to avoid storing large numbers of useless zeros in the square matrix of a’s. This space-saving modification is advantageous because the resulting al- gorithm requires less computer memory. Figure 11.2 shows pseudocode for an efficient method, called the Thomas algorithm, to solve Eq. (11.1). As with conventional LU decomposition, the algorithm consists of three steps: decomposition and forward and back substitution. Thus, all the advantages of LU decomposition, such as convenient evaluation of multiple right-hand-side vectors and the matrix inverse, can be accomplished by proper application of this algorithm. EXAMPLE 11.1 Tridiagonal Solution with the Thomas Algorithm Problem Statement. Solve the following tridiagonal system with the Thomas algorithm. ≥ 2.04 21 21 2.04 21 21 2.04 21 21 2.04 ¥ μ T1 T2 T3 T4 ∂ 5 μ 40.8 0.8 0.8 200.8 ∂ HBW + 1 HBW BW Diagonal FIGURE 11.1 Parameters used to quantify the dimensions of a banded system. BW and HBW designate the bandwidth and the half-bandwidth, respectively. (a) Decomposition DOFOR k 5 2, n ek 5 ek yfk21 fk 5 fk 2 ek ? gk21 END DO (b) Forward substitution DOFOR k 5 2, n rk 5 rk 2 ek ? rk21 END DO (c) Back substitution xn 5 rn yfn DOFOR k 5 n 21, 1, 21 xk 5 (rk 2 gk ? xk11)yfk END DO FIGURE 11.2 Pseudocode to implement the Thomas algorithm, an LU decomposition method for tridi- agonal systems.
  • 319. 302 SPECIAL MATRICES AND GAUSS-SEIDEL Solution. First, the decomposition is implemented as e2 5 21y2.04 5 20.49 f2 5 2.04 2 (20.49)(21) 5 1.550 e3 5 21y1.550 5 20.645 f3 5 2.04 2 (20.645)(21) 5 1.395 e4 5 21y1.395 5 20.717 f4 5 2.04 2 (20.717)(21) 5 1.323 Thus, the matrix has been transformed to ≥ 2.04 21 20.49 1.550 21 20.645 1.395 21 20.717 1.323 ¥ and the LU decomposition is [A] 5 [L][U] 5 ≥ 1 20.49 1 20.645 1 20.717 1 ¥ ≥ 2.04 21 1.550 21 1.395 21 1.323 ¥ You can verify that this is correct by multiplying [L][U] to yield [A]. The forward substitution is implemented as r2 5 0.8 2 (20.49)40.8 5 20.8 r3 5 0.8 2 (20.645)20.8 5 14.221 r4 5 200.8 2 (20.717)14.221 5 210.996 Thus, the right-hand-side vector has been modified to μ 40.8 20.8 14.221 210.996 ∂ which then can be used in conjunction with the [U] matrix to perform back substitution and obtain the solution T4 5 210.996y1.323 5 159.480 T3 5 [14.221 2 (21)159.48]y1.395 5 124.538 T2 5 [20.800 2 (21)124.538]y1.550 5 93.778 T1 5 [40.800 2 (21)93.778]y2.040 5 65.970 11.1.2 Cholesky Decomposition Recall from Box PT3.1 that a symmetric matrix is one where aij 5 aji for all i and j. In other words, [A] 5 [A]T . Such systems occur commonly in both mathematical and
  • 320. 11.1 SPECIAL MATRICES 303 engineering problem contexts. They offer computational advantages because only half the storage is needed and, in most cases, only half the computation time is required for their solution. One of the most popular approaches involves Cholesky decomposition. This algo- rithm is based on the fact that a symmetric matrix can be decomposed, as in [A] 5 [L][L]T (11.2) That is, the resulting triangular factors are the transpose of each other. The terms of Eq. (11.2) can be multiplied out and set equal to each other. The result can be expressed simply by recurrence relations. For the kth row, lki 5 aki 2 a i21 j51 lij lkj lii for i 5 1, 2, p , k 2 1 (11.3) and lkk 5 B akk 2 a k21 j51 l2 kj (11.4) EXAMPLE 11.2 Cholesky Decomposition Problem Statement. Apply Cholesky decomposition to the symmetric matrix [A] 5 £ 6 15 55 15 55 225 55 225 979 § Solution. For the first row (k 5 1), Eq. (11.3) is skipped and Eq. (11.4) is employed to compute l11 5 1a11 5 16 5 2.4495 For the second row (k 5 2), Eq. (11.3) gives l21 5 a21 l11 5 15 2.4495 5 6.1237 and Eq. (11.4) yields l22 5 2a22 2 l2 21 5 255 2 (6.1237)2 5 4.1833 For the third row (k 5 3), Eq. (11.3) gives (i 5 1) l31 5 a31 l11 5 55 2.4495 5 22.454 and (i 5 2) l32 5 a32 2 l21l31 l22 5 225 2 6.1237(22.454) 4.1833 5 20.917
  • 321. 304 SPECIAL MATRICES AND GAUSS-SEIDEL Figure 11.3 presents pseudocode for implementing the Cholesky decomposition al- gorithm. It should be noted that the algorithm in Fig. 11.3 could result in an execution error if the evaluation of akk involves taking the square root of a negative number. How- ever, for cases where the matrix is positive definite,1 this will never occur. Because many symmetric matrices dealt with in engineering are, in fact, positive definite, the Cholesky algorithm has wide application. Another benefit of dealing with positive definite sym- metric matrices is that pivoting is not required to avoid division by zero. Thus, we can implement the algorithm in Fig. 11.3 without the complication of pivoting. 11.2 GAUSS-SEIDEL Iterative or approximate methods provide an alternative to the elimination methods de- scribed to this point. Such approaches are similar to the techniques we developed to obtain the roots of a single equation in Chap. 6. Those approaches consisted of guessing a value and then using a systematic method to obtain a refined estimate of the root. Because the present part of the book deals with a similar problem—obtaining the values that simultaneously satisfy a set of equations—we might suspect that such approximate methods could be useful in this context. The Gauss-Seidel method is the most commonly used iterative method. Assume that we are given a set of n equations: [A]{X} 5 {B} Suppose that for conciseness we limit ourselves to a 3 3 3 set of equations. If the di- agonal elements are all nonzero, the first equation can be solved for x1, the second for x2, and the third for x3 to yield x1 5 b1 2 a12x2 2 a13x3 a11 (11.5a) 1 A positive definite matrix is one for which the product {X}T [A]{X} is greater than zero for all nonzero vectors {X}. and Eq. (11.4) yields l33 5 2a33 2 l2 31 2 l2 32 5 2979 2 (22.454)2 2 (20.917)2 5 6.1101 Thus, the Cholesky decomposition yields [L] 5 £ 2.4495 6.1237 4.1833 22.454 20.917 6.1101 § The validity of this decomposition can be verified by substituting it and its transpose into Eq. (11.2) to see if their product yields the original matrix [A]. This is left for an exercise. FIGURE 11.3 Pseudocode for Cholesky’s LU decomposition algorithm. DOFOR k 5 1, n DOFOR i 5 1, k 2 1 sum 5 0. DOFOR j 5 1, i 2 1 sum 5 sum 1 aij ? akj END DO aki 5 (aki 2 sum)yaii END DO sum 5 0. DOFOR j 5 1, k 2 1 sum 5 sum 1 a2 kj END DO akk 5 1akk 2 sum END DO
  • 322. 11.2 GAUSS-SEIDEL 305 x2 5 b2 2 a21x1 2 a23x3 a22 (11.5b) x3 5 b3 2 a31x1 2 a32x2 a33 (11.5c) Now, we can start the solution process by choosing guesses for the x’s. A simple way to obtain initial guesses is to assume that they are all zero. These zeros can be substituted into Eq. (11.5a), which can be used to calculate a new value for x1 5 b1ya11. Then, we substitute this new value of x1 along with the previous guess of zero for x3 into Eq. (11.5b) to compute a new value for x2. The process is repeated for Eq. (11.5c) to calculate a new estimate for x3. Then we return to the first equation and repeat the entire procedure until our solution converges closely enough to the true values. Conver- gence can be checked using the criterion [recall Eq. (3.5)] Zea,i Z 5 ` xj i 2 xj21 i xj i ` 100% , es (11.6) for all i, where j and j 2 1 are the present and previous iterations. EXAMPLE 11.3 Gauss-Seidel Method Problem Statement. Use the Gauss-Seidel method to obtain the solution of the same system used in Example 10.2: 3x1 2 0.1x2 2 0.2x3 5 7.85 0.1x1 1 7x2 2 0.3x3 5 219.3 0.3x1 2 0.2x2 1 10x3 5 71.4 Recall that the true solution is x1 5 3, x2 5 22.5, and x3 5 7. Solution. First, solve each of the equations for its unknown on the diagonal. x1 5 7.85 1 0.1x2 1 0.2x3 3 (E11.3.1) x2 5 219.3 2 0.1x1 1 0.3x3 7 (E11.3.2) x3 5 71.4 2 0.3x1 1 0.2x2 10 (E11.3.3) By assuming that x2 and x3 are zero, Eq. (E11.3.1) can be used to compute x1 5 7.85 1 0 1 0 3 5 2.616667 This value, along with the assumed value of x3 5 0, can be substituted into Eq. (E11.3.2) to calculate x2 5 219.3 2 0.1(2.616667) 1 0 7 5 22.794524
  • 323. 306 SPECIAL MATRICES AND GAUSS-SEIDEL The first iteration is completed by substituting the calculated values for x1 and x2 into Eq. (E11.3.3) to yield x3 5 71.4 2 0.3(2.616667) 1 0.2(22.794524) 10 5 7.005610 For the second iteration, the same process is repeated to compute x1 5 7.85 1 0.1(22.794524) 1 0.2(7.005610) 3 5 2.990557 Zet Z 5 0.31% x2 5 219.3 2 0.1(2.990557) 1 0.3(7.005610) 7 5 22.499625 Zet Z 5 0.015% x3 5 71.4 2 0.3(2.990557) 1 0.2(22.499625) 10 5 7.000291 Zet Z 5 0.0042% The method is, therefore, converging on the true solution. Additional iterations could be applied to improve the answers. However, in an actual problem, we would not know the true answer a priori. Consequently, Eq. (11.6) provides a means to estimate the error. For example, for x1, Zea, 1 Z 5 ` 2.990557 2 2.616667 2.990557 ` 100% 5 12.5% For x2 and x3, the error estimates are Zea, 2Z 5 11.8% and Zea, 3Z 5 0.076%. Note that, as was the case when determining roots of a single equation, formulations such as Eq. (11.6) usually provide a conservative appraisal of convergence. Thus, when they are met, they ensure that the result is known to at least the tolerance specified by es. As each new x value is computed for the Gauss-Seidel method, it is immediately used in the next equation to determine another x value. Thus, if the solution is converg- ing, the best available estimates will be employed. An alternative approach, called Jacobi iteration, utilizes a somewhat different tactic. Rather than using the latest available x’s, this technique uses Eq. (11.5) to compute a set of new x’s on the basis of a set of old x’s. Thus, as new values are generated, they are not immediately used but rather are retained for the next iteration. The difference between the Gauss-Seidel method and Jacobi iteration is depicted in Fig. 11.4. Although there are certain cases where the Jacobi method is useful, Gauss- Seidel’s utilization of the best available estimates usually makes it the method of preference. 11.2.1 Convergence Criterion for the Gauss-Seidel Method Note that the Gauss-Seidel method is similar in spirit to the technique of simple fixed- point iteration that was used in Sec. 6.1 to solve for the roots of a single equation. Recall that simple fixed-point iteration had two fundamental problems: (1) it was some- times nonconvergent and (2) when it converged, it often did so very slowly. The Gauss- Seidel method can also exhibit these shortcomings.
  • 324. 11.2 GAUSS-SEIDEL 307 Convergence criteria can be developed by recalling from Sec. 6.5.1 that sufficient conditions for convergence of two nonlinear equations, u(x, y) and y(x, y), are ` 0u 0x ` 1 ` 0u 0y ` , 1 (11.7a) and ` 0y 0x ` 1 ` 0y 0y ` , 1 (11.7b) These criteria also apply to linear equations of the sort we are solving with the Gauss-Seidel method. For example, in the case of two simultaneous equations, the Gauss- Seidel algorithm [Eq. (11.5)] can be expressed as u(x1, x2) 5 b1 a11 2 a12 a11 x2 (11.8a) and y(x1, x2) 5 b2 a22 2 a21 a22 x1 (11.8b) The partial derivatives of these equations can be evaluated with respect to each of the unknowns as 0u 0x1 5 0 0u 0x2 5 2 a12 a11 FIGURE 11.4 Graphical depiction of the difference between (a) the Gauss-Seidel and (b) the Jacobi iterative methods for solving simultaneous linear algebraic equations. First Iteration x1 5 (b1 2 a12x2 2 a13x3)ya11 x1 5 (b1 2 a12x2 2 a13x3)ya11 x2 5 (b2 2 a21x1 2 a23x3)ya22 x2 5 (b2 2 a21x1 2 a23x3)ya22 x3 5 (b3 2 a31x1 2 a32x2)ya33 x3 5 (b3 2 a31x1 2 a32x2)ya33 Second Interation x1 5 (b1 2 a12x2 2 a13x3)ya11 x1 5 (b1 2 a12x2 2 a13x3)ya11 x2 5 (b2 2 a21x1 2 a23x3)ya22 x2 5 (b2 2 a21x1 2 a23x3)ya22 x3 5 (b3 2 a31x1 2 a32x2)ya33 x3 5 (b3 2 a31x1 2 a32x2)ya33 (a) (b) T T T T T T ⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭
  • 325. 308 SPECIAL MATRICES AND GAUSS-SEIDEL and 0y 0x1 5 2 a21 a22 0y 0x2 5 0 which can be substituted into Eq. (11.7) to give ` a12 a11 ` , 1 (11.9a) and ` a21 a22 ` , 1 (11.9b) In other words, the absolute values of the slopes of Eq. (11.8) must be less than unity to ensure convergence. This is displayed graphically in Fig. 11.5. Equation (11.9) can also be reformulated as Za11 Z . Za12 Z and Za22 Z . Za21 Z That is, the diagonal element must be greater than the off-diagonal element for each row. The extension of the above to n equations is straightforward and can be expressed as Zaii Z . a n j51 j?i Zaij Z (11.10) FIGURE 11.5 Iteration cobwebs illustrating (a) convergence and (b) divergence of the Gauss-Seidel method. Notice that the same functions are plotted in both cases (u: 11x1 1 13x2 5 286; v: 11x1 2 9x2 5 99). Thus, the order in which the equations are implemented (as depicted by the direction of the first arrow from the origin) dictates whether the computation converges. x2 x1 v u (a) x2 x1 v u (b)
  • 326. 11.2 GAUSS-SEIDEL 309 That is, the diagonal coefficient in each of the equations must be larger than the sum of the absolute values of the other coefficients in the equation. This criterion is sufficient but not necessary for convergence. That is, although the method may sometimes work if Eq. (11.10) is not met, convergence is guaranteed if the condition is satisfied. Systems where Eq. (11.10) holds are called diagonally dominant. Fortunately, many engineering problems of practical importance fulfill this requirement. 11.2.2 Improvement of Convergence Using Relaxation Relaxation represents a slight modification of the Gauss-Seidel method and is designed to enhance convergence. After each new value of x is computed using Eq. (11.5), that value is modified by a weighted average of the results of the previous and the present iterations: xnew i 5 lxnew i 1 (1 2 l)xold i (11.11) where l is a weighting factor that is assigned a value between 0 and 2. If l 5 1, (1 2 l) is equal to 0 and the result is unmodified. However, if l is set at a value between 0 and 1, the result is a weighted average of the present and the previous results. This type of modification is called underrelaxation. It is typically employed to make a nonconvergent system converge or to hasten convergence by dampening out oscillations. For values of l from 1 to 2, extra weight is placed on the present value. In this instance, there is an implicit assumption that the new value is moving in the correct direction toward the true solution but at too slow a rate. Thus, the added weight of l is intended to improve the estimate by pushing it closer to the truth. Hence, this type of modification, which is called overrelaxation, is designed to accelerate the convergence of an already convergent system. The approach is also called successive or simultaneous overrelaxation, or SOR. The choice of a proper value for l is highly problem-specific and is often determined empirically. For a single solution of a set of equations it is often unnecessary. However, if the system under study is to be solved repeatedly, the efficiency introduced by a wise choice of l can be extremely important. Good examples are the very large systems of partial differential equations that often arise when modeling continuous variations of variables (recall the distributed system depicted in Fig. PT3.1b). We will return to this topic in Part Eight. 11.2.3 Algorithm for Gauss-Seidel An algorithm for the Gauss-Seidel method, with relaxation, is depicted in Fig. 11.6. Note that this algorithm is not guaranteed to converge if the equations are not input in a diagonally dominant form. The pseudocode has two features that bear mentioning. First, there is an initial set of nested loops to divide each equation by its diagonal element. This reduces the total num- ber of operations in the algorithm. Second, notice that the error check is designated by a variable called sentinel. If any of the equations has an approximate error greater than the stopping criterion (es), then the iterations are allowed to continue. The use of the sentinel
  • 327. 310 SPECIAL MATRICES AND GAUSS-SEIDEL allows us to circumvent unnecessary calculations of error estimates once one of the equa- tions exceeds the criterion. 11.2.4 Problem Contexts for the Gauss-Seidel Method Aside from circumventing the round-off dilemma, the Gauss-Seidel technique has a num- ber of other advantages that make it particularly attractive in the context of certain en- gineering problems. For example, when the matrix in question is very large and very sparse (that is, most of the elements are zero), elimination methods waste large amounts of computer memory by storing zeros. FIGURE 11.6 Pseudocode for Gauss-Seidel with relaxation. SUBROUTINE Gseid (a,b,n,x,imax,es,lambda) DOFOR i 5 1,n dummy 5 ai,i DOFOR j 5 1,n ai,j 5 ai,j/dummy END DO bi 5 bi/dummy END DO DOFOR i 5 1, n sum 5 bi DOFOR j 5 1, n IF i fi j THEN sum 5 sum 2 ai,j*xj END DO xi5sum END DO iter51 DO sentinel 5 1 DOFOR i 5 1,n old 5 xi sum 5 bi DOFOR j 5 1,n IF i fi j THEN sum 5 sum 2 ai,j*xj END DO xi 5 lambda*sum 1(1.2lambda)*old IF sentinel 5 1 AND xi fi0. THEN ea 5 ABS((xi 2 old)/xi)*100. IF ea . es THEN sentinel 5 0 END IF END DO iter 5 iter 1 1 IF sentinel 5 1 OR (iter $ imax) EXIT END DO END Gseid
  • 328. 11.3 LINEAR ALGEBRAIC EQUATIONS WITH SOFTWARE PACKAGES 311 At the beginning of this chapter, we saw how this shortcoming could be circum- vented if the coefficient matrix is banded. For nonbanded systems, there is usually no simple way to avoid large memory requirements when using elimination methods. Be- cause all computers have a finite amount of memory, this inefficiency can place a con- straint on the size of systems for which elimination methods are practical. Although a general algorithm such as the one in Fig. 11.6 is prone to the same constraint, the structure of the Gauss-Seidel equations [Eq. (11.5)] permits concise pro- grams to be developed for specific systems. Because only nonzero coefficients need be included in Eq. (11.5), large savings of computer memory are possible. Although this entails more up-front investment in software development, the long-term advantages are substantial when dealing with large systems for which many simulations are to be per- formed. Both lumped- and distributed-variable systems can result in large, sparse matri- ces for which the Gauss-Seidel method has utility. 11.3 LINEAR ALGEBRAIC EQUATIONS WITH SOFTWARE PACKAGES Software packages have great capabilities for solving systems of linear algebraic equa- tions. Before describing these tools, we should mention that the approaches described in Chap. 7 for solving nonlinear systems can be applied to linear systems. However, in this section, we will focus on the approaches that are expressly designed for linear equations. 11.3.1 Excel There are two ways to solve linear algebraic equations with Excel: (1) using the Solver tool or (2) using matrix inversion and multiplication functions. Recall that one way to determine the solution of linear algebraic equations is {X} 5 [A]21 {B} (11.12) Excel has built-in functions for both matrix inversion and multiplication that can be used to implement this formula. EXAMPLE 11.4 Using Excel to Solve Linear Systems Problem Statement. Recall that in Chap. 10 we introduced the Hilbert matrix. The following system is based on the Hilbert matrix. Note that it is scaled, as was done previously in Example 10.3, so that the maximum coefficient in each row is unity. £ 1 1y2 1y3 1 2y3 1y2 1 3y4 3y5 § • x1 x2 x3 ¶ 5 • 1.833333 2.166667 2.35 ¶ The solution to this system is {X}T 5 :1 1 1;. Use Excel to obtain this solution. Solution. The spreadsheet to solve this problem is displayed in Fig. 11.7. First, the matrix [A] and the right-hand-side constants {B} are entered into the spreadsheet cells. Then, a set of cells of the proper dimensions (in our example 3 3 3) is highlighted by either clicking and dragging the mouse or by using the arrow keys while depressing the shift key. As in Fig. 11.7, we highlight the range: B5. .D7. S O F T W A R E
  • 329. 312 SPECIAL MATRICES AND GAUSS-SEIDEL Next, a formula invoking the matrix inverse function is entered, =minverse(B1..D3) Note that the argument is the range holding the elements of [A]. The Ctrl and Shift keys are held down while the Enter key is depressed. The resulting inverse of [A] will be calculated by Excel and displayed in the range B5. .D7 as shown in Fig. 11.7. A similar approach is used to multiply the inverse by the right-hand-side vector. For this case, the range from F5. .F7 is highlighted and the following formula is entered =mmult(B5..D7,F1..F3) where the first range is the first matrix to be multiplied, [A]21 , and the second range is the second matrix to be multiplied, {B}. By again using the Ctrl-Shift-Enter combination, the solution {X} will be calculated by Excel and displayed in the range F5. .F7, as shown in Fig. 11.7. As can be seen, the correct answer results. FIGURE 11.7 Notice that we deliberately reformatted the results in Example 11.4 to show 15 digits. We did this because Excel uses double-precision to store numerical values. Thus, we see that round-off error occurs in the last two digits. This implies a condition number on the order of 100, which agrees with the result of 451.2 originally calculated in Example 10.3. Excel does not have the capability to calculate a condition number. In most cases, particularly because it employs double-precision numbers, this does not rep- resent a problem. However, for cases where you suspect that the system is ill-conditioned, determination of the condition number is useful. MATLAB and Mathcad software are capable of computing this quantity. 11.3.2 MATLAB As the name implies, MATLAB (short for MATrix LABoratory) was designed to facili- tate matrix manipulations. Thus, as might be expected, its capabilities in this area are S O F T W A R E
  • 330. 11.3 LINEAR ALGEBRAIC EQUATIONS WITH SOFTWARE PACKAGES 313 excellent. Some of the key MATLAB functions related to matrix operations are listed in Table 11.1. The following example illustrates a few of these capabilities. EXAMPLE 11.5 Using MATLAB to Manipulate Linear Algebraic Equations Problem Statement. Explore how MATLAB can be employed to solve and analyze linear algebraic equations. Use the same system as in Example 11.4. Solution. First, we can enter the [A] matrix and the {B}vector, A 5 [ 1 1/2 1/3 ; 1 2/3 1/2 ; 1 3/4 3/5 ] A = 1.0000 0.5000 0.3333 1.0000 0.6667 0.5000 1.0000 0.7500 0.6000 B=[1+1/2+1/3;1+2/3+2/4;1+3/4+3/5] B = 1.8333 2.1667 2.3500 Next, we can determine the condition number for [A], as in cond(A) ans = 366.3503 TABLE 11.1 MATLAB functions to implement matrix analysis and numerical linear algebra. Matrix Analysis Linear Equations Function Description Function Description cond Matrix condition number and / Linear equation solution; use “help slash” norm Matrix or vector norm chol Cholesky factorization rcond LINPACK reciprocal condition estimator lu Factors from Gauss elimination rank Number of linearly independent inv Matrix inverse rows or columns det Determinant qr Orthogonal-triangular decomposition trace Sum of diagonal elements qrdelete Delete a column from the QR factorization null Null space qrinsert Insert a column in the QR factorization orth Orthogonalization nnls Nonnegative least squares rref Reduced row echelon form pinv Pseudoinverse lscov Least squares in the presence of known covariance
  • 331. 314 SPECIAL MATRICES AND GAUSS-SEIDEL This result is based on the spectral, or BAB2, norm discussed in Box 10.2. Note that it is of the same order of magnitude as the condition number 5 451.2 based on the row- sum norm in Example 10.3. Both results imply that between two and three digits of precision could be lost. Now we can solve the system of equations in two different ways. The most direct and efficient way is to employ backslash, or “left division”: X=AB X = 1.0000 1.0000 1.0000 For cases such as ours, MATLAB uses Gauss elimination to solve such systems. As an alternative, we can implement Eq. (PT3.6) directly, as in X=inv(A)*B X = 1.0000 1.0000 1.0000 This approach actually determines the matrix inverse first and then performs the matrix multiplication. Hence, it is more time consuming than using the backslash approach. S O F T W A R E 11.3.3 Mathcad Mathcad contains many special functions that manipulate vectors and matrices. These include common operations such as the dot product, matrix transpose, matrix addition, and matrix multiplication. In addition, it allows calculation of the matrix inverse, deter- minant, trace, various types of norms, and condition numbers based on different norms. It also has several functions that decompose matrices. Systems of linear equations can be solved in two ways by Mathcad. First, it is pos- sible to use matrix inversion and subsequent multiplication by the right-hand-side as discussed in Chap. 10. In addition, Mathcad has a special function called lsolve(A,b) that is specifically designed to solve linear equations. You can use other built-in functions to evaluate the condition of A to determine if A is nearly singular and thus possibly subject to round-off errors. As an example, let’s use lsolve to solve a system of linear equations. As shown in Fig. 11.8, the first step is to enter the coefficients of the A matrix using the definition symbol and the Insert/Matrix pull down menu. This gives a box that allows you to
  • 332. 11.3 LINEAR ALGEBRAIC EQUATIONS WITH SOFTWARE PACKAGES 315 specify the dimensions of the matrix. For our case, we will select a dimension of 434, and Mathcad places a blank 4-by-4-size matrix on screen. Now, simply click the appropriate cell location and enter values. Repeat similar operations to create the right- hand-side b vector. Now the vector x is defined as lsolve(A,b) and the value of x is displayed with the equal sign. We can also solve the same system using the matrix inverse. The inverse can be simply computed by merely raising A to the exponent 21. The result is shown on the right side of Fig. 11.8. The solution is then generated as the product of the inverse times b. Next, let’s use Mathcad to find the inverse and the condition number of the Hilbert matrix. As in Fig. 11.9, the scaled matrix can be entered using the definition symbol and the Insert/Matrix pull down menu. The inverse can again be computed by simply raising H to the exponent 21. The result is shown in Fig. 11.9. We can then use some other Mathcad functions to determine condition numbers by using the definition symbol to define variables c1, c2, ce, and ci as the condition number based on the column-sum (cond1), spectral (cond2), the Euclidean (conde), and the row-sum (condi) norms, re- spectively. The resulting values are shown at the bottom of Fig. 11.9. As expected, the spectral norm provides the smallest measure of magnitude. FIGURE 11.8 Mathcad screen to solve a system of linear algebraic equations.
  • 333. 316 SPECIAL MATRICES AND GAUSS-SEIDEL FIGURE 11.9 Mathcad screen to determine the matrix inverse and condition numbers of a scaled 333 Hilbert matrix. S O F T W A R E PROBLEMS 11.1 Perform the same calculations as in (a) Example 11.1, and (b) Example 11.3, but for the tridiagonal system, £ 0.8 20.4 20.4 0.8 20.4 20.4 0.8 § • x1 x2 x3 ¶ 5 • 41 25 105 ¶ 11.2 Determine the matrix inverse for Example 11.1 based on the LU decomposition and unit vectors. 11.3 The following tridiagonal system must be solved as part of a larger algorithm (Crank-Nicolson) for solving partial differential equations: D 2.01475 20.020875 20.020875 2.01475 20.020875 20.020875 2.01475 20.020875 20.020875 2.01475 T 3 d T1 T2 T3 T4 t 5 d 4.175 0 0 2.0875 t Use the Thomas algorithm to obtain a solution. 11.4 Confirm the validity of the Cholesky decomposition of Example 11.2 by substituting the results into Eq. (11.2) to see if the product of [L] and [L]T yields [A].
  • 334. PROBLEMS 317 11.13 Use the Gauss-Seidel method (a) without relaxation and (b) with relaxation (l 5 1.2) to solve the following system to a tolerance of es 5 5%. If necessary, rearrange the equations to achieve convergence. 2x1 2 6x2 2 x3 5 238 23x1 2 x2 1 7x3 5 234 28x1 1 x2 2 2x3 5 220 11.14 Redraw Fig. 11.5 for the case where the slopes of the equa- tions are 1 and 21. What is the result of applying Gauss-Seidel to such a system? 11.15 Of the following three sets of linear equations, identify the set(s) that you could not solve using an iterative method such as Gauss-Seidel. Show using any number of iterations that is neces- sary that your solution does not converge. Clearly state your con- vergence criteria (how you know it is not converging). Set One Set Two Set Three 8x 1 3y 1 z 5 12 x 1 y 1 5z 5 7 2x 1 3y 1 5z 5 7 26x 1 7z 5 1 x 1 4y 2 z 5 4 22x 1 4y 2 5z 5 23 2x 1 4y 2 z 5 5 3x 1 y 2 z 5 4 2y 2 z 5 1 11.16 Use the software package of your choice to obtain a solu- tion, calculate the inverse, and determine the condition number (without scaling) based on the row-sum norm for (a) £ 1 4 9 4 9 16 9 16 25 § • x1 x2 x3 ¶ 5 • 14 29 50 ¶ (b) D 1 4 9 16 4 9 16 25 9 16 25 36 16 25 36 49 T d x1 x2 x3 x4 t 5 d 30 54 86 126 t In both cases, the answers for all the x’s should be 1. 11.17 Given the pair of nonlinear simultaneous equations: f(x, y) 5 4 2 y 2 2x2 g(x, y) 5 8 2 y2 2 4x (a) Use the Excel Solver to determine the two pairs of values of x and y that satisfy these equations. (b) Using a range of initial guesses (x 5 26 to 6 and y 5 26 to 6), determine which initial guesses yield each of the solutions. 11.5 Perform the same calculations as in Example 11.2, but for the symmetric system, £ 6 15 55 15 55 225 55 225 979 § • a0 a1 a2 ¶ 5 • 152.6 585.6 2488.8 ¶ In addition to solving for the Cholesky decomposition, employ it to solve for the a’s. 11.6 Perform a Cholesky decomposition of the following symmet- ric system by hand, £ 8 20 15 20 80 50 15 50 60 § • x1 x2 x3 ¶ 5 • 50 250 100 ¶ 11.7 Compute the Cholesky decomposition of [A] 5 £ 9 0 0 0 25 0 0 0 4 § Do your results make sense in terms of Eqs. (11.3) and (11.4)? 11.8 Use the Gauss-Seidel method to solve the tridiagonal system from Prob. 11.1 (es 5 5%). Use overrelaxation with l 5 1.2. 11.9 Recall from Prob. 10.8, that the following system of equa- tions is designed to determine concentrations (the c’s in g/m3 ) in a series of coupled reactors as a function of amount of mass input to each reactor (the right-hand sides in g/d), 15c1 2 3c2 2 c3 5 3800 23c1 1 18c2 2 6c3 5 1200 24c1 2 c2 1 12c3 5 2350 Solve this problem with the Gauss-Seidel method to es 5 5%. 11.10 Repeat Prob. 11.9, but use Jacobi iteration. 11.11 Use the Gauss-Seidel method to solve the following system until the percent relative error falls below es 5 5%, 10x1 1 2x2 2 x3 5 27 23x1 2 6x2 1 2x3 5 261.5 x1 1 x2 1 5x3 5 221.5 11.12 Use the Gauss-Seidel method (a) without relaxation and (b) with relaxation (l 5 0.95) to solve the following system to a tolerance of es 5 5%. If necessary, rearrange the equations to achieve convergence. 23x1 1 x2 1 12x3 5 50 6x1 2 x2 2 x3 5 3 6x1 1 9x2 1 x3 5 40
  • 335. 318 SPECIAL MATRICES AND GAUSS-SEIDEL 11.24 Develop a user-friendly program in either a high-level or macro language of your choice to obtain a solution for a tridiagonal system with the Thomas algorithm (Fig. 11.2). Test your program by duplicating the results of Example 11.1. 11.25 Develop a user-friendly program in either a high-level or macro language of your choice for Cholesky decomposition based on Fig. 11.3. Test your program by duplicating the results of Example 11.2. 11.26 Develop a user-friendly program in either a high-level or macro language of your choice for the Gauss-Seidel method based on Fig. 11.6. Test your program by duplicating the results of Example 11.3. 11.27 As described in Sec. PT3.1.2, linear algebraic equations can arise in the solution of differential equations. For example, the following differential equation results from a steady-state mass balance for a chemical in a one-dimensional canal, 0 5 D d2 c dx2 2 U dc dx 2 kc where c 5 concentration, t 5 time, x 5 distance, D 5 diffusion coefficient, U 5 fluid velocity, and k 5 a first-order decay rate. Convert this differential equation to an equivalent system of simul- taneous algebraic equations. Given D 5 2, U 5 1, k 5 0.2, c(0) 5 80 and c(10) 5 20, solve these equations from x 5 0 to 10 with Dx 5 2, and develop a plot of concentration versus distance. 11.28 A pentadiagonal system with a bandwidth of five can be expressed generally as Develop a program to efficiently solve such systems without pivoting in a similar fashion to the algorithm used for tridiagonal matrices in Sec. 11.1.1. Test it for the following case: E 8 22 21 0 0 22 9 24 21 0 21 23 7 21 22 0 24 22 12 25 0 0 27 23 15 U e x1 x2 x3 x4 x5 u 5 e 5 2 0 1 5 u H f1 g1 h1 e2 f2 g2 h2 d3 e3 f3 g3 h3 . . . . . . . . . dn21 en21 fn21 gn21 dn en fn X h x1 x2 x3 . . . xn21 xn x5h r1 r2 r3 . . . rn21 rn x 11.18 An electronics company produces transistors, resistors, and computer chips. Each transistor requires four units of copper, one unit of zinc, and two units of glass. Each resistor requires three, three, and one units of the three materials, respectively, and each computer chip requires two, one, and three units of these materials, respectively. Putting this information into table form, we get: Component Copper Zinc Glass Transistors 4 1 2 Resistors 3 3 1 Computer chips 2 1 3 Supplies of these materials vary from week to week, so the com- pany needs to determine a different production run each week. For example, one week the total amounts of materials available are 960 units of copper, 510 units of zinc, and 610 units of glass. Set up the system of equations modeling the production run, and use Excel, MATLAB, or Mathcad, to solve for the number of transistors, resis- tors, and computer chips to be manufactured this week. 11.19 Use MATLAB or Mathcad software to determine the spectral condition number for a 10-dimensional Hilbert matrix. How many digits of precision are expected to be lost due to ill-conditioning? Determine the solution for this system for the case where each ele- ment of the right-hand-side vector {b} consists of the summation of the coefficients in its row. In other words, solve for the case where all the unknowns should be exactly one. Compare the resulting er- rors with those expected based on the condition number. 11.20 Repeat Prob. 11.19, but for the case of a six-dimensional Vandermonde matrix (see Prob. 10.17) where x1 5 4, x2 5 2, x3 5 7, x4 5 10, x5 5 3, and x6 5 5. 11.21 Given a square matrix [A], write a single line MATLAB command that will create a new matrix [Aug] that consists of the original matrix [A] augmented by an identity matrix [I]. 11.22 Write the following set of equations in matrix form: 50 5 5x3 2 7x2 4x2 1 7x3 1 30 5 0 x1 2 7x3 5 40 2 3x2 1 5x1 Use Excel, MATLAB, or Mathcad to solve for the unknowns. In addition, compute the transpose and the inverse of the coefficient matrix. 11.23 In Sec. 9.2.1, we determined the number of operations re- quired for Gauss elimination without partial pivoting. Make a simi- lar determination for the Thomas algorithm (Fig. 11.2). Develop a plot of operations versus n (from 2 to 20) for both techniques.
  • 336. 12 C H A P T E R 12 319 Case Studies: Linear Algebraic Equations The purpose of this chapter is to use the numerical procedures discussed in Chaps. 9, 10, and 11 to solve systems of linear algebraic equations for some engineering case studies. These systematic numerical techniques have practical significance because engineers fre- quently encounter problems involving systems of equations that are too large to solve by hand. The numerical algorithms in these applications are particularly convenient to imple- ment on personal computers. Section 12.1 shows how a mass balance can be employed to model a system of reactors. Section 12.2 places special emphasis on the use of the matrix inverse to determine the complex cause-effect interactions between forces in the members of a truss. Section 12.3 is an example of the use of Kirchhoff’s laws to compute the cur- rents and voltages in a resistor circuit. Finally, Sec. 12.4 is an illustration of how linear equations are employed to determine the steady-state configuration of a mass- spring system. 12.1 STEADY-STATE ANALYSIS OF A SYSTEM OF REACTORS (CHEMICAL/BIO ENGINEERING) Background. One of the most important organizing principles in chemical engineer- ing is the conservation of mass (recall Table 1.1). In quantitative terms, the principle is expressed as a mass balance that accounts for all sources and sinks of a material that pass in and out of a volume (Fig. 12.1). Over a finite period of time, this can be expressed as Accumulation 5 inputs 2 outputs (12.1) The mass balance represents a bookkeeping exercise for the particular substance being modeled. For the period of the computation, if the inputs are greater than the outputs, the mass of the substance within the volume increases. If the outputs are greater than the inputs, the mass decreases. If inputs are equal to the outputs, accumulation is zero and mass remains constant. For this stable condition, or steady state, Eq. (12.1) can be expressed as Inputs 5 outputs (12.2)
  • 337. 320 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS Employ the conservation of mass to determine the steady-state concentrations of a system of coupled reactors. Solution. The mass balance can be used for engineering problem solving by expressing the inputs and outputs in terms of measurable variables and parameters. For example, if we were performing a mass balance for a conservative substance (that is, one that does not increase or decrease due to chemical transformations) in a reactor (Fig. 12.2), we would have to quantify the rate at which mass flows into the reactor through the two inflow pipes and out of the reactor through the outflow pipe. This can be done by taking the product of the flow rate Q (in cubic meters per minute) and the concentration c (in milligrams per cubic meter) for each pipe. For example, for pipe 1 in Fig. 12.2, Q1 5 2 m3 /min and c1 5 25 mg/m3 ; therefore, the rate at which mass flows into the reactor through pipe 1 is Q1c1 5 (2 m3 /min)(25 mg/m3 ) 5 50 mg/min. Thus, 50 mg of chemi- cal flows into the reactor through this pipe each minute. Similarly, for pipe 2 the mass inflow rate can be calculated as Q2c2 5 (1.5 m3 /min)(10 mg/m3 ) 5 15 mg/min. Notice that the concentration out of the reactor through pipe 3 is not specified by Fig. 12.2. This is because we already have sufficient information to calculate it on the basis of the conservation of mass. Because the reactor is at steady state, Eq. (12.2) holds and the inputs should be in balance with the outputs, as in Q1c1 1 Q2c2 5 Q3c3 Substituting the given values into this equation yields 50 1 15 5 3.5c3 which can be solved for c3 5 18.6 mg/m3 . Thus, we have determined the concentration in the third pipe. However, the computation yields an additional bonus. Because the reactor is well mixed (as represented by the propeller in Fig. 12.2), the concentration will be uniform, or homogeneous, throughout the tank. Therefore the concentration in pipe 3 should be identical to the concentration throughout the reactor. Consequently, the mass balance has allowed us to compute both the concentration in the reactor and in the Input Output Accumulation Volume FIGURE 12.1 A schematic representation of mass balance.
  • 338. 12.1 STEADY-STATE ANALYSIS OF A SYSTEM OF REACTORS 321 outflow pipe. Such information is of great utility to chemical and petroleum engineers who must design reactors to yield mixtures of a specified concentration. Because simple algebra was used to determine the concentration for the single reac- tor in Fig. 12.2, it might not be obvious how computers figure in mass-balance calcula- tions. Figure 12.3 shows a problem setting where computers are not only useful but are a practical necessity. Because there are five interconnected, or coupled, reactors, five simultaneous mass-balance equations are needed to characterize the system. For reactor 1, the rate of mass flow in is 5(10) 1 Q31c3 FIGURE 12.2 A steady-state, completely mixed reactor with two inflow pipes and one outflow pipe. The flows Q are in cubic meters per minute, and the concentra- tions c are in milligrams per cubic meter. Q3 = 3.5 m3 /min c3 = ? Q1 = 2 m3 /min c1 = 25 mg/m3 Q2 = 1.5 m3 /min c2 = 10 mg/m3 Q24 = 1 Q54 = 2 Q55 = 2 Q15 = 3 Q44 = 11 Q12 = 3 Q31 = 1 Q03 = 8 c03 = 20 Q23 = 1 Q25 = 1 Q34 = 8 Q01 = 5 c01 = 10 c3 c5 c1 c2 c4 FIGURE 12.3 Five reactors linked by pipes.
  • 339. 322 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS and the rate of mass flow out is Q12c1 1 Q15c1 Because the system is at steady state, the inflows and outflows must be equal: 5(10) 1 Q31c3 5 Q12c1 1 Q15c1 or, substituting the values for flow from Fig. 12.3, 6c1 2 c3 5 50 Similar equations can be developed for the other reactors: 23c1 1 3c2 5 0 2c2 1 9c3 5 160 2c2 2 8c3 1 11c4 2 2c5 5 0 23c1 2 c2 1 4c5 5 0 A numerical method can be used to solve these five equations for the five unknown concentrations: {C}T 5 :11.51 11.51 19.06 17.00 11.51; In addition, the matrix inverse can be computed as [A]21 5 E 0.16981 0.00629 0.01887 0 0 0.16981 0.33962 0.01887 0 0 0.01887 0.03774 0.11321 0 0 0.06003 0.07461 0.08748 0.09091 0.04545 0.16981 0.08962 0.01887 0 0.25000 U Each of the elements aij signifies the change in concentration of reactor i due to a unit change in loading to reactor j. Thus, the zeros in column 4 indicate that a loading to reactor 4 will have no impact on reactors 1, 2, 3, and 5. This is consistent with the system configuration (Fig. 12.3), which indicates that flow out of reactor 4 does not feed back into any of the other reactors. In contrast, loadings to any of the first three reactors will affect the entire system as indicated by the lack of zeros in the first three columns. Such information is of great utility to engineers who design and manage such systems. 12.2 ANALYSIS OF A STATICALLY DETERMINATE TRUSS (CIVIL/ENVIRONMENTAL ENGINEERING) Background. An important problem in structural engineering is that of finding the forces and reactions associated with a statically determinate truss. Figure 12.4 shows an example of such a truss. The forces (F) represent either tension or compression on the members of the truss. External reactions (H2, V2, and V3) are forces that characterize how the truss interacts with the supporting surface. The hinge at node 2 can transmit both horizontal and vertical forces to the surface, whereas the roller at node 3 transmits only vertical forces. It is observed that the ef- fect of the external loading of 1000 lb is distributed among the various members of the truss.
  • 340. 12.2 ANALYSIS OF A STATICALLY DETERMINATE TRUSS 323 Solution. This type of structure can be described as a system of coupled linear alge- braic equations. Free-body force diagrams are shown for each node in Fig. 12.5. The sum of the forces in both horizontal and vertical directions must be zero at each node, because the system is at rest. Therefore, for node 1, gFH 5 0 5 2F1 cos 30° 1 F3 cos 60° 1 F1,h (12.3) gFV 5 0 5 2F1 sin 30° 2 F3 sin 60° 1 F1,y (12.4) for node 2, gFH 5 0 5 F2 1 F1 cos 30° 1 F2,h 1 H2 (12.5) gFV 5 0 5 F1 sin 30° 1 F2,y 1 V2 (12.6) FIGURE 12.4 Forces on a statically determi- nate truss. 1000 lb 2 3 1 30⬚ 60⬚ 90⬚ F3 F1 F2 H2 V2 V3 FIGURE 12.5 Free-body force diagrams for the nodes of a statically determinate truss. 2 F3,h F1,v F1,h F2 F2,h F1 F2,v H2 V2 F3 F1 F3,v F3 F2 V3 1 30⬚ 30⬚ 60⬚ 60⬚ 3
  • 341. 324 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS for node 3, gFH 5 0 5 2F2 2 F3 cos 60° 1 F3, h (12.7) gFV 5 0 5 F3 sin 60° 1 F3, y 1 V3 (12.8) where Fi,h is the external horizontal force applied to node i (where a positive force is from left to right) and F1,y is the external vertical force applied to node i (where a positive force is upward). Thus, in this problem, the 1000-lb downward force on node 1 corresponds to F1,y 5 21000. For this case all other Fi,y’s and Fi,h’s are zero. Note that the directions of the internal forces and reactions are unknown. Proper application of Newton’s laws requires only consistent assumptions regarding direction. Solutions are negative if the directions are assumed incorrectly. Also note that in this problem, the forces in all members are assumed to be in tension and act to pull adjoining nodes to- gether. A negative solution therefore corresponds to compression. This problem can be written as the following system of six equations and six unknowns: F 0.866 0 20.5 0 0 0 0.5 0 0.866 0 0 0 20.866 21 0 21 0 0 20.5 0 0 0 21 0 0 1 0.5 0 0 0 0 0 20.866 0 0 21 V f F1 F2 F3 H2 V2 V3 v 5 f 0 21000 0 0 0 0 v (12.9) Notice that, as formulated in Eq. (12.9), partial pivoting is required to avoid division by zero diagonal elements. Employing a pivot strategy, the system can be solved using any of the elimination techniques discussed in Chap. 9 or 10. However, because this problem is an ideal case study for demonstrating the utility of the matrix inverse, the LU decomposition can be used to compute F1 5 2500 F2 5 433 F3 5 2866 H2 5 0 V2 5 250 V3 5 750 and the matrix inverse is [A]21 5 F 0.866 0.5 0.25 20.433 20.5 0.866 21 0 20.433 20.25 0.433 20.75 0 0 0 0 0 0 1 0 0 0 0 0 21 0 21 0 0 21 0 0 0 0 0 21 V Now, realize that the right-hand-side vector represents the externally applied horizontal and vertical forces on each node, as in {F}T 5 :F1,h F1,y F2,h F2,y F3,h F3,y ; (12.10) Because the external forces have no effect on the LU decomposition, the method need not be implemented over and over again to study the effect of different external forces on the truss. Rather, all that we have to do is perform the forward- and backward-substitution steps for each right-hand-side vector to efficiently obtain alternative solutions. For example,
  • 342. 12.2 ANALYSIS OF A STATICALLY DETERMINATE TRUSS 325 we might want to study the effect of horizontal forces induced by a wind blowing from left to right. If the wind force can be idealized as two point forces of 1000 lb on nodes 1 and 2 (Fig. 12.6a), the right-hand-side vector is {F}T 5 :21000 0 1000 0 0 0; which can be used to compute F1 5 866 F2 5 250 F3 5 2500 H2 5 22000 V2 5 2433 V3 5 433 For a wind from the right (Fig. 12.6b), F1,h 5 21000, F3,h 5 21000, and all other external forces are zero, with the result that F1 5 2866 F2 5 21250 F3 5 500 H2 5 2000 V2 5 433 V3 5 2433 The results indicate that the winds have markedly different effects on the structure. Both cases are depicted in Fig. 12.6. The individual elements of the inverted matrix also have direct utility in elucidating stimulus-response interactions for the structure. Each element represents the change of one of the unknown variables to a unit change of one of the external stimuli. For ex- ample, element a21 32 indicates that the third unknown (F3) will change 0.866 due to a unit change of the second external stimulus (F1,y). Thus, if the vertical load at the first node were increased by 1, F3 would increase by 0.866. The fact that elements are 0 indicates that certain unknowns are unaffected by some of the external stimuli. For instance a21 32 5 0 means that F1 is unaffected by changes in F2,h. This ability to isolate interactions has a number of engineering applications, including the identification of those compo- nents that are most sensitive to external stimuli and, as a consequence, most prone to failure. In addition, it can be used to determine components that may be unnecessary (see Prob. 12.18). FIGURE 12.6 Two test cases showing (a) winds from the left and (b) winds from the right. (a) (b) 866 2000 1000 1000 250 5 0 0 433 433 866 2000 1000 1000 1250 5 0 0 433 433
  • 343. 326 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS The foregoing approach becomes particularly useful when applied to large complex structures. In engineering practice, it may be necessary to solve trusses with hundreds or even thousands of structural members. Linear equations provide one powerful ap- proach for gaining insight into the behavior of these structures. 12.3 CURRENTS AND VOLTAGES IN RESISTOR CIRCUITS (ELECTRICAL ENGINEERING) Background. A common problem in electrical engineering involves determining the currents and voltages at various locations in resistor circuits. These problems are solved using Kirchhoff’s current and voltage rules. The current (or point) rule states that the algebraic sum of all currents entering a node must be zero (see Fig. 12.7a), or oi 5 0 (12.11) where all current entering the node is considered positive in sign. The current rule is an application of the principle of conservation of charge (recall Table 1.1). The voltage (or loop) rule specifies that the algebraic sum of the potential differences (that is, voltage changes) in any loop must equal zero. For a resistor circuit, this is ex- pressed as oj 2 oiR 5 0 (12.12) where j is the emf (electromotive force) of the voltage sources and R is the resistance of any resistors on the loop. Note that the second term derives from Ohm’s law (Fig. 12.7b), which states that the voltage drop across an ideal resistor is equal to the product of the current and the resistance. Kirchhoff’s voltage rule is an expression of the conservation of energy. Solution. Application of these rules results in systems of simultaneous linear algebraic equations because the various loops within a circuit are coupled. For example, consider the circuit shown in Fig. 12.8. The currents associated with this circuit are unknown both in magnitude and direction. This presents no great difficulty because one simply assumes a direction for each current. If the resultant solution from Kirchhoff’s laws is negative, then the assumed direction was incorrect. For example, Fig. 12.9 shows some assumed currents. FIGURE 12.7 Schematic representations of (a) Kirchhoff’s current rule and (b) Ohm’s law. i1 i3 i2 Vi Vj Rij iij (a) (b) FIGURE 12.8 A resistor circuit to be solved using simultaneous linear algebraic equations. R = 5 ⍀ R = 10 ⍀ R = 10 ⍀ 3 2 1 4 5 6 R = 15 ⍀ R = 5 ⍀ V1 = 200 V V6 = 0 V R = 20 ⍀
  • 344. 12.3 CURRENTS AND VOLTAGES IN RESISTOR CIRCUITS 327 Given these assumptions, Kirchhoff’s current rule is applied at each node to yield i12 1 i52 1 i32 5 0 i65 2 i52 2 i54 5 0 i43 2 i32 5 0 i54 2 i43 5 0 Application of the voltage rule to each of the two loops gives 2i54R54 2 i43R43 2 i32R32 1 i52 R52 5 0 2i65R65 2 i52R52 2 i12R12 2 200 5 0 or, substituting the resistances from Fig. 12.8 and bringing constants to the right-hand side, 215i54 2 5i43 2 10i32 1 10i52 5 0 220i65 2 10i52 1 5i12 5 200 Therefore, the problem amounts to solving the following set of six equations with six unknown currents: F 1 1 1 0 0 0 0 21 0 1 21 0 0 0 21 0 0 1 0 0 0 0 1 21 0 10 210 0 215 25 5 210 0 220 0 0 V f i12 i52 i32 i65 i54 i43 v 5 f 0 0 0 0 0 200 v Although impractical to solve by hand, this system is easily handled using an elimination method. Proceeding in this manner, the solution is i12 5 6.1538 i52 5 24.6154 i32 5 21.5385 i65 5 26.1538 i54 5 21.5385 i43 5 21.5385 Thus, with proper interpretation of the signs of the result, the circuit currents and volt- ages are as shown in Fig. 12.10. The advantages of using numerical algorithms and computers for problems of this type should be evident. FIGURE 12.9 Assumed currents. 3 2 1 4 5 6 i12 i65 i52 i32 i54 i43
  • 345. 328 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS 12.4 SPRING-MASS SYSTEMS (MECHANICAL/AEROSPACE ENGINEERING) Background. Idealized spring-mass systems play an important role in mechanical and other engineering problems. Figure 12.11 shows such a system. After they are released, the masses are pulled downward by the force of gravity. Notice that the resulting dis- placement of each spring in Fig. 12.11b is measured along local coordinates referenced to its initial position in Fig. 12.11a. As introduced in Chap. 1, Newton’s second law can be employed in conjunction with force balances to develop a mathematical model of the system. For each mass, the second law can be expressed as m d2 x dt2 5 FD 2 FU (12.13) To simplify the analysis, we will assume that all the springs are identical and follow Hooke’s law. A free-body diagram for the first mass is depicted in Fig. 12.12a. The upward force is merely a direct expression of Hooke’s law: FU 5 kx1 (12.14) The downward component consists of the two spring forces along with the action of gravity on the mass, FD 5 k(x2 2 x1) 1 k(x2 2 x1) 5 m1g (12.15) Note how the force component of the two springs is proportional to the displacement of the second mass, x2, corrected for the displacement of the first mass, x1. Equations (12.14) and (12.15) can be substituted into Eq. (12.13) to give m1 d2 x1 dt2 5 2k(x2 2 x1) 1 m1g 2 kx1 (12.16) Thus, we have derived a second-order ordinary differential equation to describe the dis- placement of the first mass with respect to time. However, notice that the solution cannot be obtained because the model includes a second dependent variable, x2. Consequently, free-body diagrams must be developed for the second and the third masses (Fig. 12.12b FIGURE 12.10 The solution for currents and voltages obtained using an elimination method. V = 153.85 V = 169.23 i = 1.5385 V = 146.15 V = 123.08 V = 0 V = 200 i = 6.1538
  • 346. 12.4 SPRING-MASS SYSTEMS 329 and c) that can be employed to derive m2 d2 x2 dt2 5 k(x3 2 x2) 1 m2g 2 2k(x2 2 x1) (12.17) and m3 d2 x3 dt2 5 m3g 2 k(x3 2 x2) (12.18) m1 m3 m2 m1 m3 0 0 0 x1 x2 x3 k k k k (b) (a) m2 FIGURE 12.11 A system composed of three masses suspended vertically by a series of springs. (a) The system before release, that is, prior to extension or compression of the springs. (b) The system after release. Note that the positions of the masses are referenced to local coordinates with origins at their position before release. FIGURE 12.12 Free-body diagrams for the three masses from Fig. 12.11. m1 k(x2 – x1) m1g k(x2 – x1) kx1 k(x2 – x1) k(x2 – x1) k(x3 – x2) m2g k(x3 – x2) m3g (a) (b) (c) m2 m3
  • 347. 330 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS Equations (12.16), (12.17), and (12.18) form a system of three differential equations with three unknowns. With the appropriate initial conditions, they could be used to solve for the displacements of the masses as a function of time (that is, their oscillations). We will discuss numerical methods for obtaining such solutions in Part Seven. For the pres- ent, we can obtain the displacements that occur when the system eventually comes to rest, that is, to the steady state. To do this, the derivatives in Eqs. (12.16), (12.17), and (12.18) are set to zero to give 3kx1 2 2kx2 5 m1g 22kx1 1 3kx2 2 kx3 5 m2g 2 kx2 1 kx3 5 m3g or, in matrix form, [K]{X} 5 {W} where [K], called the stiffness matrix, is [K] 5 £ 3k 22k 22k 3k 2k 2k k § and {X} and {W} are the column vectors of the unknowns X and the weights mg, respectively. Solution. At this point, numerical methods can be employed to obtain a solution. If m1 5 2 kg, m2 5 3 kg, m3 5 2.5 kg, and the k’s 5 10 kg/s2 , use LU decomposition to solve for the displacements and generate the inverse of [K]. Substituting the model parameters with g 5 9.81 gives [K] 5 £ 30 220 220 30 210 210 10 § {W} 5 • 19.62 29.43 24.525 ¶ LU decomposition can be employed to solve for x1 5 7.36, x2 5 10.06, and x3 5 12.51. These displacements were used to construct Fig. 12.11b. The inverse of the stiffness matrix is computed as [K]21 5 £ 0.1 0.1 0.1 0.1 0.15 0.15 0.1 0.15 0.25 § Each element of this matrix k21 ji tells us the displacement of mass i due to a unit force imposed on mass j. Thus, the values of 0.1 in column 1 tell us that a downward unit load to the first mass will displace all of the masses 0.1 m downward. The other elements can be interpreted in a similar fashion. Therefore, the inverse of the stiffness matrix provides a fundamental summary of how the system’s components respond to externally applied forces.
  • 348. PROBLEMS 331 PROBLEMS Chemical/Bio Engineering 12.1 Perform the same computation as in Sec. 12.1, but change c01 to 20 and c03 to 6.Also change the following flows: Q01 5 6, Q12 5 4, Q24 5 2, and Q44 5 12. 12.2 If the input to reactor 3 in Sec. 12.1 is decreased 25 percent, use the matrix inverse to compute the percent change in the concen- tration of reactors 2 and 4? 12.3 Because the system shown in Fig. 12.3 is at steady state, what can be said regarding the four flows: Q01, Q03, Q44, and Q55? 12.4 Recompute the concentrations for the five reactors shown in Fig. 12.3, if the flows are changed to Q01 5 5 Q31 5 3 Q25 5 2 Q23 5 2 Q15 5 4 Q55 5 3 Q54 5 3 Q34 5 7 Q12 5 4 Q03 5 8 Q24 5 0 Q44 5 10 12.5 Solve the same system as specified in Prob. 12.4, but set Q12 5 Q54 5 0 and Q15 5 Q34 5 3. Assume that the inflows (Q01, Q03) and outflows (Q44, Q55) are the same. Use conservation of flow to recompute the values for the other flows. 12.6 Figure P12.6 shows three reactors linked by pipes.As indicated, the rate of transfer of chemicals through each pipe is equal to a flow rate (Q, with units of cubic meters per second) multiplied by the con- centration of the reactor from which the flow originates (c, with units of milligrams per cubic meter). If the system is at a steady state, the transfer into each reactor will balance the transfer out. Develop mass- balance equations for the reactors and solve the three simultaneous linear algebraic equations for their concentrations. 12.7 Employing the same basic approach as in Sec. 12.1, deter- mine the concentration of chloride in each of the Great Lakes using the information shown in Fig. P12.7. 12.8 The Lower Colorado River consists of a series of four reser- voirs as shown in Fig. P12.8. Mass balances can be written for each reservoir and the following set of simultaneous linear algebraic equations results: ≥ 13.442 0 0 0 213.442 12.252 0 0 0 212.252 12.377 0 0 0 212.377 11.797 ¥ μ c1 c2 c3 c4 ∂ 5 μ 750.5 300 102 30 ∂ where the right-hand-side vector consists of the loadings of chlo- ride to each of the four lakes and c1, c2, c3, and c4 5 the resulting chloride concentrations for Lakes Powell, Mead, Mohave, and Havasu, respectively. (a) Use the matrix inverse to solve for the concentrations in each of the four lakes. (b) How much must the loading to Lake Powell be reduced in or- der for the chloride concentration of Lake Havasu to be 75? (c) Using the column-sum norm, compute the condition number and how many suspect digits would be generated by solving this system. 12.9 A stage extraction process is depicted in Fig. P12.9. In such systems, a stream containing a weight fraction Yin of a chemical enters from the left at a mass flow rate of F1. Simultaneously, a solvent carrying a weight fraction Xin of the same chemical enters from the right at a flow rate of F2. Thus, for stage i, a mass balance can be represented as F1Yi21 1 F2Xi11 5 F1Yi 1 F2Xi (P12.9.1) At each stage, an equilibrium is assumed to be established between Yi and Xi as in K 5 Xi Yi (P12.9.2) FIGURE P12.6 Three reactors linked by pipes. The rate of mass transfer through each pipe is equal to the product of flow Q and con- centration c of the reactor from which the flow originates. 2 3 Q33 = 120 Q13 = 40 Q12 = 80 Q23 = 60 Q21 = 20 Q12c1 Q21c2 Q23c2 Q33c3 Q13c1 400 mg/s 200 mg/s 1
  • 349. 332 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS 12.10 An irreversible, first-order reaction takes place in four well- mixed reactors (Fig. P12.10), A S k B Thus, the rate at which A is transformed to B can be represented as Rab 5 kVc The reactors have different volumes, and because they are operated at different temperatures, each has a different reaction rate: Reactor V, L k, h21 1 25 0.05 2 75 0.1 3 100 0.5 4 25 0.1 Determine the concentration of A and B in each of the reactors at steady state. 12.11 A peristaltic pump delivers a unit flow (Q1) of a highly viscous fluid. The network is depicted in Fig. P12.11. Every pipe section has the same length and diameter. The mass and mechanical energy balance can be simplified to obtain the flows in every pipe. Solve the following system of equations to obtain the flow in every pipe. Q3 1 2Q4 2 2Q2 5 0 Q5 1 2Q6 2 2Q4 5 0 3Q7 2 2Q6 5 0 where K is called a distribution coefficient. Equation (P12.9.2) can be solved for Xi and substituted into Eq. (P12.9.1) to yield Yi21 2 a1 1 F2 F1 Kb Yi 1 a F2 F1 Kb Yi11 5 0 (P12.9.3) If F1 5 400 kg/h, Yin 5 0.1, F2 5 800 kg/h, Xin 5 0, and K 5 5, determine the values of Yout and Xout if a five-stage reactor is used. Note that Eq. (P12.9.3) must be modified to account for the inflow weight fractions when applied to the first and last stages. FIGURE P12.7 A chloride balance for the Great Lakes. Numbered arrows are direct inputs. QSH = 67 QMH = 36 QHE = 161 QEO = 182 QOO = 212 QSHcS QMHcM QHEcH QEOcE QOOcO 3850 4720 740 180 710 Superior Michigan Huron Superior Erie Ontario c1 c2 c3 c4 Upper Colorado River Lake Mead Lake Mohave Lake Havasu Lake Powell FIGURE P12.8 The Lower Colorado River.
  • 350. PROBLEMS 333 is passed over a liquid flowing from right to left. The transfer of a chemical from the gas into the liquid occurs at a rate that is propor- tional to the difference between the gas and liquid concentrations in each reactor. At steady state, a mass balance for the first reactor can be written for the gas as QGcG0 2 QGcG1 1 D(cL1 2 cG1) 5 0 and for the liquid as QL cL2 2 QL cL1 1 D(cG1 2 cL1) 5 0 where QG and QL are the gas and liquid flow rates, respectively, and D 5 the gas-liquid exchange rate. Similar balances can be written for the other reactors. Solve for the concentrations given the follow- ing values: QG 5 2, QL 5 1, D 5 0.8, cG0 5 100, cL6 5 20. Civil/Environmental Engineering 12.13 A civil engineer involved in construction requires 4800, 5810, and 5690 m3 of sand, fine gravel, and coarse gravel, respectively, for Q1 5 Q2 1 Q3 Q3 5 Q4 1 Q5 Q5 5 Q6 1 Q7 12.12 Figure P12.12 depicts a chemical exchange process consist- ing of a series of reactors in which a gas flowing from left to right FIGURE P12.9 A stage extraction process. Flow = F1 Flow = F2 x2 xout x3 xi xi + 1 xn – 1 xn xin y1 yin y2 yi – 1 yi yn – 2 yn – 1 yout 1 0 2 0 n 0 i n – 1 ••• ••• 1 2 3 4 Qin = 10 Q32 = 5 Q43 = 3 cA,in = 1 FIGURE P12.10 FIGURE P12.11 Q1 Q3 Q5 Q2 Q4 Q6 Q7 cG1 cG0 cG2 cG3 cG4 QG QG QL cG5 QL D cL1 cL2 cL3 cL4 cL5 cL6 FIGURE P12.12
  • 351. 334 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS 12.17 In the example for Fig. 12.4, where a 1000-lb downward force is applied at node 1, the external reactions V2 and V3 were calculated. But if the lengths of the truss members had been given, we could have calculated V2 and V3 by utilizing the fact that V2 1 V3 must equal 1000 and by summing moments around node 2. However, because we do know V2 and V3, we can work backward to solve for the lengths of the truss members. Note that because there are three unknown lengths and only two equations, we can solve for only the relationship between lengths. Solve for this relationship. 12.18 Employing the same methods as used to analyze Fig. 12.4, determine the forces and reactions for the truss shown in Fig. P12.18. 12.19 Solve for the forces and reaction for the truss in Fig. P12.19. Determine the matrix inverse for the system. Does the vertical- member force in the middle member seem reasonable? Why? How many cubic meters must be hauled from each pit in order to meet the engineer’s needs? 12.14 Perform the same computation as in Sec. 12.2, but for the truss depicted in Fig. P12.14. 12.15 Perform the same computation as in Sec. 12.2, but for the truss depicted in Fig. P12.15. 12.16 Calculate the forces and reactions for the truss in Fig. 12.4 if a downward force of 2500 kg and a horizontal force to the right of 2000 kg are applied at node 1. 600 1200 500 30⬚ 45⬚ 45⬚ FIGURE P12.14 FIGURE P12.19 400 200 45⬚ 60⬚ 45⬚ 30⬚ FIGURE P12.15 a building project. There are three pits from which these materials can be obtained. The composition of these pits is Sand Fine Gravel Coarse Gravel % % % Pit 1 52 30 18 Pit 2 20 50 30 Pit 3 25 20 55 FIGURE P12.18 45⬚ 800 250 30⬚ 30⬚ 60⬚ 45⬚ 45⬚ 60⬚ 3500
  • 352. PROBLEMS 335 12.22 A truss is loaded as shown in Fig. P12.22. Using the follow- ing set of equations, solve for the 10 unknowns: AB, BC, AD, BD, CD, DE, CE, Ax, Ay, and Ey. 12.20 As the name implies, indoor air pollution deals with air con- tamination in enclosed spaces such as homes, offices, work areas, etc. Suppose that you are designing a ventilation system for a res- taurant as shown in Fig. P12.20. The restaurant serving area con- sists of two square rooms and one elongated room. Room 1 and room 3 have sources of carbon monoxide from smokers and a faulty grill, respectively. Steady-state mass balances can be written for each room. For example, for the smoking section (room 1), the balance can be written as 0 5 Wsmoker 1 Qaca 2 Qac1 1 E13(c3 2 c1) (load) 1 (inflow) 2 (outflow) 1 (mixing) or substituting the parameters 225c1 2 25c3 5 2400 Similar balances can be written for the other rooms. (a) Solve for the steady-state concentration of carbon monoxide in each room. (b) Determine what percent of the carbon monoxide in the kids’ section is due to (i) the smokers, (ii) the grill, and (iii) the air in the intake vents. (c) If the smoker and grill loads are increased to 2000 and 5000 mg/hr, respectively, use the matrix inverse to determine the in- crease in the concentration in the kids’ section. (d) How does the concentration in the kids’ area change if a screen is constructed so that the mixing between areas 2 and 4 is de- creased to 5 m3 /hr? 12.21 An upward force of 20 kN is applied at the top of a tripod as depicted in Fig. P12.21. Determine the forces in the legs of the tripod. Qc = 150 m3 /hr 2 (Kids' section) 1 (Smoking section) Grill load (2000 mg/hr) Smoker load (1000 mg/hr) 4 25 m3 /hr 25 m3 /hr 3 Qb = 50 m3 /hr cb = 2 mg/m3 Qa = 200 m3 /hr ca = 2 mg/m3 Qd = 100 m3 /hr 50 m 3 /hr FIGURE P12.20 Overhead view of rooms in a restaurant. The one-way arrows represent volumetric airflows, whereas the two-way arrows represent diffusive mixing. The smoker and grill loads add carbon monoxide mass to the system but negligible airflow. D B C A x y 0.6 m 2.4 m 0.8 m 0.8 m 1 m FIGURE P12.21
  • 353. 336 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS Metal, Plastic, Rubber, Component g/component g/component g/component 1 15 0.30 1.0 2 17 0.40 1.2 3 19 0.55 1.5 If totals of 3.89, 0.095, and 0.282 kg of metal, plastic, and rubber, respectively, are available each day, how many components can be produced per day? 12.27 Determine the currents for the circuit in Fig. P12.27. 12.28 Determine the currents for the circuit in Fig. P12.28. 12.29 The following system of equations was generated by applying the mesh current law to the circuit in Fig. P12.29: 55I1 2 25I4 5 2200 237I3 2 4I4 5 2250 225I1 2 4I3 1 29I4 5 100 Solve for I1, I3, and I4. Ax 1 AD 5 0 Ay 1 AB 5 0 74 1 BC 1 (3y5)BD 5 0 2AB 2 (4y5)BD 5 0 2BC 1 (3y5)CE 5 0 224 2 CD 2 (4y5)CE 5 0 2AD 1 DE 2 (3y5)BD 5 0 CD 1 (4y5)BD 5 0 2DE 2 (3y5)CE 5 0 Ey 1 (4y5)CE 5 0 Electrical Engineering 12.23 Perform the same computation as in Sec. 12.3, but for the circuit depicted in Fig. P12.23. 12.24 Perform the same computation as in Sec. 12.3, but for the circuit depicted in Fig. P12.24. 12.25 Solve the circuit in Fig. P12.25 for the currents in each wire. Use Gauss elimination with pivoting. 12.26 An electrical engineer supervises the production of three types of electrical components. Three kinds of material—metal, plastic, and rubber—are required for production. The amounts needed to produce each component are FIGURE P12.22 3 m 3 m 4 m D A E C B 54 kN 24 kN FIGURE P12.23 R = 2 ⍀ R = 5 ⍀ R = 20 ⍀ 3 2 1 4 5 6 R = 5 ⍀ R = 10 ⍀ V1 = 200 volts V6 = 0 volts R = 25 ⍀ FIGURE P12.24 R = 7 ⍀ R = 5 ⍀ R = 10 ⍀ R = 30 ⍀ 3 2 1 4 5 6 R = 18 ⍀ R = 35 ⍀ V1 = 10 volts V6 = 200 volts R = 5 ⍀ FIGURE P12.25 20 ⍀ 5 ⍀ 10 ⍀ 10 ⍀ 20 ⍀ 5 ⍀ 5 ⍀ 60 ⍀ 0 ⍀ 4 7 9 2 1 8 3 6 15 ⍀ 5 V2 = 40 V1 = 110
  • 354. PROBLEMS 337 Mechanical/Aerospace Engineering 12.31 Perform the same computation as in Sec. 12.4, but add a third spring between masses 1 and 2 and triple k for all springs. 12.32 Perform the same computation as in Sec. 12.4, but change the masses from 2, 3, and 2.5 kg to 10, 3.5, and 2 kg, respectively. 12.33 Idealized spring-mass systems have numerous applications throughout engineering. Figure P12.33 shows an arrangement of four springs in series being depressed with a force of 2000 kg. At equilibrium, force-balance equations can be developed defining the interrelationships between the springs, k2(x2 2 x1) 5 k1x1 k3(x3 2 x2) 5 k2(x2 2 x1) k4(x4 2 x3) 5 k3(x3 2 x2) F 5 k4(x4 2 x3) where the k’s are spring constants. If k1 through k4 are 150, 50, 75, and 225 N/m, respectively, compute the x’s. 12.34 Three blocks are connected by a weightless cord and rest on an inclined plane (Fig. P12.34a). Employing a procedure similar to the one used in the analysis of the falling parachutists in Example 12.30 The following system of equations was generated by apply- ing the mesh current law to the circuit in Fig. P12.30: 60I1 2 40I2 5 200 240I1 1 150I2 2 100I3 5 0 2100I2 1 130I3 5 230 Solve for I1, I2, and I3. FIGURE P12.27 15 ⍀ 25 ⍀ 50 V 80 V 5 ⍀ 10 ⍀ 20 ⍀ + – + – FIGURE P12.28 20 V 8 ⍀ 4 ⍀ 5 ⍀ 2 ⍀ + – 6 ⍀ i3 i1 j2 FIGURE P12.30 200 V 80 V 10 A 20 ⍀ 40 ⍀ 10 ⍀ 100 ⍀ 30 ⍀ + – + – I1 I2 I3 I4 FIGURE P12.29 100 V 25 ⍀ 25 ⍀ 8 ⍀ 4 ⍀ + – 10 A 10 ⍀ 20 ⍀ I2 I3 I4 I1
  • 355. 338 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS 9.11 yields the following set of simultaneous equations (free-body diagrams are shown in Fig. P12.34b): 100a 1 T 5 519.72 50a 2 T 1 R 5 216.55 25a 2 R 5 108.28 Solve for acceleration a and the tensions T and R in the two ropes. 12.35 Perform a computation similar to that called for in Prob. 12.34, but for the system shown in Fig. P12.35. 12.36 Perform the same computation as in Prob. 12.34, but for the system depicted in Fig. P12.36 (angles are 458). 12.37 Consider the three mass-four spring system in Fig. P12.37. Determining the equations of motion from gFx 5 ma, for each mass using its free-body diagram results in the following differential equations: x $ 1 1 a k1 1 k2 m1 bx1 2 a k2 m1 bx2 5 0 x $ 2 2 a k2 m2 bx1 1 a k2 1 k3 m2 bx2 2 a k3 m2 bx3 5 0 x $ 3 2 a k3 m3 bx2 1 a k3 1 k4 m3 bx3 5 0 FIGURE P12.33 F k4 x4 x x3 x2 x1 0 k3 k2 k1 FIGURE P12.34 (b) (a) 1 0 0 k g 5 0 k g a , a c c e l e r a t i o n 2 5 k g 45⬚ R T R T 6 9 2 . 9 6 692.96 100 ⫻ 9.8 = 980 6 9 2 . 9 6 ⫻ 0 . 2 5 = 1 7 3 . 2 4 3 4 6 . 4 8 346.48 50 ⫻ 9.8 = 490 3 4 6 . 4 8 ⫻ 0 . 3 7 5 = 1 2 9 . 9 3 1 7 3 . 2 4 173.24 25 ⫻ 9.8 = 245 1 7 3 . 2 4 ⫻ 0 . 3 7 5 = 6 4 . 9 7
  • 356. PROBLEMS 339 where T 5 temperature (8C), x 5 distance along the rod (m), h9 5 a heat transfer coefficient between the rod and the ambient air (m22 ), and Ta 5 the temperature of the surrounding air (8C). This equation can be transformed into a set of linear algebraic equations by using a finite divided difference approximation for the second derivative (recall Section 4.1.3), d2 T dx2 5 Ti11 2 2Ti 1 Ti21 ¢x2 where Ti designates the temperature at node i. This approximation can be substituted into Eq. (P12.38.1) to give 2Ti21 1 (2 1 h¿¢x2 )Ti 2 Ti11 5 h¿¢x2 Ta This equation can be written for each of the interior nodes of the rod resulting in a tridiagonal system of equations. The first and last nodes at the rod’s ends are fixed by boundary conditions. (a) Develop an analytical solution for Eq. (P12.38.1) for a 10-m rod with Ta 5 20, T(x 5 0) 5 40, T(x 5 10) 5 200, and h9 5 0.02. (b) Develop a numerical solution for the same parameter values employed in (a) using a finite-difference solution with four in- terior nodes as shown in Fig. P12.38 (Dx 5 2 m). 12.39 The steady-state distribution of temperature on a heated plate can be modeled by the Laplace equation, 0 5 02 T 0x2 1 02 T 0y2 If the plate is represented by a series of nodes (Fig. P12.39), cen- tered finite-divided differences can be substituted for the second derivatives, which results in a system of linear algebraic equations. Use the Gauss-Seidel method to solve for the temperatures of the nodes in Fig. P12.39. where k1 5 k4 5 10 N/m, k2 5 k3 5 30 N/m, and m1 5 m2 5 m3 5 2 kg. Write the three equations in matrix form: 0 5 [Acceleration vector] 1 [k/m matrix][displacement vector x] At a specific time when x1 5 0.05 m, x2 5 0.04 m, and x3 5 0.03 m, this forms a tridiagonal matrix. Solve for the acceleration of each mass. 12.38 Linear algebraic equations can arise in the solution of differential equations. For example, the following differential equa- tion derives from a heat balance for a long, thin rod (Fig. P12.38): d2 T dx2 1 h¿(Ta 2 T) 5 0 (P12.38.1) FIGURE P12.35 40 kg 5 0 k g 10 kg 30⬚ 60⬚ Friction = 0.5 Friction = 0.3 Friction = 0.2 FIGURE P12.38 A noninsulated uniform rod positioned between two walls of constant but different temperature. The finite difference representation employs four interior nodes. ⌬x T0 = 40 T5 = 200 Ta = 10 Ta = 10 x = 0 x = 10 FIGURE P12.36 Friction = 0.8 Friction = 0.2 8 kg 1 0 k g 1 5 k g 5 kg FIGURE P12.37 m1 m2 m3 x1 k2 k3 k4 k1 x2 x3
  • 357. 340 CASE STUDIES: LINEAR ALGEBRAIC EQUATIONS 12.40 A rod on a ball and socket joint is attached to cables A and B, as in Fig. P12.40. (a) If a 50-N force is exerted on the massless rod at G, what is the tensile force at cables A and B? (b) Solve for the reactant forces at the base of the rod. Call the base point P. FIGURE P12.39 T12 T11 T22 T21 200⬚C 200⬚C 0⬚C 0⬚C 75⬚C 75⬚C 25⬚C 25⬚C FIGURE P12.40 Ball and socket y x z 50 N 2 m 2 m 2 m 1 m B 2 m 1 m A
  • 358. 27.1 CURRENT 1ST LEVEL HEAD 341 341 PT3.4 TRADE-OFFS Table PT3.2 provides a summary of the trade-offs involved in solving simultaneous linear algebraic equations. Two methods—graphical and Cramer’s rule—are limited to small (# 3) numbers of equations and thus have little utility for practical problem solv- ing. However, these techniques are useful didactic tools for understanding the behavior of linear systems in general. The numerical methods themselves are divided into two general categories: exact and approximate methods. As the name implies, the former are intended to yield exact answers. However, because they are affected by round-off errors, they sometimes yield imprecise results. The magnitude of the round-off error varies from system to system and is dependent on a number of factors. These include the system’s dimensions, its condition, and whether the matrix of coefficients is sparse or full. In addition, computer precision will affect round-off error. It is recommended that a pivoting strategy be employed in any computer program implementing exact elimination methods. The inclusion of such a strategy minimizes round-off error and avoids problems such as division by zero. All other things being equal, LU decomposition–based algorithms are the methods of choice because of their efficiency and flexibility. TABLE PT3.2 Comparison of the characteristics of alternative methods for finding solutions of simultaneous linear algebraic equations. Breadth of Programming Method Stability Precision Application Effort Comments Graphical — Poor Limited — May take more time than the numerical method, but can be useful for visualization Cramer’s rule — Affected by Limited — Excessive computational effort round-off error required for more than three equations Gauss elimination (with — Affected by General Moderate partial pivoting) round-off error LU decomposition — Affected by General Moderate Preferred elimination method; allows round-off error computation of matrix inverse Gauss-Seidel May not Excellent Appropriate only Easy converge if not for diagonally diagonally dominant dominant systems EPILOGUE: PART THREE
  • 359. 342 EPILOGUE: PART THREE Although elimination methods have great utility, their use of the entire matrix of coefficients can be somewhat limiting when dealing with very large, sparse systems. This is due to the fact that large portions of computer memory would be devoted to storage of meaningless zeros. For banded systems, techniques are available to implement elimination methods without having to store the entire coefficient matrix. The approximate technique described in this book is called the Gauss-Seidel method. It differs from the exact techniques in that it employs an iterative scheme to obtain progressively closer estimates of the solution. Thus, the effect of round-off is a moot point with the Gauss-Seidel method because the iterations can be continued as long as is necessary to obtain the desired precision. In addition, versions of the Gauss- Seidel method can be developed to efficiently utilize computer storage requirements for sparse systems. Consequently, the Gauss-Seidel technique has utility for large sys- tems of equations where storage requirements would pose significant problems for the exact techniques. The disadvantage of the Gauss-Seidel method is that it does not always converge or sometimes converges slowly on the true solution. It is strictly reliable only for those systems that are diagonally dominant. However, relaxation methods are available that sometimes offset these disadvantages. In addition, because many sets of linear algebraic equations originating from physical systems exhibit diagonal dominance, the Gauss- Seidel method has great utility for engineering problem solving. In summary, a variety of factors will bear on your choice of a technique for a par- ticular problem involving linear algebraic equations. However, as outlined above, the size and sparseness of the system are particularly important factors in determining your choice. PT3.5 IMPORTANT RELATIONSHIPS AND FORMULAS Every part of this book includes a section that summarizes important formulas. Although Part Three does not really deal with single formulas, we have used Table PT3.3 to sum- marize the algorithms that were covered. The table provides an overview that should be helpful for review and in elucidating the major differences between the methods. PT3.6 ADVANCED METHODS AND ADDITIONAL REFERENCES General references on the solution of simultaneous linear equations can be found in Fadeev and Fadeeva (1963), Stewart (1973), Varga (1962), and Young (1971). Ralston and Rabinowitz (1978) provide a general summary. Many advanced techniques are available to increase the savings in time and/or space when solving linear algebraic equations. Most of these focus on exploiting properties of the equations such as symmetry and bandedness. In particular, algorithms are available to operate on sparse matrices to convert them to a minimum banded format. Jacobs (1977) and Tewarson (1973) include information on this area. Once they are in a mini- mum banded format, there are a variety of efficient solution strategies that are employed such as the active column storage approach of Bathe and Wilson (1976). Aside from n 3 n sets of equations, there are other systems where the number of equations, m, and number of unknowns, n, are not equal. Systems where m , n are called underdetermined. In such cases, there can be either no solution or else more than
  • 360. PT3.6 ADVANCED METHODS AND ADDITIONAL REFERENCES 343 one. Systems where m . n are called overdetermined. For such situations, there is in general no exact solution. However, it is often possible to develop a compromise solution that attempts to determine answers that come “closest” to satisfying all the equations simultaneously. A common approach is to solve the equation in a “least-squares” sense (Lawson and Hanson, 1974; Wilkinson and Reinsch, 1971). Alternatively, linear program- ming methods can be used where the equations are solved in an “optimal” sense by minimizing some objective function (Dantzig, 1963; Luenberger, 1984; and Rabinowitz, 1968). We describe this approach in detail in Chap. 15. TABLE PT3.3 Summary of important information presented in Part Three. Potential Problems and Method Procedure Remedies Gauss elimination LU decomposition Gauss-Seidel method Problems: III conditioning Round-off Division by zero Remedies: Higher precision Partial pivoting Problems: III conditioning Round-off Division by zero Remedies: Higher precision Partial pivoting Problems: Divergent or converges slowly Remedies: Diagonal dominance Relaxation £ a11 a12 a13 0 c1 a21 a22 a23 0 c2 a31 a32 a33 0 c3 § 1 £ a11 a12 a13 Z c1 a'22 a'23 Z c'2 a'' 33 Z c'' 3 § 1 x3 5 c'' 3ya'' 33 x2 5 1c'2 2 a'23x32ya'22 x1 5 1c1 2 a12x1 2 a13x32ya11 Decomposition Back Substitution £ a11 a12 a13 a21 a22 a23 a31 a32 a33 § 1 £ 1 0 0 l21 1 0 l31 l32 1 § • d1 d2 d3 ¶ 5 • c1 c2 c3 ¶ 1 £ u11 u12 u13 0 u22 u23 0 0 u33 § • x1 x2 x3 ¶ 5 • d1 d2 d3 ¶ 5 • x1 x2 x3 ¶ Forward Substitution xi 1 5 (c1 2 a12xi21 2 2 a13xi21 3 )ya11 xi 2 5 (c2 2 a21xi 1 2 a23xi21 3 )ya22 xi 3 5 (c3 2 a31xi 1 2 a32xi 2)ya33 ¶ ` xi i 2 xi21 i xi i ` 100% , es for all x'i s continue iteratively until
  • 362. 345 PT4.1 MOTIVATION Root location (Part 2) and optimization are related in the sense that both involve guessing and searching for a point on a function. The fundamental difference between the two types of problems is illustrated in Fig. PT4.1. Root location involves searching for zeros of a function or functions. In contrast, optimization involves searching for either the minimum or the maximum. The optimum is the point where the curve is flat. In mathematical terms, this corre- sponds to the x value where the derivative f9(x) is equal to zero. Additionally, the second derivative, f0(x), indicates whether the optimum is a minimum or a maximum: if f0(x) , 0, the point is a maximum; if f0(x) . 0, the point is a minimum. Now, understanding the relationship between roots and optima would suggest a pos- sible strategy for finding the latter. That is, you can differentiate the function and locate the root (that is, the zero) of the new function. In fact, some optimization methods seek to find an optima by solving the root problem: f9(x) 5 0. It should be noted that such searches are often complicated because f9(x) is not available analytically. Thus, one must sometimes use finite-difference approximations to estimate the derivative. Beyond viewing optimization as a roots problem, it should be noted that the task of locating optima is aided by some extra mathematical structure that is not part of simple root finding. This tends to make optimization a more tractable task, particularly for multidimensional cases. OPTIMIZATION FIGURE PT4.1 A function of a single variable illustrating the difference between roots and optima. Maximum Minimum 0 Root Root Root f(x) x f⬘(x) = 0 f⬙(x) ⬎ 0 f⬘(x) = 0 f⬙(x) ⬍ 0 f(x) = 0
  • 363. 346 OPTIMIZATION PT4.1.1 Noncomputer Methods and History As mentioned above, differential calculus methods are still used to determine optimum solu- tions. All engineering and science students recall working maxima-minima problems by determining first derivatives of functions in their calculus courses. Bernoulli, Euler, Lagrange, and others laid the foundations of the calculus of variations, which deals with the minimiza- tion of functions. The Lagrange multiplier method was developed to optimize constrained problems, that is, optimization problems where the variables are bounded in some way. The first major advances in numerical approaches occurred only with the develop- ment of digital computers after World War II. Koopmans in the United Kingdom and Kantorovich in the former Soviet Union independently worked on the general problem of least-cost distribution of supplies and products. In 1947, Koopman’s student Dantzig invented the simplex procedure for solving linear programming problems. This approach paved the way for other methods of constrained optimization by a number of investiga- tors, notably Charnes and his coworkers. Approaches for unconstrained optimization also developed rapidly following the widespread availability of computers. PT4.1.2 Optimization and Engineering Practice Most of the mathematical models we have dealt with to this point have been descriptive models. That is, they have been derived to simulate the behavior of an engineering device or system. In contrast, optimization typically deals with finding the “best result,” or opti- mum solution, of a problem. Thus, in the context of modeling, they are often termed prescriptive models since they can be used to prescribe a course of action or the best design. Engineers must continuously design devices and products that perform tasks in an efficient fashion. In doing so, they are constrained by the limitations of the physical world. Further, they must keep costs down. Thus, they are always confronting optimiza- tion problems that balance performance and limitations. Some common instances are listed in Table PT4.1. The following example has been developed to help you get a feel for the way in which such problems might be formulated. TABLE PT4.1 Some common examples of optimization problems in engineering. • Design aircraft for minimum weight and maximum strength. • Optimal trajectories of space vehicles. • Design civil engineering structures for minimum cost. • Design water-resource projects like dams to mitigate flood damage while yielding maximum hydropower. • Predict structural behavior by minimizing potential energy. • Material-cutting strategy for minimum cost. • Design pump and heat transfer equipment for maximum efficiency. • Maximize power output of electrical networks and machinery while minimizing heat generation. • Shortest route of salesperson visiting various cities during one sales trip. • Optimal planning and scheduling. • Statistical analysis and models with minimum error. • Optimal pipeline networks. • Inventory control. • Maintenance planning to minimize cost. • Minimize waiting and idling times. • Design waste treatment systems to meet water-quality standards at least cost.
  • 364. PT4.1 MOTIVATION 347 EXAMPLE PT4.1 Optimization of Parachute Cost Problem Statement. Throughout the rest of the book, we have used the falling para- chutist to illustrate the basic problem areas of numerical methods. You may have noticed that none of these examples concentrate on what happens after the chute opens. In this example, we will examine a case where the chute has opened and we are interested in predicting impact velocity at the ground. You are an engineer working for an agency planning to airlift supplies to refugees in a war zone. The supplies will be dropped at low altitude (500 m) so that the drop is not detected and the supplies fall as close as possible to the refugee camp. The chutes open immediately upon leaving the plane. To reduce damage, the vertical velocity on impact must be below a critical value of yc 5 20 m/s. The parachute used for the drop is depicted in Fig. PT4.2. The cross-sectional area of the chute is that of a half sphere, A 5 2pr2 (PT4.1) The length of each of the 16 cords connecting the chute to the mass is related to the chute radius by / 5 12r (PT4.2) You know that the drag force for the chute is a linear function of its cross-sectional area described by the following formula c 5 kc A (PT4.3) where c 5 drag coefficient (kg/s) and kc 5 a proportionality constant parameterizing the effect of area on drag [kg/(s ? m2 )]. Also, you can divide the payload into as many parcels as you like. That is, the mass of each individual parcel can be calculated as m 5 Mt n FIGURE PT4.2 A deployed parachute. m r ᐉ
  • 365. 348 OPTIMIZATION where m 5 mass of an individual parcel (kg), Mt 5 total load being dropped (kg), and n 5 total number of parcels. Finally, the cost of each chute is related to chute size in a nonlinear fashion, Cost per chute 5 c0 1 c1/ 1 c2A2 (PT4.4) where c0, c1, and c2 5 cost coefficients. The constant term, c0, is the base price for the chutes. The nonlinear relationship between cost and area exists because larger chutes are much more difficult to construct than small chutes. Determine the size (r) and number of chutes (n) that result in minimum cost while at the same time meeting the requirement of having a sufficiently small impact velocity. Solution. The objective here is to determine the number and size of parachutes to minimize the cost of the airlift. The problem is constrained because the parcels must have an impact velocity less than a critical value. The cost can be computed by multiplying the cost of the individual parachute [Eq. (PT4.4)] by the number of parachutes (n). Thus, the function you wish to minimize, which is formally called the objective function, is written as Minimize C 5 n(c0 1 c1/ 1 c2A2 ) (PT4.5) where C 5 cost ($) and A and / are calculated by Eqs. (PT4.1) and (PT4.2), respectively. Next, we must specify the constraints. For this problem there are two constraints. First, the impact velocity must be equal to or less than the critical velocity, y # yc (PT4.6) Second, the number of parcels must be an integer and greater than or equal to 1, n $ 1 (PT4.7) where n is an integer. At this point, the optimization problem has been formulated. As can be seen, it is a nonlinear constrained problem. Although the problem has been broadly formulated, one more issue must be addressed: How do we determine the impact velocity y? Recall from Chap. 1 that the velocity of a falling object can be computed with y 5 gm c (1 2 e2(cym)t ) (1.10) where y 5 velocity (m/s), g 5 acceleration of gravity (m/s2 ), m 5 mass (kg), and t 5 time (s). Although Eq. (1.10) provides a relationship between y and t, we need to know how long the mass falls. Therefore, we need a relationship between the drop distance z and the time of fall t. The drop distance can be calculated from the velocity in Eq. (1.10) by integration z 5 # t 0 gm c (1 2 e2(c/m)t ) dt (PT4.8) This integral can be evaluated to yield z 5 z0 2 gm c t 1 gm2 c2 (1 2 e2(c/m)t ) (PT4.9)
  • 366. PT4.1 MOTIVATION 349 where z0 5 initial height (m). This function, as plotted in Fig. PT4.3, provides a way to predict z given knowledge of t. However, we do not need z as a function of t to solve this problem. Rather, we need to compute the time required for the parcel to fall the distance z0. Thus, we recognize that we must reformulate Eq. (PT4.9) as a root-finding problem. That is, we must solve for the time at which z goes to zero, f(t) 5 0 5 z0 2 gm c t 1 gm2 c2 (1 2 e2(cym)t ) (PT4.10) Once the time to impact is computed, we can substitute it into Eq. (1.10) to solve for the impact velocity. The final specification of the problem, therefore, would be Minimize C 5 n(c0 1 c1/ 1 c2A2 ) (PT4.11) subject to y # yc (PT4.12) n $ 1 (PT4.13) where A 5 2pr2 (PT4.14) / 5 12r (PT4.15) c 5 kc A (PT4.16) m 5 Mt n (PT4.17) FIGURE PT4.3 The height z and velocity v of a deployed parachute as it falls to earth (z 5 0). 5 10 t (s) v (m/s) z (m) 15 Impact 0 0 200 400 600
  • 367. 350 OPTIMIZATION t 5 rootcz0 2 gm c t 1 gm2 c2 (1 2 e2(cym)t ) d (PT4.18) y 5 gm c (1 2 e2(cym)t ) (PT4.19) We will solve this problem in Example 15.4 in Chap. 15. For the time being recog- nize that it has most of the fundamental elements of other optimization problems you will routinely confront in engineering practice. These are • The problem will involve an objective function that embodies your goal. • There will be a number of design variables. These variables can be real numbers or they can be integers. In our example, these are r (real) and n (integer). • The problem will include constraints that reflect the limitations you are working under. We should make one more point before proceeding. Although the objective function and constraints may superficially appear to be simple equations [e.g., Eq. (PT4.12)], they may in fact be the “tip of the iceberg.” That is, they may be underlain by complex de- pendencies and models. For instance, as in our example, they may involve other numeri- cal methods [Eq. (PT4.18)]. This means that the functional relationships you will be using could actually represent large and complicated calculations. Thus, techniques that can find the optimal solution, while minimizing function evaluations, can be extremely valuable. PT4.2 MATHEMATICAL BACKGROUND There are a number of mathematical concepts and operations that underlie optimization. Because we believe that they will be more relevant to you in context, we will defer discussion of specific mathematical prerequisites until they are needed. For example, we will discuss the important concepts of the gradient and Hessians at the beginning of Chap. 14 on multivariate unconstrained optimization. In the meantime, we will limit ourselves here to the more general topic of how optimization problems are classified. An optimization or mathematical programming problem generally can be stated as: Find x, which minimizes or maximizes f(x) subject to di (x) # ai i 5 1, 2, p , m (PT4.20) ei (x) 5 bi i 5 1, 2, p , p (PT4.21) where x is an n-dimensional design vector, f (x) is the objective function, di(x) are inequal- ity constraints, ei(x) are equality constraints, and ai and bi are constants. Optimization problems can be classified on the basis of the form of f(x): • If f(x) and the constraints are linear, we have linear programming. • If f(x) is quadratic and the constraints are linear, we have quadratic programming. • If f(x) is not linear or quadratic and/or the constraints are nonlinear, we have nonlinear programming.
  • 368. PT4.3 ORIENTATION 351 Further, when Eqs. (PT4.20) and (PT4.21) are included, we have a constrained optimiza- tion problem; otherwise, it is an unconstrained optimization problem. Note that for constrained problems, the degrees of freedom are given by n2p2m. Generally, to obtain a solution, p 1 m must be # n. If p 1 m . n, the problem is said to be overconstrained. Another way in which optimization problems are classified is by dimensionality. This is most commonly done by dividing them into one-dimensional and multidimen- sional problems. As the name implies, one-dimensional problems involve functions that depend on a single dependent variable. As in Fig. PT4.4a, the search then consists of climbing or descending one-dimensional peaks and valleys. Multidimensional problems involve functions that depend on two or more dependent variables. In the same spirit, a two-dimensional optimization can again be visualized as searching out peaks and valleys (Fig. PT4.4b). However, just as in real hiking, we are not constrained to walk a single direction, instead the topography is examined to efficiently reach the goal. Finally, the process of finding a maximum versus finding a minimum is essentially identical because the same value, x*, both minimizes f(x) and maximizes 2f(x). This equivalence is illustrated graphically for a one-dimensional function in Fig. PT4.4a. PT4.3 ORIENTATION Some orientation is helpful before proceeding to the numerical methods for optimization. The following is intended to provide an overview of the material in Part Four. In addi- tion, some objectives have been included to help you focus your efforts when studying the material. FIGURE PT4.4 (a) One-dimensional optimization. This figure also illustrates how minimization of f(x) is equivalent to the maximization of 2f(x). (b) Two-dimensional optimization. Note that this figure can be taken to represent either a maximization (contours increase in elevation up to the maximum like a mountain) or a minimization (contours decrease in elevation down to the minimum like a valley). x* x* x x (b) (a) Optimum f(x*, y*) Minimum f (x) f (x) – f(x) Maximum – f (x) f (x, y) f (x) y* y
  • 369. 352 OPTIMIZATION PT4.3.1 Scope and Preview Figure PT4.5 is a schematic representation of the organization of Part Four. Examine this figure carefully, starting at the top and working clockwise. After the present introduction, Chap. 13 is devoted to one-dimensional unconstrained optimization. Methods are presented to find the minimum or maximum of a function of a single variable. Three methods are covered: golden-section search, parabolic interpola- tion, and Newton’s method. An advanced hybrid approach, Brent’s method, that combines the reliability of the golden-section search with the speed of parabolic interpolation is also described. Chapter 14 covers two general types of methods to solve multidimensional uncon- strained optimization problems. Direct methods such as random searches, univariate searches, and pattern searches do not require the evaluation of the function’s derivatives. On the other hand, gradient methods use either first and sometimes second derivatives to find the optimum. The chapter introduces the gradient and the Hessian, which are multidimensional representations of the first and second derivatives. The method of steep- est ascent/descent is then covered in some detail. This is followed by descriptions of some advanced methods: conjugate gradient, Newton’s method, Marquardt’s method, and quasi-Newton methods. Chapter 15 is devoted to constrained optimization. Linear programming is described in detail using both a graphical representation and the simplex method. The detailed analysis of nonlinear constrained optimization is beyond this book’s scope, but we pro- vide an overview of the major approaches. In addition, we illustrate how such problems (along with the problems covered in Chaps. 13 and 14) can be obtained with software packages such as Excel, MATLAB, and Mathcad. Chapter 16 extends the above concepts to actual engineering problems. Engineering applications are used to illustrate how optimization problems are formulated and provide insight into the application of the solution techniques in professional practice. An epilogue is included at the end of Part Four. It contains an overview of the methods discussed in Chaps. 13, 14, and 15. This overview includes a description of trade-offs related to the proper use of each technique. This section also provides refer- ences for some numerical methods that are beyond the scope of this text. PT4.3.2 Goals and Objectives Study Objectives. After completing Part Four, you should have sufficient information to successfully approach a wide variety of engineering problems dealing with optimiza- tion. In general, you should have mastered the techniques, have learned to assess their reliability, and be capable of analyzing alternative methods for any particular problem. In addition to these general goals, the specific concepts in Table PT4.2 should be as- similated for a comprehensive understanding of the material in Part Four. Computer Objectives. You should be able to write a subprogram to implement a simple one-dimensional (like golden-section search or parabolic interpolation) and multidimen- sional (like the random-search method) search. In addition, software packages such as Excel, MATLAB, or Mathcad have varying capabilities for optimization. You can use this part of the book to become familiar with these capabilities.
  • 370. PT4.3 ORIENTATION 353 FIGURE PT4.5 Schematic of the organization of the material in Part Four: Optimization. CHAPTER 13 One-Dimensional Unconstrained Optimization PART 4 Optimization CHAPTER 14 Multidimensional Unconstrained Optimization CHAPTER 15 Constrained Optimization CHAPTER 16 Case Studies EPILOGUE 14.2 Gradient methods 14.1 Direct methods PT 4.2 Mathematical background PT 4.5 Additional references 16.4 Mechanical engineering 16.3 Electrical engineering 16.2 Civil engineering 16.1 Chemical engineering 15.1 Linear programming 15.3 Software packages 15.2 Nonlinear constrained PT 4.4 Trade-offs PT 4.3 Orientation PT 4.1 Motivation 13.2 Parabolic interpolation 13.3 Newton's method 13.4 Brent's method 13.1 Golden-section search
  • 371. 354 OPTIMIZATION TABLE PT4.2 Specific study objectives for Part Four. 1. Understand why and where optimization occurs in engineering problem solving. 2. Understand the major elements of the general optimization problem: objective function, decision variables, and constraints. 3. Be able to distinguish between linear and nonlinear optimization, and between constrained and unconstrained problems. 4. Be able to define the golden ratio and understand how it makes one-dimensional optimization efficient. 5. Locate the optimum of a single variable function with the golden-section search, parabolic interpolation, and Newton’s method. Also, recognize the trade-offs among these approaches, with particular attention to initial guesses and convergence. 6. Understand how Brent’s optimization method combines the reliability of the golden-section search with the speed of parabolic interpolation. 7. Be capable of writing a program and solving for the optimum of a multivariable function using random searching. 8. Understand the ideas behind pattern searches, conjugate directions, and Powell’s method. 9. Be able to define and evaluate the gradient and Hessian of a multivariable function both analytically and numerically. 10. Compute by hand the optimum of a two-variable function using the method of steepest ascent/ descent. 11. Understand the basic ideas behind the conjugate gradient, Newton’s, Marquardt’s, and quasi- Newton methods. In particular, understand the trade-offs among the approaches and recognize how each improves on the steepest ascent/descent. 12. Be capable of recognizing and setting up a linear programming problem to represent applicable engineering problems. 13. Be able to solve a two-dimensional linear programming problem with both the graphical and simplex methods. 14. Understand the four possible outcomes of a linear programming problem. 15. Be able to set up and solve nonlinear constrained optimization problems using a software package.
  • 372. 13 355 C H A P T E R 13 One-Dimensional Unconstrained Optimization This section will describe techniques to find the minimum or maximum of a function of a single variable, f(x). A useful image in this regard is the one-dimensional, “roller coaster”– like function depicted in Fig. 13.1. Recall from Part Two that root location was complicated by the fact that several roots can occur for a single function. Similarly, both local and global optima can occur in optimization. Such cases are called multimodal. In almost all instances, we will be interested in finding the absolute highest or lowest value of a func- tion. Thus, we must take care that we do not mistake a local result for the global optimum. Distinguishing a global from a local extremum can be a very difficult problem for the general case. There are three usual ways to approach this problem. First, insight into the behavior of low-dimensional functions can sometimes be obtained graphically. Sec- ond, finding optima based on widely varying and perhaps randomly generated starting guesses, and then selecting the largest of these as global. Finally, perturbing the starting point associated with a local optimum and seeing if the routine returns a better point or always returns to the same point. Although all these approaches can have utility, the fact is that in some problems (usually the large ones), there may be no practical way to ensure that you have located a global optimum. However, although you should always FIGURE 13.1 A function that asymptotically approaches zero at plus and minus q and has two maximum and two minimum points in the vicinity of the origin. The two points to the right are local optima, whereas the two to the left are global. Local maximum Local minimum Global minimum Global maximum f(x) x
  • 373. 356 ONE-DIMENSIONAL UNCONSTRAINED OPTIMIZATION be sensitive to the issue, it is fortunate that there are numerous engineering problems where you can locate the global optimum in an unambiguous fashion. Just as in root location, optimization in one dimension can be divided into bracket- ing and open methods. As described in the next section, the golden-section search is an example of a bracketing method that depends on initial guesses that bracket a single optimum. This is followed by an alternative approach, parabolic interpolation, which often converges faster than the golden-section search, but sometimes diverges. Another method described in this chapter is an open method based on the idea from calculus that the minimum or maximum can be found by solving f9(x) 5 0. This reduces the optimization problem to finding the root of f9(x) using techniques of the sort described in Part Two. We will demonstrate one version of this approach—Newton’s method. Finally, an advanced hybrid approach, Brent’s method, is described. This ap- proach combines the reliability of the golden-section search with the speed of para- bolic interpolation. 13.1 GOLDEN-SECTION SEARCH In solving for the root of a single nonlinear equation, the goal was to find the value of the variable x that yields a zero of the function f(x). Single-variable optimization has the goal of finding the value of x that yields an extremum, either a maximum or minimum of f(x). The golden-section search is a simple, general-purpose, single-variable search tech- nique. It is similar in spirit to the bisection approach for locating roots in Chap. 5. Recall that bisection hinged on defining an interval, specified by a lower guess (xl) and an upper guess (xu), that bracketed a single root. The presence of a root between these bounds was verified by determining that f(xl) and f(xu) had different signs. The root was then estimated as the midpoint of this interval, xr 5 xl 1 xu 2 The final step in a bisection iteration involved determining a new smaller bracket. This was done by replacing whichever of the bounds xl or xu had a function value with the same sign as f(xr). One advantage of this approach was that the new value xr replaced one of the old bounds. Now we can develop a similar approach for locating the optimum of a one-dimensional function. For simplicity, we will focus on the problem of finding a maximum. When we discuss the computer algorithm, we will describe the minor modifications needed to simu- late a minimum. As with bisection, we can start by defining an interval that contains a single answer. That is, the interval should contain a single maximum, and hence is called unimodal. We can adopt the same nomenclature as for bisection, where xl and xu defined the lower and upper bounds, respectively, of such an interval. However, in contrast to bisection, we need a new strategy for finding a maximum within the interval. Rather than using only two function values (which are sufficient to detect a sign change, and hence a zero), we would need three function values to detect whether a maximum occurred. Thus, an ad- ditional point within the interval has to be chosen. Next, we have to pick a fourth point.
  • 374. 13.1 GOLDEN-SECTION SEARCH 357 Then the test for the maximum could be applied to discern whether the maximum occurred within the first three or the last three points. The key to making this approach efficient is the wise choice of the intermediate points. As in bisection, the goal is to minimize function evaluations by replacing old values with new values. This goal can be achieved by specifying that the following two conditions hold (Fig. 13.2): /0 5 /1 1 /2 (13.1) /1 /0 5 /2 /1 (13.2) The first condition specifies that the sum of the two sublengths /1 and /2 must equal the original interval length. The second says that the ratio of the lengths must be equal. Equation (13.1) can be substituted into Eq. (13.2), /1 /1 1 /2 5 /2 /1 (13.3) If the reciprocal is taken and R 5 /2 y/1, we arrive at 1 1 R 5 1 R (13.4) or R2 1 R 2 1 5 0 (13.5) which can be solved for the positive root R 5 21 1 11 2 4(21) 2 5 15 2 1 2 5 0.61803p (13.6) FIGURE 13.2 The initial step of the golden-section search algorithm involves choosing two interior points according to the golden ratio. Maximum First iteration Second iteration f (x) x xu xl ᐉ0 ᐉ1 ᐉ2 ᐉ2
  • 375. 358 ONE-DIMENSIONAL UNCONSTRAINED OPTIMIZATION This value, which has been known since antiquity, is called the golden ratio (see Box 13.1). Because it allows optima to be found efficiently, it is the key element of the golden-section method we have been developing conceptually. Now let us derive an al- gorithm to implement this approach on the computer. As mentioned above and as depicted in Fig. 13.4, the method starts with two initial guesses, xl and xu, that bracket one local extremum of f(x). Next, two interior points x1 and x2 are chosen according to the golden ratio, d 5 15 2 1 2 (xu 2 xl) x1 5 xl 1 d x2 5 xu 2 d The function is evaluated at these two interior points. Two results can occur: 1. If, as is the case in Fig. 13.4, f(x1) . f(x2), then the domain of x to the left of x2, from xl to x2, can be eliminated because it does not contain the maximum. For this case, x2 becomes the new xl for the next round. 2. If f(x2) . f(x1), then the domain of x to the right of x1, from x1 to xu would have been eliminated. In this case, x1 becomes the new xu for the next round. Box 13.1 The Golden Ratio and Fibonacci Numbers In many cultures, certain numbers are ascribed qualities. For example, we in the West are all familiar with “Lucky 7” and “Friday the 13th.” Ancient Greeks called the following number the “golden ratio:” 15 2 1 2 5 0.61803 p This ratio was employed for a number of purposes, including the development of the rectangle in Fig. 13.3. These proportions were considered aesthetically pleasing by the Greeks. Among other things, many of their temples followed this shape. The golden ratio is related to an important mathematical series known as the Fibonacci numbers, which are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, p Thus, each number after the first two represents the sum of the preceding two. This sequence pops up in many diverse areas of sci- ence and engineering. In the context of the present discussion, an interesting property of the Fibonacci sequence relates to the ratio of consecutive numbers in the sequence; that is, 0y1 5 0, 1y1 5 1, 1y2 5 0.5, 2y3 5 0.667, 3y5 5 0.6, 5y8 5 0.625, 8y13 5 0.615, and so on. As one proceeds, the ratio of consecutive numbers ap- proaches the golden ratio! FIGURE 13.3 The Parthenon in Athens, Greece, was constructed in the 5th century B.C. Its front dimensions can be fit almost exactly within a golden rectangle. 0.61803 1
  • 376. 13.1 GOLDEN-SECTION SEARCH 359 Now, here is the real benefit from the use of the golden ratio. Because the original x1 and x2 were chosen using the golden ratio, we do not have to recalculate all the func- tion values for the next iteration. For example, for the case illustrated in Fig. 13.4, the old x1 becomes the new x2. This means that we already have the value for the new f(x2), since it is the same as the function value at the old x1. To complete the algorithm, we now only need to determine the new x1. This is done with the same proportionality as before, x1 5 xl 1 15 2 1 2 (xu 2 xl) A similar approach would be used for the alternate case where the optimum fell in the left subinterval. As the iterations are repeated, the interval containing the extremum is reduced rap- idly. In fact, each round the interval is reduced by a factor of the golden ratio (about 61.8%). That means that after 10 rounds, the interval is shrunk to about 0.61810 or 0.008 or 0.8% of its initial length. After 20 rounds, it is about 0.0066%. This is not quite as good as the reduction achieved with bisection, but this is a harder problem. FIGURE 13.4 (a) The initial step of the golden-section search algorithm involves choosing two interior points ac- cording to the golden ratio. (b) The second step involves defining a new interval that includes the optimum. Extremum (maximum) Eliminate f (x) x x1 xl d xu x2 d (a) f(x) x x2 x1 xl Old x1 Old x2 xu (b)
  • 377. 360 ONE-DIMENSIONAL UNCONSTRAINED OPTIMIZATION EXAMPLE 13.1 Golden-Section Search Problem Statement. Use the golden-section search to find the maximum of f(x) 5 2 sin x 2 x2 10 within the interval xl 5 0 and xu 5 4. Solution. First, the golden ratio is used to create the two interior points d 5 15 2 1 2 (4 2 0) 5 2.472 x1 5 0 1 2.472 5 2.472 x2 5 4 2 2.472 5 1.528 The function can be evaluated at the interior points f(x2) 5 f(1.528) 5 2 sin(1.528) 2 1.5282 10 5 1.765 f(x1) 5 f(2.472) 5 0.63 Because f(x2) . f(x1), the maximum is in the interval defined by xl, x2, and x1. Thus, for the new interval, the lower bound remains xl 5 0, and x1 becomes the upper bound, that is, xu 5 2.472. In addition, the former x2 value becomes the new x1, that is, x1 5 1.528. Further, we do not have to recalculate f(x1) because it was determined on the previous it- eration as f(1.528) 5 1.765. All that remains is to compute the new values of d and x2, d 5 15 2 1 2 (2.472 2 0) 5 1.528 x2 5 2.4721 2 1.528 5 0.944 The function evaluation at x2 is f(0.994) 5 1.531. Since this value is less than the function value at x1, the maximum is in the interval prescribed by x2, x1, and xu. The process can be repeated, with the results tabulated below: i xl f(xl) x2 f(x2) x1 f(x1) xu f(xu) d 1 0 0 1.5279 1.7647 2.4721 0.6300 4.0000 23.1136 2.4721 2 0 0 0.9443 1.5310 1.5279 1.7647 2.4721 0.6300 1.5279 3 0.9443 1.5310 1.5279 1.7647 1.8885 1.5432 2.4721 0.6300 0.9443 4 0.9443 1.5310 1.3050 1.7595 1.5279 1.7647 1.8885 1.5432 0.5836 5 1.3050 1.7595 1.5279 1.7647 1.6656 1.7136 1.8885 1.5432 0.3607 6 1.3050 1.7595 1.4427 1.7755 1.5279 1.7647 1.6656 1.7136 0.2229 7 1.3050 1.7595 1.3901 1.7742 1.4427 1.7755 1.5279 1.7647 0.1378 8 1.3901 1.7742 1.4427 1.7755 1.4752 1.7732 1.5279 1.7647 0.0851
  • 378. 13.1 GOLDEN-SECTION SEARCH 361 Note that the current maximum is highlighted for every iteration. After the eighth iteration, the maximum occurs at x 5 1.4427 with a function value of 1.7755. Thus, the result is converging on the true value of 1.7757 at x 5 1.4276. Recall that for bisection (Sec. 5.2.1), an exact upper bound for the error can be cal- culated at each iteration. Using similar reasoning, an upper bound for golden-section search can be derived as follows: Once an iteration is complete, the optimum will either fall in one of two intervals. If x2 is the optimum function value, it will be in the lower interval (xl, x2, x1). If x1 is the optimum function value, it will be in the upper interval (x2, x1, xu). Because the interior points are symmetrical, either case can be used to define the error. Looking at the upper interval, if the true value were at the far left, the maximum distance from the estimate would be ¢xa 5 x1 2 x2 5 xl 1 R(xu 2 xl) 2 xu 1 R(xu 2 xl) 5 (xl 2 xu) 1 2R(xu 2 xl) 5 (2R 2 1)(xu 2 xl) or 0.236(xu 2 xl). If the true value were at the far right, the maximum distance from the estimate would be ¢xb 5 xu 2 x1 5 xu 2 xl 2 R(xu 2 xl) 5 (1 2 R)(xu 2 xl) or 0.382(xu 2 xl). Therefore, this case would represent the maximum error. This result can then be normalized to the optimal value for that iteration, xopt, to yield ea 5 (1 2 R) ` xu 2 xl xopt ` 100% This estimate provides a basis for terminating the iterations. Pseudocode for the golden-section-search algorithm for maximization is presented in Fig. 13.5a. The minor modifications to convert the algorithm to minimization are listed in Fig. 13.5b. In both versions the x value for the optimum is returned as the function value (gold). In addition, the value of f(x) at the optimum is returned as the variable (fx). You may be wondering why we have stressed the reduced function evaluations of the golden-section search. Of course, for solving a single optimization, the speed savings would be negligible. However, there are two important contexts where minimizing the number of function evaluations can be important. These are 1. Many evaluations. There are cases where the golden-section-search algorithm may be a part of a much larger calculation. In such cases, it may be called many times. Therefore, keeping function evaluations to a minimum could pay great dividends for such cases.
  • 379. 362 ONE-DIMENSIONAL UNCONSTRAINED OPTIMIZATION FUNCTION Gold (xlow, xhigh, maxit, es, fx) R 5 (50.5 2 1)Y2 x/ = xlow; xu 5 xhigh iter 5 1 d 5 R * (xu 2 x/) x1 5 x/ 1 d; x2 5 xu 2 d f1 5 f(x1) f2 5 f(x2) IF f1 . f2 THEN IF f1 , f2 THEN xopt 5 x1 fx 5 f1 ELSE xopt 5 x2 fx 5 f2 END IF DO d 5 R*d; xint 5 xu 2 x/ IF f1 . f2 THEN IF f1 , f2 THEN x/ 5 x2 x2 5 x1 x1 5 x/1d f2 5 f1 f1 5 f(x1) ELSE xu 5 x1 x1 5 x2 x2 5 xu2d f1 5 f2 f2 5 f(x2) END IF iter 5 iter11 IF f1 . f2 THEN IF f1 , f2 THEN xopt 5 x1 fx 5 f1 ELSE xopt 5 x2 fx 5 f2 END IF IF xopt fi 0. THEN ea 5 (1.2R) *ABS(xintyxopt)*100. END IF IF ea # es OR iter $ maxit EXIT END DO Gold 5 xopt END Gold (a)Maximization (b)Minimization FIGURE 13.5 Algorithm for the golden-section search.
  • 380. 13.2 PARABOLIC INTERPOLATION 363 2. Time-consuming evaluation. For pedagogical reasons, we use simple functions in most of our examples. You should understand that a function can be very complex and time- consuming to evaluate. For example, in a later part of this book, we will describe how optimization can be used to estimate the parameters of a model consisting of a system of differential equations. For such cases, the “function” involves time-consuming model integration. Any method that minimizes such evaluations would be advantageous. 13.2 PARABOLIC INTERPOLATION Parabolic interpolation takes advantage of the fact that a second-order polynomial often provides a good approximation to the shape of f(x) near an optimum (Fig. 13.6). Just as there is only one straight line connecting two points, there is only one qua- dratic polynomial or parabola connecting three points. Thus, if we have three points that jointly bracket an optimum, we can fit a parabola to the points. Then we can differenti- ate it, set the result equal to zero, and solve for an estimate of the optimal x. It can be shown through some algebraic manipulations that the result is x3 5 f(x0)(x2 1 2 x2 2) 1 f(x1)(x2 2 2 x2 0) 1 f(x2)(x2 0 2 x2 1) 2 f(x0)(x1 2 x2) 1 2 f(x1)(x2 2 x0) 1 2 f(x2)(x0 2 x1) (13.7) where x0, x1, and x2 are the initial guesses, and x3 is the value of x that corresponds to the maximum value of the parabolic fit to the guesses. After generating the new point, there are two strategies for selecting the points for the next iteration. The simplest ap- proach, which is similar to the secant method, is to merely assign the new points se- quentially. That is, for the new iteration, z0 5 z1, z1 5 z2, and z2 5 z3. Alternatively, as illustrated in the following example, a bracketing approach, similar to bisection or the golden-section search, can be employed. FIGURE 13.6 Graphical description of parabolic interpolation. Parabolic approximation of maximum Parabolic function True maximum True function f (x) x x0 x1 x3 x2
  • 381. 364 ONE-DIMENSIONAL UNCONSTRAINED OPTIMIZATION EXAMPLE 13.2 Parabolic Interpolation Problem Statement. Use parabolic interpolation to approximate the maximum of f(x) 5 2 sin x 2 x2 10 with initial guesses of x0 5 0, x1 5 1, and x2 5 4. Solution. The function values at the three guesses can be evaluated, x0 5 0 f(x0) 5 0 x1 5 1 f(x1) 5 1.5829 x2 5 4 f(x2) 5 23.1136 and substituted into Eq. (13.7) to give x3 5 0(12 2 42 ) 1 1.5829(42 2 02 ) 1 (23.1136)(02 2 12 ) 2(0)(1 2 4) 1 2(1.5829)(4 2 0) 1 2(23.1136)(0 2 1) 5 1.5055 which has a function value of f(1.5055) 5 1.7691. Next, a strategy similar to the golden-section search can be employed to determine which point should be discarded. Because the function value for the new point is higher than for the intermediate point (x1) and the new x value is to the right of the intermedi- ate point, the lower guess (x0) is discarded. Therefore, for the next iteration, x0 5 1 f(x0) 5 1.5829 x1 5 1.5055 f(x1) 5 1.7691 x2 5 4 f(x2) 5 23.1136 which can be substituted into Eq. (13.7) to give x3 5 1.5829(1.50552 2 42 ) 1 1.7691(42 2 12 ) 1 (23.1136)(12 2 1.50552 ) 2(1.5829)(1.5055 2 4) 1 2(1.7691)(4 2 1) 1 2(23.1136)(1 2 1.5055) 5 1.4903 which has a function value of f(1.4903) 5 1.7714. The process can be repeated, with the results tabulated below: i x0 f(x0) x1 f(x1) x2 f(x2) x3 f(x3) 1 0.0000 0.0000 1.0000 1.5829 4.0000 23.1136 1.5055 1.7691 2 1.0000 1.5829 1.5055 1.7691 4.0000 23.1136 1.4903 1.7714 3 1.0000 1.5829 1.4903 1.7714 1.5055 1.7691 1.4256 1.7757 4 1.0000 1.5829 1.4256 1.7757 1.4903 1.7714 1.4266 1.7757 5 1.4256 1.7757 1.4266 1.7757 1.4903 1.7714 1.4275 1.7757 Thus, within five iterations, the result is converging rapidly on the true value of 1.7757 at x 5 1.4276.
  • 382. 13.3 NEWTON’S METHOD 365 We should mention that just like the false-position method, parabolic interpolation can get hung up with just one end of the interval converging. Thus, convergence can be slow. For example, notice that in our example, 1.0000 was an endpoint for most of the iterations. This method, as well as others using third-order polynomials, can be formulated into algorithms that contain convergence tests, careful selection strategies for the points to retain on each iteration, and attempts to minimize round-off error accumulation. 13.3 NEWTON’S METHOD Recall that the Newton-Raphson method of Chap. 6 is an open method that finds the root x of a function such that f(x) 5 0. The method is summarized as xi11 5 xi 2 f(xi) f¿(xi) A similar open approach can be used to find an optimum of f(x) by defining a new function, g(x) 5 f9(x). Thus, because the same optimal value x* satisfies both f¿(x*) 5 g(x*) 5 0 we can use the following, xi11 5 xi 2 f¿(xi) f–(xi) (13.8) as a technique to find the minimum or maximum of f(x). It should be noted that this equation can also be derived by writing a second-order Taylor series for f(x) and setting the derivative of the series equal to zero. Newton’s method is an open method similar to Newton-Raphson because it does not require initial guesses that bracket the optimum. In addition, it also shares the disadvantage that it may be divergent. Finally, it is usually a good idea to check that the second derivative has the correct sign to confirm that the technique is converging on the result you desire. EXAMPLE 13.3 Newton’s Method Problem Statement. Use Newton’s method to find the maximum of f(x) 5 2 sin x 2 x2 10 with an initial guess of x0 5 2.5. Solution. The first and second derivatives of the function can be evaluated as f ¿(x) 5 2 cos x 2 x 5 f–(x) 5 22 sin x 2 1 5
  • 383. 366 ONE-DIMENSIONAL UNCONSTRAINED OPTIMIZATION which can be substituted into Eq. (13.8) to give xi11 5 xi 2 2 cos xi 2 xiy5 22 sin xi 2 1y5 Substituting the initial guess yields x1 5 2.5 2 2 cos 2.5 2 2.5y5 22 sin 2.5 2 1y5 5 0.99508 which has a function value of 1.57859. The second iteration gives x1 5 0.995 2 2 cos 0.995 2 0.995y5 22 sin 0.995 2 1y5 5 1.46901 which has a function value of 1.77385. The process can be repeated, with the results tabulated below: i x f(x) f’(x) f’’(x) 0 2.5 0.57194 22.10229 21.39694 1 0.99508 1.57859 0.88985 21.87761 2 1.46901 1.77385 20.09058 22.18965 3 1.42764 1.77573 20.00020 22.17954 4 1.42755 1.77573 0.00000 22.17952 Thus, within four iterations, the result converges rapidly on the true value. Although Newton’s method works well in some cases, it is impractical for cases where the derivatives cannot be conveniently evaluated. For these cases, other approaches that do not involve derivative evaluation are available. For example, a secant-like version of Newton’s method can be developed by using finite-difference approximations for the derivative evaluations. A bigger reservation regarding the approach is that it may diverge based on the nature of the function and the quality of the initial guess. Thus, it is usually employed only when we are close to the optimum. As described next, hybrid techniques that use bracketing approaches far from the optimum and open methods near the optimum attempt to exploit the strong points of both approaches. 13.4 BRENT’S METHOD Recall that in Sec. 6.4, we described Brent’s method for root location. This hybrid method combined several root-finding methods into a single algorithm that balanced reliability with efficiency. Brent also developed a similar approach for one-dimensional minimization. It combines the slow, dependable golden-section search with the faster, but possibly unreliable, parabolic interpolation. It first attempts parabolic interpolation and keeps applying it as long as ac- ceptable results are obtained. If not, it uses the golden-section search to get matters in hand. Figure 13.7 presents pseudocode for the algorithm based on a MATLAB software M-file developed by Cleve Moler (2005). It represents a stripped-down version of the
  • 384. 13.4 BRENT’S METHOD 367 Function fminsimp(x1, xu) tol 5 0.000001; phi 5 (1 + 15)/2;; rho 5 2 2 phi u 5 x1 1 rho*(xu 2 x1); v 5 u; w 5 u; x 5 u fu 5 f(u); fv 5 fu; fw 5 fu; fx 5 fu xm 5 0.5*(x1 1 xu); d 5 0; e 5 0 DO IF |x 2 xm| # tol EXIT para 5 |e| . tol IF para THEN (Try parabolic fit) r 5 (x 2 w)*(fx 2 fv); q 5 (x 2 v)*(fx 2 fw) p 5 (x 2 v)*q 2 (x 2 w)*r; s 5 2*(q 2 r) IF s . 0 THEN p 5 2p s 5 |s| ' Is the parabola acceptable? para 5 |p| , |0.5*s*e| And p . s*(x1 2 x) And p , s*(xu 2 x) IF para THEN e 5 d; d 5 p/s (Parabolic interpolation step) ENDIF ENDIF IF Not para THEN IF x $ xm THEN (Golden-section search step) e 5 x1 2 x ELSE e 5 xu 2 x ENDIF d 5 rho*e ENDIF u 5 x 1 d; fu 5 f(u) IF fu # fx THEN (Update x1, xu, x, v, w, xm) IF u $ x THEN x1 5 x ELSE xu 5 x ENDIF v 5 w; fv 5 fw; w 5 x; fw 5 fx; x 5 u; fx 5 fu ELSE IF u , x THEN x1 5 u ELSE xu 5 u ENDIF IF fu # fw Or w 5 x THEN v 5 w; fv 5 fw; w 5 u; fw 5 fu ELSEIF fu # fv Or v 5 x Or v 5 w THEN v 5 u; fv 5 fu ENDIF ENDIF xm 5 0.5*(x1 1 xu) ENDDO fminsimp 5 fu END fminsimp FIGURE 13.7 Pseudocode for Brent’s minimum-finding algorithm based on a MATLAB M-file developed by Cleve Moler (2005).
  • 385. 368 ONE-DIMENSIONAL UNCONSTRAINED OPTIMIZATION fminbnd function, which is the professional minimization function employed in MATLAB. For that reason, we call the simplified version fminsimp. Note that it requires another function f that holds the equation for which the minimum is being evaluated. This concludes our treatment of methods to solve the optima of functions of a single variable. Some engineering examples are presented in Chap. 16. In addition, the techniques described here are an important element of some procedures to optimize multivariable functions, as discussed in Chap. 14. PROBLEMS 13.1 Given the formula f(x) 5 2x2 1 8x 2 12 (a) Determine the maximum and the corresponding value of x for this function analytically (i.e., using differentiation). (b) Verify that Eq. (13.7) yields the same results based on initial guesses of x0 5 0, x1 5 2, and x2 5 6. 13.2 Given f(x) 5 21.5x6 2 2x4 1 12x (a) Plot the function. (b) Use analytical methods to prove that the function is concave for all values of x. (c) Differentiate the function and then use a root-location method to solve for the maximum f(x) and the corresponding value of x. 13.3 Solve for the value of x that maximizes f(x) in Prob. 13.2 using the golden-section search. Employ initial guesses of xl 5 0 and xu 5 2 and perform three iterations. 13.4 Repeat Prob. 13.3, except use parabolic interpolation in the same fashion as Example 13.2. Employ initial guesses of x0 5 0, x1 5 1, and x2 5 2 and perform three iterations. 13.5 Repeat Prob. 13.3 but use Newton’s method. Employ an ini- tial guess of x0 5 2 and perform three iterations. 13.6 Employ the following methods to find the maximum of f(x) 5 4x 2 1.8x2 1 1.2x3 2 0.3x4 (a) Golden-section search (xl 5 22, xu 5 4, es 5 1%). (b) Parabolic interpolation (x0 5 1.75, x1 5 2, x2 5 2.5, itera- tions 5 4). Select new points sequentially as in the secant method. (c) Newton’s method (x0 5 3, es 5 1%). 13.7 Consider the following function: f(x) 5 2 x4 2 2x3 2 8x2 2 5x Use analytical and graphical methods to show the function has a maximum for some value of x in the range 22 # x # 1. 13.8 Employ the following methods to find the maximum of the function from Prob. 13.7: (a) Golden-section search (xl 5 22, xu 5 1, es 5 1%). (b) Parabolic interpolation (x0 5 22, x1 5 21, x2 5 1, itera- tions 5 4). Select new points sequentially as in the secant method. (c) Newton’s method (x0 5 21, es 5 1%). 13.9 Consider the following function: f(x) 5 2x 1 3 x Perform 10 iterations of parabolic interpolation to locate the mini- mum. Select new points in the same fashion as in Example 13.2. Comment on the convergence of your results. (x0 5 0.1, x1 5 0.5, x2 5 5) 13.10 Consider the following function: f(x) 5 3 1 6x 1 5x2 1 3x3 1 4x4 Locate the minimum by finding the root of the derivative of this function. Use bisection with initial guesses of xl 5 22 and xu 5 1. 13.11 Determine the minimum of the function from Prob. 13.10 with the following methods: (a) Newton’s method (x0 5 21, es 5 1%). (b) Newton’s method, but using a finite difference approximation for the derivative estimates. f¿(x) 5 f(xi 1 dxi) 2 f(xi 2 dxi) 2dxi f–(x) 5 f(xi 1 dxi) 2 2f(xi) 2 f(xi 2 dxi) (dxi)2 where d 5 a perturbation fraction (5 0.01). Use an initial guess of x0 5 21 and iterate to es 5 1%. 13.12 Develop a program using a programming or macro language to implement the golden-section search algorithm. Design the pro- gram so that it is expressly designed to locate a maximum. The subroutine should have the following features:
  • 386. PROBLEMS 369 Given that L 5 600 cm, E 5 50,000 kN/cm2 , I 5 30,000 cm4 , and w0 5 2.5 kN/cm, determine the point of maximum deflection (a) graphically, (b) using the golden-section search until the approximate error falls below es 5 1% with initial guesses of xl 5 0 and xu 5 L. 13.19 An object with a mass of 100 kg is projected upward from the surface of the earth at a velocity of 50 m/s. If the object is subject to linear drag (c 5 15 kg/s), use the golden-section search to determine the maximum height the object attains. Hint: recall Sec. PT4.1.2. 13.20 The normal distribution is a bell-shaped curve defined by y 5 e2x2 Use the golden-section search to determine the location of the inflection point of this curve for positive x. 13.21 An object can be projected upward at a specified velocity. If it is subject to linear drag, its altitude as a function of time can be computed as z 5 z0 1 m c ay0 1 mg c b (1 2 e2(cym)t ) 2 mg c t where z 5 altitude (m) above the earth’s surface (defined as z 5 0), z0 5 the initial altitude (m), m 5 mass (kg), c 5 a linear drag coef- ficient (kg/s), v0 5 initial velocity (m/s), and t 5 time (s). Note that for this formulation, positive velocity is considered to be in the up- ward direction. Given the following parameter values: g 5 9.81 m/s2 , z0 5 100 m, v0 5 55 m/s, m 5 80 kg, and c 5 15 kg/s, the equation can be used to calculate the jumper’s altitude. Determine the time and altitude of the peak elevation (a) graphically, (b) analytically, and (c) with the golden-section search until the approximate error falls be- low es 5 1% with initial guesses of tl 5 0 and tu 5 10 s. 13.22 Use the golden-section search to determine the length of the shortest ladder that reaches from the ground over the fence to touch the building’s wall (Fig. P13.22). Test it for the case where h 5 d 5 4 m. • Iterate until the relative error falls below a stopping criterion or exceeds a maximum number of iterations. • Return both the optimal x and f(x). • Minimize the number of function evaluations. Test your program with the same problem as Example 13.1. 13.13 Develop a program as described in Prob. 13.12, but make it perform minimization or maximization depending on the user’s preference. 13.14 Develop a program using a programming or macro language to implement the parabolic interpolation algorithm. Design the pro- gram so that it is expressly designed to locate a maximum and se- lects new points as in Example 13.2. The subroutine should have the following features: • Base it on two initial guesses, and have the program generate the third initial value at the midpoint of the interval. • Check whether the guesses bracket a maximum. If not, the sub- routine should not implement the algorithm, but should return an error message. • Iterate until the relative error falls below a stopping criterion or exceeds a maximum number of iterations. • Return both the optimal x and f(x). • Minimize the number of function evaluations. Test your program with the same problem as Example 13.2. 13.15 Develop a program using a programming or macro language to implement Newton’s method. The subroutine should have the following features: • Iterate until the relative error falls below a stopping criterion or exceeds a maximum number of iterations. • Returns both the optimal x and f(x). Test your program with the same problem as Example 13.3. 13.16 Pressure measurements are taken at certain points behind an airfoil over time. These data best fit the curve y 5 6 cos x 2 1.5 sin x from x 5 0 to 6 s. Use four iterations of the golden-search method to find the minimum pressure. Set xl 5 2 and xu 5 4. 13.17 The trajectory of a ball can be computed with y 5 (tan u0)x 2 g 2y2 0 cos2 u0 x2 1 y0 where y 5 the height (m), u0 5 the initial angle (radians), y0 5 the initial velocity (m/s), g 5 the gravitational constant 5 9.81 m/s2 , and y0 5 the initial height (m). Use the golden-section search to determine the maximum height given y0 5 1 m, y0 5 25 m/s and u0 5 508. Iterate until the approximate error falls below es 5 1% using initial guesses of xl 5 0 and xu 5 60 m. 13.18 The deflection of a uniform beam subject to a linearly in- creasing distributed load can be computed as y 5 w0 120EIL (2x5 1 2L2 x3 2 L4 x) d h FIGURE P13.22 A ladder leaning against a fence and just touching a wall.
  • 387. 14 C H A P T E R 14 370 Multidimensional Unconstrained Optimization This chapter describes techniques to find the minimum or maximum of a function of several variables. Recall from Chap. 13 that our visual image of a one-dimensional search was like a roller coaster. For two-dimensional cases, the image becomes that of moun- tains and valleys (Fig. 14.1). For higher-dimensional problems, convenient images are not possible. We have chosen to limit this chapter to the two-dimensional case. We have adopted this approach because the essential features of multidimensional searches are often best communicated visually. Techniques for multidimensional unconstrained optimization can be classified in a number of ways. For purposes of the present discussion, we will divide them depending on whether they require derivative evaluation. The approaches that do not require de- rivative evaluation are called nongradient, or direct, methods. Those that require deriva- tives are called gradient, or descent (or ascent), methods. FIGURE 14.1 The most tangible way to visual- ize two-dimensional searches is in the context of ascending a mountain (maximization) or descending into a valley (minimization). (a) A 2-D topographic map that corresponds to the 3-D mountain in (b). Lines of constant f x x y f y (a) (b)
  • 388. 14.1 DIRECT METHODS 371 14.1 DIRECT METHODS These methods vary from simple brute force approaches to more elegant techniques that attempt to exploit the nature of the function. We will start our discussion with a brute force approach. 14.1.1 Random Search A simple example of a brute force approach is the random search method. As the name implies, this method repeatedly evaluates the function at randomly selected values of the independent variables. If a sufficient number of samples are conducted, the optimum will eventually be located. EXAMPLE 14.1 Random Search Method Problem Statement. Use a random number generator to locate the maximum of f(x, y) 5 y 2 x 2 2x2 2 2xy 2 y2 (E14.1.1) in the domain bounded by x 5 22 to 2 and y 5 1 to 3. The domain is depicted in Fig. 14.2. Notice that a single maximum of 1.5 occurs at x 5 21 and y 5 1.5. Solution. Random number generators typically generate values between 0 and 1. If we designate such a number as r, the following formula can be used to generate x values randomly within a range between xl to xu: x 5 xl 1 (xu 2 xl)r For the present application, xl 5 22 and xu 5 2, and the formula is x 5 22 1 (2 2 (22))r 5 22 1 4r This can be tested by substituting 0 and 1 to yield 22 and 2, respectively. FIGURE 14.2 Equation (E14.1.1) showing the maximum at x 5 21 and y 5 1.5. 2 1 0 0 0 –10 –20 Maximum –1 –2 1 2 3 y x
  • 389. 372 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION This simple brute force approach works even for discontinuous and nondifferentiable functions. Furthermore, it always finds the global optimum rather than a local optimum. Its major shortcoming is that as the number of independent variables grows, the imple- mentation effort required can become onerous. In addition, it is not efficient because it takes no account of the behavior of the underlying function. The remainder of the ap- proaches described in this chapter do take function behavior into account as well as the results of previous trials to improve the speed of convergence. Thus, although the random search can certainly prove useful in specific problem contexts, the following methods have more general utility and almost always lead to more efficient convergence. Iterations x y f (x, y) 1000 20.9886 1.4282 1.2462 2000 21.0040 1.4724 1.2490 3000 21.0040 1.4724 1.2490 4000 21.0040 1.4724 1.2490 5000 21.0040 1.4724 1.2490 6000 20.9837 1.4936 1.2496 7000 20.9960 1.5079 1.2498 8000 20.9960 1.5079 1.2498 9000 20.9960 1.5079 1.2498 10000 20.9978 1.5039 1.2500 Similarly for y, a formula for the present example could be developed as y 5 yl 1 (yu 2 yl)r 5 1 1 (3 2 1)r 5 1 1 2r The following Excel VBA macrocode uses the VBA random number function Rnd, to generate (x, y) pairs. These are then substituted into Eq. (E14.1.1). The maximum value from among these random trials is stored in the variable maxf, and the correspond- ing x and y values in maxx and maxy, respectively. maxf = −1E9 For j = 1 To n x = −2 + 4 * Rnd y = 1 + 2 * Rnd fn = y − x − 2 * x ^ 2 − 2 * x * y − y ^ 2 If fn maxf Then maxf = fn maxx = x maxy = y End If Next j A number of iterations yields The results indicate that the technique homes in on the true maximum.
  • 390. 14.1 DIRECT METHODS 373 It should be noted that more sophisticated search techniques are available. These are heuristic approaches that were developed to handle either nonlinear and/or discontinuous problems that classical optimization cannot usually handle well, if at all. Simulated an- nealing, tabu search, artificial neural networks, and genetic algorithms are a few. The most widely applied is the genetic algorithm, with a number of commercial packages available. Holland (1975) pioneered the genetic algorithm approach and Davis (1991) and Goldberg (1989) provide good overviews of the theory and application of the method. 14.1.2 Univariate and Pattern Searches It is very appealing to have an efficient optimization approach that does not require evaluation of derivatives. The random search method described above does not require derivative evaluation, but it is not very efficient. This section describes an approach, the univariate search method, that is more efficient and still does not require derivative evaluation. The basic strategy underlying the univariate search method is to change one variable at a time to improve the approximation while the other variables are held constant. Since only one variable is changed, the problem reduces to a sequence of one-dimensional searches that can be solved using a variety of methods (including those described in Chap. 13). Let us perform a univariate search graphically, as shown in Fig. 14.3. Start at point 1, and move along the x axis with y constant to the maximum at point 2. You can see that point 2 is a maximum by noticing that the trajectory along the x axis just touches a contour line at the point. Next, move along the y axis with x constant to point 3. Continue this process generating points 4, 5, 6, etc. FIGURE 14.3 A graphical depiction of how a univariate search is conducted. 6 4 5 3 1 2 y x
  • 391. 374 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION Although we are gradually moving toward the maximum, the search becomes less efficient as we move along the narrow ridge toward the maximum. However, also note that lines joining alternate points such as 1-3, 3-5 or 2-4, 4-6 point in the general direc- tion of the maximum. These trajectories present an opportunity to shoot directly along the ridge toward the maximum. Such trajectories are called pattern directions. Formal algorithms are available that capitalize on the idea of pattern directions to find optimum values efficiently. The best known of these algorithms is called Powell’s method. It is based on the observation (see Fig. 14.4) that if points 1 and 2 are obtained by one-dimensional searches in the same direction but from different starting points, then the line formed by 1 and 2 will be directed toward the maximum. Such lines are called conjugate directions. In fact, it can be proved that if f(x, y) is a quadratic function, sequential searches along conjugate directions will converge exactly in a finite number of steps regardless of the starting point. Since a general nonlinear function can often be reasonably ap- proximated by a quadratic function, methods based on conjugate directions are usually quite efficient and are in fact quadratically convergent as they approach the optimum. Let us graphically implement a simplified version of Powell’s method to find the maximum of f(x, y) 5 c 2 (x 2 a)2 2 (y 2 b)2 where a, b, and c are positive constants. This equation results in circular contours in the x, y plane, as shown in Fig. 14.5. Initiate the search at point 0 with starting directions h1 and h2. Note that h1 and h2 are not necessarily conjugate directions. From zero, move along h1 until a maximum is located 2 1 y x FIGURE 14.4 Conjugate directions.
  • 392. 14.2 GRADIENT METHODS 375 at point 1. Then search from point 1 along direction h2 to find point 2. Next, form a new search direction h3 through points 0 and 2. Search along this direction until the maximum at point 3 is located. Then search from point 3 in the h2 direction until the maximum at point 4 is located. From point 4 arrive at point 5 by again searching along h3. Now, observe that both points 5 and 3 have been located by searching in the h3 direction from two dif- ferent points. Powell has shown that h4 (formed by points 3 and 5) and h3 are conjugate directions. Thus, searching from point 5 along h4 brings us directly to the maximum. Powell’s method can be refined to make it more efficient, but the formal algorithms are beyond the scope of this text. However, it is an efficient method that is quadratically convergent without requiring derivative evaluation. 14.2 GRADIENT METHODS As the name implies, gradient methods explicitly use derivative information to generate efficient algorithms to locate optima. Before describing specific approaches, we must first review some key mathematical concepts and operations. 14.2.1 Gradients and Hessians Recall from calculus that the first derivative of a one-dimensional function provides a slope or tangent to the function being differentiated. From the standpoint of optimization, this is useful information. For example, if the slope is positive, it tells us that increasing the independent variable will lead to a higher value of the function we are exploring. From calculus, also recall that the first derivative may tell us when we have reached an optimal value since this is the point that the derivative goes to zero. Further, the sign of the second derivative can tell us whether we have reached a minimum (positive second derivative) or a maximum (negative second derivative). FIGURE 14.5 Powell’s method. 2 3 0 1 4 5 h3 h2 h1 h2 h2 h3 h4 y x
  • 393. 376 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION These ideas were useful to us in the one-dimensional search algorithms we explored in Chap. 13. However, to fully understand multidimensional searches, we must first understand how the first and second derivatives are expressed in a multidimensional context. The Gradient. Suppose we have a two-dimensional function f(x, y). An example might be your elevation on a mountain as a function of your position. Suppose that you are at a specific location on the mountain (a, b) and you want to know the slope in an arbitrary direction. One way to define the direction is along a new axis h that forms an angle u with the x axis (Fig. 14.6). The elevation along this new axis can be thought of as a new func- tion g(h). If you define your position as being the origin of this axis (that is, h 5 0), the slope in this direction would be designated as g9(0). This slope, which is called the direc- tional derivative, can be calculated from the partial derivatives along the x and y axis by g¿(0) 5 0f 0x cos u 1 0f 0y sin u (14.1) where the partial derivatives are evaluated at x 5 a and y 5 b. Assuming that your goal is to gain the most elevation with the next step, the next logical question would be: what direction is the steepest ascent? The answer to this question is provided very neatly by what is referred to mathematically as the gradient, which is defined as §f 5 0f 0x i 1 0f 0y j (14.2) This vector is also referred to as “del f.” It represents the directional derivative of f(x, y) at point x 5 a and y 5 b. x = a y = b h = 0 h ␪ y x FIGURE 14.6 The directional gradient is defined along an axis h that forms an angle u with the x axis.
  • 394. 14.2 GRADIENT METHODS 377 Vector notation provides a concise means to generalize the gradient to n dimensions, as §f(x) 5 i 0f 0x1 (x) 0f 0x2 (x) . . . 0f 0xn (x) y How do we use the gradient? For the mountain-climbing problem, if we are inter- ested in gaining elevation as quickly as possible, the gradient tells us what direction to move locally and how much we will gain by taking it. Note, however, that this strategy does not necessarily take us on a direct path to the summit! We will discuss these ideas in more depth later in this chapter. EXAMPLE 14.2 Using the Gradient to Evaluate the Path of Steepest Ascent Problem Statement. Employ the gradient to evaluate the steepest ascent direction for the function f(x, y) 5 xy2 at the point (2, 2). Assume that positive x is pointed east and positive y is pointed north. Solution. First, our elevation can be determined as f(2, 2) 5 2(2)2 5 8 Next, the partial derivatives can be evaluated, 0f 0x 5 y2 5 22 5 4 0f 0y 5 2xy 5 2(2)(2) 5 8 which can be used to determine the gradient as §f 5 4i 1 8j This vector can be sketched on a topographical map of the function, as in Fig. 14.7. This immediately tells us that the direction we must take is u 5 tan21 a 8 4 b 5 1.107 radians (563.4°) relative to the x axis. The slope in this direction, which is the magnitude of =f, can be calculated as 242 1 82 5 8.944
  • 395. 378 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION Thus, during our first step, we will initially gain 8.944 units of elevation rise for a unit distance advanced along this steepest path. Observe that Eq. (14.1) yields the same result, g¿(0) 5 4 cos(1.107) 1 8 sin(1.107) 5 8.944 Note that for any other direction, say u 5 1.107y2 5 0.5235, g9(0) 5 4 cos(0.5235) 1 8 sin(0.5235) 5 7.608, which is smaller. As we move forward, both the direction and magnitude of the steepest path will change. These changes can be quantified at each step using the gradient, and your climb- ing direction modified accordingly. A final insight can be gained by inspecting Fig. 14.7. As indicated, the direction of steepest ascent is perpendicular, or orthogonal, to the elevation contour at the coordinate (2, 2). This is a general characteristic of the gradient. 0 0 1 2 3 4 1 2 3 4 y x 8 24 40 FIGURE 14.7 The arrow follows the direction of steepest ascent calculated with the gradient. Aside from defining a steepest path, the first derivative can also be used to discern whether an optimum has been reached. As is the case for a one-dimensional function, if the partial derivatives with respect to both x and y are zero, a two-dimensional optimum has been reached. The Hessian. For one-dimensional problems, both the first and second derivatives pro- vide valuable information for searching out optima. The first derivative (a) provides a steepest trajectory of the function and (b) tells us that we have reached an optimum. Once at an optimum, the second derivative tells us whether we are a maximum [negative
  • 396. 14.2 GRADIENT METHODS 379 f 0(x)] or a minimum [positive f 0(x)]. In the previous paragraphs, we illustrated how the gradient provides best local trajectories for multidimensional problems. Now, we will examine how the second derivative is used in such contexts. You might expect that if the partial second derivatives with respect to both x and y are both negative, then you have reached a maximum. Figure 14.8 shows a function where this is not true. The point (a, b) of this graph appears to be a minimum when observed along either the x dimension or the y dimension. In both instances, the second partial derivatives are positive. However, if the function is observed along the line y 5 x, it can be seen that a maximum occurs at the same point. This shape is called a saddle, and clearly, neither a maximum or a minimum occurs at the point. Whether a maximum or a minimum occurs involves not only the partials with respect to x and y but also the second partial with respect to x and y. Assuming that the partial derivatives are continuous at and near the point being evaluated, the following quantity can be computed: ZHZ 5 02 f 0 x2 02 f 0 y2 2 a 02 f 0 x0 y b 2 (14.3) Three cases can occur If ZHZ . 0 and 02 fy0x2 . 0, then f(x, y) has a local minimum. If ZHZ . 0 and 02 fy0x2 , 0, then f(x, y) has a local maximum. If ZHZ , 0, then f(x, y) has a saddle point. f(x, y) (a, b) x y y = x FIGURE 14.8 A saddle point (x 5 a and y 5 b). Notice that when the curve is viewed along the x and y directions, the function appears to go through a minimum (positive second derivative), whereas when viewed along an axis x 5 y, it is concave downward (negative second derivative).
  • 397. 380 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION The quantity ZHZ is equal to the determinant of a matrix made up of the second derivatives,1 H 5 ≥ 02 f 0 x2 02 f 0 x0 y 02 f 0 y0 x 02 f 0 y2 ¥ (14.4) where this matrix is formally referred to as the Hessian of f. Besides providing a way to discern whether a multidimensional function has reached an optimum, the Hessian has other uses in optimization (for example, for the multidi- mensional form of Newton’s method). In particular, it allows searches to include second- order curvature to attain superior results. Finite-Difference Approximations. It should be mentioned that, for cases where they are difficult or inconvenient to compute analytically, both the gradient and the determi- nant of the Hessian can be evaluated numerically. In most cases, the approach introduced in Sec. 6.3.3 for the modified secant method is employed. That is, the independent variables can be perturbed slightly to generate the required partial derivatives. For ex- ample, if a centered-difference approach is adopted, they can be computed as 0f 0x 5 f(x 1 dx, y) 2 f(x 2 dx, y) 2dx (14.5) 0f 0y 5 f(x, y 1 dy) 2 f(x, y 2 dy) 2dy (14.6) 02 f 0x2 5 f(x 1 dx, y) 2 2f(x, y) 1 f(x 2 dx, y) dx2 (14.7) 02 f 0y2 5 f(x, y 1 dy) 2 2f(x, y) 1 f(x, y 2 dy) dy2 (14.8) 02 f 0x0y 5 f(x 1 dx, y 1 dy) 2 f(x 1 dx, y 2 dy) 2 f(x 2 dx, y 1 dy) 1 f(x 2 dx, y 2 dy) 4dxdy (14.9) where d is some small fractional value. Note that the methods employed in commercial software packages also use forward differences. In addition, they are usually more complicated than the approximations listed in Eqs. (14.5) through (14.9). Dennis and Schnabel (1996) provide more detail on such approaches. Regardless of how the approximation is implemented, the important point is that you may have the option of evaluating the gradient and/or the Hessian analytically. This can sometimes be an arduous task, but the performance of the algorithm may benefit 1 Note that 02 fy(0x0y) 5 02 fy(0y0x).
  • 398. 14.2 GRADIENT METHODS 381 enough to make your effort worthwhile. The closed-form derivatives will be exact, but more importantly, you will reduce the number of function evaluations. This latter point can have a critical impact on the execution time. On the other hand, you will often exercise the option of having the quantities com- puted internally using numerical approaches. In many cases, the performance will be quite adequate and you will be saved the difficulty of numerous partial differentiations. Such would be the case on the optimizers used in certain spreadsheets and mathematical software packages (for example, Excel). In such cases, you may not even be given the option of entering an analytically derived gradient and Hessian. However, for small to moderately sized problems, this is usually not a major shortcoming. 14.2.2 Steepest Ascent Method An obvious strategy for climbing a hill would be to determine the maximum slope at your starting position and then start walking in that direction. But clearly, another prob- lem arises almost immediately. Unless you were really lucky and started on a ridge that pointed directly to the summit, as soon as you moved, your path would diverge from the steepest ascent direction. Recognizing this fact, you might adopt the following strategy. You could walk a short distance along the gradient direction. Then you could stop, reevaluate the gradient and walk another short distance. By repeating the process you would eventually get to the top of the hill. Although this strategy sounds superficially sound, it is not very practical. In par- ticular, the continuous reevaluation of the gradient can be computationally demanding. A preferred approach involves moving in a fixed path along the initial gradient until f(x, y) stops increasing, that is, becomes level along your direction of travel. This stopping point becomes the starting point where §f is reevaluated and a new direction followed. The process is repeated until the summit is reached. This approach is called the steepest ascent method.2 It is the most straightforward of the gradient search techniques. The basic idea behind the approach is depicted in Fig. 14.9. We start at an initial point (x0, y0) labeled “0” in the figure. At this point, we deter- mine the direction of steepest ascent, that is, the gradient. We then search along the direction of the gradient, h0, until we find a maximum, which is labeled “1” in the figure. The process is then repeated. Thus, the problem boils down to two parts: (1) determining the “best” direction to search and (2) determining the “best value” along that search direction. As we will see, the effectiveness of the various algorithms described in the coming pages depends on how clever we are at both parts. For the time being, the steepest ascent method uses the gradient approach as its choice for the “best” direction. We have already shown how the gradient is evaluated in Example 14.1. Now, before examining how the algorithm goes about locating the maxi- mum along the steepest direction, we must pause to explore how to transform a function of x and y into a function of h along the gradient direction. 2 Because of our emphasis on maximization here, we use the terminology steepest ascent. The same approach can also be used for minimization, in which case the terminology steepest descent is used.
  • 399. 382 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION Starting at x0, y0 the coordinates of any point in the gradient direction can be ex- pressed as x 5 x0 1 0f 0x h (14.10) y 5 y0 1 0f 0y h (14.11) FIGURE 14.9 A graphical depiction of the method of steepest ascent. 2 1 0 h0 h2 h1 y x FIGURE 14.10 The relationship between an arbitrary direction h and x and y coordinates. 10 y x 6 2 7 4 1 ⵜf = 3i + 4j h = 2 h = 1 h = 0
  • 400. 14.2 GRADIENT METHODS 383 where h is distance along the h axis. For example, suppose x0 5 1 and y0 5 2 and §f 5 3i 1 4j, as shown in Fig. 14.10. The coordinates of any point along the h axis are given by x 5 1 1 3h (14.12) y 5 2 1 4h (14.13) The following example illustrates how we can use these transformations to convert a two-dimensional function of x and y into a one-dimensional function in h. EXAMPLE 14.3 Developing a 1-D Function Along the Gradient Direction Problem Statement. Suppose we have the following two-dimensional function: f(x, y) 5 2xy 1 2x 2 x2 2 2y2 Develop a one-dimensional version of this equation along the gradient direction at point x 5 21 and y 5 1. Solution. The partial derivatives can be evaluated at (21, 1), 0f 0x 5 2y 1 2 2 2x 5 2(1) 1 2 2 2(21) 5 6 0f 0y 5 2x 2 4y 5 2(21) 2 4(1) 5 26 Therefore, the gradient vector is §f 5 6i 2 6j To find the maximum, we could search along the gradient direction, that is, along an h axis running along the direction of this vector. The function can be expressed along this axis as f ax0 1 0f 0x h, y0 1 0f 0y hb 5 f(21 1 6h, 1 2 6h) 5 2(21 1 6h)(1 2 6h) 1 2(21 1 6h) 2 (21 1 6h)2 2 2(1 2 6h)2 where the partial derivatives are evaluated at x 5 21 and y 5 1. By combining terms, we develop a one-dimensional function g(h) that maps f(x, y) along the h axis, g(h) 5 2180h2 1 72h 2 7 Now that we have developed a function along the path of steepest ascent, we can explore how to answer the second question. That is, how far along this path do we travel? One approach might be to move along this path until we find the maximum of this func- tion. We will call the location of this maximum h*. This is the value of the step that maximizes g (and hence, f ) in the gradient direction. This problem is equivalent to find- ing the maximum of a function of a single variable h. This can be done using different one-dimensional search techniques like the ones we discussed in Chap. 13. Thus, we
  • 401. 384 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION convert from finding the optimum of a two-dimensional function to performing a one- dimensional search along the gradient direction. This method is called steepest ascent when an arbitrary step size h is used. If a value of a single step h* is found that brings us directly to the maximum along the gradient direction, the method is called the optimal steepest ascent. EXAMPLE 14.4 Optimal Steepest Ascent Problem Statement. Maximize the following function: f(x, y) 5 2xy 1 2x 2 x2 2 2y2 using initial guesses, x 5 21 and y 5 1. Solution. Because this function is so simple, we can first generate an analytical solu- tion. To do this, the partial derivatives can be evaluated as 0 f 0 x 5 2y 1 2 2 2x 5 0 0 f 0 y 5 2x 2 4y 5 0 This pair of equations can be solved for the optimum, x 5 2 and y 5 1. The second partial derivatives can also be determined and evaluated at the optimum, 02 f 0x2 5 22 02 f 0y2 5 24 02 f 0x0y 5 02 f 0y0x 5 2 and the determinant of the Hessian is computed [Eq. (14.3)], ZHZ 5 22(24) 2 22 5 4 Therefore, because ZHZ . 0 and 02 fy0x2 , 0, function value f(2, 1) is a maximum. Now let us implement steepest ascent. Recall that, at the end of Example 14.3, we had already implemented the initial steps of the problem by generating g(h) 5 2180h2 1 72h 2 7 Now, because this is a simple parabola, we can directly locate the maximum (that is, h 5 h*) by solving the problem, g¿(h*) 5 0 2360h* 1 72 5 0 h* 5 0.2 This means that if we travel along the h axis, g(h) reaches a minimum value when h 5 h* 5 0.2. This result can be placed back into Eqs. (14.10) and (14.11) to solve for the
  • 402. 14.2 GRADIENT METHODS 385 (x, y) coordinates corresponding to this point, x 5 21 1 6(0.2) 5 0.2 y 5 1 2 6(0.2) 5 20.2 This step is depicted in Fig. 14.11 as the move from point 0 to 1. The second step is merely implemented by repeating the procedure. First, the partial derivatives can be evaluated at the new starting point (0.2, 20.2) to give 0 f 0 x 5 2(20.2) 1 2 2 2(0.2) 5 1.2 0 f 0 y 5 2(0.2) 2 4(20.2) 5 1.2 Therefore, the gradient vector is §f 5 1.2i 1 1.2j This means that the steepest direction is now pointed up and to the right at a 458 angle with the x axis (see Fig. 14.11). The coordinates along this new h axis can now be expressed as x 5 0.2 1 1.2h y 5 20.2 1 1.2h Substituting these values into the function yields f(0.2 1 1.2h, 20.2 1 1.2h) 5 g(h) 5 21.44h2 1 2.88h 1 0.2 The step h* to take us to the maximum along the search direction can then be directly computed as g¿(h*) 5 22.88h* 1 2.88 5 0 h* 5 1 FIGURE 14.11 The method of optimal steepest ascent. 2 2 1 0 Maximum 0 –2 –1 0 2 1 3 y x 4
  • 403. 386 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION It can be shown that the method of steepest descent is linearly convergent. Further, it tends to move very slowly along long, narrow ridges. This is because the new gradient at each maximum point will be perpendicular to the original direction. Thus, the technique takes many small steps criss-crossing the direct route to the summit. Hence, although it is reliable, there are other approaches that converge much more rapidly, particularly in the vicinity of an optimum. The remainder of the section is devoted to such methods. 14.2.3 Advanced Gradient Approaches Conjugate Gradient Method (Fletcher-Reeves). In Sec. 14.1.2, we have seen how conjugate directions in Powell’s method greatly improved the efficiency of a univariate search. In a similar manner, we can also improve the linearly convergent steepest ascent using conjugate gradients. In fact, an optimization method that makes use of conjugate gradients to define search directions can be shown to be quadratically convergent. This also ensures that the method will optimize a quadratic function exactly in a finite num- ber of steps regardless of the starting point. Since most well-behaved functions can be approximated reasonably well by a quadratic in the vicinity of an optimum, quadratically convergent approaches are often very efficient near an optimum. We have seen how starting with two arbitrary search directions, Powell’s method produced new conjugate search directions. This method is quadratically convergent and does not require gradient information. On the other hand, if evaluation of derivatives is practical, we can devise algorithms that combine the ideas of steepest descent and con- jugate directions to achieve robust initial performance and rapid convergence as the technique gravitates toward the optimum. The Fletcher-Reeves conjugate gradient algo- rithm modifies the steepest-ascent method by imposing the condition that successive gradient search directions be mutually conjugate. The proof and algorithm are beyond the scope of the text but are described by Rao (1996). Newton’s Method. Newton’s method for a single variable (recall Sec. 13.3) can be extended to multivariate cases. Write a second-order Taylor series for f(x) near x 5 xi, f(x) 5 f(xi) 1 §f T (xi)(x 2 xi) 1 1 2 (x 2 xi)T Hi(x 2 xi) where Hi is the Hessian matrix. At the minimum, 0f(x) 0xj 5 0 for j 5 1, 2, p , n This result can be placed back into Eqs. (14.10) and (14.11) to solve for the (x, y) co- ordinates corresponding to this new point, x 5 0.2 1 1.2(1) 5 1.4 y 5 20.2 1 1.2(1) 5 1 As depicted in Fig. 14.11, we move to the new coordinates, labeled point 2 in the plot, and in so doing move closer to the maximum. The approach can be repeated with the final result converging on the analytical solution, x 5 2 and y 5 1.
  • 404. 14.2 GRADIENT METHODS 387 Thus, §f 5 §f(xi) 1 Hi(x 2 xi) 5 0 If H is nonsingular, xi11 5 xi 2 H21 i §f (14.14) which can be shown to converge quadratically near the optimum. This method again performs better than the steepest ascent method (see Fig. 14.12). However, note that the method requires both the computation of second derivatives and matrix inversion at each iteration. Thus, the method is not very useful in practice for functions with large numbers of variables. Furthermore, Newton’s method may not converge if the starting point is not close to the optimum. Marquardt Method. We know that the method of steepest ascent increases the func- tion value even if the starting point is far from an optimum. On the other hand, we have just described Newton’s method, which converges rapidly near the maximum. Marquardt’s method uses the steepest descent method when x is far from x*, and Newton’s method when x closes in on an optimum. This is accomplished by modifying the diagonal of the Hessian in Eq. (14.14), H ˜ i 5 Hi 1 ai I where ai is a positive constant and I is the identity matrix. At the start of the procedure, ai is assumed to be large and H ˜ 21 i 1 ai I FIGURE 14.12 When the starting point is close to the optimal point, following the gradient can be inefficient. Newton methods attempt to search along a direct path to the optimum (solid line). y x
  • 405. 388 MULTIDIMENSIONAL UNCONSTRAINED OPTIMIZATION which reduces Eq. (14.14) to the steepest ascent method. As the iterations proceed, ai approaches zero and the method becomes Newton’s method. Thus, Marquardt’s method offers the best of both worlds: it plods along reliably from poor initial starting values yet accelerates rapidly when it approaches the optimum. Unfortunately, the method still requires Hessian evaluation and matrix inversion at each step. It should be noted that the Marquardt method is primarily used for nonlinear least- squares problems. Quasi-Newton Methods. Quasi-Newton, or variable metric, methods seek to estimate the direct path to the optimum in a manner similar to Newton’s method. However, notice that the Hessian matrix in Eq. (14.14) is composed of the second derivatives of f that vary from step to step. Quasi-Newton methods attempt to avoid these difficulties by approximating H with another matrix A using only first partial derivatives of f. The approach involves starting with an initial approximation of H21 and updating and improv- ing it with each iteration. The methods are called quasi-Newton because we do not use the true Hessian, rather an approximation. Thus, we have two approximations at work simul- taneously: (1) the original Taylor-series approximation and (2) the Hessian approximation. There are two primary methods of this type: the Davidon-Fletcher-Powell (DFP) and the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithms. They are similar except for details concerning how they handle round-off error and convergence issues. BFGS is generally recognized as being superior in most cases. Rao (1996) provides details and formal statements of both the DFP and the BFGS algorithms. PROBLEMS 14.1 Find the directional derivative of f(x, y) 5 x2 1 2y2 at x 5 2 and y 5 2 in the direction of h 5 2i 1 3j. 14.2 Repeat Example 14.2 for the following function at the point (0.8, 1.2). f(x, y) 5 2xy 1 1.5y 2 1.25x2 2 2y2 1 5 14.3 Given f(x, y) 5 2.25xy 1 1.75y 2 1.5x2 2 2y2 Construct and solve a system of linear algebraic equations that maximizes f(x). Note that this is done by setting the partial deriva- tives of f with respect to both x and y to zero. 14.4 (a) Start with an initial guess of x 5 1 and y 5 1 and apply two ap- plications of the steepest ascent method to f(x, y) from Prob. 14.3. (b) Construct a plot from the results of (a) showing the path of the search. 14.5 Find the gradient vector and Hessian matrix for each of the following functions: (a) f(x, y) 5 2xy2 1 3exy (b) f(x, y, z) 5 x2 1 y2 1 2z2 (c) f(x, y) 5 ln(x2 1 2xy 1 3y2 ) 14.6 Find the minimum value of f(x, y) 5 (x 2 3)2 1 (y 2 2)2 starting at x 5 1 and y 5 1, using the steepest descent method with a stopping criterion of es 5 1%. Explain your results. 14.7 Perform one iteration of the steepest ascent method to locate the maximum of f(x, y) 5 4x 1 2y 1 x2 2 2x4 1 2xy 2 3y2 using initial guesses x 5 0 and y 5 0. Employ bisection to find the optimal step size in the gradient search direction. 14.8 Perform one iteration of the optimal gradient steepest descent method to locate the minimum of f(x, y) 5 28x 1 x2 1 12y 1 4y2 2 2xy using initial guesses x 5 0 and y 5 0. 14.9 Develop a program using a programming or macro language to implement the random search method. Design the subprogram so that it is expressly designed to locate a maximum. Test the program with f(x, y) from Prob. 14.7. Use a range of 22 to 2 for both x and y.
  • 406. PROBLEMS 389 14.10 The grid search is another brute force approach to optimiza- tion. The two-dimensional version is depicted in Fig. P14.10. The x and y dimensions are divided into increments to create a grid. The function is then evaluated at each node of the grid. The denser the grid, the more likely it would be to locate the optimum. Develop a program using a programming or macro language to implement the grid search method. Design the program so that it is expressly designed to locate a maximum. Test it with the same problem as Example 14.1. 14.11 Develop a one-dimensional equation in the pressure gradient direction at the point (4, 2). The pressure function is f(x, y) 5 6x2 y 2 9y2 2 8x2 14.12 A temperature function is f(x, y) 5 2x3 y2 2 7xy 2 x2 1 3y Develop a one-dimensional function in the temperature gradient direction at the point (1, 1). FIGURE P14.10 The grid search. 2 1 0 –5 –10 –15 –20 –25 0 0 Maximum –1 –2 1 2 3 y x
  • 407. 15 C H A P T E R 15 390 Constrained Optimization This chapter deals with optimization problems where constraints come into play. We first discuss problems where both the objective function and the constraints are linear. For such cases, special methods are available that exploit the linearity of the underlying functions. Called linear programming methods, the resulting algorithms solve very large problems with thousands of variables and constraints with great efficiency. They are used in a wide range of problems in engineering and management. Then we will turn briefly to the more general problem of nonlinear constrained optimization. Finally, we provide an overview of how software packages can be employed for optimization. 15.1 LINEAR PROGRAMMING Linear programming (LP) is an optimization approach that deals with meeting a desired objective such as maximizing profit or minimizing cost in the presence of constraints such as limited resources. The term linear connotes that the mathematical functions representing both the objective and the constraints are linear. The term programming does not mean “computer programming,” but rather, connotes “scheduling” or “setting an agenda” (Revelle et al., 1997). 15.1.1 Standard Form The basic linear programming problem consists of two major parts: the objective function and a set of constraints. For a maximization problem, the objective function is generally expressed as Maximize Z 5 c1x1 1 c2x2 1 p 1 cnxn (15.1) where cj 5 payoff of each unit of the jth activity that is undertaken and xj 5 magnitude of the jth activity. Thus, the value of the objective function, Z, is the total payoff due to the total number of activities, n. The constraints can be represented generally as ai1x1 1 ai2x2 1 p 1 ainxn # bi (15.2)
  • 408. 15.1 LINEAR PROGRAMMING 391 where aij 5 amount of the ith resource that is consumed for each unit of the jth activity and bi 5 amount of the ith resource that is available. That is, the resources are limited. The second general type of constraint specifies that all activities must have a positive value, xi $ 0 (15.3) In the present context, this expresses the realistic notion that, for some problems, negative activity is physically impossible (for example, we cannot produce negative goods). Together, the objective function and the constraints specify the linear programming problem. They say that we are trying to maximize the payoff for a number of activities under the constraint that these activities utilize finite amounts of resources. Before show- ing how this result can be obtained, we will first develop an example. EXAMPLE 15.1 Setting Up the LP Problem Problem Statement. The following problem is developed from the area of chemical or petroleum engineering. However, it is relevant to all areas of engineering that deal with producing products with limited resources. Suppose that a gas-processing plant receives a fixed amount of raw gas each week. The raw gas is processed into two grades of heating gas, regular and premium quality. These grades of gas are in high demand (that is, they are guaranteed to sell) and yield different profits to the company. However, their production involves both time and on-site storage constraints. For example, only one of the grades can be produced at a time, and the facility is open for only 80 hr/week. Further, there is limited on-site storage for each of the products. All these factors are listed below (note that a metric ton, or tonne, is equal to 1000 kg): Product Resource Regular Premium Resource Availability Raw gas 7 m3 /tonne 11 m3 /tonne 77 m3 /week Production time 10 hr/tonne 8 hr/tonne 80 hr/week Storage 9 tonnes 6 tonnes Profit 150/tonne 175/tonne Develop a linear programming formulation to maximize the profits for this operation. Solution. The engineer operating this plant must decide how much of each gas to produce to maximize profits. If the amounts of regular and premium produced weekly are designated as x1 and x2, respectively, the total weekly profit can be calculated as Total profit 5 150x1 1 175x2 or written as a linear programming objective function, Maximize Z 5 150x1 1 175x2
  • 409. 392 CONSTRAINED OPTIMIZATION The constraints can be developed in a similar fashion. For example, the total raw gas used can be computed as Total gas used 5 7x1 1 11x2 This total cannot exceed the available supply of 77 m3 /week, so the constraint can be represented as 7x1 1 11x2 # 77 The remaining constraints can be developed in a similar fashion, with the resulting total LP formulation given by Maximize Z 5 150x1 1 175x2 (maximize profit) subject to 7x1 1 11x2 # 77 (material constraint) 10x1 1 8x2 # 80 (time constraint) x1 # 9 (“regular” storage constraint) x2 # 6 (“premium” storage constraint) x1,x2 $ 0 (positivity constraints) Note that the above set of equations constitute the total LP formulation. The parenthetical explanations at the right have been appended to clarify the meaning of each term. 15.1.2 Graphical Solution Because they are limited to two or three dimensions, graphical solutions have limited practical utility. However, they are very useful for demonstrating some basic concepts that underlie the general algebraic techniques used to solve higher-dimensional problems with the computer. For a two-dimensional problem, such as the one in Example 15.1, the solution space is defined as a plane with x1 measured along the abscissa and x2 along the ordinate. Because they are linear, the constraints can be plotted on this plane as straight lines. If the LP prob- lem was formulated properly (that is, it has a solution), these constraint lines will delineate a region, called the feasible solution space, encompassing all possible combinations of x1 and x2 that obey the constraints and hence represent feasible solutions. The objective func- tion for a particular value of Z can then be plotted as another straight line and superimposed on this space. The value of Z can then be adjusted until it is at the maximum value while still touching the feasible space. This value of Z represents the optimal solution. The cor- responding values of x1 and x2, where Z touches the feasible solution space, represent the optimal values for the activities. The following example should help clarify the approach. EXAMPLE 15.2 Graphical Solution Problem Statement. Develop a graphical solution for the gas-processing problem pre- viously derived in Example 15.1: Maximize Z 5 150x1 1 175x2
  • 410. 15.1 LINEAR PROGRAMMING 393 subject to 7x1 1 11x2 # 77 (1) 10x1 1 8x2 # 80 (2) x1 # 9 (3) x2 # 6 (4) x1 $ 0 (5) x2 $ 0 (6) We have numbered the constraints to identify them in the following graphical solution. Solution. First, the constraints can be plotted on the solution space. For example, the first constraint can be reformulated as a line by replacing the inequality by an equal sign and solving for x2: x2 5 2 7 11 x1 1 7 Thus, as in Fig. 15.1a, the possible values of x1 and x2 that obey this constraint fall below this line (the direction designated in the plot by the small arrow). The other constraints can be evaluated similarly, as superimposed on Fig. 15.1a. Notice how they encompass a region where they are all met. This is the feasible solution space (the area ABCDE in the plot). Aside from defining the feasible space, Fig. 15.1a also provides additional insight. In particular, we can see that constraint 3 (storage of regular gas) is “redundant.” That is, the feasible solution space is unaffected if it were deleted. FIGURE 15.1 Graphical solution of a linear programming problem. (a) The constraints define a feasible solution space. (b) The objective function can be increased until it reaches the highest value that obeys all constraints. Graphically, the function moves up and to the right until it touches the feasible space at a single optimal point. (b) 0 8 4 x1 x2 8 A B C D E Z ⫽ 0 Z ⫽ 600 Z ⫽ 1400 (a) 0 8 4 4 x1 Redundant 4 x2 8 A F B C D E 3 6 5 1 2
  • 411. 394 CONSTRAINED OPTIMIZATION Next, the objective function can be added to the plot. To do this, a value of Z must be chosen. For example, for Z 5 0, the objective function becomes 0 5 150x1 1 175x2 or, solving for x2, we derive the line x2 5 2 150 175 x1 As displayed in Fig. 15.1b, this represents a dashed line intersecting the origin. Now, since we are interested in maximizing Z, we can increase it to say, 600, and the objective function is x2 5 600 175 2 150 175 x1 Thus, increasing the value of the objective function moves the line away from the origin. Because the line still falls within the solution space, our result is still feasible. For the same reason, however, there is still room for improvement. Hence, Z can keep increasing until a further increase will take the objective beyond the feasible region. As shown in Fig. 15.1b, the maximum value of Z corresponds to approximately 1400. At this point, x1 and x2 are equal to approximately 4.9 and 3.9, respectively. Thus, the graphical solu- tion tells us that if we produce these quantities of regular and premium, we will reap a maximum profit of about 1400. Aside from determining optimal values, the graphical approach provides further insights into the problem. This can be appreciated by substituting the answers back into the constraint equations, 7(4.9) 1 11(3.9) 77 10(4.9) 1 8(3.9) 80 4.9 # 9 3.9 # 6 Consequently, as is also clear from the plot, producing at the optimal amount of each product brings us right to the point where we just meet the resource (1) and time con- straints (2). Such constraints are said to be binding. Further, as is also evident graphically, neither of the storage constraints [(3) and (4)] acts as a limitation. Such constraints are called nonbinding. This leads to the practical conclusion that, for this case, we can increase profits by either increasing our resource supply (the raw gas) or increasing our production time. Further, it indicates that increasing storage would have no impact on profit. The result obtained in the previous example is one of four possible outcomes that can be generally obtained in a linear programming problem. These are 1. Unique solution. As in the example, the maximum objective function intersects a single point. 2. Alternate solutions. Suppose that the objective function in the example had coefficients so that it was precisely parallel to one of the constraints. In our example problem,
  • 412. 15.1 LINEAR PROGRAMMING 395 one way in which this would occur would be if the profits were changed to $140/ tonne and $220/tonne. Then, rather than a single point, the problem would have an infinite number of optima corresponding to a line segment (Fig. 15.2a). 3. No feasible solution. As in Fig. 15.2b, it is possible that the problem is set up so that there is no feasible solution. This can be due to dealing with an unsolvable problem or due to errors in setting up the problem. The latter can result if the problem is over-constrained to the point that no solution can satisfy all the constraints. 4. Unbounded problems. As in Fig. 15.2c, this usually means that the problem is under- constrained and therefore open-ended. As with the no-feasible-solution case, it can often arise from errors committed during problem specification. Now let us suppose that our problem involves a unique solution. The graphical approach might suggest an enumerative strategy for hunting down the maximum. From Fig. 15.1, it should be clear that the optimum always occurs at one of the corner points where two constraints meet. Such a point is known formally as an extreme point. Thus, out of the infinite number of possibilities in the decision space, focusing on extreme points clearly narrows down the possible options. Further, we can recognize that not every extreme point is feasible, that is, satisfying all constraints. For example, notice that point F in Fig. 15.1a is an extreme point but is not feasible. Limiting ourselves to feasible extreme points narrows the field down still further. Finally, once all feasible extreme points are identified, the one yielding the best value of the objective function represents the optimum solution. Finding this optimal solution could be done by exhaustively (and inefficiently) evaluating the value of the objective function at every feasible extreme point. The following section discusses the simplex method, which offers a preferable strategy that charts a selective course through a sequence of feasible extreme points to arrive at the optimum in an extremely efficient manner. FIGURE 15.2 Aside from a single optimal solution (for example, Fig. 15.1b), there are three other possible outcomes of a linear programming problem: (a) alternative optima, (b) no feasible solution, and (c) an unbounded result. (b) 0 x1 x2 (a) 0 x1 x2 (c) 0 x1 x2 Z
  • 413. 396 CONSTRAINED OPTIMIZATION 15.1.3 The Simplex Method The simplex method is predicated on the assumption that the optimal solution will be an extreme point. Thus, the approach must be able to discern whether during problem solution an extreme point occurs. To do this, the constraint equations are reformulated as equalities by introducing what are called slack variables. Slack Variables. As the name implies, a slack variable measures how much of a constrained resource is available, that is, how much “slack” of the resource is available. For example, recall the resource constraint used in Examples 15.1 and 15.2, 7x1 1 11x2 # 77 We can define a slack variable S1 as the amount of raw gas that is not used for a particular production level (x1, x2). If this quantity is added to the left side of the constraint, it makes the relationship exact, 7x1 1 11x2 1 S1 5 77 Now recognize what the slack variable tells us. If it is positive, it means that we have some “slack” for this constraint. That is, we have some surplus resource that is not being fully utilized. If it is negative, it tells us that we have exceeded the constraint. Finally, if it is zero, we exactly meet the constraint. That is, we have used up all the allowable resource. Since this is exactly the condition where constraint lines intersect, the slack variable provides a means to detect extreme points. A different slack variable is developed for each constraint equation, resulting in what is called the fully augmented version, Maximize Z 5 150x1 1 175x2 subject to (15.4a) (15.4b) (15.4c) (15.4d) x1, x2, S1, S2, S3, S4 $ 0 Notice how we have set up the four equality equations so that the unknowns are aligned in columns. We did this to underscore that we are now dealing with a system of linear algebraic equations (recall Part Three). In the following section, we will show how these equations can be used to determine extreme points algebraically. Algebraic Solution. In contrast to Part Three, where we had n equations with n un- knowns, our example system [Eqs. (15.4)] is underspecified or underdetermined, that is, it has more unknowns than equations. In general terms, there are n structural variables (that is, the original unknowns), m surplus or slack variables (one per constraint), and n 1 m total variables (structural plus surplus). For the gas production problem we have 2 structural variables, 4 slack variables, and 6 total variables. Thus, the problem involves solving 4 equations with 6 unknowns. 7x1 1 11x2 1 S1 10x1 1 8x2 1 S2 x1 1 S3 x2 1 S4 5 77 5 80 5 9 5 6
  • 414. 15.1 LINEAR PROGRAMMING 397 The difference between the number of unknowns and the number of equations (equal to 2 for our problem) is directly related to how we can distinguish a feasible extreme point. Specifically, every feasible point has 2 variables out of 6 equal to zero. For ex- ample, the five corner points of the area ABCDE have the following zero values: Extreme Point Zero Variables A x1, x2 B x2, S2 C S1, S2 D S1, S4 E x1, S4 This observation leads to the conclusion that the extreme points can be determined from the standard form by setting two of the variables equal to zero. In our example, this reduces the problem to a solvable form of 4 equations with 4 unknowns. For example, for point E, setting x1 5 S4 5 0 reduces the standard form to 11x2 1 S1 8x2 1 S2 1 S3 x2 5 77 5 80 5 9 5 6 which can be solved for x2 5 6, S1 5 11, S2 5 32, and S3 5 9. Together with x1 5 S4 5 0, these values define point E. To generalize, a basic solution for m linear equations with n unknowns is devel- oped by setting n 2 m variables to zero, and solving the m equations for the m remain- ing unknowns. The zero variables are formally referred to as nonbasic variables, whereas the remaining m variables are called basic variables. If all the basic variables are nonnegative, the result is called a basic feasible solution. The optimum will be one of these. Now a direct approach to determining the optimal solution would be to calculate all the basic solutions, determine which were feasible, and among those, which had the highest value of Z. There are two reasons why this is not a wise approach. First, for even moderately sized problems, the approach can involve solving a great number of equations. For m equations with n unknowns, this results in solving Cn m 5 n! m!(n 2 m)! simultaneous equations. For example, if there are 10 equations (m 5 10) with 16 un- knowns (n 5 16), you would have 8008 [5 16!y(10! 6!)] 10 3 10 systems of equations to solve! Second, a significant portion of these may be infeasible. For example, in the present problem, out of C4 6 5 15 extreme points, only 5 are feasible. Clearly, if we could avoid solving all these unnecessary systems, a more efficient algorithm would be developed. Such an approach is described next.
  • 415. 398 CONSTRAINED OPTIMIZATION Simplex Method Implementation. The simplex method avoids inefficiencies outlined in the previous section. It does this by starting with a basic feasible solution. Then it moves through a sequence of other basic feasible solutions that successively improve the value of the objective function. Eventually, the optimal value is reached and the method is terminated. We will illustrate the approach using the gas-processing problem from Examples 15.1 and 15.2. The first step is to start at a basic feasible solution (that is, at an extreme corner point of the feasible space). For cases like ours, an obvious starting point would be point A; that is, x1 5 x2 5 0. The original 6 equations with 4 unknowns become S1 5 77 S2 5 80 S3 5 9 S4 5 6 Thus, the starting values for the basic variables are given automatically as being equal to the right-hand sides of the constraints. Before proceeding to the next step, the beginning information can now be sum- marized in a convenient tabular format called a tableau. As shown below, the tableau provides a concise summary of the key information constituting the linear programming problem. Basic Z x1 x2 S1 S2 S3 S4 Solution Intercept Z 1 2150 2175 0 0 0 0 0 S1 0 7 11 1 0 0 0 77 11 S2 0 10 8 0 1 0 0 80 8 S3 0 1 0 0 0 1 0 9 9 S4 0 0 1 0 0 0 1 6 ` Notice that for the purposes of the tableau, the objective function is expressed as Z 2 150x1 2 175x2 2 0S1 2 0S2 2 0S3 2 0S4 5 0 (15.5) The next step involves moving to a new basic feasible solution that leads to an improvement of the objective function. This is accomplished by increasing a current nonbasic variable (at this point, x1 or x2) above zero so that Z increases. Recall that, for the present example, extreme points must have 2 zero values. Therefore, one of the cur- rent basic variables (S1, S2, S3, or S4) must also be set to zero. To summarize this important step: one of the current nonbasic variables must be made basic (nonzero). This variable is called the entering variable. In the process, one of the current basic variables is made nonbasic (zero). This variable is called the leaving variable. Now, let us develop a mathematical approach for choosing the entering and leav- ing variables. Because of the convention by which the objective function is written [(Eq. (15.5)], the entering variable can be any variable in the objective function having a negative coefficient (because this will make Z bigger). The variable with the largest negative value is conventionally chosen because it usually leads to the largest increase
  • 416. 15.1 LINEAR PROGRAMMING 399 in Z. For our case, x2 would be the entering variable since its coefficient, 2175, is more negative than the coefficient of x1, 2150. At this point the graphical solution can be consulted for insight. As in Fig. 15.3, we start at the initial point A. Based on its coefficient, x2 should be chosen to enter. However, to keep the present example brief, we choose x1 since we can see from the graph that this will bring us to the maximum quicker. Next, we must choose the leaving variable from among the current basic variables— S1, S2, S3, or S4. Graphically, we can see that there are two possibilities. Moving to point B will drive S2 to zero, whereas moving to point F will drive S1 to zero. However, the graph also makes it clear that F is not possible because it lies outside the feasible solu- tion space. Thus, we decide to move from A to B. How is the same result detected mathematically? One way is to calculate the values at which the constraint lines intersect the axis or line corresponding to the entering variable (in our case, the x1 axis). We can calculate this value as the ratio of the right- hand side of the constraint (the “Solution” column of the tableau) to the corresponding coefficient of x1. For example, for the first constraints slack variable S1, the result is Intercept 5 77 7 5 11 The remaining intercepts can be calculated and listed as the last column of the tableau. Because 8 is the smallest positive intercept, it means that the second constraint line will be reached first as x1 is increased. Hence, S2 should be the leaving variable. FIGURE 15.3 Graphical depiction of how the simplex method successively moves through feasible basic solu- tions to arrive at the optimum in an efficient manner. 0 8 4 4 x1 4 1 x2 8 2 A F B C D E 3
  • 417. 400 CONSTRAINED OPTIMIZATION At this point, we have moved to point B (x2 5 S2 5 0), and the new basic solution becomes 7x1 1 S1 5 77 10x1 5 80 x1 1 S3 5 9 S4 5 6 The solution of this system of equations effectively defines the values of the basic vari- ables at point B: x1 5 8, S1 5 21, S3 5 1, and S4 5 6. The tableau can be used to make the same calculation by employing the Gauss- Jordan method. Recall that the basic strategy behind Gauss-Jordan involved converting the pivot element to 1 and then eliminating the coefficients in the same column above and below the pivot element (recall Sec. 9.7). For this example, the pivot row is S2 (the leaving variable) and the pivot element is 10 (the coefficient of the entering variable, x1). Dividing the row by 10 and replacing S2 by x1 gives Basic Z x1 x2 S1 S2 S3 S4 Solution Intercept Z 1 2150 2175 0 0 0 0 0 S1 0 7 11 1 0 0 0 77 x1 0 1 0.8 0 0.1 0 0 8 S3 0 1 0 0 0 1 0 9 S4 0 0 1 0 0 0 1 6 Next, the x1 coefficients in the other rows can be eliminated. For example, for the objective function row, the pivot row is multiplied by 2150 and the result subtracted from the first row to give Z x1 x2 S1 S2 S3 S4 Solution 1 2150 2175 0 0 0 0 0 20 2(2150) 2(2120) 20 2(215) 0 0 2(21200) 1 0 255 0 15 0 0 1200 Similar operations can be performed on the remaining rows to give the new tableau, Basic Z x1 x2 S1 S2 S3 S4 Solution Intercept Z 1 0 255 0 15 0 0 1200 S1 0 0 5.4 1 20.7 0 0 21 3.889 x1 0 1 0.8 0 0.1 0 0 8 10 S3 0 0 20.8 0 20.1 1 0 1 21.25 S4 0 0 1 0 0 0 1 6 6
  • 418. 15.2 NONLINEAR CONSTRAINED OPTIMIZATION 401 Thus, the new tableau summarizes all the information for point B. This includes the fact that the move has increased the objective function to Z 5 1200. This tableau can then be used to chart our next, and in this case final, step. Only one more variable, x2, has a negative value in the objective function, and it is therefore chosen as the entering variable. According to the intercept values (now calculated as the solution column over the coefficients in the x2 column), the first constraint has the small- est positive value, and therefore, S1 is selected as the leaving variable. Thus, the simplex method moves us from points B to C in Fig. 15.3. Finally, the Gauss-Jordan elimination can be implemented to solve the simultaneous equations. The result is the final tableau, Basic Z x1 x2 S1 S2 S3 S4 Solution Z 1 0 0 10.1852 7.8704 0 0 1413.889 x2 0 0 1 0.1852 20.1296 0 0 3.889 x1 0 1 0 20.1481 0.2037 0 0 4.889 S3 0 0 0 0.1481 20.2037 1 0 4.111 S4 0 0 0 20.1852 0.1296 0 1 2.111 We know that the result is final because there are no negative coefficients remaining in the objective function row. The final solution is tabulated as x1 5 3.889 and x2 5 4.889, which give a maximum objective function of Z 5 1413.889. Further, because S3 and S4 are still in the basis, we know that the solution is limited by the first and second constraints. 15.2 NONLINEAR CONSTRAINED OPTIMIZATION There are a number of approaches for handling nonlinear optimization problems in the presence of constraints. These can generally be divided into indirect and direct ap- proaches (Rao, 1996). A typical indirect approach uses so-called penalty functions. These involve placing additional expressions to make the objective function less optimal as the solution approaches a constraint. Thus, the solution will be discouraged from violating constraints. Although such methods can be useful in some problems, they can become arduous when the problem involves many constraints. The generalized reduced gradient (GRG) search method is one of the more popular of the direct methods (for details, see Fylstra et al., 1998; Lasdon et al., 1978; Lasdon and Smith, 1992). It is, in fact, the nonlinear method used within the Excel Solver. It first “reduces” the problem to an unconstrained optimization problem. It does this by solving a set of nonlinear equations for the basic variables in terms of the nonbasic variables. Then, the unconstrained problem is solved using approaches similar to those described in Chap. 14. First, a search direction is chosen along which an improvement in the objective function is sought. The default choice is a quasi-Newton approach (BFGS) that, as described in Chap. 14, requires storage of an approximation of the Hessian matrix. This approach performs very well for most cases. The conjugate gradient approach is also available in Excel as an alternative for large problems. The Excel Solver has the nice feature that it automatically switches to the conjugate gradient method, depending on available storage. Once the search direction is established, a one-dimensional search is carried out along that direction using a variable step-size approach.
  • 419. 402 CONSTRAINED OPTIMIZATION S O F T W A R E 15.3 OPTIMIZATION WITH SOFTWARE PACKAGES Software packages have great capabilities for optimization. In this section, we will give you an introduction to some of the more useful ones. 15.3.1 Excel for Linear Programming There are a variety of software packages expressly designed to implement linear program- ming. However, because of its broad availability, we will focus on the Excel spreadsheet. This involves using the Solver option previously employed in Chap. 7 for root location. The manner in which Solver is used for linear programming is similar to our previ- ous applications in that these data are entered into spreadsheet cells. The basic strategy is to arrive at a single cell that is to be optimized as a function of variations of other cells on the spreadsheet. The following example illustrates how this can be done for the gas-processing problem. EXAMPLE 15.3 Using Excel’s Solver for a Linear Programming Problem Problem Statement. Use Excel to solve the gas-processing problem we have been examining in this chapter. Solution. An Excel worksheet set up to calculate the pertinent values in the gas- processing problem is shown in Fig. 15.4. The unshaded cells are those containing numeric and labeling data. The shaded cells involve quantities that are calculated based on other cells. Recognize that the cell to be maximized is D12, which contains the total profit. The cells to be varied are B4:C4, which hold the amounts of regular and premium gas produced. FIGURE 15.4 Excel spreadsheet set up to use the Solver for linear programming.
  • 420. 15.3 OPTIMIZATION WITH SOFTWARE PACKAGES 403 Once the spreadsheet is created, Solver is chosen from the Data tab (recall Sec. 7.7.1). At this point a dialogue box will be displayed, querying you for pertinent information. The pertinent cells of the Solver dialogue box are filled out as The constraints must be added one by one by selecting the “Add” button. This will open up a dialogue box that looks like As shown, the constraint that the total raw gas (cell D6) must be less than or equal to the available supply (E6) can be added as shown. After adding each constraint, the “Add” button can be selected. When all four constraints have been entered, the OK but- ton is selected to return to the Solver dialogue box. Now, before execution, the Solver options button should be selected and the box la- beled “Assume linear model” should be checked off. This will make Excel employ a ver- sion of the simplex algorithm (rather than the more general nonlinear solver it usually uses) that will speed up your application. After selecting this option, return to the Solver menu. When the OK button is se- lected, a dialogue box will open with a report on the success of the operation. For the present case, the Solver obtains the correct solution (Fig. 15.5)
  • 421. 404 CONSTRAINED OPTIMIZATION S O F T W A R E Beyond obtaining the solution, the Solver also provides some useful summary reports. We will explore these in the engineering application described in Sec. 16.2. 15.3.2 Excel for Nonlinear Optimization The manner in which Solver is used for nonlinear optimization is similar to our previous applications in that these data are entered into spreadsheet cells. Once again, the basic strategy is to arrive at a single cell that is to be optimized as a function of variations of other cells on the spreadsheet. The following example illustrates how this can be done for the parachutist problem we set up in the introduction to this part of the book (recall Example PT4.1). EXAMPLE 15.4 Using Excel‘s Solver for Nonlinear Constrained Optimization Problem Statement. Recall from Example PT4.1 that we developed a nonlinear con- strained optimization to minimize the cost for a parachute drop into a refugee camp. Parameters for this problem are Parameter Symbol Value Unit Total mass Mt 2000 kg Acceleration of gravity g 9.8 m/s2 Cost coefficient (constant) c0 200 $ Cost coefficient (length) c1 56 $/m Cost coefficient (area) c2 0.1 $/m2 Critical impact velocity vc 20 m/s Area effect on drag kc 3 kg/(s?m2 ) Initial drop height z0 500 m FIGURE 15.5 Excel spreadsheet showing solution to linear programming problem.
  • 422. 15.3 OPTIMIZATION WITH SOFTWARE PACKAGES 405 Substituting these values into Eqs. (PT4.11) through (PT4.19) gives Minimize C 5 n(200 1 56/ 1 0.1A2 ) subject to y # 20 n $ 1 where n is an integer and all other variables are real. In addition, the following quantities are defined as A 5 2pr2 / 5 12r c 5 3A m 5 Mt n (E15.4.1) t 5 root c500 2 9.8m c t 1 9.8m2 c2 (1 2 e2(cym)t )d (E15.4.2) y 5 9.8m c (1 2 e2(cym)t ) Use Excel to solve this problem for the design variables r and n that minimize cost C. Solution. Before implementation of this problem on Excel, we must first deal with the problem of determining the root in the above formulation [Eq. (E15.4.2)]. One method might be to develop a macro to implement a root-location method such as bisection or the secant method. (Note that we will illustrate how this is done in the next chapter in Sec. 16.3.) For the time being, an easier approach is possible by developing the following fixed- point iteration solution to Eq. (E15.4.2), ti11 5 c500 1 9.8m2 c2 (1 2 e2(cym)ti )d c 9.8m (E15.4.3) Thus, t can be adjusted until Eq. (E15.4.3) is satisfied. It can be shown that for the range of parameters used in the present problem, this formula always converges. Now, how can this equation be solved on a spreadsheet? As shown below, two cells can be set up to hold a value for t and for the right-hand side of Eq. (E15.4.3) [that is, f(t)].
  • 423. 406 CONSTRAINED OPTIMIZATION S O F T W A R E You can type Eq. (E15.4.3) into cell B21 so that it gets its time value from cell B20 and the other parameter values from cells elsewhere on the sheet (see below for how we set up the whole sheet). Then go to cell B20 and point its value to cell B21. Once you enter these formulations, you will immediately get the error message: “Cannot resolve circular references” because B20 depends on B21 and vice versa. Now, go to the Tools/Options selections from the menu and select calculation. From the cal- culation dialogue box, check off “iteration” and hit “OK.” Immediately the spreadsheet will iterate these cells and the result will come out as FIGURE 15.6 Excel spreadsheet set up for the nonlinear parachute optimization problem. Thus, the cells will converge on the root. If you want to make it more precise, just strike the F9 key to make it iterate some more (the default is 100 iterations, which you can change if you wish). An Excel worksheet to calculate the pertinent values can then be set up as shown in Fig. 15.6. The unshaded cells are those containing numeric and labeling data. The
  • 424. 15.3 OPTIMIZATION WITH SOFTWARE PACKAGES 407 shaded cells involve quantities that are calculated based on other cells. For example, the mass in B17 was computed with Eq. (E15.4.1) based on the values for Mt (B4) and n (E5). Note also that some cells are redundant. For example, cell E11 points back to cell E5. The information is repeated in cell E11 so that the structure of the constraints is evident from the sheet. Finally, recognize that the cell to be minimized is E15, which contains the total cost. The cells to be varied are E4:E5, which hold the radius and the number of parachutes. Once the spreadsheet is created, the selection Solver is chosen from the Data tab. At this point a dialogue box will be displayed, querying you for pertinent information. The pertinent cells of the Solver dialogue box would be filled out as The constraints must be added one by one by selecting the “Add” button. This will open up a dialogue box that looks like As shown, the constraint that the actual impact velocity (cell E10) must be less than or equal to the required velocity (G10) can be added as shown. After adding each con- straint, the “Add” button can be selected. Note that the down arrow allows you to choose among several types of constraints (,5, .5, 5, and integer). Thus, we can force the number of parachutes (E5) to be an integer. When all three constraints have been entered, the “OK” button is selected to return to the Solver dialogue box. After selecting this option return to the Solver menu. When the “OK” button is selected, a dialogue box will open with a report on the success of the operation. For the present case, the Solver obtains the correct solution as in Fig. 15.7.
  • 425. 408 CONSTRAINED OPTIMIZATION Thus, we determine that the minimum cost of $4377.26 will occur if we break the load up into six parcels with a chute radius of 2.944 m. Beyond obtaining the solution, the Solver also provides some useful summary reports. We will explore these in the engineering application described in Sec. 16.2. FIGURE 15.7 Excel spreadsheet showing the solution for the nonlinear parachute optimization problem. 15.3.3 MATLAB As summarized in Table 15.1, MATLAB software has a variety of built-in functions to perform optimization. The following examples illustrates how they can be used. TABLE 15.1 MATLAB functions to implement optimization. Function Description fminbnd Minimize function of one variable with bound constraints fminsearch Minimize function of several variables
  • 426. 15.3 OPTIMIZATION WITH SOFTWARE PACKAGES 409 EXAMPLE 15.5 Using MATLAB for One-Dimensional Optimization Problem Statement. Use the MATLAB fminbnd function to find the maximum of f(x) 5 2 sin x 2 x2 2 within the interval xl 5 0 and xu 5 4. Recall that in Chap. 13, we used several methods to solve this problem for x 5 1.7757 and f(x) 5 1.4276. Solution. First, we must create an M-file to hold the function. function f=fx(x) f = −(2*sin(x)−x^2/10) Because we are interested in maximization, we enter the negative of the function. Then, we invoke the fminbnd function with x=fminbnd('fx',0,4) The result is f = −1.7757 x = 1.4275 Note that additional arguments can be included. One useful addition is to set optimiza- tion options such as error tolerance or maximum iterations. This is done with the optimset function, which was used previously in Example 7.6 and has the general format, optimset('param1',value1,'param2',value2,...) where parami is a parameter specifying the type of option and valuei is the value assigned to that option. For example, if you wanted to set the tolerance at 1 31022 , optimset('TolX',le–2) Thus, solving the present problem to a tolerance of 1 3 1022 can be generated with fminbnd('fx',0,4,optimset('TolX',le–2)) with the result f = −1.7757 ans = 1.4270
  • 427. 410 CONSTRAINED OPTIMIZATION S O F T W A R E A complete set of parameters can be found by invoking Help as in Help optimset MATLAB has a variety of capabilities for dealing with multidimensional functions. Recall from Chap. 13 that our visual image of a one-dimensional search was like a roller coaster. For two-dimensional cases, the image becomes that of mountains and valleys. As in the following example, MATLAB’s graphic capabilities provide a handy means to visualize such functions. EXAMPLE 15.6 Visualizing a Two-Dimensional Function Problem Statement. Use MATLAB’s graphical capabilities to display the following function and visually estimate its minimum in the range 22 # x1 # 0 and 0 # x2 # 3: f(x1, x2) 5 2 1 x1 2 x2 1 2x2 1 1 2x1x2 1 x2 2 Solution. The following script generates contour and mesh plots of the function: x=linspace(−2,0,40);y=linspace(0,3,40); [X,Y] = meshgrid(x,y); Z=2+X−Y+2*X.^2+2*X.*Y+Y.^2; subplot(1,2,1); cs=contour(X,Y,Z);clabel(cs); xlabel('x_1');ylabel('x_2'); title('(a) Contour plot');grid; subplot(1,2,2); cs=surfc(X,Y,Z); zmin=floor(min(Z)); zmax=ceil(max(Z)); xlabel('x_1');ylabel('x_2');zlabel('f(x_1,x_2)'); title('(b) Mesh plot'); As displayed in Fig. 15.8, both plots indicate that function has a minimum value of about f(x1, x2) 5 0 to 1 located at about x1 5 21 and x2 5 1.5. Standard MATLAB has a function fminsearch that can be used to determine the minimum of a multidimensional function. It is based on the Nelder-Mead method, which is a direct-search method that uses only function values (does not require derivatives) and handles nonsmooth objective functions. A simple expression of its syntax is [xmin, fval] = fminsearch(function,x1,x2) where xmin and fval are the location and value of the minimum, function is the name of the function being evaluated, and x1 and x2 are the bounds of the interval being searched.
  • 428. 15.3 OPTIMIZATION WITH SOFTWARE PACKAGES 411 FIGURE 15.8 (a) Contour and (b) mesh plots of a two-dimensional function. x1 (x 1 , x 2 ) x1 (a) Contour plot (b) Mesh plot x2 x 2 EXAMPLE 15.7 Using MATLAB for Multidimensional Optimization Problem Statement. Use the MATLAB fminsearch function to find the maximum for the simple function we just graphed in Example 15.6. f(x1, x2) 5 2 1 x1 2 x2 1 2x2 1 1 2x1x2 1 x2 2 Employ initial guesses of x 5 20.5 and y 5 0.5. Solution. We can invoke the fminsearch function with f=@(x) 2+x(1)−x(2)+2*x(1)^2+2*x(1)*x(2)+x(2)^2; [x,fval]=fminsearch(f,[−0.5,0.5]) x = −1.0000 1.5000 fval = 0.7500 Just as with fminbnd, arguments can be included in order to specify additional param- eters of the optimization process. For example, the optimset function can be used to limit the maximum number of iterations [x,fval]=fminsearch(f,[−0.5,0.5],optimset('MaxIter',2))
  • 429. 412 CONSTRAINED OPTIMIZATION with the result Exiting: Maximum number of iterations has been exceeded − increase MaxIter option. Current function value: 1.225625 x = −0.5000 0.5250 fval = 1.2256 Thus, because we have set a very stringent limit on the iterations, the optimization ter- minates well before the maximum is reached. 15.3.4 Mathcad Mathcad contains a numeric mode function called Find that can be used to solve up to 50 simultaneous nonlinear algebraic equations with inequality constraints. The use of this function for unconstrained applications was described in Part Two. If Find fails to locate a solution that satisfies the equations and constraints, it returns the error message “did not find solution.” However, Mathcad also contains a similar function called Minerr. This function gives solution results that minimize the errors in the constraints even when exact solutions cannot be found. This function solves equations and accommodates sev- eral constraints using the Levenberg-Marquardt method taken from the public-domain MINPACK algorithms developed and published by the Argonne National Laboratory. Let’s develop an example where Find is used to solve a system of nonlinear equa- tions with constraints. Initial guesses of x 5 21 and y 5 1 are input using the definition symbol as shown in Fig. 15.9. The word Given then alerts Mathcad that what follows is a system of equations. Then we can enter the equations and the inequality constraint. Note that for this application, Mathcad requires the use of a symbolic equal sign (typed as [Ctrl]5) and . to separate the left and right sides of an equation. Now the vector consisting of xval and yval is computed using Find (x,y) and the values are shown using an equal sign. A graph that displays the equations and constraints as well as the solution can be placed on the worksheet by clicking to the desired location. This places a red crosshair at that location. Then use the Insert/Graph/X-Y Plot pull-down menu to place an empty plot on the worksheet with placeholders for the expressions to be graphed and for the ranges of the x and y axes. Four variables are plotted on the y axis as shown: the top and bottom halves of the equation for the circle, the linear function, and a vertical line to represent the x . 2 constraint. In addition, the solution is included as a point. Once the graph has been created, you can use the Format/Graph/X-Y Plot pull-down menu to vary the type of graph; change the color, type, and weight of the trace of the function; and add titles, labels, and other features. The graph and the numerical values for xval and yval nicely portray the solution as the intersection of the circle and the line in the region where x . 2.
  • 430. PROBLEMS 413 FIGURE 15.9 Mathcad screen for a nonlinear constrained optimization problem. PROBLEMS 15.1 A company makes two types of products, A and B. These products are produced during a 40-hr work week and then shipped out at the end of the week. They require 20 and 5 kg of raw material per kg of product, respectively, and the company has access to 9500 kg of raw material per week. Only one product can be created at a time with production times for each of 0.04 and 0.12 hr, respectively. The plant can only store 550 kg of total product per week. Finally, the company makes profits of $45 and $20 on each unit of A and B, respectively. Each unit of product is equivalent to a kg. (a) Set up the linear programming problem to maximize profit. (b) Solve the linear programming problem graphically. (c) Solve the linear programming problem with the simplex method. (d) Solve the problem with a software package. (e) Evaluate which of the following options will raise profits the most: increasing raw material, storage, or production time. 15.2 Suppose that for Example 15.1, the gas-processing plant decides to produce a third grade of product with the following characteristics: Supreme Raw gas 15 m3 /tonne Production time 12 hr/tonne Storage 5 tonnes Profit $250/tonne In addition, suppose that a new source of raw gas has been discov- ered so that the total available is doubled to 154 m3 /week. (a) Set up the linear programming problem to maximize profit. (b) Solve the linear programming problem with the simplex method. (c) Solve the problem with a software package. (d) Evaluate which of the following options will raise profits the most: increasing raw material, storage, or production time. 15.3 Consider the linear programming problem: Maximize f(x, y) 5 1.75x 1 1.25y
  • 431. 414 CONSTRAINED OPTIMIZATION 15.7 Consider the following constrained nonlinear optimization problem: Minimize f(x, y) 5 (x 2 3)2 1 (y 2 3)2 subject to x 1 2y 5 4 (a) Use a graphical approach to estimate the solution. (b) Use a software package (for example, Excel) to obtain a more accurate estimate. 15.8 Use a software package to determine the maximum of f(x, y) 5 2.25xy 1 1.75y 2 1.5x2 2 2y2 15.9 Use a software package to determine the maximum of f(x, y) 5 4x 1 2y 1 x2 2 2x4 1 2xy 2 3y2 15.10 Given the following function, f(x, y) 5 28x 1 x2 1 12y 1 4y2 2 2xy use a software package to determine the minimum: (a) Graphically. (b) Numerically. (c) Substitute the result of (b) back into the function to determine the minimum f(x, y). (d) Determine the Hessian and its determinant, and substitute the result of part (b) back into the latter to verify that a minimum has been detected. 15.11 You are asked to design a covered conical pit to store 50 m3 of waste liquid. Assume excavation costs at $100ym3 , side lining costs at $50ym2 , and cover cost at 25ym2 . Determine the dimen- sions of the pit that minimize cost (a) if the side slope is uncon- strained and (b) if the side slope must be less than 458. 15.12 An automobile company has two versions of the same model car for sale, a two-door coupe and the full-size four door. (a) Graphically solve how many cars of each design should be produced to maximize profit and what that profit is. (b) Solve the same problem with Excel. Two Door Four Door Availability Profit $13,500/car $15,000/car Production time 15 h/car 20 h/car 8000 h/year Storage 400 cars 350 cars Consumer demand 700/car 500/car 240,000 cars 15.13 Og is the leader of the surprisingly mathematically ad- vanced, though technologically run-of-the-mill, Calm Waters cave- man tribe. He must decide on the number of stone clubs and stone axes to be produced for the upcoming battle against the neighboring subject to 1.2x 1 2.25y # 14 x 1 1.1y # 8 2.5x 1 y # 9 x $ 0 y $ 0 Obtain the solution: (a) Graphically. (b) Using the simplex method. (c) Using an appropriate software package (for example, Excel, MATLAB, or Mathcad). 15.4 Consider the linear programming problem: Maximize f(x, y) 5 6x 1 8y subject to 5x 1 2y # 40 6x 1 6y # 60 2x 1 4y # 32 x $ 0 y $ 0 Obtain the solution: (a) Graphically. (b) Using the simplex method. (c) Using an appropriate software package (for example, Excel). 15.5 Use a software package (for example, Excel, MATLAB, Mathcad) to solve the following constrained nonlinear optimization problem: Maximize f(x, y) 5 1.2x 1 2y 2 y3 subject to 2x 1 y # 2 x $ 0 y $ 0 15.6 Use a software package (for example, Excel, MATLAB, Mathcad) to solve the following constrained nonlinear optimization problem: Maximize f(x, y) 5 15x 1 15y subject to x2 1 y2 # 1 x 1 2y # 2.1 x $ 0 y $ 0
  • 432. PROBLEMS 415 • Check whether the guesses bracket a maximum. If not, the func- tion should not implement the algorithm, but should return an error message. • Iterate until the relative error falls below a stopping criterion or exceeds a maximum number of iterations. • Return both the optimal x and f(x). • Use a bracketing approach (as in Example 13.2) to replace old values with new values. 15.17 The length of the longest ladder that can negotiate the corner depicted in Fig. P15.17 can be determined by computing the value of u that minimizes the following function: L(u) 5 w1 sin u 1 w2 sin(p 2 a 2 u) For the case where w1 5 w2 5 2 m, use a numerical method (in- cluding software) to develop a plot of L versus a range of a’s from 458 to 1358. Peaceful Sunset tribe. Experience has taught him that each club is good for, on the average, 0.45 kills and 0.65 maims, while each axe produces 0.70 kills and 0.35 maims. Production of a club requires 5.1 lb of stone and 2.1 man-hours of labor while an axe requires 3.2 lb of stone and 4.3 man-hours of labor. Og’s tribe has 240 lb of stone available for weapons production, and a total of 200 man-hours of labor available before the expected time of this battle (that Og is sure will end war for all time). Og values a kill as worth two maims in quantifying the damage inflicted on the enemy, and he wishes to produce that mix of weapons that will maximize damage. (a) Formulate this as a linear programming problem. Make sure to define your decision variables. (b) Represent this problem graphically, making sure to identify all the feasible corner points and the infeasible corner points. (c) Solve the problem graphically. (d) Solve the problem using the computer. 15.14 Develop an M-file that is expressly designed to locate a maximum with the golden-section search algorithm. In other words, set it up so that it directly finds the maximum rather than finding the minimum of 2f(x). Test your program with the same problem as Example 13.1. The function should have the following features: • Iterate until the relative error falls below a stopping criterion or exceeds a maximum number of iterations. • Return both the optimal x and f(x). 15.15 Develop an M-file to locate a minimum with the golden- section search. Rather than using the standard stopping criteria (as in Fig. 13.5), determine the number of iterations needed to attain a desired tolerance. 15.16 Develop an M-file to implement parabolic interpolation to locate a minimum. Test your program with the same problem as Example 13.2. The function should have the following features: • Base it on two initial guesses, and have the program generate the third initial value at the midpoint of the interval. FIGURE P15.17 A ladder negotiating a corner formed by two hallways. w2 a q w1 L
  • 433. 16 C H A P T E R 16 416 Case Studies: Optimization The purpose of this chapter is to use the numerical procedures discussed in Chaps. 13 through 15 to solve actual engineering problems involving optimization. These prob- lems are important because engineers are often called upon to come up with the “best” solution to a problem. Because many of these cases involve complex systems and interactions, numerical methods and computers are often a necessity for developing optimal solutions. The following applications are typical of those that are routinely encountered during upper-class and graduate studies. Furthermore, they are representative of problems you will address professionally. The problems are drawn from the major discipline areas of engineering: chemical/bio, civil/environmental, electrical, and mechanical/aerospace. The first application, taken from chemical/bio engineering, deals with using nonlin- ear constrained optimization to design an optimal cylindrical tank. The Excel Solver is used to develop the solution. Next, we use linear programming to assess a problem from civil/environmental en- gineering: minimizing the cost of waste treatment to meet water-quality objectives in a river. In this example, we introduce the notion of shadow prices and their use in assess- ing the sensitivity of a linear programming solution. The third application, taken from electrical engineering, involves maximizing the power across a potentiometer in an electric circuit. The solution involves one-dimensional unconstrained optimization. Aside from solving the problem, we illustrate how the Visual Basic macro language allows access to the golden-section search algorithm within the context of the Excel environment. Finally, the fourth application, taken from mechanical/aerospace engineering, involves determining the equilibrium position of a multi-spring system based on the minimum potential energy. 16.1 LEAST-COST DESIGN OF A TANK (CHEMICAL/BIO ENGINEERING) Background. Chemical engineers (as well as other specialists such as mechanical and civil engineers) often encounter the general problem of designing containers to transport liquids and gases. Suppose that you are asked to determine the dimensions of a small
  • 434. 16.1 LEAST-COST DESIGN OF A TANK 417 cylindrical tank to transport toxic waste that is to be mounted on the back of a pickup truck. Your overall objective will be to minimize the cost of the tank. However, aside from cost, you must ensure that it holds the required amount of liquid and that it does not exceed the dimensions of the truck’s bed. Note that because the tank will be carrying toxic waste, the tank thickness is specified by regulations. A schematic of the tank and bed are shown in Fig. 16.1. As can be seen, the tank consists of a cylinder with two plates welded on each end. The cost of the tank involves two components: (1) material expense, which is based on weight, and (2) welding expense based on length of weld. Note that the latter involves welding both the interior and the exterior seams where the plates connect with the cylinder. The data needed for the problem are summarized in Table 16.1. Solution. The objective here is to construct a tank for a minimum cost. The cost is related to the design variables (length and diameter) as they effect the mass of the tank and the welding lengths. Further, the problem is constrained because the tank must (1) fit within the truck bed and (2) carry the required volume of material. Lmax Dmax t L D t FIGURE 16.1 Parameters for determining the optimal dimensions of a cylindrical tank. TABLE 16.1 Parameters for determining the optimal dimensions of a cylindrical tank used to transport toxic wastes. Parameter Symbol Value Units Required volume Vo 0.8 m3 Thickness t 3 cm Density r 8000 kg/m3 Bed length Lmax 2 m Bed width Dmax 1 m Material cost cm 4.5 $/kg Welding cost cw 20 $/m
  • 435. 418 CASE STUDIES: OPTIMIZATION The cost consists of tank material and welding costs. Therefore, the objective function can be formulated as minimizing C 5 cm m 1 cw/w (16.1) where C 5 cost ($), m 5 mass (kg), /w 5 weld length (m), and cm and cw 5 cost factors for mass ($/kg) and weld length ($/m), respectively. Next, we will formulate how the mass and weld lengths are related to the dimensions of the drum. First, the mass can be calculated as the volume of material times its density. The volume of the material used to create the side walls (that is, the cylinder) can be computed as Vcylinder 5 Lpc a D 2 1 tb 2 2 a D 2 b 2 d For each circular end plate, it is Vplate 5 pa D 2 1 tb 2 t Thus, the mass is computed by m 5 reLpc a D 2 1 tb 2 2 a D 2 b 2 d 1 2pa D 2 1 tb 2 tf (16.2) where r 5 density (kg/m3 ). The weld length for attaching each plate is equal to the cylinder’s inside and outside circumference. For the two plates, the total weld length would be /w 5 2c2pa D 2 1 tb 1 2p D 2 d 5 4p(D 1 t) (16.3) Given values for D and L (remember, thickness t is fixed by regulations), Eqs. (16.1) through (16.3) provide a means to compute cost. Also recognize that when Eqs. (16.2) and (16.3) are substituted into Eq. (16.1), the resulting objective function is nonlinear in the unknowns. Next, we can formulate the constraints. First, we must compute how much volume can be held within the finished tank, V 5 pD2 4 L This value must be equal to the desired volume. Thus, one constraint is pD2 L 4 5 Vo where Vo is the desired volume (m3 ). The remaining constraints deal with ensuring that the tank will fit within the dimen- sions of the truck bed, L # Lmax D # Dmax
  • 436. 16.1 LEAST-COST DESIGN OF A TANK 419 The problem is now specified. Substituting the values from Table 16.1, it can be summarized as Maximize C 5 4.5m 1 20/w subject to pD2 L 4 5 0.8 L # 2 D # 1 where m 5 8000eLpc a D 2 1 0.03b 2 2 a D 2 b 2 d 1 2pa D 2 1 0.03b 2 0.03f and /w 5 4p(D 1 0.03) The problem can now be solved in a number of ways. However, the simplest approach for a problem of this magnitude is to use a tool like the Excel Solver. The spreadsheet to accomplish this is shown in Fig. 16.2. For the case shown, we enter the upper limits for D and L. For this case, the volume is more than required (1.57 . 0.8). FIGURE 16.2 Excel spreadsheet set up to evaluate the cost of a tank subject to a volume requirement and size constraints.
  • 437. 420 CASE STUDIES: OPTIMIZATION Once the spreadsheet is created, the selection Solver is chosen from the Data tab. At this point a dialogue box will be displayed, querying you for pertinent information. The pertinent cells of the Solver dialogue box would be filled out as When the OK button is selected, a dialogue box will open with a report on the success of the operation. For the present case, the Solver obtains the correct solution, which is shown in Fig. 16.3. Notice that the optimal diameter is nudging up against the constraint of 1 m. Thus, if the required capacity of the tank were increased, we would run up against this constraint and the problem would reduce to a one-dimensional search for length. FIGURE 16.3 Results of minimization. The price is reduced from $9154 to $5723 because of the smaller volume using dimensions of D 5 0.98 m and L 5 1.05 m.
  • 438. 16.2 LEAST-COST TREATMENT OF WASTEWATER 421 16.2 LEAST-COST TREATMENT OF WASTEWATER (CIVIL/ENVIRONMENTAL ENGINEERING) Background. Wastewater discharges from big cities are often a major cause of river pollution. Figure 16.4 illustrates the type of system that an environmental engineer might confront. Several cities are located on a river and its tributary. Each generates pollution at a loading rate P that has units of milligrams per day (mg/d). The pollution loading is subject to waste treatment that results in a fractional removal x. Thus, the amount discharged to the river is the excess not removed by treatment, Wi 5 (1 2 xi)Pi (16.4) where Wi 5 waste discharge from the ith city. When the waste discharge enters the stream, it mixes with pollution from upstream sources. If complete mixing is assumed at the discharge point, the resulting concentration at the discharge point can be calculated by a simple mass balance, ci 5 Wi 1 Qu cu Qi (16.5) where Qu 5 flow (L/d), cu 5 concentration (mg/L) in the river immediately upstream of the discharge, and Qi 5 flow downstream of the discharge point (L/d). After the concentration at the mixing point is established, chemical and biological decomposition processes can remove some of the pollution as it flows downstream. For the present case, we will assume that this removal can be represented by a simple frac- tional reduction factor R. Assuming that the headwaters (that is, the river above cities 1 and 2) are pollution- free, the concentrations at the four nodes can be computed as c1 5 (1 2 x1)P1 Q13 c2 5 (1 2 x2)P2 Q23 c3 5 R13Q13c1 1 R23 Q23 c2 1 (1 2 x3)P3 Q34 (16.6) c4 5 R34 Q34 c3 1 (1 2 x4)P4 Q45 FIGURE 16.4 Four wastewater treatment plants discharging pollution to a river system. The river segments between the cities are labeled with circled numbers. 4 P1 3 2 P4 P2 P3 W1 W2 W3 W4 34 23 13 45 WWTP2 1 WWTP1 WWTP4 WWTP3
  • 439. 422 CASE STUDIES: OPTIMIZATION Next, it is recognized that the waste treatment costs a different amount, di ($1000/mg removed), at each of the facilities. Thus, the total cost of treatment (on a daily basis) can be calculated as Z 5 d1 P1x1 1 d2 P2 x2 1 d3 P3 x3 1 d4P4 x4 (16.7) where Z is total daily cost of treatment ($1000/d). The final piece in the “decision puzzle” involves environmental regulations. To pro- tect the beneficial uses of the river (for example, boating, fisheries, bathing), regulations say that the river concentration must not exceed a water-quality standard of cs. Parameters for the river system in Fig. 16.4 are summarized in Table 16.2. Notice that there is a difference in treatment cost between the upstream (1 and 2) and the down- stream cities (3 and 4) because of the outmoded nature of the downstream plants. The concentration can be calculated with Eq. (16.6) and the result listed in the shaded column for the case where no waste treatment is implemented (that is, all the x’s 5 0). Notice that the standard of 20 mg/L is being violated at all mixing points. Use linear programming to determine the treatment levels that meet the water-quality standards for the minimum cost. Also, evaluate the impact of making the standard more stringent below city 3. That is, redo the exercise, but with the standards for segments 3–4 and 4–5 lowered to 10 mg/L. Solution. All the factors outlined above can be combined into the following linear programming problem: Minimize Z 5 d1P1x1 1 d2P2x2 1 d3P3x3 1 d4P4x4 (16.8) subject to the following constraints (1 2 x1)P1 Q13 # cs1 (1 2 x2)P2 Q23 # cs2 R13Q13c1 1 R23Q23c2 1 (1 2 x3)P3 Q34 # cs3 (16.9) R34Q34c3 1 (1 2 x4)P4 Q45 # cs4 0 # x1, x2, x3, x4 # 1 (16.10) TABLE 16.2 Parameters for four wastewater treatment plants discharging pollution to a river system, along with the resulting concentration (ci) for zero treatment. Flow, removal, and standards for the river segments are also listed. City Pi (mg/d) di ($1026 /mg) ci (mg/L) Segment Q (L/d) R cs (mg/L) 1 1.00 3 109 2 100 1–3 1.00 3 107 0.5 20 2 2.00 3 109 2 40 2–3 5.00 3 107 0.35 20 3 4.00 3 109 4 47.3 3–4 1.10 3 108 0.6 20 4 2.50 3 109 4 22.5 4–5 2.50 3 108 20
  • 440. 16.2 LEAST-COST TREATMENT OF WASTEWATER 423 Thus, the objective function is to minimize treatment cost [Eq. (16.8)] subject to the constraint that water-quality standards must be met for all parts of the system [Eq. (16.9)]. In addition, treatment cannot be negative or greater than 100% removal [Eq. (16.10)]. The problem can be solved using a variety of packages. For the present application, we use the Excel spreadsheet. As seen in Fig. 16.5, these data along with the concentra- tion calculations can be set up nicely in the spreadsheet cells. Once the spreadsheet is created, the selection Solver is chosen from the Data tab. At this point a dialogue box will be displayed, querying you for pertinent information. The pertinent cells of the Solver dialogue box would be filled out as FIGURE 16.5 Excel spreadsheet set up to evaluate the cost of waste treatment on a regulated river system. Column F contains the calculation of concentration according to Eq. (16.6). Cells F4 and H4 are highlighted to show the formulas used to calculate c1 and treatment cost for city 1. In addition, highlighted cell H9 shows the formula (Eq. 16.8) for total cost that is to be minimized. Notice that not all the constraints are shown, because the dialogue box displays only six constraints at a time.
  • 441. 424 CASE STUDIES: OPTIMIZATION When the OK button is selected, a dialogue box will open with a report on the success of the operation. For the present case, the Solver obtains the correct solution, which is shown in Fig. 16.6. Before accepting the solution (by selecting the OK button on the Solver Reports box), notice that three reports can be generated: Answer, Sensitivity, and Limits. Select the Sensitivity Report and then hit the OK button to accept the solution. The Solver will automatically generate a Sensitivity Report, as in Fig. 16.7. Now let us examine the solution (Fig. 16.6). Notice that the standard will be met at all the mixing points. In fact, the concentration at city 4 will actually be less than the standard (16.28 mg/L), even though no treatment would be required for city 4. As a final exercise, we can lower the standards for reaches 3–4 and 4–5 to 10 mg/L. Before doing this, we can examine the Sensitivity Report. For the present case, the key column of Fig. 16.7 is the Lagrange Multiplier (aka the “shadow price”). The shadow price is a value that expresses the sensitivity of the objective function (in our case, cost) to a unit change of one of the constraints (water-quality standards). It therefore represents the additional cost that will be incurred by making the standards more stringent. For our example, it is revealing that the largest shadow price, 2$440yDcs3, occurs for one of the standard changes (that is, downstream from city 3) that we are contemplating. This tips us off that our modification will be costly. This is confirmed when we rerun Solver with the new standards (that is, we lower cells G6 and G7 to 10). As seen in Table 16.3, the result is that treatment cost is increased from $12,600/day to $19,640/day. In addition, reducing the standard concentrations for the lower reaches means that city 4 must begin to treat its waste and city 3 must upgrade its treatment. Notice also that the treatment of the upstream cities is unaffected. FIGURE 16.6 Results of minimization. The water-quality standards are met at a cost of $12,600/day. Notice that despite the fact that no treatment is required for city 4, the concentration at its mixing point actually exceeds the standard.
  • 442. 16.3 MAXIMUM POWER TRANSFER FOR A CIRCUIT 425 16.3 MAXIMUM POWER TRANSFER FOR A CIRCUIT (ELECTRICAL ENGINEERING) Background. The simple resistor circuit in Fig. 16.8 contains three fixed resistors and one adjustable resistor. Adjustable resistors are called potentiometers. The values for the parameters are V 5 80 V, R1 5 8 V, R2 5 12 V, and R3 5 10 V. (a) Find the value of the adjustable resistance Ra that maximizes the power transfer across terminals 1 and 2. (b) Perform a sensitivity analysis to determine how the maximum power and the corresponding setting of the potentiometer (Ra) varies as V is varied over the range from 45 to 105 V. FIGURE 16.7 Sensitivity Report for spread- sheet set up to evaluate the cost of waste treatment on a regu- lated river system. TABLE 16.3 Comparison of two scenarios involving the impact of different regulations on treatment costs. Scenario 1: All cs 5 20 Scenario 2: Downstream cs 5 10 City x c City x c 1 0.8 20 1 0.8 20 2 0.5 20 2 0.5 20 3 0.5625 20 3 0.8375 10 4 0 15.28 4 0.264 10 Cost 5 $12,600 Cost 5 $19,640
  • 443. 426 CASE STUDIES: OPTIMIZATION Solution. An expression for power for the circuit can be derived from Kirchhoff’s laws as P(Ra) 5 c VR3 Ra R1(Ra 1 R2 1 R3) 1 R3 Ra 1 R3 R2 d 2 Ra (16.11) Substituting the parameter values gives the plot shown in Fig. 16.9. Notice that a maximum power transfer occurs at a resistance of about 16 V. We will solve this problem in two ways with the Excel spreadsheet. First, we will employ trial-and-error and the Solver option. Then, we will develop a Visual Basic macro program to perform the sensitivity analysis. (a) An Excel spreadsheet to implement Eq. (16.11) is shown in Fig. 16.10. As indi- cated, Eq. (16.11) can be entered into cell B9. Then the value of Ra (cell B8) can be varied in a trial-and-error fashion until a minimum drag was determined. For this ex- ample, the result is a power of 30.03 W and a potentiometer setting of Ra 5 16.44 V. A superior approach involves using the Solver option from the spreadsheet’s Data tab. At this point a dialogue box will be displayed, querying you for pertinent informa- tion. The pertinent cells of the Solver dialogue box would be filled out as R3 1 2 V ⫹ ⫺ R2 R1 Ra FIGURE 16.8 A resistor circuit with an adjustable resistor, or potentiometer. 50 100 40 0 0 Ra P(Ra) 20 Maximum power FIGURE 16.9 A plot of power transfer across terminals 1-2 from Fig. 16.8 as a function of the potentiometer resistance Ra. Set target cell: B9 Equal to ● max ❍ min ❍ equal to 0 By changing cells B8
  • 444. 16.3 MAXIMUM POWER TRANSFER FOR A CIRCUIT 427 When the OK button is selected, a dialogue box will open with a report on the success of the operation. For the present case, the Solver obtains the same correct solution shown in Fig. 16.10. (b) Now, although the foregoing approach is excellent for a single evaluation, it is not convenient for cases where multiple optimizations would be employed. Such would be the case for the second part of this application, where we are interested in determin- ing how the maximum power varies for different voltage settings. Of course, the Solver could be invoked multiple times for different parameter values, but this would be inef- ficient. A preferable course would involve developing a macro function to come up with the optimum. Such a function is listed in Fig. 16.11. Notice how closely it resembles the golden- section-search pseudocode previously presented in Fig. 13.5. In addition, notice that a function must also be defined to compute power according to Eq. (16.11). An Excel spreadsheet utilizing this macro to evaluate the sensitivity of the solution to voltage is given in Fig. 16.12. A column of values is set up that spans the range of V’s (that is, from 45 to 105 V). A function call to the macro is written in cell B9 that references the adjacent value of V (the 45 in A9). In addition, the other parameters in the function argument are also included. Notice that, whereas the reference to V is rela- tive, the references to the lower and upper guesses and the resistances are absolute (that is, including leading $). This was done so that when the formula is copied down, the absolute references stay fixed, whereas the relative reference corresponds to the voltage in the same row. A similar strategy is used to place Eq. (16.11) in cell C9. When the formulas are copied downward, the result is as shown in Fig. 16.12. The maximum power can be plotted to visualize the impact of voltage variations. As seen in Fig. 16.13, the power increases with V. The results for the corresponding potentiometer settings (Ra) are more interesting. The spreadsheet indicates that the same setting, 16.44 V, results in maximum power. Such a result might be difficult to intuit based on casual inspection of Eq. (16.11). FIGURE 16.10 Excel determination of maximum power across a potentiometer using trial-and-error.
  • 445. 428 CASE STUDIES: OPTIMIZATION Option Explicit Function Golden(xlow, xhigh, R1, R2, R3, V) Dim iter As Integer, maxit As Integer, ea As Double, es As Double Dim fx As Double, xL As Double, xU As Double, d As Double, x1 as Double Dim x2 As Double, f1 As Double, f2 As Double, xopt As Double Const R As Double = (5 ^ 0.5 – 1) / 2 maxit = 50 es = 0.001 xL = xlow xU = xhigh iter = 1 d = R * (xU – xL) x1 = xL + d x2 = xU – d f1 = f(x1, R1, R2, R3, V) f2 = f(x2, R1, R2, R3, V) If f1 f2 Then xopt = x1 fx = f1 Else xopt = x2 fx = f2 End If Do d = R * d If f1 f2 Then xL = x2 x2 = x1 x1 = xL + d f2 = f1 f1 = f(x1, R1, R2, R3, V) Else xU = x1 x1 = x2 x2 = xU – d f1 = f2 f2 = f(x2, R1, R2, R3, V) End If iter = iter + 1 If f1 f2 Then xopt = x1 fx =f1 Else xopt = x2 fx = f2 End If If xopt 0 Then ea = (1 – R) * Abs((xU – xL) / xopt) * 100 If ea = es Or iter = maxit Then Exit Do Loop Golden = xopt End Function Function f(Ra, R1, R2, R3, V) f = (V * R3 * Ra / (R1 * (Ra + R2 + R3) + R3 * Ra + R3 * R2)) ^ 2 / Ra END FUNCTION FIGURE 16.11 Excel macro written in Visual Basic to determine a maximum with the golden-section search.
  • 446. 16.4 EQUILIBRIUM AND MINIMUM POTENTIAL ENERGY 429 FIGURE 16.13 Results of sensitivity analysis of the effect of voltage variations on maximum power. 45 105 40 60 0 P (W) 75 20 Ra (⍀) V (V) 16.4 EQUILIBRIUM AND MINIMUM POTENTIAL ENERGY (MECHANICAL/AEROSPACE ENGINEERING) Background. As in Figure 16.14a, an unloaded spring can be attached to a wall mount. When a horizontal force is applied the spring stretches. The displacement is related to the force by Hooke’s law, F 5 kx. The potential energy of the deformed state consists of the difference between the strain energy of the spring and the work done by the force, PE(x) 5 0.5kx2 2 Fx (16.12) Equation (16.12) defines a parabola. Since the potential energy will be at a minimum at equilibrium, the solution for displacement can be viewed as a one-dimensional optimization FIGURE 16.12 Excel spreadsheet to implement a sensitivity analysis of the maximum power to variations of voltage. This routine accesses the macro program for golden-section search from Fig. 16.11. =(A9*$B$5*B9/($B$3*(B9+$B$4+$B$5)+$B$5*B9+$B$3*$B$4))^2/B9 = Golden($B$6,$B$7,$B$3,$B$4,$B$5,A9) Call to Visual Basic macro function Power calculation A B C D 1 Maximum Power Transfer 2 3 R1 8 4 R2 12 5 R3 10 6 Rmin 0.1 7 Rmax 100 8 V Ra P(Ra) 9 45 16.44444 9.501689 10 60 16.44444 16.89189 11 75 16.44444 26.39358 12 90 16.44444 38.00676 13 105 16.44444 51.73142
  • 447. 430 CASE STUDIES: OPTIMIZATION problem. Because this equation is so easy to differentiate, we can solve for the displacement as x 5 Fyk. For example, if k 5 2 N/cm and F 5 5 N, x 5 5Ny(2 N/cm)y5 5 2.5 cm. A more interesting two-dimensional case is shown in Figure 16.15. In this system, there are two degrees of freedom in that the system can move both horizontally and vertically. In the same way that we approached the one-dimensional system, the equilib- rium deformations are the values of x1 and x2 that minimize the potential energy, PE(x1, x2) 5 0.5ka(2x2 1 1 (La 2 x2)2 2 La)2 1 0.5kb(2x2 1 1 (Lb 1 x2)2 2 Lb)2 2 F1x1 2 F2x2 (16.13) If the parameters are ka 5 9 N/cm, kb 5 2 N/cm, La 5 10 cm, Lb 5 10 cm, F1 5 2 N, and F2 5 4 N, solve for the displacements and the potential energy. Background. We can use a variety of software tools to solve this problem. For example, using MATLAB, an M-file can be developed to hold the potential energy function, function p=PE(x,ka,kb,La,Lb,F1,F2) PEa=0.5*ka*(sqrt(x(1)^2+(La-x(2))^2)-La)^2; PEb=0.5*kb*(sqrt(x(1)^2+(Lb+x(2))^2)-Lb)^2; W=F1*x(1)+F2*x(2); p=PEa+PEb-W; The solution can then be obtained with the fminsearch function, ka=9;kb=2;La=10;Lb=10;F1=2;F2=4; [x,f]=fminsearch(@PE,[—0.5,0.5],[],ka,kb,La,Lb,F1,F2) x = 4.9523 1.2769 f = -9.6422 Thus, at equilibrium, the potential energy is 29.6422 N?cm. The connecting point is located 4.9523 cm to the right and 1.2759 cm above its original position. FIGURE 16.14 (a) An unloaded spring at- tached to a wall mount. (b) Ap- plication of a horizontal force stretches the spring where the relationship between force and displacement is described by Hooke’s law. k (a) (b) x F FIGURE 16.15 A two-spring system: (a) unloaded, and (b) loaded. F1 x1 x2 ka kb Lb La F2 (a) (b)
  • 448. PROBLEMS 431 PROBLEMS Chemical/Bio Engineering 16.1 Design the optimal cylindrical container (Fig. P16.1) that is open at one end and has walls of negligible thickness. The con- tainer is to hold 0.5 m3 . Design it so that the areas of its bottom and sides are minimized. 16.2 (a) Design the optimal conical container (Fig. P16.2) that has a cover and has walls of negligible thickness. The container is to hold 0.5 m3 . Design it so that the areas of its top and sides are mini- mized. (b) Repeat (a) but for a conical container without a cover. 16.3 Design the optimal cylindrical tank with dished ends (Fig. P16.3). The container is to hold 0.5 m3 and has walls of negli- gible thickness. Note that the area and volume of each of the dished ends can be computed with A 5 p(h2 1 r2 ) V 5 ph(h2 1 3r2 ) 6 (a) Design the tank so that its surface area is minimized. Interpret the result. (b) Repeat part (a), but add the constraint L $ 2h. FIGURE P16.1 A cylindrical container with no lid. h r Open FIGURE P16.2 A conical container with a lid. h r Lid FIGURE P16.3 L h r 16.4 The specific growth rate of a yeast that produces an antibiotic is a function of the food concentration c, g 5 2c 4 1 0.8c 1 c2 1 0.2c3 As depicted in Fig. P16.4, growth goes to zero at very low concen- trations due to food limitation. It also goes to zero at high concen- trations due to toxicity effects. Find the value of c at which growth is a maximum. 16.5 A chemical plant makes three major products on a weekly basis. Each of these products requires a certain quantity of raw chemical and different production times, and yields different profits. The pertinent information is in Table P16.5. Note that there is sufficient warehouse space at the plant to store a total of 450 kg/week. FIGURE P16.4 The specific growth rate of a yeast that produces an antibiotic versus the food concentration. 5 10 0.4 0 0 c (mg/L) g (d⫺1 ) 0.2
  • 449. 432 CASE STUDIES: OPTIMIZATION that the initial cost of the system is a function of the conversion xA. Find the conversion that will result in the lowest cost system. C is a proportionality constant. Cost 5 C c a 1 (1 2 xA)2 b 0.6 1 5a 1 xA b 0.6 d 16.9 In problem 16.8, only one reactor is used. If two reactors are used in series, the governing equation for the system changes. Find the conversions for both reactors (xA1 and xA2) such that the total cost of the system is minimized. Cost 5 C c a xA1 xA2(1 2 xA1)2 b 0.6 1 a 1 2 (xA1 xA2 ) (1 2 xA2)2 b 0.6 1 5a 1 xA2 b 0.6 d 16.10 For the reaction: 2A 1 B 3 C equilibrium can be expressed as: K 5 [C] [A]2 [B] 5 [C] [A0 2 2C]2 [B0 2 C] If K 5 2 M21 , the initial concentration of A (A0) can be varied. The initial concentration of B is fixed by the process, B0 5 100. A costs $1/M and C sells for $10/M. What would be the optimum initial concentration ofA to use such that the profits would be maximized? 16.11 A chemical plant requires 106 L/day of a solution.Three sources are available at different prices and supply rates. Each source also has a different concentration of an impurity that must be kept below a minimum level to prevent interference with the chemical. The data for the three sources are summarized in the following table. Determine the amount from each source to meet the requirements at the least cost. Source 1 Source 2 Source 3 Required Cost ($yL) 0.50 1.00 1.20 minimize Supply (105 Lyday) 20 10 5 $10 Concentration (mgyL) 135 100 75 #100 (a) Set up a linear programming problem to maximize profit. (b) Solve the linear programming problem with the simplex method. (c) Solve the problem with a software package. (d) Evaluate which of the following options will raise profits the most: increasing raw chemical, production time, or storage. 16.6 Recently chemical engineers have become involved in the area known as waste minimization. This involves the operation of a chemical plant so that impacts on the environment are minimized. Suppose a refinery develops a product Z1 made from two raw materials X and Y. The production of 1 metric tonne of the product involves 1 tonne of X and 2.5 tonnes of Y and produces 1 tonne of a liquid waste W. The engineers have come up with three alternative ways to handle the waste: • Produce a tonne of a secondary product Z2 by adding an addi- tional tonne of X to each tonne of W. • Produce a tonne of another secondary product Z3 by adding an additional tonne of Y to each tonne of W. • Treat the waste so that it is permissible to discharge it. The products yield profits of $2500, 2$50, and $200/tonne for Z1, Z2, and Z3, respectively. Note that producing Z2 actually creates a loss. The treatment process costs $300/tonne. In addition, the com- pany has access to a limit of 7500 and 10,000 tonnes of X and Y, respectively, during the production period. Determine how much of the products and waste must be created in order to maximize profit. 16.7 A mixture of benzene and toluene are to be separated in a flash tank. At what temperature should the tank be operated to get the highest purity toluene in the liquid phase (maximizing xT)? The pres- sure in the flash tank is 800 mm Hg. The units for Antoine’s equation are mm Hg and 8C for pressure and temperature, respectively. xB PsatB 1 xT PsatT 5 P log10 (PsatB) 5 6.905 2 1211 T 1 221 log10 (PsatT) 5 6.953 2 1344 T 1 219 16.8 A compoundA will be converted into B in a stirred tank reactor. The product B and unreacted A are purified in a separation unit. Unreacted A is recycled to the reactor. A process engineer has found TABLE P16.5 Resource Product 1 Product 2 Product 3 Availability Raw chemical 7 kg/kg 5 kg/kg 13 kg/kg 3000 kg Production time 0.05 hr/kg 0.1 hr/kg 0.2 hr/kg 55 hr/week Profit $30/kg $30/kg $35/kg
  • 450. PROBLEMS 433 16.16 Suppose that you are asked to design a column to support a compressive load P, as shown in Fig. P16.16a. The column has a cross-section shaped as a thin-walled pipe as shown in Fig. P16.16b. The design variables are the mean pipe diameter d and the wall thickness t. The cost of the pipe is computed by Cost 5 f(t, d) 5 c1W 1 c2d where c1 5 4 and c2 5 2 are cost factors and W 5 weight of the pipe, W 5 pdtHr where r 5 density of the pipe material 5 0.0025 kg/cm3 . The col- umn must support the load under compressive stress and not buckle. Therefore, Actual stress (s) # maximum compressive yield stress 5 sy 5 550 kg/cm2 Actual stress # buckling stress 16.12 You must design a triangular open channel to carry a waste stream from a chemical plant to a waste stabilization pond (Fig. P16.12). The mean velocity increases with the hydraulic radius Rh 5 Ayp, where A is the cross-sectional area and p equals the wetted perimeter. Because the maximum flow rate corresponds to the maximum velocity, the optimal design amounts to minimiz- ing the wetted perimeter. Determine the dimensions to minimize the wetted perimeter for a given cross-sectional area.Are the relative dimensions universal? 16.13 As an agricultural engineer, you must design a trapezoi- dal open channel to carry irrigation water (Fig. P16.13). Deter- mine the optimal dimensions to minimize the wetted perimeter for a cross-sectional area of 100 m2 . Are the relative dimensions universal? 16.14 Find the optimal dimensions for a heated cylindrical tank designed to hold 10 m3 of fluid. The ends and sides cost $200/m2 and $100/m2 , respectively. In addition, a coating is applied to the entire tank area at a cost of $50/m2 . Civil/Environmental Engineering 16.15 A finite-element model of a cantilever beam subject to load- ing and moments (Fig. P16.15) is given by optimizing f(x, y) 5 5x2 2 5xy 1 2.5y2 2 x 2 1.5y where x 5 end displacement and y 5 end moment. Find the values of x and y that minimize f(x, y). FIGURE P16.12 w d ␪ ␪ FIGURE P16.13 w d ␪ ␪ FIGURE P16.15 A cantilever beam. x y FIGURE P16.16 (a) A column supporting a compressive load P. (b) The column has a cross section shaped as a thin-walled pipe. (a) H P (b) d t
  • 451. 434 CASE STUDIES: OPTIMIZATION below the point discharge. This point is called “critical” because it represents the location where biota that depend on oxygen (like fish) would be the most stressed. Determine the critical travel time and concentration, given the following values: os 5 10 mg/L kd 5 0.1 d21 ka 5 0.6 d21 ks 5 0.05 d21 Lo 5 50 mg/L Sb 5 1 mg/L/d 16.18 The two-dimensional distribution of pollutant concentration in a channel can be described by c(x, y) 5 7.9 1 0.13x 1 0.21y 2 0.05x2 2 0.016y2 2 0.007xy Determine the exact location of the peak concentration given the function and the knowledge that the peak lies within the bounds 210 # x # 10 and 0 # y # 20. 16.19 The flow Q (m3 /s) in an open channel can be predicted with the Manning equation Q 5 1 n AcR2y3 S1y2 where n 5 Manning roughness coefficient (a dimensionless num- ber used to parameterize the channel friction), Ac 5 cross-sectional area of the channel (m2 ), S 5 channel slope (dimensionless, meters drop per meter length), and R 5 hydraulic radius (m), which is re- lated to more fundamental parameters by R 5 AcyP, where P 5 wetted perimeter (m). As the name implies, the wetted perimeter is the length of the channel sides and bottom that is under water. For example, for a rectangular channel, it is defined as P 5 B 1 2H, where H 5 depth (m). Suppose that you are using this formula to design a lined canal (note that farmers line canals to minimize leak- age losses). (a) Given the parameters n 5 0.035, S 5 0.003, and Q 5 1 m3 /s, determine the values of B and H that minimize the wetted pe- rimeter. Note that such a calculation would minimize cost if lining costs were much larger than excavation costs. (b) Repeat part (a), but include the cost of excavation. To do this minimize the following cost function, C 5 c1 Ac 1 c2P where c1 is a cost factor for excavation 5 $100/m2 and c2 is a cost factor for lining $50/m. (c) Discuss the implications of your results. 16.20 A cylindrical beam carries a compression load P 5 3000 kN. To prevent the beam from buckling, this load must be less than a critical load, Pc 5 p2 EI L2 The actual stress is given by s 5 P A 5 P pdt The buckling stress can be shown to be sb 5 pEI H2 dt where E 5 modulus of elasticity and I 5 second moment of the area of the cross section. Calculus can be used to show that I 5 p 8 dt(d2 1 t2 ) Finally, diameters of available pipes are between d1 and d2 and thicknesses between t1 and t2. Develop and solve this problem by determining the values of d and t that minimize the cost. Note that H 5 275 cm, P 5 2000 kg, E 5 900,000 kg/cm2 , d1 5 1 cm, d2 5 10 cm, t1 5 0.1 cm, and t2 5 1 cm. 16.17 The Streeter-Phelps model can be used to compute the dissolved oxygen concentration in a river below a point discharge of sewage (Fig. P16.17), o 5 os 2 kd Lo kd 1 ks 2 ka (e2ka t 2 e2(kd1ks)t ) 2 Sb ka (1 2 e2ka t ) (P16.17) where o 5 dissolved oxygen concentration (mg/L), os 5 oxygen saturation concentration (mg/L), t 5 travel time (d), Lo 5 biochem- ical oxygen demand (BOD) concentration at the mixing point (mg/L), kd 5 rate of decomposition of BOD (d21 ), ks 5 rate of set- tling of BOD (d21 ), ka 5 reaeration rate (d21 ), and Sb 5 sediment oxygen demand (mg/L/d). As indicated in Fig. P16.17, Eq. (P16.17) produces an oxygen “sag” that reaches a critical minimum level oc some travel time tc 15 20 8 12 0 0 t (d) 5 4 10 o (mg/L) o os tc oc FIGURE P16.17 A dissolved oxygen “sag” below a point discharge of sewage into a river.
  • 452. PROBLEMS 435 16.24 A system consists of two power plants that must deliver loads over a transmission network. The costs of generating power at plants 1 and 2 are given by F1 5 2p1 1 2 F2 5 10p2 where p1 and p2 5 power produced by each plant. The losses of power due to transmission L are given by L1 5 0.2p1 1 0.1p2 L2 5 0.2p1 1 0.5p2 The total demand for power is 30 and p1 must not exceed 42. Determine the power generation needed to meet demands while minimizing cost using an optimization routine such as those found in, for example, Excel, MATLAB, or Mathcad software. 16.25 The torque transmitted to an induction motor is a function of the slip between the rotation of the stator field and the rotor speed s where slip is defined as s 5 n 2 nR n where n 5 revolutions per second of rotating stator speed and nR 5 rotor speed. Kirchhoff’s laws can be used to show that the torque (expressed in dimensionless form) and slip are related by T 5 15(s 2 s2 ) (1 2 s)(4s2 2 3s 1 4) Figure P16.25 shows this function. Use a numerical method to determine the slip at which the maximum torque occurs. 16.26 (a) A computer equipment manufacturer produces scanners and printers. The resources needed for producing these devices and the corresponding profits are Device Capital ($/unit) Labor (hr/unit) Profit ($/unit) Scanner 300 20 500 Printer 400 10 400 where E 5 Young’s modulus 5 200 3 109 N/m2 , I 5 pr4 y4 (the area moment of inertia for a cylindrical beam of radius r), and L is the beam length. If the volume of beam V cannot exceed 0.075 m3 , find the largest height L that can be utilized and the correspond- ing radius. 16.21 The Splash River has a flow rate of 2 3 106 m3 /d, of which up to 70% can be diverted into two channels where it flows through Splish County. These channels are used for transportation, irriga- tion, and electric power generation, with the latter two being sources of revenue. The transportation use requires a minimum di- verted flow rate of 0.3 3 106 m3 /d for Channel 1 and 0.2 3 106 m3 /d for Channel 2. For political reasons it has been decided that the absolute difference between the flow rates in the two channels can- not exceed 40% of the total flow diverted into the channels. The Splish County Water Management Board has also limited mainte- nance costs for the channel system to be no more than $1.8 3 106 per year. Annual maintenance costs are estimated based on the daily flow rate. Channel 1 costs per year are estimated by multiply- ing $1.1 times the m3 /d of flow; while for Channel 2 the multiplica- tion factor is $1.4 per m3 /d. Electric power production revenue is also estimated based on daily flow rate. For Channel 1 this is $4.0 per m3 /d, while for Channel 2 it is $3.0 per m3 /d. Annual revenue from irrigation is also estimated based on daily flow rate, but the flow rates must first be corrected for water loss in the channels previous to delivery for irrigation. This loss is 30% in Channel 1 and 20% in Channel 2. In both channels the reve- nue is $3.2 per m3 /d. Determine the flows in the channels that maximize profit. 16.22 Determine the beam cross-sectional areas that result in the minimum weight for the truss we studied in Sec. 12.2 (Fig. 12.4). The critical buckling and maximum tensile strengths of compres- sion and tension members are 10 and 20 ksi, respectively. The truss is to be constructed of steel (density 5 3.5 lb/ft-in2 ). Note that the length of the horizontal member (2) is 50 ft. Also, recall that the stress in each member is equal to the force divided by cross-sectional area. Set up the problem as a linear program- ming problem. Obtain the solution graphically and with the Excel Solver. Electrical Engineering 16.23 A total charge Q is uniformly distributed around a ring- shaped conductor with radius a. A charge q is located at a distance x from the center of the ring (Fig. P16.23). The force exerted on the charge by the ring is given by F 5 1 4pe0 qQx (x2 1 a2 )3y2 where e0 5 8.85 3 10212 C2 y(N m2 ), q 5 Q 5 2 3 1025 C, and a 5 0.9 m. Determine the distance x where the force is a maximum. FIGURE P16.23 x a Q q
  • 453. 436 CASE STUDIES: OPTIMIZATION where D 5 drag, s 5 ratio of air density between the flight altitude and sea level, W 5 weight, and V 5 velocity.As seen in Fig. P16.28, the two factors contributing to drag are affected differently as velocity increases. Whereas friction drag increases with velocity, the drag due to lift decreases. The combination of the two factors leads to a minimum drag. (a) If s 5 0.5 and W 5 15,000, determine the minimum drag and the velocity at which it occurs. (b) In addition, develop a sensitivity analysis to determine how this optimum varies in response to a range of W 5 12,000 to 18,000 with s 5 0.5. 16.29 Roller bearings are subject to fatigue failure caused by large contact loads F (Fig. P16.29). If there are $127,000 worth of capital and 4270 hr of labor available each day, how many of each device should be pro- duced per day to maximize profit? (b) Repeat the problem, but now assume that the profit for each printer sold Pp depends on the number of printers produced Xp, as in Pp 5 400 2 Xp 16.27 A manufacturer provides specialized microchips. During the next 3 months, its sales, costs, and available time are Month 1 Month 2 Month 3 Chips required 1000 2500 2200 Cost regular time ($/chip) 100 100 120 Cost overtime ($/chip) 110 120 130 Regular operation time (hr) 2400 2400 2400 Overtime (hr) 720 720 720 There are no chips in stock at the beginning of the first month. It takes 1.5 hr of production time to produce a chip and costs $5 to store a chip from one month to the next. Determine a production schedule that meets the demand requirements, does not exceed the monthly production time limitations, and minimizes cost. Note that no chips should be in stock at the end of the 3 months. Mechanical/Aerospace Engineering 16.28 The total drag on an airfoil can be estimated by D 5 0.01sV2 1 0.95 s a W V b 2 friction lift FIGURE P16.25 Torque transmitted to an inductor as a function of slip. s T 4 8 10 3 4 0 0 2 2 6 1 FIGURE P16.28 Plot of drag versus velocity for an airfoil. 400 800 1,200 10,000 20,000 Total Minimum Lift Friction 0 0 V D FIGURE P16.29 Roller bearings. F F x
  • 454. PROBLEMS 437 and testing of mountain bikes (Fig. P16.33a). Suppose that you are given the task of predicting the horizontal and vertical displace- ment of a bike bracketing system in response to a force. Assume the forces you must analyze can be simplified as depicted in Fig. P16.33b.You are interested in testing the response of the truss to a force exerted in any number of directions designated by the angle u. The parameters for the problem are E 5Young’s modulus 5 231011 Pa, A 5 cross-sectional area 5 0.0001 m2 , w 5 width 5 0.44 m, / 5 length 5 0.56 m, and h 5 height 5 0.5 m. The dis- placements x and y can be solved by determining the values that yield a minimum potential energy. Determine the displacements for a force of 10,000 N and a range of u’s from 08 (horizontal) to 908 (vertical). The problem of finding the location of the maximum stress along the x axis can be shown to be equivalent to maximizing the function f(x) 5 0.4 21 1 x2 2 21 1 x2 a1 2 0.4 1 1 x2 b 1 x Find the x that maximizes f(x). 16.30 An aerospace company is developing a new fuel additive for commercial airliners. The additive is composed of three ingre- dients: X, Y, and Z. For peak performance, the total amount of additive must be at least 6 mL/L of fuel. For safety reasons, the sum of the highly flammable X and Y ingredients must not exceed 2.5 mL/L. In addition, the amount of the X ingredient must always be equal to or greater than the Y, and the Z must be greater than half the Y. If the cost per mL for the ingredients X, Y, and Z is 0.05, 0.025, and 0.15, respectively, determine the minimum cost mixture for each liter of fuel. 16.31 A manufacturing firm produces four types of automobile parts. Each is first fabricated and then finished. The required worker hours and profit for each part are Part A B C D Fabrication time (hr/100 units) 2.5 1.5 2.75 2 Finishing time (hr/100 units) 3.5 3 3 2 Profit ($/100 units) 375 275 475 325 The capacities of the fabrication and finishing shops over the next month are 640 and 960 hours, respectively. Determine how many of each part should be produced in order to maximize profit. 16.32 In a similar fashion to the case study described in Sec. 16.4, develop the potential energy function for the system de- picted in Fig. P16.32. Develop contour and surface plots in MATLAB. Minimize the potential energy function in order to determine the equilibrium displacements x1 and x2 given the forcing function F 5 100 N, and the parameter ka 5 20 and kb 5 15 N/m. 16.33 Recent interest in competitive and recreational cycling has meant that engineers have directed their skills toward the design FIGURE P16.32 Two frictionless masses connected to a wall by a pair of linear elastic springs. ka kb F 2 1 x2 x1 FIGURE P16.33 (a) A mountain bike along with (b) a free-body diagram for a part of the frame. (a) x F y h ᐉ w ␪ (b)
  • 455. 438 EPILOGUE: PART FOUR The epilogues of other parts of this book contain a discussion and a tabular summary of the trade-offs among various methods as well as important formulas and relationships. Most of the methods of this part are quite complicated and, consequently, cannot be summarized with simple formulas and tabular summaries. Therefore, we deviate some- what here by providing the following narrative discussion of trade-offs and further refer- ences. PT4.4 TRADE-OFFS Chapter 13 dealt with finding the optimum of an unconstrained function of a single vari- able. The golden-section search method is a bracketing method requiring that an interval containing a single optimum be known. It has the advantage that it minimizes function evaluations and always converges. Parabolic interpolation also works best when imple- mented as a bracketing method, although it can also be programmed as an open method. However, in such cases, it may diverge. Both the golden-section search and parabolic interpolation do not require derivative evaluations. Thus, they are both appropriate meth- ods when the bracket can be readily defined and function evaluations are costly. Newton’s method is an open method not requiring that an optimum be bracketed. It can be implemented in a closed-form representation when first and second derivatives can be determined analytically. It can also be implemented in a fashion similar to the secant method with finite-difference representations of the derivatives. Although Newton’s method converges rapidly near the optimum, it is often divergent for poor guesses. Convergence is also dependent on the nature of the function. Finally, hybrid approaches are available that orchestrate various methods to attain both reliability and efficiency. Brent’s method does this by combining the reliable golden- section search with speedy parabolic interpolation. Chapter 14 covered two general types of methods to solve multidimensional uncon- strained optimization problems. Direct methods such as random searches and univariate searches do not require the evaluation of the function’s derivatives and are often ineffi- cient. However they also provide a tool to find global rather than local optima. Pattern search methods like Powell’s method can be very efficient and also do not require de- rivative evaluation. Gradient methods use either first and sometimes second derivatives to find the op- timum. The method of steepest ascent/descent provides a reliable but sometimes slow approach. In contrast, Newton’s method often converges rapidly when in the vicinity of a root, but sometimes suffers from divergence. The Marquardt method uses the steepest descent method at the starting location far away from the optimum and switches to Newton’s method near the optimum in an attempt to take advantage of the strengths of each method.
  • 456. PT4.5 ADDITIONAL REFERENCES 439 The Newton method can be computationally costly because it requires computation of both the gradient vector and the Hessian matrix. Quasi-Newton approaches attempt to circumvent these problems by using approximations to reduce the number of matrix evaluations (particularly the evaluation, storage, and inversion of the Hessian). Research investigations continue today that explore the characteristics and respective strengths of various hybrid and tandem methods. Some examples are the Fletcher-Reeves conjugate gradient method and Davidon-Fletcher-Powell quasi-Newton methods. Chapter 15 was devoted to constrained optimization. For linear problems, linear pro- gramming based on the simplex method provides an efficient means to obtain solutions. Approaches such as the GRG method are available to solve nonlinear constrained problems. Software packages include a wide variety of optimization capabilities. As described in Chap. 15, Excel, MATLAB software, and Mathcad all have built-in search capabilities that can be used for both one-dimensional and multidimensional problems routinely encountered in engineering and science. PT4.5 ADDITIONAL REFERENCES General overviews of optimization including some algorithms can be found in Press et al. (2007) and Moler (2004). For multidimensional problems, additional information can be found in Dennis and Schnabel (1996), Fletcher (1980, 1981), Gill et al. (1981), and Luenberger (1984). In addition, there are a number of advanced methods that are well suited for specific problem contexts. For example, genetic algorithms use strategies inspired by evolutionary biology such as inheritance, mutation, and selection. Because they do not make assump- tions regarding the underlying search space, such evolutionary algorithms are often use- ful for large problems with many local optima. Related techniques include simulated annealing and Tabu search. Hillier and Lieberman (2005) provide overviews of these and a number of other advanced techniques.
  • 458. 441 PT5.1 MOTIVATION Data are often given for discrete values along a continuum. However, you may require estimates at points between the discrete values. The present part of this book describes techniques to fit curves to such data to obtain intermediate estimates. In addition, you may require a simplified version of a complicated function. One way to do this is to compute values of the function at a number of discrete values along the range of interest. Then, a simpler function may be derived to fit these values. Both of these applications are known as curve fitting. There are two general approaches for curve fitting that are distinguished from each other on the basis of the amount of error associated with these data. First, where these data exhibit a significant degree of error or “noise,” the strategy is to derive a single curve that represents the general trend of these data. Because any individual data point may be incorrect, we make no effort to intersect every point. Rather, the curve is designed to follow the pattern of the points taken as a group. One approach of this nature is called least-squares regression (Fig. PT5.1a). Second, where these data are known to be very precise, the basic approach is to fit a curve or a series of curves that pass directly through each of the points. Such data usually originate from tables. Examples are values for the density of water or for the heat capacity of gases as a function of temperature. The estimation of values between well-known discrete points is called interpolation (Fig. PT5.1b and c). PT5.1.1 Noncomputer Methods for Curve Fitting The simplest method for fitting a curve to data is to plot the points and then sketch a line that visually conforms to these data. Although this is a valid option when quick estimates are required, the results are dependent on the subjective viewpoint of the per- son sketching the curve. For example, Fig. PT5.1 shows sketches developed from the same set of data by three engineers. The first did not attempt to connect the points, but rather, characterized the general upward trend of these data with a straight line (Fig. PT5.1a). The second engineer used straight-line segments or linear interpolation to connect the points (Fig. PT5.1b). This is a very common practice in engineering. If the values are truly close to being linear or are spaced closely, such an approximation provides estimates that are adequate for many engineering calculations. However, where the underlying relationship is highly curvilinear or these data are widely spaced, significant errors can be introduced by such linear interpolation. The third engineer used curves to try to capture the meanderings suggested by these data (Fig. PT5.1c). A fourth or fifth engineer would likely develop alternative fits. Obviously, our goal here is to develop systematic and objective methods for the purpose of deriving such curves. CURVE FITTING
  • 459. 442 CURVE FITTING PT5.1.2 Curve Fitting and Engineering Practice Your first exposure to curve fitting may have been to determine intermediate values from tabulated data—for instance, from interest tables for engineering economics or from steam tables for thermodynamics. Throughout the remainder of your career, you will have frequent occasion to estimate intermediate values from such tables. Although many of the widely used engineering properties have been tabulated, there are a great many more that are not available in this convenient form. Special cases and new problem contexts often require that you measure your own data and develop your own predictive relationships. Two types of applications are generally encountered when fitting experimental data: trend analysis and hypothesis testing. Trend analysis represents the process of using the pattern of these data to make predictions. For cases where these data are measured with high precision, you might FIGURE PT5.1 Three attempts to fit a “best” curve through five data points. (a) Least-squares regression, (b) linear interpolation, and (c) curvilinear interpolation. f(x) x (a) f(x) x (b) f(x) x (c)
  • 460. PT5.2 MATHEMATICAL BACKGROUND 443 utilize interpolating polynomials. Imprecise data are often analyzed with least-squares regression. Trend analysis may be used to predict or forecast values of the dependent variable. This can involve extrapolation beyond the limits of the observed data or interpolation within the range of the data. All fields of engineering commonly involve problems of this type. A second engineering application of experimental curve fitting is hypothesis testing. Here, an existing mathematical model is compared with measured data. If the model coefficients are unknown, it may be necessary to determine values that best fit the ob- served data. On the other hand, if estimates of the model coefficients are already avail- able, it may be appropriate to compare predicted values of the model with observed values to test the adequacy of the model. Often, alternative models are compared and the “best” one is selected on the basis of empirical observations. In addition to the above engineering applications, curve fitting is important in other numerical methods such as integration and the approximate solution of differential equa- tions. Finally, curve-fitting techniques can be used to derive simple functions to ap- proximate complicated functions. PT5.2 MATHEMATICAL BACKGROUND The prerequisite mathematical background for interpolation is found in the material on Taylor series expansions and finite divided differences introduced in Chap. 4. Least-squares regression requires additional information from the field of statistics. If you are familiar with the concepts of the mean, standard deviation, residual sum of the squares, normal distribution, and confidence intervals, feel free to skip the follow- ing pages and proceed directly to PT5.3. If you are unfamiliar with these concepts or are in need of a review, the following material is designed as a brief introduction to these topics. PT5.2.1 Simple Statistics Suppose that in the course of an engineering study, several measurements were made of a particular quantity. For example, Table PT5.1 contains 24 readings of the coefficient of thermal expansion of a structural steel. Taken at face value, these data provide a limited amount of information—that is, that the values range from a minimum of 6.395 to a maximum of 6.775. Additional insight can be gained by summarizing these data in one or more well-chosen statistics that convey as much information as possible about specific characteristics of the data set. These descriptive statistics are most often selected TABLE PT5.1 Measurements of the coefficient of thermal expansion of structural steel [3 1026 in/(in ? 8F)]. 6.495 6.595 6.615 6.635 6.485 6.555 6.665 6.505 6.435 6.625 6.715 6.655 6.755 6.625 6.715 6.575 6.655 6.605 6.565 6.515 6.555 6.395 6.775 6.685
  • 461. 444 CURVE FITTING to represent (1) the location of the center of the distribution of these data and (2) the degree of spread of the data set. The most common location statistic is the arithmetic mean. The arithmetic mean (y) of a sample is defined as the sum of the individual data points (yi) divided by the num- ber of points (n), or y 5 gyi n (PT5.1) where the summation (and all the succeeding summations in this introduction) is from i 5 1 through n. The most common measure of spread for a sample is the standard deviation (sy) about the mean, sy 5 B St n 2 1 (PT5.2) where St is the total sum of the squares of the residuals between the data points and the mean, or St 5 g (yi 2 y)2 (PT5.3) Thus, if the individual measurements are spread out widely around the mean, St (and, consequently, sy) will be large. If they are grouped tightly, the standard deviation will be small. The spread can also be represented by the square of the standard deviation, which is called the variance: s2 y 5 g (yi 2 y)2 n 2 1 (PT5.4) Note that the denominator in both Eqs. (PT5.2) and (PT5.4) is n 2 1. The quantity n 2 1 is referred to as the degrees of freedom. Hence St and sy are said to be based on n 2 1 degrees of freedom. This nomenclature derives from the fact that the sum of the quanti- ties upon which St is based (that is, y 2 y1, y 2 y2, p , y 2 yn) is zero. Consequently, if y is known and n 2 1 of the values are specified, the remaining value is fixed. Thus, only n 2 1 of the values are said to be freely determined. Another justification for divid- ing by n 2 1 is the fact that there is no such thing as the spread of a single data point. For the case where n 5 1, Eqs. (PT5.2) and (PT5.4) yield a meaningless result of infinity. It should be noted that an alternative, more convenient formula is available to com- pute the standard deviation, s2 y 5 gy2 i 2 ( gyi)2 yn n 2 1 This version does not require precomputation of y and yields an identical result as Eq. (PT5.4).
  • 462. PT5.2 MATHEMATICAL BACKGROUND 445 A final statistic that has utility in quantifying the spread of data is the coefficient of variation (c.v.). This statistic is the ratio of the standard deviation to the mean. As such, it provides a normalized measure of the spread. It is often multiplied by 100 so that it can be expressed in the form of a percent: c.v. 5 sy y 100% (PT5.5) Notice that the coefficient of variation is similar in spirit to the percent relative error (␧t) discussed in Sec. 3.3. That is, it is the ratio of a measure of error (sy) to an estimate of the true value (y). EXAMPLE PT5.1 Simple Statistics of a Sample Problem Statement. Compute the mean, variance, standard deviation, and coefficient of variation for the data in Table PT5.1. TABLE PT5.2 Computations for statistics for the readings of the coefficient of thermal expansion. The frequencies and bounds are developed to construct the histogram in Fig. PT5.2. Interval Lower Upper i yi (yi 2 y w)2 Frequency Bound Bound 1 6.395 0.042025 1 6.36 6.40 2 6.435 0.027225 1 6.40 6.44 3 6.485 0.013225 4 6.495 0.011025 4 6.48 6.52 5 6.505 0.009025 6 6.515 0.007225 7 6.555 0.002025 8 6.555 0.002025 2 6.52 6.56 9 6.565 0.001225 10 6.575 0.000625 3 6.56 6.60 11 6.595 0.000025 12 6.605 0.000025 13 6.615 0.000225 14 6.625 0.000625 5 6.60 6.64 15 6.625 0.000625 16 6.635 0.001225 17 6.655 0.003025 18 6.655 0.003025 3 6.64 6.68 19 6.665 0.004225 20 6.685 0.007225 21 6.715 0.013225 3 6.68 6.72 22 6.715 0.013225 23 6.755 0.024025 1 6.72 6.76 24 6.775 0.030625 1 6.76 6.80 S 158.4 0.217000
  • 463. 446 CURVE FITTING Solution. These data are added (Table PT5.2), and the results are used to compute [Eq. (PT5.1)] y 5 158.4 24 5 6.6 As in Table PT5.2, the sum of the squares of the residuals is 0.217000, which can be used to compute the standard deviation [Eq. (PT5.2)]: sy 5 B 0.217000 24 2 1 5 0.097133 the variance [Eq. (PT5.4)]: s2 y 5 0.009435 and the coefficient of variation [Eq. (PT5.5)]: c.v. 5 0.097133 6.6 100% 5 1.47% PT5.2.2 The Normal Distribution Another characteristic that bears on the present discussion is the data distribution—that is, the shape with which these data are spread around the mean. A histogram provides a simple visual representation of the distribution. As seen in Table PT5.2, the histogram is constructed by sorting the measurements into intervals. The units of measurement are plot- ted on the abscissa and the frequency of occurrence of each interval is plotted on the or- dinate. Thus, five of the measurements fall between 6.60 and 6.64. As in Fig. PT5.2, the histogram suggests that most of these data are grouped close to the mean value of 6.6. If we have a very large set of data, the histogram often can be approximated by a smooth curve. The symmetric, bell-shaped curve superimposed on Fig. PT5.2 is one such characteristic shape—the normal distribution. Given enough additional measurements, the histogram for this particular case could eventually approach the normal distribution. The concepts of the mean, standard deviation, residual sum of the squares, and normal distribution all have great relevance to engineering practice. A very simple ex- ample is their use to quantify the confidence that can be ascribed to a particular measure- ment. If a quantity is normally distributed, the range defined by y 2 sy to y 1 sy will encompass approximately 68 percent of the total measurements. Similarly, the range defined by y 2 2sy to y 1 2sy will encompass approximately 95 percent. For example, for the data in Table PT5.1 (y 5 6.6 and sy 5 0.097133), we can make the statement that approximately 95 percent of the readings should fall between 6.405734 and 6.794266. If someone told us that they had measured a value of 7.35, we would suspect that the measurement might be erroneous. The following section elaborates on such evaluations. PT5.2.3 Estimation of Confidence Intervals As should be clear from the previous sections, one of the primary aims of statistics is to estimate the properties of a population based on a limited sample drawn from that
  • 464. PT5.2 MATHEMATICAL BACKGROUND 447 population. Clearly, it is impossible to measure the coefficient of thermal expansion for every piece of structural steel that has ever been produced. Consequently, as seen in Tables PT5.1 and PT5.2, we can randomly make a number of measurements and, on the basis of the sample, attempt to characterize the properties of the entire population. Because we “infer” properties of the unknown population from a limited sample, the endeavor is called statistical inference. Because the results are often reported as estimates of the population parameters, the process is also referred to as estimation. We have already shown how we estimate the central tendency (sample mean, y) and spread (sample standard deviation and variance) of a limited sample. Now, we will briefly describe how we can attach probabilistic statements to the quality of these estimates. In particular, we will discuss how we can define a confidence interval around our estimate of the mean. We have chosen this particular topic because of its direct relevance to the regression models we will be describing in Chap. 17. Note that in the following discussion, the nomenclature y and sy refer to the sample mean and standard deviation, respectively. The nomenclature ␮ and ␴ refer to the popu- lation mean and standard deviation, respectively. The former are sometimes referred to as the “estimated” mean and standard deviation, whereas the latter are sometimes called the “true” mean and standard deviation. An interval estimator gives the range of values within which the parameter is ex- pected to lie with a given probability. Such intervals are described as being one-sided or two-sided. As the name implies, a one-sided interval expresses our confidence that the parameter estimate is less than or greater than the true value. In contrast, the two-sided interval deals with the more general proposition that the estimate agrees with the truth with no consideration to the sign of the discrepancy. Because it is more general, we will focus on the two-sided interval. FIGURE PT5.2 A histogram used to depict the distribution of data. As the number of data points increases, the histogram could approach the smooth, bell-shaped curve called the normal distribution. 5 4 Frequency 3 2 1 6.4 6.6 6.8 0
  • 465. 448 CURVE FITTING A two-sided interval can be described by the statement P{L # m # U} 5 1 2 a which reads, “the probability that the true mean of y, ␮, falls within the bound from L to U is 1 2 ␣.” The quantity ␣ is called the significance level. So the problem of defining a confidence interval reduces to estimating L and U. Although it is not abso- lutely necessary, it is customary to view the two-sided interval with the ␣ probability distributed evenly as ␣y2 in each tail of the distribution, as in Fig. PT5.3. If the true variance of the distribution of y, ␴2 , is known (which is not usually the case), statistical theory states that the sample mean y comes from a normal distribution with mean ␮ and variance ␴2 yn (Box PT5.1). In the case illustrated in Fig. PT5.3, we really do not know ␮. Therefore, we do not know where the normal curve is exactly located with respect to y. To circumvent this dilemma, we compute a new quantity, the standard normal estimate z 5 y 2 m sy2n (PT5.6) which represents the normalized distance between y and ␮. According to statistical theory, this quantity should be normally distributed with a mean of 0 and a variance of 1. Furthermore, the probability that z would fall within the unshaded region of Fig. PT5.3 FIGURE PT5.3 A two-sided confidence interval. The abscissa scale in (a) is written in the natural units of the ran- dom variable y. The normalized version of the abscissa in (b) has the mean at the origin and scales the axis so that the standard deviation corresponds to a unit value. L ␣/2 1 – ␣ Distribution of means of y, y – ␮ U y (a) z–␣/2 –1 1 0 z␣/2 z – (b) ␣/2 ␴ –␴
  • 466. PT5.2 MATHEMATICAL BACKGROUND 449 should be 1 2 ␣. Therefore, the statement can be made that y 2 m sy1n , 2zay2 or y 2 m sy1n . zay2 with a probability of ␣. The quantity z␣y2 is a standard normal random variable. This is the distance measured along the normalized axis above and below the mean that encompasses 1 2 ␣ probability (Fig. PT5.3b). Values of z␣y2 are tabulated in statistics books (for example, Milton and Arnold, 2002). They can also be calculated using functions on software packages like Excel, MATLAB, and Mathcad. As an example, for ␣ 5 0.05 (in other words, defining an interval encompassing 95%), z␣y2 is equal to about 1.96. This means that an interval around the mean of width 61.96 times the standard deviation will encompass approxi- mately 95% of the distribution. These results can be rearranged to yield L # m # U Box PT5.1 A Little Statistics Most engineers take several courses to become proficient at statis- tics. Because you may not have taken such a course yet, we would like to mention a few ideas that might make this present section more coherent. As we have stated, the “game” of inferential statistics assumes that the random variable you are sampling, y, has a true mean (␮) and variance (␴2 ). Further, in the present discussion, we also as- sume that it has a particular distribution: the normal distribution. The variance of this normal distribution has a finite value that spec- ifies the “spread” of the normal distribution. If the variance is large, the distribution is broad. Conversely, if the variance is small, the distribution is narrow. Thus, the true variance quantifies the intrin- sic uncertainty of the random variable. In the game of statistics, we take a limited number of measure- ments of this quantity called a sample. From this sample, we can compute an estimated mean (y) and variance (s2 y). The more mea- surements we take, the better the estimates approximate the true values. That is, as n S `, y S m and s2 y S s2 . Suppose that we take n samples and compute an estimated mean y1. Then, we take another n samples and compute another, y2. We can keep repeating this process until we have generated a sample of means: y1, y2, y3, p , ym, where m is large. We can then develop a histogram of these means and determine a “distribution of the means,” as well as a “mean of the means” and a “standard deviation of the means.” Now the question arises: does this new distribution of means and its statistics behave in a predictable fashion? There is an extremely important theorem known as the Central Limit Theorem that speaks directly to this question. It can be stated as Let y1, y2, . . . , yn be a random sample of size n from a distribu- tion with mean ␮ and variance ␴2 . Then, for large n, y is approxi- mately normal with mean ␮ and variance ␴2 yn. Furthermore, for large n, the random variable (y 2 m)y(sy1n) is approximately standard normal. Thus, the theorem states the remarkable result that the distri- bution of means will always be normally distributed regardless of the underlying distribution of the random variables! It also yields the expected result that given a sufficiently large sample, the mean of the means should converge on the true population mean ␮. Further, the theorem says that as the sample size gets larger, the variance of the means should approach zero. This makes sense, because if n is small, our individual estimates of the mean should be poor and the variance of the means should be large. As n in- creases, our estimates of the mean will improve and hence their spread should shrink. The Central Limit Theorem neatly defines exactly how this shrinkage relates to both the true variance and the sample size, that is, as ␴2 yn. Finally, the theorem states the important result that we have given as Eq. (PT5.6). As is shown in this section, this result is the basis for constructing confidence intervals for the mean.
  • 467. 450 CURVE FITTING with a probability of 1 2 ␣, where L 5 y 2 s 1n zay2 U 5 y 1 s 1n zay2 (PT5.7) Now, although the foregoing provides an estimate of L and U, it is based on knowl- edge of the true variance ␴. For our case, we know only the estimated variance sy. A straightforward alternative would be to develop a version of Eq. (PT5.6) based on sy, t 5 y 2 m syy1n (PT5.8) Even when we sample from a normal distribution, this fraction will not be normally distributed, particularly when n is small. It was found by W. S. Gossett that the random variable defined by Eq. (PT5.8) follows the so-called Student-t, or simply, t distribution. For this case, L 5 y 2 sy 1n tay2, n21 U 5 y 1 sy 1n tay2, n21 (PT5.9) where t␣y2,n21 is the standard random variable for the t distribution for a probability of ␣y2. As was the case for z␣y2, values are tabulated in statistics books and can also be calculated using software packages and libraries. For example, if ␣ 5 0.05 and n 5 20, t␣y2,n21 5 2.086. The t distribution can be thought of as a modification of the normal distribution that accounts for the fact that we have an imperfect estimate of the standard deviation. When n is small, it tends to be flatter than the normal (see Fig. PT5.4). Therefore, for small FIGURE PT5.4 Comparison of the normal distribution with the t distribution for n 5 3 and n 5 6. Notice how the t distribution is generally flatter. –1 –2 –3 0 Z or t 2 1 3 t(n = 6) t(n = 3) Normal
  • 468. PT5.2 MATHEMATICAL BACKGROUND 451 numbers of measurements, it yields wider and hence more conservative confidence in- tervals. As n grows larger, the t distribution converges on the normal. EXAMPLE PT5.2 Confidence Interval on the Mean Problem Statement. Determine the mean and the corresponding 95% confidence interval for the data from Table PT5.1. Perform three estimates based on (a) the first 8, (b) the first 16, and (c) all 24 measurements. Solution. (a) The mean and standard deviation for the first 8 points is y 5 52.72 8 5 6.59 sy 5 B 347.4814 2 (52.72)2 y8 8 2 1 5 0.089921 The appropriate t statistic can be calculated as t0.05y2,821 5 t0.025,7 5 2.364623 which can be used to compute the interval L 5 6.59 2 0.089921 18 2.364623 5 6.5148 U 5 6.59 1 0.089921 18 2.364623 5 6.6652 or 6.5148 # m # 6.6652 FIGURE PT5.5 Estimates of the mean and 95% confidence intervals for different numbers of sample size. 6.60 6.55 6.50 Coefficient of thermal expansion [⫻ 10–6 in/(in • ⬚F)] 6.70 6.65 n = 24 n = 16 y – n = 8
  • 469. 452 CURVE FITTING Thus, based on the first eight measurements, we conclude that there is a 95% probabil- ity that the true mean falls within the range 6.5148 to 6.6652. The two other cases for (b) 16 points and (c) 24 points can be calculated in a similar fashion and the results tabulated along with case (a) as n y w sy t␣y2,n21 L U 8 6.5900 0.089921 2.364623 6.5148 6.6652 16 6.5794 0.095845 2.131451 6.5283 6.6304 24 6.6000 0.097133 2.068655 6.5590 6.6410 These results, which are also summarized in Fig. PT5.5, indicate the expected outcome that the confidence interval becomes more narrow as n increases. Thus, the more mea- surements we take, our estimate of the true value becomes more refined. The above is just one simple example of how statistics can be used to make judg- ments regarding uncertain data. These concepts will also have direct relevance to our discussion of regression models. You can consult any basic statistics book (for example, Milton and Arnold, 2002) to obtain additional information on the subject. PT5.3 ORIENTATION Before we proceed to numerical methods for curve fitting, some orientation might be helpful. The following is intended as an overview of the material discussed in Part Five. In addition, we have formulated some objectives to help focus your efforts when study- ing the material. PT5.3.1 Scope and Preview Figure PT5.6 provides a visual overview of the material to be covered in Part Five. Chapter 17 is devoted to least-squares regression. We will first learn how to fit the “best” straight line through a set of uncertain data points. This technique is called lin- ear regression. Besides discussing how to calculate the slope and intercept of this straight line, we also present quantitative and visual methods for evaluating the validity of the results. In addition to fitting a straight line, we also present a general technique for fitting a “best’’ polynomial. Thus, you will learn to derive a parabolic, cubic, or higher-order polynomial that optimally fits uncertain data. Linear regression is a subset of this more general approach, which is called polynomial regression. The next topic covered in Chap. 17 is multiple linear regression. It is designed for the case where the dependent variable y is a linear function of two or more independent variables x1, x2, . . . , xm. This approach has special utility for evaluating experimental data where the variable of interest is dependent on a number of different factors.
  • 470. PT5.3 ORIENTATION 453 After multiple regression, we illustrate how polynomial and multiple regression are both subsets of a general linear least-squares model. Among other things, this will allow us to introduce a concise matrix representation of regression and discuss its general statistical properties. FIGURE PT5.6 Schematic of the organization of the material in Part Five: Curve Fitting. PART 5 Curve Fitting CHAPTER 20 Case Studies EPILOGUE 18.6 Splines 18.7 Multidimensional interpolation 18.5 Additional comments 18.4 Inverse interpolation 18.3 Polynomial coefficients 18.2 Lagrange polynomial 18.1 Newton polynomial PT 5.2 Mathematical background PT 5.6 Advanced methods PT 5.5 Important formulas 20.4 Mechanical engineering 20.3 Electrical engineering 20.2 Civil engineering 20.1 Chemical engineering 19.8 Software packages 19.7 Power spectrum 19.1 Sinusoids 19.2 Continuous Fourier series 19.6 Fast Fourier transform 19.5 Discrete Fourier transform 19.3 Frequency and time domains 19.4 Fourier transform PT 5.4 Trade-offs PT 5.3 Orientation PT 5.1 Motivation 17.2 Polynomial regression 17.3 Multiple regression 17.4 General linear least squares 17.5 Nonlinear regression 17.1 Linear regression CHAPTER 17 Least-Squares Regression CHAPTER 19 Fourier Approximation CHAPTER 18 Interpolation
  • 471. 454 CURVE FITTING Finally, the last sections of Chap. 17 are devoted to nonlinear regression. This ap- proach is designed to compute a least-squares fit of a nonlinear equation to data. In Chap. 18, the alternative curve-fitting technique called interpolation is de- scribed. As discussed previously, interpolation is used for estimating intermediate values between precise data points. In Chap. 18, polynomials are derived for this purpose. We introduce the basic concept of polynomial interpolation by using straight lines and parabolas to connect points. Then, we develop a generalized procedure for fitting an nth-order polynomial. Two formats are presented for expressing these poly- nomials in equation form. The first, called Newton’s interpolating polynomial, is pref- erable when the appropriate order of the polynomial is unknown. The second, called the Lagrange interpolating polynomial, has advantages when the proper order is known beforehand. The next section of Chap. 18 presents an alternative technique for fitting precise data points. This technique, called spline interpolation, fits polynomials to data but in a piece- wise fashion. As such, it is particularly well-suited for fitting data that are generally smooth but exhibit abrupt local changes. Finally, we provide a brief introduction to multidimensional interpolation. Chapter 19 deals with the Fourier transform approach to curve fitting where periodic functions are fit to data. Our emphasis in this section will be on the fast Fourier trans- form. At the end of this chapter, we also include an overview of several software pack- ages that can be used for curve fitting. These are Excel, MATLAB, and Mathcad. Chapter 20 is devoted to engineering applications that illustrate the utility of the numerical methods in engineering problem contexts. Examples are drawn from the four major specialty areas of chemical, civil, electrical, and mechanical engineering. In addi- tion, some of the applications illustrate how software packages can be applied for engi- neering problem solving. Finally, an epilogue is included at the end of Part Five. It contains a summary of the important formulas and concepts related to curve fitting as well as a discussion of trade-offs among the techniques and suggestions for future study. PT5.3.2 Goals and Objectives Study Objectives. After completing Part Five, you should have greatly enhanced your capability to fit curves to data. In general, you should have mastered the techniques, have learned to assess the reliability of the answers, and be capable of choosing the preferred method (or methods) for any particular problem. In addition to these general goals, the specific concepts in Table PT5.3 should be assimilated and mastered. Computer Objectives. You have been provided with simple computer algorithms to implement the techniques discussed in Part Five. You may also have access to software packages and libraries. All have utility as learning tools. Pseudocode algorithms are provided for most of the methods in Part Five. This information will allow you to expand your software library to include techniques beyond polynomial regression. For example, you may find it useful from a professional view- point to have software to implement multiple linear regression, Newton’s interpolating polynomial, cubic spline interpolation, and the fast Fourier transform.
  • 472. PT5.3 ORIENTATION 455 In addition, one of your most important goals should be to master several of the general-purpose software packages that are widely available. In particular, you should become adept at using these tools to implement numerical methods for engineering problem solving. TABLE PT5.3 Specific study objectives for Part Five. 1. Understand the fundamental difference between regression and interpolation and realize why confusing the two could lead to serious problems 2. Understand the derivation of linear least-squares regression and be able to assess the reliability of the fit using graphical and quantitative assessments 3. Know how to linearize data by transformation 4. Understand situations where polynomial, multiple, and nonlinear regression are appropriate 5. Be able to recognize general linear models, understand the general matrix formulation of linear least squares, and know how to compute confidence intervals for parameters 6. Understand that there is one and only one polynomial of degree n or less that passes exactly through n 1 1 points 7. Know how to derive the first-order Newton’s interpolating polynomial 8. Understand the analogy between Newton’s polynomial and the Taylor series expansion and how it relates to the truncation error 9. Recognize that the Newton and Lagrange equations are merely different formulations of the same interpolating polynomial and understand their respective advantages and disadvantages 10. Realize that more accurate results are generally obtained if data used for interpolation are centered around and close to the unknown point 11. Realize that data points do not have to be equally spaced nor in any particular order for either the Newton or Lagrange polynomials 12. Know why equispaced interpolation formulas have utility 13. Recognize the liabilities and risks associated with extrapolation 14. Understand why spline functions have utility for data with local areas of abrupt change 15. Understand how interpolating polynomials can be applied in two dimensions 16. Recognize how the Fourier series is used to fit data with periodic functions 17. Understand the difference between the frequency and time domains
  • 473. 17 C H A P T E R 17 456 Least-Squares Regression Where substantial error is associated with data, polynomial interpolation is inappropriate and may yield unsatisfactory results when used to predict intermediate values. Experi- mental data are often of this type. For example, Fig. 17.1a shows seven experimentally derived data points exhibiting significant variability. Visual inspection of these data sug- gests a positive relationship between y and x. That is, the overall trend indicates that higher values of y are associated with higher values of x. Now, if a sixth-order interpo- lating polynomial is fitted to these data (Fig. 17.1b), it will pass exactly through all of the points. However, because of the variability in these data, the curve oscillates widely in the interval between the points. In particular, the interpolated values at x 5 1.5 and x 5 6.5 appear to be well beyond the range suggested by these data. A more appropriate strategy for such cases is to derive an approximating function that fits the shape or general trend of the data without necessarily matching the indi- vidual points. Figure 17.1c illustrates how a straight line can be used to generally char- acterize the trend of these data without passing through any particular point. One way to determine the line in Fig. 17.1c is to visually inspect the plotted data and then sketch a “best” line through the points. Although such “eyeball” approaches have commonsense appeal and are valid for “back-of-the-envelope” calculations, they are deficient because they are arbitrary. That is, unless the points define a perfect straight line (in which case, interpolation would be appropriate), different analysts would draw different lines. To remove this subjectivity, some criterion must be devised to establish a basis for the fit. One way to do this is to derive a curve that minimizes the discrepancy between the data points and the curve. A technique for accomplishing this objective, called least- squares regression, will be discussed in the present chapter. 17.1 LINEAR REGRESSION The simplest example of a least-squares approximation is fitting a straight line to a set of paired observations: (x1, y1), (x2, y2), . . . , (xn, yn). The mathematical expression for the straight line is y 5 a0 1 a1x 1 e (17.1)
  • 474. 17.1 LINEAR REGRESSION 457 where a0 and a1 are coefficients representing the intercept and the slope, respectively, and e is the error, or residual, between the model and the observations, which can be represented by rearranging Eq. (17.1) as e 5 y 2 a0 2 a1x Thus, the error, or residual, is the discrepancy between the true value of y and the ap- proximate value, a0 1 a1x, predicted by the linear equation. y x (a) 5 5 0 0 y x (b) 5 5 0 0 y x (c) 5 5 0 0 FIGURE 17.1 (a) Data exhibiting significant error. (b) Polynomial fit oscillating beyond the range of the data. (c) More satisfactory result using the least-squares fit.
  • 475. 458 LEAST-SQUARES REGRESSION 17.1.1 Criteria for a “Best” Fit One strategy for fitting a “best” line through the data would be to minimize the sum of the residual errors for all the available data, as in a n i51 ei 5 a n i51 (yi 2 a0 2 a1 xi) (17.2) where n 5 total number of points. However, this is an inadequate criterion, as illustrated by Fig. 17.2a which depicts the fit of a straight line to two points. Obviously, the best FIGURE 17.2 Examples of some criteria for “best fit” that are inadequate for regression: (a) minimizes the sum of the residuals, (b) minimizes the sum of the absolute values of the residuals, and (c) minimizes the maximum error of any individual point. y Midpoint Outlier x (a) y x (b) y x (c)
  • 476. 17.1 LINEAR REGRESSION 459 fit is the line connecting the points. However, any straight line passing through the mid- point of the connecting line (except a perfectly vertical line) results in a minimum value of Eq. (17.2) equal to zero because the errors cancel. Therefore, another logical criterion might be to minimize the sum of the absolute values of the discrepancies, as in a n i51 Zei Z 5 a n i51 Zyi 2 a0 2 a1xi Z Figure 17.2b demonstrates why this criterion is also inadequate. For the four points shown, any straight line falling within the dashed lines will minimize the sum of the absolute values. Thus, this criterion also does not yield a unique best fit. A third strategy for fitting a best line is the minimax criterion. In this technique, the line is chosen that minimizes the maximum distance that an individual point falls from the line. As depicted in Fig. 17.2c, this strategy is ill-suited for regres- sion because it gives undue influence to an outlier, that is, a single point with a large error. It should be noted that the minimax principle is sometimes well-suited for fitting a simple function to a complicated function (Carnahan, Luther, and Wilkes, 1969). A strategy that overcomes the shortcomings of the aforementioned approaches is to minimize the sum of the squares of the residuals between the measured y and the y calculated with the linear model Sr 5 a n i51 e2 i 5 a n i51 (yi,measured 2 yi,model)2 5 a n i51 (yi 2 a0 2 a1xi)2 (17.3) This criterion has a number of advantages, including the fact that it yields a unique line for a given set of data. Before discussing these properties, we will present a technique for determining the values of a0 and a1 that minimize Eq. (17.3). 17.1.2 Least-Squares Fit of a Straight Line To determine values for a0 and a1, Eq. (17.3) is differentiated with respect to each coef- ficient: 0Sr 0a0 5 22 a (yi 2 a0 2 a1xi) 0Sr 0a1 5 22 a [(yi 2 a0 2 a1xi)xi] Note that we have simplified the summation symbols; unless otherwise indicated, all summations are from i 5 1 to n. Setting these derivatives equal to zero will result in a minimum Sr. If this is done, the equations can be expressed as 0 5 a yi 2 a a0 2 a a1xi 0 5 a yi xi 2 a a0 xi 2 a a1x2 i
  • 477. 460 LEAST-SQUARES REGRESSION Now, realizing that Sa0 5 na0, we can express the equations as a set of two simultane- ous linear equations with two unknowns (a0 and a1): na0 1 (a xi)a1 5 a yi (17.4) (a xi)a0 1 (a x2 i )a1 5 a xi yi (17.5) These are called the normal equations. They can be solved simultaneously a1 5 noxi yi 2 oxi oyi nox2 i 2 (oxi)2 (17.6) This result can then be used in conjunction with Eq. (17.4) to solve for a0 5 y 2 a1x (17.7) where y and x are the means of y and x, respectively. EXAMPLE 17.1 Linear Regression Problem Statement. Fit a straight line to the x and y values in the first two columns of Table 17.1. Solution. The following quantities can be computed: n 5 7 a xi yi 5 119.5 a x2 i 5 140 a xi 5 28 x 5 28 7 5 4 a yi 5 24 y 5 24 7 5 3.428571 Using Eqs. (17.6) and (17.7), a1 5 7(119.5) 2 28(24) 7(140) 2 (28)2 5 0.8392857 a0 5 3.428571 2 0.8392857(4) 5 0.07142857 TABLE 17.1 Computations for an error analysis of the linear fit. xi yi (yi 2 y) (yi 2 a0 2 a1xi)2 1 0.5 8.5765 0.1687 2 2.5 0.8622 0.5625 3 2.0 2.0408 0.3473 4 4.0 0.3265 0.3265 5 3.5 0.0051 0.5896 6 6.0 6.6122 0.7972 7 5.5 4.2908 0.1993 S 24.0 22.7143 2.9911
  • 478. 17.1 LINEAR REGRESSION 461 17.1.3 Quantification of Error of Linear Regression Any line other than the one computed in Example 17.1 results in a larger sum of the squares of the residuals. Thus, the line is unique and in terms of our chosen criterion is a “best” line through the points. A number of additional properties of this fit can be elucidated by examining more closely the way in which residuals were computed. Recall that the sum of the squares is defined as [Eq. (17.3)] Sr 5 a n i51 e2 i 5 a n i51 (yi 2 a0 2 a1xi)2 (17.8) Notice the similarity between Eqs. (PT5.3) and (17.8). In the former case, the square of the residual represented the square of the discrepancy between the data and a single estimate of the measure of central tendency—the mean. In Eq. (17.8), the square of the residual represents the square of the vertical distance between the data and another mea- sure of central tendency—the straight line (Fig. 17.3). The analogy can be extended further for cases where (1) the spread of the points around the line is of similar magnitude along the entire range of the data and (2) the distribution of these points about the line is normal. It can be demonstrated that if these criteria are met, least-squares regression will provide the best (that is, the most likely) estimates of a0 and a1 (Draper and Smith, 1981). This is called the maximum likelihood Therefore, the least-squares fit is y 5 0.07142857 1 0.8392857x The line, along with the data, is shown in Fig. 17.1c. FIGURE 17.3 The residual in linear regression represents the vertical distance between a data point and the straight line. y yi xi a0 + a1xi Measurement yi – a0 – a1xi Regression line x
  • 479. 462 LEAST-SQUARES REGRESSION principle in statistics. In addition, if these criteria are met, a “standard deviation” for the regression line can be determined as [compare with Eq. (PT5.2)] syyx 5 A Sr n 2 2 (17.9) where syyx is called the standard error of the estimate. The subscript notation “yyx” desig- nates that the error is for a predicted value of y corresponding to a particular value of x. Also, notice that we now divide by n 2 2 because two data-derived estimates—a0 and a1—were used to compute Sr; thus, we have lost two degrees of freedom. As with our discussion of the standard deviation in PT5.2.1, another justification for dividing by n 2 2 is that there is no such thing as the “spread of data” around a straight line connecting two points. Thus, for the case where n 5 2, Eq. (17.9) yields a meaningless result of infinity. Just as was the case with the standard deviation, the standard error of the estimate quantifies the spread of the data. However, sy/x quantifies the spread around the regression line as shown in Fig. 17.4b in contrast to the original standard deviation sy that quantified the spread around the mean (Fig. 17.4a). The above concepts can be used to quantify the “goodness” of our fit. This is par- ticularly useful for comparison of several regressions (Fig. 17.5). To do this, we return to the original data and determine the total sum of the squares around the mean for the dependent variable (in our case, y). As was the case for Eq. (PT5.3), this quantity is designated St. This is the magnitude of the residual error associated with the dependent variable prior to regression. After performing the regression, we can compute Sr, the sum of the squares of the residuals around the regression line. This characterizes the residual error that remains after the regression. It is, therefore, sometimes called the unexplained FIGURE 17.4 Regression data showing (a) the spread of the data around the mean of the dependent variable and (b) the spread of the data around the best-fit line. The reduction in the spread in going from (a) to (b), as indicated by the bell-shaped curves at the right, represents the improvement due to linear regression. (a) (b)
  • 480. 17.1 LINEAR REGRESSION 463 sum of the squares. The difference between the two quantities, St 2 Sr, quantifies the improvement or error reduction due to describing the data in terms of a straight line rather than as an average value. Because the magnitude of this quantity is scale-dependent, the difference is normalized to St to yield r2 5 St 2 Sr St (17.10) where r2 is called the coefficient of determination and r is the correlation coefficient (52r2 ). For a perfect fit, Sr 5 0 and r 5 r2 5 1, signifying that the line explains 100 percent of the variability of the data. For r 5 r2 5 0, Sr 5 St and the fit represents no improvement. An alternative formulation for r that is more convenient for computer implementation is r 5 noxiyi 2 (oxi)(oyi) 2nox2 i 2 (oxi)2 2noy2 i 2 (oyi)2 (17.11) y x (a) y x (b) FIGURE 17.5 Examples of linear regression with (a) small and (b) large residual errors.
  • 481. 464 LEAST-SQUARES REGRESSION EXAMPLE 17.2 Estimation of Errors for the Linear Least-Squares Fit Problem Statement. Compute the total standard deviation, the standard error of the estimate, and the correlation coefficient for the data in Example 17.1. Solution. The summations are performed and presented in Table 17.1. The standard deviation is [Eq. (PT5.2)] sy 5 A 22.7143 7 2 1 5 1.9457 and the standard error of the estimate is [Eq. (17.9)] syyx 5 A 2.9911 7 2 2 5 0.7735 Thus, because syyx , sy, the linear regression model has merit. The extent of the improve- ment is quantified by [Eq. (17.10)] r2 5 22.7143 2 2.9911 22.7143 5 0.868 or r 5 10.868 5 0.932 These results indicate that 86.8 percent of the original uncertainty has been explained by the linear model. Before proceeding to the computer program for linear regression, a word of caution is in order. Although the correlation coefficient provides a handy measure of goodness- of-fit, you should be careful not to ascribe more meaning to it than is warranted. Just because r is “close” to 1 does not mean that the fit is necessarily “good.” For example, it is possible to obtain a relatively high value of r when the underlying relationship between y and x is not even linear. Draper and Smith (1981) provide guidance and ad- ditional material regarding assessment of results for linear regression. In addition, at the minimum, you should always inspect a plot of the data along with your regression curve. As described in the next section, software packages include such a capability. 17.1.4 Computer Program for Linear Regression It is a relatively trivial matter to develop a pseudocode for linear regression (Fig. 17.6). As mentioned above, a plotting option is critical to the effective use and interpretation of regression. Such capabilities are included in popular packages like MATLAB software and Excel. If your computer language has plotting capabilities, we recommend that you expand your program to include a plot of y versus x, showing both the data and the regression line. The inclusion of the capability will greatly enhance the utility of the program in problem-solving contexts.
  • 482. 17.1 LINEAR REGRESSION 465 EXAMPLE 17.3 Linear Regression Using the Computer Problem Statement. We can use software based on Fig. 17.6 to solve a hypothesis- testing problem associated with the falling parachutist discussed in Chap. 1. A theoreti- cal mathematical model for the velocity of the parachutist was given as the following [Eq. (1.10)]: y(t) 5 gm c (1 2 e(2cym)t ) where y 5 velocity (m/s), g 5 gravitational constant (9.8 m/s2 ), m 5 mass of the para- chutist equal to 68.1 kg, and c 5 drag coefficient of 12.5 kg/s. The model predicts the velocity of the parachutist as a function of time, as described in Example 1.1. An alternative empirical model for the velocity of the parachutist is given by y(t) 5 gm c a t 3.75 1 t b (E17.3.1) Suppose that you would like to test and compare the adequacy of these two math- ematical models. This might be accomplished by measuring the actual velocity of the SUB Regress(x, y, n, al, a0, syx, r2) sumx 5 0: sumxy 5 0: st 5 0 sumy 5 0: sumx2 5 0: sr 5 0 DOFOR i 5 1, n sumx 5 sumx 1 xi sumy 5 sumy 1 yi sumxy 5 sumxy 1 xi*yi sumx2 5 sumx2 1 xi*xi END DO xm 5 sumx/n ym 5 sumy/n a1 5 (n*sumxy 2 sumx*sumy)y(n*sumx2 2 sumx*sumx) a0 5 ym 2 a1*xm DOFOR i 5 1, n st 5 st 1 (yi 2 ym)2 sr 5 sr 1 (yi 2 a1*xi 2 a0)2 END DO syx 5 (sr/(n 2 2))0.5 r2 5 (st 2 sr)/st END Regress FIGURE 17.6 Algorithm for linear regression.
  • 483. 466 LEAST-SQUARES REGRESSION parachutist at known values of time and comparing these results with the predicted ve- locities according to each model. Such an experimental-data-collection program was implemented, and the results are listed in column (a) of Table 17.2. Computed velocities for each model are listed in columns (b) and (c). Solution. The adequacy of the models can be tested by plotting the model-calculated velocity versus the measured velocity. Linear regression can be used to calculate the slope and the intercept of the plot. This line will have a slope of 1, an intercept of 0, and an r2 5 1 if the model matches the data perfectly. A significant deviation from these values can be used as an indication of the inadequacy of the model. Figure 17.7a and b are plots of the line and data for the regressions of columns (b) and (c), respectively, versus column (a). For the first model [Eq. (1.10) as depicted in Fig. 17.7a], ymodel 5 20.859 1 1.032ymeasure and for the second model [Eq. (E17.3.1) as depicted in Fig. 17.7b], ymodel 5 5.776 1 0.752ymeasure These plots indicate that the linear regression between these data and each of the models is highly significant. Both models match the data with a correlation coefficient of greater than 0.99. However, the model described by Eq. (1.10) conforms to our hypothesis test criteria much better than that described by Eq. (E17.3.1) because the slope and intercept are more nearly equal to 1 and 0. Thus, although each plot is well described by a straight line, Eq. (1.10) appears to be a better model than Eq. (E17.3.1). TABLE 17.2 Measured and calculated velocities for the falling parachutist. Measured v, Model-calculated v, Model-calculated v, m/s m/s [Eq. (1.10)] m/s [Eq. (E17.3.1)] Time, s (a) (b) (c) 1 10.00 8.953 11.240 2 16.30 16.405 18.570 3 23.00 22.607 23.729 4 27.50 27.769 27.556 5 31.00 32.065 30.509 6 35.60 35.641 32.855 7 39.00 38.617 34.766 8 41.50 41.095 36.351 9 42.90 43.156 37.687 10 45.00 44.872 38.829 11 46.00 46.301 39.816 12 45.50 47.490 40.678 13 46.00 48.479 41.437 14 49.00 49.303 42.110 15 50.00 49.988 42.712
  • 484. 17.1 LINEAR REGRESSION 467 Model testing and selection are common and extremely important activities per- formed in all fields of engineering. The background material provided in this chapter, together with your software, should allow you to address many practical problems of this type. 55 30 Y 5 30 X 55 5 (a) 55 30 Y 5 30 X 55 5 (b) FIGURE 17.7 (a) Results using linear regression to compare predictions computed with the theoretical model [Eq. (1.10)] versus measured values. (b) Results using linear regression to compare predictions computed with the empirical model [Eq. (E17.3.1)] versus measured values. There is one shortcoming with the analysis in Example 17.3. The example was un- ambiguous because the empirical model [Eq. (E17.3.1)] was clearly inferior to Eq. (1.10). Thus, the slope and intercept for the former were so much closer to the desired result of 1 and 0, that it was obvious which model was superior. (a) (b)
  • 485. 468 LEAST-SQUARES REGRESSION However, suppose that the slope were 0.85 and the intercept were 2. Obviously this would make the conclusion that the slope and intercept were 1 and 0 open to debate. Clearly, rather than relying on a subjective judgment, it would be preferable to base such a conclusion on a quantitative criterion. This can be done by computing confidence intervals for the model parameters in the same way that we developed confidence intervals for the mean in Sec. PT5.2.3. We will return to this topic at the end of this chapter. 17.1.5 Linearization of Nonlinear Relationships Linear regression provides a powerful technique for fitting a best line to data. However, it is predicated on the fact that the relationship between the dependent and independent variables is linear. This is not always the case, and the first step in any regression analysis should be to plot and visually inspect the data to ascertain whether a linear model applies. For example, Fig. 17.8 shows some data that is obviously curvilinear. In some cases, techniques such as polynomial regression, which is described in Sec. 17.2, are appropriate. For others, transformations can be used to express the data in a form that is compatible with linear regression. FIGURE 17.8 (a) Data that are ill-suited for linear least-squares regression. (b) Indication that a parabola is preferable. y x (a) y x (b)
  • 486. 17.1 LINEAR REGRESSION 469 One example is the exponential model y 5 a1eb1x (17.12) where a1 and b1 are constants. This model is used in many fields of engineering to characterize quantities that increase (positive b1) or decrease (negative b1) at a rate that is directly proportional to their own magnitude. For example, population growth or ra- dioactive decay can exhibit such behavior. As depicted in Fig. 17.9a, the equation rep- resents a nonlinear relationship (for b1 ? 0) between y and x. Another example of a nonlinear model is the simple power equation y 5 a2xb2 (17.13) FIGURE 17.9 (a) The exponential equation, (b) the power equation, and (c) the saturation-growth-rate equation. Parts (d), (e), and (f ) are linearized versions of these equations that result from simple transformations. y x y = ␣1e␤1x (a) Linearization y x y = ␣2x␤2 (b) Linearization y x (c) Linearization y = ␣3 x ␤3 + x ln y x Slope = ␤1 Intercept = ln ␣1 (d) log y log x (e) 1/y 1/x ( f ) Intercept = log ␣2 Intercept = 1/␣3 Slope = ␤2 Slope = ␤3/␣3
  • 487. 470 LEAST-SQUARES REGRESSION where a2 and b2 are constant coefficients. This model has wide applicability in all fields of engineering. As depicted in Fig. 17.9b, the equation (for b2 ? 0 or 1) is nonlinear. A third example of a nonlinear model is the saturation-growth-rate equation [recall Eq. (E17.3.1)] y 5 a3 x b3 1 x (17.14) where a3 and b3 are constant coefficients. This model, which is particularly well-suited for characterizing population growth rate under limiting conditions, also represents a nonlinear relationship between y and x (Fig. 17.9c) that levels off, or “saturates,” as x increases. Nonlinear regression techniques are available to fit these equations to experimental data directly. (Note that we will discuss nonlinear regression in Sec. 17.5.) However, a simpler alternative is to use mathematical manipulations to transform the equations into a linear form. Then, simple linear regression can be employed to fit the equations to data. For example, Eq. (17.12) can be linearized by taking its natural logarithm to yield ln y 5 ln a1 1 b1x ln e But because ln e 5 1, ln y 5 ln a1 1 b1x (17.15) Thus, a plot of ln y versus x will yield a straight line with a slope of b1 and an intercept of ln a1 (Fig. 17.9d). Equation (17.13) is linearized by taking its base-10 logarithm to give log y 5 b2 log x 1 log a2 (17.16) Thus, a plot of log y versus log x will yield a straight line with a slope of b2 and an intercept of log a2 (Fig. 17.9e). Equation (17.14) is linearized by inverting it to give 1 y 5 b3 a3 1 x 1 1 a3 (17.17) Thus, a plot of 1Yy versus lYx will be linear, with a slope of b3Ya3 and an intercept of 1Ya3 (Fig. 17.9f ). In their transformed forms, these models can use linear regression to evaluate the constant coefficients. They could then be transformed back to their original state and used for predictive purposes. Example 17.4 illustrates this procedure for Eq. (17.13). In addition, Sec. 20.1 provides an engineering example of the same sort of computation. EXAMPLE 17.4 Linearization of a Power Equation Problem Statement. Fit Eq. (17.13) to the data in Table 17.3 using a logarithmic transformation of the data. Solution. Figure 17.10a is a plot of the original data in its untransformed state. Figure 17.10b shows the plot of the transformed data. A linear regression of the log-transformed data yields the result log y 5 1.75 log x 2 0.300
  • 488. 17.1 LINEAR REGRESSION 471 TABLE 17.3 Data to be fit to the power equation. x y log x log y 1 0.5 0 20.301 2 1.7 0.301 0.226 3 3.4 0.477 0.534 4 5.7 0.602 0.753 5 8.4 0.699 0.922 FIGURE 17.10 (a) Plot of untransformed data with the power equation that fits these data. (b) Plot of transformed data used to determine the coefficients of the power equation. y x 5 0 0 5 (a) log y 0.5 (b) log x 0.5
  • 489. 472 LEAST-SQUARES REGRESSION 17.1.6 General Comments on Linear Regression Before proceeding to curvilinear and multiple linear regression, we must emphasize the introductory nature of the foregoing material on linear regression. We have focused on the simple derivation and practical use of equations to fit data. You should be cognizant of the fact that there are theoretical aspects of regression that are of practical importance but are beyond the scope of this book. For example, some statistical assumptions that are inherent in the linear least-squares procedures are 1. Each x has a fixed value; it is not random and is known without error. 2. The y values are independent random variables and all have the same variance. 3. The y values for a given x must be normally distributed. Such assumptions are relevant to the proper derivation and use of regression. For example, the first assumption means that (1) the x values must be error-free and (2) the regression of y versus x is not the same as x versus y (try Prob. 17.4 at the end of the chapter). You are urged to consult other references such as Draper and Smith (1981) to appreciate aspects and nuances of regression that are beyond the scope of this book. 17.2 POLYNOMIAL REGRESSION In Sec. 17.1, a procedure was developed to derive the equation of a straight line using the least-squares criterion. Some engineering data, although exhibiting a marked pattern such as seen in Fig. 17.8, is poorly represented by a straight line. For these cases, a curve would be better suited to fit these data. As discussed in the previous section, one method to accomplish this objective is to use transformations. Another alternative is to fit poly- nomials to the data using polynomial regression. The least-squares procedure can be readily extended to fit the data to a higher-order polynomial. For example, suppose that we fit a second-order polynomial or quadratic: y 5 a0 1 a1x 1 a2x2 1 e For this case the sum of the squares of the residuals is [compare with Eq. (17.3)] Sr 5 a n i51 (yi 2 a0 2 a1xi 2 a2x2 i )2 (17.18) Following the procedure of the previous section, we take the derivative of Eq. (17.18) with respect to each of the unknown coefficients of the polynomial, as in 0Sr 0a0 5 22a (yi 2 a0 2 a1xi 2 a2x2 i ) Thus, the intercept, log a2, equals 20.300, and therefore, by taking the antilogarithm, a2 5 1020.3 5 0.5. The slope is b2 5 1.75. Consequently, the power equation is y 5 0.5x1.75 This curve, as plotted in Fig. 17.10a, indicates a good fit.
  • 490. 17.2 POLYNOMIAL REGRESSION 473 0Sr 0a1 5 22 a xi(yi 2 a0 2 a1xi 2 a2x2 i ) 0Sr 0a2 5 22 a x2 i (yi 2 a0 2 a1xi 2 a2x2 i ) These equations can be set equal to zero and rearranged to develop the following set of normal equations: (n)a0 1 (a xi)a1 1 (a x2 i )a2 5 a yi (a xi)a0 1 (a x2 i )a1 1 (a x3 i )a2 5 a xiyi (17.19) (a x2 i )a0 1 (a x3 i )a1 1 (a x4 i )a2 5 a x2 i yi where all summations are from i 5 1 through n. Note that the above three equations are linear and have three unknowns: a0, a1, and a2. The coefficients of the unknowns can be calculated directly from the observed data. For this case, we see that the problem of determining a least-squares second-order polynomial is equivalent to solving a system of three simultaneous linear equations. Techniques to solve such equations were discussed in Part Three. The two-dimensional case can be easily extended to an mth-order polynomial as y 5 a0 1 a1x 1 a2x2 1 p 1 amxm 1 e The foregoing analysis can be easily extended to this more general case. Thus, we can recognize that determining the coefficients of an mth-order polynomial is equivalent to solving a system of m 1 1 simultaneous linear equations. For this case, the standard error is formulated as sy/x 5 B Sr n 2 (m 1 1) (17.20) This quantity is divided by n 2 (m 1 1) because (m 1 1) data-derived coefficients— a0, a1, . . . , am—were used to compute Sr; thus, we have lost m 1 1 degrees of free- dom. In addition to the standard error, a coefficient of determination can also be computed for polynomial regression with Eq. (17.10). EXAMPLE 17.5 Polynomial Regression Problem Statement. Fit a second-order polynomial to the data in the first two columns of Table 17.4. Solution. From the given data, m 5 2 a xi 5 15 a x4 i 5 979 n 5 6 a yi 5 152.6 a xiyi 5 585.6 x 5 2.5 a x2 i 5 55 a x2 i yi 5 2488.8 y 5 25.433 a x3 i 5 225
  • 491. 474 LEAST-SQUARES REGRESSION Therefore, the simultaneous linear equations are £ 6 15 55 15 55 225 55 225 979 § • a0 a1 a2 ¶ 5 • 152.6 585.6 2488.8 ¶ Solving these equations through a technique such as Gauss elimination gives a0 5 2.47857, a1 5 2.35929, and a2 5 1.86071. Therefore, the least-squares quadratic equation for this case is y 5 2.47857 1 2.35929x 1 1.86071x2 The standard error of the estimate based on the regression polynomial is [Eq. (17.20)] syyx 5 A 3.74657 6 2 3 5 1.12 TABLE 17.4 Computations for an error analysis of the quadratic least-squares fit. xi yi (yi 2 y)2 (yi 2 a0 2 a1xi 2 a2xi 2 )2 0 2.1 544.44 0.14332 1 7.7 314.47 1.00286 2 13.6 140.03 1.08158 3 27.2 3.12 0.80491 4 40.9 239.22 0.61951 5 61.1 1272.11 0.09439 S 152.6 2513.39 3.74657 FIGURE 17.11 Fit of a second-order polynomial. y x 5 0 50 Least-squares parabola
  • 492. 17.2 POLYNOMIAL REGRESSION 475 The coefficient of determination is r2 5 2513.39 2 3.74657 2513.39 5 0.99851 and the correlation coefficient is r 5 0.99925. These results indicate that 99.851 percent of the original uncertainty has been ex- plained by the model. This result supports the conclusion that the quadratic equation represents an excellent fit, as is also evident from Fig. 17.11. 17.2.1 Algorithm for Polynomial Regression An algorithm for polynomial regression is delineated in Fig. 17.12. Note that the primary task is the generation of the coefficients of the normal equations [Eq. (17.19)]. (Pseudocode for accomplishing this is presented in Fig. 17.13.) Then, techniques from Part Three can be applied to solve these simultaneous equations for the coefficients. A potential problem associated with implementing polynomial regression on the computer is that the normal equations tend to be ill-conditioned. This is particularly true for higher-order versions. For these cases, the computed coefficients may be highly susceptible to round-off error, and consequently, the results can be inaccurate. Among other things, this problem is related to the structure of the normal equations and to the fact that for higher-order polynomials the normal equations can have very large and very small coefficients. This is because the coefficients are summations of the data raised to powers. Although the strategies for mitigating round-off error discussed in Part Three, such as pivoting, can help to partially remedy this problem, a simpler alternative is to use a com- puter with higher precision. Fortunately, most practical problems are limited to lower-order polynomials for which round-off is usually negligible. In situations where higher-order versions are required, other alternatives are available for certain types of data. However, these techniques (such as orthogonal polynomials) are beyond the scope of this book. The reader should consult texts on regression, such as Draper and Smith (1981), for additional information regarding the problem and possible alternatives. FIGURE 17.12 Algorithm for implementation of polynomial and multiple linear regression. Step 1: Input order of polynomial to be fit, m. Step 2: Input number of data points, n. Step 3: If n , m 1 1, print out an error message that regression is impossible and terminate the process. If n $ m 1 1, continue. Step 4: Compute the elements of the normal equation in the form of an augmented matrix. Step 5: Solve the augmented matrix for the coefficients a0, a1, a2, . . . , am, using an elimination method. Step 6: Print out the coefficients.
  • 493. 476 LEAST-SQUARES REGRESSION 17.3 MULTIPLE LINEAR REGRESSION A useful extension of linear regression is the case where y is a linear function of two or more independent variables. For example, y might be a linear function of x1 and x2, as in y 5 a0 1 a1x1 1 a2x2 1 e Such an equation is particularly useful when fitting experimental data, where the variable being studied is often a function of two other variables. For this two-dimensional case, the regression “line” becomes a “plane” (Fig. 17.14). DOFOR i 5 1, order 1 1 DOFOR j 5 1, i k 5 i 1 j 2 2 sum 5 0 DOFOR , 5 1, n sum 5 sum 1 x, k END DO ai,j 5 sum aj,i 5 sum END DO sum 5 0 DOFOR , 5 1, n sum 5 sum 1 y, ? x, i21 END DO ai,order12 5 sum END DO FIGURE 17.13 Pseudocode to assemble the elements of the normal equations for polynomial regression. FIGURE 17.14 Graphical depiction of multiple linear regression where y is a linear function of x1 and x2. y x1 x2
  • 494. 17.3 MULTIPLE LINEAR REGRESSION 477 As with the previous cases, the “best” values of the coefficients are determined by setting up the sum of the squares of the residuals, Sr 5 a n i51 (yi 2 a0 2 a1x1i 2 a2x2i)2 (17.21) and differentiating with respect to each of the unknown coefficients, 0Sr 0a0 5 22 a (yi 2 a0 2 a1x1i 2 a2x2i) 0Sr 0a1 5 22 a x1i (yi 2 a0 2 a1x1i 2 a2x2i) 0Sr 0a2 5 22 a x2i (yi 2 a0 2 a1x1i 2 a2x2i) The coefficients yielding the minimum sum of the squares of the residuals are obtained by setting the partial derivatives equal to zero and expressing the result in matrix form as £ n gx1i gx2i gx1i gx2 1i gx1ix2i gx2i gx1ix2i gx2 2i § 5 • a0 a1 a2 ¶ 5 • gyi gx1iyi gx2iyi ¶ (17.22) EXAMPLE 17.6 Multiple Linear Regression Problem Statement. The following data were calculated from the equation y 5 5 1 4x1 2 3x2: x1 x2 y 0 0 5 2 1 10 2.5 2 9 1 3 0 4 6 3 7 2 27 Use multiple linear regression to fit these data. Solution. The summations required to develop Eq. (17.22) are computed in Table 17.5. The result is £ 6 16.5 14 16.5 76.25 48 14 48 54 § • a0 a1 a2 ¶ 5 • 54 243.5 100 ¶ which can be solved using a method such as Gauss elimination for a0 5 5 a1 5 4 a2 5 23 which is consistent with the original equation from which these data were derived.
  • 495. 478 LEAST-SQUARES REGRESSION The foregoing two-dimensional case can be easily extended to m dimensions, as in y 5 a0 1 a1x1 1 a2x2 1 p 1 amxm 1 e where the standard error is formulated as syyx 5 B Sr n 2 (m 1 1) and the coefficient of determination is computed as in Eq. (17.10). An algorithm to set up the normal equations is listed in Fig. 17.15. Although there may be certain cases where a variable is linearly related to two or more other variables, multiple linear regression has additional utility in the derivation of power equations of the general form y 5 a0xa1 1 xa2 2 p xam m TABLE 17.5 Computations required to develop the normal equations for Example 17.6. y x1 x2 x1 2 x2 2 x1x2 x1y x2y 5 0 0 0 0 0 0 0 10 2 1 4 1 2 20 10 9 2.5 2 6.25 4 5 22.5 18 0 1 3 1 9 3 0 0 3 4 6 16 36 24 12 18 27 7 2 49 4 14 189 54 S 54 16.5 14 76.25 54 48 243.5 100 DOFOR i 5 1, order 1 1 DOFOR j 5 1, i sum 5 0 DOFOR , 5 1, n sum 5 sum 1 xi21,, ? xj21,, END DO ai,j 5 sum aj,i 5 sum END DO sum 5 0 DOFOR , 5 1, n sum 5 sum 1 y, ? xi21,, END DO ai,order12 5 sum END DO FIGURE 17.15 Pseudocode to assemble the elements of the normal equations for multiple regression. Note that aside from storing the independent variables in x1,i, x2,i, etc., 1’s must be stored in x0,i for this al- gorithm to work.
  • 496. 17.4 GENERAL LINEAR LEAST SQUARES 479 Such equations are extremely useful when fitting experimental data. To use multiple linear regression, the equation is transformed by taking its logarithm to yield log y 5 log a0 1 a1 log x1 1 a2 log x2 1 p 1 am log xm This transformation is similar in spirit to the one used in Sec. 17.1.5 and Example 17.4 to fit a power equation when y was a function of a single variable x. Section 20.4 provides an example of such an application for two independent variables. 17.4 GENERAL LINEAR LEAST SQUARES To this point, we have focused on the mechanics of obtaining least-squares fits of some simple functions to data. Before turning to nonlinear regression, there are several issues that we would like to discuss to enrich your understanding of the preceding material. 17.4.1 General Matrix Formulation for Linear Least Squares In the preceding pages, we have introduced three types of regression: simple linear, polynomial, and multiple linear. In fact, all three belong to the following general linear least-squares model: y 5 a0z0 1 a1z1 1 a2z2 1 p 1 amzm 1 e (17.23) where z0, z1, . . . , zm are m 1 1 basis functions. It can easily be seen how simple and multiple linear regression fall within this model—that is, z0 5 1, z1 5 x1, z2 5 x2, . . . , zm 5 xm. Further, polynomial regression is also included if the basis functions are simple monomials as in z0 5 x0 5 1, z1 5 x, z2 5 x2 , . . . , zm 5 xm . Note that the terminology “linear” refers only to the model’s dependence on its parameters—that is, the a’s. As in the case of polynomial regression, the functions them- selves can be highly nonlinear. For example, the z’s can be sinusoids, as in y 5 a0 1 a1 cos(vt) 1 a2 sin(vt) Such a format is the basis of Fourier analysis described in Chap. 19. On the other hand, a simple-looking model like f(x) 5 a0(1 2 e2a1x ) is truly nonlinear because it cannot be manipulated into the format of Eq. (17.23). We will turn to such models at the end of this chapter. For the time being, Eq. (17.23) can be expressed in matrix notation as {Y} 5 [Z]{A} 1 {E} (17.24) where [Z] is a matrix of the calculated values of the basis functions at the measured values of the independent variables, [Z] 5 F z01 z11 p zm1 z02 z12 p zm2 . . . . . . . . . z0n z1n p zmn V
  • 497. 480 LEAST-SQUARES REGRESSION where m is the number of variables in the model and n is the number of data points. Be- cause n $ m 1 1, you should recognize that most of the time, [Z] is not a square matrix. The column vector {Y} contains the observed values of the dependent variable {Y}T 5 :y1 y2 p yn ; The column vector {A} contains the unknown coefficients {A}T 5 :a0 a1 p am; and the column vector {E} contains the residuals {E}T 5 :e1 e2 p en ; As was done throughout this chapter, the sum of the squares of the residuals for this model can be defined as Sr 5 a n i51 ayi 2 a m j50 ajzjib 2 This quantity can be minimized by taking its partial derivative with respect to each of the coefficients and setting the resulting equation equal to zero. The outcome of this process is the normal equations that can be expressed concisely in matrix form as 3[Z]T [Z]4{A} 5 5[Z]T {Y}6 (17.25) It can be shown that Eq. (17.25) is, in fact, equivalent to the normal equations developed previously for simple linear, polynomial, and multiple linear regression. Our primary motivation for the foregoing has been to illustrate the unity among the three approaches and to show how they can all be expressed simply in the same matrix notation. The matrix notation will also have relevance when we turn to nonlinear regres- sion in the last section of this chapter. From Eq. (PT3.6), recall that the matrix inverse can be employed to solve Eq. (17.25), as in {A} 5 3[Z]T [Z]421 5[Z]T {Y}6 (17.26) As we have learned in Part Three, this is an inefficient approach for solving a set of simultaneous equations. However, from a statistical perspective, there are a number of reasons why we might be interested in obtaining the inverse and examining its coeffi- cients. These reasons will be discussed next. 17.4.2 Statistical Aspects of Least-Squares Theory In Sec. PT5.2.1, we reviewed a number of descriptive statistics that can be used to describe a sample. These included the arithmetic mean, the standard deviation, and the variance. Aside from yielding a solution for the regression coefficients, the matrix formula- tion of Eq. (17.26) provides estimates of their statistics. It can be shown (Draper and Smith, 1981) that the diagonal and off-diagonal terms of the matrix [[Z]T [Z]]21 give, respectively, the variances and the covariances1 of the a’s. If the diagonal elements of 1 The covariance is a statistic that measures the dependency of one variable on another. Thus, cov(x, y) indicates the dependency of x and y. For example, cov(x, y) 5 0 would indicate that x and y are totally independent.
  • 498. 17.4 GENERAL LINEAR LEAST SQUARES 481 [[Z]T [Z]]21 are designated as z21 i,i , var(ai21) 5 z21 i,i s2 yyx (17.27) and cov(ai21, aj21) 5 z21 i, j s2 yyx (17.28) These statistics have a number of important applications. For our present purposes, we will illustrate how they can be used to develop confidence intervals for the intercept and slope. Using an approach similar to that in Sec. PT5.2.3, it can be shown that lower and upper bounds on the intercept can be formulated as (see Milton and Arnold, 2002, for details) L 5 a0 2 tay2,n22 s(a0) U 5 a0 1 tay2,n22 s(a0) (17.29) where s(aj) 5 the standard error of coefficient aj 5 1var(aj). In a similar manner, lower and upper bounds on the slope can be formulated as L 5 a1 2 tay2,n22 s(a1) U 5 a1 1 tay2,n22 s(a1) (17.30) The following example illustrates how these intervals can be used to make quantitative inferences related to linear regression. EXAMPLE 17.7 Confidence Intervals for Linear Regression Problem Statement. In Example 17.3, we used regression to develop the following relationship between measurements and model predictions: y 5 20.859 1 1.032x where y 5 the model predictions and x 5 the measurements. We concluded that there was a good agreement between the two because the intercept was approximately equal to 0 and the slope approximately equal to 1. Recompute the regression but use the matrix approach to estimate standard errors for the parameters. Then employ these errors to develop confidence intervals, and use these to make a probabilistic statement regarding the goodness of fit. Solution. These data can be written in matrix format for simple linear regression as: [Z] 5 G 1 10 1 16.3 1 23 . . . . . . 1 50 W {Y} 5 g 8.953 16.405 22.607 . . . 49.988 w Matrix transposition and multiplication can then be used to generate the normal equations as 3[Z]T [Z]4 {A} 5 5[Z]T {Y}6 c 15 548.3 548.3 22191.21 d e a0 a1 f 5 e 552.741 22421.43 f
  • 499. 482 LEAST-SQUARES REGRESSION Matrix inversion can be used to obtain the slope and intercept as {A} 5 3[Z]T [Z]421 5[Z]T {Y}6 5 c 0.688414 20.01701 20.01701 0.000465 d e 552.741 22421.43 f 5 e 20.85872 1.031592 f Thus, the intercept and the slope are determined as a0 5 20.85872 and a1 5 1.031592, respectively. These values in turn can be used to compute the standard error of the estimate as syyx 5 0.863403. This value can be used along with the diagonal elements of the matrix inverse to calculate the standard errors of the coefficients, s(a0) 5 2z21 11 s2 yyx 5 20.688414(0.863403)2 5 0.716372 s(a1) 5 2z21 22 s2 yyx 5 20.000465(0.863403)2 5 0.018625 The statistic, tay2,n21 needed for a 95% confidence interval with n 2 2 5 15 2 2 5 13 degrees of freedom can be determined from a statistics table or using software. We used an Excel function, TINV, to come up with the proper value, as in 5 TINV(0.05, 13) which yielded a value of 2.160368. Equations (17.29) and (17.30) can then be used to compute the confidence intervals as a0 5 20.85872 ; 2.160368(0.716372) 5 20.85872 ; 1.547627 5 [22.40634, 0.688912] a1 5 1.031592 ; 2.160368(0.018625) 5 1.031592 ; 0.040237 5 [0.991355, 1.071828] Notice that the desired values (0 for intercept and slope and 1 for the intercept) fall within the intervals. On the basis of this analysis we could make the following statement regarding the slope: We have strong grounds for believing that the slope of the true regres- sion line lies within the interval from 0.991355 to 1.071828. Because 1 falls within this interval, we also have strong grounds for believing that the result supports the agreement between the measurements and the model. Because zero falls within the intercept interval, a similar statement can be made regarding the intercept. As mentioned previously in Sec. 17.2.1, the normal equations are notoriously ill- conditioned. Hence, if solved with conventional techniques such as LU decomposition, the computed coefficients can be highly susceptible to round-off error. As a conse- quence, more sophisticated orthogonalization algorithms, such as QR factorization, are available to circumvent the problem. Because these techniques are beyond the scope of this book, the reader should consult texts on regression, such as Draper and Smith (1981), for additional information regarding the problem and possible alternatives. Moler (2004) also provides a nice discussion of the topic with emphasis on the nu- merical methods. The foregoing is a limited introduction to the rich topic of statistical inference and its relationship to regression. There are many subleties that are beyond the scope of this
  • 500. 17.5 NONLINEAR REGRESSION 483 book. Our primary motivation has been to illustrate the power of the matrix approach to general linear least squares. In addition, it should be noted that software packages such as Excel, MATLAB, and Mathcad can generate least-squares regression fits along with information relevant to inferential statistics. We will explore some of these capabilities when we describe these packages at the end of Chap. 19. 17.5 NONLINEAR REGRESSION There are many cases in engineering where nonlinear models must be fit to data. In the present context, these models are defined as those that have a nonlinear dependence on their parameters. For example, f(x) 5 a0(1 2 e2a1x ) 1 e (17.31) This equation cannot be manipulated so that it conforms to the general form of Eq. (17.23). As with linear least squares, nonlinear regression is based on determining the values of the parameters that minimize the sum of the squares of the residuals. However, for the nonlinear case, the solution must proceed in an iterative fashion. The Gauss-Newton method is one algorithm for minimizing the sum of the squares of the residuals between data and nonlinear equations. The key concept underlying the technique is that a Taylor series expansion is used to express the original nonlinear equa- tion in an approximate, linear form. Then, least-squares theory can be used to obtain new estimates of the parameters that move in the direction of minimizing the residual. To illustrate how this is done, first the relationship between the nonlinear equation and the data can be expressed generally as yi 5 f(xi; a0, a1, p , am) 1 ei where yi 5 a measured value of the dependent variable, f(xi; a0, a1, p , am) 5 the equa- tion that is a function of the independent variable xi and a nonlinear function of the parameters a0, a1, p , am, and ei 5 a random error. For convenience, this model can be expressed in abbreviated form by omitting the parameters, yi 5 f(xi) 1 ei (17.32) The nonlinear model can be expanded in a Taylor series around the parameter values and curtailed after the first derivative. For example, for a two-parameter case, f(xi)j11 5 f(xi)j 1 0f(xi)j 0a0 ¢a0 1 0f(xi)j 0a1 ¢a1 (17.33) where j 5 the initial guess, j 1 1 5 the prediction, Da0 5 a0,j11 2 a0,j, and Da1 5 a1,j11 2 a1,j. Thus, we have linearized the original model with respect to the parameters. Equation (17.33) can be substituted into Eq. (17.32) to yield yi 2 f(xi)j 5 0f(xi)j 0a0 ¢a0 1 0f(xi)j 0a1 ¢a1 1 ei or in matrix form [compare with Eq. (17.24)], {D} 5 [Zj]{¢A} 1 {E} (17.34)
  • 501. 484 LEAST-SQUARES REGRESSION where [Zj] is the matrix of partial derivatives of the function evaluated at the initial guess j, [Zj] 5 F 0f1y0a0 0f1y0a1 0f2y0a0 0f2y0a1 . . . . . . 0fny0a0 0fny0a1 V where n 5 the number of data points and 0fiy0ak 5 the partial derivative of the function with respect to the kth parameter evaluated at the ith data point. The vector {D} contains the differences between the measurements and the function values, {D} 5 f y1 2 f(x1) y2 2 f(x2) . . . yn 2 f(xn) v and the vector {DA} contains the changes in the parameter values, {¢A} 5 f ¢a0 ¢a1 . . . ¢am v Applying linear least-squares theory to Eq. (17.34) results in the following normal equa- tions [recall Eq. (17.25)]: 3[Zj]T [Zj]4{¢A} 5 5[Zj]T {D}6 (17.35) Thus, the approach consists of solving Eq. (17.35) for {DA}, which can be employed to compute improved values for the parameters, as in a0, j11 5 a0, j 1 ¢a0 and a1, j11 5 a1, j 1 ¢a1 This procedure is repeated until the solution converges—that is, until Zea Zk 5 ` ak, j11 2 ak, j ak, j11 ` 100% (17.36) falls below an acceptable stopping criterion.
  • 502. 17.5 NONLINEAR REGRESSION 485 EXAMPLE 17.8 Gauss-Newton Method Problem Statement. Fit the function f(x; a0, a1) 5 a0(1 2 e2a1x ) to the data: x 0.25 0.75 1.25 1.75 2.25 y 0.28 0.57 0.68 0.74 0.79 Use initial guesses of a0 5 1.0 and a1 5 1.0 for the parameters. Note that for these guesses, the initial sum of the squares of the residuals is 0.0248. Solution. The partial derivatives of the function with respect to the parameters are 0f 0a0 5 1 2 e2a1x (E17.8.1) and 0f 0a1 5 a0xe2a1x (E17.8.2) Equations (E17.8.1) and (E17.8.2) can be used to evaluate the matrix [Z0] 5 E 0.2212 0.1947 0.5276 0.3543 0.7135 0.3581 0.8262 0.3041 0.8946 0.2371 U This matrix multiplied by its transpose results in [Z0]T [Z0] 5 c 2.3193 0.9489 0.9489 0.4404 d which in turn can be inverted to yield 3[Z0]T [Z0]421 5 c 3.6397 27.8421 27.8421 19.1678 d The vector {D} consists of the differences between the measurements and the model predictions, {D} 5 e 0.28 2 0.2212 0.57 2 0.5276 0.68 2 0.7135 0.74 2 0.8262 0.79 2 0.8946 u 5 e 0.0588 0.0424 20.0335 20.0862 20.1046 u It is multiplied by [Z0]T to give [Z0]T {D} 5 c 20.1533 20.0365 d
  • 503. 486 LEAST-SQUARES REGRESSION The vector {DA} is then calculated by solving Eq. (17.35) for ¢A 5 e 20.2714 0.5019 f which can be added to the initial parameter guesses to yield e a0 a1 f 5 e 1.0 1.0 f 1 e 20.2714 0.5019 f 5 e 0.7286 1.5019 f Thus, the improved estimates of the parameters are a0 5 0.7286 and a1 5 1.5019. The new parameters result in a sum of the squares of the residuals equal to 0.0242. Equation (17.36) can be used to compute e0 and e1 equal to 37 and 33 percent, respectively. The computation would then be repeated until these values fell below the prescribed stopping criterion. The final result is a0 5 0.79186 and a1 5 1.6751. These coefficients give a sum of the squares of the residuals of 0.000662. A potential problem with the Gauss-Newton method as developed to this point is that the partial derivatives of the function may be difficult to evaluate. Consequently, many computer programs use difference equations to approximate the partial derivatives. One method is 0fi 0ak f(xi; a0, p , ak 1 dak, p , am) 2 f(xi; a0, p , ak, p , am) dak (17.37) where d 5 a small fractional perturbation. The Gauss-Newton method has a number of other possible shortcomings: 1. It may converge slowly. 2. It may oscillate widely, that is, continually change directions. 3. It may not converge at all. Modifications of the method (Booth and Peterson, 1958; Hartley, 1961) have been de- veloped to remedy the shortcomings. In addition, although there are several approaches expressly designed for regres- sion, a more general approach is to use nonlinear optimization routines as described in Part Four. To do this, a guess for the parameters is made, and the sum of the squares of the residuals is computed. For example, for Eq. (17.31) it would be com- puted as Sr 5 a n i51 [yi 2 a0(1 2 e2a1xi )]2 (17.38) Then, the parameters would be adjusted systematically to minimize Sr using search tech- niques of the type described previously in Chap. 14. We will illustrate how this is done when we describe software applications at the end of Chap. 19.
  • 504. PROBLEMS 487 PROBLEMS 17.1 Given these data 8.8 9.5 9.8 9.4 10.0 9.4 10.1 9.2 11.3 9.4 10.0 10.4 7.9 10.4 9.8 9.8 9.5 8.9 8.8 10.6 10.1 9.5 9.6 10.2 8.9 Determine (a) the mean, (b) the standard deviation, (c) the vari- ance, (d) the coefficient of variation, and (e) the 95% confidence interval for the mean. (f) construct a histogram using a range from 7.5 to 11.5 with intervals of 0.5. 17.2 Given these data 29.65 28.55 28.65 30.15 29.35 29.75 29.25 30.65 28.15 29.85 29.05 30.25 30.85 28.75 29.65 30.45 29.15 30.45 33.65 29.35 29.75 31.25 29.45 30.15 29.65 30.55 29.65 29.25 Determine (a) the mean, (b) the standard deviation, (c) the vari- ance, (d) the coefficient of variation, and (e) the 90% confidence interval for the mean. (f) Construct a histogram. Use a range from 28 to 34 with increments of 0.4. (g) Assuming that the distribution is normal and that your estimate of the standard deviation is valid, compute the range (that is, the lower and the upper values) that encompasses 68% of the readings. Determine whether this is a valid estimate for the data in this problem. 17.3 Use least-squares regression to fit a straight line to x 0 2 4 6 9 11 12 15 17 19 y 5 6 7 6 9 8 7 10 12 12 Along with the slope and intercept, compute the standard error of the estimate and the correlation coefficient. Plot the data and the regression line. Then repeat the problem, but regress x versus y— that is, switch the variables. Interpret your results. 17.4 Use least-squares regression to fit a straight line to x 6 7 11 15 17 21 23 29 29 37 39 y 29 21 29 14 21 15 7 7 13 0 3 Along with the slope and the intercept, compute the standard error of the estimate and the correlation coefficient. Plot the data and the re- gression line. If someone made an additional measurement of x 5 10, y 5 10, would you suspect, based on a visual assessment and the standard error, that the measurement was valid or faulty? Justify your conclusion. 17.5 Using the same approach as was employed to derive Eqs. (17.15) and (17.16), derive the least-squares fit of the following model: y 5 a1x 1 e That is, determine the slope that results in the least-squares fit for a straight line with a zero intercept. Fit the following data with this model and display the result graphically: x 2 4 6 7 10 11 14 17 20 y 1 2 5 2 8 7 6 9 12 17.6 Use least-squares regression to fit a straight line to x 1 2 3 4 5 6 7 8 9 y 1 1.5 2 3 4 5 8 10 13 (a) Along with the slope and intercept, compute the standard error of the estimate and the correlation coefficient. Plot the data and the straight line. Assess the fit. (b) Recompute (a), but use polynomial regression to fit a parabola to the data. Compare the results with those of (a). 17.7 Fit the following data with (a) a saturation-growth-rate model, (b) a power equation, and (c) a parabola. In each case, plot the data and the equation. x 0.75 2 3 4 6 8 8.5 y 1.2 1.95 2 2.4 2.4 2.7 2.6 17.8 Fit the following data with the power model (y 5 axb ). Use the resulting power equation to predict y at x 5 9: x 2.5 3.5 5 6 7.5 10 12.5 15 17.5 20 y 13 11 8.5 8.2 7 6.2 5.2 4.8 4.6 4.3 17.9 Fit an exponential model to x 0.4 0.8 1.2 1.6 2 2.3 y 800 975 1500 1950 2900 3600 Plot the data and the equation on both standard and semi-logarithmic graph paper. 17.10 Rather than using the base-e exponential model (Eq. 17.22), a common alternative is to use a base-10 model, y 5 a510b5x When used for curve fitting, this equation yields identical results to the base-e version, but the value of the exponent parameter (b5) will differ from that estimated with Eq. 17.22 (b1). Use the base-10 version to solve Prob. 17.9. In addition, develop a formulation to relate b1 to b5. 17.11 Beyond the examples in Fig. 17.10, there are other models that can be linearized using transformations. For example, y 5 a4xeb4x
  • 505. 488 LEAST-SQUARES REGRESSION Determine the coefficients by setting up and solving Eq. (17.25). 17.16 Given these data x 5 10 15 20 25 30 35 40 45 50 y 17 24 31 33 37 37 40 40 42 41 use least-squares regression to fit (a) a straight line, (b) a power equation, (c) a saturation-growth-rate equation, and (d) a parabola. Plot the data along with all the curves. Is any one of the curves superior? If so, justify. 17.17 Fit a cubic equation to the following data: x 3 4 5 7 8 9 11 12 y 1.6 3.6 4.4 3.4 2.2 2.8 3.8 4.6 Along with the coefficients, determine r2 and syyx. 17.18 Use multiple linear regression to fit x1 0 1 1 2 2 3 3 4 4 x2 0 1 2 1 2 1 2 1 2 y 15.1 17.9 12.7 25.6 20.5 35.1 29.7 45.4 40.2 Compute the coefficients, the standard error of the estimate, and the correlation coefficient. 17.19 Use multiple linear regression to fit x1 0 0 1 2 0 1 2 2 1 x2 0 2 2 4 4 6 6 2 1 y 14 21 11 12 23 23 14 6 11 Compute the coefficients, the standard error of the estimate, and the correlation coefficient. 17.20 Use nonlinear regression to fit a parabola to the following data: x 0.2 0.5 0.8 1.2 1.7 2 2.3 y 500 700 1000 1200 2200 2650 3750 17.21 Use nonlinear regression to fit a saturation-growth-rate equation to the data in Prob. 17.16. 17.22 Recompute the regression fits from Probs. (a) 17.3 and (b) 17.17, using the matrix approach. Estimate the standard errors and develop 90% confidence intervals for the coefficients. 17.23 Develop, debug, and test a program in either a high-level language or macro language of your choice to implement linear regression. Among other things: (a) include statements to docu- ment the code, and (b) determine the standard error and the coeffi- cient of determination. 17.24 A material is tested for cyclic fatigue failure whereby a stress, in MPa, is applied to the material and the number of cycles needed to cause failure is measured. The results are in the table below. When a log-log plot of stress versus cycles is generated, the Linearize this model and use it to estimate a4 and b4 based on the following data. Develop a plot of your fit along with the data. x 0.1 0.2 0.4 0.6 0.9 1.3 1.5 1.7 1.8 y 0.75 1.25 1.45 1.25 0.85 0.55 0.35 0.28 0.18 17.12 An investigator has reported the data tabulated below for an experiment to determine the growth rate of bacteria k (per d), as a function of oxygen concentration c (mg/L). It is known that such data can be modeled by the following equation: k 5 kmaxc2 cs 1 c2 where cs and kmax are parameters. Use a transformation to linearize this equation. Then use linear regression to estimate cs and kmax and predict the growth rate at c 5 2 mg/L. c 0.5 0.8 1.5 2.5 4 k 1.1 2.4 5.3 7.6 8.9 17.13 An investigator has reported the data tabulated below. It is known that such data can be modeled by the following equation x 5 e(y2b)ya where a and b are parameters. Use a transformation to linearize this equation and then employ linear regression to determine a and b. Based on your analysis predict y at x 5 2.6. x 1 2 3 4 5 y 0.5 2 2.9 3.5 4 17.14 It is known that the data tabulated below can be modeled by the following equation y 5 a a 1 1x b1x b 2 Use a transformation to linearize this equation and then employ linear regression to determine the parameters a and b. Based on your analysis predict y at x 5 1.6. x 0.5 1 2 3 4 y 10.4 5.8 3.3 2.4 2 17.15 The following data are provided x 1 2 3 4 5 y 2.2 2.8 3.6 4.5 5.5 You want to use least-squares regression to fit these data with the following model, y 5 a 1 bx 1 c x
  • 506. PROBLEMS 489 at which the concentration will reach 200 CFUy100 mL. Note that your choice of model should be consistent with the fact that nega- tive concentrations are impossible and that the bacteria concentra- tion always decreases with time. 17.28 An object is suspended in a wind tunnel and the force mea- sured for various levels of wind velocity. The results are tabulated below. v, m/s 10 20 30 40 50 60 70 80 F, N 25 70 380 550 610 1220 830 1450 Use least-squares regression to fit these data with (a) a straight line, (b) a power equation based on log transformations, and (c) a power model based on nonlinear regression. Display the results graphically. 17.29 Fit a power model to the data from Prob. 17.28, but use natural logarithms to perform the transformations. 17.30 Derive the least-squares fit of the following model: y 5 a1x 1 a2x2 1 e That is, determine the coefficients that results in the least-squares fit for a second-order polynomial with a zero intercept. Test the ap- proach by using it to fit the data from Prob. 17.28. 17.31 In Prob. 17.11 we used transformations to linearize and fit the following model: y 5 a4xeb4x Use nonlinear regression to estimate a4 and b4 based on the follow- ing data. Develop a plot of your fit along with the data. x 0.1 0.2 0.4 0.6 0.9 1.3 1.5 1.7 1.8 y 0.75 1.25 1.45 1.25 0.85 0.55 0.35 0.28 0.18 data trend shows a linear relationship. Use least-squares regression to determine a best-fit equation for these data. N, cycles 1 10 100 1000 10,000 100,000 1,000,000 Stress, MPa 1100 1000 925 800 625 550 420 17.25 The following data show the relationship between the vis- cosity of SAE 70 oil and temperature. After taking the log of the data, use linear regression to find the equation of the line that best fits the data and the r2 value. Temperature, 8C 26.67 93.33 148.89 315.56 Viscosity, m, N ? s/m2 1.35 0.085 0.012 0.00075 17.26 The data below represents the bacterial growth in a liquid culture over a number of days. Day 0 4 8 12 16 20 Amount 3 106 67 84 98 125 149 185 Find a best-fit equation to the data trend. Try several possibilities— linear, parabolic, and exponential. Use the software package of your choice to find the best equation to predict the amount of bac- teria after 40 days. 17.27 The concentration of E. coli bacteria in a swimming area is monitored after a storm: t (hr) 4 8 12 16 20 24 c (CFUy100 mL) 1600 1320 1000 890 650 560 The time is measured in hours following the end of the storm and the unit CFU is a “colony forming unit.” Use these data to estimate (a) the concentration at the end of the storm (t 5 0) and (b) the time
  • 507. 18 C H A P T E R 18 490 Interpolation You will frequently have occasion to estimate intermediate values between precise data points. The most common method used for this purpose is polynomial interpolation. Recall that the general formula for an nth-order polynomial is f(x) 5 a0 1 a1x 1 a2x2 1 p 1 anxn (18.1) For n 1 1 data points, there is one and only one polynomial of order n that passes through all the points. For example, there is only one straight line (that is, a first-order polynomial) that connects two points (Fig. 18.1a). Similarly, only one parabola connects a set of three points (Fig. 18.lb). Polynomial interpolation consists of determining the unique nth-order polynomial that fits n 1 1 data points. This polynomial then provides a formula to compute intermediate values. Although there is one and only one nth-order polynomial that fits n 1 1 points, there are a variety of mathematical formats in which this polynomial can be expressed. In this chapter, we will describe two alternatives that are well-suited for computer implementa- tion: the Newton and the Lagrange polynomials. FIGURE 18.1 Examples of interpolating polynomials: (a) first-order (linear) connecting two points, (b) second- order (quadratic or parabolic) connecting three points, and (c) third-order (cubic) connecting four points. (a) (b) (c)
  • 508. 18.1 NEWTON’S DIVIDED-DIFFERENCE INTERPOLATING POLYNOMIALS 491 18.1 NEWTON’S DIVIDED-DIFFERENCE INTERPOLATING POLYNOMIALS As stated above, there are a variety of alternative forms for expressing an interpolating polynomial. Newton’s divided-difference interpolating polynomial is among the most popular and useful forms. Before presenting the general equation, we will introduce the first- and second-order versions because of their simple visual interpretation. 18.1.1 Linear Interpolation The simplest form of interpolation is to connect two data points with a straight line. This tech- nique, called linear interpolation, is depicted graphically in Fig. 18.2. Using similar triangles, f1(x) 2 f(x0) x 2 x0 5 f(x1) 2 f(x0) x1 2 x0 which can be rearranged to yield f1(x) 5 f(x0) 1 f(x1) 2 f(x0) x1 2 x0 (x 2 x0) (18.2) which is a linear-interpolation formula. The notation f1(x) designates that this is a first- order interpolating polynomial. Notice that besides representing the slope of the line connecting the points, the term [ f(x1) 2 f(x0)]y(x1 2 x0) is a finite-divided-difference FIGURE 18.2 Graphical depiction of linear interpolation. The shaded areas indicate the similar triangles used to derive the linear-interpolation formula [Eq. (18.2)]. f(x) x x1 x x0 f(x1) f(x0) f1(x)
  • 509. 492 INTERPOLATION approximation of the first derivative [recall Eq. (4.17)]. In general, the smaller the inter- val between the data points, the better the approximation. This is due to the fact that, as the interval decreases, a continuous function will be better approximated by a straight line. This characteristic is demonstrated in the following example. EXAMPLE 18.1 Linear Interpolation Problem Statement. Estimate the natural logarithm of 2 using linear interpolation. First, perform the computation by interpolating between ln 1 5 0 and ln 6 5 1.791759. Then, repeat the procedure, but use a smaller interval from ln 1 to ln 4 (1.386294). Note that the true value of ln 2 is 0.6931472. Solution. We use Eq. (18.2) and a linear interpolation for ln(2) from x0 5 1 to x1 5 6 to give f1(2) 5 0 1 1.791759 2 0 6 2 1 (2 2 1) 5 0.3583519 which represents an error of ␧t 5 48.3%. Using the smaller interval from x0 5 1 to x1 5 4 yields f1(2) 5 0 1 1.386294 2 0 4 2 1 (2 2 1) 5 0.4620981 Thus, using the shorter interval reduces the percent relative error to ␧t 5 33.3%. Both interpolations are shown in Fig. 18.3, along with the true function. FIGURE 18.3 Two linear interpolations to estimate ln 2. Note how the smaller interval provides a better estimate. f(x) f (x) = ln x f1(x) True value Linear estimates x 5 0 2 0 1
  • 510. 18.1 NEWTON’S DIVIDED-DIFFERENCE INTERPOLATING POLYNOMIALS 493 18.1.2 Quadratic Interpolation The error in Example 18.1 resulted from our approximating a curve with a straight line. Consequently, a strategy for improving the estimate is to introduce some curvature into the line connecting the points. If three data points are available, this can be accomplished with a second-order polynomial (also called a quadratic polynomial or a parabola). A particularly convenient form for this purpose is f2(x) 5 b0 1 b1(x 2 x0) 1 b2(x 2 x0)(x 2 x1) (18.3) Note that although Eq. (18.3) might seem to differ from the general polynomial [Eq. (18.1)], the two equations are equivalent. This can be shown by multiplying the terms in Eq. (18.3) to yield f2(x) 5 b0 1 b1x 2 b1x0 1 b2x2 1 b2x0x1 2 b2xx0 2 b2xx1 or, collecting terms, f2(x) 5 a0 1 a1x 1 a2x2 where a0 5 b0 2 b1x0 1 b2x0x1 a1 5 b1 2 b2x0 2 b2x1 a2 5 b2 Thus, Eqs. (18.1) and (18.3) are alternative, equivalent formulations of the unique second- order polynomial joining the three points. A simple procedure can be used to determine the values of the coefficients. For b0, Eq. (18.3) with x 5 x0 can be used to compute b0 5 f(x0) (18.4) Equation (18.4) can be substituted into Eq. (18.3), which can be evaluated at x 5 x1 for b1 5 f(x1) 2 f(x0) x1 2 x0 (18.5) Finally, Eqs. (18.4) and (18.5) can be substituted into Eq. (18.3), which can be evaluated at x 5 x2 and solved (after some algebraic manipulations) for b2 5 f(x2) 2 f(x1) x2 2 x1 2 f(x1) 2 f(x0) x1 2 x0 x2 2 x0 (18.6) Notice that, as was the case with linear interpolation, b1 still represents the slope of the line connecting points x0 and x1. Thus, the first two terms of Eq. (18.3) are equivalent to linear interpolation from x0 to x1, as specified previously in Eq. (18.2). The last term, b2(x 2 x0)(x 2 x1), introduces the second-order curvature into the formula. Before illustrating how to use Eq. (18.3), we should examine the form of the coef- ficient b2. It is very similar to the finite-divided-difference approximation of the second derivative introduced previously in Eq. (4.24). Thus, Eq. (18.3) is beginning to manifest a structure that is very similar to the Taylor series expansion. This observation will be
  • 511. 494 INTERPOLATION explored further when we relate Newton’s interpolating polynomials to the Taylor series in Sec. 18.1.4. But first, we will do an example that shows how Eq. (18.3) is used to interpolate among three points. EXAMPLE 18.2 Quadratic Interpolation Problem Statement. Fit a second-order polynomial to the three points used in Example 18.1: x0 5 1 f(x0) 5 0 x1 5 4 f(x1) 5 1.386294 x2 5 6 f(x2) 5 1.791759 Use the polynomial to evaluate ln 2. Solution. Applying Eq. (18.4) yields b0 5 0 Equation (18.5) yields b1 5 1.386294 2 0 4 2 1 5 0.4620981 and Eq. (18.6) gives b2 5 1.791759 2 1.386294 6 2 4 2 0.4620981 6 2 1 5 20.0518731 FIGURE 18.4 The use of quadratic interpolation to estimate ln 2. The linear interpolation from x 5 1 to 4 is also included for comparison. f(x) f (x) = ln x f2(x) True value Linear estimate Quadratic estimate x 5 0 2 0 1
  • 512. 18.1 NEWTON’S DIVIDED-DIFFERENCE INTERPOLATING POLYNOMIALS 495 Substituting these values into Eq. (18.3) yields the quadratic formula f2(x) 5 0 1 0.4620981(x 2 1) 2 0.0518731(x 2 1)(x 2 4) which can be evaluated at x 5 2 for f2(2) 5 0.5658444 which represents a relative error of ␧t 5 18.4%. Thus, the curvature introduced by the quadratic formula (Fig. 18.4) improves the interpolation compared with the result obtained using straight lines in Example 18.1 and Fig. 18.3. 18.1.3 General Form of Newton’s Interpolating Polynomials The preceding analysis can be generalized to fit an nth-order polynomial to n 1 1 data points. The nth-order polynomial is fn(x) 5 b0 1 b1(x 2 x0) 1 p 1 bn(x 2 x0)(x 2 x1) p (x 2 xn21) (18.7) As was done previously with the linear and quadratic interpolations, data points can be used to evaluate the coefficients b0, b1, . . . , bn. For an nth-order polynomial, n 1 1 data points are required: [x0, f(x0)], [x1, f(x1)], . . . , [xn, f(xn)]. We use these data points and the following equations to evaluate the coefficients: b0 5 f(x0) (18.8) b1 5 f [x1, x0] (18.9) b2 5 f [x2, x1, x0] (18.10) . . . bn 5 f[xn, xn21, p , x1, x0] (18.11) where the bracketed function evaluations are finite divided differences. For example, the first finite divided difference is represented generally as f[xi, xj] 5 f(xi) 2 f(xj) xi 2 xj (18.12) The second finite divided difference, which represents the difference of two first divided differences, is expressed generally as f[xi, xj, xk] 5 f[xi, xj] 2 f[xj, xk] xi 2 xk (18.13) Similarly, the nth finite divided difference is f[xn, xn21, p , x1, x0] 5 f[xn, xn21, p , x1] 2 f[xn21, xn22, p , x0] xn 2 x0 (18.14)
  • 513. 496 INTERPOLATION These differences can be used to evaluate the coefficients in Eqs. (18.8) through (18.11), which can then be substituted into Eq. (18.7) to yield the interpolating polynomial fn(x) 5 f(x0) 1 (x 2 x0)f[x1, x0] 1 (x 2 x0)(x 2 x1)f[x2, x1, x0] 1 p 1 (x 2 x0)(x 2 x1) p (x 2 xn21)f[xn, xn21, p , x0] (18.15) which is called Newton’s divided-difference interpolating polynomial. It should be noted that it is not necessary that the data points used in Eq. (18.15) be equally spaced or that the abscissa values necessarily be in ascending order, as illustrated in the following example. Also, notice how Eqs. (18.12) through (18.14) are recursive—that is, higher- order differences are computed by taking differences of lower-order differences (Fig. 18.5). This property will be exploited when we develop an efficient computer program in Sec. 18.1.5 to implement the method. EXAMPLE 18.3 Newton’s Divided-Difference Interpolating Polynomials Problem Statement. In Example 18.2, data points at x0 5 1, x1 5 4, and x2 5 6 were used to estimate ln 2 with a parabola. Now, adding a fourth point [x3 5 5; f(x3) 5 1.609438], estimate ln 2 with a third-order Newton’s interpolating polynomial. Solution. The third-order polynomial, Eq. (18.7) with n 5 3, is f3(x) 5 b0 1 b1(x 2 x0) 1 b2(x 2 x0)(x 2 x1) 1 b3(x 2 x0)(x 2 x1)(x 2 x2) The first divided differences for the problem are [Eq. (18.12)] f[x1, x0] 5 1.386294 2 0 4 2 1 5 0.4620981 f[x2, x1] 5 1.791759 2 1.386294 6 2 4 5 0.2027326 f[x3, x2] 5 1.609438 2 1.791759 5 2 6 5 0.1823216 FIGURE 18.5 Graphical depiction of the recursive nature of finite divided differences. i xi f(xi) First Second Third 0 x0 f(x0) f[x1, x0] f[x2, x1, x0] f[x3, x2, x1, x0] 1 x1 f(x1) f[x2, x1] f[x3, x2, x1] 2 x2 f(x2) f[x3, x2] 3 x3 f(x3)
  • 514. 18.1 NEWTON’S DIVIDED-DIFFERENCE INTERPOLATING POLYNOMIALS 497 The second divided differences are [Eq. (18.13)] f[x2, x1, x0] 5 0.2027326 2 0.4620981 6 2 1 5 20.05187311 f[x3, x2, x1] 5 0.1823216 2 0.2027326 5 2 4 5 20.02041100 The third divided difference is [Eq. (18.14) with n 5 3] f[x3, x2, x1, x0] 5 20.02041100 2 (20.05187311) 5 2 1 5 0.007865529 The results for f [x1, x0], f[x2, x1, x0], and f [x3, x2, x1, x0] represent the coefficients b1, b2, and b3, respectively, of Eq. (18.7). Along with b0 5 f(x0) 5 0.0, Eq. (18.7) is f3(x) 5 0 1 0.4620981(x 2 1) 2 0.05187311(x 2 1)(x 2 4) 1 0.007865529(x 2 1)(x 2 4)(x 2 6) which can be used to evaluate f3(2) 5 0.6287686, which represents a relative error of ␧t 5 9.3%. The complete cubic polynomial is shown in Fig. 18.6. f(x) f(x) = ln x f3(x) True value Cubic estimate x 5 0 2 0 1 FIGURE 18.6 The use of cubic interpolation to estimate ln 2. 18.1.4 Errors of Newton’s Interpolating Polynomials Notice that the structure of Eq. (18.15) is similar to the Taylor series expansion in the sense that terms are added sequentially to capture the higher-order behavior of the underlying function. These terms are finite divided differences and, thus, represent
  • 515. 498 INTERPOLATION approximations of the higher-order derivatives. Consequently, as with the Taylor series, if the true underlying function is an nth-order polynomial, the nth-order interpolating polynomial based on n 1 1 data points will yield exact results. Also, as was the case with the Taylor series, a formulation for the truncation error can be obtained. Recall from Eq. (4.6) that the truncation error for the Taylor series could be expressed generally as Rn 5 f (n11) (j) (n 1 1)! (xi11 2 xi)n11 (4.6) where ␰ is somewhere in the interval xi to xi11. For an nth-order interpolating polynomial, an analogous relationship for the error is Rn 5 f (n11) (j) (n 1 1)! (x 2 x0)(x 2 x1) p (x 2 xn) (18.16) where ␰ is somewhere in the interval containing the unknown and the data. For this formula to be of use, the function in question must be known and differentiable. This is not usually the case. Fortunately, an alternative formulation is available that does not require prior knowledge of the function. Rather, it uses a finite divided difference to approximate the (n 1 1)th derivative, Rn 5 f[x, xn, xn21, p , x0](x 2 x0)(x 2 x1) p (x 2 xn) (18.17) where f[x, xn, xn21, . . . , x0] is the (n 1 1)th finite divided difference. Because Eq. (18.17) contains the unknown f(x), it cannot be solved for the error. However, if an additional data point f(xn11) is available, Eq. (18.17) can be used to estimate the error, as in Rn f[xn11, xn, xn21, p , x0](x 2 x0)(x 2 x1) p (x 2 xn) (18.18) EXAMPLE 18.4 Error Estimation for Newton’s Polynomial Problem Statement. Use Eq. (18.18) to estimate the error for the second-order polyno- mial interpolation of Example 18.2. Use the additional data point f(x3) 5 f(5) 5 1.609438 to obtain your results. Solution. Recall that in Example 18.2, the second-order interpolating polynomial provided an estimate of f2(2) 5 0.5658444, which represents an error of 0.6931472 2 0.5658444 5 0.1273028. If we had not known the true value, as is most usually the case, Eq. (18.18), along with the additional value at x3, could have been used to estimate the error, as in R2 5 f[x3, x2, x1, x0](x 2 x0)(x 2 x1)(x 2 x2) or R2 5 0.007865529(x 2 1)(x 2 4)(x 2 6) where the value for the third-order finite divided difference is as computed previously in Example 18.3. This relationship can be evaluated at x 5 2 for R2 5 0.007865529(2 2 1)(2 2 4)(2 2 6) 5 0.0629242 which is of the same order of magnitude as the true error.
  • 516. 18.1 NEWTON’S DIVIDED-DIFFERENCE INTERPOLATING POLYNOMIALS 499 From the previous example and from Eq. (18.18), it should be clear that the error esti- mate for the nth-order polynomial is equivalent to the difference between the (n 1 1)th order and the nth-order prediction. That is, Rn 5 fn11(x) 2 fn(x) (18.19) In other words, the increment that is added to the nth-order case to create the (n 1 1)th- order case [that is, Eq. (18.18)] is interpreted as an estimate of the nth-order error. This can be clearly seen by rearranging Eq. (18.19) to give fn11(x) 5 fn(x) 1 Rn The validity of this approach is predicated on the fact that the series is strongly con- vergent. For such a situation, the (n 1 1)th-order prediction should be much closer to the true value than the nth-order prediction. Consequently, Eq. (18.19) conforms to our standard definition of error as representing the difference between the truth and an approximation. However, note that whereas all other error estimates for iterative approaches introduced up to this point have been determined as a present prediction minus a previous one, Eq. (18.19) represents a future prediction minus a present one. This means that for a series that is converging rapidly, the error estimate of Eq. (18.19) could be less than the true error. This would represent a highly unattractive quality if the error estimate were being employed as a stopping criterion. However, as will be described in the following section, higher-order interpolating polynomials are highly sensitive to data errors—that is, they are very ill-conditioned. When employed for in- terpolation, they often yield predictions that diverge significantly from the true value. By “looking ahead” to sense errors, Eq. (18.19) is more sensitive to such divergence. As such, it is more valuable for the sort of exploratory data analysis for which Newton’s polynomial is best-suited. 18.1.5 Computer Algorithm for Newton’s Interpolating Polynomial Three properties make Newton’s interpolating polynomials extremely attractive for com- puter applications: 1. As in Eq. (18.7), higher-order versions can be developed sequentially by adding a single term to the next lower-order equation. This facilitates the evaluation of several different-order versions in the same program. Such a capability is especially valuable when the order of the polynomial is not known a priori. By adding new terms se- quentially, we can determine when a point of diminishing returns is reached—that is, when addition of higher-order terms no longer significantly improves the estimate or in certain situations actually detracts from it. The error equations discussed below in (3) are useful in devising an objective criterion for identifying this point of diminishing terms. 2. The finite divided differences that constitute the coefficients of the polynomial [Eqs. (18.8) through (18.11)] can be computed efficiently. That is, as in Eq. (18.14) and Fig. 18.5, lower-order differences are used to compute higher-order differences. By utilizing this previously determined information, the coefficients can be computed efficiently. The algorithm in Fig. 18.7 contains such a scheme. 3. The error estimate [Eq. (18.18)] can be very simply incorporated into a computer algorithm because of the sequential way in which the prediction is built.
  • 517. 500 INTERPOLATION All the above characteristics can be exploited and incorporated into a general algo- rithm for implementing Newton’s polynomial (Fig. 18.7). Note that the algorithm consists of two parts: The first determines the coefficients from Eq. (18.7), and the second deter- mines the predictions and their associated error. The utility of this algorithm is demon- strated in the following example. EXAMPLE 18.5 Error Estimates to Determine the Appropriate Order of Interpolation Problem Statement. After incorporating the error [Eq. (18.18)], utilize the computer algorithm given in Fig. 18.7 and the following information to evaluate f(x) 5 ln x at x 5 2: x f (x) ⫽ ln x 1 0 4 1.3862944 6 1.7917595 5 1.6094379 3 1.0986123 1.5 0.4054641 2.5 0.9162907 3.5 1.2527630 SUBROUTINE NewtInt (x, y, n, xi, yint, ea) LOCAL fddn,n DOFOR i 5 0, n fddi,0 5 yi END DO DOFOR j 5 1, n DOFOR i 5 0, n 2 j fddi,j 5 (fddi11,j21 2 fddi,j21)/(xi1j 2 xi) END DO END DO xterm 5 1 yint0 5 fdd0,0 DOFOR order 5 1, n xterm 5 xterm * (xi 2 xorder21) yint2 5 yintorder21 1 fdd0,order * xterm eaorder21 5 yint2 2 yintorder21 yintorder 5 yint2 END order END NewtInt FIGURE 18.7 An algorithm for Newton’s interpolating polynomial written in pseudocode.
  • 518. 18.1 NEWTON’S DIVIDED-DIFFERENCE INTERPOLATING POLYNOMIALS 501 Solution. The results of employing the algorithm in Fig. 18.7 to obtain a solution are shown in Fig. 18.8. The error estimates, along with the true error (based on the fact that ln 2 5 0.6931472), are depicted in Fig. 18.9. Note that the estimated error and the true error are similar and that their agreement improves as the order increases. From these results, it can be concluded that the fifth-order version yields a good estimate and that higher-order terms do not significantly enhance the prediction. This exercise also illustrates the importance of the positioning and ordering of the points. For example, up through the third-order estimate, the rate of improvement is slow because the points that are added (at x 5 4, 6, and 5) are distant and on one side of the point in question at x 5 2. The fourth-order estimate shows a somewhat greater improve- ment because the new point at x 5 3 is closer to the unknown. However, the most dramatic decrease in the error is associated with the inclusion of the fifth-order term using the data point at x 5 1.5. Not only is this point close to the unknown but it is also positioned on the opposite side from most of the other points. As a consequence, the error is reduced by almost an order of magnitude. The significance of the position and sequence of these data can also be illustrated by using the same data to obtain an estimate for ln 2 but considering the points in a different sequence. Figure 18.9 shows results for the case of reversing the order of the original data, that is, x0 5 3.5, x1 5 2.5, x3 5 1.5, and so forth. Because the initial points for this case are closer to and spaced on either side of ln 2, the error decreases much more rapidly than for the original situation. By the second-order term, the error has been reduced to less than ␧t 5 2%. Other combinations could be employed to obtain different rates of convergence. NUMBER OF POINTS? 8 X( 0 ), y( 0 ) = ? 1,0 X( 1 ), y( 1 ) = ? 4,1.3862944 X( 2 ), y( 2 ) = ? 6,1.7917595 X( 3 ), y( 3 ) = ? 5,1.6094379 X( 4 ), y( 4 ) = ? 3,1.0986123 X( 5 ), y( 5 ) = ? 1.5,0.40546411 X( 6 ), y( 6 ) = ? 2.5,0.91629073 X( 7 ), y( 7 ) = ? 3.5,1.2527630 INTERPOLATION AT X = 2 ORDER F(X) ERROR 0 0.000000 0.462098 1 0.462098 0.103746 2 0.565844 0.062924 3 0.628769 0.046953 4 0.675722 0.021792 5 0.697514 -0.003616 6 0.693898 -0.000459 7 0.693439 FIGURE 18.8 The output of a program, based on the algorithm from Fig. 18.7 to evaluate ln 2.
  • 519. 502 INTERPOLATION The foregoing example illustrates the importance of the choice of base points. As should be intuitively obvious, the points should be centered around and as close as pos- sible to the unknown. This observation is also supported by direct examination of the error equation [Eq. (18.17)]. If we assume that the finite divided difference does not vary markedly along the range of these data, the error is proportional to the product: (x 2 x0)(x 2 x1) p (x 2 xn). Obviously, the closer the base points are to x, the smaller the magnitude of this product. 18.2 LAGRANGE INTERPOLATING POLYNOMIALS The Lagrange interpolating polynomial is simply a reformulation of the Newton polyno- mial that avoids the computation of divided differences. It can be represented concisely as fn(x) 5 a n i50 Li(x) f(xi) (18.20) FIGURE 18.9 Percent relative errors for the prediction of ln 2 as a function of the order of the interpolating polynomial. Error True error (original) Estimated error (original) Estimated error (reversed) Order 5 0.5 0 –0.5
  • 520. 18.2 LAGRANGE INTERPOLATING POLYNOMIALS 503 where Li(x) 5 q n j50 j?1 x 2 xj xi 2 xj (18.21) where P designates the “product of.” For example, the linear version (n 5 1) is f1(x) 5 x 2 x1 x0 2 x1 f(x0) 1 x 2 x0 x1 2 x0 f(x1) (18.22) and the second-order version is f2(x) 5 (x 2 x1)(x 2 x2) (x0 2 x1)(x0 2 x2) f(x0) 1 (x 2 x0)(x 2 x2) (x1 2 x0)(x1 2 x2) f(x1) 1 (x 2 x0)(x 2 x1) (x2 2 x0)(x2 2 x1) f(x2) (18.23) Equation (18.20) can be derived directly from Newton’s polynomial (Box 18.1). However, the rationale underlying the Lagrange formulation can be grasped directly by realizing that each term Li(x) will be 1 at x 5 xi and 0 at all other sample points (Fig. 18.10). Thus, each product Li(x)f(xi) takes on the value of f(xi) at the sample point xi. Consequently, the summation of all the products designated by Eq. (18.20) is the unique nth-order polynomial that passes exactly through all n 1 1 data points. EXAMPLE 18.6 Lagrange Interpolating Polynomials Problem Statement. Use a Lagrange interpolating polynomial of the first and second order to evaluate ln 2 on the basis of the data given in Example 18.2: x0 5 1 f(x0) 5 0 x1 5 4 f(x1) 5 1.386294 x2 5 6 f(x2) 5 1.791760 Solution. The first-order polynomial [Eq. (18.22)] can be used to obtain the estimate at x 5 2, f1(2) 5 2 2 4 1 2 4 0 1 2 2 1 4 2 1 1.386294 5 0.4620981 In a similar fashion, the second-order polynomial is developed as [Eq. (18.23)] f2(2) 5 (2 2 4)(2 2 6) (1 2 4)(1 2 6) 0 1 (2 2 1)(2 2 6) (4 2 1)(4 2 6) 1.386294 1 (2 2 1)(2 2 4) (6 2 1)(6 2 4) 1.791760 5 0.5658444 As expected, both these results agree with those previously obtained using Newton’s interpolating polynomial.
  • 521. 504 INTERPOLATION Box 18.1 Derivation of the Lagrange Form Directly from Newton’s Interpolating Polynomial The Lagrange interpolating polynomial can be derived directly from Newton’s formulation. We will do this for the first-order case only [Eq. (18.2)]. To derive the Lagrange form, we reformulate the divided differences. For example, the first divided difference, f[x1, x0] 5 f(x1) 2 f(x0) x1 2 x0 (B18.1.1) can be reformulated as f[x1, x0] 5 f(x1) x1 2 x0 1 f(x0) x0 2 x1 (B18.1.2) which is referred to as the symmetric form. Substituting Eq. (B18.1.2) into Eq. (18.2) yields f1(x) 5 f(x0) 1 x 2 x0 x1 2 x0 f(x1) 1 x 2 x0 x0 2 x1 f(x0) Finally, grouping similar terms and simplifying yields the La- grange form, f1(x) 5 x 2 x1 x0 2 x1 f(x0) 1 x 2 x0 x1 2 x0 f(x1) FIGURE 18.10 A visual depiction of the rationale behind the Lagrange polynomial. This figure shows a second-order case. Each of the three terms in Eq. (18.23) passes through one of the data points and is zero at the other two. The summation of the three terms must, therefore, be the unique second-order polynomial f2(x) that passes exactly through the three points. Summation of three terms = f2(x) Third term Second term 150 0 100 50 –150 –100 –50 20 15 30 25 First term
  • 522. 18.2 LAGRANGE INTERPOLATING POLYNOMIALS 505 Note that, as with Newton’s method, the Lagrange version has an estimated error of [Eq. (18.17)] Rn 5 f[x, xn, xn21, p , x0] q n i50 (x 2 xi) Thus, if an additional point is available at x 5 xn11, an error estimate can be obtained. However, because the finite divided differences are not employed as part of the Lagrange algorithm, this is rarely done. Equations (18.20) and (18.21) can be very simply programmed for implementation on a computer. Figure 18.11 shows pseudocode that can be employed for this purpose. In summary, for cases where the order of the polynomial is unknown, the Newton method has advantages because of the insight it provides into the behavior of the different-order formulas. In addition, the error estimate represented by Eq. (18.18) can usually be integrated easily into the Newton computation because the estimate employs a finite difference (Example 18.5). Thus, for exploratory computations, Newton’s method is often preferable. When only one interpolation is to be performed, the Lagrange and Newton formula- tions require comparable computational effort. However, the Lagrange version is some- what easier to program. Because it does not require computation and storage of divided differences, the Lagrange form is often used when the order of the polynomial is known a priori. EXAMPLE 18.7 Lagrange Interpolation Using the Computer Problem Statement. We can use the algorithm from Fig. 18.11 to study a trend analysis problem associated with our now-familiar falling parachutist. Assume that we have FUNCTION Lagrng(x, y, n, xx) sum 5 0 DOFOR i 5 0, n product 5 yi DOFOR j 5 0, n IF i ⫽ j THEN product 5 product*(xx 2 xj)/(xi 2 xj) ENDIF END DO sum 5 sum 1 product END DO Lagrng 5 sum END Lagrng FIGURE 18.11 Pseudocode to implement Lagrange interpolation. This algorithm is set up to compute a single nth-order prediction, where n 1 1 is the number of data points.
  • 523. 506 INTERPOLATION developed instrumentation to measure the velocity of the parachutist. The measured data obtained for a particular test case are Time, Measured Velocity v, s cm/s 1 800 3 2310 5 3090 7 3940 13 4755 Our problem is to estimate the velocity of the parachutist at t 5 10s to fill in the large gap in the measurements between t 5 7 and t 5 13s. We are aware that the behavior of interpolating polynomials can be unexpected. Therefore, we will construct polynomials of orders 4, 3, 2, and 1 and compare the results. Solution. The Lagrange algorithm can be used to construct fourth-, third-, second-, and first-order interpolating polynomials. The fourth-order polynomial and the input data can be plotted as shown in Fig. 18.12a. It is evident from this plot that the estimated value of y at x 5 10 is higher than the overall trend of these data. Figure 18.12b through d shows plots of the results of the computations for third-, second-, and first-order interpolating polynomials, respectively. It is noted that the lower the order, the lower the estimated value of the velocity at t 5 10s. The plots of the in- terpolating polynomials indicate that the higher-order polynomials tend to overshoot the trend of these data. This suggests that the first- or second-order versions are most ap- propriate for this particular trend analysis. It should be remembered, however, that be- cause we are dealing with uncertain data, regression would actually be more appropriate. v, cm/s v, cm/s 0 0 3000 6000 5 10 15 0 0 3000 6000 5 10 t(s) 15 0 0 3000 6000 5 10 15 0 0 3000 6000 5 10 t(s) 15 (a) (b) (c) (d) FIGURE 18.12 Plots showing (a) fourth-order, (b) third-order, (c) second-order, and (d) first-order interpolations.
  • 524. 18.4 INVERSE INTERPOLATION 507 The preceding example illustrates that higher-order polynomials tend to be ill- conditioned, that is, they tend to be highly sensitive to round-off error. The same problem applies to higher-order polynomial regression. Double-precision arithmetic sometimes helps mitigate the problem. However, as the order increases, there will come a point at which round-off error will interfere with the ability to interpolate using the simple approaches covered to this point. 18.3 COEFFICIENTS OF AN INTERPOLATING POLYNOMIAL Although both the Newton and the Lagrange polynomials are well-suited for determining intermediate values between points, they do not provide a convenient polynomial of the conventional form f(x) 5 a0 1 a1x 1 a2x2 1 p 1 anxn (18.24) A straightforward method for computing the coefficients of this polynomial is based on the fact that n 1 1 data points are required to determine the n 1 1 coefficients. Thus, simultaneous linear algebraic equations can be used to calculate the a’s. For example, suppose that you desired to compute the coefficients of the parabola f(x) 5 a0 1 a1x 1 a2x2 (18.25) Three data points are required: [x0, f(x0)], [x1, f(x1)], and [x2, f(x2)]. Each can be substi- tuted into Eq. (18.25) to give f(x0) 5 a0 1 a1x0 1 a2x2 0 f(x1) 5 a0 1 a1x1 1 a2x2 1 (18.26) f(x2) 5 a0 1 a1x2 1 a2x2 2 Thus, for this case, the x’s are the knowns and the a’s are the unknowns. Because there are the same number of equations as unknowns, Eq. (18.26) could be solved by an elimination method from Part Three. It should be noted that the foregoing approach is not the most efficient method that is available to determine the coefficients of an interpolating polynomial. Press et al. (2007) provide a discussion and computer codes for more efficient approaches. Whatever technique is employed, a word of caution is in order. Systems such as Eq. (18.26) are notoriously ill-conditioned. Whether they are solved with an elimination method or with a more efficient algorithm, the resulting coefficients can be highly inaccurate, particularly for large n. When used for a subsequent interpolation, they often yield erroneous results. In summary, if you are interested in determining an intermediate point, employ Newton or Lagrange interpolation. If you must determine an equation of the form of Eq. (18.24), limit yourself to lower-order polynomials and check your results carefully. 18.4 INVERSE INTERPOLATION As the nomenclature implies, the f(x) and x values in most interpolation contexts are the dependent and independent variables, respectively. As a consequence, the values of the x’s are typically uniformly spaced. A simple example is a table of values derived for the
  • 525. 508 INTERPOLATION function f(x) 5 1yx, x 1 2 3 4 5 6 7 f(x) 1 0.5 0.3333 0.25 0.2 0.1667 0.1429 Now suppose that you must use the same data, but you are given a value for f(x) and must determine the corresponding value of x. For instance, for the data above, sup- pose that you were asked to determine the value of x that corresponded to f(x) 5 0.3. For this case, because the function is available and easy to manipulate, the correct answer can be determined directly as x 5 1y0.3 5 3.3333. Such a problem is called inverse interpolation. For a more complicated case, you might be tempted to switch the f(x) and x values [that is, merely plot x versus f(x)] and use an approach like Lagrange interpolation to determine the result. Unfortunately, when you reverse the variables, there is no guarantee that the values along the new abscissa [the f(x)’s] will be evenly spaced. In fact, in many cases, the values will be “telescoped.” That is, they will have the appearance of a logarithmic scale with some adjacent points bunched together and others spread out widely. For example, for f(x) 5 1yx the result is f(x) 0.1429 0.1667 0.2 0.25 0.3333 0.5 1 x 7 6 5 4 3 2 1 Such nonuniform spacing on the abscissa often leads to oscillations in the resulting interpolating polynomial. This can occur even for lower-order polynomials. An alternative strategy is to fit an nth-order interpolating polynomial, fn(x), to the original data [that is, with f(x) versus x]. In most cases, because the x’s are evenly spaced, this polynomial will not be ill-conditioned. The answer to your problem then amounts to finding the value of x that makes this polynomial equal to the given f(x). Thus, the interpolation problem reduces to a roots problem! For example, for the problem outlined above, a simple approach would be to fit a qua- dratic polynomial to the three points: (2, 0.5), (3, 0.3333) and (4, 0.25). The result would be f2(x) 5 1.08333 2 0.375x 1 0.041667x2 The answer to the inverse interpolation problem of finding the x corresponding to f(x) 5 0.3 would therefore involve determining the root of 0.3 5 1.08333 2 0.375x 1 0.041667x2 For this simple case, the quadratic formula can be used to calculate x 5 0.375 6 2(20.375)2 2 4(0.041667)0.78333 2(0.041667) 5 5.704158 3.295842 Thus, the second root, 3.296, is a good approximation of the true value of 3.333. If additional accuracy were desired, a third- or fourth-order polynomial along with one of the root location methods from Part Two could be employed. 18.5 ADDITIONAL COMMENTS Before proceeding to the next section, we must mention two additional topics: interpola- tion with equally spaced data and extrapolation.
  • 526. 18.5 ADDITIONAL COMMENTS 509 Because both the Newton and Lagrange polynomials are compatible with arbitrarily spaced data, you might wonder why we address the special case of equally spaced data (Box 18.2). Prior to the advent of digital computers, these techniques had great utility for interpolation from tables with equally spaced arguments. In fact, a computational framework known as a divided-difference table was developed to facilitate the imple- mentation of these techniques. (Figure 18.5 is an example of such a table.) However, because the formulas are subsets of the computer-compatible Newton and Lagrange schemes and because many tabular functions are available as library subroutines, the need for the equispaced versions has waned. In spite of this, we have included them at this point because of their relevance to later parts of this book. In particular, they are needed to derive numerical integration formulas that typically employ equispaced data (Chap. 21). Because the numerical integration formulas have relevance to the solution of ordinary differential equations, the material in Box 18.2 also has significance to Part Seven. Extrapolation is the process of estimating a value of f(x) that lies outside the range of the known base points, x0, x1, . . . , xn (Fig. 18.13). In a previous section, we mentioned that the most accurate interpolation is usually obtained when the unknown lies near the center of the base points. Obviously, this is violated when the unknown lies outside the range, and consequently, the error in extrapolation can be very large. As depicted in Fig. 18.13, the open-ended nature of extrapolation represents a step into the unknown because the process extends the curve beyond the known region. As such, the true curve could easily diverge from the prediction. Extreme care should, therefore, be exercised whenever a case arises where one must extrapolate. FIGURE 18.13 Illustration of the possible divergence of an extrapolated prediction. The extrapolation is based on fitting a parabola through the first three known points. f(x) x True curve Extrapolation of interpolating polynomial Interpolation Extrapolation x2 x1 x0
  • 527. 510 INTERPOLATION Box 18.2 Interpolation with Equally Spaced Data If data are equally spaced and in ascending order, then the indepen- dent variable assumes values of x1 5 x0 1 h x2 5 x0 1 2h . . . xn 5 x0 1 nh where h is the interval, or step size, between these data. On this basis, the finite divided differences can be expressed in concise form. For example, the second forward divided difference is f[x0, x1, x2] 5 f(x2) 2 f(x1) x2 2 x1 2 f(x1) 2 f(x0) x1 2 x0 x2 2 x0 which can be expressed as f[x0, x1, x2] 5 f(x2) 2 2 f(x1) 1 f(x0) 2h2 (B18.2.1) because x1 2 x0 5 x2 2 x1 5 (x2 2 x0)y2 5 h. Now recall that the second forward difference is equal to [numerator of Eq. (4.24)] ¢2 f(x0) 5 f(x2) 2 2 f(x1) 1 f(x0) Therefore, Eq. (B18.2.1) can be represented as f[x0, x1, x2] 5 ¢2 f(x0) 2!h2 or, in general, f[x0, x1, p , xn] 5 ¢n f(x0) n!hn (B18.2.2) Using Eq. (B18.2.2), we can express Newton’s interpolating poly- nomial [Eq. (18.15)] for the case of equispaced data as fn(x) 5 f(x0) 1 ¢ f(x0) h (x 2 x0) 1 ¢2 f(x0) 2!h2 (x 2 x0)(x 2 x0 2 h) 1 p 1 ¢n f(x0) n!hn (x 2 x0)(x 2 x0 2 h) p[x 2 x0 2 (n 2 1)h] 1 Rn (B18.2.3) where the remainder is the same as Eq. (18.16). This equation is known as Newton’s formula, or the Newton-Gregory forward for- mula. It can be simplified further by defining a new quantity, ␣: a 5 x 2 x0 h This definition can be used to develop the following simplified ex- pressions for the terms in Eq. (B18.2.3): x 2 x0 5 ah x 2 x0 2 h 5 ah 2 h 5 h(a 2 1) . . . x 2 x0 2 (n 2 1)h 5 ah 2 (n 2 1)h 5 h(a 2 n 1 1) which can be substituted into Eq. (B18.2.3) to give fn(x) 5 f(x0) 1 ¢f(x0)a 1 ¢2 f(x0) 2! a(a 2 1) 1 p 1 ¢n f(x0) n! a(a 2 1) p (a 2 n 1 1) 1 Rn (B18.2.4) where Rn 5 f (n11) (j) (n 1 1)! hn11 a(a 2 1)(a 2 2) p (a 2 n) This concise notation will have utility in our derivation and error analyses of the integration formulas in Chap. 21. In addition to the forward formula, backward and central Newton-Gregory formulas are also available. Carnahan, Luther, and Wilkes (1969) can be consulted for further information regard- ing interpolation for equally spaced data.
  • 528. 18.6 SPLINE INTERPOLATION 511 18.6 SPLINE INTERPOLATION In the previous sections, nth-order polynomials were used to interpolate between n 1 l data points. For example, for eight points, we can derive a perfect seventh-order poly- nomial. This curve would capture all the meanderings (at least up to and including seventh derivatives) suggested by the points. However, there are cases where these func- tions can lead to erroneous results because of round-off error and overshoot. An alterna- tive approach is to apply lower-order polynomials to subsets of data points. Such connecting polynomials are called spline functions. For example, third-order curves employed to connect each pair of data points are called cubic splines. These functions can be constructed so that the connections between adjacent cubic equations are visually smooth. On the surface, it would seem that the third-order approximation of the splines would be inferior to the seventh-order expres- sion. You might wonder why a spline would ever be preferable. Figure 18.14 illustrates a situation where a spline performs better than a higher- order polynomial. This is the case where a function is generally smooth but undergoes an abrupt change somewhere along the region of interest. The step increase depicted in Fig. 18.14 is an extreme example of such a change and serves to illustrate the point. Figure 18.14a through c illustrates how higher-order polynomials tend to swing through wild oscillations in the vicinity of an abrupt change. In contrast, the spline also connects the points, but because it is limited to lower-order changes, the oscillations are kept to a minimum. As such, the spline usually provides a superior approximation of the behavior of functions that have local, abrupt changes. The concept of the spline originated from the drafting technique of using a thin, flexible strip (called a spline) to draw smooth curves through a set of points. The process is depicted in Fig. 18.15 for a series of five pins (data points). In this technique, the drafter places paper over a wooden board and hammers nails or pins into the paper (and board) at the location of the data points. A smooth cubic curve results from interweaving the strip between the pins. Hence, the name “cubic spline” has been adopted for poly- nomials of this type. In this section, simple linear functions will first be used to introduce some basic concepts and problems associated with spline interpolation. Then we derive an algorithm for fitting quadratic splines to data. Finally, we present material on the cubic spline, which is the most common and useful version in engineering practice. 18.6.1 Linear Splines The simplest connection between two points is a straight line. The first-order splines for a group of ordered data points can be defined as a set of linear functions, f(x) 5 f(x0) 1 m0(x 2 x0) x0 # x # x1 f(x) 5 f(x1) 1 m1(x 2 x1) x1 # x # x2 . . . f(x) 5 f(xn21) 1 mn21(x 2 xn21) xn21 # x # xn
  • 529. 512 INTERPOLATION where mi is the slope of the straight line connecting the points: mi 5 f(xi11) 2 f(xi) xi11 2 xi (18.27) These equations can be used to evaluate the function at any point between x0 and xn by first locating the interval within which the point lies. Then the appropriate equation (a) f(x) x 0 (b) f(x) x 0 (c) f(x) x 0 (d) f(x) x 0 FIGURE 18.14 A visual representation of a situation where the splines are superior to higher-order interpolating polynomials. The function to be fit undergoes an abrupt increase at x 5 0. Parts (a) through (c) indicate that the abrupt change induces oscillations in interpolating polynomials. In contrast, because it is limited to third-order curves with smooth transitions, a linear spline (d) provides a much more acceptable approximation.
  • 530. 18.6 SPLINE INTERPOLATION 513 is used to determine the function value within the interval. The method is obviously identical to linear interpolation. EXAMPLE 18.8 First-Order Splines Problem Statement. Fit the data in Table 18.1 with first-order splines. Evaluate the function at x 5 5. Solution. These data can be used to determine the slopes between points. For example, for the interval x 5 4.5 to x 5 7 the slope can be computed using Eq. (18.27): m 5 2.5 2 7 7 2 4.5 5 0.60 The slopes for the other intervals can be computed, and the resulting first-order splines are plotted in Fig. 18.16a. The value at x 5 5 is 1.3. TABLE 18.1 Data to be fit with spline functions. x f(x) 3.0 2.5 4.5 1.0 7.0 2.5 9.0 0.5 FIGURE 18.15 The drafting technique of using a spline to draw smooth curves through a series of points. Notice how, at the end points, the spline straightens out. This is called a “natural” spline.
  • 531. 514 INTERPOLATION Visual inspection of Fig. 18.16a indicates that the primary disadvantage of first- order splines is that they are not smooth. In essence, at the data points where two splines meet (called a knot), the slope changes abruptly. In formal terms, the first derivative of the function is discontinuous at these points. This deficiency is overcome by using higher- order polynomial splines that ensure smoothness at the knots by equating derivatives at these points, as discussed in the next section. 18.6.2 Quadratic Splines To ensure that the mth derivatives are continuous at the knots, a spline of at least m 1 1 order must be used. Third-order polynomials or cubic splines that ensure continuous first and second derivatives are most frequently used in practice. Although third and higher derivatives could be discontinuous when using cubic splines, they usually cannot be detected visually and consequently are ignored. FIGURE 18.16 Spline fits of a set of four points. (a) Linear spline, (b) quadratic spline, and (c) cubic spline, with a cubic interpolating polynomial also plotted. f(x) x 10 2 4 6 (a) 8 0 2 f(x) x (b) 0 2 f(x) x (c) 0 Interpolating cubic First-order spline Second-order spline Cubic spline 2
  • 532. 18.6 SPLINE INTERPOLATION 515 Because the derivation of cubic splines is somewhat involved, we have chosen to include them in a subsequent section. We have decided to first illustrate the concept of spline interpolation using second-order polynomials. These “quadratic splines” have con- tinuous first derivatives at the knots. Although quadratic splines do not ensure equal second derivatives at the knots, they serve nicely to demonstrate the general procedure for developing higher-order splines. The objective in quadratic splines is to derive a second-order polynomial for each in- terval between data points. The polynomial for each interval can be represented generally as fi(x) 5 aix2 1 bix 1 ci (18.28) Figure 18.17 has been included to help clarify the notation. For n 1 1 data points (i 5 0, 1, 2, . . . , n), there are n intervals and, consequently, 3n unknown constants (the a’s, b’s, and c’s) to evaluate. Therefore, 3n equa