SlideShare a Scribd company logo
Understanding the “Chain Rule” by Deriving
Your Own Version
July 29, 2018
James Smith
nitac14b@yahoo.com
https://mx.linkedin.com/in/james-smith-1b195047
Abstract
Because the Chain Rule can confuse students as much as it helps them
solve real problems, we put ourselves in the shoes of the mathematicians
who derived it, so that students may understand the motivation for the
rule; its limitations; and why textbooks present it in its customary form.
We begin by finding the derivative of sin 2x without using the Chain
Rule. That exercise, having shown that even a comparatively simple
compound function can be bothersome to differentiate using the definition
of the derivative as a limit, provides the motivation for developing our
own formula for the derivative of the general compound function g [f (x)].
In the course of that development, we see why the function f must be
continuous at any value of x to which the formula is applied. We finish by
comparing our formula to that which is commonly given.
1
Contents
1 Introduction 2
2 Motivational Example: Derivative of sin 2x from the Definition
of the Derivative as a Limit 2
3 Looking for an Easier Route 3
4 Comparison with the Usual Form of the Chain Rule 11
1 Introduction
When we are learning a new technique in mathematics, we benefit from familiar-
izing ourselves with the type of problem that the method was developed to solve.
We also benefit from struggling with a few problems of that sort before being
shown the technique in its modern form. In that way, we are better prepared
to understand that version of that technique, as well as its derivation. We also
become better problem-solvers in general.
In this document, we will develop our own formula for the derivative of a
composite function, then compare it to a version of the Chain Rule that is found
in many standard calculus textbooks. As a motivational example (that is, to
help us see why some sort of Chain Rule would be desirable), we’ll begin by
finding
d sin 2x
dx
using the definition
du (x)
dx x=a
= lim
δ→0
u (a + δ) − u (a)
δ
, (1.1)
followed by the Law of Universal Generalization.
2 Motivational Example: Derivative of sin 2x from
the Definition of the Derivative as a Limit
For the real number a, arbitrary, the definition in Eq. (1.1) gives
d sin 2x
dx x=a
= lim
δ→0
sin 2 (a + δ) − sin 2a
δ
.
The trig identities that we need:
(1) sin 2θ = 2 sin θ cos θ
(2) sin (α + β) =
sin α cos β + cos α sin β
(3) cos (α + β) =
cos α cos β − sin α sin β.
The next several steps use trigonometric identities for sums and doubles of
angles to transform the right-hand side.
d sin 2x
dx x=a
= lim
δ→0
2 sin (a + δ) cos (a + δ) − sin 2a
δ
= lim
δ→0
2 [sin (a + δ)] [cos (a + δ)] − sin 2a
δ
= lim
δ→0
2 [sin a cos δ + cos a sin δ] [cos a cos δ − sin a sin δ] − sin 2a
δ
= lim
δ→0
2 sin a cos a cos2
δ − sin2
δ − 1 + 2 cos2
a − sin2
a sin δ cos δ
δ
= lim
δ→0
2 sin a cos a −2 sin2
δ + 2 cos 2a sin δ cos δ
δ
.
Next, we transform the right-hand side in a way that will enables to use theorems
about limits.
d sin 2x
dx x=a
= lim
δ→0
−4 sin a cos a sin2
δ
δ
+
2 cos 2a sin δ cos δ
δ
Now, we’ll use those theorems about limits.
d sin 2x
dx x=a
= lim
δ→0
−4 sin a cos a sin2
δ
δ
+ lim
δ→0
2 cos 2a sin δ cos δ
δ
= −4 sin a cos a lim
δ→0
sin δ
δ
sin δ + 2 cos 2a lim
δ→0
sin δ
δ
cos δ
= −4 sin a cos a lim
δ→0
sin δ
δ
=1
lim
δ→0
sin δ
=0
+2 cos 2a lim
δ→0
sin δ
δ
=1
lim
δ→0
cos δ
=1
= 2 cos 2a.
We conclude by saying that because a was an arbitrary real number, the
result is valid for all real numbers. Customarily, we communicate that conclusion
by writing
d sin 2x
dx
= 2 cos 2x.
WOW! That was a lot of work to find the derivative of such a simple function.
We might well dread trying to find the derivative of, say, sin
√
1 + log x by the
same route. That is, by starting from the definition in (Eq. (1.1)). Let’s see if
we can find a better idea.
3 Looking for an Easier Route
Anyone who has solved math problems “by hand” or with spreadsheets has
seen several benefits of treating functions like sin 2x as composites of the form
3
v [u (x)]. (In the case of sin 2x, u (x) = 2x, and v is the sine function.) Therefore,
why not attempt to derive a formula for the derivative of the generic composite
function g [f (x)]? We want our formula to be applicable to as many types of
functions as possible, so we’ll accept restrictions upon g, f, and their domains
only when necessary. We’ll begin by writing, for x = a, arbitrary,
dg [f (x)]
dx x=a
= lim
δ→0
g [f (a + δ)] − g [f (a)]
δ
. (3.1)
The expression on the right-hand side appears unhelpful, so we’ll look for
ideas that might suggest ways to transform it. We’re searching for notions, so
for now we won’t pay much attention to rigor—time for that later. If we bear in
mind that
dg [f (x)]
dx
is ‘the rate of change of g with respect to x”, we might jot
down (informally)
Rate of
change of g
with respect
to x
=



Rate of change
of f with re-
spect to x






Rate of change
of g with re-
spect to f



.
Continuing to think informally, might rewrite that note as
dg
dx
=
df
dx
dg
df
. (3.2)
This idea has intuitive appeal. Let’s test it on our result for
d sin 2x
dx
. As
we noted above, our “f” in that case is 2x, so
df
dx
would be 2. Ou “g” is the sine
function. Viewing 2x as a single variable, the derivative of sin 2x with respect
to 2x would be cos 2x . Thus,
dg
df
in our case would be cos 2x. Putting these
ideas together,
d sin 2x
dx
=
df
dx
dg
df
= [2] cos 2x
= 2 cos 2x,
which is the result that we we obtained in our motivational example.
Although we would appear to be on the right track, we can’t trust our idea
that
dg
dx
=
df
dx
dg
df
without deriving it rigorously—for example, from Ec. (3.1))—and
expressing it clearly. How might we do that? Recognizing that Ec. (3.1) refers
4
specifically to the value of the derivative for x = a, we might write
Rate of
change of g
with respect
to x
=
df (x)
dx x=a



Rate of change
of g with re-
spect to f at
x = a



, and therefore
dg [f (x)]
dx x=a
= lim
δ→0
f (a + δ) − (a)
δ



Rate of change
of g with re-
spect to f at
x = a



. (3.3)
At this point, we might note that the quantity in the box on the right-hand
side is a limit of “something” as δ → 0:
dg [f (x)]
dx x=a
= lim
δ→0
f (a + δ) − (a)
δ
lim
δ→0
[“Something”] . (3.4)
But what is that “Something”? Comparing the right-hand sides of Ecs. (3.1)
and (3.4), and using the theorem that “the limit of a product of functions is the
product of the functions’ limits”, we reason as follows:
dg [f (x)]
dx x=a
=
dg [f (x)]
dx x=a
lim
δ→0
f (a + δ) − (a)
δ
lim
δ→0
[“Something”] = lim
δ→0
g [f (a + δ)] − g [f (a)]
δ
lim
δ→0
f (a + δ) − (a)
δ
[“Something”] = lim
δ→0
g [f (a + δ)] − g [f (a)]
δ
∴ “Something” =
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
.
Now, we can return to Ec. (3.4) to write
dg [f (x)]
dx x=a
= lim
δ→0
f (a + δ) − (a)
δ
lim
δ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
=
df (x)
dx x=a
lim
δ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
. (3.5)
A restriction upon the formula
that we’re attempting to
develop: the derivative of f with
respect to x must exist at x = a.
Having written that result, we need to recall that it is true only if the derivative
of f with respect to x exists at a.
We’ve just placed our first restriction upon whatever result we may obtain
from our derivation.
Our question now is what to to with the remaining limit on the right-hand
side of Ec. (3.5):
lim
δ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
That’s quite a “busy” expression, so let’s draw a graph to help us “get our minds
around it”. We’ll start with a graph of f (x) (Fig. 1).
5
Figure 1: Our first step in constructing a graph that might help us to understand
the limit on the right-hand side of Eq. (3.5): the graph of f (x) .
Figure 2: To eliminate a possible confusion, we’ve graphed both f and g as
functions of the real variable z.
Next, we’ll want to add the graph of g [f (x)]. But how do we do that, on a
graph whose horizontal axis is x?
At this point, we might realize that we’ve been a little careless with our use
of symbols. We’re accustomed to using the single symbol “x” both to represent
the independent variable in a problem (as we have here with f (x)), and as a
coordinate along the real-number line (as in our graph). That dual meaning
seldom causes trouble for us, but now it has.
“Defined” means that for every
real number b, there exists a
unique real number f (b).
Similarly for g.
To find a way forward, let’s consider the case where both f and g are
defined for every real number z. (We’ll discuss more-complicated cases later.)
We’ll graph both functions in that way (Fig. 2). Now, along the horizontal axis,
we’ll locate the point for the number a at which we’re evaluating our derivative
dg [f (x)]
dx
. On the vertical axis, we’ll locate the point for the number f (a) (Fig.
3).
6
Figure 3: Along the horizontal axis, we’ve located the point for the number a
at which we’re evaluating our derivative
dg [f (x)]
dx
. On the vertical axis, we’ve
located the point for the number f (a).
Figure 4: After locating the point on the horizontal axis for the number f (a).
The set-theoretic concept of a
function may be helpful here:
The function f is the set of
ordered pairs (b, c) such that no
two pairs have b as their first
element. We can call c “the
value of f at z = b”. When
making a graph of f, we
“highlight” those points whose
horizontal coordinate is the first
element of some pair, and whose
vertical coordinate is the second
element of that same pair.
Because the function g is defined for every real number, it’s defined for the
specific real number f (a). Therefore, our next step is to locate the point on the
horizontal axis for that number (Fig. 4). The value of g, evaluated at z = f (a),
is some specific real number that we’ll write as g [f (a)]. In Fig. 5, we locate the
point for that number along the vertical axis.
We seem to be progressing, but we’ve yet to incorporate δ. We should know
how to do that; we did it many times in our first classes on derivatives as limits.
Still, before adding to our graph the points that involve δ, we want to think a
bit about our goal: we want to understand what happens when δ approaches
zero. To that end, we first attempt to understand the situation that exists when
δ is some suitably small, non-zero number. Taking δ as positive for the time
being (Fig. 6), we locate the point for the number z = a + δ on the horizontal
7
Figure 5: After locating the point on the vertical axis for the number g [f (a)].
Figure 6: After locating the point for g [(a + δ)].
axis, and that for the number f (a + δ) on the vertical axis. Then, we locate
f (a + δ) on the horizontal axis, and g [f (a + δ)] on the vertical axis.
We’re ready, finally, to investigate limδ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
. To avoid
distractions, we’ll eliminate the portions of our graph that don’t concern g (Fig.
7). We’ll also draw a straight line connecting the indicated points on the curve
for g (z).
We can let our early experiences with derivatives as limits guide us now.
As the interval between z = f (a) and z = f (a + δ) shrinks, the straight line
that we drew becomes the line tangent to the graph of g at z = f (a) (Fig. 8).
The slope of that tangent line is
dg (z)
dz z=f(a)
. Therefore, if f (a + δ) − f (a)
goes to zero as δ itself shrinks to zero, then
lim
δ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
=
dg (z)
dz z=f(a)
. (3.6)
8
Figure 7: Focusing on the curve for g. We’ve added the secant line as preparation
for considering what occurs when δ → 0.
Figure 8: The tangent that the secant line shown in Fig. 7 approaches if
f (a + δ) → f (a) as δ → 0. The slope of the tangent is
dg (z)
dz z=f(a)
.
But notice the “if”: as we know, not all functions behave as stated for every
real number.
Functions can also be piecewise
continuous; that is, continuous
on certain intervals. The same
arguments that we’re using here
work for that type of function as
well.
Where does that realization leave us? The bad news is that if we’re dealing
with a function f and a number a such that f (a + δ) does not go to f (a) as
δ goes to zero, then we’re stuck: nothing can be done. The good news is that
many common functions do have the required behavior: they’re the type that
mathematicians call continuous. That is, the functions u (z) such that for every
real number c,
lim
z→c+
u (z) = lim
z→c−
u (z) = u (c) .
Polynomials and sin x are examples of continuous functions.
The need for f (z) to be continuous at a becomes apparent when f has the
behavior shown in Fig. 9. In such a case, the point on the horizontal axis for
9
Figure 9: A function in which f (a + δ) would be equal to (a) several times
as δ → 0, at each of which the denominator in Eq. (3.6) would be zero. Those
subtleties require treatment that is beyond the scope of this document.
f (a + δ) will alternate between being to the left and the right of that for f (a).
Nevertheless, the length of the interval will shrink to zero as δ itself goes to
zero. Note that for some values of δ in Fig. 9, f (a + δ) = f (a), making the
denominator zero in the limit in Eq. (3.6). Those subtleties require treatment
that is beyond the scope of this document.
However, those considerations should not distract us from what we’ve accom-
plished: accepting the restriction that f must be continuous at a is a small price
to pay for being able to reduce a monstrosity like limδ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
to
dg (z)
dz z=f(a)
. Using that result, we can write that if f is continuous at a,
and if
dg (z)
dz
is continuous at f (a), then
dg [f (a)]
dx x=a
=
df (x)
dx x=a
dg (z)
dz z=f(a)
. (3.7)
The right-hand side of that equation is messy because of the (apparently) different
variables x and z. To clean it up, we can ask ourselves what those variables mean
in this context. We’ll start with
df (x)
dx x=a
: that expression means “the rate
of change of the dependent variable f with respect to its independent variable,
when the value of the latter is a”. In the context of our present problem, x and
z refer to the same variable. Therefore, we’ll use z, and rewrite Eq. (3.7) as
dg [f (a)]
dx x=a
=
df (z)
dz z=a
dg (z)
dz z=f(a)
. (3.8)
Let’s test that result on the function sin 2x, which we used in our motiva-
tional example. Our f (x) is 2x, and our g is the sine function. The procedure
is shown in Table 1 to find the derivative of sin 2x according to Eq. (3.8).
10
Table 1: Implementation of Eq. (3.8)
Step
Implementation for
g [f (x)] = sin 2x
Identify f (z) and g (z) f (z) = 2z, g (z) = sin z
Identify formulas for
df (z)
dz
, and
dg (z)
dz
d2z
dz
= 2,
and
d sin z
dz
= cos z
Evaluate
df (z)
dz
at z = a,
and
dg (z)
dz
at z = f (a)
2|z=a = 2;
and cos z |z=2a = cos 2a
dg [f (a)]
dz x=a
=
df (z)
dz z=a
dg (z)
dz z=f(a)
dg [sin 2x]
dz x=a
= [2] [cos 2a]
Because the procedure worked, we now invoke the Law of Universal Gen-
eralization to write that because (1) the function f = 2z is continuous for all
values of z, (2)
d2z
dz
exists at all z, and (3) cos z exists at all z,
d sin 2x
dx
= 2 cos 2x.
Now that we’re sure our method for finding the derivative of a compound
function is sound, we’ll want to compare our method to the standard formulation
of the Chain Rule.
4 Comparison with the Usual Form of the Chain
Rule
A typical presentation of the Chain Rule is
If a variable g depends on the variable f, which itself depends on the
variable x, so that f and g are therefore dependent variables, then g,
via the intermediate variable of f, depends on x as well. The chain
rule then states 1
dg
dx
=
dg
df
df
dx
. (4.1)
1Paraphrase of Wikipedia’s article “Chain rule”, accessed 22 Julio 2018.
11
Eq. (4.1) is identical to our “intuitive” Eq. (3.2). In both, the derivative
df
dx
is the customary way of writing the generalization of our
df (z)
dz
|z=a (in Eq.
(3.8)) to the whole set of real numbers. (Or more accurately, to those at which
df
dz
exists.) But what about the factor
dg
df
in Eqs. (3.2) and (4.1)? It must
be equal to the factor
dg (z)
dz
|z=f(a) in our Eq. (3.8). Can we establish that
equality rigorously?
For any given f, g, and a for
which the limit
limδ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
exists, that limit is a specific real
number.
Let’s review the analysis through which we established that
lim
δ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
=
dg (z)
dz z=f(a)
,
with the restriction that f must be continuous at a. We accepted that restriction
because it ensured that [f (a + δ) − f (a)] → 0 as δ → 0. Therefore, the
expressions
lim
δ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
and
lim
[f(a+δ)−f(a)]→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
are the same number. The latter expression is the definition, given in the form
of a limit, of
dg
df
|f=f(a). We’d generalize that result by writing simply
dg
df
.
Thus,
dg
df
in Eqs. (3.2) and (4.1) is indeed the generalization of the factor
dg (z)
dz z=f(a)
that we identified in our own version of the Chain Rule.
12

More Related Content

PDF
Limits, Continuity & Differentiation (Theory)
PDF
Review 1 -_limits-_continuity_(pcalc+_to_ap_calc)
DOCX
Introduction to calculus
DOCX
Limits and continuity[1]
PDF
limits and continuity
PPTX
Limit and continuity
PDF
Lesson 11: Limits and Continuity
PPT
The Application of Derivatives
Limits, Continuity & Differentiation (Theory)
Review 1 -_limits-_continuity_(pcalc+_to_ap_calc)
Introduction to calculus
Limits and continuity[1]
limits and continuity
Limit and continuity
Lesson 11: Limits and Continuity
The Application of Derivatives

What's hot (20)

PDF
Lesson 27: Evaluating Definite Integrals
PDF
PDF
Lesson 27: Integration by Substitution (Section 041 slides)
PPT
Limits And Derivative
PPT
Functions limits and continuity
PPTX
Integration presentation
PDF
Lesson 2: Limits and Limit Laws
PPT
functions limits and continuity
PDF
Lesson 5: Continuity
PPTX
Limits, continuity, and derivatives
PPT
Limit and continuity (2)
PDF
Formulas
PDF
Lesson 10: The Chain Rule (slides)
PDF
Lesson 25: Evaluating Definite Integrals (slides)
PPT
Limits and derivatives
PPTX
5.2 the substitution methods
PPTX
1.5 all notes
PDF
Analysis Solutions CVI
PDF
PPTX
DIFFERENTIATION
Lesson 27: Evaluating Definite Integrals
Lesson 27: Integration by Substitution (Section 041 slides)
Limits And Derivative
Functions limits and continuity
Integration presentation
Lesson 2: Limits and Limit Laws
functions limits and continuity
Lesson 5: Continuity
Limits, continuity, and derivatives
Limit and continuity (2)
Formulas
Lesson 10: The Chain Rule (slides)
Lesson 25: Evaluating Definite Integrals (slides)
Limits and derivatives
5.2 the substitution methods
1.5 all notes
Analysis Solutions CVI
DIFFERENTIATION
Ad

Similar to Understanding the "Chain Rule" for Derivatives by Deriving Your Own Version (20)

PDF
1543 integration in mathematics b
PDF
Introduction to Functions
PDF
Integration material
PDF
Integration
PDF
Note introductions of functions
PDF
Introduction to functions
PPTX
Project in Calcu
PDF
lemh201 (1).pdfvjsbdkkdjfkfjfkffkrnfkfvfkrjof
PPTX
Lecture co3 math21-1
PPT
PPT
Derivatie class 12
PDF
PDF
Calculus - Functions Review
PDF
Real World Haskell: Lecture 6
PDF
Matematicas FINANCIERAS CIFF dob.pdf
PDF
The Fundamental theorem of calculus
PPTX
Differential Equations Assignment Help
PPTX
AIOU Code 803 Mathematics for Economists Semester Spring 2022 Assignment 2.pptx
PPTX
Differential Equations Assignment Help
1543 integration in mathematics b
Introduction to Functions
Integration material
Integration
Note introductions of functions
Introduction to functions
Project in Calcu
lemh201 (1).pdfvjsbdkkdjfkfjfkffkrnfkfvfkrjof
Lecture co3 math21-1
Derivatie class 12
Calculus - Functions Review
Real World Haskell: Lecture 6
Matematicas FINANCIERAS CIFF dob.pdf
The Fundamental theorem of calculus
Differential Equations Assignment Help
AIOU Code 803 Mathematics for Economists Semester Spring 2022 Assignment 2.pptx
Differential Equations Assignment Help
Ad

More from James Smith (20)

PDF
Using a Common Theme to Find Intersections of Spheres with Lines and Planes v...
PDF
Via Geometric Algebra: Direction and Distance between Two Points on a Spheric...
PDF
Solution of a Vector-Triangle Problem Via Geometric (Clifford) Algebra
PDF
Via Geometric (Clifford) Algebra: Equation for Line of Intersection of Two Pl...
PDF
Solution of a Sangaku ``Tangency" Problem via Geometric Algebra
PDF
Un acercamiento a los determinantes e inversos de matrices
PDF
Making Sense of Bivector Addition
PDF
Learning Geometric Algebra by Modeling Motions of the Earth and Shadows of Gn...
PDF
Solution of a High-School Algebra Problem to Illustrate the Use of Elementary...
PDF
Nuevo Manual de la UNESCO para la Enseñanza de Ciencias
PDF
Calculating the Angle between Projections of Vectors via Geometric (Clifford)...
PDF
Estimation of the Earth's "Unperturbed" Perihelion from Times of Solstices an...
PDF
Projection of a Vector upon a Plane from an Arbitrary Angle, via Geometric (C...
PDF
Formulas and Spreadsheets for Simple, Composite, and Complex Rotations of Vec...
PDF
"Rotation of a Rotation" via Geometric (Clifford) Algebra
PDF
Sismos: Recursos acerca de la inspección y refuerzo de edificios dañados por ...
PDF
How to Effect a Composite Rotation of a Vector via Geometric (Clifford) Algebra
PDF
A Modification of the Lifshitz-Slyozov-Wagner Equation for Predicting Coarsen...
PDF
Calculating Dimensions for Constructing Super Adobe (Earth Bag) Domes
PDF
Trampas comunes en los exámenes de se selección sobre matemáticas
Using a Common Theme to Find Intersections of Spheres with Lines and Planes v...
Via Geometric Algebra: Direction and Distance between Two Points on a Spheric...
Solution of a Vector-Triangle Problem Via Geometric (Clifford) Algebra
Via Geometric (Clifford) Algebra: Equation for Line of Intersection of Two Pl...
Solution of a Sangaku ``Tangency" Problem via Geometric Algebra
Un acercamiento a los determinantes e inversos de matrices
Making Sense of Bivector Addition
Learning Geometric Algebra by Modeling Motions of the Earth and Shadows of Gn...
Solution of a High-School Algebra Problem to Illustrate the Use of Elementary...
Nuevo Manual de la UNESCO para la Enseñanza de Ciencias
Calculating the Angle between Projections of Vectors via Geometric (Clifford)...
Estimation of the Earth's "Unperturbed" Perihelion from Times of Solstices an...
Projection of a Vector upon a Plane from an Arbitrary Angle, via Geometric (C...
Formulas and Spreadsheets for Simple, Composite, and Complex Rotations of Vec...
"Rotation of a Rotation" via Geometric (Clifford) Algebra
Sismos: Recursos acerca de la inspección y refuerzo de edificios dañados por ...
How to Effect a Composite Rotation of a Vector via Geometric (Clifford) Algebra
A Modification of the Lifshitz-Slyozov-Wagner Equation for Predicting Coarsen...
Calculating Dimensions for Constructing Super Adobe (Earth Bag) Domes
Trampas comunes en los exámenes de se selección sobre matemáticas

Recently uploaded (20)

PDF
Classroom Observation Tools for Teachers
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Insiders guide to clinical Medicine.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Basic Mud Logging Guide for educational purpose
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Pre independence Education in Inndia.pdf
Classroom Observation Tools for Teachers
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
RMMM.pdf make it easy to upload and study
Supply Chain Operations Speaking Notes -ICLT Program
STATICS OF THE RIGID BODIES Hibbelers.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Insiders guide to clinical Medicine.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
Basic Mud Logging Guide for educational purpose
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Renaissance Architecture: A Journey from Faith to Humanism
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Anesthesia in Laparoscopic Surgery in India
Microbial diseases, their pathogenesis and prophylaxis
Pre independence Education in Inndia.pdf

Understanding the "Chain Rule" for Derivatives by Deriving Your Own Version

  • 1. Understanding the “Chain Rule” by Deriving Your Own Version July 29, 2018 James Smith nitac14b@yahoo.com https://mx.linkedin.com/in/james-smith-1b195047 Abstract Because the Chain Rule can confuse students as much as it helps them solve real problems, we put ourselves in the shoes of the mathematicians who derived it, so that students may understand the motivation for the rule; its limitations; and why textbooks present it in its customary form. We begin by finding the derivative of sin 2x without using the Chain Rule. That exercise, having shown that even a comparatively simple compound function can be bothersome to differentiate using the definition of the derivative as a limit, provides the motivation for developing our own formula for the derivative of the general compound function g [f (x)]. In the course of that development, we see why the function f must be continuous at any value of x to which the formula is applied. We finish by comparing our formula to that which is commonly given. 1
  • 2. Contents 1 Introduction 2 2 Motivational Example: Derivative of sin 2x from the Definition of the Derivative as a Limit 2 3 Looking for an Easier Route 3 4 Comparison with the Usual Form of the Chain Rule 11 1 Introduction When we are learning a new technique in mathematics, we benefit from familiar- izing ourselves with the type of problem that the method was developed to solve. We also benefit from struggling with a few problems of that sort before being shown the technique in its modern form. In that way, we are better prepared to understand that version of that technique, as well as its derivation. We also become better problem-solvers in general. In this document, we will develop our own formula for the derivative of a composite function, then compare it to a version of the Chain Rule that is found in many standard calculus textbooks. As a motivational example (that is, to help us see why some sort of Chain Rule would be desirable), we’ll begin by finding d sin 2x dx using the definition du (x) dx x=a = lim δ→0 u (a + δ) − u (a) δ , (1.1) followed by the Law of Universal Generalization. 2 Motivational Example: Derivative of sin 2x from the Definition of the Derivative as a Limit For the real number a, arbitrary, the definition in Eq. (1.1) gives d sin 2x dx x=a = lim δ→0 sin 2 (a + δ) − sin 2a δ .
  • 3. The trig identities that we need: (1) sin 2θ = 2 sin θ cos θ (2) sin (α + β) = sin α cos β + cos α sin β (3) cos (α + β) = cos α cos β − sin α sin β. The next several steps use trigonometric identities for sums and doubles of angles to transform the right-hand side. d sin 2x dx x=a = lim δ→0 2 sin (a + δ) cos (a + δ) − sin 2a δ = lim δ→0 2 [sin (a + δ)] [cos (a + δ)] − sin 2a δ = lim δ→0 2 [sin a cos δ + cos a sin δ] [cos a cos δ − sin a sin δ] − sin 2a δ = lim δ→0 2 sin a cos a cos2 δ − sin2 δ − 1 + 2 cos2 a − sin2 a sin δ cos δ δ = lim δ→0 2 sin a cos a −2 sin2 δ + 2 cos 2a sin δ cos δ δ . Next, we transform the right-hand side in a way that will enables to use theorems about limits. d sin 2x dx x=a = lim δ→0 −4 sin a cos a sin2 δ δ + 2 cos 2a sin δ cos δ δ Now, we’ll use those theorems about limits. d sin 2x dx x=a = lim δ→0 −4 sin a cos a sin2 δ δ + lim δ→0 2 cos 2a sin δ cos δ δ = −4 sin a cos a lim δ→0 sin δ δ sin δ + 2 cos 2a lim δ→0 sin δ δ cos δ = −4 sin a cos a lim δ→0 sin δ δ =1 lim δ→0 sin δ =0 +2 cos 2a lim δ→0 sin δ δ =1 lim δ→0 cos δ =1 = 2 cos 2a. We conclude by saying that because a was an arbitrary real number, the result is valid for all real numbers. Customarily, we communicate that conclusion by writing d sin 2x dx = 2 cos 2x. WOW! That was a lot of work to find the derivative of such a simple function. We might well dread trying to find the derivative of, say, sin √ 1 + log x by the same route. That is, by starting from the definition in (Eq. (1.1)). Let’s see if we can find a better idea. 3 Looking for an Easier Route Anyone who has solved math problems “by hand” or with spreadsheets has seen several benefits of treating functions like sin 2x as composites of the form 3
  • 4. v [u (x)]. (In the case of sin 2x, u (x) = 2x, and v is the sine function.) Therefore, why not attempt to derive a formula for the derivative of the generic composite function g [f (x)]? We want our formula to be applicable to as many types of functions as possible, so we’ll accept restrictions upon g, f, and their domains only when necessary. We’ll begin by writing, for x = a, arbitrary, dg [f (x)] dx x=a = lim δ→0 g [f (a + δ)] − g [f (a)] δ . (3.1) The expression on the right-hand side appears unhelpful, so we’ll look for ideas that might suggest ways to transform it. We’re searching for notions, so for now we won’t pay much attention to rigor—time for that later. If we bear in mind that dg [f (x)] dx is ‘the rate of change of g with respect to x”, we might jot down (informally) Rate of change of g with respect to x =    Rate of change of f with re- spect to x       Rate of change of g with re- spect to f    . Continuing to think informally, might rewrite that note as dg dx = df dx dg df . (3.2) This idea has intuitive appeal. Let’s test it on our result for d sin 2x dx . As we noted above, our “f” in that case is 2x, so df dx would be 2. Ou “g” is the sine function. Viewing 2x as a single variable, the derivative of sin 2x with respect to 2x would be cos 2x . Thus, dg df in our case would be cos 2x. Putting these ideas together, d sin 2x dx = df dx dg df = [2] cos 2x = 2 cos 2x, which is the result that we we obtained in our motivational example. Although we would appear to be on the right track, we can’t trust our idea that dg dx = df dx dg df without deriving it rigorously—for example, from Ec. (3.1))—and expressing it clearly. How might we do that? Recognizing that Ec. (3.1) refers 4
  • 5. specifically to the value of the derivative for x = a, we might write Rate of change of g with respect to x = df (x) dx x=a    Rate of change of g with re- spect to f at x = a    , and therefore dg [f (x)] dx x=a = lim δ→0 f (a + δ) − (a) δ    Rate of change of g with re- spect to f at x = a    . (3.3) At this point, we might note that the quantity in the box on the right-hand side is a limit of “something” as δ → 0: dg [f (x)] dx x=a = lim δ→0 f (a + δ) − (a) δ lim δ→0 [“Something”] . (3.4) But what is that “Something”? Comparing the right-hand sides of Ecs. (3.1) and (3.4), and using the theorem that “the limit of a product of functions is the product of the functions’ limits”, we reason as follows: dg [f (x)] dx x=a = dg [f (x)] dx x=a lim δ→0 f (a + δ) − (a) δ lim δ→0 [“Something”] = lim δ→0 g [f (a + δ)] − g [f (a)] δ lim δ→0 f (a + δ) − (a) δ [“Something”] = lim δ→0 g [f (a + δ)] − g [f (a)] δ ∴ “Something” = g [f (a + δ)] − g [f (a)] f (a + δ) − f (a) . Now, we can return to Ec. (3.4) to write dg [f (x)] dx x=a = lim δ→0 f (a + δ) − (a) δ lim δ→0 g [f (a + δ)] − g [f (a)] f (a + δ) − f (a) = df (x) dx x=a lim δ→0 g [f (a + δ)] − g [f (a)] f (a + δ) − f (a) . (3.5) A restriction upon the formula that we’re attempting to develop: the derivative of f with respect to x must exist at x = a. Having written that result, we need to recall that it is true only if the derivative of f with respect to x exists at a. We’ve just placed our first restriction upon whatever result we may obtain from our derivation. Our question now is what to to with the remaining limit on the right-hand side of Ec. (3.5): lim δ→0 g [f (a + δ)] − g [f (a)] f (a + δ) − f (a) That’s quite a “busy” expression, so let’s draw a graph to help us “get our minds around it”. We’ll start with a graph of f (x) (Fig. 1). 5
  • 6. Figure 1: Our first step in constructing a graph that might help us to understand the limit on the right-hand side of Eq. (3.5): the graph of f (x) . Figure 2: To eliminate a possible confusion, we’ve graphed both f and g as functions of the real variable z. Next, we’ll want to add the graph of g [f (x)]. But how do we do that, on a graph whose horizontal axis is x? At this point, we might realize that we’ve been a little careless with our use of symbols. We’re accustomed to using the single symbol “x” both to represent the independent variable in a problem (as we have here with f (x)), and as a coordinate along the real-number line (as in our graph). That dual meaning seldom causes trouble for us, but now it has. “Defined” means that for every real number b, there exists a unique real number f (b). Similarly for g. To find a way forward, let’s consider the case where both f and g are defined for every real number z. (We’ll discuss more-complicated cases later.) We’ll graph both functions in that way (Fig. 2). Now, along the horizontal axis, we’ll locate the point for the number a at which we’re evaluating our derivative dg [f (x)] dx . On the vertical axis, we’ll locate the point for the number f (a) (Fig. 3). 6
  • 7. Figure 3: Along the horizontal axis, we’ve located the point for the number a at which we’re evaluating our derivative dg [f (x)] dx . On the vertical axis, we’ve located the point for the number f (a). Figure 4: After locating the point on the horizontal axis for the number f (a). The set-theoretic concept of a function may be helpful here: The function f is the set of ordered pairs (b, c) such that no two pairs have b as their first element. We can call c “the value of f at z = b”. When making a graph of f, we “highlight” those points whose horizontal coordinate is the first element of some pair, and whose vertical coordinate is the second element of that same pair. Because the function g is defined for every real number, it’s defined for the specific real number f (a). Therefore, our next step is to locate the point on the horizontal axis for that number (Fig. 4). The value of g, evaluated at z = f (a), is some specific real number that we’ll write as g [f (a)]. In Fig. 5, we locate the point for that number along the vertical axis. We seem to be progressing, but we’ve yet to incorporate δ. We should know how to do that; we did it many times in our first classes on derivatives as limits. Still, before adding to our graph the points that involve δ, we want to think a bit about our goal: we want to understand what happens when δ approaches zero. To that end, we first attempt to understand the situation that exists when δ is some suitably small, non-zero number. Taking δ as positive for the time being (Fig. 6), we locate the point for the number z = a + δ on the horizontal 7
  • 8. Figure 5: After locating the point on the vertical axis for the number g [f (a)]. Figure 6: After locating the point for g [(a + δ)]. axis, and that for the number f (a + δ) on the vertical axis. Then, we locate f (a + δ) on the horizontal axis, and g [f (a + δ)] on the vertical axis. We’re ready, finally, to investigate limδ→0 g [f (a + δ)] − g [f (a)] f (a + δ) − f (a) . To avoid distractions, we’ll eliminate the portions of our graph that don’t concern g (Fig. 7). We’ll also draw a straight line connecting the indicated points on the curve for g (z). We can let our early experiences with derivatives as limits guide us now. As the interval between z = f (a) and z = f (a + δ) shrinks, the straight line that we drew becomes the line tangent to the graph of g at z = f (a) (Fig. 8). The slope of that tangent line is dg (z) dz z=f(a) . Therefore, if f (a + δ) − f (a) goes to zero as δ itself shrinks to zero, then lim δ→0 g [f (a + δ)] − g [f (a)] f (a + δ) − f (a) = dg (z) dz z=f(a) . (3.6) 8
  • 9. Figure 7: Focusing on the curve for g. We’ve added the secant line as preparation for considering what occurs when δ → 0. Figure 8: The tangent that the secant line shown in Fig. 7 approaches if f (a + δ) → f (a) as δ → 0. The slope of the tangent is dg (z) dz z=f(a) . But notice the “if”: as we know, not all functions behave as stated for every real number. Functions can also be piecewise continuous; that is, continuous on certain intervals. The same arguments that we’re using here work for that type of function as well. Where does that realization leave us? The bad news is that if we’re dealing with a function f and a number a such that f (a + δ) does not go to f (a) as δ goes to zero, then we’re stuck: nothing can be done. The good news is that many common functions do have the required behavior: they’re the type that mathematicians call continuous. That is, the functions u (z) such that for every real number c, lim z→c+ u (z) = lim z→c− u (z) = u (c) . Polynomials and sin x are examples of continuous functions. The need for f (z) to be continuous at a becomes apparent when f has the behavior shown in Fig. 9. In such a case, the point on the horizontal axis for 9
  • 10. Figure 9: A function in which f (a + δ) would be equal to (a) several times as δ → 0, at each of which the denominator in Eq. (3.6) would be zero. Those subtleties require treatment that is beyond the scope of this document. f (a + δ) will alternate between being to the left and the right of that for f (a). Nevertheless, the length of the interval will shrink to zero as δ itself goes to zero. Note that for some values of δ in Fig. 9, f (a + δ) = f (a), making the denominator zero in the limit in Eq. (3.6). Those subtleties require treatment that is beyond the scope of this document. However, those considerations should not distract us from what we’ve accom- plished: accepting the restriction that f must be continuous at a is a small price to pay for being able to reduce a monstrosity like limδ→0 g [f (a + δ)] − g [f (a)] f (a + δ) − f (a) to dg (z) dz z=f(a) . Using that result, we can write that if f is continuous at a, and if dg (z) dz is continuous at f (a), then dg [f (a)] dx x=a = df (x) dx x=a dg (z) dz z=f(a) . (3.7) The right-hand side of that equation is messy because of the (apparently) different variables x and z. To clean it up, we can ask ourselves what those variables mean in this context. We’ll start with df (x) dx x=a : that expression means “the rate of change of the dependent variable f with respect to its independent variable, when the value of the latter is a”. In the context of our present problem, x and z refer to the same variable. Therefore, we’ll use z, and rewrite Eq. (3.7) as dg [f (a)] dx x=a = df (z) dz z=a dg (z) dz z=f(a) . (3.8) Let’s test that result on the function sin 2x, which we used in our motiva- tional example. Our f (x) is 2x, and our g is the sine function. The procedure is shown in Table 1 to find the derivative of sin 2x according to Eq. (3.8). 10
  • 11. Table 1: Implementation of Eq. (3.8) Step Implementation for g [f (x)] = sin 2x Identify f (z) and g (z) f (z) = 2z, g (z) = sin z Identify formulas for df (z) dz , and dg (z) dz d2z dz = 2, and d sin z dz = cos z Evaluate df (z) dz at z = a, and dg (z) dz at z = f (a) 2|z=a = 2; and cos z |z=2a = cos 2a dg [f (a)] dz x=a = df (z) dz z=a dg (z) dz z=f(a) dg [sin 2x] dz x=a = [2] [cos 2a] Because the procedure worked, we now invoke the Law of Universal Gen- eralization to write that because (1) the function f = 2z is continuous for all values of z, (2) d2z dz exists at all z, and (3) cos z exists at all z, d sin 2x dx = 2 cos 2x. Now that we’re sure our method for finding the derivative of a compound function is sound, we’ll want to compare our method to the standard formulation of the Chain Rule. 4 Comparison with the Usual Form of the Chain Rule A typical presentation of the Chain Rule is If a variable g depends on the variable f, which itself depends on the variable x, so that f and g are therefore dependent variables, then g, via the intermediate variable of f, depends on x as well. The chain rule then states 1 dg dx = dg df df dx . (4.1) 1Paraphrase of Wikipedia’s article “Chain rule”, accessed 22 Julio 2018. 11
  • 12. Eq. (4.1) is identical to our “intuitive” Eq. (3.2). In both, the derivative df dx is the customary way of writing the generalization of our df (z) dz |z=a (in Eq. (3.8)) to the whole set of real numbers. (Or more accurately, to those at which df dz exists.) But what about the factor dg df in Eqs. (3.2) and (4.1)? It must be equal to the factor dg (z) dz |z=f(a) in our Eq. (3.8). Can we establish that equality rigorously? For any given f, g, and a for which the limit limδ→0 g [f (a + δ)] − g [f (a)] f (a + δ) − f (a) exists, that limit is a specific real number. Let’s review the analysis through which we established that lim δ→0 g [f (a + δ)] − g [f (a)] f (a + δ) − f (a) = dg (z) dz z=f(a) , with the restriction that f must be continuous at a. We accepted that restriction because it ensured that [f (a + δ) − f (a)] → 0 as δ → 0. Therefore, the expressions lim δ→0 g [f (a + δ)] − g [f (a)] f (a + δ) − f (a) and lim [f(a+δ)−f(a)]→0 g [f (a + δ)] − g [f (a)] f (a + δ) − f (a) are the same number. The latter expression is the definition, given in the form of a limit, of dg df |f=f(a). We’d generalize that result by writing simply dg df . Thus, dg df in Eqs. (3.2) and (4.1) is indeed the generalization of the factor dg (z) dz z=f(a) that we identified in our own version of the Chain Rule. 12