Understanding the "Chain Rule" for Derivatives by Deriving Your Own Version

Understanding the “Chain Rule” by Deriving
Your Own Version
July 29, 2018
James Smith
nitac14b@yahoo.com
https://mx.linkedin.com/in/james-smith-1b195047
Abstract
Because the Chain Rule can confuse students as much as it helps them
solve real problems, we put ourselves in the shoes of the mathematicians
who derived it, so that students may understand the motivation for the
rule; its limitations; and why textbooks present it in its customary form.
We begin by finding the derivative of sin 2x without using the Chain
Rule. That exercise, having shown that even a comparatively simple
compound function can be bothersome to differentiate using the definition
of the derivative as a limit, provides the motivation for developing our
own formula for the derivative of the general compound function g [f (x)].
In the course of that development, we see why the function f must be
continuous at any value of x to which the formula is applied. We finish by
comparing our formula to that which is commonly given.
1

Contents
1 Introduction 2
2 Motivational Example: Derivative of sin 2x from the Definition
of the Derivative as a Limit 2
3 Looking for an Easier Route 3
4 Comparison with the Usual Form of the Chain Rule 11
1 Introduction
When we are learning a new technique in mathematics, we benefit from familiar-
izing ourselves with the type of problem that the method was developed to solve.
We also benefit from struggling with a few problems of that sort before being
shown the technique in its modern form. In that way, we are better prepared
to understand that version of that technique, as well as its derivation. We also
become better problem-solvers in general.
In this document, we will develop our own formula for the derivative of a
composite function, then compare it to a version of the Chain Rule that is found
in many standard calculus textbooks. As a motivational example (that is, to
help us see why some sort of Chain Rule would be desirable), we’ll begin by
finding
d sin 2x
dx
using the definition
du (x)
dx x=a
= lim
δ→0
u (a + δ) − u (a)
δ
, (1.1)
followed by the Law of Universal Generalization.
2 Motivational Example: Derivative of sin 2x from
the Definition of the Derivative as a Limit
For the real number a, arbitrary, the definition in Eq. (1.1) gives
d sin 2x
dx x=a
= lim
δ→0
sin 2 (a + δ) − sin 2a
δ
.

The trig identities that we need:
(1) sin 2θ = 2 sin θ cos θ
(2) sin (α + β) =
sin α cos β + cos α sin β
(3) cos (α + β) =
cos α cos β − sin α sin β.
The next several steps use trigonometric identities for sums and doubles of
angles to transform the right-hand side.
d sin 2x
dx x=a
= lim
δ→0
2 sin (a + δ) cos (a + δ) − sin 2a
δ
= lim
δ→0
2 [sin (a + δ)] [cos (a + δ)] − sin 2a
δ
= lim
δ→0
2 [sin a cos δ + cos a sin δ] [cos a cos δ − sin a sin δ] − sin 2a
δ
= lim
δ→0
2 sin a cos a cos2
δ − sin2
δ − 1 + 2 cos2
a − sin2
a sin δ cos δ
δ
= lim
δ→0
2 sin a cos a −2 sin2
δ + 2 cos 2a sin δ cos δ
δ
.
Next, we transform the right-hand side in a way that will enables to use theorems
about limits.
d sin 2x
dx x=a
= lim
δ→0
−4 sin a cos a sin2
δ
δ
+
2 cos 2a sin δ cos δ
δ
Now, we’ll use those theorems about limits.
d sin 2x
dx x=a
= lim
δ→0
−4 sin a cos a sin2
δ
δ
+ lim
δ→0
2 cos 2a sin δ cos δ
δ
= −4 sin a cos a lim
δ→0
sin δ
δ
sin δ + 2 cos 2a lim
δ→0
sin δ
δ
cos δ
= −4 sin a cos a lim
δ→0
sin δ
δ
=1
lim
δ→0
sin δ
=0
+2 cos 2a lim
δ→0
sin δ
δ
=1
lim
δ→0
cos δ
=1
= 2 cos 2a.
We conclude by saying that because a was an arbitrary real number, the
result is valid for all real numbers. Customarily, we communicate that conclusion
by writing
d sin 2x
dx
= 2 cos 2x.
WOW! That was a lot of work to find the derivative of such a simple function.
We might well dread trying to find the derivative of, say, sin
√
1 + log x by the
same route. That is, by starting from the definition in (Eq. (1.1)). Let’s see if
we can find a better idea.
3 Looking for an Easier Route
Anyone who has solved math problems “by hand” or with spreadsheets has
seen several benefits of treating functions like sin 2x as composites of the form
3

v [u (x)]. (In the case of sin 2x, u (x) = 2x, and v is the sine function.) Therefore,
why not attempt to derive a formula for the derivative of the generic composite
function g [f (x)]? We want our formula to be applicable to as many types of
functions as possible, so we’ll accept restrictions upon g, f, and their domains
only when necessary. We’ll begin by writing, for x = a, arbitrary,
dg [f (x)]
dx x=a
= lim
δ→0
g [f (a + δ)] − g [f (a)]
δ
. (3.1)
The expression on the right-hand side appears unhelpful, so we’ll look for
ideas that might suggest ways to transform it. We’re searching for notions, so
for now we won’t pay much attention to rigor—time for that later. If we bear in
mind that
dg [f (x)]
dx
is ‘the rate of change of g with respect to x”, we might jot
down (informally)
Rate of
change of g
with respect
to x
=



Rate of change
of f with re-
spect to x






Rate of change
of g with re-
spect to f



.
Continuing to think informally, might rewrite that note as
dg
dx
=
df
dx
dg
df
. (3.2)
This idea has intuitive appeal. Let’s test it on our result for
d sin 2x
dx
. As
we noted above, our “f” in that case is 2x, so
df
dx
would be 2. Ou “g” is the sine
function. Viewing 2x as a single variable, the derivative of sin 2x with respect
to 2x would be cos 2x . Thus,
dg
df
in our case would be cos 2x. Putting these
ideas together,
d sin 2x
dx
=
df
dx
dg
df
= [2] cos 2x
= 2 cos 2x,
which is the result that we we obtained in our motivational example.
Although we would appear to be on the right track, we can’t trust our idea
that
dg
dx
=
df
dx
dg
df
without deriving it rigorously—for example, from Ec. (3.1))—and
expressing it clearly. How might we do that? Recognizing that Ec. (3.1) refers
4

speciﬁcally to the value of the derivative for x = a, we might write
Rate of
change of g
with respect
to x
=
df (x)
dx x=a



Rate of change
of g with re-
spect to f at
x = a



, and therefore
dg [f (x)]
dx x=a
= lim
δ→0
f (a + δ) − (a)
δ



Rate of change
of g with re-
spect to f at
x = a



. (3.3)
At this point, we might note that the quantity in the box on the right-hand
side is a limit of “something” as δ → 0:
dg [f (x)]
dx x=a
= lim
δ→0
f (a + δ) − (a)
δ
lim
δ→0
[“Something”] . (3.4)
But what is that “Something”? Comparing the right-hand sides of Ecs. (3.1)
and (3.4), and using the theorem that “the limit of a product of functions is the
product of the functions’ limits”, we reason as follows:
dg [f (x)]
dx x=a
=
dg [f (x)]
dx x=a
lim
δ→0
f (a + δ) − (a)
δ
lim
δ→0
[“Something”] = lim
δ→0
g [f (a + δ)] − g [f (a)]
δ
lim
δ→0
f (a + δ) − (a)
δ
[“Something”] = lim
δ→0
g [f (a + δ)] − g [f (a)]
δ
∴ “Something” =
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
.
Now, we can return to Ec. (3.4) to write
dg [f (x)]
dx x=a
= lim
δ→0
f (a + δ) − (a)
δ
lim
δ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
=
df (x)
dx x=a
lim
δ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
. (3.5)
A restriction upon the formula
that we’re attempting to
develop: the derivative of f with
respect to x must exist at x = a.
Having written that result, we need to recall that it is true only if the derivative
of f with respect to x exists at a.
We’ve just placed our ﬁrst restriction upon whatever result we may obtain
from our derivation.
Our question now is what to to with the remaining limit on the right-hand
side of Ec. (3.5):
lim
δ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
That’s quite a “busy” expression, so let’s draw a graph to help us “get our minds
around it”. We’ll start with a graph of f (x) (Fig. 1).
5

Figure 1: Our first step in constructing a graph that might help us to understand
the limit on the right-hand side of Eq. (3.5): the graph of f (x) .
Figure 2: To eliminate a possible confusion, we’ve graphed both f and g as
functions of the real variable z.
Next, we’ll want to add the graph of g [f (x)]. But how do we do that, on a
graph whose horizontal axis is x?
At this point, we might realize that we’ve been a little careless with our use
of symbols. We’re accustomed to using the single symbol “x” both to represent
the independent variable in a problem (as we have here with f (x)), and as a
coordinate along the real-number line (as in our graph). That dual meaning
seldom causes trouble for us, but now it has.
“Defined” means that for every
real number b, there exists a
unique real number f (b).
Similarly for g.
To find a way forward, let’s consider the case where both f and g are
defined for every real number z. (We’ll discuss more-complicated cases later.)
We’ll graph both functions in that way (Fig. 2). Now, along the horizontal axis,
we’ll locate the point for the number a at which we’re evaluating our derivative
dg [f (x)]
dx
. On the vertical axis, we’ll locate the point for the number f (a) (Fig.
3).
6

Figure 3: Along the horizontal axis, we’ve located the point for the number a
at which we’re evaluating our derivative
dg [f (x)]
dx
. On the vertical axis, we’ve
located the point for the number f (a).
Figure 4: After locating the point on the horizontal axis for the number f (a).
The set-theoretic concept of a
function may be helpful here:
The function f is the set of
ordered pairs (b, c) such that no
two pairs have b as their first
element. We can call c “the
value of f at z = b”. When
making a graph of f, we
“highlight” those points whose
horizontal coordinate is the first
element of some pair, and whose
vertical coordinate is the second
element of that same pair.
Because the function g is defined for every real number, it’s defined for the
specific real number f (a). Therefore, our next step is to locate the point on the
horizontal axis for that number (Fig. 4). The value of g, evaluated at z = f (a),
is some specific real number that we’ll write as g [f (a)]. In Fig. 5, we locate the
point for that number along the vertical axis.
We seem to be progressing, but we’ve yet to incorporate δ. We should know
how to do that; we did it many times in our first classes on derivatives as limits.
Still, before adding to our graph the points that involve δ, we want to think a
bit about our goal: we want to understand what happens when δ approaches
zero. To that end, we first attempt to understand the situation that exists when
δ is some suitably small, non-zero number. Taking δ as positive for the time
being (Fig. 6), we locate the point for the number z = a + δ on the horizontal
7

Figure 5: After locating the point on the vertical axis for the number g [f (a)].
Figure 6: After locating the point for g [(a + δ)].
axis, and that for the number f (a + δ) on the vertical axis. Then, we locate
f (a + δ) on the horizontal axis, and g [f (a + δ)] on the vertical axis.
We’re ready, ﬁnally, to investigate limδ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
. To avoid
distractions, we’ll eliminate the portions of our graph that don’t concern g (Fig.
7). We’ll also draw a straight line connecting the indicated points on the curve
for g (z).
We can let our early experiences with derivatives as limits guide us now.
As the interval between z = f (a) and z = f (a + δ) shrinks, the straight line
that we drew becomes the line tangent to the graph of g at z = f (a) (Fig. 8).
The slope of that tangent line is
dg (z)
dz z=f(a)
. Therefore, if f (a + δ) − f (a)
goes to zero as δ itself shrinks to zero, then
lim
δ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
=
dg (z)
dz z=f(a)
. (3.6)
8

Figure 7: Focusing on the curve for g. We’ve added the secant line as preparation
for considering what occurs when δ → 0.
Figure 8: The tangent that the secant line shown in Fig. 7 approaches if
f (a + δ) → f (a) as δ → 0. The slope of the tangent is
dg (z)
dz z=f(a)
.
But notice the “if”: as we know, not all functions behave as stated for every
real number.
Functions can also be piecewise
continuous; that is, continuous
on certain intervals. The same
arguments that we’re using here
work for that type of function as
well.
Where does that realization leave us? The bad news is that if we’re dealing
with a function f and a number a such that f (a + δ) does not go to f (a) as
δ goes to zero, then we’re stuck: nothing can be done. The good news is that
many common functions do have the required behavior: they’re the type that
mathematicians call continuous. That is, the functions u (z) such that for every
real number c,
lim
z→c+
u (z) = lim
z→c−
u (z) = u (c) .
Polynomials and sin x are examples of continuous functions.
The need for f (z) to be continuous at a becomes apparent when f has the
behavior shown in Fig. 9. In such a case, the point on the horizontal axis for
9

Figure 9: A function in which f (a + δ) would be equal to (a) several times
as δ → 0, at each of which the denominator in Eq. (3.6) would be zero. Those
subtleties require treatment that is beyond the scope of this document.
f (a + δ) will alternate between being to the left and the right of that for f (a).
Nevertheless, the length of the interval will shrink to zero as δ itself goes to
zero. Note that for some values of δ in Fig. 9, f (a + δ) = f (a), making the
denominator zero in the limit in Eq. (3.6). Those subtleties require treatment
that is beyond the scope of this document.
However, those considerations should not distract us from what we’ve accom-
plished: accepting the restriction that f must be continuous at a is a small price
to pay for being able to reduce a monstrosity like limδ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
to
dg (z)
dz z=f(a)
. Using that result, we can write that if f is continuous at a,
and if
dg (z)
dz
is continuous at f (a), then
dg [f (a)]
dx x=a
=
df (x)
dx x=a
dg (z)
dz z=f(a)
. (3.7)
The right-hand side of that equation is messy because of the (apparently) diﬀerent
variables x and z. To clean it up, we can ask ourselves what those variables mean
in this context. We’ll start with
df (x)
dx x=a
: that expression means “the rate
of change of the dependent variable f with respect to its independent variable,
when the value of the latter is a”. In the context of our present problem, x and
z refer to the same variable. Therefore, we’ll use z, and rewrite Eq. (3.7) as
dg [f (a)]
dx x=a
=
df (z)
dz z=a
dg (z)
dz z=f(a)
. (3.8)
Let’s test that result on the function sin 2x, which we used in our motiva-
tional example. Our f (x) is 2x, and our g is the sine function. The procedure
is shown in Table 1 to ﬁnd the derivative of sin 2x according to Eq. (3.8).
10

Table 1: Implementation of Eq. (3.8)
Step
Implementation for
g [f (x)] = sin 2x
Identify f (z) and g (z) f (z) = 2z, g (z) = sin z
Identify formulas for
df (z)
dz
, and
dg (z)
dz
d2z
dz
= 2,
and
d sin z
dz
= cos z
Evaluate
df (z)
dz
at z = a,
and
dg (z)
dz
at z = f (a)
2|z=a = 2;
and cos z |z=2a = cos 2a
dg [f (a)]
dz x=a
=
df (z)
dz z=a
dg (z)
dz z=f(a)
dg [sin 2x]
dz x=a
= [2] [cos 2a]
Because the procedure worked, we now invoke the Law of Universal Gen-
eralization to write that because (1) the function f = 2z is continuous for all
values of z, (2)
d2z
dz
exists at all z, and (3) cos z exists at all z,
d sin 2x
dx
= 2 cos 2x.
Now that we’re sure our method for ﬁnding the derivative of a compound
function is sound, we’ll want to compare our method to the standard formulation
of the Chain Rule.
4 Comparison with the Usual Form of the Chain
Rule
A typical presentation of the Chain Rule is
If a variable g depends on the variable f, which itself depends on the
variable x, so that f and g are therefore dependent variables, then g,
via the intermediate variable of f, depends on x as well. The chain
rule then states 1
dg
dx
=
dg
df
df
dx
. (4.1)
1Paraphrase of Wikipedia’s article “Chain rule”, accessed 22 Julio 2018.
11

Eq. (4.1) is identical to our “intuitive” Eq. (3.2). In both, the derivative
df
dx
is the customary way of writing the generalization of our
df (z)
dz
|z=a (in Eq.
(3.8)) to the whole set of real numbers. (Or more accurately, to those at which
df
dz
exists.) But what about the factor
dg
df
in Eqs. (3.2) and (4.1)? It must
be equal to the factor
dg (z)
dz
|z=f(a) in our Eq. (3.8). Can we establish that
equality rigorously?
For any given f, g, and a for
which the limit
limδ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
exists, that limit is a specific real
number.
Let’s review the analysis through which we established that
lim
δ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
=
dg (z)
dz z=f(a)
,
with the restriction that f must be continuous at a. We accepted that restriction
because it ensured that [f (a + δ) − f (a)] → 0 as δ → 0. Therefore, the
expressions
lim
δ→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
and
lim
[f(a+δ)−f(a)]→0
g [f (a + δ)] − g [f (a)]
f (a + δ) − f (a)
are the same number. The latter expression is the definition, given in the form
of a limit, of
dg
df
|f=f(a). We’d generalize that result by writing simply
dg
df
.
Thus,
dg
df
in Eqs. (3.2) and (4.1) is indeed the generalization of the factor
dg (z)
dz z=f(a)
that we identified in our own version of the Chain Rule.
12

Understanding the "Chain Rule" for Derivatives by Deriving Your Own Version

More Related Content

What's hot (20)

Similar to Understanding the "Chain Rule" for Derivatives by Deriving Your Own Version (20)

More from James Smith (20)

Recently uploaded (20)

Understanding the "Chain Rule" for Derivatives by Deriving Your Own Version