Honors Calculus
Honors Calculus
Honors Calculus
Pete L. Clark
Contents
Foreword
Spivak and Me
What is Honors Calculus?
Some Features of the Text
Some Novelties in the Text
7
7
10
10
12
13
13
17
25
25
27
29
31
32
34
35
39
40
45
45
51
55
55
56
58
60
64
Chapter 5. Dierentiation
1. Dierentiability Versus Continuity
2. Dierentiation Rules
3. Optimization
4. The Mean Value Theorem
5. Monotone Functions
6. Inverse Functions I: Theory
7. Inverse Functions II: Examples and Applications
8. Some Complements
3
73
73
74
79
83
87
94
99
106
CONTENTS
Chapter 6. Completeness
1. Dedekind Completeness
2. Intervals and the Intermediate Value Theorem
3. The Monotone Jump Theorem
4. Real Induction
5. The Extreme Value Theorem
6. The Heine-Borel Theorem
7. Uniform Continuity
8. The Bolzano-Weierstrass Theorem For Subsets
9. Tarskis Fixed Point Theorem
107
107
115
118
119
120
121
122
124
125
127
127
130
138
Chapter 8. Integration
1. The Fundamental Theorem of Calculus
2. Building the Denite Integral
3. Further Results on Integration
4. Riemann Sums, Dicing, and the Riemann Integral
5. Lesbesgues Theorem
6. Improper Integrals
151
151
154
163
172
178
181
185
185
185
187
194
196
199
200
201
203
203
206
208
211
214
216
217
225
229
229
233
236
241
247
CONTENTS
6. Absolute Convergence
7. Non-Absolute Convergence
8. Power Series I: Power Series as Series
250
254
259
265
265
266
269
271
277
281
281
283
288
291
291
293
302
306
310
312
317
317
319
321
323
329
330
338
Bibliography
347
Foreword
Spivak and Me
The document you are currently reading began its life as the lecture notes for a year
long undergraduate course in honors calculus. Specically, during the 2011-2012
academic year I taught Math 2400(H) and Math 2410(H), Calculus With Theory,
at the University of Georgia. This is a course for unusually talented and motivated
(mostly rst year) undergraduate students. It has been oered for many years at
the University of Georgia, and so far as I know the course text has always been
Michael Spivaks celebrated Calculus [S]. The Spivak texts take on calculus is
suciently theoretical that, although it is much beloved by students and practitioners of mathematics, it is seldomed used nowadays as a course text. In fact, the
UGA Math 2400/2410 course is traditionally something of an interpolation between
standard freshman calculus and the fully theoretical approach of Spivak. My own
take on the course was dierent: I treated it as being a sequel to, rather than an
enriched revision of, freshman calculus.
I began the course with a substantial familiarity with Spivaks text. The summer after my junior year of high school I interviewed at the University of Chicago
and visited Paul Sally, then (and still now, as I write this in early 2013, though
he is 80 years old) Director of Undergraduate Mathematics at the University of
Chicago. After hearing that I had taken AP calculus and recently completed a
summer course in multivariable calculus at Johns Hopkins, Sally led me to a supply closet, rummaged through it, and came out with a beat up old copy of Spivaks
text. This is how we do calculus around here, he said, presenting it to me. During
my senior year at high school I took more advanced math courses at LaSalle University (which turned out, almost magically, to be located directly adjacent to my
high school) but read through Spivaks text. And I must have learned something
from it, because by the time I went on to college of course at the University of
Chicago I placed not into their Spivak calculus course, but the following course,
Honors Analysis in Rn . This course has the reputation of sharing the honor with
Harvards Math 55 of being the hardest undergraduate math course that American
universities have to oer. I cant speak to that, but it was certainly the hardest
math course I ever took. There were three ten week quarters. The rst quarter was
primarily taught out of Rudins classic text [R], with an emphasis on metric spaces.
The second quarter treated Lebesgue integration and some Fourier analysis, and
the third quarter treated analysis on manifolds and Stokess theorem.
However, that was not the end of my exposure to Spivaks text. In my second
year of college I was a grader for the rst quarter of Spivak calculus, at the (even
then) amazingly low rate of $20 per student for the entire 10 week quarter. Though
7
FOREWORD
the material in Spivaks text was at least a level below the trial by re that I had
successfully endured in the previous year, I found that there were many truly difcult problems in Spivaks text, including a non-negligible percentage that I still
did not know how to solve. Grading for this course solidied my knowledge of this
elementary but important material. It was also my rst experience with reading
proofs written by bright but very inexperienced authors: often I would stare at
an entire page of text, one long paragraph, and eventually circle a single sentence
which carried the entire content that the writer was trying to express. I only graded
for one quarter after that, but I was a drop in tutor for my last three years of
college, meaning that I would eld questions from any undergraduate math course
that a student was having trouble with, and I had many further interactions with
Spivaks text. (By the end of my undergraduate career there were a small number
of double-starred problems of which I well knew to steer clear.)
Here is what I remembered about Spivaks text in fall of 2011:
(i) It is an amazing trove of problems, some of which are truly dicult.
(ii) The text itself is lively and idiosyncratic.1
(iii) The organization is somewhat eccentric. In particular limits are not touched
until Chapter 5. The text begins with a chapter basic properties of numbers,
which are essentially the ordered eld axioms, although not called such. Chapter
3 is on Functions, and Chapter 4 is on Graphs. These chapters are essentially contentless. The text is broken up into ve parts of which the third is Derivatives
and Integrals.
After teaching the 2400 course for a while, I lost no esteem for Spivaks text,
but increasingly I realized it was not the ideal accompaniment for my course. For
one thing, I realized that my much more vivid memories of the problems than the
text itself had some basis in fact: although Spivak writes with a lively and distinct
voice and has many humorous turns of phrase, the text itself is rather spare. His
words are (very) well chosen, but few. When one takes into account the ample
margins (sometimes used for gures, but most often a white expanse) the chapters
themselves are very short and core-minded. When given the chance to introduce a
subtlety or ancillary topic, Spivak almost inevitably defers it to the problems.
I have had many years to reect on Spivaks text, and I now think its best use
is in fact the way I rst encountered it myself: as a source of self study for bright,
motivated students with little prior background and exposure to university level
mathematics (in our day, this probably means high school students, but I could
imagine a student underwhelmed by an ordinary freshman calculus class for which
Spivaks text would be a similarly mighty gift). Being good for self study is a
high compliment to pay a text, and any such text can a fortiori also be used in a
course...but not in a completely straightforward way. For my course, although the
1In between the edition of the book that Sally had given me and the following edition, all
instances of third person pronouns referring to mathematicians had been changed from he to
she. Initially I found this amusing but silly. More recently I have begun doing so myself.
SPIVAK AND ME
students who stuck with it were very motivated and hard-working, most (or all) of
them needed a lot of help from me in all aspects of the course. I had been informed
in advance by my colleague Ted Shifrin that signing on to teach this course should
entail committing to about twice as many oce hours as is typical for an undergraduate course, and this did indeed come to pass: I ended up having oce hours
four days a week, and I spent the majority of them helping the students make their
way through the problems in Spivaks text. Even the best students who, by the
way, I thought were awfully good could not solve some of the problems unassisted.
There were many beautiful, multi-part problems introducing extra material that I
wanted my students to experience: and eventually I gured out that the best way
to do this was to incorporate the problems into my lectures as theorems and proofs.
All in all I ended up viewing Spivaks book as being something of a deconstruction2 of the material, and much of my teaching time was spent reconstructing it.
For me, the lightness of touch of Spivaks approach was ultimately something I
appreciated aesthetically but could see was causing my students diculty at various key points. My experience rather convinced me that more is more: students
wanted to see more arguments in complete detail, and also simply more proofs overall. Aside from the (substantial) content conveyed in the course, a major goal is to
give the students facility with reading, understanding, constructing and critiquing
proofs. In this regard being in a classroom setting with other motivated students is
certainly invaluable, and I tried to take advantage of this by insisting that students
present their questions and arguments to each other as well as to me.
But students also learn about how to reason and express themselves mathematically by being repeatedly exposed to careful, complete proofs. I think experienced
mathematicians can forget the extent to which excellent, even brilliant, students
benet from this exposure. Of course they also benet immensely, probably more
so, by working things out for themselves. The value of this is certainly not in
question, and there is no competition here: the amount of things to do in honors
calculus is innite and the repository of problems in Spivaks text is nearly so
so by working out more for herself, the instructor is not leaving less for the students
to do, but only dierent things for them to do.
This brings me to the current text. As explained above, it is heavily indebted
to [S]. However, it is or at least, I mean it to be a new honors calculus text,
and not merely a gloss of [S]. Indeed:
The text is indebted to [S], but not uniquely so. It is also heavily indebted
to Rudins classic text [R], and it borrows at key points from several other sources,
e.g. [A], [Go], [H], [L].
I do not view this borrowing as being in any way detrimental, and I certainly do
not attempt to suppress my reliance on other texts. The mathematics exposed here
is hundreds of years old and has been treated excellently in many famous texts. An
undergraduate mathematics text which strove to be unlike its predecessors would
2My conception of the meaning of deconstruction in the sense of academic humanities is
painfully vague. I am more thinking of the term in the sense that chefs use it.
10
FOREWORD
11
and harder (or phrased more positively: one merit of abstraction is to make certain
arguments shorter and easier ).
I am a fan of abstraction, but I have come to believe that it is much less useful and moreover, much less educational in basic real analysis than in most
other branches of pure mathematics. A turning point in my feelings about this
was a second semester undergraduate real analysis course I taught at McGill University in 2005. When preparing to teach the course I was quite surprised at how
pedestrian the syllabus seemed: an entire year of real analysis was restricted to
the one-dimensional case, and there was no topology or set theory whatsoever.3 I
was strictly forbidden from talking about metric spaces, for instance. The topics of
the course were: innite series, the Riemann integral, and sequences and series of
functions, including uniform convergence. By the end of the course I had acquired
more respect for the deep content inherent in these basic topics and the ecacy of
treating them using no more than - arguments and the completeness axiom. This
was also the rst course I taught in which I typed up lecture notes, and portions of
these notes appear here as parts of Chapters 11 through 14.
Exception: We begin our treatment of the Riemann integral with an axiomatic approach: that is, we list some reasonable properties (axioms) that an area functional
should satisfy, and before we do the hard work of constructing such a functional
we explore the consequences of these axioms. In particular, less than two pages
into our discussion of integration we state and prove (given the axioms!) the Fundamental Theorem of Calculus. This approach to the Riemann Integral resonates
deeply with me, as it addresses concerns that bubbled up over several years of my
teaching this material in the context of freshman calculus. I came up with this
approach in late 2004 while preparing for the McGill analysis course. Just a few
days later I noticed that Lang had done (essentially) the same thing in his text [L].
I was encouraged that this idea had been used by others, and I endorse it still.
Given the choice between pounding out an - argument and developing a more
abstract or softer technique that leads to an easier proof, in this text we usually
opt for the former. Well, thats not strictly true: sometimes we do both.
We often spend time giving multiple proofs and approaches to basic results.
For instance, there are two distinct approaches to the Riemann integral: Riemanns
original approach using tagged partitions and Riemann sums, and Gaston Darbouxs later simplication using upper and lower sums and integrals. Most texts at
this level cover only one of these in detail, but here we cover both: rst Darboux,
then Riemann. I got permission to do so while teaching the analysis class at
McGill, since this was done in the ocial course text of R. Gordon [Go]. Later I
realized that Gordon is in real life an integration theorist!
More signicantly, we prove the Interval Theorems using - arguments and later
come back to give much quicker proofs using sequences and the Bolzano-Weierstrass
Theorem. One may well say that this is evidence that sequences should be treated
3In fact, I didnt nd out until just after the course ended that the students did not know
about countable and uncountable sets. Without conscious thought I had assumed otherwise.
12
FOREWORD
at the beginning of the course rather than towards the end, and many texts do
take this approach, most notably [R]. However I endorse Spivaks ordering of the
material, in which honors calculus essentially begins with a thorough grappling
with the - denition of limit. Although there are easier and especially, softer
approaches to most of the key theorems, I feel that manipulation of inequalities
that characterized hard analysis is a vitally important skill for students to learn
and this is their best chance to learn it.
We view the completeness axiom as the star of the show. Thus we do not put it
on the stage in Act I, Scene I, when people are still settling into their seats.
Some Novelties in the Text
We emphasize a version of the completeness axiom as an inductive principle, a
sort of continuous analogue to the usual discrete mathematical induction. (More
precisely our inductive principle is valid in a linearly ordered set i it is Dedekind
complete.) This principle, which we call real induction is not new as a piece
of research mathematics. Equivalent forms of it have been known for almost 100
years, and an identical form to mine appears in a 2011 note of D. Hathaway [?]. (I
rst explored these concepts in October of 2010, so in order for his article to appear
in published form in May 2011 it must have been done at least a little earlier than
mine.) I am told that in certain areas of analysis (e.g. PDEs) arguments by real
induction are quite standard. (In fact my note XXX which explores real induction
in a manner slightly deeper than the context of honors calculus allows has a few
citations in research papers.)
CHAPTER 1
14
Z+ N Z Q R C.
1. INTRODUCTION
15
lowest terms, i.e., with a and b not both divisible by any integer n > 1.2 Thus in
Q we have an additive identity 0, every element has an additive inverse, we have a
multiplicative identity 1, and every nonzero element has a multiplicative inverse.
What then are the real numbers R? The geometric answer is that the real numbers
correspond to points on the number line, but this does not make clear why there
are such points other than the rational numbers. An answer that one learns in high
school is that every real number has an innite decimal expansion, not necessarily
terminating or repeating, and conversely any integer followed by an innite decimal expansion determines a real number. In fact this is perfectly correct: it gives a
complete characterization of the real numbers, but it is not a cure-all: in order to
pursue the implications of this denition and even to really understand it one
needs tools that we will develop later in the course.
Finally, the complex numbers C are expressions of the form a + bi where a and
b are real numbers and i2 = 1. They are extremely important in mathematics
generally e.g. one needs them in order to solve polynomial equations but in this
course they will play at most a peripheral role.
Back to R: let us nail down the fact that there are real numbers which are not
rational. One way to see this is as follows: show that the decimal expansion of
every rational number is eventually periodic, and then exhibit a decimal expansion
which is not eventually periodic, e.g.
x = 0.16116111611116111116 . . .
where the number of 1s after each 6 increases by 1 each time. But this number x
reeks of contrivance: it seems to have been constructed only to make trouble. The
ancient Pythagoreans discovered a much more natural irrational real number.
Theorem 1.4. The square root of 2 is not a rational number.
Proof. The proof is the most famous (and surely one of the rst) instances
of a certain important kind of argument, namely a proof by contradiction. We
assume that what we are trying to prove is false, and from that we reason until we
reach an absurd conclusion. Therefore what we are trying
to prove must be true.
there are
integers a, b with b > 0 and 2 = ab . Since the dening property of 2 is that its
square is 2, there is really nothing to do but square both sides to get
2=
a2
.
b2
16
or
b2 = 2A2 .
Thus b2 is divisible by 2, so as above b = 2B for some B Z. Substituting, we get
4B 2 = (2B)2 = b2 = 2A2 ,
or
2B 2 = A2 .
Thus we are back where we started: assuming that 2b2 = a2 , we found that both
a and b were divisible by 2. This is suspect in the extreme, and we now have our
choice of killing blow. One ending is to observe that everything we have said above
applies to A and B: thus we must also have A = 2A1 , B = 2B2 , and so forth.
We can continue in this way factoring out as many powers of 2 from a and b as we
wish. But the only integer which is arbitrarily divisible by 2 is 0, so our conclusion
is a = b = 0, whereas we assumed b > 0: contradiction.
Alternately and perhaps more simply each rational number may be written
in lowest terms, so we could have assumed this about ab at the outset and, in particular,
that a and b are not both divisible by 2. Either way we get a contradiction,
so 2 must not be a rational number.
1.3. Why do we not do calculus on Q?
To paraphrase the title question, why do we want to use R to do calculus? Is
there something stopping us from doing calculus over, say, Q?
The answer to the second question is no: we can dene limits, continuity, derivatives and so forth for functions f : Q Q exactly as is done for real functions. The
most routine results carry over with no change: it is still true, for instance, that
sums and products of continuous functions are continuous. However most of the
big theorems especially, the Interval Theorems become false over Q.
For a, b Q, let [a, b]Q = {x Q | a x b}.
Example: Consider the function f : [0, 2]Q Q given by f (x) = 1 if x2 <
2
and f (x) = 1 if x2 > 2. Note that we do not need to dene f (x) at x = 2,
because by the result of the previous section these are not rational numbers. Then
f is continuous in fact it is dierentiable and has identically zero derivative. But
f (0) = 1 < 0, f (2) = 1 > 0, and there is no c [0, 2]Q such that f (c) = 0. Thus
the Intermediate Value Theorem fails over Q.
Example: Consider the function: f : [0, 2]Q Q given by f (x) = x212 . Again, this
function is well-dened at all points of [0, 2]Q because 2 is not a rational number.
It is also a continuous function. However it is
not bounded above: by taking rational numbers which are arbitrarily close to 2, x2 2 becomes arbitrarily small
and thus f (x) becomes arbitarily large.3 In particular, f certainly does not attain
a maximum value. Thus the Extreme Value Theorem fails over Q.
Moreover, it can be shown (and will be later) that any function on a closed,
bounded interval which is either uniformly continuous or integrable is bounded, so
3We will be much more precise about this sort of thing later on. This is just an overview.
17
the above function f is neither uniformly continuous nor integrable. If you have
had second semester
freshman calculus, you should think about why the analogous
function f : [0, 2] \ { 2} R
is not improperly Riemann integrable: it builds up
innite area as we approach 2.
The point of these examples is in order to succeed in getting calculus o the ground,
we need to make use of some fundamental property of the real numbers not possessed by (for intance) the rational numbers. This property, which can be expressed
in various forms, is called completeness, and will play a major role in this course.
2. Some Properties of Numbers
2.1. Axioms for a Field.
In order to do mathematics in a rigorous way, one needs to identify a starting
point. Virtually all mathematical theorems are of the form A = B. That is,
assuming A, B must follow. For instance, in Euclidean geometry one lays down a
set of axioms and reasons only from them. The axioms needed for calculus are a
lot to swallow in one dose, so we will introduce them gradually. What we give here
is essentially a codication of high school algebra, including inequalities.
Specically, we will give axioms that we want a number system to satisfy. At
this point we will take it for granted that in our number system we have operations
of addition, multiplication and an inequality relation <, and that there are distinguished numbers called 0 and 1. We require the following properties:
(P0) 0 = 1.
(P1) (Commutativity of +): For all numbers x, y, x + y = y + x.
(P2) (Associativity of +): For all numbers x, y, z, (x + y) + z = x + (y + z).
(P3) (Identity for +): For all numbers x, x + 0 = x.
(P4) (Inverses for +): For all numbers x, there exists y with x + y = 0.
(P5) (Commutativity of ): For all numbers x, y, x y = y x.
(P6) (Associativity of ): For all numbers x, y, z (x y) z = x (y z).
(P7) (Identity for ): For all numbers x, x 1 = x.
(P8) (Inverses for ) For all numbers x = 0, there exists a number y with xy = 1.
(P9) (Distributivity of over +): For all numbers x, y, z, x (y + z) = (x y) + (x z).
Although it is not important for us now, the above axioms (P0) through (P9)
are called the eld axioms, and a structure which satises them is called a eld.
Example: Both Q and R satisfy all of the above eld axioms. (We take this as
known information.)
Example: The complex numbers C satisfy all of the above eld axioms. The only
one which is not straightforward is the existence of multiplicative inverses. For this:
if z = x + iy is a nonzero complex number i.e., the real numbers x and y are not
both zero then if w = xxiy
2 +y 2 , zw = 1.
Example: Let F2 = {0, 1} be a set consisting of two elements, 0 and 1. We dene
18
0 + 0 = 0, 0 + 1 = 1 + 0 = 1, 1 + 1 = 0, 0 0 = 0 1 = 1 0 = 0, 1 1 = 1. Then F2
satises all of the above eld axioms. It is sometimes called the binary eld.
Proposition 1.5. In every system satisfying the eld axioms, for every number
x we have x 0 = 0.
Proof. We have x 0 = x (0 + 0) = (x 0) + (x 0). Subtracting (x 0) from
both sides gives 0 = x 0.
Proposition 1.6. In every system satisfying the eld axioms:
a) The only additive identity is 0.
b) Every number x has a unique additive inverse. If 1 denotes the additive inverse
of 1, then the additive inverse of x is (1) x.
c) The only multiplicative identity is 1.
b) Every nonzero number has a unique multiplicative inverse.
Proof. a) Note that 0 is an additive identity by (P3). Suppose that z is
another additive identity, and consider 0+z. Since 0 is an additive identity, 0+z = z.
Since z is an additive identity, 0 + z = 0. Thus z = 0.
b) Suppose y and z are both additive inverses to x: x + y = x + z = 0. Adding y
to both sides gives
y = 0 + y = (x + y) + y = (y + x) + y = y + (x + y)
= y + (x + z) = (y + x) + z = (x + y) + z = 0 + z = z,
so y = z. Moreover, for any number x,
(1) x + x = ((1) x) + (1 x) = (1 + 1) x = 0 x = 0.
c), d) The proofs of these are the same as the proofs of parts a) and b) but with
all instances of + replaced by and all instances of 0 replaced by 1.
Proposition 1.7. In every system satisfying the eld axioms, (1)2 = 1.
Proof. By Proposition 1.6, 11 is the additive inverse of 1, namely 1.
Note that a logically equivalent formulation of Proposition 1.8 is: in any system
satisfying the eld axioms, if xy = 0 then x = 0 or y = 0.
2.2. Axioms for an ordered eld.
The remaining properties of numbers concern the inequality relation <. Instead
of describing the relation < directly, it turns out to be simpler to talk about the
properties of positive numbers. If we are given the inequality relation <, then we
say that x is positive if x > 0, thus knowing < we know which numbers are positive. Conversely, suppose that we have identied a subset P of numbers as positive.
Then we can dene x < y if y x P. Now we want our set of positive numbers
19
20
and thus x1 = ( 1
x ) is negative.
d) Suppose x is positive and y is negative. In particular x and y are not zero,
so xy = 0. To show that xy is negative, by part b) it is enough to rule out the
possibility that xy is positive. Suppose it is. Then, by part c), since x is positive,
1
1
x is positive, and thus y = xy x would be positive: contradiction.
e) Suppose x and y are both negative. Then xy = 0, and we need to rule out the
possibility that xy is negative. Suppose it is. Then xy is positive, x1 is negative,
so by part d) y = xy x1 is negative and thus y is positive: contradiction.
a)
b)
c)
d)
e)
4When I rst typed this I wrote that x is less than n + 1. But actually this need not be
0
true! Can you think of an example? Beware: decimal expansions can be tricky.
21
Since every positive real number is less than or equal to some integer, and every positive rational number is, in particular, a positive real number, then also
every positive rational number is less than or equal to some integer. That is, Q also
satises the Archimedean property. (Or, directly: any positive rational number
may be written in the form ab with a, b Z+ , and then ab a.)
This Archimedean property is so natural and familiar (not to mention useful...)
that the curious student may be well wonder: are there in fact systems of numbers
satisfying the ordered eld axioms but not the Archimedean property?!? The answer is yes, there are plenty of them, and it is in fact possible to construct a theory
of calculus based upon them (in fact, such a theory is in many ways more faithful to
the calculus of Newton and Leibniz than the theory which we are presenting here,
which is a 19th century innovation). But we will not see such things in this course!
The next property does provide a basic dierence between Q and R.
Theorem 1.12. Let x be a real number and n Z+ .
22
Proof. If x 0 then x = |x|. If x < 0 then x < 0 < x = |x|, so x < |x|.
Theorem 1.15. (Triangle Inequality) For all numbers x, y, |x + y| |x| + |y|.
Proof. Since |x| is dened to be x if x 0 and x if x < 0, it is natural to
break the proof into cases.
Case 1: x, y 0. Then |x + y| = x + y = |x| + |y|.
Case 2: x, y < 0. Then x + y < 0, so |x + y| = (x + y) = x y = |x| + |y|.
Case 3: x 0, y < 0. Now unfortunately we do not know whether |x + y| is
non-negative or negative, so we must consider consider further cases.
Case 3a: x + y 0. Then |x + y| = x + y |x| + |y|.
Case 3b: x + y < 0. Then |x + y| = x y | x| + | y| = |x| + |y|.
Case 4: x < 0, y 0. The argument is exactly the same as that in Case 3. In fact,
we can guarantee it is the same: since the desired inequality is symmetric in x
and y meaning, if we interchange x and y we do not change what we are trying
to show we may reduce to Case 3 by interchanging x and y.5
The preceding argument is denitely the sort that one should be prepared to make
when dealing with expressions involving absolute values. However, it is certainly
not very much fun. Spivak gives an alternate proof of the Triangle Inequality
which is more interesting and thematic. First, since both quantities |x + y| and
|x| + |y| are non-negative, the inequality will hold i it holds after squaring both
sides (Proposition 1.11e). So it is enough to show
(|x + y|)2 (|x| + |y|)2 .
now (|x+y|)2 = (x+y)2 = x2 +2xy +y 2 , whereas (|x|+|y|)2 = |x|2 +2|x||y|+|y|2 =
x2 + |2xy| + y 2 , so subtracting the left hand side from the right, it is equivalent to
show that
0 (x2 + |2xy| + y 2 ) (x2 + 2xy + y 2 ).
But
(x2 + |2xy| + y 2 (x2 + 2xy + y 2 ) = |2xy| 2xy 0
by Proposition 1.14. So this gives a second proof of the Triangle Inequality.
A similar argument can be used to establish the following variant.
Proposition 1.16. (Reverse Triangle Inequality)
For all numbers x, y, ||x| |y|| |x y|.
Proof. Again, since both quantities are non-negative, it is sucient to prove
the inequality after squaring both sides:
(||x| |y||)2 = (|x| |y|)2 = |x|2 2|x||y| + |y|2 = x2 |2xy| + y 2
x2 2xy + y 2 = (x y)2 = (|x y|)2 .
Exercise: Let x, y be any numbers.
a) Show that |x| |y| |x y| by writing x = (x y) + y and applying the usual
triangle inequality.
b) Deduce from part a) that ||x| |y|| |x y|.
5Such symmetry arguments can often by used to reduce the number of cases considered.
23
x2i yj2 .
RHS =
x2i yi2 +
i=1
LHS =
x2i yi2 + 2
i=1
so
RHS LHS =
i=j
x2i yj2 2
i=j
xi yi xj yj ,
i<j
i<j
xi yj xj yi =
(xi yj xj yi )2 0.
i<j
Theorem 1.19. (Arithmetic-Geometric Mean Inequality, n = 2)
For all numbers 0 < a < b, we have
(
)2
a+b
2
a < ab <
< b2 .
2
Proof. First inequality: Since a > 0 and 0 < a < b, a a < a b.
Second inequality: Expanding out the square and clearing denominators, it is equivalent to 4ab < a2 + 2ab + ab2 , or to a2 2b + b2 > 0. But a2 2ab + b2 = (a b)2 ,
so since a = b, (a b)2 > 0.
a+b
Third inequality: Since a+b
2 and b are both positive, it is equivalent to 2 < b and
thus to a + b < 2b. But indeed since a < b, a + b < b + b = 2b.
Later we will use the theory of convexity to prove a signcant generalization of
Theorem 1.19, the Weighted Arithmetic-Geometric Mean Inequality.
CHAPTER 2
Mathematical Induction
1. Introduction
Principle of Mathematical Induction for sets
Let S be a subset of the positive integers. Suppose that:
(i) 1 S, and
(ii) n Z+ , n S = n + 1 S.
Then S = Z+ .
The intuitive justication is as follows: by (i), we know that 1 S. Now apply (ii) with n = 1: since 1 S, we deduce 1 + 1 = 2 S. Now apply (ii) with
n = 2: since 2 S, we deduce 2 + 1 = 3 S. Now apply (ii) with n = 3: since
3 S, we deduce 3 + 1 = 4 S. And so forth.
This is not a proof. (No good proof uses and so forth to gloss over a key point!)
But the idea is as follows: we can keep iterating the above argument as many times
as we want, deducing at each stage that since S contains the natural number which
is one greater than the last natural number we showed that it contained. Now it
is a fundamental part of the structure of the positive integers that every positive
integer can be reached in this way, i.e., starting from 1 and adding 1 suciently
many times. In other words, any rigorous denition of the natural numbers (for
instance in terms of sets, as alluded to earlier in the course) needs to incorporate,
either implicitly or (more often) explicitly, the principle of mathematical induction.
Alternately, the principle of mathematical induction is a key ingredient in any axiomatic characterization of the natural numbers.
It is not a key point, but it is somewhat interesting, so let us be a bit more specic.
In Euclidean geometry one studies points, lines, planes and so forth, but one does
not start by saying what sort of object the Euclidean plane really is. (At least
this is how Euclidean geometry has been approached for more than a hundred years.
Euclid himself gave such denitions as: A point is that which has position but
not dimensions. A line is breadth without depth. In the 19th century it was
recognized that these are descriptions rather than denitions, in the same way that
many dictionary denitions are actually descriptions: cat: A small carnivorous
mammal domesticated since early times as a catcher of rats and mice and as a pet
and existing in several distinctive breeds and varieties. This helps you if you are
already familiar with the animal but not the word, but if you have never seen a cat
before this denition would not allow you to determine with certainty whether any
particular animal you encountered was a cat, and still less would it allow you to
reason abstractly about the cat concept or prove theorems about cats.) Rather
25
26
2. MATHEMATICAL INDUCTION
point, line, plane and so forth are taken as undened terms. They are
related by certain axioms: abstract properties they must satisfy.
In 1889, the Italian mathematician and proto-logician Gisueppe Peano came up
with a similar (and, in fact, much simpler) system of axioms for the natural numbers. In slightly modernized form, this goes as follows:
The undened terms are zero, number and successor.
There are ve axioms that they must satisfy, the Peano axioms. The rst four are:
(P1)
(P2)
(P3)
(P4)
Zero is a number.
Every number has a successor, which is also a number.
No two distinct numbers have the same successor.
Zero is not the successor of any number.
Using set-theoretic language we can clarify what is going on here as follows: the
structures we are considering are triples (X, 0, S), where X is a set, 0 is an element
of X, and S : X X is a function, subject to the above axioms.
From this we can deduce quite a bit. First, we have a number (i.e., an element
of X) called S(0). Is 0 = S(0)? No, that is prohibited by (P4). We also have a
number S(S(0)), which is not equal to 0 by (P4) and it is also not equal to S(0),
because then S(0) = S(S(0)) would be the successor of the distinct numbers 0
and S(0), contradicting (P3). Continuing in this way, we can produce an innite
sequence of distinct elements of X:
(3)
In particular X itself is innite. The crux of the matter is this: is there any element
of X which is not a member of the sequence (3), i.e., is not obtained by starting at
0 and applying the successor function nitely many times?
The axioms so far do not allow us to answer this question. For instance, suppose
that the numbers consisted of the set [0, ) of all non-negative real numbers, we
dene 0 to be the real number of that name, and we dene the successor of x to be
x + 1. This system satises (P1) through (P4) but has much more in it than just
the natural numbers we want, so we must be missing an axiom! Indeed, the last
axiom is:
(P5) If Y is a subset of the set X of numbers such that 0 Y and such that
x Y implies S(x) Y , then Y = X.
Notice that the example we cooked up above fails (P5), since in [0, ) the subset
of natural numbers contains zero and contains the successor of each of its elements
but is a proper subset of [0, ).
Thus it was Peanos contribution to realize that mathematical induction is an axiom for the natural numbers in much the same way that the parallel postulate is
an axiom for Euclidean geometry.
27
On the other hand, it is telling that this work of Peano is little more than one
hundred years old, which in the scope of mathematical history is quite recent.
Traces of what we now recognize as induction can be found from the mathematics
of antiquity (including Euclids Elements!) on forward. According to the (highly
recommended!) Wikipedia article on mathematical induction, the rst mathematician to formulate it explicitly was Blaise Pascal, in 1665. During the next hundred
years various equivalent versions were used by dierent mathematicians notably
the methods of innite descent and minimal counterexample, which we shall discuss later and the technique seems to have become commonplace by the end of
the 18th century. Not having an formal understanding of the relationship between
mathematical induction and the structure of the natural numbers was not much
of a hindrance to mathematicians of the time, so still less should it stop us from
learning to use induction as a proof technique.
Principle of mathematical induction for predicates
Let P (x) be a sentence whose domain is the positive integers. Suppose that:
(i) P (1) is true, and
(ii) For all n Z+ , P (n) is true = P (n + 1) is true.
Then P (n) is true for all positive integers n.
Variant 1: Suppose instead that P (x) is a sentence whose domain is the natural numbers, i.e., with zero included, and in the above principle we replace (i) by
the assumption that P (0) is true and keep the assumption (ii). Then of course the
conclusion is that P (n) is true for all natural numbers n. This is more in accordance
with the discussion of the Peano axioms above.1
Exercise 1: Suppose that N0 is a xed integer. Let P (x) be a sentence whose
domain contains the set of all integers n N0 . Suppose that:
(i) P (N0 ) is true, and
(ii) For all n N0 , P (n) is true = P (n + 1) is true.
Show that P (n) is true for all integers n N0 . (Hint: dene a new predicate Q(n)
with domain Z+ by making a change of variables in P .)
2. The First Induction Proofs
2.1. The Pedagogically First Induction Proof.
There are many things that one can prove by induction, but the rst thing that
everyone proves by induction is invariably the following result.
Proposition 2.1. For all n Z+ , 1 + . . . + n =
n(n+1)
.
2
Proof. We go by induction on n.
Base case (n = 1): Indeed 1 = 1(1+1)
.
2
+
Induction step: Let n Z and suppose that 1 + . . . + n =
IH
1 + . . . + n + n + 1 = (1 + . . . + n) + n + 1 =
n(n+1)
.
2
Then
n(n + 1)
+n+1
2
1In fact Peanos original axiomatization did not include zero. What we presented above is a
28
2. MATHEMATICAL INDUCTION
n2 + n 2n + 2
n2 + 2n + 3
(n + 1)(n + 2)
(n + 1)((n + 1) + 1)
+
=
=
=
.
2
2
2
2
2
Here the letters IH signify that the induction hypothesis was used.
=
Induction is such a powerful tool that once one learns how to use it one can prove
many nontrivial facts with essentially no thought or ideas required, as is the case in
the above proof. However thought and ideas are good things when you have them!
In many cases an inductive proof of a result is a sort of rst assault which raises
the challenge of a more insightful, noninductive proof. This is certainly the case
for Proposition 2.1 above, which can be proved in many ways.
Here is one non-inductive proof: replacing n by n 1, it is equivalent to show:
(n 1)n
(4)
n Z, n 2 : 1 + . . . + n 1 =
.
2
( )
We recognize the quantity on the right-hand side as the binomial coecient n2 :
it counts the number of 2-element subsets of an n element set. This raises the
prospect of a combinatorial proof, i.e., to show that the number of 2-element
subsets of an n element set is also equal to 1 + 2 + . . . + n 1. This comes out
immediately if we list the 2-element subsets of {1, 2, . . . , n} in a systematic way:
we may write each such subset as {i, j} with 1 i n 1 and i < j n. Then:
The subsets with least element 1 are {1, 2}, {1, 3}, . . . , {1, n}, a total of n 1.
The subsets with least element 2 are {2, 3}, {2, 4}, . . . , {2, n}, a total of n 2.
..
.
The subset with least element n 1 is {n 1, n}, a total of 1.
( )
Thus the number of 2-element subsets of {1, . . . , n} is on the one hand n2 and
on the other hand (n 1) + (n 2) + . . . + 1 = 1 + 2 + . . . + n 1. This gives a
combinatorial proof of Proposition 2.1.
2.2. The (Historically) First(?) Induction Proof.
Theorem 2.2. (Euclid) There are innitely many prime numbers.
Proof. For n Z+ , let P (n) be the assertion that there are at least n prime
numbers. Then there are innitely many primes if and only if P (n) holds for all
positive integers n. We will prove the latter by induction on n.
Base Case (n = 1): We need to show that there is at least one prime number. For
instance, 2 is a prime number.
Induction Step: Let n Z+ , and assume that P (n) holds, i.e., that there are at
least n prime numbers p1 < . . . < pn . We need to show that P (n + 1) holds, i.e.,
there is at least one prime number dierent from the numbers we have already
found. To establish this, consider the quantity
Nn = p1 pn + 1.
Since p1 pn p1 2, Nn 3. In particular it is divisible by at least one prime
number, say q.2 But I claim that Nn is not divisible by pi for any 1 i n. Indeed,
n
Z. Then kpi = p1 pn + 1 =
if Nn = api for some a Z, then let b = p1 p
pi
2Later in these notes we will prove the stronger fact that any integer greater than one may
be expressed as a product of primes. For now we assume this (familiar) fact.
29
30
2. MATHEMATICAL INDUCTION
LHS(n) = RHS(n). In this case to make the induction proof work you need only
(i) establish the base case and (ii) verify the equality of successive dierences
LHS(n + 1) LHS(n) = RHS(n + 1) RHS(n).
We give two more familiar examples of this.
Proposition 2.3. For all n Z+ , 1 + 3 + . . . + (2n 1) = n2 .
Proof. Let P (n) be the statement 1 + 3 + . . . + (2n 1) = n2 . We will show
P (n) holds for all n Z+ by induction on n. Base case (n = 1): indeed 1 = 12 .
Induction step: Let n be an arbitrary positive integer and assume P (n):
(5)
1 + 3 + . . . + (2n 1) = n2 .
n(n+1)(2n+1)
.
6
Proof. By induction on n.
Base case: n = 1.
Induction step: Let n Z+ and suppose that 12 + . . . + n2 =
IH
1 + . . . + n2 + (n + 1)2 =
n(n+1)(2n+1)
.
6
Then
n(n + 1)(2n + 1)
+ (n + 1)2 =
6
4. INEQUALITIES
31
n(n + 1)
1+3+. . .+(2n1) =
(2i1) = 2
i
1=2
n = n2 +nn = n2 .
2
i=1
i=1
i=1
4. Inequalities
Proposition 2.5. For all n N , 2n > n.
Proof analysis: For n N, let P (n) be the statement 2n > n. We want to show
that P (n) holds for all natural numbers n by induction.
Base case: n = 0: 20 = 1 > 0.
Induction step: let n be any natural number and assume P (n): 2n > n. Then
2n+1 = 2 2n > 2 n.
We would now like to say that 2n n + 1. But in fact this is true if and only
if n 1. Well, dont panic. We just need to restructure the argument a bit: we
verify the statement separately for n = 0 and then use n = 1 as the base case of
our induction argument. Here is a formal writeup:
Proof. Since 20 = 1 > 0 and 21 = 2 > 1, it suces to verify the statement
for all natural numbers n 2. We go by induction on n.
Base case: n = 2: 22 = 4 > 2.
Induction step: Assume that for some natural number n 2, 2n > n. Then
2n+1 = 2 2n > 2n > n + 1.
Proposition 2.6. There exists N0 Z+ such that for all n N0 , 2n n3 .
Proof analysis: A little experimentation shows that there are several small values
of n such that 2n < n3 : for instance 29 = 512 < 93 = 729. On the other hand, it
seems to be the case that we can take N0 = 10: lets try.
Base case: n = 10: 210 = 1024 > 1000 = 103 .
Induction step: Suppose that for some n 10 we have 2n n3 . Then
2n+1 = 2 2n 2n3 .
Our task is then to show that 2n3 (n + 1)3 for all n 10. (By considering limits
as n , it is certainly the case that the left hand side exceeds the right hand
side for all suciently large n. Its not guaranteed to work for n 10; if not, we
will replace 10 with some larger number.) Now,
2n3 (n + 1)3 = 2n3 n3 3n2 3n 1 = n3 3n2 3n 1 0
n3 3n2 3n 1.
Since everything in sight is a whole number, this is in turn equivalent to
n3 3n2 3n > 0.
Now n3 3n2 3n = n(n2 3n 3), so this is equivalent
to n2 3n 3 0.
3 21
2
The roots of the polynomial x 3x 3 are x =
, so n2 3n 3 > 0 if
2
32
2. MATHEMATICAL INDUCTION
n > 4 = 3+2 25 > 3+2 21 . In particular, the desired inequality holds if n 10, so
by induction we have shown that 2n n3 for all n 10.
We leave it to to the student to convert the above analysis into a formal proof.
Remark: More precisely, 2n n3 for all natural numbers n except n = 2, 3, 4, 6, 7, 8, 9.
It is interesting that the desired inequality is true for a little while (i.e., at n = 0, 1)
then becomes false for a little while longer, and then becomes true for all n 10.
Note that it follows from our analysis that if for any N 4 we have 2N N 3 , then
this equality remains true for all larger natural numbers n. Thus from the fact that
29 < 93 , we can in fact deduce that 2n < n3 for all 4 n 8.
Proposition 2.7. For all n Z+ , 1 +
1
4
+ ... +
1
4
1
n2
2 n1 .
+ ... +
1
n2
2 n1 . Then
1
1
1
1
1
+ ... + 2 +
2 +
.
4
n
(n + 1)2
n (n + 1)2
1
n+1 ,
1
1
1
+
<2
.
2
n (n + 1)
n+1
Equivalently, it suces to show
1
1
1
+
< .
n + 1 (n + 1)2
n
But we have
1
1
n+1+1
n+2
+
=
=
.
n + 1 (n + 1)2
(n + 1)2
(n + 1)2
Everything in sight is positive, so by clearing denominators, the desired inequality
is equivalent to
2
33
all have the same color. Consider now a set of n + 1 horses, which for specicity
we label H1 , H2 , . . . , Hn , Hn+1 . Now we can split this into two sets of n horses:
S = {H1 , . . . , Hn }
and
T = {H2 , . . . , Hn , Hn+1 }.
By induction, every horse in S has the same color as H1 : in particular Hn has
the same color as H1 . Similarly, every horse in T has the same color as Hn : in
particular Hn+1 has the same color as Hn . But this means that H2 , . . . , Hn , Hn+1
all have the same color as H1 . It follows by induction that for all n Z+ , in any
set of n horses, all have the same color.
Proof analysis: Naturally one suspects that there is a mistake somewhere, and
there is. However it is subtle, and occurs in a perhaps unexpected place. In fact
the argument is completely correct, except the induction step is not valid when
n = 1: in this case S = {H1 } and T = {H2 } and these two sets are disjoint:
they have no horses in common. We have been misled by the dot dot dot notation which suggests, erroneously, that S and T must have more than one element.
In fact, if only we could establish the argument for n = 2, then the proof goes
through just ne. For instance, the result can be xed as follows: if in a nite set
of horses, any two have the same color, then they all have the same color.
There is a moral here: one should pay especially close attention to the smallest
values of n to make sure that the argument has no gaps. On the other hand,
there is a certain type of induction proof for which the n = 2 case is the most
important (often it is also the base case, but not always), and the induction step
is easy to show, but uses once again the n = 2 case. Here are some examples of this.
The following is a fundamental fact of number theory, called Euclids Lemma.
Proposition 2.8. Let p be a prime, and let a, b Z+ . If p | ab, p | a or p | b.
Later in this chapter we will give a proof (yes, by induction!). Lets assume it for
now. Then we can swiftly deduce the following useful generalization.
Proposition 2.9. Let p be a prime number, n Z+ and a1 , . . . , an Z+ . If
p | a1 an , then p | ai for some 1 i n.
Proof. This is trivial for n = 1. We show it for all n 2 by induction.
Base case: n = 2: This is precisely Euclids Lemma.
Induction Step: We assume that for a given n Z+ and a1 , . . . , an Z+ , if a prime
p divides the product a1 an , then it divides at least one ai . Let a1 , . . . , an , an+1
Z, and that a prime p divides a1 an an+1 . Then p | (a1 an )an+1 , so by Euclids
Lemma, p | a1 an or p | an+1 . If the latter, were done. If the former, then by
our inductive hypothesis, p | ai for some 1 i n, so we are also done.
Corollary 2.10. Let p be a prime, and let a, n Z+ . Then p | an = p | a.
1
Exercise 5: Use Corollary 2.10 to show that for any prime p, p n is irrational.
34
2. MATHEMATICAL INDUCTION
35
In other words, if one can deduce statement C from statement A, then one can
also deduce statement C from A together with some additional hypothesis or hypotheses B. Specically, we can take A to be P (n), C to be P (n + 1) and B to be
P (1) P (2) . . . P (n 1).
Less obviously, one can use our previous PMI to prove PS/CI. The proof is not
hard but slightly tricky. Suppose we know PMI and wish to prove PS/CI. Let P (n)
be a sentence with domain the positive integers and satisfying (i) and (ii) above.
We wish to show that P (n) holds for all n Z+ , using only ordinary induction.
The trick is to introduce a new predicate Q(n), namely
Q(n) = P (1) P (2) . . . P (n).
Notice that Q(1) = P (1) and that (ii) above tells us that Q(n) = P (n + 1).
But if we know Q(n) = P (1) . . . P (n) and we also know P (n + 1), then we
know P (1) . . . P (n) P (n + 1) = Q(n + 1). So Q(1) holds and for all n,
Q(n) = Q(n + 1). So by ordinary mathematical induction, Q(n) holds for all n,
hence certainly P (n) holds for all n.
Exercise 6: As for ordinary induction, there is a variant of strong/complete induction where instead of starting at 1 we start at any integer N0 . State this explicitly.
Here is an application which makes full use of the strength of PS/CI.
Proposition 2.11. Let n > 1 be an integer. Then there exist prime numbers
p1 , . . . , pk (for some k 1) such that n = p1 pk .
Proof. We go by strong induction on n. Base case: n = 2. Indeed 2 is prime,
so were good.
Induction step: Let n > 2 be any integer and assume that the statement is true for
all integers 2 k < n. We wish to show that it is true for n.
Case 1: n is prime. As above, were good.
Case 2: n is not prime. By denition, this means that there exist integers a, b, with
1 < a, b < n, such that n = ab. But now our induction hypothesis applies to both
a and b: we can write a = p1 pk and b = q1 ql , where the pi s and qj s are all
prime numbers. But then
n = ab = p1 pk q1 ql
is an expression of n as a product of prime numbers: done!
This is a good example of the use of induction (of one kind or another) to give a
very clean proof of a result whose truth was not really in doubt but for which a
more straightforward proof is wordier and messier.
7. Solving Homogeneous Linear Recurrences
Recall our motivating problem for PS/CI: we were given a sequence dened by
a1 = 1, a2 = 2, and for all n 1, an = 3an1 2an2 . By trial and error we
guessed that an = 2n1 , and this was easily conrmed using PS/CI.
But this was very lucky (or worse: the example was constructed so as to be easy
to solve). In general, it might not be so obvious what the answer is, and as above,
36
2. MATHEMATICAL INDUCTION
(ab + ac c)an b
a
1
xn = can + b
=
.
ai = can + b
a1
a1
i=1
In particular the sequence xn grows exponentially in n.
Let us try our hand on a sequence dened by a two-term recurrence:
F1 = F2 = 1; n 1, Fn+2 = Fn+1 + Fn .
The Fn s are the famous Fibonacci numbers. Again we list some values:
F3 = 2, F4 = 3, F5 = 5, F6 = 8, F7 = 13, F8 = 21, F9 = 34, F10 = 55,
F11 = 89, F12 = 144, F13 = 233, F14 = 377, F15 = 377,
F200 = 280571172992510140037611932413038677189525,
F201 = 453973694165307953197296969697410619233826.
These computations suggest Fn grows exponentially. Taking ratios of successive
values suggests that the base of the exponential lies between 1 and 2, e.g.
F201
= 1.618033988749894848204586834 . . . .
F200
Cognoscenti may recognize this as the decimal expansion of the golden ratio
1+ 5
.
=
2
37
However, lets consider a more general problem and make a vaguer guess. Namely,
for real numbers b, c we consider an recurrence of the form
(6)
In all the cases weve looked at the solution was (roughly) exponential. So lets
guess an exponential solution xn = Crn and plug this into the recurrence; we get
Crn+2 = xn+2 = b(Crn+1 ) + c(Crn ),
which simplies to
r2 br cr = 0.
Evidently the solutions to this are
b b2 + 4c
r=
.
2
Some cases to be concerned about are the case c = b4 , in which case we have
2
only a single root r = 2b , and the case c < b4 in which case the roots are complex
numbers.
case: b = c = 1. Then
But for the moment lets look at the Fibonacci
1 5
= 1 = .618033988749894848204586834 . . . .
2
So we have two dierent bases what do we do with that? A little thought shows
that if r1n and r2n are both solutions to the recurrence xn+2 = bxn+1 cxn (with any
initial conditions), then so is C1 r1n + C2 r2n for any constants C1 and C2 . Therefore
we propose xn = C1 r1n + C2 r2n as the general solution to the two-term homogeneous linear recurrence (6) and the two initial conditions x1 = A1 , x2 = A2 provide
just enough information to solve for C1 and C2 .
2
38
2. MATHEMATICAL INDUCTION
1
1
1
1
= ( )
= , C2 = .
1+
5
2 1
5
5
2
1
2
where =
1+ 5
2 .
1
Fn = (n (1 )n ) ,
5
39
It certainly looks as though xn = n for all n. Indeed, assuming it to be true for all
positive integers smaller than n + 2, we easily check
xn+2 = 2xn+1 xn = 2(n + 1) n = 2n + 2 n = n + 2.
The characteristic polynomial is r2 2r + 1 = (r 1)2 : it has repeated roots. One
solution is C1 1n = C1 (i.e., xn is a constant sequence). This occurs i x2 = x1 , so
clearly there are nonconstant solutions as well. It turns out that in general, if the
characteristic polynomial is (x r)2 , then the two basic solutions are xn = rn and
also xn = nrn . It is unfortunately harder to guess this in advance, but it is not hard
to check that this gives a solution to a recurrence of the form xn+2 = 2r0 xn+1 r02 xn .
These considerations will be eerily familiar to the reader who has studied dierential equations. For a more systematic exposition on discrete analogues of calculus
concepts (with applications to the determination of power sums as in 3), see [DC].
8. The Well-Ordering Principle
There is yet another form of mathematical induction that can be used to give what
is, arguably, an even more elegant proof of Proposition 2.11.
Theorem 2.13. (Well-Ordering Principle) Let S be a nonempty subset of Z+ .
Then S has a least element, i.e., there exists s S such that for all t S, s t.
Intutitively, the statement is true by the following reasoning: rst we ask: is 1 S?
If so, it is certainly the least element of S. If not, we ask: is 2 S? If so, it is
certainly the least element of S. And then we continue in this way: if we eventually
get a yes answer then we have found our least element. But if for every n the
answer to the question Is n an element of S? is negative, then S is empty!
The well-ordering principle (henceforth WOP) is often useful in its contrapositive form: if a subset S Z+ does not have a least element, then S = .
We claim WOP is logically equivalent to the principle of mathematical induction (PMI) and thus also to the principle of strong/complete induction (PS/CI).
First we will assume PS/CI and show that WOP follows. For this, observe that
WOP holds i P (n) holds for all n Z+ , where P (n) is the following statement:
P (n): If S Z+ and n S, then S has a least element.
Indeed, if P (n) holds for all n and S Z is nonempty, then it contains some
positive integer n, and then we can apply P (n) to see that S has a least element.
Now we can prove that P (n) holds for all n by complete induction: rst, if 1 S,
then indeed 1 is the least element of S, so P (1) is certainly true. Now assume P (k)
for all 1 k n, and suppose that n + 1 S. If n + 1 is the least element of S,
then were done. If it isnt, then it means that there exists k S, 1 k S. Since
we have assumed P (k) is true, therefore there exists a least element of S.
Conversely, let us assume WOP and prove PMI. Namely, let S Z and suppose
that 1 S, and that for all n, if n S then n + 1 S. We wish to show that
S = Z+ . Equivalently, putting T = Z+ \ S, we wish to show that T = . If not,
40
2. MATHEMATICAL INDUCTION
then by WOP T has a least element, say n. Reasoning this out gives an immediate
contradiction: rst, n S. By assumption, 1 S, so we must have n > 1, so that
we can write n = m + 1 for some m Z+ . Further, since n is the least element of
T we must have n 1 = m S, but now our inductive assumption implies that
n + 1 = n S, contradiction.
So now we have shown that PMI PS/CI = WOP = PMI.
Let us give another proof of Proposition 2.11 using WOP. We wish to show that
every integer n > 1 can be factored into primes. Let S be the set of integers n > 1
which cannot be factored into primes. Seeking a contradiction, we assume S is
nonempty. In that case, by WOP it has a least element, say n. Now n is certainly
not prime, since otherwise it can be factored into primes. So we must have n = ab
with 1 < a, b < n. But now, since a and b are integers greater than 1 which are
smaller than the least element of S, they must each have prime factorizations, say
a = p1 pk , b = q1 ql . But then (stop me if youve heard this one before)
n = ab = p1 pk q1 ql
itself can be expressed as a product of primes, contradicting our assumption. therefore S is empty: every integer greater than 1 is a product of primes.
This kind of argument is often called proof by minimum counterexample.
Upon examination, the two proofs of Proposition 2.11 are very close: the dierence
between a proof using strong induction and a proof using well ordering is more a
matter of literary taste than mathematical technique.
9. The Fundamental Theorem of Arithmetic
9.1. Euclids Lemma and the Fundamental Theorem of Arithmetic.
The following are the two most important theorems in beginning number theory.
Theorem 2.14. (Euclids Lemma) Let p be a prime number and a, b be positive
Suppose that p | ab. Then p | a or p | b.
Theorem 2.15. (Fundamental Theorem of Arithmetic) The factorization of
any integer n > 1 into primes is unique, up to the order of the factors. Explicitly,
suppose that
n = p1 pk = q1 ql ,
are two factorizations of n into primes, with p1 . . . pk and q1 . . . ql . Then
k = l and pi = qi for all 1 i k.
We say a prime factorization n = p1 pk is in standard form if, as above,
p1 . . . pk . Every prime factorization can be put in standard form by ordering
the primes from least to greatest, and dealing with standard form factorizations is
a convenient bookkeeping device, since otherwise our uniqueness statement would
have to include the proviso up to the order of the factors.
Given Proposition 2.11 i.e., the existence of prime factorizations Theorems
2.14 and 15.13 are equivalent: each can be easily deduced from the other.
41
EL implies FTA: Assume Euclids Lemma. As seen above, this implies the Generalized Euclids Lemma (Proposition 2.9): if a prime divides any nite product of
integers it must divide one of the factors. Our proof will be by minimal counterexample: suppose that there are some integers greater than one which factor into
primes in more than one way, and let n be the least such integer, so
(7)
n = p1 pk = q1 ql ,
42
2. MATHEMATICAL INDUCTION
n = p1 pr = q1 qs .
Here the pi s and qj s are prime numbers, not necessarily distinct from each other.
However, p1 = qj for any j. Indeed, if we had such an equality, then after relabelling
the qj s we could assume p1 = q1 and then divide through by p1 = q1 to get a smaller
positive integer pn1 . By the assumed minimality of n, the prime factorization of pn1
must be unique: i.e., r 1 = s 1 and pi = qi for all 2 i r. But then
multiplying back by p1 = q1 we see that we didnt have two dierent factorizations
after all. (In fact this shows that for all i, j, pi = qj .)
In particular p1 = q1 . Without loss of generality, assume p1 < q1 . Then, if we
subtract p1 q2 qs from both sides of (9), we get
(10)
43
CHAPTER 3
f : x 7 an xn + . . . + a1 x + a0
46
n
Then every polynomial expression f = i=0 ai xi determines a polynomial function x 7 f (x). But it is at least conceivable that two dierent-looking polynomial
expressions give rise to the same function. To give some rough idea of what I mean
here, consider the two expressions f = 2 arcsin x + 2 arccos x and g = . Now it
turns out for all x [1, 1] (the common domain of the arcsin and arccos functions) we have f (x) = . (The angle whose sine is x is complementary to the
angle whose cosine is x, so arcsin x + arccos x = + = 2 .) But still f and g are
given by dierent expressions: if I ask you what the coecient of arcsin x is in
the expression f , you will immediately tell me it is 2. If I ask you what the coecient of arcsin x is in the expression , you will have no idea what Im talking about.
One special polynomial expression is the zero polynomial. This is the polynomial whose ith coecient ai is equal to zero for all i 0.
Every nonzero polynomial expression has a degree, which is a natural number,
the largest natural number i such that the coecient ai of xi is nonzero. Thus in
(11) the degree of f is n if and only if an = 0: otherwise the degree is smaller than n.
Although the zero polynomial expression does not have any natural number as
a degree, it is extremely convenient to regard deg 0 as negative, i.e., such that deg 0
is smaller than the degree of any nonzero polynomial. This means that for any
d N the set of polynomials of degree at most d includes the zero polynomial.
We will follow this convention here but try not to make too big a deal of it.
Let us give some examples to solidify this important concept:
The polynomials of degree at most 0 are the expressions f = a0 . The corresponding
functions are all constant functions: their graphs are horizontal lines. (The graph
of the zero polynomial is still a horizontal line, y = 0, so it is useful to include the
zero polynomial as having degree at most zero.)
The polynomials of degree at most one are the linear expressions L = mx + b.
The corresponding functions are linear functions: their graphs are straight lines.
The degree of L(x) is one if m = 0 i.e., if the line is not horizontal and 0 if
m = 0 and b = 0.
Similarly the polynomials of degree at most two are the quadratic expressions
q(x) = ax2 + bc + c. The degree of q is 2 unless a = 0.
We often denote the degree of the polynomial expression f by deg f .
Theorem 3.1. Let f, g be nonzero polynomial expressions.
a) If f + g = 0, then deg(f + g) max(deg f, deg g).
b) We have deg(f g) = deg f + deg g.
Proof. a) Suppose that
f (x) = am xm + . . . + a1 x + a0 , am = 0
and
g(x) = bn xn + . . . + b1 x + b0 , bn = 0
1. POLYNOMIAL FUNCTIONS
47
n
Exercise: For polynomial expressions f = i=0 ai xi and g = j=0 bj xj , dene
(f g)(x) =
i=0
ai (
bj xj )n .
j=0
48
Thus we may take q(x) = q1 (x)+ . . .+qk (x), so that r(x) = a(x)q(x)b(x) = rk (x)
and deg r(x) = deg rk (x) < deg b(x).
Step 2: We prove the uniqueness of q(x) (and thus of r(x) = a(x) q(x)b(x)).
Suppose Q(x) is another polynomial such that R(x) = a(x) Q(x)b(x) has degree
less than the degree of b(x). Then
a(x) = q(x)b(x) + r(x) = Q(x)b(x) + R(x),
so
(q(x) Q(x))b(x) = R(x) r(x).
Since r(x) and R(x) both have degree less than deg b(x), so does r(x) R(x), so
deg b(x) > deg(R(x)r(x)) = deg((q(x)Q(x))b(x)) = deg(q(x)Q(x))+deg b(x).
Thus deg(q(x) Q(x)) < 0, and the only polynomial with negative degree is the
zero polynomial, i.e., q(x) = Q(x) and thus r(x) = R(x).1
Exercise: Convince yourself that the proof of Step 1 is really a careful, abstract
description of the standard high school procedure for long division of polynomials.
Theorem 3.2 has many important and useful consequences; here are some of them.
Theorem 3.3. (Root-Factor Theorem) Let f (x) be a polynomial expression and
c a real number. The following are equivalent:
(i) f (c) = 0. (c is a root of f .)
(ii) There is some polynomial expression q such that as polynomial expressions,
f (x) = (x c)q(x). (x c is a factor of f .)
Proof. We apply the Division Theorem with a(x) = f (x) and b(x) = x c,
getting polynomials q(x) and r(x) such that
f (x) = (x c)q(x) + r(x)
and r(x) is either the zero polynomial or has deg r < deg x c = 1. In other
words, r(x) is in all cases a constant polynomial (perhaps constantly zero), and its
constant value can be determined by plugging in x = c:
f (c) = (c c)q(c) + r(c) = r(c).
The converse is easier: if f (x) = (x c)q(x), then f (c) = (c c)q(c) = 0.
Corollary 3.4. Let f be a nonzero polynomial of degree n. Then the corresponding polynomial function f has at most n real roots: i.e., there are at most n
real numbers a such that f (a) = 0.
Proof. By induction on n.
Base case (n = 0): If deg f (x) = 0, then f is a nonzero constant, so has no roots.
Induction Step: Let n N, suppose that every polynomial of degree n has at
most n real roots, and let f (x) be a polynomial of degree n + 1. If f (x) has no
real root, great. Otherwise, there exists a R such that f (a) = 0, and by the
Root-Factor Theorem we may write f (x) = (x a)g(x). Moreover by Theorem
3.1, we have n + 1 = deg f = deg(x a)g(x) = deg(x a) + deg g = 1 + deg g, so
deg g = n. Therefore our induction hypothesis applies and g(x) has m distinct real
1If you dont like the convention that the zero polynomial has negative degree, then you
can phrase the argument as follows: if q(x) Q(x) were a nonzero polynomial, these degree
considerations would give the absurd conclusion deg(q(x) Q(x)) < 0, so q(x) Q(x) = 0.
1. POLYNOMIAL FUNCTIONS
49
50
Proof. We know
( )
bn
b
b
0=P
= an n + . . . + a1 + a0 .
c
c
c
2. RATIONAL FUNCTIONS
51
2. Rational Functions
A rational function is a function which is a quotient of two polynomial functions:
P (x)
f (x) = Q(x)
, with Q(x) not the zero polynomial. To be sure, Q is allowed to have
roots; it is just not allowed to be zero at all points of R.
P (x)
The natural domain of Q(x)
is the set of real numbers for which Q(x) = 0,
i.e., all but a nite (possibly empty) set of points.
so we may take A = aP , B = bP .
52
Exercise: Show that every monic polynomial of positive degree factors uniquely into
a product of monic irreducible polynomials.
Corollary 3.12. Let n Z+ , and let p, P, Q0 be polynomials, such that p is
irreducible and that p does not divide Q0 .
a) There are polynomials B, A such that we have a rational function identity
P (x)
B(x)
A(x)
=
.
+
p(x)n Q0 (x)
p(x)n
p(x)n1 Q0 (x)
(12)
P = Ap + BQ0 .
n
The identity of Corollary 3.12 can be applied repeatedly: suppose we start with a
P
proper rational function Q
, with Q a monic polynomial (as is no loss of generality;
we can absorb the leading coecient into P ). Then Q is a polynomial of positive
degree, so we may factor it as
Q = pa1 1 par r ,
where p1 , . . . , pr are distinct monic irreducible polynomials. Let us put Q0 =
pa2 2 par r , so Q = pa1 1 Q0 . Applying Corollary 3.12, we may write
Ba
Aa1
P
= a11 + n1
Q
p1
p1 Q0
with deg Ba1 < deg p1 and deg Aa1 < deg(p1n1 Q0 ). Thus Corollary 3.12 applies to
Aa1
, as well, giving us overall
a1 1
p1
Q0
P
Ba
Ba 1
Aa1 1
= a11 + a111 + n2
.
Q
p1
p1 Q0
p1
2. RATIONAL FUNCTIONS
53
P (x)
Q(x)
be a proper
P (x)
Ak,i
Bl,j x + Cl,j
A1,i
B1,j x + C1,j
=
+. . .+
+
+. . .+
.
i
i
j
Q(x)
(x
c
)
(x
c
)
q
(x)
ql (x)j
1
k
1
i=1
j=1
i=1
j=1
Exercise: Show that the constants in (16) are unique.
CHAPTER 4
56
easy to see that an ordered eld admits innitesimal elements i it does not satisfy
the Archimedean axiom, whereas the real numbers R do satisfy the Archimedean
axiom. So at best Lebiniz was advocating a limiting process based on a dierent
mathematical model of the real numbers than the standard modern one. And
at worst, Lebinizs writing on innitesimals seems like equivocation: at dierent
stages of a calculation the same quantity is at one point vanishingly small and at
another point not. The calculus of both uxions and infnitesimals required, among
other things, some goodwill: if you used them as Newton and Leibniz did in their
calculations then at the end you would get a sensible (in fact, correct!) answer.
But if you wanted to make trouble and ask why innitesimals could not be manipulated in other ways which swiftly led to contradictions, it was all too easy to do so.
The calculus of Newton and Leibniz had a famous early critic, Bishop George
Berkeley. In 1734 he published The Analyst, subtitled A DISCOURSE Addressed
to an Indel MATHEMATICIAN. WHEREIN It is examined whether the Object,
Principles, and Inferences of the modern Analysis are more distinctly conceived,
or more evidently deduced, than Religious Mysteries and Points of Faith. Famously, Berkeley described uxions as the ghosts of departed quantities. I havent
read Berkeleys text, but from what I am told it displays a remarkable amount of
mathematical sophistication and most of its criticisms are essentially valid!
So if the mid 17th century is the birth of systematic calculus it is not the birth
of a satisfactory treatment of the limiting concept. When did this come? More
than 150 years later! The modern denition of limits via inequalities was given by
Bolzano in 1817 (but not widely read), in a somewhat imprecise form by Cauchy
in his inuential 1821 text, and then nally by Weierstrass around 1850.
2. Derivatives Without a Careful Denition of Limits
Example 2.1: Let f (x) = mx + b be a linear function. Then f has the following
property: for any x1 = x2 , secant line between the two points (x1 , f (x1 )) and
(x2 , f (x2 )) is the line y = f (x). Indeed, the slope of the secant line is
f (x2 ) f (x1 )
mx2 + b (mx1 + b)
m(x2 x1 )
=
=
= m.
x2 x1
x2 x1
x2 x1
Thus the secant line has slope m and passes through the point (x1 , f (x1 )), as does
the linear function f . But there is a unique line passing through a given point with
a given slope, so that the secant line must be y = mx + b.
Using this, it is now not at all dicult to compute the derivative of a linear function...assuming an innocuous fact about limits.
Example 2.2: Let f (x) = mx + b. Then
f (x) = lim
h0
f (x + h) f (x)
mh
= lim m(x + h) + b (mx + b)h = lim
= lim m.
h0
h0 h
h0
h
The above computation is no surprise, since we already saw that the slope of any
secant line to a linear function y = mx + b is just m. So now we need to evaluate
the limiting slope of the secant lines. But surely if the slope of every secant line
57
is m, the desired limiting slope is also m, and thus f (x) = m (constant function).
Let us record the fact about limits we used.
Fact 4.1. The limit of a constant function f (x) = C as x approaches a is C.
Example 2.3: Let f (x) = x2 . Then
f (x + h) f (x)
(x + h)2 x2
= lim
h0
h0
h
h
f (x) = lim
x2 + 2xh + h2 x2
h(2x + h)
= lim
= lim 2x + h.
h0
h0
h0
h
h
Now Leibniz would argue as follows: in computing the limit, we want to take h
innitesimally small. Therefore 2x + h is innitesimally close to 2x, and so in the
limit the value is 2x. Thus
f (x) = 2x.
= lim
But these are just words. A simpler and equally accurate description of what we
(x)
have done is as follows: we simplied the dierence quotient f (x+h)f
until we
h
got an expression in which it made good sense to plug in h = 0, and then we
plugged in h = 0. If you wanted to give a freshman calculus student practical
instructions on how to compute derivatives of reasonably simple functions directly
from the denition, I think you couldnt do much better than this!
Example 2.4: f (x) = x3 . Then
f (x + h) f (x)
(x + h)3 x3
x3 + 3x2 h + 3xh2 + h3 x3
= lim
= lim
h0
h0
h0
h
h
h
f (x) = lim
h(3x2 + 3xh + h2 )
= lim 3x2 + 3xh + h2 .
h0
h0
h
Again we have simplied to the point where we may meaningfully set h = 0, getting
= lim
f (x) = 3x2 .
Example 2.5: For n Z+ , let f (x) = xn . Then
n (n) ni i
x h xn
f (x + h) f (x)
(x + h)n xn
f (x) = lim
= lim
= lim i=0 i
h0
h0
h0
h
h
h
n (n) ni ii
( )
(
)
n
n
h i=1 i x h
n n1
= lim
= lim
x
+h
xni hi2
h0
h0 1
h
i
i=2
( )
n n1
=
x
= nxn1 .
1
At this point we have seen many examples of a very pleasant algebraic phenomenon.
Namely, for y = f (x) a polynomial function, when we compute the dierence
(x)
we nd that the numerator, say G(h) = f (x + h) f (x),
quotient f (x+h)f
h
always has h as a factor: thus we can write it as G(h) = hg(h), where g(h) is
another polynomial in h. This is exactly what we need in order to compute the
derivative, because when this happens we get
f (x + h) f (x)
hg(h)
= lim
= lim g(h) = g(0).
h0
h0
h0
h
h
f (x) = lim
58
ha
(This generalizes our rst fact above, since constant functions are polynomials.)
Dierentiating polynomial functions directly from the denition is, evidently, somewhat tedious. Perhaps we can establish some techniques to streamline the process?
For instance, suppose we know the derivative of some function f : what can we say
about the derivative of cf (x), where c is some real number? Lets see:
(
)
cf (x + h) cf (x)
f (x + h) f (x)
lim
= lim c
.
h0
h0
h
h
If we assume limxa cf (x) = c limxa f (x), then we can complete this computation:
the derivative of cf (x) is cf (x). Let us again record our assumption about limits.
Fact 4.3. If limxa f (x) = L, then limxa cf (x) = cL.
This tells for instance that the derivative of 17x10 is 17(10x9 ) = 170x9 . More generally, this tells us that the derivative of the general monomial cxn is cnxn1 .
Now what about sums?
Let f and g be two dierentiable functions. Then the derivative of f + g is
f (x + h) + g(x + h) f (x) g(x)
f (x + h) f (x) g(x + h) g(x)
lim
= lim
+
.
h0
h0
h
h
h
If we assume the limit of a sum is the sum of limits, we get
(f + g) = f + g .
Again, lets record what weve used.
Fact 4.4. If limxa f (x) = L and limxa g(x) = M , then limx (f + g)(x) =
L + M.
Exercise 2.6: Show by mathematical induction that if f1 , . . . , fn are functions with
derivatives f1 , . . . , fn , then (f1 + . . . + fn ) = f1 + . . . + fn .
Putting these facts together, we get an expression for the derivative of any polynomial function: if f (x) = an xn + . . . + a1 x + a0 , then f (x) = nan xn1 + . . . + a1 . In
particular the derivative of a degree n polynomial is a polynomial of degree n 1
(and the derivative of a constant polynomial is the zero polynomial).
3. Limits in Terms of Continuity
We have been dancing around two fundmaental issues in our provisional treatment
of derivatives. The rst is, of course, the notion of the limit of a function at a point.
The second, just as important, is that of continuity at a point.
In freshman calculus it is traditional to dene continuity in terms of limits. A
59
true fact which is not often mentioned is that this works just as well the other way
around: treating the concept of a continuous function as known, one can dene
limits in terms of it. Since I think most people have at least some vague notion
of what a continuous function is very roughly it is that the graph y = f (x) is a
nice, unbroken curve and I know all too well that many students have zero intuition for limits, it seems to be of some value to dene limits in terms of continuity.
Let f : R R be a function. For any x R, f may or may not be continuous at x. We say f is simply continuous if it is continuous at x for every x R.3u
Here are some basic and useful properties of continuous functions. (Of course
we cannot prove them until we give a formal denition of continuous functionsu.)
Fact 4.5. a) Every constant function is continuous at every c R.
b) The identity function I(x) = x is continuous at every c R.
c) If f and g are continuous at x = c, then f + g and f g are continuous at x = c.
d) If f is continuous at x = c and f (c) = 0, then f1 is continuous at x = c.
e) If f is continuous at x = c and g is continuous at x = f (c), then g f is
continuous at x = c.
From this relatively small number of facts many other facts follow. For instance,
since polynomials are built up out of the identity function and the constant functions by repeated addition and multiplication, it follows that all polynomials are
continuous at every c R. Similarly, every rational function fg is continuous at
every c in its domain, i.e., at all points c such that g(c) = 0.
We now wish to dene the limit of a function f at a point c R. Here it is
crucial to remark that c need not be in the domain of f . Rather what we need
is that f is dened on some deleted interval Ic, about c: that is, there is some
> 0 such that all points in (c , c + ) except possibly at c, f is dened. To see
that this a necessary business, consider the basic limit dening the derivative:
f (x + h) f (x)
.
h
Here x is xed and we are thinking of the dierence quotient as a function of h.
Note though that this function is not dened at h = 0: the denominator is equal
to zero. In fact what we are trying to when dierentiating is to nd the most reasonable extension of the right hand side to a function which is dened at 0. This
brings us to the following denition.
f (x) = lim
h0
60
61
62
|A| . (Note what is being done here: by continuity, we can make |f (x) f (c)| less
than any positive number we choose. It is convenient for us to make it smaller than
= .
|A|
b) Fix > 0. We must show that there exists > 0 such that |x c| < implies
|f (x) + g(x) (f (c) + g(c))| < . Now
|f (x)+g(x)(f (c)+g(c))| = |(f (x)f (c))+(g(x)g(c))| |f (x)f (c)|+|g(x)g(c)|.
This is good: since f and g are both continuous at c, we can make each of |f (x)
f (c)| and |g(x) g(c)| as small as we like by taking x suciently close to c. The
sum of two quantities which can each be made as small as we like can be made as
small as we like!
Now formally: choose 1 > 0 such that |x c| < 1 implies |f (x) f (c)| < 2 .
Choose 2 > 0 such that |x c| < 2 implies |g(x) g(c)| < 2 . Let = min(1 , 2 ).
Then |x c| < implies |x c| < 1 and |x c| < 2 , so
63
Using Lemma 4.6a) and taking = 1, there exists 1 > 0 such that |x c| < 1
implies |f (x)| |f (c)| + 1. There exists 2 > 0 such that |x c| < 2 implies
|f (x) f (c)| < 2|g(c)| . (Here we are assuming that g(c) = 0. If g(c) = 0 then we
simply dont have the second term in our expression and the argument is similar
but easier.) Taking = min 1 , 2 , 3 , for |x c| < then |x c| is less than 1 , 2
and 3 so
|f (x)g(x) f (c)g(c)| |f (x)||g(x) g(c)| + |g(c)||f (x) f (c)|
+ |g(c)|
= + = .
< (|f (c)| + 1)
2(|f (c)| + 1)
2|g(c)|
2 2
d) Since fg = f g1 , in light of part c) it will suce to show that if g is continuous
at c and g(c) = 0 then g1 is continuous at c. Fix > 0. We must show that there
exists > 0 such that |x c| < implies
1
1
|
| < .
g(x) g(c)
Now
1
1
|g(c) g(x)|
|g(x) g(c)|
|
|=
=
.
g(x) g(c)
|g(x)||g(c)|
|g(x)||g(c)|
Since g is continuous at x = c, we can make the numerator |g(x) g(c)| as small
as we like by taking x suciently close to c. This will make the entire fraction as
small as we like provided the denominator is not also getting arbitrarily small as x
approaches c. But indeed, since g is continuous at c and g(c) = 0, the denominator
is approaching |g(c)|2 = 0. Thus again we have a quantity which we can make
arbitarily small times a bounded quantity, so it can be made arbitrarily small!
Now formally:
We apply Lemma 4.6b) with = 12 : there exists 1 > 0 such that |x c| < 1
implies |g(x)| |g(c)|
and thus also
2
1
2
.
|g(x)||g(c)|
|g(c)|2
(
)
2
Also there exists 2 > 0 such that |x c| < 2 implies |g(x) g(c)| < |g(c)|
.
2
Taking = min(1 , 2 , |x c| < implies
(
)
(
)
1
1
1
2
|g(c)|2
|
|=
|g(x) g(c)| <
= .
g(x) g(c)
|g(x)||g(c)|
|g(c)|2
2
e) Fix > 0. Since g(y) is continuous at y = f (c), there exists > 0 such that
|y f (c)| < implies |g(y) g(f (c))| < . Moreover, since f is continuous at c,
there exists > 0 such that |xc| < implies |f (x)f (c)| < . Thus, if |xc| < ,
|f (x) f (c)| = |y f (c)| < and hence
|g(f (x)) g(f (c))| = |g(y) g(f (c))| < .
Corollary 4.8. All rational functions are continuous.
Proof. Since rational functions are built out of constant functions and the
identity by repeated addition, multiplication and division, this follows immediately
from Theorem 4.7.
64
Other elementary functions: unfortunately if we try to go beyond rational functions to other elementary functions familiar from precalculus, we run into the issue
that we have not yet given complete, satisfactory denitions
of these functions! For
instance, take even the relatively innocuous f (x) = x. We want this function to
have domain [0, ), but this uses the special property of R that every non-negative
number has a square root: we havent proved this yet! If > 0is irrational we
have not given any denition of the power function x . Similarly we do not yet
have rigorous denitions of ax for a > 1, log x, sin x and cos x, so we are poorly
placed to rigorously prove their continuity. However (following Spivak) in order
so as not to drastically limit the supply of functions to appear in our examples
and exercises, we will proceed for now on the assumption that all the above
elementary functions are continuous. We hasten to make two remarks:
Remark 4.5: This assumption can be justied! That is, all the elementary functions above are indeed continuous atevery point of their domain (with the small
proviso that for power functions like x we will need to give a separate denition
of continuity at an endpoint of an interval, coming up soon). And in fact we will
prove this later in the course...much later.
Remark 4.6: We will not use the continuity of the elementary functions as an
assumption in any of our main results (but only in results and examples explicitly
involving elementary functions; e.g. we will use the assumed continuity of the sine
function to dierentiate it). Thus it will be clear that we are not arguing circularly
when we nally prove the continuity of these functions.
5. Limits Done Right
5.1. The Formal Denition of a Limit.
In order to formally dene limits, it is convenient to have the notion of a deleted
interval Ic, about a point c, namely a set of real numbers of the form
0 < |x c| <
for some > 0. Thus Ic consists of (c , c) together with the points (c, c + ),
or more colloquially it contains all points suciently close to c but not equal to c.
Now comes the denition. For real numbers c and L and a function f : D R R,
we say limxc f (x) = L if for every > 0 there exists > 0 such that for all x in
the deleted interval Ic, i.e., for all x with 0 < |x c| < f is dened at x and
|f (x) L| < .
Among all the many problems of limits, perhaps the following is the most basic
and important.
Theorem 4.9. The limit at a point is unique (if it exists at all): that is, if L
and M are two numbers such that limxc f (x) = L and limxc f (x) = M , then
L = M.
Proof. Seeking a contradiction, we suppose L = M ; it is no loss of generality
to suppose that L < M (otherwise switch L and M ) and we do so. Now we take
= M 2L in the denition of limit: since limxc f (x) = L, there exists 1 > 0 such
65
that 0 < |x c| < 1 implies |f (x) L| < M 2L ; and similarly, since limxc f (x) =
M , there exists 2 > 0 such that 0 < |x c| < 2 implies |f (x) M | < M 2L .
Taking = min(1 , 2 , then, as usual, for 0 < |x c| < we get both inequalities:
M L
|f (x) L| <
2
M L
|f (x) M | <
.
2
However these inequalities are contradictory! Before we go further we urge the
reader to draw a picture to see that the vertical strips dened by the two inequalities above are disjoint: they have no points in common. Let us now check
this formally: since |f (x) L| < M 2L , f (x) < L + M 2L = M 2+L . On the other
hand, since |f (x) M | < M 2L , f (x) > M M 2L = M 2+L . Clearly there is not
a single value of x such that f (x) is at the same time greater than and less than
M +L
2 , let alone a deleted interval around c of such values of x, so we have reached
a contradiction. Therefore L = M .
We have now given a formal denition of continuity at a point and also a formal
denition of limits at a point. Previously though we argued that each of limits and
continuity can be dened in terms of the other, so we are now in an overdetermined situation. We should therefore check the compatibility of our denitions.
Theorem 4.10. Let f : D R R, and let c R.
a) f is continuous at x = c if and only if f is dened at c and limxc f (x) = f (c).
b) limxc f (x) = L i f is dened in some deleted interval Ic, around x and,
dening f on (c , c + ) by f (x) = f (x), x = c, f (c) = L makes f continuous at
x = c.
Proof. All the pedagogical advantage here comes from working through this
yourself rather than reading my proof, so I leave it to you.
5.2. Basic Properties of Limits.
Most of the basic properties of continuity discussed above have analogues for limits.
We state the facts in the following result.
Theorem 4.11. Let f and g be two functions dened in a deleted interval Ic,
of a point c. We suppose that limxc f (x) = L and limxc g(x) = M .
a) For any constant A, limxc Af (x) = AL.
b) We have limxc f (x) + g(x) = L + M .
c) We have limxc f (x)g(x) = LM .
(x)
L
d) If M = 0, then limxc fg(x)
=M
.
We leave the proof to the reader. It is possible to prove any/all of these facts in
one of two ways: (i) by rephrasing the denition of limit in terms of continuity and
appealing to Theorem 4.7 above, or (ii) adapting the proofs of Theorem 4.7 to the
current context.
Now what about limits of composite functions? The natural analogue of Theorem 4.7e) above would be the following:
If limxc f (x) = L and limxL g(x) = M , then limxc g(f (x)) = M .
66
xc
xc
In other words, one can pull a limit through a continuous function. In this form
the result is actually a standard one in freshman calculus.
It happens that one can say exactly when the above statement about limits of
composite funtions holds. I dont plan on mentioning this in class and you neednt
keep it in mind or even read it, but I recently learned that this question has a rather
simple answer so I might as well record it here so I dont forget it myself.
Theorem 4.13. (Marjanovic [MK09]) Suppose limxc f (x) = L and limxL g(x) =
M . The following are equivalent:
(i) limxc g(f (x)) = M .
(ii) At least one of the following holds:
a) g is continuous at L.
b) There exists > 0 such that for all 0 < |x c| < , f (x) = L.
Proof. (i) = (ii): We will argue by contradiction: suppose that neither a)
nor b) holds; we will show that limxc g(f (x)) = M . Indeed, since b) dooes not
hold, for every > 0 there exists x with 0 < |x c| < such that f (x) = L. For
such x we have g(f (x)) = g(L). But since a) does not hold, g is not continuous at
L, i.e., M = g(L). Thus g(f (x)) = g(L) = M . Taking = |g(L) M | this shows
that there is no > 0 such that 0 < |x c| < implies |g(f (x)) M | < , so
limxc g(f (x)) = M .
(ii) = (i). The case in which a) holds i.e., g is continuous at L is precisely
Theorem 4.12. So it suces to assume that b) holds: there exists some > 0 such
that 0 < |x c| < implies f (x) = L. Now x > 0; since limxL g(x) = M ,
there exists > 0 such that 0 < |y L| < implies |g(y) M | < . Similarly (and
familiarly), since limxc f (x) = L, there exists 1 > 0 such that 0 < |x c| < 1
implies |f (x)L| < . Here is the point: for 0 < |xc| < 1 , we have |f (x)L| < .
67
If in addition 0 < |f (x) L| < , then we may conclude that |g(f (x)) M | < .
So our only concern is that pehaps f (x) = L for some c with 0 < |x c| < 1 ,
and this is exactly what the additional hypothesis b) allows us to rule out: if we
take = min(1 , ) then 0 < |x c| < implies 0 < |f (x) L| < and thus
|g(f (x)) M | < .
Remark 5.2: The nice expository article [MK09] gives some applications of the
implication (ii)b) = (i) of Theorem 4.13 involving making an inverse change of
variables to evaluate a limit. Perhaps we may revisit this point towards the end of
the course when we talk about inverse functions.
5.3. The Squeeze Theorem and the Switching Theorem.
Theorem 4.14. (Squeeze Theorem) Let m(x), f (x) and M (x) be dened on
some deleted interval I = (c , c + ) {c} about x = c. We suppose that:
(i) For all x I , m(x) f (x) M (x), and
(ii) limxc m(x) = limxc M (x) = L.
Then limxc f (x) = L.
Proof. Fix > 0. There exists 1 > 0 such that 0 < |x c| < 1 implies
|m(x) L| < and 2 > 0 such that 0 < |x c| < 2 implies |M (x) L| < . Let
= min(1 , 2 ). Then 0 < |x c| < implies
f (x) M (x) < L +
and
f (x) m(x) > L ,
so L < f (x) < L + , or equivalently |f (x) L| < .
x = 0 and
Example 5.3: For 0, dene f : R R by f (x) = x
f (0) = 0. By our assumption about the continuity of sign and our results on
continuity of rational functions and compositions of continuous functions, f is
continuous at all x = 0. We claim that f is continuous at x = 0 i > 0. . . .
sin( x1 ),
1
2
68
x
times the area of the unit circle, or
the circular sector is 2
1
1 sin x
of T2 is 2 tan x = 2 cos x . This gives us the inequalities
x
2
x
2.
The area
1
1
1 sin x
cos x sin x x
,
2
2
2 cos x
or equivalently, for x = 0,
x
1
.
sin x
cos x
Taking reciprocals this inequality is equivalent to
sin x
1
cos x.
cos x
x
Now we may apply the Squeeze Theorem: since cos x is continuous at 0 and takes
the value 1 = 0 there, we have
1
lim
= lim cos x = 1.
x0 cos x
x0
Therefore the Squeeze Theorem implies
sin x
(17)
lim
= 1.
x0 x
cos x
x
Example 5.6: We will evaluate limx0 1cos
= 0. The idea is to use trigonometric
x
identities to reduce this limit to an expression involving the limit (17). Here goes:
(
)
1 cos x
1 cos x 1 + cos x
cos2 x 1
sin2 x
lim
= lim
= lim
= lim
x0
x0
x0 x(1 + cos x)
x0 x(1 + cos x)
x
x
1 + cos x
(
)(
)
sin x
0
sin x
= lim
lim
=1(
) = 0.
x0 x
x0 1 + cos x
2
x
= 0 = 0. In summary:
Of course we also have limx0 cos xx1 = limx0 1cos
x
(18)
lim
x0
1 cos x
cos x 1
= lim
= 0.
x0
x
x
Before doing the next two examples we remind the reader of the composite angle formulas from trigonometry: for any real numbers x, y,
sin(x + y) = sin x cos y + cos x sin y,
cos(x + y) = cos x cos y sin x sin y.
Example 5.7: If f (x) = sin x, then we claim f (x) = cos x. Indeed
f (x + h) f (x)
sin(x + h) sin x
sin x cos h + cos x sin h sin x
f (x) = lim
= lim
= lim
h0
h0
h0
h
h
h
(
)
(
)
1 cos h
sin h
= sin x lim
+ cos x lim
= ( sin x) 0 + (cos x) 1 = cos x.
h0
h0 h
h
Example 5.8: If f (x) = cos x, then we claim f (x) = sin x. Indeed
f (x + h) f (x)
cos(x + h) cos x
= lim
=
h0
h
h
cos x cos h sin x sin h cos x
lim
h0
h
f (x) = lim
h0
69
(
= cos x
)
(
)
1 cos h
sin h
sin x lim
= ( cos x)0+( sin x)1 = sin x.
h0
h0 h
h
lim
A similar phenomenon arises when we consider the function f (x) = x(1 x),
which has natural domain [0, 1]. The function is dened at 1 but not in any open
interval containing 1: only intervals of the form (1 , 1].
70
x is right continuous at x = 0.
As above, it is necessary to require left/right continuity when discussing behavior at right/left endpoints of an interval. On the other hand one may still discuss
left/right continuity at interior points of an interval, and it is sometimes helpful to
do so.
Example: Let f (x) = x be the greatest integer function. Then f is continuous at c for all c R \ Z, whereas for any c Z, f is right continuous but not left
continuous at c.
This example suggests the following simple result.
Proposition 4.16. For a function f : D R R and c D, the following
are equivalent:
(i) f is left continuous at c and right continuous at c.
(ii) f is continuous at c.
We leave the proof to the reader.
In a similar way we can dene one-sided limits at a point c.
We say limxc f (x) = L and read this as the limit as x approaches c from
the left of f (x) is L if for all > 0 there exists > 0 such that for all x with
c < x < c + , |f (x) L| < .
We say limxc+ f (x) = L and read this as the limit as x approaches c from
the right of f (x) is L if for all > 0, there exists > 0 such that for all x with
c < x < c, |f (x) L| < .
Proposition 4.17. For a function f : D R R and c D, the following
are equivalent:
71
(i) The left hand and right hand limits at c exist and are equal.
(ii) limxc f (x) exists.
Again we leave the proof to the reader.
Example: Let f (x) = x be the greatest integer function, and let n Z. Then
limxn f (x) = n 1 and limxn+ f (x) = n, so f is not continuous at n.
There is some terminology here not essential, but sometimes useful. If for a
function f and c R, the left and right hand limits at c both exist but are unequal,
we say that f has a jump discontinuity at c. If the left and right hand limts at
c both exist and are equal i.e., if limxc f (x) = L exists but still f (x) is not
continuous c (this can happen if either f (c) = L or, more plausibly, if c is not in
the domain of f ) we say that f has a removable discontinuity at c. This terminology comes from our earlier observation that if we (re)dene f at c to be the
limiting value L then f becomes continuous at c. One sometimes calls a discontinuity which is either removable or a jump discontinuity a simple discontinuity: i.e.,
this is the case whenever both one-sided limits exist at c but f is not continuous at c.
Innite limits: Consider limx0 x12 . This limit does not exist: indeed, if it did,
then there would be some deleted interval I0, on which f is bounded, whereas
just the opposite is happening: the closer x is to 0, the larger f (x) becomes. In
freshman calculus we would say limx0 f (x) = . And we still want to say that,
but in order to know what we mean when we say this we want to give an - style
denition of this. Here it is:
We say limxc f (x) = if for all M R, there exists > 0 such that 0 <
|x c| < = f (x) > M .
Geometrically, this is similar to the - denition of limit, but instead of picking two horizontal lines arbitrarily close to y = L, we pick one horizontal line which
is arbitrarily large and require that on some small deleted interval the graph of
y = f (x) always lie above that line. Similarly:
We say limxc f (x) = if for all m R, there exists > 0 such that 0 <
|x c| < implies f (x) < m.
Example: Let us indeed prove that limx0 x12 = . Fix M R. We need to
nd such that 0 < |x| < implies x12 > M . It is no loss of generality to assume
M > 0 (why?). Then x12 > M |x| < 1M , so we may take = 1M .
CHAPTER 5
Dierentiation
1. Dierentiability Versus Continuity
Recall that a function f : D R R is dierentiable at a D if
f (a + h) f (a)
h0
h
lim
exists, and when this limit exists it is called the derivative f (a) of f at a. Moreover, the tangent line to y = f (x) at f (a) exists if f is dierentiable at a and is
the unique line passing through the point (a, f (a)) with slope f (a).
Note that an equivalent denition of the derivative at a is
lim
xa
f (x) f (a)
.
xa
One can see this by going to the - denition of a limit and making the substitution h = x a: then 0 < |h| < 0 < |x a| < .
Theorem 5.1. Let f : D R R be a function, and let a D. If f is
dierentiable at a, then f is continuous at a.
Proof. We have
f (x) f (a)
(x a)
xa
(
)
)
f (x) f (a) (
= lim
lim x a = f (a) 0 = 0.
xa
xa
xa
lim f (x) f (a) = lim
xa
xa
Thus
0 = lim (f (x) f (a)) =
xa
)
lim f (x) f (a),
xa
so
lim f (x) = f (a).
xa
The converse of Theorem 5.1 is far from being true: a function f which is continuous at a need not be dierentiable at a. An easy example is f (x) = |x| at a = 0.
In fact the situation is worse: a function f : R R can be continuous everywhere yet still fail to be dierentiable at many points. One way of introducing
points of non-dierentiability while preserving continuity is to take the absolute
value of a dierentiable function.
73
74
5. DIFFERENTIATION
f (a + h)g(a + h) f (a)g(a)
h
f (a + h)g(a + h) f (a)g(a + h) + (f (a)g(a + h) f (a)g(a))
= lim
h)
)(
(
)
( h0
g(a + h) g(a)
f (a + h) f (a)
lim g(a + h) + f (a) lim
.
= lim
h0
h0
h0
h
h
Since g is dierentiable at a, g is continuous at a and thus limh0 g(a + h) =
limx ag(x) = g(a). The last expression above is therefore equal to
(f g) (a) = lim
h0
2. DIFFERENTIATION RULES
75
Dimensional analysis and the product rule: Leibniz kept a diary in which he
recorded his scientic discoveries, including his development of dierential calculus.
One day he got to the product rule and wrote a formula equivalent to
(f g) = f g .
But this entire entry of the diary is crossed out. Three days later there is a new
entry with a correct statement of the product rule and the derivation, which Leibniz
says he has known for some time.
I confess that I have taken this story from a third-party account. Every once
in a while I think about trying to verify it, but I am always stopped by the fact
that if it turns out to be apocryphal, I dont really want to know: its too good
a story! It is fundamentally honest about the way mathematics is done and, especially, the way new mathematics is created. That is, to go forward we make
guesses. Often we later realize that our guesses were not only wrong but silly, to
the extent that we try to hide them from others and perhaps even from ourselves.
But the process of silly guesswork is vital to mathematics, and if the great genius Gottfried von Leibniz was not above them, what chance do the rest of us have?
It is natural, like Leibniz, to want to hide our silly guesses. However, they play an
important role in teaching and learning. Many veteran instructors of higher mathematics come to lament the fact that for the sake of eciency and coverage of the
syllabus we often give the student the correct answer within minutes or seconds of
asking the question (or worse, do not present the question at all!) and thus deprive
them of the opportunity to make a silly guess and learn from their mistake. What
can we learn from Leibnizs guess that (f g) = f g ? Simply to write down the
correct product rule is not enough: I want to try to persuade you that Leibnizs
guess was not only incorrect but truly silly: that in some sense the product rule
could have turned out to be any number of things, but certainly not (f g) = f g .
For this I want to follow an approach I learned in high school chemistry: dimensional analysis. We can give physically meaningful dimensions (the more
common, but less precise, term is units) to all our quantities, and then our formulas must manipulate these quantities in certain ways or they are meaningless.
For a simple example, suppose you walk into an empty room and nd on the
blackboard a drawing of a cylinder with its radius labelled r and its height labelled
h. Below it are some formulas, and this one jumps out at you: 2(rh + r3 ). You
can see right away that something has gone wrong: since r and h are both lengths
say in meters rh denotes a product of lengths say, square meters. But r3 is
a product of three lengths i.e., cubic meters. How on earth will we get anything
physically meaningful by adding square meters to cubic meters? It is much more
likely that the writer meant 2(rh + r2 ), since that is a meaningful quantity of dimension length squared (i.e., area). In fact, it happens to be the correct formula for
the surface area of the cylinder with the top and bottom faces included, although
dimensional considerations wont tell you that.
Something similar can be applied to the formula (f g) = f g . Let f = f (t)
and view t as being time, say measured in seconds. Since f (t) is a limit of quo(t)
tients f (t+h)f
, its dimension is the dimension of f divided by time. Now suppose
h
76
5. DIFFERENTIATION
both f (t) and g(t) are lengths, say in meters. Then f (t) and g (t) both have units
meters per second, so f (t) g (t) has units meters squared per second squared.
On the other hand, the units of (f g) are meters squared per second. Thus the
formula (f g) = f g is asserting that some number of meters squared per second
is equal to some number of meters squared per second squared. Thats not only
wrong, its a priori meaningless.
By contrast the correct formula (f g) = f g + f g makes good dimensional
sense: as above, (f g) is meters squared per second; so is f g and so is f g , so we
can add them to get a meaningful number of meters squared per second. Thus the
formula at least makes sense, which is good because above we proved it to be correct.
Taking these ideas more seriously suggests that we should look for a proof of the
product rule which explicitly takes into account that both sides are rates of change
of areas. This is indeed possible, but we omit it for now.
Suppose we want to nd the derivative of a function which is a product of not
two but three functions whose derivatives we already know, e.g. f (x) = x sin xex .
We can of course? still use the product rule, in two steps:
f (x) = (x sin xex ) = ((x sin x)ex ) = (x sin x) ex + (x sin x)(ex )
= (x sin x + x(sin x) )ex + x sin xex = sin x + x cos xex + x sin xex .
Note that we didnt use the fact that our three dierentiable functions were x,
sin x and ex until the last step, so the same method shows that for any three
functions f1 , f2 , f3 which are all dierentiable at x, the product f = f1 f2 f3 is also
dierentiable at a and
f (a) = f1 (a)f2 (a)f3 (a) + f1 (a)f2 (a)f3 (a) + f1 (a)f2 (a)f3 (a).
The following result rides this train of thought to its nal destination.
Theorem 5.7. (Generalized Product Rule) Let n 2 be an integer, and let
f1 , . . . , fn be n functions which are all dierentiable at a. Then f = f1 fn is also
dierentiable at a, and
(19)
Proof. By induction on n.
Base Case (n = 2): This is precisely the ordinary Product Rule (Theorem 5.6).
Induction Step: Let n 2 be an integer, and suppose that the product of any n
functions which are each dierentiable at a R is dierentiable at a and that the
derivative is given by (19). Now let f1 , . . . , fn , fn+1 be functions, each dierentiable
at a. Then by the usual product rule
(f1 fn fn+1 ) (a) = ((f1 fn )fn+1 ) (a) = (f1 fn ) (a)fn+1 (a)+f1 (a) fn (a)fn+1
(a).
(f1 (a)f2 (a) fn (a) + . . . + f1 (a) fn1 (a)fn (a)) fn+1 (a)+f1 (a) fn (a)fn+1
(a)
2. DIFFERENTIATION RULES
77
Example: We may use the Generalized Product Rule to give a less computationally
intensive derivation of the power rule
(xn ) = nxn1
for n Z+ . Indeed, taking f1 = = fn = x, we have f (x) = xn = f1 fn , so
applying the Generalized Power rule we get
(xn ) = (x) x x + . . . + x x(x) .
Here in each term we have x = 1 multipled by n 1 factors of x, so each term
evalutes to xn1 . Moreover we have n terms in all, so
(xn ) = nxn1 .
No need to dirty our hands with binomial coecients!
Theorem 5.8. (Quotient Rule) Let f and g be functions which are both dierentiable at a R, with g(a) = 0. Then fg is dierentiable at a and
( )
g(a)f (a) f (a)g (a)
f
(a) =
.
g
g(a)2
Proof. Step 0: First observe that since g is continuous and g(a) = 0, there is
some interval I = (a , a + ) about a on which g is nonzero, and on this interval
f
g is dened. Thus it makes sense to consider the dierence quotient
f (a + h)/g(a + h) f (a)/g(a)
h
for h suciently close to zero.
Step 1: We rst establish the Reciprocal Rule, i.e., the special case of the Quotient Rule in which f (x) = 1 (constant function). Then
( )
1
1
1
g(a+h) g(a)
(a) = lim
h0
g
h
(
)(
)
g(a) g(a + h)
g(a + h) g(a)
1
g (a)
= lim
= lim
lim
=
.
h0 hg(a)g(a + h)
h0
h0 g(a)g(a + h)
h
g(a)2
We have again used the fact that if g is dierentiable at a, g is continuous at a.
Step 2: We now derive the full Quotient Rule by combining the Product Rule and
the Reciprocal Rule. Indeed, we have
( )
(
)
( )
f
1
1
1
(a) = f
(a) = f (a)
+ f (a)
(a)
g
g
g(a)
g
=
f (a)
g (a)
g(a)f (a) g (a)f (a)
f (a)
=
.
g(a)
g(a)2
g(a)2
78
5. DIFFERENTIATION
Theorem 5.10. (Chain Rule) Let f and g be functions, and let a R be such
that f is dierentiable at a and g is dierentiable at f (a). Then the composite
function g f is dierentiable at a and
(g f ) (a) = g (f (a))f (a).
Proof. Motivated by Leibniz notation, it is tempting to argue as follows:
(
) (
)
g(f (x)) g(f (a))
g(f (x)) g(f (a))
f (x) f (a)
(g f ) (a) = lim
= lim
xa
xa
xa
f (x) f (a)
xa
(
)(
)
g(f (x)) g(f (a))
f (x) f (a)
= lim
lim
xa
xa
f (x) f (a)
xa
(
)(
)
g(f (x)) g(f (a))
f (x) f (a)
=
lim
lim
= g (f (a))f (a).
xa
f (x) f (a)
xa
f (x)f (a)
The replacement of limxa . . . by limf (x)f (a) . . . in the rst factor above is
justied by the fact that f is continuous at a. However, this argument has a gap:
when we multiply and divide by f (x) f (a), how do we know that we are not
dividing by zero?? The answer is that we cannot rule this out: it is possible for
f (x) to take the value f (a) on arbitarily small deleted intervals around a: again,
this is exactly what happens for the function f (x) of the above example near a = 0.
I maintain that the above gap can be mended so as to give a complete proof.
The above argument is valid unless for all > 0, there is x with 0 < |x a| <
such that f (x) f (a) = 0. In this case, it follows from Lemma 5.9 that if
lim
xa
f (x) f (a)
xa
exists at all, it must be equal to 0. But we are assuming the above limit exists, since
we are assuming f is dierentiable at a. So f (a) = 0, and therefore, since the Chain
Rule reads (g f ) (a) = g (f (a))f (a), here we are trying to show (g f ) (a) = 0.
For x R we have two possibilities: the rst is f (x) f (a) = 0, in which case
also g(f (x)) g(f (a)) = g(f (a)) g(f (a)) = 0, so the dierence quotient is zero at
these points. The second is f (x) f (a) = 0, in which case
g(f (x)) g(f (a)) =
f (x) f (a)
xa
holds, and the above argument shows this expression tends to g (f (a))f (a) = 0
(a))
as x a. So whichever holds, the dierence quotient g(f (x))g(f
is close to (or
xa
equal to!) zero. Thus the limit tends to zero no matter which alternative obtains.
Somewhat more formally, if we x > 0, then the rst step of the argument
shows that there is > 0 such that for all x with 0 < |x a| < such that
(a))
f (x) f (a) = 0, | g(f (x))g(f
| < . On the other hand, when f (x) f (a) = 0,
xa
(a))
| = 0, so it is certainly less than ! Therefore, all in all we have
then | g(f (x))g(f
xa
(a))
0 < |x a| < = | g(f (x))g(f
| < , so that
xa
lim
xa
3. OPTIMIZATION
79
3. Optimization
3.1. Intervals and interior points.
At this point I wish to digress to formally dene the notion of an interval on
the real line and and interior point of the interval. . . .
3.2. Functions increasing or decreasing at a point.
Let f : D R be a function, and let a be an interior point of D. We say that f is
increasing at a if for all x suciently close to a and to the left of a, f (x) < f (a)
and for all x suciently close to a and to the right of a, f (x) > f (a). More formally
phrased, we require the existence of a > 0 such that:
for all x with a < x < a, f (x) < f (a), and
for all x with a < x < a + , f (x) > f (a).
We say f is decreasing at a if there exists > 0 such that:
for all x with a < x < a, f (x) > f (a), and
for all x with a < x < a + , f (x) < f (a).
We say f is weakly increasing at a if there exists > 0 such that:
for all x with a < x < a, f (x) f (a), and
for all x with a < x < a + , f (x) f (a).
Exercise: Give the denition of f is decreasing at a.
Exercise: Let f : I R, and let a be an interior point of I.
a) Show that f is increasing at a i f is decreasing at a.
b) Show that f is weakly increasing at a i f is weakly decreasing at a.
Example: Let f (x) = mx + b be the general linear function. Then for any a R:
f is increasing at a i m > 0, f is weakly increasing at a i m 0, f is decreasing
at a i m < 0, and f is weakly decreasing at a i m 0.
Example: Let n be a positive integer, let f (x) = xn . Then:
If x is odd, then for all a R, f (x) is increasing at a.
If x is even, then if a < 0, f (x) is decreasing at a, if a > 0 then f (x) is increasing
at a. Note that when n is even f is neither increasing at 0 nor decreasing at 0
because for every nonzero x, f (x) > 0 = f (0).1
If one looks back at the previous examples and keeps in mind that we are supposed to be studying derivatives (!), one is swiftly led to the following fact.
Theorem 5.11. Let f : I R. Suppose f is dierentiable at a I .
a) If f (a) > 0, then f is increasing at a.
b) If f (a) < 0, then f is decreasing at a.
1We do not stop to prove these assertions as it would be inecient to do so: soon enough
we will develop the right tools to prove stronger assertions. But when given a new denition, it is
always good to nd ones feet by considering some examples and nonexamples of that denition.
80
5. DIFFERENTIATION
f (x) f (a)
< 2f (a).
xa
(a)
> 0, so: if x > a, f (x)f (a) >
In particular, for all x with 0 < |xa| < , f (x)f
xa
0, i.e., f (x) > f (a); and if x < a, f (x) f (a) < 0, i.e., f (x) < f (a).
b) This is similar enough to part a) to be best left to the reader as an exercise.2
c) If f (x) = x3 , then f (0) = 0 but f is increasing at 0. If f (x) = x3 , then
f (0) = 0 but f is decreasing at 0. If f (x) = x2 , then f (0) = 0 but f is neither
increasing nor decreasing at 0.
3. OPTIMIZATION
81
82
5. DIFFERENTIATION
Exercise: State and prove a version of Theorem 5.14 for the maximum value.
Often one lumps cases (ii) and (iii) of Theorem 5.14 together under the term critical point (but there is nothing very deep going on here: its just terminology).
Clearly there are always exactly two endpoints. In favorable circustances there
will be only nitely many critical points, and in very favorable circumstances they
can be found exactly: suppose they are c1 , . . . , cn . (There may not be any critical
83
points, but that only makes things easier...) Suppose further that we can explicitly
compute all the values f (a), f (b), f (c1 ), . . . , f (cn ). Then we win: the largest of
these values is the maximum value, and the smallest is the minimum value.
Example: Let f (x) = x2 (x 1)(x 2) = x4 3x3 + 2x2 . Above we argued
that there is a such that |x| > = |f (x)| 1: lets nd such a explicitly.
We intend nothing fancy here:
f (x) = x4 3x2 + 2x2 x4 3x3 = x3 (x 3).
So if x 4, then
x3 (x 3) 43 1 = 64 1.
On the other hand, if x < 1, then x < 0, so 3x3 > 0 and thus
f (x) x4 + 2x2 = x2 (x2 + 2) 1 3 = 3.
Thus we may take = 4.
Now let us try the procedure of Theorem 5.14 out by nding the maximum and
minimum values of f (x) = x4 3x3 + 2x2 on [4, 4].
Since f is dierentiable everywhere on (4, 4), the only critical points will be the
stationary points, where f (x) = 0. So we compute the derivative:
f (x) = 4x3 9x2 + 4x = x(4x2 9x + 4).
9 17
,
8
or, approximately,
x1 0.6094 . . . , x2 = 1.604 . . . .
f (x1 ) = 0.2017 . . . , f (x2 ) = 0.619 . . . .
Also we always test the endpoints:
f (4) = 480, f (4) = 96.
So the maximum value is 480 and the minimum value is .619 . . ..
4. The Mean Value Theorem
4.1. Statement of the Mean Value Theorem.
Our goal in this section is to prove the following important result.
Theorem 5.15. (Mean Value Theorem) Let f : [a, b] R be continuous on
[a, b] and dierentiable on (a, b). Then there is at least one c with a < c < b and
f (c) =
f (b) f (a)
.
ba
Remark:3 I still remember the calculus test I took in high school in which I was
asked to state the Mean Value Theorem. It was a multiple choice question, and I
didnt see the choice I wanted, which was as above except with the subtly stronger
assumption that fR (a) and fL (b) exist: i.e., f is one-sided dierentiable at both
endpoints. So I went up to the teachers desk to ask about this. He thought for
a moment and said, Okay, you can add that as an answer if you want, and so
as not to give special treatment to any one student, he announced to all that he
3Please excuse this personal anecdote.
84
5. DIFFERENTIATION
was adding a possible answer to the Mean Value Theorem question. So I marked
my added answer, did the rest of the exam, and then had time to come back to
this question. After more thought I decided that one-sided dierentiability at the
endpoints was not required. So in the end I selected this pre-existing choice and
submitted my exam. As you can see, my nal answer was correct. But many other
students gured that if I had successfully lobbied for an additional answer then
my answer was probably correct, so they changed their answer from the correct
answer to my added answer. They were not so thrilled with me or the teacher, but
in my opinion he behaved admirably: talk about a teachable moment!
One should certainly draw a picture to go with the Mean Value Theorem, as it
has a very simple geometric interpretation: under the hypotheses of the theorem,
there exists at least one interior point c of the interval such that the tangent line
at c is parallel to the secant line joining the endpoints of the interval.
And one should also interpret it physically: if y = f (x) gives the position of
(a)
a particle at a given time x, then the expression f (b)f
is nothing less than the
ba
average velocity between time a and time b, whereas the derivative f (c) is the instantaneous velocity at time c, so that the Mean Value Theorem says that there is at
least one instant at which the instantaneous velocity is equal to the average velocity.
Example: Suppose that cameras are set up at checkpoints along an interstate
highway in Georgia. One day you receive timestamped photos of yourself at two
checkpoints. The two checkpoints are 90 miles apart and the second photo is taken
73 minutes after the rst photo. You are issued a ticket for violating the speeed
limit of 70 miles per hour. Your average velocity was (90 miles) / (73 minutes)
(60 minutes) / (hour) 73.94 miles per hour. Thus, although no one saw you
violating the speed limit, they may mathematically deduce that at some point your
instantaneous velocity was over 70 mph. Guilt by the Mean Value Theorem!
4.2. Proof of the Mean Value Theorem.
We will deduce the Mean Value Theorem from the (as yet unproven) Extreme
Value Theorem. However, it is convenient to rst establish a special case.
Theorem 5.16. (Rolles Theorem) Let f : [a, b] R. We suppose:
(i) f is continuous on [a, b].
(ii) f is dierentiable on (a, b).
(iii) f (a) = f (b).
Then there exists c with a < c < b and f (c) = 0.
Proof. By Theorem 5.12, f has a maximum M and a minimum m.
Case 1: Suppose M > f (a) = f (b). Then the maximum value does not occur
at either endpoint. Since f is dierentiable on (a, b), it must therefore occur at a
stationary point: i.e., there exists c (a, b) with f (c) = 0.
Case 2: Suppose m < f (a) = f (b). Then the minimum value does not occur at
either endpoint. Since f is dierentiable on (a, b), it must therefore occur at a
stationary point: there exists c (a, b) with f (c) = 0.
Case 3: The remaining case is m = f (a) = M . Then f is constant and f (c) = 0
at every point c (a, b).
85
To deduce the Mean Value Theorem from Rolles Theorem, it is tempting to tilt
our head until the secant line from (a, f (a)) to (b, f (b)) becomes horizontal and
then apply Rolles Theorem. The possible aw here is that if we start a subset in
the plane which is the graph of a function and rotate it too much, it may no longer
be the graph of a function, so Rolles Theorem does not apply.
The above objection is just a technicality. In fact, it suggests that more is true:
there should be some version of the Mean Value Theorem which applies to curves
in the plane which are not necessarily graphs of functions. Nevertheless formalizing
this argument is more of a digression than we want to make, so the ocial proof
that follows is (slightly) dierent.
Proof of the Mean Value Theorem: Let f : [a, b] R be continuous on [a, b] and
dierentiable on (a, b). There is a unique linear function L(x) such that L(a) = f (a)
and L(b) = f (b): indeed, L is nothing else than the secant line to f between (a, f (a))
and (b, f (b)). Heres the trick: by subtracting L(x) from f (x) we reduce ourselves
to a situation where we may apply Rolles Theorem, and then the conclusion that
we get is easily seen to be the one we want about f . Here goes: dene
g(x) = f (x) L(x).
Then g is dened and continuous on [a, b], dierentiable on (a, b), and g(a) =
f (a) L(a) = f (a) f (a) = 0 = f (b) f (b) = f (b) L(b) = g(b). Applying Rolles
Theorem to g, there exists c (a, b) such that g (c) = 0. On the other hand, since
(a)
L is a linear function with slope f (b)f
, we compute
ba
0 = g (c) = f (c) L (c) = f (c)
and thus
f (c) =
f (b) f (a)
,
ba
f (b) f (a)
.
ba
(20)
86
5. DIFFERENTIATION
h(b) =
cos t
the tangent line to v(t) is xy (t)
(t) = sin t and observing that these two slopes are
opposite reciprocals. Thus we get a derivation using calculus of a familiar (or so I
hope!) fact from elementary geometry.
Here is a version of the Mean Value Theorem for nonsingular parameterized curves.
4Dont be so impressed: we wanted a constant C such that if h(x) = f (x) Cg(x), then
h(a) = h(b), so we set f (a) Cg(a) = f (b) Cg(b) and solved for C.
5. MONOTONE FUNCTIONS
87
x (c)
x(b) x(a)
which says that the tangent line to v at c has the same slope as the secant line
between (x(a), y(a)) and (x(b), y(b)), hence these two lines are parallel.
Case 2: Suppose x (c) = 0, i.e., the tangent line to v at c is vertical. By our
nonsingularity assumption y (c) = 0, so
0 = (y(b) y(a))x (c) = (x(b) x(a))y (c)
implies x(a) = x(b), so the secant line from (x(a), x(b)) to (y(a), y(b)) is vertical,
hence the two lines are parallel.
Conversely, it is possible (indeed, similar) to deduce the Cauchy Mean Value Theorem from the Parametric Mean Value Theorem: try it. Thus really we have one
theorem with two moderately dierent phrasings, and indeed it is common to also
refer to Theorem 5.18 as the Cauchy Mean Value Theorem.
5. Monotone Functions
A function f : I R is monotone if it is weakly increasing or weakly decreasing.
5.1. The Monotone Function Theorems.
The Mean Value Theorem has several important consequences. Foremost of all
it will be used in the proof of the Fundamental Theorem of Calculus, but thats for
later. At the moment we can use it to establish a criterion for a function f to be
monotone on an interval in terms of sign condition on f .
Theorem 5.19. (First Monotone Function Theorem) Let I be an open interval,
and let f : I R be a function which is dierentiable on I.
a) Suppose f (x) > 0 for all x I. Then f is increasing on I: for all x1 , x2 I
with x1 < x2 , f (x1 ) < f (x2 ).
b) Suppose f (x) 0 for all x I. Then f is weakly increasing on I: for all
x1 , x2 I with x1 < x2 , f (x1 ) f (x2 ).
c) Suppose f (x) < 0 for all x I. Then f is decreasing on I: for all x1 , x2 inI
with x1 < x2 , f (x1 ) > f (x2 ).
88
5. DIFFERENTIATION
5. MONOTONE FUNCTIONS
89
90
5. DIFFERENTIATION
(iii) = (i): If f is identically zero on some subinterval [a, b], then by the Zero
Velocity Theorem f is constant on [a, b], hence is not increasing.
The next result follows immediately.
Corollary 5.25. Let f : I R. Suppose f (x) 0 for all x I, and
f (x) > 0 except at a nite set of points x1 , . . . , xn . Then f is increasing on I.
5. MONOTONE FUNCTIONS
91
small > 0, f (x) is negative for x (a , a) and positive for x (a, a + ) and
then apply the First Derivative Test. To see this, consider
f (x) f (a)
f (x)
= lim
.
xa
xa x a
xa
We are assuming that this limit exists and is positive, so that there exists > 0
(x)
such that for all x (a , a) (a, a + ), fxa
is positive. And this gives us exactly
f (a) = lim
f (x)
xa
(x)
> 0 and x a > 0, so
On the other hand, suppose x (a, a + ). Then fxa
f (x) > 0. So f has a strict local minimum at a by the First Derivative Test.
Remark: When f (a) = f (a) = 0, no conclusion can be drawn about the local
behavior of f at a: it may have a local minimum at a, a local maximum at a, be
increasing at a, decreasing at a, or none of the above.
5.4. Sign analysis and graphing.
When one is graphing a function f , the features of interest include number and
approximate locations of the roots of f , regions on which f is positive or negative,
regions on which f is increasing or decreasing, and local extrema, if any. For these
considerations one wishes to do a sign analysis on both f and its derivative f .
Let us agree that a sign analysis of a function g : I R is the determination of regions on which g is positive, negative and zero.
The basic strategy is to determine rst the set of roots of g. As discussed before,
nding exact values of roots may be dicult or impossible even for polynomial
functions, but often it is feasible to determine at least the number of roots and
their approximate location (certainly this is possible for all polynomial functions,
although this requires justication that we do not give here). The next step is to
test a point in each region between consecutive roots to determine the sign.
This procedure comes with two implicit assumptions. Let us make them explicit.
The rst is that the roots of f are sparse enough to separate the domain I into
regions. One precise formulation of of this is that f has only nitely many roots
on any bounded subset of its domain. This holds for all the elementary functions we
know and love, but certainly not for all functions, even all dierentiable functions:
we have seen that things like x2 sin( x1 ) are not so well-behaved. But this is a convenient assumption and in a given situation it is usually easy to see whether it holds.
The second assumption is more subtle: it is that if a function f takes a positive value at some point a and a negative value at some other point b then it must
take the value zero somewhere in between. Of course this does not hold for all
functions: it fails very badly, for instance, for the function f which takes the value
1 at every rational number and 1 at every irrational number.
Let us formalize the desired property and then say which functions satisfy it.
92
5. DIFFERENTIATION
5. MONOTONE FUNCTIONS
93
R is
disat a
each
f (x) f (a)
= lim+ f (cx ) = lim+ f (x) = L.
xa
xa
xa
94
5. DIFFERENTIATION
95
f (g(y)) = f (xy ) = y.
(ii) = (i): Suppose that f 1 exists. To see that f is injective, let x1 , x2 X
be such that f (x1 ) = f (x2 ). Applying f 1 on the left gives x1 = f 1 (f (x1 )) =
f 1 (f (x2 )) = x2 . So f is injective. To see that f is surjective, let y Y . Then
f (f 1 (y)) = y, so there is x X with f (x) = y, namely x = f 1 (y).
For any function f : X Y , we dene the image of f to be {y Y | x X | y =
f (x)}. The image of f is often denoted f (X).5
We now introduce the dirty trick of codomain restriction. Let f : X Y
be any function. Then if we replace the codomain Y by the image f (X), we still
get a well-dened function f : X f (X), and this new function is tautologically
surjective. (Imagine that you manage the up-and-coming band Yellow Pigs. You
get them a gig one night in an enormous room lled with folding chairs. After
everyone sits down you remove all the empty chairs, and the next morning you
write a press release saying that Yellow Pigs played to a packed house. This is
essentially the same dirty trick as codomain restriction.)
Example: Let f : R R by f (x) = x2 . Then f (R) = [0, ), and although
x2 : R R is not surjective, x2 : R [0, ) certainly is.
Since a codomain-restricted function is always surjective, it has an inverse i it
is injective i the original functionb is injective. Thus:
Corollary 5.34. For a function f : X Y , the following are equivalent:
(i) The codomain-restricted function f : X f (X) has an inverse function.
(ii) The original function f is injective.
6.2. The Interval Image Theorem.
Next we want to return to earth by considering functions f : I R and their
inverses, concentrating on the case in which f is continuous.
Theorem 5.35. (Interval Image Theorem) Let I R be an interval, and let
f : I R be a continuous function. Then the image f (I) of f is also an interval.
Proof. For now we will give the proof when I = [a, b], i.e., is closed and
bounded. The general case will be discussed later.
Suppose f : [a, b] R is continuous. Then f has a minimum value m, say at
xm and a maximum value M , say at xM . Thus the image f ([a, b]) of f is a subset
of [m, M ]. Moreover, if L (m, M ), then by the Intermediate Value Theorem there
exists c in between xm and xM such that f (c) = L. So f ([a, b]) = [m, M ].
Exercise: Let I be a nonempty interval which is not of the form [a, b]. Let J be any
nonempty interval. Show: there is a continuous function f : I R with f (I) = J.
6.3. Monotone Functions and Invertibility.
Recall f : I R is strictly monotone if it is either increasing or decreasing.
Every strictly monotone function is injective. Therefore our dirty trick of codomain
restriction works to show that if f : I R is strictly monotone, f : I f (I) is
5This is sometimes called the range of f , but sometimes not. It is safer to call it the image!
96
5. DIFFERENTIATION
bijective, hence invertible. Thus in this sense we may speak of the inverse of any
strictly monotone function.
Proposition 5.36. Let f : I f (I) be a strictly monotone function.
a) If f is increasing, then f 1 : f (I) I is increasing.
b) If f is decreasing, then f 1 : f (I) I is decreasing.
Proof. As usual, we will content ourselves with the increasing case, the decreasing case being so similar as to make a good exercise for the reader.
Seeking a contradiction we suppose that f 1 is not increasing: that is, there
exist y1 < y2 f (I) such that f 1 (y1 ) is not less than f 1 (y2 ). Since f 1 is an
inverse function, it is necessarily injective (if it werent, f itself would not be a
function!), so we cannot have f 1 (y1 ) = f 1 (y2 ), and thus the possibility we need
to rule out is f 1 (y2 ) < f 1 (y1 ). But if this holds we apply the increasing function
f to get y2 = f (f 1 (y2 )) < f (f 1 (y1 )) = y1 , a contradiction.
Lemma 5.37. (-V Lemma) Let f : I R. The following are equivalent:
(i) f is not monotone: i.e., f is neither increasing nor decreasing.
(ii) At least one of the following holds:
(a) f is not injective.
(b) f admits a -conguration: there exist a < b < c I with f (a) < f (b) > f (c).
(c) f admits a V -conguration: there exist a < b < c I with f (a) > f (b) < f (c).
Exercise: Prove Lemma 5.37.
Theorem 5.38. If f : I R is continuous and injective, it is monotone.
Proof. We will suppose that f is injective and not monotone and show that
it cannot be continuous, which suces. We may apply Lemma 5.37 to conclude
that f has either a conguration or a V conguration.
Suppose rst f has a conguration: there exist a < b < c I with f (a) <
f (b) > f (c). Then there exists L R such that f (a) < L < f (b) > L > f (c). If f
were continuous then by the Intermediate Value Theorem there would be d (a, b)
and e (b, c) such that f (d) = f (e) = L, contradicting the injectivity of f .
Next suppose f has a V conguration: there exist a < b < c I such that
f (a) > f (b) < f (c). Then there exists L R such that f (a) > L > f (b) < L < f (c).
If f were continuous then by the Intermediate Value Theorem there would be
d (a, b) and e (b, c) such that f (d) = f (e) = L, contradicting injectivity.
6.4. Inverses of Continuous Functions.
Theorem 5.39. (Continuous Inverse Function Theorem) Let f : I R be
injective and continuous. Let J = f (I) be the image of f .
a) f : I J is a bijection, and thus there is an inverse function f 1 : J I.
b) J is an interval in R.
c) If I = [a, b], then either f is increasing and J = [f (a), f (b)] or f is decreasing
and J = [f (b), f (a)].
d) The function f 1 : J I is also continuous.
Proof. [S, Thm. 12.3] Parts a) through c) simply recap previous results. The
new result is part d), that f 1 : J I is continuous. By part c) and Proposition
5.36, either f and f 1 are both increasing, or f and f 1 are both decreasing. As
usual, we restrict ourselves to the rst case.
97
Let b J. We must show that limyb f 1 (y) = f 1 (b). We may write b = f (a)
for a unique a I. Fix > 0. We want to nd > 0 such that if f (a) < y <
f (a) + , then a < f 1 (y) < a + .
Take = min(f (a + ) f (a), f (a) f (a )). Then:
f (a ) f (a) , f (a) + f (a + ),
and thus if f (a) < y < f (a) + we have
f (a ) f (a) < y < f (a) + f (a + ).
Since f 1 is increasing, we get
f 1 (f (a )) < f 1 (y) < f 1 (f (a + )),
or
f 1 (b) < f 1 (y) < f 1 (b) + .
Remark: To be honest, I dont nd the above proof very enlightening. After reecting on my dissatisfaction with it, I came up with an alternate proof that I nd
conceptually simpler, but which depends on the Monotone Jump Theorem, a
characterization of the possible discontinuities of a monotone function. The proof
uses the completeness of the real numbers, so is postponed to the next chapter.
6.5. Inverses of Dierentiable Functions.
In this section our goal is to determine conditions under which the inverse f 1
of a dierentiable funtion is dierentiable, and if so to nd a formula for (f 1 ) .
Lets rst think about the problem geometrically. The graph of the inverse function y = f 1 (x) is obtained from the graph of y = f (x) by interchanging x and
y, or, put more geometrically, by reecting the graph of y = f (x) across the line
y = x. Geometrically speaking y = f (x) is dierentiable at x i its graph has
a well-dened, nonvertical tangent line at the point (x, f (x)), and if a curve has
a well-dened tangent line, then reecting it across a line should not change this.
Thus it should be the case that if f is dierentiable, so is f 1 . Well, almost. Notice
the occurrence of nonvertical above: if a curve has a vertical tangent line, then
since a vertical line has innite slope it does not have a nite-valued derivative.
So we need to worry about the possibility that reection through y = x carries a
nonvertical tangent line to a vertical tangent line. When does this happen? Well,
1
the inverse function of the straight line y = mx + b is the straight line y = m
(x b)
1
i.e., reecting across y = x takes a line of slope m to a line of slope m . Morever,
it takes a horizontal line y = c to a vertical line x = c, so that is our answer: at
any point (a, b) = (a, f (a)) such that f (a) = 0, then the inverse function will fail
to be dierentiable at the point (b, a) = (b, f 1 (b)) because it will have a vertical
tangent. Otherwise, the slope of the tangent line of the inverse function at (b, a) is
precisely the reciprocal of the slope of the tangent line to y = f (x) at (a, b).
Well, so the geometry tells us. It turns out to be quite straightforward to adapt
this geometric argument to derive the desired formula for (f 1 ) (b), under the assumption that f is dierentiable. We will do this rst. Then we need to come back
98
5. DIFFERENTIATION
(21)
or
(f 1 ) (f (x)) =
1
f (x)
To apply this to get the derivative at b J, we just need to think a little about our
variables. Let a = f 1 (b), so f (a) = b. Evaluating the last equation at x = a gives
1
1
(f 1 ) (b) =
= 1
.
f (a)
f (f (b))
Moreover, since by (21) we have (f 1 ) (b)f (f 1 (b)) = 1, f (f 1 (b)) = 0.
b + h = f (a + kh )
for a unique kh I. Since b + h = f (a + kh ), f 1 (b + h) = a + kh ; lets make this
substitution as well as h = f (a + kh ) f (a) in the limit we are trying to evaluate:
a + kh a
kh
1
(f 1 ) (b) = lim
= lim
= lim f (a+k )f (a) .
h
h0 f (a + kh ) b
h0 f (a + kh ) f (a)
h0
6
kh
We are getting close: the limit now looks like the reciprocal of the derivative of f
at a. The only issue is the pesky kh , but if we can show that limh0 kh = 0, then
we may simply replace the limh0 with limkh 0 and well be done.
6Unlike Spivak, we will include the subscript k to remind ourselves that this k is dened in
h
terms of h: to my taste this reminder is worth a little notational complication.
99
h0
h0
So as h 0, kh 0 and thus
(f 1 ) (b) =
1
limkh 0 f (a+kkhh)f (a)
1
1
= 1
.
f (a)
f (f (b))
7.1. x .
In this section we illustrate the preceding concepts by dening and dierentiat1
ing the nth root function x n . The reader should not now be surprised to hear that
we give separate consideration to the cases of odd n and even n.
Either way, let n > 1 be an integer, and consider
f : R R, x 7 xn .
Case 1: n = 2k +1 is odd. Then f (x) = (2k +1)x2k = (2k +1)(xk )2 is non-negative
for all x R and not identically zero on any subinterval [a, b] with a < b, so by
Theorem 5.24 f : R R is increasing. Moreover, we have limx f (x) = .
Since f is continuous, by the Intermediate Value Theorem the image of f is all of
R. Moreover, f is everywhere dierentiable and has a horizontal tangent only at
x = 0. Therefore there is an inverse function
f 1 : R R
which is everywhere continuous and dierentiable at every x R except x = 0. It
1
is typical to call this function x n .
Case 2: n = 2k is even. Then f (x) = (2k)x2k1 is positive when x > 0 and
negative when x < 0. Thus f is decreasing on (, 0] and increasing on [0, ). In
particular it is not injective on its domain. If we want to get an inverse function,
we need to engage in domain restriction. Unlike codomain restriction, which can
be done in exactly one way so as to result in a surjective function, domain restriction brings with it many choices. Luckily for us, this is a relatively simple case: if
D R, then the restriction of f to D will be injective if and only if for each x R,
at most one of x, x lies in D. If we want the restricted domain to be as large as
possible, we should choose the domain to include 0 and exactly one of x, x for
all x > 0. There are still lots of ways to do this, so lets try to impose another
desirable property of the domain of a function: namely, if possible we would like
it to be an interval. A little thought shows that there are two restricted domains
which meet all these requirements: we may take D = [0, ) or D = (, 0].
100
5. DIFFERENTIATION
We have
f (x) = L (xy)(xy) L (x) =
1
1
y
1
y =
= 0.
xy
x
xy x
By the zero velocity theorem, the function f (x) is a constant (depending, a priori
on y), say Cy . Thus for all x (0, ),
L(xy) = L(x) + L(y) + Cy .
If we plug in x = 1 we get
L(y) = 0 + L(y) + Cy ,
and thus Cy = 0, so L(xy) = L(x) + L(y).
101
E (x)
.
E(x)
E (x) = E(x).
102
5. DIFFERENTIATION
f (x)
E(x) .
f
E
In other words, if there really is a function f (x) = ex out there with f (x) = ex and
f (0) = 1, then we must have ex = E(x) for all x. The point of this logical maneuver
is that although in precalculus mathematics one learns to manipulate and graph
exponential functions, the actual denition of ax for irrational x is not given, and
indeed I dont see how it can be given without using key concepts and theorems of
calculus. But, with the functions E(x) and L(x) in hand, let us develop the theory
of exponentials and logarithms to arbitrary bases.
Let a > 0 be a real number. How should we dene ax ? In the following slightly
strange way: for any x R,
ax := E(L(a)x).
Let us make two comments: rst, if a = e this agrees with our previous denition:
ex = E(xL(e))) = E(x). Second, the denition is motivated by the following
desirable law of exponents: (ab )c = abc . Indeed, assuming this holds unrestrictedly
for b, c R and a > 1, we would have
ax = E(x log a) = ex log a = (elog a )x = ax .
But here is the point: we do not wish to assume that the laws of exponents work
for all real numbers as they do for positive integers...we want to prove them!
Proposition 5.46. Fix a (0, ). For x R, we dene
ax := E(L(a)x).
If a = 1, we dene
L(x)
.
L(a)
a) The function ax is dierentiable and (ax ) = L(a)ax .
1
b) The function loga x is dierentiable and (loga x) = L(a)x
.
x
c) Suppose a > 1. Then a is increasing with image (0, ), loga x is increasing
with image (, ), and ax and loga x are inverse functions.
d) For all x, y R, ax+y = ax ay .
e) For all x > 0 and y R, (ax )y = axy .
f ) For all x, y > 0, loga (xy) = loga x + loga y.
g) For all x > 0 and y R, loga (xy ) = y loga x.
loga (x) =
Proof. a) We have
(ax ) = E(L(a)x) = E (L(a)x)(L(a)x) = E(L(a)x) L(a) = L(a)ax .
b) We have
(loga (x)) =
L(x)
L(a)
)
=
1
.
L(a)x
103
c) Since their derivatives are always positive, ax and loga x are both increasing
functions. Moreover, since a > 1, L(a) > 0 and thus
lim ax = lim E(L(a)x) = E() = ,
L(x)
=
= .
x L(a)
x
L(a)
Thus ax : (, ) (0, ) and loga x : (0, ) (, ) are bijective and thus
have inverse functions. Thus check that they are inverses of each other, it suces
to show that either one of the two compositions is the identity function. Now
lim loga (x) = lim
loga (ax ) =
L(ax )
L(E(L(a)x))
L(a)x
=
=
= x.
L(a)
L(a)
L(a)
d) We have
ax+y = E(L(a)(x + y)) = E(L(a)x + L(a)y) = E(L(a)x)E(L(a)y) = ax ay .
e) We have
(ax )y = E(L(ax )y) = E(L(E(L(a)x))y) = E(L(a)xy) = axy .
f) We have
loga (xy) =
L(xy)
L(x) + L(y)
L(x) L(y)
=
=
+
= loga x + loga y.
L(a)
L(a)
L(a) L(a)
g) We have
loga xy =
L(xy )
L(E(L(x)y))
L(x)y
=
=
= y loga x.
L(a)
L(a)
L(a)
Having established all this, we now feel free to write ex for E(x) and log x for L(x).
Exercise: Suppose 0 < a < 1. Show that ax is decreasing with image (0, ),
loga x is decreasing with image (0, ), and ax and loga x are inverse functions.
Exercise: Prove the change of base formula: for all a, b, c > 0 with a, c = 1,
logc b
loga b =
.
logc a
2
Proposition 5.47. Let f (x) = ex . Then for all n Z+ there exists a polynomial Pn (x), of degree n, such that
2
dn
f (x) = Pn (x)ex .
dxn
Proof. By induction on n.
Base case (n = 1):
2
2
d x2
= 2xex = P1 (x)ex , where P1 (x) = 2x, a degree one polynomial.
dx e
Inductive step: Assume that for some positive integer n there exists Pn (x) of degree
2
dn+1 x2
dn x 2
= Pn (x)ex . So dx
=
n such that dx
ne
n+1 e
2
2
2
2
d dn x2 IH d
e =
Pn (x)ex = Pn (x)ex + 2xPn (x)ex = (Pn (x) + 2xPn (x)) ex .
n
dx dx
dx
104
5. DIFFERENTIATION
Now, since Pn (x) has degree n, Pn (x) has degree n 1 and 2xPn (x) has degree
n + 1. If f and g are two polynomials such that the degree of f is dierent from
the degree of g, then deg(f + g) = max(deg(f ), deg(g)). In particular, Pn+1 (x) :=
Pn (x) + 2xPn (x) has degree n + 1, completing the proof of the induction step.
7.3. Some inverse trigonometric functions.
We now wish to consider inverses of the trigonometric functions: sine, cosine, tangent, and so forth. Right away we encounter a problem similar to the case of xn for
even n: the trigonometric functions are periodic, hence certainly not injective on
their entire domain. Once again we are forced into the art of domain restriction
(as opposed to the science of codomain restriction).
Consider rst f (x) = sin x. To get an inverse function, we need to restrict the
domain to some subset S on which f is injective. As usual we like intervals, and a
little thought shows that the maximal possible length of an interval on which the
sine function is injective is , attained by any interval at which the function either
increases from 1 to 1 or decreases from 1 to 1. This still gives us choices to make.
The most standard choice but to be sure, one that is not the only possible one
We claim that f is increasing on I. To check this, note that f (x) = cos x is indeed
positive on (
2 , 2 ). We have f ([ 2 , 2 ]) = [1, 1]. The inverse function here is often called arcsin x (arcsine of x) in an attempt to distinguish it from sin1 x = csc x.
This is as good a name as any: lets go with it. We have
arcsin : [1, 1] [
, ].
2 2
As the inverse of an increasing function, arcsin x is increasing. Moreover since sin x
105
negative on (0, ) and 0 and 0 and , f (x) is decreasing on [0, ] and hence injective
there. Its image is f ([0, ])) = [1, 1]. Therefore we have an inverse function
arccos : [1, 1] [0, ].
Since cos x is continuous, so is arccos x. Since cos x is dierentiable and has zero
derivative only at 0 and , arccos x is dierentiable on (1, 1) and has vertical tangent lines at x = 1 and x = 1. Morever, since cos x is decreasing, so is arccos x.
We nd a formula for the derivative of arccos just as for arcsin: dierentiating
cos arccos x = x
gives
sin(arccos x) arccos x = 1,
or
1
.
sin arccos x
Again, this may be simplied. If = arccos
x, then x = cos , so if we are on the
unit circle then the y-coordinate is sin = 1 x2 , and thus
1
arccos x =
.
1 x2
Remark: It is hard not to notice that the derivatives of the arcsine and the arccosine
are simply negatives of each other, so for all x [0, 2 ],
arccos x =
arccos x + arcsin x = 0.
By the Zero Velocity Theorem, we conclude
arccos x + arcsin x = C
for some constant C. To determine C, simply evaluate at x = 0:
C = arccos 0 + arcsin 0 = + 0 = ,
2
2
and thus for all x [0, 2 ] we have
arccos x + arcsin x = .
2
So the angle whose sine is x is complementary to the angle whose cosine is x.
sin x
Finally, consider f (x) = tan x = cos
x . The domain is all real numbers for which
cos x = 0, so all real numbers except 2 , 3
2 , . . .. The tangent function is periodic with period and also odd, which suggests that, as with the sine function, we
should restrict this domain to the largest interval about 0 on which f is dened and
f ((
2 , 2 )) = R. Therefore we have an inverse function
)
(
,
.
arctan : R
2 2
Since the tangent function is dierentiable with everywhere positive derivative, the
same is true for arctan x. In particular it is increasing but bounded: limx arctan x =
2 . In other words the arctangent has horizontal asymptotes at y = 2 .
106
5. DIFFERENTIATION
8. Some Complements
The Mean Value Theorem and its role in freshman calculus has been a popular
topic of research and debate over the years.
A short paper of W.J. Knight improves upon the Zero Velocity Theorem.
Theorem 5.48. (Right-Handed Zero Velocity Theorem [Kn80]) Let f : [a, b]
CHAPTER 6
Completeness
1. Dedekind Completeness
1.1. Introducing (LUB) and (GLB).
Gather round, my friends: the time has come to tell what makes calculus work.
Recall that we began the course by considering the real numbers as a set endowed
with two binary operations + and together with a relation <, and satisfying a
longish list of familiar axioms (P0) through (P12), the ordered eld axioms. We
then showed that using these axioms we could deduce many other familiar properties of numbers and prove many other identities and inequalities.
However we did not claim that (P0) through (P12) was a complete list of axioms
for R. On the contrary, we saw that this could not be the case: for instance the
rational numbers Q also satisfy the ordered eld axioms but as we have taken
great pains to point out most of the big theorems of calculus are meaningful
but false when regarded as results applied to the system of rational numbers. So
there must be some further axiom, or property, of R which is needed to prove the
three Interval Theorems, among others.
Here it is. Among structures F satisfying the ordered eld axioms, consider the
following further property:
(P14): Least Upper Bound Axiom (LUB): Let S be a nonempty subset of
F which is bounded above. Then S admits a least upper bound.
This means exactly what it sounds like, but it is so important that we had better
make sure. Recall a subset S of F is bounded above if there exists M R such
that for all x S, x M . (For future reference, a subset S of R is bounded
below if there exists m F such that for all x S, m x.) By a least upper
bound for a subset S of F , we mean an upper bound M which is less than any
other upper bound: thus, M is a least upper bound for S if M is an upper bound
for S and for any upper bound M for S, M M .
There is a widely used synonym for the least upper bound of S, namely the
supremum of S. We also introduce the notation lub S = sup S for the supremum
of a subset S of an ordered eld (when it exists).
The following is a useful alternate characterization of sup S: the supremum of
107
108
6. COMPLETENESS
S is an upper bound M for S with the property that for any M < M , M is not
an upper bound for S: explicitly, for all M < M , there exists x S with M < x.
The denition of the least upper bound of a subset S makes sense for any set
X endowed with an order relation <. Notice that the uniqueness of the supremum
sup S is clear: we cannot have two dierent least upper bounds for a subset, because one of them will be larger than the other! Rather what is in question is the
existence of least upper bounds, and (LUB) is an assertion about this.
Taking the risk of introducing even more terminology, we say that an ordered eld
(F, +, , <) is Dedekind complete1 if it satises the least upper bound axiom.
Now here is the key fact lying at the foundations of calculus and real analysis.
Theorem 6.1. a) The ordered eld R is Dedekind complete.
b) Conversely, any Dedekind complete ordered eld is isomorphic to R.
Part b) of Theorem 6.1 really means the following: if F is any Dedekind complete
ordered eld then there is a bijection f : F R which preserves the addition,
multiplication and order structures in the following sense: for all x, y F ,
f (x + y) = f (x) + f (y),
f (xy) = f (x)f (y), and
If x < y, then f (x) < f (y).
This concept of isomorphism of structures comes from a more advanced course
abstract algebra so it is probably best to let it go for now. One may take part
b) to mean that there is essentially only one Dedekind complete ordered eld: R.
The proof of Theorem 6.1 involves constructing the real numbers in a mathematically rigorous way. This is something of a production, and although in some
sense every serious student of mathematics should see a construction of R at some
point of her career, this sense is similar to the one in which every serious student
of computer science should build at least one working computer from scratch: in
practice, one can probably get away with relying on the fact that many other people
have performed this task in the past. Spivak does give a construction of R and a
proof of Theorem 6.1 in the Epilogue of his text. And indeed, if we treat this
material at all it will be at the very end of the course.
After discussing least upper bounds, it is only natural to consider the dual concept of greatest lower bounds. Again, this means exactly what it sounds like but it
is so important that we spell it out explicitly: if S is a subset of an ordered eld F ,
then a greatest lower bound for S, or an inmum of S, is an element m F
which is a lower bound for S i.e., m x for all x S and is such that if m
is any lower bound for S then m m. Equivalently, m = inf S i m is a lower
bound for S and for any m > m there exists x S with x < m . Now consider:
(P14 ): Greatest Lower Bound Axiom (GLB): Let S be a nonempty subset
of F which is bounded below. Then S admits a greatest lower bound, or inmum.
1It is perhaps more common to say complete instead of Dedekind complete. I have my
reasons for preferring the lengthier terminology, but I wont trouble you with them.
1. DEDEKIND COMPLETENESS
109
110
6. COMPLETENESS
1. DEDEKIND COMPLETENESS
111
112
6. COMPLETENESS
1. DEDEKIND COMPLETENESS
113
To give a name to what we have done, we dene the extended real numbers
[, ] = R {} to be the real numbers together with these two formal
symbols and . This extension is primarily order-theoretic: that is, we may
extend the relation to the extended real numbers in the obvious way:
x R, < x < .
Conversely much of the point of the extended real numbers is to give the real
numbers, as an ordered set, the pleasant properties of a closed, bounded interval
[a, b]: namely we have a largest and smallest element.
The extended real numbers [, ] are not a eld. In fact, we cannot even
dene the operations of + and unrestrictedly on them. However, it is useful to
dene some of these operations:
x R, + x = , x + = .
x (0, ), x = , x () = .
x (, 0), x = , x () = .
= , () = , () () = .
1
1
=
= 0.
None of these denitions are really surprising, are they? If you think about it,
they correspond to facts you have learned about manipulating innite limits, e.g.
if limxc f (x) = and limxc g(x) = 17, then limxc f (x) + g(x) = . However,
certain other operations with the extended real numbers are not dened, for similar
reasons. In particular we do not dene
,
0 ,
Why not? Well, again we might think in terms of associated limits. The above are
indeterminate forms: if I tell you that limxc f (x) = and limxc g(x) = ,
then what can you tell me about limxc f (x) + g(x)? Answer: nothing, unless you
know what specic functions f and g are. As a simple example, suppose
1
1
f (x) =
+ 2011, g(x) =
.
(x c)2
(x c)2
Then limxc f (x) = , limxc g(x) = , but
lim f (x) + g(x) = lim 2011 = 2011.
xc
xc
So cannot have a universal denition independent of the chosen functions.3 In a similar way, when evaluating limits 0 is an indeterminate form:
if limxc f (x) = 0 and limxc g(x) = , then limxc f (x)g(x) depends on how
fast f approaches zero compared to how fast g approaches innity. Again, consider
2011
something like f (x) = (x c)2 , g(x) = (xc)
2 . And similarly for .
These are good reasons. However, there are also more purely algebraic reasons:
there is no way to dene the above expressions in such a way to make the eld
3In the unlikely event you think that perhaps = 2011 always, try constructing another
example...or wait until next semester and ask me again.
114
6. COMPLETENESS
X n;
n=0
in other words, every element of X is a subset of X n for some n (this is precisely the
Archimedean property). Applying Proposition 6.7, we get that for every nonempty
subset X of R,
sup X 0 sup X 1 sup X 2 . . . sup X n . . . .
Suppose moreover that X is bounded above. Then some N Z+ is an upper
bound for X, i.e., X = X N = X N +1 = . . ., so the sequence sup X n is eventually
constant, and in particular limn sup X n = sup X. On the other hand, if X
is bounded above, then the sequence sup X n is not eventually constant; in fact it
takes increasingly large values, and thus
lim sup X n = .
Thus if we take as our denition for sup X, limn sup X n , then for X which is
unbounded above, we get sup X = limn sup X n = . By reection, a similar
discussion holds for inf X.
115
There is, however, one last piece of business to attend to: we said we wanted
sup S and inf S to be dened for all subsets of R: what if S = ? There is an
answer for this as well, but many people nd it confusing and counterintuitive at
rst, so let me approach it again using Proposition 6.7. For each n Z, consider
the set Pn = {n}: i.e., Pn has a single element, the integer n. Certainly then
inf Pn = sup Pn = n. So what? Well, I claim we can use these sets Pn along with
Proposition 6.7 to see what inf and sup should be. Namely, to dene these
quantities in such a way as to obey Proposition 6.7, then for all n Z, because
{n}, we must have
sup sup{n} = n
and
inf inf{n} = n.
There is exactly one extended real number which is less than or equal to every
integer: . Similarly, there is exactly one extended real number which is greater
than or equal to every integer: . Therefore the inexorable conclusion is
sup = , inf = .
Other reasonable thought leads to this conclusion: for instance, in class I had a lot
of success with the pushing conception of suprema and inma. Namely, if your
set S is bounded above, then you start out to the right of every element of your
set i.e., at some upper bound of S and keep pushing to the left until you cant
push any farther without passing by some element of S. What happens if you try
this with ? Well, every real number is an upper bound for , so start anywhere
and push to the left: you can keep pushing as far as you want, because you will
never hit an element of the set. Thus you can push all the way to , so to speak.
Similarly for inma, by reection.
2. Intervals and the Intermediate Value Theorem
2.1. Convex subsets of R.
We say that a subset S of R is convex if for all x < y S, the entire interval
[x, y] lies in S. In other words, a convex set is one that whenever two points are in
it, all in between points are also in it.
Example 2.1: The empty set is convex. For any x R, the singleton set {x} is
convex. In both cases the denition applies vacuously: until we have two distinct
points of S, there is nothing to check!
Example 2.2: We claim any interval is convex. This is immediate or it would
be, if we didnt have so many dierent kinds of intervals to write down and check.
One needs to see that the denition applies to invervals of all of the following forms:
(a, b), [a, b), (a, b], [a, b], (, b), (, b], (a, ), [a, ), (, ).
All these verications are trivial appeals to things like the transitivity of and .
Are there any nonempty convex sets other than intervals? (Just to be sure, we
116
6. COMPLETENESS
count {x} = [x, x] as an interval.4) A little thought suggests that the answer should
be no. But more thought shows that if so we had better use the Dedekind completeness of R, because if we work over Q with all of the corresponding denitions
then there are nonempty convex sets which are not intervals, e.g.
S = {x Q | x2 < 2}.
This
Q by R we would get an interval, namely
has
a familiar theme: replacing
( 2, 2), but once again 2 Q. When one looks carefully at the denitions
it is no trouble to check that working solely in the rational numbers S is a convex
set but is not an interval.
Remark: Perhaps the above example seems legalistic, or maybe even a little silly.
It really isnt: one may surmise that contemplation of such examples led Dedekind
to his construction of the real numbers via Dedekind cuts. This construction
may be discussed at the end of this course. Most contemporary analysts prefer
a rival construction of R due to Cauchy using Cauchy sequences. I agree that
Cauchys construction is simpler. However, both are important in later mathematics: Cauchys construction works in the context of a general metric space (and,
with certain modications, in a general uniform space) to construct an associated complete space. Dedekinds construction works in the context of a general
linearly ordered set to construct an associated Dedekind-complete ordered set.
Theorem 6.8. Any nonempty convex subset D of R is an interval.
Proof. We have already seen the most important insight for the proof: we
must use the Dedekind-completeness of R in our argument. With this in mind the
only remaining challenge is one of organization: we are given a nonempty convex
subset D of R and we want to show it is an interval, but as above an interval can
have any one of nine basic shapes. It may be quite tedious to argue that one of
nine things must occur!
So we just need to set things up a bit carefully: here goes: let a [, ) be
the inmum of D, and let b (, ] be the supremum of D. Let I = (a, b), and
let I be the closure of I, i.e., if a is nite, we include a; if b is nite, we include b.
Step 1: We claim that I D I. Let x I.
Case 1: Suppose I = (a, b) with a, b R. Let z (a, b). Then, since z > a =
inf D, there exists c D with c < z. Similarly, since z < b = sup D, then there
exists d D with z < d. Since D is convex, z D. Now suppose z D. We must
have inf D = a z b = sup D.
Case 2: Suppose I = (, b), and let z I. Since D is unbounded below,
there exists a D with a < z. Moreover, since z < sup D, there exists b D such
that z < b. Since D is convex, z D. Next, let z D. We wish to show that
z I = (, b]; in other words, we want z b. But since z D and b = sup D,
this is immediate. Thus I D I.
Case 3: Suppose I = (a, ). This is similar to Case 2 and is left to the reader.
Case 4: Suppose I = (, ) = R. Let z R. Since D is unbounded below,
there exists a D with a < z, and since D is unbounded above there exists b D
with z < b. Since D is convex, z D. Thus I = D = I = R.
4However, we do not wish to say whether the empty set is an interval. Throughout these
notes the reader may notice minor linguistic contortions to ensure that this issue never arises.
117
118
6. COMPLETENESS
xc
xc+
b) Suppose I has a left endpoint a. Then limxa+ f (x) exists and is at least f (a).
c) Suppose I has a right endpoint b. Then limxb f (x) exists and is at most f (c).
Proof. a) Step 0: As usual, we may f is weakly increasing. We dene
L = {f (x) | x I, x < c}, R = {f (x) | x I, x > c}.
Since f is weakly increasing, L is bounded above by f (c) and U is bounded below
by f (c). Therefore we may dene
l = sup L, r = inf R.
Step 1: For all x < c, f (x) f (c), f (c) is an upper bound for L, so l f (c). For
all c < x, f (c) f (x), so f (c) is a lower bound for R, so f (c) r. Thus
(23)
l f (c) r.
Step 2: We claim limxc f (x) = l. To see this, let > 0. Since l is the least upper
bound of L and l < l, l is not an upper bound for L: there exists x0 < c
such that f (x0 ) > l . Since f is weakly increasing, for all x0 < x < c we have
l < f (x0 ) f (x) l < l + .
Thus we may take = c x0 .
Step 3: We claim limxc+ f (x) = r: this is shown as above and is left to the reader.
Step 4: Substituting the results of Steps 2 and 3 into (23) gives the desired result.
b) and c): The arguments at an endpoint are routine modications of those of part
a) above and are left to the reader as an opportunity to check her understanding.
4. REAL INDUCTION
119
120
6. COMPLETENESS
Real Induction proofs are not by contradiction). This strategy follows [Ka07].
Namely, IVT is equivalent to: let f : [a, b] R be continuous and nowhere zero. If
f (a) > 0, then f (b) > 0. We prove this by Real Induction. Let
S = {x [a, b] | f (x) > 0}.
Then f (b) > 0 i b S. We will show S = [a, b] by real induction, which suces.
(RI1) By hypothesis, f (a) > 0, so a S.
(RI2) Let x S, x < b, so f (x) > 0. Since f is continuous at x, there exists > 0
such that f is positive on [x, x + ], and thus [x, x + ] S.
(RI3) Let x (a, b] be such that [a, x) S, i.e., f is positive on [a, x). We claim
that f (x) > 0. Indeed, since f (x) = 0, the only other possibility is f (x) < 0,
but if so, then by continuity there would exist > 0 such that f is negative on
[x, x], i.e., f is both positive and negative at each point of [x, x]: contradiction!
The following result shows that Real Induction does not only uses the Dedekind
completeness of R but actually carries the full force of it.
Theorem 6.15. In an ordered eld F , the following are equivalent:
(i) F is Dedekind complete: every nonempty bounded above subset has a supremum.
(ii) F satises the Principle of Real Induction: for all a < b F , a subset S [a, b]
satisfying (RI1) through (RI3) above must be all of [a, b].
Proof. (i) = (ii): This is simply a restatement of Theorem 6.14.
(ii) = (i): Let T F be nonempty and bounded below by a F . We will show
that T has an inmum. For this, let S be the set of lower bounds m of T with
a m. Let b be any element of T . Then S [a, b].
Step 1: Observe that b S b = inf T . In general the inmum could be
smaller, so our strategy is not exactly to use real induction to prove S = [a, b].
Nevertheless we claim that S satises (RI1) and (RI3).
(RI1): Since a is a lower bound of T with a a, we have a S.
(RI3): Suppose x (a, b] and [a, x) S, so every y [a, x) is a lower bound for T .
Then x is a lower bound for T : if not, there exists t T such that t < x; taking
any y (t, x), we get that y is not a lower bound for T either, a contradiction.
Step 2: Since F satises the Principle of Real Induction, by Step 1 S = [a, b] i S
satises (RI2). If S = [a, b], then the element b is a lower bound for T , so it must
be the inmum of T . Now suppose that S = [a, b], so by Step 1 S does not satisfy
(RI2): there exists x S, x < b such that for any y > x, there exists z (x, y) such
that z
/ S, i.e., z is not a lower bound for T . In other words x is a lower bound
for T and no element larger than x is a lower bound for T ...so x = inf T .
Remark: Like Dedekind completeness, Real Induction depends only on the ordering relation < and not on the eld operations + and . In fact, given any ordered
set (F, <) i.e., we need not have operations + or at all it makes sense to speak
of Dedekind completeness and also of whether an analogue of Real Induction holds.
In [Cl11], I proved that Theorem 6.15 holds in this general context: an ordered set
F is Dedekind complete i the only it satises a Principle of Ordered Induction.
5. The Extreme Value Theorem
Theorem 6.16. (Extreme Value Theorem)
Let f : [a, b] R be continuous. Then:
121
a) f is bounded.
b) f attains a minimum and maximum value.
Proof. a) Let S = {x [a, b] | f : [a, x] R is bounded}.
(RI1): Evidently a S.
(RI2): Suppose x S, so that f is bounded on [a, x]. But then f is continuous
at x, so is bounded near x: for instance, there exists > 0 such that for all
y [x , x + ], |f (y)| |f (x)| + 1. So f is bounded on [a, x] and also on [x, x + ]
and thus on [a, x + ].
(RI3): Suppose that x (a, b] and [a, x) S. Now beware: this does not say that
f is bounded on [a, x): rather it says that for all a y < x, f is bounded on [a, y].
1
These are really dierent statements: for instance, f (x) = x2
is bounded on [0, y]
for all y < 2 but it is not bounded on [0, 2). But, as usual, the key feature of this
counterexample is a lack of continuity: this f is not continuous at 2. Having said
this, it becomes clear that we can proceed almost exactly as we did above: since f
is continuous at x, there exists 0 < < x a such that f is bounded on [x , x].
But since a < x < x we know also that f is bounded on [a, x ], so f is
bounded on [a, x].
b) Let m = inf f ([a, b]) and M = sup f ([a, b]). By part a) we have
< m M < .
We want to show that there exist xm , xM [a, b] such that f (xm ) = m, f (xM ) = M ,
i.e., that the inmum and supremum are actually attained as values of f . Suppose
that there does not exist x [a, b] with f (x) = m: then f (x) > m for all x [a, b]
1
and the function gm : [a, b] R by gm (x) = f (x)m
is dened and continuous. By
the result of part a), gm is bounded, but this is absurd: by denition of the inmum,
f (x) m takes values less than n1 for any n Z+ and thus gm takes values greater
than n for any n Z+ and is accordingly unbounded. So indeed there must exist
xm [a, b] such that f (xm ) = m. Similarly, assuming that f (x) < M for all x
1
[a, b] gives rise to an unbounded continuous function gM : [a, b] R, x 7 M f
(x) ,
contradicting part a). So there exists xM [a, b] with f (xM ) = M .
6. The Heine-Borel Theorem
Let S R, and let {Xi
}iI be a family of subsets of R. We say that the family
{Xi } covers S if S iI Xi : in words, this simply means that every element
x S is also an element of Xi for at least one i.
Theorem 6.17. (Heine-Borel) Let {Ui }iI be any covering of the closed,
bounded interval by open intervals Ui . Then the covering {Ui }iI has a nite
subcovering: there is a nite subset J I such that every x [a, b] lies in Uj for
some j J.
Proof. For an open covering U = {Ui }iI of [a, b], let
S = {x [a, b] | U [a, x] has a nite subcovering}.
We prove S = [a, b] by Real Induction. (RI1) is clear. (RI2): If U1 , . . . , Un covers
[a, x], then some Ui contains [x, x + ] for some > 0. (RI3): if [a, x) S, let ix I
be such that x Uix , and let
> 0 be such that [x , x] Uix . Since x S,
there is a nite J I with iJ Ui [a, x ], so {Ui }iJ Uix covers [a, x].
122
6. COMPLETENESS
long ago shows this, because for every > 0 we took = |m|
. Although we used
this to show that f is continuous at some arbitrary point c R, evidently the
choice of does not depend on the point c: it works uniformly across all values of
c. Thus f is uniformly continuous on R.
Example 6.2: Let f : R R by f (x) = x2 . This time I claim that our usual
proof did not show uniform continuity. Lets see it in action. To show that f is
continuous at c, we factored x2 c into (x c)(x + c) and saw that to get some
control on the other factor x + c we needed to restrict x to some bounded interval
around c, say [c 1, c + 1]. On this interval |x + c| |x| + |c| |c| + 1 + |c| 2|c| + 1.
(2|c| + 1) = .
2|c| + 1
7. UNIFORM CONTINUITY
123
we have 2|c|+1
) 2M1+1 , so for > 0 we may take = min(1, 2M1+1 ). This shows
2
that f (x) = x is uniformly continuous on [M, M ].
It turns out that one can always recover uniform continuity from continuity by
restricting to a closed bounded interval: this is the last of our Interval Theorems.
7.2. The Uniform Continuity Theorem.
Let f : I R. For , > 0, let us say that f is (, )-UC on I if for all x1 , x2 I,
|x1 x2 | < = |f (x1 ) f (x2 )| < . This is a sort of halfway unpacking of the
denition of uniform continuity. More precisely, f : I R is uniformly continuous
i for all > 0, there exists > 0 such that f is (, )-UC on I.
The following small technical argument will be applied twice in the proof of the
Uniform Continuity Theorem, so advance treatment of this argument should make
the proof of the Uniform Continuity Theorem more palatable.
Lemma 6.18. (Covering Lemma) Let a < b < c < d be real numbers, and let
f : [a, d] R. Suppose that for real numbers 1 , 1 , 2 > 0,
f is (, 1 )-UC on [a, c] and
f is (, 2 )-UC on [b, d].
Then f is (, min(1 , 2 , c b))-UC on [a, b].
Proof. Suppose x1 < x2 I are such that |x1 x2 | < . Then it cannot be
the case that both x1 < b and c < x2 : if so, x2 x1 > c b . Thus we must
have either that b x1 < x2 or x1 < x2 c. If b x1 < x2 , then x1 , x2 [b, d]
and |x1 x2 | < 2 , so |f (x1 ) f (x2 )| < . Similarly, if x1 < x2 c, then
x1 , x2 [a, c] and |x1 x2 | < 1 , so |f (x1 ) f (x2 )| < .
Theorem 6.19. (Uniform Continuity Theorem) Let f : [a, b] R be continuous. Then f is uniformly continuous on [a, b].
124
6. COMPLETENESS
Proof. For > 0, let S() be the set of x [a, b] such that there exists > 0
such that f is (, )-UC on [a, x]. To show that f is uniformly continuous on [a, b], it
suces to show that S() = [a, b] for all > 0. We will show this by Real Induction.
(RI1): Trivially a S(): f is (, )-UC on [a, a] for all > 0!
(RI2): Suppose x S(), so there exists 1 > 0 such that f is (, 1 )-UC on
[a, x]. Moreover, since f is continuous at x, there exists 2 > 0 such that for all
c [x, x+2 ], |f (c)f (x)| < 2 . Why 2 ? Because then for all c1 , c2 [x2 , x+2 ],
|f (c1 ) f (c2 )| = |f (c1 ) f (x) + f (x) f (c2 )| |f (c1 ) f (x)| + |f (c2 ) f (x)| < .
In other words, f is (, 2 )-UC on [x 2 , x + 2 ]. We apply the Covering Lemma to
f with a < x 2 < x < x + 2 to conclude that f is (, min(, 2 , x (x 2 ))) =
(, min(1 , 2 ))-UC on [a, x + 2 ]. It follows that [x, x + 2 ] S().
(RI3): Suppose [a, x) S(). As above, since f is continuous at x, there exists
1 > 0 such that f is (, 1 )-UC on [x 1 , x]. Since x 21 < x, by hypothesis there
exists 2 such that f is (, 2 )-UC on [a, x 21 ]. We apply the Covering Lemma to f
with a < x1 < x 21 < x to conclude that f is (, min(1 , 2 , x 21 (x1 ))) =
(, min( 21 , 2 ))-UC on [a, x]. Thus x S().
8. The Bolzano-Weierstrass Theorem For Subsets
Let S R. We say that x R is a limit point of S if for every > 0, there
exists s S with 0 < |s x| < . Equivalently, x is a limit point of S if every open
interval I containing x also contains an element s of S which is not equal to x.
Proposition 6.20. For S R and x R, the following are equivalent:
(i) Every open interval I containing x also contains innitely many points of S.
(ii) x is a limit point of S.
Example: If S = R, then every x R is a limit point. More generally, if S R
is dense i.e., if every nonempty open interval I contains an element of S then
every point of R is a limit point of S. In particular this holds when S = Q and
when S = R \ Q. Note that these examples show that a limit point x of S may or
may not be an element of S: both cases can occur.
Example: If S T and x is a limit point of S, x is a limit point of T .
Example: No nite subset S of R has a limit point.
Example: The subset Z has no limit points: indeed, for any x R, take I =
(x 1, x + 1). Then I is bounded so contains only nitely many integers.
Example: More generally, let S be a subset such that for all M > 0, S [M, M ]
is nite. Then S has no limit points.
Theorem 6.21. (Bolzano-Weierstrass)
Every innite subset A of [a, b] has a limit point.
Proof. Let A [a, b], and let S be the set of x in [a, b] such that if A [a, x]
is innite, it has a limit point. It suces to show S = [a, b], which we will do by
Real Induction. (RI1) is clear. (RI2) Suppose x [a, b) S. If A [a, x] is innite,
then it has a limit point and hence so does A [a, b]: thus S = [a, b]. If for some
125
CHAPTER 7
Dierential Miscellany
1. LH
opitals Rule
We have come to the calculus topic most hated by calculus instructors: LHopitals
(x)
(x)
First, since fg (x)
A < , there is c (a, b) such that for all x > c, fg (x)
< .
Let c < x < y < b. By Cauchys Mean Value Theorem, there is t (x, y) such that
(24)
f (x) f (y)
f (t)
=
<
g(x) g(y)
g (t)
(y)
Suppose rst that (i) holds. Then by letting x approach b in (24) we get fg(y)
f (x)
g(y) f (y)
<
+
.
g(x)
g(x) g(x)
(x)
Letting x approach b, we nd: there is c (c1 , b) such that for all x > c, fg(x)
< .
Step 2: Suppose A > . Then arguing in a very similar manner as in Step 1
we may show that for any < A there exists c (a, b) such that for all x > c,
127
128
7. DIFFERENTIAL MISCELLANY
f (x)
g(x)
f (x)
g(x)
= A.
Remark: Perhaps you were expecting the additional hypothesis limxb f (x) =
in condition (ii). As the proof shows, this is not necessary. But it seems to be
very risky to present the result to freshman calculus students in this form!
n
Example 7.1: We claim that for all n Z+ , limx xex = 0. We show this by
induction on n. First we do n = 1: limx exx = 0. Since limx g(x) = 0
and limx
f (x)
g (x)
= limx
1
ex
n
limx exx = 0. Induction Step: let n Z+ and suppose limx xex = 0. Then
(
n+1
n)
LH
(n+1)xn
= (n + 1) limx xex = (n + 1) 0 = 0.
limx x ex =
= limx
ex
Why do calculus instructors not like LHopitals Rule? Oh, let us count the ways!
(x)
1) Every derivative f (x) = limh0 f (x+h)f
is of the form 00 . Thus many calcuh
lus students switch to applying LHopitals Rule instead of evaluating derivatives
from the denition. This can lead to painfully circular reasoning. For instance,
what is limx0 sinx x ? Well, both numerator and denominator approach 0 and
limx0 (sinxx) = limx0 cos1 x = cos 0 = 1. Whats wrong with this? Well, how
do we know that (sin x) = cos x? Thinking back, we reduced this to computing
the derivative of sin x at x = 0, i.e., to showing that limx0 sinx x = 1!
2) Many limits which can be evaluated using LHopitals Rule can also be evaluated in many other ways, and often just by thinking a bit about how functions
actually behave. For intance, try to evaluate the limit of Example 7.1 above without
using LHopital. There are any number of ways. For instance:
Lemma 7.2. (Racetrack Principle) Let f, g : [a, ) R be two dierentiable
functions such that f (x) g (x) for all x a. Then:
a) We have f (x) f (a) g(x) g(a) for all x a.
b) If f (x) > g (x) for all x > a, then f (x) f (a) > g(x) g(a) for all x > a.
Proof. Put h = f g : [a, ) R.
a) Then h (x) 0 for all x a, so h is weakly increasing on (a, ), and thus being
continuous, weakly increasing on [a, ): for all x a, f (x) g(x) = h(x) h(a) =
f (a) g(a), and thus f (x) f (a) g(x) g(a).
b) This is the same as part a) with all instances of replaced by >: details may
be safely left to the reader.
Proposition 7.3. Let f : [a, ) R be twice dierentiable such that:
(i) f (a) > 0 and
(ii) f (x) 0 for all x a.
Then limx f (x) = .
Proof. Let g be the tangent line to f at x = a, viewed as a function from
[a, ) to R. Because f = (f ) is non-negative on [a, ), f is weakly increasing
1In fact, [R] states and proves the result with lim
xa+ instead of limxb . I recast it this
way since a natural class of examples concerns limx .
1. LHOPITALS
RULE
129
x
(i) Let fn (x) = xen . One can therefore establish limx xen = by showing that
fn (x), fn (x) are both positive for suciently large x. It is easy to see that fn (x) > 0
for all x > n. The analysis for fn is a bit messier; we leave it to the reader and try
something slightly dierent instead.
(ii) Since fn (x) > 0 for all x > n, f is eventually increasing and thus tends either
to a positive limit A or to +. But as x , x + 1 , so
ex+1
ex
= e lim
= eA.
n
x (x + 1)
x (x + 1)n
A = lim
n
for ex , namely n=0 xn! . From this the desired limit follows almost immediately!
log A = lim log
3) The statement of LHopitals Rule is complicated and easy for even relatively
procient practitioners of the mathematical craft to misremember or misapply. A
classic rookie mistake is to forget to verify condition (i) or (ii): of course in general
f (x)
f (x)
= lim
;
xa g(x)
xa g (x)
lim
try a random example. But there are subtler pitfalls as well. For instance, even
(x)
(x)
under conditions (i) and (ii), limxa fg(x)
= A need not imply that limxa fg (x)
exists, so you cannot use LHopitals Rule to show that a limit does not exist.
Example 7.2: Let f, g : R R by f (x) = x2 sin( x1 ) and f (0) = 0 and g(x) = x.
Then f and g are both dierentiable (for f this involves going back to the limit
denition of the derivative at x = 0 we have seen this example before), and
(x)
limx0 fg(x)
= limx0 x sin( x1 ) = 0. However,
( )
( )
( )
f (x)
1
1
1
lim
= lim 2x sin
cos
= lim cos
,
x0 g (x)
x0
x0
x
x
x
130
7. DIFFERENTIAL MISCELLANY
f (xn )
f (xn )
xn+1 = xn
f (xn )
.
f (xn )
or
(26)
Note that our expression for xn+1 is undened if f (xn ) = 0, as well it should be:
if the tangent line at xn is horizontal, then either it coincides with the x-axis (in
which case xn is already a root of f and no amelioration is needed) or it is parallel
2. NEWTONS METHOD
131
to the x-axis, in which case the method breaks down: in a sense we will soon make
precise, this means that xn is too far away from the true root c of f .
2.2. A Babylonian Algorithm.
We can use Newtons method to approximate 2. Consider f (x) = x2 2; straightforward calculus tells us that there is a unique positive number c such that f (c) = 2
and that for instance c [1, 2]. We compute the amelioration formula in this case:
(
)
x2 2
2x2 (x2n 2)
x2 + 2
1
2
(27)
xn+1 = xn n
= n
= n
=
xn +
.
2xn
2xn
2xn
2
xn
In other words, to get from xn to xn+1 we take the average of xn and
2
xn .
If I now ask my laptop computer to directly compute 2, then it tells me2 that
2 = 1.414213562373095048801688724 . . . .
Thus x5 is accurate to 11 decimal places and x6 is accurate to 23 decimal places.
Looking more carefully, it seems that each iteration of the amelioration process
xn 7 xn+1 roughly doubles the number of decimal places of accuracy. If this holds
true, it means
that the approximations get close to the true root very fast it we
132
7. DIFFERENTIAL MISCELLANY
application
of which with x1 = 1 leads to fantastically good numerical approxima
tions to a. If you ever nd yourself on a desert island and needing to compute a
to many decimal placees as part of your engineering research to build a raft that will
carry you back to civilization, then this is probably the method you should use.
And now if anyone asks you whether honors calculus contains any practically
useful information, you must tell them that the answer is yes!
2.3. Questioning Newtons Method.
Of course we havent proven anything yet. Here are two natural questions:
Question 7.4. Let f : I R be dierentiable, and let c I be such that
f (c) = 0.
a) Is there some subinterval (c , c + ) about the true root c such that starting
Newtons method with any x1 (c , c + ) guarantees that the sequence of approximations {xn } gets arbitrarily close to c?
b) Assuming the answer to part a) is yes, given some x1 (c , c + ) can we give
a quantitative estimate on how close xn is to c as a function of n?
Questions like these are explored in a branch of mathematics called numerical
analysis. Most theoretical mathematicians (e.g. me) know little about it, which
is a shame because the questions its treats are fundamental and closely related to
pure mathematics. (As well as being useful in applications, of course.)
2.4. Introducing Innite Sequences.
We will give some answers to these questions. First, the business of the xn s getting
arbitrarily close to c should be construed in terms of a limiting process, but one of
a kind which is slightly dierent and in fact simpler than the limit of a real-valued
function at a point. Namely, a real innite sequence xn is simply an ordered list
of real numbers x1 , x2 , . . . , xn , . . ., or slightly more formally, is given by a function
from the positive integers Z+ to R, say f (n) = xn . If L R, we say the innite
sequence {xn } converges to L and write xn L if for all > 0 there exists
N Z+ such that for all n N , |xn L| < . This is precisely the denition
of limx f (x) = L except that our function f is no longer dened for all (or all
suciently large) real numbers but only at positive integers. So it is a very close
cousin of the types of limit operations we have already studied.
Here is one very convenient property of limits of sequences.
Proposition 7.5. Let {xn }
n=1 be a sequence of real numbers, and let f : R
R be a continuous function. Suppose that xn L. Then f (xn ) f (L).
Proof. Fix > 0. Since f is continuous at L, there exists > 0 such that
|x L| < = |f (x) f (L)| < . Moreover, since xn L, there exists a positive
integer N such that for all n N , |xn L| < . Putting these together: if n N
then |xn L| < , so |f (xn ) f (L)| < . This shows that f (xn ) L.
Remark: a) Proposition 7.5 is a close cousin of the fact that compositions are continuous functions are continuous, and in particular the proof is almost the same.
b) At the moment we are just getting a taste of innite sequences. Later in the
course we will study them more seriously and show that Proposition 7.5 has a very
2. NEWTONS METHOD
133
134
7. DIFFERENTIAL MISCELLANY
2. NEWTONS METHOD
135
f (xn )
.
f (xn )
f (x)
.
f (x)
Now we have to check that our setup does apply to T , quite nicely. First, observe
that a point x is a root of f if and only if it is a xed point of T . Since by our
assumption c is the unique root of f in [c , c + ], c is the unique xed point of
T on this interval.
The next order of business is to show that T is contractive, at least in some
smaller interval around c. For this we look at the derivative:
T (x) = 1
136
7. DIFFERENTIAL MISCELLANY
2. NEWTONS METHOD
137
Let x0 [c, c+], and let {xn } be the Newtons Method sequence of iterates.
It will be useful to rewrite the dening recursion as
n N, xn+1 xn =
f (xn )
.
f (xn )
Apply the Mean Value Theorem to f on |[xn , c]|: there is yn |(xn , c)| such that
f (xn ) f (c)
f (xn )
=
= f (yn ).
xn c
xn c
Apply the Mean Value Theorem to f on |[xn , yn ]|: there is zn |(xn , yn ))| such
that
f (xn ) f (yn )
= f (zn ).
xn yn
The rest of the proof is a clever calculation: for n N, we have
f (xn )
f (xn )
|xn+1 c| = |(xn+1 xn ) + (xn c)| =
+
f (xn )
f (yn )
f (xn )(f (xn ) f (yn )) f (xn )(xn c)(f (xn ) f (yn ))
=
=
f (xn )f (yn )
f (xn )f (yn )
f (xn ) f (yn )
f (zn )
=
|xn c| = f (yn ) |xn yn ||xn c|
f (yn )
f (zn )
|xn c|2 B |xn c|2 .
f (yn )
A
In the second to the last inequality above, we used the fact that since yn lies between
xn and c, |xn yn | |xn c|.
b) This is left as an exercise for the reader.
Exercise: Prove Theorem 7.11b).
Let c be a real number, C a positive real number, and let {xn }
n=0 be a sequence
of real numbers. We say that the sequence {xn } quadratically converges to c if
(QC1) xn c, and
(QC2) For all n N, |xn+1 c| C|xn c|2 .
Exercise:
a) Show that a sequence to satsify (QC2) but not (QC1).
1
b) Suppose that a sequence {xn }
n=0 satises |x0 c| < min(1, C ). Show that {xn }
quadratically converges to c.
c) Deduce that under the hypotheses of Theorem 7.11, there is > 0 such that for
all x0 [c , c + ], the Newtons Method sequence quadratically converges to c.
Exercise: Let {xn } be a sequence which is quadrtically convergent to c R. Viewing xn as an approximation to c, one often says that the number of decimal places
of accuracy roughly doubles with each iteration.
a) Formulate this as a statement about the sequence dn = log10 (|xn c|).
b) Prove the precise statement you formulated in part a).
138
7. DIFFERENTIAL MISCELLANY
3. CONVEX FUNCTIONS
139
Exercise: Let
1 , . . . , n be convex subsets of Rn .
n
a) Show that i=1 i the
nset of all points lying in every i is convex.
b) Show by example that i=1 i need not be convex.
When n = 1, convex subsets are quite constrained. Recall we have proven:
Theorem 7.12. For a subset R, the following are equivalent:
(i) is an interval.
(ii) is convex.
3.2. Goals.
In freshman calculus one learns, when graphing a function f , to identify subintervals on which the graph of f is concave up and intervals on which it is concave
down. Indeed one learns that the former occurs when f (x) > 0 and the latter
occurs when f (x) < 0. But, really, what does this mean?
First, where freshman calculus textbooks say concave up the rest of the mathematical world says convex ; and where freshman calculus textbooks say concave
down the rest of the mathematical world says concave. Moreover, the rest of the
mathematical world doesnt speak explicitly of concave functions very much because it knows that f is concave exactly when f is convex.
Second of all, really, whats going on here? Are we saying that our denition
of convexity is that f > 0? If so, exactly why do we care when f > 0 and when
f < 0: why not look at the third, fourth or seventeenth derivatives? The answer is
that we have not a formal denition but an intuitive conception of convexity, which
a good calculus text will at least try to nurture: for instance I was taught that a
function is convex (or rather concave up) when its graph holds water and that
it is concave (concave down) when its graph spills water. This is obviously not
a mathematical denition, but it may succeed in conveying some intuition. In less
poetic terms, the graph of a convex function has a certain characteristic shape that
the eye can see: it looks, in fact, qualitatively like an upward opening parabola or
some portion thereof. Similarly, the eye can spot concavity as regions where the
graph looks, qualitatively, like a piece of a downward opening parabola. And this
explains why one talks about convexity in freshman calculus: it is a qualitative,
visual feature of the graph of f that you want to take into account. If you are
graphing f and you draw something concave when the graph is actually convex,
the graph will look wrong and you are liable to draw false conclusions about the
behavior of the function.
So, at a minimum, our task at making good mathematical sense of this portion
of freshman calculus, comes down to the following:
Step 1: Give a precise denition of convexity: no pitchers of water allowed!
Step 2: Use our denition to prove a theorem relating convexity of f to the second
derivative f , when f exists.
In fact this is an oversimplication of what we will actually do. When we try
to nail down a mathematical denition of a convex function, we succeed all too
well: there are ve dierent denitions, each having some intuitive geometric appeal and each having its technical uses. But we want to be talking about one class
140
7. DIFFERENTIAL MISCELLANY
of functions, not four dierent classes, so we will need to show that all ve of our
denitions are equivalent, i.e., that any function f : I R which satises any one
of these denitions in fact satises all four. This will take some time.
3.3. Epigraphs.
For a function f : I R, we dene its epigraph to be the set of points (x, y) IR
which lie on or above the graph of the function. In fewer words,
Epi(f ) = {(x, y) I R | y f (x)}.
A function f : I R is convex if its epigraph Epi(f ) is a convex subset of R2 .
Example: Any linear function f (x) = mx + b is convex.
Example: The function f (x) = |x| is convex.
Example: Suppose f (x) = ax2 + bx + c. Then Epi(f ) is just the set of points
of R2 lying on or above a parabola. From this picture it is certainly intuitively
clear that Epi(f ) is convex i a > 0, i.e., i the parabola is opening upward. But
proving from scratch that Epi(f ) is a convex subset is not so much fun.
3.4. Secant-graph, three-secant and two-secant inequalities.
A function f : I R satises the secant-graph inequality if for all a < b I
and all [0, 1], we have
(29)
.
xa
ba
bx
A function f : I R satises the two-secant inequality if for all a < x < b,
(30)
(31)
f (x) f (a)
f (b) f (a)
.
xa
ba
3. CONVEX FUNCTIONS
141
Proof. We will show (i) = (ii) (iii) = (i) and (iii) (iv).
(i) = (ii): This is immediate.
(ii) (iii): The two-secant inequality
f (x) f (a)
f (b) f (a)
xa
ba
is equivalent to
(
f (x) f (a) +
f (b) f (a)
ba
)
(x a) = La,b (x),
say. Now La,b (x) is a linear function with La,b (a) = f (a) and La,b b() = f (b), hence
it is the secant line between (a, f (a)) and (b, f (b)). Thus the two-secant inequality
is equivalent to the secant-graph inequality.
(iii) = (i): As above,
( snce the
) secant line La,b (x) from (a, f (a)) to (b, f (b)) has
equation y = f (a) +
f (b)f (a)
ba
f (x) f (a)
f (b) f (a)
.
xa
ba
To get the other half of the three-secant inequality, note that we also have
La,b (x) = f (b) +
and the inequality f (x) f (b) +
f (a) f (b)
(b x),
ba
f (a)f (b)
(b
ba
f (b) f (a)
f (b) f (x)
.
ba
bx
(iii) = (iv): Let P1 = (x1 , y1 ), P2 = (x2 , y2 ) Epi(f ). We want to show Epi(f )
contains the line segment joining P1 and P2 . This is clear if x1 = x2 = x in
this case the line segment is vertical, since since y1 and y2 are both greater than or
equal to f (x), so is every point y in between y1 and y2 . So we may assume x1 = x2
and then that x1 < x2 (otherwise interchange P1 and P2 ). Seeking a contradiction,
we suppose there is 1 (0, 1) such that (1 1 )P1 + 1 P2
/ Epi(f ): that is,
(1 1 )y1 + 1 y2 < f ((1 1 )x1 + 1 x2 ).
But since f (x1 ) y1 and f (x2 ) y2 , we have
(1 1 )f (x1 ) + 1 f (x2 ) (1 1 )y1 + 1 y2
and thus
(1 1 )f (x1 ) + 1 f (x2 ) < f ((1 1 )x1 + 1 x2 ),
violating the secant-graph inequality.
(iv) = (iii): Let x < y I. Since (x, f (x)) and (y, f (y)) lie on the graph of f ,
they are elements of the epigraph Epi(f ). Since Epi(f ) is convex the line segment
joining (x, f (x)) and (y, f (y)) lies inside Epi(f ). But this line segment is nothing
else than the secant line between the two points, and to say that it lies inside the
epigraph is to say that the secant line always lies on or above the graph of f .
Corollary 7.14. (Generalized Two Secant Inequality) Let f : I R be a
convex function, and let a, b, c, d I with a < b c < d. Then
(32)
f (b) f (a)
f (d) f (c)
.
ba
dc
142
7. DIFFERENTIAL MISCELLANY
ba
cb
and applying it to the points b, c, d gives
f (c) f (b)
f (d) f (c)
.
cb
dc
Combining these two inequalities gives the desired result.
.
vu
yx
zw
Thus
so
f (v) f (u) f (z) f (w)
|f (x) f (y)|
,
,
max
|x y|
vu zw
f (v) f (u) f (z) f (w)
,
L = max
vu zw
3. CONVEX FUNCTIONS
143
f (x) f (a)
,
xa
S(x) =
f (b) f (x)
.
bx
xb
f (x)f (a)
.
xa
Thus g (x) 0 for all x (a, b], so indeed g is increasing on (a, b].
3Andrew Kane was a student in the 2011-2012 course who suggested this criterion upon being
prompted in class.
144
7. DIFFERENTIAL MISCELLANY
3. CONVEX FUNCTIONS
145
so c is a supporting line and also that if A < 0, f (x) c (x) < 0 for all x so
c is not a supporting line. Note that since f (x) = 2A, f is convex i A > 0.
Example: Consider the function f : R R given by f (x) = |x|. Since Epi(f )
is a convex subset of R2 , f is convex. For every c > 0, the line y = x is a supporting line, and for every c < 0, the line y = x is a supporting line, and in both
cases these supporting lines are unique. For c = 0, y = 0 is a supporting line, but
it is not the only one: indeed y = mx is a supporting line at c = 0 i 1 m 1.
Note that the smallest slope of a supporting line is the left-hand derivative at zero:
f (0 + h) f (0)
h
= lim
= 1,
h
h
h0
and the largest slope of a supporting line is the right-hand derivative at zero:
(0) = lim
f
h0
f+
(0) = lim
h0+
f (0 + h) f (0)
h
= lim
= 1.
+
h
h0 h
Lemma 7.22. Convex functions are closed under suprema. More precisely, if
{fi : I R}iI is a family of convex functions, f : I R is a function such that
for all x I, f (x) = supiI fi (x), then f is convex.
Proof. Let a < b I and (0, 1). Then
f ((1 )a + b) = sup fi ((1 )a + b)
iI
iI
Theorem 7.23. Let I be an open interval. For a function f : I R, TFAE:
(i) f is convex.
(ii) f admits a supporting line at each c I.
Proof. (i) = (ii): Neither property (i) or (ii) is disturbed by translating
the coordinate axes, so we may assume that c = 0 and f (0) = 0. Let I \ {0}.
For all 1 , 2 > 0 such that 1 , 2 I, by the secant-graph inequality we have
(
)
1
2
0 = (1 + 2 )f
(2 ) +
(1 ) 1 f (2 ) + 2 f (1 ),
1 + 2
1 + 2
or
f (2 )
f (1 )
.
2
1
It follows that sup2
f (2 )
2
inf 1
f (1 )
1 ,
so there is m R with
f (1 )
f (2 )
m
.
2
1
Equivalently, f (t) mt for all t R such that t I. Thus (x) = mx is a
supporting line for f at c = 0.
(ii) = (i): For each c I, let c : I R be a supporting line for f at c. Since
for all x I, f (x) c (x) for all c and f (c) = c (c), we have f (x) = supcI c (x).
Since the linear functions c are certainly convex, f is the supremum of a family of
convex functions, hence convex by Lemma 7.21
146
7. DIFFERENTIAL MISCELLANY
Before stating the next result, we recall the notion of one-sided dierentiability:
if f is a function dened (at least) on some interval [c , c], we say f is left
(c)
dierentiable at c if limxc f (x)f
exists, and if so, we denote this limit by
xc
and if so, we denote this limit by f+ (c), the right derivative of f at c. (As usual
c) f
, f+
: I R are both weakly increasing functions.
d) A line passing through (c, f (c)) is a supporting line for f i its slope m satises
(c) m f+
(c).
f
f (x) f (c)
.
xc
Further, put
A = ((a, c)), B = (c, b).
From the three-secant inequality we immediately deduce all of the following: is
weakly increasing on (a, c), is weakly increasing on (c, b), and A B. Thus
f
(c) = lim (x) = sup A inf B = lim+ (x) = f+
(c).
xc
xc
c) Let x1 , x2 (a, b) with x1 < x2 , and choose v with x1 < v < x2 . Then by part
b) and the three-secant inequality,
f (v) f (x1 )
f (v) f (x2 )
f
(x2 ) f+
(x2 ).
v x1
v x2
d) The proof of parts a) and b) shows that for x I,
f
(x1 ) f+
(x1 )
f (x) f (c) + f
(c)(x c), if x c,
f (x) f (c) + f+
(c)(x c), if x c.
f (x) f (c) + f
(c)(x c) f (c) + m(x c), if x c,
(c)(x c) m(xc ) if x c,
f (x) f (c) + f+
so (x) = f (c) + m(x c) is a supporting line for f at c. That these are the only
possible slopes of supporting lines for f at c is left as an exercise for the reader.
3. CONVEX FUNCTIONS
147
(i) f
is continuous at c.
(ii) f+
is continuous at c.
(iii) f is dierentiable at c, and the tangent line is a supporting line at c.
(iv) f has a unique supporting line at c.
Remark: Because a weakly increasing function can only have jump discontinuities, it can be shown that such functions are continuous at most points of their
domain. For those who are familiar with the notions of countable and uncountable
sets, we may be more precise: the set of discontinuities of a monotone function
must be countable. Since the union of two countable sets is countable, it follows
that a convex function is dierentiable except possibly at a countable set of points.
Remark: One can go further: a convex function is twice dierentiable at most
points of its domain. The sense of most here is dierent (and weaker): the set
of points at which f fails to be twice dierentiable has measure zero in the sense
of Lebesgue. This result, which is itself due to Lebesgue, lies considerably deeper
than the one of the previous remark.
3.9. Jensens Inequality.
Theorem 7.25. (Jensens Inequality) Let f : I R be continuous and convex.
For any x1 , . . . , xn I and any 1 , . . . , n [0, 1] with 1 + . . . + n = 1, we have
f (1 x1 + . . . + n xn ) 1 f (x1 ) + . . . + n f (xn ).
Proof. We go by induction on n, the base case n = 1 being trivial. So
suppose Jensens Inequality holds for some n Z+ , and consider x1 , . . . , xn+1 I
and 1 , . . . , n+1 [0, 1] with 1 + . . . + n+1 = 1. If n+1 = 0 we are reduced
to the case of n variables which holds by induction. Similarly if n+1 = 1 then
1 = . . . = n = 0 and we have, trivially, equality. So we may assume n+1 (0, 1)
and thus also that 1 n+1 (0, 1). Now for the big trick: we write
(
)
1
n
1 x1 +. . .+n+1 xn+1 = (1n+1 )
x1 + . . . +
xn +n+1 xn+1 ,
1 n+1
1 n+1
so that
1
n
x1 +. . .+
xn )+n+1 xn+1 )
1 n+1
1 n+1
(
)
1
n
(1 n+1 )f
x1 + . . . +
xn + n+1 f (xn+1 ).
1 n+1
1 n+1
f (1 x1 + . . . + n xn ) = f ((1n+1 )(
1
n
Since 1
, . . . , 1
are non-negative numbers that sum to 1, by induction
n+1
n+1
the n variable case of Jensens Inequality can be applied to give that the above
expression is less than or equal to
)
(
n
1
f (x1 ) + . . . +
f (xn ) + n+1 f (xn+1 )
(1 n+1 )
1 n+1
1 n+1
148
7. DIFFERENTIAL MISCELLANY
(33)
1
n,
Taking 1 = . . . = n =
x1 1 xnn = elog(x1
n
x
n )
1
p
1
q
= 1. Then
xp
yq
+ .
p
q
Proof. When either x = 0 or y = 0 the left hand side is zero and the right hand
side is non-negative, so the inequality holds and we may thus assume x, y > 0. Now
apply the Weighted Arithmetic-Geometric Mean Inequality with n = 2, x1 = xp ,
x2 = y q , 1 = p1 , 2 = 1q . We get
xy
(34)
xy = (xp ) p (y q ) q = x1 1 x2 2 1 x1 + 2 x2 =
xp
yq
+ .
p
q
Theorem 7.28. (H
olders Inequality)
Let x1 , . . . , xn , y1 , . . . , yn R and let p, q (1, ) satisfy
(35)
1
p
1
p
1
q
= 1. Then
1
3. CONVEX FUNCTIONS
149
|x1 ||x1 +y1 |p1 +. . .+|xn ||xn +yn |p1 +|y1 ||x1 +y1 |p1 +. . .+|yn ||xn +yn |p1
1
(|x1 |p +. . .+|xn |p ) p (|x1 +y1 |p +. . .+|xn +yn |p ) q +(|y1 |p +. . .+|yn |p ) p (|x1 +y1 |p +. . .+|xn +yn |p ) q
(
)
1
1
1
= (|x1 |p + . . . + |xn |p ) p + (|y1 |p + . . . |yn |p ) p (|x1 + y1 |p + . . . + |xn + yn |p ) q .
1
CHAPTER 8
Integration
1. The Fundamental Theorem of Calculus
Having nished with continuity and dierentiation, we turn to the third main
theme of calculus: integration. The basic idea is this: for a function f : [a, b] R,
b
we wish to associate a number a f , the denite integral. When f is non-negative,
b
our intuition is that a f should represent the area under the curve y = f (x), or
more precisely the area of the region bounded above by y = f (x), bounded below
by y = 0, bounded on the left by x = a and bounded on the right by x = b.
Unfortunately this is not yet a formal denition, because we do not have a
formal denition of the area of a subset of the plane! In high school geometry
one learns only about areas of very simple gures: polygons, circles and so forth.
Dealing head-on with the task of assigning an area to every subset of R2 is quite
dicult: it is one of the important topics of graduate level real analysis: measure
theory.
b
So we need to back up a bit and give a denition of a f . As you probably know,
b
the general idea is to construe a f as the result of some kind of limiting process,
wherein we divide [a, b] into subintervals and take the sum of the areas of certain rectangles which approximate the function f at various points of the interval
(Riemann sums). As usual in freshman calculus, reasonably careful denitions
appear in the textbook somewhere, but with so little context and development that
(almost) no actual freshman calculus student can really appreciate them.
But wait! Before plunging into the details of this limiting process, lets take a
b
more axiomatic approach: given that we want a f to represent the area under
y = f (x), what properties should it satisfy? Here are some reasonable ones.
b
C = C(b a).
a b
b
(I2) If f1 (x) f2 (x) for all x [a, b], then a f1 a f2 .
b
c
c
(I3) If a c b, then a f = a f + b f .
(I1) If f = C is a constant function, then
Exercise 1.1: Show (I1) implies: for any f : [a, b] R and any c [a, b],
c
c
f = 0.
It turns out that these three axioms already imply many of the other properties we
want an integral to have. Even more, there is essentially only one way to dene
b
f so as to satisfy (I1) through (I3).
a
Well, almost. One feature that we havent explicitly addressed yet is this: for
151
152
8. INTEGRATION
b
which functions f : [a, b] R do we expect a f to be dened? For all functions??
A little thought shows this not to be plausible: there are some functions so pathological that there is no reason to believe that the area under the curve y = f (x)
has any meaning whatsoever, and there are some functions for which this area concept seems meaningful but for which the area is innite.
So it turns out to be useful to think of integration itself as a real-valued function,
with domain some set of functions {f : [a, b] R}. That is, for each a b we
wish to have a set, say R[a, b], of integrable functions f : [a, b] R and for each
b
f R[a, b], we wish to associate a real number a f . As to exactly what this set
R[a, b] of integrable functions should be, it turns out that we have some leeway, but
to get a theory which is useful and not too complicated, lets assume the following:
(I0) For all real numbers a < b:
a) Every continuous f : [a, b] R lies in R[a, b].
b) Every function f R[a, b] is bounded.
By the Extreme Value Theorem, every continuous function f : [a, b] R is
bounded. Thus the class C[a, b] of all continuous functions f : [a, b] R is contained in the class B[a, b] of all bounded functions f : [a, b], and axiom (I0) requires
that the set of integrable functions lies somewhere in between:
C[a, b] R[a, b] B[a, b].
Lets recast the other three axioms in terms of our set R[a, b] of integrable functions:
(I1) If f = C is constant, then f R[a, b] and
C = C(b a).
b
b
(I2) If for f1 , f2 R[a, b] we have f1 (x) f2 (x) for all x [a, b], then a f1 a f2 .
(I3) Let f : [a, b] R, and let c (a, b). Then f R[a, b] i f R[a, c] and
b
c
b
f R[c, b]. If these equivalent conditions hold, then a f = a f + c f .
a
If this business of integrable functions seems abstruse, then on the rst pass just
imagine that R[a, b] is precisely the set of all continuous functions f : [a, b] R.
Now we have the following extremely important result.
Let f R[a, b] be any
Theorem 8.1. (Fundamental Theorem of Calculus)
x
integrable function. For x [a, b], dene F(x) = a f . Then:
a) The function F : [a, b] R is continuous at every c [a, b].
b) If f is continuous at c [a, b], then F is dierentiable at c, and F (c) = f (c).
c) If f is continuous and F is any antiderivative of f i.e., a function F : [a, b] R
b
such that F (x) = f (x) for all x [a, b], then a f = F (b) F (a).
Proof. By (I0), there exists M R such that |f (x)| M for all x [a, b]. If
M = 0 then f is the constant function 0, and then it follows from (I1) that F is
also the constant function zero, and one sees easily that the theorem holds in this
case.
. Indeed, by (I3)
So we may assume M > 0. For all > 0, we may take = M
x
c
x
(37)
F(x) F(c) =
f
f=
f.
a
153
and thus
f | M (B A).
(38)
M.
xc
x
x
f (c)
f
f (c) +
c
f (c) =
c
= f (c) + ,
xc
xc
xc
and thus
x
f
| c
f (c)| .
xc
This shows that F (c) exists and is equal to f (c).
x
c) By part b), if f is continuous, F(x) = a f is an antiderivative of f . But we have
shown that if antiderivatives exist at all they are unique up to an additive constant.
We have just found an antiderivative F, so if F is any other antiderivative of f we
must have F (x) = F(x) + C for some constant C, and then
b
a
b
F (b) F (a) = (F(b) + C) (F(a) + C) = F(b) F(a) =
f
f=
f.
c
Remark: Although we introduced the integral axiomatically, as long as we are
only trying to integrate continuous functions we had no choice: the only way to
b
assign a value a f to each continuous function f : [a, b] R satisfying the (reab
sonable!) axioms (I1) through (I3) is to take a f to be an antidervative F of f
with F (a) = 0, and again, there is at most one such function.
These same considerations answer the conundrum of why the celebrated Theorem 8.1 has such a short and simple proof.1 The theorem assumes that we already
1This is not just orid language. I taught second semester calculus four times as a graduate
student and really did become puzzled at how easy it was to prove the Fundamental Theorem
of Calculus so soon after integration is discussed. I worked out the answer while teaching an
undergraduate real analysis course at McGill University in 2005. The current presentation is an
adaptation of my lecture notes from this older course. Soon after I gave my 2005 lectures I found
that a very similar axiomatic treatment of the integral was given by the eminent mathematician
154
8. INTEGRATION
b
have an integral, i.e., an assignment (f : [a, b] R) 7 a f for every continuous
function f . We have shown that there is at most one such integral on the continuous functions, but we have not yet constructed this integral! In other words,
we have settled the problem of uniqueness of the denite integral but (thus far)
assumed a solution to the much harder problem of existence of the denite integral.
And again, this existence problem is equivalent to an existence problem that we
mentioned before, namely that every continuous function has an antiderivative.
Thus: if we could prove by some other means that every continuous function f
is the derivative of some other function F , then by the above we may simply dene
b
f = F (b) F (a). This is the approach that Newton himself took, although
a
he didnt prove that every continuous function was a derivative but rather merely
assumed it. It is also what freshman calculus students seem to think is taught in
b
freshman calculus, namely that the denition of a f is F (b) F (a).2
But I do not know any way to prove that an arbitrary continuous function has
b
an antiderivative except to give a constructive denition of a f as a limit of sums
x
and then appeal to Theorem 8.1b) to get that a f is an antiderivative of f .
Thus Theorem 8.1 is easy because it diverts the hard work elsewhere: we need to
give a constructive denition of the denite integral via a (new) kind of limiting process and then show from scratch that applied to every continuous f : [a, b] R
b
this limiting process converges and results in a well-dened number a f .
2. Building the Denite Integral
2.1. Upper and Lower Sums.
Now we begin the proof of the hard fact lurking underneath the Fundamental Theorem of Calculus: that we may dene for every continuous function f : [a, b] R
b
a number a f so as to satisfy (I1) through (I3) above. For now, we will make
a simplifying assumption on our class of integrable functions: namely, let us only
consider functions f : [a, b] R such that for every closed subinterval [c, d] [a, b],
f : [c, d] R has a maximum and minimum value. Of course this holds for all
continuous functions, so it will be a good start.
The basic idea is familiar from freshman calculus: we wish to subdivide our interval [a, b] into a bunch of closed subintervals meeting only at the endpoints, and
then we want to consider the lower sum and upper sum associated to f on each
subinterval. Then the lower sum should be less than or equal to the true area
under the curve which should be less than or equal to the upper sum, and by
dividing [a, b] into more and smaller subintervals we should get better and better
b
approximations to the true area under the curve, so we should dene a f via
some limiting process involving lower sums and upper sums.
Okay, lets do it!
Serge Lang in [L]. So the presentation that I give here is not being given by me for the rst time
and was not originated by me...but nevertheless the material is rarely presented this way.
2This is not what the books actually say, but what they actually say they dont say loudly
enough in order for the point to really stick.
155
Step 1: We need the notion of a partition of an interval [a, b]: we choose nitely
many same points in [a, b] and use them to divide [a, b] into subintervals. Formally, a partitition P is given by a positive integer n and real numbers
a = a0 a1 . . . an1 an = b.
That is, we require the rst sample point a0 to be the left endpoint of the interval,
the last sample point an to be the right endpoint of the interval, and the other
(distinct) points are absolutely arbitrary but written in increasing order.
Let f : [a, b] R be any function admitting a minimum and maximum value on
every closed subinterval of [a, b] (e.g. any continuous function!). For 0 i n 1,
let mi (f ) denote the minimum value of f on the subinterval [xi , xi+1 ] and let Mi (f )
denote the maximum value of f on the subinterval [xi , xi+1 ]. Then we dene the
lower sum associated to f : [a, b] R and the partition P = {x0 , . . . , xn } as
L(f, P) =
n1
mi (f )(xi+1 xi )
i=0
n1
Mi (f )(xi+1 xi ).
i=0
These sums have a simple and important geometric interpretation: for any 0 i
n 1, the quantity xi+1 xi is simply the length of the subinterval [xi , xi+1 ]. So
consider the constant function mi (f ) on the interval [xi , xi+1 ]: by denition of mi ,
this is the largest constant function whose graph lies on or below the graph of f
at every point of [xi , xi+1 ]. Therefore the quantity mi (f )(xi+1 xi ) is simply the
area of the rectangle with height mi (f ) and width xi+1 xi , or equivalently the
area under the constant function y = mi (f ) on [xi , xi+1 ].
We say the function f : [a, b] R is integrable if there is a unique I R
such that for every partition P of [a, b] we have
L(f, P) I U (f, P).
This denition, although correct, is not ideally formulated: it underplays the most
important part the uniqueness of I while making it annoying to show the existence of I. (It turn outs that there is always at least one I lying between every lower
sum and every upper sum, but this is as yet far from clear.) Here are some examples.
Example 2.1: If f (x) C is a constant function, then for every partition P on
[a, b] we have L(f, P) = U (f, P) = C(b a). Thus the unique I in question is
celarly C(b a): constant functions are integrable.
Example 2.2: Suppose f (x) is constantly equal to 1 on the interval [a, b] except
for one interior point c, at which f (c) = 0. We claim that despite having a disb
continuity at c, f is integrable, with a f = b a. To see this, rst observe that
for any partition P of [a, b] we have U (f, P) = b a. Indeed this is because on
every subinterval of [a, b] f has 1 as its maximum value. On the other hand, for
156
8. INTEGRATION
any suciently small > 0, we may choose a partition in which c occurs in exactly
one subinterval (i.e., c is not one of the points of the partition). Then the lower
sum on that subinterval is 0, whereas on every other subinterval the minimum is
again 1, so L(f, P) = (b a)(1 ). This shows that the unique number between
b
every L(f, P) and U (f, P) is b a, so a f = (b a).
Exercise 2.3: Show that starting with the constant function C on [a, b] and changing
b
its value at nitely many points yields an integrable function f with a f = C(ba).
The previous examples have the property that the upper sums U (f, P) are constant. When this happens, one can show f is integrable by nding a sequence of
partitions for which the lower sums approach this common value U (f, P) which
must then be the integral. But constancy of upper sums only occurs in trivial examples. For instance, suppose we want to show that f (x) = x is integrable on [0, 1].
If we partition [0, 1] into n equally spaced subintervals let us call this partition
Pn then since f is increasing its minimum on each subinterval occurs at the left
endpoint and its maximum on each subinterval occurs at the right endpoint. Thus
(
)
n1
n1
(i) 1
1
1
(n 1)n
1
L(f, Pn ) =
= 2
i= 2
=1
.
n
n
n
n
2
2n
i=0
i=0
and
U (f, Pn ) =
n1
(
i=0
i+1
n
(
)
n
1
1
1 n(n + 1)
1
= 2
i= 2
=1+
.
n
n i=1
n
2
2n
limx x1
2x
1
Since
= limx x+1
2x = 2 , the upper and lower sums can both be
1
made arbitrarily close to 2 by taking n to be suciently large. Thus if f (x) = x
is integrable on [0, 1], its integral must be 12 . Unfortunately we have not yet shown
that f is integrable according to our denition: to do this we would have to consider
an arbitrary partition P of [0, 1] and show that L(f, P) 12 U (f, P). For this
very simple function f (x) = x it is possible to grind this out directly, but its quite
a bit of work. And thats just to nd the area of a right triangle!
157
n1
mi (f )(xi+1 xi ) [, ),
i=0
U (f, P) =
n1
Mi (f )(xi+1 xi ) (, ].
i=0
Observe though that the lower sum could take the value and the upper sum
could take the value . The following result claries when this is the case.
Proposition 8.2. Let f : [a, b] R be any function.
a) The following are equivalent:
(i) For all partitions P of [a, b], L(f, P) = .
(ii) There exists a partition P of [a, b] such that L(f, P) = .
(iii) f is not bounded below on [a, b].
b) The following are equivalent:
(i) For all partitions P of [a, b], U (f, P) = .
(ii) There exists a partition P of [a, b] such that U (f, P) = .
(iii) f is not bounded above on [a, b].
c) The following are equivalent:
(i) For all partitions P of [a, b], L(f, P) > and U (f, P) < .
(ii) f is bounded on [a, b].
Proof. a) (i) = (ii) is immediate.
(ii) = (iii): We prove the contrapositive: suppose that there is m R such that
m f (x) for all x [a, b]. Then for all partitions P = {a = x0 < . . . < xn1 <
xn = b} and all 0 i n 1, we have mi (f ) m > , so L(f, P) > .
(iii) = (i): Suppose f is not bounded below on [a, b], and let P = {a = x0 < . . . <
xn1 < xn = b} be a partition of [a, b]. If mi (f ) > for all 0 i n 1, then
minn1
i=0 mi (f ) is a nite lower bound for f on [a, b], contradicting our assumption.
So there is at least one i such that mi (f ) = , which forces L(f, P) = .
b) This is similar enough to part a) to be left to the reader.
c) If for all partitions P, L(f, P) > and U (f, P) < , then by parts a) and b)
f is bounded above and below on [a, b], so is bounded on [a, b]. Conversely, if f is
bounded on [a, b] then it is bounded above and below on [a, b], so by parts a) and
b), for all partitions P we have L(f, P) > and U (f, P) < .
Let P1 and P2 be two partitions of [a, b]. We say that P2 renes P1 if P2 contains
every point of P1 : i.e., if P1 P2 .
Lemma 8.3. (Renement Lemma) Let P1 P2 be partitions of [a, b] (i.e., P2
renes P1 ). Then for any bounded function f : [a, b] R we have
L(f, P1 ) L(f, P2 ) U (f, P2 ) U (f, P1 ).
158
8. INTEGRATION
f
a
f.
a
Proof. Recall that if X, Y R are such that x y for all x X and all
y Y , then sup X inf Y . Now, by Lemma 8.4, for any partitions P1 and P2 we
have L(f, P1 ) U (f, P2 ). Therefore
P1
P2
f.
a
159
b
a
f=
b
a
f = ?
(40)
U (f, P) U (f, P2 ) <
f+
2
a
and also
and thus
L(f, P) <
2
(41)
f ,
2
f.
a
f
a
160
8. INTEGRATION
b
b
Since this holds for all > 0, we have a f f . On the other hand, by Lemma
a
b
b
b
b
8.5 we have f a f , so f = a f R and thus f is Darboux integrable.
a
a
b b
b
(i) = (iii): Suppose f is Darboux integrable, so a = f = a f R. Then for
a
all partitions P we have
L(f, P)
b
f U (f, P).
a
f=
f=
a
L(f, P)
f U (f, P),
f<
a
b b
and thus every I [ f, a f ] lies between every upper sum and every lower sum.
a
2.3. Verication of the Axioms.
Let R([a, b]) denote the set of Darboux integrable functions on [a, b]. We now
tie together the work of the previous two sections by showing that the assignment
b
f R([a, b]) 7 a f satises the axioms (I0) through (I3) introduced in 1. In
particular, this shores up the foundations of the Fundamental Theorem of Calculus
and completes the proof that every continuous f : [a, b] R has an antiderivative.
In summary, we wish to prove the following result.
Theorem 8.8. (Main Theorem on Integration)
a) Every continuous function f : [a, b] R is Darboux integrable.
b) The operation which assigns to every Darboux integrable function f : [a, b] R
b
the number a f satises axioms (I0) through (I3) above.
c) Thus the Fundamental Theorem of Calculus holds for the Darboux integral. In
x
particular, for every continuous function f , F (x) = a f is an antiderivative of f .
Proof. a) Let f : [a, b] R be continuous. The key is that f is uniformly
continuous, so for all > 0, there is > 0 such that for all x1 , x2 [a, b],
ba
ba
U (f, Pn ) L(f, Pn ) =
(Mi (f ) mi (f ))
Mi (f ) mi (f ).
n
n
i=0
i=0
161
(43)
|Mi (f ) mi (f )| = |f (di ) f (ci )| <
.
ba
Combining (42) and (43) gives
) n1
(
) n1
(
ba
ba
(Mi (f ) mi (f ))
= .
U (f, Pn ) L(f, Pn )
n
n
ba
i=0
i=0
b) (I0): By part a), every continuous function f : [a, b] R is Darboux integrable.
By Proposition 8.6, every Darboux integrable function on [a, b] is bounded. (I1):
In Example 2.1, we showed that the constant function C is integrable on [a, b] with
b
C = C(b a). (I2): If f1 , f2 : [a, b] R are both Darboux integrable and such
a
that f1 (x) f2 (x) for all x [a, b], then for every partition P of [a, b] we have
L(f1 , P) L(f2 , P), and thus
b
b
f1 = sup L(f1 , P) sup L(f2 , P) =
f2 .
a
(I3): Let f : [a, b] R, and let c (a, b). Suppose rst that f : [a, b] R is
Darboux integrable: thus, for all > 0, there exists a partition P of [a, b] with
U (f, P) L(f, P) < . Let Pc = P {c}. By the Renement Lemma,
L(f, P) L(f, Pc ) U (f, Pc ) U (f, P),
so U (f, Pc ) L(f, Pc ) U (f, P) L(f, P) < . Let P1 = Pc [a, c] and P2 =
Pc [c, b]. Then
L(f, Pc ) = L(f, P1 ) + L(f, P2 ), U (f, Pc ) = U (f, P1 ) + U (f, P2 ),
and therefore
(U (f, P1 )L(f, P1 ))+(U (f, P2 )L(f, P1 )) = (U (f, P1 )+U (f, P2 ))(L(f, P1 )+L(f, P2 ))
= U (f, Pc ) L(f, Pc ) < ,
so by Darbouxs criterion f : [a, c] R and f : [c, b] R are Darboux integrable.
Conversely, suppose f : [a, c] R and f : [c, b] R are Darboux integrable; let
> 0. By Darbouxs criterion, there is a partition P1 of [a, c] such that
162
8. INTEGRATION
c
b
Thus a f + c f is a real number lying in between L(f, P) and U (f, P) for every
b
c
b
partition P of [a, b], so by Theorem 8.7 a f + c f = a f .
c) This is immediate from Theorem 8.1 (the Fundamental Theorem of Calculus!).
2.4. An Inductive Proof of the Integrability of Continuous Functions.
In this section we will give a proof of the Darboux integrability of an arbitrary
continuous function f : [a, b] R which avoids the rather technical Uniform Continuity Theorem. We should say that we got the idea for doing this from Spivaks
text, which rst proves the integrability using uniform continuity as we did above
and then later goes back to give a direct proof.
Theorem 8.9. Let f : [a, b] R be a continuous function on a closed bounded
interval. Then f is Darboux integrable.
Proof. By Darbouxs Criterion, it suces to show that for all > 0, there is
a partition P of [a, b] such that U (f, P) L(f, P) < . It is convenient to prove the
following slightly dierent (but logically equivalent!) statement: for every > 0,
there exists a partion P of [a, b] such that U (f, P) L(f, P) < (b a).
Fix > 0, and let S() be the set of x [a, b] such that there exists a partition
Px of [a, b] with U (f, Px ) L(f, Px ) < . We want to show b S(); our strategy
will be to show S() = [a, b] by Real Induction.
(RI1) The only partition of [a, a] is Pa = {a}, and for this partition we have
U (f, Pa ) = L(f, Pa ) = f (a) 0 = 0, so U (f, Pa ) L(f, Pa ) = 0 < .
(RI2) Suppose that for x [a, b) we have [a, x] S(). We must show that there
is > 0 such that [a, x + ] S(), and by the above observation it is enough
to nd > 0 such that x + S(): we must nd a partition Px+ of [a, x + ]
such that U (f, Px+ ) L(f, Px+ ) < (x + a)). Since x S(), there is a
partition Px of [a, x] with U (f, Px ) L(f, Px ) < (x a). Since f is continuous
at x, we can make the dierence between the maximum value and the minimum
value of f as small as we want by taking a suciently small interval around x: i.e.,
there is > 0 such that max(f, [x, x + ]) min(f, [x, x + ]) < . Now take the
smallest partition of [x, x + ], namely P = {x, x + }. Then U (f, P ) L(f, P ) =
(x+x)(max(f, [x, x+])min(f, [x, x+])) < . Thus if we put Px+ = Px +P
and use the fact that upper / lower sums add when split into subintervals, we have
U (f, Px+ ) L(f, Px+ ) = U (f, Px ) + U (f, P ) L(f, Px ) L(f, P )
= U (f, Px ) L(f, Px ) + U (f, P ) L(f, P ) < (x a) + = (x + a).
(RI3) Suppose that for x (a, b] we have [a, x) S(). We must show that x S().
The argument for this is the same as for (RI2) except we use the interval [x , x]
instead of [x, x + ]. Indeed: since f is continuous at x, there exists > 0 such that
max(f, [x, x])min(f, [x, x]) < . Since x < x, x S() and thus there
exists a partition Px of [a, x] such that U (f, Px ) = L(f, Px ) = (x a).
Let P = {x , x} and let Px = Px P . Then
U (f, Px ) L(f, Px ) = U (f, Px ) + U (f, P ) (L(f, Px ) + L(f, P ))
= (U (f, Px ) L(f, Px )) + (max(f, [x , x]) min(f, [x , x]))
< (x a) + = (x a).
163
Suppose now that c is a point in the interior of the domain D of f . We dene the
oscillation of f at c to be
(f, c) = lim (f, [c , c + ]).
0+
164
8. INTEGRATION
Now let f : [a, b] R and let P = {a = x0 < x1 < . . . < xn1 < xn = b}
be a partition of [a, b]. We dene3
(f, P) =
n1
i=0
Thus this notation is just a way of abbreviating the quantities upper sum minus
lower sum which will appear ubiquitously in the near future. We can restate Darbouxs Criterion especially cleanly with this new notation: a function f : [a, b] R
is integrable i for all > 0, there exists a partition P of [a, b] with (f, P) < .
3.2. Discontinuities of Darboux Integrable Functions.
At this point, I want to discuss the result that a bounded function f : [a, b] R
with only nitely many discontinuities is Darboux integrable. So I wrote up a direct proof of this and it was long and messy. Afterwards I realized that a better
argument is by induction on the number of discontinuities. One then has to prove
the result for a function with a single discontinuity (base case), and assuming the
result for every function with n discontinuities, prove it for every function with
n + 1 discontinutities (inductive step). Here the inductive step is especially easy:
if f : [a, b] R has n + 1 points of discontinuity, we can choose c (a, b) such
that f |[a,c] has exactly one discontinuity and f |[c,b] has exactly n discontinuities.
The restricted functions are Darboux integrable by the base case and the induction
hypothesis, and as we know, this implies that f : [a, b] R is Darboux integrable.
So really it is enough to treat the case of a bounded function with a single
discontinuity. It turns out that it is no trouble to prove a stronger version of this.
Theorem 8.11. Let f : [a, b] R be bounded. Suppose that for all c (a, b),
f |[c,b] : [c, b] R is Darboux integrable. Then f is Darboux integrable and
b
b
lim+
f=
f.
ca
Proof. Let M > 0 be such that |f (x)| M for all x [a, b]. Fix > 0 and
consider partitions P of [a, b] with x1 = a + . For such partitions,
(f, P) = (f, P [a, a + ]) + (f, P [a + , b]).
Since the inmum of f on any subinterval of [a, b] is at least M and the supremum
is at most M , (f, [a, a + ]) 2M , which we can make as small as we wish by
taking small enough. Similarly, having chosen , we may make (f, P [a + , b])
as small as we like with a suitable choice of P, since f is assumed to be Darboux
integrable on [a + , b]. Thus we can make the oscillation at most for any > 0,
so f is Darboux integrable on [a, b]. The second statement follows easily:
b
b
c
|
f
f| = |
f | 2M (c a),
a
3For once we do not introduce a name but only a piece of notation. In an earlier course on
this subject I called this quantity the oscillation of f on P, but this is not especially apt. Better
perhaps would be to call (f, P) the discrepancy of f and P, since it is the dierence between
the upper and the lower sum. But in fact it is simplest not to call it anything but (f, P)!
165
166
8. INTEGRATION
To see this, observe that for any xed , there are only nitely many nonzero rational numbers pq in [0, 1] with q : indeed there is at most 1 such with denominator
1, at most 2 with denominator 2, and so forth (and in fact there are less than
this because e.g. in our terminology the denominator of 24 is actually 2, since
2
1
4 = 2 in lowest terms). Suppose then that there are N points x in [0, 1] such that
f (x) . Choose a partition P such that each of these points x lies in the interior
of a subinterval of length at most N . Since the maximum value of f on [0, 1] is 1,
the term of the upper sum corresponding to each of these N bad subintervals is
; since there are N bad subintervals over all, this part of the sum
at most 1 2N
is at most N N = , and the remaining part of the sum is at most times the
length of [a, b] = [0, 1], i.e., at most . Thus U (f, P) + = 2. Since of course
lim0 2 = 0, this shows that f is Darboux integrable.
All of our results so far have been in the direction of exhibiting examples of Darboux
integrable functions with increasingly large sets of discontinuities. What about the
other direction: is there, for instance, a Darboux integrable function which is discontinuous at every point? In fact, no:
Theorem 8.14. Let f : [a, b] R be Darboux integrable. Let S be the set of
x [a, b] such that f is continuous at x. Then S is dense in [a, b]: i.e., for all
a x < y b, there exists z (x, y) such that f is continuous at z.
Proof. Step 1: We show that there is at least one c [a, b] such that f is
continuous at c. We will construct such a c using the Nested Intervals Theorem:
recall that if we have a sequence of closed subintervals [an , bn ] such that for all n,
an bn ,
an an+1 ,
bn+1 bn ,
there is at least one c such that an c bn for all n: indeed supn an inf n bn ,
so any c [supn an , inf n bn ] will do. Since f is Darboux integrable, for all n Z+
there is a partition Pn = {a = x0 < x1 < . . . < xn1 < xn = b} of [a, b] such that
(45)
(f, Pn ) =
n1
i=0
ba
.
n
Now (45) implies that for at least one 0 i n 1 we have (f, [xi , xi+1 ]) <
for, if not, (f, [xi , xi+1 ]) n1 for all i and thus
1
n:
1
1
1
xn x0
ba
(x1 x0 ) + (x2 x1 ) + . . . + (xn xn1 ) =
=
,
n
n
n
n
n
contradiction. We will use this analysis to choose a nested sequence of subintervals.
First we take n = 1 and see that there is some closed subinterval [xi , xi+1 ] of [a, b]
on which (f, [xi , xi+1 ]) < 1. We then dene a1 = xi , b1 = xi+1 , and instead of
considering f as dened on [a, b], we now consider it as dened on the subinterval
[a1 , b1 ]. Since f is Darboux integrable on [a, b], we know it is also Darboux integrable
on [a1 , b1 ], so the above argument still works: there exists a partition P2 of [a1 , b1 ]
such that for at least one subinterval [xi , xi+1 ] [a1 , b1 ] we have (f, [xi , xi+1 ]) <
1
2 . We then put a2 = xi (this is not necessarily the same number that we were
calling xi in the previous step, but we will stick with the simpler notation) and
b2 = xi+1 and have dened a sub-subinterval [a2 , b2 ] [a1 , b1 ] [ab ] on which
(f, Pn )
167
(f, [a2 , b2 ]) < n1 . Now, continuing in this way we construct a nested sequence
[an , bn ] of closed subintervals such that for all n Z+ , (f, [an , bn ]) < n1 . Now
apply the Nested Intervals Theorem: there exists c R such that c [an , bn ] for
all n Z+ . It follows that for all n Z+
1
(f, c) (f, [an , bn ])) < ,
n
i.e., (f, c) = 0 and thus f is continuous at c by Proposition 8.10.
Step 2: To show f has innitely many points of continuity, its enough to show that
for all N Z+ f is continuous at at least N distinct points, and we can do this by
induction, the base case N = 1 being Step 1 above. So suppose we have already
shown f is continuous at x1 < x2 < . . . < xN in [a, b]. Choose any A, B R with
a x1 < A < B < x2 b. Once again, since f : [a, b] R is Darboux integrable,
the restriction of f to [A, B] is Darboux integrable on [A, B]. Applying Step 1, we
get c [A, B] such that f is continuous at c, and by construction c is dierent from
all the continuity points we have already found. This completes the induction step,
and thus it follows that f is continuous at innitely many points of [a, b].
3.3. A supplement to the Fundamental Theorem of Calculus.
Theorem 8.15. Let f : [a, b] R be dierentiable and suppose f is Darboux
b
integrable. Then a f = f (b) f (a).
Proof. Let P be a partition of [a, b]. By the Mean Value Theorem there is
ti [xi , xi+1 ] such that f (xi+1 ) f (xi ) = f (ti )(xi+1 xi ). Then we have
mi (f )(xi+1 xi ) f (ti )(xi+1 xi ) Mi (f )(xi+1 xi )
and thus
mi (f )(xi+1 xi ) f (xi+1 ) f (xi ) Mi (f )(xi+1 xi ).
Summing these inequalities from i = 0 to n 1 gives
L(f , P) f (b) f (a) U (f , P).
b
Since for the integrable function f , a f is the unique number lying in between all
b
lower and upper sums, we conclude f (b) f (a) = a f .
How is Theorem 8.15 dierent from Theorem 8.1c)? Only in a rather subtle way:
in order to apply Theorem 8.1c) to f , we need f to be continuous, whereas in
Theorem 8.15 we are assuming only that f is Darboux integrable. Every continuous function is Darboux integrable but, as we have seen, there are discontinuous Darboux integrable functions. What about discontinuous, Darboux integrable
derivatives? The possible discontinuities of a monotone function are incompatible
with the possible discontinuities of a derivative: if f is monotone, it is continuous.
So we must look elsewhere for examples. In fact, we return to an old friend.
Example 3.3: Let a, b (0, ) and let fa,b be given by x 7 xa sin( x1b ), x = 0
and 0 7 0. Then fa,b is innitely dierentiable except possibly at zero. It is continuous at 0, the sine of anything is bounded, and limx0 xa = 0, so the product
approaches zero. To check dierentiability at 0, we use the denition:
ha sin( h1b )
f (h) f (0)
1
= lim
= lim ha1 sin( b ).
h0
h0
h0
h
h
h
f (0) = lim
168
8. INTEGRATION
This limit exists and is 0 i a 1 > 0 i a > 1. Thus if a > 1 then fa,b
(0) = 0. As
for continuity of fa,b at zero, we compute the derivative for nonzero x and consider
the limit as x 0:
1
1
fa,b
(x) = axa1 sin( b ) bxab1 cos( b ).
x
x
The rst term approaches 0 for a > 1. As for the second term, in order for the
fa,b
= fa,b (x) fa,b (0) = fa,b (x).
0
limx0 fa,b
does not exist, but fa,b
is bounded on any closed, bounded interval, say
[0, x]. Therefore Theorem 8.15 applies to give
x
fb+1,b
= fb+1,b (x) fb+1,b (0) = fb+1,b (x)
0
Proof. a) The idea here is simply that C may be factored out of the lower
sums and the upper sums. The details may be safely left to the reader.
b) Let I [a, b] be a subinterval, and let mf , mg , mf +g be the inma of f , g
169
+ = .
2 2
.
b a + 2M
iS1
(xi+1 xi )
n1
i=0
(xi+1 xi ) = b a.
170
8. INTEGRATION
On the other hand, since M f (x) M for all x [a, b], the oscillation of f on
any subinterval of [a, b] is at most 2M . Thus we get
(g f, P) =
(f, [xi , xi+1 ])(xi+1 xi ) +
(f, [xi , xi+1 ])(xi+1 xi )
iS1
iS2
(b a) + 2M
(xi+1 xi ).
b a + 2M
iS1
iS2
iS2
(Note that reasoning as above also gives iS2 xi+1 xi (b a), but this is not
good enough: using it would give us a second term of 2M (b a), i.e., not something
that we can make arbitrarily small.) Here is a better estimate:
1
1
(xi+1 xi ) =
(xi+1 xi ) <
(f, [xi , xi+1 ])(xi+1 xi )
(xi+1 xi ) + (2M )
iS2
(xi+1 xi )
iS2
iS2
n1
1
1
1
(f, [xi , xi+1 ])(xi+1 , xi ) = (f, P) < 2 = .
i=0
(b a)
2M
(b a) + 2M <
+
= .
b a + 2M
b a + 2M
b a + 2M
The proof of Theorem 8.17 becomes much easier if we assume that g is not merely
continuous but a Lipschitz function. Recall that f : I R is Lipschitz if there
exists C 0 such that for all x, y I, |f (x) f (y)| C|x y|.
Example: f : R R by x 7 |x| is a Lipschitz function. Indeed, the reverse
triangle inequality reads: for all x, y R,
||x| |y|| |x y|,
and this shows that 1 is a Lipschitz constant for f .
Exercise: a) For which functions may we take C = 0 as a Lipshitz constant?
b) Let I be an interval. Show that for every Lipshitz function f : I R, there is
a smallest Lipschitz constant.
Proposition 8.18. Let f : [a, b] R be a C 1 -function. Then M = maxx[a,b] |f (x)|
is a Lipschitz constant for f .
Proof. Let x < y [a, b]. By the Mean Value Theorem, there is z (x, y)
such that f (x)f (y) = f (z)(xy), so |f (x)f (y)| |f (z)||xy| M |xy|.
Lemma 8.19. Let f : I [c, d] be bounded and g : [c, d] R a Lipshitz function
with Lipschitz constant C. Then (g f, I) C(f, I).
Exercise: Prove Lemma 8.19.
Theorem 8.20. Let f : [a, b] [c, d] be Darboux integrable, and let g : [c, d]
R be Lipschitz with contant C. Then g f : [a, b] R is Darboux integrable.
171
n1
i=0
.
C
n1
Then by Lemma 8.19 we have (g f, P) = i=0 (g f, [xi , xi+1 ])(xi+1 xi )
(n1
)
()
= .
C
(f, [xi , xi+1 ])(xi+1 xi ) < C
C
i=0
Corollary 8.21. Let f : [a, b] R be Darboux integrable. Then |f | : [a, b]
R is Darboux integrable, and we have the integral triangle inequality
b
b
|
f|
|f |.
a
|f |
f
|f |,
a
so
f|
|f |.
a
Example 3.6: Let f : [1, 2] [0, 1] be the function which takes the value 1q at
every rational number pq and 0 at every irrational number, and let g : [0, 1] R
be the function which takes 0 to 0 and every x (0, 1] to 1. Then f is Darboux
integrable, g is bounded and discontinuous only at 0 so is Darboux integrable, but
g f : [1, 2] R takes every rational number to 0 and every irrational number to
1, so is not Darboux integrable. Thus we see that the composition of Darboux integrable functions need not by Darboux integrable without some further hypothesis.
Example 3.7: Above we showed that if g is continuous and f is Darboux integrable then the composition g f is Darboux integrable; then we saw that if g
172
8. INTEGRATION
and f are both merely Darboux integrable, g f need not be Darboux integrable.
So what about the other way around: suppose f is continuous and g is Darboux
integrable; must g f be Darboux integrable? The answer is again no; the easiest
counterexample I know is contained in a paper of Jitan Lu [Lu99].
4. Riemann Sums, Dicing, and the Riemann Integral
We now turn to the task of reconciling G. Darbouxs take on the integral with
B. Riemanns (earlier) work. Riemann gave an apparently dierent construction
b
of an integral a f which also satises axioms (I0) through (I3). By virtue of the
b
uniqueness of the integral of a continuous function, the Riemann integral R a f of
b
a continuous function agrees with our previously constructed Darboux integral a f .
But this leaves open the question of how the class of Riemann integrable functions compares with the class of Darboux integrable functions. In fact, although
the denitions look dierent, a function f : [a, b] R is Riemann integrable i it
b
b
is Darboux integrable and then R a f = a f . Thus what we really have is a rival
construction of the Darboux integral, which is in some respects more complicated
but also possesses certain advantages.
It turns out however to be relatively clear that a Riemann integrable function
is necessarily Darboux integrable. This suggests a slightly dierent perspective: we
view Riemann integrability as an additional property that we want to show that
every Darboux integrable function possesses. This seems like a clean way to go: one
b
the one hand, it obviates the need for things like R a f . On the other, it highlights
what is gained by this construction: namely, further insight on the relationship of
b
the upper and lower sums U (f, P) and L(f, P) to the integral a f . At the moment
the theory tells us that if f is Darboux integrable, then for every > 0 there exists
some partition P of [a, b] such that U (f, P ) L(f, P ) < . But this is not very
explicit: how do we go about nding such a P ? In the (few!) examples in which
we showed integrability from scratch, we saw that we could always take a uniform
partition Pn : in particular it was enough to chop the interval [a, b] into a suciently
large number of pieces of equal size. In fact, looking back at our rst proof of the
integrability of continuous functions, we see that at least if f is continuous, such
uniform partitions always suce. The key claim that we wish to establish in this
section is that for any integrable function we will have (f, Pn ) < for suciently
large n. In fact we will show something more general than this: in order to achieve
(f, P) < , we do not need P to have equally spaced subintervals but only to have
all subintervals of length no larger than some xed, suciently small constant .
Given a function f : [a, b] R and a partition P of [a, b], we will also introduce
b
a more general approximating sum to a f than just the upper and lower sums,
namely we will dene and consider Riemann sums. The additional exibility of
Riemann sums is of great importance in the eld of numerical integration (i.e.,
the branch of numerical analysis where we quantitatively study the error between
a numerical approximation to an integral and its true value), and it pays some
modest theoretical dividends as well. But the Riemann sums are little more than
a ligree to the main dicing property of the Darboux integral alluded to in the
last paragraph: the Riemann sums will always lie in between the lower and upper
173
sums, so if we can prove that the upper and lower sums are good approximations, in
b
whatever sense, to the integral a f , then the same has to be true for the Riemann
sums: they will be carried along for the ride.
4.1. Riemann sums.
Let f : [a, b] R be any function, and let P be a partition of [a, b]. Instead
of forming the rectangle with height the inmum (or supremum) of f on [xi , xi+1 ],
we choose any point xi [xi , xi+1 ] and take f (xi ) as the height of the rectann1
gle. In this way we get a Riemann sum i=0 f (xi )(xi+1 xi ) associated to
the function f , the partition P, and the choice of a point xi [xi , xi+1 ] for all
0 i n 1. Given a partition P = {a = x0 < x1 < . . . < xn1 < xn = b},
a choice of xi [xi , xi+1 ] for 0 i n 1 is called a tagging of P and gets
a notation of its own, say = {x0 , . . . , xn1 }. A pair (P, ) is called a tagged
partition, and given any function f : [a, b] R and any tagged partition (P, )
of [a, b], we associate the Riemann sum
R(f, P, ) =
n1
f (xi )(xi+1 xi ).
i=0
Let us compare the Riemann sums R(f, ) to the upper and lower sums. Just
because every value of a function f on a (sub)interval I lies between its inmum
and its supremum, we have that for any tagging of P,
L(f, P) R(f, P, ) U (f, P).
(47)
[xi , xi+1 ]
Conversely, if f is bounded then for all > 0 we can nd xi , x
i
such that sup(f, [xi , xi+1 ]) f (xi ) + and inf(f, [xi , xi+1 ]) f (x
)
, and it
i
follows that the upper and lower sums associated to f and P are the supremum
and inmum of the possible Riemann sums R(f, P, ):
(48)
Exercise 4.1: Show that (48) holds even if f is unbounded. More precisely, show:
a) If f is unbounded above, then sup R(f, P, ) = U (f, P) = .
b) If f is unbounded below, then inf R(f, P, ) = L(f, P) = .
From inequalities (47) and (48) the following result follows almost immediately.
Theorem 8.23. For a function f : [a, b] R, the following are equivalent:
(i) f is Darboux integrable.
(ii) For all > 0, there exists a real number I and a partition P of [a, b] such that
for any renement P of P and any tagging of P we have
|R(f, P, ) I| < .
b
If the equivalent conditions hold, then I = a f .
Proof. (i) = (ii): If f is Darboux integrable, then by Darbouxs Criterion
there is a partition P such that (f, P ) = U (f, P ) L(f, P ) < . For any
renement P of P we have (f, P) (f, P ), and moreover by integrablity
b
L(f, P)
f U (f, P).
a
174
8. INTEGRATION
Proof. (Levermore) Let > 0. There exists a partition P of [a, b] such that
b
b
(51)
0 U (f, P)
f = (U (f, P )
a
175
We will establish the claim by showing that the two terms on the right hand side of
(50) are each less than 2 and, similarly, that the two terms on the right hand side
of (51) are each less than 2 . Using the Renement Lemma (Lemma 8.3), we have
b
b
0
f L(f, P )
f L(f, P ) <
2
a
a
and
0 U (f, P )
f U (f, P )
a
f<
a
.
2
This gives two of the four inequalities. As for the other two: since P is a renement
of P = {a = x0 < . . . < . . . < xN 1 < xN = b}, for 0 i N 1, Pi :=
P [xi , xi+1 ] is a partition of [xi , xi+1 ]. By the Renement Lemma,
0 L(f, P ) L(f, P) =
n1
i=0
0 U (f, P) U (f, P ) =
n1
i=0
Because P has at most N 1 elements which are not contained in P, there are at
most N 1 indices i such that (xi , xi+1 ) contains at least one point of Pi . For all
other indices the terms are zero. Further, each nonzero term in either sum satises
0 L(f, Pi ) inf(f, [xi , xi+1 ]) 2M (xi+1 xi ) < 2M ,
0 sup(f, [xi , xi+1 ]) U (f, Pi ) 2M (xi+1 xi ) < 2M .
Because there are at most N 1 nonzero terms, we get
0 L(f, P ) L(f, P) < 2M N <
,
2
.
2
So the last terms on the right hand sides of (50) and (51) are each less than 2 .
0 U (f, P) U (f, P ) < 2M N <
176
8. INTEGRATION
b
b
Proof. a) (i) = (ii): if f is Darboux integrable, then a f = a f , and
property (ii) follows immediately from the Dicing Lemma (Lemma 8.25).
(ii) = (iii): Indeed, if (ii) holds then for any sequence of tagged partitions
(Pn , n ) with |Pn | 0, we have R(f, Pn , n ) I.
(iii) = (i): We will show the contrapositive: if f is not Darboux integrable,
then there is a sequence (Pn , n ) of tagged partitions with |Pn | 0 such that the
sequence of Riemann sums R(f, Pn , n ) is not convergent.
Case 1: Suppose f is unbounded. Then for any partition P of [a, b] and any M > 0,
there exists a tagging such that |R(f, P, )| > M . Thus we can build a sequence
of tagged partitions (Pn , n ) with |Pn | 0 and |R(f, Pn , n )| .
Case 2: Suppose f is bounded but not Darboux integrable, i.e.,
b
b
<
f<
f < .
a
lim R(f, Pn , Tn ) =
f.
a
f.
a
b
b
Since a f = a f , the sequence {R(f, Pn , n )}
n=1 does not converge.
b) This follows from Theorem 8.23: therein, the number I satisfying (ii) was unique.
Our condition (ii) is more stringent, so there can be at most one I satisfying it.
c) This is almost immediate from the equivalence (ii) (iii) and part b): we
leave the details to the reader.
4.3. The Riemann Integral.
By denition, a function f : [a, b] R satisfying condition (ii) of Theorem 8.26 is
Riemann integrable, and the number I associated to f is called the Riemann
integral of f . In this language, what we have shown is that a function is Riemann
integrable i it is Darboux integrable, and the associated integrals are the same.
177
As mentioned above, Riemann set up his integration theory using the Riemann integral. Some contemporary texts take this approach as well. It really is a bit messier
though: on the one hand, the business about the taggings creates another level
of notation and another (minor, but nevertheless present) thing to worry about.
But more signicantly, the more stringent notion of convergence in the denition
of the Riemann integral can be hard to work with: directly showing that the composition of a continuous function with a Riemann integrable function is Riemann
integrable seems troublesome. On the other hand, there are one or two instances
where Riemann sums are more convenient to work with than upper and lower sums.
Example 4.2: Suppose f, g : [a, b] R are both Darboux integrable. We wanted
to show that f + g is also Darboux integrable...and we did, but the argument was
slightly complicated by the fact that we had only inequalities
L(f, P) + L(g, P) L(f + g, P), U (f, P) + U (g, P) U (f + g, P).
However, for any tagging of P, the Riemann sum is truly additive:
R(f + g, P, ) = R(f, P, ) + R(g, P, ).
Using this equality and Theorem 8.23 leads to a more graceful proof that f + g is
b
b
g
integrable and a f + g = a f + a . I encourage you to work out the details.
Example 4.3: Let f : [a, b] R be dierentiable such that f is Darboux integrable. Choose a partition P = {a = x0 < x1 < . . . < xn1 < xn = b} of [a, b].
Apply the Mean Value Theorem to f on [xi , xi+1 ]: there is xi (xi , xi+1 ) with
f (xi+1 ) f (xi ) = f (xi )(xi+1 xi ).
Now {xi }n1
i=0 gives a tagging of P. The corresponding Riemann sum is
R(f, P, ) = (f (x1 ) f (x0 )) + . . . + (f (xn ) f (xn1 )) = f (b) f (a).
Thus, no matter what partition of [a, b] we choose, there is some tagging such that
b
the corresponding Riemann sum for a f is exactly f (b) f (a)! Since the integral
of an integrable function can be evaluated as the limit of any sequence of Riemann
b
sums over tagged partitions of mesh approaching zero, we nd that a f is the limit
of a sequence each of whose terms has value exactly f (b) f (a), and thus the limit
is surely f (b) f (a). This is not really so dierent from the proof of the supplement to the Fundamental Theorem of Calculus we gave using upper and lower sums
(and certainly, no shorter), but I confess I nd it to be a more interesting argument.
Remark: By distinguishing between Darboux integrable functions and Riemann
integrable functions, we are exhibiting a fastidiousness which is largely absent in
the mathematical literature. It is more common to refer to the Riemann integral to mean either the integral dened using either upper and lower sums and
upper and loewr integrals or using convergence of Riemann sums as the mesh of a
partition tends to zero. However, this ambiguity leads to things which are not completely kosher: in the renowned text [R], W. Rudin gives the Darboux version of
the Riemann integral, but then gives an exercise involving recognizing a certain
limit as the limit of a sequence of Riemann sums and equating it with the integral
of a certain function: hes cheating! Let us illustrate with an example.
178
8. INTEGRATION
n
k=1 k2 +n2 .
Solution: First observe that as a consequence of Theorem 8.26, for any Darboux
integrable function f : [0, 1] R, we have
1
n
1 k
lim
f( ) =
f.
n n
n
0
k=1
Now observe that our limit can be recognized as a special case of this:
n
n
n
n
n
1 n2
1
1
1 k
=
lim
=
lim
f ( ),
lim
=
lim
n n
n n
n
k 2 + n2
k 2 + n2
n
( nk )2 + 1 n n
k=1
where f (x) =
k=1
1
x2 +1 .
k=1
k=1
= arctan 1 arctan 0 = .
2+1
x
2
0
Anyway, we have done our homework: by establishing Theorem 8.26 we have earned
the right to use Darboux integral and Riemann integral interchangeably. In
fact however we will generally simply drop both names and simply speak of integrable functions and the integral. For us, this is completely safe. However,
as mentioned before, you should be aware that in more advanced mathematical
analysis one studies other kinds of integrals, especially the Lebesgue integral.
5. Lesbesgues Theorem
5.1. Statement of Lebesgues Theorem.
In this section we give a characterization of the Riemann-Darboux integrable functions f : [a, b] R due to H. Lebesgue. Lebesgues Theorem is a powerful, denitive
result: many of our previous results on Riemann-Darboux integrable functions are
immediate corollaries. Here we give an unusually elementary proof of Lebesgues
Theorem following lecture notes of A.R. Schep.
For an interval I, we denote by (I) its length: for all < a b < ,
((a, b)) = ([a, b)) = ((a, b]) = ([a, b]) = b a,
((a, )) = ([a, )) = ((, b)) = ((, b]) = ((, )) = .
We dene a subset S R to have measure zero if for all > 0, there is a sequence
N
{In }of open intervals in R such that (i) for all N 1, n=1 (In ) , and (ii)
= .
In,k =
(In,k ) <
2k+1
n
n
n,k
5. LESBESGUES THEOREM
179
180
8. INTEGRATION
Proof. Fix > 0, and choose a partition P of [a, b] such that U(f, P)) < C.
Let i be the set of indices such that SC [xi , xi+1 ] is nonempty. Thus, i I
Mi = sup(f, [xi , xi+1 ]) C. It follows that
C > U(f, P)
Mi (xi+1 xi ) C
([xi , xi+1 ]).
iI
iI
Thus SC iI [xi , xi+1 ] and iI ([xi , xi+1 ]) < . Thus SC has content zero.
b) We have S = n=1 S n1 . Each S n1 has content zero, hence certainly has measure
zero. Now apply Proposition 8.27.
Exercise: Let E be a subset of R, and let E be the set of x R such that for all
> 0, there is y E with |x y| . Show that E has content zero i E has
measure zero.
5.3. Proof of Lebesgues Theorem.
The proof of Lebesgues Theorem uses Heine-Borel (Theorem 6.17) and also the
fact that a bounded monotone sequence is convergent (Theorem 10.11), which we
will not discuss until Chapter 9. All in all this section should probably be omitted
on a rst reading, and only the extremely interested student need proceed.
Step 0: Let P = {a = x0 < x1 < . . . < xn1 < xn = b} be a partition of
[a, b]. As usual, for 0 i n 1, put
mi = inf(f, [xi , xi+1 ]), Mi = sup(f, [xi , xi+1 ],
so that
L(f, P) =
n1
n1
i=0
i=0
Let , : [a, b] R be the lower and upper step functions, i.e., takes the value mi
on [xi , xi+1 ) and takes the value Mi on [xi , xi+1 ). The functions , are bounded
with only nitely many discontinuities, so are Riemann-Darboux integrable by X.X;
moreover
b
n
= L(f, P),
= U (f, P).
a
k = U(f, Pk )
a
f.
a
Further, since Pk Pk+1 for all k, for all x [a, b), the sequence k (x) is increasing
and bounded above, hence convergent, say to (x); and similarly the sequence k (x)
6. IMPROPER INTEGRALS
181
is decreasing and bounded below, hence convergent, say to (x). For all x [a, b),
we have
k (x) (x) f (x) (x) k (x),
and thus
k
a
f
a
k .
a
b
These inequalities show that and are Riemann-Darboux integrable and a =
b
b
= a f . Applying Lemma 8.29 to , we get that = except on a set of
a
measure zero. Let
Since E is the union of a set of measure zero with a the union of a sequence of
nite sets, it has measure zero. We claim that f is continuous at every point of
[a, b] \ E, which will be enough to complete this direction of the proof.
proof of claim Fix x0 [a, b] \ E. Since (x) = (x), there is k Z+ such
that k (x0 ) k (x0 ) < . Further, since x0
/ Pk , there is > 0 such that k k
is constant on the interval (x0 , x0 + ), and for x in this interval we have
< k (x0 ) k (x0 ) f (x) f (x0 ) k (x0 ) k (x0 ) < ,
so f is continuous at x0 .
Step 2: Suppose now that f is bounded on [a, b] and continuous on [a, b] \ E for a
subset E of measure zero. We must show that f is Riemann-Darboux integrable
on [a, b]. Let M be such that |f (x)| M for all x [a, b]. Fix > 0. Since E has
measure
zero, there is a sequence {In }
n=1 of open intervals covering E such that
N
1
i=0
iI
([ti , ti+1 ]) 2M +
([ti , ti+1 ])
iI
/
2(b a)
( )
2M + (b a)
= .
<
4M
2(b a)
Since > 0 was arbitrary, f is Riemann-Darboux integrable on [a, b].
6. Improper Integrals
6.1. Basic denitions and rst examples.
182
8. INTEGRATION
In other words, the limit exists as a real number i F is bounded; otherwise, the
limit is : there is no oscillation! We deduce:
Proposition 8.30. Let f : [a,
) [0, ) be integrable
on every nite interval
[a, N ] with N a. Then either a f is convergent or a f = .
Inview of Proposition 8.30, we may write the two alternatives as
a f < (convergent)
a f = (divergent).
2
Example: Suppose we wish to compute ex . Well, we are out of luck: this
integral cannot be destroyed ahem, I mean computed by any craft that we
here possess.4 The problem is that we do not know any useful expression for the
2
antiderivative of ex (and in fact it can be shown that this antiderivative is not an
elementary function). But because we are integrating a non-negative function,
we know that the integral is either convergent or innite. Can we at least decide
which of these alternatives is the case?
Yes we can. First, since we are integrating an even function, we have
2
2
ex = 2
ex .
x2
2
at least for suciently large x. So it seems like a good guess that 0 ex < .
Can we formalize this reasoning?
Yes we can. First, for all x 1, x x2 , and since (ex ) = ex > 0, ex is in2
2
creasing, so for all x 1 ex ex , and nally, for all x 1, ex ex . By the
familiar (I2) property of integrals, this gives that for all N 1,
N
N
2
ex
ex ,
1
2
ex
1
ex .
6. IMPROPER INTEGRALS
183
ex = ex |
e1 ) = .
1 = (e
e
1
1
x2
x2
: does that make a dierence? Well,
Note that we replaced 0 e
with 1 e
1
2
yes, the dierence between the two quantities is precisely 0 ex , but this is a
proper integral, hence nite, so removing it changes the value of the integral
which we dont know anyway! but not whether it converges. However we can be
2
slightly more quantitative: for all x R, x2 0 so ex 1, and thus
1
1
2
ex
1 = 1,
0
x2
x2
e
1
(
)
1
2 1+
= 2.735 . . . .
e
The exact value of the integral is known we just dont possess the craft to nd it:
2
ex = = 1.772 . . . .
b) If a f = , then a g = .
Proof. By property (I2) of integrals, for all N a since f (x) g(x) on [a, N ]
N
N
we have a f a g. Taking limits of both sides as N gives
(52)
f
g;
a
184
8. INTEGRATION
d) Suppose there exists b a such that g(x) > 0 for all x b and that limx
L with 0 < L < . Then
f <
g < .
a
f (x)
g(x)
Proof. For any b a, since f and g are integrable on [a, b], we have
g < .
g <
f < ,
f <
b
a
b
a
f
g < . Then by part a), a f < , contradiction.
a
(x)
(x)
= L < , there is b a such that for all x b, fg(x)
L + 1.
c) Since limx fg(x)
Thus for all x b we have f (x) (L + 1)g(x), so (S) holds.
(x)
d) Suppose limx fg(x)
= L (0, ). By part c), (S) holds, so by part a), if
1
g < , then a f < . Moreover, since L = 0, limx fg(x)
(x) = L (0, ). So
a
part c) applies with the roles of f and g reversed: if a f < then a g < .
Although from a strictly logical perspective part d) of Theorem 8.32 is the weakest,
it is the most useful in practice.
CHAPTER 9
Integral Miscellany
1. The Mean Value Therem for Integrals
Theorem 9.1. Let f, g : [a, b] R. Suppose that f is continuous and g is
integrable and non-negative. Then there is c [a, b] such that
b
b
(53)
f g = f (c)
g.
a
fg
Thus aI lies between the minimum and maximum values of the continuous function f : [a, b] R, so by the Intermediate Value Theorem there is c [a, b] such
b
b
fg
that f (c) = ab g . Multiplying through by a g, we get the desired result.
a
Exercise 9.1.1: Show that in the setting of Theorem 9.1 we may in fact choose
c (a, b).
Exercise 9.1.2: Show by example that the conclusion of Theorem 9.1 becomes false
even when g 1 if the hypothesis on continuity of f is dropped.
Exercise 9.1.3: Suppose that f : [a, b] R is dierentiable and f is continuous on [a, b]. Deduce the Mean Value Theorem for f from the Mean Value Theoem
for Integrals and the Fundamental Theorem of Calculus.1
2. Some Antidierentiation Techniques
2.1. Change of Variables.
1Since the full Mean Value Theorem does not require continuity of f , this is not so exciting.
But we want to keep track of the logical relations among these important theorems.
185
186
9. INTEGRAL MISCELLANY
entiable functions:
i.e., f and
g are dened and continuous on I.
a) We have f g = f g f g, in the sense that subtracting from f g any antiderivative of f g gives an antiderivative of f g .
b) If [a, b] I then
b
b
f g = f (b)g(b) f (a)g(a)
f g.
a
Exercise 9.2.1: a) Prove Theorem 9.2. (Yes, the point is that its easy.)
b) Can the hypothesis on continuous dierentiability of f and g be weakened?
Exercise 9.2.2: Use Theorem 9.2a) to nd antiderivatives for the following functions.
a) xn ex for 1 n 6.
b) log x.
c) arctan x.
d) x cos x.
e) ex sin x.
f) sin6 x.
g) sec3 x, given that log(sec x + tan x) is antiderivative of sec x.
Exercise 9.2.3: a) Show that for each n Z+ , there is a unique monic (= leading
d
coecient 1) polynomial Pn (x) such that dx
(Pn (x)ex ) = xn ex .
b) Can you observe/prove anything about the other coecients of Pn (x)? (If you
did part a) of the preceding exercise, you should be able to nd patterns in at least
three of the non-leading coecients.)
Exercise 9.2.4: Use Theorem 9.2a) to derive the following reduction formulas:
here m, n Z+ and a, b, c R.
a)
1
n1
cosn x = cosn1 x sin x +
cosn2 x.
n
n
b)
1
x
2n 3
1
=
+
.
2
2
n
2
2
2
n1
2
2
(x + a )
2a (n 1)(x + a )
2a (n 1)
(x + a2 )n1
c)
m1
1
sinm1 ax cosn+1 ax +
sin ax cos ax =
a(m + n)
m+n
m
sinm2 ax cosn ax
3. APPROXIMATE INTEGRATION
1
n1
sinm+1 ax cosn1 ax +
a(m + n)
m+n
187
= (0 0) + (n + 1)
xn ex dx = (n + 1)n! = (n + 1)!
IH
2.3. Integration of Rational Functions.
3. Approximate Integration
Despite the emphasis on integration (more precisely, antidierentiation!) techniques
in a typical freshman calculus class, it is a dirty secret of the trade that in practice
many functions you wish to integrate do not have an elementary antiderivative,
i.e., one that can be written in (nitely many) terms of the elementary functions one
learns about in precalculus mathematics. Thus one wants methods for evaluating
denite integrals other than the Fundamental Theorem of Calculus. In practice it
b
would often be sucient to approximate a f rather than know it exactly.2
Theorem 9.4. (Endpoint Approximation Theorem) Let f : [a, b] R be differentiable with bounded derivative: there is M 0 such that |f (x)| M for
all x [a, b]. For n Z+ , let Ln (f ) be the left endpoint Riemann sum obtained by dividing
spaced subintervals: thus for 0 i n 1,
(
) [a, b] into n equally
n1
ba
xi = a + i ba
and
L
(f
)
=
f
n
i=0 (xi ) n . Then
n
(
)
b
(b a)2 M 1
f Ln (f )
a
2
n
Proof. Step 1: We establish the result for n = 1. Note that L1 (f ) = (b
a)f (a). By the Racetrack Principle, for all x [a, b] we have
M (x a) + f (a) f (x) M (x a) + f (a)
2It would be reasonable to argue that if one can approximate the real number b f to any
a
188
and thus
9. INTEGRAL MISCELLANY
(M (x a) + f (a)))
a
f
a
(M (x a) + f (a)).
a
Thus
b
M
M
2
(b a) + (b a)f (a)
f
(b a)2 + (b a)f (a),
2
2
a
which is equivalent to
M
b
f L1 (f )
(b a)2 .
a
2
Step 2: Let n Z+ . Then
(
)
n1
n1
b
xi+1
xi+1
ba
ba
f Ln (f ) = |
f f (xi )(
) |
f f (xi )(
)|
a
n
n
xi
xi
i=0
i=0
L
(f
)
(
)
=
.
n
a
2
n
2
n
n=0
Exercise 9.3.1: Show that Theorem 9.4 holds verbatim for with left endpoint sums
replaced by right endpoint sums Rn (f ).
Exercise 9.3.2: a) Suppose f : [a, b] R is increasing. Show that
b
1
0
f Ln (f ) (f (b) f (a))(b a) .
n
a
b) Derive a similar result to part a) for the right endpoint sum Rn (f ).
c) Derive similar results to parts a) and b) if f is decreasing.
In view of the preceding exercise there is certainly no reason to prefer left endpoint sums over right endpoint sums. In fact, if you stare at pictures of left and
right endpoint sums for a while, eventually it will occur to you to take the average
of the two of them: with = ba
n , we get
(55)
1
1
Tn (f ) = Ln (f )+Rn (f ) = f (a)+f (a+)+f (a+2) . . .+f (a+(n1))+ f (b).
2
2
On each subinterval [xi , xi+1 ] the
( average of )the left and right endpoint sums is
f (xi )(xi+1 xi )+f (xi+1 )(xi+1 xi )
2
Thinking of the trapezoidal rule this way suggests that Tn (f ) should, at least for
b
nice f and large n, a better approximation to a f than either the left or right
3. APPROXIMATE INTEGRATION
189
endpoint sums. For instance, if f is linear then on each subinterval we are approxb
imating f by itself and thus Tn (f ) = a f . This was certainly not the case for
the endpoint rules: in fact, our proof of Theorem 9.4 showed that for each xed
M , taking f to be linear with slope M (or M ) was the worst case scenario: if
we approximate a linear function (x) with slope M on the interval [a, b] by the
horizontal line given by the left endpoint (say), then we may as well assume that
b
(a) = 0 and then we are approximating by the zero function, whereas a is
2
the area of the triangle with base (b a) and height M (b a), hence M
2 (b a) .
Another motivation is that if f is increasing, then Ln (f ) is a lower estimate and
Rn (f ) is an upper estimate, so averaging the two of them gives something which is
closer to the true value.
The following result conrms our intuitions.
Theorem 9.5. (Trapezoidal Approximation Theorem) Let f : [a, b] R be
twice dierentiable with bounded second derivative: there is M 0 such that
|f (x)| M for all x [a, b]. Then for all n Z+ we have
(
)
b
1
(b a)3 M
.
f Tn (f )
a
12
n2
and for i = 0, . . . , n 1, let xi = a + i. Also put
xi+1
t
i : [0, ] R, i (t) = (f (xi ) + f (xi + t))
f.
2
xi
Proof. Put =
ba
n ,
(56)
i () = Tn (f )
i=0
f.
a
i =
1
1
1
1
f (xi + t) + f (xi + t) + tf (xi + t) = tf (xi + t).
2
2
2
2
Put
A = inf(f , [a, b]) B = sup(f , [a, b]).
Then for all 0 i n 1 and t [0, ] we have
B
A
t i (t) t.
2
2
190
9. INTEGRAL MISCELLANY
b
a
f.
Looking back at the formula (55) for the trapezoidal rule we notice that we have
given equal weights to all the interior sample points but only half as much weight
to the two endpoints. This is an instance of a heuristic in statistical reasoning:
extremal sample points are not as reliable as interior points. This suggests a dierent kind of approximation scheme: rather than averaging endpoint Riemann sums,
lets consider the Riemann sums in which each sample point xi is the midpoint
of the subinterval [xi , xi+1 ]. Dividing the interval [a, b] into n subintervals of equal
width = ba
n as usual, this leads us to the midpoint rule
Mn (f ) =
n1
1
f (a + (i + )).
2
i=0
b
a
f = Mn (f ).
The following result concerning the midpoint rule is somewhat surprising: it gives
a sense in which the midpoint rule is twice as good as the trapezoidal rule.
Theorem 9.6. (Midpoint Approximation Theorem) Let f : [a, b] R be twice
dierentiable with bounded second derivative: there is M 0 such that |f (x)| M
for all x [a, b]. Then for all n Z+ we have
(
)
b
1
(b a)3 M
.
f Mn (f )
a
24
n2
for i = 0, . . . , n 1, let xi = a + (i + 12 ). Also put
xi +t
i : [0, /2] R, i (t) =
f 2tf (xi ).
Proof. Put =
ba
n ;
xi t
3. APPROXIMATE INTEGRATION
191
(/2) =
f Mn (f ).
(57)
a
i=0
f (xi + t) f (xi t)
(t)
= i ,
2t
2t
or
i (t) = 2tf (xi,t ).
Put
A = inf(f , [a, b]) B = sup(f , [a, b]).
Then for all 0 i n 1 and t [0,
2 ] we have
2tA i (t) 2tB.
Integrate and apply the Racetrack Principle twice as in the proof of Theorem 9.5,
and plug in t =
2 to get
(58)
A 3
B 3
i (/2)
.
24
24
b
(b a)3 A 1
(b a)3 B 1
.
f
M
(f
)
n
24
n2
24
n2
a
Since |A|, |B| M , we get the desired result:
(
)
b
1
(b a)3 M
f Mn (f )
.
a
24
n2
For integrable f : [a, b] R and n Z+ , we dene Simpsons Rule
2
1
Mn (f ) + Tn (f ).
3
3
Thus S2n (f ) is a weighted average of the midpoint rule and the trapezoidal rule.
Since our previous results suggest that the midpoint rule is twice as good as the
trapezoidal rule, it makes some vague sense to weight in this way.
S2n (f ) =
Exercise: a) Show that for any a, b R such that a+b = 1, aMn (f )+bTn (f )
b
b) Deduce that S2n (f ) a f .
b
a
f.
Let us better justify Simpsons Rule. As with the other approximation rules, it
192
9. INTEGRAL MISCELLANY
Lemma 9.7. For any quadratic function f (x) = Ax2 + Bx + C, Simpsons Rule
b
is exact: a f = S2n (f ) for all n Z+ .
Proof. Let = ba
2n . By splitting up [a, b] into n pairs of subintervals, it sufb
ces to show a f = S2 (f ). This is done by a direct, if unenlightening, calculation:
b
A
B
(Ax2 + Bx + C) = (b3 a3 ) + (b2 a2 ) + C(b a)
3
2
a
(
)
A 2
B
= (b a)
(a + ab + b2 ) + (a + b) + C) ,
3
2
whereas
(
(
)
)
ba
a+b
f (a) + 4f
+ f (b)
6
2
(
)
(
)2
(
)
ba
a+b
a+b
2
2
=
Aa + Ba + C + 4(A
+B
+ C) + (Ab + Bb + C)
6
2
2
)
ba(
=
A(a2 + (a2 + 2ab + b2 ) + b2 ) + B(a + 2a + 2b + b) + 6C
6
)
(
B
A 2
(a + ab + b2 ) + (a + b) + C .
= (b a)
3
2
Thus whereas the endpoint rule is an approximation by constant functions and
the trapezoidal rule is an approximation by linear functions, Simpsons rule is an
approximation by quadratic functions. We may therefore expect it to be more accurate than either the Trapezoidal or Midpoint Rules. The following exercise (which
again, can be solved by a direct, if unenlightening, calculation), shows that it is
even a little better than we might have expected.
Exercise 9.3.6: Suppose f is a cubic polynomial. Show that Simpsonss Rule is
b
exact: S2n (f ) = a f .
3. APPROXIMATE INTEGRATION
193
Since Simpsons Rule is exact for polynomials of degree at most 3 and a function f is a polynomial of degree at most three if and only if f (4) 0, it stands to
reason that Simpsons Rule will be a better or worse approximation according to
the magnitude of f (4) on [a, b]. The following result conrms this.
Theorem 9.8. (Simpson Approximation Theorem) Let f : [a, b] R be four
times dierentiable with bounded fourth derivative: there is M 0 such that
|f (4) (x)| M for all x [a, b]. Then for all even n Z+ we have
(
)
b
(b a)4 M
1
f Sn (f )
.
a
180
n4
Proof. For 0 i
n
2
i : [0, ] R, i (t) =
1
t(f (xi t) + 4(xi ) + f (xi + t))
3
n1
(59)
i (t) = Sn (f )
i=0
xi +t
f.
xi t
f.
a
1
1
2
2
t(f (xi t)+f (xi +t))+ (f (xi +t)f (xi t))+ f (xi t) f (xi +t)
3
3
3
3
1
1
= t(f (xi + t) + f (xi t)) (f (xi + t) f (xi t)).
3
3
So i (0) = 0 and
i (t) =
1
1
1
1
t(f (xi +t)f (xi t))+ (f (xi +t)+f (xi t)) f (xi +t) f (xi t)
3
3
3
3
1
= t(f (xi + t) f (xi t)).
3
So
(0)
=
0.
Applying
the
Mean Value Theorem to f on [xi t, xi + t], there
i
is (xi t, xi + t) such that
i (t) =
3
f (xi + t) f (xi t)
i (t)/t2 =
= f (4) ().
2
2t
Let A = inf(f (4) , [a, b]), B = sup(f (4) , [a, b]), so for all i and all t [0, ] we have
2B 2
2A 2
t
t .
i (t)
3
3
Integrate and apply the Racetrack Principle three times and plug in t = to get
(60)
B 3
A 3
i ()
.
90
90
194
9. INTEGRAL MISCELLANY
S
(f
)
.
n
180
n4
180
a
ba
n
gives
Exercise 9.3.7: a) In the setting of the Trapezoidal Rule, suppose moreover that
f is continuous on [a, b]. Adapt the proof of Theorem 9.5 to show that there is
[a, b] such that
(
)
b
(b a)3 f () 1
f=
Tn (f )
.
12
n2
a
b) Derive similar error equalities for the Endpoint, Midpoint and Simpson rules.
The proofs of Theorems 9.5, 9.6 and 9.8 are taken from [BS, Appendix D]. They are
admirably down-to-earth, using nothing more than the Mean Value Theorem and
the Fundamental Theorem of Calculus (perhaps the two most basic and important
results of the entire course). However it must be admitted that they are rather
mysterious. Even our treatment of Simpsons Rule is a bit obscure: we introduced
it simply as a nite sum and then asked the reader to show in the exercises that it
is a weighted average of the trapezoidal and midpoint rules. A better motivation
for Simpsons Rule is given by Exericse X.Xd): it is in fact what one obtains by
splitting [a, b] up into subintervals [xi , xi + 2] and on each subinterval approxib
mating a f by the integral of the unique quadratic function which interpolates f
at the points x, xi + , xi + 2.
This suggests that all of the rules have something to do with polynomial interpolation. This is true and will be taken up in more detail in Chapter 12,
which is concerned with various types of polynomial approximations. For now we
just mention the sobering fact that all these rules and higher-degree analogues
of them were already known to Newton, developed in jointly with his younger
collaborator Roger Cotes in the early years of the 18th century.
4. Integral Inequalities
Theorem 9.9. (The Hermite-Hadamard Inequality) Let f : [a, b] R be convex
and continuous. Then
b
(
)
f
a+b
f (a) + f (b)
f
a
.
2
ba
2
a+b
Proof. Let s(x) =( f ( a+b
2 ) +)m(x 2 ) be a supporting line for f at x =
f (b)f (a)
ba
a+b
2 ,
(61)
a .
ba
ba
ba
3Note that there are
to an extra factor of 12 .
n
2
terms here half as many as in the previous results. This gives rise
4. INTEGRAL INEQUALITIES
195
Now we have
b
b a+b
b
s(x)
f ( 2 ) + m(x a+b
a+b
a+b
m
2 )
a
(x
= a
= f(
)+
)
ba
ba
2
ba a
2
( 2
)
a+b
m
b
a2
a+b
a+b
m
a+b
= f(
)+
(b a) = f (
)+
0=
,
2
ba 2
2
2
2
ba
2
(
)
b
b
(a)
(f (a) + f (b)f
(x a))dx
S(x)dx
ba
a
f (a) + f (b)
a
=
=
.
ba
ba
2
b
s
ba
a
and ba
into (61) gives
b
f
a+b
f (a) + f (b)
f(
) a
.
2
ba
2
Exercise: Show that the hypothesis of continuity in Theorem 9.9 is not necessary:
the inequality holds for any convex f : [a, b] R.
Let [a, b] be a closed interval, and let P : [a, b] [0, ) be a probability density:
b
i.e., P is integrable on [a, b] and a P = 1. For an integrable function f : [a, b] R,
we dene the expected value
b
E(f ) =
f (x)P (x)dx.
a
c=
cP (x)dx
f (x)P (x)
a
dP (x) = d,
b
so a f (x)P (x)dx = E(f ) [c, d] and thus (E(f )) is dened. Now put x0 = E(f )
and let s(x) = mx + B be a supporting line for the convex function at x0 , so
s(x) (x) for all x [c, d] and s(x0 ) = (x0 ). Now
b
b
E((f )) =
(f (x))P (x)dx
(mf (x) + B)P (x)dx
=m
196
9. INTEGRAL MISCELLANY
b
Proof. Step 1: Suppose a |f |p = 0. Let Mf and Mg be upper bound for f
and g on [a, b]. By Lemma 8.29, the set of x [a, b] such that |f |p > has content
zero: thus, for every > 0 there is a nite collection of subintervals of [a, b], of total
length at most , such that on the complement of those subintervals |f |p . On
1
this complement, |f g| p Mg and thus the sum of the integrals of |f g| over the
1
complement is at most (b a) p Mg . The sum of the Riemann integrals over the
subintervals of total length of |f g| is at most Mf Mg , so altogether we get
b
b
1
|
f g|
|f g| (b a) p Mg + Mf Mg .
a
b
Since all the other quantities are xed and , are arbitrary, we deduce | a f g| = 0.
Thus both sides of (62) are zero in this case and the inequality holds. A similar
b
argument works if a |g|q = 0. Henceforth we may suppose
b
b
If =
|f |p > 0, Ig =
|g|q > 0.
a
1
p
1
q
|
f g|
|f ||
g|
+
= + = 1.
p
q
p q
a
a
a
1
b
a
|f |p
) p1 (
b
a
|g|q
) q1
f2
g2 .
a
197
0 L(f, P)
f< .
2
a
Let g be the step function which is constantly equal to mi = inf(f, [xi , xi+1 ]) on
b
the subinterval [xi , xi+1 ) of [a, b], so g f and a g = L(f, P), so
b
0
(f g) .
2
a
Now
b
b
b
f (x) cos(x)dx
|f (x) g(x)|| cos(x)|dx +
g(x) cos(x)dx
a
a
a
(63)
n1
n1
xi+1
mi
(sin(xi+1 ) sin(xi )) .
mi cos(x)dx +
+
2
2 i=0 xi
i=0
Here we have a lot of expressions of the form | sin(A) sin(B)| for which an obvious
upper bound is 2. Using this, the last expression in (63) is at most
n1
2 i=0 |mi |
+
.
2
But this inequality holds for any > 0, so taking suciently large we can make
b
the last term at most 2 and thus | a f (x) cos(x)dx| < .
CHAPTER 10
Innite Sequences
Let X be a set. An innite sequence in X is given by a function x : Z+ X.
Less formally but entirely equivalently, we are getting an ordered innite list of
elements of X: x1 , x2 , x3 , . . . , xn , . . .. Note that the function is not required to be
injective: i.e., we may have xi = xj for i = j. In fact, a simple but important example of a sequence is a constant sequence, in which we x some element x X
and take xn = x for all n.
The notion of an innite sequence in a general set X really is natural and important
throughout mathematics: for instance, if X = {0, 1} then we are considering innite sequences of binary numbers, a concept which comes up naturally in computer
science and probability theory, among other places. But here we will focus on real
innite sequences xbullet : Z+ R. In place of x , we will write {xn }
n=1 or
even, by a slight abuse of notation, xn .
We say that an innite sequence an converges to a real number L if for all > 0
there exists a positive integer N such that for all n N , we have |an L| < .
A sequence is said to be convergent if it converges to some L R and otherwise divergent. Further, we say a sequence an diverges to innity and write
limn an = or an if for all M R there exists N Z+ such that
n N = an > M . Finally, we dene divergence to negative innity: I leave it
to you to write out the denition.
This concept is strongly reminiscent of that of the limit of a function f : [1, ) R
as x approaches innity. In fact, it is more than reminiscent: there is a direct connection. If limx f (x) = L, then if we form the sequence xn = f (n), then it
follows that limn xn = L. If xn = f (x) for a function f which is continuous or
better, dierentiable then the methods of calculus can often be brought to bear
to analyze the limiting behavior of xn .
Given a sequence {xn }, we say that a function f : [1, ) R interpolates f
if f (n) = xn for all n Z+ .
Example: Supose xn =
log n
n .
Then f (x) =
log x
x
1
log x
LH
1
=
= lim x =
= 0.
x x
x 1
lim
It follows that xn 0.
Exercise: Let {an }
n=1 be a real sequence. Dene f : [1, ) R as follows:
for x [n, n + 1), f (x) = (n + 1 x)an + (x n)an+1 .
199
200
1
1
1
an =
= 1 + + ... + ,
i
2
n
i=1
and let
1
1
1
bn =
= 1 + 2 + ... + 2.
2
i
2
n
i=1
What is the limiting behavior of an and bn ? In fact it turns out that an and
2
bn 6 : whatever is happening here is rather clearly beyond the tools we have at
the moment! So we will need to develop new tools.
1. Summation by Parts
Lemma 10.1. (Summation by Parts) Let {an } and {bn } be two sequences. Then
for all m n we have
n
n
k=m
Proof.
n
= an bn+1 am bn
(ak+1 ak )bk+1
k=m
(ak+1 ak )bk+1
k=m
= an+1 bn+1 am bm
(ak+1 ak )bk+1 .
k=m
2. EASY FACTS
201
The proof of Lemma 10.1 could hardly be more elementary or straightforward: literally all we are doing is regrouping some nite sums. Nevertheless the human
mind strives for understanding and rebels against its absence: without any further
explanation, summation by parts would seem (to me at least) very mysterious.
The point is that Lemma 10.1 is a discrete analogue of integration by parts:
b
b
f g.
f g = f (b)g(b) f (a)g(a)
a
Exercise:
Let {a
nn}n=1 and {bn }n=1 be sequences, and for N Z , put An =
n
i=1 bn , A0 = B0 = 0. Show that
i=1 an , Bn =
N
(64)
an bn =
N
1
n=1
An (bn bn+1 ) + AN bN .
n=1
n=1
N
1
ab bn | = |
N
1
An (bn bn+1 ) + AN bN |
n=1
n=1
We will put Abels Lemma to use...but only much later on. For now though you
might try to prove it without using the summation by parts variant (64): by doing
so, youll probably gain some appreciation that these formulas, though in some
sense trivial, can be used in distinctly nontrivial ways.
2. Easy Facts
The following result collects some easy facts about convergence of innite sequences.
Theorem 10.3. Let {an }, {bn }, {cn } be real innite sequences.
a) If an = C for all n a constant sequence then an C.
b) The limit of a convergent sequence is unique: if for L1 , L2 R we have an L1
202
and an L2 , then L1 = L2 .
c) If an L and bn M then:
(i) For all C R, Can CL.
(ii) an + bn L + M .
(iii) an bn LM .
L
(iv) If M = 0, abnn M
.
d) If an bn for all n, an L and bn M , then L M .
e) If a b are such that an [a, b] for all n and an L, then L [a, b].
f ) (Three Sequence Principle) Suppose cn = an + bn . Then it is not possible for
exactly two of the three sequences {an }, {bn }, {cn } to be convergent: if any two are
convergent, then so is the third.
g) Suppose there exists N Z+ such that bn = an for all n N . Then for any
L [, ], an L bn L.
Most of these facts are qiute familiar, and the ones that may not be are routine.
In fact, every part of Theorem 10.3 holds verbatim for functions of a continuous
variable approaching innity. Hence one method of proof would be to establish
these for functions or maintain that we have known these facts for a long time
and then apply the Sequence Interpolation Theorem. But be honest with yourself:
for each part of Theorem 10.3 for which you have an iota of doubt as to how to
prove, please take some time right now to write out a careful proof.
We say that a sequence {an } is eventually constant if there is C R and N Z+
such that an = C for all n N . It is easy to see that if such a C exists then it
is unique, and we call such a C the eventual value of the sequence. Of course
an eventually constant sequence converges to its eventual value e.g. by applying
parts a) and g) of Theorem 10.3, but really this is almost obvous in any event.
Proposition 10.4. Let {an } be an innite sequence with values in the integers
Z. Then an is convergent i it is eventually constant.
Proof. As above, it is clear that an eventually constant sequence is convergent.
Conversely, suppose an L R. First we claim that L Z. If not, the distance
from L to the nearest integer is a positive number, say . But since an L, there
exists N Z+ such that for all n N , |an L| < . But the interval (L , L + )
contains no integers: contradiction.
Now take = 1 in the denition of convergence: there is N Z+ such that for
n N , we have |an L| < 1, and since an and L are both integers this implies
an = L. Thus the sequence is eventually constant with eventual value L.
Proposition 10.4 goes a long way towards explaining why we have described a function from Z+ to R as semi -discrete. A function from Z+ to Z+ is fully discrete,
and thus the limiting behavior of such functions is very limited.
Exercise: A subset S R is discrete if for all x S, there is > 0 such that the
only element of S which lies in (x , x + ) is x.
a) Which of the following subsets of R are discrete?
(i) A nite set.
(ii) The integers Z.
(iii) The rational numbers Q.
(iv) The set { n1 | n Z+ } of reciprocals of positive integers.
4. MONOTONE SEQUENCES
203
204
4. MONOTONE SEQUENCES
205
We are therefore able to speak of bounded sequences just as for bounded functions: i.e., in terms of the image...um, I mean the term set.
A sequence a : Z+ R is bounded above if its term set is bounded above:
that is, if there exists M R such that an M for all n Z+ . Otherwise we say
the sequence is unbounded above. Similarly, we say a is bounded below if its
term set is bounded below: that is, if there exists m R such that m an for all
n Z+ . Otherewise we say the sequence is unbounded below. Finally, a sequenc
is bounded if it is both bounded above and bounded below, and a sequence is
unbounded if it is not bounded.
Proposition 10.9. Let {an }
n=1 be a weakly increasing sequence.
a) If the sequence converges to L R, then L is the least upper bound of the term
set A = {an | n Z+ }.
b) Conversely, if the term set A has an upper bound L < , then an L.
Proof. a) First we claim L = limn an is an upper bound for the term
set A. Indeed, suppose not: then there is N Z+ with L < aN . But since the
sequence is weakly increasing, this implies that for all n N , L < aN an . Thus
if we take = aN L, then for no n N do we have |an L| < , contradicting
our assumption that an L. Second we claim L is the least upper bound. Indeed,
suppose not: then there is L such that for all n Z+ , an L < L. Let = L L .
For no n do we have |an L| < , contradicting our asumption that an L.
b) Let > 0. We need to show that for all but nitely many n Z+ we have
< L an < . Since L is the least upper bound of A, in particular L an for
all n Z+ , so L an 0 > . Next suppose that there are innitely many terms
an with L an , or L an + . But if this inequality holds for ifninitely many
terms of the sequence, then because an is increasing, it holds for all terms of the
sequence, and this implies that L an for all n, so that L is a smaller upper
bound for A than L, contradiction.
Remark: In the previous result we have not used the completeness property of
R, and thus it holds for sequences with values in the rationals Q (and where by
converges we mean converges to a rational number !) or really in any ordered eld.
By combining this with the least upper bound axiom, we get a much stronger result.
Theorem 10.10. Let {an }
n=1 be a weakly increasing real sequence. Let L
(, ] be the least upper bound of the term set of A. Then an L.
This is so important as to be worth spelling out very carefully. We get:
Theorem 10.11. (Monotone Sequence Theorem) a) Every bounded monotone
real sequence is convergent. More precisely:
b) Let {an } be weakly increasing. Then if {an } is bounded above, it converges to its
least upper bound, and if it is unbounded above it diverges to .
c) Let {an } be weakly decreasing. Then if {an } is bounded below, it converges to its
greatest lower bound, and it is unbounded below it diverges to .
In fact, in proving the Monotone Sequence Theorem we did not just invoke the
completeness of the real eld: we used its full force, in the following sense.
206
1, 1, 1, 1, . . .
1, 1, 1, 1, . . .
1, 1, 1, 1, . . . .
There are other choices many other choices. In fact, a real sequence can be obtained as a subsequence of {an } i it takes values in {1}.
5. SUBSEQUENCES
207
But note that something very interesting happened in the passage from our original
sequence to each of the rst two subsequences. The original sequence (65) is not
convergent, due to oscillation. However, the subsequence (66) is constant, hence
convergent to it constant value 1. Similarly, the subsequence (67) converges to
its constant value 1.
Lets recap: we began with a sequence (67) which did not converge due to oscillation. However, by choosing appropriate subsequences we were able to remove the
oscillation, resulting in this case, at least in a convergent subsequence. (There
are also lots of subsequences which are inappropriate for this purpose.)
Example: Let an = n, so the sequence is
(68)
1, 2, 3, 4, . . .
1, 3, 5, 7, . . .
(70)
2, 4, 6, 8, . . .
(71)
1, 4, 9, 16, . . .
(72)
1, 2, 4, 8, . . .
And so forth. Indeed, the subsequences of (68) are precisely the increasing sequences with values in the positive integers. Note moreover that our sequence (68)
fails to converge, but not due to oscillation. It is an increasing sequence which is
not unbounded above, and thus it diverges to innity. For that matter, so do the
subsequences (69), (70), (71), (72), and a little thought suggests that every subsequence will have this property. Thus, passing to a subsequence can cure divergence
due to oscillation, but not divergence to innity.
Example (subsequences of a convergent sequence):
We are now well-prepared for the formal denition. In fact, we practically saw
it in the example above. Given a real sequence {an }, we view it as a function
a : Z R, n 7 an . To obtain a subsequence we choose an increasing sequence
n : Z+ Z+ , k 7 nk and form the composite function
a n : Z+ R, k 7 ank .
Less formally, we choose an increasing list n1 < n2 < . . . < nk of positive integers
and use this to tell us which terms of the sequence to take, getting
an1 , an2 , an3 , . . . , ank , . . . .
Lets formalize these observations about what passing to subsequences does for us.
Exercise: Let n : Z+ Z+ be increasing. Show that for all k Z+ , nk k.
Proposition 10.13. Let {an } be a real sequence, L [, ], and suppose
that an L. Then for any subsequence {ank }
k=1 , we have ank L.
208
209
210
211
Our task now is to explain why we have two dierent results called the BolzanoWeierstrass Theorem. In fact they are really equivalent results, which means
roughly that is much easier to deduce each from the other than it is to prove
either one. Indeed:
Assume the Bolzano-Weierstrass Theorem for sequences, and let X be an innite subset of [a, b]. Since X is innite, we can choose a sequence {xn }
n=1 of
distinct elements of X. Since xn X [a, b], we have a xn b for all n;
in particular, {xn } is bounded, so by Bolzano-Weierstrass for sequences there is a
subsequence xnk converging to some L [a, b]. We claim that L is a limit point for
X: indeed, for any > 0, there is K Z+ such that for all k K, |xnk L| < :
since the terms are distinct, at most one of them is equal to L and thus the interval
(L , L + ) contains innitely many elements of X.
Assume the Bolzano-Weierstrass Theorem for subsets, and let {xn } be a bounded
sequence: thus there are a, b R with a xn b for all n Z+ . Let X = {xn | n
Z+ } be the term set of the sequence. If X is nite, then the sequence has a constant (hence convergent) subsequence. Otherwise X is innite and we may apply
Bolzano-Weierstrass for subsets to get a limit point L of X. This implies: there is
n1 Z+ such that |xn1 L| < 1; having chosen such an n1 , there is n2 Z+ such
that n2 > n1 and |xn2 L| < 12 : continuing in this way we build a subsequence
{xnk } such that for all k Z+ , |xnk L| < k1 , and thus xnk L.
7. Partial Limits; Limits Superior and Inferior
7.1. Partial Limits.
For a real sequence {an }, we say that an extended real number L [, ]
is a partial limit of {an } if there exists a subsequence ank such that ank L.
Lemma 10.23. Let {an } be a real sequence. Suppose that L is a partial limit of
some subsequence of {an }. Then L is also a partial limit of {an }.
Exercise: Prove Lemma 10.23. (Hint: this comes down to the fact that a subsequence of a subsequence is itself a subsequence.)
Theorem 10.24. Let {an } be a real sequence.
a) {an } has at least one partial limit L [, ].
b) The sequence {an } is convergent i it has exactly one partial limit L and L is
nite, i.e., L = .
c) an i is the only partial limit.
d) an i is the only partial limit.
Proof. a) If {an } is bounded, then by Bolzano-Weierstrass there is a nite
partial limit L. If {an } is unbounded above, then by Theorem 10.19a), is a
partial limit. It {an } is unbounded below, then by Theorem 10.19b) is a
partial limit. Every sequence is either bounded, unbounded above or unbounded
below (and the last two are not mutually exclusive), so there is always at least one
partial limit.
b) Suppose that L R is the unique partial limit of {an }. We wish to show that
an L. First observe that by Theorem 10.19, {an } must be bounded above and
below, for otherwise it would have an innite partial limit. So choose M R such
212
213
Case 2: The sequence diverges to . Then is the only partial limit and
thus L = is the largest partial limit.
Case 3: The sequence is bounded above and does not diverge to . Then it
has a nite partial L (it may or may not also have as a partial limit), so
L (, ). We need to nd a subsequence converging to L.
For each k Z+ , L k1 < L, so there exists a subsequence converging to some
L > L k1 . In particular, there exists nk such that ank > L k1 . It follows from
these inequalities that the subsequence ank cannot have any partial limit which
is less than L; moreover, by the denition of L = sup L the subsequence cannot
have any partial limit which is strictly greater than L: therefore by the process of
elimination we must have ank L.
Similarly we dene the limit inmum L of a real sequence to be the inmum of the
set of all partial limits. By reection, the proof of Theorem 10.25 shows that L is a
partial limit of the sequence, i.e., there exists a subsequence ank such that ank L.
Here is a very useful characterization of the limit supremum of a sequence {an } it is
the unique extended real number L such that for any M > L, {n Z+ | an M }
is nite, and such that for any m < L, {n Z+ | an m} is innite.
Exercise:
a) Prove the above characterization of the limit supremum.
b) State and prove an analogous characterization of the limit inmum.
Proposition 10.26. For any real sequence an , we have
(73)
L = lim sup ak
n kn
and
(74)
L = lim inf ak .
n kn
214
8. CAUCHY SEQUENCES
215
what number the integral should be. And thats what made Darbouxs criterion
so useful: we used it to show that every continuous function and every monotone
function is integrable, but of course without having to nd in any sense the value
of the integral. (This inexplicitness is not entirely a good thing, and the main point
of our discussion of Riemann sums was to make the convergence more explicit.)
Upshot: it would be nice to have some way of expressing/proving that a sequence
is convergent which doesnt have the limit of the sequence built into it. This is
exactly what Cauchy sequences are for.
8.2. Cauchy sequences.
+
A sequence {an }
such
n=1 in is Cauchy if for all > 0, there exists N Z
that for all m, n N , |am an | < .
216
The above proofs did not use completeness, and thus the results hold in any ordered
eld. In contrast, the next result does crucially use the Dedekind completeness of
the real numbers, in the form of the Bolzano-Weierstrass Theorem.
Theorem 10.32. (Cauchy Criterion) Any real Cauchy sequence is convergent.
Proof. Let {an } be a real Cauchy sequence. By Proposition 10.30, {an } is
bounded. By Bolzano-Weierstrass there exists a convergent subequence. Finally,
by Proposition 10.31, this implies that {an } is convergent.
It can be further shown that an Archimedean ordered eld F in which every
Cauchy sequence is convergent must be Dedekind complete. However, there are
non-Archimedean and thus necessarily not Dedekind complete ordered elds
in which every Cauchy sequence converges. (In fact there are non-Archimedean
ordered elds in which the only Cauchy sequences are the eventually constant sequences!) But we had better not get into such matters here.
Exercise: Is there a Cauchy criterion for a function to be dierentiable at a
point? (Hint: yes. See [Ma56].)
9. Geometric Sequences and Series
A geometric sequence is a sequence {xn }
n=0 of real numbers such that there is
a xed real number r with xn+1 = rxn . We call r the geometric ratio since, if
for all n Z+ xn = 0 we have xxn+1
= r.
n
Theorem 10.33. (Geometric Sequences) Let {xn } be a geometric sequence with
geometric ratio r and x0 = 0.
a) We have xn = x0 rn .
b) If |r| < 1, then xn 0.
c) If r = 1, then xn x0 .
d) If r = 1, then the sequence is x0 , x0 , x0 , x0 , . . ., which diverges.
e) If |r| > 1, then |xn | .
Proof. a) A simple induction argument which is left to the reader.
Suppose xn L [, ]. Since xn+1 = rxn , L = limn xn+1 =
limn rxn = rL. The solutions to L = rL for r R, L [, ] are: (1, L) for
any L, (r, 0) for any r, (r, ) for positive r and (r, ) for negative r. Now:
b) If |r| < 1, then |xn+1 | = |r||xn | < |xn |, so {|xn |} is decreasing and bounded
below by 0. By the Montone Sequence Lemma, |xn | converges to a nite, nonnegative number L, and by the above analysis L = 0. Since |xn | 0, xn 0.
c), d) These are immediate and are just recorded for easy reference.
e) If |r| > 1, then |xn+1 | = |r||xn | > |xn |, so the sequence {|xn |} is increasing and
bounded below by |x0 |. By the Monotone Sequence Lemma, |xn | L (|x0 |, ],
and by the above analysis we must have L = .
For x0 , r R, we dene the nite geometric series
(75)
Sn = x0 + x0 r + . . . + x0 rn .
We claim that quite luckily! we are able to obtain a simple closed-form expression for Sn . We may dispose of the case r = 1: then Sn = (n + 1). When r = 1, we
use a very standard trick : multiplying (76) by r gives
(76)
rSn = x0 r + . . . + x0 rn + x0 rn+1 ,
217
Sn =
x0 r k =
k=0
1 rn+1
.
1r
Having this closed form enables us to determine easily whether the sequence {Sn }
converges, and if so, to what.
Theorem 10.34. (The Geometric Series) For x0 , r R, let
Sn =
x0 rn = x0 + x0 r + . . . + x0 rn .
k=0
1
a) If |r| < 1, then limn Sn = 1r
.
b) If |r| 1, then {Sn } diverges.
Recall that f : I R is Lipschitz if (78) holds for some C > 0. If so, for
any > 0, we may independently of x choose < C , and then |x y|
= |f (x) f (y)| C < . So a Lipschitz function is uniformly continuous. A
contraction mapping is a Lipchitz mapping with Lipschitz constant C < 1.
A function f : I R is weakly contractive if:
x = y I, |f (x) f (y)| < |x y|.
A function f : I R is a short map if 1 is a Lipschitz constant for f , i.e.,
x, y I, |f (x) f (y)| |x y|.
Thus contractive implies weakly contractive implies short map.
Exercise: Let f : I I be dierentiable.
218
219
x2 + 1.
0<x1
1, x = 0
x
2,
220
the literature, the term attracting point is often used for a weaker, local property which is studied in the following exercise.
Exercise: Let f : I I be a function. A point L I is a locally attracting
point if there exists > 0 such that f maps [L , L + ] into itself and the
restriction of f to [L , L + ] has L as an attracting point.
Now let f : I I be C 1 , and let L I be a xed point of f .
a) Show that if |f (L)| < 1, then L is a locally attracting point for f .
b) Show that if |f (L)| > 1, then L is not a locally attracting point for f .
c) Exhibit f : [1, 1] R such that L = 0 is a xed point, f (0) = 1, and 0 is a
locally attracting point for f .
d) Exhibit f : [1, 1] R such that L = 0 is a xed point, f (0) = 1 and 0 is not
a locally attracting point for f .
10.3. The Contraction Mapping Theorem.
Theorem 10.39. (Contraction Mapping Theorem) Let I R be a closed interval, and let f : I I be contractive with constant C (0, 1).
a) Then f is attractive: there is a unique xed point L, and for all x0 I, the
sequence {xn } of iterates of f under x0 converges to L.
b) Explicitly, for all x0 I and all n N,
(
)
|x0 f (x0 )|
(79)
|xn L|
C n.
1C
Proof. Step 0: By Theorem 10.36, f has at most one xed point.
Step 1: Let x0 I, x > 0, let N be a large positive integer to be chosen (rather
sooner than) later, let n N and let k 0. Then
|xn+k xn | |xn+k xn+k1 | + |xn+k1 xn+k2 | + . . . + |xn+1 xn |
C n+k1 |x1 x0 | + C n+k2 |x1 x0 | + . . . + C n |x1 x0 |
(
) (
)
(
)
1 Ck
|x1 x0 |
= |x1 x0 |C n 1 + C + . . . + C k1 = |x1 x0 |C n
<
C n.
1C
1C
Since
|C|)< 1, C n 0, so we may choose N such that for all n N and all k N,
(
|x1 x0 |
1C
221
Exercise: a) Show that there is a unique real number x such that cos x = x.
b) Explain how to use (say) a handheld calculator to approximate x to as many
decimal places of accuracy as your calculator carries.
Exercise: Let f : [a, b) [a, b) be a contractive map with constant C (0, 1).
a) Show that limxb f (x) exists (as a real number!), and that by dening f at
b to be this limit, the extended function f : [a, b] R remains contractive with
constant C.5
b) Use part a) to extend the Contraction Mapping Principle to contractions f :
[a, b) [a, b), with the proviso that the unique xed point may be b.
c) Give an example of a contraction mapping on [a, b) with xed point b.
d) State and prove a version of the Contraction Mapping Principle valid for an
arbitrary interval.
Exercise: Let f : I I be a function. For n Z+ , let f n = f . . . f be
the nth iterate of f . Suppose that for some N Z+ , f N is contractive.
a) Show that any xed point of f is a xed point of f N .
b) Show that f N has a unique xed point L I, and deduce that f has at most
one xed point in I.
c) Show that f (L) is also a xed point for f N , and deduce that f (L) = L: thus f
has a unique xed point in I.
d) Show that in fact L is an attracting point for f .
e) Consider the function f : [0, 1] [0, 1] by
{
0, 0 x [0, 1]
f (x) =
1, x (1, 2].
Show that f is not a contraction but f f is a contraction.
10.4. Further Attraction Theorems.
Let I be an interval, and let f : I I be continuous. Recall that L I is
attracting for f if for all x0 I, the sequence of iterates of x0 under f converges to
L; above, we showed that if L is attracting for f then L is the unique xed point
of f . Let us say sup I is an attracting point for f if for all x0 I, the sequence
of iterates of x0 under f approaches sup I (sup I = i I is unbounded above).
Similarly, we say inf I is an attracting point for f if for all x0 I, the sequence of
iterates of x0 under f approaches inf I (inf I = i I is unbounded below).
Theorem 10.40. Let I R be an interval, and let f : I I be continuous.
a) At least one of the following holds:
(i) f has a xed point.
(ii) sup I is an attracting point for f .
(iii) inf I is an attracting point for f .
b) If I = [a, b] then f has a xed point.
Proof. a) Dene g : I R by g(x) = f (x) x. A xed point of f is precisely
a root of g, so to prove part a) we may assume that g has no roots on I and show
5This is the hardest part of the problem. The result on extension will be much easier after
you have read the next section, which treats similar but more general problems. You may wish to
assume this part for now and come back to it later.
222
that either (ii) or (iii) holds. If the continuous function g has no roots then either
(I) f (x) > x for all x I or (II) f (x) < x for all x I. We will show (I) = (ii);
the very similar proof that (II) = (iii) is left to the reader.
Suppose f (x) > x for all x I. Then, for any x0 I, the sequence of iterates
is increasing. This sequence cannot converge to any element L of I, for by Lemma
10.37, L would then be a xed point of f , contradiction. So xn must approach
sup I.
b) Let I = [a, b] and let x0 I. If there were no xed point, then by part a) the
sequence of iterates of x0 under f would approach either sup I = b or inf I = a.
But both are elements of I, so by Lemma 10.37 either a or b is a xed point.
Theorem 10.41. Let I R be an interval, and let f : I I be weakly
contractive. Then f has an attracting point in [inf I, sup I].
Proof. Step 0: If f has no xed point in I, then by Theorem 10.40 either
sup I or inf I is an attracting point for f . Thus we may assume that f has a xed
point L I, and our task is to show that L is attracting for f .
Step 1: Let x0 I, and for n N, put dn = |xn L|. If for some N we have
xN = L, then xn = L for all n N , so the sequence of iterates converges to L.
So we may assume dn > 0 for all n N, and then by weak contractivity {dn } is
decreasing. Since 0 is a lower bound, there is d 0 such that dn d. Observe
that the desired conclusion that xn L is equivalent to d = 0.
Step 2: For all n N, xn [L d0 , L + d0 ], so {xn } is bounded; by BolzanoWeierstrass, there is a convergent subsequence, say xnk y. As k we have:
|xnk L| |y L|, |xnk L| = dnk d,
|f (xnk ) L| = |xnk +1 L| = dnk +1 d, |f (xnk ) L| |f (y) L|,
so |y L| = d = |f (y) L| = |f (y) f (L)|. By weak contractivity, d = 0.
Remark: The case I = R of Theorem 10.41 is due to A. Beardon [Be06]. The case
I = [a, b] is an instance of a general result of M. Edelstein [Ed62] which is described
in the next section. Our proof of Theorem 10.41 draws ideas from Beardons proof
and also from K. Conrads treatment of Edelsteins Theorem in [CdC].
Exercise: Show that the function f : [0, 1] [0, 1] dened by f (x) =
weakly contractive but not contractive.
1
1+x
is
223
224
you will surely learn about systems of dierential equations, and the most important result in this area is that with suitable hypotheses and precisions, of course
every system of dierential equations has a unique solution. The now standard
proof of this seminal result uses Banachs Fixed Point Theorem!7
Let X be a metric space. Then the statement Every sequence with values in
X admits a convergent subsequence is certainly meaningful, but as is already
the case with intervals on the real line! whether it is true or false certainly depends on X. We say that a metric space is compact if every sequence with values
in X admits a convergent subsequence.
In fact every compact metric space is complete, and the proof again requires no
ideas other than the ones we have already developed: indeed, if {xn } is a Cauchy
sequence in a compact metric space, then by denition of compactness it admits
a subsequence xnk converging to some L X, and then we prove exactly as we
did before that if a subsequence of a Cauchy sequence converges to L then the
Cauchy sequence itself must converge to L.
Above we showed that a weakly contractive map on a closed, bounded (and thus
compact) interval was attractive and attributed this to M. Edelstein. What Edelstein actually showed was the following result.
Theorem 10.43. (Edelstein [Ed62]) Let X be a compact metric space, and let
f : X X be a weakly contractive mapping. Then f is attractive:
a) There is a unique xed point L of f .
b) For all x0 X, the sequence of iterates of x0 under f converges to f .
Proof. We follow [CdC].
Step 0: The Extreme Value Theorem has the following generalization to compact
metric spaces: if X is a compact metric space, then any continuous function f :
X R is bounded and attains its maximum and minimum values. Recall that
we gave two proofs of the Extreme Value Theorem for X = [a, b]: one using Real
Induction and one using the fact that every sequence in [a, b] admits a convergent
subsequence. Since by denition this latter property holds in a compact metric
space, it is the second proof that we wish to carry over here, and we ask the
interested reader to check that it does carry over with no new diculties.
Step 1: We claim f has a xed point. Here we need a new argument: the one we
gave for X = [a, b] used the Intermediate Value Theorem, which is not available in
our present context. So here goes: let g : X R by g(x) = d(x, f (x)). Since f is
continuous, so is g and thus the Extreme Value Theorem applies and in particular
g attains a minimum value: there is L X such that for all y X, d(L, f (L))
d(y, f (y)). But if f (L) = L, then by weak contractivity we have d(f (L), f (f (L))) <
d(L, f (L)), i.e., g(f (L)) < g(L), contradiction. So L is a xed point for f .
Step 2: The argument of Step 2 of the proof of Theorem 10.41 carries over directly
to show that L is an attracting point for f .
7In fact the title of [Ba22] indicates that applications to integral equations are being explicitly
considered. An integral equation is very similar in spirit to a dierential equation: it is an
equating relating an unknown function to its integral(s).
225
226
+ = ,
3
3
contradiction.
Theorem 10.47 gives necessary and sucient conditions for a function f : S R to
admit a uniformly continuous extension to I. When I is closed and bounded, this
solves our extension problem because uniform continuity is equivalent to continuity.
However, for an interval which is not closed and bounded, being uniformly continuous is much stronger than being continuous. For instance, the only polynomial
227
functions which are uniformly continuous on all of R are the linear polynomials.
However there is an easy, but sneaky, way to soup up the Extension Theorem: if I is not closed and bounded, we dont have to extend f to I all at once;
we can do it via an increasing sequence of closed, bounded subintervals of I.
Corollary 10.48. Let S be a dense subset of R, and let f : S R. The
following are equivalent:
(i) For all M > 0, the restriction of f to S [M, M ] is uniformly continuous.
(ii) There is a unique extension of f to a continuous function f : R R.
Proof. (i) = (ii): Applying the Extension Theorem to f on S [M, M ],
we get a unique continuous extension fM : [M, M ] R. Since the extensions
are unique, for any x R, we may choose any M with M |x| and dene f(x) =
fM (x): this does not depend on which M we choose. Moreover, since continuity at
a point depends only on the behavior of the function in small intervals around the
point, it is immediate that any function constructed from an expanding family of
continuous functions in this way is continuous on all of R.
CHAPTER 11
Innite Series
1. Introduction
1.1. Zeno Comes Alive: a historico-philosophical introduction.
Humankind has had a fascination with, but also a suspicion of, innite processes
for well over two thousand years. Historically, the rst kind of innite process that
received detailed infomation was the idea of adding together innitely many quantitties; or, to put a slightly dierent emphasis on the same idea, to divide a whole
into innitely many parts.
The idea that any sort of innite process can lead to a nite answer has been
deeply unsettling to philosophers going back at least to Zeno,1 who believed that a
convergent innite process was absurd. Since he had a suciently penetrating eye
to see convergent innite processes all around him, he ended up at the lively conclusion that many everyday phenomena are in fact absurd (so, in his view, illusory).
We will get the avor of his ideas by considering just one paradox, Zenos arrow paradox. Suppose that an arrow is red at a target one stadium away. Can
the arrow possibly hit the target? Call this event E. Before E can take place,
the arrow must arrive at the halfway point to its target: call this event E1 . But
before it does that it must arrive halfway to the halfway point: call this event E2 .
We may continue in this way, getting innitely many events E1 , E2 , . . . all of which
must happen before the event E. That innitely many things can happen before
some predetermined thing Zeno regarded as absurd, and he concluded that the arrow never hits its target. Similarly he deduced that all motion is impossible.2
Nowadays we have the mathematical tools to retain Zenos basic insight (that a
single interval of nite length can be divided into innitely many subintervals)
without regarding it as distressing or paradoxical. Indeed, assuming the arrow
takes one second to hit its target and (rather unrealistically) travels at uniform
velocity, we know exactly when these events Ei take place: E1 takes place after
1
1
1
2 seconds, E2 takes place after 4 seconds, and so forth: En takes place after 2n
seconds. Nevertheless there is something interesting here: we have divided the total
time of the trip into innitely many parts, and the conclusion seems to be that
(80)
1 1
1
+ + . . . + n + . . . = 1.
2 4
2
1Zeno of Elea, ca. 490 BC - ca. 430 BC.
2One has to wonder whether he got out much.
229
230
So now we have not a problem not in the philosophical sense but in the mathematical
one: what meaning can be given to the left hand side of (80)? Certainly we ought
to proceed with some caution in our desire to add innitely many things together
and get a nite number: the expression
1 + 1 + ... + 1 + ...
represents an innite sequence of events, each lasting one second. Surely the aggregate of these events takes forever.
We see then that we dearly need a mathematical denition of an innite series
of numbers and also of its sum. Precisely, if a1 , a2 , . . . is a sequence of real numbers
and S is a real number, we need to give a precise meaning to the equation
a1 + . . . + an + . . . = S.
So here it is. We do not try to add everything together all at once. Instead, we
form from our sequence {an } an auxiliary sequence {Sn } whose terms represent
adding up the rst n numbers. Precisely, for n Z+ , we dene
Sn = a1 + . . . + an .
The associated sequence {Sn } is said to be the sequence of partial sums of the
sequence {an }; when necessary we call {an } the sequence
of terms. Finally, we
series is convergent; if the sequence {Sn } diverges then the innite series n=1 an
is divergent.
Thus the trick of dening the innite sum n=1 an is to do everything in terms
of the associated sequence
of partial sums Sn = a1 + . . . + an .
In particular
by
n=1 an = we mean the sequence of partial sums diverges
1 = lim n = .
n
n=1
1
2
1
4
+ ... +
1
2n
Sn =
+ . . ., in which an =
1
2n
for all n, so
1
1
+ ... + n.
2
2
There is a standard trick for evaluating such nite sums. Namely, multiplying (81)
by 12 and subtracting it from (81) all but the rst and last terms cancel, and we get
1. INTRODUCTION
231
1
1
1
1
Sn = Sn Sn = n+1 ,
2
2
2 2
and thus
Sn = 1
It follows that
1
.
2n
1
1
= lim (1 n ) = 1.
n
n
2
2
n=1
1
n=1 n2 +n .
We have
1
,
2
1 1
2
S2 = S1 + a2 = + = ,
2 6
3
2
1
3
S3 = S2 + a3 = +
= ,
3 12
4
3
1
4
S4 = S3 + a4 = +
= .
4 20
5
1
n
It certainly seems as though we have Sn = 1 n+1 = n+1
for all n Z+ . If this is
the case, then we have
n
= 1.
an = lim
n n + 1
n=1
S1 =
1
n
(82)
Sn =
=
.
2
k +k
n+1
k=1
(n +
1)2
=
1
2.
n
1
n
1
1
IH
=
+ 2
=
+
+ (n + 1)
n + 1 n + 3n + 2
n + 1 (n + 1)(n + 2)
(n + 2)n + 1
(n + 1)2
n+1
=
=
.
(n + 1)(n + 2)
(n + 1)(n + 2)
n+2
This approach will work whenever we have some reason to look for and successfully
guess a simple closed form identity for Sn . But in fact, as we will see in the coming
232
sections, in practice it is exceedingly rare that we are able to express the partial
sums Sn in a simple closed form. Trying to do this for each given series would turn
out
to be1 a discouraging waste of time. We need some insight into why the series
n=1 n2 +n happens to work out so nicely.
Well, if we stare at the induction proof long enough we will eventually notice how
convenient it was that the denominator of (n+1)21+(n+1) factors into (n + 1)(n + 2).
1
. Does this remind
Equivalently, we may look at the factorization n21+n = (n+1)(n)
us of anything? I certainly hope so: this is a partial fractions decomposition. In
this case, we know there are constants A and B such that
1
A
B
= +
.
n(n + 1)
n
n+1
I leave it to you to conrm in whatever manner seems best to you that we have
1
1
1
=
.
n(n + 1)
n n+1
This makes the behavior of the partial sums much more clear! Indeed we have
1
S1 = 1 .
2
1
1 1
1
S2 = S1 + a2 = (1 ) + ( ) = 1 .
2
2 3
3
1
1 1
1
S3 = S2 + a3 = (1 ) + ( ) = 1 ,
3
3 4
4
1
and so on. This much simplies the inductive proof that Sn = 1 n+1
. In fact
induction is not needed: we have that
1
1 1
1
1
1
Sn = a1 + . . . + an = (1 ) + ( ) + . . . + (
)=1
,
2
2 3
n n+1
n+1
the point being that every term
cancelled out by some
except the rst and last is
1
other term. Thus once again n=1 n21+n = limn 1 n+1
= 1.
Finite sums which cancel in this way are often called telescoping sums, I believe
after those old-timey collapsible nautical telescopes. In general an innite
233
lucky), then in order to prove it you do not need to do anything so fancy as mathematical induction (or fancier!). Rather, it will suce to just compute that S1 = a1
and for all n 2, Sn Sn1 = an . This is the discrete analogue of the fact that if
you want to show that f dx = F i.e., you already have a function F which you
believe is an antiderivative of f then you need not use any integration techniques
whatsoever but may simply check that F = f .
n
Exercise: Let n Z+ . We dene the nth harmonic number Hn = k=1 k1 =
1
1
1
1 + 2 + . . . + n . Show that for all n 2, Hn Q \ Z. (Suggestion: more specically,
show that for all n 2, when written as a fraction ab in lowest terms, then the
denominator b is divisible by 2.)3
+
Exercise:
Let1 k Z . Use the method of telescoping sums to give an exact formula
for n=1 n(n+k) in terms of the harmonic number Hk of the previous exercise.
Given an innite series n=1 an there are two basic questions to ask:
of these questions: for a geometric series n=N crn , we know that the series concr N
verges i |r| < 1 and in that case its sum is 1r
. We should keep this success
story in mind, both because geometric series are ubiquitous and turn out to play
a distinguished role in the theory in many ways, but also because other examples
of series in which we can answer Question 11.1b) i.e., determine the sum of a
convergent series are much harder to come by. Frankly, in a standard course
on innite series one all but forgets about Question 11.1b) and the game becomes
simply to decide whether a given series is convergent or not. In these notes we try
to give a little more attention to the second question in some of the optional sections.
In any case, there is a certain philosophy at work when one is, for the
moment,
234
ak + Tn = a1 + . . . + aN + aN +1 + . . . + aN +n = SN +n .
k=1
N
an . Conversely if
n=1 an exists, then
n=1
so does limn k=1 ak + Tn =
n
k=1 ak + limn Tn , hence limn Tn =
n=N +1 an exists.
Similarly, if we are so inclined (and we will be, on occasion), we could add nitely
many terms to the series, or for that matter change nitely many terms of the
series, without aecting the convergence. We record this as follows.
Proposition 11.2. The addition, removal or altering of any nite number of
terms in an innite series does not aect the convergence or divergence of the series
(though of course it may change the sum of a convergent series).
As the reader has probably already seen for herself, reading someone elses formal
proof of this result can be more tedious than enlightening, so we leave it to the
reader to construct a proof that she nds satisfactory.
Proposition 11.3. Let n=1 an , n=1 bn be two innite series, and let be
any real
number.
a) If n=1 an = A and n=1 bn = B are both convergent, then the series n=1 an +
bn is
also convergent, with sum A + B.
b
,
c
converge,
then
so
does
the
third.
n
n
n
n
235
an
Warning: The
converse of Theorem 11.4 is not valid! It may well be the case
that an 0 but n an diverges. Later we will see many examples. Still, when put
under duress (e.g. while taking an exam) many students can will themselves into
believing that the converse might be true. Dont do it!
P (x)
be a rational function. The polynomial Q(x) has only nitely
Exercise: Let Q(x)
many roots, so we may choose N Z+ such that for all n N , Q(x) = 0. Show
P (n)
that if deg P deg Q, then n=N Q(n)
is divergent.
series n=1 an converges if and only if: for every > 0, there exists N0 Z+ such
N +k
that for all N N0 and all k N, | n=N an | < .
Note that taking k = 0 in the Cauchy criterion, we recover the Nth Term Test for
convergence (Theorem 11.4). It is important to compare these two results: the Nth
Term Test gives a very weak necessary condition for the convergence of the series.
In order to turn this into a necessary and sucient condition we must require not
only that an 0 but also an + an+1 0 and indeed that an + . . . + an+k 0 for
a k which is allowed to be (in a certain precise sense) arbitrarily large.
N +k
Let us call
a sum of the form n=N = aN + aN +1 + . . . + aN +k a nite tail
n=N
an | .
N +k
In other words the supremum of the absolute values of the nite tails | n=N an |
is at most . This gives a nice way of thinking about the Cauchy criterion.
4This means: a L = 0, or a diverges.
n
n
236
we may express
convergence by writing n an < and divergence by writing n an = .
3.2. The Comparison Test.
Example: Consider the series n=1 n21n . Its sequence of partial sums is
( )
( )
( )
1
1
1
1
1
Tn = 1
+
+ ... +
.
2
2
4
n
2n
Unfortunately we do not (yet!) know a closed form expression for Tn , so it is
not possible for us to compute limn Tn directly. But if we just want
to decide
whether the series converges, we can compare it with the geometric series n=1 21n :
Sn =
1 1
1
+ + ... + n.
2 4
2
Since n1 1 for all n Z+ , we have that for all n Z+ , n21n 21n . Summing these
inequalities from k = 1 to n gives Tn Sn for all n. By our work with geometric
series we know that Sn 1 for all n and thus also Tn
1 for all n. Therefore our
given series has partial sums bounded above by 1, so n=1 n21n 1. In particular,
the series converges.
Example: conside the series n=1 n. Again, a closed form expression for Tn =
237
Theorem 11.8. (Comparison Test) Let n=1 an , n=1 bn be two series with
non-negative terms, and suppose that an bn for all n Z+ . Then
In particular: if
n=1
n bn
< then
an
bn .
n=1
an < , and if
an = then
n bn
= .
Proof. There is really nothing new to say here, but just to be sure: write
Sn = a1 + . . . + an , Tn = b1 + . . . + bn .
Since ak bk for all k we have Sn Tn for all n and thus
an = sup Sn sup Tn =
n=1
bn .
n=1
1
1
1
1
1
=1+1+ +
+
+ ... +
+ ....
n!
2
2
3
2
4
2
... n
n=0
We would like to show that the series converges by comparison, but what to compare
it to? Well, there is always the geometric series! Observe that the sequence n!
grows faster than any geometric rn in the sense that limn rn!n = . Taking
1
reciprocals, it follows that for any 0 < r < 1 we will have n!
< r1n not necessarily
+
for all n Z , but at least for all suciently large n. For instance, one easily
1
1
establishes by induction that n!
< 21n if and only if n 4. Putting an = n!
and
1
bn = 2n we cannot apply the Comparison Test because we have an bn for all
n 4 rather than for all n 0. But this objection is more worthy of a bureaucrat
than a mathematician: certainly the idea of the Comparison Test is applicable here:
1
1
1
1
8 1
67
=
+
8/3 +
= + =
< .
n
n!
n!
n!
2
3
8
24
n=0
n=0
n=4
n=4
So the series converges. More than that, we still retain a quantitative estimate on
the sum: it is at most (in fact strictly less than, as a moments thought will show)
67
24 = 2.79166666 . . .. (Perhaps this reminds you of e = 2.7182818284590452353602874714 . . .,
which also happens to be a bit less than 67
24 . It should! More on this later...)
We record the technique of the preceding example as a theorem.
238
Theorem 11.9. (Delayed Comparison Test) Let n=1 , n=1 bn be two series
with non-negative terms. Suppose that there exists N Z+ such that for all n > N ,
an bn . Then
(N
)
an
an bn +
bn .
In particular: if
n=1
n bn < then
n=1
n=1
a
<
,
and
if
n
n
n an = then
n bn = .
Corollary
11.11. (Calculus Students Limit Comparison Test) Let
n an
and n bn be two series. Suppose that for all suciently large n both an and bn
are positive and limn abnn = L [0, ].
or both diverge).
b) If L = and n an converges, then n bn converges.
c) If L = 0 and n bn converges, then n an converges.
Proof. In all cases we deduce the result from the Limit Comparison Test.
a) If 0 < L < , then there exists N Z+ such that 0 < L2 bn
an (2L)bn .
Applying
Theorem
11.10
to
the
second
inequality,
we
get
that
if
n bn converges,
then n an converges. The rst inequality is equivalent to0 < bn L2 an for all
n N , and applying Theorem 11.10
we get that if n an converges, then
to this
a
,
b
converges.
So
the
two
series
n bn converge or diverge together.
n n
n n
b) If L = , then there exists N Z+ such
, an bn 0.
that for all n N
Applying Theorem 11.10 to this we get that if n converges, then n bn converges.
c) This case is left to the reader as an exercise.
Exercise: Prove Corollary 11.11c).
Example: We will show that for all p 2, the p-series n=1 n1p converges. In
fact it is enough to show this
for p = 2,
since for p > 2 we have for all n Z+ that
1
1
1
2
p
n < n and thus np < n2 so n np n n12 . For p = 2, we happen to know that
)
1
1
1
=
= 1,
n2 + n n=1 n n + 1
n=1
and in particular that n n21+n converges. For large n, n21+n is close to n12 . More
precisely, putting an = n21+n and bn = n12 we have an bn , i.e.,
an
n2
1
= lim 2
= lim
n bn
n n + n
n 1 +
lim
1
n
= 1.
239
together. Since the former series converges, we deduce that n n12 converges, even
though the Comparison Test does not directly apply.
Exercise: Let
P (x)
Q(x)
n=0
an and
n=0 bn
In order to forestall possible confusion, let us point out that many students are
tempted to consider the following product operation on series:
(
) ( )
??
an
bn =
an bn .
n=0
n=0
n=0
In other words, given two sequences of terms {an }, {bn }, we form a new sequence
of terms {an bn } and then the
associated series.
In fact this is not a very useful
candidate for the product. If n an = A and n bn = B, we want our product
series to converge to AB. But for instance, take {an } = {bn } = 21n . Then
an =
n=0
so AB = 4, whereas
n=0
Unfortunately
4
3
an bn =
bn =
n=0
1
1
1
2
1
1
=
n
4
1
n=0
= 2,
1
4
4
.
3
Plenty! We have ignored the laws of algebra for nite sums: e.g.
(a0 + a1 + a2 )(b0 + b1 + b2 ) = a0 b0 + a1 b1 + a2 b2 + a0 b1 + a1 b0 + a0 b2 + a1 b1 + a2 b0 .
The product is dierent and more complicated and indeed, if all the terms are
positive, strictly larger than just a0 b0 + a1 b1 + a2 b2 . We have forgotten about
the cross-terms which show up when we multiply one expression involving several
terms by another expression involving several terms.5
Let us try again at formally multiplying out a product of innite series:
(a0 + a1 + . . . + an + . . .)(b0 + b1 + . . . + bn + . . .)
= a0 b0 + a0 b1 + a1 b0 + a0 b2 + a1 b1 + a2 b0 + . . . + a0 bn + a1 bn1 + . . . + an b0 + . . . .
5To the readers who did not forget about the cross-terms: my apologies. But it is a common
enough misconception that it had to be addressed.
240
The notation is getting complicated. In order to shoehorn the right hand side into
a single innite series, we need to either (i) choose some particular ordering of the
terms ak b or (ii) collect some terms together into an nth term.
For the moment we choose the latter: we dene for any n N
n
cn =
ak bnk = a0 bn + a1 bn1 + . . . + an bn
k=0
and then we dene the Cauchy product of n=0 an and n=0 bn to be the series
( n
)
cn =
ak bnk .
n=0
n=0
k=0
Theorem
11.12. Let{an }
n=0 , {bn }n=0 be two series
Let
a
=
A
and
b
=
B.
Putting
c
=
n
n
n
n=0
k=0 ak bnk we have that
n=0
c
=
AB.
In
particular,
the
Cauchy
product
series
converges i the two
n
n=0
ai bj = (a0 + . . . + aN )(b0 + . . . + bN ) = AN BN .
N =
0i,jN
So the box product clearly converges to the product of the sums of the two series.
This suggests that we compare the Cauchy product to the box product. The entries
of the box product can be arranged to form a square, viz:
N = a0 b0 + a0 b1 + . . . + a0 bN
+a1 b0 + a1 b1 + . . . + a1 bN
..
.
+aN b0 + aN b1 + . . . + aN bN .
On the other hand, the terms of the N th partial sum of the Cauchy product can
naturally be arranged in a triangle:
CN =
a0 b0
+a0 b1 + a1 b0
+ a0 b2 + a1 b1 + a2 b0
+a0 b3 + a1 b2 + a2 b1 + a3 b0
..
.
+a0 bN + a1 bN 1 + a2 bN 2 + . . . + aN b0 .
Thus while N is a sum of (N + 1)2 terms, CN is a sum of 1 + 2 + . . . + N + 1 =
(N +1)(N +2)
terms: those lying on our below the diagonal of the square. Thus in
2
considerations involving the Cauchy product, the question is to what extent one
can neglect the terms in the upper half of the square i.e., those with ai bj with
i + j > N as N gets large.
Here, since all the ai s and bj s are non-negative and N contains all the terms
of CN and others as well, we certainly have
CN N = AN BN AB.
Thus C = limN CN AB. For the converse, the key observation is that if we
make the sides of the triangle twice as long, it will contain the box: that is, every
term of N is of the form ai bj with 0 i, j N ; thus i + j 2N so ai bj appears
as a term in C2N . It follows that C2N N and thus
C = lim CN = lim C2N lim N = lim AN BN = AB.
N
C=
an = AB =
an
bn .
n=0
n=0
n=0
4. Series With Non-Negative Terms II: Condensation and Integration
242
For now, we give the following brilliant and elementary argument due to Cauchy.
Consider the terms arranged as follows:
( ) (
) (
)
1
1 1
1 1 1 1
+
+
+
+ + +
+ ...,
1
2 3
4 5 6 7
i.e., we group the terms in blocks of length 2k . The power of 12 which begins each
block is larger than every term in the preceding block, so if we replaced every term
in the current block the the rst term in the next block, we would only decrease
the sum of the series. But this latter sum is much easier to deal with:
( ) (
) (
)
1
1
1 1
1 1 1 1
1 1 1
+
+ + +
+
+
+ . . . = + + + . . . = .
n
2
4
4
8
8
8
8
2 2 2
n=1
1
1
n1+ n
P (x)
Exercise: Let Q(x)
be a rational function with deg Q deg P = 1. Show that
P (n)
6
n=N Q(n) diverges.
P (x)
Q(x) ,
the series
P (n)
n=N Q(n)
con
The apparently ad hoc argument used to prove the divergence of the harmonic
series can be adapted to give the following useful test, due to A.L. Cauchy.
a) We have n=1
an n=0 2n a2n 2 n=1 an .
an = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + . . .
n=1
a1 + a2 + a2 + a4 + a4 + a4 + a4 + 8a8 + . . . =
2n a2n
n=0
=2
an .
n=1
6Take N larger than any of the roots of Q(x), so that every term in the sum is well dened.
The Cauchy Condensation Test is, I think, an a priori interesting result: it says
that, under the given hypotheses, in order to determine whether a series converges
we need to know only a very sparse set of the terms of the series whatever is
happening in between a2n and a2n+1 is immaterial, so long as the sequence remains
decreasing. This is a very curious phenomenon, and of couse without the hypothesis
that the terms are decreasing, nothing like this could hold.
On the other hand, it may be less clear that
the Condensation Test is of any
practical use: after
all, isnt the condensed series n 2n a2n more complicated than
the original series n an ? In fact the opposite is often the case: passing from the
given series to the condensed series preserves the convergence or divergence but
tends to exchange subtly convergent/divergent series for more obviously (or better:
more rapidly) converging/diverging series.
Example: Fix a real number p and consider the p-series7
is to nd all values of p for which the series converges.
1
n=1 np .
Our task
Step 1: The sequence an = n1p has positive terms. The terms are decreasing i
the sequence np is increasing i p > 0. So we had better treat the cases p 0 separately. First, if p < 0, then limn n1p = limn n|p| = , so the p-series diverges
by the nth term test. Second, if p = 0 then our series is simply n n10 = n 1 = .
So the p-series obviously diverges when p 0.
Step 2: Henceforth we assume
p > 0, so that the hypotheses
Cauchys
n nofp
nCondenp
np
sation
Test
apply.
We
get
that
n
converges
i
2
(2
)
=
=
n
n
n2 2
1p n
(2
)
converges.
But
the
latter
series
is
a
geometric
series
with
geometric
n
ratio r = 21p , so it converges i |r| < 1 i 2p1 > 1 i p > 1.
Thus we have proved the following important result.
1
1
n n p
2
(2
)
=
.
(21p )n =
p
n
1 21p
n=1
n=0
n=0
1
1
= 2.
2
n
1
212
n=1
1024
n=1
1
= 1.643957981030164240100762569 . . . .
n2
244
So it seems like n=1 n12 1.64, whereas the Condensation Test tells us that it
is at most 2. (Note that since the terms are positive, simply adding up any nite
number of terms gives a lower bound.)
The following
exercise gives a technique for using the Condensation Test to es
timate n=1 n1p to arbitrary accuracy.
Exercise: Let N be a non-negative integer.
a) Show that under the hypotheses of the Condensation Test we have
an
2n a2n+N .
n=0
n=2N +1
1
1
Example:
n=2 n log n . an = n log n is positive and decreasing (since its reciprocal
is positive and increasing) so the Condensation Test applies. We get that the
convergence of the series is equivalent to the convergence of
2n
1 1
=
= ,
n
n
2 log 2
log 2 n n
n
1
so the series diverges. This is rather subtle: we know that for any > 0, n nn
converges, since it is a p-series with p = 1 + . But log n grows more slowly than
n for any > 0, indeed slowly enough so that replacing n with log n converts a
convergent series to a divergent one.
1
n log(n!)
converges.
Exercise: Let
p, q, r be positive real numbers.
a) Show that n n(log1 n)q converges i q > 1.
1
b) Show that n np (log
n)q converges i p > 1 or (p = 1 and q > 1).
c) Find all values of p, q, r such that n np (log n)q1(log log n)r converges.
The pattern of Exercise X.X can be continued indenitely, giving series which converge or diverge excruciatingly slowly, and showing that the dierence between
convergence and divergence can be arbitarily subtle.
4.3. The Integral Test.
Theorem 11.16. (Integral Test) Let f : [1, ) R be a positive decreasing
function, and for n Z+ put an = f (n). Then
an
f (x)dx
an .
Thus the series
n=2
n=1
f (x)dx converges.
an
f (x)dx
an .
n=2
n=1
Remark: The Integral Test is due to Maclaurin8 [Ma42] and later in more modern
form to Cauchy [Ca89].
Among series which arise naturally in undergraduate analysis, it usually holds that
the Condensation Test can be successfully applied to determine convergence / divergence of a series if and only if the Integral Test can be successfully applied.9
Example:
Let us use the Integral Test to determine the set of p > 0 such
that
dx
1
converges.
Indeed
the
series
converges
i
the
improper
integral
is
p
n n
1 xp
nite. If p = 1, then we have
dx
x1p x=
=
|
.
p
x
1 p x=1
1
The upper limit is 0 if p 1 < 0 p > 1 and is if p < 1. Finally,
dx
= log x|
x=1 = .
x
1
So, once again, the p-series diverges i p > 1.
Exercise: Verify that all of the above examples involving the Condensation Test
can also be done using the Integral Test.
Given the similar applicability of the Condensation and Integral Tests, it is perhaps
not so surprising that many texts content themselves to give one or the other. In
calculus texts, one almost always nds the Integral Test, which is logical since often
integration and then improper integation are covered earlier in the same course in
which one studies innite series. In elementary analysis courses one often develops
sequences and series before the study of functions of a real variable, which is logical because a formal treatment of the Riemann integral is necessarily somewhat
involved and technical. Thus many of these texts give the Condensation Test.
The Condensation Test and the Integral Test have a similar range of applicability: in most textbook examples, where one test succeeeds, so will the other.
From an aesthetic standpoint, the Condensation Test is more appealing (to me).
On the other hand, under a mild additional hypothesis the Integral Test can be
used to give asymptotic expansions for divergent series. Our treatment of the
8Colin Maclaurin, 1698-1746
9Why this should be so is not clear to me: the observation is purely empirical.
246
N
N
with an bn and n an = . Then n bn = and n=1 an n=1 bn .
(K
1
n=1
an =
K1
n=1
an
an +
n=1
K1
an
(1 + )bn
n=1
an +
n=1
n=K
K1
(1 + )bn
n=K
(1 + )bn = C,K + (1 + )
n=1
bn ,
n=1
N
say, where C,K does not depend on N . Dividing both sides by n=1 bn and using
N
N
an
limN n=1 bn = , we nd that n=1
is at most 1 + 2 for all suciently
N
n=1 bn
N
n=1 bn
N
n=1 an
Theorem
11.18. Let f : [1, ) (0, ) be continuous and monotone. Suppose the series n f (n) diverges and that as x , f (x) f (x + 1). Then
N
N
f (n)
f (x)dx.
n=1
Proof.
n+1 Case 1: Suppose f is increasing. Then, for n x n + 1, we have
f (n) n f (x)dx f (n + 1), or
n+1
f (x)dx
f (n + 1)
.
1 n
f (n)
f (n)
By assumption we have
lim
f (n + 1)
= 1,
f (n)
247
n+1
Applying Lemma 11.17 with an = f (n) and bn = n f (x)dx, we conclude
N +1
N k+1
N
f (x)dx =
f (x)dx
f (n).
1
Further, we have
k=1
N +1
1
lim
f (x)dx
f (x)dx
n=1
f (N + 1)
= lim
= 1,
N f (N )
where in the starred equality we have applied LHopitals Rule and then the Fundamental Theorem of Calculus. We conclude
N
N +1
N
f (x)dx
f (x)dx
f (n).
1
n=1
or
n+1
f (x)dx
f (n + 1)
n
1.
f (n)
f (n)
Once again, by our assumption that f (n) f (n + 1) and the Squeeze Principle we
get (83); the remainder of the proof proceeds as in the previous case.
5. Series With Non-Negative Terms III: Ratios and Roots
Theorem 11.19. (Ratio Test) Let n an be a series with an > 0 for all n.
a) Suppose there exists N Z+ and 0 < r < 1 such that for all n N , aan+1
r.
n
an+2
an
aN +k
rk ,
aN
so
aN +k aN rk .
248
ak =
aN +k
aN rk < ,
k=N
k=0
k=0
an+1
an
xn+1
(n+1)!
xn
n!
x
.
n+1
n
n an is a series with non-negative terms such that an r for some r < 1. Raising both sides to the nth power gives an rn , and once again we nd that the
series converges by comparison to a geometric series.
a) Suppose thereexists N Z+ and 0 < r < 1 such that for all n N , ann r.
Then the series n an converges.
1
n
b) Suppose
that for innitely many positive integers n we have an 1. Then the
series n an diverges.
1
c) The hypothesis of part a) holds if = limn ann exists and is less than 1.
1
d) The hypothesis of part b) holds if = limn ann exists and is greater than 1.
Exercise: Prove Theorem 11.20.
5.3. Ratios versus Roots.
It is a fact a piece of calculus folklore that the Root Test is stronger than
the Ratio Test. That is, whenever the ratio test succeeds in determining the convergence or divergence of a series, the root test will also succeed.
In order to explain this result we need to make use of the limit inmum and limit
249
supremum. First we recast the ratio and root tests in those terms.
Exercise: Let
250
Now let
n an be a series which the Ratio Test succeeds in showing is convergent: that is, < 1. Then by Proposition 11.21, we have 1, so the Root
Test also shows that the series is convegent. Now suppose that the Ratio Test
succeeds in showing that the series is divergent: that is > 1. Then > 1,
so the Root Test also shows that the series is divergent.
n
Exercise: Consider the series n 2n+(1) .
a) Show that = 18 and = 2, so the Ratio Test fails.
b) Show that = = 12 , so the Root Test shows that the series converges.
Exercise: Construct further examples of series for which the Ratio Test fails but
the Root Test succeeds to show either convergence or divergence.
Warning: The sense in which the Root Test is stronger than the Ratio Test is
a theoretical one. For a given relatively benign series, it may well be the case that
the Ratio Test is easier to apply than the Root Test, even though in theory whenever the Ratio Test works the Root Test must also work.
1
Example: Consider again the series
n=0 n! . In the presence of factorials one
should always attempt the Ratio Test rst. Indeed
an+1
1/(n + 1)!
n!
1
= lim
= lim
= lim
= 0.
n an
n
n (n + 1)n!
n n + 1
1/n!
lim
Thus the Ratio Test limit exists (no need for liminfs or limsups) and is equal to 0,
so the series converges. If instead we tried the Root Test we would have to evaluate
1
1 n
limn ( n!
) . This is not so bad if we keep our head e.g. one can show that for
1
1
1 n
any xed R > 0 and suciently large n, n! > Rn and thus ( n!
) ( R1n ) n = R1 .
Thus the root test limit is at most R1 for any positive R, so it is 0. But this is
elaborate compared to the Ratio Test computation, which was immediate.
Turning these ideas around, Proposition 11.21 can be put to the following
sneaky use.
Corollary 11.22. Let {an }
n=1 be a sequence of positive real numbers. Assume
that limn
an+1
an
Proof. Indeed, the hypothesis gives that for the innite series
= L, so by Proposition 11.21 we must also have = L.
an we have
b) For R, limn n n .
1
c) limn (n!) n .
6. Absolute Convergence
6.1. Introduction to absolute convergence.
We turn now to the serious study of series with both positive and negative terms.
6. ABSOLUTE CONVERGENCE
251
It turns out that under one relatively mild additional hypothesis, virtually all of
our work on series with non-negative terms can be usefully applied in this case. In
this section we study this wonderful hypothesis: absolute convergence. (In the next
section we get really serious by studying series in the absence of absolute convergence. This will lead to surprisingly delicate and intricate considerations.)
A
real series n an is absolutely convergent if n |an | converges. Note that
it is convergent
n |an | is a series with non-negative terms, so to decide whether
important
|a
| and n an + |a
a
,
First Proof : Consider
the
three
series
n |. Our
n
n
n
n
hypothesis is that n |an | converges. But we claim that this implies that n an +
|an | converges as well. Indeed, consider the expression an + |an |: it is equal to
2an =
2|an | when an is non-negative and 0 when
an is negative.In particular the
a
+
|a
|
has
non-negative
terms
and
series
n
n
n an +
n
|an | n 2|an | < . So
a
+
|a
|
converges.
By
the
Three
Series
Principle,
n
n
n an converges.
n
Second Proof : The above argument
is
clever
maybe
too clever! Lets try
|a
|
converges,
for
every
> 0 there exists
something more fundamental: since
n
n
N Z+ such that for all n N , n=N |an | < . Therefore for all k 0,
|
N
+k
n=N
an |
N
+k
n=N
|an |
|an | ,
n=N
n=1
|an | is
As an example of how Theorem 11.23 may be combined with the previous tests
to give tests for absolute convergence, we record the following result.
Theorem 11.24. (Ratio & Root Tests for Absolute Convergence) Let n an
be a real series.
a) Assume an = 0 for all n. If there exists 0 r < 1 such that for all suciently
11We warn the reader that the more standard terminology is conditionally convergent.
We will later on give a separate denition for conditionally convergent and then it will be a
theorem that a real series is conditionally convergent if and only if it is nonabsolutely convergent.
The reasoning for this which we admit will seem abstruse at best to our target audience is
that in functional analysis one studies convergence and absolute convergence of series in a more
general context, such that nonabsolute converge and conditional convergence may indeed dier.
252
r,
the
series
n an is divergent.
n
1
n
c)
If there exists r < 1 such that for all suciently large n, |an | r, the series
n an is absolutely convergent.
1
d) If there are innitely many n for which |an | n 1, then the series diverges.
Proof. Parts a) and
c) are immediate: applying Theorem 11.19 (resp. Theorem 11.20) we nd that n |an | is convergent and the point is that by Theorem
11.23, this implies that n an is convergent.
There is something to say in parts
b) and d), because in general just because
|a
|
=
does
not
imply
that
n
n
n an diverges. (We will study this subtlety
later on in detail.) But recall that whenever
the Ratio or Root tests establish the
= lim |
n
and then all previous material on Ratio and Root Tests applies to all real series.
n
n
|cn | =
|
ak bnk |
|ak ||bnk | < ,
n=0
n=0 k=0
n=0 k=0
n
the last inequality following from the fact
|ak ||bnk | is the Cauchy
that n=0 k=0
product of
the two non-negative series n=0 |an | and n=0 |bn |, hence it converges.
We now wish to show that limN CN = n=0 cn = AB. Recall the notation
N =
ai bj = (a0 + . . . + aN )(b0 + . . . + bN ) = AN BN .
0i,jN
We have
|CN AB| |N AB| + |N CN |
6. ABSOLUTE CONVERGENCE
253
(
)
)
|AN BN AB| +
|an |
|bn | +
|bn |
|an | .
n=0
n=0
nN
nN
Fix > 0; since AN BN AB, for suciently large N |AN BN AB| < 3 . Put
A=
|an |, B =
n=0
|bn |.
n=0
nN
|bn | <
3A
and
Theorem 11.26.
(Mertens Theorem) Let n=0 an = A be an absolutely convegent
series and n=0 bn = B be a convergent series. Then the Cauchy product
an , BN =
n=0
bn , CN =
n=0
cn
n=0
put = n=0 |an |. Since BN B, N 0, and thus for any > 0 we may choose
nN N0
|N | |0 aN + . . . + N0 aN N0 | + |N0 +1 aN N0 1 + . . . + N a0 |
|0 aN + . . . + N0 aN N0 | + M
|an | + + = .
2
2
2 2
nN N0
254
So N 0.
7. Non-Absolute Convergence
We say that
a real series n an is nonabsolutely convergent if the series converges but n |an | diverges, thus if it is convergent but not absolutely convergent.13
A series which is nonabsolutely convergent is a more delicate creature than any
we have studied thus far. A test which can show that a series is convergent but
nonabsolutely convergent is necessarily subtler than those of the previous section.
In fact the typical undergraduate student of calculus / analysis learns exactly one
such test, which we give in the next section.
7.1. The Alternating Series Test.
Consider the alternating harmonic series
(1)n+1
1 1
= 1 + ....
n
2 3
n=1
Upon taking the absolute value of every term we get the usual harmonic series,
which diverges, so the alternating harmonic series is not absolutely convergent.
However, some computations with partial sums suggests that the alternating harmonic series is convergent, with sum log 2. By looking more carefully at the partial
sums, we can nd a pattern that allows us to show that the series does indeed
converge. (Whether it converges to log 2 is a dierent matter, of course, which we
will revisit much later on.)
It will be convenient to write an = n1 , so that the alternating harmonic series
n+1
is n (1)
n+1 . We draw the readers attention to three properties of this series:
(AST1) The terms alternate in sign.
(AST2) The nth term approaches 0.
(AST3) The sequence of absolute values of the terms is weakly decreasing:
a1 a2 . . . an . . . .
These are the clues from which we will make our case for convergence. Here it is:
consider the process of passing from the rst partial sum S1 = 1 to S3 = 1 12 + 13 =
5
6 . We have S3 1, and this is no accident: since a2 a3 , subtracting a2 and then
adding a3 leaves us no larger than where we started. But indeed this argument is
valid in passing from any S2n1 to S2n+1 :
S2n+1 = S2n1 (a2n a2n+1 ) S2n1 .
Thus the sequence of odd-numbered partial sums {S2n1 } is decreasing. Moreover,
S2n+1 = (a1 a2 ) + (a3 a4 ) + . . . + (a2n1 | |a2n ) + a2n1 0.
13One therefore has to distinguish between the phrases not absolutely convergent and
nonabsolutely convergent: the former allows the possibility that the series is divergent, whereas
the latter does not. In fact our terminology here is not completely standard. We defend ourselves
grammatically: nonabsolutely is an adverb, so it must modify convergent, i.e., it describes
how the series converges.
7. NON-ABSOLUTE CONVERGENCE
255
Therefore all the odd-numbered terms are bounded below by 0. By the Monotone
Sequence Lemma, the sequence {S2n+1 } converges to its greatest lower bound, say
Sodd . On the other hand, just the opposite sort of thing happens for the evennumbered partial sums:
S2n+2 = S2n + a2n+1 a2n+2 S2n
and
S2n+2 = a1 (a2 a3 ) (a4 a5 ) . . . (a2n a2n+1 |) a2n+2 a1 .
Therfore the sequence of even-numbered partial sums {S2n } is increasing and
bounded above by a1 , so it converges to its least upper bound, say Seven . Thus we
have split up our sequence of partial sums into two complementary subsequences
and found that each of these series converges. Now the sequence {Sn } converges i
Sodd = Seven , and the inequalities
S2 S4 . . . S2n S2n+1 S2n1 . . . S3 S1
show that Seven Sodd . Moreover, for any n Z+ we have
Sodd Seven S2n+1 S2n = a2n+1 .
Since a2n+1 0, we conclude Sodd = Seven = S, i.e., the series converges.
Further, since for all n we have S2n S2n+2 S S2n+1 , it follows that
|S S2n | = S S2n S2n+1 S2n = a2n+1
and similarly
|S S2n+1 | = S2n+1 S S2n+1 S2n+2 = a2n+2 .
Thus the error in cutting o the innite sum n=1 (1)n+1 |an | after N terms is in
absolute value at most the absolute value of the next term: aN +1 .
Of course in all this we never used that an = n1 but only that we had a series
satisfying (AST1) (i.e., an alternating series), (AST2) and (AST3). Therefore the
preceding arguments have in fact proved the following more general result, due
originally due to Leibniz.
Theorem 11.27. Let {an }
n=1 be a sequence of non-negative real numbers which
is weakly decreasing and such that limn
an = 0. Then:
a) The associated alternating series n (1)n+1 an converges.
b) For N Z+ , put
(84)
EN = |(
(1)n+1 an ) (
n=1
(1)n+1 an )|.
n=1
n=1
(1)n+1
np
is:
256
P (x)
Exercise: Let Q(x)
be a rational function. Give necessary and sucient condi
n P (x)
tions for n (1) Q(x) to be nonabsolutely convergent.
n=1
EN = |S
an |.
n=1
1000
1
(1)n+1 (1)n+1
|
|
.
n
n
1001
n=1
n=1
n=1
(1)n+1
= 0.6926474305598203096672310589 . . . .
n
Again, later we will show that the exact value of the sum is log 2, which my software
package tells me is14
log 2 = 0.6931471805599453094172321214.
Thus the actual error in cutting o the sum after 1000 terms is
E1000 = 0.0004997500001249997500010625033.
It is important to remember that this and other error estimates only give upper
bounds on the error: the true error could well be much smaller. In this case we
1
and we see that the true error is
were guaranteed to have an error at most 1001
about half of that. Thus the estimate for the error is reasonably accurate.
Note well that although the error estimate of Theorem 11.27b) is very easy
to apply, if an tends to zero rather slowly (as in this example), it is not especially
ecient for computations. For instance, in order to compute the true sum of the alternating harmonic series to six decimal place accuracy using this method, we would
14Yes, you should be wondering how it is computing this! More on this later.
7. NON-ABSOLUTE CONVERGENCE
257
need to add up the rst million terms: thats a lot of calculation. (Thus please be
assured that this is not the way that a calculator or computer would compute log 2.)
n
1
(1)n (1)n
|<
< 106 .
n!
n!
10!
n=0
n=0
(1)n
= 0.3678791887125220458553791887.
n!
n=0
Theorem 11.28. (Dirichlets Test) Let n=1 an , n=1 bn be two innite series. Suppose that:
(i) The partial sums Bn = b1 + . . . + bn are bounded.
ak bk | = |
k=m
ak (Bk Bk1 )| = |
k=m
=|
n1
k=m
ak Bk
n1
ak+1 Bk |
k=m1
k=m
( n1
)
+ |an ||Bn | + |am ||Bm1 |
k=m
15Johann Peter Gustav Lejeune Dirichlet, 1805-1859
258
M(
n1
k=m
Therefore
( n1
)
(ak ak+1 ) + an + am
k=m
= M (am an + an + am ) = 2M am 2M aN < .
an bn converges by the Cauchy criterion.
In the preceding proof, without saying what we were doing, we used the technique
of summation by parts.
If we take bn = (1)n+1 , then B2n+1 = 1 for all n and B2n = 0 for all n,
so {bn } has bounded partial sums. Applying Dirichlets Test with a sequence
a
to 0 and with this sequence {bn }, we nd that the series
n which decreases
n+1
an converges. We have recovered the Alternating Series
n an bn =
n (1)
Test!
In fact Dirichlets Test yields the following Almost Alternating Series Test:
let {an } be a sequence decreasing to 0, and for all n let bn {1} be a sign sequence which is almost alternating in the sense
that the sequence of partial
sums Bn = b1 + . . . + bn is bounded. Then the series n bn an converges.
Exercise: Show that Dirichlets generalization of the Alternating Series Test is
as strong as possible in the following sense: if {bn } is a sequence of elements,
each 1, such that the sequence of partial sums Bn = b1
+ . . . + bn is unbounded,
then there is a sequence an decreasing to zero such that n an bn diverges.
Exercise:
a) Use the trigonometric identity
cos n =
sin(n + 21 ) sin(n 12 )
2 sin( 12 )
n=1
sin n
n
is nonabsolutely convergent.
Remark: Once we know about series of complex numbers and Eulers formula
eix = cos x + i sin x, we will be able to give a trigonometry-free proof of the preceding two exercises.
Dirichlet himself applied his test to establish the convergence of a certain class
of series of a mixed algebraic and number-theoretic nature. The analytic properties of these series were used to prove his celebrated theorem on prime numbers in
arithmetic progressions. To give a sense of how inuential this work has become, in
modern terminology Dirichlet studied the analytic properties of Dirichlet series
associated to nontrivial Dirichlet characters. For more information on this work,
the reader may consult (for instance) [DS].
7.3. Cauchy Products III: A Divergent Cauchy Product.
Let us give an example due to Cauchy! of a Cauchy product of two nonabsolutely
1
1
cn =
(1)i (1)j
.
j
+1
i
+
1
i+j=n
259
(1)n
.
n+1
Now (1)i (1)j = (1)i+j = (1)n , so cn is equal to (1)n times a sum of positive
1
1
1
terms. Since i, j n, i+1
, j+1
n+1
, and thus each term in cn has absolute
1
1
2
value at least ( n+1 ) = n+1 . Since we are summing from i = 0 to n there are
n +1 terms, all of the same size, we nd |cn | 1 for all n. Thus the general term
of n cn does not converge to 0, so the series diverges.
7.4. Decomposition into positive and negative parts.
For a real number r, we dene its positive part
r+ = max(r, 0)
and its negative part
r = min(r, 0).
Exercise: Let r be a real number. Show:
a) r = r+ r .
b) |r| = r+ + r .
For any real series
an we have a decomposition
an =
a+
a
n
n,
n
Case 1: Both
n an and n an converge. Hence n |an | = n (an + an ) converges: i.e., n an is absolutely convergent.
| = n a+
Case 2: Both n a+
n + an diverges:
n and
n |an
n an diverge. Hence
indeed,
if
it
converged,
then
adding
and
subtracting
a
we
would
get that
n n
and
2
a
converge,
contradiction.
Thus:
2 n a+
n
n n
Exercise: Let
n an be a real series.
a) Show that if n a+
n converges and n an diverges then n an = .
b) Show that if n a+
n diverges and
n an converges then
n an = .
8. Power Series I: Power Series as Series
8.1. Convergence of Power Series.
n
Let {an }
n=0 be a sequence of real numbers. Then a series of the form
n=0 an x
is called a power series. Thus, for instance, if we had an = 1 for all n we would
260
1
.
get the geometric series n=0 xn which converges i x (1, 1) and has sum 1x
n
The nth partial sum of a power series is k=0 ak xk , a polynomial in x. One
of the major themes of Chapter three will be to try to view power series as innite
polynomials: in particular, we will regard x as a variable and be interested in the
propeties
continuity, dierentiability, integrability, and so on of the function
an 0n = a0 + a1 0 + a2 02 = a0 .
n=0
n=0
(n + 1)!xn+1
= lim (n + 1)|x|.
n
n
n!xn
The last limit is 0 if x = 0 and otherwise is +. Therefore the Ratio Test shows
that (as we already knew!) the series converges absolutely at x = 0 and diverges
at every nonzero x. So it is indeed possible for a power series
to converge only at
x = 0. This is disappointing if we are interested in f (x) = n an xn as a function
of x, since in this case it is just the function from {0} to R which sends 0 to a0 .
There is nothing interesting going on here.
lim
Example 2: Consider
xn
n=0 n! . We
n+1
lim |
x
n!
|x|
|| | = lim
= 0.
n n + 1
(n + 1)! xn
1
n
n=1 nRn x .
nR
|x|
n+1 1
|x|
= |x| lim
=
.
n
(n + 1)Rn+1 |x|n
n
R
R
n
n+1
Therefore the series converges absolutely when |x| < R and diverges when |x| > R.
We must look separately at the
case |x| = R i.e., when x = R. When x = R, the
series is the harmonic series n n1 , hence divergent. But when x = R, the series
n
261
1
n
n=1 n2 Rn x .
n2 R n
|x|n+1
= |x| lim
n (n + 1)2 Rn+1 |x|n
n
lim
=
.
n
R
R
So once again the series converges absolutely when |x| < R, diverges when |x|
> R,
and we must look separately at x = R. This time plugging in x = R gives n n12
n
In each case the set of values is an interval containing 0 and with a certain radius,
i.e., an extended real number R [0, ) such that the series denitely converges
for all x (R, R) and denitely diverges for all x outside of [R, R]. Our goal is
to show that this is the case for any power series.
This goal can be approached at various degrees of sophistication. At the calculus level, we have already said what is needed: we use the Ratio Test to see that
the convergence set isan interval around 0 of a certain radius R. Namely, taking a
general power series n an xn and applying the Ratio Test, we nd
|an+1 xn+1 |
an+1
= |x| lim
.
n
n an
|an xn |
lim
n
n
Lemma 11.31.
A > 0 and let
n an x be a power series. If
n an A
Let
n
converges, then n x converges absolutely for all x (A, A).
262
B n
B
|an B n | =
|an An |
< .
A
A
n
n
n
Exercise: Let n=0 an xn and n=0 bn xn be two power serieswith positive radii
n
of convergence Ra and Rb . Let R = min(Ra , Rb ). Put cn = k=0 ak bnk . Show
that the formal identity
(
)(
)
n
n
an x
bn x
=
cn xn
n=0
n=0
n=0
is valid for all x (R, R). (Suggestion: use past results on Cauchy products.)
The drawback of Theorem 11.32 is that it does not give an explicit description
of the radius of convergence R in terms of the coecients of the power series, as is
|
the case when the ratio test limit = limn |a|an+1
exists. In order to achieve this
n|
in general, we need to appeal instead to the Root Test and make use of the limit
supremum. The following elegant result is generally attributed to J.S. Hadamard,16
who published it in 1888 [Ha88] and included it in his 1892 PhD thesis. This seems
remarkably late in the day for a result which is so closely linked to (Cauchys) Root
Test. It turns out that the result was established by our most usual suspect: it was
rst proven by Cauchy in 1821 [Ca21] but apparently had been nearly forgotten.
16Jacques Salomon Hadamard (1865-1963)
263
1
:
Proof. We have lim supn |an xn | n = |x| lim supn |an | n = |x|. Put
R = 1 . If |x| < R, choose A such that |x| < A < R and then A such that
1
1
< A < .
R
A
1
Then for all suciently large n, |an xn | n A A < 1, so the series converges absolutely by the Root Test. Similarly, if |x| > R, choose A such that R < |x| < A and
then A such that
1
1
< A <
= .
A
R
1
Then there are innitely
many non-negative integers n such that |an xn | n A A >
Remark: For the reader who is less than comfortable with limits inmum and supremum, we recommend simply assuming that the Ratio Test limit = limn | aan+1
|
n
exists and proving Theorem 11.35 under that additional assumption using the Ratio
Test. This will be good enough for most of the power series encountered in practice.
CHAPTER 12
P (n) (c)
n!
for all 0 n N .
N
1
bn (t c)n .
n=0
Then
266
N 1
n
n
n=0 (an bn )(t c) =
n=0 dn t . Reasoning as above we nd that dN 1 =
aN 1 bN 1 = 0, and so forth: continuing in this way we nd that an = bn for all
0 = n N , and thus the c-expansion is unique.
b) Consider the identity
(85)
P (x) =
n=0
P (k) (c)
.
k!
f (k) (c)
k=0
k!
(x c)k .
Then TN (x) is the unique polynomial function of degree at most N such that
(k)
TN (c) = f (k) (c) for all 0 k N .
Proof. Applying Theorem 0 to TN (x) =
(k)
TN (c)
,
k!
(k)
TN (c)
N
k=0
f (k) (c)
k! (x
c)k gives
f (k) (c)
k!
hence
= f (c) for all 0 k N . As for the uniqueness: let P (x) be
any polynomial of degree at most N such that P (k) (c) = f (k) (c) for 0 k N , and
let Q = TN P . Then Q is a polynomial of degree at most N such that Q(k) (c) = 0
N Q(k) (c)
for 0 k N ; applying Theorem 12.1 we get Q(x) =
(x c)k =
k=0
k!
N
k
k=0 0 (x c) = 0, i.e., Q is the zero polynomial and thus P = TN .
(k)
xc
f (x) g(x)
= 0.
(x c)n
267
xc
f (x) g(x)
= lim f (x) g(x) = f (c) g(c).
xc
(x c)0
The converse, that if f (c) = g(c) then limxc f (x) g(x) = 0, is equally clear.
Example 1: We claim that two dierentiable functions f and g agree to order
1 at c if and only if f (c) = g(c) and f (c) = g (c). Indeed, by the exercise above
both hypotheses imply f (c) = g(c), so we may assume that, and then we nd
f (x) f (c) g(x) g(c)
f (x) g(x)
= lim
= f (c) g (c).
xc
xc
xc
xc
Thus assuming f (c) = g(c), f and g agree to order 1 at c if and only f (c) = g (c).
lim
xc
The following result gives the expected generalization of these two examples. It
is generally attributed to Taylor,1 probably correctly, although special cases were
known to earlier mathematicians. Note that Taylors Theorem often refers to a
later result (Theorem 12.4) that we call Taylors Theorem With Remainder, even
though it is Theorem 12.3 (only) that was proved by Brook Taylor.
Theorem 12.3. (Taylor) Let n N and f, g : I R be two n times dierentiable functions. Let c be an interior point of I. The following are equivalent:
(i) We have f (c) = g(c), f (c) = g (c), . . . , f (n) (c) = g (n) (c).
(ii) f and g agree to order n at c.
Proof. Set h(x) = f (x) g(x). Then (i) holds i h(c) = h (c) = . . . =
h(x)
h (c) = 0 and (ii) holds i limxc (xc)
n = 0. So we may work with h instead of
f and g. Since we dealt with n = 0 and n = 1 above, we may assume n 2.
h(x)
0
(i) = (ii): L = limxc (xc)
opitals Rule gives
n is of the form 0 , so LH
(n)
h (x)
,
xc n(x c)n1
L = lim
provided the latter limit exists. By our assumptions, this latter limit is still of the
form 00 , so we may apply LHopitals Rule again. We do so i n > 2. In general,
we apply LHopitals Rule n 1 times, getting
(
)
h(n1) (x)
1
h(n1) (x) h(n1) (c)
L = lim
=
lim
,
xc n!(x c)
n! xc
xc
provided the latter limit exists. But the expression in parentheses is nothing else
than the derivative of the function h(n1) (x) at x = c i.e., it is h(n) (c) = 0 (and,
in particular the limit exists; only now have the n 1 applications of LHopitals
Rule been unconditionally justied), so L = 0. Thus (ii) holds.
(ii) = (i): Let Tn (x) be the degree N Taylor polynomial to h at c. By Corollary
1Brook Taylor, 1685 - 1731
268
12.2, f and Tn agree to order n at c, so by the just proved implication (i) = (ii),
h(x) and Tn (x) agree to order n at x = c:
lim
xc
h(x) Tn (x)
= 0.
(x c)n
h (c)
2 (x
c)2 + . . . +
(x c)n
h (c)
2 (x
c) + . . . + h
(x c)n1
(n)
h(n) (c)
n! (x
Tn (x)
(xc)n
(c)
n! (x
c)n
= 0.
c)n1
= 0.
xc
so h
(n)
h(n) (c)
n! (x
c)n
h(n) (c)
=
,
n
(x c)
n!
(c) = 0.
L = lim
assuming this limit exists. But to assume this last limit exists and is equal to h(n) (0)
is to assume that nth derivative of h is continuous at zero, which is slightly more
than we want (or need) to assume.
f (x)
For n N, a function f : I R vanishes to order n at c if limxc (xc)
n = 0.
Note that this concept came up prominently in the proof of Theorem 12.3 in the
form: f and g agree to order n at c i f g vanishes to order n at c.
269
Rn (x) =
f (n+1) (z)
(x z)n (x c).
(n + 1)!
Rn (x) =
f (n+1) (z)
(x c)n+1 .
(n + 1)!
270
d) We have
|Rn (x)| = |f (x) Tn (x)|
||f (n+1) ||
|x c|n+1 .
(n + 1)!
f (k) (t)
(89)
Rn,t (x) = f (x) Tn,t (x) = f (x)
(x t)k .
k!
k=0
f
(t)
f (k) (t)
f (n+1) (t)
k
k1
S (t) = f (t)
(x t) +
(x t)
=
(xt)n .
k!
(k 1)!
n!
k=1
We apply the Mean Value Theorem to S on |[c, x]|: there is z |(c, x)| such that
S(x) S(c)
f (n+1) (z)
= S (z) =
(x z)n .
xc
n!
Noting that S(x) = Rn,x (x) = 0 and S(c) = Rn,c (x) = Rn (x), this gives
f (n+1) (z)
(x z)n (x c).
n!
b) Apply the Cauchy Mean Value Theorem to S(t) and g(t) = (x t)n+1 on |[c, x]:
there is z |(c, x)| such that
Rn (x) = S(c) S(x) =
f
(z)
(x z)n
Rn (x)
S(x) S(c)
S (z)
f (n+1) (z)
n!
=
=
=
=
,
(x c)n+1
g(x) g(c)
g (z)
(n + 1)(x z)n
(n + 1)!
(n+1)
so
f (n+1) (z)
(x c)n+1 .
(n + 1)!
c) If f (n+1) is integrable on |[c, x]|, then
x
x (n+1)
f
(t)(x t)n dt
Rn (x) = (S(x) S(c)) =
S (t) =
.
n!
c
c
Rn (x) =
d) This follows almost immediately from part b); the proof is left to the reader.
Exercise: Show that Theorem 12.4 (Taylors Theorem With Remainder) implies
Theorem 12.3 (Taylors Theorem) under the additional hypothesis that f (n+1) exists
and is continuous3 on the interval |[c, x]|.
3Thanks to Nick Fink for pointing out the hypothesis of continuity seems to be needed here.
4. TAYLOR SERIES
271
4. Taylor Series
4.1. The Taylor Series. Let f : I R be an innitely dierentiable function, and let c I. We dene the Taylor series of f at c to be
f (n) (0)xn
T (x) =
;
n!
n=0
indeed, to get from this to the general case one merely has to make the change of
variables x 7 x c. It is traditional to call Taylor series centered around c = 0
Maclaurin series. But I know no good reason for this Taylor series were introduced by Taylor in 1721, whereas Colin Maclaurins Theory of uxions was not
published until 1742 and makes explicit attribution is made to Taylors work.4 Using separate names for Taylor series centered at 0 and Taylor series centered at c
often suggests misleadingly! to students that there is some conceptual dierence
between the two cases. So we will not use the term Maclaurin series here.
1
(0)xn
T (x) = n=0 f n!
be its Taylor series.
a) For which values of x does T (x) converge?
b) If for x I, T (x) converges, do we have T (x) = f (x)?
Notice that Question 12.5a) is simply asking for which values of x R a power series is convergent, a question to which we worked out a very satisfactory answer in
X.X. Namely, the set of values x on which a power series converges is an interval
of radius R [0, ] centered at 0. More precisely, in theory the value of R is given
1
by Hadamards Formula R1 = lim supn |an | n , and in practice we expect to be
able to apply the Ratio Test (or, if necessary, the Root Test) to compute R.
If R = 0 then T (x) only converges at x = 0 and we have T (0) = f (0): this is a
trivial case. Henceforth we assume that R (0, ] so that f converges (at least)
on (R, R). Fix a number A, 0 < A R such that (A, A) I. We may then
move on to Question 12.5b): must f (x) = T (x) for all x (A, A)?
The answer is no: consider the function f (x) of Exercise X.X. f (x) is innitely dierentiable and has f (n) (0) = 0 for all n N, so its Taylor series is
4Special cases of the Taylor series concept were well known to Newton and Gregory in the
17th century and to the Indian mathematician Madhava of Sangamagrama in the 14th century.
272
T (x) =
0xn
n=0 n!
n=0
So it comes down to being able to give upper bounds on Rn (x) which tend to zero as
n . According to Taylors Theorem with Remainder, this will hold whenever
we can show that the norm of the nth derivative ||f (n) || does not grow too rapidly.
Example: We claim that for all x R, the function f (x) = ex is equal to its
Taylor series expansion at x = 0:
xn
x
e =
.
n!
n=0
First we compute the Taylor series expansion: f (0) (0) = f (0) = e0 = 1, and f (x) =
ex , hence every derivative of ex is just ex again. We conclude that f (n) (0) = 1 for
n
all n and thus the Taylor series is n=0 xn! , as claimed. Next note that this power
series converges for all real x, as we have already seen: just apply the Ratio Test.
Finally, we use Taylors Theorem with Remainder to show that Rn (x) 0 for each
xed x R. Indeed, Theorem 12.4 gives us
Rn (x)
||f (n+1) ||
|x c|n+1 ,
(n + 1)!
where ||f (n+1) || is the supremum of the the absolute value of the (n+1)st derivative
on the interval |[0, x]|. But lucky us in this case f (n+1) (x) = ex for all n and
the maximum value of ex on this interval is ex if x 0 and 1 otherwise, so in either
way ||f (n+1) || e|x| . So
( n+1 )
x
.
Rn (x) e|x|
(n + 1)!
And now we win: the factor inside the parentheses approaches zero with n and is
being multiplied by a quantity which is independent of n, so Rn (x) 0. In fact a
4. TAYLOR SERIES
273
moments thought shows that Rn (x) 0 uniformly on any bounded interval, say
on [A, A], and thus our work on the general properties of uniform convergence of
power series (in particular the M -test) is not needed here: everything comes from
Taylors Theorem With Remainder.
Example continued: we use Taylors Theorem With Remainder to compute e = e1
accurate to 10 decimal places.
A little thought shows that the work we did for f (x) = ex carries over verbatim
under somewhat more general hypotheses.
Theorem 12.6. Let f (x) : R R be a smooth function. Suppose that for all
A [0, ) there exists a number MA such that for all x [A, A] and all n N,
|f (n) (x)| MA .
(n)
(0)xn
converges absolutely for all x R.
a) The Taylor series T (x) = n=0 f n!
b) For all x R we have f (x) = T (x): that is, f is equal to its Taylor series
expansion at 0.
()( 1) ( (n 1)) n
x .
n!
n=1
274
If
() N, we recognize the nth Taylor series coecient as the binomial coecient
n , and this ought not to be surprising because for N, expanding out T (x)
simply gives the binomial theorem:
( )
x .
N, (1 + x) =
n
n=0
So lets extend our denition of binomial coecients: for any R, put
( )
= 1,
0
( )
()( 1) ( (n 1))
+
.
n Z ,
=
n!
n
Exercise: For any R, n Z+ , show
( ) (
) (
)
1
1
(90)
=
+
.
n
n1
n
Finally, we rename the Taylor series to f (x) as the binomial series
( )
n
B(, x) =
x .
n
n=0
The binomial series is as old as calculus itself, having been studied by Newton in the
17th century.5 It remains one of the most important and useful of all power series.
For us, our order of business is the usual one when given a Taylor series: rst, for
each xed we wish to nd the interval I on which the series B(, x) converges.
Second, we would like to show if possible! that for all x I, B(, x) = (1 + x) .
Theorem 12.7. Let R \ N, and consider the binomial series
( )
( )
n
n
B(, x) =
x =1+
x .
n
n
n=0
n=1
a)
b)
c)
d)
n
n n + 1
n
4. TAYLOR SERIES
275
1
Step 1: Suppose (0, 1). Choose an integer m 2 such that m
< . Then
( )
1
1 1
(1 ) (n 1 )
|
|=
< 1(1 ) (n 1 )
n!
m
m n!
n
m 1 2m 1
(n 1)m 1 1
1
= an ,
m
2m
(n 1)m n
n
say. Using Step 0, we get
=
am1
<
n
(n 1)m
m
2m
2m 1 3m 1
nm 1
2m
(n 1)m m 1
1 1
m
m1
2m 1
(n 1)m 1 nm 1
an n
()
1
1
It follows that an < 1 , so | n | < 1+ 1 , so
nm
n m
()
1
|
|
1 < .
1+ m
n
n
n n
( )
()
n
This shows that B(, 1) is absolutely convergent; since |
n (1) | = | n |, it also
shows that B(, 1) is absolutely convergent.
Step 2: Using the identity (90), we nd
) (
))
( )
((
n
1
1
S(, x) = 1 +
x = 1+
+
xn = (1 + x)S( 1, x).
n
n
1
n
n=1
n=1
=
n
/
=
(1, 0);
n+1
n
n+1
()
this shows simultaneously that the sequence of terms of B(, 1) =
n=0 n is
decreasing in absolute value and alternating in sign. Further, write = 1, so
1
that (0, 1). Choose an integer m 2 such that m
< . Then
( )
(1 )(2 ) (n 1 ) n
n
|
|=
= bn
.
n
(n 1)!
n
n
Arguing as in Step 1 of part b) shows that bn < 11 , and hence
nm
( )
n
lim |
| = lim bn lim
= 0 1 = 0.
n
n
n
n
n
Therefore the Alternating Series Test applies to show that S(, 1) converges.
( )
d) The absolute value of the nth term of both B(, 1) and B(, 1) is |
n |. If
1, then | n| n + 1 and thus
(
) ( )
n
| 1,
|
/
|=|
n+1
n+1
n
( )
and thus
n 0. By the N th term test, S(, 1) and S(, 1) diverge.
276
Exercise*: Show that for (1, 0), the binomial series B(, 1) diverges.
Remark: As the reader has surely noted, the convergence of the binomial series
S(, x) at x = 1 is a rather delicate and tricky enterprise. In fact most texts at
this level even [S] do not treat it. We have taken Step 1 of part b) from [Ho66].
Remark: There is an extension of the Ratio Test due to J.L. Raabe which simplies much the of the above analysis, including the preceding exercise.
Theorem 12.8. Let R \ N; let f (x) = (1 + x) , and consider its Taylor
series at zero, the binomial series
( )
n
B(, x) =
x .
n
n=0
a) For all x (1, 1), f (x) = B(, x).
b) If > 1, f (1) = B(, 1).
c) If > 0, f (1) = B(, 1).
Proof. [La] Let Tn1 (x) be the (n 1)st Taylor polynomial for f at 0, so
B(, x) = lim Tn1 (x)
n
is the Taylor series expansion of f at zero. As usual, put Rn1 (x) = f (x)Tn1 (x).
a) By Theorem 12.4b),
x n
x
f (t)(x t)n1 dt
1
Rn1 (x) =
=
(1) (n+1)(1+t)n (xt)n1 dt.
(n 1)!
(n 1)! 0
0
By the Mean Value Theorem for Integrals, there is (0, 1) such that
Rn1 (x) =
( 1) ( n + 1)
(1 + x)n (x x)n1 (x 0).
(n 1)!
Put
t=
1
, cn (s) =
1 + x
)
1 n1
s
.
n1
Then
(1 + s)1 =
cn (s)
n=1
and
Rn1 (x) = cn (xt)x(1 + x)1 .
Since x (1, 1), we have t (0, 1), so |xt| < 1. It follows that
n=1 cn (xt)
converges, so by the nth term test cn (xt) 0 as n and thus Rn1 (x) 0.
b) The above argument works verbatim if x = 1 and > 1.
c) If > 0, then by Theorem 12.7b), S(, 1) is convergent. Moreover, 1 > 1,
so
n=1 cn (1) converges and thus cn (1) 0. But |cn (1)| = |cn (1)|, so also
cn (1) 0 and thus Rn1 (1) 0.
5. HERMITE INTERPOLATION
277
5. Hermite Interpolation
It is a well-known fact that two points determine a line. One version of this
is: given real numbers x1 < x2 and real numbers f1 , f2 , there is a unique linear
function f : R R such that f (x1 ) = f1 and f (x2 ) = f2 . In a similar way, three
points determine a parabola: given real numbers x1 < x2 < x3 and real numbers
f1 , f2 , f3 , there is a unique quadratic polynomial P (x) = ax2 + bx + c such that
P (xi ) = fi for i = 1, 2, 3. The following is a generalization of this.
Theorem 12.9. (Polynomial Interpolation) Let n Z+ , let x0 < . . . < xn be
real numbers, and let y0 , . . . , yn be real numbers. Then there is a unique polynomial
P of degree at most n such that P (xi ) = yi for 0 i n. Indeed, there are real
numbers A0 , . . . , An such that
(91) P (x) = A0 + A1 (x x0 ) + A2 (x x0 )(x x1 ) + . . . + An (x x0 ) (x xn1 ).
Proof. Uniqueness: Suppose that f and g are two such polynomials. Then
the polynomial h = f g has degree at most n and vanishes at the n + 1 points
x0 , . . . , xn . Since a nonzero polynomial cannot have more roots than its degree, we
must have h 0 and thus f = g.
Existence: Often the way to nd something is to postulate (i.e., guess) a certain
form that the solution should take, while leaving certain parameters undetermined,
then algebraically solve for these parameters. (This is sometimes called the method
of undetermined coecients.) The present result is an instance of this: (91)
gives the general form of the solution. If we plug in x = x0 we nd
y0 = P (x0 ) = A0 .
Plugging in x = x1 we nd
y1 = P (x1 ) = A0 + A1 (x1 x0 ),
which we can uniquely solve for A1 : explicitly
A1 =
y1 A0
y1 y0
=
.
x1 x0
x1 x0
In fact, following Lagrange, we can give a more explicit formula for the interpolating
polynomial of Theorem 12.9. (Why didnt we start with this? Because often it is
useful to make a distinction between proving the existence of something and giving
an explicit construction of it. The former is often easier, whereas the latter is often
more useful. In fact we will not need the explicit formula for most of our work here,
but in our study of quadrature we will want to have it.)
Theorem 12.10. (Lagrange Interpolation Formula) Let n Z+ , let x0 < . . . <
xn be real numbers, and let y0 , . . . , yn be real numbers. For 0 j n, dene
x x0
x xj1 x xj+1
x xn
x xn
=
.
j (x) =
xj xn
xj x0
xj xj1 xj xj+1
xj xn
0in, i=j
278
Then an explicit formula for the unique polynomial P (x) of degree at most n such
that P (xi ) = yi for all 0 i n of Theorem 12.9 is
(92)
P (x) =
yj j (x).
j=0
5. HERMITE INTERPOLATION
279
Let us now try to switch back to the old notation: we give ourselves n + 1 real
numbers with multiplicity: x0 x1 . . . xn , a and n + 1 real numbers
f0 , . . . , fn . We write the interpolation problem as above as f (xi ) = fi , but with the
understanding that when a root is repeated more than once, the further conditions
are conditions on the derivatives of f . In this case we claim that the interpolation
polynomial can be taken in the same form as above: namely, there are unique real
numbers A0 , . . . , An such that
f = f (x) = A0 + A1 (x x0 ) + A2 (x x0 )(x x1 ) + . . . + An (x x0 ) (x xn1 ).
At the moment we will prove this by linear algebraic considerations (which is cheating: we are not supposed to be assuming any knowledge of linear algebra in this
text!). Namely, since we have already shown the existence of an interpolating polynomial f of degree at most n, it suces to show that the set of polynomials
S = {1, x x0 , (x x0 )(x x1 ), . . . , (x x0 ) (x xn1 )}
spans the R-vector space Pn of all polynomials of degree at most n. The set S is
linearly independent: indeed, the polynomials have distinct degrees, so a nontrivial
linear indeendence relationship would allow us to write a nonzero polynomial as
a linear combination of polynomials of smaller degree, which is absurd. Further,
#S = n + 1. But Pn has dimension n, so S must be a basis for Pn : in paticular S
spans Pn .
Having billed Theorem 12.11 as generalizing Theorem 12.9, let us now call attention to the other extreme: suppose x0 = . . . = xn = c, say. Then the interpolating
polynomial P is precisely the degree n Taylor polynomial at c to any n times differentiable function f with f (j) (c) = fj for 0 j n. This brings up a key
idea in general: let I be an interval containing the points x0 . . . xn , and let
f : I R be an n times dierentiable function. We dene the Hermite interpolation polynomial P (x) to be the unique polynomial of degree at most n such
that for all 0 i n, P (xi ) = f (xi ): here we are using the above slightly shady
convention that when the xi s occur with multiplicity greater than 1, the conditions
P (xi ) = f (xi ) are actually conditions on the derivatives of P and f at xi .
Let us dene the remainder function: for x I,
R(x) = f (x) P (x).
Following [CJ], we will now give an expression for R which generalizes one form of
Taylors Theorem With Remainder. We begin with one preliminary result.
Theorem 12.12. (Generalized Rolles Theorem) Let f : I R be n times differentiable, and assume that f has at least n+1 roots on I, counted with multiplicity.
Then there is I with f (n) () = 0.
Exercise: Prove Theorem 12.12.
Theorem 12.13. (Hermite With Remainder) Let x0 . . . xn I, and let
f : I R be (n + 1) times dierentiable. Let P be the Hermite Interpolation
Polynomial for f . Then, for all x I, there is I in fact, lying in any closed
interval containing x, x0 , . . . , xn such that
(x x0 ) (x xn ) (n+1)
f
().
(93)
R(x) = f (x) P (x) =
(n + 1)!
280
Proof. If x = xi for some i, then both sides of (93) are 0, so equality holds.
We may thus assume x = xi for any i. Let c R, and consider
K(x) = R(x) c(x x0 ) (x xn ).
There is a unique value of c such that K(x) = 0: namely,
R(x)
.
(x x0 ) (x xn )
The function K : I R thus vanishes at least n + 2 times on I with multiplicity,
so by the Generalized Rolles Theorem there is I such that
c=
f (n+1) ()
.
(n + 1)!
(x x0 ) (x xn ) (n+1)
f
().
(n + 1)!
Remark: Restricting to Taylor polynomials, our earlier argument for the existence
of the interpolating polynomial is certainly easier: recall this consisted of simply
writing down the answer and checking that it was correct. However this proof of
part b) of Taylors Theorem with Remainder seems easier.
Exercise: a) Let x0 . . . xn , and let m n. Let
Pn (x) = A0 + A1 (x x0 ) + . . . + An (x x0 ) (x xn1 )
be the Hermite interpolation polynomial for a function f . Show that the Hermite
interpolation polynomial for f with respect to the approximation points x0 . . .
xk is
Pm (x) = A0 + A1 (x x0 ) + . . . + Am (x x0 ) (x xm1 ).
b) Suppose xn1 = xn . Show that there is [x0 , xn ] such that
f (n) ()
.
n!
c) Show that there is a sequence {k } taking values in I such that
An =
f (n) (k )
.
k
n!
d) Suppose that f has a continuous (n + 1)st derivative. Use part c) to recover the
formula or the nth Taylor series coecient.
An = lim
CHAPTER 13
282
The great mathematicians of the 17th, 18th and early 19th centuries encountered
many sequences and series of functions (again, especially power series and Taylor
series) and often did not hesitate to assert that the pointwise limit of a sequence of
functions having a certain nice property itself had that nice property.2 The problem
is that statements like this unfortunately need not be true!
Example 1: Dene fn = xn : [0, 1] R. Clearly fn (0) = 0n = 0, so fn (0) 0. For
any 0 < x 1, the sequence fn (x) = xn is a geometric sequence with geometric
ratio x, so that fn (x) 0 for 0 < x < 1 and fn (1) 1. It follows that the
sequence of functions {fn } has a pointwise limit f : [0, 1] R, the function which
is 0 for 0 x < 1 and 1 at x = 1. Unfortunately the limit function is discontinuous
at x = 1, despite the fact that each of the functions fn are continuous (and are
polynomials, so really as nice as a function can be). Therefore the pointwise
limit of a sequence of continuous functions need not be continuous.
Remark: Example 1 was chosen for its simplicity, not to exhibit maximum pathology. It is possible to construct a sequence {fn }
n=1 of polynomial functions converging pointwise to a function f : [0, 1] R that has innitely many discontinuities!
(On the other hand, it turns out that it is not possible for a pointwise limit of
continuous functions to be discontinuous at every point. This is a theorem of R.
Baire. But we had better not talk about this, or well get distracted from our stated
goal of establishing the wonderful properties of power series.)
One can also nd assertions in the math papers of old that if fn converges to
b
b
f pointwise on an interval [a, b], then a fn dx a f dx. To a modern eye, there
are in fact two things to establish here: rst that if each fn is Riemann integrable,
then the pointwise limit f must be Riemann integrable. And second, that if f is
Riemann integrable, its integral is the limit of the sequence of integrals of the fn s.
In fact both of these are false!
Example 2: Dene a sequence {fn }
n=0 with common domain [0, 1] as follows. Let
f0 be the constant function 1. Let f1 be the function which is constantly 1 except
f (0) = f (1) = 0. Let f2 be the function which is equal to f1 except f (1/2) = 0.
Let f3 be the function which is equal to f2 except f (1/3) = f (2/3) = 0. And
so forth. To get from fn to fn+1 we change the value of fn at the nitely many
rational numbers na in [0, 1] from 1 to 0. Thus each fn is equal to 1 except at a nite
set of points: in particular it is bounded with only nitely many discontinuities,
so it is Riemann integrable. The functions fn converges pointwise to a function f
which is 1 on every irrational point of [0, 1] and 0 on every rational point of [0, 1].
Since every open interval (a, b) contains both rational and irrational numbers, the
function f is not Riemann integrable: for any partition of [0, 1] its upper sum is
1 and its lower sum is 0. Thus a pointwise limit of Riemann integrable functions
need not be Riemann integrable.
2This is an exaggeration. The precise denition of convergence of real sequences did not come
until the work of Weierstrass in the latter half of the 19th century. Thus mathematicians spoke of
functions fn approaching or getting innitely close to a xed function f . Exactly what they
meant by this and indeed, whether even they knew exactly what they meant (presumably some
did better than others) is a matter of serious debate among historians of mathematics.
2. UNIFORM CONVERGENCE
283
0 n
Thus
284
n xc
xc
xc n
Proof. Step 1: We show that the sequence {Ln } is convergent. Since we dont
yet have a real number to show that it converges to, it is natural to try to use the
Cauchy criterion, hence to try to bound |Lm Ln |. Now comes the trick: for all
x I we have
|Lm Ln | |Lm fm (x)| + |fm (x) fn (x)| + |fn (x) Ln |.
By the Cauchy criterion for uniform convergence, for any > 0 there exists N Z+
such that for all m, n N and all x I we have |fm (x) fn (x)| < 3 . Moreover,
the fact that fm (x) Lm and fn (x) Ln give us bounds on the rst and last
terms: there exists > 0 such that if 0 < |x c| < then |Ln fn (x)| < 3
and |Lm fm (x)| < 3 . Combining these three estimates, we nd that by taking
x (c , c + ), x = c and m, n N , we have
|Lm Ln | + + = .
3 3 3
So the sequence {Ln } is Cauchy and hence convergent, say to the real number L.
Step 2: We show that limxc f (x) = L (so in particular the limit exists!). Actually
the argument for this is very similar to that of Step 1:
|f (x) L| |f (x) fn (x)| + |fn (x) Ln | + |Ln L|.
Since Ln L and fn (x) f (x), the rst and last term will each be less than 3
for suciently large n. Since fn (x) Ln , the middle term will be less than 3 for
x suciently close to c. Overall we nd that by taking x suciently close to (but
not equal to) c, we get |f (x) L| < and thus limxc f (x) = L.
Corollary 13.3. Let fn be a sequence of continuous functions with common
u
domain I and suppose that fn f on I. Then f is continuous on I.
2. UNIFORM CONVERGENCE
285
Since Corollary 13.3 is easier than Theorem 13.2, we include a separate proof.
Proof. Let x I. We need to show that limxc f (x) = f (c), thus we need to
show that for any > 0 there exists > 0 such that for all x with |x c| < we
have |f (x) f (c)| < . The idea again! is to trade this one quantity for three
quantities that we have an immediate handle on by writing
|f (x) f (c)| |f (x) fn (x)| + |fn (x) fn (c)| + |fn (c) f (c)|.
By uniform convergence, there exists n Z+ such that |f (x) fn (x)| < 3 for
all x I: in particular |fn (c) f (c)| = |f (c) fn (c)| < 3 . Further, since fn (x)
is continuous, there exists > 0 such that for all x with |x c| < we have
|fn (x) fn (c)| < 3 . Consolidating these estimates, we get
a n
a
u
n1
i=0
n1
(xi+1 xi ) = (b a),
i=0
and similarly,
|L(fn , P) L(f, P)| (b a).
Since fN is integrable, by Darbouxs Criterion there is a partition P of [a, b] such
that U (fN , P) L(fN , P) < . Thus
|U (f, P)L(f, P)| |U (f, P)U (fn , P)|+|U (fn , P)L(fn , P)|+|L(fn , P)L(f, P)|
286
(b a) + + (b a) = (2(b a) + 1).
Since > 0 was arbitrary, Darbouxs Criterion shows f is integrable on [a, b].
Step 2: If f, g : [a, b] R are integrable and |f (x) g(x)| for all x [a, b], then
b
b
b
b
|
f
g| = |
f g|
|f g| (b a).
a
a
u
From this simple observation and Step 1 the fact that fn f implies
is almost immediate. The details are left to you.
b
a
fn
b
a
f
Exercise: It follows from Theorem 13.4 that the sequences in Examples 2 and 3
above are not uniformly convergent. Verify this directly.
Corollary 13.5. Let {fn } be a sequence of continuous functions dened on
u
the interval [a, b] such that n=0 fn f . For each n, let Fn : [a, b] R be the
unique function with Fn = fn and Fn (a) = 0, and similarly let F : [a, b] R be the
u
unique function with F = f and F (a) = 0. Then n=0 Fn F .
Exercise: Prove Corollary 13.5.
Our next order of business is to discuss dierentiation of sequences of functions.
For this we should reconsider Example 4: let g : R R be a bounded dierential
function such that limn g(n) does not exist, and let fn (x) = g(nx)
n . Let M be
u
M
such that |g(x)| M for all R. Then for all x R, |fn (x)| n , so fn 0. But
as we saw above, limn fn (1) does not exist.
Thus we have shown the following somewhat distressing fact: uniform convergence of fn to f does not imply that fn converges.
Well, dont panic. What we want is true in practice; we just need suitable hypotheses. We will give a relatively simple result sucient for our coming applications.
Theorem 13.6. Let {fn }
n=1 be a sequence of functions on [a, b]. We suppose:
(i) Each fn is continuously dierentiable on [a, b],
(ii) The functions fn converge pointwise on [a, b] to some function f , and
(iii) The functions fn converge uniformly on [a, b] to some function g.
Then f is dierentiable and f = g, or in other words
( lim fn ) = lim fn .
n
Proof. Let x [a, b]. Since fn g on [a, b], certainly fn g on [a, x]. Since
each fn is continuous, by Corollary 13.3 g is continuous. Now applying Theorem
13.4 and the Fundamental Theorem of Calculus we have
x
x
x
g=
lim fn = lim
fn = lim fn (x) fn (a) = f (x) f (a).
a
a n
2. UNIFORM CONVERGENCE
287
u
to f (x). Suppose that each fn is continuously dierentiable and n=0 fn (x) g.
fn ) =
n=0
fn .
n=0
(95)
|g(x) g(t)| = |x t||g (c)| |x t|
.
2(b a)
2
It follows that for all x [a, b],
+ = .
2 2
By the Cauchy criterion, fn is uniformly convergent on [a, b] to some function f .
Step 2: Now x x [a, b] and dene
|fm (x) fn (x)| = |g(x)| |g(x) g(x0 )| + |g(x0 )| <
n (t) =
fn (t) fn (x)
tx
and
f (t) f (x)
,
tx
so that for all n Z+ , limxt n (t) = fn (x). Now by (95) we have
|m (t) n (t)|
2(b a)
(t) =
for all m, n N , so once again by the Cauchy criterion n converges uniformly for
u
u
all t = x. Since fn f , we get n for all t = x. Finally we apply Theorem
13.2 on the interchange of limit operations:
f (x) = lim (t) = lim lim n (t) = lim lim n (t) = lim fn (x).
tx
tx n
n tx
288
In (more) words, ||f || is the least M [0, ] such that |f (x)| M for all x I.
Theorem 13.9. (Weierstrass M-Test) Let {fn }
n=0 be a sequence of functions
dened on an interval I.Let {Mn }
be
a
non-negative
sequence such that ||fn ||
n=0
|SN +k (x) SN (x)| =
fn (x)
|fn (x)|
Mn < .
n=N +1
n>N
n>N
f (x) =
nan xn1 .
n=1
c) Since the power series f has the same radius of convergence R > 0 as f , f is
in fact innitely dierentiable.
d) For all n N, f (n) (0) = (n!)an .
Proof.
a) Let
0 < A < R, so f denes a function from [A, A] to R. We claim that
n
the series
as a function
n an x converges to f uniformly on [A,
A]. Indeed,
n
n
on [A, A], we have ||an xn || = |an |An , and thus
||a
x
||
=
<
n
n
n |an |A
, because power series converge absolutely on the interior of their interval of
convergence.
n Thus by the Weierstrass M -test f is the uniform limit of the sequence
Sn (x) = k=0 ak xk . But each Sn is a polynomial function, hence continuous and
innitely dierentiable. So by Theorem 13.2 f is continuous on [A, A]. Since any
x (R, R) lies in [A, A] for some 0 < A < R, f is continuous
R).
on (R,
b) According to Corollary 13.7, in order to show that f = n an xn = n fn is
dierentiable and the derivative may be compuited termwise,
it is enough to check
289
By X.X, this power series also has radius of convergence R, hence by the result of
part a) it is uniformly
convergent on [A, A]. Therefore Corollary 13.7 applies to
an n+1
F (x) = n=0 n+1 x
is an anti-derivative of f .
The following exercise drives home that uniform convergence of a sequence or series
of functions on all of R is a very strong condition, often too much to hope for.
Exercise: Let f (x) = n=0 an xn be a power series with an 0 for all n. Suppose
that the radius of convergence is 1, so that f denes a function on (1, 1). Show
that the following
are equivalent:
(n)
(0)
n!
Exercise: Suppose f (x) = n an xn and g(x) = n bn xn are two power series each
converging on some open interval (A, A). Let {xn }
n=1 be a sequence of elements
of (A, A) \ {0} such that limn xn = 0. Suppose that f (xn ) = g(xn ) for all
n Z+ . Show that an = bn for all n.
The upshot of Corollary 13.11 is that the only way that two power series can be
equal as functions even in some very small interval around zero
is if alloftheir
coecients are equal. This is not obvious, since in general n=0 an = n=0 bn
does not imply an = bn for all n. Another way of saying this is that the only power
series a function can be equal to on a small interval around zero is its Taylor series.
CHAPTER 14
Serial Miscellany
1.
1
2
n=1 n2 = 6
1
n=1 np converges
2
Theorem 14.1. (Euler) n=1 n12 = 6 .
Eulers original argument is brilliant but not fully rigorous by modern standards.
Since then several branches of mathematical analysis have been founded which give
systematic tools for nding sums of this and similar series. In particular if one
learns about Fourier series or complex analysis then very natural proofs can
be given, but both of these topics are beyond the scope of an honors calculus course.
On the other hand, in the intervening centuries literally hundreds of proofs of
Theorem 14.1 have been given, some of which use only tools we have developed
(or indeed, no tools beyond standard freshman calculus). Among these we give
here a particularly nice argument due to D. Daners [Da12] following Y. Matsuoka
[Ma61]. In fact this argument barely uses notions from innite series! Rather,
N
2
it gives an upper bound on 6 n=1 n12 in terms of N which approaches 0 as
N , and this certainly suces. Precisely, we will show the following result.
Theorem 14.2. For all positive integers N ,
2 1
2
.
6
n2
4(N + 1)
n=1
N
(96)
An
An1
=
.
2n 1
2n
292
= (2n 1)
sin x cos
2(n1)
xdx = (2n 1)
= (2n 1)(An1 An ).
Thus
and
1
An1
=
2n
2n
(
An +
An
2n 1
)
=
1
2n
An
= An1 An ,
2n 1
(2n 1)An + An
2n 1
An =
1 cos2n xdx = 2n
=n
)
=
An
.
2n 1
)
x2 cos x cos2n1 x (2n 1) sin2 x cos2(n1) x dx
= nBn + n(2n 1)
.
2
n
nAn
An
An1
An
Thus for all n Z+ we have
)
N
N (
1
2Bn
2B0
2BN
2Bn1
=
.
2
n
A
A
A
AN
n1
n
0
n=1
n=1
Since
A0 =
, B0 =
2
dx =
0
we have
x2 dx =
3
,
24
2B0
2
=
.
A0
6
N
2
2BN
1
=
.
2
n
6
AN
n=1
Equivalently
2 1
2BN
=
> 0.
2
6
n
AN
n=1
N
(98)
1This time we leave it to the reader to check that the boundary terms uv | 2 evaluate to 0.
0
293
and thus
( )2
sin2 x.
x2
2
Exercise: Prove Lemma 14.4. (Hint: use convexity!)
Using Lemma 14.4 and Lemma 14.3a) with N = n 1 we get
2
N
2 1
2
2BN
0<
=
x2 cos2N xdx
=
2
6
n
A
A
N
N
0
n=1
2 ( )2 2
2
2 ( )2
AN
=
,
sin2 x cos2N xdx =
AN 2
AN 2
2(N + 1)
4(N + 1)
0
which proves Theorem 14.2.
sequence. However, if we reorder the terms {an } of an innite series n=1 an , the
corresponding change in the sequence An of partial sums is not simply a reordering,
as one can see by looking at very simple examples. For instance, if we reorder
1 1 1
1
+ + + ... + n + ...
2 4 8
2
as
1 1 1
1
+ + + ... + n + ...
4 2 8
2
Then the rst partial sum of the new series is 14 , whereas every nonzero partial sum
of the original series is at least 12 .
Thus there is some evidence to fuel suspicion that reordering the terms of an innite series may not be so innocuous an operation as for that of an innite seuqence.
All of this discussion is mainly justication for our setting up the rearrangement
problem carefully, with a precision that might otherwise look merely pedantic.
Namely, the formal notion of rearrangement of a series
n=0
an begins with a
294
permuation
of N, i.e., a bijective function
Indeed, suppose that an 0 for all n. In this case the sum A = n=0 an [0,
k
is simply the supremum of the set An =
n=0 ak of nite sums. More generally, let S = {n1 , . . . , nk } be any nite subset of the natural numbers, and put
AS = an1 + . . . + ank . Now every nite subset S N is contained in {0, . . . , N } for
some N N, so for all S, AS AN for some (indeed, for all suciently large) N .
This shows that if we dene
A = sup AS
S
as S ranges over all nite subsets of N, then A A. On the other hand, for all
N N, AN = a0 + . . . + aN = A{0,...,N } : in other words, each partial sum AN
arises as AS for a suitable nite subset S. Therefore A A and thus A = A .
The point here is that the description n=0 an = supS AS is manifestly unchanged
by rearranging the terms of the series by any permutation : taking S 7 (S)
gives a bijection on the set of all nite subsets of N, and thus
n=0
The case of absolutely convergent series follows rather easily from this.
N
the largest element of S. Then S {0, . . . , N } so AS = nS an n=0 aN =
AN A, so A = sup AS A. Thus A = A.
The point of Lemma 14.6 is that we have expressed the sum of a series with nonnegative terms in a way which is manifestly independent of the ordering of the
terms:2 for any bijection of N, as S = {n1 , . . . , nk } ranges over all nite subsets
2This is a small preview of unordered summation, the subject of the following section.
295
an =
a(n) [0, ],
n=0
n=0
i.e., rearrangement of a series with non-negative terms does not disturb the convergence/divergence or the sum.
n=0
a(n) A| = |
n=0
a(n)
n=0
an |
|an | < .
n=N0
Indeed: by our choice of M we know that the terms a0 , . . . , aN0 1 appear in both
M
n=0 an and thus get cancelled; some further terms may or may
n=0 a(n) and
not be cancelled, but by applying the triangle inequality and summing the absolute
values we get an upper bound by assuming no further cancellation. This shows
M
296
to a permutation such that n=0 a(n) = . We leave this case to the reader.
Step 4 (converging to B R): if anything, the argument is simpler in this case. We
rst take positive terms p1 , . . . , pN1 , stopping when the partial sum p1 + . . . + pN1
is greater than B. (To be sure, we take at least one positive term, even if 0 >
B.) Then we take negative terms n1 , . . . , nN2 , stopping when the partial sum
p1 + . . . + pN1 + n1 + . . . + nN2 is less than B. Then we repeat the process, taking
enough positive terms to get a sum strictly larger than B then enough negative
terms to get a sum strictly less than B, and so forth. Because both the positive
and negative parts diverge, this construction can be completed. Because the general term an 0, a little thought shows that the absolute value of the dierence
between the partial sums of the series and B approaches zero.
The conclusion of Theorem 14.8 holds under somewhat milder hypotheses.
+
[,
],
there
297
Exercise: Let n=0 an be any nonabsolutely convergent real series, and let
a A . Show
that there exists a permutation of N such that the set of
partial limits of n=0 a(n) is the closed interval [a, A].
2.3. Unordered summation.
It is very surprising that the ordering of the terms of a nonabsolutely convergent
series aects both its convergence and its sum it seems fair to say that this phenomenon was undreamt of by the founders of the theory of innite series.
Armed
now, as we are, with the full understanding of the implications of our deni
ion of n=0 an as the limit of a sequence of partial sums, it seems reasonable to
ask: is there an alternative denition for the sum of an innite series, one in which
the ordering of the terms is a priori immaterial?
The answer to this question is yes and is given by the theory of unordered
summation.
To be sure to get a denition of the sum of a series which does not depend on
the ordering of the terms, it is helpful to work in a context in which no ordering is
present. Namely, let S be a nonempty set, and dene an S-indexed sequence of
real numbers to be a function a : S R. The point here is that we recover the
usual denition of a sequence by taking S = N (or S = Z+ ) but whereas N and Z+
come equipped with a natural ordering, the naked set S does not.
3Both of these are well beyond the scope of these notes, i.e., you are certainly not expected
to know what I am talking about here.
298
We wish to dene
sS as , i.e., the unordered sum of the numbers as as s
ranges over all elements
of S. Here it is: for every nite subset T = {s1 , . . . , sN }
of S, we dene aT = sT as = as1 + . . . + a
sN . (We also dene a = 0.) Finally,
for A R, we say that the unordered sum sS as converges to A if: for all
> 0, there exists a nite subset T0 S such that for allnite subsets T0 T S
we
have |aT A| < . If there exists A R such that sS as = A, we say that
sS as is convergent or that the S-indexed sequence a is summable. (When
S = ZN we already have a notion of summability, so when we need to make the
distinction we will say unordered summable.)
Notation: because we are often going to be considering various nite subsets T
of a set S, we allow ourselves the following time-saving notation: for two sets A
and B, we denote the fact that A is a nite subset of B by A f B.
Exercise: Suppose S is nite.Show that every S-indexed sequence a : S R
is summable, with sum aS = sS as .
Exercise: If S = , there is a unique function a : R, the
empty function. Convince yourself that the most reasonable value to assign s as is 0.
Exercise: Give reasonable denitions for
sS
as = and
sS
as = .
|
as
as | < .
sT
sT
299
Tn Tn+1 for all n and such that for all nite subsets T, T of S containing Tn ,
|aT aT | < 2 . It follows that the real sequence {aTn } is Cauchy, hence convergent,
say to A. We claim that a is summable to A: indeed, for > 0, choose n > 2 .
Then, for any nite subset T containing Tn we have
|
as | = |
as
as | < .
sT
sT
sT0
(iii) = (ii): Fix > 0, and let T f S be such that for all nite subsets T of S
with T T = , |aT | < 2 . Then, for any nite subset T of S containing T ,
|aT | |
as | + |
as | 1 +
|as |,
sT T1
sT \T1
so we may take M = 1 +
sT1
|as |.
sT1
300
s such that as 0 and Tn consists of the elements s such that as < 0. It follows
that
aTn = |aTn+ | |aTn |
hence
|aTn | |aTn+ | + |aTn |,
from which it follows that max |aTn+ , aTn | 2 , so we may dene for all n a subset
Tn Tn such that |aTn | 2 and the sum aTn consists either entirely of non-negative
as = sup aT .
sS
T f S
|aT | = |
as | |
|as || = ||a|T | < .
sT
sT
Suppose |a | is not summable. Then by Proposition 14.12, for every M > 0, there
exists T f S such that |a|T 2M . But as in the proof of Theorem 14.11, there
must exist a subset T T such that (i) aT consists entirely of non-negative terms
or entirely of negative terms and (ii) |aT | M . Thus the partial sums of a are
not uniformly bounded, and by Theorem 14.11 a is not summable.
Theorem 14.14. For
a : N R an ordinary sequence and A R, TFAE:
(i) The unordered sum nZ+ an is convergent, with sum A.
301
(ii) = (i): We will prove the contrapositive: suppose the unordered sum nN an
is divergent.
Then by Theorem 14.11 for every M 0, there exists T S with
Exercise: Fill in the missing details of (ii) = (i) in the proof of Theorem 14.14.
Exercise*: Can one prove Theorem 14.14 without appealing to the fact that |x| M
implies x M or x M ? For instance, does Theorem 14.14 holds for S-indexed
sequences with values in any Banach space? Any complete normed abelian group?
Comparing Theorems 14.12 and 14.13 we get a second proof of the portion of
the Main Rearrangement Theorem that says that a real series is unconditionally
convergent i it is absolutely convergent. Recall that our rst proof of this depended on the Riemann Rearrangement Theorem, a more complicated result.
On the other hand, if we allow ourselves to use the previously derived result
that unconditional convergence and absolute convergence
coincide, then we can get
a
is
unconditionally
convergent,
an easier
proof
of
(ii)
=
(i):
if
the
series
n
n
then n |an | < , so by Proposition 14.10 the unordered sequence |a | is summable, hence by Theorem 14.12 the unordered sequence a is summable.
To sum up (!), when we apply the very general denition of unordered summability to the classical case of S = N, we recover precisely the theory of absolute
(= unconditional) convergence. This gives us a clearer perspective on exactly what
the usual, order-dependent notion of convergence is buying us: namely, the theory
of conditionally convergent series. It may perhaps be disappointing that such an
elegant theory did not gain us anything new.
However when we try to generalize the notion of an innite series in various
ways, the results on unordered summability become very helpful. For instance,
often in nature one encounters biseries
an
n=
am,n .
m,nN
We may treat the rst case as the unordered sum associated to the Z-indexed sequence n 7 an and the second as the unordered sum associated to the NN-indexed
sequence (m, n) 7 am,n and we are done: there is no need to set up separate theories of convergence. Or, if we prefer, we may shoehorn these more ambitiously
indexed series into conventional N-indexed series: this involves choosing a bijection
b from Z (respectively N N) to N. In both cases such bijections exist, in fact
in great multitude: if S is any countably innite set, then for any two bijections
b1 , b2 : S N, b2 b1
: N N is a permutation of N. Thus the discrepancy
1
between two chosen bijections corresponds precisely to a rearrangement of the series. By Theorem 14.13, if the unordered sequence is summable, then the choice of
bijection b is immaterial, as we are getting an unconditionally convergent series.
302
The theory of products of innite series comes out especially cleanly in this unordered setting (which is not surprising, since it corresponds to the case of absolute
convergence, where Cauchy products are easy to deal with).
Exercise: Let S1 and S2 be two sets, and let a : S1 R, b : S2 R. We
assume the following nontriviality condition: there exists s1 S1 and s2 S2 such
that as1 = 0 and as2 = 0. We dene (a, b) : S1 S2 R by
(a, b)s = (a, b)(s1 ,s2 ) = as1 bs2 .
a) Show that a and b are both summable i (a, b) is summable.
b) Assuming the equivalent conditions of part a) hold, show
(
)(
)
(a, b)s =
as1
b s2 .
sS1 S2
s1 S1
s2 S2
series.
Let n=0 an be
a convergent
n
a) The series n=0 a
n x is uniformly
convergent on [0, 1].
n
Proof. a) ([C, p. 47]) Since
n=0 an , convergence at x =
n=0 an 1 =
1 is our hypothesis. By our work on power series specically
Lemma 11.31
convergence at 1 implies convergence on (1, 1), and thus n=0 an xn converges
pointwise on [0, 1]. Since we have convergence
at x = 1, it suces to show uniform
+
convergence
on
[0,
1).
Fix
>
0;
because
n an converges, there is N Z such
that | n=N an | < . Now we apply Abels Lemma (Proposition 10.2) with an
sequence aN , aN +1 , . . ., withbn sequence xN , xN +1 , . . . (note this is positive and
N
+k
an xn | xN M xN < .
n=N
N
u
By the Cauchy
Criterion (Lemma 13.1), n=0 an xn f on [0, 1).
n
n
n=0 an x is continuous at x = 1: limx1
n=0 an x =
n=0 an .
5Here we are following our usual convention of allowing individual exercises to assume knowledge that we do not want to assume in the text itself. Needless to say, there is no need to attempt
this exercise if you do not already know and care about uncountable sts.
3. ABELS THEOREM
303
Remark: Usually Abels Theorem means part b) of the above result: n=0 an =
limx1 an xn . But this is an immediate consequence of the uniformity of the
convergence on [0, 1], so having this statement be part of Abels Theorem gives a
stronger and also more conceptually transparent result.
Above we followed an exercise in the text [C] of Cartan.6 For comparison, here
is a dierent proof of Theorem 14.15b) from a famous text of Rudin.
an xn =
n=0
n=0
N
1
An xn + AN xN .
n=0
f (x) =
an xn = (1 x)
An xn .
n=0
n=0
Now x > 0, and choose N such that n N implies |A An | < . Then, since
(99)
(1 x)
xn = 1
n=0
An xn A(1 x)
n=0
(1 x)
n=0
|An A|xn +
xn | = |(1 x)
n=0
()
2
(1 x)
n=N +1
xn (1 x)
(An A)xn |
n=0
N
|An A|xn + .
n=0
The last quantity above approaches as x approaches 1 from the left. Since was
arbitrary, this shows limx1 f (x) = A.
As you can see, Rudins proof uses much less than Cartans: rather than relying on
Abels Lemma, a bit of partial summation is done on the y. Moreover, most of the
appeals to the theory of power series and uniform convergence are replaced by a
clever introduction of the geometric series! Nevertheless I must say that, although
Rudins argument is easy enough to follow line by line, in terms of whats going
on in the proof I nd it absolutely impenetrable.
The rest of this section is an extended exercise in Abels Theorem appreciation.
First of all, it may help to restate the result in a form which is slightly more general
and moreover makes more clear exactly what has been established.
304
1
Exercise: Consider f (x) = 1x
= n=0 xn , which converges for all x (1, 1).
Show that limx1+ f (x) exists and thus f extends to a continuous function on
[1, 1). Nevertheless f (1) = limx1+ f (x). Why does this not contradict Abels
Theorem?
3.2. An Application to the Cauchy Product.
As our rst application, we round out our treatment of Cauchy products by showing
that the Cauchy product never wrongly converges.
x1
3.3. Two Amazing Identities Justied By Abels Theorem.
Here are two surprising and beautiful identities.
(100)
1 1 1
(1)n+1
= 1 + + . . . = log 2.
n
2 3 4
n=1
(101)
(1)n
1 1 1
= 1 + + ... = .
2n
+
1
3
5
7
4
n=0
The identity (101) was known to Leibniz (as would be logical, given that the
convergence of both series follows from Leibnizs Alternating Series Test), where
the quotation marks are meant to suggest that Leibniz probably did not have an
argument we would accept as a rigorous proof. Suppose someone shows you such
identities, as I now have: what would you make of them?
A good rst reaction is to attempt numerical verication. Even this is not as
easy as one might expect, because the convergence of both series is rather slow.
(In particular, among all ways one might try to numerically compute , (101) is
one of the worst I know.) Like any series which is shown to be convergent by the
Alternating Series Test, there is a built in error estimate for the sum: if we cut o
3. ABELS THEOREM
305
after N terms the error is in absolute value at most |aN +1 |. The problem with this
is that the N th terms of these series tend to zero quite slowly! So for instance,
10
(1)n+1
=
= 0.6930971830599452969172323714 . . .
n
n=1
4
S104
(Using a software package I asked for the exact sum of the series, which is of course
a rational number, but a very complicated one: it occupies more than one full screen
on my computer. The amount of time spent to compute this rational number was
small but not instantaneous. In fact the reason I chose 104 is that the software managed this but had trouble with S105 . Then I converted the fraction to a decimal.
Of course a much better way to do this would be to convert the fractionals to decimals as we go along, but this needs to be done carefully to prevent rounding errors:
to be serious about this sort of thing one needs to know some numerical analysis.)
By the Alternating Series Test, the dierence between the innite sum and the
1
nite sum S104 is at most 1001
, so we are guaranteed (roughly) four decimal places
of accuracy. For comparison, computing log 2 e.g. by writing it as log( 12 ) and
using the Taylor series for log(1 + x) we get
log 2 = 0.6931471805599453094172321214 . . . .
So indeed the identity (100) holds true at least up to four decimal places. If we
wanted to do much more numerical verication than this, we would probably have
to do something a little more clever.
Similarly, we have
10
(1)n
= 0.78542316 . . . ,
2n + 1
n=0
4
and by the Alternating Series test this approximates the innite sum
to at least four decimal places of accuracy, whereas
= 0.7853981633974483096156608458 . . . ,
4
which shows that (101) holds true at least up to four decimal places.
(1)n
n=0 2n+1
f (x) =
1
1
=
=
(x)n =
(1)n xn .
1+x
1 (x) n=0
n=0
(1)n xn+1
(1)n xn+1
=
.
n+1
n+1
n=1
n=1
(1)n
.
n+1
n=1
306
But not so fast. Although you will nd this explanation in many freshman calculus
books, it is not yet justied. We were being sloppy in our above work by writing
down power series expansions and not keeping track of the interval of convergence.
The identity (102) holds for x (1, 1), and thats really the best we can do, since
the power series on the right hand side does not converge for any other values of x.
Integrating this term by term, we nd that (103) holds for x (1, 1). Above, in
our excitement, we plugged in x = 1: so close, but out of bounds. Too bad!
But dont despair: its Abels Theorem for the win! Indeed, because the series
converges at x = 1 and the function log x is dened and continuous at x = 1,
log(2) = log(1 + 1) = lim log x = lim
x1
x1
I leave it to you to establish (101) starting with the function f (x) = arctan x.
3.4. Abel Summability.
Abels Theorem gives rise to a summability
method: a way to extract numerical
n
Example: Consider the series
n=0 (1) . As we saw, the partial sums alternate between 0 and 1 so the series does not diverge. We mentioned earlier that (the
great)
believed that nevertheless the right number to attach to the series
L. Euler
1
n
n=0 (1) is 2 . Since the two partial limits of the sequence of partial sums are 0
and 1, it seems vaguely plausible to split the dierence.
Theorem provides a much more convincing argument. The power series
Abels
n n
n (1) x converges for all x with |x| < 1, and moreover for all such x we have
(1)n xn =
n=0
n=0
and thus
lim
x1
n=0
(x)n =
1
1
=
,
1 (x)
1+x
(1)n xn = lim
x1
1
1
= .
1+x
2
That is, the series n (1) is divergent but Abel summable, with Abel sum 12 . So
Eulers idea was better than we gave him credit for.
n
this
case
the
converse
of
Abels
Theorem
holds:
if
lim
a
x
= L, then
n
x1
n=0
n=0 an = L.
4. The Weierstrass Approximation Theorem
4.1. Statement of Weierstrass Approximation.
307
|mi xi + bi |.
i=1
7http://www.math.harvard.edu/elkies/M55b.10/index.html
308
.
2
2
(iii) One may, for instance, go by induction on the number of corners of f .
max(f, g) =
Then
n=0 an = L, and the convergence of the series to the limit function is
uniform on [0, 1].
Exercise: Prove Lemma 14.21. Two suggestions:
(i) Reduce to the case in which an 0 for all n N.
(ii) Use the Weierstrass M-Test.
Proposition 14.22. For any > 0, the function f (x) = |x| on [, ] can be
uniformly approximated by polynomials.
Proof. Step 1: Suppose that for all > 0, there is a polynomial function
P : [1, 1] R such that |P (x) |x|| < for all x [1, 1]. Put x = y . Then for
all y [, ] we have
y
y
|P ( ) | || = |Q(y) |y|| < ,
309
n
(1 + y) =
y
n
n=0
valid for all R and all y (1, 1). Taking = 21 and substituting y for y, we
TN (y) =
(1)n 2 y n ,
n
n=0
(1)
and limN TN (y) = 1 y for y [0, 1). Further, (1)n n2 < 0 for n 1, and
y1
u
Thus we may apply Elkies Lemma to get TN (y) 1 y on [0, 1]. For x [1, 1],
y = 1 x2 [0, 1], so making this substitution we nd that on [1, 1],
(1)
N
u
2
n 2
TN (1 x ) =
(1 x2 )n 1 (1 x2 ) = x2 = |x|.
(1)
n
n=0
4.4. Proof of the Weierstrass Approximation Theorem.
It will be convenient to rst introduce some notation. For a < b R, let C[a, b]
be the set of all continuous functions f : [a, b] R, and let P be the set of all
polynomial functions f : [a, b] R. Let PL([a, b]) denote the set of piecewise linear
functions f : [a, b] R.
For a subset S C[a, b], we dene the uniform closure of S to be the set of
all functions f C[a, b] which are uniform limits of sequences in S: precisely, for
u
which there is a sequence of functions fn : [a, b] R with each fn S and fn f .
Lemma 14.23. For any subset S C[a, b], we have S = S.
Proof. Simply unpacking the notation is at least half of the battle here. Let
u
f S, so that there is a sequence of functions gi S with gi f . Similarly, since
u
each gi S, there is a sequence of continuous functions fij gi . Fix k Z+ :
1
1
choose n such that ||gn f || < 2k
and then j such that ||fnj gn || < 2k
; then
||fnj f || ||fnj gn || + ||gn f || <
1
1
1
+
= .
2k 2k
k
1
k
and thus fk f .
310
|mi x + bi |.
i=1
Choose > 0 such that for all 1 i n, if x [a, b], then mi x + bi [, ]. For
each 1 i n, by Lemma 14.22 there is a polynomial Pi such that
n for all x [a, b],
|Pi (mi x+bi )|mi x+bi || < n . Let P : [a, b] R by P (x) = b+ i=1 Pi (mi x+bi ).
Then P P and for all x [a, b],
n
n
|P (x) f (x)| =
= .
(Pi (mi x + bi ) |mi x + bi |) <
n
i=1
i=1
5. A Continuous, Nowhere Dierentiable Function
We are going construct a function f : R R with the following striking property:
for all x0 R, f is continuous at x0 but f is not dierentiable at x0 . In short, we
say that f is continuous but nowhere dierentiable.
The rst such construction (accompanied by a complete, correct proof) was given
in a seminal 1872 paper of Weierstrass. Weierstrasss example was as follows: let
(0, 1), and let b be a positive odd integer such that b > 1 + 3
2 . Then the
function f : R R given by
(104)
f (x) =
n cos(bn x)
n=0
311
Exercise:
a) Let f : [a, b] R be a piecewise linear function with slopes m1 , . . . , mn . Show
that f is Lipschitz, and the smallest possible Lipschitz constant is C = maxi |mi |.
b) Let f : R R be a function. Suppose that there is C > 0 such that for every
closed subinterval [a, b] of R, C is a Lipschitz constant for the restriction of f to
[a, b]. Show that C is a Lipschitz constant for f .
c) Let f : R R be a piecewise linear function with corners at the integers
i.e., f is dierentiable on (n, n + 1) for all n Z+ and is not dierentiable at
any integer n. For n Z, let mn be the slope of f on the interval (n, n + 1). Let
C = supnZ mn . Show that f is Lipschitz i C < , in which case C is the smallest
Lipschitz contant for f .
Now we begin our construction with the sawtooth function S : R R: the
unique piecewise linear function with corners at the integers and such that S(n) = 0
for every even integer n and S(n) = 1 for every odd integer n. The slopes of S are
all 1, so by the preceding exercise S is Lipschitz (hence continuous):
x, y R, |S(x) S(y)| |x y|.
Also S is 2-periodic: for all x R, S(x + 2) = S(x). For k N, dene
( )k
3
fk : R R, fk (x) =
S(4k x).
4
We suggest that the reader sketch the graphs of the functions fk : roughly speaking
they are sawtooth functions which, as k increases, oscillate more and more rapidly
( )k
but with smaller amplitude: indeed ||fk (x)|| = 34 . We dene f : R R by
( )k
3
f (x) =
fk (x) =
S(4k x).
4
k=0
k=0
( 3 )k
Since k=0 ||fk || = k=0 4 < , the series dening f converges uniformly by
the Weierstrass M-Test. This also gives that f is continuous, since f is a uniform
limit of a sequence of continuous functions. We claim however that f is nowhere
dierentiable. To see this, x x0 R. We will dene a sequence {n } of nonzero
real numbers such that n 0 and the sequence
Dn =
f (x0 + n ) f (x0 )
n
(105)
(106)
k = n, |S(4k x0 + 4k n ) S(4k x0 )| =
1
.
2
312
(107)
( )n
n1
( 3 )k S(4k x0 + 4k n ) S(4k x0 )
3
n
4
4
4
n
k=0
n1
3n
k=0
3k = 3n
3n 1
3n
.
2
2
313
(x) =
tx1 et dt.
two pieces, say 0 and 1 . The argument of part a) will handle the latter integral.
1
Reduce the former integral to 0 xdxp .
b) Deduce that
(109)
1
( ) = 2
2
x2
dx =
ex dx.
2
(x + 1) = x(x).
314
Proof.
(x)(y)
.
(x + y)
Proof. . . .
Making the substitution t = sin2 in the integral dening B(x, y) and applying
Theorem 14.29 we get
2
(x)(y)
2x1
2y1
(111)
2
(sin )
(cos )
d =
.
(x + y)
0
Taking x = y =
1
2
in (111) we deduce:
Theorem 14.30.
(112)
( )
2
1
=
ex dx = .
2
315
f
(x)dx
f
(x)dx
+
|f
(x)|dx
+
|f (x)|dx < .
0
N
0
N
Since fn f uniformly on [ N1 , N ],
N
lim
fn (x)dx =
n
1
N
f (x)dx.
1
N
f
(x)dx
n
0
0
( 1 )
( 1 )
+
0
|fn (x)|dx +
+
0
N
|f (x)|dx +
(fn (x) f (x))dx
1
+ + = 3.
Since was arbitrary, the proof is complete.
lim
(x + 1)
= 1.
(x/e)x 2x
CHAPTER 15
318
f1
f2
is continuous.
319
320
this section we provide a review of complex numbers and the rudiments of complex
power series. This theory can be developed to an amazing extent further than the
theory of real power series, in fact! but such qualitatively dierent developments
are the subject of another course. Here we just want to develop the theory enough
so that we can make sense of plugging complex numbers into power series.
Recall that a complex number is an expression of the form z = a + bi. Here a
and b are real numbers and i is a formal symbol having the property that i2 = 1.
For many years people had philosophical diculties with complex numbers; indeed, numbers of the form ib were called imaginary, and the prevailing view was
that although they did not exist, they were nevertheless very useful.
From a modern point of view this is neither acceptable (we cannot work with
things that dont exist, no matter how useful they may be!) nor necessary: we can
dene the complex numbers entirely in terms of the real numbers. Namely, we may
identify a complex number a + bi with the ordered pair (a, b) of real numbers, and
we will dene addition and multiplication. Since we would want (a + bi) + (c + di) =
(a + c) + (b + d)i, in terms of ordered pairs this is just (a, b) + (c, d) = (a + c, b + d).
In other words, this is the usual addition of vectors in the plane. The multiplication
operation is more interesting but still easy enough to write down in terms of only
real numbers: to compute (a + bi)(c + di), we would want to use the distributive
law of multiplication over addition and the relation i2 = 1. In other words, we
would like (a + bi) (c + di) = ac + bci + adi + bdi2 = (ac bd) + (ad + bc)i. Thus
in terms of ordered pairs we dene a multiplication operation
(a, b) (c, d) = (ac bd, ad + bc).
Note that with this convention, we may identify real numbers a (i.e., those with
b = 0) with pairs of the form (a, 0); moreover, what we were calling i corresponds
to (0, 1), and now any ordered pair a + bi can be expressed as (a, 0) + (b, 0) (0, 1).
Exercise: Show that the above operations of addition and multiplication on ordered pairs satisfy all the eld axioms (P0) through (P9). The resulting structure
is called the complex numbers and denoted C.
Exercise: Show that because of the relation i2 = 1, C cannot be endowed with
the structure of an ordered eld.
Two other important operations on the complex numbers are conjugation and taking the modulus. For any complex number z = a + bi, we dene its complex
conjugate to be z = a bi. Conjugation ts in nicely with the rest of the algebraic
structure: one has z1 + z2 = z1 + z2 and (z1 z2 ) = z1 z2 .
For any complex number
z = a + bi, we dene its modulus (or norm, or abso
lute value) to be |z| = a2 + b2 . This is just the usual norm of an element of RN
specialized to the case N = 2. In particular, we have the triangle inequality
z1 , z2 C, |z1 + z2 | |z1 | + |z2 |.
However, the norm also behaves nicely with respect to the multiplicative structure.
321
1
n
n
n=0 an z is a power series with complex coecients, then dening = lim sup |an | ,
we nd that is the radius of convergence of the complex power series in the sense
that the seriesconverges for all z with |z| < and diverges for all z with |z| > .
Especially, if n an xn is a power series with real coecients and innite radius of
convergence, then because for a real number x, its absolute value |x|
is the same
as the modulus of the complex number x + 0i, then the power seres n an z n must
converge for all complex numbers z.
3. Elementary Functions Over the Complex Numbers
3.1. The complex exponential function.
Consider the following complex power series:
zn
.
E(z) :=
n!
n=0
1
1
Because the ratio test limit is limn (n+1)!
= limn n+1
= 0, the radius of
1
n!
convergence is innite: the series converges for all complex numbers z.
z k wnk
1 n k nk (z + w)n
E(z)E(w) =
=
z w
=
= E(z+w).
k!(n k)! n=0 n!
k
n!
n=0
n=0
k=0
k=0
Since E(0) = 1, we have for all z that
E(z)E(z) = E(z z) = E(0) = 1,
1
E(z) .
or E(z) =
Note in particular that E(z) is never zero. Restricting attention
to real values, since E : x 7 E(x) is a continuous function which is never zero and
such that E(0) = 1, we conclude E(x) > 0 for all real x.
3.2. The trigonometric functions.
Let us now turn to the functions sin x and cos x. Recall that we have already shown
that any pair of dierentiable functions S(x) and C(x) such that S (x) = C(x),
C (x) = S(x), S(0) = 0 and C(0) = 1 must be equal to their Taylor series and
given by the following expansions:
(1)n x2n+1
.
S(x) :=
(2n + 1)!
n=0
322
C(x) :=
(1)n x2n
.
(2n)!
n=0
Of course we would like to say S(x) = sin x and C(x) = cos x, but we do not
want to have to resort to discussions involving angles, lengths of arcs and other
such things. We want to see how much can be derived directly from the power series expansions themselves. For instance, we would like the show that C 2 + S 2 = 1.
Unfortunately, although this identity does hold, showing it directly from the power
series expansions involves some rather unpleasant algebra (try it and see).
This is where complex numbers come in to save the day:
Proposition 15.10. For all real x, we have the following identities:
1
C(x) = (E(ix) + E(ix)),
2
1
S(x) = (E(ix) E(ix)),
2i
E(ix) = C(x) + iS(x).
Exercise: Prove Proposition 15.10.
Now we are in business: since the coecients of E(z) are real, we have E(ix) =
E(ix) = E(ix) for all real x, hence
C(x)2 + S(x)2 = |E(ix)|2 = E(ix)E(ix) = E(ix)E(ix) = E(ix ix) = E(0) = 1.
Were not done yet: wed like to prove that S(x) and C(x) are periodic functions,
whose period is a mysterious number approximately equal to 2 3.141592653 . . ..
This can also be worked out from the power series expansions, with some cleverness:
We rst claim that there exists x0 > 0 such that C(x0 ) = 0. Otherwise, since
C(0) = 1 > 0, wed have C(x) > 0 for all x, hence S (x) = C(x) > 0 for all x, hence
S would be strictly increasing on the entire real line. Since S(0) = 0, it follows that
S(x) > 0 for all x > 0. Now, if 0 < x < y, we have
y
S(x)(y x) <
S(t)dt = C(x) C(y) 2.
But now for xed x and y > x +
x
2
S(x) ,
Lemma 15.11. Let f : [0, ) R be continuous such that f (0) > 0 and
f (x) = 0 for some x > 0. Then there is a least positive number x0 such that
f (x0 ) = 0.
Proof. Left to the reader.
Now we dene the number by = 2x0 . where x0 is the lesat positive number x
such that C(x) = 0. The relation C(x)2 + S(x)2 = 1 together with C( 2 ) = 0 shows
that S( 2 ) = 1. On the other hand, since C(x) = S (x) is non-negative on [0, 2 ],
S(x) is increasing on this interval, so it must be that S( 2 ) = 1. Thus E( i
2 ) = i.
Using the addition formula for E(z) we recover Eulers amazing identity
( i )2
= 1,
ei = e 2
323
and also e2i = 1. In general, ez+2i = ez e2i = ez , so E is periodic with period 2i.
Using the periodicity of E and the formula of Proposition 2, we get that for all
x C(x + 2) = C(x) and S(x + 2) = S(x).
Since for all real t, |eit | = 1, the parameterized curve
r(t) = eit = C(t) + iS(t) (C(t), S(t)) = (x(t), y(t))
has image contained in the unit circle. We claim that every point on the unit circle
is of the form eit for a unique t [0, 2). To see this, start at the point 1 = ei0 ,
and consider t [0, 2 ]. The function C : [0, 2 ] R is continuous and decreasing,
hence injective, with C(0) = 1 and C( 2 ) = 0. By the Intermediate Value Theorem, all values in [0, 1] are assumed for a (necessarily unique, by the injectivity)
t [0, pi
2 ], and every point in the rst quadrant of the unit circle is of the form
(x, y) for a unique x [0, 1]. By making similar arguments in the intervals [ 2 , ],
3
[, 3
2 ], [ 2 , 2] we establish the claim.
Finally, if we grant that by the arclength of the parameterized curve r(t) = (x(t), y(t))
from t = a to t = b we mean the integral
b
dx
dy
| |2 + | |2 dt
dt
dt
t=a
it is easy to show that C(x) = cos x and S(x) = sin x.
(C(t), S(t)), the arclength integral is
S 2 (t) + C 2 (t)dt = ,
t=0
so the point r() = (C(), S()) really is the point that we arrive at by starting at
the point (1, 0) on the unit circle and traversing units of arc.
Lemma 15.12. (DeMoivre) Let k Z+ and z C. Then there is w C such
that wk = z.
Proof. If z = 0 we may take w = 0. Otherwise, r := |z| > 0, so zr lies on
the unit circle and thus zr = ei for a unique [0, 2). We well know (as a
consequence of the Intermediate Value Theorem) that every positive real number
1
1
has a positive kth root, denoted r k . Thus if w = r k ei / k ,
1
wk = (r k ei / k )k = r(ei k )k = rei = z.
Exercise: Let z be a nonzero complex number and k a positive integer. Show that
there are preciesly k complex numbers w such that wk = z.
4. The Fundamental Theorem of Algebra
4.1. The Statement and Some Consequences.
Theorem 15.13. (Fundamental Theorem of Algebra) Let
P (z) = an z n + . . . + a1 z + z0
324
be a polynomial with complex coecients and positive degree. Then P has a root in
the complex numbers: there is z0 C such that P (z0 ) = 0.
Theorem 15.13 is not easy to prove, and we defer the proof until the next section.
For now we give some important consequences of this seminal result.
Corollary 15.14. Every nonconstant polynomial with complex coecients
factors as a product of linear polynomials. More precisely, let
P (z) = an z n + . . . + a1 z + a0 , with an = 0.
Then there are 1 , . . . , n C (not necessarily distinct) such that
(114)
P (z) = an (z 1 )(z 2 ) (z n ).
Proof. First observe that if the result holds for P (z) then it holds for P (z)
for any C \ {0}. It is therefore no loss of generality to assume that the leading
coecient an of P (z) is equal to 1. Let us do so.
We now prove the result by induction on n, the degree of the polynomial P .
Base Case (n = 1): A degree one polynomial with leading coecient 1 is of the
form z + a0 = z 1 , with 1 = a0 .
Induction Step: Let n Z+ , suppose the result holds for all polynomials of degree
n, and let P (z) be a polynomial of degree n + 1. By Theorem 15.13, there is
z0 C such that P (z0 ) = 0. By the Root-Factor Theorem, we may write P (z) =
(z z0 )Q(z), with Q(z) a polynomial of degree n and leading coecient 1. By
induction, Q(z) = (z 1 ) (z n ), so putting n+1 = z0 we get
P (z) = (z 1 ) (z n )(z n+1 ).
More generally, let F be a eld: that is, a set endowed with two binary operations
denoted + and and satisfying the eld axioms (P0) through (P9) from Chapter
1. We say that F is algebraically closed if every nonconsant polynomial P (x)
with coecients in F has a root in F , i.e., there is x0 F such that P (x0 ) = 0. In
this terminology, the Fundamental Theorem of Algebra asserts precisely that the
complex eld C is algebraically closed.
Exercise: Let F be an algebraically closed eld. Show that the conclusion of Corollary 15.14 holds for F : that is, every nonconstant polynomial factors as a product
of linear polynomials.
Since every real number is, in particular, a complex number, Corollary 15.14 applies
in particular to polynomials over R: if P (x) = an xn + . . . + a1 x + a0 with an = 0,
then there are complex numbers 1 , . . . , n such that
P (x) = an (x 1 )(x 2 ) (x n ).
But since the coecients of P are real, it is natural to ask whether or to what
extent some or all of the roots i must also be real. Recall that we need not have
any real roots (that is, the eld R is not algebraically closed), for any n Z+ , the
polynomial Pn (x) = (x2 + 1)n is positive for all real x, so has no real roots. And
indeed, its factorization over C is (x2 + 1)n = (x + i)n (x i)n .
However, the polynomials Pn all had even degree. Recall that, as a consequence
of the Intermediate Value Theorem, every polynomial of odd degree has at least
325
one real root (and need not have more than one, as the family of examples xPn (x)
shows). So there is some relation between the parity (i.e., the evenness or oddness)
of the degree of a real polynomial and its real and complex roots. This observation
can be claried and sharpened in terms of the operation z = a + bi 7 a bi = z of
complex conjugation.
Lemma 15.15. a) For z, a0 , . . . , an C, we have
an z n + . . . + a1 z + a0 = an z n + . . . + a1 z + a0 .
b) Thus, if a0 , . . . , an R, we have
an z n + . . . + a1 z + a0 = an z n + . . . + a1 z + a0 .
c) Let a0 , . . . , an R and put P (z) = an z n + . . . + a1 z + a0 . If z0 C is such that
P (z0 ) = 0, then also P (z0 ) = 0.
Proof. We have already observed that for any z1 , z2 C, z1 + z2 = z1 +z2 and
z1 z2 = z1 z2 . Keeping these identities in mind, the proof becomes a straightforward
exercise which we leave to the reader.
It is part c) of Lemma 15.15 that is important for us: this well-known result often
goes by the description The complex roots of a real polynomial occur in conjugate
pairs. To see why this is relevant, consider the following extremely simple but
extremely important result.
Lemma 15.16. For a complex number z, z R z = z.
Proof. If z R, then z = z + 0i, so z = z 0i = z. Conversely, let z = a + bi.
IF a bi = z = z = a + bi, then (2i)b = 0. Multiplying through by (2i)1 = i
2 ,
we get b = 0, so z = a + 0b R.
Let us say that C is properly complex if it is not a real number. If P (x)
is a polynomial with real coecients and has the properly complex number as a
root, then by Lemma 15.16 it also has the (distinct!) properly complex number
as a root. By the Root-Factor Theorem, we may write P (z) = (z )(z )P2 (z).
Now wake up! something interesting is about to happen. Namely, if we write
P1 (z) = (z )(z ),
then we claim both P1 (z) and P2 (z) have real coecients, so we have obtained a
factorization of real polynomials
P (x) = P1 (x)P2 (x).
For P1 (z) we need only write = a + bi and multiply it out:
(z )(z ) = z 2 ( + )z + = z 2 (2a)z + (a2 + b2 ).
For P2 (z) we have to argue a bit more abstractly. Namely, for polynomials over R
(and really, over any eld) we can always perform division with remainder: there
are unique real polynomials Q(x), R(x) such that
(115)
We claim R(x) 0 (i.e., it is the zero polynomial). To see this, we use the uniqueness part of the division algorithm in a slightly sneaky way: namely, consider P (x)
326
and P1 (x) as polynomials with complex coecients and perform the division algorithm there: there are unique complex polynomials, say QC (x) and RC (x), such
that
(116)
Heres the point: on the one hand, real polynomials are complex polynomials, so
comparing (115) and (116) we deduce Q(x) = QC (x) and R(x) = RC (x). On
the other hand, the identity P (x) = P1 (x)P2 (x) of complex polynomials shows
that we may take QC (x) = P2 (x) and RC (x) 0. Putting these together, we get
Q(x) = P2 (x) and R(x) 0, so indeed P (x) = P1 (x)P2 (x) is a factorization of real
polynomials.
A positive degree polynomial P (x) over a eld F is called irrreducible if it cannot
be factored as P (x) = P1 (x)P2 (x) with deg P1 , deg P2 < deg P . (This last condition
is there to prevent trivial factorizations like x2 + 1 = (2) ( 12 x2 + 12 .) A polynomial
of positive degree which is not irreducible is called reducible.
Exercise: Let F be a eld, and let P (x) be a polynomial with coecients in F .
a) Show that if P is irreducible, it has no roots in F .
b) Suppose P has degree 2 and no roots in F . Show that P is irreducible.
c) Suppose P has degree 3 and no roots in F . Show that P is irreducible.
d) For each n 4, exhibit a degree n polynomial with coecients in R which is
reducible, but has no real roots.
Theorem 15.17. Let P (x) be a real polynomial of degree n 1.
a) There are natural numbers r, s N such that r + 2s = n, linear polynomials
L1 (x), . . . , Lr (x) and irreducible quadratic polynomials Q1 (x), . . . , Qs (x) such that
P (x) = L1 (x) Lr (x)Q1 (x) Qs (x).
b) If n is odd, then P has at least one real root.
Proof. Its harmless to assume that P has leading coecient 1, and we do so.
a) We go by strong induction on n. When n = 1, P (x) is a linear polynomial, so
we may take r = n = 1, s = 0 and P (x) = L(x). Suppose n 2 and the result
holds for all polynomials of degree less than n, and let P (x) be a real polynomial
of degree n. By the Fundamental Theorem of Algebra, P (x) has a complex root .
If R, then by the Root-Factor Theorem P (x) = (x )P2 (x) and we are done
by induction. If is properly complex, Q(x) = (x )(x ) is a real, irreducible
quadratic polynomial and P (x) = Q(x)P2 (x), and again we are done by induction.
b) Since r + 2s = n, if n is odd we cannot have r = 0. Thus P has at least one
linear factor and thus at least one real root.
The partial fractions decomposition rests on the foundation of Theorem 15.17.
4.2. Proof of the Fundamental Theorem of Algebra.
We now give a proof of Theorem 15.13, closely following W. Rudin [R, Thm. 8.8].
Let P (z) be a polynomial with complex coecients, degree n 1, and leading
coecient an . We want to show that P (z) has a complex root; certainly this holds
327
i a1n P (z) has a complex root, so it is no loss of generality to assume that the
leading coecient is 1 and thus
P (z) = 1 + an1 z n1 + . . . + a1 z + a0 , ai C.
Let
= inf |P (z)|.
zC
Thus is a non-negative real number and our job is to show (i) that is actually
attained as a value of M and (ii) = 0.
Step 1: Since for z = 0,
(
)
|an1 |
|a0 |
|P (z)| = |z|n 1 +
+ ... + n ,
|z|
|z|
it follows that
lim |P (z)| = 1 = .
z
Thus there is R > 0 such that for all z C with |z| > R, |P (z)| > |P (0)|. By
Theorem 15.7, the continuous function |P (z)| assumes a minimum value on the
closed, bounded set {z C | |z| R}, say at z0 . But R was chosen so that
|P (z)| > |P (0)| |P (z0 )| for all z with |z| > R, so altogether |P (z)| |P (z0 )| for
all z C and thus = inf zC |P (z)| = |P (z0 )|.
Step 2: Seeking a contradiction, we suppose > 0. Dene Q : C C by
0)
Q(z) = PP(z+z
(z0 ) . Thus Q is also a degree n polynomial function, Q(z) = 1, and by
minimality of z0 , |Q(z)| 1 for all z C. We may write Q(z) = 1+bk z k +. . .+bn z n
with bk = 0 for some 1 k n. Let w C be such that
|bk |
wk =
;
bk
the existence of such a w is guaranteed by Lemma 15.12. Then for r (0, ),
Q(rw) = 1 + bk rk wk + bk+1 rk+1 wk+1 + . . . + bn rn wn
(
)
= 1 rk |bk | rwk+1 bk+1 . . . rnk wn bn = 1 rk (|bk | + C(r)) ,
where we have set C(r) = rwk+1 bk+1 . . . rnk wb bn . Thus
|Q(rw)| = |1 rk (|bk | + C(r))| |1 rk |bk || + |rk C(r)|.
As r approaches 0 from the right, rk |bk | and C(r) both approach 0. Thus for
suciently small r, we have rk |bk | < 1 and |C(r)| < |bk | and then
|Q(rw)| |1 rk |bk || + |rk C(r)| = 1 rk (|bk | |C(r)|) < 1.
This contradicts the fact that minzC |Q(z)| = 1 and completes the proof.
CHAPTER 16
Foundations Revisited
The reader should picture a street mime juggling non-existent balls. As the mime
continues, the action of juggling slowly brings the balls into existence, at rst in
dim outline and then into solid reality. T.W. Korner1
An ordered eld F is Dedekind complete if every nonempty subset which is
bounded above has a least upper bound (or supremum).
Exercise 16.0: Show that an ordered eld is Dedekind complete i every nonempty
subset which is bounded below has a greatest upper bound (or inmum).
Our initial denition of R was precisely that it was a Dedekind complete ordered
eld. Practically speaking, this is a great foundation for honors calculus and real
analysis, because it contains all the information we need to know about R.
In other words, we have put a neat little black box around our foundational problems. Real analysis works perfectly well without ever having to look in the box.
But curiosity is a fundamental part of mathematics, and at some point most of us
will want to look in the box. This chapter is for those who have reached that point,
i.e., who want to understand a proof of the following theorem.
Theorem 16.1. (Black Box Theorem)
a) There is a Dedekind complete ordered eld.
b) If F1 and F2 are Dedekind complete ordered elds, they are isomorphic: that is,
there is a bijection : F1 F2 such that:
(i) For all x, y F1 , (x + y) = (x) + (y).
(ii) For all x, y F1 , (xy) = (x)(y).
(iii) For all x, y F1 , x y (x) (y).
c) The isomorphism of part b) is unique: there is exactly one such map between
any two Dedekind complete ordered elds.
The Black Box Theorem explains why we never needed any further axioms of R
beyond the fact that it is a Dedekind complete ordered eld: there is exactly one
such structure, up to isomorphism.2
1Thomas William K
orner, 1946
2The student unfamiliar with the notion of isomorphism should think of it as nothing else
than a relabelling of the points of R. For instance consider the x-axis Rx = {(a, 0) | a R}
and the y-axis Ry = {(0, a) | a R} in the plane. These are two copies of R. Are they the
same? Not in a hard-nosed set-theoretic sense: they are dierent subsets of the plane. But
they are essentially the same: the bijection which carries (a, 0) 7 (0, a) preserves the addition,
multiplication and order relation. So really we have two slightly dierent presentations of the same
329
330
We will prove the Black Box Theorem...eventually. But rather than taking the
most direct possible route we broaden our focus to a study of the structure of
ordered elds, not just Q and R.
1. Ordered Fields
1.1. Basic Denitions.
In this section we revisit the considerations of 1.2 from a somewhat dierent perspective. Before we listed certain ordered eld axioms, but the perspective there
was that we were collecting true, and basic, facts about the real numbers for use in
our work with them. This time our perspective is to study and understand the collection of all ordered elds. One of our main goals is to construct the real numbers
R in terms of the rational numbers Q and to understand this in terms of a more
general process, completion, which can be applied in any ordered eld.
A eld is a set F endowed with two binary operations + and which satisfy
all of the eld axioms (P0) through (P9). To a rst approximation, these axioms
simply encode the usual rules of addition, subtraction, multiplication and division
of numbers, so any eld can be thought of as a kind of generalized number system. The most important basic examples are the rational numbers Q, the real
numbers R, and the complex numbers C. But there are other examples which seem
farther removed from the usual numbers: e.g. nite elds like F2 = {0, 1} are
smaller than what we normally think of as a number system, whereas the set R(t) of
all rational functions (with real coecients) is a eld whose elements are naturally
regarded as functions, not as numbers.
Field theory is an active branch of mathematical research, with several texts and
thousands of papers devoted to it (e.g. [FT]). Nevertheless the very simple properties of elds established in 1.2.1 will be sucient for our needs here, in part
because we are not interested in elds per se but rather ordered elds. An ordered
eld is a eld equipped with the additional structure of a total order relation,
namely a binary relation which satises:
(TO1)
(TO2)
(TO3)
(TO4)
1. ORDERED FIELDS
331
332
Proof. Assume to the contrary that there are x = y F with f (x) = f (y).
Then f (x y) = f (x) f (y) = 0. Since x = y, x y = 0, and thus we have a
1
multiplicative inverse xy
. Then
1 = f (1) = f ((x y)
1
1
1
) = f (x y)f (
) = 0 f(
) = 0,
xy
xy
xy
1. ORDERED FIELDS
333
334
in a non-Archimedean eld.
By denition, an ordered eld K is Archimedean if Z is conal in K. Equivalently, K is Archimedean i Q is conal in K.
Let F be a subeld of the ordered eld F . We say that F is dense in K if
for all x < y K, there is z F such that x < z < y.
Lemma 16.4. Let K be an ordered eld, and let F be a subeld of K. If F is
dense in K, then F is conal in K.
Proof. We show the contrapositive: suppose F is not conal in K: there is
x K such that for all y F , y x. Then the interval (x, x + 1) contains no
points of F , so F is not dense in K.
More generally, let f : F F be a homomorphism of ordered elds. We say that
f is conal if the image f (F ) is a conal subeld of F . We say that f is dense
if the image f (F ) is a dense subeld of F .
Exercise 16.10: Let K be a subeld of F .
a) Suppose that for every F , there is a sequence {xn } of elements of K such
that xn . Show that K is a dense subeld of F .
b) Does the converse of part a) hold? (Hint: no, but counterexamples are not so
easy to come by.)
Lemma 16.5. For a homomorphism f : F F of ordered elds, TFAE:
(i) f is conal.
(ii) For every positive F , there is a positive F such that f () < .
Exercise 16.11: Prove Lemma 16.5. (Hint: take reciprocals!)
Lemma 16.6. For an ordered eld F , the following are equivalent:
(i) F is Archimedean.
(ii) Q is a dense subeld of F .
Proof. (i) = (ii): Suppose F is Archimedean and let x < y F . Let
1
n Z+ be such that yx
< n; then 0 < n1 < y x, so
1
< y.
n
(ii) = (i): If Q is dense in F then by Lemma 16.4, Q is conal in F .
x<x+
1. ORDERED FIELDS
335
336
On the other hand, in an arbitrary ordered eld a Cauchy sequence need not be
convergent. For instance the Babylonian
sequence
1. ORDERED FIELDS
337
338
339
k
U(A) and k 2n M }.
2n
Every element of Sn lies in the interval [2n m, 2n M ] and 2n M Sn , so each Sn is
kn
n
nite and nonempty. Put kn = min Sn and an = k2nn , so 22k
n+1 = 2n U(S) wihle
kn 1
2kn 2
/ U(S). It follows that we have either kn+1 = 2kn or kn+1 = 2kn 1
2n+1 = 2n
1
and thus either an+1 = an or an+1 = an 2n+1
. In particular {an } is decreasing.
For all 1 m < n we have
Sn = {k Z |
so an
1
2n
kn 1
2n
The proof of (ii) = (i) in Theorem 16.14 above is taken from [HS] by way of
[Ha11]. It is rather unexpectedly complicated, but I do not know a simpler proof
at this level. However, if one is willing to introduce the notion of convergent and
Cauchy nets, then one can show rst that in an Archimedean ordered eld, the
convergence of all Cauchy sequences implies the convergence of all Cauchy nets,
and second use the hypothesis that all Cauchy nets converge to give a proof which
is (in my opinion of course) more conceptually transparent. This is the approach
taken in my (more advanced) notes on Field Theory [FT].
In fact there are (many!) non-Archimedean sequentially complete ordered elds.
We will attempt to describe two very dierent examples of such elds here. We
hasten to add that this is material that the majority of working research mathematicians are happily unfamiliar with, and which is thus extremely rarely covered
in undergraduate courses. Only the exceptionally curious need the next section.
2.2. Sequential Completion I: Statement and Applications.
We will now establish one of our main results: for every ordered eld F , there
is a sequentially complete ordered eld R and a homomorphism f : F R.
In fact we can, and will prove, even more than this. The point is that there will
be many (nonisomorphic) sequentially complete elds into which any given ordered
340
eld embeds. For example, when we construct the real numbers we will have an
embedding Q , R. But we also have an embedding R , R((t)), so taking the
composite gives an embedding Q R((t)). (There is no way that R and R((t)) are
isomorphic, since the former is Archimedean and the latter is not.)
We would like a general denition which allows us to prefer the embedding
Q , R to the embedding Q , R((t)). The key observation is that, since R is
Archimedean, the embedding of Q into R is dense, whereas since R((t)) is not
Archimedean, the embedding of Q into R is not dense. This leads to the following
important denition.
A sequential completion of an ordered eld F is a dense embedding F , R
into a sequentially complete ordered eld.
Lemma 16.15. For an ordered eld F , the following are equivalent.
(i) F is Dedekind complete.
(ii) The inclusion : Q , F makes F into a sequential completion of Q.
Proof. (i) = (ii): By Theorem 16.14, F is sequentially complete and
Archimedean. By Lemma 16.6, Q = (Q) is a dense subeld of F , and it follows
that F is a sequential completion of Q.
We will prove that every ordered eld admits a sequential completion. And again,
we will in fact prove a bit more.
Theorem 16.16. Let F be an ordered eld.
a) F admits a sequential completion : F R.
b) If L is any sequentially complete ordered eld and f : F L is a conal ordered
eld homomorphism, then there is a unique ordered eld homomorphism g : R L
such that f = g .
Corollary 16.17. Two sequential completions of the same ordered eld are
isomorphic.
Proof. Let 1 : F R1 and 2 : F R2 be two sequential completions.
Applying Theorem 16.16 with R = R1 and f = 2 : F R2 , we get a unique
homomorphism g : R1 R2 such that 2 = g 1 . Interchanging the roles of R1
and R2 we also get a unique homomorphism g : R2 R1 such that 1 = g 2 .
Now consider g g : R1 R . We have
(g g) 1 = g (g 1 ) = g 2 = 1 .
Applying Theorem 16.16 with L = R = R1 we get that there is a unique homomorphism G : R R1 such that G1 = 1 , but clearly the identity map 1R1 has this
property. Thus we must have g g = 1R1 . Similarly considering g g : R2 R2 ,
then in view of
(g g ) 2 = g (g 2 ) = g 1 = 2 ,
we deduce that g g = 1R2 . In other words, g and g are mutually inverse isomorphisms...so R1 and R2 are isomorphic.
Applying Theorem 16.16 to the ordered eld Q, we get a sequential completion R
of Q. Since R is Archimedean and sequentially complete, by Theorem 16.14, R is
Dedekind complete. Conversely, by Lemma 16.15any Dedekind complete ordered
341
eld R is isomorphic to R. Thus the existence and uniqueness statements of Theorem 16.16 imply the existence and uniqueness up to isomorphism of a Dedekind
complete ordered eld.
The uniqueness statement can be strengthened: let R1 and R2 be two Dedekind
complete ordered elds. We claim that not only are they isomorphic, but that the
isomorphism between them is unique. Indeed, for i = 1, 2 let i : Q Ri be the
inclusion maps. We saw above that there is a unique map g : R1 R2 such that
g 1 = 2 and this g is an isomorphism. But any isomorphism h : R1 R2
will satisfy h 1 = 2 , since in fact there is exactly one embedding from Q into
any ordered eld. Thus whereas in general there is an isomorphim g between two
sequential completions of a given ordered eld F which is unique such that blah
blah blah (more precisely, such that g 1 = 2 ), in this case the blah blah blah
is vacuous and the isomorphism is unique full stop.
In abstract mathematics, uniqueness up to a unique isomorphism is as close to
identical as we can reasonably ask for two structures to be. (Even the horizontal
copy of R and the vertical copy of R are dierent sets, but the obvious isomorphism between them is the only isomorphism, so no trouble can arise by identifying
the two.) We denote this unique eld by R and call it the real numbers...of course.
2.3. Sequential Completion II: The Proof.
Now we are properly motivated to roll up our sleeves and endure the rather lengthy,
technical proof of Theorem 16.16. The essential idea (which is indeed due to A.L.
Cauchy) is to build the sequential completion directly from the set C of all Cauchy
sequences in F .
We can observe that C itself has some structure reminiscent of an ordered eld
but that things do not quite work out: it is somehow too large to itself be an ordered eld. Namely, it makes perfectly good sense to add, subtract and multiply
Cauchy sequences in F . For that matter, it makes perfectly good sense to add,
subtract and multiply arbitrary sequences in F : we simply put
{xn } + {yn } = {xn + yn },
{xn } {yn } = {xn yn },
{xn } {yn } = {xn yn }.
It remains to check that these operations take Cauchy sequences to Cauchy sequences. At the very beginning of our study of sequences we showed this for
convergent sequences (in R, but the proofs certainly did not use any form of the
completeness axiom). It is no more dicult to establish the analogue for Cauchy
sequences in F .
Lemma 16.18. Let F be any ordered eld, and let a , b be Cauchy sequences.
Then a + b and a b are both Cauchy.
Proof. Since a and b are both Cauchy, for > 0 there is N Z+ such that
for m, n N , |am an | < and and |bm bn | < . Then
|(am + bm ) (an + bn )| |am an | + |bm bn | < 2.
342
Further, since the sequences are Cauchy, they are bounded: there are Ma , Mb F
such that |an | Ma and |bn | Mb for all n Z+ . Then for m, n N ,
|am bm an bn | |am an ||bm | + |an ||bm bn | (Ma + Mb ).
So does this addition and multiplication endow C with the structure of a eld?
There is an additive identity, namely the sequence with xn = 0 for all n. There
is also a multiplicative identity, namely the sequence with xn = 1 for all n. It all
works well until we get to multiplicative inverses.
Exercise 16.16: Let {xn } be a sequence in the ordered eld F .
a) Show that there is a sequence {yn } with {xn } {yn } = {1} if and only if for all
n Z+ , xn = 0.
b) Show that if {xn } is Cauchy and xn = 0 for all n, then its inverse { x1n } is again
a Cauchy sequence.
Thus there are plenty of Cauchy sequences other than the constantly zero sequence
which do not have multiplicative inverses: e.g. (0, 1, 1, 1, . . .), or indeed any constant
sequence which takes the value 0 at least once. Thus C has many good algebraic
properties, but it is not the case that every nonzero element has a multiplicative
inverse, so it is not a eld.5
We also have some order structure on C. For instance, it is tempting to dene
{xn } > {yn } if xn > yn for all n. This turns out not to be a good denition in the
sense that it does not lead to a trichotomy: there will be unequal Cauchy sequences
{xn } and {yn } for which neither is less than the other, e.g.
{xn } = {0, 1, 1, . . .}, {yn } = {1, 0, 0, . . .}.
As in the denition of convergence, it is more fruitful to pay attention to what
a Cauchy sequence is doing eventually. Exploiting this idea we can get a sort of
trichotomy result.
Lemma 16.19. (Cauchy Trichotomy) For a Cauchy sequence {xn } in an ordered
eld F , exactly one of the following holds:
(i) There is a positive element F and N Z+ such that xn for all n N .
(ii) There is a positive element F and N Z+ such that xn for all n N .
(iii) xn converges to 0.
Proof. It is easy to see that the conditions are mutually exclusive. Let us
suppose that (iii) does not hold: thus there is > 0 and a subsequence {xnk }
such that |xnk | for all k Z+ . By passing to a further subsequence we may
assume either that xnk for all k or xnk for all k. Let us suppose that
the former holds and show that this implies (i): if so, replacing x by x shows
that the latter alternative implies (ii). Since {xn } is Cauchy, there is N Z+ such
that |xm xn | 2 for all m, n N . Putting these two conditions together we get
xn 2 for all n N .
5for those who know some abstract algebra: what weve shown is that (C, +, ) is a commutative ring. There is a very general algebraic method which, when given a commutative ring,
will yield a collection of elds dened in terms of that ring. The present construction is indeed
an instance of this.
343
Unfortunately this is not quite the kind of trichotomy which denes a total order
relation: we have some elements that we regard as positive case (i) above, some
elements that we regard as negative case (ii) above but for an order relation
only the zero element should be neither positive nor negative, whereas case (iii)
above includes the much larger collection of elements converging to zero.
Lemma 16.19 suggests that if we could somehow squash down the subset of
Cauchy sequences which converge to 0 to a single point, then we would actually get
a total order relation. This business of squashing subsets to a point is formalized
in mathematics (more so in algebra and topology than the part of mathematics
weve been studying for most of this text!) by an equivalence relation. Rather
than providing a logically complete but pedagogically useless whirlwind tour of
equivalence relations, we will simply assume that the reader is familiar with them.6
Namely, we regard any two Cauchy sequences which converge to 0 as equivalent.
We are left with the question of when to regard two Cauchy sequences which do
not converge to zero as equivalent. We could simply not squash them, i.e., declare two such sequences to be equivalent exactly when they are equal. But a
little exploration shows that this wont work: well get a total order relation but it
wont interact well with the algebraic structure. For instance, consider the Cauchy
sequences
{xn } = (0, 0, 1, 1, 1, . . .), {yn } = (1, 1, 1, 1, . . .).
The dierence {xn } {yn } converges to 0 so is getting identied with 0. Thus we
should identify {xn } and {yn } as well. This leads to the following key denitions.
Let Z be the set of all sequences in F which converge to 0; convergent sequences
are Cauchy, so certainly Z C. For two Cauchy sequences a , b C, we put
a b a b Z.
In words, two Cauchy sequences are equivalent i their dierence converges to zero.
Exercise 16.17 (if you know abstract algebra): Show that Z is a maximal ideal
in the commutative ring C. Why is this an exciting sign that were on the right
track?
Exercise 16.18: Let {xn } be a Cauchy sequence in F , and let {xnk } be any subsequence. Show that {xn } {xnk }.
Now we dene R as C/ , that is, the set of equivalence classes of Cauchy sequences. This will be the underlying set of our sequential completion. It remains
to endow it with all the rest of the structure. The idea here is that when we
pass to a quotient by an equivalence relation we can try to simply carry over the
structure we already had, but at every step we must check that the operations are
well-dened, meaning they are independent of the chosen equivalence class. At no
point are these verications dicult, but we admit they can be somewhat tedious.
Let us check the addition and multiplication induced well-dened operations on
6At UGA they are covered in the transitional Math 3200 course. The reader who has made
it through most of this text will have no problem learning this concept.
344
the set R of equivalence classes. This means: if we have four Cauchy sequences
a , b , c , d and a c , b d , then
a + b c + d , a b c d .
All right: since a c and b d , a c 0 and b d 0, so
(a + b (c + d ) = (a c ) + (b d ) 0 + 0 = 0,
so a + b c + d . Similarly,
a b c d = (a c )b + (b d )c ,
and this converges to 0 because a c , b d 0 and b , c are bounded. Thus
we have equipped our set R with two binary operations + and .
Proposition 16.20. (R, +, ) is a eld.
Proof. Most of these axioms are completely straightforward (but, yes, somewhat tedious) to verify and are left to the reader as exercises. Let us single out:
(P3) The additive identity is [0 ], the class of the constant sequence 0.
(P7) The multiplicative identity is [1 ], the class of the constant sequence 1.
(P8) Suppose that x R \ {[0 ]}, and let x be any Cauchy sequence representing
x. Then we must have xn = 0 for all suciently large n: indeed, otherwise we
would have 0 = (0, 0, 0, . . .) as a subsequence, and if a subsequence of a Cauchy
sequence converges to 0, then the Cauchy sequence itself converges to 0, contradiction. Suppose xn = 0 for all n > N . Then dene y by yn = 0 for all 1 n N
(or whatever you want: it doesnt matter) and yn = x1n for n > N . Then xn yn = 1
for all n > N , so x y diers from 1 by a sequence which is convergent to zero:
[x ][y = [1 ] = 1, so y = [y ] is the multiplicative inverse of x = [x ].
We now equip R with an ordering. For a , b C, we put [a ] > [b ] if there is a
positive element in F such that an bn for all suciently large n. We claim
that this is well-dened independent of the representatives a and b chosen. Indeed,
if x and y converge to zero, then for suciently large n we have |xn yn | < 2
and then
345
Existence: We must show that putting g(x) = y as above denes an ordered eld
homomorphism from R to L. If x1 = [an ] and x2 = [bn ], let y1 = limn f (an )
and y2 = limn f (bn ). Then an + bn x1 + x2 and f (an + bn ) = f (an ) +
346
Bibliography
[Ac00]
[A]
[B]
[Ba70]
[Ba22]
[Ba98]
[Be06]
[Be12]
[BM22]
[Bo71]
[BS]
[C]
[Ca21]
[Ca89]
[Ch01]
[CJ]
[Cl10]
[Cl11]
[Co77]
[CdC]
[CdD]
[Co49]
[Cu65]
[Da12]
[DC]
[DR50]
[DS]
equations int
egrales. Fund. Math. 3 (1922), 133181.
B. Banaschewski, On proving the existence of complete ordered elds. Amer. Math.
Monthly 105 (1998), 548551.
A.F. Beardon, Contractions of the Real Line. Amer. Math. Monthly 113 (2006), 557558.
S.J. Bernstein, D
emonstration du th
eor`
eme de Weierstrass fond
ee sur le calcul des probabilit
es. Communications of the Kharkov Mathematical Society 13 (1912), 12.
H. Bohr and J. Mollerup, Lrebog i matematisk Analyse, vol. 3, Jul. Gjellerups Forlag,
Copenhagen, 1922.
R.P. Boas, Jr., Signs of Derivatives and Analytic Behavior. Amer. Math. Monthly 78
(1971), 10851093.
R.G. Bartle and D.R. Sherbert, Introduction to real analysis. Second edition. John Wiley
& Sons, Inc., New York, 1992.
H. Cartan, Elementary theory of analytic functions of one or several complex variables.
Dover Publications, Inc., New York, 1995.
A.L. Cauchy, Analyse alg
ebrique, 1821.
A.L. Cauchy, Sur la convergence des s
eries, in Oeuvres compl`
etes S
er. 2, Vol. 7, GauthierVillars (1889), 267279.
D.R. Chalice, How to Dierentiate and Integrate Sequences. Amer. Math. Monthly 108
(2001), 911921.
R. Courant and F. John, Introduction to Calculus and Analysis.
P.L. Clark, Real induction. http://math.uga.edu/pete/realinduction.pdf
P.L.
Clark,
Induction
and
completeness
in
ordered
sets.
http://math.uga.edu/pete/induction completeness brief.pdf
G.L. Cohen, Is Every Absolutely Convergent Series Convergent? The Mathematical
Gazette 61 (1977), 204213.
K. Conrad, The contraction mapping theorem. http://www.math.uconn.edu/
kconrad/blurbs/analysis/contractionshort.pdf
K.
Conrad,
Estimating
the
size
of
a
divergent
sum.
http://www.math.uconn.edu/kconrad/blurbs/analysis/sumest.pdf
J.L. Coolidge, The story of the binomial theorem. Amer. Math. Monthly 56 (1949), 147157.
F. Cunningham, Jr., Classroom Notes: The Two Fundamental Theorems of Calculus.
Amer. Math. Monthly 72 (1965), 406-407.
1
2
D. Daners, A Short Elementary Proof of
= 6 . Math. Mag. 85 (2012), 361364.
k2
P.L. Clark, Discrete calculus. In preparation. Draft available on request.
A. Dvoretzky and C.A. Rogers, Absolute and unconditional convergence in normed linear
spaces. Proc. Nat. Acad. Sci. USA 36 (1950), 192197.
Dirichlet
series,
notes
by
P.L.
Clark,
available
at
http://math.uga.edu/pete/4400dirichlet.pdf
347
348
[Ed62]
[Er94]
[ES35]
[FT]
[Go]
[Gr]
[Ha88]
[Ha11]
[Ha50]
[H]
[Ha02]
[Ha11]
[HS]
[Ho66]
[Ka07]
[Ke70]
[Kn80]
[K
o91]
[La]
[L]
[Li33]
[LV06]
[Lu99]
[Ma56]
[Ma42]
[Ma61]
[MK09]
[Me72]
[MS22]
[Mo50]
[Mo51]
BIBLIOGRAPHY
M. Edelstein, On xed and periodic points under contractive mappings. J. London Math.
Soc. 37 (1962), 74-79.
M. Erickson, An introduction to combinatorial existence theorems. Math. Mag. 67 (1994),
118-123.
P. Erd
os and G. Szekeres, A combinatorial problem in geometry. Compositio Math. 2
(1935), 463-470.
Field
Theory,
notes
by
P.L.
Clark,
available
at
http://www.math.uga.edu/pete/FieldTheory.pdf
R. Gordon, Real Analysis: A First Course. Second Edition, Addison-Wesley, 2001.
P.M. Gruber, Convex and discrete geometry. Grundlehren der Mathematischen Wissenschaften 336. Springer, Berlin, 2007.
J. Hadamard, Sur le rayon de convergence des s
eries ordonn
ees suivant les puissances
dune variable. C. R. Acad. Sci. Paris 106 (1888), 259262.
J.F. Hall, Completeness of Ordered Fields. 2011 arxiv preprint.
H.J. Hamilton, A type of variation on Newtons method. Amer. Math. Monthly 57 (1950),
517-522.
G.H. Hardy, A course of pure mathematics. Centenary edition. Reprint of the tenth (1952)
edition with a foreword by T. W. K
orner. Cambridge University Press, Cambridge, 2008.
F. Hartmann, Investigating Possible Boundaries Between Convergence and Divergence.
College Math. Journal 33 (2002), 405406.
D. Hathaway, Using Continuity Induction. College Math. Journal 42 (2011), 229231.
E. Hewitt and K. Stromberg, Real and abstract analysis. A modern treatment of the
theory of functions of a real variable. Third printing. Graduate Texts in Mathematics,
No. 25. Springer-Verlag, New York-Heidelberg, 1975.
A. Howard, Classroom Notes: On the Convergence of the Binomial Series. Amer. Math.
Monthly 73 (1966), 760-761.
I. Kalantari, Induction over the continuum. Induction, algorithmic learning theory, and
philosophy, 145154, Log. Epistemol. Unity Sci., 9, Springer, Dordrecht, 2007.
H. Kestelman, Riemann Integration of Limit Functions. Amer. Math. Monthly 77 (1970),
182187.
W.J. Knight, Functions with zero right derivatives are constant. Amer. Math. Monthly
87 (1980), 657-658.
T.W. K
orner, Dierentiable functions on the rationals. Bull. London Math. Soc. 23
(1991), 557-562.
J. Labute, Math 255, Lecture 22:
Power Series:
The Binomial Series.
http://www.math.mcgill.ca/labute/courses/255w03/L22.pdf
S. Lang, Undergraduate analysis. Second edition. Undergraduate Texts in Mathematics.
Springer-Verlag, New York, 1997.
F.A Lindemann, The Unique Factorization of a Positive Integer. Quart. J. Math. 4,
319320, 1933.
M. Longo and V. Valori, The Comparison Test Not Just for Nonnegative Series. Math.
Magazine 79 (2006), 205210.
J. Lu, Is the Composite Function Integrable? Amer. Math. Monthly 106 (1999), 763766.
A.M. Macbeath, A criterion for dierentiability. Edinburgh Math. Notes (1956), 8-11.
C. Maclaurin, Treatise of uxions, 1. Edinburgh (1742), 289290.
1
2
Y. Matsuoka, An elementary proof of the formula
k=1 k2 = 6 . Amer. Math. Monthly
68 (1961), 485487.
M.M.
Marjanovi
c
and
Z.
Kadelburg,
Limits
of
composite
functions.
The
Teaching
of
Mathematics,
XII
(2009),
16.
http://elib.mi.sanu.ac.rs/files/journals/tm/22/tm1211.pdf
F. Mertens, Ueber die Multiplicationsregel f
ur zwei unendliche Reihen. Journal f
ur die
Reine und Angewandte Mathematik 79 (1874), 182184.
E.H. Moore and H.L. Smith, A General Theory of Limits. Amer. J. of Math. 44 (1922),
102121.
R.K. Morley, Classroom Notes: The Remainder in Computing by Series. Amer. Math.
Monthly 57 (1950), 550551.
R.K. Morley, Classroom Notes: Further Note on the Remainder in Computing by Series.
Amer. Math. Monthly 58 (1951), 410412.
BIBLIOGRAPHY
349
[Mo57] T.E. Mott, Newtons method and multiple roots. Amer. Math. Monthly 64 (1957), 635638.
[Mu63] A.A. Mullin, Recursive function theory (A modern look at a Euclidean idea). Bulletin of
the American Mathematical Society 69 (1963), 737.
[Ne03] Nelsen, R.B. An Impoved Remainder Estimate for Use with the Integral Test. College
Math. Journal 34 (2003), 397399.
[Ne81] D.J. Newman, Dierentiation of asymptotic formulas. Amer. Math. Monthly 88 (1981),
526-527.
[No52] M.J. Norris, Integrability of continuous functions. Amer. Math. Monthly 59 (1952), 244245.
[Ol27] L. Olivier, Journal f
ur die Reine und Angewandte Mathematik 2 (1827), 34.
[PFD] P.L.
Clark,
Partial
Fractions
Via
Linear
Algebra,
http://www.math.uga.edu/pete/partialfractions.pdf.
[Ro63] K. Rogers, Classroom Notes: Unique Factorization. Amer. Math. Monthly 70 (1963),
547548.
[R]
W. Rudin, Principles of mathematical analysis. Third edition. International Series in
Pure and Applied Mathematics. McGraw-Hill, New York-Auckland-D
usseldorf, 1976.
[S]
M. Schramm, Introduction to Real Analysis. Dover edition, 2008.
[SP88] D. Scott and D.R. Peeples, The Teaching of Mathematics: A Constructive Proof of the
Partial Fraction Decomposition. Amer. Math. Monthly 95 (1988), 651-653.
[Se59] A. Seidenberg, A simple proof of a theorem of Erd
os and Szekeres. J. London Math. Soc.
34 (1959), 352.
[S]
M. Spivak, Calculus. Fourth edition.
[St95] S.K. Stein, Do Estimates of an Integral Really Improve as n Increases? Math. Mag. 68
(1995), 16-26.
[St37] M.H. Stone, Applications of the theory of Boolean rings to general topology. Trans. Amer.
Math. Soc. 41 (1937), 375-481.
[St48] M.H. Stone, The generalized Weierstrass approximation theorem. Math. Mag. 21 (1948),
167-184.
[Str90] G. Strang, Sums and Dierences vs. Integrals and Derivatives. College Math. Journal 21
(1990), 2027.
[Ta55] A. Tarski, A lattice-theoretical xpoint theorem and its applications. Pacic J. Math. 5
(1955), 285-309.
J.W. Tukey, Convergence and Uniformity in Topology. Annals of Mathematics Studies,
[T]
no. 2. Princeton University Press, 1940.
[Ve02] D.J. Velleman, Partial fractions, binomial coecients, and the integral of an odd power
of sec . Amer. Math. Monthly 109 (2002), 746-749.
[Wa48] H.S. Wall, A modication of Newtons method. Amer. Math. Monthly 55 (1948), 90-94.
[Wa95] J.A. Walsh, The Dynamics of Newtons Method for Cubic Polynomials. The College
Mathematics Journal 26 (1995), 2228.
[Wa36] M. Ward, A Calculus of Sequences. Amer. J. Math. 58 (1936), 255266.
[Ze34] E. Zermelo, Elementare Betrachtungen zur Theorie der Primzahlen. Nachr. Gesellsch.
Wissensch. G
ottingen 1, 4346, 1934.