FOUNDATIONS OF INFINITESIMAL CALCULUS

H. JEROME KEISLER Department of Mathematics University of Wisconsin, Madison, Wisconsin, USA [email protected]

June 4, 2011

This work is licensed under the Creative Commons Attribution-NoncommercialShare Alike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/

2007 by H. Jerome Keisler

CONTENTS

Preface................................................................ vii

Chapter 1. The Hyperreal Numbers.............................. 1

1A. Structure of the Hyperreal Numbers (§1.4, §1.5).............. 1

1B. Standard Parts (§1.6) ........................................ 5

1C. Axioms for the Hyperreal Numbers (§Epilogue) .............. 7

1D. Consequences of the Transfer Axiom ......................... 9

1E. Natural Extensions of Sets ................................... 14

1F. Appendix. Algebra of the Real Numbers ..................... 19

1G. Building the Hyperreal Numbers ............................. 23

Chapter 2. Differentiation........................................ 33

2A. Derivatives (§2.1, §2.2)....................................... 33

2B. Inﬁnitesimal Microscopes and Inﬁnite Telescopes ............. 35

2C. Properties of Derivatives (§2.3, §2.4) ......................... 38

2D. Chain Rule (§2.6, §2.7)....................................... 41 Chapter 3. Continuous Functions ................................ 43

3A. Limits and Continuity (§3.3, §3.4)............................ 43

3B. Hyperintegers (§3.8) ......................................... 47

3C. Properties of Continuous Functions (§3.5–§3.8)............... 49

Chapter 4. Integration ............................................ 59

4A. The Deﬁnite Integral (§4.1) .................................. 59

4B. Fundamental Theorem of Calculus (§4.2)..................... 64

4C. Second Fundamental Theorem of Calculus (§4.2) ............. 67

Chapter 5. Limits................................................... 71

5A. ε,δ Conditions for Limits (§5.8, §5.1) ........................ 71

5B. L’Hospital’s Rule (§5.2) ...................................... 74 Chapter 6. Applications of the Integral........................ 77

6A. Inﬁnite Sum Theorem (§6.1, §6.2, §6.6)....................... 77

6B. Lengths of Curves (§6.3, §6.4)................................ 82

6C. Improper Integrals (§6.7)..................................... 87

iii

iv CONTENTS

Chapter 7. Trigonometric Functions ............................ 91

7A. Inverse Function Theorem (§7.3) ............................. 91

7B. Derivatives of Trigonometric Functions (§7.1, §7.2) ........... 94

7C. Area in Polar Coordinates (§7.9) ............................. 95

Chapter 8. Exponential Functions............................... 99

8A. Extending Continuous Functions ............................. 99

8B. The Functions ax and logb x (§8.1, §8.2) ..................... 100

8C. Derivatives of Exponential Functions (§8.3) .................. 102 Chapter 9. Infinite Series ......................................... 105 9A. Sequences (§9.1) ............................................. 105 9B. Series (§9.2 – §9.6)........................................... 108

9C. Taylor’s Formula and Higher Diﬀerentials (§9.10)............. 110 Chapter 10. Vectors ............................................... 115

10A. Hyperreal Vectors (§10.8) .................................... 115

10B. Vector Functions (§10.6) ..................................... 118 Chapter 11. Partial Differentiation............................. 121 11A. Continuity in Two Variables (§11.1, §11.2) ................... 121

11B. Partial Derivatives (§11.3, §11.4) ............................. 122

11C. Chain Rule and Implicit Functions (§11.5, §11.6) ............. 125

11D. Maxima and Minima (§11.7) ................................. 128

11E. Second Partial Derivatives (§11.8)............................ 133

Chapter 12. Multiple Integration................................ 137 12A. Double Integrals (§12.1, §12.2) ............................... 137

12B. Inﬁnite Sum Theorem for Two Variables (§12.3).............. 140

12C. Change of Variables in Double Integrals (§12.5) .............. 144

Chapter 13. Vector Calculus.....................................151 13A. Line Integrals (§13.2) ........................................ 151 13B. Green’s Theorem (§13.3, §13.4)...............................154

Chapter 14. Differential Equations ............................. 161

14A. Existence of Solutions (§14.4) ................................ 162

14B. Uniqueness of Solutions (§14.4)...............................167

14C. An Example where Uniqueness Fails (§14.3).................. 171

Chapter 15. Logic and Superstructures......................... 175 15A. The Elementary Extension Principle ......................... 175

15B. Superstructures .............................................. 180

15C. Standard, Internal, and External Sets ........................ 184

15D. Bounded Ultrapowers ........................................ 189

15E. Saturation and Uniqueness................................... 194

CONTENTS v

References............................................................199

Index............................................................ 201

PREFACE

In 1960 Abraham Robinson (1918–1974) solved the three hundred year old problem of giving a rigorous development of the calculus based on inﬁnitesimals. Robinson’s achievement was one of the major mathematical advances of the twentieth century. This is an exposition of Robinson’s inﬁnitesimal calculus at the advanced undergraduate level. It is entirely self-contained but is keyed to the 2000 digital edition of my ﬁrst year college text Elementary Calculus: An Inﬁnitesimal Approach [Keisler 2000] and the second printed edition [Keisler 1986]. Elementary Calculus: An Inﬁnitesimal Approach is available free online at www.math.wisc.edu/∼Keisler/calc. This monograph can be used as a quick introduction to the subject for mathematicians, as background material for instructors using the book Elementary Calculus, or as a text for an undergraduate seminar. This is a major revision of the ﬁrst edition of Foundations of Inﬁnitesimal Calculus [Keisler 1976], which was published as a companion to the ﬁrst (1976) edition of Elementary Calculus, and has been out of print for over twenty years. A companion to the second (1986) edition of Elementary Calculus was never written. The biggest changes are: (1) A new chapter on diﬀerential equations, keyed to the corresponding new chapter in Elementary Calculus. (2) The axioms for the hyperreal number system are changed to match those in the later editions of Elementary Calculus. (3) An account of the discovery of Kanovei and Shelah [KS 2004] that the hyperreal number system, like the real number system, can be built as an explicitly deﬁnable mathematical structure. Earlier constructions of the hyperreal number system depended on an arbitrarily chosen parameter such as an ultraﬁlter. The basic concepts of the calculus were originally developed in the seventeenth and eighteenth centuries using the intuitive notion of an inﬁnitesimal, culminating in the work of Gottfried Leibniz (1646-1716) and Isaac Newton (1643-1727). When the calculus was put on a rigorous basis in the nineteenth century, inﬁnitesimals were rejected in favor of the ε,δ approach, because mathematicians had not yet discovered a correct treatment of inﬁnitesimals. Since then generations of students have been taught that inﬁnitesimals do not exist and should be avoided.

vii

viii Preface

The actual situation, as suggested by Leibniz and carried out by Robinson, is that one can form the hyperreal number system by adding inﬁnitesimals to the real number system, and obtain a powerful new tool in analysis. The reason Robinson’s discovery did not come sooner is that the axioms needed to describe the hyperreal numbers are of a kind which were unfamiliar to mathematicians until the mid-twentieth century. Robinson used methods from the branch of mathematical logic called model theory which developed in the 1950’s. Robinson called his method nonstandard analysis because it uses a nonstandard model of analysis. The older name inﬁnitesimal analysis is perhaps more appropriate. The method is surprisingly adaptable and has been applied to many areas of pure and applied mathematics. It is also used in such ﬁelds as economics and physics as a source of mathematical models. (See, for example, the books [AFHL 1986] and [ACH 1997]). However, the method is still seen as controversial, and is unfamiliar to most mathematicians. The purpose of this monograph, and of the book Elementary Calculus, is to make inﬁnitesimals more readily available to mathematicians and students. Inﬁnitesimals provided the intuition for the original development of the calculus and should help students as they repeat this development. The book Elementary Calculus treats inﬁnitesimal calculus at the simplest possible level, and gives plausibility arguments instead of proofs of theorems whenever it is appropriate. This monograph presents the subject from a more advanced viewpoint and includes proofs of almost all of the theorems stated in Elementary Calculus. Chapters 1–14 in this monograph match the chapters in Elementary Calculus, and after each section heading the corresponding sections of Elementary Calculus are indicated in parentheses. In Chapter 1 the hyperreal numbers are ﬁrst introduced with a set of axioms and their algebraic structure is studied. Then in Section 1G the hyperreal numbers are built from the real numbers. This is an optional section which is more advanced than the rest of the chapter and is not used later. It is included for the reader who wants to see where the hyperreal numbers come from. Chapters 2 through 14 contain a rigorous development of inﬁnitesimal calculus based on the axioms in Chapter 1. The only prerequisites are the traditional three semesters of calculus and a certain amount of mathematical maturity. In particular, the material is presented without using notions from mathematical logic. We will use some elementary set-theoretic notation familiar to all mathematicians, for example the function concept and the symbols ∅,A∪B,{x ∈ A: P(x)}. Frequently, standard results are given alternate proofs using inﬁnitesimals. In some cases a standard result which is beyond the scope of beginning calculus is rephrased as a simpler inﬁnitesimal result and used eﬀectively in Elementary Calculus; some examples are the Inﬁnite Sum Theorem, and the two-variable criterion for a global maximum.

Preface ix

The last chapter of this monograph, Chapter 15, is a bridge between the simple treatment of inﬁnitesimal calculus given here and the more advanced subject of inﬁnitesimal analysis found in the research literature. To go beyond inﬁnitesimal calculus one should at least be familiar with some basic notions from logic and model theory. Chapter 15 introduces the concept of a nonstandard universe, explains the use of mathematical logic, superstructures, and internal and external sets, uses ultrapowers to build a nonstandard universe, and presents uniqueness theorems for the hyperreal number systems and nonstandard universes. The simple set of axioms for the hyperreal number system given here (and in Elementary Calculus) make it possible to present inﬁnitesimal calculus at the college freshman level, avoiding concepts from mathematical logic. It is shown in Chapter 15 that these axioms are equivalent to Robinson’s approach. For additional background in logic and model theory, the reader can consult the book [CK 1990]. Section 4.4 of that book gives further results on nonstandard universes. Additional background in inﬁnitesimal analysis can be found in the book [Goldblatt 1991]. I thank my late colleague Jon Barwise, and Keith Stroyan of the University of Iowa, for valuable advice in preparing the First Edition of this monograph. In the thirty years between the ﬁrst and the present edition, I have beneﬁted from equally valuable and much appreciated advice from friends and colleagues too numerous to recount here,

CHAPTER 1

THE HYPERREAL NUMBERS

We will assume that the reader is familiar with the real number system and develop a new object, called a hyperreal number system. The deﬁnition of the real numbers and the basic existence and uniqueness theorems are brieﬂy outlined in Section 1F, near the end of this chapter. That section also explains some useful notions from modern algebra, such as a ring, a complete ordered ﬁeld, an ideal, and a homomorphism. If any of these terms are unfamiliar, you should read through Section 1F. We do not require any knowledge of modern algebra except for a modest vocabulary. In Sections 1A–1E we introduce axioms for the hyperreal numbers and obtain some ﬁrst consequences of the axioms. In the optional Section 1G at the end of this chapter we build a hyperreal number system as an ultrapower of the real number system. This proves that there exists a structure which satisﬁes the axioms. We conclude the chapter with the construction of Kanovei and Shelah [KS 2004] of a hyperreal number system which is deﬁnable in set theory. This shows that the hyperreal number system exists in the same sense that the real number system exists.

1A. Structure of the Hyperreal Numbers (§1.4, §1.5)

In this and the next section we assume only Axioms A, B, and C below.

Axiom A R is a complete ordered ﬁeld.

Axiom B R∗ is an ordered ﬁeld extension of R.

Axiom C R∗ has a positive inﬁnitesimal, that is, an element ε such that 0 < ε and ε < r for every positive r ∈R. In the next section we will introduce two powerful additional axioms which are needed for our treatment of the calculus. However, the algebraic facts

2 1. The Hyperreal Numbers

about inﬁnitesimals which underlie the intuitive picture of the hyperreal line follow from Axioms A–C alone. We call R the ﬁeld of real numbers and R∗ the ﬁeld of hyperreal numbers. Definition 1.1. An element x ∈R∗ is inﬁnitesimal if |x| < r for all positive real r; ﬁnite if |x| < r for some real r; inﬁnite if |x| > r for all real r. Two elements x,y ∈ R∗ are said to be inﬁnitely close, x ≈ y, if x−y is inﬁnitesimal. (Thus x is inﬁnitesimal if and only if x ≈ 0). Notice that a positive inﬁnitesimal is hyperreal but not real, and that the only real inﬁnitesimal is 0. Definition 1.2. Given a hyperreal number x ∈R∗, the monad of x is theset monad(x) = {y ∈R∗: x ≈ y}. The galaxy of x is the set galaxy(x) = {y ∈R∗: x−y is ﬁnite}. Thus monad(0) is the set of inﬁnitesimals and galaxy(0) is the set of ﬁnite hyperreal numbers. In Elementary Calculus, the pictorial device of an inﬁnitesimal microscope is used to illustrate part of a monad, and an inﬁnite telescope is used to illustrate part of an inﬁnite galaxy. Figure 1 shows how the hyperreal line is drawn. In Section 2B we will give a rigorous treatment of inﬁnitesimal microscopes and telescopes so the instructor can use them in new situations.

1A. Structure of the Hyperreal Numbers (§1.4, §1.5) 3

r r r r r @ @ @ @ r r r r r r

@ @ @ @ r r r r r r

monad(0) monad(1)

0 1− ε ε 1−ε 1 + ε

galaxy(−H) galaxy(H)

−H H− H−1 −H+1 H−1 H+1

0− 2 −1 1 2

Microscopes

Telescopes

Figure 1

We now describe the algebraic structure of R∗. Theorem 1.3. The set galaxy(0) of ﬁnite elements is a subring of R∗, that is, sums, diﬀerences, and products of ﬁnite elements are ﬁnite. Proof. Suppose x and y are ﬁnite, say |x| < r, |y| < s where r and s are real. Then |x + y| < r + s, |x−y| < r + s, |xy| < rs, so x + y,x−y, and xy are ﬁnite. a Corollary 1.4. Any two galaxies are either equal or disjoint. Proof. For each x ∈R∗, the galaxy of x is the coset of x modulo galaxy(0), galaxy(x) = {x + a: a ∈ galaxy(0)}. a Theorem 1.5. The set monad(0) of inﬁnitesimal elements is a subring of R∗ and an ideal in galaxy(0). That is: (i) Sums, diﬀerences, and products of inﬁnitesimals are inﬁnitesimal. (ii) The product of an inﬁnitesimal and a ﬁnite element is inﬁnitesimal.

4 1. The Hyperreal Numbers Proof. Let ε,δ ≈ 0. For each positive real r, |ε| < r/2, |δ| < r/2, whence |ε + δ| < r,|ε−δ| < r. Hence ε + δ and ε−δ are inﬁnitesimal. Let b be ﬁnite, say |b| < t,1 ≤ t ∈ R. Then for any positive real r we have |ε| < r/t,|εb| < (r/t)t = r. Therefore εb is inﬁnitesimal. a Corollary 1.6. Any two monads are equal or disjoint. The relation x ≈ y is an equivalence relation on R∗. Proof. For each x ∈R∗, monad(x) is the coset of x modulo monad(0), monad(x) = {x + ε: ε ∈ monad(0)}. From the deﬁnition of monad and x ≈ y we see that x ≈ y if and only if monad(x) = monad(y), so x ≈ y is an equivalence relation. a Theorem 1.7. (i) x is inﬁnite if and only if x−1 is inﬁnitesimal. (ii) monad(0) is a maximal ideal in galaxy(0). That is, there is no ideal I in galaxy(0) such that monad(0) $ I $ galaxy(0).

Proof. (i) The following are equivalent: |x|≥ r for all positive real r. |x−1|≤ r−1 for all positive real r. x−1 is inﬁnitesimal. (ii) Let I be an ideal containing monad(0) and let b ∈ I \monad(0). By(i), b−1 is not inﬁnite since b = (b−1)−1 is not inﬁnitesimal. Then b−1 ∈ galaxy(0), so 1 = b·(b−1) ∈ I. Then for any c ∈ galaxy(0), 1·c = c ∈ I, so I = galaxy(0). a Corollary 1.8. (i) There exist negative inﬁnitesimals in R∗. (ii) R∗ has positive and negative inﬁnite elements.

Proof. Axiom C says that there exists a positive inﬁnitesimal ε. The elements −ε,1/ε, and −1/ε are respectively negative inﬁnitesimal, positive inﬁnite, and negative inﬁnite. a Actually, one can go on to show that there are inﬁnitely many inﬁnitesimals and inﬁnitely many galaxies in R∗. Moreover, each galaxy is partitioned into inﬁnitely many monads. Each monad has a complicated structure; for example, the mapping H 7→ x + (H−1) maps the inﬁnite elements of R∗ one-one onto monad(x)\{x}. We caution the reader that monad(0) is an ideal in galaxy(0) but is only a subring, not an ideal, in R∗. In other words, the product of an inﬁnitesimal and an inﬁnite element need not be inﬁnitesimal. In fact, given a positive inﬁnitesimal ε, we see that

1B. Standard Parts (§1.6) 5 ε2 ·1/ε is inﬁnitesimal, ε·1/ε is ﬁnite but not inﬁnitesimal, ε·1/ε2 is inﬁnite. This corresponds to the intuitive principle that 0·∞ is an “indeterminate form”.

1B. Standard Parts (§1.6) We now state and prove the Standard Part Principle. This is the ﬁrst place where we use the fact that the real numbers are complete. This principle was stated without proof in §1.6 of Elementary Calculus, but a proof was given in the Epilogue. It was often used in Elementary Calculus instead of the Completeness Property, which made some concepts more accessible to beginning students. Theorem 1.9. (Standard Part Principle) Every ﬁnite x ∈ R∗ is inﬁnitely close to a unique real number r ∈ R. That is, every ﬁnite monad contains a unique real number. Proof. Let x ∈R∗ be ﬁnite. Uniqueness: Suppose r and s are real and r ≈ x,s ≈ x. Since ≈ is an equivalence relation we have r ≈ s, whence r −s ≈ 0. But r −s is real, so r−s = 0 and r = s. Existence: Let X = {s ∈ R: s < x}. X is nonempty and has an upper bound because there is a positive real number r such that |x| < r, whence −r < x < r, so −r ∈ X and r is an upper bound of X. By Axiom A, R is a complete ordered ﬁeld, so the set X has a least upper bound t. For every positive real r we have x ≤ t + r, x−t ≤ r and t−r ≤ x, −(x−t) ≤ r. It follows that x−t ≈ 0, so x ≈ t. a Definition 1.10. Given a ﬁnite x ∈R∗, the unique real r ≈ x is called the standard part of x, in symbols r = st(x). If x is inﬁnite, st(x) is undeﬁned.

Corollary 1.11. Let x and y be ﬁnite. (i) x ≈ y if and only if st(x) = st(y). (ii) x ≈ st(x). (iii) If r ∈R then st(r) = r. (iv) If x ≤ y then st(x) ≤ st(y). Proof. (iv) We have x = st(x) + ε, y = st(y) + δ

6 1. The Hyperreal Numbers for some inﬁnitesimal ε and δ. Assume x ≤ y. Then st(x) + ε ≤ st(y) + δ, st(x) ≤ st(y) + (δ−ε). For any positive real r, δ−ε < r, st(x) < st(y) + r, and therefore st(x) ≤ st(y).

The next theorem gives the basic algebraic rules for standard parts. Theorem 1.12. The standard part function is a homomorphism of the ring galaxy(0) onto the ﬁeld of real numbers. That is, for ﬁnite x and y, (i) st(x + y) = st(x) + st(y), (ii) st(x−y) = st(x)−st(y), (iii) st(xy) = st(x)st(y). Proof. Let x = r + ε,y = s + δ where r = st(x),s = st(y). Then ε and δ are inﬁnitesimal. (i) st(x + y) = st((r + ε) + (s + δ)) = st((r + s) + (ε + δ)) = r + s. (ii) Similar to (i). (iii) st(xy) = st((r + ε)(s + δ)) = st(rs + rδ + sε + εδ)) = rs, since rδ + sε + εδ is inﬁnitesimal. Thus st(·) is a homomorphism of galaxy(0) into R. It is obviously onto R, because for r ∈R,st(r) = r. a Corollary 1.13. Let x and y be ﬁnite. (i) If st(y) 6= 0 then st(x/y) = st(x)/st(y). (ii) If x ≥ 0 and y = n √x then st(y) = n pst(x).Proof. (i) This follows from the computation st(x) = st((x/y)·y) = st(x/y)·st(y). (ii) We have yn = x and y ≥ 0. Taking standard parts, st(x) = st(yn) = (st(y))n, and st(y) ≥ 0, so st(y) = n pst(x). a

1C. Axioms for the Hyperreal Numbers (§Epilogue) 7 1C. Axioms for the Hyperreal Numbers (§Epilogue) The properties of a hyperreal number system are given by ﬁve axioms. The ﬁrst three of these axioms were stated in Section 1A. Before giving a precise statement of the remaining two axioms, we describe the intuitive idea. The real and hyperreal numbers are related by a ∗ mapping such that: (1) With each relation X on R there is a corresponding relation X∗ on R∗, called the natural extension of X. (2) The relations on R have the same “elementary properties” as their natural extensions on R∗.

The diﬃculty in making (2) precise is that we must explain exactly what an “elementary property” is. The properties “X ⊆ Y ”, “X is a function”, and “X is a symmetric binary relation” are elementary. On the other hand, the Archimedean Property and the Completeness Property must not be elementary, because no proper extension of R is an Archimedean or complete ordered ﬁeld. In most expositions of the subject an “elementary property” is taken to be any sentence in ﬁrst order logic. However, it is not appropriate to begin a calculus course by explaining ﬁrst order logic to the students because they have not yet been exposed to the right sort of examples. It is better to learn calculus ﬁrst, and at some later time use the ε,δ conditions from calculus as meaningful examples of sentences in ﬁrst order logic. Fortunately, the notion of a sentence of ﬁrst order logic is not necessary at all in stating the axioms. It turns out that a simpler concept which is within the experience of beginning students is suﬃcient. This is the concept of a (ﬁnite) system of equations and inequalities. We shall see in Chapter 15 at the end of this monograph that we get equivalent sets of axioms using either the simple concept of a system of equations and inequalities or the more sophisticated concept of a sentence of ﬁrst order logic. The main objects of study in elementary calculus are partial functions of n real variables. Our plan is to have an axiom corresponding to (1) above but for partial functions instead of relations, and then an axiom corresponding to (2) above where an “elementary property” means a system of equations and inequalities in these functions. In the next few paragraphs we will explain exactly what is meant by a system of equations and inequalities. Rn denotes the set of all n-tuples of elements of R. A real function of n variables is a mapping f from a subset of Rn into R. The letters x,y,z,... are called variables. We think of them as varying over the set R of real numbers. A real number c ∈ R is also called a real constant. Expressions built up from variables and constants by applying real functions are called terms. For example,

x, c, x + c, f(x), g(c,x,f(y))

8 1. The Hyperreal Numbers

are terms. Here is a precise deﬁnition of a term. A term is an expression which can be built up using the following rules: • Every variable is a term. • Every real constant is a term. • If τ1,... ,τn are terms and f is a real function of n variables, then f(τ1,... ,τn) is a term. A term which contains no variables is called a constant term. A constant term is either undeﬁned or has value equal to some real number. In particular, f(c) is undeﬁned if c is not in the domain of f, and has value d if f(c) = d. The value of a constant term is computed step by step; thus the value of

g(f(5) + 3,f(f(2)))

is computed by ﬁrst computing f(5), then f(5) + 3, then f(2), then f(f(2)), and then g(f(5)+3,f(f(2))). If at any stage in the computation we reach an undeﬁned part, the whole constant term is considered to be undeﬁned. For example, since √−6 is undeﬁned in the reals, the constant terms √−6−√−6 and √−6·√−6 are also undeﬁned in the reals (if we were working over the complex numbers, these terms would be deﬁned and have real values 0 and −6 respectively, but in this monograph we will always be working over the reals.) By an equation we mean an expression σ = τ where σ and τ are terms. By an inequality we mean an expression of one of the forms σ ≤ τ, σ < τ, σ 6= τ where σ and τ are terms. By a formula we mean an equation or inequality between two terms, or a statement or the form “τ is deﬁned” or of the form “τ is undeﬁned”. A system of formulas is a nonempty ﬁnite set of formulas. The last notion we need is that of a solution of a system of formulas. Consider a system S of formulas whose variables are x1,... ,xn. By a real solution of S we mean an n-tuple (c1,... ,cn) of real constants such that when the xi are replaced by ci in S, each term within an equation or inequality in S is deﬁned, and each formula in S is true. The notion of a system of formulas is easily motivated in an elementary calculus course, because the ﬁrst step in solving a “story problem” is to set the problem up as a system of formulas. The statements of the forms “τ is deﬁned” and “τ is undeﬁned” are allowed for convenience, but they are not really needed because one can always use an equation to say that a term is deﬁned or is undeﬁned. To see this, let τ(x1,... ,xn) be a term with the variables x1,... ,xn. Now let g(x1,... ,xn) be the real function such that for each n-tuple of real constants (c1,... ,cn), g(c1,... ,cn) =(1, if τ(c1,... ,cn) is deﬁned, 0, if τ(c1,... ,cn) is undeﬁned.

1D. Consequences of the Transfer Axiom 9

Then τ(c1,... ,cn) is deﬁned if and only if (c1,... ,cn) is a solution of the equation g(x1,... ,xn) = 1, and τ(c1,... ,cn) is undeﬁned if and only if (c1,... ,cn) is a solution of the equation g(x1,... ,xn) = 0. The following axioms describe a hyperreal number system as a triple (∗,R,R∗), where R is called the ﬁeld of real numbers, R∗ the ﬁeld of hyperreal numbers, and ∗ the natural extension mapping.

Axiom A R is a complete ordered ﬁeld.

Axiom B R∗ is an ordered ﬁeld extension of R.

Axiom C R∗ has a positive inﬁnitesimal.

Axiom D (Function Axiom) For each real function f of n variables there is a corresponding hyperreal function f∗ of n variables, called the natural extension of f. The ﬁeld operations of R∗ are the natural extensions of the ﬁeld operations of R.

By a hyperreal solution of a system of formulas S with the variables x1,... ,xn we mean an n-tuple (c1,... ,cn) of hyperreal numbers such that all the formulas in S are true when each function is replaced by its natural extension and each xi is replaced by ci.

Axiom E (Transfer Axiom) Given two systems of formulas S,T with the same variables, if every real solution of S is a solution of T, then every hyperreal solution of S is a solution of T.

1D. Consequences of the Transfer Axiom

The Transfer Axiom E is much more powerful than it looks. We will postpone a full explanation of its scope to Chapter 15 at the end of this monograph, because this explanation requires notions from logic. In this section, will give some easy but general consequences of the Transfer Axiom which will facilitate our development of the calculus. Our ﬁrst corollary shows that the Transfer Axiom still holds if T has fewer variables than S. If T has variables x1,... ,xk, we say that a tuple (c1,... ,ck,... ,cn) contains a solution of T if (c1,... ,ck) is a solution of T.

10 1. The Hyperreal Numbers

Corollary 1.14. Given two systems of formulas S,T such that the variables in T form a subset of the variables in S. If every real solution of S contains a solution of T, then every hyperreal solution of S contains a solution of T.

Proof. Let the variables of S be x1,... ,xn. Form the system of formulas T0 by adding to T the trivial equations x1 = x1,... ,xn = xn. T0 has the same meaning as T but has the same variables as S. Every real solution of S is a solution of T0 because it contains a solution of T. By Transfer, every hyperreal solution of S is a hyperreal solution of T0, and thus contains a hyperreal solution of T. a Corollary 1.15. Any two systems of formulas with the same variables which have the same real solutions have the same hyperreal solutions. (This was called the Solution Axiom in the 1976 edition).

Proof. Suppose S and T are systems of formulas with the same real solutions. Then every real solution of S is a solution of T. By Transfer, every hyperreal solution of S is a solution of T. Similarly, every hyperreal solution of T is a solution of S. a Often a real function f is deﬁned by a rule of the form

f(x1,... ,xn) = y if and only if S where S is a system of formulas with the same variables. If a real function f is deﬁned by a rule of this kind, then by the above theorem, the natural extension f∗ is deﬁned by the same rule applied to the hyperreal numbers. For example, the square root function on the reals is deﬁned by the rule √x = y if and only if {y2 = x, 0 ≤ y}. By Transfer, the natural extension of the square root function is deﬁned by the same rule where x and y vary over the hyperreal numbers.

Corollary 1.16. (i) If a system S of formulas is true for all real numbers, it is also true for all hyperreal numbers. (ii) If a system of formulas has no real solutions, it has no hyperreal solutions.

Proof. (i) Let S have the variables x1,... ,xn and let T be the system of equation x1 = x1,... ,xn = xn. Since S is true for all real numbers, S has the same real solutions as T. By Corollary 1.15, S has the same hyperreal solutions as T, and therefore S is true for all hyperreal numbers. (ii) The proof is similar but uses the inequality xi 6= xi instead of the equation xi = xi. a Corollary 1.17. Let f be a real function of n variables and let c1,... ,cn be real constants. If f(c1,... ,cn) is deﬁned then f∗(c1,... ,cn) = f(c1,... ,cn).

1D. Consequences of the Transfer Axiom 11

Proof. Let f(c1,... ,cn) = c. The system of formulas f(c1,... ,cn) = c, x = x is true for all real numbers x. By Corollary 1.16 it is true for all hyperreal numbers. Therefore f∗(c1,... ,cn) = c. a From now on, we will usually drop the asterisk on the natural extension f∗ and write f(c1,... ,cn) for the value of f∗ at (c1,... ,cn), whether the ci’s are real or hyperreal. By Corollary 1.17, this will cause no trouble. Occasionally, we will still put in the asterisk if we wish to call attention to the fact that we are working with the natural extension rather than the original real function. The axioms in Elementary Calculus were stated in a more leisurely way that avoided the phrase “ordered ﬁeld”. Axioms A, C, D, and E are the same in Elementary Calculus as here. But instead of Axiom B, Elementary Calculus had the weaker axiom that R∗ is an extension of R and the relation <∗ satisﬁes the Transitive Law a <∗ b and b <∗ c implies a <∗ c and the Trichotomy Law, which says that forall a,b,c ∈R∗, exactly one of a <∗ b, a = b, b <∗ a

holds. We will now use the Transfer Axiom to show that the axioms in Elementary Calculus are equivalent to the present Axioms A—E. Proposition 1.18. Assume Axioms A, C, D, E, and also that R∗ with the relation <∗ and the functions +,−,·,−1 is an extension of R which satisﬁes the Trichotomy Law. Then R∗ is an ordered ﬁeld, so Axiom B holds.

Proof. First, observe that the proof of Corollary 1.16 did not use Axiom B, so it follows from the remaining axioms. By deﬁnition, an ordered ﬁeld is a structure with a relation ≤ and operations +,−,·,−1 that satisﬁes the laws stated Section 1E. Each of the laws for a ﬁeld is an equation, except for the inequality 0 6= 1. Since R is a ﬁeld by Axiom A, each of these laws is true for R. By Corollary 1.16, each of these laws is true for R∗, so R∗ is a ﬁeld. Except for the Trichotomy Law, each of the order laws is of the form “if S then T” where S,T are systems of formulas. For example, the Sum Law says that if a < b and c = c then a + c < b + c. Since R is an ordered ﬁeld, these laws are true for R. By Transfer, these laws are also true for R∗. By hypothesis, the Trichotomy Law is also true for R∗. Therefore R∗ is an ordered ﬁeld. a Hereafter, we will usually leave out the stars on the hyperreal order relations <∗,≤∗ in a system of formulas. The next theorem extends the Transfer Axiom to the case where the system of formulas T has more variables than S. It will be used frequently in this monograph, and gives a way to show that a hyperreal number with a certain property exists.

12 1. The Hyperreal Numbers

Definition 1.19. Let T be a system of formulas with variables x1,... ,xk,... ,xn. A partial real solution of T is a k-tuple (c1,... ,ck) of real constants which can be extended to a real solution (c1,... ,ck,... ,cn) of T. A partial hyperreal solution is deﬁned similarly.

Theorem 1.20. (Partial Solution Theorem) Let S be a system of formulas with the variables x1,... ,xk and T a system of formulas with the variables x1,... ,xk,... ,xn. The following are equivalent. (i) Every real solution of S is a partial real solution of T. (ii) Every real solution of S is a partial hyperreal solution of T. (iii) Every hyperreal solution of S is a partial hyperreal solution of T. Proof. We prove (i) ⇒ (iii) ⇒ (ii) ⇒ (i). To simplify notation let S have the single variable x and T have the two variables x,y. Assume (i). For each real solution x0 of S choose a real number y0 = f(x0) such that (x0,y0) is a real solution of T. Then

Every real solution of S is a solution of “f(x (1) ) is deﬁned”.

Every real solution of S ∪{y = f(x)} is a solution of T.(2) By Transfer, (1) and (2) also hold for the hyperreal numbers. Let x1 be a hyperreal solution of S. By (1), f(x1) is deﬁned; let y1 = f(x1). By (2), (x1,y1) is a hyperreal solution of T. Thus x1 is a partial hyperreal solution of T. This shows that (i) implies (iii). (iii) trivially implies (ii). Assume (ii), and let x0 be a real solution of S. Suppose x0 is not a partial real solution of T. Let T(x0,y) be the system of formulas obtained by replacing the variable x by the constant x0 in T. Then T(x0,y) has no real solutions. By Corollary 1.16, T(x0,y) has no hyperreal solutions. But then x0 is not a partial hyperreal solution of T, contradicting our assumption (ii). We conclude that x0 is a partial real solution if T, so (ii) implies (i). a In the nonstandard approach to calculus, the Completeness Property of the real numbers is seldom used directly. It is always possible, and usually easier, to use the Standard Part Principle (Theorem 1.9) instead. This is explained by the following theorem, which shows that in the presence of the other axioms, the Completeness Property can be replaced by the Standard Part Principle and the Archimedean Property. The set of natural numbers (non-negative integers) is denoted by N. An ordered ﬁeld F is said to have the Archimedean Property if every element of F is less than some natural number. Equivalently, the set N of natural numbers has no upper bound in F. Lemma 1.21. In any ordered ﬁeld F, the set N of natural numbers does not have a least upper bound.

1D. Consequences of the Transfer Axiom 13 Proof. Suppose x is an upper bound of N. Then for any y ∈ N we havey +1 ∈N, so y +1 ≤ x and hence y ≤ x−1. Therefore x−1 is also an upper bound of N. By the order laws we have x−1 < x, so x cannot be a least upper bound of N. a Corollary 1.22. Every complete ordered ﬁeld has the Archimedean Property. Proof. In a complete ordered ﬁeld, Ncannot have an upper bound because, by the preceding lemma, N does not have a least upper bound. a Corollary 1.23. The ordered ﬁeld R∗ of hyperreal numbers does not have the Archimedean Property. Proof. By Corollary 1.8, R∗ has a positive inﬁnite element H. By deﬁnition, H is an upper bound of R and N ⊆ R, so H is an upper bound of N. a Theorem 1.24. Assume Axioms B, C, D, E, the Standard Part Principle 1.9, and that R is an ordered ﬁeld with the Archimedean Property. Then R has the Completeness Property, so Axiom A holds.

Proof. First observe that Axiom A was not used in the proof of the Partial Solution Theorem, so this theorem follows from the remaining Axioms B, C, D, and E. Let X be a nonempty subset of R with an upper bound. Let f be the function f(y) =(1, if y is an upper bound of X, 0, otherwise. We will ﬁnd a real number b such that f(x) = 0 for all real x < b, and f(y) = 1 for all real y > b. Since X is nonempty and has an upper bound, there are real numbers a,c with

a < c, f(a) = 0, f(c) = 1.

Let t be any positive real number and consider the points

a,a + t,a + 2t,... ,a + nt,... . By the Archimedean Principle there is an n such that (c−a)/t < n. Then c < a+nt. Since f(c) = 1, c is an upper bound of X. Then a+nt is an upper bound of X, so f(a + nt) = 1. Therefore there is a least positive integer n0 such that f(a + n0t) = 1. Hence f(a + n0t−t) = 0, f(a + n0t) = 1. Then c is an upper bound of X but a + n0t−t is not, so a−t ≤ a + n0t−t < c and thus a ≤ a + n0t < c + t.

14 1. The Hyperreal Numbers

Taking u = a + n0t, we see that every real solution of

0 < t(3)

is a partial real solution of f(u−t) = 0, f(u) = 1, a ≤ u < c + t.(4) Let t1 be positive inﬁnitesimal. Then t1 is a hyperreal solution of (3). By the Partial Solution Theorem, t1 is a partial hyperreal solution of (4). So there is a hyperreal number u1 with f(u1 −t1) = 0, f(u1) = 1, a ≤ u1 ≤ c + t1. Then u1 is ﬁnite. By the Standard Part Principle, u1 has a standard part b = st(u1). We show that for any real x b, f(x) = 0 and f(y) = 1. Every real solution of

x < z, f(z (5) ) = 0

contains a solution of

f(x) = 0.(6)

By Corollary 1.14, every hyperreal solution of (5) contains a solution of (6). Since f(u1−t1) = 0, f(x1) = 0 for all hyperreal x1 < u1−t1. In particular, if x is real and x b. It follows that b is the least upper bound of X. a

1E. Natural Extensions of Sets

Our axioms provide a natural extension f∗ of each real function f. We now deﬁne the natural extension of a set of reals. Later on we will extend our discussion to relations on the reals. Definition 1.25. Let Y be a set of reals, Y ⊆ R, and let CY be the characteristic function of Y , deﬁned by CY (x) =(1 if x ∈ Y, 0 if x / ∈ Y. The natural extension of Y is the set Y ∗ = {x ∈R∗: CY (x) = 1}. Proposition 1.26. For any set Y of reals, the natural extension Y ∗ is the unique set of hyperreal numbers such that every system of formulas which has Y as its set of real solutions has Y ∗ as its set of hyperreal solutions.

1E. Natural Extensions of Sets 15

Proof. It is clear that there is at most one such set. Suppose a system of formulas S has Y as its set of real solutions. Then S has the same set of real solutions as the equation CY (x) = 1. By Corollary 1.15, S has the same set of hyperreal solutions as CY (x) = 1, which by deﬁnition is the set Y ∗. a Given a term τ and a set Y ⊆ R, we will sometimes write τ ∈ Y for the formula CY (τ) = 1, and τ / ∈ Y for the formula CY (τ) = 0. Examples Since [a,b] = {x ∈R: a ≤ x, x ≤ b}, we have [a,b]∗ = {x ∈R∗: a ≤ x, x ≤ b}. Since (a,b) = {x ∈R: a < x, x < b}, we have (a,b)∗ = {x ∈R∗: a < x, x < b}. Proposition 1.27. Let X,Y ⊆R. (i) X ⊆ X∗ and X∗∩R = X. (ii) The natural extension mapping preserves Boolean operations, i.e., (X ∩Y )∗ = X∗∩Y ∗, (X ∪Y )∗ = X∗∪Y ∗, (X \Y )∗ = X∗\Y ∗, X ⊆ Y if and only if X∗ ⊆ Y ∗. (iii) For any real function f of one variable, (domain(f))∗ = domain(f∗), (range(f))∗ = range(f∗).

Proof. We prove (iii). Let X be the domain of f. Then X is the set of all real solutions of the formula f(x) is deﬁned. By Proposition 1.26, X∗ is the set of all hyperreal solutions of this formula, which is the domain of f∗. Let Y be the range of f. For each y ∈ Y choose a real number x = g(y) such that f(x) = y. Then Y is the set of all real solutions of the equation f(g(y)) = y,(7) so by Proposition 1.26, Y ∗ is the set of all hyperreal solutions of (7). It follows that Y ∗ ⊆ range(f∗). Moreover, every real solution of the equation f(x) = y contains a solution of (7). By Corollary 1.14, every hyperreal solution of the equation f(x) = y contains a solution of (7). Therefore range(f∗) ⊆ Y ∗. a

16 1. The Hyperreal Numbers

We will now explain the connection between hyperreal numbers and topological properties of sets of reals. By a real neighborhood of a real point x ∈R we mean an open interval of the form (x−r,x + r) where r is a positive real number. x belongs to the interior of Y if some neighborhood of x is included in Y . An open set is a set which is equal to its interior. x belongs to the closure of Y if every neighborhood of x meets Y . A closed set is a set Y which is equal to its closure. Theorem 1.28. Let c ∈R and Y ⊆R. Y includes a neighborhood of c (i.e.,c is in the interior of Y ) if and only if Y ∗ includes the monad of c. Proof. Suppose Y contains a neighborhood (c−r,c + r) of c. Then every real solution of |c−x| < r belongs to Y . By Transfer, every hyperreal solution belongs to Y ∗, and thus the monad of c is included in Y ∗. Now suppose Y does not contain a neighborhood of c. Then each real solution of x > (8) 0 is a partial real solution of y / ∈ Y, |c−y| < x.(9) Let x1 be a positive inﬁnitesimal. Then x1 is a hyperreal solution of (8). By the Partial Solution Theorem 1.20, x1 is a partial hyperreal solution of (9). Thus there is a hyperreal y1 with y1 / ∈ Y ∗, |c−y1| < x1. Then y1 belongs to the monad of c but not to Y ∗. a Corollary 1.29. The closure of a set Y ⊆R of reals is equal to the set {st(x): x is ﬁnite and x ∈ Y ∗}. Thus Y is closed if and only if st(x) ∈ Y for all ﬁnite x ∈ Y ∗. Proof. Let Z = R\Y , so Z∗ = R∗\Y ∗. The following are equivalent: c ∈ closure of Y . c / ∈ interior of Z. The monad of c is not included in Z∗. There is an x ∈ Y ∗ such that x ≈ c. c = st(x) for some ﬁnite x ∈ Y ∗. a Corollary 1.30. A real function f is deﬁned at every point of some neighborhood of c if and only if f∗ is deﬁned at every point of the monad of c. Proof. By Theorems 1.27 (iii) and 1.28. a

1E. Natural Extensions of Sets 17 A set Y ⊆R is bounded if Y is included in some closed real interval [a,b]. Theorem 1.31. Let Y ⊆R. Then Y is bounded if and only if every elementof Y ∗ is ﬁnite. Proof. If Y is bounded, Y ⊆ [a,b], then every element y ∈ Y is a solutionof a ≤ y, y ≤ b.(10) By Transfer, every y1 ∈ Y ∗ is a solution of (10), and hence is ﬁnite. Now suppose Y is not bounded. Then either Y has no upper bound or no lower bound, say Y has no upper bound. Thus each real number x is a partial real solution of x < y, y ∈ Y.(11) Let x1 be a positive inﬁnite hyperreal number. By the Partial Solution Theorem, x1 is a partial hyperreal solution of (11), so there is a y1 such that x1 < y1,y1 ∈ Y ∗. Thus y1 is a positive inﬁnite element of Y ∗. a We now extend our discussion to relations on the reals. The plane, or real plane, is the set R2 of ordered pairs of real numbers, and the hyperreal plane is the set (R∗)2 of ordered pairs of hyperreal numbers. In n variables we have the real n-space Rn and the hyperreal n-space (R∗)n. By a real relation in n variables we mean a subset of the real n-space Rn. Definition 1.32. Let Y be a real relation in n variables. The characteristic function of Y is the function CY deﬁned by CY (x1,... ,xn) =(1 if (x1,... ,xn) ∈ Y, 0 if (x1,... ,xn) / ∈ Y. The natural extension of Y is the hyperreal relation Y ∗ = {x ∈ (R∗)n: CY (x) = 1}. Proposition 1.26 has the following analogue for relations.

Proposition 1.33. Let Y be a real relation in n variables. The natural extension Y ∗ of Y is the unique subset Y ∗ ⊆ (R∗)n such that every system of formulas which has Y as its set of real solutions has Y ∗ as its set of hyperreal solutions.

Definition 1.34. The distance between two points x = (x1,... ,xn), y = (y1,... ,yn) of Rn or (R∗)n is deﬁned by |x−y| =q(x1 −y1)2 +···+ (xn −yn)2.A real neighborhood of a point x ∈Rn is a set of the form Nr(x) = {y ∈Rn: |x−y| < r}

18 1. The Hyperreal Numbers where 0 < r ∈ R. A point y is inﬁnitely close to x, in symbols y ≈ x, if |x−y| is inﬁnitesimal. The monad of x is deﬁned as the set monad(x) = {y ∈ (R∗)n: x ≈ y}. x is ﬁnite if each xi is ﬁnite.

With the above deﬁnitions, all the results in this section hold for subsets of the real and hyperreal n-spaces.

1F. Appendix. Algebra of the Real Numbers 19

1F. Appendix. Algebra of the Real Numbers

This appendix is a summary of some basic notions from algebra which leads up to the characterization of the real number system (Theorem 1.38). The details, and the proofs of the theorems stated in this appendix, can be found in most undergraduate texts on modern algebra. Following the normal mathematical practice, we work within Zermelo-Fraenkel set theory. (Commutative) Ring: A ring is a structure (R,0,+,−,·) such that 0 ∈ R,+ ,· are binary functions on R, − is a unary function on R, and the following laws hold for all a,b,c ∈ R. Commutative Laws a + b = b + a, a·b = b·a Associative Laws a + (b + c) = (a + b) + c, a·(b·c) = (a·b)·c Identity Law a + 0 = a Inverse Law a + (−a) = 0 Distributive Law a·(b + c) = a·b + a·c We write a−b for a + (−b). Subring: A subring of a ring R is a subset S of R which contains 0 and is closed under +,−,·. It follows that S itself is a ring. Ideal: An ideal in a ring R is a subring I of R such that whenever a ∈ Iand r ∈ R, a·r ∈ I. Coset: Given a ring R and a subring S, the coset of an element r ∈ R modulo S is the set coset(r) = {r + s: s ∈ S}. Equivalence Relation: An equivalence relation on a set A is a binary relation ≡ on A such that for all a,b,c ∈ A, Reﬂexive Law a ≡ a Symmetry Law a ≡ b implies b ≡ a Transitive Law a ≡ b and b ≡ c implies a ≡ c Proposition 1.35. If S is a subring of a ring R, then the relation a−b ∈ S is an equivalence relation on R. Moreover, If a−b ∈ S then coset(a) = coset(b), If a−b / ∈ S then coset(a)∩coset(b) = ∅. Homomorphism: A homomorphism from a ring R into a ring S is a function h: R → S such that for all a,b ∈ R, h(0) = 0, h(a + b) = h(a) + h(b), h(−a) = −h(a), h(a·b) = h(a)·h(b). Isomorphism: An isomorphism from a ring R to a ring S is a homomorphism h: R → S such that h maps R one to one onto S.

20 1. The Hyperreal Numbers Field: A ﬁeld is a structure (F,0,1,+,−,·,−1 ) such that (F,0,+,−,·) is a ring, 1 ∈ F, −1 is a function from F \{0} into F, and the following laws hold for all a 6= 0 in F: Nontriviality 1 6= 0 Identity Law 1·a = a Inverse Law a·(a−1) = 1

Subfield: A subﬁeld of a ﬁeld F is a subset G of F which contains 0,1 and is closed under the functions +,−,·,−1. It follows that G itself is a ﬁeld.

Field Extension: Given a ﬁeld G, a ﬁeld extension of G is a ﬁeld F such that G is a subﬁeld of F. A proper ﬁeld extension of G is a ﬁeld extension F such that and F 6= G.

Ordered Field: An ordered ﬁeld is a ﬁeld F with a binary relation < such that the following laws hold for all a,b,c ∈ F. Transitive Law If a < b and b < c then a < c. Trichotomy Law Exactly one of the relations a < b,a = b,b < a hold. Sum Law If a < b and c = c then a + c < b + c. Product Law If a < b and 0 < c then a·c < b·c.

Examples Z, the ring of integers. Q, the ordered ﬁeld of rational numbers. R, the ordered ﬁeld of real numbers. C, the ﬁeld of complex numbers. GF(2), the ﬁeld with exactly two elements 0,1, where 1 + 1 = 0. Z is a subring of Q, Q is a subﬁeld of every ordered ﬁeld (including R), and R is a subﬁeld of C. The set of even integers is an ideal in Z. The mapping h: Z→ GF(2) given by h(n) =(0 if n is even 1 if n is odd is a homomorphism of Z onto GF(2). As usual, ab means a·b, a/b means a·(b−1), and a ≤ b means that either a < b or a = b. The absolute value of a is deﬁned by |a| =(a if 0 ≤ a −a if a < 0

1F. Appendix. Algebra of the Real Numbers 21

Proposition 1.36. The following algebraic rules hold in every ordered ﬁeld. a·(−b) = (−a)·b = −(a·b) a·0 = 0, 0 < 1 −(−a) = a, (a−1)−1 = a if a 6= 0 −(a−b) = b−a, (a/b)−1 = b/a if a,b 6= 0 |−a| = |a|, |a·b| = |a|·|b| If a < b then −b < −a. If 0 < a < b then 0 < b−1 < a−1. Proposition 1.37. In an ordered ﬁeld, if b,d 6= 0 then a b + c d = a·d + b·c b·d , a b · c d = a·c b·d Complete Ordered Field: An ordered ﬁeld F is complete if every nonempty subset X ⊆ F which has an upper bound in F has a least upper bound in F.

Order Isomorphic: Two ordered ﬁelds F,G are order isomorphic if there is an isomorphism h from F onto G such that for any a,b ∈ F, a < b if and only if h(a) < h(b).

Theorem 1.38. There is a complete ordered ﬁeld, and any two complete ordered ﬁelds are order isomorphic.

This is an important theorem which shows that there is exactly one complete ordered ﬁeld up to order isomorphism. The complete ordered ﬁeld is called the ﬁeld R of real numbers. The theorem has two parts, existence and uniqueness. The uniqueness part, that any two complete ordered ﬁelds are order isomorphic, is easy to prove. Given two complete ordered ﬁelds F and G, one may assume that F and G have the same subﬁelds of rational numbers. Then the order isomorphism is the mapping h such that h(x) = y if and only if for every rational number q, q < x if and only if q < y. It also follows that this isomorphism is unique. There are several well-known ways to prove the existence part, that there exists a complete ordered ﬁeld. Each of these proofs shows somewhat more; it gives what is called a deﬁnable complete ordered ﬁeld. For beginning calculus students, this is usually done informally in pre-calculus courses, where the real numbers are constructed by taking a positive real number to be a natural number followed by a decimal point and an inﬁnite sequence of decimal digits which does not end in a sequence of 9’s. In more advanced courses this is done more carefully in other ways, such as constructing the real numbers as the set of equivalence classes of Cauchy sequences of rationals. We now turn to the natural numbers. The existence of the set of natural numbers, and the Principle of Induction, are part of the underlying set theory. We identify natural numbers with elements of an ordered ﬁeld in the usual way.

22 1. The Hyperreal Numbers Natural Number: The set N = {0,1,2,...} of natural numbers in an ordered ﬁeld F is the smallest subset X of F such that 0 ∈ X, and x ∈ X implies x + 1 ∈ X. Thus N is the set of all elements of F formed by adding 1 to itself zero or more times. Principle of Induction: If 0 ∈ Y , and n ∈ Y implies n + 1 ∈ Y , thenN ⊆ Y . Corollary 1.39. Every nonempty subset of N has a least element.

Archimedean Property: An ordered ﬁeld F has the Archimedean Property if every element x ∈ F is less than some natural number n ∈N. We saw in Section 1D that R has the Archimedean Property but R∗ does not. The next theorem generalizes these facts.

Theorem 1.40. An ordered ﬁeld G has the Archimedean property if and only if it is order isomorphic to a subﬁeld of R.

Since we often refer to intervals in the real line, we review the deﬁnition and notation here. Interval: A real interval is a set I of real numbers such that if a,b ∈ I and a < c < b, then c ∈ I.

It follows from the Completeness Property that every real interval is of one of the types below: Bounded closed intervals [a,b] = {x: a ≤ x ≤ b}. Bounded open intervals (a,b) = {x: a < x < b}. Bounded half-open intervals

[a,b), (a,b].

Unbounded open intervals (a,∞) = {x: a < x}, (−∞,b) = {x: x < b}, (−∞,∞) = R. Unbounded half-open intervals [a,∞), (−∞,b].

1G. Building the Hyperreal Numbers 23

1G. Building the Hyperreal Numbers

This is an optional section for the reader who wants to see where the hyperreal numbers come from. It will not be needed in the body of the monograph, until Chapter 15 at the end (which is also optional). The existence and uniqueness theorem for complete ordered ﬁelds, Theorem 1.38, has an analogue for the hyperreal number systems. Moreover, as discovered by Kanovei and Shelah [KS 2004], there are deﬁnable hyperreal number systems, just as there are deﬁnable complete ordered ﬁelds. In practice, this does not matter for the calculus course, where one just uses the axioms, but it is important for the foundations of the subject. In this section we will ﬁrst build a hyperreal number system the easy way, using what is called an ultrapower. This does not give a deﬁnable object, because it depends on an arbitrary choice of an ultraﬁlter. We will then describe a more elaborate method, the iterated ultrapower, which does give a deﬁnable hyperreal number system. The uniqueness theorem for hyperreal number systems will be postponed until the last chapter, Chapter 15. The reason for this is that to have uniqueness one needs one more axiom in addition to Axioms A–E for the hyperreal numbers, called the Saturation Axiom. Saturation has an appeal similar to completeness, and is important in more advanced applications of hyperreal numbers. But the Saturation Axiom is not needed at the beginning calculus level, and so it was not included in our present list. For this reason we say “a hyperreal number system” rather than “the hyperreal system”.

The Ultrapower. We will now build a hyperreal number system as an ultrapower of the real number system. This will prove that there exists a triple (∗,R,R∗) which satisﬁes Axioms A—E. We will then be able to conclude that any statement about the real numbers which follows from the axioms is true of the real numbers. The hyperreal numbers can be regarded as a tool which facilitate the study of the real numbers. Historically, ultrapowers were ﬁrst applied to the natural numbers by Skolem [Skolem 1934]. Hewitt [Hewitt 1948] studied ultrapowers of the real number ﬁeld, and the ultrapower was applied to arbitrary structures by L o´s in [L o´s 1955]. Since then ultrapowers have had a variety of applications in several areas of mathematics. We will use a form of the Axiom of Choice called Zorn’s Lemma. A nonempty set X of sets is called a chain if for any two sets x,y ∈ X, either x ⊆ y or y ⊆ x.

Zorn’s Lemma. Let Y be a nonempty set of sets such that for any chain X ⊆ Y , the union of X belongs to Y . Then Y has a maximal element y, that is, a set y ∈ Y such that no member of Y properly contains y.

24 1. The Hyperreal Numbers

We begin with the deﬁnition of an ultraﬁlter over an inﬁnite set I. We call I the index set.

Definition 1.41. A ﬁlter U over I is a set of subsets of I such that: (i) U is closed under supersets; if X ∈ U and X ⊆ Y ⊆ I then Y ∈ U. (ii) U is closed under ﬁnite intersections; if X ∈ U and Y ∈ U then X∩Y ∈U . (iii) I ∈ U but ∅ / ∈ U. An ultraﬁlter over I is a ﬁlter U over I with the additional property that for each X ⊆ I, exactly one of the sets X,I \X belongs to U. A free ultraﬁlter is an ultraﬁlter U such that no ﬁnite set belongs to U.

Theorem 1.42. For every inﬁnite set I, there exists a free ultraﬁlter over

Proof. The set of all coﬁnite (complements of ﬁnite) subsets of I is a ﬁlter over I (called the Fr`echet ﬁlter). Let A be the set of all ﬁlters F over I such that F contains all coﬁnite subsets of I. Then A is nonempty and A is closed under unions of chains. By Zorn’s Lemma, A has a maximal element U (in fact, inﬁnitely many maximal elements). U is a ﬁlter and contains no ﬁnite set, because U contains all coﬁnite sets but ∅ / ∈ U. To show that U an ultraﬁlter,we consider an arbitrary set X ⊆ I and prove that there is a ﬁlter V ⊇ Uwhich contains either X or I \X, so by maximality, X ∈ U or I \X ∈ U.Case 1: For all Y ∈ U, X ∩Y is inﬁnite. X and each Y ∈ U belong to theset V = {Z ⊆ I: Z ⊇ X ∩Y for some Y ∈ U}. V is a ﬁlter over I, because V is obviously closed under supersets and ﬁnite intersections, and the hypothesis of Case 1 guarantees that each Z ∈ V is inﬁnite. Case 2: For some Y ∈ U, X∩Y is ﬁnite. Then for every W ∈ U, (I\X)∩W is inﬁnite, for otherwise Y ∩W ∈ U would be ﬁnite. Case 1 applies to I \X, so the set V = {Z ⊆ I: Z ⊇ (I \X)∩Y for some Y ∈ U} is a ﬁlter over I such that V ⊆ U, I \X ∈ V . a Hereafter we let U be a free ultraﬁlter over I. Let RI be the set of all functions a: I → R. The elements a ∈ RI will be called I-sequences, and we write ai for the value of a at an element i ∈ I.. Definition 1.43. Two I-sequences a,b in RI are said to be U-equivalent, in symbols a =U b, if {i: ai = bi}∈ U. Lemma 1.44. The relation =U is an equivalence relation on the set RI.

1G. Building the Hyperreal Numbers 25

Proof. The Reﬂexive and Symmetric Laws for =U are obvious. We prove the Transitive Law. Assume a =U b and b =U c. Let X = {i: ai = bi}, Y = {i: bi = ci}, Z = {i: ai = ci}. Then X ∈ U and Y ∈ U, so X∩Y ∈ U. But X∩Y ⊆ Z, so Z ∈ U, and hence a =U c. a Our next step is to deﬁne the ultrapowerQU R, which will be the set R∗ of hyperreal numbers built from U. The idea is to take the set of all U-equivalence classes of I-sequences and modify it by replacing the U-equivalence class of a constant I-sequence by the constant itself. This makesQU R an extension of R. Definition 1.45. Let a be an I-sequence. If a is U-equivalent to a constant I-sequence hr,r,...i where r ∈R, we deﬁne aU = r. Otherwise, we deﬁne aU to be the U-equivalence class of a, aU = {b: a =U b}. The ultrapower of the set R modulo U is the set Y U R = {aU : a ∈RI}. Lemma 1.46. (i) R⊆QU R. (ii) a =U b if and only if aU = bU. Proof. (i) We have hr,r,...iU = r ∈R∗ for each r ∈R. (ii) This follows from the fact that =U is an equivalence relation. a Definition 1.47. The natural extension of the order relation < is the relation <∗ onQU R such that for all a,b ∈RI, aU <∗ bU if and only if {i: ai < bi}∈ U. Lemma 1.48. The relation <∗ is well-deﬁned, that is, if a =U c and b =U d then {i: ai < bi}∈ U iﬀ {i: ci < di}∈ U. Proof. Suppose {i: ai < bi}∈ U. Then {i: ci < di}⊇{i: ai < bi}∩{i: ai = ci}∩{i: bi = di}. The right side belongs to U, so the left side belongs to U, as required. a Definition 1.49. An element x ∈QU R is positive inﬁnite if n <∗ x for every natural number n. Definition 1.50. A set I is countable if there is a one to one function a from I onto N. Lemma 1.51. Suppose the index set I is countable. Then there are positive inﬁnite elements inQU R.

26 1. The Hyperreal Numbers Proof. Let a be a one to one function from I onto N. Then aU ∈QU R. However, for each n ∈ N, the set {i: n < ai} is coﬁnite hence belongs to U. Therefore n <∗ aU, so aU is positive inﬁnite. a Definition 1.52. We use vector notation for n-tuples in the obvious way. Let f be a real function of n variables. The natural extension of f is the function f∗ of n variables onQU R such that whenever ai,... ,an,c ∈RI, we have f∗(~aU) = cU if and only if {i: f(~a)i = ci}∈ U. Lemma 1.53. For each real function f of n variables, the natural extension f∗ is well-deﬁned. That is, whenever ~a =U ~ b and c =U d we have {i: f(~a)i = ci}∈ U if and only if {i: f(~ b)i = di}∈ U. The proof is similar to that of Lemma 1.48.

Definition 1.54. The hyperreal number system built from the ultraﬁlter U is the structure (∗,R,R∗) where R∗ =QU R, <∗ is the natural extension of <, and f∗ is the natural extension of f for each real function f. Theorem 1.55. For each free ultraﬁlter U on a countable index set I, the hyperreal number system (∗,R,R∗) built from U satisﬁes Axioms A–E. Proof. Axiom A, that R is a complete ordered ﬁeld, is satisﬁed by deﬁnition. Axiom D, the Function Axiom, follows from Lemmas 1.48 and 1.53, which show that <∗ and f∗ are well-deﬁned. We now prove the Transfer Axiom E. From the deﬁnition of the natural extensions <∗ and f∗, a tuple ~x = ~aU of hyperreal numbers is a solution of an equation or inequality S if and only if {i: ~ai is a solution of S}∈ U. Since X ∩Y ∈ U if and only if X ∈ U and Y ∈ U, it follows that this also holds for each ﬁnite system of formulas S. Suppose every real solution of a system of formulas S is a solution of T, and let ~x = ~aU be a hyperreal solution of S. Then {i: ~ai is a solution of S}⊆{i: ~ai is a solution of T}∈ U, so ~x is a solution of T. This proves the Transfer Axiom. Axiom B says that R∗ is an ordered ﬁeld extension of R. By deﬁnition, the set R∗ is an extension of the set R, and it follows from I ∈ U that <∗ is an extension of < and that for each real function f, f∗ is an extension of f. Each of the ordered ﬁeld axioms except for the Trichotomy Law is a statement saying that every real solution of some system of formulas S is a solution of T, and hence follows from the Transfer Axiom. For the Trichotomy Law, let

1G. Building the Hyperreal Numbers 27

x = aU and y = bU. It is easy to see from the deﬁnition of an ultraﬁlter that exactly one of the sets {i: ai < bi}, {i: ai = bi}, {i: bi < ai} belongs to U. Therefore exactly one of x <∗ y, x = y, y <∗ x

holds, as required. Finally Axiom C, that R∗ has a positive inﬁnitesimal, follows from Lemma 1.51 and the fact that in a hyperreal ordered ﬁeld reciprocals of positive inﬁnite elements are positive inﬁnitesimals. a

Examples of Inﬁnitesimals. When we build a hyperreal number system as an ultrapower, we can take the index set I to be any countable set. Let us now take I to be the set of natural numbers N. The elements of R∗\R are now U-equivalence classes of sequences of reals, and we can give explicit examples of sequences of reals whose equivalence classes are hyperreal numbers with various properties.

Positive infinitesimals: 1, 1 2

1 3

1 4

,... ,

1 n + 1

,...U

1, 1 2

1 4

1 8

,... ,2−n,...U

Infinite hyperintegers: h1,2,3,4,... ,n,...iU h1,2,6,24,... ,n!,...iU Elements of the monad of π: 3,3.1,3.14,... , [10nπ] 10n ...U π−1,π− 1 2 ,π− 1 3 ,... ,π− 1 n ,... ,U

The next theorem veriﬁes these assertions. Theorem 1.56. Let ha1,a2,a3,...i be a sequence of real numbers and letr ∈R (i) ha1,a2,a3,...iU ≈ r for every free ultraﬁlter U over N if and only iflim n→∞an = r. (ii) ha1,a2,a3,...iU is positive inﬁnite for every free ultraﬁlter U over N if and only if limn→∞an = ∞.

28 1. The Hyperreal Numbers

Proof. We prove (i). First assume that limn→∞an = r. Then for each positive real ε, we have |an−r| < ε for all but ﬁnitely many n ∈N, and hence {n ∈N: |an −r| < ε}∈ U. Then |aU −∗ r|∗ <∗ ε. Since ε is arbitrary, we have aU ≈ r. Now suppose it is not the case that limn→∞an = r. By the ε,δ condition 5.9 there is a real ε > 0 such that the set X = {n ∈N: |an −r|≥ ε} is inﬁnite. Using the proof of Theorem 1.42 one can show that there is a free ultraﬁlter U over N such that X ∈ U. Then |aU −∗ r|∗ ≥∗ ε, so aU 6≈ r. a Here are some examples of sequences a such that the behavior of aU in R∗ depends on the ultraﬁlter U. The equivalence class h1,−1,1,−1,... ,(−1)n,...iU is equal to 1 if {n: n is even}∈ U, and is equal to −1 if {n: n is odd}∈ U. The equivalence class 1, 1 2 ,3,1, 1 5 ,6,1, 1 7 ,8,...U is either one, inﬁnitesimal, or inﬁnite, depending on which congruence class modulo 3 belongs to U. For each r ∈ [−1,1] there is a free ultraﬁlter U over N such that hsin0,sin1,sin2,... ,sinn,...iU ≈ r. The following theorem can be used to verify these examples. Its proof is similar to the proof of Theorem 1.56. Theorem 1.57. Let a: N→R and r ∈R. (i) ha0,a1,a2,...iU ≈ r for some free ultraﬁlter U over N if and only if a has a subsequence converging to r. (ii) ha0,a1,a2,...iU is positive inﬁnite for some free ultraﬁlter U over N if and only if a has a subsequence diverging to ∞.

A Deﬁnable Hyperreal Number System. The ultrapower of the real number system produces a hyperreal number system (∗,R,R∗) which satisﬁes Axioms A–E, but which depends on a free ultraﬁlter U over a countable index set I.

We will next show how to modify the ultrapower to get a hyperreal number system which satisﬁes Axioms A–E and is deﬁnable in set theory. Our purpose here is only to explain why there is a deﬁnable hyperreal system. The details will not be needed in the rest of this monograph. For this reason, we will skip the proofs of some lemmas along the way.

1G. Building the Hyperreal Numbers 29

First, we give some comments about the notion of being deﬁnable. In set theory, we say that a set X is deﬁnable by a ﬁrst order formula θ(v) if we can prove that X is the unique set such that θ(X) holds. For example, the sets N of natural numbers, P(N) of sets of natural numbers, and R of equivalence classes of Cauchy sequences of rationals, are deﬁnable. A deﬁnable structure, such as an ordered ﬁeld or a hyperreal number system, can be thought of as a structure that be described explicitly by a formula. By contrast, the ultrapower construction gives us a nonempty class of isomorphic structures, each depending on an ultraﬁlter U. We remark that deﬁnable sets can have elements which are not deﬁnable. In fact, there must be real numbers r ∈R and sets of natural numbers X ∈P(N) which are not deﬁnable, because there are only countably many statements in the language of set theory, but the sets R and P(N) have uncountably many elements. It should be no surprise that something similar happens when we build a deﬁnable hyperreal number system. The set of all free ultraﬁlters over the set N is deﬁnable, but some (and possibly every) free ultraﬁlter over N is not deﬁnable. We are going to build a big but deﬁnable hyperreal number system from the set of all free ultraﬁlters over N. The idea will be to amalgamate all ultrapowers of R with index set N together into one large structure. The starting point is a product operation on ﬁnite sequences of ultraﬁlters, which can be used to amalgamate ﬁnitely many ultrapowers. Let U,V be ultraﬁlters over index sets I,J. The product U ⊗V is the set U ⊗V = {Z ⊆ I ×J : {j: {i: hi,ji∈ Z}∈ U}∈ V}. Warning: in general, U ⊗V will be diﬀerent from V ⊗U. The ﬁnite product U1 ⊗···⊗Un is deﬁned inductively by U1 ⊗···⊗Un = (U1 ⊗···⊗Un−1)⊗Un. Lemma 1.58. Given free ultraﬁlters U1,... ,Un over index sets I1,... ,In, the product U1 ⊗···⊗Un is a free ultraﬁlter over I1 ×···×In. The next ingredient is a deﬁnable function which maps a linearly ordered set onto the set of all free ultraﬁlters over N. This is the key idea that was introduced in the paper of Kanovei and Shelah [KS 2004]. To get this function we need ordinal numbers in the sense of Von Neumann. These are deﬁned in such a way that each ordinal number is equal to the set of all smaller ordinals numbers. Let c be the least ordinal number whose cardinality is the continuum 2ℵ0, that is, the least ordinal number which can be mapped onto the set P(N). We deﬁne A to be the set of all functions a: c →P(N) such that range(a) is a free ultraﬁlter Ua over N. The set A is nonempty because, by Theorem 1.42, free ultraﬁlters over N exist.

30 1. The Hyperreal Numbers For X,Y ∈P(N), we write X <p Y if and only ifPn∈X 3−n <Pn∈Y 3−n. We deﬁne the lexicographic order <A on A as follows. For a,b ∈ A, a <A b if and only if a(α) <p b(α) where α is the least ordinal such that a(α) 6= b(α). Lemma 1.59. <p is a linear ordering of P(N), <A is a linear ordering ofA , and {Ua: a ∈ A} is the set of all free ultraﬁlters over N. We now build a whole deﬁnable family of hyperreal number systems, one for each nonempty ﬁnite subset σ of A. Arrange σ in increasing order, σ = {a1 <A ... <A an}. Let Uσ = Ua1 ⊗···⊗Uan be the product of the corresponding ultraﬁlters, and let R(σ) be the ultrapower of R modulo Uσ. By Theorem 1.55, the ultrapower yields a hyperreal number system ((σ),R,R(σ)) with an order relation <(σ) and a natural extension f(σ) for each real function f. We also put R(∅) = R where ∅ is the empty set. For each pair of ﬁnite subsets σ ⊆ τ of A, there is a natural embeddingh στ : R(σ) →R(τ). To illustrate, we give the deﬁnition of hστ in the case thatσ = {a1 <A a3} and τ = {a1 <A a2 <A a3}. Uσ is an ultraﬁlter over N×N and Uτ is an ultraﬁlter over N×N×N. Given an element xUσ ∈ R(σ) where x: N×N→R, hστ(xUσ) is the element yUτ ∈R(τ) such that y(n1,n2,n3) = x(n1,n3). In particular, hσσ is the identity map on R(σ).

Lemma 1.60. Let τ be a ﬁnite subset of A. (i) If σ ⊆ ρ ⊆ τ then hστ is the composition hστ(x) = hρτ(hσρ(x)). (ii) For each x ∈ R(τ), there is a unique smallest subset σ ⊆ τ such that x is in the range of hστ. (iii) If σ ⊆ τ then for each system of formulas S and tuple ~x in R(σ), ~x is a solution of S in R(σ) if and only if hστ(~x) is a solution of S in R(τ).

The proofs of Lemmas 1.58 and 1.60 can be found in Section 6.5 of the book [CK 1990]. With the above lemmas, we can now amalgamate the hyperreal ﬁelds R(σ) into one large hyperreal ﬁeld R•. The intuitive idea is to identify each element x ∈ R(σ) with its image hστ(x) ∈ R(τ), and then take R• to be the union of the sets R(σ). Formally, we can carry out this idea by introducing, for each ﬁnite σ ⊆ A and x ∈R(σ), a new object called the thread of x, deﬁned by hσ(x) = {(ρ,y): ρ ⊆ σ, hρσ(y) = x}∪{(τ,y): σ ⊆ τ, hστ(x) = y}. One can easily check that Lemma 1.61. If τ is a ﬁnite subset of A and σ ⊆ τ, then hσ is the composition hσ(x) = hτ(hστ(x)).

1G. Building the Hyperreal Numbers 31

We then deﬁne R• to be the set of all threads, R• = {hσ(x): σ ⊆ A and x ∈R(σ)}, and deﬁne the natural extensions <• and f• in the obvious way on R•. This gives the desired result. Theorem 1.62. The hyperreal structure (•,R,R•) is deﬁnable and satisﬁes Axioms A–E.

CHAPTER 2

DIFFERENTIATION

Permanent Assumption Throughout this monograph it will be understood that f,g,... denote real functions. We will use the hyperreal numbers as a tool to deﬁne the notions of limit, derivative, continuity, and integral for real functions. Section 2B contains a rigorous treatment of inﬁnitesimal microscopes and telescopes suggested by Keith Stroyan [Stroyan 1997]. The remaining material in this chapter is also given in Elementary Calculus but is repeated here to make this monograph complete.

2A. Derivatives (§2.1, §2.2) Definition 2.1. A real number S is said to be the slope of a real function f at a real point a if S = stf(a +4x)−f(a) 4x for every nonzero inﬁnitesimal 4x. The derivative of a real function f is the real function f0 such that: f0(x) = slope of f at x if it exists, f0(x) is undeﬁned otherwise.

We will show in Chapter 5 that this deﬁnition is equivalent to the standard deﬁnition of derivative. We say that f is diﬀerentiable at a if the slope of f at a exists. Here are some easy consequences of the deﬁnition.

Corollary 2.2. f is diﬀerentiable at a real number a if and only if (a) f(x) is deﬁned for all x ≈ a, and (b) The quotient (f(a +4x)−f(a))/4x is ﬁnite and has the same standard part for all nonzero 4x ≈ 0. Corollary 2.3. If f is diﬀerentiable at a real point a, then f(x) is deﬁned for all real x in some neighborhood of a. Proof. By part (a) above and Corollary 1.30. a

34 2. Differentiation

To take full advantage of the Leibniz notation we use independent and dependent variables. We ﬁrst make these notions precise. If we are given a system of formulas which has the same solution set as the simple equation y = f(x), we say that y is a function of x, or that y depends on x, and we call x the independent variable and y the dependent variable. There can also be more than one independent variable. When y depends on x, we introduce one new independent variable 4x and two new dependent variables 4y and dy. 4y is called the increment of y and its dependence on x and 4x is given by the equation 4y = f(x +4x)−f(x). Thus when f0(x) exits, its value is f0(x) = st4y 4x. We call dy the diﬀerential of y, and its dependence on x and 4x is given by dy = f0(x)4x, with the understanding that dy exists only when f0(x) exists. As usual we write dx = 4x, so we have the familiar equations dy = f0(x)dx, f0(x) = dy dx . Geometrically, we deﬁne the tangent line to the curve y = f(x) at a real point (a,b) on the curve to be the line through (a,b) with slope f0(a). Thus 4y = change in y along curve, dy = change in y along tangent line. Usually we will be interested in the case where 4x is inﬁnitesimal. Remember that the increment 4y and diﬀerential dy are dependent variables which depend on x and 4x. The next theorem shows the relationship between dy and 4y when 4x ≈ 0. Definition 2.4. Let 4x be a nonzero inﬁnitesimal. We say that u and vare inﬁnitely close compared to 4x, in symbols u ≈ v (compared to 4x), if u 4x = v 4x . The relation ≈ (compared to 4x) is obviously an equivalence relation on the hyperreal numbers. It is closely related to the classical large and small oh notation. In the present hyperreal setting, for each nonzero inﬁnitesimal 4x, we can deﬁne O(4x) = {u: u/4x is ﬁnite},

2B. Infinitesimal Microscopes and Infinite Telescopes 35 o(4x) = {u: u/4x is inﬁnitesimal}. Both O(4x) and o(4x) are ideals in galaxy(0), and o(4x) ⊆ O(4x) ⊆ monad(0). One can see from the deﬁnitions that u ≈ v (compared to 4x) iﬀ u−v ∈ o(4x). Theorem 2.5. (Increment Theorem) Suppose x is real, y = f(x), f0(x) exists, and 4x is a nonzero inﬁnitesimal. Then 4y = f0(x)4x + ε4x = dy + ε4x for some inﬁnitesimal ε. In other words, 4y ≈ dy (compared to 4x). Proof. Take ε− 4y 4x −f0(x). Then ε ≈ 0. Multiplying by 4x, ε4x = 4y−f0(x)4x, 4y = f0(x)4x + ε4x. a

2B. Inﬁnitesimal Microscopes and Inﬁnite Telescopes

In Elementary Calculus we frequently used the pictorial devices of inﬁnitesimal microscopes and inﬁnite telescopes to illustrate deﬁnitions and theorems about hyperreal numbers. The following precise deﬁnition was suggested by Keith Stroyan [Stroyan 1997] and may help the reader to consistently apply the device in new situations. Given a point P(a,b) in the hyperreal plane and a positive hyperreal number ε, the ε-disc around P is deﬁned as the set of all hyperreal points (x,y) at distance at most ε from P, that is, (x−a)2 + (y−b)2 ≤ ε2. Definition 2.6. Let P(a,b) be a point in the hyperreal plane and δ be a positive inﬁnitesimal. The δ-inﬁnitesimal microscope aimed at P is the mapping M from the 2δ-disc around P onto the 2-disc around the origin given by the formula M(a + δx,b + δy) = (x,y) where x2 + y2 ≤ 4.

36 2. Differentiation

Thus M maps (a,b) to (0,0), magniﬁes distances by 1/δ, and preserves directions. The 2δ-disc around P is called the ﬁeld of view of the microscope. A drawing of a δ-inﬁnitesimal microscope will distinguish two points if and only if the distance between them is not inﬁnitesimal compared to δ. Thus a point (x,y) is inﬁnitely close to (a,b) if and only if it is in the ﬁeld of view of some inﬁnitesimal microscope aimed at (a,b). As an example we discuss the use of inﬁnitesimal microscopes in giving a picture of the slope of a function. By the Increment Theorem 2.5, for each real point a we have f0(a) = S if and only if for every nonzero inﬁnitesimal 4x, the curve at a+4x is inﬁnitely close to the tangent line at a +4x compared to 4x, that is, f(a +4x) = f(a) + S4x (compared to 4x). It follows that f0(a) = S at a real point a if and only if the curve y = f(x) looks like a straight line with slope S in the ﬁeld of view of any inﬁnitesimal microscope aimed at the point (a,f(a)). This is illustrated in Figure 2a. One would also like to use inﬁnitesimal microscopes to illustrate the difference between the tangent line and the curve when the slope exists. In Elementary Calculus we simply used artistic license and drew a picture like Figure 2b with the curvature exaggerated. A more sophisticated but accurate picture can be drawn by using a more powerful (4x)2-inﬁnitesimal microscope within a (4x)-inﬁnitesimal microscope, as in Figure 2c. The curve and tangent line are indistinguishable in the (4x)-inﬁnitesimal microscope but can usually be distinguished in the (4x)2-inﬁnitesimal microscope aimed at the hyperreal point (a +4x,f(a +4x)).

2B. Infinitesimal Microscopes and Infinite Telescopes 37

r( x,y)

Figure 2a

r( x,y)

Figure 2b

r( x,y)

Figure 2c

((((((((

r r

((((((((

4x dy4 y

&% '$ r r

4y dy

38 2. Differentiation

We now turn to the notion of an inﬁnite telescope.

Definition 2.7. Let (a,b) be a point in the hyperreal plane which is inﬁnitely far from the origin, that is, a2 +b2 is inﬁnite. By the inﬁnite telescope aimed at (a,b) we mean the mapping T from the 2-disc around (a,b) onto the 2-disc around the origin given by

T(a + x,b + y) = (x,y).

Thus T simply translates the 2-disc around (a,b) to the 2-disc around the origin, and preserves distances and directions. In Elementary Calculus, limits of the form lim x→∞ f(x) = L are illustrated with an inﬁnitesimal microscope within an inﬁnite telescope aimed at a point (H,L) where H is positive inﬁnite. See Chapter 5 of this monograph for the hyperreal deﬁnition of inﬁnite limits.

2C. Properties of Derivatives (§2.3, §2.4)

The familiar rules for derivatives can be obtained quite easily from the rules for standard parts in Section 1B. Given a term τ, we write dτ for dy, and dτ/dx for dy/dx, where y is the dependent variable given by the equation y = τ. This often saves space. For example, we can write d(u+v) without introducing a new variable y = u+v.

Theorem 2.8. Suppose u and v depend on the independent variable x. Then for any real value of x where du/dx and dv/dx exist, we have (i) (Sum Rule) d(u + v) dx = du dx + dv dx . (ii) (Constant Rule) For any real number c,

d(cu) dx

= c

du dx

(iii) (Product Rule)

d(uv) dx

= u

dv dx

+ v

du dx

(iv) (Quotient Rule) If v 6= 0, d(u/v) dx =

v(du/dx)−u(dv/dx) v2

Proof. For each part we let 4x be a nonzero inﬁnitesimal.

2C. Properties of Derivatives (§2.3, §2.4) 39 Sum Rule: Let y = u + v. Then

4y = (u +4u) + (v +4v)−(u + v) = 4u +4v, 4y 4x = 4u 4x + 4v 4x , st4y 4x= st4u 4x + 4v 4x= st4u 4x+ st4v 4x, dy dx = du dx + dv dx .

Constant Rule: Let y = cu. Then

4y = c(u +4u)−cu = c4u, 4y 4x = c4u 4x , st4y 4x= stc4u 4x= c·st4u 4x, dy dx = c du dx .

Product Rule: Let y = uv.

4y = (u +4u)(v +4v)−uv = u4v + v4u +4u4v, 4y 4x = u4v 4x + v4u 4x +4u4v 4x , st4y 4x= stu4v 4x + v4u 4x +4u4v 4x, st4y 4x= u·stu4v 4x+ v·st4u 4x+ 0·st4v 4x, dy dx = u dv dx + v du dx .

40 2. Differentiation Quotient Rule: Let y = u/v,v 6= 0. 4y = u +4u v +4v − u v = (u +4u)v−(v +4v)u v(v +4v) = v4u−u4v v(v +4v) , 4y 4x = v(4u/4x)−u(4v/4x) v(v +4v)

st4y 4x= v·st(4u/4x)−u·st(4v/4x) v2

dy dx

v(du/dx)−u(dv/dx) v2

a Theorem 2.9. (Power Rule) If x is a positive real number and r is any rational number, then d(xr) dx = rxr−1. Proof. Case 1: r is a positive integer. The proof is an easy induction using the Product Rule. Case 2: r = 1/n for some positive integer n. Let y = x1/n and let 4x be a nonzero inﬁnitesimal. Consider 4y = (x +4x)1/n −x1/n. 4y 6= 0 because x +4x 6= x. 4y is inﬁnitesimal because st(4y) = st((x +4x)1/n)−st(x1/n) = x1/n −x1/n = 0. Now x = yn, dx dy = nyn−1, 4x 4y ≈ nyn−1. Therefore 4y 4x ≈ 1 nyn−1 = 1 n x(1/n)−1, dy dx = 1 n x(1/n)−1. Case 3: r is a positive rational. This follows from Cases 1 and 2 using the fact that xm/n = x(1/n)m.

2D. Chain Rule (§2.6, §2.7) 41 Case 4: r is a negative rational. This follows from Case 3 using the Quotient Rule. a If r = m/n where n is odd, the above Power Rule also holds for negative values of x. If r = m/n where n is even, then xr is undeﬁned in the real number system when x is negative .

2D. Chain Rule (§2.6, §2.7) The Chain Rule can be proved in a natural way using the Increment Theorem.

Theorem 2.10. (Chain Rule) Let f and G be real functions and let g be the composition g(t) = G(f(t)). For any real value of t where f0(t) and G0(f(t)) exist, g0(t) also exists and g0(t) = G0(f(t))f0(t). Proof. Let x = f(t),y = g(t) = G(x). Let 4t 6= 0 be inﬁnitesimal. By the Increment Theorem 2.5 for x = f(t), 4x is inﬁnitesimal. By the Increment Theorem for y = G(x), 4y = G0(x)4x + ε4x for some inﬁnitesimal ε. Dividing by 4t and taking standard parts, 4y 4t = G0(x)4x 4t + ε4x 4t , dy dt = G0(x)dx dt + 0, g0(t) = G0(f(t))f0(t). a The Chain Rule with dependent variables is stated in the following form. Let x = f(t),y = g(t) = G(x), and suppose g0(t) and G0(x) exist. Then dy dt = dy dx dx dt where dx/dt and dy/dt are computed with t as the independent variable, and dy/dx is computed with x as the independent variable. As in the standard calculus treatment, the Chain Rule is not trivial because dy has one meaning when x is the independent variable, dy = G0(x)dx, and a diﬀerent meaning when t is the independent variable, dy = g0(t)dt. Higher derivatives are deﬁned in the usual way. Thus f00 is the derivative of f0, f(n+1) is the derivative of f(n).

42 2. Differentiation

The n-th diﬀerential of y can be considered separately and is deﬁned by dny = f(n)(x)dxn where dxn means (dx)n. Notice that when dx is inﬁnitesimal, dx2 is a much smaller inﬁnitesimal and d2y is the product of the real number f00(x) and dx2.

CHAPTER 3

CONTINUOUS FUNCTIONS

In this chapter we go beyond the treatment given in Chapter 3 of Elementary Calculus. Several proofs which were omitted or only sketched there are given fully here.

3A. Limits and Continuity (§3.3, §3.4) We will now use hyperreal numbers to deﬁne the notions of a limit and of a continuous function. In Chapter 5 we will show that these deﬁnitions are equivalent to the standard deﬁnitions.

Definition 3.1. Let L, c be real numbers. L is the limit of f(x) as x approaches c, in symbols L = lim x→c f(x), if whenever x ≈ c but x 6= c, we have f(x) ≈ L. If there is no such L we say that the limit does not exist.

In Elementary Calculus we made the intuitive statement that whenever limx→c f(x) = L, we can see the entire part of the hyperreal graph of f(x), where x ≈ c but x 6= c, in an inﬁnitesimal microscope aimed at (c.L). This is a simpliﬁcation that does not conform to the precise deﬁnition of inﬁnitesimal microscope given in Section 2B, because an ε-inﬁnitesimal microscope only has a ﬁeld of radius 2ε. A more exact statement is as follows: limx→c f(x) = L if and only if whenever x ≈ c but x 6= c, the point (x,f(x)) belongs to the ﬁeld of some inﬁnitesimal microscope aimed at (c,L). Limits can often be formed by computing standard parts. We see from the deﬁnition that if st(f(x)) = L for all x inﬁnitely close but not equal to c, then

lim x→c

f(x) = L. Corollary 3.2. If limx→c f(x) exists then f(x) is deﬁned for all real x 6= c in some neighborhood of c. Proof. Let Y = domain(f)∪{c}. Then Y ∗ = domain(f∗)∪{c}, by Proposition 1.27. By the deﬁnition of limit, f(x) must be deﬁned for all x 6= c in

44 3. Continuous Functions

the monad of c, so Y ∗ contains the monad of c. Then by Theorem 1.28, Y contains a neighborhood of c, so f(x) must be deﬁned for all real x 6= c in that neighborhood. a The next corollary follows at once from the deﬁnitions.

Corollary 3.3. The slope of f at a is given by the limit

f0(a) = lim 4x→0

f(a + δx)−f(a) 4x

. The limit with respect to a set Y ⊆R is deﬁned as follows. Definition 3.4. Let L and c be real numbers. L is the limit of f(x) as x approaches c in Y , L = lim x→c,x∈Y f(x), if whenever x ∈ Y ∗ and x ≈ c but x 6= c, we have f(x) ≈ L. Important special cases are the one-sided limits, deﬁned by

lim x→c−

f(x) = lim x→c,x<c

f(x),

lim x→c+

f(x) = lim x→c,x>c

f(x).

The following result is easy. Proposition 3.5. limx→c f(x) exists if and only if both one-sided limits exist and are equal, lim x→c− f(x) = lim x→c+ f(x). The rules for standard parts lead at once to the following rules for limits.

Theorem 3.6. (Rules for Limits) Suppose the limits

lim x→c

f(x), lim x→c

g(x)

both exist. (i) For any constant k, limx→c(kf(x)) = klimx→c f(x). (ii) limx→c(f(x) + g(x)) = limx→c f(x) + limx→c g(x). (iii) limx→c f(x)g(x) = (limx→c f(x))(limx→c g(x)). (iv) If limx→c g(x) 6= 0, limx→c(f(x)/g(x)) = (limx→c f(x))/(limx→c g(x)). Proof. To illustrate we prove (ii). Let x ≈ c but x 6= c. Then by Theorem 1.12,

lim x→c

(f(x)+g(x)) = st(f(x)+g(x)) = st(f(x))+st(g(x)) = lim x→c

f(x)+lim x→c

g(x). a Definition 3.7. f is continuous at a real point c if f(c) is deﬁned and whenever x is inﬁnitely close to c, f(x) is inﬁnitely close to f(c).

3A. Limits and Continuity (§3.3, §3.4) 45 As an immediate consequence of the deﬁnitions, we have the usual condition for continuity in terms of limits.

Corollary 3.8. f is continuous at a real point c if and only if f(c) is deﬁned and limf→c f(x) = f(c). Corollary 3.9. If f is continuous at c, then f(x) is deﬁned for all real x in some neighborhood of c. Proof. By Corollary 3.2, f(x) is deﬁned for all x 6= c in some neighborhoodof c. By deﬁnition, f(x) is also deﬁned at x = c. a It follows from Theorem 3.6 that sums, products, and quotients of continuous functions are continuous, provided that the denominator is not 0.

Theorem 3.10. If f is diﬀerentiable at c then f is continuous at c. Proof. Let f be diﬀerentiable at c. Then f(c) is deﬁned. Let x ≈ c butx 6 = c. Then f(x)−f(c) x−cis ﬁnite and x−c is inﬁnitesimal. It follows that f(x)−f(c) is inﬁnitesimal,so f(x) ≈ f(c). a Proposition 3.11. Compositions of continuous functions are continuous. That is, if f is continuous at c, and G is continuous at f(c), then g(x) = G(f(x)) is continuous at c. Proof. Let x ≈ c. Then f(x) ≈ f(c), so g(x) = G(f(x)) ≈ G(f(c)) = g(c). a We now deﬁne continuity and uniform continuity on a set Y of real numbers.

Definition 3.12. Let Y be a subset of the domain of f. f is continuous on Y if whenever c ∈ Y , x ≈ c, and x ∈ Y ∗, we have f(x) ≈ f(c). f is uniformly continuous on Y if whenever x,y ∈ Y ∗ and x ≈ y, wehave f(x) ≈ f(y). Corollary 3.13. If f is uniformly continuous on Y then f is continuous on Y . Proof. Suppose c ∈ Y,x ∈ Y ∗,x ≈ c. By Theorem 1.27, Y ⊆ Y ∗, soc ∈ Y ∗. Then, since f is uniformly continuous on Y , f(x) ≈ f(c), and thereforef is continuous on Y . a A set Y of reals is said to be compact if it is closed and bounded. For example, each closed interval [a,b] is compact. Corollary 3.14. A set Y of reals is compact if and only if for every y ∈Y ∗, y is ﬁnite and st(y) ∈ Y .

46 3. Continuous Functions Proof. By Corollary 1.29 and Theorem 1.31. a Theorem 3.15. Let Y be a compact set of reals. If f is continuous on Y then f is uniformly continuous on Y . Proof. Suppose f is continuous on Y . Let x,y ∈ Y ∗ and x ≈ y. By Corollary 3.14, x is ﬁnite and c ∈ Y where c = st(x). Since x ≈ y, c = st(y). By the continuity of f on Y , f(x) ≈ f(c), f(y) ≈ f(c). Thus f(x) ≈ f(y), and f is uniformly continuous on Y . a If f is uniformly continuous on a set Y then f is obviously uniformly continuous on any subset of Y . The following theorem allows us to extend the domain of a uniformly continuous function from an interval to the whole real line.

Theorem 3.16. (i) Let f be uniformly continuous on an interval I. Then there is a function g which agrees with f on I and is uniformly continuous on the whole real line. (ii) Suppose the derivative f0 of f is uniformly continuous on an interval I. Then there is a function g which agrees with f on I such that g0 is uniformly continuous on the whole real line.

Proof. We give the proof for the case where I is a half-open interval of the form [a,b). The other cases are similar. (i) We ﬁrst show that limx→b− f(x) exists. Let x < b and x ≈ b. Assume for the moment that f(x) is inﬁnite. Then every real u < b is a partial hyperreal solution of u < y, y < b, |f(u)−f(y)|≥ 1.(12) Let u1 < b,u1 ≈ b. By the Partial Solution Theorem 1.20 there is a hyperreal y1 such that (12) holds. But then u1 ≈ y1 but not f(u1) ≈ f(y1), contradicting the uniform continuity of f. Therefore f(x) could not have been inﬁnite. So f(x) is ﬁnite, and has a standard part B. For all y < b with y ≈ x we have f(y) ≈ f(x) and hence f(y) ≈ B. This shows that B = limx→b− f(x). Now let g be the function g(x) =     f(a), if x < a f(x), if x ∈ [a,b) B, if x ≥ b. Then g agrees with f on [a,b) and is uniformly continuous on the whole real line. (ii) From the proof of (i), the limits

B = lim x→b−

f(x), C = lim x→b−

f0(x)

3B. Hyperintegers (§3.8) 47 both exist. The function g(x) =     f(a) + f0(a)(x−a) if x < a f(x), if x ∈ [a,b) B + C(x−b), if x ≥ b agrees with f on [a,b) and has a uniformly continuous derivative on the whole real line. a

3B. Hyperintegers (§3.8) Hyperintegers are a basic tool in several areas of the calculus, including integration and inﬁnite series. In this chapter they are used in the proofs of the Intermediate and Extreme Value Theorems. Recall from Section 1E that Z denotes the set of integers, and that x ∈Z is the formula CZ(x) = 1, where CZ is the characteristic function of Z. Definition 3.17. The set of integers is denoted by Z. The natural extension Z∗ of Z is called the set of hyperintegers. Note that by Theorem 1.27 (i), Z∗ ∩R = Z, that is, a real number is an integer if and only if it is a hyperinteger. In Elementary Calculus the hyperintegers were deﬁned in a diﬀerent way using the function [x] = the greatest integer n ≤ x. We now show that the two deﬁnitions are equivalent. Theorem 3.18. Z∗ is the set of all hyperreal numbers y such that y = [x] for some hyperreal x. Proof. We ﬁrst note that the set of integers Z is the set of all real solutions of the equation y = [y]. By Proposition 1.26, the natural extension Z∗ is the set of all hyperreal solutions of y = [y]. Thus if y ∈Z∗ then y = [y], and hence y = [x] for some hyperreal number x. Now suppose that y = [x] for some hyperreal number x. The equation [x] = [[x]] holds for all real numbers. By Transfer, it holds for all hyperreal numbers. Therefore y = [x] = [[x]] = [y], so y = [y] and hence y ∈Z∗. a It follows that for each hyperreal number x, [x] is a hyperinteger. Moreover, since [x] ≤ x < [x] + 1 for all real x, we see from the Transfer Axiom that [x] ≤ x < [x] + 1 for all hyperreal x. Theorem 3.19. (i) Z∗ is a subring of R∗. That is, sums, diﬀerences, and products of hyperintegers are hyperintegers. (ii) For each hyperreal number x, [x] is the greatest hyperinteger ≤ x, and [x] ≤ x < [x] + 1. (iii) There are positive inﬁnite and negative inﬁnite hyperintegers.

48 3. Continuous Functions

(iv) Every ﬁnite hyperinteger is an integer, that is, Z∗∩galaxy(0) = Z. Proof. (i) Every real solution of x ∈Z, y ∈Z(13) is a solution of x + y ∈Z.(14) Hence every hyperreal solution of (13) is a solution of (14), so the sum of two hyperintegers is a hyperinteger. The proofs for diﬀerences and products are similar. (ii) For each real number x, [x] is the greatest integer ≤ x. Then [x] ≤ x for all real x, and by Corollary 1.16, [x] ≤ x for all hyperreal x. Moreover, every real solution of n ∈Z, n ≤ x(15) is a solution of n ≤ [x].(16) By Transfer, every hyperreal solution of (15) is a solution of (16). This shows [x] is the greatest hyperinteger ≤ x for every hyperreal x. Finally, since the system of formulas [x] ≤ x < [x] + 1 holds for all real x, it holds for all hyperreal x by Corollary 1.16. (iii) Let H be a positive inﬁnite hyperreal number. Then K = [H] + 1 is a hyperinteger which is greater than H and hence is positive inﬁnite. (iv) Let K be a ﬁnite hyperinteger. Then st(K) is real, so there is an integer n with n ≤ st(K) < n + 1. It follows that 0 ≤|n−K| < 1. However, |n−K| is a hyperinteger by (i), so by (ii) we must have |n−K| = 0, and thus n = K ∈Z. a A frequent construction in the calculus is the partition of a closed interval [a,b] into inﬁnitely many subintervals of equal inﬁnitesimal length. When x and y are hyperreal numbers, we call the set [x,y]∗ = {x ∈R∗: x ≤ z ≤ y} a hyperreal closed interval. If a ≤ x ≤ y ≤ b we call [x,y]∗ a hyperreal subinterval of [a,b]∗. When 0 < 4x and 4x ≈ 0, we call [x,x +4x]∗ an inﬁnitesimal interval. Since no ambiguity can arise, we sometimes drop the star on an inﬁnitesimal interval, writing [x,x +4x] for [x,x +4x]∗. In Elementary Calculus we always wrote [x,x+4x] instead of [x,x+4x]∗. Given

3C. Properties of Continuous Functions (§3.5–§3.8) 49 a hyperinteger H > 0, the closed hyperreal interval [a,b]∗ may be partitioned into subintervals of length δ = (b−a)/H. The partition points are a,a + δ,a + 2δ,... ,a + Kδ,... ,a + Hδ = b

where K runs over the hyperintegers from 0 to H. If H is inﬁnite, each subinterval will have inﬁnitesimal length δ = (b−a)/H, and the partition is called an inﬁnite partition of [a,b]∗, or sometimes an inﬁnite partition of [a,b]. A typical subinterval has the form [a + Kδ,a + (K + 1)δ]∗.

Corollary 3.20. Given a closed real interval [a,b] and a positive hyperinteger H, let δ = (b−a)/H. Then [a,b]∗ is the union of the subintervals [a + Kδ,a + (K + 1)δ)∗ where K ∈Z∗ and 0 ≤ K < H. Proof. Let x ∈ [a,b]∗. Let K = [(x−a)/δ]. Then K ∈Z∗ and K ≤ x−a δ < K + 1. It follows that 0 ≤ K < b−a δ = H, a + Kδ ≤ x < a + (K + 1)δ. a

3C. Properties of Continuous Functions (§3.5–§3.8) We use hyperintegers to prove the Intermediate and Extreme Value Theorems.

Theorem 3.21. (Intermediate Value Theorem) Suppose f is continuous on the closed interval [a,b]. Then for every real number D between f(a) and f(b) there is a point c ∈ [a,b] such that f(c) = D. Proof. We may assume that f(a) ≤ f(b). The result is trivial if D = f(a)or D = f(b), so we assume that a < b and f(a) < D < f(b). Consider a positive integer n and the ﬁnite partition

a,a + δ,a + 2δ,... ,a + nδ = b where δ = (b−a)/n. The value of f must cross D in one of the subintervals, so there is an integer m such that 0 ≤ m < n, f(a + mδ) < D ≤ f(a + (m + 1)δ).(17) Thus every real solution of the system of formulas n ∈Z, 0 < n, δ = (b−a)/n

50 3. Continuous Functions

is a partial real solution of (17). Now let n1 be a positive inﬁnite hyperinteger and let δ1 = (b − a)/n1. By the Partial Solution Theorem, there exists m1 such that (17) holds. Let c = st(a + m1δ1). We show that a ≤ c ≤ b and f(c) = D. We have a ≤ a + m1δ1 ≤ a + (m1 + 1)δ1 ≤ a + n1δ1 = b. Taking standard parts, a ≤ c ≤ b. Since f is continuous on [a,b], f(c) = st(f(a + m1δ1)) ≤ D, f(c) = st(f(a + (m1 + 1)δ1)) ≥ D. Therefore f(c) = D. a Definition 3.22. A function f is called increasing if f(x) < f(y) whenever x < y and x,y ∈ domain(f). f is called decreasing if f(x) > f(y) whenever x < y and x,y ∈ domain(f). Here is a useful consequence about continuous one to one functions.

Theorem 3.23. Suppose f is a continuous one to one function whose domain is an interval I. Then f is either increasing or decreasing.

Proof. We prove: (1) Whenever a f(b) > f(c). Proof of (1): Suppose a < b < c in I. Since f is one to one, the values f(a),f(b), and f(c) are all diﬀerent. We consider two cases. Case 1: f(a) < f(b). In this case we must show that f(b) < f(c). We do this by assuming instead that f(b) > f(c) and arriving at a contradiction. Under this assumption, we may pick a value x such that f(a) < x < f(b) and f(c) < x < f(b). By the Intermediate Value Theorem 3.21, there are points a1 ∈ (a,b) and c1 ∈ (b,c) such that f(a1) = x and f(c1) = x. But then a1 < c1 but f(a1) = f(c1), contradicting our hypothesis that f is one to one. This shows that f(a) < f(b) < f(c), and proves (1) in Case 1. Case 2: f(a) > f(b). A similar argument shows that f(a) > f(b) > f(c). We next prove: (2) Whenever a f(b) > f(c) > f(d). Proof of (2): By (1), either f(a) < f(b) < f(c) or f(a) > f(b) > f(c). If f(a) < f(b) < f(c), then we cannot have f(b) > f(c) > f(d), so by (1) again, f(a) < f(b) < f(c) < f(d). Similarly, if f(a) > f(b) > f(c), then f(a) > f(b) > f(c) > f(d). We now use (2) to prove that f is either increasing or decreasing. Pick any two points x < y in I. We again consider two cases. Case A: f(x) < f(y). In this case we show that f is increasing. To do this, let u < v in I. Arrange the points x,y,u,v in increasing order, giving us four

3C. Properties of Continuous Functions (§3.5–§3.8) 51 points a f(b) > f(c) > f(d). Then by (2), we must have f(a) < f(b) < f(c) < f(d). Since u,v ∈{a,b,c,d}, f(u) < f(v), so f is increasing. Case B: f(x) > f(y). A similar argument shows that f is decreasing. a Definition 3.24. f has a maximum at c if f(c) ≥ f(x) for all x in the domain of f. A minimum of f is deﬁned analogously.

Proposition 3.25. Suppose f has a maximum at c. Then the natural extension f∗ also has a maximum at c, that is, f(c) ≥ f(x) for all hyperreal x in the domain of f∗.

Proof. Every real solution of the formula

f(x) is deﬁned is a solution of f(c) ≥ f(x). By Transfer, every hyperreal solution of the ﬁrst formula is a solution of the second, so f∗ has a maximum at c. a Definition 3.26. f has a local maximum at c if c has a real neighborhood (c−r,c+r) such that f(x) is deﬁned and f(c) ≥ f(x) for all x ∈ (c−r,c+r). Local minima are deﬁned in a similar way.

Theorem 3.27. f has a local maximum at c if and only if f(x) is deﬁned and f(c) ≥ f(x) for all hyperreal x ≈ c. Proof. Suppose f has a local maximum at c. Then f(c) ≥ f(x) for all x in some real neighborhood (c−r,c+r) of c. By Proposition 3.25, f(c) ≥ f(x) for all hyperreal x such that c−r < x < c + r, and therefore f(c) ≥ f(x) for all hyperreal x ≈ c. Now suppose f does not have a local maximum at c. Case 1: There is no real neighborhood of c on which f is deﬁned. By Corollary 1.30, there is a hyperreal number x ≈ c at which f(x) is undeﬁned. Case 2: f is deﬁned on some real neighborhood (c−r,c + r) of c. Since f does not have a local maximum at c, every real solution of

0 < s < r(18)

is a partial real solution of c−s < x < c + s, f(c) < f(x).(19) Let s1 be positive inﬁnitesimal. Then s1 is a hyperreal solution of (18). By the Partial Solution Theorem , s1 is a partial hyperreal solution of (19). Hence there exists x1 ≈ c with f(c) < f(x1). a Theorem 3.28. (Extreme Value Theorem) Suppose that the domain of f is a closed interval [a,b] and f is continuous on [a,b]. Then f has a maximum and a minimum.

52 3. Continuous Functions

Proof. The result is trivial if a = b, so we assume a < b. Let n be a positive integer and consider the ﬁnite partition

a,a + δ,a + 2δ,... ,a + nδ = b where δ = (b−a)/n. Let f(a + mδ) be the greatest of the values f(a),f(a + δ),... ,f(a + nδ),

and let g be the function on the set of positive integers such that m = g(n). Then every real solution of n ∈Z, 0 < n, δ = (b−a)/n, m = g(n (20) ) is a solution of a ≤ a + mδ ≤ b.(21) Furthermore, every real solution of (20) plus k ∈Z, 0 ≤ k ≤ n(22) is a solution of f(a + mδ) ≥ f(a + kδ).(23) Let n1 be a positive inﬁnite hyperinteger and let δ1 = (b−a)/n1 and m1 = g(n1). We show that f has a maximum at c = st(a + m1δ1). The triple (n1,δ1,m1) is a hyperreal solution of (20), so by Transfer it is a solution of (21). Thus a ≤ a + m1δ1 ≤ b. Taking standard parts, a ≤ c ≤ b. Consider any real number x ∈ [a,b]. By Corollary 3.20, x belongs to an inﬁnitesimal subinterval of [a,b] of the form [a + k1δ1,a + (k1 + 1)δ1]∗ where k1 is a hyperinteger between 0 and n1. Then x = st(a + k1δ1). The quadruple (n1,δ1,m1,k1) is a hyperreal solution of (22), and by Transfer it is also a solution of (23). Thus f(a + m1δ1) ≥ f(a + k1δ1). Since f is continuous on [a,b], f(c) = st(f(a + m1δ1)) ≥ st(f(a + k1δ1)) = f(x). Thus f has a maximum at c. a

3C. Properties of Continuous Functions (§3.5–§3.8) 53 Theorem 3.29. (Critical Point Theorem) Suppose the domain of f is an interval I, f is continuous on I, and f has a maximum or minimum at a point c in I. Then one of the following occurs: (i) c is an endpoint of I, (ii) f0(c) is undeﬁned, (iii) f0(c) = 0.

Proof. Assume neither (i) nor (ii) holds, so c is not an endpoint of I and f0(c) exists. We show that f0(c) = 0, so that (iii) holds. Suppose f has a maximum at c. Let 4x > 0 be inﬁnitesimal. Then f(c +4x) ≤ f(c), f(c−4x) ≤ f(c), f(c +4x)−f(c) 4x ≤ 0 ≤ f(c−4x)−f(c) 4x , stf(c +4x)−f(c) 4x ≤ 0 ≤ stf(c−4x)−f(c) 4x , f0(c) ≤ 0 ≤ f0(c), f0(c) = 0. a We call a point c where (i), (ii), or (iii) occurs a critical point of f. A critical point which is not an endpoint of I is called an interior critical point of f.

Theorem 3.30. (Mean Value Theorem) Assume that a < b and f is continuous on the closed interval [a,b] and diﬀerentiable on (a,b). Then there is a point c ∈ (a,b) such that the slope of f at c equals the average slope of f, f0(c) = f(b)−f(a) b−a . The proof of the Mean Value Theorem from the Extreme Value and Critical Point Theorems is elementary and uses standard methods only, so we will not give it here. Here is a corollary of the Mean Value Theorem which is often used in the calculus. Corollary 3.31. (i) If f is continuous on an interval I and f0(x) = 0 whenever x is in the interior of I, then f(x) is constant on I. (ii) If f is continuous on an interval I and f0(x) ≥ 0 whenever x is in the interior of I, then f(x) ≤ f(y) whenever x ≤ y in I. (iii) If f is continuous on an interval I and f0(x) > 0 whenever x is in the interior of I, then f is increasing in I.

The Intermediate, Extreme, and Mean Value Theorems have the following useful consequences which involve hyperreal numbers. In these theorems we start with a real function f with at least one variable x, but allow the possibility of other variables as well. We state the results for the case that there is one

54 3. Continuous Functions

extra variable. Given a function f(x,s) of two variables x,s, for each real constant a we get a function f(x,a) of one variable x. The natural extension f∗(x,s) is a hyperreal function of two variables x and s, and for each hyperreal constant a, f∗(x,a) is a hyperreal function of one variable x. In each of the following theorems we suppose that f(x,s) is a real function of two variables such that for each real constant a, f(x,a) considered as a function of x is continuous on an interval I.

Theorem 3.32. (Hyperreal Intermediate Value Theorem) For each hyperreal constant a and each x < y in I∗, if u is a hyperreal number between f∗(x,a) and f∗(y,a), then there is a hyperreal z such that x ≤ z ≤ y and f∗(z,a) = u.

Theorem 3.33. (Hyperreal Extreme Value Theorem) For each hyperreal constant a and each x < y in I∗, f∗(z,a) has a maximum and minimum on the hyperreal closed interval [x,y]∗. That is, there is a hyperreal number z between x and y such that whenever x ≤ u ≤ y, f∗(z,a) ≥ f∗(u,a). Theorem 3.34. (Hyperreal Mean Value Theorem) Suppose that for each real a, f(x,a) as a function of x is diﬀerentiable on the interior of I, and let g(x,a) = f0(x,a). Then for each hyperreal constant a and each x < y in I∗, there is a hyperreal number z such that x < z < y and g∗(z,a) = f∗(y,a)−f∗(x,a) y−x . Proof. To illustrate the method we prove the Hyperreal Extreme Value Theorem. By the real Extreme Value Theorem, for each a, on every closed real subinterval [x0,y0] of I, f(z,a) has a maximum at some point z0 = g(x0,y0,a). Thus each real solution of x0 ∈ I, y0 ∈ I, x0 ≤ u0 ≤ y0, a = a(24) is a solution of x0 ≤ g(x0,y0,a) ≤ y0, f(g(x0,y0,a),a) ≥ f(u0,a).(25) By Transfer, every hyperreal solution of (24) is also a solution of (25). Therefore for each hyperreal constant a, f∗(z,a) has a maximum in [x,y]∗ at z = g∗(x,y,a). a We conclude this section with two applications of the Hyperreal Mean Value Theorem.

Theorem 3.35. (Second Derivative Test) (i) If f0(c) = 0 and f00(c) < 0, then f has a local maximum at c. (ii) If f0(c) = 0 and f00(c) > 0, then f has a local minimum at c. Proof. We prove (i). Since f00(c) exists, f and f0 are deﬁned on some real neighborhood of c. Let x ≈ c. By Theorem 3.27 it suﬃces to prove that

3C. Properties of Continuous Functions (§3.5–§3.8) 55 f(c) ≥ f(x). We assume f(c) < f(x) and arrive at a contradiction. Say c < x. By the Hyperreal Mean Value Theorem there is a hyperreal point t such that c < t < x, f0(t) = f(x)−f(c) x−c . Then f0(t) > 0. Since f0(c) = 0, we have f0(t)−f0(c) t−c = f0(t) t−c > 0. Taking standard parts, f00(c) = stf0(t)−f0(c) t−c ≥ 0. This contradicts the hypothesis f00(c) < 0. The case c > x is similar. We conclude that f(c) ≥ f(x). a Definition 3.36. A real function f is said to be uniformly diﬀerentiable at a real point c if f0(c) exists and whenever x ≈ c and 4x is nonzero inﬁnitesimal, f0(c) ≈ f(x +4x)−f(x) 4x . Note that uniform diﬀerentiability implies diﬀerentiability. Uniform diﬀerentiability will be useful when we study inverse functions in Chapter 7 and partial derivatives in Chapter 11. The next theorem compares uniform diﬀerentiability with continuous diﬀerentiability. Theorem 3.37. (i) If f is continuously diﬀerentiable at c, then f is uniformly diﬀerentiable at c. (ii) f is uniformly diﬀerentiable at every point of an open interval I, then f is continuously diﬀerentiable at every point of I. Proof. (i) Assume f is continuously diﬀerentiable at c. Then f0 is continuous at c, and by Corollary 3.9, f0 is deﬁned and hence f is diﬀerentiable at every point of some open neighborhood I of c. Let x ≈ c and let 4x be nonzero inﬁnitesimal. By Theorem 1.28, x,x +4x ∈ I∗. By the Hyperreal Mean Value Theorem there is a hyperreal number t between x and x +4x such that f0(t) = f(x +4x)−f(x) 4x . Since t ≈ x ≈ c, we have f0(c) ≈ f0(t), so f is uniformly diﬀerentiable at c. (ii) Assume f is uniformly diﬀerentiable at every point of an open interval I. Every real solution of x ∈ I, ε > (26) 0 is a partial hyperreal solution of 0 < 4x < ε,

f0(x)− f(x +4x)−f(x) 4x

< ε,(27)

56 3. Continuous Functions for we may take 4x to be inﬁnitesimal. Let x1 ≈ c ∈ I and let ε1 be positive inﬁnitesimal. Since I is open, x1 ∈ I∗, hence (x1,ε1) is a hyperreal solution of (26). By the Partial Solution Theorem, every hyperreal solution of (26) is a partial hyperreal solution of (27), Therefore there is a 4x1 such that (27) holds. Then f0(x1) ≈ f(x1 +4x1)−f(x1) 4x1 . But 4x1 ≈ 0, so by (ii), f0(c) ≈ f(x1 +4x1)−f(x1) 4x1 . Therefore f0(x1) ≈ f0(c), and (i) holds. a The following corollary shows that if f has a continuous derivative then the hyperreal formula for f0(x) holds for ﬁnite hyperreal x as well as for real x.

Corollary 3.38. Suppose the derivative of f is continuous on an interval I (not necessarily open), c ∈ I,c ≈ x ≈ x +4x, 4x 6= 0, and x,x +4x ∈ I∗. Then f0(x) ≈ f(x +4x)−f(x) 4x . Proof. By Theorem 3.37, f is uniformly diﬀerentiable at c, so f0(x) ≈ f0(c) ≈ f(x +4x)−f(x) 4x . a Here is an example of a function f which is uniformly diﬀerentiable at 0 but not continuously diﬀerentiable at 0. Let f(x) be the function with domain [−1,1] such that f(0) = 0, f(−x) = f(x) for each x, f(1/n) = n−2 for every positive integer n, and the graph of f(x) is a straight line on each subinterval [1/(n + 1),1/n]. Then f is uniformly diﬀerentiable at 0. But f is not diﬀerentiable at 1/n for each positive integer n, so f0 is not deﬁned on an open neighborhood of 0 and thus f0 cannot be continuous at 0. This example shows that uniform diﬀerentiability of f at c does not imply that f is diﬀerentiable on some open neighborhood of c. But it does imply that f is continuous on some open neighborhood of c.

Theorem 3.39. If f is uniformly diﬀerentiable at c, then f is continuous on some open neighborhood I of c. Proof. Let L = |f0(c)|+1. It suﬃces to prove that on some real neighborhood (c−r,c + r) of c, we have the Lipschitz condition |f(y)−f(x)|≤ L|y−x| for all x,y ∈ (c−r,c + r).(28)

3C. Properties of Continuous Functions (§3.5–§3.8) 57 f is diﬀerentiable at c, so by Theorem 3.10 and Corollary 3.9, f is deﬁned on some open neighborhood of c. Suppose there is no real number r > 0 such that (28) holds. Then every real solution of r > 0 is a partial solution of |f(y)−f(x)| > L|y−x|, x,y ∈ (c−r,c + r).(29) Let r1 be a positive inﬁnitesimal. By the Partial Solution Theorem, there are hyperreal x1,y1 such that (29) holds. But then

f(y1)−f(x1) y1 −x1

> |f0(c)|+ 1, x1 ≈ c, y1 ≈ c,contradicting the uniform diﬀerentiability of f at c. a

CHAPTER 4

INTEGRATION

In this chapter we use hyperreal numbers to develop the Riemann Integral. To keep the theory as elementary as possible, we restrict ourselves to continuous real functions.

Permanent Assumption We assume throughout this chapter that f and g are real functions which are continuous on an interval I.

4A. The Deﬁnite Integral (§4.1)

Given a positive real function f, consider the region in the plane bounded by the lines x = a, x = b, y = 0, and the curve y = f(x). We call this the region under the curve y = f(x) from a to b. The area of this region may be regarded as a real function A(a,b) of two variables. In this and the next section we will deﬁne the deﬁnite integral Z a b f(x)dx and show that it is equal to the area of the region under y = f(x) from a to b. Our plan is as follows. First we list some properties which the intuitive concept of area has. Second we prove that the deﬁnite integral has these properties. Third we prove that the deﬁnite integral is the only function with these properties.

Definition 4.1. By an area function for f we mean a real function A(u,v), whose domain is the set of ordered pairs of elements of I, such that (i) A has the Addition Property: A(a,c) = A(a,b) + A(b,c) for all a,b,c ∈ I. (ii) A has the Rectangle Property: m(b−a) ≤ A(a,b) ≤ M(b−a) whenever a < b in I and f has minimum value m and maximum value M on [a,b].

60 4. Integration

The Rectangle Property states that the area of the region is between the areas of the inscribed and circumscribed rectangles. It follows at once from the Addition Property that A(a,a) = 0, A(b,a) = −A(a,b). Thus to specify an area function we need only specify the values of A(a,b) for a < b in I. We now introduce the notion of a Riemann sum of a function with respect to a partition of an interval. For simplicity we will consider only partitions of [a,b] in which all subintervals except the last subinterval have the same length. Definition 4.2. Let [a,b] be a subinterval of I and let 4x be a positive real number. The Riemann sumPb a f(x)4x is deﬁned as the sum b X a f(x)4x = f(x0)4x + f(x1)4x +···+ f(xn−1)4x + f(xn)(b−xn) where n is the largest integer such that a + n4x < b, and x0 = a, x1 = a +4x, ... , xn = a + n4x. Geometrically, the interval [a,b] is partitioned into subintervals of equal length 4x, except that if 4x does not evenly divide b−a then the last subinterval [xn,b] is shorter than 4x. The Riemann sum is equal to the sum of the areal of the vertical strips over each subinterval with height equal to the value of f(x) at the left end of the subinterval. The Riemann sum Pb a f(x)4x is a real function of the three variablesa,b, 4x. If a and b are held ﬁxed it becomes a function of the single vari-able 4x. If we replace the positive real 4x in this function by a positive inﬁnitesimal dx, the natural extension gives us the inﬁnite Riemann sum.

Definition 4.3. Given a continuous real function f on I and a subinterval [a,b] of I, let S(4x) = b X a f(x)4x be the ﬁnite Riemann sum. The inﬁnite Riemann sum is the natural extension S∗(dx) = b X a f(x)dx. Since the ﬁnite Riemann sum is deﬁned for all real 4x > 0, the inﬁnite Riemann sum is deﬁned for all hyperreal dx > 0. Our plan is to deﬁne the integral as the standard part of the inﬁnite Riemann sum. First we must prove that this sum is ﬁnite, so its standard part exists.

Lemma 4.4. Let a < b in I and let dx be positive inﬁnitesimal. Then the inﬁnite Riemann sumPb a f(x)dx is a ﬁnite hyperreal number.

4A. The Definite Integral (§4.1) 61 Proof. By the Extreme Value Theorem 3.28, f has a minimum value m and a maximum value M on [a,b]. For each positive real 4x we have b X a m4x ≤ b X a f(x)4x ≤ b X a M4x, and b X a m4x = m(b−a), b X a M4x = M(b−a). Therefore every real solution of 4x > 0 is a solution of m(b−a) ≤ b X a f(x)4x ≤ M(b−a).(30) By Transfer, since dx > 0, dx is a hyperreal solution of (30), and therefore Pb a f(x)dx is ﬁnite. a Definition 4.5. Let a < b in I and let dx be positive inﬁnitesimal. The deﬁnite integral of f from a to b with respect to dx is the standard part of the inﬁnite Riemann sum, Z b a f(x)dx = st b X a f(x)dx!. Moreover, Z a a f(x) = 0, Z a b f(x)dx = −Z b a f(x)dx. For each ﬁxed positive inﬁnitesimal dx, the deﬁnite integralRw u f(x)dx is a real function of two variables u and w. It does not depend on the dummy variable x. We always use matching symbols for the dummy variable x and the inﬁnitesimal dx. This convention identiﬁes the dummy variable when integrating a function of two or more variables. For example, Z 1 0 x2tdx = 1 3 t, Z 1 0 x2tdt = 1 2 x2. We now develop some properties of the deﬁnite integral.

Theorem 4.6. Let a < b in I, let c be a real constant, and let dx be positive inﬁnitesimal. Then (i)Rb a cdx = c(b−a) (ii)Rb a cf(x)dx = cRb a f(x)dx (iii)Rb a(f(x) + g(x))dx =Rb a f(x)dx +Rb a g(x)dx (iv) If f(x) ≤ g(x) for all x ∈ [a,b], thenRb a f(x)dx ≤Rb a g(x)dx

62 4. Integration

Proof. For each case the proof has three steps. First, verify the analogous formula for ﬁnite Riemann sums. Second, use the Transfer Axiom to prove the formula for inﬁnite Riemann sums. Third, take standard parts. a The following theorem shows that the deﬁnite integralRb a f(x)dx does not depend on the inﬁnitesimal dx. Theorem 4.7. Let a < b in I and let dx and du be positive inﬁnitesimals. Then Z b a f(x)dx =Z b a f(u)du. Proof. It suﬃces to prove that for every positive real number r, Z b a f(x)dx ≤Z b a f(u)du + r. Let c = r/(b−a). We will show that b X a f(x)dx ≤ b X a (f(u) + c)du,(31) whence by Theorem 4.6, Z b a f(x)dx ≤Z b a (f(u) + c)du =Z b a f(u)du + r. In Elementary Calculus, Formula (31) was justiﬁed intuitively. To give a rigorous proof of (31) we use the Partial Solution Theorem 1.20. Let 4x and 4u be positive real numbers. If

b X a

f(x)4x >

b X a (f(u) + c)4u, there must be a point at which a rectangle inPb a f(x)4x is above a rectangle inPb a(f(u) + c)4u. So there must be a pair of points x,u in [a,b] such that x−4u ≤ u ≤ x +4x, f(x) > f(u) + c. Thus every real solution of

4x > 0, 4u > 0,

b X a

f(x)4x >

b X a (f(u) + c)4u(32) is a partial real solution of a ≤ x ≤ b, a ≤ u ≤ b, x−4u ≤ u ≤ x +4x, f(x) > f(u) + c.(33) Now suppose (31) fails for dx and du, so

b X a

f(x)dx >

b X a (f(u) + c)du.

4A. The Definite Integral (§4.1) 63 Then (dx,du) is a hyperreal solution of (32). By the Partial Solution Theorem 1.20, there is a hyperreal solution (dx,du,x1,u1) of (33). Since dx and du are inﬁnitesimal, (33) implies that x1 ≈ u1 and f(x1) 6≈ f(u1), contradicting the continuity of f. We conclude that (31) is true. a Corollary 4.8. Z b a f(x)dx = lim 4x→0+ b X a f(x)4x. Proof. By Theorem 4.7 and Deﬁnition 4.5, for every positive inﬁnitesimal dx = 4x we have Z b a f(u)du =Z b a f(x)dx = st b X a f(x)4x!. a From now on when we writeRb a f(x)dx, it is to be understood that dx is some positive inﬁnitesimal. By Theorem 4.7, it doesn’t matter which one. Theorem 4.9. The deﬁnite integralRb a f(x)dx is an area function for f. Proof. The Rectangle Property follows at once from Theorem 4.6, which gives m ≤ f(x) ≤ M, m(b−a) =Z b a mdx ≤Z b a f(x)dx ≤Z b a M dx = M(b−a). By Theorem 4.7, it suﬃces to prove the Addition Property for one positive inﬁnitesimal dx. Let a < b < c in I. Let n be a positive integer and let 4x = (b−a)/n. Then b = n4x is an endpoint of one of the subintervals of length 4x in the partition of [a,c], so c X a f(x)4x = b X a f(x)4x + c X b f(x)4x.(34) Thus every real solution of n ∈Z, 0 < n, 4x = (b−a)/n(35) is a solution of (34). Now let n1 be a positive inﬁnite hyperinteger and let dx = (b − a)/n1. (n1,dx) is a hyperreal solution of (35). By the Transfer Axiom, dx is a solution of (34), so

c X a

f(x)dx =

b X a

f(x)dx +

c X b

f(x)dx.

64 4. Integration

Since dx is positive inﬁnitesimal, we may take standard parts and obtain the Addition Property for f and dx, Z c a f(x)dx =Z b a f(x)dx +Z c b f(x)dx. a

4B. Fundamental Theorem of Calculus (§4.2)

We will prove that the deﬁnite integral is the only area function for f, and use that fact to prove the Fundamental Theorem of Calculus. We ﬁrst introduce lower and upper Riemann sums. For simplicity we consider only the case where 4x evenly divides the interval [a,b]. Definition 4.10. Let 4x = (b − a)/n for some positive integer n. The lower Riemann sum of f is the sum

b X a

m(x,x +4x)4x = m(x0,x1)4x + m(x1,x2)4x +···+ m(xn−1,xn)4x,

and the upper Riemann sum of f is the sum

b X a

M(x,x+4x)4x = M(x0,x1)4x+M(x1,x2)4x+···+M(xn−1,xn)4x,

where

m(x,x +4x) = minimum value of f on [x,x +4x], M(x,x +4x) = maxumum value of f on [x,x +4x], xk = a + k4x. Geometrically, the lower Riemann sum is the sum of the inscribed rectangles, and the upper Riemann sum is the sum of the circumscribed rectangles. For a given real function f continuous on [a,b], the lower and upper Riemann sums

b X a

m(x,x +4x)4x,

b X a

M(x,x +4x)4x, 4x = (b−a)/n

are real functions of n, and are deﬁned whenever n is a positive integer. By Transfer, their natural extensions are deﬁned for any positive hyperinteger. Given a positive inﬁnite hyperinteger H, we let dx = (b−a)/H and call the natural extensions b X a m(x,x + dx)dx, b X a M(x,x + dx)dx the inﬁnite lower and upper Riemann sums of f with respect to dx.

4B. Fundamental Theorem of Calculus (§4.2) 65 Lemma 4.11. The inﬁnite lower and upper Riemann sums are inﬁnitely close to each other,

b X a

m(x,x + dx)dx ≈

b X a

M(x,x + dx)dx.

Proof. For each positive integer n with 4x = (b−a)/n, we have b X a m(x,x +4x)4x ≤ b X a M(x,x +4x)4x, so by Transfer,

b X a

m(x,x + dx)dx ≤

b X a

M(x,x + dx)dx.

We must show that for each positive real r,

b X a

M(x,x + dx)dx <

b X a m(x,x + dx)dx + r.(36) The proof is like that of Theorem 4.7. Let c = r/(b−a). Consider a positive integer n with 4x = (b−a)/n. If b X a M(x,x +4x)4x ≥ b X a m(x,x +4x)4x + r, then there must be an x such that M(x,x +4x) ≥ m(x,x +4x) + c. But M(x,x+4x) = f(y) and m(x,x+4x) = f(z) for some y,z ∈ (x,x+4x). Thus any real solution of

n ∈Z, 0 < n, 4x = (b−a)/n,

b X a

M(x,x +4x)4x ≥

b X a

m(x,x +4x)4x + r

(37)

is a partial real solution of a ≤ y ≤ b, a ≤ z ≤ b, |y−z|≤4x, f(y) ≥ f(z) + c.(38) Now let dx = (b−a)/n1 where n1 is a positive inﬁnite hyperinteger. Suppose (36) fails, b X a M(x,x + dx)dx ≥ b X a m(x,x + dx)dx + r. Then (n1,dx) is a hyperreal solution of (37). By the Partial Solution Theorem, (n1,dx) is a partial hyperreal solution of (38), so there is a solution (n1,dx,y1,z1) of (38). It follows that y1 ≈ z1, f(y1) 6≈ f(z1),

66 4. Integration contradicting the continuity of f. We conclude that (36) holds. a Theorem 4.12. The deﬁnite integral is the only area function for f.

Proof. Let A(u,v) and B(u,v) be area functions for f. Let a < b in I. For each positive integer n with 4x = (b−a)/n, the Rectangle Property gives m(xk,xk+1)4x ≤ A(xk,xk+1) ≤ M(xk,xk+1)4x, k = 0,1,... ,n−1, and by the Addition Property,

b X a

m(x,x +4x)4x ≤ A(a,b) ≤

b X a

M(x,x +4x)4x. Let n1 be a positive inﬁnite hyperinteger and dx = (b−a)/n1. By Transfer, b X a m(x,x + dx)dx ≤ A(a,b) ≤ b X a M(x,x + dx)dx. Similarly,

b X a

m(x,x + dx)dx ≤ B(a,b) ≤

b X a

M(x,x + dx)dx.

By Lemma 4.11,

A(a,b) ≈ B(a,b), and since A(a,b) and B(a,b) are real they must be equal. If b ≤ a they are still equal because A(a,a) = B(a,a) = 0, and A(b,a) = −A(a,b) = −B(a,b) = B(b,a). a Definition 4.13. Suppose the domain of f is an open interval I. A function F is said to be the antiderivative of f on I if f is the derivative of F on I.

Theorem 4.14. (Fundamental Theorem of Calculus) Suppose I is an open interval, a ≤ b in I, and F is the antiderivative of f on I. Then Z b a f(x)dx = F(b)−F(a). Proof. Let D(a,b) = F(b)−F(a). We show that D is an area function forf . Since the deﬁnite integral is the only area function for f by Theorem 4.12, it will follow that D is equal to the deﬁnite integral of f. Addition Property: D(a,c) = F(c)−F(a) = (F(b)−F(a)) + (F(c)−F(b)) = D(a,b) + D(b,c). Rectangle Property: By the Mean Value Theorem 3.30 there is a point c ∈ (a,b) such that f(c) = F0(c) = F(b)−F(a) b−a ,

4C. Second Fundamental Theorem of Calculus (§4.2) 67 f(c)(b−a) = F(b)−F(a) = D(a,b). By the Extreme Value Theorem 3.28, f has a minimum value m and a maximum value M on [a,b]. Then m ≤ f(c) ≤ M, m(b−a) ≤ f(c)(b−a) ≤ M(b−a), m(b−a) ≤ D(a,b) ≤ M(b−a). Thus D has the Rectangle Property and hence D is an area function for f. a Theorem 4.15. Suppose the domain of f is an open interval I, and F is an antiderivative of f. Then the set of all antiderivatives of f is equal to the set of all functions which diﬀer from F by a constant.

Proof. For any constant C0, F(x) + C0 has the same derivative as F and hence is an antiderivative of f. Suppose G is an antiderivative of f, and let H(x) = F(x)−G(x). We prove that H is constant. We have H0(x) = F0(x)−G0(x) = f(x)−f(x) = 0. By the Mean Value Theorem 3.30, whenever a < b in I there is a point c in (a,b) such that 0 = H0(c) = H(b)−H(a) b−a . Therefore H(b) = H(a), and H is a constant function. a Definition 4.16. The set of all antiderivatives of f is called the indeﬁnite integral of f, and is writtenRf(x)dx. The set of all functions which diﬀer from F by a constant is denoted by F(x) + C.

If F is an antiderivative of f, then Theorem 4.15 states that Z f(x)dx = F(x) + C. The techniques for evaluating integrals in this treatment are exactly the same as in the standard calculus treatment. As usual, one can evaluate a deﬁnite integral by ﬁnding an antiderivative and using the Fundamental Theorem of Calculus. Notice that we have not yet proved that every continuous real function f has an antiderivative. This will be done in the next section.

4C. Second Fundamental Theorem of Calculus (§4.2) In this section let I be an arbitrary interval and suppose f has domain I and is continuous on I. We may regard the deﬁnite integralRw u f(t)dt as a real function of two variables deﬁned for all u,w ∈ I. By the Function Axiom, Rw u f(t)dt is also deﬁned when u and w are hyperreal numbers in I∗.

68 4. Integration Theorem 4.17. (Second Fundamental Theorem of Calculus) Let a ∈ I and deﬁne F(x) for all x ∈ I by F(x) =Z x a f(t)dt. (i) F is continuous on I. (ii) F is an antiderivative of f on the interior of I. Proof. (i) Let c ∈ I and let 4x be an inﬁnitesimal such that c+4x ∈ I∗. We must show that F(c +4x) ≈ F(c).(39) Suppose 4x > 0, and let b be a point in I such that c < c +4x < b. By the Extreme Value Theorem 3.28, f has a minimum value m and a maximum value M in [c,b]. For each positive real number u < b−c we have F(c + u)−F(c) =Z c+u c f(t)dt, mu ≤Z c+u c f(t)dt ≤ Mu, and hence mu ≤ F(c + u)−F(c) ≤ Mu. By Transfer we have m4x ≤ F(c +4x)−F(c) ≤ M4x, and (39) follows. The case 4x < 0 is similar. (ii) Let c be an interior point of I and let4x be nonzero inﬁnitesimal. Again suppose 4x > 0 and let b ∈ I, c < c +4x < b. Consider a real number u ∈ (0,b−c). By the Extreme Value Theorem 3.28, in the interval [c,c + u] the function f has a minimum at some point y and a maximum at some point z. Then F(c + u)−F(c) =Z c+u c f(t)dt, f(y)u ≤Z c+u c f(t)dt ≤ f(z)u. Then any real solution of 0 < u < b−c(40) is a partial real solution of c ≤ y ≤ c + u, c ≤ z ≤ c + u, f(y)u ≤ F(c + u)−F(c) ≤ f(z)u.(41) u = 4x is a hyperreal solution of (40). By the Partial Solution Theorem, it is a partial hyperreal solution of (41). Thus there are hyperreal numbers y1,z1 such that y1 ≈ c, z1 ≈ c, f(y1)4x ≤ F(c +4x)−F(c) ≤ f(z1)4x.

4C. Second Fundamental Theorem of Calculus (§4.2) 69

Then

f(y1) ≤

F(c +4x)−F(c) 4x ≤ f(z1).Using the continuity of f, f(c) = st(f(y1)) ≤ stF(c +4x)−F(c) 4x ≤ st(f(z1)) = f(c), whence f(c) = stF(c +4x)−F(c) 4x . A similar argument works when 4x < 0. It follows that f(c) = F0(c).

a We now use the Second Fundamental Theorem to give a short alternate proof of the Fundamental Theorem of Calculus in Section 4B. This proof does not depend on the fact that the area function for f is unique. Alternate Proof of the Fundamental Theorem of Calculus Let G be an antiderivative of f. Since any two antiderivatives diﬀer by a constant, G(x) =Z x a f(t)dt + C0 for some constant C0. Then G(b)−G(a) = Z b a f(t)dt + C0!−Z a a f(t)dt + C0=Z b a f(t)dt.

CHAPTER 5

LIMITS

5A. ε,δ Conditions for Limits (§5.8, §5.1)

We show that the inﬁnitesimal deﬁnitions of limit, continuity, uniform continuity, diﬀerentiability, and uniform diﬀerentiability are equivalent to the standard ε,δ deﬁnitions. These equivalence theorems will be useful later on. In this section, f is a real function. We start with the equivalence theorem for ﬁnite limits.

Theorem 5.1. Let c,L be real numbers. The following are equivalent. (i) limx→c f(x) = L. That is, whenever x ≈ c, we have f(x) ≈ L. (ii) There exists a hyperreal δ > 0 such that whenever 0 < |x−c| < δ, wehave f(x) ≈ L. (iii) The ε,δ condition: For every real ε > 0 there is a real δ > 0 such that whenever x is real and 0 < |x−c| < δ, we have |f(x)−L| < ε. Proof. (i) obviously implies (ii), with δ being any positive inﬁnitesimal. To prove (ii) implies (iii), assume (iii) fails for some real ε > 0. Then every real δ > 0 is a partial real solution of 0 < |x−c| < δ, |f(x)−L|≥ ε.(42) Let δ1 > 0 be hyperreal. By the Partial Solution Theorem 1.20, there is a hyperreal x1 such that (42) holds, and therefore 0 < |x1 −c| < δ1, f(x1) 6≈ L. Thus if (iii) fails then (ii) fails, so (ii) implies (iii). Assume (iii). Let x1 ≈ c. Let ε be any positive real number, and let δ be the corresponding positive real number in the ε,δ condition. Then every real solution of 0 < |x−c| < δ is a solution of |f(x)−L| < ε. We have 0 < |x1 −c| < δ, and then by Transfer, |f(x1)−L| < ε. Since this holds for all positive real ε, f(x1) ≈ L. a

72 5. Limits

Condition (ii) in the above theorem is sometimes easier to verify than (i). Here is the equivalence theorem for continuity.

Corollary 5.2. Let c a real number. The following are equivalent. (i) f is continuous at c. That is, whenever x ≈ c, we have f(x) ≈ f(c). (ii) There is a hyperreal δ > 0 such that whenever |x − c| < δ, we havef (x) ≈ f(c). (iii) The ε,δ condition: For every real ε > 0 there is a real δ > 0 such that for all real x ∈ (c−δ,c + δ), we have |f(x)−f(c)| < ε. We next give the equivalence theorem for diﬀerentiability.

Corollary 5.3. Let c,S be real numbers. The following are equivalent. (i) f0(c) = S. That is, whenever 4x ≈ 0 but 4x 6= 0, we have f(c +4x)−f(c) 4x ≈ S. (ii) There is a hyperreal δ > 0 such that whenever 0 < |4x| < δ, we have f(c +4x)−f(c) 4x ≈ S. (iii) The ε,δ condition: For every real ε > 0 there is a real δ > 0 such that whenever 4x is real and 0 < |4x| < δ, we have

f(c +4x)−f(c) 4x −S

< ε. Here is the equivalence theorem for uniform diﬀerentiability. Theorem 5.4. Let c,S be real numbers. The following are equivalent. (i) f0(c) = S and f is uniformly diﬀerentiable at c. (ii) There is a hyperreal δ > 0 such that whenever 0 < |4x| < δ and |x−c| < δ, we have f(x +4x)−f(x) 4x ≈ S. (iii) The ε,δ condition: For every real ε > 0 there is a real δ > 0 such that for all real 4x,x with 0 < |4x| < δ and |x−c| < δ, we have

f(x +4x)−f(x) 4x −S

< ε.Proof. The proof is similar to Theorem 5.1. (i) clearly implies (ii) where δ is any positive inﬁnitesimal. Assume (iii) fails for some real ε > 0. Then any real δ > 0 is a partial real solution of 0 < |4x| < δ, |x−c| < δ,

f(x +4x)−f(x) 4x −S

≥ ε.(43) Let δ1 be hyperreal. By the Partial Solution Theorem there are hyperreal 4x1 and x1 such that (43) holds, and therefore (ii) fails. Thus (ii) implies (iii).

5A. ε,δ Conditions for Limits (§5.8, §5.1) 73 Assume (iii), let 4x1 ≈ 0, 4x1 6= 0, and x1 ≈ c. Let ε > 0 be real and take the corresponding real δ > 0. Then every real solution of |x−c| < δ, 0 < |4x| < δ is a solution of

f(x +4x)−f(x) 4x −S

< ε.By Transfer, this also holds for 4x1 and x1, and therefore (i) holds. a We now state, without proof, versions of Theorem 5.1 and Corollary 5.2 restricted to a set Y ⊆R. Theorem 5.5. Let Y ⊆R and c ∈ Y , and let L be real. The following are equivalent. (i) limx→c,x∈Y f(x) = L. That is, whenever x ∈ Y ∗,x ≈ c, and x 6= c, wehave f(x) ≈ L. (ii) There is a hyperreal δ > 0 such that whenever x ∈ Y ∗ and 0 < |x−c| < δ, we have f(x) ≈ L. (iii) The ε,δ condition: For every real ε > 0 there is a real δ > 0 such that whenever x ∈ Y and 0 < |x−c| < δ, we have |f(x)−L| < ε. Corollary 5.6. Let Y ⊆R. The following are equivalent. (i) f is continuous on Y . That is, whenever c ∈ Y,x ∈ Y ∗,x ≈ c, andx 6 = c, we have f(x) ≈ f(c). (ii) For each c ∈ Y there exists a hyperreal δ > 0 such that whenever x ∈ Y ∗and 0 < |x−c| < δ, we have f(x) ≈ f(c). (iii) The ε,δ condition: For every real ε > 0 and c ∈ Y there is a real δ > 0 such that whenever x ∈ Y and 0 < |x−c| < δ, we have |f(x)−f(c)| < ε. Our next result is an equivalence theorem for uniform continuity. Theorem 5.7. Let Y ⊆R. The following are equivalent. (i) f is uniformly continuous on Y . That is, whenever x,y ∈ Y ∗ and x ≈ y, we have f(x) ≈ f(y). (ii) There is a hyperreal δ > 0 such that whenever x,y ∈ Y ∗ and |x−y| < δ, we have f(x) ≈ f(y). (iii) The ε,δ condition: For every real ε > 0 there is a real δ > 0 such that whenever x,y ∈ Y and 0 < |x−y| < δ, we have |f(x)−f(y)| < ε. Proof. (i) trivially implies (ii). Assume (iii) fails for some real ε > 0. Then every real δ > 0 is a partial real solution of x ∈ Y, y ∈ Y, |x−y| < δ, |f(x)−f(y)|≥ ε.(44) Hence every hyperreal δ1 > 0 is a partial hyperreal solution of (44). So there exist x1,y1 ∈ Y ∗ such that |x1 −y1| < δ1 but f(x1) 6≈ f(y1). This shows that the failure of (iii) implies the failure of (ii), so (ii) implies (iii).

74 5. Limits Assume (iii). Let x1,y1 ∈ Y ∗ and x1 ≈ y1. Let ε be any positive real and letδ > 0 be the corresponding number in the ε,δ condition. Every real solution of x ∈ Y, y ∈ Y, |x−y| < δ(45) is a solution of |f(x)−f(y)| < ε.(46) (x1,y1) is a hyperreal solution of (45). By Transfer, (x1,y1) is a hyperreal solution of (46). Since this holds for all real ε > 0, f(x1) ≈ f(y1). a We conclude this section with a discussion of inﬁnite limits.

Definition 5.8. Let c and L be real numbers. limx→∞f(x) = L if f(H) ≈ L for every positive inﬁnite H. limx→c f(x) = ∞ if f(x) is positive inﬁnite whenever x ≈ c but x 6= c. We state equivalence theorems for these limits without proof.

Theorem 5.9. The following are equivalent, where L is real. (i) limx→∞f(x) = L. (ii) There is a hyperreal number K such that whenever H > K, we have f(H) ≈ L. (iii) The ε,M condition: For every real ε > 0 there is a real number M such that whenever x is real and x > M, we have |f(x)−L| < ε. Theorem 5.10. The following are equivalent, where c is real. (i) limx→c f(x) = ∞. (ii) There is a hyperreal δ > 0 such that whenever 0 < |x−c| < δ, f(x) is positive inﬁnite. (iii) The M,δ condition: For every real number M there is a real δ > 0 such that whenever x is real and 0 < |x−c| < δ, we have f(x) > M. Limits such as limx→∞f(x) = ∞, negative inﬁnite limits, and inﬁnite limits restricted to a set Y ⊆R, are deﬁned analogously and have similar equivalence theorems.

5B. L’Hospital’s Rule (§5.2)

In this section f and g are real functions. The proof of l’Hospital’s Rule uses the Generalized Mean Value Theorem, whose proof is elementary and can be found in Elementary Calculus.

Theorem 5.11. (Generalized Mean Value Theorem) Suppose f and g are continuous on the closed interval [a,b] and diﬀerentiable on the open interval

5B. L’Hospital’s Rule (§5.2) 75 (a,b). Assume further that g0(x) 6= 0 for x ∈ (a,b). Then there is a point t ∈ (a,b) such that f0(t) g0(t) = f(b)−f(a) g(b)−g(a) . Theorem 5.12. (L’Hospital’s Rule for 0/0). Suppose that for all x in some real open interval (c,b), f0(x) and g0(x) exist and g0(x) 6= 0. Assume that lim x→c+ f(x) = 0, lim x→c+ g(x) = 0. If limx→c+(f0(x)/g0(x)) exists, then

lim x→c+

f(x) g(x)

= lim x→c+

f0(x) g0(x)

Proof. Let limx→c+(f0(x)/g0(x)) = L. We may set f(c) = 0 and g(c) = 0, so f and g are continuous on [c,b). By the Generalized Mean Value Theorem, every real solution of c < x < b is a partial real solution of

c < t < x,

f0(t) g0(t)

f(x)−f(c) g(x)−g(c)

which simpliﬁes to

c < t < x,

f0(t) g0(t)

f(x) g(x) .(47) Now let x1 ≈ c and x1 > c. By the Partial Solution Theorem there exists t1 such that (47) holds. Thus

t1 6= c, t1 ≈ c,

f0(t1) g0(t1)

f(x1) g(x1)

Since L = limt→c+(f0(t)/g0(t)), we have L ≈ f0(t1) g0(t1) = f(x1) g(x1) , L = lim x→c+

f(x) g(x)

a L’Hospital’s Rule also holds when either c or L or both are replaced by ∞or −∞, and for x → c− and x → c, with only routine changes in the proof. The proof of l’Hospital’s Rule for ∞/∞ is more diﬃcult. Theorem 5.13. (L’Hospital’s Rule for ∞/∞) Suppose that for all x in some real open interval (c,b), f0(x) and g0(x) exist and g0(x) 6= 0. Assume that lim x→c+ f(x) = ∞, lim x→c+ g(x) = ∞.

76 5. Limits If limx→c+(f0(x)/g0(x)) exists, then lim x→c+ f(x) g(x) = lim x→c+

f0(x) g0(x)

. Proof. We will use Theorem 5.1. Let L = limx→c+(f0(x)/g0(x)). By the Generalized Mean Value Theorem every real solution of c < x < y < c + r(48) is a partial solution of

x < t < y,

f0(t) g0(t)

f(y)−f(x) g(y)−g(x)

which we rewrite as

x < t < y,

f0(t) g0(t)

f(x) g(x) − f(y) g(x) 1− g(y) g(x) .(49) Let y1 ≈ c and y1 > c. Then f(y1) and g(y1) are positive inﬁnite, and so is their product K = f(y1)g(y1). By the M,δ condition for limx→c+ g(x) = ∞, for every real M there is a real δ(M) such that every real solution of c < x < c + δ(M) is a solution of g(x) > M. Let δ1 be such that 0 < δ1 ≤ δ(K) and c+δ1 < y1. Consider any x1 with c < x1 < c + δ1. By Transfer, g(x1) > K. Moreover, c < x1 < y1 < c + x, so by the Partial Solution Theorem there is a t1 such that (49) holds. Then t1 ≈ c, f0(t1) g0(t1) ≈ L. Also,

f(y1) g(x1)

≤

f(y1) K

f(y1) f(y1)g(y1)

≈ 0so ( f(y1)/g(x1)) ≈ 0. Similarly (g(y1)/g(x1)) ≈ 0. Taking standard parts in (49), we have stf0(t1) g0(t1)= st  f(x1) g(x1) − f(y1) g(x1) 1− g(y1) g(x1)  ,whence L ≈ f0(t1) g0(t1) ≈ f(x1) g(x1) . Since this holds for all c < x1 < c + δ1, we see from Theorem 5.1 that

L = lim x→c+

f(x) g(x)

CHAPTER 6

APPLICATIONS OF THE INTEGRAL

6A. Inﬁnite Sum Theorem (§6.1, §6.2, §6.6)

The Inﬁnite Sum Theorem, which is a nonstandard version of Duhamel’s Principle, is a simple and extremely useful criterion for a quantity to be equal to the deﬁnite integral of a function. It can be used to justify many familiar applications of the integral in geometry and physics. It captures the intuitive idea of obtaining an integration formula by considering a typical inﬁnitesimal element and adding up. Recall from Chapter 2 that given a nonzero inﬁnitesimal 4x, u ≈ v (compared to 4x) means that u 4x ≈ v 4x . Theorem 6.1. (Inﬁnite Sum Theorem) Assume that (i) h is a real function which is continuous on the interval [a,b]. (ii) B(u,w) is a real function which has the Addition Property,

B(u,w) = B(u,v) + B(v,w) for u < v < w in [a,b]. (iii) For any inﬁnitesimal subinterval [x,x +4x]∗ of [a,b]∗, B(x,x +4x) = h(x)4x (compared to 4x). Then B(a,b) =Z b a h(x)dx. We will write 4B = B(x,x +4x). The Inﬁnite Sum Theorem intuitively says that if each inﬁnitesimal piece 4B is inﬁnitely close to h(x)4x (compared to 4x), then the sum B(a,b) of all the 4B’s is inﬁnitely close to the Riemann sumPb a h(x)4x. 77

78 6. Applications of the Integral Proof. Let n1 be a positive inﬁnite hyperinteger and let 4x1 = (b−a)/n1. We will prove that for every positive real number r,

b X a (h(x)−r)4x1 < B(a,b) <

b X a

(h(x) + r)4x1.

This will show that

b X a

h(x)4x1 −r(b−a) < B(a,b) <

b X a

h(x)4x1 + r(b−a),

b X a

h(x)4x1 ≈ B(a,b)

and

Z b a

h(x)dx = B(a,b). Consider a positive integer n and let 4x = (b−a)/n. By the Addition Property,

B(a,b) =

b X a

B(x,x +4x).

It follows that every real solution of

n ∈Z, 0 < n, 4x = (b−a)/n, B(a,b) ≥

b X a (h(x) + r)4x(50) is a partial real solution of a ≤ x < x +4x ≤ b, B(x,x +4x) ≥ (h(x) + r)4x.(51) We may rewrite (51) as

a ≤ x < x +4x ≤ b,

B(x,x +4x) 4x ≥ h(x) + r.(52) By hypothesis (iii), there is no hyperreal number x1 such that (52) holds for x1,4x1. By the Partial Solution Theorem 1.20, (50) cannot hold for n1,4x1, so

B(a,b) <

b X a

(h(x) + r)4x1.

The proof that

B(a,b) >

b X a

(h(x)−r)4x1 is similar. a

6A. Infinite Sum Theorem (§6.1, §6.2, §6.6) 79 A standard form of the Inﬁnite Sum Theorem can be found, for example, in the book Buck [B], Section 3.5. This treatment also covers the two variable case, which is given in Chapter 12 of this monograph. The Inﬁnite Sum Theorem is actually a sharp form of Theorem 4.12 which states that the deﬁnite integral of f is the unique area function for f. As a ﬁrst illustration of how the Inﬁnite Sum Theorem is used, we give a second proof of the uniqueness of the area function.

Theorem 4.12 (Repeated) The deﬁnite integral is the only area function for a continuous function f on [a,b].

Second Proof. We show that for any area function B(u,v) for f, B(a,b) =Z b a f(x)dx. B(u,v) has the Addition Property. Consider any inﬁnitesimal subinterval [x,x+4x]∗ of [a,b]∗. By the Hyperreal Extreme Value Theorem 3.33, f has a minimum value m and a maximum value M in [x,x+4x]∗. By the Rectangle Property in Deﬁnition 4.1 and Transfer, m4x ≤ M4x, m ≤4B/4x ≤ M. Since f is continuous, m ≈ f(x) ≈ M, so f(x) ≈4B/4x, f(x)4x ≈4B (compared to 4x). Then by the Inﬁnite Sum Theorem, B(a,b) =Z b a f(x)dx. a At this point we need the real constant π. Definition 6.2. π is the area of the unit circle x2 + y2 = 1, π =Z 1 −1 2p1−x2 dx. It follows by a change of variables that the circle x2 + y2 = r2 of radius r has area A = πr2. We will now use the Inﬁnite Sum Theorem to justify several formulas in geometry and physics. In each case we start with intuitively reasonable assumptions about the inﬁnitesimal elements and apply the Inﬁnite Sum Theorem to get an integration formula. We ﬁrst justify the formulas for volumes of solids of revolution.

80 6. Applications of the Integral

Definition 6.3. Let D be a basic region in the plane of the form D = {(x,y): a ≤ x ≤ b, 0 ≤ y ≤ g(x)}. (i) The solid formed by revolving D about the x axis has volume V =Z b a π(g(x))2 dx. (ii) The solid formed by revolving D about the y axis has volume V =Z b a 2π xg(x)dx. Justification. (i) We use the disc method. Let V (u,w) be the volume of the solid generated around the x axis by the region between 0 and g over [u,w]. We assume: (a) V (u,w) has the Addition Property. (b) Subset Property: If S and T are solids of revolution and S ⊆ T, the volume of S is at most the volume of T. (c) A right circular cylinder with base of radius r and thickness h has volume πr2h. We use (b) and (c) for solids generated by regions over an inﬁnitesimal subinterval [x,x+4x]∗ of [a,b]∗. The region between 0 and g over [x,x+4x]∗ generates an inﬁnitely thin disc of volume 4V . g has minimum value m and maximum value M in [x,x +4x]∗, so 4V has a inscribed cylinder of radius m and a circumscribed cylinder of radius M. Both cylinders have thickness 4x, so by (b) and (c), πm24x ≤4V ≤ πM24x. By continuity, m ≈ g(x) ≈ M, so 4V ≈ π(g(x))24x (compared to 4x). The Inﬁnite Sum Theorem gives V (a,b) =Z b a π(g(x))2 dx. (ii) We use the cylindrical shell method. This time V (u,w) denotes the volume generated about the y axis by the region between 0 and g over [u,w]. We assume the same properties (a)—(c). The region between 0 and g over [x,x +4x]∗ generates a volume 4V shaped like an inﬁnitely thin cylindrical shell. By (a) and (c), the inscribed cylindrical shell has volume π(x +4x)2m−πx2m = π(2x +4x)4xm. Using a similar formula for the circumscribed shell and (b), we have π(2x +4x)4xm ≤4V ≤ π(2x +4x)4xM, π(2x +4x)m ≤4V/4x ≤ π(2x +4x)M.

6A. Infinite Sum Theorem (§6.1, §6.2, §6.6) 81 By continuity, m ≈ g(x) ≈ M, so 4V/4x ≈ π(2x +4x)g(x) ≈ 2π xg(x), 4V ≈ 2π xg(x)4x (compared to 4x). The Inﬁnite Sum Theorem gives V (a,b) =Z b a 2π xg(x)dx. a We give just one example from physics here. Other examples are sketched in Elementary Calculus, and the detailed use of the Inﬁnite Sum Theorem can be readily worked out.

Definition 6.4. Suppose a plane object occupies the basic region a ≤ x ≤ b, 0 ≤ y ≤ g(x) and its density (mass per unit area) at (x,y) is a continuous function ρ(x) of x alone. The mass of the object is mass =Z b a g(x)ρ(x)dx. Justification. Let m(u,w) be the mass of the piece of the object from x = u to x = v. Our assumptions are: (a) m(u,w) has the Addition Property. (b) If the region occupied by a plane object S is a subset of the region occupied by a plane object T, and the density of S is everywhere at most the density of T, then the mass of S is at most the mass of T. (c) The mass of a vertical rectangular object of constant density is the product of the base, the height, and the density. On an inﬁnitesimal subinterval [x,x +4x]∗ of [a,b]∗, the continuous functions g and ρ have minimum values gmin,ρmin and maximum values gmax,ρmax. 4m is an inﬁnitely thin strip of width 4x. Using (b) and (c), gmin ρmin4x ≤4m ≤ gmax ρmax4x. By continuity, gmin ≈ g(x) ≈ gmax, ρmin ≈ ρ(x) ≈ ρmax, and hence 4m ≈ g(x)ρ(x)4x (compared to 4x). Applying the Inﬁnite Sum Theorem, m(a,b) =Z b a g(x)ρ(x)dx. a

82 6. Applications of the Integral 6B. Lengths of Curves (§6.3, §6.4) Definition 6.5. A real curve y = f(x), x ∈ [a,b] is said to be smooth if the derivative dy/dx is continuous on [a,b]. The length of a smooth curve is deﬁned as s =Z b a s1 +dy dx2 dx. Justification. Let s(u,w) be the intuitive length of the curve segment over the interval [u,w]. Our assumptions are: (a) s(u,w) has the Addition Property. (b) If the slope of a smooth curve is inﬁnitely close to the slope of the chord from P(x,y) to Q(x +4x,y +4y) at every point of [x,x +4x]∗, then 4sp 4x2 +4y2 ≈ 1. Here4s is the length of the curve segment over [x,x+4x]∗ andp4x2 +4y2 is the length of the chord. On an inﬁnitesimal subinterval [x,x+4x]∗, the inﬁnitesimal curve segment 4s connects P(x,y) to Q(x+4x,y +4y). The chord from P to Q has slope 4y/4x. It follows from Corollary 3.38 on continuous derivatives that f0(x) ≈ 4y/4x. Since f0 is continuous, f0(u) ≈4y/4x for all u ∈ [x,x +4x]∗. By (b) and (c), 4sp 4x2 +4y2 ≈ 1. Then 4s 4x = 4s p4x2 +4y2 ·p4x2 +4y2 4x = 4s p4x2 +4y2 s1 +4y 4x2 ≈s1 +dy dx2 Since dy/dx ≈4y/4x, 4s 4x ≈s1 +4y 4x2 ≈s1 +dy dx2, 4s ≈s1 +dy dx24x (compared to 4x).

6B. Lengths of Curves (§6.3, §6.4) 83 By the Inﬁnite Sum Theorem, s(a,b) =Z b a s1 +dy dx2 dx. a As an application of the length formula we obtain the classical formula for the area of a sector of a circle. Theorem 6.6. On the circle x2 + y2 = r2 of radius r, let P be the point (r,0) and let Q be a point (x,y) in the ﬁrst quadrant. Then the area A of the sector POQ is given by the formula A = 1 2 rs, where s is the length of the arc PQ.

Proof. Since the circle is vertical at P, we will take y as the independent variable instead of x. In the ﬁrst quadrant the circle has the equation x =pr2 −y2.Then s =Z y 0 s1 +dx dy2 dy, dx dy = − y pr2 −y2 = − y x , s1 +dx dy2 =r1 + y2 x2 =px2 + y2 x = r x . Let A be the area of the sector POQ, A =Z y 0 pr2 −u2 du− 1 2 xy. Letting y vary, we have dA dy =pr2 −y2 − 1 2x + ydx dy= x− 1 2 x− 1 2 y−y x= x2 + y2 2x = r2 2x . Therefore dA dy = 1 2 rs1 +dx dy2. Integrating from 0 to y we obtain

A =

1 2

rZ y 0 s1 +dx dy2 dy = 1 2

rs.

84 6. Applications of the Integral

Corollary 6.7. The circle x2 + y2 = r2 has circumference 2πr. Proof. This follows from the formula A = πr2 for the area of the circle. By symmetry, the ﬁrst quadrant of the circle has area A = 1 4πr2. By Theorem 6.6, A = 1 2 rs where s is the arc length of the ﬁrst quadrant. Solving for s we get s = 1 2 πr. By symmetry again, the circumference is C = 4s = 2πr. a We next deﬁne the length of a parametric curve. Definition 6.8. A parametric curve is a pair of continuous functions x = f(t), y = g(t), t ∈ [a,b]. It is said to be smooth if at every point t ∈ [a,b], f0 and g0 are continuous and at least one of f0(t),g0(t) is nonzero. The path of the parametric curve is the set {(f(t),g(t)): a ≤ t ≤ b}. The length of a smooth parametric curve is the integral s =Z b a sdx dt2 +dy dt2 dt. The justiﬁcation of the parametric length formula is exactly like the justiﬁcation of the length formula for curves of the form y = f(x). Notice that when x = t, the parametric equation and length formula reduce to the special case y = g(x), a ≤ x ≤ b, s =Z b a s1 +dy dx2 dx. Definition 6.9. A smooth parametric curve C: x = f(t), y = g(t), t ∈ [a,b] is said to be simple if C maps [a,b] one to one onto its path and there is no t with f0(t) = g0(t) = 0. C1 is a reparametrization of C if C and C1 are simple parametric curves with the same path P. The next theorem shows that the length of a simple parametric curve depends only on its path. Theorem 6.10. (Reparametrization Theorem) Let C: x = f(t), y = g(t), t ∈ [a,b] be a simple parametric curve. (i) A parametric curve C1: x = F(u), y = G(u), u ∈ [A,B] is a reparametrization of C if and only if there is a smooth function t = h(u) mapping [A,B] onto [a,b] such that h0(u) is never zero and for all u ∈ [A,B], F(u) = f(h(u)), G(u) = g(h(u)). (ii) If C1 is a reparametrization of C then C and C1 have the same length.

6B. Lengths of Curves (§6.3, §6.4) 85 Proof. (i) Suppose C1 is a reparametrization of C. There are two cases for the endpoints. Case 1: F(A) = f(a), G(A) = g(a). Case 2: F(A) = f(b), G(A) = g(b). We give the proof in Case 1, where both curves move in the same direction. Case 2 is similar, but with the curves moving in opposite directions. Since C and C1 are one to one functions onto the same path P, there is a function h mapping [A,B] onto [a,b] such that whenever t = h(u),

(F(u),G(u)) = (f(t),g(t)). The function h is one to one because C and C1 are one to one. We claim that h is continuous. To see this, we assume that u,v ∈ [A,B]∗ and st(u) = st(v), and prove that st(h(u)) = st(h(v)). Since F and f are continuous, and F(u) = f(h(u)), we have

f(st(h(u))) = st(f(h(u))) = st(F(u)) = F(st(u)) = F(st(v)).

Using the same argument at v,

f(st(h(v))) = st(f(h(v))) = st(F(v)) = F(st(v)).

Thus f(st(h(u))) = f(st(h(v))). A similar fact holds for g, so

(f(st(h(u))),g(st(h(u)))) = (f(st(h(v))),g(st(h(v)))).

Since the curve C maps [a,b] one to one onto the path P, we have st(h(u)) = st(h(v)) as required. In Case 1 we have h(A) = a. Since h is continuous and maps [A,B] one to one onto [a,b], it follows from Theorem 3.23 that h is increasing. We now show that h is smooth and h0(u) is never zero. Let A ≤ u < u+4u ≤B with 4u ≈ 0, and let 4x,4y, and 4t be the corresponding changes inx = F(u), y = G(u), and t = h(u). Since F,G, and h are continuous, 4x,4y,and 4t are inﬁnitesimal. By Transfer, h∗ is increasing, so 4t > 0. Since C is a simple curve, either F0(st(u)) 6= 0 or G0(st(u)) 6= 0, say F0(st(u)) 6= 0. The function F is smooth, so its derivative F0 is continuous. Then by Theorem 3.37, F0(st(u)) ≈4x/4u, and therefore4x 6= 0. It follows thatp4x2 +4y2 is positive inﬁnitesimal. We thus have 4t 4u =p4x2 +4y2/4u p4x2 +4y2/4t =p(4x/4u)2 + (4y/4u)2 p(4x/4t)2 + (4y/4t)2= pF0(st(u))2 + G0(st(u))2 pf0(st(t))2 + g0(st(t))2 ,so for real u, h0(u) exists and h0(u) = pF0(u)2 + G0(u)2 pf0(h(u))2 + g0(h(u))2 6= 0.Since h,F0,G0,f0, and g0 are continuous, the derivative h0 is continuous.

86 6. Applications of the Integral

We now prove the converse. Since h0 is never zero, h maps [A,B] one to one onto [a,b]. Therefore C1 maps [A,B] one to one onto the path P. By the Chain Rule, F0(u) = f0(h(u))h0(u), G0(u) = g0(h(u))h0(u). Thus F0 and G0 are continuous and are never both zero, so C1 is a simple parametric curve. (ii) We give the proof in Case 1, where (F(A),G(A)) = (f(a),g(a)). Let h be as in part (i). Then h0(u) > 0, so length of C =Z b a pf0(t)2 + g0(t)2 dt =Z B A pf0(h(u))2 + g0(h(u))2 h0(u)du =Z B A pF0(u)2 + G0(u)2 du = length of C1.The proof in Case 2, where ( F(A),G(A)) = (f(b),g(b)), is similar but with h0(u) < 0 and the integral from B to A. a We now justify the formula for the area of a surface of revolution. Our starting point is the elementary formula A = π(r1 + r2)` for the surface area of a circular cone frustum of slant height ` and bases of radius r1 and r2.

Definition 6.11. The surface area formed by revolving a smooth curve y = f(x), a ≤ x ≤ b about the y axis is A =Z b a 2π xs1 +dy dx2 dx. The surface area formed by revolving the curve about the x axis is A =Z b a 2π ys1 +dy dx2 dx. Justification. We justify the area formula for the surface of revolution about the y axis. Let s(u,v) and A(u,v) be the length of the curve and the area of the surface of revolution for u ≤ x ≤ v. Assume that A(u,v) has the Addition Property. Let [x,x +4x]∗ be an inﬁnitesimal subinterval of [a,b]∗. When the line segment from (x,y) to (x+4x,y+4y) is revolved about the y axis it generates an inﬁnitely thin cone frustum of slant heightp4x2 +4y2 and bases of radius x and x +4x. This frustum has surface area π(x + (x +4x))p4x2 +4y2.

6C. Improper Integrals (§6.7) 87 The curve y = f(x) has value and slope inﬁnitely close to the line segment from (x,y) to (x + 4x,y + 4y) throughout the interval [x,x + 4x]∗. It is thus reasonable to assume that the surface area 4A = A(x,x +4x) and the frustum area have a ratio inﬁnitely close to one, 4A π(x + (x +4x))p4x2 +4y2 ≈ 1.Since 4y/4x ≈ dy/dx, 4A π(x + (x +4x))p4x2 +4y2 ≈ 4A 2π xp4x2 +4y2= 4A/4x 2π xr1 +4y 4x2 ≈ 4A/4x 2π xr1 +dy dx2. Therefore 4A ≈ 2π xs1 +dy dx24x (compared to 4x), and by the Inﬁnite Sum Theorem, A(a,b) =Z b a 2π xs1 +dy dx2 dx. The justiﬁcation of the area formula for the surface of revolution about the x axis is similar. a Given a simple parametric curve x = f(t), y = g(t), t ∈ [a,b], the area of the surface of revolution about the y axis is deﬁned by A =Z b a 2π xsdx dt2 +dy dt2 dt.

6C. Improper Integrals (§6.7)

In Chapter 4 we only considered integrals of continuous functions over closed intervals. Using improper integrals we can integrate functions with at most ﬁnitely many discontinuities over arbitrary intervals.

Definition 6.12. Suppose f is continuous on the half-open interval [a,b). The improper integral of f from a to b is deﬁned as the limit Z b a f(x)dx = lim u→b−Z u a f(x)dx.

88 6. Applications of the Integral

If the limit exists, the integral is said to converge. Otherwise the integral is said to diverge. If the limit is ∞ we say that the integral diverges to ∞ and write Z b a f(x)dx = ∞. Other types of improper integrals are deﬁned analogously. For example: If f is continuous on (a,b], Z b a f(x)dx = lim u→a+Z b u f(x)dx. If f is continuous on [a,∞), Z∞ a f(x)dx = lim u→∞Z u a f(x)dx. The deﬁnitions can be rephrased using the inﬁnitesimal deﬁnition of limit. We state four cases. Let f be continuous on [a,b). Rb a f(x)dx = L if and only if whenever u < b but u ≈ b,Ru a f(x)dx ≈ L.R b a f(x)dx = ∞ if and only if whenever u < b but u ≈ b,Ru a f(x)dx is positive inﬁnite. Let f be continuous on [a,∞).R ∞ a f(x)dx = L if and only if whenever H is positive inﬁnite,RH a f(x)dx ≈ L.R ∞ a f(x)dx = ∞ if and only if whenever H is positive inﬁnite,RH a f(x)dx is positive inﬁnite. Theorem 6.13. If f is continuous on [a,b], then the improper integral of f from a to b converges and equals the deﬁnite integral. Proof. Let F(u) =Ru a f(x)dx. By the Second Fundamental Theorem of Calculus 4.17, F is continuous on [a,b]. Therefore Z b a f(x)dx = F(b) = lim u→b− F(u) = lim u→b−Z u a f(x)dx. a Theorem 6.14. Let f be continuous and nonnegative for x in [a,∞). Then the improper integralR∞ a f(x)dx either converges to some ﬁnite value or diverges to ∞. Proof. Let F(u) =Ru a f(x)dx. By the Second Fundamental Theorem of Calculus, F is continuous on [a,∞). Since f is nonnegative, it follows from Theorem 4.9 that F is nondecreasing on [a,∞). Case 1: The range of F is bounded. Then it has a least upper bound L. Let H be positive inﬁnite. Since every real u ≥ a is a solution of F(u) ≤ L, Transfer gives F(H) ≤ L. Consider a real number r < L. There is a real u ≥ a

6C. Improper Integrals (§6.7) 89 such that F(u) > r. Any real solution of v ≥ u is a solution of F(v) ≥ F(u), and hence F(v) > r. Therefore F(H) > r. Since this holds for all real r < L, we conclude that F(H) ≈ L, soR∞ a f(x)dx = L. Case 2: The range of F is not bounded. Again let H be positive inﬁnite. For every real r there is a real u ≥ a such that F(u) > r. Since F(H) ≥ F(u), we have F(H) > r and hence F(H) is positive inﬁnite. ThusR∞ a f(x)dx = ∞. a

CHAPTER 7

TRIGONOMETRIC FUNCTIONS

In the study of trigonometric functions we use the notion of arc length on the unit circle, which depends on the integration formula for curve length from Chapter 6, s =Z b a s1 +dy dx2 dx. To prepare the way for this study we introduce inverse functions.

7A. Inverse Function Theorem (§7.3) Given a binary relation X ⊆R2 on the reals, the inverse relation of X is the binary relation Y = {(y,x): (x,y) ∈ X}. Obviously, if Y is the inverse relation of X then X is the inverse relation of Y . If f and g are real functions and g is the inverse relation of f, we call g the inverse function of f. Here are some simple facts about inverse functions.

Lemma 7.1. (i) A real function f has an inverse function if and only if f is one to one. (ii) g is the inverse function of f if and only if the equations f(x) = y, g(y) = x have the same solutions. (iii) g is the inverse function of f if and only if domain(g) = range(f) and g(f(x)) = x for all x ∈ domain(f). We remark that if f is increasing, then f is one to one and therefore has an inverse function. Similarly, if f is decreasing then f has an inverse function.

Theorem 7.2. Let f be increasing and continuous on its domain which is an interval I, and let g be the inverse function of f. Then g is increasing, the domain of g is an interval J, and g is continuous on J.

92 7. Trigonometric Functions Proof. Let J be the domain of g. Suppose A,B ∈ J and A < B. Leta = g(A), b = g(B). Then f(a) = A,f(b) = B. We cannot have b ≤ a because that would imply that f(b) ≤ f(a),B ≤ A. Therefore a < b, so g is increasing. Suppose C ∈ (A,B). By the Intermediate Value Theorem 3.28 for the continuous function f, there is a point c ∈ (a,b) such that f(c) = C. Therefore C belongs to the range of f, which is the domain J of g. This shows that J is an interval. To show that g is continuous, let A ∈ J,X ∈ J∗,A ≈ X, and put a = g(A),x = g(X). We must show that a ≈ x. C ∈ J and c = g(C) implies c ∈ I and f(c) = C, so by Transfer we have x ∈ I∗ and f(x) = X. Assume ﬁrst that A < X. Since g is increasing, g∗ is increasing by Transfer, so a < x. If a 6≈ x, then there is a real number c ∈ I such that a < c < x, hence f(a) < f(c) < f(x), contradicting f(a) = A ≈ X = f(x). This shows that a ≈ x. The case X > A is similar, so g is continuous on J. a Theorem 7.3. (Inverse Function Theorem) Suppose f is continuous and increasing on an open interval I, and g is the inverse function of f. For any point x ∈ I where f is diﬀerentiable and f0(x) 6= 0, g is diﬀerentiable at y = f(x) and g0(y) = 1/f0(x). Proof. Let 4y be a nonzero inﬁnitesimal. We must show that g(y +4y)−g(y) 4y ≈ 1 f0(x) . Let J be the range of f and the domain of g. Since I is an open interval, J is an open interval. Thus y = f(x) is an interior point of J. Therefore g(y+4y) is deﬁned. We have x = g(y), and we put 4x = g(y +4y)−g(y). By Theorem 7.2, g is continuous and increasing, so 4x is inﬁnitesimal and not zero. Also, x +4x = g(y +4y), f(x +4x) = y +4y. Then f0(x) ≈ f(x +4x)−f(x) 4x = 4y 4x , and since f0(x) 6= 0, 1 f0(x) ≈ 4x 4y = g(y +4y)−g(y) 4y . a We now apply the Inverse Function Theorem to show that in a simple parametric curve we may take the curve length itself as the independent variable.

Theorem 7.4. Let C: x = f(t), y = g(t), t ∈ [a,b]

7A. Inverse Function Theorem (§7.3) 93 be a simple parametric curve. Then C has a reparametrization C1: x = F(s), y = G(s), s ∈ [0,L] such that s is equal to the length of the curve from 0 to s.

Proof. Let `(t) be the curve length from a to t, `(t) =Z t a pf0(u)2 + g0(u)2 du. By the Second Fundamental Theorem of Calculus, 4.17, `0(t) =pf0(t)2 + g0(t)2. This is always positive and continuous, and ` maps [a,b] onto [0,L] where L is the length of C. By the Inverse Function Theorem 7.3, the inverse function h of ` maps [0,L] onto [a,b] and has the derivative h0(s) = 1/`0(h(s)) > 0. Since h and `0 are continuous, h0 is continuous. By the Reparametrization Theorem 6.10, the curve C1: x = f(h(s)), y = g(h(s)), s ∈ [0,L] is a reparametrization of C, and the length of C1 from 0 to s is `(h(s)) = s. a We will now prove a Local Inverse Function Theorem, due to Behrens (see the book of Stroyan and Luxemburg [SL 1976]). It does not require the hypothesis that f is continuous on a neighborhood, and will be convenient when we study two variables later. Recall from Chapter 3 that a real function f is uniformly diﬀerentiable at a real point c if f0(c) exists and whenever x ≈ c and 4x is nonzero inﬁnitesimal, f0(c) ≈ f(x +4x)−f(x) 4x . Theorem 7.5. (Local Inverse Function Theorem) Suppose f is deﬁned in a neighborhood of a real point c, increasing, uniformly diﬀerentiable at c, and that f0(c) 6= 0. Then the inverse function g of f is uniformly diﬀerentiable at d = f(c) and g0(d) = 1/f0(c).

Proof. Since f is increasing, g is increasing. By Theorem 3.39, f is continuous on some real neighborhood I of c. Then by the Inverse Function Theorem 7.3, g0(d) exists and g0(d) = 1/f0(c). To show that g is uniformly differentiable at d, suppose y ≈ d and 4y is nonzero inﬁnitesimal. Let x = g(y) and4x = g(y+4y)−g(y). Since g increasing and continuous,4x is a nonzero inﬁnitesimal. f is uniformly diﬀerentiable at c, so f0(c) ≈ f(x +4x)−f(x) 4x . From the deﬁnition of 4x we see that g(y +4y) = g(y) +4x = x +4x, so y +4y = f(x +4x)

94 7. Trigonometric Functions

and

4y = f(x +4x)−y = f(x +4x)−f(x).

Therefore

g0(d) = 1/f0(c) ≈ 4x f(x +4x)−f(x)

g(y +4y)−g(y) 4y

. This shows that g is uniformly diﬀerentiable at d. a

7B. Derivatives of Trigonometric Functions (§7.1, §7.2) We ﬁrst deﬁne the trigonometric functions and then use the Inverse Function Theorem 7.3 to show that they are diﬀerentiable. Definition 7.6. Let 0 ≤ θ ≤ π/2 and let P(x,y) be the point at distance θ counter-clockwise around the unit circle starting from (1,0). We then deﬁne x = cos θ, y = sin θ. Lemma 7.7. For 0 ≤ θ ≤ π/2, the functions y = sinθ and x = cos θ are diﬀerentiable, and d(sin θ) dθ = cos θ, d(cos θ) dθ = −sin θ. Proof. We repeat the argument used for Theorem 7.4 but for the special case x =p1−y2, 0 ≤ y < 1. We have dx dy = −y p1−y2,s 1 +dx dy2 =s1 + y2 1−y2 = 1 p1−y2.Let θ be the arc length θ =Z y 0 1 √1−u2 du.By deﬁnition, x = cos θ and y = sin θ. For 0 ≤ y < 1, θ is increasing and hasthe continuous derivative dθ dy = 1 p1−y2.By the Inverse Function Theorem 7.3, y = sin θ is diﬀerentiable and dy dθ = 1 dθ/dy =p1−y2 = x = cos θ.Similarly, dx dθ = dx dy dy dθ = −y = −sin θ. a

7C. Area in Polar Coordinates (§7.9) 95 We extend the sine and cosine functions from the interval [0,π/2] to the whole real line by deﬁning

sin(θ + π/2) = cos θ, sin(θ + π) = −sin θ. One can easily check that the sine and cosine functions have period 2π. Given the derivative formulas for 0 ≤ θ < π/2 in Lemma 7.7, we leave the proof of the formulas for arbitrary θ to the reader.

Theorem 7.8.

d(sin θ) dθ

= cos θ,

d(cos θ) dθ

= −sin θ. The other trigonometric functions are deﬁned by

tan θ =

sin θ cos θ

, cot θ =

cos θ sinθ

sec θ =

1 cos θ

, csc θ =

1 sinθ

The inverse trigonometric functions are obtained by restricting the trigonometric functions to either [−π/2,π/2] or [0,π] and then taking the inverse functions. For example,

arc sin x is the inverse of sinθ, −π/2 ≤ θ ≤ π/2, arc tan x is the inverse of tanθ, −π/2 ≤ θ ≤ π/2, arc sec x is the inverse of secθ, 0 ≤ θ ≤ π. The Inverse Function Theorem 7.3 shows that the inverse trigonometric functions are diﬀerentiable except at the endpoints and leads in the usual way to the formulas for the derivatives: d(arc sin x) dx = 1 √1−x2, |x| < 1, d(arc tan x) dx = 1 1 + x2 , all x, d(arc sec x) dx = 1 |x|√x2 −1, |x| > 1.

7C. Area in Polar Coordinates (§7.9)

We assume that the reader is familiar with the polar coordinate system. The Inﬁnite Sum Theorem 6.1 can be used to obtain a formula for area in polar coordinates.

96 7. Trigonometric Functions

Definition 7.9. By a basic polar region we mean a set of points with polar coordinates of the form {(r,θ): a ≤ θ ≤ b, f(θ) ≤ r ≤ g(θ)}, where b ≤ a + 2π, f and g are continuous on [a,b], and 0 ≤ f(θ) ≤ g(θ). Thus the image of a basic polar region under the mapping (r,θ) 7→ (rcos θ,rsin θ) is a basic rectangular region. If the functions f and g are constants the region is called a polar rectangle. The simplest kind of polar rectangle is a circular sector {(r,θ): a ≤ θ ≤ b, 0 ≤ r ≤ c}. Our starting point for polar areas is the following formula for the area of a circular sector: A = 1 2 c2(b−a). This formula comes from the formula A = 1 2 rs for a circular sector where r is the radius of the circle and s = (b−a)r is the length of the arc. Consider a polar region of the simple form D = {(r,θ): a ≤ θ ≤ b, 0 ≤ r ≤ g(θ)}. Definition 7.10. Let g(θ) be a nonnegative continuous real function for a ≤ θ ≤ b, where b ≤ a + 2π. By a polar area function for g we mean a function A(u,v) deﬁned for u,v ∈ [a,b] with the following two properties. (i) A(u,w) = A(u,v) + A(v,w). (ii) If g has minimum value m and maximum value M on [u,v], then 1 2 m2(v−u) ≤ A(u,v) ≤ 1 2 M2(v−u). Condition (ii) says that A(u,v) is between the areas of the inscribed and circumscribed circular sectors for u ≤ θ ≤ v. Theorem 7.11. The unique polar area function for g is the deﬁnite integral A(a,b) =Z b a 1 2 g(θ)2 dθ. Proof. A(u,v) is a polar area function for g because Z v u 1 2 m2 dθ ≤Z v u 1 2 g(θ)2 dθ ≤Z v u 1 2 M2 dθ, Z v u 1 2 m2 dθ = 1 2 m2(v−u), Z v u 1 2 M2 dθ = 1 2 M2(v−u). To prove uniqueness let B(u,v) be a polar area function for g. Let [θ,θ+4θ]∗ be an inﬁnitesimal subinterval of [a,b]∗. On [θ,θ +4θ]∗, g has a minimum

7C. Area in Polar Coordinates (§7.9) 97 value m and a maximum value M. By the continuity of g, m ≈ g(θ) ≈ M. Applying the Transfer Axiom to property (ii) we have 1 2 m24θ ≤4B ≤ 1 2 M24θ, 1 2 m2 ≤ 4B 4θ ≤ 1 2 M2, 4B 4θ ≈ 1 2 g(θ)2, 4B ≈ 1 2 g(θ)24θ (compared to 4θ). By (i), B has the Addition Property. Therefore by the Inﬁnite Sum Theorem 6.1, B(a,b) =Z b a 1 2 g(θ)2 dθ. a The area formula for an arbitrary basic polar region, which is justiﬁed in a similar way, is A =Z b a 1 2 (g(θ)2 −f(θ)2)dθ.

CHAPTER 8

EXPONENTIAL FUNCTIONS

The exponential and logarithmic functions have been introduced in a variety of ways in calculus texts. Our approach here (and in Elementary Calculus) is to deﬁne ax as the unique continuous function of x which has the value am/n = n√am when x = m/n is rational. In this chapter we will use hyperreal numbers to prove a general result on uniquely extending continuous functions, and then apply the result to the case of the exponential functions.

8A. Extending Continuous Functions

Let us recall the hyperreal characterizations of the closure of a set of reals and of uniform continuity. By Corollary 1.29, the closure of a set Y ⊆ R is the set Y = {st(y): y is ﬁnite and y ∈ Y ∗}. By Deﬁnition 3.12, a real function f is uniformly continuous on a set Y ⊆ R if and only if for all x,y ∈ Y ∗, x ≈ y implies f(x) ≈ f(y). We will see in the next section that the exponential function ax is not uniformly continuous on the set Q of all rationals, but is uniformly continuous on every bounded subset of Q. Proposition 8.1. A real function f is uniformly continuous on every bounded subset of Y if and only if for every ﬁnite x,y ∈ Y ∗, x ≈ y implies f(x) ≈ f(y). Proof. This follows at once from the fact that x,y are ﬁnite elements of Y ∗ if and only if x,y ∈ (Y ∩[a,b])∗ for some interval [a,b]. a Theorem 8.2. Let f be a real function which is uniformly continuous on every bounded subset of its domain Y . Then f has a unique extension g whose domain is the closure Y of Y and which is continuous on Y . Proof. Let g be the real function on Y deﬁned by g(st(x)) = st(f(x)) for all ﬁnite x ∈ Y ∗. This deﬁnition is unambiguous because if st(x) = st(y) then st(f(x)) = st(f(y)). g extends f, because if r ∈ Y then g(r) = g(st(r)) = st(f(r)) = f(r).

100 8. Exponential Functions We show that g is continuous at each point c ∈ Y . Let X be the bounded set X = Y ∩[c−1,c + 1]. We use the ε,δ condition for uniform continuity, Theorem 5.7. Consider a real ε > 0. Since f is uniformly continuous on X, there is a real δ ∈ (0,1) such that whenever x,y ∈ X and |x−y| < δ, |f(x)−f(y)| < ε/2. By Transfer, whenever x,y ∈ X∗ and |x−y| < δ, |f(x)−f(y)| < ε/2. Now let b ∈ Y and |b−c| < δ. We have b = st(x),c = st(y) for some x,y ∈ Y ∗. Since |b−c| < δ < 1 we have x,y ∈ X∗, |x−y| < δ. Therefore |f(x)−f(y)| < ε/2 |st(f(x))−st(f(y))|≤ ε/2 < ε |g(b)−g(c)| < ε. Thus the ε,δ condition for continuity holds for g at c. The function g is unique because for any continuous extension h of f to Y and any point c ∈ Y we have h(c) = h(st(x)) = st(h(x)) = st(f(x)) = g(c) where x ∈ Y ∗ and c = st(x), a 8B. The Functions ax and logb x (§8.1, §8.2) In Elementary Calculus, hyperreal numbers were used to deﬁne and obtain the basic properties of the exponential function. The natural extension of the set Q of rationals is called the set Q∗ of hyperrationals. Here are some properties of Q∗ which follow at once from the Transfer Axiom. Proposition 8.3. (i) Q∗ is a subﬁeld of R∗, that is, Q∗ is closed under addition, subtraction, multiplication, and division by nonzero numbers. (ii) Q∗ is dense in R∗, that is, if b < c in R∗ then there exists q ∈Q∗ with b < q < c. (iii) For every x ∈R∗ there is a q ∈Q∗ such that x ≈ q. (This follows from (ii)). The closure of the set Q of rationals is the whole real line R. One way to see this is by applying (iii) to x ∈ R. We will use the following properties of rational exponents.

Lemma 8.4. Let a,b be positive real numbers and q,r be rational. Then: (i) 1q = 1 (ii) aq+r = aqar, aq−r = aq/ar (iii) aqr = (aq)r (iv) aqbq = (ab)q (v) a 0 imply aq < bq (vi) 1 < a and q < r imply aq < ar

8B. The Functions ax and logb x (§8.1, §8.2) 101 (vii) q ≥ 1 implies (a + 1)q ≥ aq + 1. Lemma 8.5. Let a > 0. The function f(x) = ax, x ∈ Q, is uniformly continuous on every bounded subset of Q. Proof. Assume ﬁrst that a ≥ 1. Suppose q,r ∈Q∗, q ≈ r, q < r. We must show that aq ≈ ar. Let b = a(q−r)−1. By Lemma 8.4 (vi) and Transfer, b ≥ 0. Moreover, by Lemma 8.4 (vii) and Transfer, a = (b + 1)1/(r−q) ≥ b r−q + 1 ≥ 1, so b/(r −q) is ﬁnite and hence b ≈ 0. Thus ar−q ≈ 1. For some integer n, n ≤ q < n+1, and by Lemma 8.4 (vi). an < aq < an+1. Therefore aq is ﬁnite and ar ≈ aq. Now assume 0 < a < 1. Then a−1 > 1, so by the preceding paragraph, ar =a−1−r ≈a−1−q = aq. a By Theorem 8.2 we may make the following deﬁnition.

Definition 8.6. Let a be a positive real number. The exponential function with base a, ax, is the unique extension of the function aq, q ∈Q which is continuous on the whole real line.

Theorem 8.7. The exponent rules (i)—(vii) of Lemma 8.4 hold for arbitrary real numbers q and r.

Proof. All the rules except (iii) are easily proved by using Lemma 8.4 and the fact that there are hyperrational numbers q1 ≈ q and r1 ≈ r. We prove (iii) for the case 1 < a and q,r > 0. Choose hyperrational numbers q1,q2 such that q1 ≈ q2, q1 < q < q2. Do the same for r. We have q1r1 < qr < q2r2. By (vi) and Transfer, aq1r1 < aqr < aq2r2, aq1 < aq < aq2, (aq1)r1 < (aq)r < (aq2)r2 . Using (iii) for hyperrational exponents, aq1r1 = (aq1)r1 , aq2r2 = (aq2)r2 . But q1r1 ≈ qr ≈ q2r2, so by the continuity of the function ax, aq1r1 ≈ aqr ≈ aq2r2. It follows that aqr ≈ (aq)r, and since both numbers are real they must be equal. a

102 8. Exponential Functions Definition 8.8. Let 0 < a and a 6= 1. The logarithmic function with base a is the inverse of the exponential function with base a, x = loga y if and only if y = ax. By Theorem 7.2, loga y is continuous and has domain (0,∞). The familiar rules for logarithms follow from the corresponding rules for exponents and will be used freely below.

8C. Derivatives of Exponential Functions (§8.3)

In this section we introduce the number e, and diﬀerentiate the functions ex and ln x. The next lemma uses the geometric series formula (1 + b + b2 +···+ bn) = bn+1 −1 b−1 , b 6= 1, which is easily proved by multiplying both sides by b−1. Lemma 8.9. The limit limx→∞1 + 1 xx exists.Proof. Let b be a real number greater than 1. The function y = bt is continuous, so it has an integral c =R1 0 bt dt. yt is positive for all t, so c > 0. Let H be positive inﬁnite. We will prove that 1 + 1 HH ≈ bc/(b−1). We work with the logarithm logb"1 + 1 HH#= H logb1 + 1 H. Let 4t = logb1 + 1 H. 4t is positive inﬁnitesimal, because 4t ≈ logb 1 = 0. Moreover, b4t = 1 + 1 H , H = 1 b4t −1 , H logb1 + 1 H= 4t b4t −1 . We wish to estimate the Riemann sumP1 0 bt4t. Any real solution of 4u > 0, n ∈Z, n4u < 1 ≤ (n + 1)4u(53)

8C. Derivatives of Exponential Functions (§8.3) 103 is a solution of 1 + b4u +···+ b(n−1)4u4u ≤ 1 X 0 bu4u ≤1 + b4u +···+ bn4u4u. By the geometric series formula, this simpliﬁes to bn4u −1 b4u −1 4u ≤ 1 X 0 bu4u ≤ b(n+1)4u −1 b4u −1 4u.(54) Let K be the hyperinteger with K4t < 1 < (K + 1)4t. By Transfer, bK4t −1 b4t −1 4t ≤ 1 X 0 bt4t ≤ b(K+1)bc/(b−1)4t −1 b4t −1 4t. Then H logb1 + 1 HbK4t −1≤ 1 X 0 bt4t ≤ H logb1 + 1 Hb(K+1)4t −1, P1 0 bt4t b(K+1)4t −1 ≤ H logb1 + 1 H≤P1 0 bt4t bK4t −1 . Since K4t ≈ 1 ≈ (K + 1)4t, we conclude that c b−1 ≈ H logb1 + 1 H, c =Z 1 0 bt dt and bc/(b−1) ≈1 + 1 HH . Since this holds for all positive inﬁnite H,

bc/(b−1) = lim x→∞1 + 1 xx .

Definition 8.10.

e = lim x→∞1 + 1 xx , ln x = loge x. Theorem 8.11. The exponential function ex is diﬀerentiable, and d(ex) dx = ex.

104 8. Exponential Functions Proof. Let t be ﬁnite and 4t be positive inﬁnitesimal. We show that et+4t −et 4t ≈ et. We have et+4t −et 4t = ete4t −1 4t . Let b = e4t−1 4t . Then e4t = 1+b4t. Since et is continuous, e4t ≈ 1, and b4t is positive inﬁnitesimal. Thus H = 1/(b4t) is positive inﬁnite. Also, e ≈1 + 1 HH = (1 + b4t)1/b4t =e4t1/b4t = e1/b.Therefore b ≈ 1, and et+4t −et 4t = bet ≈ et.(55) Now let x be real and 4t be positive inﬁnitesimal. Then by (55), ex+4t −ex 4t ≈ ex, ex −ex−4t 4t ≈ ex−4t ≈ ex. Therefore d(ex) dx = ex. a In general, ax = ex lna, so d(ax) dx = dex ln a dx = (ln a)ex lna = (ln a)ax. This shows that e is uniquely characterized by the equation d(ex)/dx = ex, because when a 6= e, ln a 6= 1. Corollary 8.12. y = ln x is diﬀerentiable for all x > 0, and d(ln x) dx = 1 x . Proof. Since y = ln x, x = ey. By the Inverse Function Theorem 7.3, dy/dx exists and dy dx = 1 dx/dy = 1 ey = 1 x . a

CHAPTER 9

INFINITE SERIES

Throughout this chapter we let H and K denote positive inﬁnite hyperintegers. The set of natural numbers (nonnegative integers) will be denoted by N, and the set of positive integers by N+.

9A. Sequences (§9.1) By an inﬁnite sequence hani we mean a function from either N or N+ into the real numbers. Definition 9.1. An inﬁnite sequence hani converges to a real number Lif aH ≈ L for every H, in symbols, limn→∞an = L. hani diverges to ∞ ifa H is positive inﬁnite for all H, in symbols, limn→∞an = ∞. We begin with some computations of limits. Theorem 9.2. The following sequences diverge to ∞. lim n→∞ n! bn = ∞, (b ≥ 1) lim n→∞ bn nc = ∞, (b > 1,c ≥ 0) lim n→∞ nc ln(n) = ∞, (c > 0) lim n→∞ ln(n) = ∞. Proof. We show that each of H! bH , bH Hc , Hc ln(H) , ln(H) is positive inﬁnite. The natural logarithm ln(H) is positive inﬁnite because for each real r, er < H and hence r < ln(H). In the other three cases we show that the logarithm of the quotient is positive inﬁnite. H!/bH: For an integer m > b we have lnH! bH= ln(1) +···+ ln(m−1) + ln(m) +···+ ln(H)−H ln(b)

105

106 9. Infinite Series > (H −m)ln(m)−H ln(b) = H(lnm−lnb)−mlnm. Since lnm > lnb, lnH!/bHis positive inﬁnite. bH/Hc: We have lnbH Hc= H lnb−clnH = Hln(b)−cln(H) H . By l’Hospital’s Rule, limx→∞(ln(x)/x) = 0, so ln(H)/H ≈ 0 and lnbH/Hc is positive inﬁnite. Hc/lnH: Putting K = ln(H), lnHc ln(H)= cln(H)−ln(ln(H)) = Kc− ln(K) K , so ln(Hc/ln(H)) is positive inﬁnite. a Here are three equivalence theorems for limits of sequences. Theorem 9.3. Given a sequence hani and a real number L, the following are equivalent. (i) limn→∞an = L. (ii) There is an H such that for all K ≥ H, aK ≈ L. (iii) The ε,M condition: For every real ε > 0 there is a positive integer M such that for all n ≥ M, |an −L| < ε. The proof is similar to Theorem 5.1, the equivalence theorem for limits of functions.

Theorem 9.4. The following are equivalent. (i) limn→∞an = ∞. (ii) There is an H such that for all K ≥ H, aK is positive inﬁnite. (iii) For every real B there is a positive integer M such that for all n ≥ M,a n > B. Theorem 9.5. The following are equivalent. (i) The sequence hani converges. (ii) Hyperreal Cauchy Condition: For all H and K, aH ≈ aK. (iii) Real Cauchy Condition: For every real ε > 0 there is a positive integer M such that for all m,n ≥ M, |am −an| < ε. Proof. Assume (i), say limn→∞an = L. Then for all H and K, aH ≈ L ≈a K, so (ii) holds. Now assume (ii). Suppose the Real Cauchy Condition (iii) fails for some real ε > 0. Then every M ∈N+ is a partial real solution of m ∈N+, n ∈N+, M ≤ m, M ≤ n, |am −an|≥ ε. Let J be positive inﬁnite. By the Partial Solution Theorem 1.20 there exist H,K ≥ J such that |aH = aK|≥ ε, contradicting (ii). Assume the Real Cauchy Condition (iii). There is a positive integer M1 such that for all integers m,n ≥ M1, |am − an| < 1. By Transfer. for all integers m ≥ M1, |am −aH| < 1. Therefore aH is ﬁnite. Let L = st(aH). We

9A. Sequences (§9.1) 107 show that the sequence converges to L. Let ε > 0 be real and let M be the corresponding positive integer. Using Transfer again, for all hyperreal n ≥ M we have |an −aH| < ε. Thus for all K, |aK −aH| < ε. Since this holds for each ε, we have aK ≈ aH ≈ L for every K, whence limn→∞an = L, a We now give hyperreal proofs of the Bolzano-Weierstrass Theorem and the countable Heine-Borel Theorem. A subsequence ofhaniis a sequencehaf(n)i where f is an increasing function from N into N. Lemma 9.6. (i) If st(H) = L then hani has a subsequence converging to L. (ii) If aH is positive inﬁnite then hani has a subsequence diverging to ∞. Proof. (i) For each n ∈N+ and each H we have H ∈N∗, H ≥ n, |aH −L|≤ 1 n . By the Partial Solution Theorem, each n ∈N+ is a partial real solution of m ∈N+, m ≥ n, |am −L| < 1 n . Deﬁne f(0) = 0, and for each n > 0 deﬁne f(n) to be the ﬁrst positive integer m such that m > f(n−1), |am −L| < 1 n . Then f is increasing and |af(n) −L| < 1 n . By the ε,δ condition for limits (Theorem 5.1), the subsequence haf(n)i converges to L. (ii) The proof is similar to (i). a hani is a bounded sequence if its range {an: n ∈ N} is a bounded set. Thus by Theorem 1.31, hani if bounded if and only if aH is ﬁnite for every H ∈N∗. Theorem 9.7. (Bolzano-Weierstrass Theorem) Every bounded sequence has a convergent subsequence. Proof. Let hani be bounded and choose an inﬁnite H. Then aH is ﬁnite, and by Lemma 9.6 hani has a subsequence converging to st(aH). a Theorem 9.8. (Countable Heine-Borel Theorem) Let X0 ⊇ X1 ⊇···⊇ Xn ⊇··· be a decreasing chain of nonempty closed bounded sets of real numbers. Then ∞ \ n=0 Xn 6= ∅.

108 9. Infinite Series Proof. Let P(x,n) be the binary relation x ∈ Xn on the reals. Then for each natural number n, X∗ n = {x ∈R∗: P∗(x,n)}. Choose an inﬁnite H and deﬁne X∗ H = {x ∈R∗: P∗(x,H)}. By hypothesis, each Xn is nonempty, so each real solution of n ∈N is a partial solution of n ∈N, P(x,n). By the Partial Solution Theorem, there is a hyperreal x such that P∗(x,H), so x ∈ X∗ H. Using Transfer we see that X∗ H ⊆ X∗ n for all n ≤ H, and hence forall n ∈N. Therefore x ∈ X∗ n for all n ∈N. Since Xn is closed and bounded, xis ﬁnite and st( x) ∈ Xn. Thus st(x) ∈T∞ n=0 Xn. a Theorem 9.9. An increasing sequence hani either diverges to ∞ or converges. Proof. Suppose the sequence does not diverge to ∞. Then some aH is not positive inﬁnite. Since hani is increasing, it follows from the Transfer Axiom that the natural extension of hani is increasing, that is, am < an whenever m < n in N∗. Then aH > a0, so aH is not negative inﬁnite. Therefore aH is ﬁnite. By Lemma 9.6, hani has a subsequence haf(n)i which converges to L = st(aH). For every natural number n ≥ f(0) there is an m such that f(m) ≤ n < f(m + 1). Take an inﬁnite K. By the Partial Solution Theorem there is an M ∈ N∗ such that f(M) ≤ K ≤ f(M + 1). M must be positive inﬁnite, so af(M) ≈ L ≈ af(M+1). But since hani is increasing, af(M) ≤ aK ≤ af(M+1). Therefore aK ≈ L, so hani converges to L. a

9B. Series (§9.2 – §9.6)

Given an inﬁnite sequence hani = a0,a1,... ,an,... , The partial sum sequence hSni is deﬁned by Sn = a0 + a1 +···+ an = n X k=0

ak.

By the Function Axiom, the natural extension of the function Sn has a value SH for each positive inﬁnite hyperinteger H, which we call the inﬁnite partial sum SH = a0 + a1 +···+ aH = H X k=0 ak.

9B. Series (§9.2 – §9.6) 109 Definition 9.10. The inﬁnite seriesP∞ n=0 an is said to converge to S, in symbols S = a0 + a1 +···+ an +··· = ∞ X n=0 an, if the partial sum sequence converges to S,

S = lim n→∞

Sn = lim n→∞

(a0 + a1 +···+ an). That is, for every positive inﬁnite hyperinteger H, S ≈ a0 + a1 +···+ aH. The series is said to diverge if the partial sum sequence diverges. Proposition 9.11. (Geometric Series) For each b ∈ (−1,1), ∞ X n=0 bn = 1 1−b . Proof. By the geometric series formula and Transfer, for each inﬁnite H we have (1 + b + b2 +···+ bH) = bH+1 −1 b−1 ≈ 1 1−b . a The equivalence theorems for limits of sequences in the preceding section automatically give equivalence theorems for sums of series. In particular, the Hyperreal Cauchy Condition in Theorem 9.5 has the following consequence, where aH +···+ aK meansPK n=0 an −PH n=0 an. Proposition 9.12. (i)P∞ n=0 an converges if and only if for all inﬁnite H ≤K , aH +···+ aK ≈ 0. (ii) IfP∞ n=0 an converges, then limn→∞an = 0. Proof. (i) is the Hyperreal Cauchy Condition stated in series notation. (ii) follows by setting K = H, so that aH ≈ 0. a The usual converge tests for inﬁnite series are developed in Elementary Calculus in the standard way. The following hyperreal form of the Limit Comparison Test is simpler to state and use than the standard result. Theorem 9.13. (Limit Comparison Test) Let P∞ n=0 an and P∞ n=0 bn be positive term series and c be a positive real number. Suppose that aK ≤ cbK for all inﬁnite K. IfP∞ n=0 bn converges thenP∞ n=0 an converges.

110 9. Infinite Series Proof. Let H ≤ K be inﬁnite. By the Hyperreal Cauchy Condition 9.12, bH +···+ bK ≈ 0. Hence 0 ≤ aH +···+ aK ≤ c(bH +···+ bK) ≈ 0, so aH +···+ aK ≈ 0 andP∞ n=0 an converges. a As an illustration of the use of the Limit Comparison Test we give a proof of a basic result on power series. A seriesP∞ n=0 an is absolutely convergent if the seriesP∞ n=0|an| is convergent. Theorem 9.14. If a power seriesP∞ n=0 an xn is convergent when x = u, then it is absolutely convergent whenever |x| < |u|. Proof. Let |v| < |u| and b = |v|/|u|. Then 0 ≤ b < 1, so the geometric series P∞ n=0 bn converges. We assume that P∞ n=0 an un converges, so limn→∞anun = 0 by Proposition 9.12. Then for positive inﬁnite H, aH uH ≈ 0. Thus

aH vH

=

aH uH

bH ≤ bH,so by the Limit Comparison Test 9.13, P∞ n=0|an vn| converges. a 9C. Taylor’s Formula and Higher Diﬀerentials (§9.10)

In Elementary Calculus the standard proof of Taylor’s Formula is given. Let f be a real function and c,x ∈R. Theorem 9.15. (Taylor’s Formula) If f(n+1) exists between c and x, then

f(x) =

n X k=0

f(k)(c) k!

(x−c)k + f(n+1)(t) (n + 1)!

(x−c)n+1

for some real t between c and x.

Using the Partial Solution Theorem we obtain the following consequence concerning inﬁnitesimals.

Corollary 9.16. Suppose c is real and f(n) is continuous at c. Then whenever x ≈ c and 4x is a nonzero inﬁnitesimal, f(x +4x) ≈ n X k=0 f(k)(x) k! 4xk (compared to 4xn).

9C. Taylor’s Formula and Higher Differentials (§9.10) 111 Proof. fn)(t) is deﬁned for t ≈ c. By Theorem 9.15 and the Partial Solution Theorem there is a t between x and x +4x such that f(x +4x) = n−1 X k=0 f(k)(x) k! 4xk + f(n)(t) n! 4xn = n X k=0 f(k)(x) k! 4xk + f(n)(t)−f(n)(x) n! 4xn. By continuity of f(n) at c, f(n)(t)−f(n)(x) ≈ 0, and the result follows. a By modifying the proof of Taylor’s Formula, we show that Corollary 9.16 holds at x = c assuming only that f(n)(c) exists. Theorem 9.17. Suppose c is real, 4x is nonzero inﬁnitesimal, and f(n)(c) exists. Then

f(c +4x) ≈

n X k=0

f(k)(c) k! 4xk (compared to 4xn).

Proof. Since f(n)(c) exists, f(n−1)(x) exists for all x in some real neighborhood of c. Let

F(x) = f(x)−

n X k=0

f(k)(c) k!

(x−c)k,

G(x) = (x−c)n. Then for m < n, F(m)(c) = 0 and G(m)(c) = 0. Using the Generalized Mean Value Theorem 5.11 n−1 times we see that F(x) G(x) = F(n−1)(t) G(n−1)(t) for some t between c and x. Note that the (n−1)-st derivative of (t−c)(n−1) is (n−1)!, and the (n−1)-st derivative of (t−c)n is n!(t−c). From this, we make the computations F(n−1)(t) = f(n−1)(t)−f(n−1)(c)−f(n)(c)(t−c), G(n−1)(t) = n!(t−c), and we have in turn F(x) G(x) = f(n−1)(t)−f(n−1)(c) n!(t−c) − f(n)(c) n! , F(x) =f(n−1)(t)−f(n−1)(c) n!(t−c) − f(n)(c) n! (x−c)n,

f(x) =

n−1 X k=0

f(k)(c) k!

(x−c)k + f(n)(c) n!

(x−c)n + F(x),

112 9. Infinite Series

f(x) =

n−1 X k=0

f(k)(c) k!

(x−c)k + f(n−1)(t)−f(n−1)(c) n!(t−c)

(x−c)n. By the Partial Solution Theorem there is a hyperreal t1 between c and c+4x such that

f(c +4x) =

n−1 X k=0

fk)(c) k! 4xk + f(n−1)(t1)−f(n−1)(c) n!(t1 −c) 4xn. Since f(n)(c) exists, we have f(n)(c) ≈ f(n−1)(t1)−f(n−1)(c) t1 −c . Therefore f(c +4x) = n X k=0 f(k)(c) k! 4xk + ε4xn for some ε ≈ 0, and the desired result follows. a When n = 1, Theorem 9.17 reduces to the Increment Theorem, f(c) +4x) ≈ f(c) + f0(c)4x (compared to 4x). Another natural way to generalize the Increment Theorem involves the notion of an n-th increment.

Definition 9.18. Let y = f(x) be deﬁned on an interval I. The n-th increment 4ny = 4nf(x,4x) is deﬁned by induction on n as follows. 40y = 40f(x,4x) = f(x +4x)−f(x), 4n+1y = 4n+1f(x,x +4x) = 4nf(x +4,4x)−4nf(x,4x). Thus 4nf(x,4x) is deﬁned whenever x ∈ I and x + n4x ∈ I. For n = 2 we have 42y = [f(x + 24x)−f(x +4x)]−[f(x +4x)−f(x)] = f(x + 24x)−2f(x +4x) + f(x). For n = 3, 43y = f(x + 34x)−3f(x + 24x) + 3f(x +4x)−f(x). In general, it can be shown by induction that 4ny = n X k=0 (−1)n−kn kf(x + k4x) wheren k= n! k!(n−k)! is the binomial coeﬃcient. The following theorem gives a connection between the n-th increment and the n-th diﬀerential dny = f(n)(x)dxn.

9C. Taylor’s Formula and Higher Differentials (§9.10) 113 Theorem 9.19. Suppose y = f(x) is a real function, c is a real number, and 4x is a nonzero inﬁnitesimal. (i) If f(n) is continuous at c, then at every hyperreal point x ≈ c we have 4ny 4xn ≈ dny dxn , that is, 4nf(x,4x) ≈ f(n)(x)4xn (compared to 4xn). (ii) If f(n) exists at c, then at x = c we have 4ny 4xn ≈ dny dxn , that is, 4nf(c,4x) ≈ f(n)(c)4xn (compared to 4xn). Proof. We give the proof for the case 4x > 0. We ﬁrst prove a generalization of the Mean Value Theorem 3.30. (a) Let u and 4u be real and suppose g(m)(t) exists for u ≤ t ≤ u + m4u. Then 4mg(u,4u) = g(m)(t1)4um for some t1 ∈ (u,u + m4u). The proof is by induction on m. For m = 1 it is just the Mean Value Theorem, g(u +4u)−g(u) = g0(t)4u. Assume (a) for m−1 and let h(t) = g(t +4u)−g(t). Then h(m−1)(t) = g(m−1)(t +4u)−g(m−1)(t) exists for u ≤ t ≤ u + (m−1)4u, so for some t0 ∈ (u,u + (m−1)4u), 4m−1h(u,4u) = h(m−1)(t0)4um−1. Moreover, 4m−1h(u,4u) = 4mg(u,4u). By the Mean Value Theorem, h(m−1)(t0) = g(m−1)(t0 +4u)−g(m−1)(t0) = g(m)(t1)4u for some t1 ∈ (t0,t0 +4u), and 4mg(u,4u) = g(m)(t1)4u. This proves (a) for m, and completes the induction. We now prove (i). Since f(n) is continuous at c, f(n)(t) exists for x ≤ t ≤x +4x. By (a) and the Partial Solution Theorem, 4nf(x,4x) 4xn = f(n)(t1)

114 9. Infinite Series for some t1 ∈ (x,x + n4x). Then t1 ≈ x ≈ c, and since f(n) is continuous at c, 4nf(x,4x) 4xn ≈ f(n)(x). To prove (ii) we let m = n−1 and g(t) = f(t+4x)−f(t). By (a) and the Partial Solution Theorem there is a t1 ∈ (c,c + (n−1)4x) such that 4n−1g(c,4x) = g(n−1)(t1)4xn−1, that is, 4nf(c.4x) = f(n−1)(t1) +4x)−f(n−1)(t1) 4x 4xn.Let t2 = t1 +4x. Then t1 ≈ c, t2 ≈ c, so 4nf(c,4x) 4xn = f(n−1)(t2)−f(n−1)(t1) 4x = f(n−1)(t2)−f(n−1)(c) t2 −c t2 −c 4x + f(n−1)(c)−f(n−1)(t1) c−t1 c−t1 4x ≈ f(n)(c)stt2 −c 4x + f(n)(c)stc−t1 4x = f(n)(c). Thus 4nf(c,4x) 4xn ≈ f(n)(c). a

CHAPTER 10

VECTORS

10A. Hyperreal Vectors (§10.8) Chapter 10 of Elementary Calculus deals mostly with two and three dimensional vectors over the reals. Only the last section of the chapter, §10.8, concerns hyperreal vectors. Let n be a ﬁxed natural number. The simplest way to deﬁne a real vector in n dimensions is as an n-tuple of real numbers, hr1,... ,rni. In Elementary Calculus we used the following more geometric deﬁnition. Given two points P = hp1,... ,pni, Q = hq1,... ,qni in Rn, the ordered pair −→ PQ is called the directed line segment from P to Q. The components of −→ PQ are the terms in the n-tuple hq1 −p1,... ,qn −pni. Two directed line segments are said to be equivalent if they have the same components (hence the same length and direction), and a vector A is an equivalence class of directed line segments. If A is the equivalence class of −→ PQ, the components of −→ PQ are called the components of A, denoted by ha1,... ,ani,and A is called the vector from P to Q. The basis vectors are the vectors j1,... ,jn, where jm is the vector with m-th component 1 and all other components 0. For real vectors the vector sum A + B, vector diﬀerence A − B, and the scalar multiple cA are deﬁned in the usual way in terms of components. Thus the vector A with components ha1,... ,ani is equal to the sum A = a1j1 +···+ anjn. The inner product and length are deﬁned by A·B = a1b1 +···+ anbn, |A| = √A·A =qa2 1 +···+ a2 n. The zero vector in n dimensions is denoted by 0. In two dimensions, the basis vectors are denoted by i,j, so the vector A with components ha,bi is the sum A = ai + bj.

115

116 10. Vectors

The hyperreal vectors in n dimensions are deﬁned in a similar way. By the Function and Transfer Axioms, the hyperreal vectors in n dimensions form an inner product space over the ﬁeld R∗ of hyperreal numbers. That is, the usual algebraic rules for vector sums and diﬀerences, scalar multiples, and inner products hold for hyperreal vectors. Hyperreal vectors A will be classiﬁed by the behavior of the length |A| and the unit vector 1 |A| A. Definition 10.1. A hyperreal vector A is said to be inﬁnitesimal, ﬁnite, or inﬁnite if its length |A| is inﬁnitesimal, ﬁnite, or inﬁnite respectively. A is inﬁnitely close to B, in symbols A ≈ B, if and only if B−A is inﬁnitesimal. The monad and galaxy of A are deﬁned by

monad(A) = {B: B ≈ A},

galaxy(A) = {B: B−A is ﬁnite}. Thus monad(0) is the set of inﬁnitesimal vectors, and galaxy(0) is the set of ﬁnite vectors.

The next proposition follows easily from the inequalities |am|≤qa2 1 +···a2 m ≤|a1|+···+|am|. Proposition 10.2. (i) A is inﬁnitesimal if and only if all its components are inﬁnitesimal. (ii) A is ﬁnite if and only if all its components are ﬁnite. (iii) A is inﬁnite if and only if at least one of its components is inﬁnite. (iv) A ≈ B if and only if a1 ≈ b1,... ,an ≈ bn. Proposition 10.3. The sets monad(0) and galaxy(0) are closed under vector sums, vector diﬀerences, and ﬁnite scalar multiples.

This follows from Proposition 10.2 and the fact that the sets monad(0) and galaxy(0) of hyperreal numbers are closed under sums, diﬀerences, and ﬁnite multiples.

Definition 10.4. Given a ﬁnite hyperreal vector A, the standard part st(A) is deﬁned as the real vector

st(A) = st(a1)j1 +···+ st(an)jn. Thus st(A) is the unique real vector inﬁnitely close to A.

The next result also follows from Proposition 10.2.

10A. Hyperreal Vectors (§10.8) 117 Proposition 10.5. The mapping st preserves sums, diﬀerences, ﬁnite scalar multiples, and lengths. That is, if A,B are ﬁnite vectors and c is a ﬁnite scalar, st(A + B) = st(A) + st(B) st(A−B) = st(A)−st(B) st(cA) = st(c)st(A) st(A·B) = st(A)·st(B) st(|A|) = |st(A)|. Definition 10.6. A hyperreal vector A has real length if |A| is real. A unit vector is a hyperreal vector of length 1. If A 6= 0, the unit vector of A is the unit vector U = 1 |A| A. A has real direction if A 6= 0 and the unit vector of A is a real vector.

Proposition 10.7. A nonzero vector A is real if and only if A has both real length and real direction. Proof. A has real length and direction if and only if both |A| and U =1 |A| A are real, which holds if and only if A = |A|U is real. a Definition 10.8. Two nonzero vectors A and B are said to be parallel if their unit vectors U and V are equal or opposite, U = V or U = −V. We introduce a weaker notion for hyperreal vectors. Two nonzero hyperreal vectors A and B are said to be almost parallel if their unit vectors U and V are such that either U ≈ V or U ≈−V. Proposition 10.9. Every nonzero hyperreal vector A is almost parallel to some real vector.

Proof. Let U be the unit vector of A. Then U has ﬁnite length 1, so the real vector V = st(U) exists. V is its own unit vector because |V| = |st(U)| = st(|U|) = 1. Finally, U ≈ V, so A is almost parallel to V. a The notion of almost parallel can be generalized to ﬁnite sequences of vectors.

Definition 10.10. A k-tuple of real vectors A1,... ,Ak is linearly dependent over R if there exist real numbers c1,... ,ck, not all zero, such that c1A1 +···+ ckAk = 0.(56) Similarly, a k-tuple A1,... ,Ak of hyperreal vectors is linearly dependent over R∗ if there exist hyperreal numbers c1,... ,ck, not all zero, such that such that (56) holds. Finally, a k-tuple of hyperreal vectors A1,... ,Ak is almost linearly dependent over R if either some Am = 0 or there exist real numbers c1,... ,ck, not all zero, such that c1U1 +···+ ckUk ≈ 0, where Um is the unit vector of Am.

118 10. Vectors

Thus nonzero hyperreal vectors A1,... ,Ak are almost linearly dependent over R if and only if the standard parts of the unit vectors, st(U1),... ,st(Uk), are linearly dependent over R. For pairs of nonzero vectors, linearly dependent means parallel, and almost linearly dependent means almost parallel. Theorem 10.11. (i) If A1,... ,Ak are linearly dependent over R∗ then they are almost linearly dependent over R. (ii) For real vectors, linear dependence over R, linear dependence over R∗, and almost linear dependence over R are equivalent. Proof. (i) Let c1A1 +···+ckAk = 0 where the hyperreal numbers ck are not all zero. We may assume that all the Am are nonzero since otherwise the vectors are trivially almost linearly dependent over R. Then c1|A1|U1 +···+ ck|Ak|Uk = 0. Let b = |cm||Am|be the largest of the k hyperreal numbers|c1||A1|,... ,|ck||Ak|, and let d` = c`|A`|/b for ` = 1,... ,k. Then d1U1 +···+ dkUk = 0, each d` is ﬁnite, and |dm| = 1. Taking standard parts we see that st(d1)st(U1) +···+ st(dk)st(Uk) = 0 and st(dm) 6= 0. Therefore the standard parts of the unit vectors are linearly dependent over R, and hence A1,... ,Ak are almost linearly dependent over R. This proves (i). (ii) By the Partial Solution Theorem, linear dependence over R and over R∗ are equivalent for real vectors. Moreover, real vectors are linearly dependent over R if and only if the standard parts of their unit vectors are linearly dependent over R. This proves (ii). a

10B. Vector Functions (§10.6) An n-dimensional real vector function F maps a subset of R into the set of n dimensional real vectors. The components of a real vector function F are real functions hf1,... ,fni with the same domain as F. In symbols, F(t) = f1(t)j1 +···+ fn(t)jn. The natural extension of F is the hyperreal vector function F∗ whose components are the natural extensions hf∗ 1,... ,f∗ ni. We conclude this chapter with deﬁnitions and equivalence theorems for limits, continuity, and derivatives of real vector functions of one variable.

Definition 10.12. Let F be a real vector function, A be a real vector, and c be a real number. lim t→c F(t) = A

10B. Vector Functions (§10.6) 119 means that whenever t ≈ c but t 6= c, F(t) ≈ A. F is continuous at c if whenever t ≈ c, F(t) ≈ F(c), that is, lim t→c F(t) = F(c). Theorem 10.13. The following are equivalent. (i) limt→c F(t) = A. (ii) For m = 1,... ,n, limt→c fm(t) = am. (iii) The ε,δ condition. For every real ε > 0 there is a real δ > 0 such that whenever 0 < |t−c| < δ, |F(t)−A| < ε. Proof. The equivalence of (i) and (ii) follows from Proposition 10.2, and the equivalence of (ii) and (iii) follows from Theorem 5.1, the equivalence theorem for limits of functions. a Definition 10.14. Let F be a real function, S be a real vector, and c be a real number. F has vector derivative S at c, in symbols F0(c) = S, if for every nonzero inﬁnitesimal 4t we have F(c +4t)−F(c) 4t ≈ S. Thus F0(c) = lim 4t→0 F(c +4t)−F(c) 4t . Theorem 10.13 leads to ε,δ conditions for the vector derivative. In particular, it follows that F0(c) exists if and only if f0 1(c),... ,f0 n(c) all exist, and F0(c) = f0 1(c)j1 +···+ f0 n(c)jn. Vector increments and vector diﬀerentials are deﬁned as vector dependent variables as follows. We are given a real vector function X = F(t) where t is a scalar independent variable and X a vector dependent variable. We introduce a new scalar independent variable 4t and a new vector dependent variable 4X, called the vector increment of X, with the equation 4X = F(t +4t)−F(t). The vector diﬀerential of X is a second vector dependent variable whose values are given by the equation dX = F0(t)4t. Thus dX exists when F0(t) exists, and putting dt = 4t we have dX dt = F0(t), 4X 4t ≈ dX dt .

CHAPTER 11

PARTIAL DIFFERENTIATION

11A. Continuity in Two Variables (§11.1, §11.2)

For simplicity we concentrate on real functions of two variables. However, all the notions and results readily extend to n variables. Recall from Deﬁnition 1.32 that the distance between two hyperreal points P(x1,y1) and Q(x2,y2) is |P −Q| =q(x2 −x1)2 + (y2 −y1)2, that P is inﬁnitely close to Q, P ≈ Q, if|P−Q|≈ 0, and that the monad of P is the set of all points Q inﬁnitely close to P. We have P(x1,y1) = Q(x2,y2) if and only if x1 = x2and y1 = y2. Given a real point P and a real number δ > 0, the real neighborhood Nδ(P) is deﬁned as the set of all real points Q such that |Q−P| < δ. The interior of a set D ⊆R2 is the set of all points P such that some real neighborhood Nδ(P) is contained in D. D ⊆R2 is an open set if it is equal to its interior. A point P belongs to the boundary of D if every real neighborhood of P meets both D and its complement. It follows that the interior of D and the boundary of D are disjoint, so each open set is disjoint from its boundary. D is a closed set if D contains its boundary. Throughout this chapter we let f be a real function of two variables. The graph of z = f(x,y) is a surface in (x,y,z) space. The following result was proved for one variable in Theorem 1.28 and Corollary 1.30, and the proof for two variables is similar. Theorem 11.1. (i) Let Y ⊆ R2 be a set of real points and let P ∈ R2 be a real point. If Y ∗ contains the monad of P, then Y contains some real neighborhood of P. (ii) Let P ∈R2. If f(x,y) is deﬁned for all (x,y) ≈ P, then f is deﬁned on some real neighborhood of P. Definition 11.2. Let (a,b) ∈R2 and L ∈R. The limit lim (x,y)→(a,b) f(x,y) = L

121

122 11. Partial Differentiation means that whenever (x,y) ≈ (a,b) but (x,y) 6= (a,b), f(x,y) ≈ L. Inﬁnite limits are deﬁned in a similar way. The function f is continuous at (a,b) if f(a,b) is deﬁned, and (x,y) ≈( a,b) implies f(x,y) ≈ f(a,b). Corollary 11.3. f is continuous at (a,b) if and only if f is deﬁned at (a,b) and lim (x,y)→(a,b) f(x,y) = f(a,b). Theorem 1.12 on the standard part function shows that the functions x+y, x−y, and xy are everywhere continuous. For example, if (x,y) ≈ (a,b) ∈R2 then x + y ≈ st(x + y) = st(x) + st(y) = a + b. It follows from Theorem 11.1 that any function which is continuous at (a,b) is deﬁned on some real neighborhood of (a,b). As in the one variable case, we have the following equivalence theorem.

Theorem 11.4. The following are equivalent. (i) lim(x,y)→(a,b) f(x,y) = L, (ii) For every real ε > 0 there is a real δ > 0 such that for all (x,y) 6= (a,b)in Nδ(a,b), |f(x,y)−L| < ε. (iii) There is a hyperreal δ > 0 such that whenever 0 < |(x,y)−(a,b)| < δ, we have f(x,y) ≈ L. Proposition 11.5. Compositions of continuous functions are continuous. That is, if f(x,y) and g(x,y) are continuous at (x,y) = (a,b), and if h(u,v) is continuous at (u,v) = (f(a,b),g(a,b)), then the composition k(x,y) = h(f(x,y),g(x,y)) is continuous at (x,y) = (a,b).

The proof is the same as in the one variable case, Proposition 3.11.

11B. Partial Derivatives (§11.3, §11.4) Definition 11.6. Given a real function f(x,y) and a real point (a,b), the partial derivatives are deﬁned by fx(a,b) = g0(a) where g(x) = f(x,b), fy(a,b) = h0(b) where h(y) = f(a,y). If z = f(x,y) we use the notation ∂z ∂x = ∂f ∂x (a,b) = fx(a,b), ∂z ∂y = ∂f ∂y (a,b) = fy(a,b).

11B. Partial Derivatives (§11.3, §11.4) 123 The mere existence of the partial derivatives of f(x,y) tells us nothing about the behavior of f oﬀ of the lines x = a,y = b. We introduce three stronger notions of diﬀerentiability which will be used later on in this chapter. Definition 11.7. Let (a,b) ∈R2 and suppose both partial derivatives fx(a,,b)and fy(a,b) exist. (i) f is smooth at (a,b) if both fx and fy are continuous at (a,b). (ii) f is diﬀerentiable at (a,b) if for any nonzero inﬁnitesimal point (4x,4y), f(a +4x,b +4y)−f(a,b) ≈ fx(a,b)4x + fy(a,b)4y compared top4x2 +4y2.(iii) f is uniformly diﬀerentiable at (a,b) if for any hyperreal point (x,y) ≈ (a,b) and nonzero inﬁnitesimal point (4x,4y), f(x +4x,y +4y)−f(x,y) ≈ fx(a,b)4x + fy(a,b)4y compared top4x2 +4y2. Diﬀerentiability and uniform diﬀerentiability correspond to the one variable notions. They are equivalent to real ε,δ conditions which are left to the reader. When z = f(x,y) we introduce two new dependent variables, the increment 4z given by 4z = f(x +4x,y +4y)−f(x,y) and the total diﬀerential dz given by dz = fx(x,y)4x + fy(x,y)4y = ∂z ∂x4x + ∂z ∂y4y. Both4z and dz depend on the four independent variables x,y,4x,4y. Using dependent variable notation, f is diﬀerentiable at (a,b) if and only if for every nonzero inﬁnitesimal (4x,4y), 4z ≈ dzcompared top4x2 +4y2.If f is diﬀerentiable at (a,b), the tangent plane of f at (a,b) is deﬁned as the plane with the equation z−f(a,b) = fx(a,b)(x−a) + fy(a,b)(y−b). Putting 4x = x−a and 4y = y−b, we see that 4z = change in z along the surface, dz = change in z along the tangent plane.

By deﬁnition, if f is diﬀerentiable at (a,b) then whenever (x,y) is inﬁnitely close to but not equal to (a,b), the tangent plane is inﬁnitely close to the surface compared top4x2 +4y2.

124 11. Partial Differentiation

The following result shows the relationship between the three notions of diﬀerentiability. The implication (i) ⇒ (iii) was called the Increment Theorem in Elementary Calculus. The one-variable form of the implication (i) ⇒ (ii) was given by Theorem 3.37 (i).

Theorem 11.8. Let (a,b) be a real point. Each condition below implies the next, (i) ⇒ (ii) ⇒ (iii) ⇒ (iv). (i) f is smooth at (a,b). (ii) f is uniformly diﬀerentiable at (a,b). (iii) f is diﬀerentiable at (a,b). (iv) f is continuous at (a,b). Proof. (i) ⇒ (ii): Let (x,y) ≈ (a,b) and let (4x,4y) be nonzero inﬁnitesimal. Then f(x +4x,y +4y)−f(x,y) = [f(x +4x,y +4y)−f(x +4x,y)] + [f(x +4x,y)−f(x,y)]. Since fx and fy are continuous at (a,b) they are deﬁned everywhere in the monad of (a,b). Using the Hyperreal Mean Value Theorem 3.34 for the one variable function g(u) = f(x +4x,u), we have f(x +4x,y +4y)−f(x +4x,y) = fy(x +4x,u)4y for some u between y and y +4y. Similarly, f(x +4x,y)−f(x,y) = fx(t,y)4x for some t between x and x +4x. Since fy(x +4x,u) ≈ fy(a,b), fx(t,y) ≈ fx(a,b) and 4xp 4x2 +4y2 , 4y p4x2 +4y2are ﬁnite, we have f(x +4x,y +4y)−f(x,y) ≈ fx(a,b)4x + fy(a,b)4y compared top4x2 +4y2. Thus (i) implies (ii). Condition (ii) trivially implies (iii). (iii) ⇒ (iv): Let (4x,4y) ≈ (0,0). By (iii), f(a +4x,b +4y)−f(a,b) ≈ fx(a,b)4x + fy(a,b)4y ≈ 0. Therefore f is continuous at (a,b). a The next two theorems are the two-variable analogues of Theorems 3.37 (ii) and 3.39. The proofs are left to the reader.

Theorem 11.9. If f is uniformly diﬀerentiable at every point of an open set Y ⊆R2, then f is smooth at every point of Y .

11C. Chain Rule and Implicit Functions (§11.5, §11.6) 125 Theorem 11.10. If f(x,y) is uniformly diﬀerentiable at a real point (a,b) then f is continuous on some real neighborhood of (a,b).

11C. Chain Rule and Implicit Functions (§11.5, §11.6)

The Chain Rule for functions of two variables holds when all functions involved are diﬀerentiable in the sense of the preceding section. In Elementary Calculus the theory was simpliﬁed by concentrating on smooth functions. Some theorems were stated under the hypothesis that f is smooth when only diﬀerentiability or uniform diﬀerentiability was actually needed. In this section we give detailed proofs of the basic results and keep track of the kind of diﬀerentiability which is actually needed in each case.

Theorem 11.11. (Chain Rule for Two Variables) Suppose x = f(s,t) and y = g(s,t) are diﬀerentiable at the real point (s0,t0) and z = h(x,y) is diﬀerentiable at the real point

(x0,y0) = (f(s0,t0),g(s0,t0)).

Then the composition

z = H(s,t) = h(f(s,t),g(s,t))

is diﬀerentiable at (s0,t0), and its partial derivatives are ∂z ∂s = ∂z ∂x ∂x ∂s + ∂z ∂y ∂y ∂s , ∂z ∂t = ∂z ∂x ∂x ∂t + ∂z ∂y

∂y ∂t

Proof. Let (4s,4t) be a nonzero inﬁnitesimal point and let 4x,4y, and 4z be the corresponding increments in x,y, and z. Let δ =p4s2 +4t2. Then at the point (s0,t0), 4x ≈ ∂x ∂s4s + ∂x ∂t4t (compared to δ), 4y ≈ ∂y ∂s4s + ∂y ∂t4t (compared to δ). It follows that4x ≈ 0,4y ≈ 0. Also, 4x/δ,4y/δ, and hencep4x2 +4y2/δ are ﬁnite. Therefore 4z ≈ ∂z ∂x4x + ∂z ∂y4y (compared to δ). When 4t = 0 we have δ = 4s and 4z 4s ≈ ∂z ∂x 4x 4s + ∂z ∂y 4y 4s ≈ ∂z ∂x ∂x ∂s + ∂z ∂y ∂y ∂s , so ∂z ∂s = ∂z ∂x ∂x ∂s + ∂z ∂y ∂y ∂s .

126 11. Partial Differentiation The formula for ∂z/∂t is obtained in a similar way when 4s = 0. Finally, H is diﬀerentiable at (s0,t0) because 4z δ ≈ ∂z ∂x 4x δ + ∂z ∂y 4y δ ≈ ∂z ∂x∂x ∂s4s + ∂x ∂t4t1 δ + ∂z ∂y∂y ∂s4s + ∂y ∂t4t1 δ , so 4z ≈ ∂z ∂s4s + ∂z ∂t4t (compared to δ). a We now turn to the Implicit Function Theorem, ﬁrst for two variables and then for three variables.

Definition 11.12. An implicit function of a real curve F(x,y) = 0 at a real point (a,b) is a real function y = g(x) such that:

(i) g(a) = b; (ii) The domain of g is a real neighborhood of a; (iii) The graph of g is a subset of the graph of the equation F(x,y) = 0.

An implicit function of a real surface F(x,y,z) = 0 at a real point (a,b,c) is a real function z = h(x,y) such that:

(iv) h(a,b) = c; (v) The domain of h is a real neighborhood of (a,b); (vi) The graph of h is a subset of the graph of F(x,y,z) = 0.

The notion of uniform diﬀerentiability will be useful for the Implicit Function Theorem.

Theorem 11.13. (Two Variable Implicit Function Theorem) Suppose that at the real point (a,b), z = F(x,y) is uniformly diﬀerentiable, F(a,b) = 0, and Fy(a,b) 6= 0. Then the curve F(x,y) = 0 has an implicit function at (a,b). Moreover, for every implicit function g(x) at (a,b), g is uniformly diﬀerentiable at a and g0(a) = −Fx(a,b) Fy(a,b) . Proof. Since Fy(a,b) 6= 0, it follows from the ε,δ condition for uniform diﬀerentiability at (a,b) (the two variable form of Theorem 5.4) that there is a real neighborhood Nδ(a,b) with the following property. For any two distinct points (x,y) and (x,y + 4y) of Nδ(a,b), F(x,y) 6= F(x,y + 4y). We may therefore deﬁne a real function g by g(x) = y iﬀ (x,y) ∈ Nδ(a,b) and F(x,y) = 0.

11C. Chain Rule and Implicit Functions (§11.5, §11.6) 127 Obviously g(a) = b. We show ﬁrst that g is continuous at a. Let x1 ≈ a. Since F is continuous at (a,b), F(x1,b) ≈ 0. Since F is uniformly diﬀerentiable at (a,b), for every nonzero inﬁnitesimal 4y we have F(x1,b +4y)−F(x1,b) 4y ≈ Fy(a,b) 6= 0. Hence there is a 4y ≈ 0 such that 0 is between F(x1,b) and F(x1,b +4y). Since F is continuous on Nδ(a,b), it follows from the Hyperreal Intermediate Value Theorem 3.32 that there is a hyperreal point y1 ≈ b with F(x1,y1) = 0. Every real solution of F(x,y) = 0, |(x,y)−(a,b)| < δ is a real solution of y = g(x). By Transfer, y1 = g(x1). Thus g is deﬁned in the monad of a and is continuous at a. By Theorem 11.1 (or Corollary 1.30), g is deﬁned on some real neighborhood of a. By deﬁnition, the graph of g is a subset of the graph of F(x,y) = 0. It remains to show that g is uniformly diﬀerentiable at a and has the required derivative. Let x ≈ a and let 4x be nonzero inﬁnitesimal. Let y = g(x), 4y = g(x +4x)−g(x), and 4s =p4x2 +4y2. Since g is continuous at s, 4y ≈ 0. Since F is uniformly diﬀerentiable at (a,b), 0 = F(x +4x,y +4y)−F(x,y) ≈ Fx(a,b)4x + Fy(a,b)4y (compared to 4x). Since Fy(a,b) 6= 0, 4y 4s ≈− Fx(a,b) Fy(a,b) 4x 4s .(57) We cannot have 4x/4s ≈ 0, because it leads to the contradiction 1 ≈ 4y 4s ≈− Fx(a,b) Fy(a,b) ·0 = 0. Therefore we may divide both sides of (57) by 4x/4s, and get 4y 4x = 4y/4s 4x/4s ≈− Fx(a,b) Fy(a,b) . We conclude that g is uniformly diﬀerentiable at a and g0(a) = −Fx(a,b) Fy(a,b) . a We now turn to the Implicit Function Theorem in three variables. It is proved in exactly the same way as the two variable theorem.

128 11. Partial Differentiation

Theorem 11.14. (Three variable Implicit Function Theorem) Suppose that at the point (a,b,c), w = F(x,y,z) is uniformly diﬀerentiable, F(a,b,c) = 0, and Fz(a,b,c) 6= 0. Then the curve F(x,y,z) = 0 has an implicit function at (a,b). Moreover, every implicit function h(x,y) is uniformly diﬀerentiable at (a,b), has partial derivatives

hx(a,b) = −

Fx(a,b,c) Fz(a,b,c)

, hy(a,b) = −

Fy(a,b,c) Fz(a,b,c)

and has the tangent plane Fx(a,b,c)(x−a) + Fy(a,b,c)(y−b) + Fz(a,b,c)(z−c) = 0.

11D. Maxima and Minima (§11.7) We discuss maxima and minima for real functions of two variables.

Definition 11.15. Let f(x,y) be a real function with domain D. f has a maximum at a point (a,b) ∈ D if f(a,b) ≥ f(x,y) for all (x,y) ∈ D. The value f(a,b) is called the maximum value of f. A minimum of f and the minimum value are deﬁned analogously.

The following two results are proved exactly as in the one variable case.

Proposition 11.16. Suppose a real function f(x,y) with domain D has a maximum at a real point (a,b). Then the natural extension of f has a maximum at (a,b), that is, f(a,b) ≥ f(x,y) for all hyperreal points (x,y) ∈ D∗.

Theorem 11.17. Given a real function f(x,y) and a real point (a,b), the following are equivalent. (i) f has a local maximum at (a,b), that is, the restriction of f to some real neighborhood of (a,b) has a maximum at (a,b). (ii) Whenever (x,y) ≈ (a,b), f(a,b) ≥ f(x,y). By Corollary 1.29 and Theorem 1.31 for two variables, a set of points D ⊆R2 is closed and bounded if and only if (i) Every point (x,y) ∈ D∗ is ﬁnite, and (ii) Whenever (x,y) ∈ D∗, (st(x),st(y)) ∈ D. We use this fact to prove the Extreme Value Theorem for two variables.

Theorem 11.18. (Extreme Value Theorem) Suppose the domain of f(x,y) is a closed and bounded set D, and f(x,y) is continuous on D. Then f has a maximum and a minimum.

Proof. Let Y be the range of f, Y = {f(x,y): (x,y) ∈ D}.

11D. Maxima and Minima (§11.7) 129 By Proposition 1.27, f∗ has domain D∗ and range Y ∗. For each (x1,y1) ∈ D∗, we have (st(x1),st(y1)) ∈ D, and by the continuity of f, f(x1,y1) ≈ f(st(x1),st(y1)). Therefore f(x1,y1) is ﬁnite, so every element of Y ∗ is ﬁnite. By Theorem 1.31, the set Y is bounded, so Y ⊆ [A,B] for some real A and B. Consider a positive integer n, and partition the interval [A,B] into n equal subintervals of length δ = (B−A)/n. Let k = k(n) be the largest integer such that Y meets the kth subinterval. Then there is a point (g(n),h(n)) such that (g(n),h(n)) ∈ D, A + (k−1)δ ≤ f(g(n),h(n)), but for all (x,y) ∈ D, f(x,y) ≤ A + kδ. Now let n1 be a positive inﬁnite hyperinteger and let δ1 = (B −A)/n1 and k1 = k(n1). Using Transfer we see that (g(n1),h(n1)) ∈ D∗, A + (k1 −1)δ1 ≤ f(g(n1),h(n1)). By another use of Transfer, for every point (x1,y1) ∈ D∗ we have f(x1,y1) ≤ A + k1δ1. Since D is closed and bounded, (g(n1),h(n1)) is ﬁnite, and has a standard part (a,b) = (st(g(n1)),st(h(n1))) ∈ D. By continuity of f, f(a,b) = st(f(g(n1),h(n1))). It follows that for any point (x,y) ∈ D, f(x,y) ≤ st(A + k1δ1) = st(A + (k1 −1)δ1) ≤ f(a,b), so f has a maximum at (a,b). a The above proof diﬀers from the proof of the one variable Extreme Value Theorem 3.28 because this time we partitioned the range of f instead of the domain of f. The one variable proof can be generalized to the present case by using a rectangular grid to partition the domain of f. Here is a hyperreal form of the Extreme Value Theorem.

Theorem 11.19. (Hyperreal Extreme Value Theorem) Suppose f is continuous on its domain D, and E is a hyperreal closed rectangle which is contained in D∗. Then f∗ has a maximum and minimum on E.

Proof. By the Extreme Value Theorem 11.18, for every real closed rectangle E with sides [s,t] and [u,v] there is a point x = g(s,t,u,v),y = h(s,t,u,v) such that (x,y) ∈ E and either (x,y) / ∈ D or f has a maximum on E at (x,y).Then every real solution of s ≤ x1 ≤ t, u ≤ y1 ≤ v, x = g(s,t,u,v), y = h(s,t,u,v), (x,y) ∈ D(58)

130 11. Partial Differentiation

is a solution of s ≤ x ≤ t, u ≤ y ≤ v, f(x,y) ≥ f(x1,y1).(59) By Transfer, every hyperreal solution of (58) is a solution of (59). For any hyperreal closed rectangle E ⊆ D∗ with sides [s,t]∗ and [u,v]∗, f∗ is deﬁned everywhere in E. It follows that f∗ has a maximum on E at

x = g(s,t,u,v),y = h(s,t,u,v). The proof that f∗ has a minimum is similar. a The above theorem can easily be generalized. For instance, later in this section we will introduce the notion of a basic closed hyperreal region, and a proof similar to the above works when E is such a region.

Definition 11.20. For simplicity, for the rest of this chapter we only consider functions f such that f is diﬀerentiable at every interior point of its domain D. An interior point (a,b) of D is said to be a critical point of f if both partial derivatives of f are equal to zero,

fx(a,b) = 0, fy(a,b) = 0.

Theorem 11.21. (Critical Point Theorem) Suppose f is continuous on its domain D and f is diﬀerentiable at every interior point of D. If f has a maximum or minimum at a real point (a,b), then (a,b) is either a boundary point of D or a critical point of f.

Proof. Suppose f has a maximum at an interior point (a,b) of D. Then the one variable function g(x) = f(x,b) has a maximum at the interior point a. By the one variable Critical Point Theorem 3.29, g0(a) = fx(a,b) = 0. Similarly, fy(a,b) = 0. a As in the one variable case, the Extreme Value Theorem and Critical Point Theorem lead to a method for ﬁnding the maxima and minima of a function with a bounded closed domain. We will see that the method can often be applied for other domains as well. In Elementary Calculus we concentrated on basic closed regions, that is, sets D of the form D = {(x,y): a ≤ x ≤ b, g(x) ≤ y ≤ h(x)} where g and h are continuous and g(x) ≤ h(x) for x in [a,b]. The following lemma ties the present approach to the simpler treatment in Elementary Calculus.

Lemma 11.22. Let D be a basic closed region in the plane. (i) D is a bounded closed set. (ii) The boundary of D is the set of all points of D on one of the four curves

x = a, x = b, y = g(x), y = h(x).

11D. Maxima and Minima (§11.7) 131 Proof. (i) Since g and h are continuous, they have maxima and minima on [a,b], so D is bounded. Let (x1,y1) ∈ D∗ and let x = st(x1),y = st(y1). Then a ≤ x1 ≤ b, g(x1) ≤ y1 ≤ h(x1). By the continuity of g and h, g(x) = st(g(x1)), h(x) = st(h(x1)). Therefore a ≤ x ≤ b, g(x) ≤ y ≤ h(x), so (x,y) ∈ D. (ii) Obviously any point on one of the four curves is on the boundary of D. For example, if a ≤ x ≤ b and g(x) = y then for any real ε > 0, y−ε/2 < g(x) and hence (x,y−ε/2) / ∈ D, so Nε(x,y) is not contained in D and (x,y) is onthe boundary of D. Suppose (x,y) ∈ D but (x,y) is not on one of the four curves. Then a < x < b, g(x) < y < h(x). For any (x1,y1) ≈ (x,y) we have a < x1 < b. Also, g(x1) < y1 < h(x1) because if, for example, y1 ≤ g(x1), then y = st(y1) ≤ st(g(x1)) = g(st(x1)) = g(x), contradicting the assumption that g(x) < y. Thus (x1,y1) ∈ D∗, so D∗ contains the monad of (x,y). It follows by Theorem 1.28 that (x,y) belongs to the interior of D. a In many applications, a function f is continuous on a basic closed region D and is diﬀerentiable and has only ﬁnitely many critical points on the interior of D. The maximum of f can usually be found as follows. First, evaluate f at each of its critical points. Second, on each of the four boundary curves of D ﬁnd the maxima of f by eliminating one variable and using the one variable method. Finally, the largest of the values of f at the critical points and on the boundary curves must be the maximum value of f. We now present a method for ﬁnding maxima and minima on open regions. A basic open region in the plane is a set D ⊆R2 of one of the forms {(x,y): x ∈ I, g(x) < y < h(x)}, {(x,y): x ∈ I, −∞ < y < h(x)}, {(x,y): x ∈ I, g(x) < y < ∞}, {(x,y): x ∈ I, −∞ < y < ∞} where I is an open interval, g and h are continuous real functions on I, and g(x) < h(x) for all x ∈ I. A basic closed hyperreal region in the plane is a set E ⊆R∗ of hyperreal points of the form E = {(x,y): a ≤ x ≤ b, g(x) + A ≤ y ≤ h(x) + B}

132 11. Partial Differentiation

where a,b,A,B are hyperreal numbers and g,h are real functions on an interval I such that a,b ∈ I∗. A hyperreal cover of a basic open region D is a basic closed hyperreal region E obtained as follows. Let ε be a positive inﬁnitesimal and H be a positive inﬁnite hyperreal number. If D = {(x,y) ∈R2: a < x < b, g(x) < y < h(x)}, then E = {(x,y) ∈ (R∗)2: a + ε ≤ x ≤ b−ε, g(x) + ε ≤ y ≤ h(x)−ε}. If D = {(x,y) ∈R2: a < x < ∞, g(x) < y < ∞}, then E = {(x,y) ∈ (R∗)2: a + ε ≤ x ≤ H, g(x) + ε ≤ y ≤ H}. The other cases are similar. Note that every basic open region D has a hyperreal cover. In fact, there is one hyperreal cover for each positive inﬁnitesimal ε > 0 and positive inﬁnite H.

Lemma 11.23. If E is a hyperreal cover of a basic open region D, then D ⊆ E ⊆ D∗. This lemma follows easily from the deﬁnition. The standard calculus course often gives the Second Derivative Test for local maxima, minima, and saddle points, but no practical test for global maxima and minima. The following method for ﬁnding global maxima and minima for open regions is presented in Elementary Calculus. It is an application of hyperreal numbers to an elementary problem about real functions. For a function f with an open domain D, no point of D is on the boundary. Thus by the Critical Point Theorem, if f has a maximum, it must occur at a critical point. But it is often hard to determine whether or not f has a maximum. The next result is a practical criterion for the existence of a maximum of f. By a maximum critical point of f(x,y) we mean a critical point (x0,y0) such that for every other critical point (x1,y1), f(x0,y0) ≥ f(x1,y1). Obviously, if f(x,y) has at least one but only ﬁnitely many critical points, it has a maximum critical point.

Theorem 11.24. Suppose a real function f(x,y) is deﬁned on a basic open region D, has partial derivatives at every point of D, and has a maximum critical point (c,d). Let E be a hyperreal cover of D. Then (i) If f(c,d) ≥ f(x,y) for all (x,y) on the boundary of E, then f has a maximum in D at (c,d). (ii) If f(c,d) < f(x,y) for some (x,y) on the boundary of E, then f has no maximum in D. A similar result holds for minima.

11E. Second Partial Derivatives (§11.8) 133 Proof. We give the proof in the case that D has the form D = {(x,y): a < x < ∞, g(x) < y < ∞}. The hyperreal cover E has the form E = {(x,y): a + ε ≤ x ≤ H, g(x) + ε ≤ y ≤ H} where ε is positive inﬁnitesimal and H is positive inﬁnite. We ﬁrst prove (ii). Suppose f has a maximum in D. To prove (ii) it suﬃces to show that f(c,d) ≥ f(x,y) for all (x,y) on the boundary of E. Since D is open, no point of D is on the boundary of D. Hence by the Critical Point Theorem 11.21, the maximum must occur at the maximum critical point (c,d). By Proposition 11.16, f(c,d) ≥ f(x,y) for all (x,y) ∈ D∗. Since E ⊆ D∗, it follows that f(c,d) ≥ f(x,y) for all (x,y) on the boundary of E. This proves (ii). The proof of (i) uses the Partial Solution Theorem. For each positive real ε0 and H0, let E(ε0,H0) be the basic closed region E(ε0,H0) = {(x,y): a + ε0 ≤ x ≤ H0, g(x) + ε0 ≤ y ≤ H0}. Then E(ε0,H0) ⊆ D ⊆ E ⊆ D∗. Let ∂E(ε0,H0) be the boundary of E(ε0,H0). The relations (x,y) ∈ E(ε0,H0) and (x,y) ∈ ∂E(ε0,H0) can both be expressed by systems of formulas. By the Extreme Value and Critical Point Theorems, if (c,d) ∈ E(ε0,H0) then in E(ε0,H0), f has a maximum either at (c,d) or on ∂E(ε0,H0). Thus every real solution of (c,d) ∈ E(ε0,H0), (s,t) ∈ E(ε0,H0), f(c,d) < f(s,t (60) ) is a partial real solution of (s,t) = (s,t), (x,y) ∈ ∂E(ε0,H0), f(c,d) < f(x,y).(61) Suppose f does not have a maximum in D at (c,d). To prove (i) we must show that f(c,d) < f(u,v) for some (u,v) on the boundary of E. We have f(c,d) < f(s,t) for some real point (s,t) ∈ D. Since D ⊆ E = E(ε,H), (s,t,ε,H) is a hyperreal solution of the system of formulas (60). By the Partial Solution Theorem, there is a hyperreal point (u,v) such that (61) holds for (s,t,ε,H,u,v). Thus (u,v) is on the boundary of E and f(c,d) < f(u,v). This proves (i). a

11E. Second Partial Derivatives (§11.8)

Given a function f(x,y) of two variables, one can consider four second partial derivatives, the pure second partials ∂2f ∂x2 = fxx = (fx)x and ∂2f ∂y2 = fyy = (fy)y

134 11. Partial Differentiation

and the mixed second partials ∂2f ∂x∂y = fxy = (fx)y and

∂2f ∂y∂x

= fyx = (fy)x.

We give a hyperreal proof of the equality of the mixed second partial derivatives. The hypothesis we need is the uniform diﬀerentiability of both ﬁrst partial derivatives. (In Elementary Calculus we used the stronger hypothesis that both ﬁrst partial derivatives are smooth). The proof uses the following form of the Hyperreal Mean Value Theorem for two variables.

Theorem 11.25. (Hyperreal Mean Value Theorem) Suppose f(x,y) is a real function whose partial derivative fx(x,y) exists on an open rectangle D. Then for every pair of hyperreal points (x,y) and (x +4x,y) in D∗ with 4x 6= 0 there is a point t between x and x +4x such that f(x +4x,y)−f(x,y) 4x = fx(t,y). A similar result holds for the other partial derivative fy.

Proof. By the real Mean Value Theorem 3.30 in one variable, every real solution of (x,y) ∈ D, (x,x +4x,y) ∈ D, 4x > 0 is a partial real solution of f(x,x +4x,y)−f(x,y) 4x = fx(t,y), x < t < x +4x. The result now follows from the Partial Solution Theorem. a Theorem 11.26. Suppose both partial derivatives fx and fy of a real function f(x,y) are uniformly diﬀerentiable at a real point (a,b). Then the mixed second partial derivatives are equal at (a,b), ∂2f ∂x∂y (a,b) = ∂2f ∂y∂x (a,b). Proof. Let 4x and 4y be nonzero inﬁnitesimals. Since fx and fy are uniformly diﬀerentiable at (a,b), they exist and are continuous on a real neighborhood of (a,b), by Theorem 11.10. Let δ = [f(a +4x,b +4y)−f(a +4x,b)]−[f(a,b +4x)−f(a,b)]. By the Hyperreal Mean Value Theorem 11.25 applied to the function g(x,4y) = f(x,b +4y)−f(x,b), we have δ 4y = f(a +4x,b +4y)−f(a +4x,b) 4y − f(a,b +4x)−f(a,b) 4y = fy(a +4x,u)−fy(a,u)

11E. Second Partial Derivatives (§11.8) 135 for some u between b and b +4y. By uniform diﬀerentiability of fx and fy, δ 4x4y = fx(t,b +4y)−fx(t,b) 4y ≈ ∂2f ∂y∂x (a,b) and δ 4x4y = fy(a +4x,u)−fy(a,u) 4x ≈ ∂2f ∂x∂y (a,b). Therefore ∂2f ∂y∂x (a,b) = ∂2f ∂x∂y (a,b). a

CHAPTER 12

MULTIPLE INTEGRATION

Double and triple integrals are developed in Chapter 12 of Elementary Calculus. To simplify our notation we will concentrate entirely on double integrals here. However, everything we do in this chapter cn readily be generalized to triple integrals. Permanent Assumption We assume throughout this Chapter that D0 ⊆R 2 is an open set in the plane, and that f(x,y) is a real function which is continuous on D0.

12A. Double Integrals (§12.1, §12.2)

We will deﬁne the double integral of a continuous real function f(x,y) and state the basic results. Most of the proofs are similar to corresponding proofs for the single integral in Chapter 4 and will be omitted here. We will consider basic closed subregions D ⊆ D0. The analogue for f of an area function for a real function of one variable is a volume function.

Definition 12.1. A volume function for f is a function B(D) from the set of basic closed regions D ⊆ D0 into the real numbers which has the following properties: Addition Property: If D = D1 ∪D2 where D1 = {(x,y) ∈ D: x ≤ c}, D2 = {(x,y) ∈ D: x ≥ c}, then B(D) = B(D1) + B(D2). If D = D1 ∪D2 where D1 = {(x,y) ∈ D: y ≤ k(x)}, D2 = {(x,y) ∈ D: y ≥ k(x)}, where k(x) is continuous on R, then B(D) = B(D1) + B(D2).

137

138 12. Multiple Integration

Cylinder Property: If f(x,y) has minimum value m and maximum value M on D, and A is the area of D, then mA ≤ B(D) ≤ MA. The Addition Property states that if we divide D into two basic closed regions D1 and D2 with either a vertical line x = c or a continuous curve y = k(x), then B(D) = B(D1) + B(D2). The Cylinder Property states that B(D) is between the volume of the cylinder above D with the minimum height m and the volume of the cylinder above D with the maximum height M. The intuitive notion of the volume of a solid above the region D between the horizontal plane z = 0 and the surface z = f(x,y) has the Addition and Cylinder properties. It will turn out that the double integral is the unique volume function for f. Our ﬁrst step in deﬁning the double integral is the ﬁnite double Riemann sum. Let D = {(x,y): a ≤ x ≤ b, g(x) ≤ y ≤ h(x)} be a basic closed region included in D0, and let B1 = minimum value of g(x), a ≤ x ≤ b, B2 = minimum value of h(x), a ≤ x ≤ b. The rectangle [a,b]×[B1,B2] is called the circumscribed rectangle of D. Let 4x and 4y be positive real numbers and divide the intervals [a,b] and [B1,B2] into subintervals of length 4x and 4y respectively, x0 = a, x1 = a +4x, ... , xm = a + m4x where xm < b ≤ xm +4x, y0 = B1, y1 = B1 +4y, ... , yn = B1 + n4y where yn < B2 ≤ yn + 4y. If 4x does not evenly divide b − a, the last subinterval [xm,b] will have length less than 4x and will be covered by the interval [xm,xm +4x]. Thus the circumscribed rectangle [a,b]×[B1,B2] of D is covered by a grid of 4x by 4y rectangles. Definition 12.2. The ﬁnite double Riemann sum of f over D is deﬁned as XX D f(x,y)4x4y = m X k=0 n X `=0 {f(xk,y`)4x4y: (xk,y`) ∈ D}. ThusPPD f(x,y)4x4y is the sum of the volumes of the rectangles ofbase 4x4y and height f(xk,y`) such that the point (xk,y`) belongs to D. For a given region D, the ﬁnite double Riemann sumPPD f(x,y)4x4y is a real function of two variables 4x and 4y, and is deﬁned for all 4x > 0,4y > 0. By the Function Axiom, its natural extension is deﬁned for all hyperreal 4x > 0,4y > 0.

12A. Double Integrals (§12.1, §12.2) 139 Definition 12.3. If dx and dy are positive inﬁnitesimals, the hyperreal number XX D f(x,y)dxdy is called the inﬁnite Riemann sum of f over D with respect to dx and dy.

Lemma 12.4. For any basic closed region D and positive inﬁnitesimals dx and dy, the inﬁnite Riemann sum PPD f(x,y)dxdy is a ﬁnite hyperreal number. Definition 12.5. Let dx and dy be positive inﬁnitesimals. The double integral of f over D with respect to dx and dy is the standard part of the inﬁnite Riemann sum, ZZD f(x,y)dxdy = st XX D f(x,y)dxdy!. Theorem 12.6. The value of the double integral of f over a basic closed region D does not depend on dx or dy. That is, if dx,d1x,dy,d1y are positive inﬁnitesimals, then ZZD f(x,y)dxdy =ZZD f(x,y)d1xd1y. Hereafter we will write dA = dxdy and denote the double integral by ZZD f(x,y)dxdy =ZZD f(x,y)dA. Theorem 12.7. The double integral is the unique volume function for f.

This theorem justiﬁes the deﬁnition of the volume V of the solid over the region D above the plane z = 0 and below the surface z = f(x,y) as V =ZZD f(x,y)dA. It follows from the Cylinder Property that the area of a basic closed region D is equal to the deﬁnite integral A =ZZD 1dA. The next theorem provides the basic tool for computing double integrals.

Theorem 12.8. (Iterated Integral Theorem) Let D be the basic closed region D = {(x,y): a ≤ x ≤ b, g(x) ≤ y ≤ h(x)}. Then the double integral of f over D is equal to the iterated integral ZZD f(x,y)dA =Z b a Z h(x) g(x) f(x,y)dy dx.

140 12. Multiple Integration

Proof. Let

B(D) =Z b a Z h(x) g(x)

f(x,y)dy dx.

Using properties of the single integral, a simple computation shows that B(D) is a volume function for f. By Theorem 12.7, the double integral is the only volume function for f, so it must be equal to B(D). a Corollary 12.9. Suppose D is a basic closed region in both the (x,y) plane and the (y,x) plane, that is, D = {(x,y): a ≤ x ≤ b, g(x) ≤ y ≤ h(x)}, D = {(x,y): c ≤ y ≤ d, k(y) ≤ x ≤ `(y)}. Then the two iterated integrals are equal, Z b a Z h(x) g(x) f(x,y)dy dx =Z d c Z `(y) k(y) f(x,y)dxdy. Proof. The deﬁnition of the double Riemann sum is symmetric between x and y, so the double integral in the sense of the (x,y) plane equals the double integral in the sense of the (y,x) plane. By the Iterated Integral Theorem 12.8, each of the iterated integrals is equal to the double integral ZZD f(x,y)dA. a

12B. Inﬁnite Sum Theorem for Two Variables (§12.3) As in the case of single integrals, the Inﬁnite Sum Theorem can be used for applications of the double integral. Definition 12.10. An element of area is a hyperreal rectangle 4D ⊆( D0)∗ whose sides are inﬁnitesimal and parallel to the x and y axes. Given an element of area 4D we let (x,y) = lower left corner of 4D, 4x,4y = dimensions of 4D, 4A = 4x4y = area of 4D. Theorem 12.11. (Inﬁnite Sum Theorem) Let f(x,y) be a real function which is continuous on an open set D0 and let B(D) be a function from basic closed regions D ⊆ D0 to real numbers. Let 4x and 4y be positive inﬁnitesimals. Assume that: (i) B has the Addition Property; (ii) B(D) ≥ 0 for all D;

12B. Infinite Sum Theorem for Two Variables (§12.3) 141 (iii) For every element of area 4D ⊆ (D0)∗ with dimensions 4x and 4y, B(4D) ≈ f(x,y)4x4y (compared to 4x4y). Then for every basic closed region D ⊆ D0, B(D) =ZZD f(x,y)dA. In Elementary Calculus we stated a weaker form of the Inﬁnite Sum Theorem, where (iii) is assumed for all elements of area rather than only for elements of area with ﬁxed dimensions 4x and 4y. The proof there was given only in the case that D is a rectangle. Here we give the proof in general. The proof in the general case is much longer than the proof in the case that D is a rectangle, and uses several lemmas.

Lemma 12.12. Let D be a closed and bounded subset of the open set D0. There is a positive real number ε such that the ε-neighborhood of every point of D is included in D0. Proof. Suppose that for each real ε > 0 there are points P ∈ D andQ / ∈ D0 such that |P −Q| < ε. Then by the Partial Solution Theorem there are hyperreal points P1 ∈ D∗,Q1 / ∈ (D0)∗ with P1 ≈ Q1. Since D is closedand bounded, by Corollary 1.29 and Theorem 1.31, P1 has a standard part P0, and P0 ∈ D. Thus P0 ∈ D0 and Q1 ≈ P0. Since D0 is open, by Theorem 1.28 we have Q1 ∈ (D0)∗. This contradiction completes the proof. a Lemma 12.13. Suppose f is a real function which is continuous on D0. For every basic closed region D ⊆ D0 there is a basic open region D1 such that D ⊆ D1 ⊆ D0 and f is bounded on D1. Proof. Let D = {(x,y): a ≤ x ≤ b, g(x) ≤ y ≤ h(x)}. By Lemma 12.12 there is a real ε > 0 such that the ε-neighborhood of any point of D is included in D0. Therefore the basic closed region D2 = {(x,y): a−ε ≤ x ≤ b + ε, g(x)−ε ≤ y ≤ h(y) + ε} is included in D0. By the Extreme Value Theorem 11.18, f has a maximum and a minimum value in D2, so f is bounded in D2. Let D1 be the interior of D2, D1 = {(x,y): a−ε < x < b + ε, g(x)−ε < y < h(y) + ε}. Then D1 is a basic open region, D ⊆ D1 ⊆ D0, and f is bounded on D1. a Lemma 12.14. Suppose the function B(D) on basic closed regions D ⊆ D0 has the Addition Property and B(D) ≥ 0 for all D. Then whenever D1 ⊆ D2 ⊆ D0 we have B(D1) ≤ B(D2).

142 12. Multiple Integration

Proof. By extending the left and right boundaries of the subregion D1, which are vertical line segments, to the lower and upper boundary curves of D2, we get a partition of D2 into ﬁve basic closed regions D2 = D1 ∪E1 ∪E2 ∪E3 ∪E4 which meet only on boundaries. By the Addition Property, B(D2) = B(D1) + B(E1) + B(E2) + B(E3) + B(E4). Since each term on the right is ≥ 0, B(D1) ≤ B(D2). a Lemma 12.15. Let D be a basic closed region. For every real ε > 0 there is a real δ > 0 such that for every partition of the circumscribed rectangle of D into a grid of subrectangles of length and width less than δ, the total area of the subrectangles which meet the boundary of D is less than ε.

Proof. Let

D = {(x,y): a ≤ x ≤ b, g(x) ≤ y ≤ h(x)} and let [a,b] × [B1,B2] be the circumscribed rectangle of D. Let ε1 be a positive real number. Since g(x) and h(x) are continuous, they are uniformly continuous on [a,b] by Theorem 3.15. Hence there is a real δ > 0 such that δ < ε1 and whenever x1,x2 ∈ [a,b] with |x1 −x2| < δ, we have |g(x2)−g(x1)| < ε1, |h(x2)−h(x1)| < ε1. Let 0 < 4x < δ, 0 < 4y < δ and partition the circumscribed rectangle into a grid of 4x by 4y subrectangles. Each vertical boundary of D is covered by at most two columns of subrectangles which have total area less than 2ε1(B2 −B1), Since g(x) and h(x) change by less than ε1 over an interval of length 4x, each of the upper and lower boundary curves of D is covered by a set of subrectangles of total area less than 2ε1(b−a). Therefore the set of subrectangles which meet the boundary of D has total area less than 4ε1[(B2 −B1) + (b−a)]. Thus a value of δ corresponding to

ε1 =

ε 4ε1[(B2 −B1) + (b−a)] has the required property. a Proof of the Infinite Sum Theorem. Let D be the basic closed region D = {(x,y): a ≤ x ≤ b, g(x) ≤ y ≤ h(x)} and suppose D ⊆ D0. We wish to show that B(D) = RRD f(x,y)dA. By Lemma 12.13, we may assume without loss of generality that f is bounded on D0. We have f(x,y) ≥ 0 on D0 because of hypotheses (i) and (iii). Thus for some real M, 0 ≤ f(x,y) ≤ M on D0. By Lemma 12.12 there is a real ε > 0 such that the ε-neighborhood of every point in D is included in D0. Let

12B. Infinite Sum Theorem for Two Variables (§12.3) 143 [a,b]×[B1,B2] be the circumscribed rectangle of D. Consider real numbers 4X,4Y such that 0 < 4X < ε/2, 0 < 4Y < ε/2. Partition the circumscribed rectangle into a grid of 4X by 4Y subrectangles. (If 4X does not evenly divide b−a. the right column of subrectangles will be smaller, and similarly for 4Y ). Let G be the set of all subrectangles and G1 = {E ∈ G: E in included in the interior of D}, G2 = {E ∈ G: E meets the boundary of D}. Since 4X + 4Y < ε, we have E ⊆ D0 whenever E ∈ G1 ∪ G2. We wish to estimate the diﬀerence between B(D) and the ﬁnite Riemann sum PPD f(x,y)4X4Y . Let p(4X,4Y ) be the total area of the subrectangles which meet the boundary of D, that is, p(4X,4Y ) = area of [G2.For each E ∈ G1 ∪G2 let F(E) = f(x,y)4X4Y where (x,y) is the lower left corner of E, and let q(4X,4Y ) be the maximum of the values |B(E)−F(E)| 4X4Y , E ∈ G1 ∪G2. Since 0 ≤ f(x,y) ≤ M we have 0 ≤ F(E) ≤ M4X4Y, 0 ≤ B(E) ≤ (M + q(4X,4Y ))4X4Y. By Lemma 12.14, X E∈G2 B(D∩E) ≤ X E∈G2 B(E). It follows that

B(D)−XX D f(x,y)4X4Y

≤

X E∈G1 (B(E)−F(E))

+ X E∈G2 B(D∩E) + X E∈G2 F(E) ≤ X E∈G1 |B(E)−F(E)|+ X E∈G2 (B(E) + F(E)) ≤ (b−a)(B2 −B1)q(4X,4Y ) + p(4X,4Y )(2M + q(4X,4Y )). Now consider the given positive inﬁnitesimals 4x,4y. By Transfer,

B(D)−XX D f(x,y)4x4y

144 12. Multiple Integration ≤ (b−a)(B2 −B1)q(4x,4y) + p(4x,4y)(2M + q(4x,4y)). By Lemma 12.15 and Transfer, 0 < p(4x,4y) < ε0 for every positive real ε0, so p(4x,4y) ≈ 0. By the Partial Solution Theorem there is a 4x by 4y element of area 4D such that 4D meets D and |B(4D)−f(x,y)4x4y| 4x4y = q(4x,4y). Therefore 4D ⊆ (D0)∗ and by the hypothesis (iii), q(4x,4y) is inﬁnitesimal. We conclude that

B(D)−XX D f(x,y)4x4y

≈ 0,whence B(D) =ZZD f(x,y)dxdy. a In Elementary Calculus the Inﬁnite Sum Theorem for two variables was used to justify integration formulas for the volume between two surfaces, for mass, ﬁrst moments, and moments of inertia of plane objects, and for the area of a surface. Another application is given in the next section.

12C. Change of Variables in Double Integrals (§12.5)

In Elementary Calculus the Inﬁnite Sum Theorem is used to prove the formula for a double integral over a region D = {(r,θ): α ≤ θ ≤ β, a(θ) ≤ r ≤ b(θ)} in polar coordinates: ZZD f(x,y)dA =Z β α Z b(θ) a(θ) f(rcosθ,rsinθ)rdrdθ. Here we will use the Inﬁnite Sum Theorem to obtain the general formula for a double integral with a change of variables. For comparison we ﬁrst state the simpler result for single integrals (§4.7 in Elementary Calculus). Theorem 12.16. Suppose f(u) is continuous on an open interval I, g maps an open interval J into I, and g is continuously diﬀerentiable on J. Then for all a,b in J, Z g(b) g(a) f(u)du =Z b a f(g(t))g0(t)dt.

12C. Change of Variables in Double Integrals (§12.5) 145 This is proved in Elementary Calculus using the Chain Rule and the Fundamental Theorem of Calculus. In the analogous theorem for two variables, the mapping g is replaced by a smooth transformation from an open region in the (s,t) plane into the (x,y) plane, and the derivative of g is replaced by the Jacobian matrix of the transformation. Our elementary treatment of double integrals over basic closed regions is not adequate for integration by change of variables. The diﬃculty is that the image of a basic closed region under a smooth transformation is not necessarily a basic closed region. To give a natural treatment we need the classical theory of Jordan content. Once this theory is developed, the Inﬁnite Sum Theorem can be used to obtain the change of variables formula. Our main purpose in this section is to show how the Inﬁnite Sum Theorem is used. To prepare the way we sketch a hyperreal form of Jordan content. Let D be a bounded set in the plane. The circumscribed rectangle E of D is the intersection of all closed rectangles containing D. Given positive real numbers 4x and 4y, partition E into a grid of 4x by 4y subrectangles. Let C(4x,4y) be the sum of the areas of the subrectangles which are entirely within D, and let C(4x,4y) be the sum of the areas of the subrectangles which meet D. By the Function Axiom, the natural extensions C(dx,dy) and C(dx,dy) are deﬁned for all positive inﬁnitesimal dx and dy. We say that D has Jordan content A(D) if for all positive inﬁnitesimals dx and dy,

A(D) = st(C(dx,dy)) = st(C(dx,dy)). Intuitively, the total area of the inﬁnitesimal subrectangles included in D∗ has standard part A(D). Also, the total area of the inﬁnitesimal subrectangles which meet D∗ has standard part A(D). The Jordan content of D is also called the area of D.

For the rest of this section, suppose that D is a set whose closure is contained in D0, and D has Jordan content A(D).

The double Riemann sum and double integral are deﬁned exactly as in the case of a basic closed region. That is, for positive inﬁnitesimal dx and dy, we partition the circumscribed rectangle E of D into a dx by dy grid and deﬁne XX D f(x,y)dxdy =XX{f(xK,xL)dxdy: (xK,xL) ∈ D∗}, ZZD f(x,y)dxdy = st XX D f(x,y)dxdy!. The basic properties of double integrals are readily generalized to integrals over sets which have Jordan content, and take the following form. Theorem 12.17. The double integralRRD f(x,y)dxdy does not depend on the inﬁnitesimals dx and dy.

146 12. Multiple Integration Theorem 12.18. (Addition Property) If D = D1 ∪D2 where D1 ∩D2 has Jordan content zero, and D1,D2 have Jordan content, then ZZD f(x,y)dxdy =ZZD1 f(x,y)dxdy +ZZD2 f(x,y)dxdy. Theorem 12.19. (Cylinder Property) Suppose m ≤ f(x,y) ≤ M for (x,y) ∈D . Then m·A(D) ≤ZZD f(x,y)dxdy ≤ M ·A(D). Taking f(x,y) = 1, we see at once that A(D) =ZZD 1dxdy. It follows that if D is a basic closed region, then D has Jordan content equal to the area of D as deﬁned in Chapter 6.

Theorem 12.20. (Sum Rule) If f and g are continuous on D0, then ZZD(f(x,y) + g(x,y))dxdy =ZZD f(x,y)dxdy +ZZD g(x,y)dxdy. This follows from the analogous formula for the double Riemann sum. We now introduce the notion of a smooth transformation.

Definition 12.21. Let D0 be an open set in the (s,t) plane. A smooth transformation T on D0 is a mapping T(s,t) = (g(s,t),h(s,t)) = (x,y) of D0 into the (x,y) plane such that each of the functions x = g(s,t) and y = h(s,t) has continuous partial derivatives on D0. The image of a subset D ⊆ D0 under T is the set T(D) = {T(s,t): (s,t) ∈ D}. The Jacobian of a smooth transformation T, denoted by J(T) or ∂(x,y) ∂(s,t) , is the function

∂(x,y) ∂(s,t)

∂x ∂s

∂x ∂t

∂y ∂s

∂y ∂t

∂x ∂s

∂y ∂t −

∂x ∂t

∂y ∂s

Thus ∂(x,y) ∂(s,t) is a continuous function with domain D0. For example, the polar coordinate transformation x = rcosθ, y = rsinθ is smooth and its Jacobian is

∂(x,y) ∂(s,t)

cosθ −rsinθ sinθ rcosθ

= r.

12C. Change of Variables in Double Integrals (§12.5) 147 We will use the following two Lemmas which can be found in many advanced calculus books (for example, the book Buck [B]).

Lemma 12.22. Let T be a smooth transformation which is one to one and has nonzero Jacobian on an open set D0. Let D be a closed bounded subset of D0 which has Jordan content. Then (i) T(D0) is an open set. (ii) T(D) is a closed bounded set which has Jordan content. (iii) T maps the interior of D onto the interior of T(D), and maps the boundary of D onto the boundary of T(D).

Lemma 12.23. Suppose T is a smooth transformation with nonzero Jacobian on an open set D0, and the partial derivatives ∂x/∂s,∂x/∂t,∂y/∂s,∂y∂t are bounded on D0. Then for every real ε > 0 there is a real δ > 0 such that for every square 4D ⊆ D0 with side 4s < δ, the area of the image of 4D is within ε4s2 of the Jacobian times the area of 4D, that is, |A(T(4D))−|J(T)|4s2| < ε4s2. The number |J(T)4s2| is the area of the parallelogram with vertex T(s,t) and sides 4s∂x ∂s i + ∂y ∂s j, 4s∂x ∂t i + ∂y ∂t j. The lemma is proved by showing that the boundaries of T(4D) are close to the boundaries of the parallelogram.

Theorem 12.24. (Change of Variables) Let T be a smooth transformation which is one to one and has nonzero Jacobian on an open set D0. Then for every continuous function h(x,y) on T(D0) and every basic closed region D ⊆ D0, ZZT(D) h(x,y)dxdy =ZZD h(T(s,t))|J(T)|dsdt. For example, if D is a basic closed region D = {(r,θ): α ≤ θ ≤ β, a(θ) ≤ r ≤ b(θ)} where α < β < α + 2π and 0 < a(θ) ≤ b(θ), then the polar coordinate transformation T satisﬁes the hypotheses of the theorem, and we obtain the formula ZZT(D) h(x,y)dxdy =ZZD h(rcosθ,rsinθ)rdrdθ. We required β < α + 2π to make T one to one, and 0 < a(θ) to make r = J(T) 6= 0 in D. But by passing to a limit we see that the polar integration formula is valid even for α ≤ β ≤ α + 2π, 0 ≤ a(θ) ≤ b(θ). Proof of Theorem 12.24. Let D be a basic closed region contained in D0. By Lemma 12.13 we may assume that the partial derivatives of x and y

148 12. Multiple Integration

are bounded on D0. We may also assume that h(T(s,t)) is bounded on D0, so that h(x,y) is bounded on T(D0). We ﬁrst give the proof in the case that h(x,y) ≥ 0 on T(D0). Let B(D1) be deﬁned for basic closed regions D1 ⊆ D0 by B(D1) =ZZT(D1) h(x,y)dxdy. We want to prove that B(D) =ZZD h(T(s,t))|J(T)|dsdt.(62) To do this we will use the Inﬁnite Sum Theorem 12.11. By the Cylinder Property, B(D1) ≥ 0 for all D1. If D1 = D2 ∪ D3 where D2 and D3 have disjoint interiors, then T(D1) = T(D2)∪T(D3), and by Lemma 12.22, T(D2) and T(D3) have disjoint interiors. Thus by the Addition Property for double integrals, B(D1) = B(D2) + B(D3). We have now shown that hypotheses (i) and (ii) of the Inﬁnite Sum Theorem hold for B(D). Hypothesis (iii) says that for at least one pair of positive inﬁnitesimals 4s,4t, for every element of area 4D ⊆ (D0)∗ with lower left corner (s,t) and dimensions 4s and 4t we have B(4D) ≈ h(T(s,t))|J(T)|4s4t (compared to 4s4t). Since we only need this for some pair of positive inﬁnitesimals, it suﬃces to prove it for a pair of positive inﬁnitesimals 4s,4t with 4t = 4s. Suppose 4s is positive inﬁnitesimal, 4D is an inﬁnitesimal square of side 4s with lower left corner (s,t), and 4D is contained in D∗. By the Hyperreal Extreme Value Theorem 11.19, h(T(u,v)) has a minimum value m and a maximum value M for (u,v) ∈ 4D. Therefore h(x,y) has minimum value m and maximum value M for (x,y) ∈ T(4D). Since T is continuous, every point of T(4D) is inﬁnitely close to T(s,t). By the continuity of f, m and M are inﬁnitely close to h(T(s,t)). Then by the Cylinder Property and Transfer Axiom, m·A(T(4D)) ≤ B(4D) ≤ M ·A(T(4D)), whence B(4D) ≈ h(T(s,t))A(T(4D)) (compared to A(T(4D))). Therefore by Lemma 12.23, B(4D) 4s2 = B(4D) A(T(4D)) A(T(4D)) 4s2 ≈ h(T(s,t))|J(T)|, and B(4D) ≈ h(T(s,t))|J(T)|4s2compared to 4s2. Hence by Inﬁnite Sum Theorem, B(D) =ZZD h(T(s,t))|J(T)|dsdt.

12C. Change of Variables in Double Integrals (§12.5) 149 Finally, we consider the general case where f is not necessarily ≥ 0 in D0. Since f is bounded on D0 there is a real number c > 0 such that h(x,y) ≥−c on D0. then h(x,y) + c ≥ 0 on D0, so by the previous case, ZZT(D)(h(x,y) + c)dxdy =ZZD(h(T(s,t)) + c)|J(T)|dsdt, ZZT(D) cdxdy =ZZD c|J(T)|dsdt. Using the Sum Rule for double integrals, Theorem 12.20, ZZT(D) h(x,y)dxdy =ZZT(D)(h(x,y) + c)dxdy−ZZT(D) cdxdy =ZZD(h(T(s,t)) + c)|J(T)|dsdt−ZZD c|J(T)|dsdt =ZZD h(T(s,t))|J(T)|dsdt. a

CHAPTER 13

VECTOR CALCULUS

13A. Line Integrals (§13.2)

In this section we will give a hyperreal characterization of the line integral. Recall that by Theorem 7.4, every smooth parametric curve has a reparametrization in which the curve length itself is the independent variable. We will take line integrals over parametric curves of this kind.

Definition 13.1. Let A and B be points in the (x,y) plane. A smooth curve from A to B is a real parametric curve C: x = g(s), y = h(s), s ∈ [0,L] where g(s) and h(s) are continuously diﬀerentiable functions on the closed interval [0,L], (g(0),h(0)) = A, (g(L),h(L)) = B, and s is the length of the part of the curve from A to the point (x,y) = (g(s),h(s)). We call A the initial point of C, and B the terminal point of C.

In Elementary Calculus the line integral was deﬁned as an ordinary deﬁnite integral as follows.

Definition 13.2. Let

F(x,y) = P(x,y)i + Q(x,y)j

be a continuous real vector valued function on an open rectangle D in the (x,y) plane which contains a smooth curve C. The line integral of F along C, denoted by ZC F·dS =ZC P dx + Qdy, is deﬁned as the deﬁnite integral Z L 0 P dx ds + Q dy dsds where x = g(s) and y = h(s).

151

152 13. Vector Calculus Let n be a positive integer and let 4s = L/n. For k = 0,... ,n−1 let sk = k4s, xk = g(sk), yk = h(sk), 4xk = xk+1 −xk, 4yk = yk+1 −yk, 4Sk = 4xki +4ykj. The ﬁnite Riemann sum along C, X C F·4S =X C P4x + Q4y, is deﬁned as the sum n X k=0 F(xk,yk)·4Sk =X C P(xk,yk)4xk + Q(xk,yk)4yk. This sum corresponds to a polygonal line connecting n+1 points along C. For a given curve C and vector valued function F it depends only on n, so we may write I(n) =X C F·4S. By the Function Axiom, the natural extension I(H) is deﬁned for all positive hyperintegers H. When H is a positive inﬁnite hyperinteger, the value I(H) is called an inﬁnite Riemann sum along C and is denoted by I(H) =X C F·dS =X C P dx + Qdy. Theorem 13.3. Let F(x,y) be a continuous real vector valued function on an open rectangle D containing a smooth curve C. Let H be a positive inﬁnite hyperinteger. Then the line integral of F along C is equal to the standard part of the inﬁnite Riemann sum of F along C, ZC F·dS = st X C F·dS!, or ZC P dx + Qdy = st X C P dx + Qdy!. Proof. First consider a positive integer n and form the ﬁnite Riemann sum X C P dx + Qdy =X C P(xk,yk)4xk + Q(xk,yk)4yk. Let r be a positive real number. If the Riemann sum diﬀers from the line integral by at least r, then on one of the subintervals the Riemann sum diﬀers from the line integral by at least r/n. That is, every real solution of n ∈Z, 0 < n,

X C P4x + Q4y−ZC P dx + Qdy

≥ r(63)

13A. Line Integrals (§13.2) 153 is a partial real solution of 4s = L/n, 0 ≤ s < s +4s ≤ L, x = g(s), y = h(s), 4x = g(s +4s)−g(s), 4y = h(s +4s)−h(s),(64)

P(x,y)4x + Q(x,y)4y−Z s+4s s (Pg0 + Qh0)ds

≥ r/n. By the Partial Solution Theorem, every hyperreal solution of (63) is a partial solution of (64). Now form the inﬁnite Riemann sumPC P dx + Qdy with respect to H. Assume that

X C P dx + Qdy−ZC P dx + Qdy

≥ r.(65) Let 4s = L/H. Then there exist 0 ≤ s1 < s1 +4s ≤ L such that, putting x1 = g(s1), y1 = h(s1), 4x = g(s1 +4s)−g(s1), 4y = h(s1 +4s)−h(s1), we have

P(x1,y1)4x + Q(s1,y1)4y−Z s1+4s s1 (Pg0 + Qh0)ds

≥ r H . Dividing by 4s,

P(x1,y1)4x 4s + Q(x1,y1)4y 4s −Rs1+4s s1 (Pg0 + Qh0)ds 4s

≥ r L .(66) Since x = g(s),y = h(s), andRs 0 (Pg0 + Qh0)ds are continuously diﬀerentiable functions of s, we have 4x 4s ≈ g0(s1), 4y 4s ≈ h0(s1), Rs1+4s s1 (Pg0 + Qh0)ds 4s ≈ P(x1,y1)g0(s1) + Q(x1,y1)h0(s1). But then P(x1,y1)4x 4s + Q(x1,y1)4y 4s ≈Rs1+4s s1 (Pg0 + Qh0)ds 4s , contradicting (66). We conclude that (65) fails for every positive real r. It follows that X C P dx + Qdy ≈ZC P dx + Qdy. a

154 13. Vector Calculus

A piecewise smooth curve is a parametric curve C which is a ﬁnite union of smooth curves C = C1 ∪···∪Cn such that the terminal point of Ck is the initial point of Ck+1 for k = 1,... ,n − 1. The line integral of F along a piecewise smooth curve C is deﬁned as the sum ZC F·dS =ZC1 F·dS +···+ZCn F·dS.

13B. Green’s Theorem (§13.3, §13.4)

In this section we will prove Green’s Theorem for basic closed regions with smooth boundaries, using the Inﬁnite Sum Theorem for one variable. We need two lemmas about single integrals of functions with more than one variable.

Lemma 13.4. Suppose E is a real open rectangular solid, f(x,y,z) and g(x,y,z) are continuous on E, the hyperreal points (a,y,z) and (b,y,z) belong to E∗, b−a is ﬁnite, and f(x,y,z) ≈ g(x,y,z) for all x ∈ [a,b]∗. Then Z b a f(x,y,z)dx ≈Z b a g(x,y,z)dx. Proof. For each real r > 0 we have f(x,y,z)−r ≤ g(x,y,z) ≤ f(x,y,z) + r for all x ∈ [a,b]∗. By the rules of integrals in one variable and Transfer, Z b a f(x,y,z)dx−r(b−a) ≤Z b a g(x,y,z)dx ≤Z b a f(x,y,z)dx + r(b−a). Therefore

Z b a f(x,y,z)dx−Z b a g(x,y,z)dx

≤ r(b−a).Since b−a is ﬁnite and this holds for all real r > 0, we have Z b a f(x,y,z)dx ≈Z b a g(x,y,z)dx. a The following lemma was stated in Section §13.3 of Elementary Calculus, and used in the standard proof of the Path Independence Theorem for line integrals. A proof of the lemma was sketched there. We give a complete proof here.

13B. Green’s Theorem (§13.3, §13.4) 155 Lemma 13.5. Suppose P(x,y) is a smooth function on a real open rectangle D containing the point (a,b). Then whenever (x,y) ∈ D, ∂ ∂xZ x a P(t,y)dt = P(x,y) and ∂ ∂yZ x a P(t,y)dt =Z x a ∂P ∂y (t,y)dt. Proof. The ﬁrst formula follows at once from the Second Fundamental Theorem of Calculus, Theorem 4.17. We prove the second formula. Let F(x,y) =Z x a P(t,y)dt, (x,y) ∈ D. Any real solution of (x,y) ∈ D, (x,y +4y) ∈ D, 4y 6 (67) = 0 is a solution of F(x,y +4y)−F(x,y) 4y =Z x a P(t,y +4y)−P(t,y) 4y dt.(68) By Transfer, any hyperreal solution of (67) is a solution of (68). Now let 4y be positive inﬁnitesimal. By Theorem 11.8, P is uniformly diﬀerentiable on D. Therefore, whenever (x,y) ∈ D and t ≈ x, P(t,y +4y)−P(t,y) ≈ ∂P ∂y (x,y)4y (compared to 4y), so P(t,y +4y)−P(t,y) 4y ≈ ∂P ∂y (x,y). Since P is smooth on D, ∂P/∂y is continuous on D. Hence whenever (x,y) ∈ D and t ≈ x, the above formula holds with t in place of x, P(t,y +4y)−P(t,y) 4y ≈ ∂P ∂y (t,y). Since D is open, (x,y +4y) ∈ D∗ whenever (x,y) ∈ D. The function P(x,y +4y)−P(x,y) 4y is continuous in the variables (x,y,4y) on D ×(0,∞). Now ﬁx (x,y) ∈ D. By Lemma 13.4, Z x a P(t,y +4y)−P(t,y) 4y dt ≈Z x a ∂P ∂y (t,y)dt. Then by (68), F(x,y +4y)−F(x,y) 4y ≈Z x a ∂P ∂y (t,y)dt.

156 13. Vector Calculus

Taking standard parts, we have ∂F ∂y (x,y) =Z x a

∂P ∂y

(t,y)dt.

Given a basic closed region D = {(x,y): a ≤ x ≤ b, g(x) ≤ y ≤ h(x)}, whose upper and lower boundary curves g(x) and h(x) are smooth, the (counterclockwise) boundary curve of D is the piecewise smooth curve ∂D = C1 ∪C2 ∪C3 ∪C4 where

C1 is the lower boundary curve of D moving from left to right, C2 it the right vertical boundary line of D moving upward, C3 is the upper boundary curve of D moving from right to left, C4 is the left vertical boundary line of D moving downward. ∂D is a closed curve, that is, its initial point and terminal point are the same, both equal to (a,g(a)). The line integral of P dx + Qdy around the boundary curve ∂D is denoted by I∂D P dx + Qdy. In Elementary Calculus, Green’s theorem was stated for basic closed regions but only proved for rectangles. Here we prove the general case.

Theorem 13.6. (Green’s Theorem) Suppose P(x,y) and Q(x,y) are smooth functions on an open set containing a basic closed region D with smooth upper and lower boundary curves. Then I∂D P dx + Qdy =ZZD∂Q ∂x − ∂P ∂ydA. Proof. For a ≤ u < v ≤ b let B(u,v) =I∂E(u,v) P dx + Qdy where E(u,v) = {(x,y) ∈ D: u ≤ x ≤ v} is the part of D with x ∈ [u,v]. Then B has the Addition Property B(u,v) + B(v,w) = B(u,w)

because the right boundary of E(u,v) is the same vertical line segment V as the left boundary of E(v,w), and the upward line integral over V in B(u,v)

13B. Green’s Theorem (§13.3, §13.4) 157 cancels the downward line integral over V in B(v.w). Our plan is to show that for any inﬁnitesimal subinterval [x,x +4x]∗ of [a,b]∗, 4B 4x ≈ P(x,g(x))−P(x,h(x)) +Z h(x) g(x) ∂Q ∂x (x,y)dy.(69) After (69) is veriﬁed, the proof is completed as follows. By (69) and the Inﬁnite Sum Theorem 6.1, B(a,b) =Z b a "P(x,g(x))−P(x,h(x)) +Z h(x) g(x) ∂Q ∂x (x,y)dy#dx. By the Fundamental Theorem of Calculus 4.14, P(x,g(x))−P(x,h(x)) =Z h(x) g(x) − ∂P ∂y dy for each real x ∈ [a,b]. Therefore B(a,b) =Z b a Z h(x) g(x) ∂Q ∂x − ∂P ∂ydy dx =ZZD∂Q ∂x − ∂P ∂ydA, as required. It remains to prove (69). First let [x,x +4x] be a real subinterval of [a,b], that is, a ≤ x < x +4x ≤ b.(70) Then 4B 4x = 1 4xZ x+4x x [P(t,g(t)) + Q(t,g(t))g0(t)]dt + 1 4xZ h(x+4x) g(x+4x) Q(x +4x,y)dy − 1 4xZ x+4x x [P(t,h(t)) + Q(t,h(t))h0(t)]dt − 1 4xZ h(x) g(x) Q(x,y)dy. We rewrite this equation in the form

158 13. Vector Calculus

4B 4x

1 4xZ x+4x x [P(t,g(t))−P(t,h(t))]dt(71)

1 4xZ x+4x x

[Q(t,g(t))g0(t)−Q(t,h(t))h0(t))]dt

1 4xZ h(x) g(x)

[Q(x +4x,y)−Q(x,y)]dy

−

1 4xZ g(x+4x) g(x)

Q(x +4x,y)dy

1 4xZ h(x+4x) h(x)

Q(x +4x,y)dy.

By Transfer, every hyperreal solution of (70) is a solution of (71). Now let [x,x +4x]∗ be an inﬁnitesimal subinterval of [a,b]∗. Then (x,x +4x) is a hyperreal solution of (70), so it is also a solution of (71). Let us now consider the third line of (71). Let G(x,t) be the function G(x,t) =Z h(x) g(x) Q(t,y)dy. Then the third line of (71) is equal to the quotient G(x,x +4x)−G(x,x) 4x = 1 4xZ h(x) g(x) [Q(x +4x,y)−Q(x,y)]dy.(72) By Lemma 13.5, for x,t in [a,b] we have

∂ ∂t

G(x,t) =

∂ ∂tZ h(x) g(x)

Q(t,y)dy =Z h(x) g(x)

∂Q ∂t (t,y)dy.(73) The right side of (73) is continuous in x and t. Since g0(x) and h0(x) are continuous, one can see from the Second Fundamental Theorem of Calculus that ∂ ∂x G(x,t) is also continuous in x and t. Therefore by Theorem 11.8, G(x,t) is uniformly diﬀerentiable on [a,b]×[a,b], and hence for all t ≈ x we have G(x,x +4x)−G(x,x) 4x ≈ ∂ ∂t G(x,t).(74) Then by (72)– (74), the third line of (71) has the inﬁnitely close approximation

1 4xZ h(x) g(x)

[Q(x +4x,y)−Q(x,y)]dy ≈Z h(x) g(x)

∂Q ∂x

(x,y)dy.

It follows from the Hyperreal Mean Value Theorem 3.34 that the sum of the second, fourth, and ﬁfth lines of (71) is inﬁnitesimal. Therefore, from (71) we

13B. Green’s Theorem (§13.3, §13.4) 159

obtain

4B 4x ≈ P(x,g(x))−P(x,h(x)) +Z h(x) g(x)

∂Q ∂x

(x,y)dy. This completes the proof of (69). a

CHAPTER 14

DIFFERENTIAL EQUATIONS

In Elementary Calculus, most of the material in Chapter 14 deals with ways to ﬁnd explicit solutions of ﬁrst and second order linear diﬀerential equations, using standard methods. However, inﬁnitesimals are used extensively in Section 14.4, on the approximation, existence, and uniqueness of solutions of general ﬁrst order diﬀerential equations with continuous coeﬃcients. We will expand upon that section here. We assume throughout this chapter that f(t,y) is a real function which is continuous on I×R for some open interval I containing t0, and that [t0,T) is a half-open subinterval of I, where T is either a positive real number or ∞. A (ﬁrst order) diﬀerential equation is an equation of the form

dy dt

= f(t,y).

An initial value problem is a diﬀerential equation together with an initial value,

dy dt = f(t,y), y(t0) = y0.(75)

A solution of the initial value problem on [t0,T) is a function y(t) with domain [t0,T) which satisﬁes (75) for all t ∈ [t0,T). It is helpful to think of the variable t as time. Using the Fundamental Theorem of Calculus 4.14 and the Second Fundamental Theorem 4.17, (75) can also be written as an integral equation y(t) = y0 +Z t t0 f(s,y(s))ds.(76) All the results we present in this chapter can be generalized without diﬃculty to ﬁnite systems of diﬀerential equations, where f and y are vector valued functions of dimension n. We will conﬁne our attention to solutions on intervals [t0,T), that is, solutions with time t ≥ t0. One can easily extend the results to solutions on intervals (−T,t0] by making a change of variables.

161

162 14. Differential Equations 14A. Existence of Solutions (§14.4) In this section we will use hyperﬁnite Euler approximations to prove that every initial value problem has at least one solution. At the end of the section we will draw a conclusion about standard Euler approximations of solutions. Let 4t > 0 be real. By a polygonal function on [t0,T) with increment 4t we will mean a real function Y (t),t ∈ [t0,T) such that for each natural number n, the graph of Y (t) for t0 + n4t ≤ t ≤ t0 + (n + 1)4t is a straight line segment. Thus a polygonal function with increment 4t is continuous and its graph is a broken line with corners at t0 plus multiples of 4t. For a given increment 4t and assignment of values at multiples of 4t, there is exactly one polygonal function. The (standard) Euler approximation with initial value y0 and increment 4t for a diﬀerential equation y0(t) = f(t,y) is the polygonal function Y (t) on [t0,T) with increment 4t such that Y (t0) = y0 and for each t = t0 +n4t, the slope on the interval [t,t +4t] is equal to the value f(t,Y (t)). That is, Y (·) is the polygonal function on [t0,T) with increment 4t such that Y (t0) = y0, Y (t +4t) = Y (t) + f(t,Y (t))4t for t = t0 + n4t. It is deﬁned for each natural number n by induction, starting with Y (t0) = y0, Y (t0 +4t) = y0 + f(t0,y0)4t. Thus for each positive integer n, Y (t0 + n4t) is equal to the ﬁnite sum Y (t0 + n4t) = y0 + n−1 X k=0 f(t0 + k4t,Y (t0 + k4t))4t. Following the notation for Riemann sums, we may write this as

Y (s) = y0 +

s X t0

f(t,Y (t))4t.

For a given function f(t,y), let us denote the Euler approximation on [t0,T) with initial value y0 and increment 4t by Yy0,4t. Yy0,4t(t) is a function of three real variables (y0,4t,t). By the Function Axiom, its natural extension is a hyperreal function of three hyperreal variables. We call the Euler approximation Yz,dt on [t0,T)∗ with inﬁnitesimal increment dt and an initial value z ≈ y0 a hyperﬁnite Euler approximation of the initial value problem (75), and write it as an inﬁnite sum

Y (s) = z +

s X t0

f(t,Y (t))dt.

For each z and dt, Yz,dt(·) is a hyperreal function of t with domain [t0,T)∗. In Elementary Calculus, we only considered hyperﬁnite Euler approximations with the standard initial value y0. Here we will give a fuller treatment, and it will be useful to allow hyperreal initial values as well. One advantage of

14A. Existence of Solutions (§14.4) 163 doing this is that one can sometimes get a diﬀerent solution of the initial value problem by making an inﬁnitesimal change in the initial value in an Euler approximation, as in Section 14C of this chapter. Given a hyperreal function Y (·) with domain [t0,T)∗, a real function y(·) is said to be the standard part of Y (·) on [t0,T) if whenever t ∈ [t0,T)∗, s ∈ [t0,T), and t ≈ s we have Y (t) ≈ y(s). That is, the standard part of Y (t) exists and depends only on the standard part of t, and we have y(st(t)) = st(Y (t)). The next theorem shows that as long as a hyperﬁnite Euler approximation is ﬁnite, its standard part exists and is a solution of the initial value problem on some interval [t0,T). The special case of the theorem with z = y0 was already proved in Elementary Calculus.

Theorem 14.1. Let Y (t) be the hyperﬁnite Euler approximation of the initial value problem dy dt = f(t,y), y(t0) = y0 with initial value z and positive inﬁnitesimal increment dt. Suppose Y (t) is ﬁnite whenever st(t) ∈ [t0,T). Then the standard part of Y (·) exists and is a solution of the initial value problem on the interval [t0,T). Proof. We will show that the standard part of Y (·) on [t0,T) exists and is a solution of the of the equivalent integral equation (76) on [t0,T). It is enough to show that for each positive real number c ∈ (t0,T), the standard part of Y (·) exists and is a solution of (76) on [t0,c). Take an arbitrary real c ∈ (t0,T). For each real s ∈ [t0,c] let y(s) = st(Y (s)). y(·) is a real function which has a natural extension y∗(·) to [t0,c]∗. The main steps are to prove the following for each t ∈ [t0,c]∗: st(Y (t)) = st(y(t)) = y(st(t (77) ))

and

st(Y (t)) = y0 + st t X t0 f(s,y(s))dt!.(78) It follows from (77) that y(·) is the standard part of Y (·) on [t0,c]. The right side of (78) is the inﬁnite Riemann sum of f(s,y(s)), and it follows that y(t) = y0 +Z t t0 f(s,y(s))dt for all t ∈ [t0,c]. Proof of (77): By the Extreme Value Theorem, for each real initial value u and positive real increment 4t, the continuous function |f(t,Yu,4t(t))| has a maximum Nu,4t ∈ [t0,c], and a maximum value Mu,4t. Yu,4t(t) never

This proves (77). By (77), the function y(·) is continuous on [t0,c]. By the Extreme Value Theorem, for each positive real 4t and initial value u the diﬀerence |f(t,Yu,4t(t))−f(t,y(t))| has a maximum at some point Ju,4t ∈ [t0,c] and a maximum value Ku,4t. By Transfer, |f(t,Yz,dt(t))−f(t,y(t))| has a maximum at Jz,dt in [t0,c]∗ and a maximum value Kz,dt. By (77) and the continuity of f, Kz,dt is inﬁnitesimal. For each positive real 4t and real u, and each t ∈ [t0,c], we have

Yu,4t(t)− u + t X t0 f(s,y(s))4t!

≤ Ku,4t(t−t0).By Transfer, for each t ∈ [t0,c]∗ we have

Yz,dt(t)− z + t X t0 f(s,y(s))dt!

≤ Kz,dt(t−r).Since Kz,dt is inﬁnitesimal, this proves (78). a To apply the above theorem, we need a convenient criterion for a hyperﬁnite Euler approximation to be ﬁnite. The next theorem gives one.

14A. Existence of Solutions (§14.4) 165 Theorem 14.2. We are given an initial value problem dy dt = f(t,y), y(t0) = y0. Let M and ε be positive reals. (i) Suppose |f(t,y)|≤ M whenever t ∈ [t0,T) and |y−y0|≤ M(t−t0) + ε. Then for every hyperﬁnite Euler approximation Y (·) with initial value z ≈ y0, the standard part of Y (·) exists and is a solution of the initial value problem on the interval [t0,T). (ii) Suppose |f(t,y)|≤ M whenever t ∈ [t0,T) and |y−y0|≤ M(t−t0). Then for every hyperﬁnite Euler approximation Y (·) with initial value y0, the standard part of Y (·) exists and is a solution of the initial value problem on the interval [t0,T). Proof. (i) We show that Y (t) is ﬁnite whenever st(t) ∈ [t0,T). and then use Theorem 14.1. Suppose c,u are real numbers such that t0 < c < T, u is within ε of y0, and 4t > 0. Let U = Yu,4t. We show by induction on n that if t = t0 + n4t ≤ c then |U(t)−u|≤ M(t−t0).(80) This is true for n = 0 because U(t0) = u. Assume that (80) is true for n and let s = t0 + n4t, t = s +4t. Then t = t0 + (n + 1)4t. Suppose t ≤ c. Then s ≤ c, so by the induction hypothesis we have |U(s)−u|≤ M(s−t0). But u is within ε of y0, so |U(s)−y0|≤ M(s−t0) + ε, and thus |f(s,U(s))|≤ M. By deﬁnition, U(t) = U(s +4t) = U(s) + f(s,U(s))4t. Therefore |U(t)−u|≤|U(s)−u|+|f(s,U(s))|4t ≤ M(s−t0) + M4t = M(t−t0) as required. This completes the induction. Since U is a polygonal function, it follows that |U(t)−u|≤ M(t−t0) for all t ∈ [t0,c]. Since z ≈ y0, z is within ε of y0. By Transfer, |Y (t)−z| ≤ M(t−t0) and hence Y (t) is ﬁnite whenever t ∈ [t0,c]∗. Since this holds for all c ∈ (t0,T), Y (t) is ﬁnite whenever st(t) ∈ [t0,T), as required. The proof of (ii) is similar, but with ε = 0 and z = y0. a Corollary 14.3. (Peano Existence Theorem) For some positive real c, the initial value problem dy dt = f(t,y), y(0) = y0 has a solution on [t0,c).

166 14. Differential Equations Proof. Choose real a ∈ (t0,T) and b > 0. By the Extreme Value Theorem, |f(t,y)|has a maximum value M on the closed rectangle [t0,a]×[y0−b,y0+b]. Let c = min(a,b/M). Then |f(t,y)|≤ M whenever t ∈ [t0,c] and |y−y0|≤ M(t−t0) because when t ∈ [t0,c], t ∈ [t0,a] and M(t−t0) ≤ Mc ≤ b. By Theorem 14.2, the standard part of a hyperﬁnite Euler approximation with initial value y0 is a solution on [t0,c]. a Corollary 14.4. If f(t,y) is bounded on [t0,T)×R then for every y0 the initial value problem has a solution on [t0,T).

If y(t) is a solution of the initial value problem and is the standard part of some hyperﬁnite Euler approximation on [t0,T), we say that y(t) is Euler approximable on [t0,T). Theorem 14.2 shows that every initial value problem has an Euler approximable solution on some subinterval [t0,c) of [t0,T). The preceding two corollaries also give Euler approximable solutions. We now show that any Euler approximable solution of an initial value problem has the property that it is within ε of a standard Euler approximation for each real ε > 0.

Proposition 14.5. Given an initial value problem dy dt = f(t,y), y(0) = y0, let y(·) be an Euler approximable solution on [t0,T). Then y(·) has the following standard property: for each real ε > 0 and c < T there are real numbers u,4t such that u is within ε of y0, 4t ∈ (0,ε), and the Euler approximation Yu,4t(t) is within ε of y(t) for all t ∈ [t0,c]. Proof. Since y(·) is Euler approximable, there is a hyperreal value z ≈ y0 and inﬁnitesimal dt > 0 such that y(·) is the standard part of the hyperﬁnite Euler approximation Yz,dt(·) on [t0,T). Suppose that the property fails for some ε > 0 and c < T. Then every real solution of u = u, 0 < 4t < ε is a partial real solution of t ∈ [t0,c], |Yu,4t(t)−y(t)| > ε. By the Partial Solution Theorem, for every hyperreal z ≈ y0 and inﬁnitesimal dt > 0, there exists t ∈ [t0,c]∗ such that |Yz,dt(t)−y∗(t)| > ε. But y(·) is continuous on [t0,c], so y(st(t)) ≈ st(y∗(t)), and hence |Yz,dt(t)−y(st(t))| > ε. This contradicts the fact that y(·) is the standard part of Yz,dt(·) on [t0,T), and completes the proof. a

14B. Uniqueness of Solutions (§14.4) 167 The above result also has a converse which we will not prove here. It says that any solution of the initial value problem which satisﬁes the property in the theorem is Euler approximable. In the next section we will see that if the function f is nice then the solution of the initial value problem is unique and Euler approximable. In the last section we will examine in detail an example of an initial value problem where there are many solutions, but every solution is Euler approximable.

14B. Uniqueness of Solutions (§14.4) Definition 14.6. We say that the function f has Lipschitz bound L (in y) on a rectangle E if for all points (t,y) and (t,z) in E, |f(t,y)−f(t,z)|≤ L|y−z|. We say that f is locally Lipschitz on a rectangle E if f has a ﬁnite Lipschitz bound on each bounded closed rectangle D ⊆ E. Proposition 14.7. If the partial derivative fy(t,y) exists and is continuous on an open rectangle E, then f is locally Lipschitz on E.

Proof. Let D be a bounded closed rectangle contained in E. By the Extreme Value Theorem, |fy(t,y)| has a ﬁnite bound L on D. Take (t,y) and (t,z) in D, with y < z. By the Mean Value Theorem 3.30, there is a point u ∈ (y,z) such that f(t,z)−f(t,y) z−y = fy(t,u). The point (t,u) also belongs to D. Therefore |fy(t,u)|≤ L, and hence |f(t,z)−f(t,y)|≤ L|z−y|, and f has Lipschitz bound L on D. a Lemma 14.8. Suppose L > 0, g(t) is continuous on [t0,c], and for all t ∈[ t0,c], 0 ≤ g(t) ≤ LZ t t0 g(s)ds. Then g(t) = 0 for all t ∈ [t0,c]. Proof. Let h(t) =Rt t0 g(s)ds. Then h(t0) = 0, and by the Second Fundamental Theorem of Calculus, h0(t) = g(t) for t ∈ (t0,c). By hypothesis, g(t)−Lh(t) ≤ 0, so h0(t)−Lh(t) ≤ 0. By the Product Rule for derivatives, d dte−Lth(t)= e−Lth0(t)−Le−Lth(t) = e−Lt(h0(t)−Lh(t)).Thus e−Lth(t) is continuous on [t0,c] and has derivative ≤ 0 on (t0,c). By Corollary 3.31, 0 = e−Lt0h(t0) ≥ e−Lth(t) ≥ 0 and hence h(t) = 0 for all t ∈ [t0,c]. Therefore g(t) = 0 for all t ∈ [t0,c]. a

168 14. Differential Equations

Theorem 14.9. (Uniqueness Theorem) Suppose f is locally Lipschitz on [t0,T)×R. Then the initial value problem dy dt = f(t,y), y(t0) = y0 has at most one solution on [t0,T). It also has at most one solution on [t0,S)× R where 0 < S ≤ T. Proof. Since f is locally Lipschitz on [t0,T)×R, it is also locally Lipschitzon [ t0,S)×R. Let x(t) and y(t) be two solutions and let t0 < c < S. By the Extreme Value Theorem, |x(t)| and |y(t)| have maximum values for t ∈ [t0,c], so there is a closed rectangle D ⊆ [t0,S) ×R which contains the graphs of x(t) and y(t). By hypothesis, f has a ﬁnite Lipschitz bound L on D. Let g(t) = |x(t)−y(t)|. Then g is continuous on [t0,c]. For all t ∈ [t0,c], g(t) = |x(t)−y(t)| =

Z t t0 f(s,x(s))ds−Z t t0 f(s,y(s))ds

Z t t0 (f(s,x(s))−f(s,y(s)))ds

≤Z t t0 |f(s,x(s))−f(s,y(s))| ds ≤Z t t0 L|x(s)−y(s)| ds =Z t t0 Lg(s)ds. Thus by Lemma 14.8, g(t) = 0 for all t ∈ [t0,c]. Since this holds whenever t0 < c < S, x(t) = y(t) for all t ∈ [t0,S). a Corollary 14.10. Suppose fy(t,y) exists and is continuous on I×R. Then the initial value problem

dy dt

= f(t,y), y(t0) = y0

has at most one solution on [t0,T). Proof. By Proposition 14.7 and Theorem 14.9. a Corollary 14.11. Let J be an open interval which contains the initial value y0, and suppose that fy(t,y) exists and is continuous on I ×J. Then the initial value problem

dy dt

= f(t,y), y(t0) = y0 has at most one solution y(·) on [t0,T) such that y(t) ∈ J for all t ∈ [t0,T). Proof. Let x(t) and y(t) be two solutions such that x(t) ∈ J and y(t) ∈ J for all t ∈ [t0,T), and let t0 < c < T. By the Extreme Value Theorem, x(t) and y(t) have maximum and minimum values for t ∈ [t0,c], so there is a closed

14B. Uniqueness of Solutions (§14.4) 169 interval [a,b] ⊆ J such that x(t) ∈ [a,b] and y(t) ∈ [a,b] for all t ∈ [t0,c]. Let g(t,y) be the function deﬁned by g(t,y) =     f(t,b) if y > b f(t,y) if y ∈ [a,b] f(t,a) if y < a Since g agrees with f on [t0,c]×[a,b], x(·) and y(·) are solutions of the initial value problem dy dt = g(t,y), y(t0) = y0(81) on [t0,c). Observe that g is continuous on I ×R. By Proposition 14.7, f is locally Lipschitz on I ×J, and hence g is locally Lipschitz on I ×R. By Theorem 14.9, there is at most one solution of (81) on [t0,c), so x(t) = y(t) for all t ∈ [t0,c). Since this holds for all c ∈ (t0,T), x(t) = y(t) for all t ∈ [t0,T). a Corollary 14.12. Suppose that for each c ∈ (t0,T) the initial value problem (75) has at most one solution on [t0,c). Then there is an S (a real number or ∞) such that t0 < S ≤ T and: (i) There is a unique solution on [t0,R) when R ≤ S, and (ii) There is no solution on [t0,R) when S < R ≤ T. Proof. Let J be the set of all c ∈ (t0,T) such that either c = t0 or there exists a solution yc(·) on [t0,c). For each c ∈ J and b ∈ (0,c), the restriction of yc(·) to [t0,b) is a solution on [0,b), and therefore b ∈ J. We conclude that {t0}∪J is an interval [t0,S) ⊆ [t0,T) for some S ≤ T. The Peano Existence Theorem 14.3 shows that 0 < S. Since there is at most one solution on [t0,b) for each b ∈ (t0,S), yb(t) = yc(t) whenever t0 ≤ t < b < c < S. Thus the union y(·) =[{yc(·): c ∈ J}is a function on [ t0,S) and is the unique solution on [t0,S). This proves (i). If S < R ≤ T, then there is a real c with S < c < R. Then c does not belong to J, so there is no solution on [t0,c) and hence no solution on [t0,R). This proves (ii). a Corollary 14.13. Suppose that for each c ∈ (t0,T) the initial value problem (75) has exactly one solution on [t0,c). Then it has exactly one solution on [t0,T). Proof. In this case, we must have S = T in Corollary 14.12 a We now show that if there is a unique solution, then the solution is also Euler approximable.

170 14. Differential Equations

Theorem 14.14. Suppose the initial value problem (75) has the unique solution y(·) on [t0,T), and has only one solution on [t0,c) for each c ∈ (t0,T). Then y(t) is Euler approximable and is the standard part of every hyperﬁnite Euler approximation of (75) on [t0,T).

Proof. Let Y (t) be the hyperﬁnite Euler approximation with initial value z0 ≈ y0 and inﬁnitesimal increment dt. It is enough to show that Y (t) is ﬁnite whenever st(t) ∈ [t0,T), because then by Theorem 14.1, the standard part of Y (·) is a solution on [t0,T) and is Euler approximable. Take a real number c ∈ (t0,T) and let c < d < T. By assumption, the initial value problem has exactly one solution x(t) on [t0,d). Then the function f(t,x(t)) is continuous for t ∈ [t0,c], and by the Extreme Value Theorem there is a positive real N such that |f(t,x(t))|≤ N for all t ∈ [t0,c]. Let g(t,y) be the function deﬁned by g(t,y) =     N + 1 if f(t,y) > N + 1 f(t,y) if |f(t,y)|≤ N + 1 −(N + 1) if f(t,y) < −(N + 1) Then for all (t,y) ∈×R, g(t,y) is continuous, |g(t,y)|≤ N + 1, and g(t,y) = f(t,y) whenever |f(t,y)|≤ N + 1. Let Z(·) be the hyperﬁnite Euler approximation for the initial value problem

dy dt = g(t,y), y(t0) = y0(82)

with initial value z0 and increment dt. By Theorem 14.2, the standard part z(·) of Z(·) exists and is a solution of the new initial value problem (82) on the interval [t0,T). We now show that z(t) = x(t) for all t ∈ [t0,c). Suppose not, and let v be the greatest lower bound of all t ∈ [t0,c) such that z(t) 6= x(t). Then t0 ≤ v < c, z(t0) = x(t0) = y0, and for all t ∈ [t0,v) we have z(t) = x(t) and |f(t,z(t))|≤ N. Since z(·) and x(·) are continuous on [t0,c), we also have z(v) = x(v) and |f(v,z(v))| ≤ N. f(t,z(t)) is also continuous on [t0,c), so there is an s ∈ (v,c] such that |f(t,z(t))|≤ N + 1 for all t ∈ [t0,s). Therefore g(t,z(t)) = f(t,z(t)) for all t ∈ [t0,s), and hence z(·) is a solution of the original initial value problem (75) on [t0,s). But by hypothesis there is exactly one such solution on [t0,s), so we must have z(t) = x(t) for all t ∈ [t0,s). This contradicts the fact that s > v, and proves that z(t) = x(t) for all t ∈ [t0,c). We now have |f(t,z(t))| ≤ N for all t ∈ [t0,c). Now consider hyperreal points (t2,y2) ∈ [r,c]∗ × R∗. Since z(·) is the standard part of Z(·) on [t0,c), |f(t2,Z(t2))| ≤ N + 1 whenever st(t2) ∈ [t0,c). By Transfer, g(t2,y2) = f(t2,y2) for all hyperreal t2,y2 such that |f(t2,y2)| ≤ N + 1. Hence f(t2,Z(t2)) = g(t2,Z(t2)) whenever st(t2) ∈ [t0,c). It follows that Z(t2) = Y (t2) whenever st(t2) ∈ [t0,c). Therefore Y (·) has the standard part

14C. An Example where Uniqueness Fails (§14.3) 171 z(·) on [t0,c), so Y (t2) is ﬁnite whenever st(t2) ∈ [t0,c). Since c was an arbitrary real number less than T, Y (t2) is ﬁnite whenever st(t2) ∈ [t0,T). This completes the proof. a We conclude this section by showing that the unique solution y(·) in the preceding theorem has a ε,δ property involving standard Euler approximations. Proposition 14.15. Assume the hypotheses of Theorem 14.14 and let y(·) be the unique solution of the initial value problem on [t0,T). Then y(·) has the following standard property: for each real ε > 0 and c < T there is a real δ > 0 such that for every Euler approximation Yu,4t(·) with |u−y0| < δ and 0 < 4t < δ, Yu,4t(t) is within ε of y(t) for all t ∈ [t0,c]. Proof. Suppose the property fails for some ε > 0 and c < T. For each real u and 4t > 0, let M(u,4t) be the maximum value of |Yu,4t(t)−y(t)| for t ∈ [t0,c]. This maximum exists by the Extreme Value Theorem, because |Yu,4t(t)−y(t)| is continuous on [t0,c]. Since the property fails for ε and c, for every real δ > 0 there exist u and 4t such that |u−y0| < δ, 0 < 4t < δ, M(u,4t) ≥ ε. By the Partial Solution Theorem, for each positive inﬁnitesimal σ > 0 there exist hyperreal z and dt such that |z−y0| < σ, 0 < dt < σ, M(z,dt) ≥ ε. Using the Partial Solution Theorem again, M(z,dt) is the maximum value of |Yz,dt(t)−y∗(t)| for t ∈ [t0,c]∗. Then z ≈ y0 and dt is positive inﬁnitesimal, so Yz,dt(·) is a hyperﬁnite Euler approximation but y(·) is not the standard part of Yz,dt(·) on [t0,T). This contradicts Theorem 14.14, so the property must hold after all. a

14C. An Example where Uniqueness Fails (§14.3)

In Elementary Calculus the initial value problem y0 = 3y2/3, y(0) = 0

was presented as an example with inﬁnitely many solutions. Here we will examine this example more closely, and show that every solution is Euler approximable. For every real number a, let ya(t) be the function ya(t) =(0 for t < a (t−a)3 for t ≥ a with domain [0,∞).

172 14. Differential Equations

Proposition 14.16. The family of all solutions of the initial value problem y0 = 3y2/3, y(0) = 0 on [0,∞) consists of the constant zero function y(t) = 0 for all t ∈ [0,∞) and the functions ya(·) where a ≥ 0. Proof. The constant zero function is a solution because its derivative is everywhere zero. For each real a, one can check by diﬀerentiation that y0 a(t) = 0 whenever t ≤ a, and y0 a(t) = 3(t−a)2 = 3(ya(t))2/3 for t > a. When a ≥ 0,y a(0) = 0, so ya(·) is a solution of the given initial value problem. When a < 0, we instead have ya(0) = −a3 > 0, so in this case ya(·) is not a solution. We now show that these are the only solutions. Let x(·) be a solution of the initial value problem, and suppose that x(·) is not the constant zero function. Since f(t,y) = 3y2/3 ≥ 0 for all (t,y), x(t) must be non-decreasing, that is, x(s) ≤ x(t) whenever s ≤ t. We also must have x(0) = 0, so x(t) ≥ 0 for all t. By Theorem 14.11, whenever b > 0 and x(b) > 0, the initial value problem y0 = 3y2/3, y(b) = x(b) has at most one solution on [b,∞). But it does have one solution on [b,∞), namely the function ya(·) where (b−a)3 = x(b), that is, a = b−((x(b))1/3. It follows from uniqueness that we get the same value a for each starting point (b,x(b)) on the curve. Thus for some a ≥ 0, we have x(t) = ya(t) whenever x(t) > 0. Since x(·) and ya(·) are continuous, non-decreasing, and map 0 to 0, we must have x(t) = ya(t) for all t. a The following result is given in [AFHL 1986], page 32, and is attributed to B. Birkeland and D. Normann. Proposition 14.17. Every solution of the initial value problem y0 = 3y2/3, y(0) = 0 is Euler approximable. In fact, for each positive inﬁnitesimal dt, every solution is the standard part of a hyperﬁnite Euler approximation with increment dt and initial value z with z ≥ 0. Proof. We ﬁrst consider the standard Euler approximations Yu,4t(·) with initial value u with u ≥ 0 and increment 4t > 0. Since 3y2/3 ≥ 0, Yu,4t(t) is non-decreasing in t. The function f(t,y) = 3y2/3 is continuous and increasing in y for y ≥ 0. It therefore follows by induction that for each ﬁxed 4t and n, Yu,4t(n4t) is a continuous and increasing function of u ∈ [0,∞). Since Yu,4t(·) is a polygonal function, for each ﬁxed 4t > 0 and t > 0, the function h(u) = Yu,4t(t) is also continuous and increasing. We also have h(0) = 0 and u ≤ h(u). Using the Intermediate Value Theorem we see that h(u) = Yu,4t(t) maps [0,∞) onto [0,∞). Thus whenever 4t > 0, t ≥ 0, v ≥ 0

14C. An Example where Uniqueness Fails (§14.3) 173 there exists u ∈ [0,v] such that Yu,4t(t) = v. Now let dt be positive inﬁnitesimal. The constant zero solution is the standard part of the hyperﬁnite Euler approximation with increment dt and initial value 0, because Y0,dt(t) = 0. Let ya(·) be a nonzero solution of the initial value problem with initial value 0. Then a is a non-negative real number. Take a time t > a, so that ya(t) = (t−a)3 > 0. By the Partial Solution Theorem, there is a hyperreal z ∈ [0,ya(t)]∗ such that Yz,dt(t) = ya(t). By Theorem 14.2, the standard part of Yz,dt(·) exists on [0,∞) and is a solution of the initial value problem with initial value st(z). By Proposition 14.16, ya(·) is the only solution of the initial value problem whose graph contains the point (t,ya(t)), so ya(·) must be the standard part of Yz,dt(·). a

CHAPTER 15

LOGIC AND SUPERSTRUCTURES

This chapter is optional, and provides a link between the simple treatment of inﬁnitesimal calculus in the text Elementary Calculus and the more advanced treatment of inﬁnitesimal analysis found in the literature. The material in this chapter is not needed as background for teaching calculus from Elementary Calculus. It is aimed at mathematicians who wish to go more deeply into the subject.

15A. The Elementary Extension Principle

We will show here that the Transfer Axiom is equivalent to an apparently stronger statement called the Elementary Extension Principle. It says that the real numbers (with all real functions and relations) satisfy the same sentences of ﬁrst order logic as the hyperreal numbers (with the natural extensions of all real functions and relations). Before stating the result precisely we start from the beginning and deﬁne the notion of a sentence of ﬁrst order logic. In ﬁrst order logic we start with a set of symbols, called a language, appropriate for the structure under consideration. In this case we introduce a language L for the real number system with the following uncountable collection of symbols:

A symbol c for each real constant c, A symbol f for each real function f of n variables, A symbol P for each real relation P of n variables.

In addition, L has the following logical symbols common to all ﬁrst order languages:

Variables v1,v2,v3,... Connectives ¬ (not), ∧ (and), ∨ (or), ⇒ (implies). ⇔ (if and only if) Quantiﬁers ∀ (for all), ∃ (there exists) Parentheses and commas.

For simplicity we identify each real constant c, function f, and relation P, with its symbol c,f,P. In Section 1C we deﬁned the notion of a term.

175

176 15. Logic and Superstructures

We repeat the deﬁnition here. A term is a ﬁnite sequence of symbols built according to the following rules: • Every variable is a term. • Every constant is a term. • If τ1,... ,τn are terms and f is a real function of n variables, then f(τ1,... ,τn) is a term. When we replace each variable vi in a term τ(v1,... ,vn) by a constant ci, we obtain a constant term, which is either equal to some real number or is undeﬁned. In Section 1C we called an equation or inequality between two terms a formula. These are expressions of the forms τ = σ, τ 6= σ, τ ≤ σ, τ < σ, τ ≥ σ, τ > σ. The equations and inequalities are special cases of a broader class which we call the atomic formulas of L. If P is a real relation of n variables and τ1,... ,τn are terms, then P(τ1,... ,τn) is called an atomic formula of L. The set of all (ﬁrst order) formulas of L is deﬁned as follows. • Every atomic formula of L is a formula of L. • If ϕ,ψ are formulas of L, so are ¬ϕ, (ϕ∧ψ), (ϕ∨ψ), (ϕ ⇒ ψ), (ϕ ⇔ ψ). • If ϕ is a formula of L and vn is a variable, then (∀vnϕ), (∃vnϕ) are formulas of L. For example, omitting unnecessary parentheses, the following formula of L is the standard ε,δ condition for f to be continuous at x. For readability we use ε,δ,x,y for variables instead of v1,v2,v3,v4. ∀ε(ε > 0 ⇒∃δ(δ > 0∧∀y(|x−y| < δ ⇒|f(x)−f(y)| < ε))). In this example, ε,δ, and y are bound variables, while x is a free variable because it is not in a quantiﬁer. A ﬁrst order formula with no free variables is called a sentence. Thus whenever all the free variables in a ﬁrst order formula are replaced by constants, the result is a sentence. The statement that every real solution of a system of formulas S is a real solution of a system of formulas T is expressed by a sentence of L of the form ∀v1···∀v`((ϕ1 ∧···∧ϕm) ⇒ (ψ1 ∧···∧ψn (83) )) where the ϕi and ψj are atomic formulas.

15A. The Elementary Extension Principle 177

The statement that every real solution of a system of formulas S is a partial real solution of a system of formulas T is expressed by a sentence of L of the form ∀v1···∀vk((ϕ1 ∧···∧ϕm) ⇒∃vk+1···∃v`(ψ1 ∧···∧ψn)).(84) Each sentence ϕ of L is either true or false in the real number system. The notion of a true sentence is a precise mathematical concept which is deﬁned by induction on the complexity of a sentence, and corresponds to the intuitive notion.

Definition 15.1. (i) An atomic sentence P(τ1,... ,τn) is true if each of the constant terms τ1,... ,τn is deﬁned and the n-tuple of values belongs to the relation P. (ii) If ϕ and ψ are sentences, then the truth values of the combinations of ϕ and ψ by connectives are obtained from the truth values of ϕ and ψ using the following table. ϕ ψ ¬ϕ ϕ∧ψ ϕ∨ψ ϕ ⇒ ψ ϕ ⇔ ψ T T F T T T T T F F F T F F F T T F T T F F F T F F T T (iii) The sentence ∀xϕ(x) is true if and only if ϕ(c) is true for all constantsc ∈R. (iv) The sentence ∃xϕ(x) is true if and only if ϕ(c) is true for some constantc ∈R. The notation R|= ϕ means that ϕ is true in R. Note that we treat = and 6= as binary relations, and that the sentence τ 6= σ is diﬀerent than ¬τ = σ. In fact, R |= τ 6= σ if and only if τ and σ are both deﬁned but have diﬀerent values, while R|= ¬τ = σ if and only if it is not the case that τ and σ are both deﬁned and equal. We now introduce a second language L∗ for the hyperreal number system which has a symbol for each hyperreal constant, function, and relation. The truth value of a sentence of L∗ in R∗ is deﬁned as before but with quantiﬁers ranging over R∗ instead of R. Definition 15.2. Given a formula ϕ of L, the ∗-transform of ϕ is the formula ϕ∗ of L∗ obtained by replacing each real function f and relation P occurring in ϕ by its natural extension f∗ and P∗. Given a term τ of L, the ∗-transform τ∗ is deﬁned analogously. Thus the ∗-transform of the formula expressing continuity of f at x is the formula ∀ε(ε >∗ 0 ⇒∃δ(δ >∗ 0∧∀y(|x−∗ y|∗ <∗ δ ⇒|f∗(x)−∗ f∗(y)|∗ <∗ ε))).

178 15. Logic and Superstructures The relation x ≈ y is expressed by the following formula of L∗: ∀z(R(z)∧0 <∗ z ⇒|x−∗ y|∗ <∗ z). This formula is not the ∗-transform of any formula of L because it involves the set R, which is a subset of R∗ but is not the natural extension of any set of reals. Similarly, the property “x is ﬁnite” is expressed by ∃z(R(z)∧|x|∗ <∗ z), which again is not the ∗-transform of any formula of L. The ∗-transform of a sentence of L is always a sentence of L∗. We can now state the Elementary Extension Principle and show that it follows from our Axioms A–E for hyperreal numbers.

Theorem 15.3. (Elementary Extension Principle) For every sentence ϕ of L, ϕ is true in R if and only if its ∗-transform ϕ∗ is true in R∗. Notice that the Transfer Axiom is just the special case of the Elementary Extension Principle where ϕ is a sentence of the form (83) above. The Partial Solution Theorem 1.20 is the special case where ϕ is a sentence of the form (84) above. Proof. For each formula ϕ(v1,... ,vn) of L with at most the free variables v1,... ,vn, let Cϕ be the corresponding characteristic function Cϕ(a1,...an) =(1 if ϕ(a1,... ,an) holds in R 0 otherwise. Thus R|= ∀v1···∀vn(ϕ(v1,... ,vn) ⇔ Cϕ(v1,... ,vn) = 1). We will prove by induction on the complexity of formulas that for each formula ϕ(v1,... ,vn) of L, R∗ |= ∀v1···∀vn(ϕ∗(v1,... ,vn) ⇔ (Cϕ)∗(v1,... ,vn) = 1).(85) Replacing variables by real constants, it will then follow that the statements below are equivalent: R|= ϕ(a1,... ,an) R|= Cϕ(a1,... ,an) = 1 R∗ |= (Cϕ)∗(a1,... ,an) = 1 R∗ |= ϕ∗(a1,... ,an). To start the induction, we let ϕ(v) be an atomic formula P(τ1(v),... ,τk(v)). (For simplicity we do this for the case that ϕ(v) has only one variable v; the case of n variables is similar). Let CP be the characteristic function of P(y1,... ,yk). By the deﬁnition of P∗, R∗ |= ∀y1···∀yk(P∗(y1,... ,yk) ⇔ (CP)∗(y1,... ,yk) = 1),

15A. The Elementary Extension Principle 179

and since ϕ(v) is P(τ1(v),... ,τk(v)), R∗ |= ∀v(ϕ∗(v) ⇔ (CP)∗(τ∗ 1 (v),... ,τ∗ k(v)) = 1). Moreover, in R we have R|= ∀v(CP(τ1(v),... ,τk(v)) = 1 ⇔ Cϕ(v) = 1). Then by Transfer, R∗ |= ∀v((CP)∗(τ∗ 1 (v),... ,τ∗ k(v)) = 1 ⇔ (Cϕ)∗(v) = 1). Combining the above formulas, we see that (85) holds for ϕ(v). To shorten our induction we use the fact that every formula of L is equivalent to a formula which is built up using only the connectives ∧ and ¬ and the quantiﬁer ∃. The other connectives and quantiﬁers may be treated as abbreviations for longer expressions. In the following we assume that (85) holds for ϕ(v1,... ,vn) and ψ(v1,... ,vn), and prove that (85) also holds for ¬ϕ, ϕ∧ψ, and ∃vnϕ. We ﬁrst consider ¬ϕ. We have R|= ∀v1···∀vn(ϕ(v1,... ,vn) ⇔ Cϕ(v1,... ,vn) = 1), R|= ∀v1···∀vn(¬ϕ(v1,... ,vn) ⇔ C¬ϕ(v1,... ,vn) = 1), so R|= ∀v1···∀vn(¬Cϕ(v1,... ,vn) = 1 ⇔ C¬ϕ(v1,... ,vn) = 1). By Transfer the ∗-transforms hold in R∗. Therefore R∗ |= ∀v1···∀vn(¬(Cϕ)∗(v1,... ,vn) = 1 ⇔ (C¬ϕ)∗(v1,... ,vn) = 1). Using the hypothesis (85) for ϕ, we have R∗ |= ∀v1···∀vn(ϕ∗(v1,... ,vn) ⇔ (Cϕ)∗(v1,... ,vn) = 1). It follows that R∗ |= ∀v1···∀vn(¬ϕ∗(v1,... ,vn) ⇔ (C¬ϕ)∗(v1,... ,vn) = 1), so (85) holds for ¬ϕ. To verify (85) for ϕ∧ψ we observe that R|= ∀v1···∀vn(Cϕ∧ψ(v1,... ,vn) = Cϕ(v1,... ,vn)·Cψ(v1,... ,vn)), and by Transfer the ∗-transform holds in R∗. We now show that∃vnϕ satisﬁes (85). For any real (n−1)-tuple a1,... ,an−1 such that R|= ∃vnϕ(a1,... ,an−1,vn), choose a real number f(a1,... ,an−1) such that R|= ϕ(a1,... ,an−1,f(a1,... ,an−1)). Otherwise put f(a1,... ,an−1) = 0. The real function f of n−1 variables is called a Skolem function for ∃vnϕ. Working in R we see that R|= ∀v1···∀vn−1 C∃vnϕ(v1,... ,vn−1) = Cϕ(v1,... ,vn−1,f(v1,... ,vn−1))

180 15. Logic and Superstructures

and also R|= ∀v1···∀vn(Cϕ(v1,... ,vn) = 1 ⇒ C∃vnϕ(v1,... ,vn−1) = 1). It follows from Transfer that the ∗-transforms of these formulas hold in R∗. Thus for any b1,... ,bn−1 ∈R∗, each statement below implies the next. R∗ |= ∃vnϕ∗(b1,... ,bn−1,vn) R∗ |= ∃vn((Cϕ)∗(b1,... ,bn−1,vn) = 1) R∗ |= (C∃vnϕ)∗(b1,... ,bn−1) = 1 R∗ |= (Cϕ)∗(b1,... ,bn−1,f∗(b1,... ,bn−1)) = 1 R∗ |= ϕ∗(b1,... ,bn−1,f∗(b1,... ,bn−1)) R∗ |= ∃vnϕ∗(b1,... ,bn−1,vn). We conclude that (85) holds for ∃vnϕ. a The sentences which arise in beginning calculus are of a very simple form, and the Transfer Axiom and Partial Solution Theorem are broad enough to cover the cases of the Elementary Extension Principle which are needed in proofs. It is important pedagogically that we are able to base the course on the familiar concept of a system of equations and inequalities rather than on the general notion of a formula in ﬁrst order logic. Beginning calculus students do not have the mathematical experience necessary to work with formulas with even three quantiﬁers, such as the ε,δ deﬁnition of continuity. One advantage of introducing the hyperreal numbers at the beginning of the calculus course is that complicated ﬁrst order formulas of L can often be replaced by simpler formulas of L∗. This is especially true when the hyperreal relations x ≈ y and x = st(y) are used. For example, the ε,δ condition for f to be continuous at x, ∀ε(ε > 0 ⇒∃δ(δ > 0∧∀y(|x−y| < δ ⇒|f(x)−f(y)| < ε))), is equivalent to the simpler L∗ formula ∀y(y ≈ x ⇒ f∗(y) ≈ f∗(x)).

15B. Superstructures

In classical analysis one goes beyond the real numbers and real functions. A more appropriate object of study is the superstructure over the real numbers, deﬁned as follows. Definition 15.4. The power set of a set X is the set P(X) of all subsetsof X, P(X) = {Y : Y ⊆ X}. The n-th cumulative power set of X is deﬁned recursively by V0(X) = X, Vn+1(X) = Vn(X)∪P(Vn(X)).

15B. Superstructures 181

The superstructure over X is the union of the cumulative power sets and is denoted by V (X),

V (X) =

∞ [ n=0

Vn(X).

The superstructure V (X) has a membership relation between elements of Vn(X) and Vn+1(X),n = 0,1,2,.... We treat elements of X itself as atoms, and assume that ∅ / ∈ X and that no x ∈ X contains any elements of V (X). We observe that for any superstructure V (X), we have X ⊆ V1(X) ⊆ V2(X) ⊆··· and X ∈ V1(X) ∈ V2(X) ∈··· . Moreover, P(X) ∈ V (X), Vn(X) ∈ V (X) for each n, and if A ∈ V (X)\X then A ⊆ V (X). One usually takes the set X of atoms to be the set R of reals, the set C of complex numbers, or some other structure under investigation. Here we concentrate on the case where the set of atoms is X = R. The sets Z of integers and N of natural numbers are important subsets of R and elements of V (R). We may deﬁne an ordered pair hx,yi by hx,yi = {{x},{x,y}} and an ordered n-tuple, n > 2, as the function hx1,... ,xni = {h1,x1i,... ,hn,xni} from {1,... ,n} into R. Thus all ordered n-tuples belong to V3(R), and all real relations and functions in n variables belong to V4(R). Function spaces, measures, and all other structures from classical analysis belong to V (R), and even to, say, V100(R). The following lemma is easily proved by induction on n. Lemma 15.5. If n > 0 and x ∈ y ∈ Vn(R), then x ∈ Vn−1(R). In order to be in a position to use inﬁnitesimals in more advanced areas of mathematics, we must extend the whole superstructure instead of just the real numbers.

Definition 15.6. A superstructure embedding is a one to one mapping ∗ of V (R) into another superstructure V (S) such that (i) R is a proper subset of S, r∗ = r for all r ∈R, and R∗ = S. (ii) For x,y ∈ V (R), x ∈ y if and only if x∗ ∈ y∗. In view of (i), we will always write R∗ instead of S, and denote the superstructure embedding by ∗: V (R) → V (R∗). Notice that by (i), V (R) ⊆ V (R∗) and ∗ contains the identity map on R, but ∗ does not contain the identity map on V (R). To go further we need an analogue of the Elementary Extension Principle for superstructure embeddings. An arbitrary embedding will not do; we want

182 15. Logic and Superstructures

the embedding to preserve all properties of a certain kind. Assume hereafter that ∗: V (R) → V (R∗) is a superstructure embedding. We ﬁrst form the ﬁrst order predicate logic with the equality symbol, a binary relation symbol∈, and a constant symbol a for each element a ∈ V (R∗). For simplicity we identify a with its constant symbol a. We call the language L. Definition 15.7. A bounded formula of L is an expression built according to the following rules. • If x and y are variables or constants, x = y and x ∈ y are bounded formulas, called atomic formulas. • If ϕ,ψ are bounded formulas, so are (¬ϕ), (ϕ∧ψ), (ϕ∨ψ), (ϕ ⇒ ψ), (ϕ ⇔ ψ). • If u is a variable, c is a constant, and ϕ is a bounded formula, then (∀u ∈ c)ϕ and (∃u ∈ c)ϕ are bounded formulas. • If u,v are variables, ϕ is a bounded formula, and v does not appear in ϕ in the form (∀v ∈ w) or (∃v ∈ w), then (∀u ∈ v)ϕ and (∃u ∈ v)ϕ are bounded formulas.

The adjective bounded refers to the fact that the quantiﬁers in bounded formulas of L are of the form (∀u ∈ x) or (∃u ∈ x). A bounded sentence is a bounded formula in which each occurrence of a variable u is within the scope of a quantiﬁer of the form (∀u ∈ x) or (∃u ∈ x) where x is another variable or constant. Note that if a bounded sentence begins with a quantiﬁer (∀u ∈ x) or (∃u ∈ x), x must be a constant. For example, the property y =Sx is expressed by the bounded formula (∀u ∈ y)(∃v ∈ x)u ∈ v∧(∀v ∈ x)(∀u ∈ v)u ∈ y. If x and y are constants, this is a bounded sentence. Each bounded sentence is either true or false in the superstructure V (R∗). The deﬁnition of the relation “ϕ is true in V (R∗)” is by induction on the complexity of the bounded sentence ϕ. The quantiﬁer clauses are: (∃v ∈ c)ϕ(v) is true in V (R∗) iﬀ ϕ(b) is true in V (R∗) for some b ∈ c, (∀v ∈ c)ϕ(v) is true in V (R∗) iﬀ ϕ(b) is true in V (R∗) for all b ∈ c. The notation V (R∗) |= ϕ means “ϕ is true in V (R∗)”. Since V (R∗) is the only structure under discussion, we sometimes suppress mention of V (R∗) and say “ϕ is true” instead of “ϕ is true in V (R∗)”. Remember that R ⊆ R∗ and V (R) ⊆ V (R∗). We call the elements of R real numbers, the elements of V (R)\R real sets, and arbitrary elements of V (R) real entities. Similarly, elements of R∗, V (R∗) \R∗, and V (R∗) are called hyperreal numbers, hyperreal sets, and hyperreal entities respectively. A real bounded formula is a bounded formula of L all of whose constants are real entities. Since each element of a real set is a real entity, a real bounded sentence has the same meaning in V (R) as in V (R∗).

15B. Superstructures 183 The superstructure embedding ∗: V (R) → V (R∗) induces a mapping, called the ∗-transform, from real bounded formulas to bounded formulas. The ∗transform ϕ∗ of a real bounded formula ϕ is deﬁned as the bounded formula obtained by replacing each constant c occurring in ϕ by its image c∗. For example, the ∗-transform of the real bounded sentence (∀x ∈R)(x < 0∨(∃y ∈R)y·y = x) is the bounded sentence (∀x ∈R∗)(x <∗ 0∗∨(∃y ∈R∗)y·∗ y = x). Definition 15.8. A nonstandard universe is a superstructure embedding ∗: V (R) → V (R∗) which satisﬁes Leibniz’ Principle, which is the property that for each real bounded sentence ϕ ∈L, ϕ is true if and only if ϕ∗ is true. Of course, Leibniz did not formulate his principle in anything like the present form. In fact, ﬁrst order predicate logic was not available until the work of Frege and Peano in the late nineteenth century. The name “Leibniz’ Principle” is used in the literature because Leibniz suggested that the real numbers should be extended to a larger system which has the same elementary properties but contains inﬁnitesimals. The formal notion given here captures this intuitive idea. In Section 15D we will build a nonstandard universe. We now show that Leibniz’ Principle implies the Elementary Extension Principle of Section 15A. By the elementary part of the superstructure V (R) we will mean the set R∪ ∞ [ n=1 P(Rn) of all elements of R and all relations (and hence functions) on R with ﬁnitely many variables. By the elementary part of the nonstandard universe∗: V (R→ R∗ we mean the structure (•,R,R∗) where • is the restriction of the mapping ∗ to the elementary part of V (R). This mapping • associates with each real relation P or function f a hyperreal relation P∗ or function f∗. Theorem 15.9. The elementary part of a nonstandard universe ∗: V (R) →V (R∗) satisﬁes the Elementary Extension Principle. Proof. Given a ﬁrst order sentence ϕ of L, let ϕR be the real bounded sentence of L obtained by replacing each quantiﬁer ∃vn occurring in ϕ by the bounded quantiﬁer (∃vn ∈R), and replacing each quantiﬁer ∀vn occurring in ϕ by the bounded quantiﬁer (∀vn ∈ R). Deﬁne ϕR∗ similarly. Then for each sentence ϕ of L, we have (ϕ∗)R∗ =ϕR∗. From the deﬁnition of truth value we see that the following are equivalent: R|= ϕ, ϕR is true, ϕR∗ is true, R∗ |= ϕ∗.This completes the proof. a

184 15. Logic and Superstructures Theorem 15.10. The elementary part of a nonstandard universe∗: V (R) →V (R∗) is a hyperreal number system which satisﬁes Axioms A–E for the hyperreal numbers. Proof. Axiom A, that R is a complete ordered ﬁeld, is satisﬁed by deﬁnition. Axiom B says that R∗ is an ordered ﬁeld extension of R. By the deﬁnition of a superstructure embedding, r∗ = r for each r ∈ R, so R∗ is an extension of R. Each of the ordered ﬁeld axioms is a ﬁrst order sentence which holds in R, so its ∗-transform holds in R∗ by the Elementary Extension Principle. Therefore Axiom B is satisﬁed. Axiom D, the Function Axiom, holds because the superstructure embedding ∗ gives us the natural extension f∗ of each real function f and the natural extension <∗ of the real order relation <. Axiom E, the Transfer Axiom, is a special case of the Elementary Extension Principle, which holds by Theorem 15.9. It remains to prove Axiom C, that R∗ has a positive inﬁnitesimal. By the deﬁnition of a superstructure embedding, R∗ is a proper ﬁeld extension of R, so there is an element x ∈R∗\R. Suppose ﬁrst that x is inﬁnite. Using only Axioms A and B, it now follows that |x| is positive inﬁnite, |x|−1 is positive, and |x|−1 is inﬁnitesimal (here we use Theorem 1.7, whose proof uses only Axioms A and B). So in this case R∗ has a positive inﬁnitesimal. Now suppose that x is ﬁnite. The Standard Part Principle, Theorem 1.9, is also proved using only Axioms A and B, and shows that there is a real number r such that r−x is inﬁnitesimal. Since x / ∈R, r−x 6= 0, and therefore R∗ has the positiveinﬁnitesimal element |r−x|. This proves Axiom C. a Theorem 15.10 has the following converse, which we will state without proof. It shows that any hyperreal system can be extended to a nonstandard universe. A proof can be found in the book [CK 1990], Section 4.4. Theorem 15.11. Suppose that the triple (•,R,R∗) satisﬁes Axioms A–E for the hyperreal numbers. Then there is a nonstandard universe ∗: V (R) → V (R∗) with elementary part (•,R,R∗).

15C. Standard, Internal, and External Sets In this section we assume that ∗: V (R) → V (R∗) is a nonstandard universe. By Theorem 15.10, it follows that Axioms A–E and all their consequences hold for the real and hyperreal numbers. We will adopt the convention of dropping asterisks on terms f∗(x1,... ,xn) where f is a real function, and on the hyperreal order relations <∗, ≤∗, >∗, ≥∗. A number of new distinctions which did not arise in the hyperreal numbers become important in a nonstandard universe. The image of the power set of R will contain some but not all subsets of R∗, that is, [P(R)]∗ will be a proper subset of P(R∗). For example, the set R of real numbers and the set N of

15C. Standard, Internal, and External Sets 185 natural numbers are subsets of R∗ but do not belong to [P(R)]∗. To see this we note that the ∗-transform of the sentence “Every nonempty bounded set in P(R) has a least upper bound” holds in V (R∗). However, R and N are bounded but have no least upper bound in R∗, so R and N cannot belong to [P(R)]∗. It is useful to distinguish between four kinds of entities in V (R∗). An entity b ∈ V (R∗) is said to be real if b ∈ V (R), standard if b = a∗ for some a ∈ V (R), internal if b ∈ a∗ for some a ∈ V (R)\R, external if b is not internal.

Proposition 15.12. (i) Every standard entity is internal. (ii) An entity x is internal if and only if x ∈ [Vn(R)]∗ for some n. (iii) Every element of an internal set is internal. Proof. (i) If x is standard, x = a∗ for some real entity a. If a is a real number, then x ∈R∗ and R∈ V (R). If a is a real set, then a ∈P(a) ∈ V (R), so x = a∗ ∈ [P(a)]∗. In each case, x is internal. (ii) Since Vn(R) ∈ V (R), every element of [Vn(R)]∗ is internal. Suppose x is internal, so that x ∈ a∗ for some a ∈ V (R)\R. Then a ⊆ Vn(R) for some n, so (∀u ∈ a)u ∈ Vn(R). By Leibniz’ Principle, (∀u ∈ a∗)u ∈ [Vn(R)]∗, and hence x ∈ [Vn(R)]∗. (iii) Suppose x is an internal set and y ∈ x. By (ii), x ∈ [Vn(R)]∗ for somen . Since x is a set, n > 0. By Lemma 15.5, (∀u ∈ Vn(R))(∀v ∈ u)v ∈ Vn−1(R). By Leibniz’ Principle, (∀u ∈ [Vn(R)]∗)(∀v ∈ u)v ∈ [Vn−1(R)]∗. Hence y ∈ [Vn−1(R)]∗, so y is internal. a Here are some examples. Standard and real: Each r ∈R. Each ﬁnite subset of R. External and real: N,R. Standard but not real: N∗,R∗. Internal but not standard: Each c ∈R∗\R. [a,b]∗ where a,b ∈R∗\R. The function h(x) = sin(Hx) where H is inﬁnite.

186 15. Logic and Superstructures

External but not real: The monad of 0. The galaxy of 0. The standard part function st(·). We remark that for every standard function g(x,y) and hyperreal constant c, the function h(x) = g(x,c) is internal. Similarly, for any standard relation P(x,y) are hyperreal constant c, the relation P(x,c) is internal. A bounded formula of L is said to be internal if every constant occurring in the formula is internal. For example, the ∗-transform of a real bounded formula is an internal formula. Any formula formed from an internal formula by replacing variables by internal constants is again an internal formula. Theorem 15.13. Let A be a real set, A ∈ V (R)\R. (i) If B ⊆ A then B∗ ⊆ A∗. (ii) [P(A)]∗ is equal to the set of all internal subsets of A∗. Proof. (i) Since B ⊆ A, we have (∀x ∈ B)x ∈ A. By Leibniz’ Principle, (∀x ∈ B∗)x ∈ A∗, so B∗ ⊆ A∗. (ii) First suppose X ∈ [P(A)]∗. Then X is internal, because P(A) ∈ V (R). We have (∀u ∈P(A))(∀v ∈ u)v ∈ A. By Leibniz’ Principle, (∀u ∈ [P(A)]∗)(∀v ∈ u)v ∈ A∗. Therefore (∀v ∈ X)v ∈ A∗, so X is an internal subset of A∗. Now suppose that X is an internal subset of A∗. Then X ∈ B∗ for some real set B. We have (∀u ∈ B)((∀v ∈ u)v ∈ A ⇒ u ∈P(A)). By Leibniz’ Principle, (∀u ∈ B∗)((∀v ∈ u)v ∈ A∗ ⇒ u ∈ [P(A)]∗). But X ∈ B∗ and (∀v ∈ X)v ∈ A∗. Therefore X ∈ [P(A)]∗. a Several basic notions such as open set, continuity, and diﬀerentiability, split into two separate notions when applied to internal functions and relations. Consider an internal function f on R∗ and a point c ∈R∗. Let R+ be the set of positive reals. f is said to be S-continuous at c if it satisﬁes the real ε,δ condition (∀ε ∈R+)(∃δ ∈R+)(∀x ∈R∗)(|x−c| < δ ⇒|f(x)−f(c)| < ε).

15C. Standard, Internal, and External Sets 187

This is not an internal formula, because it has the constant R+. f is said to be ∗-continuous at c if it satisﬁes the hyperreal ε,δ condition (∀ε ∈R∗ +)(∃δ ∈R∗ +)(∀x ∈R∗)(|x−c| < δ ⇒|f(x)−f(c)| < ε), which is an internal formula. Theorem 5.5 shows that if f is a standard function and c is real, then f is S-continuous at c if and only if f is ∗-continuous at c. This can also be proved by Leibniz’ Principle. First we use Leibniz’ Principle to show that for each real ε,δ ∈ R+, we may replace (∀x ∈ R∗) by (∀x ∈ R) in the deﬁnition of S-continuous, and then we use Leibniz’ Principle again to show that f is S-continuous if and only if it is ∗-continuous. The two notions are not equivalent when f is only assumed to be internal. Here are two examples. Let H be positive inﬁnite. The internal function f(x) = sin(Hx) is everywhere ∗-continuous but nowhere S-continuous. The internal function g(x) =(1/H if x ∈Q∗ 0 if x / ∈Q∗ where Q is the set of rational numbers, is everywhere S-continuous but nowhere ∗-continuous. Some properties of the real numbers are never preserved by a superstructure embedding. For example, the ﬁeld R of real numbers is Archimedean but the larger ﬁeld R∗ of hyperreal numbers is not. The Archimedean Property cannot be expressed by an internal formula. A ﬁeld is Archimedean if every element is less than some natural number. Formally we have (∀x ∈R)(∃n ∈N)x < n but ¬(∀x ∈R∗)(∃n ∈N)x < n. The ∗-transform of the Archimedean Property has a diﬀerent meaning and is true for the hyperreal numbers, (∀x ∈R∗)(∃n ∈N∗)x < n. Thus R∗ is ∗-Archimedean but not Archimedean. A second example is the completeness property. R is complete but R∗ is not. Formally, (∀x ∈P(R))ϕ(x) but ¬(∀x ∈P(R∗))ϕ∗(x), where ϕ(x) is the bounded real formula stating that if x is nonempty and has an upper bound in R then x has a least upper bound in R. However, R∗ is ∗-complete, that is, (∀x ∈ [P(R)]∗)ϕ∗(x). ∗-completeness states that every internal subset of R∗ which is nonempty and has an upper bound in R∗ has a least upper bound in R∗. A third example is induction on the natural numbers. We have (∀x ∈P(N))(0 ∈ x∧(∀y ∈ x)(y + 1 ∈ x) ⇒ x = N),

188 15. Logic and Superstructures but by considering x = N as an element of P(N∗) we see that ¬(∀x ∈P(N∗))(0 ∈ x∧(∀y ∈ x)(y + 1 ∈ x) ⇒ x = N∗). However, N∗ does satisfy ∗-induction, that is, induction for internal subsets of N∗, (∀x ∈ [P(N)]∗)(0 ∈ x∧(∀y ∈ x)(y + 1 ∈ x) ⇒ x = N∗). We conclude this section with some consequences of Leibniz’ Principle. The next theorem is very useful in practice when one wants to show that a set is internal.

Theorem 15.14. (Internal Deﬁnition Principle) If ϕ(x,y1,... ,yn) is a bounded formula with no constants, and c,b1,... ,bn are internal sets, then {x ∈ c: ϕ(x,b1,... ,bn)} is an internal set.

Proof. By Proposition 15.12, there is a natural number m such that the elements c,b1,... ,bn all belong to [Vm(R)]∗. By the axioms of set theory, the following real bounded sentence is true: (∀y1,... ,yn,z ∈ Vm(R))(∃u ∈ Vm+1(R))u = {x ∈ z: ϕ(x,y1,... ,yn)}. By Leibniz’ Principle, the ∗-transform of this sentence is true. It follows that {x ∈ c: ϕ(x,b1,... ,bn)}∈ [Vm+1(R)]∗. Therefore the set on the left is internal. a Theorem 15.15. (Overspill Principle) Let X be an internal subset of R∗. (i) If c ∈R∗ and X contains the monad of c, then there is a real δ > 0 suchthat (c−δ,c + δ)∗ ⊆ X. (ii) If X contains inﬁnitely many natural numbers then X contains a positive inﬁnite hyperinteger.

Proof. (i) Let Y = {y ∈R∗: 0 < y ≤ 1∧(∀u ∈ (c−y,c + y)∗)u ∈ X}. By the Internal Deﬁnition Principle 15.14, the set Y is internal. Y is a subset of R∗ and is bounded above by 1. Since X contains the monad of c, every positive inﬁnitesimal belongs to Y . By the ∗-transform of the completeness property of the reals, Y has a least upper bound b ∈ R∗. Then b is positive but not inﬁnitesimal, so there is a real number δ such that 0 < δ < b. Then δ ∈ Y and hence (c−δ,c + δ)∗ ⊆ X. (ii) Let K be a positive inﬁnite hyperinteger. By the Internal Deﬁnition Principle, the set Y = {x ∈ X: x ∈N∗∧x ≤ K}

15D. Bounded Ultrapowers 189

is internal, and has the upper bound K. Y is nonempty because it contains the inﬁnite set X∩N. By the ∗-transform of the completeness property of the reals, Y has a least upper bound b. Since Y contains an inﬁnite subset of N, b must be positive inﬁnite. Take an inﬁnite c < b. Then c is not an upper bound of Y , so there exists H ∈ Y such that H > c. Then H is inﬁnite, H ∈N∗, and H ∈ X. a For example, suppose f is an internal function from R∗ into R∗, which is ∗-continuous at every point x ≈ c. By the Internal Deﬁnition Principle, the set X = {x ∈R∗: f is ∗-continuous at x} is internal and contains the monad of c. Then by the Overspill Principle, there is a real δ > 0 such that (c−δ,c+δ)∗ ⊆ X and thus f is ∗-continuous at every point of (c−δ,c + δ)∗. Corollary 15.16. (Robinson’s Principle) Let f be an internal function from N∗ into R∗. If f(n) is inﬁnitesimal for all ﬁnite n, then there is an inﬁnite H ∈N∗ such that f(K) is inﬁnitesimal for all K ≤ H in N∗. Proof. Let Y = {n ∈N∗: (∀m ∈N∗)(m ≤ n ⇒|f(m)|≤ 1/(n + 1)}. By the Internal Deﬁnition Principle, Y is internal. By hypothesis, N ⊆ Y ⊆ N∗. By Theorem 15.15, Y contains a positive inﬁnite hyperinteger H. Then for all K ≤ H in N∗, |f(K)|≤ 1/(H + 1) and hence f(K) is inﬁnitesimal. a

15D. Bounded Ultrapowers

In this section we build a nonstandard universe. One way to do this is to apply the logical compactness theorem for the theory of types ([Henkin 1950]), as in the book [Robinson 1970]. Another method, which we will adopt here, is the bounded ultrapower. Our method here will have two stages. First, we build a bounded ultrapower of the superstructure V (R). This is a generalization of the ultrapower of the real number system R given in Section 1G. Then we use a method known as the Mostowski collapse to map this ultrapower into a new superstructure V (R∗). The notion of a free ultraﬁlter over a set I was given in Deﬁnition 1.41. Theorem 1.42 shows that there exists a free ultraﬁlter over every inﬁnite set. Hereafter we let U be a free ultraﬁlter over N. Definition 15.17. A countable sequence a = ha0,a1,a2,...i of elements ofV (R) is said to be bounded in V (R) if there is a ﬁxed n ∈N such that eacha i belongs to Vn(R).

190 15. Logic and Superstructures

We will build the bounded ultrapower of V (R) modulo U. The elements of the bounded ultrapower are going to be equivalence classes of bounded sequences in V (R), and the equivalence relation is determined by U. Definition 15.18. Two bounded sequences a,b in V (R) are said to be Uequivalent, in symbols a =U b, if {n: an = bn}∈ U. Lemma 15.19. The relation =U is an equivalence relation on the set of bounded sequences in V (R). Proof. =U is obviously reﬂexive and symmetric. We show that =U is transitive. Assume a =U b and b =U c. Let X = {n: an = bn}, Y = {n: bn = cn}, Z = {n: an = cn}. Then X ∈ U and Y ∈ U, so X∩Y ∈ U. But X∩Y ⊆ Z, so Z ∈ U, and hence a =U c. a Definition 15.20. For each a be a bounded sequence in V (R), we deﬁne aU to be the U-equivalence class of a, aU = {b: a =U b}. The bounded ultrapowerQU V (R) of the set V (R) is the set Y U V (R) = {aU : a is bounded in V (R)}. The natural embedding of V (R) intoQU V (R) is the mapping i: V (R) →Y U V (R) deﬁned by i(x) = hx,x,x,...iU for each x ∈ V (R). That is, i(x) = x if x ∈ R, and i(x) is the U-equivalence class of the constant sequence hx,x,x,...i otherwise. The U-membership relation ∈U onQU V (R) is deﬁned by aU ∈U bU iﬀ {n: an ∈ bn}∈ U. Lemma 15.21. The relation aU ∈U bU depends only on the equivalence classes aU,bU. That is, if a =U a0 and b =U b0 then {n: an ∈ bn}∈ U iﬀ {n: a0n ∈ b0n}∈ U. Proof. Suppose {n: an ∈ bn}∈ U. Then {n: a0n ∈ b0n}⊇{n: an ∈ bn}∩{n: an = a0n}∩{n: bn = b0n}. The right side belongs to U, so the left side belongs to U, and hence a0 ∈U b0. a Lemma 15.22. The natural embedding i: V (R) →QU V (R) is one to one.

15D. Bounded Ultrapowers 191 Proof. If x 6= y then hx,x,x,...i and hy,y,y,...i are never equal, and ∅ / ∈ U, so i(x) 6= i(y). a Definition 15.23. The ultrapowerQU R of R modulo U is the set Y U R = {bU : b maps N into R}. In Section 1G we deﬁnedQU R in a slightly diﬀerent way, replacing the equivalence class of a constant sequence hr,r,...i by r itself to makeQU R an extension of R. This time we postpone that replacement to the next step, when we form the set R∗ of hyperreal numbers. Observe that the following conditions are equivalent: aU ∈Y U R, {n: bn ∈R} = N for some b =U a, {n: an ∈R}∈ U, aU ∈U i(R). Lemma 15.24. The set {i(r) : r ∈R} is a proper subset ofQU R. Proof. R is a subset ofQU R because i(r) = hr,r,...iU ∈QU R for eachr ∈R. Let a be a one to one mapping of N into R. Then aU ∈QU R. However, for each r ∈R, the set {n: an = r} has at most one element and hence is not in U, so aU 6= i(r). a Lemma 15.25. Suppose aU,bU ∈QU V (R)\QU R and aU 6= bU. Then {cU : cU ∈U aU}6= {cU : cU ∈U bU}. Proof. The intersection of the two sets {n: an ⊆ bn},{n: bn ⊆ an} does not belong to U, because their intersection {n: an = bn} does not belong to U. Therefore one of these sets does not belong to U, say {n: an ⊆ bn} / ∈ U. Then its complement X does belong to U, X = {n: an \bn 6= ∅}∈ U. For each n ∈ X choose cn ∈ an \bn, and for n ∈ N\X choose cn = 0. Then cU ∈U aU but not cU ∈U bU. a We have deﬁned the bounded ultrapower and proved some basic lemmas. We now turn to the second stage, the Mostowski collapse. We begin at the bottom level by forming the set R∗ of hyperreal numbers. As in Section 1G, the idea is to start withQU R and replace i(r) by r itself for each real r. But this time, we want to use R∗ as the set of atoms of a superstructure V (R∗), so we must also make sure that no element of R∗ contains any elements of V (R∗).

192 15. Logic and Superstructures

Lemma 15.26. There is a set R∗ and a function j0 such that: (i) j0 mapsQU R one to one onto R∗, (ii) j0(i(r)) = r for each r ∈R, and hence R⊆R∗, (iii) ∅ / ∈R∗ and no element of R∗ contains any elements of V (R∗). Proof. There are many ways to do this. One way is to take a set λ of cardinality greater than V (R) and deﬁne j0(aU) =(r if r ∈R and aU = i(r), aU ×{λ} otherwise. a Hereafter we let R∗ and j0 be as in Lemma 15.26.

Theorem 15.27. There is a unique one to one mapping j: Y U V (R) → V (R∗), called the Mostowski collapse ofQU V (R), such that: (i) j(x) = j0(x) for x ∈QU R. (ii) j(aU) = {j(bU): bU ∈U aU} for all aU ∈QU V (R)\QU R. Proof. For each k ∈N we consider the ultrapower of the set Vk(R), which is deﬁned by Y U Vk(R) = {aU : a maps N into Vk(R)}. Then Y U V (R) = ∞ [ k=0Y U Vk(R), Y U R =Y U V0(R) ⊆Y U V1(R) ⊆Y U V2(R) ⊆··· . We deﬁne j restricted toQU Vk(R) by induction on k. For k = 0, j restricted toQU V0(R) is the mapping j0 from Lemma 15.26. Suppose we already have deﬁned j restricted toQU Vk(R). For aU ∈QU Vk+1(R)\QU Vk(R), deﬁne j(aU) = {j(bU): bU ∈U aU}. This deﬁnition is unambiguous because by Lemma 15.5, whenever bU ∈U aU we have {n: bn ∈ Vk(R)}⊇{n: bn ∈ an}∈ U, so bU ∈QU Vk(R). Conditions (i) and (ii) hold by deﬁnition. j is one to one by Lemma 15.25. To show that there is at most one function satisfying (i) and (ii), one proves by induction on k that the restriction of j toQU Vk(R) is the only function with properties (i) and (ii). a

15D. Bounded Ultrapowers 193 Definition 15.28. The embedding ∗: V (R) → V (R∗) is the composition of the natural embedding i and the Mostowski collapse j, that is, for a ∈ V (R),a∗ = j(i(a)). QU V (R) V (R∗) V (R)@ @ @I i ∗ j

Lemma 15.29. The mapping ∗: V (R) → V (R∗) built from a bounded ultrapowerQU V (R) is a superstructure embedding. Proof. Since i and j are one to one, their composition ∗ is one to one. By Lemmas 15.24 and 15.26, R is a proper subset of R∗ and r∗ = r for each r ∈R. Taking R as an element of V (R), we have j(i(R)) = j(hR,R,...iU) =(j(aU): aU ∈Y U R)= R∗. Finally, for x,y ∈ V (R), x ∈ y iﬀ i(x) ∈U i(y) iﬀ j(i(x)) ∈ j(i(y)) iﬀ x∗ ∈ y∗. a The following theorem of L o´s will show that Leibniz’ Principle holds for this superstructure embedding. Theorem 15.30. (L o´s’ Theorem) Given a bounded ultrapower QU V (R),let ϕ(v1,... ,vn) be a bounded formula with no constants and let a1 U,... ,an U ∈Q U V (R). Then ϕ(j(a1 U),... ,j(an U)) is true if and only if {k ∈N: ϕ(a1(k),... ,an(k))}∈ U. Proof. We use induction on the complexity of formulas. The theorem holds for atomic formulas by the deﬁnition of aU = bU and aU ∈U bU. For the induction steps it suﬃces to assume the theorem for ϕ(v1,... ,vn) and ψ(v1,... ,vn) and prove the theorem for ¬ϕ, ϕ∧ψ, and (∃v1 ∈ v2)ϕ. For ¬ϕ we observe that the following are equivalent. ¬ϕ∗(j(a1 U),... ,j(an U)) {k ∈N: ϕ(a1(k),... ,an(k))} / ∈ U {k ∈N: ¬ϕ(a1(k),... ,an(k))}∈ U

194 15. Logic and Superstructures For ϕ∧ψ we use the following equivalences. (ϕ∧ψ)∗(j(a1 U),... ,j(an U)) ϕ∗(j(a1 U),... ,j(an U))∧ψ∗(j(a1 U),... ,j(an U)) {k ∈N: ϕ(a1(k),... ,an(k))}∈ U and {k ∈N: ψ1(a(k),... ,an(k))}∈ U {k ∈N: ϕ(a1(k),... ,an(k))}∩{k ∈N: ϕ(a1(k),... ,an(k))}∈ U {k ∈N: (ϕ∧ψ)(a1(k),... ,an(k))}∈ U. The theorem for (∃v1 ∈ v2)ϕ again follows from a sequence of equivalent statements. (∃v1 ∈ j(a2 U))ϕ∗(v1,j(a2 U),... ,j(an U)) For some a1 U ∈U a2 U,ϕ∗(j(a1 U),... ,j(an U)) {k ∈N: For some a1(k) ∈ a2(k),ϕ(a1(k),... ,an(k))}∈ U {k ∈N: (∃v1 ∈ a2(k))ϕ(v1,a2(k),... ,an(k))}∈ U. a Theorem 15.31. There exists a nonstandard universe. In fact, for each free ultraﬁlter U over N, the superstructure embedding ∗: V (R) → V (R∗) built by the bounded ultrapower modulo U is a nonstandard universe. Proof. By Theorem 1.42 there exists a free ultraﬁlter U over N. For a real bounded sentence ψ, L o´s’ Theorem 15.30 shows that ψ∗ is true if and only if {k ∈ N: ψ is true} ∈ U. Since N ∈ U, this is equivalent to the truth of ψ. Hence Leibniz’ Principle holds for ∗: V (R) → V (R∗). a

15E. Saturation and Uniqueness

By Theorem 1.38, the ﬁeld of real numbers is, up to isomorphism, the unique complete ordered ﬁeld. Furthermore, there is a deﬁnable complete ordered ﬁeld. In this section we present the analogous results for hyperreal number systems and for nonstandard universes. The theorems in this section use some fairly advanced methods from model theory and are stated without proof. The hyperreal number system (∗,R,R∗) is not uniquely characterized by Axioms A–E, even up to isomorphism. This lack of uniqueness does not matter much in practice, but can be unsettling. In this section we remedy the situation by introducing one more axiom, the Saturation Axiom, which is important in applications beyond calculus and which does give uniqueness up to isomorphism. Saturation is diﬀerent from completeness but has a similar appeal.

15E. Saturation and Uniqueness 195

The most natural formulation of the Saturation Axiom requires the existence of an uncountable inaccessible cardinal. The default foundation for mathematics is usually taken to be the system ZFC, Zermelo-Fraenkel set theory plus the Axiom of Choice. The Axiom of Inaccessibility in set theory, which says that there are uncountable inaccessible cardinals, cannot be proved in ZFC (if ZFC is consistent). But the justiﬁcation of the axioms of ZFC, based on the idea of a cumulative hierarchy of sets, also justiﬁes the Axiom of Inaccessibility. For this reason, it is considered acceptable to add the Axiom of Inaccessibility to ZFC when convenient. We now introduce the notion of an inaccessible cardinal and state the Axiom of Inaccessibility in set theory. We will then introduce the Saturation Axiom for hyperreal numbers, and an analogous property for nonstandard universes.

Definition 15.32. A cardinal number κ is said to be inaccessible if: (i) For any set x of cardinality less than κ, the power setP(x) has cardinality less than κ. (ii) For any set of sets X = {xi: i ∈ I} such that I and each xi has cardinality less than κ, the unionSi∈I xi has cardinality less than κ. The Axiom of Inaccessibility in set theory is the axiom that there exists an uncountable inaccessible cardinal. The ﬁrst inﬁnite cardinal ℵ0 is inaccessible. In ZFC, the axiom of inﬁnity gives the existence of ℵ0. The Axiom of Inaccessibility is the simplest example of a strong axiom of inﬁnity, and is a common addition to the axioms of ZFC. We now state the Saturation Axiom for a hyperreal number system (∗,R,R∗).

Saturation Axiom Let S be a set of equations and inequalities involving real functions, hyperreal constants, and variables, such that S has smaller cardinality than R∗. If every ﬁnite subset of S has a hyperreal solution, then S has a hyperreal solution.

To state the uniqueness theorem, we need the notion of an isomorphism between two hyperreal number systems. Definition 15.33. Let (∗,R,R∗) and (•,R,R•) be two hyperreal number systems with the same real part R. An isomorphism between them is a mapping h: R∗ →R• such that: (i) h(r) = r for each r ∈R, (ii) h is an ordered ﬁeld isomorphism from R∗ onto R•, (iii) For each real function f of n variables and x1,... ,xn ∈R∗, f•(hx1,... ,hxn) = h(f∗(x1,... ,xn)). Two hyperreal number systems with the same real part R are said to be isomorphic if there is an isomorphism between them.

196 15. Logic and Superstructures

Theorem 15.34. Assume the Axiom of Inaccessibility. There for each complete ordered ﬁeld R there is up to isomorphism a unique structure (∗,R,R∗) which satisﬁes Axioms A–E and the Saturation Axiom, such that the cardinality of R∗ is the ﬁrst uncountable inaccessible cardinal. One might ask whether there exist structures in cardinalities other than the ﬁrst uncountable inaccessible cardinal which satisfy Axioms A–E and the Saturation Axiom. The set theory ZFC plus Axiom of Inaccessibility is not strong enough to decide this question, which depends on the behavior of cardinal exponents. However, the analogue of Theorem 15.34 can be shown to hold for every uncountable inaccessible cardinal. If the generalized continuum hypothesis never holds, i.e. κ+ < 2κ for all inﬁnite cardinals κ, then such structures exist only in uncountable inaccessible cardinals. So it is natural to require the size of the hyperreal structure to be the ﬁrst uncountable inaccessible cardinal. There is a similar uniqueness theorem for nonstandard universes. Definition 15.35. A nonstandard universe ∗: V (R) → V (R∗) is said to be saturated if for every set X of internal sets such that the cardinality of X is less than the cardinality of R∗, if every ﬁnite subset of X has nonempty intersection then X has nonempty intersection. We say that ∗: V (R) → V (R∗) has the inaccessibility property if both R∗ and the set of all internal sets have cardinality equal to the ﬁrst uncountable inaccessible cardinal. Definition 15.36. Let ∗: V (R) → V (R∗) and •: V (R) → V (R•) be two nonstandard universes with the same real part R. An isomorphism between them is a mapping h: V (R∗) → V (R•) such that: (i) h(r) = r for each r ∈R, (ii) h maps R∗ one to one onto R•, (iii) For each X ∈ V (R∗)\R∗, h(X) = {h(u) : u ∈ X}. (iv) For each A ∈ V (R), h(A∗) = A•. Two nonstandard universes with the same real part R are said to be isomorphic if there is an isomorphism between them.

Here is an easy lemma.

Lemma 15.37. If h is an isomorphism between two nonstandard universes ∗: V (R) → V (R∗) and •: V (R) → V (R•), then h(A) = A for each A ∈ V (R), h maps V (R∗) one to one onto V (R•), and h is an extension of an isomorphism between the elementary parts of ∗: V (R) → V (R∗) and •: V (R) → V (R•). We can now state a uniqueness theorem for nonstandard universes.

Theorem 15.38. Assume the Axiom of Inaccessibility. Then for each complete ordered ﬁeld R there is up to isomorphism a unique nonstandard universe ∗: V (R) → V (R∗) which is saturated and has the inacessibility property.

15E. Saturation and Uniqueness 197

Theorems 15.34 and 15.38 have analogues in ordinary ZFC set theory, without the Axiom of Inaccessibility. However, these results replace the notion of a saturated structure with a more complicated notion, called a special structure. Here we have chosen to assume the Axiom of Inaccessibility in order to get a more natural uniqueness result. One can also prove a result like Theorem 15.34 but where R∗ is a proper class. This approach would require an extension of ZFC with proper classes, and a corresponding result for nonstandard universes would need a notion of superstructure over a proper class.

In Section 1G we presented the result of Kanovei and Shelah [KS 2004], which gives a deﬁnable hyperreal number system (∗,R,R∗) that satisﬁes Axioms A–E. This was done with an iterated ultrapower. One can also build deﬁnable structures in Theorems 15.34 and 15.38. In the set theory ZFC plus the Axiom of Inaccessibility, we say that a set X is deﬁnable by a ﬁrst order formula θ(v) if one can prove that X is the unique set such that θ(X) holds. The following two theorems are implicit in the paper of Kanovei and Shelah, and are proved with an iterated ultrapower which is similar to but more elaborate than the one in Section 1G .

Theorem 15.39. Assume the Axiom of Inaccessibility. There is a deﬁnable hyperreal number system (∗,R,R∗) which satisﬁes Axioms A–E and the Saturation Axiom, such that the cardinality of R∗ is the ﬁrst uncountable inaccessible cardinal.

Theorem 15.40. Assume the Axiom of Inaccessibility. There is a deﬁnable nonstandard universe ∗: V (R) → V (R∗) which is saturated and has the inacessibility property.

The superstructure method given here is not the only approach to inﬁnitesimal analysis. Other approaches extend the language of ZFC by adding new primitive symbols to the language in some way. One of these, Nelson’s Internal Set Theory, has been used extensively, and several extensions of this theory have been studied (see the article of Hrbacek in [CNR 2006] for a survey). The superstructure method has been the most common approach in the literature, because it stays close to the traditional classical foundations of mathematics, working within ZFC, or ZFC with the Axiom of Inaccessibility. Theorems 15.38 and 15.40 strengthen this point by showing that one can work with a nonstandard universe which is deﬁnable and is uniquely characterized up to isomorphism by Leibniz’ Principle, saturation, and the inaccessibility property.

REFERENCES

[AFHL 1986] S. Albeverio, J.-E. Fenstad, R. Hoegh-Krohn, and T. Lindstrom, Nonstandard Methods in Stochastic Analysis and Mathematical Physics, Academic Press (1986) [ACH 1997] L.O. Arkeryd, N.J. Cutland, and C.W. Henson, Nonstandard Analysis, Theory and Applications, Kluwer (1997). [Buck 1965] R.C. Buck, Advanced Calculus, McGraw-Hill (Second Edition, 1965). [CK 1990] C.C. Chang and H.J. Keisler, Model Theory, North-Holland (1990) [CNR 2006] N.J. Cutland, M. Di Nasso, and D.A. Ross, Nonstandard Methods and Applications in Mathematics, Lecture Notes in Logic, Association for Symoblic Logic (2006). [Goldblatt 1991] R. Goldblatt, Lectures on the Hyperreals, Springer (1991). [Henkin 1950] L.A. Henkin, Completeness in the Theory of Types, Journal of Symbolic Logic vol. 15 (1950), pages 159-166. [Hewitt 1948] E. Hewitt, Rings of real-valued continuous functions, Transactions of the American Mathematical Society vol. 64 (1948), pages 45-99. [HL 1985] A.E. Hurd and P.A. Loeb, An Introduction to Nonstandard Real Analysis, Academic Press (1985). [KS 2004] V. Kanovei and S. Shelah, A Deﬁnable Nonstandard Model of the Reals, Journal of Symbolic Logic vol. 69 (2004), pages 159-164. [Keisler 2000] H.J. Keisler, Elementary Calculus: An Inﬁnitesimal Approach, Online Edition, www.math.wisc.edu/∼keisler. [Keisler 1986] H.J. Keisler, Elementary Calculus: An Inﬁnitesimal Approach, Prindle, Weber and Schmidt (First Edition 1976, Second Edition 1986). [Keisler 1976] H.J. Keisler, Foundations of Inﬁnitesimal Calculus, Prindle, Weber and Schmidt (1976). [L o´s 1955] J. L o´s, Quelques remarques, th´eor`emes et probl`emes sur les classes d´eﬁnissables d’alg`ebres, Mathematical Interpretation of Formal Systems, North-Holland (1955). [Robinson 1970] A. Robinson, Nonstandard Analysis, North-Holland (1970).

199

200 References [Skolem 1934] T. Skolem, ¨Uber di Nicht-Charakterisierbarkeit der Zahlenreighe mittels endlich oder abz¨ahlbar unendlicht vieler Aussagen mit ausschliesslich Zahlenvariablen, Fundamentae Mathematica vol. 23 (1934), pages 150-161. [Stroyan 1997]Mathematical Background: Foundations of Inﬁnitesimal Calculus, Second Edition, Academic Press (1997). [SL 1976] K.D. Stroyan and W.A.J. Luxemburg, Introduction to the Theory of Inﬁnitesimals, Academic Press (1976).

Index

absolute value 20 absolutely convergent 108 Addition Property 59,75,135,146 almost linearly dependent 115 almost parallel 115 antiderivative 64 Archimedean Property 12,22,187 area function 57 Associative Law 19 atomic formula 174,180 Axiom of Choice 23 Axiom of Inaccessibility 193 Barwise, J. ix basic closed region 128,129 basic open region 129 basic polar region 94 basis vectors 113 Behrens, M.F. 93 Bolzano-Weierstrass Theorem 107 bound variables 174 boundary 119 boundary curve 154 bounded 17,22 bounded formula 180 bounded in V (R) 187 bounded sentence 105,180 bounded ultrapower 188 Cauchy Condition 21,96 chain 23 Chain Rule 41 circumscribed rectangle 136,143 closed set 16,119 closure 16 Commutative Law 19 commutative ring 19 compact 43 complete ordered ﬁeld 21,187 components 113 constant 7 Constant Rule 38 constant term 8,174 contains a solution 9 continuous 42,43,117,120 ∗-continuous 185 converge 83,103,107 coset 19 countable 25 critical point 51,128

Critical Point Theorem 53,130 cumulative power set 178 Cylinder Property 136,146 decreasing 48 deﬁnable complete ordered ﬁeld 21 deﬁnable 28,195 deﬁnite integral 59 dependent variable 32 depends on x 32 derivative 31 diﬀerentiable 31,121 diﬀerential equation 159 directed line segment 113 distance 17 Distributive Law 19 diverge 86,103,107 double integral 137 double Riemann sum 136 element of area 138 Elementary Extension Principle 178 elementary part 181 equation 8 equivalence relation 19 Extreme Value Theorem 51,54,128,129 Euler approximation 160,164 exponential function 99 external 183 ﬁeld 20 ﬁeld extension 20 ﬁeld of view 34 ﬁlter 24 ﬁnite 2,18,114 formula 8,174 Fr`echet ﬁlter 24 free ultraﬁlter 24 free variable 174 Frege, F.L.G 183 Function Axiom 9 function of x 32 Fundamental Theorem of Calculus 66 galaxy 2,114 Generalized Mean Value Theorem 74 geometric series formula 100,109 Goldblatt, R. ix Green’s Theorem 156 Heine-Borel Theorem 107 Henkin, L.A. 189 Hewitt, E 23

201

homomorphism 19 Hrbacek, K 197 hyperﬁnite Euler approximation 160 hyperintegers 45 hyperrationals 98 hyperreal closed interval 46 hyperreal cover 130 hyperreal entity 180 hyperreal number 2,9,180 hyperreal plane 17 hyperreal sets 180 hyperreal solution 9 hyperreal subinterval 46 hyperreal vector 114 ideal 19 Identity Law 19,20 image 144 implicit function 124 Implicit Function Theorem 126,128 improper integral 85 inaccessibility property 194 inaccessible 193 increasing 48 increment 32,121 Increment Theorem 35 indeﬁnite integral 65 independent variable 32 index set 23 induction 22 inequality 8 inﬁnite 2,114 inﬁnite lower Riemann sum 62 inﬁnite partial sum 106 inﬁnite partition 47 inﬁnite Riemann sum 58,137,150 inﬁnite sequence 103 inﬁnite series 107 Inﬁnite Sum Theorem 77,140 inﬁnite telescope 36 inﬁnitely close 2,18,114 inﬁnitely close compared to 4x 32 inﬁnitesimal 2,114 inﬁnitesimal interval 46 inﬁnitesimal microscope 33 initial value problem 159 inner product 113 integral equation 159 interior 16,119 interior critical point 51 Intermediate Value Theorem 49,54 internal 183 Internal Deﬁnition Principle 188

interval 22 inverse function 89 Inverse Function Theorem 92 Inverse Law 19,20 inverse relation 89 isomorphism 19,193 Iterated Integral Theorem 139 Jacobian 144 Jordan content 143 Kanovei, V. vii,1,23,197 Keisler, H.J. vii language 173 Leibniz, G.W. vii,183 Leibniz’ Principle 181 length 80,82,113 l’Hospital’s Rule 75 limit 41,42 Limit Comparison Test 109 line integral 149 linearly dependent 115 Lipschitz bound 165 Local Inverse Function Theorem 93 local maximum 49 locally Lipschitz 165 logarithmic function 100 L o´s, J. 23,193 lower Riemann sum 62 Luxemburg, W.A.J. 93 maximum 49,126 maximum critical point 130 maximum value 126 Mean Value Theorem 53,54,134 membership relation 179 minimum 49,126 minimum value 126 monad 2,18,114 Mostowski, A 189 Mostowski collapse 190 n-th diﬀerential 40 n-th increment 110 natural embedding 188 natural extension 7,9,14,17 natural number 22 Nelson, E. 197 nonstandard universe 181 open set 16,119 order isomorphic 21 ordered ﬁeld 20 Overspill Principle 188 parallel 115 parametric curve 82 partial derivatives 120

202

Partial Solution Theorem 12 partial sum sequence 106 path 82 Peano, G. 183 Peano Existence Theorem 165 piecewise smooth curve 152 plane 17 polar area function 94 polar rectangle 94 polygonal function 160 Power Rule 40 power set 178 Product Law 20 Product Rule 38 Quotient Rule 38 real bounded formula 180 real direction 115 real entity 180 real function 7 real length 115 real neighborhood 16,17,119 real number 2,9,21,180 real plane 17 real relation 17 real set 180 real solution 8 real vector function 116 Rectangle Property 59 Reﬂexive Law 19 region under the curve 57 reparametrization 82 Reparametrization Theorem 84 Riemann sum 58 ring 19 Robinson,A vii,viii,189 Robinson’s Principle 189 S-continuous 184 Saturation Axiom 23,194,195 scalar multiple 113 Second Derivative Test 54 Second Fundamental Theorem 68 sentence 174 Shelah, S vii,1,23,197 simple curve 82 Skolem, T 23 Skolem function 177 slope 31 smooth curve 149 smooth transformation 144 smooth 80,82,121 standard part 5,114,161 Standard Part Principl

FOUNDATIONS OF INFINITESIMAL CALCULUS

猜你喜欢