本世纪最好的NSA!

 NONSTANDARD ANALYSIS

By DR. J. PONSTEIN

With love, love, love to those five women, who caressed me.

J. Ponstein former professor at the University of Groningen

A Naive Way to the Infinitesimals (an unorthodox treatment of Nonstandard Analysis)

ISBN: 90-367-1672-1

Contents

Prologue 11

Preface 15

1 Generalities 17

1.1 Infinitesimals and other nonstandard numbers: getting acquainted 17

1.2 Other ∗-transforms; generating new numbers . . . . . . . . . . . . 19 1.3 Bound and free variables; prenex normal form . . . . . . . . . . . 20

1.4 The purpose of nonstandard analysis . . . . . . . . . . . . . . . . 23

1.5 More about the ∗-transform; transfer . . . . . . . . . . . . . . . . 25

1.6 Standard, internal, and external constants . . . . . . . . . . . . . 28

1.7 Infinitesimals in Greek geometry? . . . . . . . . . . . . . . . . . . 29

1.8 Infinitesimals in the 17th to the 19th century . . . . . . . . . . . . 31

1.9 Infinitesimals in the 20th century . . . . . . . . . . . . . . . . . . 34

1.10 Introducing infinitesimals by plausible reasoning; filters . . . . . . 37

1.11 Basic assumptions of formalism . . . . . . . . . . . . . . . . . . . 41

1.12 Basic assumptions of constructivism . . . . . . . . . . . . . . . . . 43

1.13 Selecting basic assumptions naively . . . . . . . . . . . . . . . . . 45

1.14 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

1.15 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

1.16 About the nature of free ultrafilters . . . . . . . . . . . . . . . . . 53

2 Basic theory 55

2.1 Reviewing the introduction of ZZ,Q and IR . . . . . . . . . . . . . 55

2.2 Introducing internal constants; definition of equality . . . . . . . . 57

2.3 Identification of internal constants . . . . . . . . . . . . . . . . . . 58

2.4 Standard constants; basic results for internal constants . . . . . . 63

2.5 External constants . . . . . . . . . . . . . . . . . . . . . . . . . . 67

2.6 The ∗-transform of operations and expressions . . . . . . . . . . . 69

2.7 The ∗-transform of relations and statements; L oˇs’ theorem; the internal definition principle . . . . . . . . . . . . . . . . . . . . . . 71

2.8 Transfer; the standard definition principle . . . . . . . . . . . . . 78 2.9 The ∗-transform of attributes . . . . . . . . . . . . . . . . . . . . 81 2.10 ∗IN, ∗ZZ, ∗Q, ∗IR: main definitions and properties . . . . . . . . . . 83

2.11 Overflow and underflow . . . . . . . . . . . . . . . . . . . . . . . . 85

2.12 ∗IN and ∗ZZ: more properties . . . . . . . . . . . . . . . . . . . . . 87 2.13 ∗Q and ∗IR: more properties; standard part . . . . . . . . . . . . . 91

2.14 An alternative to introducing ∗ZZ, ∗Q and ∗IR . . . . . . . . . . . . 92

2.15 Getting away with generating sequences and H(si); summary . . 94

3 Some applications 99

3.1 Introduction and least upper bound theorem . . . . . . . . . . . . 99

3.2 Simplifying definitions and proofs of elementary calculus . . . . . 100

3.3 Continuity and limits for internal functions . . . . . . . . . . . . . 104

3.4 More nonstandard characterizations of classical notions . . . . . . 108

3.5 Inverse functions; bc . . . . . . . . . . . . . . . . . . . . . . . . . . 112

3.6 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

3.7 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

3.8 Pitfalls in nonstandard analysis . . . . . . . . . . . . . . . . . . . 121

4 Some special topics 125

4.1 Principles of permanence . . . . . . . . . . . . . . . . . . . . . . . 125

4.2 The saturation principle . . . . . . . . . . . . . . . . . . . . . . . 130

4.3 Stirling’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

4.4 Nonstandard mathematics without the axiom of choice? . . . . . . 134

Appendix 139

References 141

Index 143

 

Prologue

Jaap Ponstein has been a professor of Operations Research at the University of Groningen for more than 20 years. The beginning of his career in Groningen coincided with the beginning of the study of econometrics in the university, in which or was positioned. His broad experience as a mathematician, both pure and applied, appeared to be instrumental in the development of or in Groningen. He became an authority on optimization, duality, convexity and generalized differentiability. His book “Approaches to the theory of optimization” (Cambridge University Press, 1980) is a beautiful example of the precise and transparent way in which he succeeded in connecting various areas of optimization. He was the right person to act as promotor honoris causa on the occasion of the honorary degree of R.T. Rockafellar (University of Washington, Seattle, usa) on June 20, 1984.

Professor Ponstein was known as an excellent teacher. For students and coworkers he was the ideal advisor: inspiring, and always making time to discuss papers. Usually, his comments were critical, both on contents and on formulations, but always aimed at improvement and maximal clarity. Hans Nieuwenhuis, Caspar Schweigman, Gerard Sierksma and I, to name a few, learned a great deal from him. We still are impressed by his integrity, his commitment, and his scientific erudition.

In the eighties Imme van den Berg raised Jaap’s interest in so-called Nonstandard Analysis. The possibility of dealing with infinitely small and infinitely large quantities as numbers, without losing mathematical rigor appealed to him. After his retirement (ultimo 1989) he made a profound study of this subject. The point of view of his research characterizes Jaap’s scientific attitude: How to introduce infinitesimals and other “nonstandard” numbers naively and simplemindedly, but in a way such that the resulting theory is mathematically sound, and complete within obvious limits.

This book is the result of his study. It was finished in the summer of 1995. Unfortunately, he was not able to publish it: he died on November 22, 1995.

11

12

I am pleased and proud, that the Faculty of Economics at the University of Groningen has offered the opportunity to publish the book. It underlines the respect we feel for Jaap’s merits.

Groningen, February 2001 Wim Klein Haneveld

13

After Jaap had presented his inaugural speech my mother said to me: “I had no idea what he was talking about, but he presented it in such a way that I couldn’t help thinking: Any minute now it will all make sense”. This was one of the special things about Jaap: his ability to inspire people with topics that appealed to him. It was the same when he explained to our small daughters, Marianne, Anne, Els and Ada, what a half and a quarter is. He cut an apple-pie first into two and then into four pieces. That same spirit of enthusiasm showed itself later on in Groningen when teaching econometrics students and when helping our neighbours’ children with their “difficult maths”.

To my mind the above illustrates the way in which Jaap began working on this book. Infinitely small numbers and their applications fascinated him. Jaap understood how to address the subject in an unconventional way, in a way that would make the subject more accessible to others. He had extensive discussions with Els, which word, and why, would be the best to describe the infinitely small numbers. As the manuscript neared completion, he discussed with Anne the possibility of publishing it.

That was the situation when Jaap became sick in September 1995. My daughters and I did not act on the manuscript for quite some time. We were unsure about what to do. Finally, I contacted Imme van den Berg who agreed to review the manuscript and bring it to the attention of the Faculty of Economics at the University of Groningen. The Faculty, represented by Wim Klein Haneveld, offered to publish the manuscript as a volume of the internal Research Reports series of the SOM Research School.

I would like to express my gratitude to Imme van den Berg for his rapid and thorough review of the manuscript, and to Wim Klein Haneveld for his careful handling the publication process. Last but not least, I would like to thank Suwarni Bambang Oetomo, who invested an enormous amount of effort into the task of transferring the manuscript from the word-processing program CW into the more modern TEX. It would not have been possible to publish this document without the financial assistance offered by the Faculty of Economics. My sincere thanks, also on behalf of our daughters, goes to all who have helped to bring about the publication of Jaap’s manuscript.

Zeist, November 2000 Sileen Ponstein-Troelstra

14

15

16

Preface

An infinitesimal is a ‘number’ that is smaller then each positive real number and is larger than each negative real number, so that in the real number system there is just one infinitesimal, i.e. zero. But most of the time only nonzero infinitesimals are of interest. This is related to the fact that when in the usual limit definition x is tending to c, most of the time only the values of x that are different from c are of interest. Hence the real number system has to be extended in some way or other in order to include all infinitesimals.

This book is concerned with an attempt to introduce the infinitesimals and the other ‘nonstandard’ numbers in a naive, simpleminded way. Nevertheless, the resulting theory is hoped to be mathematically sound, and to be complete within obvious limits. Very likely, however, even if ‘nonstandard analysis’ is presented naively, we cannot do without the axiom of choice (there is a restricted version of nonstandard analysis, less elegant and less powerful, that does not need it). This is a pity, because this axiom is not obvious to every mathematician, and is even rejected by constructivistic mathematicians, which is not unreasonable as it does not tell us how the relevant choice could be made (except in simple cases, but then the axiom is not needed).

The remaining basic assumptions that will be made would seem to be acceptable to many mathematicians, although they will be taken partly from formalistic mathematics – i.e. the usual logical principles, in particular the principle of the excluded third – as well as from constructivistic mathematics – i.e. that at the start of all of mathematics the natural numbers (in the classical sense of the term) are given to us. Not only the natural number, but also the set and the pair will be taken as primitive notions. The net effect of this is a version of mathematics that, except for truly nonstandard results, would seem to produce the same theorems as produced by classical mathematics.

One of the consequences of combining ideas from the two main schools of mathematical thinking is that the usual axioms of set theory, notably those due to Zermelo and Fraenkel, will be ignored. First of all, there will be elements that are

17

18

not sets, the natural numbers to begin with, only then sets will be formed from them in stages (or day by day), whereas when starting from the Zermelo-Fraenkel axioms each mathematical entity, in particular each natural number, is some set. From a formal point of view the latter has the advantage that there is just one primitive notion, but from a naive point of view it is not so obvious why numbers should be sets (in formalistic mathematics after the natural numbers come to life in the form of sets, this fact is concealed as soon as possible). Moreover, aren’t we presupposing at least the order of the natural numbers already when writing down axioms by means of suitable symbols?

To a certain extent nonstandard analysis is superfluous! For if a theorem of classical mathematics has a nonstandard proof, it also has a classical proof (this follows from what in nonstandard analysis is known as the ‘transfer’ theorem). Often the nonstandard proof is intuitively more attractive, simpler and shorter, which is one of the reasons to be interested in nonstandard analysis at all. Another reason is that totally new mathematical models for all kinds of problems can be (and in the mean time have been) formulated when infinitesimals or other nonstandard numbers occur in such models. A trivial example is a problem involving a heap of sand containing very many grains of sand, but where the number of grains of sand must not be infinite. Then taking the inverse of some positive infinitesimal and rounding the result up or down produces a so-called infinitely large ‘natural number’ that is larger than each ordinary natural number, but is smaller than infinity. It can be manipulated in much the same way as the ordinary numbers, which cannot, of course, be said of infinity. As a consequence the mathematics of infinitely large sets is essentially simpler than that of infinite sets. A peculiarity, however, is that the ‘selected’ infinitesimal and hence the infinitely large natural number are not specified the way the number of elements of a set of, say, 25 elements is specified. On the other hand, if ω is that infinitely large natural number, it makes sense to consider another heap of sand with ω2 grains of sand, that can be thought of as the result of combining ω heaps of sand each containing w grains of sand. But in what follows the analysis of practical models containing nonstandard numbers will not be stressed.

Chapter 1

Generalities

1.1 Infinitesimals and other nonstandard numbers: getting acquainted

An infinitesimal is a number that is smaller than every positive real number and is larger than every negative real number, or, equivalently, in absolute value it is smaller than 1/m for all m ∈ IN = {1,2,3,...}. Zero is the only real number that at the same time is an infinitesimal, so that the nonzero infinitesimals do not occur in classical mathematics. Yet, they can be treated in much the same way as can the classical numbers. For example, each nonzero infinitesimal ε can be inverted and the result is the number ω = 1/ε. It follows that | w |> m for all m ∈ IN, for which reason ω is called (positive or negative) hyperlarge (or infinitely large). Hyperlarge numbers too do not occur in classical mathematics, but nevertheless can be treated like classical numbers. If, for example, ω is positive hyperlarge, we can compute √ω, ω/2, ω −1, ω + 1, 2ω, ω2, etc., and we have (ω−1) + (ω + 1) = 2ω, (ω−1)·(ω + 1) = ω2 −1, etc. Also, for all m ∈ IN, m < √ω < ω/2 < ω−1 < ω < ω + 1 < 2ω < ω2 giving seven different hyperlarge numbers. The positive hyperlarge numbers must not be confused with infinity (∞), which should not be regarded a number at all, and which anyway does not satisfy these inequalities, except the first one.

Regrettably, there does not seem to exist a synonym for ‘hyperlarge number’ that would make a nice pair with ‘infinitesimal’, so let us introduce the synonym ‘hypersmall number’ for the latter.

If ε is hypersmall, if δ too is hypersmall but nonzero, and if ω is positive hyperlarge, so that −ω is negative hyperlarge, we write, ε ' 0, δ ∼ 0, ω ∼∞, −ω ∼−∞ respectively.

19

20

It would be wrong, of course, to deduce from ω ∼∞ that the difference between ω and ∞, or that between −ω and −∞ would be hypersmall. Given any x ∈ IR, x 6= 0, and any δ ' 0, let t = x + δ, then, ε <| t |< ω, for all ε ∼ 0 and all ω ∼∞. The number t is called appreciable (as it is not too small and not too large).

Three nonoverlapping sets of numbers (old or new) can now be presented:

a) the set of all infinitesimals, to which zero belongs, b) the set of all appreciable numbers, to which all nonzero reals belong, and c) the set of all hyperlarge numbers, containing no classical numbers at all.

Together these three sets constitute the set of all numbers of ‘real nonstandard analysis’. This set, which clearly is an extension of IR is indicated by,

∗IR and is called the ∗-transform of IR. The elements of ∗IR are called hyperreal. The use of the prefix ‘hyper’ here is not entirely defendable, as, say, 5, which obviously is an element of ∗IR, is just an ordinary real.

Abbreviating hypersmall, appreciable, and hyperlarge to s, a and l, respectively, and assuming that x and y are positive numbers, for addition and multiplication the following holds, y\x s a l y\x s a l s s a l s s s ? a a a l a s a l l l l l l ? l l addition multiplication

where the quotation marks stand for s or a or l. Examples for the lower left quotation mark are x ∼ 0 and y = √x−1, or 1/x, or 1/x2. For x−y the results are the same as for x + y (if still x,y > 0), except that if both x and y are appreciable, then x−y is either hypersmall or appreciable, and that if both x and y are hyperlarge, then x−y is either hyperlarge (positive or negative), or appreciable, or hypersmall, as is shown by the following examples: y = x/2, or 2x, or x−1, or x + ε, with ε ' 0. If a number is not hyperlarge it is called finite or limited.

21

Remark: Elsewhere in the literature, any element of ∗IR is called finite. Clearly, t is finite if and only if t = x + ε for some x ∈ IR and some ε ' 0. Given such a t, both x and ε are unique, for, x + ε = y + δ, x,y ∈ IR, ε,δ ' 0 implies that x−y = δ−ε ' 0, so that (as x−y ∈ IR), x−y = 0, hence x = y and ε = δ. By definition x is called the standard part of t, and this is written as,

x = st(t).

The standard part function st provides an important (mainly one-way) bridge between the finite numbers of nonstandard analysis and the classical numbers. Trivially, if t is itself a classical number, then st(t) = t.

1.2 Other ∗-transforms; generating new numbers The ∗-transform not only can be obtained for IR but also for IN, ZZ,Q, and in fact any set X of classical mathematics (and for much more, see Section 1.5). Their ∗-transforms are indicated by ∗IN, ∗ZZ, ∗Q, and ∗X, respectively. Throwing all nonfinite numbers out of ∗IN and ∗ZZ we obtain again IN and ZZ, but something similar is not true for ∗Q (for ∗IR we know this already), simply because ∗Q (just as ∗IR) contains finite non-classical numbers. Yet there is a striking difference between ∗Q and ∗IR in this respect: the ‘standard part theorem’ discussed at the end of the preceding section does not hold for ∗Q, that is to say, there are finite elements t of ∗Q that cannot be written as t = x + ε, with x ∈Q, ε ∈ ∗Q, ε ∼ 0. For let c be any irrational number, say c = √2, and let (r1,r2,...) be some Cauchy sequence of rationals converging to c. Later on it will become clear that then the sequence (r1−c,r2−c,...) ‘generates’ an infinitesimal δ in ∗IR (because this sequence converges to zero). On the other hand (r1,r2,...) generates an element r ∈ ∗Q ⊂ ∗IR, and r is finite (because the ri are rational, and this sequence converges), but it has no standard part in Q, for otherwise r = x + ε for some x ∈Q and some ε ∈ ∗Q, ε ' 0. But (r1 −c,r2 −c,...) also generates the finite number r −c ∈ ∗IR, so that r −c = δ ' 0. It follows that x−c = δ −ε ' 0, hence x−c = 0 (as x−c is an ordinary real), which would mean that c ∈Q, a contradiction. On the other hand, in ∗IR we have that st(r) = c. (Carrying this argument further it turns out that there exists a 1−1 mapping between IR and the set of all finite elements of ∗Q modulo the set of all rational infinitesimals, preserving addition and multiplication; i.e. the mapping is an isomorphism. In other words, IR (not ∗IR) can in a sense be produced by ∗Q.)

22

There are various ways to introduce the new numbers. Below this will be done by means of infinite sequences of classical numbers. In particular, the elements of ∗IR will be generated by means of infinite sequences of reals, and it will be necessary to consider all such sequences. (Recall that the elements of IR can be generated by means of rather special infinite sequences of rationals, i.e. the Cauchy sequences.) More generally, given any classical set X the elements of its ∗-transform ∗X will be generated by means of infinite sequences of elements of X, and again all such sequences must be taken into account. Each such sequence ‘generates’ an element of ∗X, and in case X is a set of numbers (or n-tuples of numbers) special sequences generate the elements of X itself. For example, (1,2,3,...) generates a hyperlarge element of ∗IN, and (3/2,5/4,9/8,...) generates a finite element of ∗Q, that is equal to the sum of 1, generated by (1,1,1,...) and an infinitesimal, generated by (1/2,1/4,1/8,...). Different sequences may generate the same element of ∗X. In fact, given any x ∈ ∗X there are many (uncountably many) different sequences that generate x (if X contains at least two elements). For example, changing finitely many terms of a generating sequence has no effect on the element generated. But there are many more variations on this theme. Wouldn’t it be possible to restrict ourselves to a suitable subset of all sequences? Unless we are satisfied with some sort of mutilated nonstandard analysis, most likely the answer is ‘no’. See Section 4.4.

Anyway, the nuisance of having to use generating sequences is only temporary. Once the new numbers have been introduced (as well as new functions, etc.) in most cases it is not necessary at all to know that they came about by means of infinite sequences. The situation is entirely analogous to that of introducing the real numbers: most of real analysis can be developed without the interference of Cauchy sequences. Most of the time an irrational such as √2 is treated as just a number, not as a sequence. Although ∗IN, ∗ZZ, ∗Q and ∗IR are extensions of IN, ZZ,Q and IR, respectively, in general ∗X is not always an extension of X. If, for example X = {IN}, then ∗ X = {∗IN}, and since IN 6= ∗IN, X is not contained in ∗X.

1.3 Bound and free variables; prenex normal form

At this point we must interrupt the main subject to say a few words about the formulation of mathematical statements and the difference between bound and free variables occurring in them. It will be convenient to use the logical connectives ¬, ∧, ∨, ⇒, ⇔ (for not, and, or, implies, is equivalent to, respectively), the universal quantifier ∀ and the existential quantifier (‘∀x : ...’ means: ‘for all

23

x such that ...’, and ‘∃x : ...’ means ‘there exists an x such that ...’). Apart from operations, logical connectives and quantifiers, each mathematical statement contains a number of variables and constants, which may be (expressions of) numbers, or functions, or n-tuples of numbers or functions, or sets of numbers or functions, etc.

For example, in,

∃x : [x ∈ IR∧x2 + y = 0], the constants are IR, 2, and 0 and the variables are x and y (unless y is fixed; then it may be regarded as an unspecified constant). Clearly, this statement changes into another meaningful statement if y is replaced by another variable or by some specified constant, and the same is true if one or more of the constants are replaced by other constants or by variables, at least within reasonable limits: ∃x : [x ∈ IR∧x2 + z = 0], or, ∃x : [x ∈Q∧x2 + y = 0]. Obviously, the new statement may not be equivalent to the given one, but that is not important in the present discussion (in fact the last statement is different from the given one, and the last but one is in case z is a variable that is different from the variable y). On the other hand, (only) replacing x by a constant leads to nonsense. Moreover, replacing the symbol ‘x’ by, say, ‘z’ leads to an equivalent statement: ∃z : [z ∈ IR∧z2 + y = 0], for it does not matter what symbol is used to indicate the relevant variable (again within reasonable limits). For these reasons, with respect to the given statement x is called a bound variable, and y is called a free variable. More generally, given any statement, if replacing any variable occurring in it by some constant leads to another meaningful statement that variable is called a free variable with respect to it, and any other variable occurring in it is called a bound variable (or a dummy variable) with respect to it. Any constant might be called free as well. It follows that the truth value of a statement depends on its free variables and its constants, but not on its bound variables. Furthermore, it follows that the x in ‘∀x’ or in ‘∃x’ is a bound variable, but the converse does not seem to be true. For example, in a limit definition the clause

24

‘for x tending to c’ may be occur. Then x is a bound variable. And x may also be a bound variable in the definition of an integral, namely if ‘dx’ occurs in the well-known way. If limits and integrals are replaced by their definitions, however, x will be ‘bound’ to a quantifier. How about x in, x ∈ IR ⇒ x2 ≥ 0? Many people will interpret this as ∀x : [x ∈ IR ⇒ x2 ≥ 0], and then x is a bound variable, yet x is a free variable simply because, say, 5 ∈ IR ⇒ 52 ≥ 0, is a meaningful (though uninteresting) statement. Note that many theorems take the form: ‘if x satisfies ..., then ...’, which usually is meant to be read as: ‘for all x such that ..., it is true that ...’.

Some confusion may arise because of the possibility to indicate different bound variables by the same symbol (with free variables this is not possible), as in, [∀x : [x ∈ IR ⇒ x2 ≥ 0]]∧[∃x : [x ∈ IR∧x2 + 1 = 0]] which is equivalent to, [∀x : [x ∈ IR ⇒ x2 ≥ 0]]∧[∃y : [y ∈ IR∧y2 + 1 = 0]]. Let us agree to avoid this, and in this example reject the first formulation.

Fortunately, any statement can be rewritten in its so-called prenex normal form. This is a unique rearrangement of the statement, whereby the quantifiers precede the logical connectives. Details regarding the existence and the uniqueness of the prenex normal form can be found in books on formal logic. For example, the following statement is in prenex normal form, ∀x : ∃y : ∀z : P(c,d,...,s,t,...,x,y,z), where P(c,d,...,s,t,...,x,y,z) is a statement that contains no quantifiers at all, and apart from its free (!) variables x, y, and z, contains the constants c,d,... and other free variables s,t,..., as well as logical connectives. It is clear that with three quantifiers there are eight possible normal forms if everything except the quantifiers is ignored: ∀∀∀, ∀∀∃, ∀∃∀, ∃∀∀, ∀∃∃, ∃∀∃, ∃∃∀, ∃∃∃. It is better, however, to consider the following variations, where for clarity in P there is only one constant c and only one additional free variable s, ∀x ∈ X : ∃y ∈ Y : ∀z ∈ Z : P(c,s,x,y,z), where X, Y and z ∈ Z are properly selected sets, or, ∀x ∈ X, A(x) : ∃y ∈ Y, B(x,y) : ∀z ∈ Z, C(x,y,z) : P(c,s,x,y,z),

25

where A(x), B(x,y), and C(x,y,x) are relatively simple conditions regarding the bound variables involved (such as x > 0), conditions that do not contain quantifiers. This is to be read as:

for all x in X satisfying A(x) it is true that there exists a y in Y satisfying B(x,y) such that for all z in Z satisfying C(x,y,z) it is true that P(c,s,x,y,z).

The reason why set inclusions should be specified explicitly is to avoid certain errors in nonstandard analysis, errors that cannot occur in classical mathematics. Even if x is a subset of a larger set Y , it is better to replace ‘x ⊂ Y ’ by ‘x ∈P(Y )’, whereP(Y ) is the power set of Y , i.e. the set of all subsets of Y . This will become clear when discussing the ∗-transform in Section 1.5, and the difference between internal and external constants in Section 1.6.

1.4 The purpose of nonstandard analysis

After the digression of the preceding section let us now contemplate the purpose of nonstandard analysis. Starting from IN, the sets ZZ,Q and IR (andC, but below complex numbers will be ignored) have been introduced in classical mathematics in order to enrich mathematics with more tools and to refine existing tools. The introduction of negative numbers, of fractions, and of irrational numbers is felt as a strong necessity, and without it mathematics would only be a small portion of what it actually is. The introduction of ∗IN, ∗ZZ, ∗Q, and ∗IR, however, was not meant at all to enrich mathematics (at least not when it all started), but only to simplify doing mathematics. For as soon as notions like limit and continuity are involved, definitions in nonstandard analysis can be given a simpler form, and theorems can be proved in a simpler way. Often the simplifications are considerable. In one case the proof of a classical conjecture was found by means of nonstandard analysis, after which a classical proof was found as well. Moreover, both definitions and proofs receive a more natural appearance. This may even enhance the discovery of new facts.

In the mean time nonstandard analysis has also been applied in a more traditional way, namely to introduce new mathematical notions and models. Examples can be found in probability theory, asymptotic analysis, mathematical physics, economics, etc. In what follows the attention will primarily be focused, however, on simplifying mathematics, rather than on enriching it with new concepts.

26

As an example of a simpler definition, consider continuity. A function f from IR tot IR is continuous at c ∈ IR if statement (1.1) holds, ∀ε ∈ IR, ε > 0 : ∃δ ∈ IR, δ > 0 : ∀x ∈ IR,| x−c |< δ :| f(x)−f(c) |< ε. Now to f there corresponds a unique function ∗F, called the ∗-transform of f, that is a function from ∗IR to ∗IR, such that ∗f(x) = f(x) if x ∈ IR, and (1.1) is true if and only if (1.1), which is the ∗-transform of (1.1), is true, ∀ε ∈∗IR, ε > 0 : ∃δ ∈∗IR, δ > 0 : ∀x ∈∗IR, | x−c |< δ :|∗f(x)−∗f(c) |< ε. (More about the ∗-transform in the next section.) Moreover, (1.1) is equivalent to the much simpler statement (1.1), ∀δ ∈∗IR, δ ' 0 : ∗f(c + δ)−∗f(c) ' 0.

Warning: The equivalence between (1.1) and (1.1) does in general not hold if c is replaced by a nonstandard number, or if f is replaced by a nonstandard function.

The essence of (1.1) is,

δ ' 0 ⇒∗f(c + δ)−∗f(c) ' 0, which is precisely what we want the definition of continuity to be: if x−c = δ is infinitely close to zero, then f(x)−f(c) too should be infinitely close to zero. The only problem in classical mathematics is that ‘infinitely close to’ is not (and most likely will never be) a well defined notion. In nonstandard analysis, however, all that need be done is to replace ‘infinitely close to’ by ‘' 00. Note that in all four definitions δ plays the same role (i.e. ‘distance’ from c), but that in (1.1) and (1.1) it is bound to ∃, whereas in Q it is bound to ∀. Also note that (1.1) and (1.1) each contain three quantifiers, but that (1.1) contains only one (that (1.1) contains no quantifiers at all is because it really is not complete as ‘δ’ is missing).

An illustration of a simpler proof is that of the intermediate value theorem: if f : IR → IR is continuous in the closed interval [a,b], a < b, a and b both finite, and f(a) < 0, f(b) > 0, then f(c) = 0 for some c ∈ [a,b]. A nonstandard proof of this theorem proceeds as follows. Let m ∈ ∗IN be hyperlarge. Divide [a,b] in m equal subintervals, each of length δ = (b − a)/m. Then δ ∼ 0. Let n be the smallest element of ∗IN such that ∗f(a + nδ) > 0, then ∗ f(a + (n−1)δ) ≤ 0. Let c = st(a + nδ), then, by continuity, ∗ f(a + nδ)−∗f(c) = ε1 and ∗f(c)−∗f(a + (n−1)δ) = ε2,

27

for certain infinitesimals ε1 and ε2. Hence−ε1 < f(c) = ∗f(c) ≤ ε2. But f(c) ∈ IR, so f(c) = 0.

How come, dividing [a,b] in m subintervals if m is not finite, and assuming that n, which also is not finite, exists? Yes, this is all right, because hyperlarge numbers behave like classical numbers.

The classical proof of the theorem is more involved, because it is based on the fact that a nonempty subset of IR that is bounded above has a least upper bound (or supremum).

Exercise: Show this fact by means of nonstandard analysis.

1.5 More about the ∗-transform; transfer So far a number of isolated instances of ∗-transforms have been presented (the ∗-transform of IR and other sets, of functions from IR to IR and of statement (1.1) in the preceding section). Although it is too early to present a complete treatment of the ∗-transform, a number of interesting aspects of this notion may be discussed already now. To each number, each set, each function, each operation (such as + and ∪), each simple relation (such as < and ∈), each logical connective (¬, ∨, ∧, ⇒, ⇔), both quantifiers (∀, ∃), each definition, and each statement of classical mathematics, there corresponds a unique ∗-transform in nonstandard mathematics. The notation is quite simple: just add an asterisk to the upper left of the symbol representing what is to be transformed. Sometimes the∗-transform is identical to its inverse image, but often this is not so. In the former case the asterisk should, of course, be dropped, but even in the latter case this can sometimes be done without creating confusion. Below a number of typical examples is presented, but full details will only be given later on. a) Numbers. If x ∈ IR, then ∗x = x. b) Sets. If X is a finite set of numbers, then ∗X = X, and also (happily so) ∗∅ = ∅, but if X is an infinite set of numbers, then X is strictly included in ∗X (in case X is an arbitrary abstract set and ∗X 6= X, X need not be a subset of ∗X.) c) Pairs. If hx,yi is a pair, then ∗hx,yi = hx∗,y∗i, and similarly for n-tuples hx1,...,xni.

28

d) Functions. If f : X → Y , then ∗f : ∗X → ∗Y , and ∗f(x) = f(x) if x ∈ X. Often the asterisk in ∗f may be dropped. e) Operations. As an example consider addition in IR. Its ∗-transform is ∗addition in ∗IR, and x∗+y = x + y if x,y ∈ IR. The asterisk can safely be dropped. f) Atomic relations. These are relations in which neither logical connectives nor quantifiers play a part, but only such relations as < or∈, etc. Consider first < in IR, leading to ∗ < in ∗IR. Similarly as under e) we have that x ∗< y is equivalent to x < y if x,y ∈ IR, and again the asterisk can safely be dropped. Next consider set inclusion. Let X be a subset of IR, then ∈ X transforms to ∗ ∈ ∗X. But ordinary set inclusion too is, of course, applicable to ∗X, so that there would be two set inclusions for the ∗transform ∗X of X. Fortunately, the two are identical, so that dropping the asterisk is a must. g) The logical connectives, and both quantifiers. For all of them the ∗transform is identical to the inverse image, so that asterisks should be dropped. h) Definitions. For example continuity transforms to ∗-continuity and ∗f, introduced in Section 1.4, is ∗-continuous at c if (1.1) is true. i) Statements. To some extent this covers, of course, case h). To find the ∗transform of a statement (1.1), it should be formulated in such a way that each bound variable x occurs in some set inclusion of the form x ∈ X, not x ⊂ Y . Then the ∗-transform is obtained by replacing each constant and each free variable by its ∗-transform. As an example consider (1.1) and (1.1) defined in Section 1.4. Note that if in (1.1) ’∈ IR’ would have been left out, something different would have been obtained. This is one of the reasons why at the end of Section 1.3 it was suggested to explicitly include each bound variable in some set inclusion. Why the inclusion x ⊂ Y should be avoided will become clear in the next section.

One of the basic principles of nonstandard analysis is that any given classical statement (1.1) is true if and only if its ∗-transform is true, which results from (1.1) by replacing all its constants and free variables by their ∗-transforms. Note that the bound variables are not replaced. The principle is applied both ways, from IR to ∗IR, or from ∗IR to IR. In either case one says that the deduction is done by ‘transfer’. Assuming that everything is in prenex normal form, two simple nontrivial cases are,

∀x ∈ X : P(x,s) and ∃x ∈ X : P(x,s),

29

where X is some set and P(x,s) is some atomic substatement with x a free variable and s a constant or a free variable. The ∗-transforms are, ∀x ∈∗X : P(x,∗s) and ∃x ∈∗X : P(x,∗s), respectively. Clearly, for each of the two classical statements transfer is trivial in one direction, assuming that X is a subset of ∗X, but not necessarily in the opposite direction. The following two implications are the nontrivial ones, [∀x ∈ X : P(x,s)] ⇒ [∀x ∈∗X : P(x,∗s)], [∃x ∈∗X : P(x,∗x)] ⇒ [∃x ∈ X : P(x,s)]. Note that the first implication starts from a classical statement and leads to its ∗-transform, whereas the second one starts from a ∗-transform and leads to the corresponding classical statement.

In by far the most practical situations applying transfer is fairly obvious. In what follows transfer is applied in a slightly complicated situation, where it is required to show the equivalence of statements (1.1) and (1.1), as well as that (1.1) can be simplified to Q, with (1.1), (1.1) and (1.1) as in Section 1.4. Trivially, by transfer, (1.1) and (1.1) are equivalent, so it remains to show the equivalence of (1.1) and (1.1). a) Let (1.1) be true, and let ε ∈ IR, ε < 0, and δ ∈ ∗IR, δ ' 0 be arbitrary. Then for some δ0 ∈ IR, δ0 > 0, ∀x ∈ IR, | x−c |< δ0 :| f(x)−f(c) |< ε, hence, by transfer, and because ∗c = c, ∗ε = ε, ∗δ0 = δ0, ∗f(c) = f(c), ∀x ∈∗IR,| x−c |< δ0 :|∗f(x)−∗f(c) |< ε. Let x = c+δ, then, because by definition of infinitesimal | δ |< δ0, |∗f(c+ δ)−∗f(c) |< ε. But since ε is arbitrary, this means that ∗f(c+δ)−∗f(c) ' 0, and since δ is arbitrary that (1.1) is true. b) Conversely, let (1.1) be true, and let ε ∈ IR, ε > 0, and δ ∈ ∗IR, δ ∼ 0, δ > 0, be arbitrary. For each x ∈∗IR, | x−c |< δ it follows that x−c = δ0 for some δ0 ' 0, hence, by (1.1), ∗f(x)−∗f(c) ' 0, and, by definition of infinitesimal, |∗f(x)−∗f(c) |< ε. Apparently, ∃δ00 ∈∗IR, δ00 > 0 : ∀x ∈∗IR,| x−c |< δ00 :|∗f(x)−∗f(c) |< ε, (take, for example, δ00 = δ), hence, by transfer (in the opposite direction as under a)), ∃δ00 ∈ IR, δ00 > 0 : ∀x ∈ IR,| x−c |< δ00 :| f(x)−f(c) |< ε. Since is arbitrary this proves (1.1).

30

1.6 Standard, internal, and external constants

Each∗-transform is called standard, because it corresponds in a 1−1 way to some classical constant, and in a number of cases is even identical to that constant. In particular any set ∗X is standard. For example, ∗IN, ∗IR, and ∗P(IR), with P(IR) the power set of IR, are all standard. Note that the term ‘standard’ must not be used within classical mathematics. The reason for this is that, for example, a function f from IR to IR, regarded as a function of nonstandard analysis, may not be standard, and usually it isn’t. On the other hand, ∗f is standard, but this is a function from ∗IR to ∗IR.

Each element (not subset) of a standard set turns out to be a special kind of constant of nonstandard analysis, namely a so-called internal constant, so that among others, infinitesimals and hyperlarge numbers are internal, because they are elements of ∗IR. Also the classical reals are internal, as IR ⊂∗IR. Since ∗x = x if x ∈ IR, the reals are also standard (as ingredients of nonstandard analysis). More generally, each standard constant happens to be internal, but the converse is not true, as is exemplified by infinitesimals and hyperlarge numbers.

Not every constant of nonstandard analysis is internal. For example, neither IR, nor the set of all infinitesimals, nor the set of all hyperlarge numbers is internal. Any constant that is not internal is called external.

Whereas internal constants behave like classical constants, external constants do not. They have extraordinary properties. For example, although IR is a subset of ∗IR that is bounded above in ∗IR by any positive hyperlarge number, IR has no least upper bound in ∗IR. For if b would be such a bound, then b ∼ ∞, and b − 1 too would be an upper bound, but b − 1 < b. In a similar way it can be shown that the (bounded) set of all infinitesimals has neither a least upper bound nor a largest lower bound; and that ∗IN\IN, the set of all hyperlarge natural numbers, has no smallest element, whereas each nonempty subset of IN has such an element. Therefore, working with external constants or external variables that are not recognized as such is rather dangerous. Fortunately, the occurrence of external variables can be avoided by explicitly using in classical statements set inclusions of the type x ∈ X (not x ⊂ Y ), because then their∗-transforms contain the inclusions x ∈∗X, and x is automatically internal. If an inclusion like x ⊂ Y is involved, it should be replaced by x ∈P(Y ), because an arbitrary subset of ∗Y need not be internal, as we have already seen. In this way external variables can literally be kept out of nonstandard analysis.

For any constant or variable, the diagram below shows the three main possibilities,

31

standard and internal nonstandard and internal nonstandard and external

‘Internal’ may be read as ‘mildly non-standard’, and ‘external’ as ‘extremely nonstandard’.

1.7 Infinitesimals in Greek geometry?

Maybe it was Antiphon, a Greek mathematician and contemporary of Socrates, who for the first time contemplated the existence of infinitesimals. According to Heath [1] he, Antiphon, stated that, in Heath’s words:

“If one inscribed any regular polygon, say a square, in a circle, then inscribed an octagon by constructing isosceles triangles in the four segments, then inscribed isosceles triangles in the remaining eight segments, and so on ‘until the whole area of the circle was by this means exhausted, a polygon would thus be inscribed whose sides, in consequence of their smallness, would coincide with the circumference of the circle’.”

There are at least two interpretations in modern terminology of this. One is that the end product of Antiphon’s construction is a polygon with a hyperlarge number of sides, so that the length of each side is a positive infinitesimal. But this would imply that the end product would not coincide with the circumference of the circle, that is, not exactly. The other one is that the end product is the circumference of the circle itself. But this would imply that the end product no longer was a polygon. Either interpretation contains a contradiction, so it is difficult to say what really was in Antiphon’s mind.

Anyway, Antiphon’s idea was not accepted by his fellow mathematicians. Again in Heath’s words:

“The time had, in fact, not come for the acceptance of Antiphon’s idea, and, perhaps as the result of the dialectic disputes to which the notion of the infinite gave rise, the Greek geometers shrank from the use of such expressions as infinitely great and infinitely small and substituted the idea of things greater or less than any assigned magnitude. Thus, as Hankel says, they never said that a circle is a polygon with an infinite number of infinitely small sides; they always stood still before the abyss of the infinite and never

32

ventured to overstep the bounds of dear conceptions. They never spoke of an infinitely close approximation or a limiting value of the sum of a series extending to an infinite number of terms.”

Note that the two interpretations mentioned above are also present in this quotation (‘infinitely close’ and ‘limiting value’).

Nevertheless, the Greek geometers solved many problems involving limits. They managed to do so by means of the so-called method of exhaustion. Given the problem to determine, say, the area of some figure, it is the method to find a sequence of inscribed figures as well as a sequence of circumscribed figures, each of known area, such that the given figure is approximated better and better by the terms of either sequence. But this does not mean that they thought in terms of limits. From the areas of the terms of both sequences they derived (guessed?) the area of the given figure, and a rigorous proof was obtained by showing that the proposed area of the given figure always lied between the areas of corresponding terms of both sequences. All that we can criticize is that they took the existence of the desired area for granted. In fact they managed to determine many limits without ever presenting a definition of limit.

Perhaps in his ‘Methods’ Archimedes comes closer to the use of infinitesimals. For example (see [1], Supplement, p. 15), when showing that the area of a segment ABC of a given parabola is 4/3 of the area of the triangle ABC, if, with D the middle point of the chord AC, BD is parallel to the axis of the parabola, Archimedes begins with some sort of plausible reasoning, where he states that the segment is made up of line segments between the parabola and the chord of the segment, all parallel to the axis of the parabola. Apparently, in his mind all these line segments together make up the entire segment of the parabola. It is tempting to conclude that the line segments were treated by him as parallelograms of hypersmall but positive breadth. At any rate, Heath ([1], Supplement, p. 8) writes that the line segments are

“... of course ... indefinitely narrow strips (areas) ...; but the breadth ... (dx, as we might call it) does not enter into the calculation because it is regarded the same in each of the two corresponding elements which are separately weighed against each other, and therefore divides out.”

If this would be correct Archimedese would have continued his plausible reasoning by showing that the parallelograms could each be ‘weighed’ (letting the area of a parallelogram be its weight) against one of the parallelograms making up a certain figure F. But the area of F could easily be shown to be equal to 4/3 of the area of triangle ABC.

33

There is an alternative, however, similar to the second interpretation mentioned earlier when discussing Antiphon’s idea, where not parallelograms but line segments are weighed against each other (letting the length of a line segment be its weight). In fact Archimedes neither mentioned something like breadth, nor discussed dividing something out at all. Instead, he considered line segments making up certain areas, not thin parallelograms. True, in this case the number of line segments is infinite, so a limit is involved, but when working with parallelograms each individual comparison of weights is not exact. And since (as Archimedes remarks himself) the reasoning is not to be regarded as a rigorous one, it is not clear which interpretation is the right one. Anyway, Archimedes later on presented a rigorous proof – based on the method of exhaustion – where he could use the ratio 4/3 that he found by plausible reasoning.

Let us close this discussion with Heath’s remark that Archimedes’ ‘Method’ is a rare instance where a Greek mathematician shows how his intuition has led him to the solution of some problem by means of plausible reasoning. Usually, in Greek mathematics any trace of the intuitive machinery used was completely cleared away.

Open question: Have infinitesimals been wandering through the minds of some Greek mathematicians, or didn’t they?

1.8 Infinitesimals in the 17th to the 19th century

There can be no doubt that in the 1670’s, some 1900 years after Archimedes lived, infinitesimals were conceived by Leibniz. Moreover, he formulated their main properties, and many contemporary mathematicians as well as mathematicians after him, among them Euler and Cauchy, were able to successfully work with them. But the theory of the infinitesimals lacked a rigorous basis, and during some 200 years all trials to improve this situation were in vein, so that at last one gave up, the more so because in the 1870’s Weierstrass came up with a rigorous theory of limits and continuity, which became the basis of what now is known as classical analysis, and where there was and is no need to consider infinitesimals any more.

It is quite interesting to see how Euler [2] shows the well-known product formula for the sine function. He begins his proof with the equality, 2·sinh x = (1 + x/n)n −(1−x/n)n, valid for – in Eulers’s own words – ‘infinitely large values’ of n. Obviously, this is only true up to an infinitesimal. Then the right-hand side is treated as if n were

34

a classical natural number. This leads after a purely classical reasoning to,

(1 + x/n)n −(1−x/n)n = (8x/n)·

m Y k=1

sin2(kπ/n)·{1 + x2/n2 tan(kπ/n)}, where m = (n−1)/2, taking n odd (the details of the reasoning do not matter here, and the case for n even is similar). So,

sinh x = (4x/n)·

m Y k=1

sin2(kπ/n)·{1 + x2/n2 tan2(kπ/n)}. Taking x 6= 0, and dividing by x, and then taking x = 0, gives, 1 = (4/n)· m Y k=1 sin2(kπ/n), and hence,

sinh x = x·

m Y k=1{1 + x2/n2 tan2(kπ/n)}. Now for k finite, n2 tan2(kπ/n) is ‘infinitely close’ to (kπ)2, so (?)

sinh x = x·

∞ Y k=1{1 + x2/k2π2}, and putting x = iz, this gives the desired result,

sinz = z·

∞ Y k=1{1−z2/k2π2}. Obviously, at the question mark the argument goes a little too fast, and a number of steps must be included here (see e.g. Luxemburg [3]).

Another famous example is Cauchy’s proof ([4], p. 131), that a convergent series of continuous functions has a continuous limit function. To many this theorem was not correct, because it would seem that all kinds of counter-examples could be given. One of them is the series with the partial sums,

sn(x) = (4/π)·

n X k=1

sin(2k + 1)x 2k + 1

,

that is periodic modulo 2π and converges to, f(x) =     −1 if −π < x < 0 0 if x = 0 or x = π +1 if 0 < x < π

35

as can be shown by classical Fourier analysis. Since the sine function is everywhere continuous and sn(x) converges to f(x) for n tending to∞, according to Cauchy’s theorem f ought to be continuous, which it isn’t. But sofar, everything takes place within IR, and Cauchy let everything happen in what we have indicated by ∗IR.

For him continuity of f at c meant that, ∀x ∈∗IR, x ' c : f(x) ' f(c), where, however, f : ∗IR →∗IR and f need not be a standard function, and c ∈∗IR, not only c ∈ IR, which is why his continuity is not ∗continuity (in nonstandard analysis it is called S-continuity; recall definition (1.1) in Section 1.4, where c ∈ IR and a standard function was involved, so that there S-continuity was the same as ∗continuity).

And by convergence of sn(x) to f(c) he meant that, ∀n ∼∞ : sn(c) ' f(c), where again everything is in ∗IR. Note that the Weierstrassian definitions of limit and continuity appeared half a century after Cauchy’s book, so Cauchy in a sense ‘had to’ work with definitions of the kind given here.

Now, by transfer,

∗ sn(x) = (4/π)·

n X k=1

∗sin(2k + 1)x 2k + 1

,n ∈∗IN, x ∈∗IR,

and

∗ f(x) = −1, or 0, or + 1, x ∈∗IR, since if the range of a classical function f is finite, the range of its transform is the same as that of f. Let m ∼ ∞ be fixed, and let x = c = 1/(2m), and dt = 1/m, so that x ∼ 0, dt ∼ 0. Then, ∗ sm1 2m= (2/π)· m X k=1 ∗sin(2k + 1)dt/2 (2k + 1)dt/2 ·dt. If we had that m ∈ IN, then the sum to the right would be an approximation of the Riemann-integral, J =Z1 0 sint t ·dt,

36

and it should therefore not come as a surprise that it can be shown that the standard part of the right-hand side is exactly equal to 2J/π, and hence,

∗ sm1 2m−2J/π ' 0. But by direct calculation it follows that 2J/π 6= −1, 0, and +1, and since in particular for c = 1/(2m), ∗f(c) = −1, or 0, or +1 (−1 is in fact impossible), it follows that ∗sn(c) does not converge to ∗f(c). Also, since for all n, ∗sn(0) = 0, ∗ sm(x) is not continuous at c = 0, so that the ‘counter-example’ does not satisfy the assumptions of Cauchy’s theorem, and this is why Cauchy maintained his theorem against all criticism, but without basing his proof (and much of his other work) on a rigorous theory of the infinitesimals and other nonstandard numbers. For many interesting details, see Lakatos [5].

1.9 Infinitesimals in the 20th century

When in the 1870’s Weierstrass formulated the well-known ε−δ definitions of limit and continuity, definitions that completely ignore nonstandard numbers, the dispute regarding infinitesimals quickly settled in their disadvantage, but only temporarily, for in 1961 Robinson [6,7] presented a mathematically sound theory of the nonstandard numbers. These works embody the first fairly complete analysis of the nonstandard numbers. Not only are they based on work of forerunners, but also on an amount of mathematical logic that hitherto was unusual in mathematics. Only a few references should suffice here, see [8–12].

Robinson starts from the axioms of set theory due to Zermelo and Fraenkel, and the axiom of choice (called together the ZFC axioms), derives IR in a classical kind of way, and then extends IR to ∗IR by applying a rather considerable amount of mathematical logic, as indicated before. Another way to define ∗IR was already indicated by Hewitt [10] and worked out by Luxemburg [13]. Here the ZFC axioms are again the point of departure, but the more usual line of mathematical thinking is followed. (Except for the ZF axioms, this way is also followed in the next chapter.) Still another way to introduce ∗IR was found by Nelson [14]. Nelson adds three more axioms to the ZFC axioms, as well as a new symbol, st (for ‘standard’) that is used as a kind of label to distinguish standard constants from nonstandard constants. This leads directly to the set of all standard as well as all nonstandard constants, without the intermediary step of first introducing IR; consequently

37

in internal set theory ∗IR is denoted by IR, and similarly, ∗IN is denoted by IN, etc. Actually, the point of view of internal set theory is that the IN of classical mathematics is the same as the IN of nonstandard analysis; and that all that happens is that unexpected elements of IN are discovered, elements that had always been there. In other words, according to this point of view, 0, 1, 2, etc. do not at all fill up IN (see Robert [15] and F. Diener et G. Reeb [16]). The additional axioms make sure that transfer is guaranteed (axiom of ‘transfer’), that nonstandard numbers exist (axiom of ‘idealization’), and that unique standard sets can be derived from given sets (axiom of ‘standardization’). Even though internal set theory uses relatively little of mathematical logic, the new axioms require some study, and do not seem to be as obvious as, for example, the axioms of Greek geometry: Transfer: ∀stt1 ...∀sttk : [∀stx : P(x,t1,...,tk) ⇒∀x : P(x,t1,...,tk)]. Idealization: [∀st finx : ∃x : ∀y ∈ z : P(x,y)] → [∃x : ∀sty : P(x,y)]. Standardization: ∀stx : ∃sty : ∀stz : [z ∈ y ⇔ z ∈ x∧P(z)]. Here stu means that the variable u must be standard, and similarly the label fin means that the corresponding variable must be finite (but beware, in internal set theory any hyperlarge natural number is finite, only the combination of standard and finite amounts to the classical notion of finiteness). Note that whereas stu means that the variable u is standard, ∗∧ means the variable ∗∧ is standard, because st is a label but∗is a mapping. P(...) denotes a given internal statement, except in the last axiom, where P(...) may even be external (see Section 1.6).

In naive nonstandard analysis these three additional axioms are not assumed but derived from the existence of the natural numbers and the axiom of choice. Transfer has already been discussed; and idealization is used to prove the existence of nonstandard elements in any internal set with an infinite number of elements. Perhaps standardization is the most intriguing of the three because it contains a statement P(z) that may be external. Reformulated naively it means that, ∀∗x : ∃∗y : ∀∗z : [∗z ∈∗y ⇔∗z ∈∗x∧P(∗z)], where x, y and z are, of course, classical. Since always ∗s ∈∗S if and only if s ∈ S, it follows that, y = {z ∈ x : P(∗z)}, or equivalently,

∗ y = ∗{z ∈ x : P(∗x)},

38

which in internal set theory are illegal set formations. Here are a few examples, where x and y are still classical, but z need not be classical. 1) P(z) ≡ z ∈∗IN∧z is standard; then x = {1,2,3} gives y = x = 1,2,3}, x = IN gives y = x = IN, and x = IR also gives y = IN.

In fact IN is the largest y that is possible for variable x. 2) P(z) ≡ z ∈∗IN∧z < n, with n ∈∗IN given such that n ∼∞; then the results are as under 1). 3) P(z) ≡ z ∈∗IR∧z ' 0; then y = {0} if 0 ∈ x and y = ∅ if 0 6∈ x. For other details the reader should consult more adequate treatments of internal set theory.

In the mean time other versions of nonstandard analysis have been developed. In one of them external sets are ‘legalized’ by means of still other axioms, and another label, ext (for ‘external’).

By now many hundreds of publications have been devoted to nonstandard analysis: it is an established branch of mathematics.

No matter how infinitesimals are introduced, with or without the axioms of set theory, with or without extra axioms and new undefined symbols (st and ext), always the axiom of choice seems indispensible. If one tries to develop infinitesimal calculus without this axiom, it seems that one should be satisfied with a mutilated theory, as will be explained later on in Section 4.4. Here attempts by Chwistek [17,18] in this direction should be mentioned. In his 1926 paper Chwistek introduces new numbers by means of infinite sequences of classical numbers. These new numbers are called Progressionszahlen (‘sequence numbers’), and equality for them is defined as follows. Let Ni(αi) and Ni(βi) be two new numbers, then, Ni(αi) = Ni(βi) if and only if αi = βi for i > n for some n ∈ IN. Something similar is done to define inequality, and an operation like addition is defined by,

Ni(αi) + Ni(βi) = N(αi + βi).

A classical function f is extended by means of,

f(Ni(αi)) = Ni(f(αi)).

39

The extended function happens to be quite similar to ∗f, the ∗-transform of f. Even so not much new calculus is developed. An extension of IR that includes all sequence numbers could be introduced, however.

In his 1948 book Chwistek spends less then ten pages on the subject, but nevertheless shows that he is well aware of the fact that ‘infinitely small’ numbers can be introduced, and he also introduces internal functions (called normal functions by him). Again there is no fully expanded calculus. Most likely, the deeper reason for this is that Chwistek defines (in)equality for his sequence numbers as indicated above. This definition has the advantage that the axiom of choice is not needed, but leads to rather serious problems, as will become clear in Section 4.4. It remains to remark that working with sequences is a technique used by Hewitt [10] and Luxemburg [13], and will be the technique of the next chapter, which is based on assumptions that from a naive, intuitive point of view are understandable, obvious, and acceptable, except perhaps the axiom of choice, and where everything that is not so obvious, such as transfer and all the rest, will be proved, rather than assumed.

1.10 Introducing infinitesimals by plausible reasoning; filters

Let f be a function from IR to IR, and let c and b be real numbers. Then in today’s notation the definition according to Leibniz’s ideas that the limit of f(x) for x tending to c is equal to b, is as follows, ∀x, x−c ' 0 : f(x)−b ' 0. As already mentioned, neither Leibniz nor anybody else at that time gave a mathematical sound definition of infinitesimal, and a dispute started, that temporarily stopped in the 1870’s when Weierstrass was able to present a mathematically sound limit definition, that was completely void of infinitesimal thinking: ∀ε > 0 : ∃δ > 0 : ∀x, 0 <| x−c |< δ :| f(x)−b |< ε, where ε and δ too are real numbers. This definition leaves no room for misunderstanding, and is even intuitively clear: after that mr. E. specifies any ε > 0, mrs. D. is able to specify a δ > 0, such that if x is within a distance δ of c (x need not be equal to c), then f(x) is within a distance ε of b (f(x) may be equal to b). Note that mrs. D. is able to come up with a δ > 0 no matter how small ε > 0 turns out to be; in each case she has a response.

40

Leibniz’s definition is more attractive: it contains only one quantifier, against three in Weierstrass’s definition, and even though the latter is intuitively clear, it is not as easy to grasp, not even for beginning mathematics students (probably for many other students it never becomes entirely clear).

Let us now see how by means of plausible reasoning we can find a good definition of infinitesimal. First note that the ε−δ definition is of a dynamical nature. If mr. E. first presents ε = ε1, after which mrs. D. presents δ = δ1, he may present another ε = ε2 that is so small that mrs. D. is forced to present another, i.e. smaller δ = δ2 (excluding trivial cases where f is constant in a neighborhood of c). Repeating this again and again, infinite sequences (ε1,ε2,...) and (δ1,δ2,...) are the result, and mr. E. and mrs. D. are involved in some dynamical process. Obviously, ε1 > ε2 > ..., and δ1 > δ2 > ... and it is clear that the ε’s as well as the δ’s tend to zero (again excluding trivial cases). Instead of considering the sequences (ε1,ε2,...) and (δ1,δ2,...), we can just as well consider the sequences (f(x1),f(x2),...) and (x1,x2,...) and require that if xi−c tends to 0 for i tending to infinity, but such that xi −c 6= 0 for all i, then f(xi)−b too should tend to 0 for i tending to infinity. Let us abbreviate this to: (xi −c) → 0 ⇒ (f(xi)−b) → 0, where (xi −c) and (f(xi)−b) stand for the sequences (x1 −c,x2 −c,...) and (f(x1)−b,f(x2)−b,...), respectively. Now it so happens that the ε−δ definition is equivalent to, ∀(xi), (xi −c) → 0, xi −c 6= 0 : (f(xi)−b) → 0.

Exercise: Show this.

This formulation contains only one quantifier, and in form comes quite close to Leibniz’s definition. However, we have traded two quantifiers for two converging sequences. This is overcome by replacing sequences that converge to 0 by infinitesimals, or rather, by letting a sequence that converges to 0 generate an infinitesimal, which will have the effect that the dynamical process mentioned above is replaced by something static.

So, let each infinite sequence (s1,s2,...) of real numbers si that converges to 0 generate an infinitesimal, and let this number be indicated by H(s1,s2,...) or simply by H(si), (with the H of Hyper).

As an example, let si = 1/i, then H(si) is an infinitesimal, and since 1/i > 0 for all i it should be positive. It is also reasonable to require that H(0,0,...) = H(0) = 0. Moreover, it should be possible to treat infinitesimals and other nonstandard

41

numbers as ordinary numbers. This requires that, for example, addition, subtraction and the greater-than relation be defined for them. The natural, but partly wrong, guesses (that are almost identical to Chwistek’s definitions mentioned in Section 1.9) are,

 

addition: H(si) + H(ti) = H(si + ti), subtraction: H(si)−H(ti) = H(si −ti), greater than: H(si) > H(ti) if si > ti for all i ∈ IN. Note that so far everything (being positive, being zero, addition, subtraction, greater than) is defined term by term. Let us adopt this as a basic rule. Unfortunately, this leads to trouble. For let,

(si) = (1/1,3/8,1/4,3/32,1/16,3/128,...), and (ti) = (7/8,7/16,7/32,7/64,7/128,7/256,...).

Then both (si) and (ti) converge to 0, and the terms of both (si) and (ti) are all positive, and nicely decrease with increasing i. So H(si) > 0, H(ti) > 0, H(si) ∼ 0 and H(ti) ∼ 0, and there seems to be no reason at all to reject these sequences as generating sequences of certain infinitesimals. But now consider the difference of H(si) and H(ti), which is generated by the sequence (si−ti), i.e. by, (+1/8,−1/16,+1/32,−1/64,+1/128,−1/256,...), then, as everything should be defined term by term, the conclusion must be that H(si −ti) is different from zero, is not positive and is not negative, hence that H(si − ti) does not behave as an ordinary number. Where did we go wrong? The example given would not give trouble if the definition of greater-than were revised:

H(si) > H(ti) if si > ti for all even i,

hence if only the even indices were ‘accepted’ and the odd indices were ‘rejected’. Then H(si −ti) would be negative. Not only would this revision be rather arbitrary, and could, for example ‘even’ have been replaced by ‘odd’, so that the odd indices would be accepted and the even indices would be rejected, it would not work either, as a new example could be concocted resulting in a hypernumber that was neither positive, nor negative, nor zero. The only way out would be that given any partitioning of IN into two disjoint subsets of indices i, it would be allowed to accept one of these subsets (that is to say that all of its elements would be accepted) if it were infinite, and reject the other, and given any partitioning of any accepted subset into two disjoint subsubsets, it would again be allowed to

42

accept one of the latter if it were infinite and reject the other, such that any subset of a rejected subset is itself rejected, etc., etc., ad infinitum. That accepted subsets would have to be infinite is a reasonable requirement as infinitesimals should be generated by certain infinite sequences.

Even then the definition of greater-than would be rather arbitrary, but for all sequences (si), H(si) would always be either positive, or negative, or zero. To see this, let Q0 = {i : si > 0}. Then if Q0 is accepted, hence Q1 = {i : si ≤ 0} is rejected, H(si) is positive. On the other hand, if Q0 is rejected, so that Q1 is accepted, then H(si) is nonpositive. In the latter case, let Q10 = {i : si < 0}. Then if Q10 is accepted, hence Q11 = {i : si = 0} is rejected, H(si) is negative, and if Q10 is rejected, so that Q11 is accepted, then H(si) is zero. Note that Q0 and Q1 are two complementary subsets of IN and that Q10 and Q11 are two complementary subsets of Q1. In this example it has tacitly been assumed that all the Q’s involved are infinite sets. If this is not so fewer cases need be examined.

Note that a revision of the definitions for addition and subtraction would not be necessary, because although the attention could be restricted to some accepted subsets of indices i, there is no reason to do so.

Still there is some trouble. There is strong evidence that if we are fully free to accept or reject subsets of indices, but such that the given rules are obeyed (IN is accepted; accepted sets must be infinite; of any two disjoint sets whose union is equal to an accepted set, precisely one must be accepted; subsets of rejected subsets must be rejected), we will never be able to give a complete specification of our choices or preferences. The meaning of this is that most likely we will never be able to write a computer program that contains all our preferences, and that, once the program has completely been finished and has been read in the relevant computer, it outputs either ‘accepted’ or ‘rejected’, after any subset Q of indices is presented as input to it.

Exercise: If this seems unbelievable, just try to give such a complete specification in the example above. See Section 1.16.

Although it seems not possible to specify all preferences in a constructive kind of way, it can be shown that a ‘complete system of preferences’ exists if the axiom of choice is invoked. The more technical term for such a system is free ultrafilter. But there are many such filters and when starting from one of them H(si −ti) may turn out to be positive, whereas when starting from another one H(si −ti) may turn out to be negative, because in the former case all odd indices and in the latter case all even indices would form an accepted set. We shall have to live with this arbitrariness, however. On the other hand, in practice only one free ultrafilter is needed. It is selected once and for all, and although there is much

43

arbitrariness in its selection, after the selection for each H(si) it follows uniquely whether it is zero, or positive or negative, and similarly for all other choices with a finite number of alternatives that will present themselves. Ironically, we will never know the ‘selected’ filter completely, or rather we will only know it extremely incompletely. See Sections 1.15 and 1.16.

1.11 Basic assumptions of formalism

The two main schools of mathematical thinking are formalism and constructivism. They will be reviewed in the present and the next sections.

In formalism, which is the predominant school, basic assumptions are the ZF axioms of set theory due to Zermelo and Fraenkel (or equivalent variations thereof). In them each symbol that represents a variable, represents a set, and in each of them the symbol ∈ occurs. Hence there is no reference at all to numbers, or geometrical concepts, or whatever. Instead of the word ‘set’ and the symbol ‘∈’ a word and a symbol not yet existing could be used, because the axioms only have a formal meaning. Nevertheless they are intended to fix intuitive ideas regarding ‘things contained in other things’, but that is their semantic aspect. Even though it is often said that in formalism the notion of set is undefined, and that ∈ is an undefined symbol, to a certain, but limited, extent they are defined implicitly by the axioms, because after all x ∈ y must be a statement, i.e. something that has a truth value (but recall that logic too can be formalized).

Other basic assumptions are that from the axioms other truths can be derived by applying the well-known rules of logic, such as the syllogisms, the rules of substitution, and the principle of the excluded third (or middle).

In formalism logic prevails over mathematics, that is to say that mathematics is subject to all rules of logic. In constructivism the order is reversed, which has the consequence that each rule of logic has to be screened before it can be accepted as a rule for mathematical reasoning. This has resulted in the rejection of just one rule of logic, namely the principle of the excluded third.

Here are the first seven of the ZF axioms (see, for example, C.C. Chang and H.J. Keisler [19], 1. ∀x,y : (x ≡ y ⇔ (∀z : (z ∈ x ⇔ z ∈ y))). 2. ∃x : ∀y : (¬y ∈ x). 3. ∀x,y : ∃z : ∀u : (u ∈ z ⇔ (u ≡ x∨u ≡ y)).

44

4. ∀x : ∃y : ∀z : (z ∈ y ⇔ (∃w : (z ∈ w∧w ∈ x))). 5. ∀x : ∃y : ∀z : (z ∈ y ⇔ (∀w : (w ∈ z ⇒ w ∈ x))). 6. ∃x : ((∃y : y ∈ x)∧(∀z : (z ∈ x ⇒∃w : (z ∈ w∧w ∈ x)))). 7. ∀x : (∃y : y ∈ x ⇒ (∃z : (z ∈ x∧¬(∃w : w ∈ z∧w ∈ x)))). The formal character of these axioms may be emphasized by not mentioning their intuitive meaning, and by replacing ∈ by any other suitable symbol. Whatever the interpretation of set and ∈, it can be shown that the x whose existence is secured in Axiom 2, is unique: for assume that x and x0 are such that, ∀y : (¬y ∈ x) and ∀y : (¬y ∈ x0). Then it has to be shown that x ≡ x0, or, by Axiom 1, that ∀z : (z ∈ x ⇔ z ∈ x0). But for any z it follows that ¬z ∈ x and ¬z ∈ x0, hence for any z, both implications z ∈ x ⇒ z ∈ x0 and z ∈ x0 → z ∈ x hold, so that the equivalence holds as well. Obviously, the intuitive meaning of the x of Axiom 2 is that of the empty set, and apparently the axioms dictate that there is a unique empty set, but for the proof the intuitive interpretation of neither set nor ∈ is required, which illustrates the formalistic character of formalism.

Axiom 6, which is the ‘infinity axiom’, together with other axioms takes care of the existence of an infinite set, that, apart from the empty set ∅, contains the following sets as elements: {∅}, {∅,{∅}}, {∅,{∅},{∅,{∅}}}, etc. Here the set denoted by {∅}, is defined by the requirements that ∅ ∈ {∅} and that if x ∈{∅} then x ≡∅, and similarly for the other sets. Only now numbers appear; by definition, 0 = ∅, 1 = {0}, 2 = {0,1}, 3 = {0,1,2} etc. In this way the natural numbers, including zero, appear as sets. Once in the possession of them the integers, the rationals, the reals, etc. can be defined in the well-known ways.

Most likely, Zermelo and Fraenkel’s desire was to formulate a set of axioms that would lead to the natural numbers, but nothing more, i.e. to 0 and the elements of IN, but in fact the axioms happen to lead to 0 and ∗IN. After additional steps the latter leads to Nelson’s internal set theory (see Section 1.9). If one insists on just IN, there is a way out: take the intersection of all sets that contain 0, 1, 2, etc., in other words, take the minimal infinite set whose existence follows from

45

Axiom 6. Altogether a fairly complicated way to get at the natural numbers and nothing more.

Moreover, one wonders whether in order to be able to even write down the ZF axioms the natural numbers are not at least implicitly presupposed. For this requires symbols to be written down one after the other in a linear order. There is a leftmost symbol, with just one right neighbor, for which the same is true, etc., until the rightmost symbol is reached (assuming sufficiently long lines). It would seem that here the natural numbers are implicitly used for ranking purposes. In addition, the required order not only is of a spatial nature, but of a timely nature as well, because the symbols must be written down after each other, presumably starting from the leftmost symbol, then proceeding with its right neighbor, etc. But here space and time are used as physical notions.

1.12 Basic assumptions of constructivism

Constructivism occurs in various forms, among which intuitionism and the theory of recursive functions, but only what they have in common will be briefly reviewed here. One of the starting points of constructivism is that before really starting mathematics the natural numbers, 1,2,3,..., are already given to us, simply because of our ability to count, and to do so indefinitely. This makes the ‘infinity axiom’ (Axiom 6 of the axioms listed in the previous section) superfluous, and has the agreeable consequence that the natural numbers are not sets. Nevertheless, there is no real difference between formalism and constructivism as far as the natural numbers in ordinary mathematics are concerned, since in formalistic mathematics the fact that the natural numbers are sets is largely ignored.

The same can be said about the rationals, but there is an essential difference as far as the reals are concerned, which is a consequence of the fact that the rationals form a countable set, but the reals do not. Yet the real numbers of both schools correspond to each other in a one-to-one kind of way. Differences appear if continua play a part. For example, according to both schools there exists a function f defined on the interval [0,1] of all rational numbers such that for all x ∈ [0,1], either f(x) = 5 or f(x) = 7, and such that both values are assumed somewhere. But as soon as this interval is replaced by the interval [0,1] of all real numbers, this is only true within formalism, because within constructivism such an f would not be accepted as a function, because it could not be defined constructively. In fact, within constructivism all functions defined on a finite real closed interval are continuous. For more details, see, for example, M.J. Beeson [20], or E. Bishop and D. Bridges [21].

46

More generally, a basic assumption (or perhaps restriction) in constructivism is that everything (definitions, proofs, etc.) must be of a constructive nature. For example, in formalism the following is a proper definition of p, p is the largest prime number such that p−2 too is a prime number, or if there is no such largest prime number, p = 1.

But in constructivism it is not, because it is not (yet) known whether or not the number of so-called twin primes is finite or not, hence (up to today) the definition is not constructive and must be abolished.

Another example is the 4-color problem. Let q be the minimum number of colors that is required to color any given map of countries, such that each country receives just one color, and neighboring countries receive different colors. Before 1977 q was not well defined within constructivism, because one did not know whether q was 4 or 5, but after 1977 q is also well defined in constructivism, because in 1977 Appel and Haken showed constructively that q = 4.

A consequence of constructivism is that it puts a restriction on the use of the logical principle of the excluded third. Consider some binary relation R, a statement, ∀x ∈ IR : xRy, (1.1) and its converse, ∃x ∈ IR : ¬xRy. Then within formalism either (1.1) or (1.2), because of the principle of the excluded third. But within constructivism there are three possibilities: either (1.1) can be shown constructively, or (1.2) can be shown constructively, or neither (1.1) nor (1.2) can be shown constructively. The third possibility arises because a special kind of proof is required. It follows that within constructivism mathematics prevails over logic: the principles of logic must first be checked before they can be accepted as tools in mathematical reasoning. In fact, the principle of excluded third is the only logical principle that is rejected.

There is some flexibility in constructivistic thinking, though, as can be seen from the next example, taken from E.W. Beth [22], where it is required to show constructively that,

11 + 22 + ... + 9999

is either dividable by 7 or not. Then all that is to be done is evaluate the indicated sum, divide it by 7 and see whether the remainder is zero or not. But the effort

47

involved is considerable, even if the computer is used (if the example is not impressive enough, replace 9999 by 999999), but at this point the constructivist declares that the proposed procedure is a constructive one, because the division could be carried out!

The case is similar to the construction of a regular polygon with 65,537 (= 1+216) sides by means of ruler and compass (and paper and pencil) only, except that there seems to have been a mathematician who spent 10 years of his life checking this. He truly was a constructivist.

From the point of view of constructivism many statements of formalism are incorrect or even nonsensical. Yet, from the same point of view it seems impossible ever to derive a contradiction from any statement proved within formalism, because it would seem that the required proof would have to be based on the principle of the excluded third (see, for example, E.W. Beth [23]): constructivism nobly denies itself the indispensable weapon that it would need to defeat its enemy.

Although constructivism may seem a restricted kind of mathematics, it is sound mathematics and its achievements are remarkable. Nonstandard analysis would be impossible within it, however, because then the axiom of choice would be required, but, as we will see, this axiom is of a highly nonconstructive nature.

1.13 Selecting basic assumptions naively

From a naive, common sense point of view, both, formalism and constructivism, have agreeable as well as disagreeable characteristics. Fortunately, we are not bound to choose between the two. In this section a number of basic assumptions will be presented that combine the agreeable aspects of both schools, and that at the same time are easily understandable, and fairly obvious, perhaps with the exception of the axiom of choice, but this axiom is dictated because of the subject of this book. The net effect of this combining starting points of both schools will be that the ensuing mathematics is identical to the mathematics of the formalistic school. It is this combination that will serve as the basis of the theory of the next chapters.

Basic assumptions regarding logic. Assume that mathematics is subject to the principles of logic, in particular to the principle of the excluded third, which holds that any statement is either true or false; there is no third possibility.

Basic assumptions regarding the natural numbers. Assume that when starting mathematics the natural numbers, including 0, are given to us, and that they can

48

be used to count. Here 0 is the natural number that is used if there is nothing to count.

In formalistic mathematics the natural numbers are regarded as sets, but below this will not be done. The natural number is the first kind of variable that will be considered. As usual a variable may assume certain values, each value is a constant.

From the present basic assumption it follows that it is legitimate to use the well-known inductive proof. An inductive proof has the following structure.

Suppose, we are concerned with infinitely many mathematical statements that can be counted by means of the natural numbers 0, 1, 2, .... Suppose further that P(0) is the first statement to be counted, that P(1) is the second one, etc. Assume that P(0) is a true statement. Also assume that given any natural number n if P(n) would be true then P(n0) would be true, where n0 is the natural number that comes immediately after n. Then P(m) is true for any natural number m.

Indeed, as P(0) is true, it follows that P(1) is true (taking n = 0 and hence n0 = 1), but since P(1) is true it follows that P(2) is true (taking n = 1 and hence n0 = 2), etc. Hence, when counting the given statements they can at the same time be proven to be true. Since by assumption all statements can be counted, they are all true; which shows the validity of the inductive proof.

For interesting arguments in favour of the present basic assumption, the reader may consult A. Heyting [24].

Basic assumptions regarding sets. The main difference between the assumptions that follow below and the ZF axioms of set theory, is that in the latter there are sets that have to be filled in with elements, or with no elements at all, whereas below there are elements (for example, the natural numbers) to begin with that may be taken together to form sets. As a consequence, in formalism any constant is a set, whereas below there are numbers, sets, n-tuples (among them pairs and triples), and functions (among them sequences), that each have there own intuitively determined character, although it should be admitted that the intuitive meaning of the various types of constants is not strictly required when deducing theorems from the axioms.

Existence (or specification). Given any moment in time, assume the possibility to either label or not label each of the then available constants separately, and to consider the set of precisely all unlabeled constants. A suitable label would

49

be: ‘no’. The unlabeled constants are called the elements of the set. As usual the relationship between a set and its elements is indicated by means of the symbol ∈. Labeling all available constants leads to the existence of the empty set, indicated by ∅. A well-known way to form sets is by a mathematical statement P(x) depending on a single free variable x; x is labeled if and only if P(x) does not hold.

Equality (or extensionality). Two sets X and Y are equal if and only if they have the same elements, or, more formally, if x ∈ X implies that x ∈ Y and vice versa. It follows that the empty set is unique.

Assume that any set is a constant, and hence that variables can be introduced whose values are sets, as well as sets whose elements are themselves sets, etc., but with the limitation that given any set ‘taking an element of’ can be plied a finite number of times only. Let the maximum number of times this can be done be called the level of the set, so that the lowest possible level is 1. As a natural number is not itself a set it is called an urelement or individual. In general, an urelement is any constant that is not a set. One might say that urelements are at level 0.

Without much loss of generality it may be assumed that all sets are regular. By definition a set of level k is called regular if all its elements are urelements (so that k = 1), or if k ≥ 2 and all its elements are regular of level k − 1. Hence {x,{x,y}} is not regular, but {{x},{x,y}} is if x and y both are urelements, or both are regular.

It follows that subsets and the power set of a given set can be formed, as well as the complement of a subset, and the difference, the union and the intersection of two sets. The notations are as follows. Sets: {x : P(x)} (i.e. the set of all x such that P(x)), or {x ∈ X : ...} (i.e. the set of all x such that x ∈ X and such that ...), or {a,b,c,d} (i.e. the set with the elements a, b, c, and d), or {a1,a2,a3,...} (i.e. the set with the elements an for each natural number n), etc. The subset and superset relations: ⊆ and ⊇. The proper subset and proper superset relations: ⊂ and ⊃. The power set of a set X : P(X) or 2X. The complement of a subset X : Xc. The difference, the union, and the intersection operations: −, ∪ and ∩.

50

From these assumptions it follows that a notion like the set of all sets is nonsensical. This notion plays a role in one of the many paradoxes that were found after the first, somewhat careless, set-up of the theory of sets. The argument runs as follows. If there were a set X of all sets, consider P(X), then P(X) would be an element of X, as X is the set of all sets, but P(X) cannot be an element of X, contradiction. This paradox arises because the interference of time is ignored: if today the set X of all sets that are known sofar is introduced, only tomorrow P(X) can be introduced. The existence of this and other paradoxes has been an impetus to the development of both formalism and constructivism.

As usual, the symbol IN is reserved for the set, {n : n is a natural number different from 0}, or, simply, IN = {1,2,3,...}. Similarly, INo = {0,1,2,...}.

Basic assumptions regarding n-tuples. Apart from natural numbers and sets also pairs, triples, and in general n-tuples are taken as primitive notions. The ordered pair or simply pair of two constants x and y is denoted by hx,yi. The basic property of pairs is that hx,yi = hx0,y0i if and only if x = x0 and y = y0. Often, in particular within formalism, hx,yi is defined as the following set, P = {{x}, {x,y}}, a definition that is due to Kuratowski. But there are many variations on this theme. One of the alternatives is to let hx,yi be the set, P0 = {{{x}, ∅}, {{y}}}. It is not difficult to show that both (1.7) and (1.8) satisfy the basic property of pairs, but clearly P 6= P0. Why prefer (1.7) over (1.8)? In fact definitions of hx,yi as sets do too much. If hx,yi would be equated to (1.7), for example, then {hx,xi} = {{{x}}}. This property is entirely accidental. See M.D. Potter [25], from which alternative (1.8) has been taken, for interesting details. For these reasons we follow Bourbaki’s decision to regard the pair as a primitive term. The basic property of the triple hx,y,zi is, of course, that hx,y,zi = hx0,y0,z0i if and only if x = x0 and y = y0 and z = z0. Instead of regarding the triple as a

51

primitive notion, it is often defined as the pair hx,hy,zii, although the obvious alternative is hhx,yi,zi. Combining the first choice with Kuratowski’s definition the result would be that hx,y,zi = {{x}, {x, {{y}, {y,z}}}}, but why not let, hx,y,zi = {{x}, {x,y}, {x,y,z}}? Similar remarks can be made with respect to the n-tuple hx1,...,xni, given any n ∈ IN, n ≥ 4. For any n ≥ 1, xj is the j-th term of the n-tuple hx1,...,xni, j = 1,...,n. Basic assumptions regarding generating new constants. Given any constant, assume the possibility to let a, possibly new, constant be generated by this constant, subject to the following two rules:

1) Identification. If the generated constant is not to be new it must be identified explicitly with a constant already known. 2) Equality. Equality of constants must be defined in such a way that all constants, new and old, satisfy the well-known rules of equivalence: x = x, x = y ⇒ y = x, and x = y∧y = z ⇒ x = z. The character of a new constant is determined by these two rules, as well as by its intuitive interpretation. Anyway, a new constant is regarded as an urelement.

The axiom of choice. Assume that the axiom of choice holds true, i.e. that given any infinite set, whose elements are nonempty sets, it is possible to select an element from each of these elements.

As will be illustrated in Section 1.16, this axiom is of a highly nonconstructive character, but it will only be used to establish the existence of free ultrafilters.

1.14 Basic definitions

A) Functions. Given two nonempty sets X and Y , a function (or mapping or map) f : X → Y is generated by a set G with elements of the form hx,yi, x ∈ X, y ∈ Y , such that for all x ∈ X there is a unique y ∈ Y with hx,yi∈ G. G is called the graph of the function f. Functions are regarded as new constants so that they are urelements, and two functions are equal

52

if and only if their generating sets are equal. The intuitive meaning of a function is that of an assignment: a function assigns a certain y to a given x. As usual the relationship between x and y is written as y = f(x), or as x 7→ y, or as x 7→ f(x). In case X is a subset of IN, and n ∈ X, f(n) is often written as fn. Also the definitions of function value, domain, range, injective (or one-to-one), surjective (or onto), and bijective (or one-to-one onto) functions, as well as the inverse of a bijective function are as usual. B) Sequences. If, for n ∈ IN, X = {1,...,n}(or X = {0,1,...,n}), a function f : X → Y is called a finite sequence; and if X = IN (or X = INo) then f is called an infinite sequence. The fj for j ∈ X are called the terms of the sequence. The usual notation of sequences is:

(f1,...,fn) or (f1,f2,...), and similarly in case 0 ∈ X. Note that although intuitively there may not be much difference between hx1,...,xni and (x1,...,xn), formally there is. Infinite sequences will play a crucial part in what is going to follow. C) Kinds of sets. Given a set X, if there exists a bijection from X onto {1,2,...,n} for some n ∈ IN, then X is called a finite set, otherwise an infinite set. Alternative formulations in these cases are that X contains a finite number of elements or an infinite number of elements, respectively. If X is an infinite set, and if there exists a bijection from X to IN, then X is called a countably infinite or denumerably infinite set, or simply countable or denumerable. The Cartesian or direct product of two sets S and T is the set of pairs {hs,ti : s ∈ S, t ∈ T}. The product is indicated by S × T. A similar definition can be given for n sets by means of n-tuples, n ≥ 3. D) Sets representing n-tuples or functions. In what follows it will be convenient to relate n-tuples and functions in a one-to-one way to certain sets. The choice of these sets is completely arbitrary, as long as this requirement is satisfied. (If n-tuples and functions would have been defined as sets themselves, this representation would be superfluous!). Given the n-tuple hx1,...,xni, let the set be, {{x1}, {x1,x2}, ..., {x1,...,xn}}, and given the function f let the set be its graph G, i.e. the set that generates f. Then there exists bijections Tn and F such that, hx1,...,xni = Tn({{x1},{x1,x2}, ..., {x1,...,xn}}),

53

{{x1}, {x1,x2}, ..., {x1,...,xn}} = T−1 n (hx1,...,xni), f = F(G), and G = F−1(f).

1.15 Filters

In Section 1.10, where infinitesimals were introduced by plausible reasoning, the conclusion was reached that a set of ‘acceptable’ subsets of IN was needed, satisfying the following rules:

1) IN is accepted, 2) accepted sets are infinite, 3) of any two disjoint sets whose union is equal to an accepted set, precisely one is accepted, and, 4) subsets of rejected subsets are rejected.

Such a set was called a free ultrafilter (over IN). Hence U is a free ultrafilter if: 1) IN ∈ U, 2) if Q ∈ U, then Q is infinite, 3) if Q = Q1∪Q2 ∈ U, Q1∩Q2 = ∅, then either Q1 ∈ U or Q2 ∈ U, but not both, and, 4) if Q 6∈ U and S ⊆ Q, then S 6∈ U. These requirements are equivalent to the following more usual ones. U is a free ultrafilter (over IN) if: 1a) IN ∈ U, 2a) if Q ∈ U and if R ⊇ Q, then R ∈ U, 3a) if Q ∈ U and if R ∈ U, then Q∩R ∈ U, 4a) if Q ∈ U, then Q is infinite, and, 5a) if Q ⊆ IN, then either Q ∈ U or Qc = IN−Q ∈ U, but not both. Proof of the equivalence:

A. Let 1) to 4) hold. Then 1a) and 4a) hold trivially and 5a) follows immediately from 1) and 3). If Q ∈ U and R ⊇ Q, then Qc 6∈ U and Rc ⊆ Qc, hence Rc 6∈ U and R ∈ U, which proves 2a). Finally, if Q, R ∈ U, then Qc 6∈ U, hence Qc ∩R 6∈ U, hence Q∩R ∈ U, which proves 3a). B. The proof of the converse is left as an exercise.

54

The terminology used rightly suggests that there are more general filters (over IN). Here are their definitions. Any set F of subsets of IN is called a filter (over IN) if:

1b) IN ∈ F, 2b) ∅6∈ F, 3b) if Q ∈ F and if IN ⊇ R ⊇ Q, then R ∈ F, 4b) if Q ∈ F and if R ∈ F, then Q∩R ∈ F.

Note that 2b) and 4b) imply if Q ∈ F then Qc 6∈ F, but this is not to say that either Q ∈ F or Qc ∈ F. A filter is called free if all of its elements are infinite.

Remark: Another definition is: a filter is called free if the intersection of all of its elements is empty. The two definitions are equivalent for an ultrafilter, but not for any filter. Finally, a filter is called an ultrafilter if for any Q ⊆ IN, either Q ∈ U or Qc ∈ U, but not both.

Nonfree filters are not very interesting for our subject, but free filters and free ultrafilters are. An example of a free filter (that is not an ultrafilter) is the set Fo = {Q : [∃k ∈ IN : Q ⊇ Qk]} with Qk = {i : i ≥ k}. This filter, which is called the Fr´echet filter, can be used for the definition of converging sequences, and hence for that of infinitesimals as suggested by Chwistek (see Section 1.9). For nonstandard analysis the really important filters are the free ultrafilters, because with a filter like Fo only an incomplete kind of nonstandard analysis can be developed, as will be shown later on in Section 4.4.

Filters over a set M different from IN can also be introduced. Then all that is needed is to replace IN in the definitions above by M. Below this will only happen in the next section, where M is an infinite subset of IN. Exercise: Show that if Q = R1 ∪R2 ∪...∪Rn, for some n ∈ IN, Ri ∩Rj = ∅ if i 6= j, and if F is an ultrafilter, then Ri ∈ F for precisely one i ∈{1,2,...,n}. Exercise: Show that if F is an ultrafilter, and if M ∈ F, then an ultrafilter G over M is induced by F and M, if we let G = {Q : Q ⊆ M, Q ∈ F}. Also show that G = {Q∩M : Q ∈ F}. Exercise: Show that if F is a free ultrafilter, if Q ∈ F and if qi ∈ Q, i = 1,2,...,n, n ∈ IN, then Q−{q1,...,qn}∈ F.

55

Most likely, no completely specified examples of free ultrafilters can ever be given. Evidence for this statement is given in the next section. But if the axiom of choice is invoked their existence can nevertheless be shown.

Theorem 1.15.1 Free ultrafilters over any infinite set exist.

Proof: The proof is given in the appendix.

It is only in this proof that the axiom of choice is required in the version of nonstandard analysis that is presented in this book, assuming, of course, that when nonstandard analysis is used to prove a classical theorem, the axiom is not necessary in a classical proof of that theorem.

Since there are many free ultrafilters, it is necessary to select one. The selection is completely arbitrary, or rather the selection must of necessity be extremely arbitrary: apart from a tiny little bit we will not know which free ultrafilter we are really dealing with, so that, in fact, the term ‘selection’ is not quite adequate. This is a consequence of the highly nonconstructive character of the axiom of choice.

From now on assume that U is some free ultrafilter (over IN) and that it is fixed once for all.

Since U is not known, it is not known either whether the infinitesimal s = H(+1, −1/2,+ 1/3, −1/4, ...), that is generated by the infinite sequence (+1, −1/2, +1/3, −1/4, ...) is positive or negative. For if Q = {1, 3, 5, ...}∈ U then s > 0, and if Qc = {2, 4, 6, ...}∈ U then s < 0, but we do not know whether or not Q is in U. Surprisingly enough, this and similar dichotomies will not hurt at all, simply because the final answers are independent from the underlying filter.

1.16 About the nature of free ultrafilters

As already mentioned there are many free ultrafilters U (over IN). Actually their number is uncountable. Although this fact is not very important for what is going to follow, so that the next argument may be skipped, a proof of this statement is given here, because it may give some more insight into the nature of free ultrafilters, in particular as far as nonconstructability is concerned.

First of all the next theorem is needed.

56

Theorem 1.16.1 Let M be any infinite subset of IN, then there is a free ultrafilter U over IN such that M ∈ U. Proof: Let D = IN−M, then if Q ⊆ IN then Q = Q0∪R for unique Q0 ⊆ M and R ⊆ D. Let U0 be a free ultrafilter over M, and let U be the set of all Q such that Q0 ∈ U0. Then U is a free ultrafilter over IN, and M ∈ U. The verification that U indeed is a free ultrafilter is left as an exercise.

From this theorem it follows that if a free ultrafilter U over IN is wanted, we can require beforehand that either Q0 = {1,3,5,...}∈ U or Q1 = {2,4,6,...}∈ U, for simply let M = Q0 or M = Q1. This is also true for the other Q’s that will be introduced. If we require beforehand that Q0 ∈ U, then we can also require beforehand that either Q00 = {1,5,9,...}∈ U or Q01 = {3,7,11,...}∈ U, and if we require beforehand that Q1 ∈ U, then we can also require beforehand that either Q10 = {2,6,10,...} ∈ U or Q11 = {4,8,12,...} ∈ U. These four cases can be split up into eight new cases in total, which in turn can be split up into sixteen new cases, and so on. Taking the n-th step, 2n new Q’s are added, each with n indices, each of which is 0 or 1. Apparently, we can require beforehand that, e.g. Q0, Q00, Q000, Q0000,... are all filter elements, but we can just as well replace this sequence of selections by Q0, Q01, Q010, Q0101,..., and so on. Clearly, if, say Qabcd ∈ U, the next Q is Qabcbe with e = 0 or e = 1. It follows that each infinite sequence of 0’s and 1’s defines an infinite sequence of selections, and vice versa. Since different infinite sequences of 0’s and 1’s thus lead to different filters, and since the set of all these sequences is uncountable, it follows that there are uncountably many free ultrafilters over IN, as claimed.

Note that selecting just one infinite sequence of zeros and ones would not specify U completely. Far from it, because if the sequence is, say, (0,1,0,1,...), so that Q0, Q01, Q010, Q0101, ... all are in U, then each of these Q’s can be split up into two, four, ... arbitrary infinite subsets and at each split a choice has to be made as far as the membership of U is concerned.

The conclusion must be that free ultrafilters cannot be completely constructed and that this is due to the fact that the axiom of choice cannot be dispensed with. Nevertheless, as already indicated in the previous section, classical mathematics developed by means of nonstandard analysis can be kept free from this axiom, if desired.

Chapter 2

Basic theory

2.1 Reviewing the introduction of ZZ,Q and IR A) Integers. Each integer is generated by a pair of natural numbers. If hm,ni is such a pair, the integer generated by it is indicated by Z(m,n). ZZ is the set of all Z(m,n). Equality. Z(m,n) = Z(p,q) if and only if m ≥ n, p ≥ q and m−n = p−q, or m < n, p < q and n−m = q−p. Exercise: Show that this equality relation satisfies the rules of equivalence. Identification. For all m ∈ IN, let Z(m,0) = m. The more usual notation instead of Z(m,n) is, of course, say, q in case m ≥ n and then q = m−n, or −q in case m ≤ n and then q = n−m. B) Rationals. Each rational is generated by a pair hm,ni of integers with n 6= 0. Let Q(m,n) be the rational generated by (m,n).Q is the set of all Q(m,n). Equality. Q(m,n) = Q(p,q) if and only if mq = np. Exercise: Verify again the rules of equivalence. Identification. For all k ∈ZZ, let Q(k,0) = k. Again the usual notation is different: | m | / | n | or −| m | / | n | instead of Q(m,n), depending on whether mn ≥ 0 or mn ≤ 0, preferably such that m and n have no common divisors. C) Reals. Each real is generated by a Cauchy sequence of rationals, i.e. an infinite sequence (r1,r2,...) of rationals rn, such that,

∀m ∈ IN : ∃k ∈ IN : ∀n,p ∈ IN, n,p > k :| rn −rp |< 1/m. Let the real generated by (r1,r2,...) be indicated by R(r1,r2,...). IR is the set of all R(r1,r2,...).

57

58

Equality. R(r1,r2,...) = R(s1,s2,...) if and only if (r1,r2,...) and (s1,s2, ...) are concurrent Cauchy sequences of rationals, i.e. if

∀m ∈ IN : ∃k ∈ IN : ∀n ∈ IN, n > k :| rn −sn |< 1/m.

The verification of the rules of equivalence is now somewhat more involved. Identification. For all r ∈Q, let R(r,r,...) = r. Once again the usual notation is different, although this time there is no simple notational rule as there was for integers and rationals. Examples are √2, π, e, etc., etc.

In all three cases, A) to C), operations such as + and simple relations such as ≤ can easily be defined for the new numbers by means of the corresponding operations and simple relations that are known for the terms of the sequences generating them. In some, but certainly not all, cases the definitions can be given ‘term by term’, such as + and ≤ for the reals:

R(r1,r2,...) + R(s1,s2,...) = R(r1 + s1,r2 + s2,...),

and,

R(r1,r2,...) ≤ R(s1,s2,...) if r1 ≤ s1, r2 ≤ s2,...

It would be wrong, however, to define < and > for reals in this way! A simple counterexample is,

?? 0 = R(0,0,...) < R(1,1/2,1/3,...) = 0, or 0 < 0 ??

Exercise: Present the right definitions. More generally, present the definitions for +, −, ∗, /, |·| for integers, rationals and reals, where appropriate. The success of all this is, of course, that apart from the fact that any two numbers of some kind have a sum and a product of the same kind, in addition any two integers have a difference that is an integer, any two rationals (one of which is nonzero) have a difference and a quotient that are rationals, any two reals (one of which is nonzero) have a difference and a quotient that are reals, and any Cauchy sequence of reals has a limit that is a real (the proof of the last statement is not entirely trivial).

From their definitions it follows that any integer, any rational and any real is an urelement.

59

2.2 Introducing internal constants; definition of equality

The collection of all constants of nonstandard analysis can be determined in two steps, starting from the collection of all classical constants. First of all generate constants by means of infinite sequences of classical constants and add them to the collection if they are new, i.e. if they are not identified with a classical constant. The generation of these constants will be explained in detail below. Then add new constants (sets, pairs, triples, ..., functions) to the collection that can be defined by means of what is in the collection so far in the same way as this is done in classical mathematics (for example, add the set of all infinitesimals).

Every constant of nonstandard analysis is either internal or external. The internal constants should be regarded as the decent members of the society of nonstandard analysis, and the external constants as its outcasts, because the behaviour of the former is like that of similar classical constants, but that of the latter may be strange, unexpected and counterintuitive, which is why in internal set theory (see Section 1.9) by set automatically is meant internal set, and an external set is not a set at all. Here we do not go that far: any set is either internal or external, and in the latter case still is a set. An example of an external set is IN.

Every internal constant is either standard or not. Standard constants are characterized by a close relationship to classical constants, and some of them are even identified with the latter. Every internal constant that is not standard and every external constant is nonstandard (see the diagram in Section 1.6). The internal constants are introduced first, then follow the standard constants and the external constants.

Remark: The terms ‘standard’, ‘internal’ and ‘external’ have been taken from Nelson’s internal set theory.

Recall that a free ultrafilter U over IN has been ‘selected’ once for all, but that so far the definition of new constants did not involve U, i.e. so far only classical constants have been considered.

Now let each infinite sequence (s1,s2,...) of classical constants si generate an internal constant or hyperconstant H(s1,s2,...), which will often be abbreviated to H(si). After the development of some theory, most of the time internal constants can be used without that a generating sequence of classical constants is needed. This is entirely analogous to the generation of the reals R(r1,r2,...) by means of Cauchy sequences of rationals ri. Most of the time one can use, say, √2 without that some Cauchy sequence tending to √2 is needed.

60

If all si are numbers, also H(si) will be regarded as a number (as it may not be a set in the present theory, it is not so easy to say what a number really is, all that can be done is tell who is a number). Similarly, if all Si are sets, also H(Si) will be a set; if all pi are pairs, also H(pi) will be a pair (and similarly for n-tuples), and if all fi are functions, also H(fi) will be a function. Hence hypersets will be sets, hyperpairs will be pairs (and similarly for n-tuples), and hyperfunctions will be functions, in the classical sense of these terms, but hypernumbers will not necessarily be classical numbers.

The introduction of H(s1,s2,...) requires that the rules of identification and equality must be specified. The latter is the simpler of the two:

Definition of equality of internal constants:

H(s1,s2,...) = H(t1,t2,...) if and only if {i : si = ti}∈ U. This definition applies no matter whether the si and ti are urelements or not. It implies that it is pointless to let the si be of two different kinds, say, numbers and pairs of numbers. For example, suppose that s2i−1 ∈ IR, and s2i = hpi,qii, pi,qi ∈ IR. Then, since either{2i−1 : i ∈ IN}∈ U or{2i : i ∈ IN}∈ U, we can just as well assume that all si are reals, or that all si are pairs of reals, respectively. Making the changes such that this is true will change the presentation, but not the value of H(si). This follows immediately from the definition of equality given. The same argument can be applied if there were not two but three or more kinds of elements, as long as the number of kinds is finite. A case with an infinite number of kinds is where si is an i-tuple of reals. Another case is where si is a set of level i. In order to avoid difficulties such cases are to be avoided themselves. This is quite acceptable, because in practice they would seem to be rather fancy.

2.3 Identification of internal constants

Identification of internal numbers

A) If all si are numbers, then let,

H(s1,s2,...) = s if and only if {i : si = s}∈ U.

Hence H(3,3,3,...) is identified with 3, but H(1,2,3,...) is new as it is not identified with any classical number.

61

Identification of internal sets

The identification of internal sets is based on the following result.

Theorem 2.3.1 Given infinite sequences (S1,S2,...) and (T1,T2,...) of sets Si and Ti, {H(si) : si ∈ Si} = {H(ti) : ti ∈ Ti} if and only if {i : Si = Ti}∈ U.

Proof: Let S = {H(si) : si ∈ Si}, T = {H(ti) : ti ∈ Ti} and Q = {i : Si = Ti}. If Q ∈ U, take H(si) ∈ S arbitrarily, then for all i, si ∈ Si, and for i ∈ Q, si ∈ Ti, hence H(si) ∈ T, because of the definition of equality, which means that S ⊆ T and similarly it follows that T ⊆ S, so that S = T. Conversely, if Q 6∈ U then {i : Si 6= Ti}∈ U, hence either R = {i : si 6∈ Ti for some si ∈ Si}∈ U or, {i : ti 6∈ Si for some ti ∈ Ti}∈ U, or both. Suppose R ∈ U (the other case is similar). If i ∈ R take si ∈ Si but such that si 6∈ Ti, and if i 6∈ R take si ∈ Si arbitrary. Then {i : si 6∈ Ti}⊇ R, so that {i : si 6∈ Ti}∈ U and H(si) / ∈ T, because otherwise ∅ = {i : si 6∈ Ti}∩{i : si ∈ Ti}∈ U, a contradiction. Hence, indeed, H(si) 6∈ T and S 6= T. Comparing this result with the definition of equality, it follows that H(Si) = H(Ti) if and only if S = T, where Si, Ti, S and T are as in the theorem and its proof. This suggests the following definition.

B) If all Si are sets, let, H(S1,S2,...) = {H(si) : si ∈ Si}, which obviously is a (classical) set, although its elements need not be classical.

Proof of the rules of equivalence for equality for internal numbers and internal sets (1) H(si) = H(si) because {i : si = si} = IN ∈ U.

62

(2) If H(si) = H(ti) then H(ti) = H(si), because {i : si = ti} = {i : ti = si}. (3) H(ri) = H(si) and H(si) = H(ti), so that P = {i : ri = si} ∈ U and Q = {i : si = ti}∈ U, then R = {i : ri = ti}∈ U, because R ⊇ P ∩Q, hence H(ri) = H(ti).

The next result takes care of some simple cases.

Theorem 2.3.2 If for all i, Si = {si}, then, H({s1},{s2},...) = {H(s1,s2,...)}, or H({si}) = {H(si)}. Also, if for all i, Si = {si,s0 i}, then, H({si,s0 i}) = {H(si),H(s0 i)}, and in general, if for some n ∈ IN and all i, Si = {si1,...,sin}, then, H({si1,...,sin}) = {H(si1),...,H(sin)}.

Proof: Left as an exercise. In shorthand, this result may be written as: H({...}) = {H(...)}, i.e. the ‘operator’ H may be interchanged with set formation if all the terms Si of the generating sequence are sets with n elements, n ∈ IN. If all Si are infinite, the theorem is wrong. As a counterexample, let Si = {si1,si2,...} = {1,2,...} = IN for all i. Then H(Si) contains H(1,2,3,...) = H(i), which is not contained in {H(si1),H(si2),...} = {H(1),H(2),...} = {1,2,...} = IN. Exercise: What if Si = {1,...,i}?

Identification of internal n-tuples Next consider H(pi) where all pi = hxi,yiiare pairs. It would be nice if H(hxi,yii) too would be a pair, but is it? Now recall from Section 1.14 that,

hxi,yii = T2({{xi},{xi,yi}}), so that applying Theorem 2.3.2 twice it follows that,

H(T−1 2 (hxi,yii)) = H({{xi},{xi,yi}}) = {H({xi}),H({xi,yi})} = {{H(xi)},{H(xi),H(yi)}} = T−1 2 (hH(xi),H(yi)i).

63

Therefore,

T2(H(T−1 2 (hxi,yii))) = hH(xi),H(yi)i, from which it is clear that if T2 were replaced by the identity map, H(hxi,yii) would be the pair hH(xi),H(yi)i. Fortunately, H(hxi,yii) = H(hx0 i,y0 ii) if and only if hH(xi),H(yi)i = hH(x0 i),H(y0 i)i, because, a) H(hxi,yii) = H(hx0 i,y0 ii) if and only if {i : hxi,yii = hx0 i,y0 ii}∈ U, andb) hH(xi),H(yi)i = hH(x0 i),H(y0 i)i if and only if,T 2(H(T−1 2 (hxi,yii))) = T2(H(T−1 2 (hx0 i,y0 ii))), hence if and only if, {i :T −1 2 (hxi,yii) = T−1 2 (hx0 i,y0 ii)} ∈ U, hence if and only if, {i : hxi,yii = hx0 i,y0 ii}∈ U. For this reason, H(hxi,yii) is identified with hH(xi),H(yi)i. This means that the H operator may be interchanged with pairing, and that the internal pair or hyperpair H(hxi,yii) is a (classical) pair, albeit with terms that need not be classical.

The same reasoning can be followed for internal or hyper n-tuples.

Identification of internal functions

Turning to functions, the next question is whether, given an infinite sequence (f1,f2,...) of functions fi, H(fi) can be identified with a function, and if so which one? Let fi : Xi → Yi, for certain sets Xi and Yi. Then fi is generated by the set Gi = {hxi,fi(xi)i : xi ∈ Xi}, with fi(yi) ∈ Yi. Also fi = F(Gi), see again Section 1.14. Hence, H(F−1(fi)) = H(Gi) = H({hxi,fi(xi)i : xi ∈ Xi}). Again interchanging H with set formation, this becomes, {H(hxi,fi(xi)i) : H(xi) ∈ H(Xi)}, and interchanging H with pairing, {hH(xi),H(fi(xi))i : H(xi) ∈ H(Xi)}, which set clearly generates a function g from H(Xi) to H(Yi), because H(fi(xi)) ∈ H(Yi), and it follows that g = F(H(F−1(fi))). Fortunately, H(fi) = H(f0 i) if and only if g = g0, with g0 = F(H(F(f0 i))), because,

64

a) H(fi) = H(f0 i) if and only if {i : fi = f0 i}∈ U, andb) g = g0 if and only if H(F−1(fi)) = H(F−1(f0 i)), hence if and only if, {i : F−1(fi) = F−1(f0 i)}∈ U, hence if and only if, {i : fi = f0 i}∈ U. Therefore, H(fi) is identified with g, so that the internal function or hyperfunction H(fi) is a classical function from the internal set H(Xi) to the internal set H(Yi), and,

H(fi)(H(xi)) = H(fi(xi)).

Summarizing, we have: C) For n-tuples: let H(hxi1,...,xini) = hH(xi1),...,H(xin)i. D) For functions: given fi : Xi → Yi, let H(fi) : H(Xi) → H(Yi), with H(fi)(H(xi)) = H(fi(xi)).

Proof of the rules of equivalence for equality in the case of n-tuples and functions: In the case of n-tuples, this follows from the identification for n-tuples and the fact that, if n = 2, H(hxi,yii) = H(hx0 i,y0 ii) if and only if hH(xi),H(yi)i = hH(x0 i),H(y0 i)i, and in the case of functions, this follows from the identification for pairs and functions and the fact that,

H(fi) = H(f0 i) if and only if g = g0.

The details are left as an exercise.

If for each i the function gi maps Wi to Xi and the function fi maps Xi to Yi, so that the composition fi◦gi is well-defined, then the corresponding internal composition H(fi◦gi) is the (classical) composition of H(fi) and H(gi), because, H(fi)(H(gi)(H(wi))) = H(fi)(H(gi(wi)) = H(fi(gi(wi))), wi ∈ Wi.

Identification of internal sequences

Just as sequences are special functions, internal sequences are special internal functions. Suppose, given some n ∈ IN, that each fi is a sequence (fi(1),...,fi(n)) with n terms. (For more clarity, their arguments are not indicated by subscripts, but are between parentheses.) Then H(fj) too is a sequence with n terms, because

65

the domain of H(fi) is equal to the domain of each fi, i.e. {1,...,n}. Why? The j-th term of this sequence is,

H(fi)(j) = H(fi)(H(j)) = H(fi(j)).

Hence in this case the internal sequence is a classical sequence with n internal terms. But if n depends on i this is not true in general. For if the i-th sequence is (fi(1),...,fi(ni)), then the argument or index of H(fi) takes the form H(ji) = H(j1,j2,...) and hence is an internal natural number that need not be a classical natural number. Still it is meaningful to speak of the H(ji)-th term of the internal sequence, and this term is equal to H(fi)(H(ji)) = H(fi(ji)), ji = 1,...,ni.

Example: Let ni = i and fi(ji) = i + ji, then the term with index H(ji) = H(j1,j2,...), ji = 1,...,i, is H(1 + j1,2 + j2,...). Taking all ji = 1 this gives that H(2,3,...) is the first term. The n-th term is found by taking ji = n for all i ≥ n, and the result is H(?,...,?,2n,2n + 1,...), where the n−1 question marks may be replaced by any numbers, as their values are irrelevant to the value of H(?,...,?,2n,2n+1,...). The obvious choice leads to H(n+1,...,2n− 1,2n,2n + 1,...). And the term corresponding to index H(i) = H(1,2,3,...) is H(2i) = H(2,4,6,...).

Suppose now that all fi are infinite sequences. Then the internal sequence is not a classical sequence; its argument is H(ji) and the H(ji)-th term is equal to H(fi)(H(ji)) = H(fi(ji)), ji = 1,2,.... Example: Let fi(ji) = i2 + j3 i , then the term with index H(ji) = H(j1,j2,...) is H(12 +j3 1,22 +j3 2,...), so that the first term is H(2,5,10,...) and the second one is H(9,12,17,...). And the term corresponding to index H(i) = H(1,2,3,...) is H(i2 + i3) = H(2,12,36,...).

2.4 Standard constants; basic results for internal constants

Any constant that is equal to H(s) = H(s,s,...) for some classical constant s is called standard. This does not mean that it needs to be a constant of ‘standard’, that is to say classical, mathematics. It only means that it is closely related to some classical constant, although in a number of cases it is indeed equal to such a constant. Obviously, there is a bijection between the set of all classical constants and the set of all standard constants, whereby s is mapped to H(s). In order to emphasize this special relationship, instead of H(s) = H(s,s,...) the

66

usual notation will be ∗s, and ∗s is called the ∗-transform of s. So, in case H(s) is identified with s, it follows that ∗s = s, so that, for example, ∗5 = 5, hence 5 is a standard constant, but a function f : IR → IR is not, because ∗f : ∗IR → IR∗ and ∗IR ⊃ IR. The strict inclusion here follows from the fact that, for example, H(1,2,3,...) 6∈ IR and H(1,1/2,1/3,...) 6∈ IR, as can easily be shown by an indirect argument and using the identification rules. Nevertheless, if x ∈ IR, then ∗ f(x) = f(x), so that ∗f is an extension of f.

Exercise: Show this.

Obviously, a constant is internal if it is an element of some internal set. Even the following is true.

Theorem 2.4.1 A constant is internal if and only if it is an element of some standard set.

Proof: The if-part is obvious. Conversely, given any internal constant H(si) = H(s1,s2, ...), let S = {s : s = si for some i}, then H(si) ∈∗S. Theorem 2.4.2 Let S be a classical set such that ∗s = s for each s ∈ S. Then ∗S = S if and only if S is finite. Otherwise S ⊂∗S.

Proof: Since ∗s = H(s) = s if s ∈ S, ∗S = {H(S) = H(si) : si ∈ S}⊇ S. If S is finite then any H(si) = s for some s ∈ S, hence ∗S ⊆ S, and if S is infinite then there exist s1,s2,..., all different, so that H(si) 6∈ S. As an example let S be a set of numbers.

 

Corollary 2.4.1 Let S be a classical set, such that each of its elements is a classical set of which each element is equal to its ∗-transform. Then ∗S = S if and only if S is finite and all its elements are finite. A similar result holds if S is of any level (see Section 1.13 for a definition).

Proof: Left as an exercise. In general, it may happen that S is not contained in ∗S. Examples are where S = {IR}, because then ∗S = {∗IR}, so that IR is not an element of ∗S, or where S is a set of functions from IR to IR.

Now let f be some function from a set of numbers to another set of numbers. When is ∗f = f? If this is true then the domains of both functions must be equal

67

and the theorem tells us when this is the case. Conversely, if both domains are equal and hence are finite, then if X is their common domain it follows that ∗ f(x) = f(x) for all x ∈ X, so that if f is a function from X onto some set Y , then Y must be finite, ∗Y = Y and ∗f too is a function from X onto Y . This can be generalized as follows.

Corollary 2.4.2 Let f be a function from X onto Y , and assume that X is finite, that ∗x = x for all x ∈ X and ∗y = y for all y ∈ Y . Then ∗X = X, ∗Y = Y and ∗f = f.

Corollary 2.4.3 Let g be a function from W to X, let f be a function from X onto Y , and assume that X is finite, that ∗x = x for all x ∈ X and that ∗y = y for all y ∈ Y . Then, with wi, w ∈ W, ∗(f◦g)(H(wi)) = (f◦∗g)(H(wi)) and ∗(f◦g)(∗w) = (f◦g)(w),

even if W is not finite.

Proof: It follows that ∗X = X, ∗Y = Y and that ∗f = f. Since ∗(f◦g) = ∗f◦∗g, the first equality follows immediately, and since ∗g(∗w) =∗ (g(w)) = g(w) also the second one follows quickly.

A number of useful results has been summarized below. a) (Empty set) ∗∅ = ∅. b) (Relations for sets) Given nonempty sets Si, S, Ti and T, H(Si) = H(Ti) if and only if {i : Si = Ti}∈ U, H(Si) 6= H(Ti) if and only if {i : Si 6= Ti}∈ U, H(si) ∈ H(Si) if and only if {i : si ∈ Si}∈ U, H(Si) ⊆ H(Ti) if and only if {i : Si ⊆ Ti}∈ U; in particular, ∗S = ∗T if and only if S = T, ∗S 6= ∗T if and only if S 6= T, ∗ s ∈∗S if and only if s 6= S, S ⊆∗T and only if S ⊆ T, and similarly for ⊂, ⊇ and ⊃. c) (Operations on sets) Given sets Si, S, Ti and T, H(Si ∪Ti) = H(Si)∪H(Ti), H(Si ∩Ti) = H(Si)∩H(Ti), H(Si −Ti) = H(Si)−H(Ti),

68

if Si ⊆ Ti then H(Sc i) = (H(Si))c, where taking the complement is with respect to Ti and H(Ti), respectively; in particular, ∗(S ∪T) = ∗S ∪∗T, ∗(S ∩T) = ∗S ∩∗T, ∗(S −T) = ∗S −∗T, if S ⊆ T then ∗(Sc) = (∗S)c, where taking the complement is with respect to T and ∗T, respectively. d) (Pairs) H(hsi,tii) = hH(si),H(ti)i; in particular, ∗hs,ti = h∗s,∗ti, and similar equalities hold for n-tuples, n = 3,4,.... e) (Functions) H(fi(xi)) = H(fi)(H(xi)); in particular, (a) ∗(f(x)) = ∗f(∗x). f) (Composite functions) H((fi◦gi)(wi)) = (H(fi)◦H(gi))(H(wi)); in particular, ∗[(f◦g)(w)] = (∗f◦∗g)(∗w). Most of these relationships are easily shown or even follow directly from definitions. As far as a) is concerned, apply the definition of ∗S with S = ∅. Then it appears that no H(si) can be found, so that ∗S must empty, hence, ∗∅ = ∅. And the proof to show under b) that ∗s ∈ ∗S if and only if s ∈ S, follows from H(si) ∈ H(Si) if and only if {i : si ∈ Si}∈ U, because this gives that ∗s ∈ ∗S if and only if {i : s ∈ S}∈ U and this is true if and only if s ∈ S. Remark: Obviously, not only, say, H(fi)(H(xi)) and ∗f(∗x), but also mixtures like ∗ f(H(xi)) and H(fi)(∗x) are well-defined. Remark: It is not true that if t ∈∗S, then t = ∗s for some s ∈ S. As a counterexample, let S = IN, and let t = H(1,2,3,...). Remark: It has been made clear that the inclusion ∗S ⊇ S does not always hold, but it can be ‘saved’ by introducing σS, defined by σS = {∗s : s ∈ S}, because then trivially σS ⊆ ∗S. Moreover, there is a ‘natural’ bijection ϕ from S onto σS, defined by ϕ(s) = ∗s. σS is sometimes called the standard copy of S, but often it is not a standard set.

69

Remark: In another presentation of nonstandard analysis, ∗S is called the nonstandard version of S, but ∗S is still called standard, which leads to the confusing conclusion that the nonstandard version of a classical set is standard. In still another presentation of nonstandard analysis ∗S is called the natural extension of ∗S, but as we have seen, S not always is an extension of ∗S.

So even the wording of nonstandard analysis is sometimes nonstandard.

2.5 External constants

Having defined internal and standard constants, let us now consider external constants. As already said before, an external constant is a constant that is not internal.

Theorem 2.5.1 If S is a classical infinite set of numbers, then S is external.

Proof: First notice that if T is any set of classical numbers, then ∗T ⊇ T, because if t ∈ T then ∗t = H(t) = t. This implies that if T ⊆ S, then S ∩∗T = T, for if s ∈ S ∩∗T then s = ∗s ∈ ∗T, so that s ∈ T, hence S ∩∗T ⊆ T. Since also T ⊆ S ∩∗T, it follows that S ∩∗T = T. Now if S would be internal, then,

S = H(Si) = {H(si) : si ∈ Si} for suitable sets Si. Since S is infinite, it contains a countably infinite set,

T = {tn : n ∈ IN}.

Let Ti = Si ∩T, then, H(Ti) = H(Si ∩T) = H(Si)∩H(T) = S ∩∗T = T, so that T too would be internal. Let Q = {i : Ti is infinite}, then Q 6∈ U, as otherwise it is possible to select t0 i ∈ Ti, if i ∈ Q, all different, so that, taking t0 i arbitrarily if i 6∈ Q, t0 = H(t0 i) 6∈ T, because {i : t0 i = t0} contains at most oneelement and hence is not in U. But at the same time t0 ∈ T, because H(Ti) = T,a contradiction. Therefore, Qc = {i : Ti is finite }∈ U.

70

Let, for i ∈ Qc, tk(i) be the element of Ti with largest index (if T4 would be {t6,t7,t13}, then tk(4) would be 13), and let tk(i) be arbitrary if i 6∈ Qc, then H(tk(1),tk(2),...) ∈ T, so that H(tk(1),tk(2),...) = tk for some k ∈ IN, hence {i : tk(i) = tk}∈ U, or R = {i : Ti contains at most k elements}∈ U, because of the definition of k(i), so that tk+1 6∈ Ti for all i ∈ R, hence, tk+1 6∈ H(Ti) = T, whereas tk+1 ∈ T, a contradiction. The theorem implies that, although we do not yet know much about ∗IN, ∗IR, etc., we can say already now that IN is an external subset of ∗IN, that IR is an external subset of ∗IR, etc.

Another consequence of the theorem is related to power sets. Let, given any classical set A, P(A) be the power set of A, defined by, P(A) = {S : S ⊆ A}, or, in words, P(A) is the set of all subsets of A (including A itself and ∅). Instead of P(A) also the notation 2A is used. Exercise: Show that if A is a set of numbers, then ∗(P(A)) = P(A) if A is finite. Now let A be a set of numbers and let us compare ∗(P(A)) with P(∗A). By definition, ∗(P(A)) = {H(Si) : Si ∈P(A)} = {H(Si) : Si ⊆ A}, and P(∗A) = {T : T ⊆∗A}. If T is any element of ∗(P(A)), then T = H(Si) for certain Si ⊆ A, hence T = {H(si) : si ∈ Si}⊆∗A, so that T ∈P(∗A). It follows that ∗(P(A)) ⊆P(∗A), i.e. applying the ∗-transform at a lower level (A’s level) gives no less than at a higher level (P(A)’s level). When is this inclusion strict? If A is finite, then P(A) is finite too, hence ∗(P(A)) = P(A) = P(∗A), so that the inclusion is not strict. If A is infinite, then (by the last theorem) A is an external subset T of ∗A, i.e. P(∗A) contains an element that is not internal, whereas any element of ∗(P(A)) is internal, and it follows that the inclusion is strict. In other words:

71

Theorem 2.5.2 If A is a set of numbers, the inclusion ∗(P(A)) ⊆P(∗A) always holds and is strict if and only if A is an infinite set, and if so, P(∗A) contains an element that is an external subset of A.

Since in many cases A will be infinite (e.g. if A = IN or A = IR), one should be careful when dealing with P(∗A) and prefer to work with ∗(P(A)) instead, which is standard, so that all its elements are internal. This is the deeper reason behind the rule advocated from the beginning of this book to formulate statements by means of ∈, not by means of ⊂ or ⊆. When applying transfer later on it will almost be a must to stick to this rule.

2.6 The ∗-transform of operations and expressions By operations are meant functions like taking the absolute value, addition, subtraction, multiplication, division, taking the complement, union, intersection, etc. And by expressions are meant what results when operations or other functions, or compositions thereof are applied to suitable constants or variables. The simplest expressions contain only one function, in particular one operation and no other functions, for example | x |, or x + y, or S ∪ T; more complicated expressions are, for example, {f(x)+ | y |}·z2, or (S ∪T)∩(V ∪W). (In certain computer languages expressions all of whose constants or variables are numbers are called arithmetic expressions.)

Since, when introducing the internal version H(&1,&2,...) of any classical operation &, only one operation is given, i.e. & itself, all &i must be taken equal to &, so that only the standard operation ∗& can be introduced. Since operations are functions, the definition of ∗& is straightforward. For example, standard addition in ∗IR becomes ∗+, defined by, H(si)∗+H(ti) = H(si + ti), given any H(si) and H(ti) in ∗IR. Since ∗+ is the only addition for hyperreals, the asterisk in ∗+ may be dropped, because from the context it will always be clear whether classical addition is meant or its ∗-transform ∗+. It should be kept in mind, however, that often the domains of + and ∗+ are different, so that strictly speaking the two operations are different. So the definition becomes,

H(si) + H(ti) = H(si + ti).

Whereas, by definition, internal addition is always standard, an internal sum need not, of course, be standard.

72

Standard subtraction, multiplication, division, taking the absolute value etc. are defined in exactly the same way, except that the divisor of a division must be nonzero. This extra condition can be met in the following two ways. Let H(ti) be the divisor, then H(ti) 6= 0 if and only if {i : ti 6= 0}∈ U. Now either each ti that is zero is changed to an arbitrary nonzero number, say, 1, which has no effect on the value of H(ti), and the definition becomes,

H(si)/H(ti) = H(si/ti),

or the ti that are zero are left unchanged, and the definition becomes, H(si)/H(ti) = H(ri), with ri = si/ti if ti 6= 0 and ri arbitrary otherwise. With regard to other operations similar refinements can be formulated, if necessary.

Exercise: Consider the composition of two functions.

Another group of operations is formed by the set operations of forming the union or the intersection or the difference of sets, or the complement of a set, hence ∪, ∩, −, and c. Again only their standard forms can be introduced. But whereas, say, addition has no meaning for the hyperreals, unless it is explicitly introduced for them, the classical versions of ∪, ∩, −, and c also have a meaning for internal sets, so that, for example ∗∪ must be distinguished from ∪, leading to, H(Si)∗∪H(Ti) and H(Si)∪H(Ti), where, by definition, H(Si)∗∪H(Ti) = H(Si ∪Ti), for hypersets H(Si) and H(Ti). Fortunately, ∗∪ = ∪, and the same holds for ∩, −, and c. It would be wrong, however, to assume that any operation on sets would be equal to its ∗-transform. The notorious exception is power set formation. For, by definition, (∗P)(H(Si)) = H(P(Si)) = H({Ti : Ti ⊆ Si}) = {H(Ti) : H(Ti) ⊆ H(Si)}, and, P(H(Si)) = {T : T ⊆ H(Si)}, but whereas H(Ti) is internal, T might be external. So the equality ∗& = & is only guaranteed for & = ∪, ∩, −, or c, and P should be avoided even for internal sets.

73

The internal forms of more complicated expressions such as {f(x)+ | y |}·z2 and (S ∪T)∩(V ∪W), become {H(fi)(H(xi))+ | H(yi) |}·(H(zi))2 and (H(Si)∪H(Ti))∩(H(Vi)∪H(Wi)), respectively. Taking everything standard gives, {∗f(∗x)+ |∗y |}·(∗z)2 = ∗[{f(x)+ | y |}·(z)2] and (∗S ∪∗T)∩(∗V ∪∗W) = ∗[(S ∪T)∩(V ∪W)].

2.7 The ∗-transform of relations and statements; L oˇs’ theorem; the internal definition principle

The simplest mathematical relations are the atomic relations, by which are meant relations containing neither logical connectives nor quantifiers, hence relations such as =, <, ∈, etc. They can be regarded as functions to the set B = {true, false}, where true and false are the Boolean constants, which are urelements. By definition, ∗true ≡ true and ∗false ≡ false, hence ∗B = B. (In order to avoid confusion equivalence will be indicated by means of ≡.) An atomic relation with n arguments is called n-ary. Many atomic relations are binary. Atomic statements result when atomic relations are applied to suitable arguments, which must be expressions. The smaller-than relation in IR thus leads to the atomic statements s < t (in which case R is binary), s < t < u (now R is ternary), etc. with s,t,u ∈ IR. So if a relation is regarded as a function, the corresponding statement should be regarded as a function value. As with operations, only the standard form of internal relations is introduced, but whereas some of them are new in nonstandard analysis (e.g. <) others are not, namely when the classical form is also meaningful for internal constants (e.g. =, ∈, ⊂). From the correspondence between relations and functions it follows, that if R is a binary relation such as < or ⊂, ∗R is defined by, H(si)∗RH(ti) ≡ H(siRti), for suitable expressions si and ti. Here by definition, H(siRti) ≡{i : siRti}∈ U, that is to say, H(siRti) ≡ true if and only if {i : siRti ≡ true}∈ U.

74

Hence ∗R too is a binary relation. In fact, this is a very special case of L oˇs’ theorem, to be considered below. In particular, letting si = s and ti = t for all i, ∗[sRt] ≡∗s∗R∗t ≡ sRt, as {i : sRt}∈ U ≡ sRt, since sRt does not depend on i. This means that ∗s∗R∗t is equivalent to the classical statement that is obtained by removing all asterisks, which is a very simple case of transfer. For n-ary atomic relations the definitions are analogous. In case R is one of the relations =, 6=, ∈, ⊂, ⊆, ⊃, ⊇, R has also a meaning for internal expressions. Fortunately, these relations are equivalent to their standard forms. Proof: Consider ∈. By definition, H(si) ∗∈H(Si) ≡ H(si ∈ Si) ≡{i : si ∈ Si}∈ U ≡ H(si) ∈ H(Si). The other cases are left as exercises.

In case R has no meaning within nonstandard analysis, it will usually cause no confusion when the asterisk in ∗R is dropped. Below this is done anyway when explicit examples are given, such as the next two.

Examples: 1) {H(fi)(H(xi)) +|H(yi)|}·(H(zi))2 < (H(si) + H(ti))·H(ri) ≡ H[{fi(xi)+ | yi |}·z2 i < (si + ti)·ri] and this is also equivalent to, {i : {fi(xi)+ | yi |}·z2 i < (si + ti)·ri}∈ U. Taking everything standard the results are, 2) {∗f(∗x)+ |∗y |}·(∗z)2 < (∗s + ∗t)·∗r ≡∗[{f(x)+ | y |}·z2 < (s + t)·r] which is equivalent to, {f(x)+ | y |}]·z2 < (s + t)·r, because, {i : {f(x)+ | y |}]·z2 < (s+t)·r}∈ U if and only if {f(x)+ | y |}]·z2 < (s + t)·r, as the latter statement does not depend on i.

In the first example to the left there are several internal variables (the constant 2 is even standard), whereas to the right there is only a single internal statement; and in the second example to the left there are several standard variables, whereas to the right there is only a single standard statement, that, moreover, is

75

equivalent to the corresponding classical statement. As already indicated before this equivalence is an example of transfer, but a very simple one because neither logical connectives nor quantifiers occur. Note that in cases like, {∗f(∗x)+ |∗y |}·(∗z)2 = ∗[{f(x)+ | y |·(z)2] and (∗S ∪∗T)∩(∗V ∪∗W) = ∗[(S ∪T)∩(V ∪W)], considered in the preceding section, it is not always allowed to drop the asterisk to the right, because if ∗s is some classical constant s may be different from s. So at this point there is a divergence between expressions and statements.

An arbitrary statement is composed of a finite number of atomic relations, logical connectives (¬, ∧, ∨, ⇒, ⇔), quantifiers (∀ and ∃), constants, free variables and bound variables. First of all, statements with at least one logical connective, but without quantifiers will be considered. They will be written as,

R(P(s,s0,s00,...), Q(t,t0,t00,...), S(u,u0,u00...),...),

where P, Q, S,... are atomic relations, and s, s0, s00,..., t, t0, t00,..., u, u0, u00,... are expressions of constants and free variables. In the beginning of this section it was found more convenient to write the atomic statement derived from an atomic binary relation R as sRs0. This is now written as R(s,s0). In order to simplify the notation (s,s0,s00,...), (t,t0,t00,...), (u,u0,u00,...) will be abbreviated to (s), (t), (u), respectively. Regarding R(P(s), Q(t), S(u),...) as a function value, the corresponding (Boolean) function is, of course, R, which is called a relation. Its domain is B ×B ×B ×..., and its range is B, where, as before, B = {true, false}. As an example, let

R(P(s), Q(t), S(u)) ≡ (P(s)∧Q(t)) ⇒¬(S(u)), which, emphasizing that the logical connectives are functions, can be written as,

((⇒◦(∧,¬))◦(P,Q,S))(s,t,u), where by definition R ≡ (⇒◦(∧,¬)), (P,Q,S) (s,t,u) ≡ (P(s), Q(t),S(u)) and (⇒◦(∧,¬))(x1,x2,x3) ≡ (x1 ∧x2) ⇒ (¬x3).

Remark:In formal logic expressions, statements, and statements without free variables are often called terms, formulae (or predicates), and sentences, respectively.

76

As before, internal relations will only be standard, so that internal statements have the form, H(R(P(si), Q(ti), S(ui),...))) ≡∗R(∗P(H(si)), ∗Q(H(ti)), ∗S(H(ui)),...). Continuing the last example, this gives, H(R(P(si), Q(ti), S(ui)) ≡∗R(∗P(H(si)), ∗Q(H(ti)), ∗S(H(ui))) ≡ (∗P(H(si)) ∗∧ ∗Q(H(ti))) ∗ ⇒ ∗¬(∗S(H(ui))) ≡ ((∗ ⇒◦(∗∧,∗¬))◦(∗P,∗Q,∗S))(H(si),H(ti),H(ui)). But, whereas the domains of P, Q and S are rather arbitrary their ranges are B, as is that of R, so that the range of g ≡ (P,Q,S) is X = B ×B ×B and as ∗B = B and ∗(B×B×B) = B×B×B, by Corollary 2.4.2 with f ≡ (⇒◦(∧,¬)) and Y = B, it follows that, ∗R ≡ R ≡∗((⇒◦∧,¬)) ≡ (∗ ⇒◦(∗∧,∗¬)) ≡ (⇒◦(∧,¬)). Hence, H(R(P(si), Q(ti), S(ui))) ≡ R(∗P(H(si)), ∗Q(H(ti)), ∗S(H(ui))), where the asterisks in ∗P, ∗Q, and ∗S could be dropped, as it is clear that standard forms are meant. This is a less trivial case of L oˇs’ theorem.

Taking si = s, ti = t, ui = u for all i, Corollary 2.4.3 gives that, ∗[R(P(s),Q(t),S(u))] ≡ H(R(P(s),Q(t),S(u))) ≡ R(P(s),Q(t),S(u)) that is, ∗[R(P(s),Q(t),S(u))] ≡ R(P(s),Q(t),S(u)), hence the standard form of the sample statement is equivalent to its classical form, which expresses a less trivial case of transfer.

Exactly the same kind of argument can be repeated for any other statement without quantifiers.

Finally, let arbitrary statements with at least one quantifier be given. They are assumed to be in prenex normal form (see Section 1.3), hence with all logical connectives to the right of the quantifiers. Also it is assumed that each bound variable occurs to the left of the ∈ relation. From this it follows that in any internal statement each bound variable automatically is internal, even if it is not

77

explicitly written as such. To see this observe that if to the right of this∈relation there is a constant or a free variable, then the bound variable must be internal, because that constant or that free variable is assumed to be internal, and if there is a bound variable, then the latter must be internal, as can be seen by repeating the argument. As an example see the least upper bound theorem below.

First let just one quantifier be included:

∃x ∈ X : R(P(x,s),...),

or,

∀x ∈ X : R(P(x,s),...), where X is some set and R(P(x,s),...) is an arbitrary statement without quantifiers. Then, given sets Xi, H(∃xi ∈ Xi : R(P(xi,si),...)) ≡∃H(xi) ∈ H(Xi) : R(∗P(H(xi),H(si)),...), and similarly for∀. Note that to the left xi and to the right H(xi) may be replaced by x. To prove this equivalence observe that the statement to the left is equivalent to, 1) Q = {i : [∃xi ∈ Xi : R(P(xi,si),...)]}∈ U, and that the statement to the right is equivalent to, R(∗P(H(x0 i),H(si)),...), for certain x0 i, hence, as no quantifiers are involved, to, H(R(P(x0 i,si),...)), hence to, 2) {i : R(P(x0 i,si),...)}∈ U. Now if statement 1) is true, for i ∈ Q take some x0 i ∈ Xi such that R(P(xi,si),...),and if i 6∈ Q take x0 i arbitrary. Then, {i : [∃xi ∈ Xi : R(P(xi,si),...)]}⊆{i : R(P(x0 i,si),...)}, and statement 2) follows. Conversely, if statement 2) is true, then, since, {i : [∃xi ∈ Xi : R(P(xi,si),...)]}⊇{i : R(P(x0 i,si),...)}, statement 1) follows. This completes the proof of the equivalence. A similar result for ∀ can be shown as follows.

78

¬[H(∀xi ∈ Xi : R(P(xi,si),...))] ≡ H(∃xi ∈ Xi : ¬[R(P(xi,si),...))] ≡ ∃H(xi) ∈ H(Xi) : ¬[R(∗P(H(xi),H(si)),...)] ≡ ¬[∀H(xi) ∈ H(Xi) : R(∗P(H(xi),H(si)),...)].

Secondly, let two quantifiers be involved, as in, ∃x ∈ X : ∀y ∈ Y : R(P(x,y,s),...), then a similar result can be proved by using that, H(∀yi ∈ Yi : R(P(xi,yi,si),...)) ≡ ∀H(yi) ∈ H(Yi) : R(∗P(H(xi),H(yi),H(si)),...), since here only one quantifier is involved.

Applying induction, it follows that a similar result can be shown for a statement containing arbitrarily many quantifiers. Again the asterisk in ∗P may be dropped if no confusion can arise. Hence the following result – which is quite fundamental – has been proved. In its formulation R is no longer regarded as a function of substatements P(s), Q(t), S(u),..., and of sets X, X’, X00,..., required in the quantifications, but simply as a function of constants and free variables X, X0, X00,...,s, s0, s00,..., so that P, Q, S,... have altogether disappeared.

Theorem 2.7.1 (L oˇs’ theorem.) Let any classical statement,

R(X,X0,X00,...;s,s0,s00,...),

with a finite number of constants or free variables X, X0, X00,..., s, s0, s00,..., and a finite number of logical connectives and quantifiers be given. Certain standard constants such as 0 need not be mentioned explicitly. X, X0, X00,... are the sets required to formulate the quantifications properly; that is to say that X must occur in ∃x ∈ X or in ∀x ∈ X, for some suitable bound variable x, and similarly for X0, X00,..., and that conversely each quantification is taken care of this way. Then,

H[R(Xi,X0 i,X00 i ,...;si,s0 i,s00 i ,...)] ≡ R(H(Xi),H(X0 i),H(X00 i ),...;H(si),H(s0 i),H(s00 i ),...).

79

In other words, (regarding for the time being each constant as a free variable) given any classical statement, to each of its free variables q (so that here q ∈ {X,X0,X00,...,s,s0,s00,...}) add the index i, which defines infinite sequences (qi), and which also defines an infinite sequence of classical statements, which sequence in turn defines an internal statement (the one to the left). Then the latter is equivalent to the statement (the one to the right) that results from the given classical statement by replacing each free variable q by the internal free variable H(qi) that is defined by the infinite sequence (qi). It may happen, of course, that for certain q this leads to ∗q, namely when qi = q for all i. Or, in still other words, given any classical statement, replace each of its free variables q by its internal version H(qi) (that might be its standard version ∗q), then the resulting statement (the one to the right) is equivalent to the statement (the one to the left) that results from it by removing all H’s and all asterisks, and putting one single H in front.

The formulation of the theorem implies that each bound variable must occur in some set inclusion. A more careless formulation would be allowed if it would be required that each bound variable is internal (which is automatically true in the formulation given).

So far, bound variables were only combined with quantifications, but they can occur in other formulations as well. A very usual one is the definition of a set, such as, {x ∈ X : P(x,X,s)}, where P(x,X,s) is some statement. Then obviously x is a bound variable that is not combined with ∃ or ∀, although one says ‘the set of all x in X such that ...’ Anyway, the result is a set, hence a constant or a variable, not a statement. Yet, the theorem can be applied here and leads to the following corollary.

Corollary 2.7.1 (The internal definition principle.) Let any statement be given, say P(x,X,s), where X is some set and x ∈ X makes sense. Let X and s be internal, then so is the set T = {x ∈ X : P(x,X,s)}.

Proof: Since X is internal, also x is internal, so for suitable Xi and si, T = {H(xi) ∈ H(Xi) : P(H(xi),H(Xi),H(si))}. Let, Ti = {xi ∈ Xi : P(xi,Xi,si)}.

80

Then H(xi) ∈ T if and only if H(xi) ∈ H(Xi)∧P(H(xi),H(Xi),H(si)), hence by L oˇs’ theorem, if and only if H(xi ∈ Xi ∧P(xi,Xi,si)), hence if and only if H(xi ∈ Ti), hence if and only if H(xi) ∈ H(Ti), so that H(xi) ∈ T ≡ H(xi) ∈ H(Ti), i.e. T = H(Ti), hence T is internal.

2.8 Transfer; the standard definition principle

In this section a few consequences of L oˇs’ theorem will be presented that belong to the main tools of nonstandard analysis.

Theorem 2.8.1 (Transfer, first formulation.) Let R(X,X0,X00,...;s,s0,s00,...) be as before. Then, R(X,X0,X00,...;s,s0,s00,...) ≡ R(∗X,∗X0,∗X00,...;∗s,∗s0,∗s00,...).

Proof: In L oˇs’ theorem take Xi = X, X0 i = X0, X00 i = X00,..., si = s, s0 i = s0, s00 i = s00,..., for all i, then, ∗[R(X,X0,X00,...;s,s0,s00,...)] ≡ R(∗X,∗X0,∗X00,...;∗s,∗s0,∗s00,...),

but,

∗[R(X,X0,X00,...;s,s0,s00,...)] ≡ R(X,X0,X00,...;s,s0,s00,...).

One fact is disguised in this formulation of transfer, namely that in the statement to the right the bound variables need not be standard. A simple example may clarify this:

∃x ∈ X : P(x,s) ≡ ∃H(xi) ∈∗X : P(H(xi),∗s) ≡∃x ∈∗X : P(x,∗s),

81

where P(x,s) is some substatement. So transfer expresses the fact that any classical statement is equivalent to the nonstandard statement that results from it by replacing everything by its ∗-transform except the bound variables. What will happen if the bound variables too are replaced by their ∗-transforms? In the example this gives, ∃∗x ∈∗X : P(∗x,∗s), but since ∗x ∈ ∗X ≡ x ∈ X and P(∗x,∗s) ≡ P(x,s), it follows that this is equivalent to ∃x ∈ X : P(x,s), and a similar equivalence holds for any classical statement. This leads to transfer in another formulation.

Theorem 2.8.2 (Transfer, second formulation.) Given any internal statement, replacing everything, including every bound variable, by its standard version is equivalent to replacing everything except every bound variable by its standard version.

The advantage of transfer in its first formulation is that it is the bridge between classical mathematics and nonstandard mathematics; that of transfer in its second formulation that one remains within nonstandard mathematics, so that one may forget about classical mathematics, that via the ∗-transform is mapped in a oneto-one kind of way to a certain part of nonstandard mathematics. In the second formulation it comes close to what it is in Nelson’s internal set theory, that strictly speaking ignores classical mathematics completely, and that within nonstandard mathematics defines the difference between standard and internal (and external, which in essence is an irrelevant notion in this theory, however).

Let us now see what really is the essential part of transfer. First of all it is clear that it is only of some value if bound variables are present. Simple examples of transfer in its second formulation are, ∃∗x ∈∗X : P(∗x,∗s) ≡∃x ∈∗X : P(x,∗s), and ∀∗x ∈∗X : P(∗x,∗s) ≡∀x ∈∗X : P(x,∗s), where H(xi) has been replaced by x. Keep in mind that x ∈ ∗X implies that x is not necessarily standard, but internal. Clearly, in one direction these two equivalences are trivial (at least if X ⊆ ∗X), and what is really important are the following two implications, ∃x ∈∗X : P(x,∗s) ⇒∃x ∈∗X : P(∗x,∗s),

82

and

∀∗x ∈∗X : P(∗x,∗s) ⇒∀x ∈∗X : P(x,∗s), or, in words, if there exists an internal x such that something is true, then there even exists a standard ∗x such that that something is true; and if something is true for all standard ∗x, then that something is even true for all internal x. The second implication leads from classical mathematics towards nonstandard mathematics, and the first one leads in the opposite direction back from nonstandard mathematics to classical mathematics. Obviously, in case ∗X = X, also the two implications are trivial, and transfer is of no use. Examples where this is not the case have already been given in Section 1.4. In that section it also became clear what really is the purpose of transfer: in a number of important cases the nonstandard form of a classical statement can be given a much simpler form (see Section 1.4, where statements (1.1) and (1.1) are equivalent to statement (1.1)). Inevitably, this simpler form requires the use of certain internal constants that are not standard (nonzero infinitesimals in the example of Section 1.4).

Example: As a nontrivial example let us consider the least upper bound theorem for IR, which in its classical form reads, ∀X ∈P(IR) : {X 6= ∅∧[∃b ∈ IR : ∀x ∈ X : x ≤ b]}⇒ ∃β ∈ IR : [∀x ∈ X : x ≤ β]∧[∀ε ∈ IR, ε > 0 : ∃x ∈ X : x > β −ε]. By transfer in its first formulation this is equivalent to, ∀X ∈∗ (P(IR)) : {X 6= ∅∧[∃b ∈∗IR : ∀x ∈ X : x ≤ b]}⇒ ∃β ∈∗IR : [∀x ∈ X : x ≤ β]∧[∀ε ∈∗IR, ε > ∗0 : ∃x ∈ X : x > β −ε], which by transfer in its second formulation is equivalent to, ∀∗X ∈∗ (P(IR)) : {∗X 6= ∅∧[∃∗b ∈∗IR : ∀∗x ∈∗X : ∗x ≤∗b]}⇒ ∃∗β ∈∗IR : [∀∗x ∈∗X : ∗x ≤∗β]∧[∀∗ε ∈∗IR, ∗ε > 0 : ∃∗x ∈∗X : ∗x > ∗β −∗ε], which is equivalent to the classical version of the theorem. Note that in this example different bound variables have been indicated by the same symbol, which is against the rule advocated before in Section 1.3, but this time seems appropriate. Also note that internal bound variables have not explicitly be indicated a such, simply because they are automatically internal.

83

Exercise: Apply L oˇs’ theorem to the least upper bound theorem, but with internal free variables.

Just as transfer is a direct specialization of L oˇs’ theorem, the next result is a direct specialization of the internal definition principle.

Theorem 2.8.3 (The standard definition principle.) Let ∗X be a standard set, x ∈ ∗X make sense, ∗s be standard, and P(x,∗X,∗s) be any statement. Then also, {x ∈∗X : P(x,∗X,∗s)} is standard.

Proof: The proof is a simplification of the proof of the internal definition principle, and is left as an exercise.

2.9 The ∗-transform of attributes So far the ∗-transform was concerned with expressions and statements. In this section the ∗-transforms of a number of attributes, such as finiteness are considered. A) ∗finite or hyperfinite sets. A set S = H(Si) is called ∗finite or hyperfinite if all Si are finite sets, or equivalently, if {i : Si is finite} ∈ U. This does not mean that S is a finite set. As a counterexample, let Si = {1,2,...,i}, then the smallest element of S is m = 1 = H(1,1,1,...), the largest is M = H(1,2,3,...) = H(i), and those in between are H(si) with 1 ≤ si ≤ i. It follows that S is an infinite set, for if not then all of them would be at most n = H(n,n,n,...) for some n ∈ N. In particular, M ≤ n, which is not true. Nevertheless, a hyperfinite set can be treated as if it were finite. For example, just as finite sets of classical numbers, hyperfinite sets of internal numbers have a smallest as well as a largest element, hence are bounded.

Exercise: Show this. Another example is where ω ∼∞and f is an internal function from ∗IN to{1,2}, such that f(1) = 1, f(ω) = 2. Then there exists a largest k ∈ ∗IN, 1 ≤ k ≤ ω, such that f(k) = 1, hence such that f(j) = 2 if k + 1 ≤ j ≤ ω. The hyperfinite

84

set here is {j : 1 ≤ j ≤ H(ni)} = H(Si) with ω = H(ni) and Si = {1,...,ni}. Such a result is trivial for finite sets, and by transfer carries over to the present case. For let F be the set of all classical f : IN →{1,2}. Then that trivial result reads, more formally,

∀n ∈ IN : ∀f ∈ F,f(1) = 1, f(n) = 2 : ∃k ∈ IN, 1 ≤ k ≤ n : [f(k) = 1∧∀j ∈ IN, k + 1 ≤ j ≤ n : f(j) = 2],

hence, by transfer,

∀n ∈∗IN : ∀f ∈∗F, f(1) = 1,f(n) = 2 : ∃k ∈∗IN, 1 ≤ k ≤ n : [f(k) = 1∧∀j ∈∗IN, k + 1 ≤ j ≤ n : f(j) = 2], in particular if n = ω ∼∞. B) ∗finite or hyperfinite numbers. A number x = H(xi) is called ∗finite or hyperfinite if all xi are finite. But there is no other choice, as all classical numbers are finite. In other words each internal number is hyperfinite. Yet an internal number might be larger than any natural number; just consider H(i), or H(i − 2). Nevertheless they can be treated as classical numbers: H(i)−1, (H(i))2, H(i)−H(i−2), etc. make sense. C) ∗real or hyperreal. A number H(xi) is called ∗real or hyperreal if all xi are real, that is to say if H(xi) ∈∗IR. Obviously, such a number need not be real, but it can be treated as a classical real number. D) ∗continuity or hypercontinuity. A function H(fi) from ∗IR to ∗IR is called ∗continuous or hypercontinuous at H(ci) ∈∗IR if for all i, fi is continuous at ci. E) ∗countable or hypercountable. A set H(Si) is called ∗countable or hypercountable if all Si are countable. Hence ∗IN is hypercountable, although it is not countable, as will be shown below.

It should now be clear what is the purpose of this section, and that there are many variations of the theme.

Exercise: Define hyperfinite and hyperinfinite sequences.

Remark: In Nelson’s internal set theory the prefix ‘hyper’ is not used: there ‘finite’ means hyperfinite, ‘countable’ means hypercountable, etc. On the other hand ‘standard finite’ means finite.

85

2.10 ∗IN, ∗ZZ, ∗Q, ∗IR: main definitions and properties

The sets ∗IN, ∗ZZ, ∗Q, and ∗IR have already been used in various examples, but will now be treated in a more orderly fashion. Their definitions, the rules of equality and identification, and the definitions of the arithmetic operations and the inequalities are all obvious from the foregoing theory: ∗IN = H(ni) : ni ∈ IN, and similarly for ∗ZZ,∗Q and ∗IR, H(xi) = H(yi) if and only if {i : xi = yi}∈ U, H(xi) = x if and only if {i : xi = x}∈ U, so that H(x) = ∗x = x, H(xi) + H(yi) = H(xi + yi),xi,yi ∈ IN, H(xi)·H(yi) = H(xi.yi),xi,yi ∈ IN, | H(xi) | = H(| xi |),xi ∈ZZ, H(xi)−H(yi) = H(xi −yi),xi,yi ∈ZZ, 1/H(xi) = H(1/xi), if all xi 6= 0 and xi ∈Q, H(xi) < H(yi) ≡ H(xi < yi), xi,yi ∈ IN, and similarly for >,≤,≥ . Trivially, here IN may be replaced by ZZ, ZZ byQ, andQ by IR. As far as inversion is concerned, it is of course sufficient that {i : xi 6= 0}∈ U, as then the xi that are 0 can be changed to, say, 1, without that the value of H(xi) is changed. Note that x < y is a statement, so x < y ∈ B, where B = {true, false}, hence H(x < y∗) ∈ ∗B = B. Definitions: Let x be a hypernumber. x is called positive hyperlarge if x > m for all m ∈ IN. Notation: x ∼∞. x is called negative hyperlarge if −x > m for all m ∈ IN. Notation: x ∼−∞. Instead of hyperlarge the term infinitely large may be used, but this does not mean that x would be equal to ∞, which is not regarded a number at all. x is called finite or limited if it is nor hyperlarge. x is called hypersmall, or is called an infinitesimal if | x |< 1/m for all m ∈ IN. Notation: x ' 0, or in case x 6= 0, x ∼ 0. x is called appreciable if x is limited but not hypersmall.

Theorem 2.10.1 Infinitely large numbers and nonzero infinitesimals exist!

Proof: For example,

H(1,2,3,...) ∼∞, H(−1,−2,−3,...) ∼−∞, H(1,1/2,1/3,...) ∼ 0 ,H(−1,−1/2,−1/3,...) ∼ 0.

86

Clearly x = H(xi) = H(+1,−1/2,+1/3,−1/4,...) ∼ 0, but is it positive or negative? This depends: if {i : xi > 0} ∈ U then x > 0 and otherwise x < 0. Many dichotomies of this kind exist in nonstandard analysis, but they will not cause any trouble, because it will not take very long before generating sequences and the H-operator will disappear from the scene. Then references to the basic free ultrafilter U will no longer be required (except when it is necessary to go back to basic principles).

Theorem 2.10.2 ε ∼ 0 if and only if 1/ε ∼ +∞ or 1/ε ∼ −∞, hence s is appreciable if and only if 1/s is appreciable. Let ε ∼ 0, ε0 ∼ 0, s and s0 be appreciable, and ω ∼∞, ω0 ∼∞. Then, ε + ε0 ' 0, ε−ε0 ' 0, ε·ε0 ' 0, ε + s and ε−s are appreciable, and ε·s ∼ 0, ε + ω ∼∞, ε−ω ∼−∞, and ε·ω ∼ +∞ or −∞ or 0, or ε·ω is appreciable, s + s0 and s−s0 are appreciable or ' 0, and s·s0 is appreciable, s + ω ∼ +∞, s−ω ∼−∞, s·ω ∼ +∞ or −∞, ω + ω0 ∼ +∞, ω−ω0 ∼ +∞ or −∞ or 0, or ω−ω0 is appreciable, and ω·ω0 ∼ +∞.

Exercise: Show this, and provide examples for all possibilities in case there are more than one.

The results of this section so far show that at long last Leibniz’ theory of hypersmall and hyperlarge numbers can be given a sound mathematical basis. It was Robinson who in 1961 for the first time formulated a complete theory of nonstandard analysis. See Sections 1.8 and 1.9 for more details. In Section 2.6 it was shown that IN is an external subset of ∗IN. An alternative proof can be given by showing that IN has a property it would not have were it internal. This happens to be a proof technique that can be applied to many external notions. Some even define external sets as sets that have such a property, but they fail to tell what that property is given that set, so that this definition would not seem to be very practical. In the case of IN the property is that a bounded internal subset S of ∗IN has a maximum. The proof of this statement is not difficult: let S = H(Si) be bounded by b = H(bi), then Si is bounded by

87

bi, at least for all i ∈ {i : Si is bounded by bi} which is an element of the free ultrafilter U, but as usual we may assume that this is true for all i (why?). Hence each Si has a maximum mi and H(mi) is the maximum of S. Now IN is bounded in ∗IN by any hyperlarge natural number, and if it were internal it would have a maximum, which it has not. Therefore, IN is external.

In a similar way it can be shown that the set of all infinitesimals is external. Since this set is bounded in ∗IR, say by 1, it must have a least upper bound β if it were internal. But this would imply that β itself would be an infinitesimal, so that 2β would be an infinitesimal as well, but 2β > β.

Exercise: Show that β would indeed be an infinitesimal itself.

Theorem 2.10.3 In ∗IR the set of all infinitesimals is external.

Another variation of the theme is the next result.

Theorem 2.10.4 In ∗IR the set of all positive hyperlarge numbers is external.

Proof: Left as an exercise. Hint: use lower bounds.

2.11 Overflow and underflow

This section is concerned with the existence in certain internal sets of an element that depending on the internal set given either is infinitely large, or is limited, or is an infinitesimal, or is not an infinitesimal.

Theorem 2.11.1 (Overflow or overspill.) Let S be an internal subset of ∗T, where T is either IN or ZZ, orQ, or IR, such that ∀m ∈ IN : ∃s(m) ∈ S : s(m) ≥ m, i.e. such that from a classical point of view S contains arbitrarily large elements, then ∃s ∈ S : s ∼∞, i.e. S contains some infinitely large element.

Proof: If ∀b ∈ ∗T : ∃s(b) ∈ S : s(b) > b, then take b ∼ ∞, which implies that s(b) ∼ ∞. If this is not true, then ∃b ∈ ∗T : ∀s ∈ S : s ≤ b, so that by L oˇs’ theorem, H(∃bi ∈ T : ∀si ∈ Si : si ≤ bi), where S = H(Si) and b = H(bi), which by the classical least upper bound theorem implies that, H(∃βi ∈ T : [∀si ∈ Si : si ≤ βi]∧[∃s0 i ∈ Si : s0 i > βi −1]),

88

hence, again by L oˇs’ theorem, ∃β ∈∗T : [∀s ∈ S : s ≤ β]∧[∃s0 ∈ S : s0 > β −1], and it follows that ∀m ∈ IN : β ≥ s(m) ≥ m, hence that β ∼∞, so that s0 ∼∞.

Theorem 2.11.2 (Underflow or underspill.) Let S be an internal subset of ∗T, with T as before, such that ∀ω ∈ ∗IN, ω ∼ ∞ : ∃s(ω) ∈ S : s(ω) < ω ∧s(ω) ∼∞, i.e. such that S contains infinitely large elements that are arbitrarily small, then ∃s ∈ S : s is limited.

Proof: Let S1 = {s ∈ S : s ≥ 1}. Clearly, S1 is not empty, so that by the classical greatest lower bound theorem, ∃β ∈∗IN : [∀s ∈ S1 : s ≥ β]∧[∃s0 ∈ S1 : s0 < β + 1], as can be shown by an argument similar to that used in the preceding proof. It follows that β is limited, as otherwise β ≤ s(β) < β, so that s0 is limited as well.

Since in ∗Q and ∗IR, x is infinitely large if and only if 1/x is a nonzero infinitesimal, these two theorems have the following counterparts.

Theorem 2.11.3 (‘Inverse’ overflow.) Let S be an internal subset of ∗Q or ∗IR, such that ∀m ∈ IN : ∃s(m) ∈ S : |s(m)|≤ 1/m, i.e. such that from a classical point of view S contains arbitrarily small elements, then ∃s ∈ S : s ' 0. Proof: It is no restriction to assume that 0 6∈ S. Apply overflow to S0 = {t : 1/t ∈ S} and use the fact that S0 is internal if (and only if) S is internal.

Theorem 2.11.4 (‘Inverse’ underflow.) Let S be an internal subset of ∗Q or ∗IR, such that∀ε, ε ∼ 0, ε > 0 : ∃s ∈ S : s ≥ ε, then ∃s ∈ S : s is not an infinitesimal.

Proof: Similar to the preceding proof. The overflow theorem immediately implies that S is an external subset of ∗S in case S is equal to IN, ZZ,Q or IR, a fact we knew already. The underflow theorem immediately implies that ∗IN\IN too is an external subset of ∗IN, and similarly

89

for the other sets of numbers. This also follows from the fact that a subset of a standard set ∗S is internal if and only if its complement with respect to ∗S is internal. The ‘inverse’ underflow theorem implies that the subset of ∗IR of all hypersmall elements of ∗IR is external, and similarly for ∗Q.

2.12 ∗IN and ∗ZZ: more properties

Theorem 2.12.1 If n ∈∗Q, then either n ∈ IN or n is hyperlarge.

Proof: If n = H(ni) is not hyperlarge, then n ≤ m for some m ∈ IN, so that, {i : 0 < ni ≤ m} ∈ U, but then {i : ni = m0} ∈ U for precisely one m0 ∈ IN, m0 ≤ m, as follows from the properties of U, so that n = m0. Theorem 2.12.2 Given any ω ∈∗IN, ω ∼∞, then S = [1,ω] is uncountable.

Proof: The proof is given by constructing a bijection between S and the set of all infinite sequences of 0’s and 1’s, which is known to be uncountable.

Given ω1 = ω, the interval S is split into intervals S0 = (0,ω0] and S1 = (ω0,ω1] of approximately equal length (see the details below), S0 is in a similar way split into S00 = (0,ω00] and S01 = (ω00,ω01], and S1 into S10 = (ω01,ω10] and S11 = (ω10,ω11], etc., where at each split the limits of all subintervals involved are given by a column of:

ω1 = ω11 = ω111 = ... ω110 = ... ω10 = ω101 = ... ω100 = ... ω0 = ω01 = ω011 = ... ω010 = ... ω00 = ω001 = ... ω000 = ... 0 = 0 = 0 = ..., where ω0...00 = b(ω0...01 +0)/2c, and, given that b is any string of 0’s and 1’s with an even positive binary value and that c is any such string with an odd binary

90

value, ωb = b(ωb−1 + ωb+1)/2c, ωc = ω(c−1)/2, so that, for example, ω0111 = ω011 = ω01 = ω0 = b(ω1 + 0)/2c, ω0011 = ω001 = ω00 = b(ω01 + 0)/2c, ω1011 = ω101 = ω10 = b(ω01 + ω11)/2c, ω1111 = ω111 = ω11 = ω1 = ω, ω0001 = ω000 = b(ω001 + 0)/2c, ω0101 = ω010 = b(ω001 + ω011)/2c, ω1001 = ω100 = b(ω011 + ω101)/2c, ω1101 = ω110 = b(ω101 + ω111)/2c.

Each of these intervals contains a hyperinfinite number of elements. For example, if ω = H(1,2,3,4,5,6,7,8,...), then

S = H((0,1],(0,2],(0,3],(0,4],(0,5],(0,6],(0,7],(0,8],...), S0 = H(∅,(0,1],(0,1],(0,2],(0,2],(0,3],(0,3],(0,4],...), S1 = H((0,1],(1,2],(1,3],(2,4],(2,5],(3,6],(3,7],(4,8],...), S00 = H(∅,∅,∅,(0,1],(0,1],(0,1],(0,1],(0,2],...), S01 = H(∅,(0,1],(0,1],(1,2],(1,2],(1,3],(1,3],(2,4],...), S10 = H(∅,∅,(1,2],(2,3],(2,3],(3,4],(3,5],(4,6],...), S11 = H((0,1],(1,2],(2,3],(3,4],(3,5],(4,6],(5,7],(6,8],...), etc.

Let ω = H(ni). Given any x = H(xi) ∈ S, it may be assumed that 0 < xi ≤ ni for all i. Either x ∈ S0 or x ∈ S1, if x ∈ S0, then either x ∈ S00 or x ∈ S01, and if x ∈ S1, then either x ∈ S10 or x ∈ S11, etc., which defines a unique infinite sequence of 0’s and 1’s. If, for example, x ∈ S0, x ∈ S01, x ∈ S011,..., then the sequence is (0,1,1,...).

Conversely, any infinite sequence of 0’s and 1’s defines a sequence of intervals for some x = H(xi), inducing for each i a sequence of intervals for xi. Continuing the example, let i = 7, then, if the infinite sequence of 0’s and 1’s is (0,1,1,...), the sequence of intervals for x7 is,

(0,3],(1,3],(2,3],...,

because (0,3] is the 7th term of the sequence generating S0, (1,3] that of the sequence generating S01, (2,3] that of the sequence generating S011, etc., and subsequent terms are either (2,3] or ∅, and an ∅ is followed by ∅’s only. Now no matter how large is ni, the interval sequence for xi will certainly contain a term of length 1, as, the length of (a,b] being b−a, at each split an interval is split into subintervals of proximately equal length. In fact it will take no more than d2lognie splits to find an interval of length 1. For each i with ni ≥ 2 let xi be the upper limit of any term of the interval sequence for xi whose length is 1 (all

91

those terms have the same upper limit), and for each i with ni = 1 let xi = 1. Then x ∈ S. In the example, x7 = 3. According to the construction in this proof if as before ω = H(i), then,

(1,1,1,...) leads to x = ω, (0,0,0,...) to x = 1, (0,1,1,...) to x = H(1,1,1,2,2,3,3,...), (1,0,0,...) to x = H(1,2,2,3,3,4,4,...), etc.

Note that the last two x’s differ by 1 = H(0,1,1,...), and that the last sequence of 0’s and 1’s can be found from the one but last by ‘adding 1 at infinity’.

From the construction it also follows that if the generating sequence (ni) for ω is nondecreasing (as it is in the example given), then the generating sequence (xi) for x too is nondecreasing. Now let x be any hyperlarge element of ∗IN, then there exists an ω = H(ni) ≥ x such that (ni) is nondecreasing, for simply let ni = max{xj : j ≤ i}. Starting from this ω and computing the xi from the infinite sequence of 0’s and 1’s that corresponds to x, it follows that (xi) is nondecreasing as well. This shows the following corollary, that probably does not have much practical value. Corollary 2.12.1 If x ∈ ∗IN and x ∼ ∞, then there exists a nondecreasing infinite sequence (xi) tending to infinity such that x = H(xi).

Example: Let x = H(yi), where yi = 1 if i = 2j + 1 for some j, yi = 2 if i = 4j + 2 for some j, yi = 3 for i = 8j + 4 for some j, etc. Hence for each n the number of yi = n is infinitely large, and (yi) is certainly not nondecreasing, and this sequence has no limit. Assume that x ∼ ∞, which is possible, as e.g. {1,2,4,8,16,...} could be an element of the filter U. Note that y1 = 1, y2 = 2, y4 = 4, etc. Then,

ω = H(1,2,2,3,3,3,3,4,4,4,4,4,4,4,4,8,...).

To x there corresponds an infinite sequence of 0’1 and 1’s. Which one is difficult to tell, because this entirely depends on the underlying filter U. To this sequence there corresponds the desired nondecreasing sequence (xi), and again it is difficult to tell which one. Even though the desired (xi) cannot be determined constructively, the following result has been shown.

Corollary 2.12.2 ∗IN is already generated by all nondecreasing infinite sequences that tend to infinity.

92

Subtracting well chosen H(xi) and H(yi) from each other will destroy this nice property, however, see Section 1.10.

Another corollary of the theorem is, that since IN is countable, it follows that if ω ∼∞, then [1,ω]−IN is uncountable, and that ∗IN and ∗IN−IN are uncountable as well. A direct proof of this is as follows. Proof: The proof uses a variation of Cantor’s diagonal method. If ∗IN were countable, let s(n) = H(si(n)) = H(s1(n),s2(n),...) be its n-th element, n ∈ IN. Let,

t1 = 1 + s1(1), t2 = 1 + max(s2(1),s2(2)), t3 = 1 + max(s3(1),s3(2)3,s3(3)),

etc. Then t = H(t1,t2,t3,...) is larger than each s(n), including itself, as t = s(j) for some j, contradiction.

As regards externality, recall from Section 2.6 that IN is an external set, and so is ∗IN−IN, as these two sets are each others complement with respect to ∗IN, see Section 2.5. Seemingly friendly functions from ∗IN to ∗IN turn out to be external as well. For example, let f be a function from ∗IN to ∗IN such that f(n) = 1 if n ∈ IN and f(n) 6= 1 if n 6∈ IN. Then f is external. For if f were internal, then by the internal definition principle the set T = {x ∈ ∗IN : f(x) = 1} would be internal, but T = IN. After all, f is not so friendly, as the external IN is involved in its definition. Exercise: Show that f is external if f(n) = 1 if n ∼∞, and f(n) 6= 1 if not. Exercise: Let f be a function from ∗IN to {1,2} such that f(n) = 1 if and only if n = 1, 2, or 3. Show that f is a standard function. Exercise: Let f be an internal function from ∗IN to itself, such that f(n) = 1 if n ∈ IN. Show that f(n) = 1 for all n ∈∗IN and that f is standard. Theorem 2.12.3 Let f be an internal function from ∗IN to{1,2}, such that both 1 and 2 are assumed somewhere, i.e. such that f is onto. Moreover, let f(n) = 2 for all n ≥ b for some b ∈∗IN. Then there is a β < b, β ∈∗IN such that f(β) = 1 and f(β) = 2 if n > β. Hence there is a last such that f(β) = 1, even though b ∼∞ is allowed, and {n : n ≤ b} is uncountable in that case. Proof: The set {x ∈∗IN : f(x) = 1} is internal and is bounded above by b, hence by the least upper bound theorem in its internal form it has a least upper bound β, which must be a maximum. The results of this section have obvious counterparts for ∗ZZ.

93

2.13 ∗Q and ∗IR: more properties; standard part

Theorem 2.13.1 (Standard part theorem.) If x ∈∗IR then either | x |∼∞, or x = r + ε, r ∈ IR, ε ' 0 for unique r and ε.

Proof: Let x not be hyperlarge. Then the uniqueness of r and ε is easily shown, for let r + ε = r0 + ε0, r,r0 ∈ IR, ε,ε0 ' 0, then r − r0 ' 0, but r − r0 ∈ IR, hence r−r0 = 0, so that r = r0 and ε = ε0. Since x is limited, it follows that for some b ∈ IN, | x |≤ b. Hence S = {s : s ∈ IR,s ≤ x} is a nonempty subset of IR that is bounded above by b. Indeed, S is nonempty because −b ∈ S. By the least upper bound theorem S has a least upper bound β ∈ IR. If β < x−1/m for some m ∈ IN, then β would not be an upper bound of S. If β > x + 1/m for some m ∈ IN, then would not be the least upper bound of S. Therefore, for all m ∈ IN, | β −x |≤ 1/m, i.e. β −x ' 0, so that r = β is the desired real. Definition: If x is a limited hyperreal, then the (unique) standard hyperreal r that is infinitely close to x is called the standard part of x, which is denoted by st(x). The theorem is false if ∗IR is replaced by ∗Q.

Counterexample: Let (xi) be a Cauchy sequence of rationals, such that R(xi) = √2. Then x = H(xi) ∈ ∗Q and x is limited. Suppose x = r + ε, r ∈Q, ε ' 0. Since also x ∈ ∗IR, it follows from the theorem that x = √2 and r = √2 would be the only possibility, a contradiction.

It follows that not every limited hyperrational has a standard part in Q. Yet limited hyperrationals do have a standard part, in IR and in fact the entire IR can thus be obtained. Theorem 2.13.2 Given any r ∈ IR, there exists an x ∈∗Q such that st(x) = r.

 

Proof: If r ∈ IR, then there exists a sequence (rn), rn ∈Q, such that rn tends to r if n tends to infinity, hence for all m ∈ IN, | rn −r |< 1/m if n is large enough. This implies that, ∀m ∈ IN : {i :| ri −r | 1/m}∈ U, hence that x = H(ri) ' r. Hence in some very special way ∗Q completely defines IR.

94

Theorem 2.13.3 Let a,b ∈ IR, a < b. If x ∈∗[a,b], then st(x) ∈ [a,b], but if the interval in R is not closed, this is sometimes not true.

Proof: Since x is limited, st(x) is well defined. If st(x) 6∈ [a,b], then either st(x) = a−δ or st(x) = b + δ for some δ ∈ IR, δ > 0. But x−st(x) ' 0, hence either x < a−δ/2 or x > b + δ/2, but then x is not in ∗[a,b]. If, for example, x ∈∗ (a,b], let ai in IR converge to a such that a < ai ≤ b, then x = H(ai) ∈∗ (a,b] but st(x) = a 6∈ (a,b].

2.14 An alternative to introducing ∗ZZ, ∗Q and ∗IR

In Sections 2.1 and 2.10 the following scheme for the introduction of ∗IN, ∗ZZ, ∗Q and ∗IR was used, IN → ZZ → Q → IR ↓ ↓ ↓ ↓∗ IN ∗ZZ ∗Q ∗IR

but there is an alternative, namely,

IN ZZ Q IR ↓ ↑ ↑ ↑∗ IN → ∗ZZ → ∗Q → ∗IR Extending ∗IN to ∗ZZ directly. Consider pairs hm,ni of elements m and n of ∗IN, and let these pairs generate constants Z0(m,n), subject to exactly the same identification and equality rules as were given in Section 2.1 for Z(m,n), m,n ∈ IN. Let ZZ0 = {Z0(m,n) : m,n ∈∗IN}. In ZZ0 (non)negativity, (non)positivity, absolute value, addition, subtraction, multiplication and the inequalities are defined in exactly the same way as they were defined for ZZ. A ‘natural’ bijection between ZZ0 and ∗ZZ can now be established as follows. Given any z0 ∈ZZ0, there are infinite sequences (mi), and (ni), mi,ni ∈ IN, such that z0 = Z0(H(mi),H(ni)). Each pair hmi,nii defines the element Z(mi,ni) ∈ ZZ, hence the two sequences define the element z = H(Z(mi,ni)) ∈ ∗ZZ. So each z0 ∈ ZZ0 defines a z ∈ ∗ZZ.

95

Conversely, each z ∈∗ZZ defines a z0 ∈ZZ0. It is not difficult to see that the bijection that is implied by this preserves (non)negativity, (non)positivity, absolute value, addition, subtraction, multiplication and the inequalities, so that ZZ0 may be identified with ∗ZZ by identifying corresponding z0 and z. Extending ∗ZZ to ∗Q directly. This time consider pairs hm,ni of elements m and n of ∗ZZ, and let these pairs generate constants Q0(m,n), subject to exactly the same identification and equality rules as were given in Section 2.1 for Q(m,n), m,n ∈ ZZ. Let Q0 = {Q0(m,n) : m,n ∈ ∗ZZ}. For the elements of Q0 everything is defined in exactly the same way as it was done for the elements ofQ. Now a bijection betweenQ0 and ∗Q can be established that preserves everything, so that Q0 may be identified with ∗Q. In the preceding section we have shown that given any r ∈ IR there exists an x ∈ ∗Q such that st(x) = r. An alternative proof is now as follows. Given any r ∈ IR, ∀n ∈ZZ: ∃m ∈ZZ: m ≤ nr < m + 1, hence, by transfer, ∀n ∈∗ZZ: ∃m ∈∗ZZ: m ≤ n·∗r < m + 1, but ∗r = r, hence m/n ≤ r < m/n + 1/n. Take n ∼ ∞, then, as 1/n ∼ 0, st(m/n) = r, where x = m/n ∈Q0, as m,n ∈∗ZZ. Extending ∗Q to ∗IR directly. This direct extension is more involved than the preceding two because in order to generate IR fromQ infinite sequences (i.e. Cauchy sequences) of rationals are required, whereas for the preceding two extensions only pairs of natural numbers or integers were required, for recall that the internal version of a pair is still a pair, but that the internal version of an infinite sequence is not an infinite sequence. What is needed are internal Cauchy sequences with terms in ∗Q. Now what is an internal Cauchy sequence in the first place? A classical sequence (r(n)) of rationals r(n) is a Cauchy sequence if, ∀m ∈ IN : ∃k ∈ IN : ∀n,p ∈ IN,n,p > k :| r(n)−r(p) |< 1/m. Consequently, an internal Cauchy sequence (r(n)), n ∈∗IN, r(n) ∈∗Q, is characterized by classical Cauchy sequences (ri(n)), n ∈ IN, ri(n) ∈Q, i = 1,2,3,..., such that r(H(ni)) = H(ri(ni)), hence by the hyperstatement, H[∀mi ∈ IN : ∃ki ∈ IN : ∀ni,pi ∈ IN, ni,pi > ki :| ri(ni)−ri(pi) |< 1/mi], which can be simplified to, H[∀m ∈ IN : ∃k ∈ IN : ∀n,p ∈ IN, n,p > k :| ri(n)−ri(p) |< 1/m]. In an analogous way, internal concurrency between the internal Cauchy sequences r(n) and s(n) is characterized by the hyperstatement, H[∀m ∈ IN : ∃k ∈ IN : ∀n ∈ IN, n > k :| ri(n)−si(n) |< 1/m].

96

Now let each internal Cauchy sequence r(n) of hyperrationals generate a constant R0(r(n)) and let, IR0 = {R0(r(n)) : n ∈∗IN,r(n) ∈∗Q, and (r(n)) is an internal Cauchy sequence.} Equality. R0(r(n)) = R0(s(n)) if and only if r(n) and s(n) are internally concurrent. Identification. R0(r(n)) = r if for all n ∈ IN (hence for all n ∈ ∗IN), r(n) = r for some r ∈∗Q. The definitions of absolute value, addition, subtraction, multiplication, division and the inequalities are similar to those given before in Section 2.1 and all this is preserved by the bijection between IR0 and ∗IR that is defined as follows. Let x0 ∈ IR0, then x0 = R0(r(n)) is generated by the internal Cauchy sequence r(H(ni)) = H(ri(ni)), which for each i defines the classical Cauchy sequence (ri(n)), n ∈∗IN, which in turn defines xi = R(ri(n)) ∈ IR, and hence x = H(xi) ∈∗IR. Conversely, each x ∈ ∗IR defines an x0 ∈ IR0, for if ri(n) is given for all n ∈ IN, then this defines ri(n) for all n ∈∗IN. Therefore, IR0 can be identified with ∗IR.

2.15 Getting away with generating sequences and H(si); summary

Recall from classical analysis that the real numbers were introduced by means of Cauchy sequences of rational numbers. In more formalistic mathematics a real number ‘is’ the set of all Cauchy sequences concurrent with a given one, but in practice one is seldom working with these sequences. In a similar way we have been introducing the hypernumbers, say the hyperreals, by means of infinite sequences (xi) of reals xi. Rather than letting a hyperreal be some set, it was preferred to let it be something new, i.e. H(xi), generated by (xi), but a more formalistic procedure could have been followed just as well. Anyway, so many facts regarding nonstandard mathematics have become known in the preceding sections that, just as in classical real number analysis, we can in most cases do without generating sequences. This section serves to summarize these facts.

Just as in classical mathematics in nonstandard mathematics there are the wellknown notions of number, set, n-tuple, standard, internal and external notions. Any standard notion is internal, but no notion can be internal and external at the same time (see the figure in Section 1.6). There exists a bijection between the collection of all classical notions and the collection of all standard notions. If s is

97

a classical notion, then ∗s, called its ∗-transform, is the corresponding standard one. Although the notation H(si) will largely disappear, the asterisks will not. Elements of internal sets are internal, but subsets of internal sets may or may not be internal. For example, {1,2,3} is an internal, but IN is an external subset of ∗IN. When external sets directly or indirectly enter the definition of, say, some function, then the latter may turn out to be external as well.

A fairly complete list of all results found sofar that are free from the H-operator is given below.

1. A constant is internal if and only if it is an element of some standard set. 2. Let S be a classical set such that ∗s = s for each s ∈ S. Then ∗S = S if and only if S is finite. Otherwise S ⊆∗S. 3. Let S be a classical set such that each of its elements is a classical set of which each element is equal to its ∗-transform. Then ∗S = S if and only if S is finite and all its elements are finite. A similar results holds if S is of any level.

4. Let f be a function from X onto Y , and assume that X is finite, that ∗x = x for all x ∈ X and that ∗y = y for all y ∈ Y . Then ∗X = X, ∗Y = Y and ∗f = f. 5. Let g : W → X and f : X → Y , both g and f be surjective, and assume that X is finite, that ∗x = x for all x ∈ X and that ∗y = y for all y ∈ Y . Then,

∗(f◦g)(H(wi)) = (f◦∗g)(H(wi)) for all H(wi) ∈∗W, and ∗(f◦g)(∗w) = (f◦g)(w) for all w ∈ W,

even if W is not finite.

6. Summary of useful results. a) ∗∅ = ∅. b) ∗S = ∗T if and only if S = T, ∗S 6= ∗T if and only if S 6= T, ∗s ∈∗S if and only if s ∈ S, ∗S ⊆∗T if and only if S ⊆ T, and similarly for ⊂, ⊇ and ⊃.c) ∗(S ∪T) = ∗S ∪∗T, ∗(S ∩T) = ∗S ∩∗T, ∗(S −T) = ∗S −∗T,

98

if S ⊆ T then ∗(Sc) = (∗S)c, where taking the complement is with respect to T and ∗T, respectively. d) ∗hs,ti = h∗s,∗ti, and similar equalities hold for n-tuples, n = 3,4,.... e) ∗(f(x)) = ∗f(∗x). f) ∗[(f◦g)(w)] = (∗f◦∗g)(∗w). 7. If S is a classical infinite set of numbers, then S is external. 8. If A is a set of numbers, the inclusion ∗(P(A)) ⊆P(∗A) always holds and is strict if and only if A is an infinite set, and if so, P(∗A) contains an element that is an external subset of A.

9. (The internal definition principle.) Let any statement be given, say P(x,X,s), where X is some set and x ∈ X makes sense. Let X and s be internal, then so is the set,

{x ∈ X : P(x,X,s)}. 10. (Transfer, first formulation.) Let R(X,X0,X00,...;s,s0,s00,...) be a given statement with constants or free variables X,X0,X00,... and s,s0,s00,..., and bound variables x,x0,x00,..., where X,X0,X00,... are sets, and where x occurs in either ∃x ∈ X or ∀x ∈ X, and similarly for x0,x00,.... Then, R(X,X0,X00,...; s,s0,s00,...) ≡ R(∗X,∗X0,∗X00,...; ∗s,∗s0,∗s00,...).

11. (Transfer, second formulation.) Given any internal statement, replacing everything, including every bound variable, by its standard version is equivalent to replacing everything except every bound variable by its standard version. 12. (The standard definition principle.) Let ∗X be a standard set, x ∈ ∗X make sense, ∗s be standard, and P(x,∗X,∗s) be any statement. Then also, {x ∈∗X : P(x,∗X,∗s)} is standard.

13. Infinitely large numbers and nonzero infinitesimals exist. 14. ε ∼ 0 if and only if 1/ε ∼ +∞ or 1/ε ∼−∞, hence s is appreciable if and only if 1/s is appreciable.

99

Let ε ∼ 0, ε0 ∼ 0, s and s0 be appreciable, and ω ∼∞, ω0 ∼∞. Then, ε + ε0 ∼ 0, ε−ε0 ∼ 0, ε·ε0 ∼ 0, ε + s and ε−s are appreciable, and ε·s ∼ 0, ε + ω ∼∞, ε−ω ∼−∞, and ε·ω ∼ +∞ or −∞ or 0, or ε·ω is appreciable, s + s0 and s−s0 are appreciable or ∼ 0, and s·s0 is appreciable, s + ω ∼ +∞, s−ω ∼−∞, s·ω ∼ +∞ or −∞, ω + ω0 ∼ +∞, ω−ω0 ∼ +∞ or −∞ or ' 0, or ω−ω0 is appreciable, and ω·ω0 ' +∞.

15. In ∗IR the set of all infinitesimals is external. 16. In ∗IR the set of all positive hyperlarge numbers is external. 17. (Overflow or overspill.) Let S be an internal subset of ∗T, where T is either IN or ZZ, orQ, or IR, such that ∀m ∈ IN : ∃s(m) ∈ S : s(m) ≥ m, i.e. such that from a classical point of view S contains arbitrarily large elements, then ∃s ∈ S : s ∼∞, i.e. S contains some infinitely large element. 18. (Underflow or underspill.) Let S be an internal subset of ∗T, with T as before, such that∀ω ∈∗IN, ω ∼∞ : ∃s(ω) ∈ S : s(ω) < ω∧s(ω) ∼∞, i.e. such that S contains infinitely large elements that are arbitrarily small, then ∃s ∈ S : s is limited. 19. (‘Inverse’ overflow.) Let S be an internal subset of ∗Q or ∗IR, such that ∀m ∈ IN : ∃s(m) ∈ S :| s(m) |≤ 1/m, i.e. such that from a classical point of view S contains arbitrarily small elements, then ∃s ∈ S : s ' 0. 20. (‘Inverse’ underflow.) Let S be an internal subset of ∗Q or ∗IR, such that ∀ε, ε ∼ 0 : ∃s ∈ S : s ≥ ε, then ∃s ∈ S : s is not an infinitesimal. 21. If n ∈∗IN, then either n ∈ IN or n is hyperlarge. 22. Given any ω ∈∗IN, ω ∼∞, then S = [1,ω] is uncountable. 23. Let f be an internal function from ∗IN to {1,2}, such that both 1 and 2 are assumed somewhere, i.e. such that f is onto. Moreover, let f(n) = 2 for all n ≥ b for some b ∈ ∗IN. Then there is a β < b, β ∈ ∗IN such that f(β) = 1 and f(n) = 2 if n > β. Hence there is a last such that f(β) = 1, even though b ∼ ∞ is allowed, and {n : n ≤ b} is uncountable in that case.

100

24. (Standard part theorem.) If x ∈ ∗IR then either | x |∼∞, or x = r + ε, r ∈ IR, ε ' 0 for unique r and ε. 25. Given any r ∈ IR, there exists an x ∈∗Q such that st(x) = r. 26. Let a,b ∈ IR, a < b. If x ∈∗[a,b], then st(x) ∈ [a,b], but if the interval in IR is not closed, this is sometimes not true.

Chapter 3

Some applications

3.1 Introduction and least upper bound theorem

The aim of this chapter is to show how many definitions and proofs of elementary calculus can be simplified by means of nonstandard analysis. Only a number of important examples will be considered. A much more complete treatment is Keisler [26], where the existence of nonstandard numbers is taken for granted, however, and a simplified form of transfer is introduced in an axiomatic kind of way.

Theorem 3.1.1 (The least upper bound theorem.) Let S be a nonempty subset of IR that is bounded above by some (classical) real number. Then S has a least upper bound in IR.

Proof: Taking any c ∈ S, instead of S we may consider {s : s ∈ S, s ≥ c}, that is to say we may assume that s ≥ c for all s ∈ S. Then c, b ∈ IR, c < b, exist such that ∀s ∈ S : c ≤ s ≤ b, so that, by transfer, ∀s ∈∗ S : c ≤ s ≤ b. Let ω ∈∗ IN, ω ∼ ∞ be arbitrary and divide ∗[c,b] in ω equal subintervals of length δ = (b−c)/ω, so that δ ∼ 0, and consider the points a, a + δ, a + 2δ, ..., a + ωδ = b. Then, ∃j ∈∗IN : [∀s ∈∗S : s ≤ a + jδ] ∧ [∃s0 ∈∗S : s0 > a + jδ−δ].

Let β =st(a+jδ), which is well defined as a+jδ is limited. Then β is a (hence the) least upper bound of S. For first of all if s ∈ S then s ∈∗S, hence s ≤ a+jδ = β+ε for some ε ' 0, but since s, β ∈ IR this means that s ≤ β. And secondly, if β0 were a smaller upper bound of S, then β > β0 + 1/m for some m ∈ IN, hence

101

102

β0 ≥ s0 > a+jδ−δ = β+ε−δ > β0+1/m+ε−δ, or δ−ε > 1/m, a contradiction.

Note that this proof is not much shorter than its classical counterpart, that essentially runs as follows. Let a1 = a, b1 = b and n = 1. (∗) If S ⊆ [an, (an + bn)/2], then let an+1 = an, bn+1 = (an +bn)/2, otherwise let an+1 = (an +bn)/2, bn +1 = bn. In either case replace n by n+1 and start again from (∗). This procedure defines two concurrent Cauchy sequences (an) and (bn), both converging to the least upper bound of S (the reader may work out the details).

Since the theorem is wrong if IR is replaced byQ, both proofs must use something that is typical for IR. Indeed each limited element of ∗IR (not ∗Q) has a standard part, and any Cauchy sequence converges to some element of IR (not Q). This illustrates the obvious fact that a nonstandard proof must contain all essential steps – perhaps in disguise – of the corresponding classical proof.

3.2 Simplifying definitions and proofs of elementary calculus

First of all recall from Sections 1.4 and 1.5 that a function f from IR to IR is continuous at c ∈ IR if, ∀ε ∈ IR,ε > 0 : ∃δ ∈ IR,δ > 0 : ∀x ∈ IR,| x−c |< δ :| f(x)−f(c) |< ε or, equivalently, if, ∀ε ∈∗IR,ε > 0 : ∃δ ∈∗IR,δ > 0 : ∀x ∈∗IR,| x−c |< δ :|∗f(x)−∗f(c) |< ε or, equivalently, if, ∀δ ∈∗IR,δ ' 0 : ∗f(c + δ)−∗f(c) ' 0. The first simplification, therefore, reads as follows.

Theorem 3.2.1 (Simplified definition of the continuity of real-valued real functions.) f : IR → IR is continuous at c ∈ IR if and only if, ∀δ ∈∗IR,δ ' 0 : ∗f(c + δ)−∗f(c) ' 0.

103

Theorem 3.2.2 (Simplified definition of uniform continuity.) f : S → IR, S ⊆ IR is uniformly continuous in S if and only if, ∀x,y ∈∗S,x−y ' 0 : ∗f(x)−∗f(y) ' 0.

Proof: Recall that f : S → IR, S ⊆ IR is uniformly continuous in S if, ∀ε ∈ IR,ε > 0,∃δ ∈ IR,δ > 0 : ∀x,y ∈ S,| x−y |< δ :| f(x)−f(y) |< ε. By transfer, this is equivalent to, ∀ε ∈∗IR,ε > 0 : ∃δ ∈∗IR,δ > 0, ∀x,y ∈∗S,| x−y |< δ :|∗f(x)−∗f(y) |< ε. But this can be simplified to, ∀x,y ∈∗S,x−y ' 0 : ∗f(x)−∗f(y) ' 0. For let (3.1) be true and let m ∈ IN be given arbitrarily. Then there exist ε ∈ IR, ε > 0 such that ε < 1/m, and δ as in (3.1). Hence ∀x,y ∈ S, | x − y |< δ : | f(x)−f(y) |< 1/m, or, by transfer, ∀x,y ∈∗S,| x−y |< δ :|∗f(x)−∗f(y) |< 1/m, so that, as m was arbitrary, ∗f(x) − ∗f(y) ' 0, which proves (3.1), since in absolute value any infinitesimal is smaller than δ. Conversely, let (3.1) be true, let ε ∈ IR, ε > 0 be arbitrary and let δ ' 0, δ > 0. Hence if x,y ∈∗S, | x−y |< δ then ∗f(x)−∗f(y) = ε0 for some ε0 ' 0. In other words, since | ε0 |< ε, ∃δ0 ∈∗IR,δ0 > 0 : ∀x,y ∈∗S,| x−y |< δ0 :|∗f(x)−∗f(y) |< ε, (take, for example δ0 = δ) or, by transfer (in the opposite direction), ∃δ0 ∈ IR,δ0 > 0 : ∀x,y ∈ S,| x−y |< δ0 :|∗f(x)−∗f(y) |< ε, which proves (3.1) since ε was arbitrary.

Theorem 3.2.3 If f is continuous at each x ∈ [a,b], a, b ∈ IR, a < b, then f is uniformly continuous in [a,b].

104

Simplified proof: Let x,y be in ∗[a,b], then by Theorem 2.13.3, st(x) ∈ [a,b] and st(y) ∈ [a,b]. Let x−y ' 0. Since y−st(y) ' 0 it follows that x−st(y) ' 0. By continuity, ∗f(y)−∗f(st(y)) ' 0 and ∗f(x)−∗f(st(y)) ' 0, so that ∗f(x)−∗f(y) ' 0.

Theorem 3.2.4 (Simplified limit definition.) Let f : IR → IR, then lim x→c f(x) = k, c,k ∈ IR if and only if, ∀δ ∈∗IR,δ ∼ 0 : ∗f(c + δ)−k ' 0, so that k =st[∗f(c + δ)].

Proof: By definition, the limit exists if and only if, ∀ε ∈ IR,ε > 0 : ∃δ ∈ IR,δ > 0 : ∀x ∈ IR,0 <| x−c |< δ :| f(x)−k |< ε, or, by transfer, ∀ε ∈∗IR,ε > 0 : ∃δ ∈∗IR,δ > 0 : ∀x ∈∗IR,0 <| x−c |< δ :|∗f(x)−k |< ε, which can be simplified to, ∀δ ∈∗IR,δ ∼ 0 : ∗f(c + δ)−k ' 0.

Exercise: Complete this proof. Now let f : IN → IR, so that f is an infinite sequence, and let c be replaced by ∞.

Theorem 3.2.5 (Another simplified limit definition.) Let f : IN → IR and k ∈ IR, then lim x→∞ f(n) = k, k ∈ IR, if and only if, ∀n ∈∗IN, n ∼∞ : ∗f(n)−k ' 0, so that k =st[∗f(n)].

Proof: By definition, the limit exists if and only if, ∀ε ∈ IR,ε > 0 : ∃n0 ∈ IN : ∀n ∈ IN,n > n0 :| f(n)−k |< ε, or, by transfer, ∀ε ∈∗IR,ε > 0 : ∃n0 ∈∗ N : ∀n ∈∗IN,n > n0 :|∗f(n)−k |< ε,

105

which can be simplified to, ∀n ∈∗IN,n ∼∞ : ∗f(n)−k ' 0.

Exercise: Again complete the proof.

Exercise: Treat the cases where the limit itself is infinite.

Theorem 3.2.6 If f is a nondecreasing infinite sequence, that is bounded above, then f(n) has a finite limit for n tending to ∞.

Classical proof: The set {f(n) : n ∈ IN} is bounded above, hence in IR has a least upper bound β, so that f(n) ≤ β for all n ∈ IN, and for each m ∈ IN, f(n0) > β−1/m for some n0 ∈ IN, and since f is nondecreasing this implies that lim n→∞ f(n) = β. Exercise: Give a direct nonstandard proof similar to that of Theorem 3.1.1 (the least upper bound theorem), not using Theorem 3.1.1.

Theorem 3.2.7 (The intermediate value theorem.) If a, b ∈ IR, a < b, and f(a) < 0, f(b) > 0, then f(c) = 0 for some c, a < c < b.

Simplified proof: See Section 1.4.

Theorem 3.2.8 (The extreme value theorem.) Let f : [a,b] → IR, a,b ∈ IR, a < b, and let f be continuous at each point of [a,b]. Then f(x) ≤ f(c) for some c ∈ [a,b] and all x ∈ [a,b], i.e. f has a maximum somewhere in the closed interval between a and b. And similarly for minimum.

Simplified proof: Let ω ∈ ∗IN, ω ∼∞, be arbitrary, and divide ∗[a,b] in ω equal subintervals of length δ = (b − a)/ω. Let n ∈ ∗IN be such that ∗f(a + nδ) ≥ ∗f(a+iδ) for all i = 0,1,...,ω. The existence of n follows by transfer, since any finite set has a maximum, hence so has any hyperfinite set. Obviously, a + nδ is limited, hence c =st(a+nδ) is well defined and by continuity, ∗f(a+nδ)−f(c) = ε for some ε ' 0. Each x ∈ [a,b] is within the distance δ of some a + iδ and δ ∼ 0, hence, again by continuity, f(x) = ∗f(a + iδ) + ε0 for some ε0 ' 0, hence, f(x) ≤∗f(a + nδ) + ε0 = f(c) + ε + ε0, i.e. f(x) ≤ f(c).

106

Theorem 3.2.9 (The composite function theorem.) Let g(w) be defined for w in a neighborhood of c ∈ IR, and let f(x) be defined for x in a neighborhood of g(c). Then f◦g is continuous at c if g is continuous at c and f is continuous at g(c).

Simplified proof: Let δ ' 0, then ∗g(c + δ) − g(c) ' 0, hence it follows that ∗f(∗g(c + δ))−f(g(c)) ' 0.

3.3 Continuity and limits for internal functions

So far nonstandard characterizations were given for continuity and limits of classical functions. How about arbitrary internal functions? Let f be some internal function from ∗IR to ∗IR, and let c ∈∗IR, so that c may be hyperlarge or ‘almost standard’ i.e. be the sum of a real number and a nonzero infinitesimal. Or rather, let F be the set of all classical functions from IR to IR and let f ∈ ∗F. Then, by definition D) of Section 2.9, for suitable fi and ci, f = H(fi) is ∗continuous at c = H(ci) if for all i ∈ IN, fi is continuous at ci. Here fi : IR → IR and ci ∈ IR.

Theorem 3.3.1 (Continuity of internal functions.) The internal f : ∗IR →∗IR is ∗continuous at c ∈∗IR if and only if in the classical definition IR is replaced by ∗IR, i.e. if, ∀ε ∈∗IR,ε > 0 : ∃δ ∈∗IR,δ > 0 : ∀x ∈∗IR,| x−c |< δ :| f(x)−f(c) |< ε.

Proof: Letting f = H(fi) and c = H(ci), f is ∗continuous at c if and only if, H[∀εi ∈ IR,εi > 0 : ∃δi ∈ IR,δi > 0 : ∀xi ∈ IR,| xi −ci |< δi : | fi(xi)−fi(ci) |< εi]. By L oˇs’ theorem (Theorem 2.7.1) this is equivalent to, ∀H(εi) ∈∗IR,H(εi) > 0 : ∃H(δi) ∈∗IR,H(δi) > 0 : ∀H(xi) ∈∗IR,| H(xi)−H(ci) |< H(δi) : | H(fi)(H(xi))−H(fi)(H(ci)) |< H(εi) and hence to what has to be proved. Warning: If f or c is nonstandard, ∗continuity is not always equivalent to, ∀δ ∈∗IR,δ ' 0 : f(c + δ)−f(c) ' 0.

Counterexamples:

107

a) c standard, but f nonstandard; let ω ∼∞, f(x) = ωx, c = 1, δ = 1/ω1/2; then f(c + δ)−f(c) = ωδ ∼∞. b) f standard, but c nonstandard; let f(x) = x2, δ ∼ 0, c = 1/δ; then f(c + δ)−f(c) = 2 + δ2 ∼ 2. Yet, ∀δ ∈ ∗IR, δ ' 0 : f(c + δ)−f(c) ' 0 makes sense for arbitrary internal f and c. If this is true, then f is called S-continuous at c.

Examples: a) Let α be a positive infinitesimal, and f(x) = αx if x ≥ 0, f(x) = 0 if x < 0. Then f is S-continuous everywhere in ∗IR. It is ∗continuous at c ∈∗IR if c 6= 0, but not at c = 0. b) Let ω ∼∞ and f(x) = ωx. Then f is nowhere S-continuous, since f(x)− f(c) = ω(x−c) = ω1/2 if x−c = ω−1/2 ∼ 0. It is continuous everywhere in ∗IR. c) f(x) = x2. Then f is not S-continuous if c ∼∞. It is ∗continuous everywhere in ∗IR. Theorem 3.3.2 The internal function f : ∗IR → ∗IR is S-continuous at c ∈ ∗IR if and only if, ∀ε ∈ IR,ε < 0 : ∃δ ∈ IR,δ > 0 : ∀x ∈∗IR,| x−c |< δ :| f(x)−f(c) |< ε. (Note that both ε and δ are standard, but that x is internal.)

Proof: The if part. Let ε ∈ IR, ε > 0 and δ ∈∗IR, δ ' 0 be given arbitrarily. Then there is a δ0 ∈ IR, δ0 > 0 such that, ∀x ∈∗IR,| x−c |< δ0 :| f(x)−f(c) |< ε. As | δ |< δ0, so that | x−c |< δ0 if x = c+δ, it follows that | f(c+δ)−f(c) |< ε, and since ε is arbitrary that f(c + δ)−f(c) ' 0. The only-if part. Conversely, let ε and δ be as before but such that δ > 0. Then ∀x ∈∗IR, | x−c |< δ : | f(x)−f(c) |< ε. Now let the set S be defined by, S = {δ ∈∗IR : δ > 0 and ∀x ∈∗IR,| x−c |< δ :| f(x)−f(c) |< ε}. Then if δ ∈ S every number between 0 and δ is in S. By the internal definition principle (Corollary 2.7.1) S is internal, but it contains as a subset the set of all positive infinitesimals and the latter is external (see Theorems 2.10.3 and 2.11.4),

108

so that S must contain some δ > 0 that is not an infinitesimal, and it follows that S must contain some δ > 0, δ ∈ IR. Remark: In this proof the fact that an external set is not internal has been used. This fact is called Cauchy’s principle. It is an example of a principle of permanence. In general this is a statement that if some set S contains some subset T, the latter is strictly contained in S, because S and T happen to be different kinds of set. In classical mathematics such principles do not seem to play a real part, but in nonstandard mathematics there are several of them, although there are only a few primary forms, or perhaps only one, i.e. Cauchy’s principle, which obviously really is a matter of definition. See Section 4.1 for more details. In a similar way the ∗-transform of uniform continuity can be introduced: simply copy the classical definition and replace IR by ∗IR. And the simplified form of uniform continuity leads to S-uniform continuity. Hence the internal f : ∗IR →∗IR is S-uniformly continuous in ∗IR if, ∀x,y ∈∗IR,x−y ' 0 : f(x)−f(y) ' 0.

Exercise: Formulate and show a theorem similar to Theorem 3.3.2, but for uniform continuity. It is generally agreed to drop the asterisks in both ∗continuity and ∗uniform continuity, as well as in similar indications. But keep in mind that S-continuity is then not a special form of continuity, and similarly for other attributes. Turning to ∗limits, the definitions similar to those in Section 3.2 are, dropping the asterisks: the internal f : ∗IR → ∗IR tends to the limit k ∈ ∗IR for x ∈ ∗IR tending to c ∈∗IR, if, ∀ε ∈∗IR,ε > 0 : ∃δ ∈∗IR,δ > 0 : ∀x ∈∗IR,0 <| x−c |< δ :| f(x)−k |< ε. And the internal f : ∗IN → ∗IR tends to the limit k ∈ ∗IR for n ∈ ∗IN tending to infinity, if, ∀ε ∈∗IR,ε > 0 : ∃n0 ∈∗IN : ∀n ∈∗IN,n > n0 :| f(n)−k |< ε. Similar to S-continuity the definitions of S-limit are as follows. The internal f : ∗IR →∗IR tends to the S-limit k ∈∗IR for x ∈∗IR tending to c ∈∗IR, if, ∀δ ∼ 0 : f(c + δ)−k ' 0. And the internal f : ∗IN → ∗IR tends to the S-limit k ∈ ∗IR for n ∈ ∗IN tending to infinity, if, ∀n0 ∈∗IN,n0 ∼∞ : f(n0)−k ' 0.

109

Theorem 3.3.3 The internal f : ∗IR → ∗IR tends to the S-limit k ∈ ∗IR for x ∈∗IR tending to c ∈∗IR, if and only if, ∀ε ∈ IR,ε > 0 : ∃δ ∈ IR,δ > 0 : ∀x ∈∗IR,0 <| x−c |< δ :| f(x)−k |< ε.

Proof: Left as an exercise. Theorem 3.3.4 The internal f : ∗IN → ∗IR tends to the S-limit k ∈ ∗IR for n ∈∗IN tending to infinity, if and only if, ∀ε ∈ IR,ε > 0 : ∃n0 ∈ IN : ∀n ∈∗IN,n > n0 :| f(n)−k |< ε.

Proof: Left as an exercise.

A special case arises when k is finite, because then st(k) is well defined. Then also f(x) or f(n) are finite for x close enough to c or n large enough. Assuming in the first case that c is finite as well, it follows that the classical limits for x tending to st(c) or n tending to ∞ are equal to st(k).

Theorem 3.3.5 Let f, c and k be as before, and let k and c be finite. If f(x) tends to k for x ∈∗IR tending to c, then, lim x→st(c) st(f(x)) = st(k), where x ∈ IR, and if f(n) tends to k for n ∈∗IN tending to ∞, then, lim n→∞ st(f(n)) = st(k), where n ∈ IN.

Proof: In the first case

∃δ1 ∈ IR,δ1 > 0 : ∀x ∈∗IR,0 <| x−c |< δ1 :| f(x)−k |< 1, so that ∀x ∈ IR, 0 <| x−st(c) |< δ1/2 :| f(x) |<|st(k) | +2, which means that f(x) is finite for these x’s, from which the first claim follows. The second claim is shown in a similar way.

Exercise: Treat the cases where the limit itself is infinite. And define the corresponding S-limits.

110

3.4 More nonstandard characterizations of classical notions

Theorem 3.4.1 (Nonstandard characterization of Cauchy sequence.) s(n) is a classical Cauchy sequence if and only if, ∀n,p ∈∗IN,n,p ∼∞ : ∗s(n)−∗s(p) ' 0.

Proof: By definition, ∀m ∈ IN : ∃k ∈ IN : ∀n,p ∈ IN,n,p > k :| s(n)−s(p) |< 1/m, and by transfer, fixing m ∈ IN and k ∈ IN, ∀n,p ∈ IN,n,p > k :| s(n)−s(p) |< 1/m, is equivalent to, ∀n,p ∈∗IN,n,p > k :|∗s(n)−∗s(p) |< 1/m. Now let n,p ∼∞, so that automatically n,p > k, no matter the value of k ∈∗IN, then (3.1) implies that, ∀m ∈ IN : ∃k ∈ IN : n,p ∈∗IN,n,p ∼∞ :|∗s(n)−∗s(p) |< 1/m. But since k plays no part any more this can be simplified to, ∀m ∈ IN : ∀n,p ∈∗IN,n,p ∼∞ :|∗s(n)−∗s(p) |< 1/m, hence to, ∀n,p ∈∗IN,n,p ∼∞ : ∀m ∈ IN :|∗s(n)−∗s(p) |< 1/m, hence to, ∀n,p ∈∗IN,n,p ∼∞ : ∗s(n)−∗s(p) ' 0, which is (3.1).

Conversely, consider the negation of (3.5), that is, ∃m ∈ IN : ∀k ∈ IN : ∃n,p ∈ IN,n,p > k :| s(n)−s(p) |≥ 1/m,

111

fix m ∈ IN and apply transfer to, ∀k ∈ IN : ∃n,p ∈ IN,n,p > k :| s(n)−s(p) |≥ 1/m, giving, ∀k ∈∗IN : ∃n,p ∈∗IN,n,p > k :|∗s(n)−∗s(p) |≥ 1/m, which implies, fixing k ∼∞ arbitrarily, that n,p ∼∞, hence, ∃n,p ∈∗IN,n,p ∼∞ :|∗s(n)−∗s(p) |≥ 1/m, so that (3.1) implies that, ∃m ∈ IN : ∃n,p ∈∗IN,n,p ∼∞ :|∗s(n)−∗s(p) |≥ 1/m, or, ∃n,p ∈∗IN,n,p ∼∞ : ∃m ∈ IN :|∗s(n)−∗s(p) |≥ 1/m, or, ∃n,p ∈∗IN,n,p ∼∞ : ¬[∗s(n)−∗s(p) ' 0], which is the negation of (3.4).

Theorem 3.4.2 (Nonstandard characterization of bounded set.) Let L be the set of all limited elements of ∗IR. Then S ⊆ IR is bounded if and only if ∗S ⊆ L.

Proof: S is bounded if,

∃m ∈ IN : ∀s ∈ S :| s |≤ m,

hence, by transfer, if,

∃m ∈ IN : ∀s ∈∗S :| s |≤ m,

so that ∗S ⊆ L. Conversely, if S is not bounded then, ∀m ∈ IN : ∃s ∈ S :| s |> m, hence, by transfer, ∀m ∈∗IN : ∃s ∈∗S :| s |> m, and taking m hyperlarge it follows that | s | is hyperlarge for some s ∈ ∗S, so that s 6∈ L.

112

Theorem 3.4.3 (Nonstandard characterization of open set.) Let S ⊆ IR and let h(S) = {t ∈ ∗IR : t ' s for some s ∈ S}. Then S is open if and only if, h(S) ⊆∗S.

Proof: S is open if, ∀s ∈ S : ∃m ∈ IN : ∀t ∈ IR,| t−s |< 1/m : t ∈ S, hence, by transfer, ∀s ∈ S : ∃m ∈ IN : ∀t ∈∗IR,| t−s |< 1/m : t ∈∗S, so that, restricting t such that t ' s, ∀s ∈ S : ∀t ∈∗IR,t ' s : t ∈∗S, i.e. h(S) ⊆∗S.

Conversely, if S is not open, then, ∃s ∈ S : ∀m ∈ IN : ∃t ∈ IR,| t−s |< 1/m : t 6∈ S, hence, by transfer, ∃s ∈ S : ∀m ∈∗IN : ∃t ∈∗IR,| t−s |< 1/m : t 6∈∗S, and taking m hyperlarge it follows that for some s ∈ S and some t ∈∗IR we have that t ' s, but t 6∈∗S, hence that h(S) is not a subset of ∗S. Remark: h(S) is called the halo (or the monad) of S.

Theorem 3.4.4 (Nonstandard characterization of closed set.) S ⊆ IR is closed if and only if, h(Sc) ⊆∗(Sc) = (∗S)c.

Proof: Follows directly from the previous theorem.

Exercise: Show the last theorem independently of the previous theorem.

Theorem 3.4.5 (Nonstandard characterization of interior point.) Let s ∈ IR and let h(s) = {t ∈ IR : t ' s}. Then s is an interior point of S ⊆ IR if and only if h(s) ⊆∗S.

113

Proof: Since s is an interior point of S if, ∃m ∈ IN : ∀t ∈ IR,| t−s |< 1/m : t ∈ S, the proof is a simplified version of that of Theorem 3.4.3. The details are left as an exercise.

Remark: In view of the previous remark, h(s) is of course called the halo of the point s. Note that h(0) is the set of all infinitesimals.

Theorem 3.4.6 (Nonstandard characterization of boundary point.) s ∈ IR is a boundary point of S ⊆ IR if and only if both h(s)∩∗S and h(s)∩(∗S)c are nonempty.

Proof: If s is a boundary point of S then, ∀m ∈ IN : [∃t ∈ IR,| t−s |< 1/m : t ∈ S]∧[∃t ∈ IR,| t−s |< 1/m : t 6∈ S] hence, by transfer, ∀m ∈∗IN : [∃t ∈∗IR,| t−s |< 1/m : t ∈∗S]∧[∃t ∈∗IR,| t−s |< 1/m : t 6∈∗S] so that, taking m ∼∞, [∃t : t ' s,t ∈∗S]∧[∃t : t ' s,t 6∈∗S], hence h(s)∩∗S 6= ∅ and h(s)∩(∗S)c 6= ∅. Conversely, if s is not a boundary point of S, then, ∃m ∈ IN : [∀t ∈ IR,| t−s |< 1/m : t 6∈ S]∨[∀t ∈ IR,| t−s |< 1/m : t ∈ S], hence, by transfer, ∃m ∈ IN : [∀t ∈∗IR,| t−s |< 1/m : t 6∈∗S]∨[∀t ∈∗IR,| t−s |< 1/m : t ∈∗S]. The first substatement between square brackets implies that if t ' s then t 6∈∗S, so that h(s) ⊆ (∗S)c, and the second one similarly that h(s) ⊆ ∗S, so that either h(s)∩∗S = ∅ or h(s)∩(∗S)c = ∅.

Theorem 3.4.7 (Nonstandard characterizations of accumulation point and closure.) s ∈ IR is an accumulation point (or limit point) of S ⊆ IR if and only if, ∃t ∈∗S,t 6= s : t ∼ s. Let cl S be the closure of S. Then s ∈ cl S if and only if, ∃t ∈∗S : t ' s.

114

Proof: If s is an accumulation point of S then, ∀m ∈ IN : ∃t ∈ S,t 6= s :| t−s |< 1/m, hence, by transfer, ∀m ∈∗IN : ∃t ∈∗S,t 6= s :| t−s |< 1/m, so that, taking m hyperlarge, ∃t ∈∗S, t 6= s: t ∼ s. Conversely, if s is not an accumulation point of S, then, ∃m ∈ IN : ∀t ∈ S,t 6= s :| t−s |≥ 1/m, hence, by transfer, ∃m ∈ IN : ∀t ∈∗S,t 6= s :| t−s |≥ 1/m, so that, ∀t ∈∗S,t 6= s : ¬[t ' s]. The second part of the theorem follows by observing that s ∈ cl S if and only if s ∈ S or else if s is an accumulation point of S. Exercise: Give an alternative proof of Theorem 3.4.4, using the fact that S is closed if and only if S = cl S.

3.5 Inverse functions; bc

Recall that a function f : S → T has an inverse f−1 if and only if f is bijective, and then f−1(t) = s if f(s) = t.

Theorem 3.5.1 Let a function f be monotonically increasing (or decreasing) and be continuous in [a,b], a,b ∈ IR, a < b. Then,

1) range (f), the range of f, is a finite closed interval, 1) f has an inverse, 1) f−1 too is monotonically increasing (or decreasing), and 1) f−1 is continuous in its domain.

Proof: Only the case where f is increasing is considered.

115

1) As a ≤ x ≤ b implies that f(a) ≤ f(x) ≤ f(b), range (f) ⊆ [f(a),f(b)]. And if f(a) ≤ w ≤ f(b) then by the intermediate value theorem (Theorem 3.2.7) there is a c ∈ [a,b] such that f(c) = w which means that [f(a),f(b)] ⊆ range (f). Therefore, range (f) = [f(a),f(b)]. 2, 3) Clearly, f−1 exists, [f(a),f(b)] is its domain, and it is increasing. 4) Let w = f(c) ∈ [f(a),f(b)], and let ε ∼ 0. If ε > 0 then, of course, w must be smaller than f(b), and if ε < 0 then w must be larger than f(a). Assume ε > 0. Note that ∗(f−1) = (∗f)−1, so that parentheses are not required here. Let c0 = ∗f−1(w + ε) so that c0 > c. If c0 were not infinitesimally close to c, then c0 > c+1/m for some m ∈ IN. As c,m ∈ IR. f(c + 1/m) > f(c) + 1/n for some n ∈ IN, so that ∗f(c0) > f(c + 1/m) > f(c)+1/n = w+1/n, but ∗f(c0) = w+ε, hence ε > 1/n, a contradiction. It follows that c0 ∼ c, which shows the continuity of f−1 at w. The next subject of this section is the introduction of bc, for b,c ∈ IR, b > 0. If c ∈Q this can best be done in the classical way, using the functions xc and bx as well the properties of inverse functions. Hence begin with xn, n ∈ IN, x ∈ IR, x > 0, which is monotonically increasing and continuous, leading to the definition of x1/n as its inverse, and hence to xm/n, m,n ∈ IN, x ∈ IR, x > 0, either as (x1/n)m or as (xm)1/n. To see that the two are identical, note that (yn)1/n = y, so that, taking y = (x1/n)m it follows that,

(((x1/n)m)n)1/n = (x1/n)m,

and taking y = x it follows that,

(((x1/n)m)n)1/n = ((x)m)1/n = (xm)1/n. For c ∈Q, c > 0, let xc = 1/x−c, and in view of xc ·xd = xc+d, let x0 = 1. Then xc, c ∈Q, x ∈ IR, x > 0 is increasing if c > 0, decreasing if c > 0, and equal to 1 if c = 0. Next consider bx, b ∈ IR, b > 0, x ∈Q, which is now well defined, then bx is increasing if b > 1, decreasing if b < 1, and equal to 1 if b = 1; and continuous at each x ∈Q. For c ∈ IR there is a nonstandard alternative. Given any c ∈ IR, by Theorem 2.13.2, c = st(c0) for some c0 ∈ ∗Q, where c0 is determined uniquely up to a hyperrational infinitesimal. Now let,

g(c) = st(bc0),

then g(c) = bc, where, bc = lim x→c

bx, x ∈Q.

116

To see this note first of all that g(c) is defined uniquely, for if ε ' 0, ε ∈∗Q, then st(bc0+ε) = st(bc0)· st(bε) = st(bc0)·1. Now let d ∈Q, such that c = d+r for some r > 0, then g(d) = bd. Since c0 = c+ε for some ε ' 0, c0 = d + r + ε and, g(c)−g(d) = st(bd+r+ε)−bd = bd ·[st(br+ε)−1]. In this product the first factor is positive, and the second one is positive (or negative) if b > 1 (or b < 1), and zero if b = 1. A similar result is obtained if r < 0. Hence the function g is monotonous if not constant, so that if c is between the rationals d and e, then g(c) is between bd and be. This shows both the monotonicity and the continuity of g(x), x ∈ IR, and hence that g(c) = bc.

Exercise: Show that st(p·q) = st(p)· st(q) and that st(bε) = 1 if b > 0 and ε ' 0, ε ∈∗Q.

3.6 Differentiation

Let f : ∗IR → ∗IR, c ∈ ∗IR, then f is said to be differentiable at c if for some k ∈∗IR, lim x→c f(x)−f(c) x−c exists and is equal to k. Then this limit, which obviously is a ∗limit, is called the derivative of f at c, and usually k is replaced by f0(c) or by df(c) dx . In case everything is standard, this definition becomes the classical definition of differentiability, and from Section 3.3 it follows that f : IR → IR is differentiable at c ∈ IR if for some k ∈ IR, ∀δ ∼ 0 : ∗f(c + δ)−f(c) δ ' k = f0(c), so that, f0(c) = st"∗f(c + δ)−f(c) δ #. This means that f0(c) is infinitesimally close to a quotient, justifying to a certain extent calling f0(c) a differential quotient, even though f0(c) is a limit of a quotient.

117

As before this need not be true if f, and c are internal; then f is called Sdifferentiable at c if for some internal k,

∀δ ∼ 0 :

f(c + δ)−f(c) δ ' k = f0(c). From the definition of f0(c) in the standard case it follows that, ∀δ ' 0 : ∗f(c + δ)−f(c)−f0(c)·δ = τδ, for some τ ' 0. This is known as the increment theorem. Conversely, if, for some k ∈ IR and, ∀δ ' 0 : ∗f(c + δ)−f(c)−kδ = τδ, for some τ ' 0, then f : IR → IR is differentiable at c ∈ IR and f0(c) = k.

Theorem 3.6.1

1) If f : IR → IR is differentiable at c ∈ IR, then f is continuous at c. 1) If f is differentiable at c, and b ∈ IR, then f +b and b·f are differentiable at c as well, and, (f + b)0(c) = f0(c),(b·f)0(c) = b·f0(c). 1) If both f and g are differentiable at c, then (f + g)0(c) = f0(c) + g0(c),(f ·g)0(c) = f0(c)·g(c) + f(c)·g0(c), and, if g(c) 6= 0, then, (f/g)0(c) = (f0(c)·g(c)−f(c)·g0(c))/g2(c). Note that in the last case that g(x) 6= 0 if x = c + δ, for δ ' 0, as g is continuous at c. In particular, if g(x) = 1/f(x), and f(c) 6= 0, then, g0(c) = −f0(c)/f2(c). 1) Chain rule. If g is differentiable at c, and f is differentiable at g(c), then the composite function F = f◦g, too is differentiable at c, and, F0(c) = (f◦g)0(c) = f0(g(c))·g0(c). Proof: 1) Follows immediately from ∗f(c + δ)−f(c)−f0(c)·δ = τδ, so that, ∗f(c + δ)−f(c) ' 0 if δ ' 0.

118

2) and 3) are left as exercises. 4) For any δ ' 0, and for some µ ' 0, ∗f(g(c) + δ)−f(g(c))−f0(g(c))·δ = µδ Let, given any ε ' 0, δ = ∗g(c + ε)−g(c) = g0(c)·ε + τε, for some τ ' 0. Then δ ' 0 and, ∗f(∗g(c + ε))−f(g(c))−f0(g(c))·(g0(c) + τ)·ε = µδ, hence, ∗f(∗g(c + ε))−f(g(c))−f0(g(c))·g0(c)·ε = f0(g(c))·τε + µ(g0(c)·ε + τε) = τ0ε, for some τ0 ' 0, which means that f◦g is differentiable at c, and that its derivative at c is equal to f0(g(c))·g0(c).

Theorem 3.6.2 (The critical point theorem.) Let X be an interval of IR, c ∈ X, f : X → IR be continuous at each x ∈ X, and c be a maximum or a minimum of f over X. Then c is an endpoint of X, or f0(c) is undefined, or f0(c) = 0.

Proof: Let c be a maximum of f over X. If c is not an endpoint of X and if f0(c) exists, then, ∀δ ∼ 0 : f0(c) = st"∗f(c + δ)−f(c) δ #. Taking first δ positive and then negative this gives that f0(c) ≤ 0 and f0(c) ≥ 0, hence f0(c) = 0.

Theorem 3.6.3 (Rolle’s theorem.) Let f be continuous in [a,b], a,b ∈ IR, a < b, and be differentiable in (a,b). Moreover, let f(a) = f(b) = 0. Then f0(c) = 0 for some c ∈ (a,b).

Proof: The proof is classical in nature, and relies on Theorems 3.2.8 and 3.6.2.

The proofs of the next corollaries too are classical in nature, and therefore are not given in detail.

119

Corollary 3.6.1 (Mean value theorem.) Let f be as in the previous theorem, except f(a) = f(b) = 0 need not be true. Then, f0(c) = f(b)−f(a) b−a for some c ∈ (a,b).

Proof: The proof follows by applying Rolle’s theorem to g, defined by,

g(x) = f(x)−f(a)·

f(b)−f(a) b−a ·(x−a).

Corollary 3.6.2 (Generalized mean value theorem.) Let f and g both be continuous in [a,b], and be differentiable in (a,b). In addition let g0(x) 6= 0 for all x ∈ (a,b). Then, f0(c) g0(c) = f(b)−f(a) g(b)−g(a) for some c ∈ (a,b).

Proof: The proof follows by applying Rolle’s theorem to h, defined by,

h(x) = f(x)(g(b)−g(a))−g(x)(f(b)−f(a)).

Corollary 3.6.3 (Taylor’s theorem.) Let f and f0 both be continuous in [a,b], and let f00 exist in (a,b), then, f(b) = f(a) + f0(a)(b−a)/1! + f00(c)(b−a)2/2!, for some c ∈ (a,b), and similarly for higher derivatives.

Proof: The proof is based on the previous corollary.

Theorem 3.6.4 (L’Hopital’s theorem). Let both f and g be differentiable, and g0(x) 6= 0 in a neighborhood (a,b) of

120

c ∈ IR with the possible exception of the point c itself. Also assume that lim x→c f(x) = lim x→c g(x) = 0, and that lim x→c f0(x) g0(x) exists and is equal to k. Then,

lim x→c

f(x) g(x)

= k.

Similar statements hold true if k and/or c are replaced by ∞.

Proof: Only the case as stated will be proved. Without loss of generality it may be assumed that f(x) and g(x) are defined at x = c and that f(x) = g(x) = 0, so that f and g are continuous at c. It follows that the generalized mean value theorem can be applied to both [c,c + δ] and [c−δ,c], for δ > 0 and close enough to zero. Taking δ ∼ 0 this gives that, for some c1 and c2, c < c1 < c + δ, c−δ < c2 < c, f0(c1) g0(c1) = f(c + δ)−f(c) g(c + δ)−g(c) and f0(c2) g0(c2) = f(c)−f(c−δ) g(c)−g(c−δ) . Since both c1 and c2 are infinitesimally close to c this implies that these two expressions are infinitesimally close to k and the theorem follows.

3.7 Integration

Only Riemann integration of continuous functions will be considered, even though the extension to more general functions is not very difficult. Let a,b ∈ ∗IR, a < b and let f : [a,b] → ∗IR be a continuous function. Then, by definition, the Riemann integral of f over [a,b] is, J =Zb a f(x)dx = lim n→∞ n X i=1 f(a + i(b−a)/n)(b−a)/n. In case everything is standard, then, given any ω ∈∗IN, ω ∼∞, J = st"ω X i=1 f(a + i(b−a)/ω)(b−a)/ω#. Note that in this case dx = (b−a)/ω ∼ 0 and dx > 0. And the usual formulation is, J = st"ω X i=1 f(a + i·dx)/dx#.

121

Instead of the point a + i·dx any other point xi ∈ [a + (i−1)dx,a + i·dx] may be selected, i = 1,...,ω and J becomes st(S), where,

S =

ω X i=1

f(xi)dx,

a so-called Riemann sum. That S is finite follows from the extreme value theorem (Theorem 3.2.8), as this theorem implies the existence of m,M ∈ IR such that, ∀x ∈ [a,b] : m ≤ f(x) ≤ M, so that m(b−a) ≤ S ≤ M(b−a) and S is indeed finite. The notation of the integral correctly suggests that J depends neither on ω, nor on how the xi are selected. Indeed, let yi some other element of the interval [a + (i−1)dx,a + i·dx], then if ω is not changed it follows from the uniform continuity of f and dx < δ for all δ ∈ IR, δ > 0, that, ∀ε ∈ IR,ε > 0 :

 

 

 

 ω X i=1 f(xi)dx− ω X i=1 f(yi)dx

 

 

 

 < ω·ε·dx = ε(b−a), hence that the expression to the left of the last inequality is an infinitesimal, which shows that J does not depend on the selection of the xi. And if ω1,ω2 ∼∞, and for k = 1,2, dxk = (b−a)/ωk and,

Sk =

ωk X i=1

f(a + i·dxk)dxk,

let ω = ω1ω2 and let dx and S be as before, then,

S −S1 =

ω1 X i=1

ω2 X j=1

[f(a +{(i−1)ω2 + j}dx)dx−f(a + iω2dx)dx], and again this is an infinitesimal, as is S −S2, and hence so is S1 −S2. In case a > b, then by definitionRb a f(x)dx = −Ra b f(x)dx, and if a = b, then Ra a f(x)dx = 0. As remarked before f need not be continuous in order for J to exist. Also in what follows derivatives need not always be continuous, but let us not go into the details as this chapter is only concerned with showing the possibilities of nonstandard analysis.

The definition of improper integrals is as in classical analysis: just consider the appropriate limits.

122

Exercise: Present the details of introducing improper integrals, complete with the simplified forms in case everything is standard.

In the remainder of this section it is assumed that everything is standard.

Theorem 3.7.1 If a < b < c, then, Zc a f(x)dx =Zb a

f(x)dx +Zc b

f(x)dx.

Proof: Follows from the fact that the standard part of a sum is equal to the sum of the standard parts. Theorem 3.7.2 F defined by F(x) = Rx a f(t)dt, a ≤ x ≤ b, a < b, is contin-uous. Moreover, the derivative of F(x) exists and is equal to f(x), a < x < b. Conversely, if G is such that G0(x) = f(x), then, F(b) =Zb a f(x)dx = G(b)−G(a).

Proof: The continuity of F follows from the fact that, if δ ' 0, Zx+δ a f(t)dt−Zx a f(t)dt =Zx+δ x f(t)dt, no matter whether δ is nonnegative or negative, in which case the right-hand side too is an infinitesimal.

The second part of the theorem follows from the fact that m and M exist as before such that, m·dx ≤ F(x + dx)−F(x) =Zx+dx x f(t)dt ≤ M ·dx. If dx ∼ 0, dx > 0, m and M can be taken infinitesimally small, so that for some δ ' 0, F(x + dx)−F(x) = f(x)dx + δ·dx, which gives the desired result.

To show the last part of the theorem, note that for some constant c, F(x) = G(x) + c, a ≤ x ≤ b, and that F(a) = 0. The proofs of the next two theorems are not particularly of a nonstandard nature, and for that reason are kept rather short.

123

Theorem 3.7.3 (Substitution rule.) Let F0(x) = f(x) for x ∈ [a,b], and let x = g(w) for w ∈ [α,β] such that g maps [α,β] onto [a,b], with g(α) = a, g(β) = b. Also assume that g is continuously differentiable. Then, Zb a f(x)dx =Zβ α f(g(w))g0(w)dw.

Proof: From Theorem 3.6.1 it follows that,

(F(g(w)))0 = F0(g(w))g0(w) = f(g(w))g0(w),

hence, Zb a

f(x)dx = F(b)−F(a) = F(g(β))−F(g(α)) =Zβ α

(F(g(w)))0dw =

Zβ α

f(g(w))g0(w)dw.

Theorem 3.7.4 (Integration by parts.) If both f and g are continuously differentiable in [a,b], then, Zb a f(x)g0(x)dx = f(b)g(b)−f(a)g(a)−Zb a f0(x)g(x)dx.

Proof: Since (f(x)g(x))0 = f(x)g0(x) + f0(x)g(x) it follows that, J =Zb a (f(x)g(x))0dx, exists and is equal to the sum of the two integrals in the statement of the theorem. But J = f(b)g(b)−f(a)g(a).

3.8 Pitfalls in nonstandard analysis

In this section it is shown by means of a number of examples that some care is required when applying nonstandard analysis.

124

A) The existence of a positive infinitesimal ε means that, ∃ε ∈∗IR : ∀m ∈ IN : 0 < ε < 1/m. By transfer (?) this would give, ? ∃ε ∈ IR : ∀m ∈ IN : 0 < ε < 1/m ?, which obviously is not true. The cause of the trouble is that in the first statement the constant IN is external, so that transfer is not allowed.

B) Any nonempty subset X of IN has a smallest element, that is,

∃X ⊆ IN,X 6= ∅ : ∃x ∈ X : ∀y ∈ X : x ≤ y, which by transfer (?) would lead to, ? ∀X ⊆∗IN,X 6= ∅ : ∃x ∈ X : ∀y ∈ X : x ≤ y ?, which is wrong as can be seen be seen by taking, for example, X = ∗IN−IN (which is external), for if x were the smallest element of X, then also x−1 ∈ X, so x−1 ≤ x. The correct procedure would be:

∀X ∈P(IN),X 6= ∅ : ∃x ∈ X : ∀y ∈ X : x ≤ y, which by transfer gives, ∀X ∈∗(P(IN)),X 6= ∅ : ∃x ∈ X : ∀y ∈ X : x ≤ y, so that X must be internal. The latter is indeed true as could be shown by returning to first principles, i.e. to write ∗(P(IN)) as {H(X1,X2,...) : Xi ⊆ IN}, so that, if xi is the smallest element of Xi, H(x1,x2,...) is the smallest element of H(X1,X2,...). C) As is well-known IR has Archimedian order, that is, given any x ∈ IR there is an n ∈ IN such that n > x, or, ∀x ∈ IR : ∃n ∈ IN : n > x. But ∗IR has no such order, ? ∀x ∈∗IR : ∃n ∈ IN : n > x ?, for take any x ∼∞. Indeed, transfer would give, ∀x ∈∗IR : ∃n ∈∗IN : n > x, which is correct. One might say that ∗IR has hyper-Archimedean order.

125

D) In Section 1.4 it was shown by means of transfer that statements (1.2) and (1.3) are equivalent: ∀ε ∈∗IR,ε > 0 : ∃δ ∈∗IR,δ > 0 : ∀x ∈∗IR,| x−c |< δ :|∗f(x)−∗ f(c) |< ε and, ∀δ ∈∗IR,δ ' 0 : ∗f(c + δ)−∗f(c) ' 0, where c ∈ IR and f : IR → IR. Note that ∗c = c. Replacing c by a nonstandard constant or replacing ∗f by a nonstandard, but internal, function, this equivalence may be destroyed. Examples were already given in Section 3.3.

E) Let S be a bounded subset of IR, then S has a least upper bound in IR. Hence, by transfer (?), the set of all infinitesimals in ∗IR would have a least upper bound β in ∗IR, which it has not, because β and hence 2β would have to be infinitesimals themselves, but 2β > β. Transfer is illegal here because, by Theorem 2.10.4, the infinitesimals form an external set.

126

Chapter 4

Some special topics

4.1 Principles of permanence

In the proof of Theorem 3.3.2 the fact was used that an external set is not internal, and in the remark below that proof that this fact, which is called Cauchy’s principle, is a principle of permanence. In general the latter is the statement that if a certain property P holds for all elements of a certain set A, it must hold for at least one element not in A: [∀a ∈ A : P(a)] → [∃b 6∈ A : P(b)]. The statement is based on the fact that an incompatibility between the property and the set would exist in case the property would only hold for the elements of the set. For example, let the set be IN and let the property be being an element of some given internal subset S of ∗IN: [∀a ∈ IN : a ∈ S] ⇒ [∃b 6∈ IN : b ∈ S]. If S would only contain the elements of IN there would be an incompatibility as S is internal and IN is external. In other words S must contain some ω ∼ ∞. The aspect of permanence here lies in the fact that belonging to S is necessarily carried over from the classical natural numbers to certain hyperlarge numbers. From this point of view Cauchy’s principle would not seem to be a very explicit example of a principle of permanence, and indeed some authors restrict the term ‘principle of permanence’ to cases where if something is true for all elements of some set, it must be true for some element outside that set. On the other hand all principles of permanence different from Cauchy’s principle can be based on the latter (even Fehrele’s principle to be discussed below). It would be wrong to conclude that in the example S would contain all ω ∼∞, since taking ωo ∼∞ arbitrarily, S defined by, S = {n : n ∈∗IN,n ≤ ωo},

127

128

is internal. The latter is easily shown by returning to the basic theory, because, letting ωo = H(noi), S = {H(ni) : ni ≤ noi} = {H(ni) : ni ∈ Si}, where Si = {n : 1 ≤ n ≤ noi}. Note that the example is a special case of overflow (see Theorem 2.11.1), and that overflow, as well as underflow, may be seen as principles of permanence. Theorem 4.1.1 If f : X → ∗IR is an internal function such that ∀x ∈ X : f(x) ' 0, then supx∈X | f(x) |' 0.

Proof 1: Since f(x) is bounded over X, the supremum exists. Denote it by β. Then ∀δ > 0 : ∃x ∈ X :| f(x) |≥ β −δ, that is β ≤ δ+ | f(x) |. Let δ ∼ 0, then it follows that β ' 0. Proof 2: Let I = {β ∈ ∗IR : [∀x ∈ X :| f(x) |≤ β]}. Then I is internal, but it contains the external subset {β ∈ ∗IR : β > 0, β is not an infinitesimal}, hence, by Cauchy’s principle, it must contain some β ∼ 0, from which the result follows.

Note that the first proof, which is classical in nature, is to be preferred. Nevertheless, the second proof is a good illustration of applying Cauchy’s principle. Corollary 4.1.1 Let [a,b] be some interval of ∗IR, such that b−a is limited, and let f and g both be Riemann integrable functions over [a,b] such that f(x) ' g(x) for all x ∈ [a,b]. Then, Zb a f(x)dx 'Zb a g(x)dx.

Proof: Let β = supa≤x≤b | f(x)−g(x) |. According to the theorem β ' 0, hence, 0 ≤

 

 

 

 Zb a f(x)dx−Zb a g(x)dx

 

 

 
≤Zb a | f(x)−g(x) | dx ≤Zb a β ·dx = β(b−a), which is an infinitesimal.

In Cauchy’s principle an external set is ‘confronted’ with an internal set, but there is another principle of permanence (in the more general sense of the term) where two external sets that are of different kinds are ‘confronted’ with each other. A typical example of an external set of the first kind is the set of all infinitesimals in ∗IR, and a typical example of an external set of the second kind is the set of

129

all limited numbers in ∗IR. Obviously, one must find a general rule from which it follows that these two sets are indeed of a different kind. An external set is of the first kind if it is a halo, and it is of the second kind if it is a galaxy. The notions ‘halo’ and ‘galaxy’ are defined below, and the second principle of permanence is Fehrele’s principle:

No halo is a galaxy, hence no galaxy is a halo. A set H is called a halo if there exists an ω ∈∗IN, ω ∼∞, such that, 1) H = ∩n∈INS(n), where (S(n)), n ∈ [1,ω]) is a hyperfinite internal sequence of internal subsets S(n) of some given standard set (such as ∗IN or ∗IR or something totally different), and, 2) H is external.

The second requirement is not superfluous, for take all S(n) equal to some fixed internal set.

Obviously, the sequence involved may be an infinite sequence, because for any ω ∼ ∞ it contains a hyperfinite sequence with [1,ω] as its domain. It is also allowed to define S(n) for n ∈ IN only, even though the sequence would then be external. For, with the operator H as in the basic theory, let S(n) = H(Si(n)), n ∈ IN and define T(n) for n ∈∗IN by, T(H(ni)) = H(Si(ni)), so that T = H(Si) and T is internal. Moreover, T(n) = H(Si(n)) = S(n), n ∈ IN, hence T extends S as a function with domain IN to a function with domain ∗IN. It is no restriction to assume that the sequence (S(n), n ∈ [1,ω]) is nonincreasing, i.e. that S(1) ⊇ S(2) ⊇ ... ⊇ S(ω), for if this is not the case, let, S0(n) = ∩k≤nS(k),n ∈ [1,ω], then, (S0(n),n ∈ [1,ω]) is nonincreasing, and H = ∩n∈INS0(n). It may even be assumed that the sequence is strictly decreasing, i.e. that S(1) ⊃ S(2) ⊃ ... ⊃ S(ω) (perhaps for some other ω). For let, K = {n ∈ [1,ω] : [∃p(n) > n : S(n) ⊃ S(p(n))]}, then IN ⊆ K, because otherwise ∃n0 ∈ IN : ∀p > n0 : S(n0) = S(p), hence H = S(n0), but H is external and S(n0) is internal. Moreover, by the internal definition principle, K is internal, hence ∃ω0 ∼∞ : ω0 ∈ K, i.e. [1,ω0] ⊆ K. Now let m1 = 1, m2 = p(m1),..., and S00(n) = S(mn), n ∈ [1,ω0], then, (S00(n),n ∈ [1,ω0]) is strictly decreasing, and H = ∩n∈INS00(n).

130

Conversely, if H = ∩n∈INS(n) and (S(n), n ∈ [1,ω]), where ω ∼ ∞, is strictly decreasing, then H is automatically external, for let M = {n ∈∗IN : H ⊂ S(n)}. If H were internal, then, by the internal definition principle, M would be internal as well, but M = IN and IN is external. To see that M = IN observe that IN ⊆ M and suppose that H ⊂ S(ω) for some ω ∼ ∞, then H ⊂ S(ω) ⊂ S(n) for all n ∈ IN, so that H ⊂ S(ω) ⊆ ∩n∈INS(n) = H, a contradiction. This proves the next theorem.

Theorem 4.1.2 H is a halo if and only if H = ∩n∈INS(n), where (S(n), n ∈ [1,ω]) for some ω ∼∞ is a strictly decreasing internal sequence of internal sets S(n).

A set G is called a galaxy if there exists an ω ∈∗IN, ω ∼∞, such that, 1) G = ∪n∈INT(n), where (T(n), n ∈ [1,ω]) is a hyperfinite internal sequence of internal subsets T(n) of some given standard set, and, 2) G is external.

As before the sequence may be an infinite internal sequence or be defined for n ∈ IN only. Also the sequence may assumed to be nondecreasing and even strictly increasing, as can be seen by an argument similar to the one leading to the preceding theorem.

Theorem 4.1.3 G is a galaxy if and only if G = ∪n∈INT(n), where (T(n), n ∈ [1,ω]) for some ω ∼∞ is a strictly increasing internal sequence of internal sets T(n).

Remarks:

1. The given standard set is arbitrary, and hence may be some abstract set. This shows the generality of the two definitions, where numbers only play a part in the definitions of the two notions, and even this can be weakened, as can be seen from the next remark. 2. The definitions can be generalized by replacing ∗IN by some standard index set ∗A, and letting S and T be internal functions mapping the a ∗A to internal sets S(a) and T(a). Then H = ∩a∈AS(a) and G = ∪a∈AT(a).

Theorem 4.1.4 (Fehrele’s principle.) No halo is a galaxy, hence no galaxy is a halo.

131

Proof: (Van den Berg.) Assume to the contrary that some halo H is equal to some galaxy G. Let H = ∩n∈INS(n) and G = ∪n∈INT(n), with S(n) and T(n) internal, (S(n)) nonincreasing and (T(n)) nondecreasing. Then T(n) ⊆ S(n) for all n ∈ IN. Let I = {n ∈ ∗IN : T(n) ⊆ S(n)}. Then I is internal, as follows from the internal definition principle. Also IN ⊆ I, so that, since IN is external, ω ∈ I for some ω ∼ ∞, so that S(n) ⊇ S(ω) ⊇ T(ω) ⊇ T(n) for all n ∈ IN, hence H ⊇ S(ω) ⊇ T(ω) ⊇ G, or H = S(ω) = T(ω) = G, so that H = G would be internal.

Exercise: Show that the subset G of an internal set S is a galaxy if and only if S −G is a halo.

Corollary 4.1.2 (Robinson’s lemma.) Let (s(n), n ∈ ∗IN), s(n) ∈ ∗IR, be an internal sequence, such that s(n) ' 0 for all n ∈ IN. Then, ∃ω ∈∗IN,ω ∼∞ : ∀k ∈∗IN,k ≤ ω : s(k) ' 0.

Proof 1: (Van den Berg.) Let H = {n ∈ ∗IN : [∀k ≤ n : s(k) ' 0]} and G = IN. Then G is a galaxy and H ⊇ IN = G. If H is external, then G ⊂ H (by Fehrele’s principle), and if H is internal then trivially G ⊂ H, hence G ⊂ H anyway. Let ω ∈ H −G, then ω ∼∞, and s(k) ' 0 for all k ≤ ω. Proof 2: (Robinson; in time preceding the first proof, and using Cauchy’s principle.) Let S = {n ∈ ∗IN : [∀k ≤ n :| s(k) |≤ 1/k]}, then S is internal and S ⊇ IN, hence (by Cauchy’s principle) S ⊃ IN, so that, ∃ω ∈∗IN,ω ∼∞ : ∀k ≤ ω :| s(k) |≤ 1/k. Let k ≤ ω. If k ∼ ∞, then s(k) ' 0, as then 1/k ' 0, and if k ∈ IN, then by assumption s(k) ' 0.

Corollary 4.1.3 (Dominated approximation.) Let f, g and h be functions from ∗IR to ∗IR, Riemann integrable over (−∞,+∞). Let f and g be internal but h be standard. Assume that f(x) ' g(x) for all limited x, and that | f(x) |, | g(x) |≤ h(x) for all x ∈∗IR. Then, Z+∞ −∞ f(x)dx 'Z+∞ −∞ g(x)dx.

132

Proof: Given any n ∈ IN, let β = sup |x|≤n | f(x)−g(x) |, then β ' 0, as follows from Theorem 4.1.1, and Corollary 4.1.1 implies that, ∀n ∈ IN :Z+n −n f(x)dx 'Z+n −n g(x)dx, so that, by Robinson’s lemma, there exists an ω ∼∞ such that, Z+ω −ω f(x)dx 'Z+ω −ω g(x)dx. Since, as h is standard,R|x|≥ω h(x)dx ' 0, it follows that, Z|x|≥ω f(x)dx ' 0 and Z+ω −ω f(x)dx 'Z+∞ −∞ f(x)dx, and similarly for g instead of f, from which the desired result follows. Exercise: Show that if (s(n), n ∈ ∗IN), s(n) ∈ ∗IR, is an internal sequence such that ∀n ∈ IN : s(n) |< 1/n, then ∃ω ∼∞ : ∀n ∈ [1,ω] : s(n) |< 1/n.

4.2 The saturation principle

The saturation principle is concerned with infinite sequences of internal sets and does not hold in classical mathematics. An infinite sequence (S(n), n ∈ IN) of sets – internal or not – has the finite intersection property if ∩n k=1S(k) 6= ∅ for alln ∈ IN.

Theorem 4.2.1 (Saturation.) Let the infinite sequence (S(n), n ∈ IN) of internal sets S(n) have the finite intersection property, then the intersection of all of them is nonempty, i.e.∩n∈INS(n) 6= ∅.

Proof 1: (Not using permanence.) Let S(n) = H(Si(n)), where H is the Hoperator of the basic theory. For all n ∈ IN, let, T(n) = ∩n k=1S(k), and Ti(n) = ∩n k=1Si(k), then for n ≥ 2, T(1) ⊇ T(2) ⊇ ... ⊇ T(n), hence, {i : Ti(1) ⊇ Ti(2) ⊇ ... ⊇ Ti(n)}∈ U,

133

where U is the basic free ultrafilter. Also {i : Ti(n) 6= ∅∈ U}, and since i : i ≤ n} is a finite set, Qn = {i : i ≥ n,Ti(1) ⊇ Ti(2) ⊇ ... ⊇ Ti(n),Ti(n) 6= ∅}∈ U. Obviously, Qn ⊇ Qn+1. For i ∈ Q2, i ≥ 2, let ni be the maximal n ≥ 2 such that, Ti(1) ⊇ Ti(2) ⊇ ... ⊇ Ti(n), Ti(n) 6= ∅, and n ≤ i. This ni is well-defined. Then {n2,n3,...} is not bounded, because if it were, so that ni ≤ m for some m ∈ IN, then Qm+1 = ∅, but Qm+1 ∈ U. Now for each i ∈ Q2, i ≥ 2 take si ∈ Ti(ni) and take si arbitrarily otherwise, then for each n ∈ IN, H(si) ∈ S(n), because for each n ∈ IN there is an ni ≥ n, so that as si ∈ Ti(ni), also si ∈ Ti(n) ⊆ Si(n), and H(si) ∈ S(n), as Q2 ∈ U. Proof 2: (Using Cauchy’s principle, and more elegant.) First extend the given sequence to an internal sequence as indicated below the definition of halo (this is not necessary in the first proof). Now let Q = {n ∈∗IN : ∩n k=1S(k) 6= ∅}, then Qis internal and Q ⊇ IN, hence Q ⊃ IN and, ∃ω ∼∞ : ∩ω k=1S(k) 6= ∅, so that certainly ∩k∈IN S(k) 6= ∅.

Note that the second proof leads to a more general result, which in fact is a principle of permanence.

In classical mathematics a counterexample to the theorem is, for example, the sequence (S(n)) where S(n) = {n,n + 1,...}, n ∈ IN.

Corollary 4.2.1 Let A be a given internal set, and (S(n)) an infinite sequence of internal subsets of A. If for all n ∈ IN, ∪n k=1S(k) 6= A, then ∪k∈INS(k) 6= A.Hence if the union of any finite number of S(n) does not fill up A, then the union of all of them does not fill up A.

Proof: The proof follows from the fact that (∪S(n))c = ∩Sc(n), where c denotes complementation with respect to A.

Corollary 4.2.2 Given an infinite sequence (S(n)) of internal sets, then S = ∪k∈INS(k) is internal if and only if there exists an n ∈ IN such that S=∪n k=1S(k).

Proof: The if-part follows immediately. Conversely, if S is internal, then so are all T(n) = S −S(n), and ∩k∈INT(k) = ∅, hence there must exist an n ∈ IN such that ∩n k=1T(k) = ∅, which means that S = ∪n k=1S(k).

134

Corollary 4.2.3 Given an infinite sequence (S(n)) of internals sets, then S = ∩k∈INS(k) is internal if and only if there exists an n ∈ IN such that S = ∩n k=1S(k).

Proof: By complementation from Corollary 4.2.1.

4.3 Stirling’s formula

In order to provide still more evidence that nonstandard mathematics can be a very elegant substitute for classical mathematics in this section Stirling’s formula for large factorials will be derived by nonstandard means. The argument closely follows that given in Van den Berg and Sari [27]. It takes the definition and properties of e as the base of the natural logarithm for granted, as well as those of π as the area of the unit circle, and that, Z+∞ −∞ exp(−x2)dx = √π. By definition, Γ(x) =Z∞ 0 e−ttx−1dt, x ∈∗IR,x > 0. Also the existence of this integral is taken for granted. Let ω be any positive hyperlarge element of ∗IR, so that, Γ(ω + 1) =Z∞ 0 e−ttωdt. The integrand is increasing in the interval [0,ω] and decreasing in the interval [ω,∞), for which reason the variable t is replaced by u = (t−ω)/ω, giving, Γ(ω + 1)ω−ω−1eω =Z∞ −1 e−ωu+ω log(1+u)du, so that the integrand now reaches its maximum at u = 0. It so happens that there exists a positive infinitesimal δ such that the contributions of the integrand over the intervals [−1,−δ] and [+δ,∞) may be ignored, so that only the interval [−δ,+δ] need be taken into account. In other words, the ‘mass’ of the integrand is almost entirely concentrated in a hypersmall interval around zero. Instead of the infinitesimal δ consider for the time being any d ∈ ∗IR, 0 < d < 1, split [−1,∞) into [−1,−d], [−d,+d] and [+d,∞) and indicate the integrals of e−ωu+ω log(1+u) over these subintervals by, Z−d −1 ,Z+d −d ,Z∞ +d , respectively.

135

1) [+d,∞). Since the second derivative of u−log(1+u) is positive, it follows from Taylor’s theorem that, u−log(1 + u) > d−log(1 + d) + (u−d)d/(1 + d), giving, after replacing in the integrand the left-hand side of this inequality by its right-hand side, and evaluating the resulting integral, that, 0 <Z∞ +d < e−ωd+ω log(1+d)(1 + d)/(ωd), if – in view of the denominator ωd−d is not an infinitesimal. Since −d+ log(1 + d) < 0 it then further follows that, ∀m ∈ IN : 0 <Z∞ +d < ω−m. Let G = {d ∈ ∗IR : 0 < d < 1, d is not an infinitesimal}, then since the positive infinitesimals form a halo, G is a galaxy. But, H = ∩m∈IN{d ∈∗IR : 0 < d < 1,0 <Z∞ +d < ω−m, is a halo that clearly contains G, hence, by Fehrele’s principle, H must contain a positive infinitesimal δ0, such that, ∀m ∈ IN : 0 <Z∞ +δ0 < ω−m. Obviously, δ0 may be replaced by a larger infinitesimal. 2) [−1,−d]. Now, u−log(1 + u) > −d−log(1−d)−(u + d)d/(1−d), and since d + log(1−d) < 0 it follows similarly that, ∀m ∈ IN : 0 <Z−d −1 < ω−m, if again d is not an infinitesimal, but nevertheless there must be a positive infinitesimal δ00, that may be replaced by a larger one, such that, ∀m ∈ IN : 0 <Z−δ00 −δ00 < ω−m. The details are left as an exercise. Letting δ = max{δ0,δ00} it follows that, ∀m ∈ IN : 0 <Z∞ +δ +Z−δ −1 < 2ω−m, showing that the contributions of the two ‘tails’ are extremely small, which is not yet to say that they may be ignored.

136

3) [−δ,+δ]. By Taylor’s theorem, u−log(1+u) = (u2/2)/(1+θu)2, for some θ, 0 < θ < 1, so that u−log(1 + u) = u2(1 + ε0(u))/2 for some ε0(u) ' 0. Replacing u by v = u√ω then gives, for some ε(v) ' 0, √ω·Z+δ −δ =Z+δ√ω −δ√ω exp(−v2(1 + ε(v))/2)dv. Here δ is fixed such that δ√ω ∼∞. Now let f(v) = exp(−v2(1+ε(v))/2), g(v) = exp(−v2/2), and h(v) = exp(−v2/4), then from Theorem 4.1.3 it follows that, Z+δ −δ ∼ ω−1/2 ·Z+∞ −∞ exp(−v2/2)dv =q(2π/ω). Combining everything (in 2) it is sufficient to take m=1) finally leads to,

lim x→∞

Γ(x + 1) xxe−xq(2πx) = 1.

For more general results the reader may consult Van den Berg [28] and Koudjeti [29].

4.4 Nonstandard mathematics without the axiom of choice?

The preceding pages should have made it clear that nonstandard mathematics can be introduced in a way that is well-known to classical mathematicians. But logicians claim that from the point of view of logic and axiomatics when relating, say, IR to ∗IR our naive approach obscures the insight into what is really happening. They are right, but nevertheless a naive approach could well be more understandable and also more acceptable, because there is no verdict on external sets, and because the axioms can easily be grasped (not even the Zermelo-Fraenkel axioms of set theory are necessary, requiring to look at natural numbers as sets and implying the unintended fact that there must be hyperlarge natural numbers). Yet, one stumbling stone remains: the axiom of choice. Couldn’t we do without? Let us try and see what happens if the same general line of thinking is followed but the underlying free ultrafilter U over IN is replaced by the Fr´echet filter Fo (see Section 1.14). This means that again infinite sequences of classical entities will generate entities that either are new or are identified with their

137

classical counterparts. It also means that we no longer follow ideas of Luxemburg and others, but follow instead those of Chwistek (see Section 1.9) and perhaps those of Cauchy, who does not seem to be very explicit, however, when it comes to defining infinitesimals. Anyway, it would have been difficult for Cauchy to base his informal treatment of the infinitesimals on a free ultrafilter U, because the free ultrafilter theorem (see the Appendix) was not known to him, and moreover the axiom of choice had in his time still to be ‘invented’. Chwistek is much more explicit, but does not develop anything that could be appreciated as a fully fledged infinitesimal calculus. Clearly, Q ∈ F0 if and only if i ∈ Q for i ≥ n for some n ∈ IN. The latter will be rephrased as ‘for i large enough’. As has been made clear in Section 1.10 in order to introduce ∗IR (now with respect to F0) it will again be necessary to consider all infinite sequences of real numbers. Instead of, H(si) = H(ti) if and only if {i : si = ti}∈ U (Section 2.2), the definition of equality will be,

H(si) = H(ti) if and only if i is large enough,

and (Section 2.3), H(si) = s if and only if si = s for i large enough,si,s ∈ IR, Theorem 2.3.1 remains true, although the only-if part of the proof must be modified: If {i : Si = Ti}6∈ Fo, then either there is a subsequence (si(j), j ∈ IN) such that si(j) ∈ Si(j), but si(j) 6∈ Ti(j), or there is a subsequence with the roles of S and T reversed (or both). Assume the first case, and take si ∈ Si arbitrarily if i is not an index of the subsequence. Then H(si) ∈ S and H(si) 6∈ T, i.e. S 6= T, and similarly for the other case.

So, again,

H(si) = {H(si) : si ∈ Si}, and Theorem 2.3.2 too remains valid.

Also the introduction of internal pairs, n-tuples in general and functions does not cause difficulties. But with Theorem 2.4.2 the problems begin. Let S = {0,1}, then S contains q = H(0,1,0,1,...). Even though S is finite, ∗S 6= S, because q 6= 0 and q 6= 1.

 

138

Moreover q is not hyperlarge, so what is it as an element of ∗IN? Actually ∗S turns out to be infinite, and the conclusion is that ∗S contains far to many elements, i.e. the Fr´echet filter does not operate properly and allows too much to go through. But perhaps this is not really harmful. The survey at the end of Section 2.4 reveals more trouble, however, for although ∗∅, ∗ =, ∗ ∈ and ∗∪ are equal to or equivalent to ∅, =, ∈ and ∪, respectively, this is not true for: 6=, 6∈, ∩, – and c. In fact, each of the relevant equivalences or equalities must be replaced by an implication or an inclusion in the correct direction, as the reader can find out for her or himself. It follows that ∗ 6=, ∗ 6∈, ∗∩, ∗− and ∗c are all new relations or operations. Consequently, it follows from Section 2.6 (ignoring the implications of Section 2.5 on externality) that L oˇs’ theorem (Section 2.7) is no longer true, and that the same holds for transfer (Section 2.8). As one of the many counterexamples, consider H(Si)∗∪H(Ti) that is no longer equal to H(Si)∪T(i). Section 2.7 reveals even more trouble: although ∗∧is equivalent to∧; ∗¬, ∗∨, ∗ ⇒ and ∗ ⇔ are not equivalent to ¬, ∧, ⇒ and ⇔, respectively. Finally, let us review the quantifiers. Although, given Xi and ci, ∃H(xi) ∈ H(Xi) : H(xi) = H(ci) is equivalent to, H(∃xi ∈ Xi : xi = ci), the equivalence is in general invalid if the simple statement xi = ci is replaced by some other statement. Similar remarks apply to ∀. Also the definition of ∗R with R some binary relation is cumbersome, for suppose that s2iRt2i but ¬(s2i−1Rt2i−1) for all i ∈ IN, then H(siRti) is neither true nor false. Compare this to the example before with H(0,1,0,1,...). So H(siRti) is not an ordinary statement, but an awkward internal something. Yet the definitions of ∗ < and ∗ >, for example, do not cause any problems, simply because < and > are not yet defined for hyperreal numbers, and the following case of transfer regarding continuity is legitimate, ∀ε ∈ IR,ε > 0 : ∃δ ∈ IR,δ > 0 : ∀x ∈ IR,| x−c |< δ :| f(x)−f(c) |< ε is equivalent to, ∀ε ∈∗IR,ε > 0 : ∃δ ∈∗IR,δ > 0 : ∀x ∈∗IR,| x−c |< δ :|∗f(x)−f(c) |< ε. The definitions of infinitesimal and hyperlarge number can even be given a very simple form: ε ' 0 if and only if ε = H(δi) where (δi) converges to 0,

139

and, ω ∼∞ if and only if ω = H(ωi) where (ωi) converges to +∞ or to −∞.

Remarks:

1. It may well be that Cauchy’s informal treatment comes closer to this kind of transfer, than to transfer with respect to some free ultrafilter U, but we will never know for sure. 2. Compare the definition of ω ∼∞ with Corollary 2.12.1. Even the following remains true: f is continuous at c ∈ IR if and only if, ∀δ ∈∗IR,δ ' 0 : ∗f(c + δ)−f(c) ' 0. The proof of this equivalence is simple, because the latter statement is equivalent to, ∀(δi,i ∈ IN), tending to 0 : (f(c + δi)−f(c),i ∈ IN) tends to 0, which is equivalent to the continuity of f at c, and we are back at the plausible reasoning of Section 1.10. Note that Cauchy applied the simplified definition also to arbitrary c ∈ ∗IR, so that he must have used some sort of S-continuity (see Section 3.3). In fact the ε−δ definition was introduced later on by Weierstrass. The conclusion must be that with Fo instead of U nonstandard mathematics becomes a very restricted theory and nothing is left of the logician’s equivalence ideal. More seriously to the ordinary mathematician, nothing is left of entirely new mathematical models that can be studied on the basis of a free ultrafilter and that cannot exist in classical mathematics, but nevertheless have turned out to be of great value not only within mathematics, but also outside it (such models have not been treated in this book). On the other hand what is left in the restricted theory can be based on well-known facts (such as the equivalence of (4.1) with ordinary continuity).

140

Appendix

The proof of the theorem below is based on the axiom of choice. It is the only instance where this axiom is required for the theory of nonstandard mathematics, assuming that the corresponding classical mathematics does not require it. The axiom of choice can be shown to be equivalent to Zorn’s lemma, stating that if each totally ordered subset of a partially ordered nonempty set E has an upper bound in E with respect to the implied order, then E has at least one maximal element.

E is called partially ordered if there exists a binary relation ρ, called order relation or simply order, for some or all pairs of elements of E such that for a,b,c ∈ E, 1) if aρb and bρc then aρc, 2) if aρb and bρa then a = b, and 3) aρa for all a ∈ E. A subset G of the partially ordered E with order ρ is totally ordered with respect to ρ if aρb or bρa or both for all pairs (a,b), a,b ∈ E, and m is a maximal element of E if [∀a ∈ E : mρa] implies that aρm and hence that a = m. The proof showing the equivalence of the axiom of choice and Zorn’s lemma is by no means trivial, and can be found in several textbooks, e.g. Dunford and Schwartz [30].

Theorem Free ultrafilters over IN exist.

Proof: By ‘filter’ will be meant ‘filter over IN’. Let Fo be the Fr´echet filter, i.e. the set of the complements of all finite subsets of IN, so that Q ∈ Fo if and only if {n,n + 1,...}⊆ Q for some n ∈ IN. Let E be the set of all filters F such that F ⊇ Fo. Then E is nonempty and can be partially ordered by means of the order ρ, where aρb if and only if a ⊆ b, i.e. the order is set inclusion. Let G be any totally ordered subset of E. Then, B = ∪{F : F ∈ G} is an element of E, and B is an upper bound of G. Obviously B ⊇ Fo, and that B is a filter can be seen as follows.

141

142

1) IN ∈ B, as IN ∈ Fo ⊆ B. 2) ∅6∈ B, as B is the union of filters. 3) If Q ∈ B and IN ⊇ R ⊇ Q, then Q ∈ F for some F ∈ B, hence R ∈ F, so R ∈ B.4) If Q,R ∈ B, then Q ∈ F and R ∈ F0 for certain F,F0 ∈ G. As G is totally ordered F ⊆ F0 or F0 ⊆ F (or both). Assume F ⊆ F0, then Q,R ∈ F0, so Q∩R ∈ F0 ⊆ B. That B is an upper bound for G is easily seen. According to Zorn’s lemma E must contain a maximal element U. Since U ⊇ Fo, U is a free filter. U is also an ultrafilter. For let Q ∈ U be arbitrary. In order to show that either Q ∈ U or Qc ∈ U consider the following two cases. Case 1: Suppose ∀Q0 ∈ U : Q∩Q0 is infinite. Let, V = {T : T ⊆ IN,T ⊇ Q∩Q0 for some Q0 ∈ U}. Then V is a filter. The verification of this statement is left as an exercise. Also V ∈ E, for let Q0 = {n,n + 1,...}, and U ⊆ V . By maximality V ⊆ U. But Q ∈ V , hence Q ∈ U. Case 2: Suppose ∃Q00 ∈ U : Q∩Q00 is finite. Then ∀Q0 ∈ U : Qc ∩Q0 is infinite. To see this let Q0 ∈ U be arbitrary. Since both Q0 and Q00 belong to U, also Q0∩Q00 ∈ U and is infinite. Since Q∩Q0∩Q00 is finite Q0∩Q00−Q∩Q0∩Q00 is infinite, i.e. Qc∩Q0∩Q00 is infinite and so is Qc∩Q0. Now apply Case 1 with Qc instead of Q.

References

[1] Heath, T.L., The Works of Archimedes, with the Method of Archimedes, Dover Publications, 1912, chapter VII.

[2] Euler, L., Introduction ad Analysin Infinitorum, 1748. [3] Luxemburg, W.A.J., What is Nonstandard Analysis?, American Mathematical Monthly, 80, 1973, 38–67. [4] Cauchy, A.L., Course d’analyse de l’´ecole royale polytechnique, 1821.

[5] Lakatos, Imre, Cauchy and the Continuum: The Significance of Non-standard Analysis for the History and Philosophy of Mathematics, The Mathematical Intelligencer, 1978, 151–161.

 

[6] Robinson, A., Non-standard analysis, Proceedings Royal Academy, Amsterdam, Series A, 64, 1961, 432–440.

[7] Robnson, A., Nonstandardi Analysis, North-Holland, 1966 (2nd revised edition in 1974, 3rd edition in 1996, Princeton University Press).

[8] Hahn, H., ¨Uber die nichtarchimedische Groszensysteme, S.-B. Wiener Akademie, Math.-Natur. Kl. 116, Abt. IIa, 1907, 601–655.

[9] Skolem, T., ¨Uber die Nicht-charakterisierbarkeit der Zahlenreihe mittels endlich oder abzahlbare unendlich vieler Aussagen mit ausschliesslich Zahlenvariabelen, Fund. Math. 23, 1933, 150–161, [10] Hewitt, E. Rings of real-valued continuous functions I, Trans. Amer. Math. Soc. 64, 1948, 45–99. [11] L oˇs, J., Quelques remarques, theor`emes, et probl`emes sur les classes definissables d’algebras, in: Skolem et al., eds., Mathematical interpretations of formal systems, North-Holland, 1955, 98–113.

[12] Laugwitz, D., and C. Schmieden, Eine Erweiterung der Infinitesimalrechnung, Mathematische Zeitschrift 69, 1958, 01–39.

[13] Luxemburg, W.A.J., Nonstandard Analysis, Lectures on A. Robinson’s theory of infinitesimal and infinitely large numbers, Caltech Bookstore, 1962. [14] Nelson, E., Internal set theory, Bull. Amer. Math. Soc. 83, 1977, 1165–1193. [15] Robert, A., Nonstandard Analysis, Wiley, 1988.

[16] Diener, F., et G. Reeb, Analyse Non Standard, Hermann, 1989.

143

144

[17] Chwistek, L., ¨Uber die Hypothesen der Mengenlehre, Mathematische Zeitschrift 25, 1926, 439–473.

[18] Chwistek, L., The Limits of Science, translated from the Polish, Routledge Kegan Paul, 1948.

[19] Chang, C.C., and H.J. Keisler, Model Theory, North-Holland,

[20] Beeson, M.J., Foundations of Constructive Mathematics, Springer, 1985.

[21] Bishop, E., and D. Bridges, Constructive Analysis, Springer, 1985.

[22] Beth, E.W., The Foundation of Mathematics, North-Holland, 1965.

[23] Beth, E.W., Mathematical Thought, Reidel, 1965, chapter V.

[24] Heyting, A., Intuitionism, An Introduction, North-Holland, 1966.

[25] Potter, M.D., Sets: An Introduction, Clarendon Press, 1990.

[26] Keisler, H.J., Elementary Calculus, An Infinitesimal Approach, Prindle, Weber Schmidt, 1986.

[27] Berg, I.P. van den, en T. Sari, Inleiding tot de infinitesimaalrekening, (Introduction to the infinitesimal calculus; Lecture Notes, University of Groningen, 1988; private communication; in Dutch). [28] Berg, I.P. van den, Nonstandard Asymptotic Analysis, Lecture Notes in Mathematics, nr. 1249, Springer, 1987.

[29] Koudjeti, F., Elements of External Calculus with an Application to Mathematical Finance, thesis, University of Groningen, 1995.

[30] Dunford, N., J.T. Schwartz, Linear Operators, part I, Interscience, 1957.

 

 

 

 

Index

accumulation point, 111 appreciable, 18, 83, 96 Archimedian order, 122 ∗-transform, 19, 28, 95 ∗-transform expression, 69 ∗-transform of relation, 71 ∗-transform operation, 69 ∗-transforms of attributes, 81 atomic relation, 26, 71 atomic statement, 71 axiom of choice, 15, 34, 40, 45, 139 axiomatics, 134 axioms of set theory, 15

basic assumptions regarding n-tuples, 48 basicassumptionsregardinggeneratingnew constants, 49 basic assumptions regarding logic, 45 basic assumptions regarding sets, 46 basicassumptionsregardingthenaturalnumbers, 45 bijective, 50 bound variable, 21 boundary point, 111 bounded set, 109

Cartesian product, 50 Cauchy sequence, 55, 91 Cauchy’s principle, 106, 125, 131 chain rule, 115 classical mathematics, 15 closed set, 110 closure, 111 complement of subset, 47 composite function theorem, 104 concurrent Cauchy sequences, 56 constant, 46 constructivism, 41, 43 constructivistic mathematics, 15 continuity, 82, 100, 104 countable, 82

countably infinite set, 50 critical point theorem, 116

denumerably infinite set, 50 difference of sets, 70 differentiation, 114 direct product, 50 domain, 50, 112 dominated approximation, 129 dummy variable, 21

elements of set, 47 empty set, 47 equality, 47 equality of integers, 55 equality of internal constants, 58 equality of rationals, 55 equality of reals, 55 excluded third, 15 exhaustion, 30 existence, 46 existential quantifier, 20 expressions, 69 extensionality, 47 external constant, 28, 57, 67 external notion, 94 external set, 134 extreme value theorem, 103

Fehrele’s principle, 128, 133 filter, 51 finite, 18, 83 finite intersection property, 130 finite number, 73 finite sequence, 50 finite set, 50, 81 formalism, 41 formalistic mathematics, 15 formulae, 73 Fr´echet filter, 52, 134, 139 free ultrafilter, 40, 49, 52, 53, 134, 139

145

146

free variable, 20 function, 49 function value, 50

galaxy, 127 generalized mean value theorem, 117 generating sequence, 94 generation infinite sequences, 134 graph, 49

halo, 110, 127 hyper n-tuple, 61 hyper-Archimedean order, 122 hyperconstant, 57 hypercontinuity, 82 hypercountable, 82 hyperfinite number, 82 hyperfinite set, 81 hyperfunction, 58, 62 hyperlarge, 17 hypernumber, 58 hyperpair, 58 hyperreal, 18 hyperreal number, 82 hyperset, 58 hypersmall, 87 hypersmall number, 17

idealization, 35 identification, 49 identification of integers, 55 identification of internal n-tuples, 61 identification of internal constants, 58 identification of internal functions, 61 identification of internal sequences, 62 identification of internal sets, 59 identification of rationals, 55 identification of reals, 55 improper integral, 119 increment theorem, 115 individual, 47 inductive proof, 46 infinite sequence, 50, 102 infinite set, 50 infinitely large, 16, 17, 83 infinitely large number, 96 infinitesimal, 15, 17, 83, 96 injective, 50 integers, 55 integration, 118

integration by parts, 118 interior point, 110 intermediate value theorem, 103 internal n-tuples, 61 internal composition, 62 internal constant, 28, 57, 63 internal definition principle, 71, 96 internal function, 61, 104 internal notion, 94 internal pair, 61 internal sequence, 62 internal set theory, 57 intuitionism, 43 inverse function, 112 inverse of a bijective function, 50 inverse overflow, 86, 97 inverse underflow, 86, 97

L’Hopital’s theorem, 117 least upper bound theorem, 81, 99 limit, 102, 104 limit definition, 15 limited, 18, 83 logic, 134 logical connective, 20 L oˇs’ theorem, 72

map, 49 mapping, 49 maximal element, 139 mean value theorem, 117 monotonically decreasing, 112 monotonically increasing, 112

n-tuple, 46, 94 natural extension, 67 natural number, 15 negative hyperlarge, 83 nonstandard analysis, 15 nonstandard constant, 57 nonstandard number, 16 nonstandard version, 67 number, 94

one-to-one, 50 one-to-one onto, 50 onto, 50 open set, 110 operation, 69 ordered pair, 48

147

overflow, 85, 97 overspill, 85, 97

pair, 15, 46 paradoxes, 48 partial order, 139 permanence, 130 positive hyperlarge, 83 power set, 47, 68, 70 predicate, 73 prenex normal form, 22 primitive notion, 15 principle of permanence, 106, 125

range, 50, 112 rationals, 55 real number, 82 real number system, 15 reals, 55 recursive functions, 43 regular of level k, 47 relation, 71 Riemann integral, 118 Riemann integration, 118 Riemann sum, 119 Robinson’s lemma, 129 Rolle’s theorem, 116

S-continuity, 105, 137 S-uniform continuity, 106 S-differentiable, 115 S-limit, 106 saturation, 130 sentence, 73 sequence, 50 set, 15, 46, 94 specification, 46 standard constant, 34 standard copy, 66 standard definition principle, 81, 96 standard notion, 94 standard part, 19, 91, 98 standardization, 35 statement, 96 Stirling’s formula, 132 subset, 47 substitution rule, 121 surjective, 50

Taylor’s theorem, 117

猜你喜欢

转载自blog.csdn.net/yuanmeng001/article/details/83374301