HPS 0628 Paradox

Back to doc list

Paradoxes of Naive Set Theory

John D. Norton
Department of History and Philosophy of Science
University of Pittsburgh

A set, as it is naively understood, is just a collection of anything you like. Most often, in this conception, people think of sets of numbers, like the set of even numbers. But the conception is very expansive. All stars in the universe form a set. All mice on earth form a set. The collection of stars and mice form a set.

This conception is adequate for virtually all practical uses of sets. However the notion has earned the disparaging title "naive" since it turns out to harbor a foundational paradox of such importance that it has controlled work in set theory since the start of the 20th century.

The paradox is Russell's paradox. We shall see below that it can be set up in a few sentences without any great technical machinery. Yet its effect is devastating. The paradox itself has close affinities with those seen in earlier chapters. It depends on self-reference, as do the liar paradoxes. It also connects with notions of infinity in that diagonalization shows that there is no end to the increasing hierarchy of infinities.

Sets, Naively

Naively understood, a set is anything you care to collect. Commonly numbers figure in set theory. The sizes of sets are often at issue and collecting numbers is a way to keep track of the sizes.

Here is the set of natural numbers {1, 2, 3, 4, 5, ...}

Sets can have sets as elements. Here is a set whose elements are the set of odd numbers and the set of even numbers, { {1, 3, 5, 7, ...} , {2, 4, 6, 8, ...} }

We can collect as many stars as we like to form a set:

We can collect as many mice as we like to form a set:

We can nests within sets within sets:

What is so Important about Sets?

Recently, however, the foundational importance of sets has been challenged by the categories of category theory.

Why care about a problem in set theory? The reason is that sets have become fundamental to mathematics. They are used to define all the other entities of mathematics. Counting numbers can be built from sets. From them we define all the other types of numbers and the entities of geometry.

How does the construction work? The obvious way to recover the natural numbers (and zero) is from the cardinality of sets. We might form sets from objects like apples:

0 is a set with no members. {}
1 is a set with one member {}
2 is a set with two members {, }
3 is a set with three members {,, }
and so on.

There are two problems. First, we need to keep coming up with new apples of different colors. We have to add a different apple each time we increase the size of the set, or there would be no increase at all. Recall:

{, , , , , } = {}

since duplications of elements in the list inside the curly brackets does not change the set.

Second, use of real objects like apples ties the basic definitions of mathematics to quite specific things in the world. We would like numbers to be independent of whether the world has apples, pears, stars and galaxies.

There is a way. The key is the null or empty set:


It is independent of anything in the world. It is a set theoretic embodiment of nothing. It can be 0, zero.

We can now use it to build up all the natural numbers. We need to use nothing but sets formed from the null set. At first, that may seem impossible. The trick is that we can nestle sets within sets within sets. We never need a set with elements that are apples or mice or galaxies.

To get 1, 2, 3, ..., we could try:

0 = {}
1 = {{}}, 2={{}, {}}, 3 = { {}, {}, {} }, ...

Self-test puzzle: why would we end up with 1 = 2 = 3 = ... ?

That does not work since, with this definition, the duplications do not change the set. We would have 1 = 2 = 3 = ...

We can escape the problem by a simple alteration:

0 = {}, 1 = {0} = { {} },
2 = {1} = { { {} } },
3 = {2} = { { { {} } } },

Self-test puzzle: Is that right? No set here has more than one element?

That does work and could be used. The oddity of it is that each set in the construction only ever has one member at most.

It would be nice if the set theoretic representation of 1, 2, 3, ... were sets each with cardinality 1, 2, 3, ... The standard construction of numbers does this and proceeds as follows:

0 = {}
1 = {0} = { {} }
2 = {0, 1} = { {} , { {} } }
3 = {0, 1, 2} = {{} , { {} } , { {} , { {} } } }

Overall this construction has the nice property that:

natural number n = set with cardinality n

This tangle of curly brackets grows rapidly to ever greater tangles. We have only the first few numbers listed. A graphical representation is kinder to our powers of understanding:

Whichever way we look at it, the curious and striking fact is that numbers are called into being by set operations and nothing else.

It need not be this way. In Ancient Greece, geometry held pride of place foundationally. Its basic elements were not sets but the list of entities provided in the opening pages of Euclid's Elements: points, lines, surfaces and so on.

The Set of All Sets

The early clues that there is a problem with the naive notion of sets came with Cantor's account of infinite sets. If we can collect anything at all to make a set, then why not collect all sets. Then we have formed the set of all sets.

The trouble starts when we ask how big this set is. It is, of course, very big. After all, it has all sets in it as elements. But then Cantor's theorem, which we saw earlier, tells us something odd about it. There is a set bigger than the set of all sets: its power set.

This is a real puzzle. We might be inclined to say "Well, if the set of all sets really is the set of all sets, then included in it is are all the sets forming elements of its own power set!"

That does not work. For the power set is one order of infinity greater than set of all sets. That means the elements of the power set are too many to fit into the set. The situation is analogous to the jump from rational numbers to real numbers. There are just so many more real numbers that we cannot somehow encode them within the rational numbers.

In short, we have true contradiction:

If U is the universal set of all sets then:
1. The power set P(U) is of greater cardinality than the universal set U, by Cantor's theorem.
2. The power set P(U) is of no greater cardinality than U, since the power set P(U) is formed from sets, all of which must be elements of the set of all sets U.

Bertrand Russell

This last discussion indicates that Cantor's work had generated a problem for set theory and there were versions of it in the literature towards the end of the nineteenth century. The definitive version of the problem was discovered by Bertrand Russell in 1901. In his autobiography, he described the reflections that led to it.

Bertrand Russell, Autobiography. London: Routledge, 1975, p. 138.

"Cantor had a proof that there is no greatest number, and it seemed to me that the number of all the things in the world ought to be the greatest possible. Accordingly, I examined his proof with some minuteness, and endeavoured to apply it to the class of all the things there are. This led me to consider those classes which are not members of themselves, and to ask whether the class of such classes is or is not a member of itself. I found that either answer implies its contradictory. At first I supposed that I should be able to overcome the contradiction quite easily, and that probably there was some trivial error in the reasoning."

Russell continued to describe how the gravity of problem gradually became clear to him. He connected it with liar paradox.

"Gradually, however, it became clear that this was not the case. Burali-Forti had already discovered a similar contradiction, and it turned out on logical analysis that there was an affinity with the ancient Greek contradiction about Epimenides the Cretan, who said that all Cretans are liars. A contradiction essentially similar to that of Epimenides can be created by giving a person a piece of paper on which is written: ‘The statement on the other side of this paper is false.’ The person turns the paper over, and finds on the other side: ‘The statement on the other side of this paper is true.’ It seemed unworthy of a grown man to spend his time on such trivialities, but what was I to do? There was something wrong, since such contradictions were unavoidable on ordinary premisses. Trivial or not, the matter was a challenge."


The paradox appeared in Chapter X "The Contradiction" of Russell's 1903 The Principles of Mathematics. (Cambridge University Press).

Russell's Paradox

Russell's paradox arises from a variation in the problem of the set of all sets. It is noteworthy in reducing the notions needed to produce the problem to the barest minimum. All that is needed is the naive idea of a set and the notion of what it is for something to be an element (or member--as Russell said) of a set.

There are sets that are  members of themselves and there are sets that are not members (i.e. elements) of themselves.

From Bertrand Russell, the Philosophy of Logical Atomism, Lecture 7.

This second group is the sort that we almost invariably think about. Russell's example is of teaspoons. A set of  teaspoons is not itself a teaspoon. Here are teaspoons:

Here is set of teaspoons:

This is the situation we almost always encounter. When we form a set, it is not an element of itself. However there are cases of sets that are members of themselves. They are more awkward to form. The set of all sets is a member of itself. The set of abstract mathematical entities is a member of itself. Perhaps to avoid the awkwardness of these sorts of examples, Russell's examples included a negative one: The set formed from all things that are not teaspoons. That set is something formed from things that are not teaspoons.

If we call the set of all sets that are not members of themselves the "Russell set," then Russell's paradox follows:

Russell's paradox

Either the Russell set is a member of the Russell set or it is not.

If the Russell set is a member of the Russell set, then it is a set that is a member of itself. Therefore it is not a member of the Russell set.

If the Russell set is not a member of the Russell set, then it is a set that is not a member of itself. Therefore it is a member of the Russell set.

This is the compact version of the paradox. It is not quite explicitly a contradiction, that is, a sentence of the form "A and not-A." We can turn it into one by various means. One is to treat the problem as a pair of reductio arguments.

Assume for reductio that the Russell set is a member of the Russell set. It follows that the Russell set is not a member of the Russell set, which contradicts the assumption for reductio. We conclude the assumption is false. Hence:

The Russell set is not a member of the Russell set

Assume for reductio that the Russell set is not a member of the Russell set. It follows that the Russell set is a member of the Russell set, which contradicts the assumption for reductio. We conclude the assumption is false. Hence:

The Russell set is a member of the Russell set

The conjunction of the two conclusions of the reductio arguments give us the contradiction.

As a liar paradox

As Russell mentions in his autobiographical recollections, the Russell paradox has strong affinities with the liar paradox. Both depend essentially on self-reference. The liar sentence ("This sentence is false.") derives a contradiction through referring to itself. The Russell paradox does the same. The parallel is obvious:

Liar paradox The liar sentence is true.

The liar sentence is false.
The liar sentence is false.

The liar sentence is true.
Russell paradox The Russell set is a member of itself.

The Russell set is not a member of itself.
The Russell set is not a member of itself.

The Russell set is a member of itself.

As a diagonalization

Russell's paradox had its origins in Russell's reflections on Cantor's theorem, a diagonalization, and has the character of a diagonalization argument, if we are prepared to be liberal in our understanding of how diagonalization works.

Loosely speaking, a diagonalization posits a mapping from one set to another. It then constructs an entity that eludes the mapping and thereby contradicts a property assumed for the mapping.

We assume, for reductio, that a set can be mapped one-to-one onto its powers set. We then construct a set by diagonalization that does not lie within the range of the mapping, that is, on the "to" side of the mapping from elements of the set to elements of its power set.

Correspondingly, we can characterize the supposed existence of a Russell set through its indicator function. That is a function that assigns a value of 1 or 0 to each set, according to whether it is a member of the Russell set or not. For clarity, we can replace the 1 and 0 by "in" and "out." Then the mapping supposed is:

All sets listed
"in" or "out" of Russell set?
Set of all teaspoons
Set of all things that are not teaspoons out
Set of all sets out
By the argument given above, the Russell set cannot be one of the sets in the left column assigned "in"; and it cannot be one of these sets assigned "out."

The Russell set is, by its construction, a set that eludes this mapping. There is no consistent way to include it in the mapping. That is, Russell set fails to be well-defined as a set in so far as sets can be specified by which are and are not their elements.

The outcome of Russell paradox is that we need to abandon something. The assumptions used in setting up the paradox are so sparse as to leave us little choice but to abandon something major. The standard solution is that we should abandon the naive notion of a set. Sets, whatever they are, cannot be conceived as a collection of anything you please.

Gottlob Frege

The import of the paradox was profoundly destructive. Russell had been following the work of Gottlob Frege into the logical foundations of arithmetic. Russell realized that his paradox struck at the heart of Frege's system. Russell explained the difficulty to Frege in a letter of 1902. Frege replied and recognized the gravity of the problem. It came at a difficulty moment. Frege had just completed the second volume of a major work, Grundgesetze der Arithmetik [Basic Laws of Arithmetic]. He managed to add an appendix to the printing.

Translation from The Frege Reader. M. Beaney, ed., Blackwell, 1977. pp. 279-80

"Hardly anything more unfortunate can befall a scientific writer than to have one of the foundations of his edifice shaken after the work is finished.
   This was the position I was placed in by a letter of Mr. Bertrand Russell, just when the printing of this volume was nearing its completion...
... What is in question is not just my particular way of establishing arithmetic, but whether arithmetic can be given a logical foundation at all.
   But let us must come to the point. Mr. Russell has found a contradiction which must now be stated..."

It is at once the most tragic of passages a writer can pen and, at the same time, a profound testimony to Frege's integrity and intellectual honesty.

Here is the title page:

Here is the tragic appendix:

Solatium miseris, socios habuisse malorum = "It is a comfort to the wretched to have companions in misery."

Vicious-Circle Principle

Russell's immediate reaction to this paradox was to propose a wide-ranging principle designed to preclude not just the specific problems raised by the Russell set, but to preclude all self-referential, liar-like paradoxes involving similar totalities.

Bertrand Russell, "Mathematical Logic as Based on the Theory of Types," American Journal of Mathematics, 30, 1908 pp. 222-262. On p. 225.

In one version, the principle reads:

"This leads us to the rule: 'Whatever involves all of a collection must not be one of the collection;' or, conversely: 'If, provided a certain collection had a total, it would have members only definable in terms of that total, then the said collection has no total.' "

A footnote begins:

"When I say that a collection has no total, I mean that statements about all its members are nonsense..."

Bertrand Russell and Alfred North Whitehead, Principia Mathematica. Cambridge: at the University Press, 1910. p. 40.

A later version in Russell's joint work with Alfred North Whitehead names the principle:

"The principle which enables us to avoid illegitimate totalities may be stated as follows: 'Whatever involves all of a collection must not be one of the collection'; or, conversely: 'If, provided a certain collection had a total, it would have members only definable in terms of that total, then the said collection has no total.' We shall call this the 'vicious-circle principle,' because it enables us to avoid the vicious circles involved in the assumption of illegitimate totalities. Arguments which are condemned by the vicious-circle principle will be called 'vicious-circle fallacies.' "

The principle immediately precludes the existence of the structures that surround the Russell paradox. The set of all sets is an entity denied and talk of its members is precluded as nonsense. The Russell set similarly is precluded since it too is a totality defined in terms of the totality.

Problems for the Vicious-Circle Principle

The principle is very strong and wide-ranging. Russell immediately pointed out its scope. A long held and revered principle of logic is the law of the excluded middle:

All propositions are either true or false.

The "all" in this law includes the law of the excluded middle itself. If follows that the law is judged to be a nonsense assertion.

These sorts of problems have meant that the vicious-circle principle has no firm place is modern analysis. Sometimes it works, in the sense that it gives the result we want. Sometimes it does not work.

Barber Shop in Richardson, Texas, circa 1920. https://commons.wikimedia.org/wiki/File:TexasRichardson_barberShop.jpg

For example, recall Russell's Barber paradox, which is a version of the liar paradox:

We are asked to consider a barber in some village or other location. That barber "... shaves all those, and those only, who do not shave themselves." Who shaves the barber?

This barber is defined in terms of the totality of people in the village and the relations of who shaves whom among them. The vicious-circle principle allows us to dismiss the identification of the barber since the barber is defined in terms of the totality.


While that may seem a welcome escape, it is hard to distinguish this application of the vicious-circle principle from a similar one:

The oldest person in the village is defined as that villager who is older than all the other villagers.

This oldest person is defined in terms of the totality so is disallowed by the vicious-circle principle. However that there is an identifiable oldest person seems quite unproblematic and prohibition of the idea severe and destructive.

Another example brings us closer to the problem.

Consider the set of all the real numbers between 0 and 1, including 0 and 1. It is unproblematic to define x as the largest number in the set. x is just 1.

But now consider the set of all the real numbers between 0 and 1, excluding 0 and 1. Define y as the largest number in the set. We have an immediate problem, for there is no largest number in the set.

To see this, pick a candidate largest number, say 0.99. It is not the largest, since 0.999 is larger and in the set. For any number in the set we choose, similar arithmetic operations can yield a larger one.

For any number in the interval, no matter how close it is to one, we can always find a larger one.

Here is my appraisal: defining things in terms of the totality to which they would belong is sometimes benign and sometimes not. The vicious-circle principle is right to alert us to the danger than sometimes the mode of definition fails. But it errs on the side of excessive caution in enjoining us to prohibit all such definitions.

The Russell-Whitehead definition above tries to separate the bad from the good cases with the formulation "...members only definable in terms of that total..." The key difference is between "definable" and "only definable." The problem remains, however, if the import of "only" is left vague. "The largest member of the set" is only definable in terms of the totality. How this formula happens to pick out a number in one case and not another is the issue that "only" scarcely touches.

Russell's Theory of Types

The vicious-circle principle escapes Russell's paradox by asserting negatively what we cannot do with sets. What positive account of sets can we give? Russell embarked on the project of reforming the foundations of set theory and logic so as to accommodate this principle.

Russell's answer came in the form of his "theory of types." The leading idea is that sets live in a hierarchy, akin to the hierarchical notions of truth provided by Tarski's later theory. The paradox is resolved since the notion of the set being a member of itself, or not a member of itself, violates the hierarchy.

The theory of types come in multiple versions and is quite complicated. Here, only the barest glimpse of it can be provided to see how it escapes the Russell paradox.

The theory is set within the formal confines of symbolic logic. This is a logic that talks of individuals; their predicates, that is, properties; and properties of properties; and so on. Sets arise as a byproduct, as the extension of predicates. That is, they are the collections of things that carry the property.

The essential component is a labeling of terms by "types."

Type zero entities are individuals like a (red) apple or a (green) pear.

Type one entities are properties of individuals, such as "is red" or "is green." What makes them type one is that they can be predicated only of type zero entities. That means that


meaning "this apple carries the property red" is allowed. However red(green) is not allowed, since both red and green are type one entities.

Type two entities are properties of type one entities. "is a color" is a type two entity since it can be predicated of type one entities. We can have

color(red) and color(green).

The classification of types continues indefinitely and becomes considerably more complicated.

Sets (or "classes" in Russell's system) arise as what are known as "extensions" of predicates.

Consider all the individuals that carry the property "red."

They form the set of red things.

We get a higher level set, the set of colors, by considering the extension of the property "is a color."

This is the only way sets are allowed to be formed. The resulting hierarchical structure of sets precludes us even asking if a set is member of itself. For a set inherits the type of the predicate that defines it. A set can only be formed from sets of a lower type.

For example, the set of colors is of type two. It derives that type from the type two predicate "is a color." Its extension is the set of colors: {red, green, blue, ...}. The predicate is applied to type one predicates, red, green, etc.

color(red), color(green), ...

The predicate "is a color" cannot be applied to itself since then we would be trying to apply a type two predicate to a type two predicate.


What this means extensionally is that we cannot include the set of colors as a member of the set of colors, along with the specific colors like red and green.

Or, to be clear, this set is NOT allowed:

The prohibition is stronger than merely saying "It is false that being a color is being a color." Rather, the prohibition precludes this last assertion from even being included in the applicable logic. "color(color)" is just a nonsense sentence like "cat dog blah blah" would be in ordinary English.

In extensional terms, that means that the set of colors cannot be a member of the set of colors. Even to consider the possibility within the confines of Russell's theory of types is to contemplate nonsense.

Now let us apply this analysis to the Russell set.

To form the Russell set, we use the predicate "Russell" = "does not apply to itself." The Russell set is the extension of this predicate. To ask if the Russell set is a member of itself, we need to apply the predicate to itself. That is, we need to form:

Russell (Russell)

That self-application is prohibited since it requires us to apply a predicate to another predicate of the same type. To attempt it, is to attempt to assert nonsense. In extensional terms, this means that we cannot form the Russell set.

The paradox is resolved.


Type theory does resolve Russell's paradox. However it does so at some cost. Just as Tarski's theory of truth gave us multiple notions of truth, so also Russell's theory gives us multiple notions of sets. We might ask, "What is a set?" The answer would then be that there is no such thing as a set simpliciter. There are many types of sets, type 0, type 1, ...

It is possible to remove the differences among sets by types using an iterative notion of sets. The approach is analogous to Kripke's iterative notion of truth that reduces Tarski's multiple notions of truth to just one. However a different approach, the axiomatic approach, has been adopted as the standard way to respond to the paradoxes of naive set theory.

The approach was initiated by Ernst Zermelo in 1908 and developed by Abraham Fraenkel in 1922. What results is the most common axiom system: Zermelo-Fraenkel set theory. It is routinely called just "ZF"; or "ZFC" if the axiom of choice is includes. (We shall see below why this extra axiom is separated out.)

The starting point of the axiomatic approach is that we do not need to ask what a set really is. Rather the axiomatic approach just says:

"There are sets and their elements. Here are some truths about them. There are enough truths here for you to be able to say anything you need to say about sets without getting into trouble with paradoxes."

The axiomatic approach operates within a formal language. That is, there is no loose talk, such as "The set A is collection of whatever you want to collect together." What can be said is strictly limited to combinations of symbols and the rules for combining them.

For example, we might be allowed to write:

a ∈ A

We would look at it and say: "There is a thing a that is an element of a set A." Informally, that is fine. But within the strict confines of axiomatic set theory, all we have is a string of three symbols, a, ∈ and A that we are allowed to write down.

The axioms of the theory then tell us what else we can write down. As the stock of licit sentences that we can write grows, we find ourselves building up something that looks like the more familiar set theory.

An analogy to chess might help in adopting the right conception. In chess, we are given a collection of pieces: king, queen, bishops, etc. We can do nothing with them until we consult the rules of the game. They tell us where we can place the pieces on the board and how we may move them about. The totality of the game resides in the rules and what they allow.


We might imagine that what unfolds before us on the board is a monumental battle  between two great armies. That is a construction of our imagination. All that is really before us are pieces of wood conforming to configurations allowed by a set of rules. We might think that knights on horseback can ride forward in a straight line and battle fiercely all enemies in their path. But the rules of chess do not allow it. Knights are permitted two squares forward and one to the side, only; and they fight and vanquish only whatever they find in that terminating square.

Correspondingly we write complicated sentences using a combination of symbols allowed by the axioms of the theory. They may authorize us to write "a ∈ A." We naturally might say to ourselves: "There is a set A which is a collection of elements, one of which is a." Whenever a symbol for a set appears, we would form the corresponding picture. Almost always that picture would be unproblematic. However when the sets become large and paradoxical sentences can be written, that picture is misleading. We might want to write:

"∀A (A∈U)"

We read it to say "For all sets A, A is an element of U." That is, U is the universal set that collects together all sets. However the resulting picture of collecting all sets into a universal set depicts just what is prohibited. This sentence is precluded by the axiomatic theory. It appears at best as an assumption to be refuted by a reductio when it leads to a contradiction.

Axiom of Extensionality

The axioms give us the properties of sets, one by one, until we have enough to characterize sets. An early axiom is extensionality. It gives us the key property that sets are defined only by their members. The sets:

{a, b}, {b, a}, {a, b, a, a}, {b, b, b, a}

are all the same set. All that matters is that each has a as an element, b as an element and nothing else.

From Herbert B. Enderton, Elements of Set Theory. New York: Academic Press, 1977. p.17.

In formal terms, the axiom is written as:

∀A ∀B [∀x (x ∈ A  ⇔  x ∈ B)  ⇒  A = B]

This axiom can be decoded into more familiar language as:

∀A ∀B For all sets A and for all sets B

∀x for all elements x
x ∈ A x is an element of A
logically entails and
is logically entailed by
x ∈ B x is an element B
A = B A and B are the same sets.

Or, in even simpler language: two sets are the same, if the elements of one are elements of the other; and vice versa.

The Axioms of ZFC

The remaining axioms of the system continue in this vein. Some are equally simple. Others are more complicated. For our purposes, a rendering of the axioms in more informal language is all that is needed. We will not attempt major derivations within ZFC. The axioms are displayed here merely to give a rough impression of their content. A fuller analysis would go well beyond what can be accomplished here.

Quoted directly from Joan Bagaria, "Set Theory", The Stanford Encyclopedia of Philosophy (Spring 2020 Edition), Edward N. Zalta (ed.), https://plato.stanford.edu/archives/spr2020/entries/set-theory/ Section 2.1. The Axioms of ZFC

The formal versions are given as a supplement to this article in https://plato.stanford.edu/entries/set-theory/ZF.html

Here is an accessible, informal version.


These are the axioms of Zermelo-Fraenkel set theory, or ZF. The axioms of Null Set and Pair follow from the other ZF axioms, so they may be omitted. Also, Replacement implies Separation.

Finally, there is the Axiom of Choice (AC):


Escape from Russell's Paradox

The important work of escaping from Russell's paradox is carried by the axiom of separation. That axiom allows us to form sets from properties.

For example, consider the set of all natural numbers.

{1, 2, 3, 4, ...}

The property "is even" picks out all the even numbers. The axiom allows us to form them into a set

{2, 4, 6, 8, ...}

The Russell set was introduced by looking for the extension of a property. That is, we looked for all those sets that carried the property "is not a member of itself." Let us call this the "Russell property."

Let us see if the axiom system allows us to form a set with this Russell property. The axiom of separation only allows us to use this property to form a set if that set is a subset of another set. Can there be such a larger set? Perhaps there is a universal set of all sets? Or perhaps there is some other set in the system such the Russell set is a subset?

When the Russell property is implemented within the confines of this axiomatic theory, it turns into a short proof by reductio that there is no such larger set that hosts the Russell set as a subset:

Assume for reductio that there is a set A such that the Russell set is a subset, allowed by the axiom of separation using the Russell property.

By familiar argumentation, the existence of the Russell set entails a contradiction.
If the Russell set is a member of the Russell set,
   then it has the Russell property and is not a member of the Russell set.
If the Russell set is not a member of the Russell set,
   then it has the Russell property and is a member of the Russell set.

The reductio is complete. The assumption of the existence of a set A is false.

This reductio shows that the Russell set can be a subset of no set. Hence there is no Russell set in the sets authorized by the axiom system. There is no paradox any more.

The Axiom of Choice

What it says

The axioms of ZF are generally untroubling. The exception is the axiom of choice. As above, it is commonly separated from the axioms of ZF and, when added, the full system is "ZFC" "Zermelo-Fraenkel set theory with the axiom of choice." It is worth taking a short excursion into the issues it raises, since they lead us to one of the most perplexing paradoxes of the last century, the Banach-Tarski paradox.

The axiom seems quite innocuous. Assume that we have some set of pairwise disjoint sets; that is, no two sets share a common element. Then there is another set composed of elements, one drawn from each these sets.

For example, here is an infinite collection of disjoint sets of natural numbers:

1 to 10
11 to 100
101 to 1,000
1,001 to 10,000

We form a new set by drawing just one element from each. There are many ways to do this. One simple way is just to pick the smallest element from each:

{1, 11, 101, 1,001, ...}

What's the problem?

How, we might well ask, could a simple construction such as this cause any hesitation? The issue is that the axiom says:

"... there exists a set that contains exactly one element from each set..."

In the example of the natural numbers, we had no trouble specifying a set that satisfies the conditions of the axiom. We just picked the least number in each set. We added that specification to the axiom.

The trouble is that the axiom applies not just to easy cases like these sets of natural numbers. It also applies to all sets allowed by the axiom system. They include sets of real numbers. One might think that adding a specifying rule that picks out just one number from each set will always be possible.

For example, why not just keep using the specifying rule of "pick the least from each set"? The trouble is that infinite sets commonly have no least element.

Take for example the real numbers that lie between 0 and 1, excluding 0 and 1. There is no least element and no greatest element. The real 0.01 is in the set, but 0.001 is also in the set and it is smaller; and 0.0001 is also in the set and it is smaller again.


We might suppose that the problem here is that we are using a poor notion of "less than" in picking out the least element. Might there be another way of ordering the elements of any set so that every subset has a least element?

To presume so is to presume what is called a well-ordering of a set. Such a well ordering is a ranking of all the members of the set such that each subset has a least element.

Proof sketch: If a set is well ordered, then the least element of each subset is the element from each set to be combined by the axiom of choice into a set. Conversely, if through the axiom of choice, we can select just one element from a set, then declare that element to be least element of the set in the well ordering. Now take the set with that one element removed and repeat; and repeat again; until all subsets have a least element determined.

It is tempting to suppose that finding such an ordering is just a matter of sufficient ingenuity on our part. However--maddeningly--all efforts to find a general prescription that works everywhere fails. It turns out that the axiom of choice is logically equivalent to the assertion that every set can be well ordered. The axiom of choice entails that there is a well-ordering; and the existence a well-ordering entails the axiom of choice.

In sum, the axiom of choice is distinctive among the other axioms in that it just asserts the existence of a set without giving a means of identifying the set.

Non-measurable sets

Perhaps we could live with the idea that the axiom choice calls for the existence of sets without telling us how to identify them. As Hamlet remarked:

"There are more things in heaven and earth, Horatio,
Than are dreamt of in your philosophy."

The trouble is that sometimes we would like to identify the sets. In an earlier chapter, we saw that there are non-measurable sets. These are sets to which no additive measure can be assigned. All known examples of such sets are called into being by the axiom of choice, without a full prescription that allows us to identify them. There are strong reasons to believe that no prescription can be given for such sets.

This non-constructlble character of non-measurable sets became the key sticking point in one of the most controversial of the paradoxes of the early twentieth century.

Banach-Tarski Paradox

The paradox was published in 1924 by Stefan Banach and Alfred Tarski. The result on first acquaintance is both easy to state and equally implausible. They consider a sphere in Euclidean space. Banach and Tarski show that there is a decomposition into five pieces, such that the five pieces may be recombined to give two spheres of same size as the original sphere. Four of the pieces are continuum size in the points of the Euclidean space and the fifth is merely countable.

All the rearrangements are carried out by isometries of the Euclidean space. That is, they are transformations that preserve the volume of the pieces moved, if the pieces have a volume (and other geometric properties such as what is perpendicular to what).

An engaging and still quite thorough account is given in Leonard M. Wapner. The Pea and the Sun: A Mathematical Paradox. Wellesley, Massachusetts: A.K. Peters. 2005.

The details of the decomposition involve some fairly fancy mathematics that go well beyond what can be covered here. There are many accounts of the decomposition and recomposition. None are, in the end, simple since the process is not simple.

An urn model

To get a glimpse of why this result is not so impossible, recall that a sphere is composed of infinitely (continuum) many points. Infinities let us do strange things. Here is a simple operation in some ways like that of the Banach-Tarski paradox.

Imagine that one has an infinitely capacious urn filled with a countable infinity of equal sized balls. They are numbered 1, 2, 3, ...

We can fill a second urn of equal size merely by removing all the even-numbered balls from the first urn and putting them in the second urn. That is, we take all the balls in the urn:

We separate out the even numbered balls:

We put these even numbered balls in the second urn:

We can now renumber the balls such that each urn has an infinity of balls numbered 1, 2, 3, ...

By mere rearrangement of the balls, starting with one urn, we have somehow conjured up a doubling of the number of balls and we have enough balls to fill two urns.

The Banach-Tarski paradox is about measure

This urn model gives something of the flavor of the process in the Banach-Tarski paradox. However it falls short in what is the most significant part. While the sphere of the Banach-Tarski paradox has infinitely many points in it, there is also a measure, the volume measure of Euclidean space. Doubling an infinity by rearranging the individual points is unsurprising. We have seen that sort of manipulation in the various forms of the Hilbert hotel. However we have not seen a similar doubling to do with the volumes of measurable spaces.

What sharpens the puzzle is that the movement of parts in the reassembly is by a volume preserving isometry. If the reassembly had included transformations that are not isometries, then there would be no puzzle. We can easily double a sphere simply by uniformly expanding all its points:

This expansion is not an isometry precisely because it does not preserve the volume of the sphere. The proper isometries (i.e. excluding reflections) are translations in space

and rotations in space.

Non-measurable sets

How, then, is the decomposition and reassembly possible? The key fact is that the original sphere is decomposed into non-measurable parts. That is, they are parts to which no volume can be assigned without contradiction. Visualizing the structure of such parts is difficult. You need to imagine points and voids interspersed at all levels in such a way that no set of points clump together well enough to form a region to which a volume can be assigned.

Banach and Tarski happened to use a decomposition into non-measurable parts. We now see that they had no choice. For if the decomposition was into measurable parts, then the parts could only be recombined to form a new shape equal in volume to the original sphere. The Banach-Tarski paradox then provides an indirect demonstration of the existence of non-measurable sets.

Now comes the weakness emphasized by critics of the Banach-Tarski paradox and of the axiom of choice itself:

Banach and Tarski could give no prescription for identifying the parts of the decomposition. They used the axiom of choice to justify the existence the parts.

That is, Banach and Tarski could give no answer to the critic who demands "Show me just how to carry out this decomposition." All they can say is that, on the authority of the axiom of choice, such a decomposition exists.

Here the paradox differs from the urn model. There we had the prescription: odd numbered balls go into one urn; even numbered balls into the other.

The question of how to deal with this tension remains open. For some, the oddity of the Banach-Tarski paradox and the evident weakness of justification by the axiom of choice is too much. They deny both. For others, losing the axiom of choice is too great an obstacle to mathematical theorizing to be sustained. The oddity of the Banach-Tarski paradox is the price paid for a serviceable mathematics.

Another example of how the non-constructibility of non-measurable produces unexpected problems is associated with infinite lottery machines and is discussed in a later chapter.

To Ponder

Is the naive notion of set really in trouble? Sets are used in many parts of quantitative science and even mathematics by researchers who pay no attention to foundational issues and likely do not even know of them. How can this be?

Find something prohibited by the vicious circle principle.

A routine construction of the numbers 0, 1, 2, 3, ... is shown in the text. They are conjured up as sets from nothing other than sets. Recall 0 is {}, 1 is { {} }, 2 is { {} , { {} } }, ... and so on. Can these really be the numbers? If not, what is missing from them?

Consider the two resolutions of the Russell paradox: a hierarchical construction like the theory of types and an axiomatic approach such as the ZFC system. What are the good and the bad of each? Which do you prefer?

These resolutions of the Russell paradox do some violence to our intuitive notions. Is there a less destructive solution?

Does the Banach-Tarski paradox provide sufficient basis for excluding the axiom of choice from set theory?

July 8, October 26, 2021

Copyright, John D. Norton