HPS 0628 | Paradox | |

Back to doc list

Problems for Probability: the Principle of Indifference

John D. Norton

Department of History and Philosophy of Science

University of Pittsburgh

http://www.pitt.edu/~jdnorton

- The Principle of Indifference
- Problems
- Finest Partition
- No Preferred Partition
- Geometrical Probabilities
- Escape: A Benign Principle of Rationality
- Completely Neutral Support
- Invariances
- Incomparable States
- History of the Problem
- To Ponder

How are we to assign probabilities to outcomes? So far, the probabilistic systems we have considered arise within mechanical randomizers, such as coin tosses and die rolls. There the physical properties of the randomizers are so set up that the probabilities assigned to the various outcomes are unambiguous. We make sure that our dice a perfect cubes and rolled with vigor, so each outcome has an equal probability.

https://commons.wikimedia.org/wiki/File:Dice_-_1-2-4-5-6.jpg

How are we to treat systems in which these careful provisions are not in place? What is the probability that the stock market goes up tomorrow versus goes down? What is the probability that there will be a snowstorm on January 1 next year? What is the probability that there is life on some recently discovered, distant planet?

https://commons.wikimedia.org/wiki/File:10_PM_March_12_surface_analysis_of_Great_Blizzard_of_1888.png

The principle of indifference offers assistance that can be applied in many contexts. We have no reason to favor a head over a tail on a coin toss, so we set their probabilities equal to a half.

The principle of indifference generalizes this strategy to all cases. "No reason to favor" becomes "equal probability." This apparently innocent principle rapidly leads to paradoxes, as we shall now see.

An early and still definite
treatment of the principle of indifference is in Chapter 4, "The Principle
of Indifference," of Keynes' *A Treatise on Probability* (London:
MacMillan, 1921). In this chapter, Keynes names the principle. He judges
it a better name that the earlier "principle of non-sufficient reason."

Here is his statement of the principle (his emphasis, p. 42):

"The Principle of Indifference asserts that
if there is no *known* reason for predicating of our subject one
rather than another of several alternatives, then relatively to such
knowledge the assertions of each of these alternatives have an *equal*
probability. Thus *equal* probabilities must be assigned to each
of several arguments, if there is an absence of positive ground for
assigning *unequal* ones."

This formulation of the principle is immediately followed by an ominous:

"This rule, as it stands, may lead to paradoxical and even contradictory conclusions..."

Keynes provides a series of illustrations of the
problems that come with the principle. The first and simplest concerns a
book that we have not yet seen and we are unsure of
the color of its cover.

Might it be red?

Since we know nothing of the color of the book, we have as much reason to
expect red as not-red. We have no reason to prefer
one over the other. It follows from the principle that we must assign an
equal probability to both cases. Since those two cases exhaust all the
possibilities, they must sum to one. That is, each must be probability one
half:

Probability (red) = 1/2

Now we ask, might the color be black? We have no reason to prefer black over non-black. By the same reasoning, we again arrive at a probability of one half.

Probability (black) = 1/2

Again, we ask might the book be blue? A similar analysis yields a probability of one half.

Probability (blue) = 1/2

Combining these three probabilities, we have:

Probability = 1/2 |
Probability = 1/2 |
Probability = 1/2 |

Since these colors are mutually exclusive and also mutually exclusive of other colors, we have

P(red) + P(black) + P(blue) + P(some other
color)

≥ P(red) + P(black) + P(blue)

= 1/2 + 1/2 + 1/2 = 3/2

We immediately have a contradiction with normalization of the probability distribution to one. For, as we were reminded in the probability refresher, the total outcome space has the maximum probability one. These three probabilities already sum to greater than one.

This illustration is one of
many that populate the literature on the principle of
indifference. They are essentially the same paradox, but dressed up in
different specifics.

There is a solution that, initially, appears as if it will resolve the paradox. It is that we should select the finest partition of the outcome space, that is the one with the most specific outcomes, and apply indifference just to the outcomes of the finest partition.

Something like it is familiar from the analysis of coin tosses and related problems. For example, assume that there are two coins on display.

What is the probability for each of the three possibilities: no heads, one head, two heads. Since we have no reason to prefer one over another, the principle of indifference direct us to assign equal probability to each, that is, a probability of 1/3rd to each:

Probability = 1/3 |
Probability = 1/3 |
Probability = 1/3 |

Anyone who has a familiarity with coin tossing problems will immediately want to correct this list of possibilities.

No heads, one head, two heads

It is not the finest partition of the outcome space. We learned repeatedly that we need to break up the second outcome into two outcomes, so that the full space has four outcomes:

No heads,

head on first coin and tail on second,

tail on first coin and head on second,

heads of both coins

Assigning equal probability to each of these four outcomes gives us:

Probability = 1/4 |
Probability = 1/4 |
Probability = 1/4 |
Probability = 1/4 |

The escape proposed is that the paradoxes of indifference disappear if we proceed to the finest partition and apply the principle of indifference there only.

It may seem that this restriction to the finest partition resolves the paradoxes. However that is hasty. First, we should note that the familiar coin tossing case suggests but not does require the finer distribution of probabilities of 1/4 for each case.

The reason is that, in the coin tossing case, we are told antecedently, as part of the initiation of the problem, that each coin is tossed fairly and that the tosses are independent. Those physical conditions determine the result that the probability of the four familiar outcomes are equal.

We have no corresponding physical background
in the present case. All we are told is that there are *two coins on
display*:

The conditions under which they come to be displayed are not specified. We have no details of the physical process that put them in the frame. Perhaps these are rare coins in a coin collector's display. Then it is very likely that the collector wants to display both sides of the coin, so that one head-one tail is the probability one case.

it does seem rash to assume
that the world will always present us pairs of coins just as if they had a
history akin to that of two tossed coins. This does *not* mean
that the principle of indifference has been refuted. Rather it underscores
just how different are the directions of the principle from the more
familiar analyses of coin tosses and the like.

What does present insoluble problems for this "finest partition" escape is that there are many cases in which there is no finest partition; or in which there are multiple candidates that yield contradictory results.

Consider again Keynes' case of the book covers. There is no finest partition. The colors form a continuous spectrum. It spans light with wavelengths of roughly 400 to 700 nanometers (nm).

https://commons.wikimedia.org /wiki/File:Linear_visible_spectrum.svg

Might we apply indifference
over the individual wavelengths? That fails to yield a
probability distribution over the spectrum. There are continuum many
wavelengths, each of which would acquire zero probability. As we saw
earlier in the paradox of measure, these zero probabilities would leave
the probabilities of the intervals of wavelengths undetermined.

There is no finest division
of these colors. Any small interval of colors can be divided into smaller
intervals and then still smaller intervals without end. The range of 400
to 440 can be divided into 400 to 420 and 420 to 440; and then these into
400 to 410, 410 to 420, 420 to 430 and 430 to 440; and so on indefinitely.

Perhaps, one might imagine, that there is still a way to save something like the idea of the finest partition in these cases. Perhaps we do not assign probabilities indifferently to the individual cases, but to equal intervals of wavelengths, no matter how small the intervals may be. This would correspond to a uniform distribution over the wavelengths. We might assign equal probabilities to steps of size ten:

P(400 to 410nm) = P(410 to 420nm) = P(420 to 430nm) = ... = P(690-700nm)

If we make this uniform assignment, it will be compatible with any other uniform assignment that assigns the same probability to equal intervals. Thus it entails

P(400 to 500nm) = P(500 to 600nm) = P(600 to 700nm)

Tempting as this proposal may be, it fails since there is no reason to distribute our indifferences uniformly over the wavelengths of light. We might equally represent the spectrum of colors using frequencies, measured in TeraHertz, THz.

https://commons.wikimedia.org/wiki/File:Inverse_visible_spectrum.svg

The range of frequencies of visible light is roughly 430 THz to 750 THz.

The problem is that a uniform distribution over the wavelengths is not uniform over the frequencies; and a uniform distribution over the frequencies is not uniform over the wavelengths.

This failure of uniformity follows from the relationship between wavelength and frequency:

300,000 = wavelength in nm x frequency in Thz

To see how this failure arises, consider equal intervals of wavelength and how they translate into unequal intervals of frequencies.

Wavelengths (nm) | Wavelength intervals (nm) | Frequencies (THz) | Frequency intervals (THz) |

400 to 500 | 100 | 600 to 750 | 150 |

500 to 600 | 100 | 500 to 600 | 100 |

600 to 700 | 100 | 430 to 500 | 70 |

To see the contradiction,
uniformity of probability over wavelengths leads us to require

P(400 to 500 nm) = P(600 to 700 nm)

which is equivalent to requiring

P(600 to 750 THz) = P(430 to 500 THz)

However the interval 600 to 750 THz is of size 150THz, which is
over twice the interval of 430 to 500 THz of size 70 THz. Equality of
probability over equal frequency intervals would lead us to require that

P(600 to 750 THz) > 2 x P(430 to 500 THz)

In the reversed case, we start by considering equal
intervals of frequency. They then correspond to unequal intervals
of wavelength. The table uses a slightly
adjusted range of frequencies for arithmetic simplicity.

Frequencies (THz) | Frequency intervals (THz) | Wavelengths (nm) | Wavelength intervals (nm) |

420 to 530 | 110 | 566 to 715 | 149 |

530 to 640 | 110 | 469 to 566 | 97 |

640 to 750 | 110 | 400 to 469 | 69 |

This outcome is a paradox with a true contradiction. For repeated application of principle of indifference leads us to conclude two results that contradict:

Probability is uniform over wavelength.

Probability is uniform over frequency.

This mode of failure of the principle of indifference has appeared frequently in the literature in different guises. What they commonly share is that there are two variables used to describe some system, where one is related inversely to the other.

Specific volume = volume per unit mass. Specific
density = mass per unit volume.

Keynes (p. 45) example of this type concerns the specific volume and the specific density of some substance. One is the reciprocal of the other. Hence we have:

specific volume x specific density = 1

Von Mises has a celebrated example of a mixture of wine and water. His two variables are the ratio of wine to water; and the ratio of water to wine. Once again we have:

ratio wine to water x ratio water to wine = 1

In both cases, using the same reasoning as with the light spectrum, we conclude that uniformity over one variable precludes uniformity over the other. However the principle of indifference requires us to distribute probabilities uniformly over both.

A large collection of problems for the principle of induction are associated with "geometrical probabilities" (as identified by Keynes, pp. 47-48). They have the same general structure as the light spectrum case, but are now implemented in a geometrical setting.

One of the simplest of these geometrical paradoxes has been developed in the budget of paradoxes as "Lost in the Desert." Stripped of the lost camper narrative, the paradox concerned the probability of a random point being in the inner circle or in the outer ring of the figure:

If we are indifferent over the distance from the center, then the probability that the point is in the inner circle is the same as the probability that it is in the outer ring. For the range of distances from the center in the inner circle (0 to 1, say) is the same as the range of distances from the center spanned by the outer ring (1 to 2, say).

If we are indifferent over areas, then the probability that the point is in the inner circle is one third the probability that it is in the outer ring, for the outer ring has three times the area.

This is the simplest of the geometric style paradoxes derived from the principle of indifference. Best known among many more examples are those presented by Joseph Bertrand in his 1907 treatise on the probability calculus.

Among Bertrand's examples, the best known is the problem of the chord in a circle. Bertrand considers a circle with an equilateral triangle inscribed inside.

He then considers an indifferently chosen chord of the circle and asks for the probability that its length is greater than the side of the equilateral triangle. Bertrand arrives at different answers according to how indifference is applied. Here are his three cases and the results he achieves:

Probability = 1/3 Indifference over the angles of chords drawn from a point at the apex of the triangle. |
Probability = 1/2 Indifference over chords drawn perpendicular to a radius of the circle. |
probability=1/4 Indifference over the position of the center of all chords drawn in the circle enclosed by the triangle. (Chords with center points in the inner circle are longer than the side of the triangle.) |

These elaborate geometrical paradoxes are quite entertaining. Bertrand's ingenuity is displayed through his finding a case in which we have all three of the simplest probabilities--1/2, 1/3, 1/4--applied in the same case. However, they do not reveal any deeper foundational problems than have already been raised by the simpler version of "lost in the desert" and, before that, indifference over colors.

The weakness of this first approach is that it is
always subject to a rejoinder. It says do not apply indifference in cases
in which condition X fails. The rejoinder: but what if we do not know
whether X applies?

The paradoxes of indifference have been prominent in the literature on probability and have a long history. There have been many solutions offered for them. Some (following Keynes) seek to restrict the situations in which indifferences can be applied. Others seek to escape the paradoxes by relaxing or generalizing how we treat uncertainties formally. In my view, this second approach is correct.

For more details see John D. Norton, "Ignorance
and Indifference." *Philosophy of Science*, 75 (2008), pp.
45-68; and "Cosmic
Confusions: Not Supporting versus Supporting Not-". *Philosophy
of Science*. 77 (2010), pp. 501-23.

Here is my version of this escape from the paradox.

Keynes' canonical statement of principle of indifference has two components in it. He writes, as quoted above:

"The Principle of Indifference asserts that
if there is **no known reason for predicating of our
subject one rather than another** of several alternatives, then
relatively to such knowledge the assertions of each of these alternatives
have an

The two components are indicated by boldface.

The first is a principle or rationality.

If you have no reason to prefer one possibility over another, you should not prefer one over the other.

This is about as benign a truism of rationality as I can conceive. It goes to the heart of rationality. Our beliefs and approaches should be governed by reasons; and if reasons are powerless to distinguish cases, then the cases should be treated alike.

There is considerable vagueness in this formulation. What does it mean to "treat" in "treated alike"? Here we can fill in many possibilities. There are two extremes

One extreme is that our concern is our belief states. We should have the same belief in the two possibilities.

Another is a belief independent notion of evidential support. Then the principle becomes a principle of evidence: if the evidence cannot distinguish two cases, then both are equally well supported.

This first component of Keynes' statement is not troublesome. The trouble comes with the second component. How are we to represent belief states; or the strengths of evidential support; or whatever might be intermediate between them? Keynes' second component prejudges the situation and simply presumes that probabilities will always apply.

The real import of the paradoxes of indifference is that they show this second presumption is too hasty. We can contrive circumstances in which probabilities simply fail to accommodate the benign principle of rationality above.

A contingent outcome is one whose obtaining is not
dictated by deductive logic. That a coin shows heads is a contingent
outcome. That it shows heads or tails is necessarily true and thus not
contingent.

The most extreme case of indifference arises when we are indifferent over all contingent outcomes. We can deduce from the principle of indifference the formal properties an appropriate representation of complete ignorance states or completely neutral evidential support. In short:

*Completely neutral support: All contingent
outcomes accrue the same support.*

In the case of discrete outcome spaces, such as the two coin tosses, every contingent outcome will be attributed the same indifference or ignorance value "I." Writing in terms of inductive support on the evidence, we have:

support(no heads) = I

support(first heads then tails) = I

support (first tails then heads) = I

support (one head only)

= support( first heads then
tails OR first tails then heads) = I

support (two heads) = I

if we are completely ignorant of the circumstances that lead to the coin arrangement or if the evidence completely fails to distinguish any arrangement, then we must treat all contingent outcomes alike.

This last ignorance or indifference distribution violates the additivity of a probability measure:

support (one head only)

= support( first heads then
tails OR first tail then heads)

=
support(first heads then tails) = support (first tails then heads)

support(
or )

= support(
) = support()

This failure of additivity seems to me to be the key result.

In the case of a continuous outcome space, such as the color spectrum, all intervals of wavelength would be assigned the same support:

support (wavelength in 0 to 1) = support
(wavelength in 1 to 2)

= support (wavelength in 2 to 3) = ... = I

= support (wavelength in 0 to 2) = support (wavelength in 2 to 4)

= support (wavelength in 4 to 6) = ... = I

= support (wavelength in 0 to 1) = support (wavelength in 1 to ∞) = I

= support (wavelength in 0 to 2) = support (wavelength in 2 to ∞) = I

etc.

This distribution is illustrated here. Each of the intervals marked out by the bars are assigned the same support I:

The states of completely neutral support or complete ignorance are largely determined by two invariances that are implicit in the paradoxes of indifference.

The first pertains to what happens when we refine or coarsen outcomes:

*I. Invariance under redescription:*
Completely neutral support is unchanged if we refine or coarsen our
description of the outcome space.

"Coarsening" is the reverse of refinement. Several
outcomes are "or'ed" together to form one new, coarser outcome.

"Refinement" means replacing some outcome by its disjunctive parts. For example, in the case of the two coins, we would replace

"one head only"

by

"heads first then tails OR tails first then heads."

Thus, if the principle of indifference indicates that support for no heads is the same as for one head, then that equality is not altered when we note that "no heads" has two disjunctive parts.

support(no)
= support(one)

= support(
or )

The second invariance pertains to negation.

*II. Invariance under negation:* In
completely neutral support, the support for an outcome is unaltered if we
replace it by its negation.

The quick way to see this is to imagine some contingent proposition "A" about which we know nothing or for which there is no evidence. How does our state of belief or the level of support alter if we replace A with "not-A"? Since we have complete neutrality, it should not change.

This second invariance is illustrated with the color spectrum. Consider the possibility that some electromagnetic wave has a completely unknown wavelength and frequency. To proceed, we will measure the wavelength in units of √300,000nm = 548nm; and frequency in units of √300,000THz = 548THz. The reason is that then the relationship between them becomes very simple:

wavelength x frequency = 1

Consider the support accrued to the two wavelength intervals,

0 to 1 and 1 to ∞.

Each is the negation of the other.

It is tempting initially to assign greater support to the infinite interval 1 to ∞, since it is infinite. That possibility proves unworkable. If we redescribe the two outcomes in terms of frequency, which are the finite and which are the infinite intervals switch:

wavelength (0 to 1) = frequency (1 to ∞)

wavelength (1 to ∞) = frequency (0 to 1)

The figure shows which intervals match physically in the two scales of wavelength and frequency:

If the infinity of an interval means it should have greater support, then we arrive at a contradiction. We would infer from it:

Greater support for wavelength (1 to ∞), since it
is an infinite interval.

Greater support for wavelength (0 to 1), since it corresponds to an
infinite interval of frequency.

To escape the contradiction, we assign equal support to all these four intervals:

wavelength (0 to 1), wavelength (1 to ∞), frequency (0 to 1), frequency (1 to ∞)

These equalities implement invariance under negation. A negation takes us from the interval wavelength (0 to 1) to wavelength (1 to ∞); and from frequency (0 to 1) to frequency (1 to ∞).

For a development of this approach, see Ben Eva,
"Principles of Indifference," *The Journal of Philosophy*,
116(7)(2019), pp. 390–411.

Another approach in this tradition is worth mentioning. In the approach above, an outcome can have the same support as its disjunctive parts. In the two coin example, an outcome "one head" can have the same support as "first heads then tails" and as "first tails then heads."

For some, this equality is intolerable. It can be escaped if we deny that the support for mutually exclusive outcomes can be compared. We simply refuse to judge whether:

"First heads then tails" is more strongly supported
than "first tails then heads."

"First heads then tails" is equally supported as "first tails then heads."

"First heads then tails" is less strongly supported than "first tails then
heads."

We withhold judgment and assert none of these. Then it is possible to avoid the paradoxes. We can say, for example, that

"One head" is more strongly supported than "first
tails then heads."

"One head" is more strongly supported than "first heads then tails."

However the few comparisons allowed are insufficient
to enable a paradox of indifference to be set up.

The long-standing paradoxes of indifference have an interesting history. They arose as a by-product of the expansion of the theory of probability from games of chance to other contexts.

When the theory was first developed in the seventeenth century by Fermat and Pascal, their primary concern was the analysis of games of chance. These games, by design, were developed around an outcome space whose individual outcomes had the same chance. All possible pairs of outcomes on two dice rolls, for example, have the same probability.

,
,
,
,
,
,

,
,
,
,
,
,

,
,
,
,
,
,

,
,
,
,
,
,

,
,
,
,
,
,

,
,
,
,
,

In this most hospitable environment, the comparable likelihood of different outcomes could be determined simply by counting cases. Rolling a two with two dice could happen in only one way

Rolling a seven, however, could happen in six ways.

, , , , ,

It followed that the seven was six times as likely as the two. It was easy then and remains easy today to deal with simple gambling problems merely by counting cases.

The notion of probability was a derived notion. That is, the idea of the equally likely case was fundamental, the starting point. Once these cases were secured, probability could then be defined as a simple ratio:

probability = (number of favorable case) / (number of all cases)

Thus the probability of a seven was (6 favorable cases) / (36 cases in all) = 1/6.

Pp. 6-7 from the English translation, Pierre Simon
Laplace, *Philosophical Essay on Probabilitie*s. 6th French
edition. Trans. F. W. Truscott and F. L. Emory. London: John Wiley and
sons, 1902.

This was the conception of probability employed by Laplace in his 1814 "Essai philosophique sur les probabilités." He was, in his essay, seeking to extend the reach of probabilistic thinking to broader matters. His introductory examples included discussion of the motion of comets and conjectures on the motion of molecules. He then sought an extension of the gambler's notion of probability:

"The theory of chance consists in reducing all the events of the same kind to a certain number of cases equally possible, that is to say, to such as we may be equally undecided about in regard to their existence, and in determining the number of cases favorable to the event whose probability is sought. The ratio of this number to that of all the cases possible is the measure of this probability, which is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible."

This came to be known as the "classical interpretation of probability."

The difficulty with this conception is that the more removed we are from the context of games of chance, the harder it becomes to effect the reduction to equally likely cases. The paradoxes of indifference arise from efforts to determine if Laplace's definition is viable.

The descent to paradox is already visible in Laplace's definition. He writes of us being "equally undecided" as the criterion for distinguishing the cases whose number will be used to compute the probability.

In games of chance, we have (as noted above) something stronger than indecision. We have positive, antecedent physical reasons for expecting the base cases to be equally likely. We toss fair coins in such a way as to ensure it.

Now we are simply told that we have two coins on display and are given no history of how they came to be displayed. When we are asked which faces are uppermost, we are faced with a problem different from that of the tossed coins.

Can we be confident that the same methods that succeeded with games of chance will succeed with the new problem? Work in century following Laplace's essay explored just this question. Keynes asked, what color are we to expect for a book before we see it?

These explorations found that Laplace's condition of our being "equally undecided" could be used to arrive at contradictory probability assignments. They are the paradoxes of indifference.

The solution to the paradoxes of indifference above is to allow that some cases of indefiniteness cannot be represented by the additive measures of probability theory. Is this too severe a response? Should we be ready to abandon probabilities?

The solution above considers two options: a
non-probabilistic representation of indefiniteness or abandoning any
comparison of belief or strengths of support for mutually exclusive
outcomes. Which is preferable?

August 16, 2021

Copyright, John D. Norton