0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | What Logics of
Induction are There? John D. Norton |
The illustrations of the last section have all employed the most familiar of inductive logics, one formulated in terms of the probability calculus. There are many more in the class of deductively definable logics. In general, one defines a new inductive logic by selecting any suitably well-behaved function fN in (S). Very many such functions are possible, so there are many alternative inductive logics definable. Most of these logics will differ in some important property from the probability calculus.
In work elsewhere, I have followed a tradition that seeks alternatives to the additivity of the probability calculus. According to that additivity, if we have two contradictory propositions A and B, we find the probability of their disjunction by adding their individual probabilities:
P(A or B | Ω ) = P(A | Ω ) + P( B | Ω )
This sort of additivity makes sense if low values of probability are to represent disbelief. (For then a low probability for A forces a high probability for not-A, so that we strongly believe not-A, which means we strongly disbelieve A.) It is inappropriate if low values in the inductive strengths are to reflect ignorance.(For then we might not want a low support for A to force high support not-A. If we are largely ignorant about a, we are typically similarly ignorant about not-A, so we would want low support for both A and not-A, which the probability calculus does not allow.) Then a non-additive measure may be more appropriate. For complete ignorance, I have argued elsewhere, we should seek a measure that allows
[A or B | Ω ] = [ A | Ω ] = [ B | Ω ]
where all three strengths are set to some "ignorance" value.
Since I have elaborated on these possibilities elsewhere, here I want to pursue another way that inductive logics may differ from a probabilistic logic. The logic to be pursued below differs from the Bayesian system in employing a different dynamics of conditionalization.
One of the distinctive properties of conditional probabilities is a property I have elsewhere called "narrowness." It asserts that P(A|B) = P(A&B|B). That means that the conditional probability P(A|B) takes no notice of the second of the two disjunctive parts of A = (A&B) or (A&~B).
This property can be generalized to other inductive logics and is expressed as:
Narrowness
[A|B] = [A&B|B]
The property is so familiar as generally to pass without comment. It is, however, a rather odd aspect of Bayesian confirmation theory. Imagine for example that we are trying to identify some unknown animal. Let us say that we learn it is a bird. In a narrow logic, that evidence gives the same support to the animal being a canary or to it being a (canary or whale), even though the evidence precludes it being a whale:
[ canary | bird ] = [ canary or whale | bird ]
That is, Narrowness lets quite inert disjuncts pass in a proposition without penalty, even though they play no role beyond background noise (or worse).
In one sense, Narrowness is admissible. In the population of birds, entities that are canaries arise exactly as often as entities that are canaries or whales. However that statistical fact rarely exhausts our interest in the case. The evidence that the animal is a bird points more specifically to it being a canary than it does to the animal being a canary or a whale. While we might assign equal support to both possibilities in a narrow logic, we would then generally add a step to our analysis. We would dismiss the proposition (canary or whale) as encumbered with nuisance noise in comparison with the proposition (canary), perhaps remarking that the evidence (bird) points specifically to the second.
That we need to add an extra step by hand in cases like these suggests that our inductive logic is incomplete. Shouldn't our logic tell us directly that the two propositions are not really equally favored by the evidence?
We pick out an inductive logic that addresses these problems and contradicts Narrowness by selecting the following function fN in (S):
(SC) [A|B]SC = fN(#A&B, #A&~B, #~A&B) = (#A&B/#B).(#A&B/#A)
The basic idea behind the logic can be seen by
looking at its two terms. The first term (#A&B/#B) is the same as is found in the definition of probability (P). This much of the logic is the same as a probabilistic logic. The second term (#A&B/#A) will only fail to be unity when A extends beyond B. Only then will the logic differ from a probabilistic logic. This second term is a penalty paid whenever A extends beyond B. Proposition A extends beyond B whenever #A exceeds #A&B; so the penalty factor is their ratio. One should read the formula in (SC) loosely as saying "a probability with a penalty for extending disjunctively beyond the evidence." It is this second factor (#A&B/#A) that penalizes A=(canary or whale) when we are conditionalizing on B=bird. |
It is easy to see that this formula will produce an asymptotically stable logic. The inductively adapted partitions could all be produced from one initial partition by uniform refinements. These are refinements that replace each atom by the same number of atoms. Then the value of [A|B]SC will remain the same in all refinements, for the strengths depend only on the ratio of the atoms counts #A&B, #A and #B and not on N. These ratios remain the same under the uniform disjunctive refinements to new adapted partitions. | Here's how it works in a simple case. Let us say we
start out in some partition in which #A&B=1, #A=2 and #B=2, so
that [A|B]=1/4. A uniform refinement will replace every single atom by the same number of new atoms. Suppose that new number is 10. Then, in the new partition, we will have #A&B=10, #A=20 and #B=20, so that [A|B]=1/4 as before. |
We can illustrate the logic's
signature property, its selectivity under conditioning, with a simple
example. A die is tossed and our evidence is that the outcome is
LOW = or or .
The probabilistic degrees of support and the corresponding selective
conditioning degrees of support for the two cases are:
P( | or or ) = 1/3 P( or | or or ) = 1/3 P( or or | or or ) = 1/3 P( or or or | or or ) = 1/3 |
[ | or or ]SC = 1/3 . 1/1 = 1/3 [ or | or or ]SC = 1/3 . 1/2 = 1/6 [ or or | or or ]SC = 1/3 . 1/3 = 1/9 [ or or or | or or ]SC = 1/3 . 1/4 = 1/12 |
When conditionalizing on LOW, a probability measure does not penalize the outcome for inert atoms that contradict the evidence LOW. The SC logic does penalize, so that the support offered each outcome diminishes according to the number of inert atoms it contains.
A few properties of the specific conditioning logic are noteworthy. If we conditionalize on the background Ω, then the logic reverts to an additive measure. For no proposition A can extend beyond the background Ω. A quick calculation confirms that this is the case. If there are N atoms in Ω, then
[ A | Ω ]SC = (#A&Ω/#Ω).(#A&Ω/#A) = (#A/N).(#A/#A) = #A/N
However the result of conditionalizing is rather different from what happens when we conditionalize in probability theory.
In probability theory, conditionalization produces a new measure P(.|B), which is also an additive measure. Because of Narrowness, we need no longer considers those parts of the original space Ω that contradict B.
The corresponding quantity in the specific conditioning logic [.|B]SC is not in general an additive measure. Since Narrowness is violated, we must continue to consider the disjunctive parts of outcomes that contradict the evidence. Indeed the essential novelty lies in outcomes being penalized for just such parts.
In the Bayesian system, we can usefully compute the conditionalized probabilities by means of Bayes' theorem. The analog of Bayes' theorem in a specific conditioning logic is astonishing simple and can be read directly from the symmetry of A and B in the definition (SC) of [A|B]SC:
[A|B]SC = [B|A]SC
This simple formula shows that degrees of support coincide with what is called, in the Bayesian context, likelihoods. That is much less significant that it would be in the Bayesian context, for likelihoods are much more variable in the new logic. If we have some hypothesis H that entails evidence E, the probabilistic likelihood would be P(E|H)=1. In a specific conditioning logic, we are no longer assured of a simple value for the likelihood [E|H]SC when H entails E. If E has many atoms that extend beyond H, this likelihood can be very much less than one.
The better way to read this striking symmetry [A|B]SC = [B|A]SC is as follows. In a specific conditioning logic, when we form [A|B]SC, A is penalized for falling short of B; and it is penalized for extending beyond B. The symmetry in the formula merely expresses the fact that both penalties are exacted in equal measure.
One consequence is that the maximum value of [A|B]SC = 1 can only arise when the finitely many atoms of A and the finitely many atoms of B coincide. In the probability calculus, when P(A|B) = 1, B deductively entails A. (Or at least that is the case when we have finitely many equiprobable atoms, so that there are no "measure zero" outcomes.) This limiting case of deduction does not arise in a specific conditioning logic. In it, [A|B]SC = 1 means that A is B.
According to the material theory of induction, there is no One True andUniversal inductive logic. It is definitely not intended that this specific conditioning logic is offered here as that one true and universal logic. Rather, all that is suggested is that it is another logic that may have application in this or that domain.
Whether a particular inductive logic can be applied in a particular domain is, according to the material theory of induction, decided by the material facts obtaining in the domain. That means that the applicability of the logic must be decided on a case by case basis. There is no general principle that can decide it in advance.
We can display one case in which the material facts will support the logic. It turns out to be a case that also supports a probabilistic logic. What decides between them is what we would like our degrees of support to express. In forming [ hypothesis | evidence ], do we want the evidence to support the hypothesis while ignoring inert possibilities in the hypothesis; or penalizing it for inert possibilities?
This case arises when we have a process for whose outcomes we have physical chances, in one form or another. Such processes include the familiar coin tosses, die throws, timing of radioactive decay, weather on particular days of the year, and so on. The essential point is that the physical chances give us some purchase on the relative frequency of certain outcome.
For example, if the process is the repeated throw of a die, then we expect in the long run that we will throw a roughly with frequency 1/6. We can also say that among the LOW (= or or ) outcomes thrown, we expect the to appear roughly with frequency 1/3.
These material facts can license a probabilistic
logic of induction, if we require that the inductive strength for
some outcome is to match, near enough, to the long term frequency of
the outcome. That is, we match inductive
strength with frequency of long run truth, near enough. We assign a strength 1/3 to a LOW die throw being a , since that outcome will obtain roughly 1/3rd of the time among LOW throws. Or we might assign inductive strength 0.999 to the proposition that there is at least one head among each ten coin tosses, for in the long run, that will be the case 0.999 fraction (=1023/1024) of the time. |
This is NOT another
ill-fated attempt to deliver an "interpretation of
probability." It is assumed that we already have physical chances or
physical probabilities for the outcomes and all that comes with them.
In particular, we have some version of the law of large numbers that
assures us that relative frequencies will, with arbitrarily good
chance, match relative frequencies over many trials. What is also assumed is that we match the atoms of the inductively adapted partitions of our inductive logic with outcomes that have equal chances. That is sufficient to enable probabilistic degrees of belief to match long term frequencies, "near enough"--which means within the usual qualifications of a law of large numbers. Demands for an "interpretation of probability" have had a woeful effect. For they demand an explicit definition, even an operational or behaviorist definition, of a central term of a theory. We have long since abandoned this demand for other central terms of our physical theories, since it is generally unsatisfiable. One product of this demand, the subjective interpretation, has proven especially dangerous. It allows its proponents to dismiss failures of Bayesian confirmation theory, such as the theory's failure adequately to treat ignorance, as mere aberrations of opinions. |
Let us write this a little more formally. A and B will be outcomes each comprised of one or more atomic outcomes, where these atomic outcomes have equal physical chances. The relative frequency that an atomic outcome in A obtains whenever B obtains among n trials is
Relative frequencyn(A, B)
If we set strengths of support to the probability
(P)P(A|B) = #A&B/#B
then these strengths will match the long term relative frequencies, near enough.
Now let us turn to specific conditioning. The problem with merely matching inductive strengths with frequency is that we have no defense against inert atoms. In the case of the die throw, the outcome or will obtain among LOW throws just as often as the simple outcome . The atom is inert in the sense that it never arises among LOW throws and makes no contribution to the relative frequency. Thus it no makes contribution to the inductive strength assigned.
To arrive at probabilities, we used relative frequency by itself as our guide in forming inductive strengths. The simple way to defend against these inert atoms is to apply a penalty factor to these relative frequencies. We would then use the penalized relative frequency as our guide in forming these inductive strengths. The penalty for outcome A among outcomes of type B in trials can be defined as
Penalty(A, B) = ( number atoms in A that can be realized in the trials of type B) / (number of atoms in A)
This penalty factor is equal to #A&B/#A.
We now use the penalized relative frequency
Penalized relative frequencyn(A, B) = Relative frequencyn(A, B) x Penalty(A, B)
in place of the simple relative frequency as our guide to forming inductive strengths. The two terms of the formula (SC) match the two terms of this formula for penalized relative frequency. The first term, #A&B/#B will match the long term relative frequency of A among B, as before. The second term #A&B/#A equals the penalty.
Therefore, if we form our inductive strengths according to (SC) in these cases, then, in the long run of very large n, our inductive strengths will match up near enough with the penalized relative frequency just defined.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | What Logics of
Induction are There? John D. Norton |