Less Evidence is Better
Anges Bolinska
Lunchtime talk
September 22, 2015
We are now in the fourth week of term. The routine is setting in. We have had talks on evidence, on credit for discoveries, on common origin ideas, and several umbrellas have been offered and accepted. There were unexpected moments. Is that really bacon on the maple glazed donuts?!
Yes, it is.
Today’s talk by Agnes Bolinska is on epistemic representation. The topic of representation has become popular over the last decade or so. During that time, I’ve heard many talks on the topic and even helped a PhD dissertation through to completion on it. It’s an interesting topic, but not one on which I expected to hear much new.
That is how things started. Agnes began by recalling the use of representations as a way of conveying information. The examples were familiar: Suarez’s story of using pen and paper to map ships at sea; the London tube map; Watson and Crick’s geometrical model of DNA.
“I wonder,” my inner voice said, “if Agnes has seen the new, geographically correct map of the London tube system.” Before my inner voice had finished, there it appeared on the screen. “My thanks to Jim,” she said. I presumed she meant Jim Weatherall, who is also a Fellow this term. I scratched one question from my list.
As the material unfolded, I began to pay closer attention. There was a pleasing tightness and fluidity in the argumentation. It was good to see. I surmised that this must be core material from her recently completed dissertation.
Now came the new material that would be the subject of further investigation during Agnes’ Postdoc here in the Center. “What if the phenomena are little known and we want to use our representations as investigative tools?” To answer, Agnes has a scheme. It would be developed in the context of methods used to determine molecular structure in organic chemistry.
The overall idea was safe and simple: we have a large space of possible structures and we will use various approaches to prune it in our search for the right one. Then came the remark that gave me pause. She would use information entropy to analyze the pruning.
I have a few rules of thumb in philosophy of science. One of the most reliable is that almost anything that follows the word “entropy” will be a morass of confusion. This is no idle speculation on my part. In the late 1990s, along with John Earman, I was drawn into the excitement surrounding entropy, information, and Maxwell’s demon. In a grim seminar, we gradually came to the dark realization that the literature was deeply confused. Subsequent research over the next decade and half confirmed our worst fears. (A visit to my website will provide ample reading, if you really want to see it.)
I settled a little lower in my chair. I’d seen the clouds. The storm must follow. As the talk developed, however, the sun continued to shine. What followed was a lovely, clear explanation of how information entropy can guide the pruning.
Her example illustrates the notion very well. Let us play a game in which you have to find an unknown number between 1 and 128 with questions that can have yes/no answers.
An easy but bad strategy is just to ask:
Is it 1?
Is it 2?
Is it 3?
Is it 4?
and so on.
Do that and you might get lucky. But, more likely, you won’t. On average you will need 64 questions to find the number.
Here’s a better strategy. Keep dividing the possible number sets in half. Ask:
Is it greater than 64?
and then depending on the answer:
Is it greater than 32?
and then depending on the answer:
Is it greater than 48?
and so on.
Each question halves the possible numbers. They are reduced to
64, 32, 16, 8, 4, 2, 1
It will take just 7 questions to be sure of finding the number.
The second sort of question is obviously better. In real life examples, we may not be able to choose the questions so tailor-made to the problem. We may only be able to ask “Is the number prime?” or “Is the number divisible by 7?”
How are we to decide which is the better question? We want the one that divides up the space most uniformly. We need technical help to determine which does it most uniformly. That is provided by information entropy. It is, loosely speaking, a measure of how spread out is a probability distribution.
Let’s apply it. Assign probabilities p and q to the “yes” and “no” answers. The information entropy is just
- p log2 p - q log2 q
The most uniform division with p=q=1/2 accrues the greatest information entropy. It is the best question that gives the most information. Fewer questions of this type are needed to identify the number. Put p=q=1/2 into the formula and you have an entropy of 1 bit. Seven questions gives 7 bits, which is just the minimum number of questions needed to identify the unknown number.
A bad question—-Is it 1?—-has p for “yes” of 1/128 and q for no of 127/128. The entropy is a meager 0.0659. Subsequent questions like this give successively higher values as the possibilities are pruned, one by one.
This example gives the overall strategy in a simple case. In real science, things are messier. Agnes described two ways of ascertaining molecular structure: one way prunes by using the rules of stereochemistry. Another prunes by examining X-Ray diffraction images.
It turns out—-as Agnes showed us at some length—-that stereochemical pruning is far more effective. One part of the reason is the X-Ray diffraction images are subject to multiple interpretations. We can be mistaken in reading them. Stereochemical rules are less malleable. They get us to the structure faster, which means we need fewer questions and less risk of misinterpreted evidence leading us astray.
Then came the moment of the talk. We had been led by comfortable and easy reasoning to a conclusion that moments earlier would have seemed absurd. We had accepted it, but we just didn’t know it.
Agnes unveiled the quote from Crick:
“. . . evidence can be unreliable. . . . you should use as little of it as you can.”
A titter ran through the room.
“That always gets a reaction,” Agnes remarked, “and I don’t know why.”
For Agnes, the moment was no longer remarkable or even unexpected. For us, it was the big reveal. The goal is to prune the space rapidly and efficiently. Fewer items of evidence of higher quality do that better. A few moments before we would never have assented to the ensuing maxim: use less evidence.
The talk was over. I congratulated Agnes on a great talk. It was so polished that I was surprised when she said that this was the start of her research on the topic. We turned to questions. They were energetic and sought to open new doors.
Had Agnes made the connection to analogies? Had Agnes seen the more recent work on structure determination? Was Agnes worried that the probabilities in the entropy might suffer from the notorious problem of the priors? Had Agnes looked at complexity issues in search? And more. All too soon, the time was up and Agnes was offered her umbrella.
John D. Norton
|