Proving History: Bayes's Theorem and the Quest for the Historical Jesus, Richard Carrier 2012:

# ch3

This is a slam-dunk Argument from Silence, establishing beyond any reasonable doubt the nonhistoricity of this solar event (for the logic of all arguments from silence, see chapter 4, page 117). This entails, in turn, that the Gospels, even from the very beginning, contain wildly unbelievable claims of inordinately public events that in fact never occurred, yet were never gainsaid by any of the millions of witnesses who would surely have known better. I'll consider the significance of that fact in my next volume. But here, our focus will be on the logic of the argument.

Theories in history are of two basic kinds: theories of evidence (e.g., how the content of the Civil War came to exist and survive to the present day), and theories of events (e.g., how that war got started, why Caesar did what he did, why he won, etc.). In other words, historians seek to determine two things: what happened in the past, and why. The more scientifically they do this, the better. And that means the more they attend to the logic of their own arguments, their formal validity and soundness, the better. Historians rarely realize the fact, but all sound history requires answering three difficult questions about any particular theory of evidence or events: (1) If our theory is false, how would we know it? (e.g., what evidence might there then be or should there be?) (2) What's the difference between an accidental agreement of the evidence with our theory, and an agreement produced by our theory actually being true—and how do we tell the two apart? (3) How do we distinguish merely plausible theories from provable ones, or strongly proven theories from weakly proven ones? In other words, when is the evidence clear or abundant enough to warrant believing our theory is actually true, and not just one possibility among many? As in natural science, so in history—I believe Bayes's Theorem is the only valid description of how to correctly answer these questions.6

All of this entails mathematical thinking. Because as soon as you say x is more than y, you are doing math.
In fact, your thinking is even more mathematically precise than that. When you say something is “probably true,” you mean it has an epistemic probability greater than 50%. Because that's what that sentence literally means. And when you say something is probably false, you mean it has a probability less than 50%. And when you say you don't have any idea whether a claim is probably true or probably false, you mean it has a probability of 50%, because, again, that's what that sentence literally means. Likewise, when you say something is “very probably true,” you certainly don't mean it has a probability of 51%. Or even 60%. You surely mean better than 67%, since anything that has a 1 in 3 chance of being false is not what you would ever consider “very probably true.” And if you say something is “almost certainly true,” you don't mean 67% or even 90%, but surely at least 99%.
And when you start comparing claims in order of likelihood, you're again thinking numbers. That the earth will continue spinning this summer is vastly more probable than that a local cop will catch a murderer this summer, which is in turn more probable than that it will rain in Los Angeles this summer, which is in turn more probable than that you'll suffer an injury requiring a trip to the hospital this summer. And so on. You certainly don't know what any of these probabilities are. And yet you have some idea of what they are, enough to rank them in just this way, and not merely rank them, but also rank them against known probabilities, because you know there is data on the frequency with which people like you get hospitalized for injuries, the frequency with which it rains in L.A., the frequency with which murderers are caught in your county, even the frequency with which the earth keeps spinning every year (we have data on that extending billions of years back, not just for the earth itself, but for all the phenomena that could stop the earth spinning). Thus even a merely ordinal ranking of likelihoods always translates into some range of probabilities. In fact, because you know each is more likely than the next, and roughly how much more likely, probability ratios are implicit in your ordinal ranking, and as it happens BT can proceed with just these ratios, without ever knowing any of the actual probabilities (as I show on page 284). And yet you will still often know in what ballpark each probability actually lies, because you can often relate them to a well-quantified benchmark, something whose probability you actually know. And when you think about it, you'll agree this knowledge is not completely arbitrary, but entirely reasonable and founded on evidence (such as your own past experience and study of the relevant facts and phenomena). You might never have thought about any of this, but your being unaware of it doesn't make it any less true.

If there is only one viable hypothesis, all others being crazy alternatives, then the sum of all the latter can become the prior probability of ~h as a catch-all alternative, and a very low probability it will be. But usually there are at least two or three viable hypotheses (or even more) vying for confirmation. Then it's only a matter of deciding what their relative likelihoods are, based on past comparable cases. How often are stories of miraculously darkened suns made up, relative to how often suns actually get blotted out? Even if you don't have other stories of the sun going out, you have comparable cases, such as tales of the moon splitting in two, armies marching in the sky, and crucifixes and Buddhas towering over the clouds. Adding it all up, you get a reference class (a procedure I'll discuss more in chapter 6), in which we find most of the comparable cases are ‘made up’ (or hallucinated or whatever else) rather than ‘actually happened’ (unless we agree that most of those cases are real, but then we must face the consequences of our now believing that giant space Buddhas visit earth and mysterious cloud armies might descend upon us at any moment). “Most” is a numerical assertion, especially in this context, where you certainly don't mean six out of ten such events are real and the other four made up. You will probably be quite confident that no more than one in one hundred or even one in a million of them could have been real. If you settle on the former, you have a prior probability that any such story is real equal to 0.01 and therefore a prior probability that any such story was made up (or merely records an illusion, delusion, or hallucination) equal to 0.99 (because these two options exhaust all possibilities, so we know the odds that one of these possibilities is true is 100%, and 100% – 1% = 99%).
I'll explain later why you might settle on that specific number. For now the point to be made is that priors must be assessed by comparing all the viable hypotheses against each other and deciding how likely each is relative to all the others—not in isolation from them. The biggest mistake amateurs make in determining priors in BT is to mistake the probability of an event happening with the prior probability of a story about that event being true. The physical probability that a giant Buddha will materialize in the sky is certainly astronomically low. But that's not the same thing as the epistemic probability that, when someone claims to have seen a giant Buddha materialize in the sky, they are neither lying nor in error. The priors in BT represent the latter probability, not the former. For only then will the prior probability of ‘actually happened’ and the prior probability of ‘made up’ (or whatever else) add up to exactly 100%, as they must do for any argument to remain logically valid.

And in this account, having a criminal record is eight times more likely on the “untrustworthy” hypothesis than on the “trustworthy” hypothesis, whereas not having a criminal record makes very little difference on either hypothesis. Hence, the absence of a criminal record reduces the consequent for ~h (untrustworthy) by only a tiny amount (and for h, an even tinier amount), whereas the presence of a criminal record reduces the consequent of h (trustworthy) eight times more than the consequent of ~h, the same as if the consequent for ~h were 1 (100 percent) and the consequent of h were 0.125 (merely 12.5 percent), which is a huge difference.
In such a way, evidence can reduce the consequent probability of your hypothesis, and sometimes reduce it greatly, even though that evidence doesn't even contradict your theory—just as having a criminal record and being trustworthy are not mutually contradictory. As long as you lower your consequent to reflect the fact that some of the evidence is less expected on your theory than alternatives, your reasoning will be sound. But if you don't take this into account (and historians who avoid BT often do not), your reasoning will be fatally flawed. Thus using BT will often uncover errors otherwise overlooked, such as ignoring the effect of different degrees of fitness between evidence and theory—rather than considering only evidence that directly ‘contradicts’ your theory as counting against it (a mistake too many historians make). Remember, you may already be making this mistake. So you can't avoid it by avoiding BT.

This also means you cannot exclude facts of either kind. Abusers of BT often attempt to argue in a vacuum, pretending a great many things we all know (the complete contents of b) aren't known, merely to generate a result they like. If we know a vast number of miracle claims have been established to be fraudulent, erroneous, or inaccurate (and we do), we cannot pretend otherwise. We must take this into account when estimating the prior probability of a genuine miracle. Because even if we are personally certain that some miracles are genuine, we still know for a fact most miracle claims are not. Therefore, we must accept that the prior probability of a miracle claim being true must still be low. Likewise, BT can be abused by excluding facts from e that substantially affect the consequent probabilities. The fact that the early growth of Christianity was exactly comparable (in rate and process) to other religious movements throughout history is a fact in evidence that significantly challenges the claim that Christianity had uniquely convincing evidence of its promises and claims.25 Likewise, the fact that medieval Christians became as depraved and despotic as peoples of any other faith is unlikely on the hypothesis that they had any more divine guidance or wisdom than anyone else has ever had. Similarly, the frequency of admirable new ideas among them was no greater than among many other cultures (who came up with many admirable ideas of their own), so they cannot claim any greater inspiration, either.26

Arguing a fortiori in BT also answers the objection that historians don't have precise data sets of the kind available in the sciences. All probabilities derived from properly accumulated data sets have a confidence level and a margin of error, often expressed in various ways, like “20% +/-3% at 95% confidence,” which means the data mathematically entail there is a 95% chance that the probability falls between 17% and 23% (and, conversely, a 5% chance that the probability is actually higher or lower than that). Widening the margin of error increases the confidence level according to a strict mathematical relationship. This permits subjective estimates to obtain objectively high levels of confidence. If you set the margin of error as far as you can reasonably believe it to be, then the confidence level will be as high as you reasonably require. In other words, “I am certain the probability is at least 10%” could entail such a wide margin of error (e.g., +/-10% on a base estimate of 20%) that your confidence level using that premise must be at least 95% (a confidence level that with most scientific data sets would entail a much narrower margin of error). Again, you may not have the data to determine an exact margin and confidence level, but if you stick with a fortiori estimates, then you are already working with such a wide margin of error that your confidence level must necessarily be correspondingly high (in fact, exactly as high as you need: i.e., if the margin is “as wide as I can reasonably believe” then it necessarily follows that the confidence level will be “as high as ensures my belief is reasonable”). Thus precise data is not needed—unless no definite conclusion can be reached with your a fortiori estimates, in which case you have only two options: get the data you need to be more precise, or accept that there isn't enough data to confidently know whether your theory is true. Both will be familiar outcomes to an experienced historian.
Indeed, “not knowing” is an especially common end result in the field of ancient history. BT indicates such agnosticism when it gives a result of exactly 0.5 or near enough as to make little difference in our confidence. BT also indicates agnosticism when a result using both margins of error spans the 50% mark. If you assign what you can defend to be the maximum and minimum probabilities for each variable in BT, you will get a conclusion that likewise spans a minimum and maximum. Such a result might be, for example, “45% to 60%,” which would indicate that you don't know whether the probability is 45% (and hence, “probably false”) or 60% (and hence, “probably true”) or anywhere in between. Such a result would indicate agnosticism is warranted, with a very slight lean toward “true,” not only because the probability of it being false is at best still small (at most only 55%, when we would feel more confident with more than 90%), but also because the amount of this result's margin falling in the “false” range is a third of that falling in the “true” range. Since reducing confidence level would narrow the error margin, a lower confidence would thus move the result entirely into the “true” range—but it would still be a very low probability (e.g., a 52% chance of being true hardly instills much confidence), at a very low confidence level (certainly lower than warrants our confidence).

One might still object not to BT's applicability, but to its utility. That is, we can acknowledge that BT is valid and that BT arguments in history are sound, while still claiming that BT adds no value to already-existing methods, or even makes things harder on the historian than they need to be, in terms of both learning curve and time-in-use. But this objection will be more than amply met in the following chapters. In general, there are three points to make. First, insofar as historians are doing their job as professionals, they should already be devoting considerable time to mastering the relevant methodologies. And yet learning BT is no more time-consuming than learning existing methods of historical reasoning, and as existing methods are significantly flawed (as will be shown in coming chapters), the time spent learning the latter can be redirected toward learning the former (or rather, new versions of the latter that have been reformulated in terms of the former), resulting in no net loss in learning time. Second, insofar as historians are not spending time learning logical methods of analysis and reasoning, they ought to be, otherwise they cannot claim to be professionals. Complaining that learning how to think properly is too time-consuming is not a complaint any honest professional can utter in good conscience. Third, when it comes to time-in-use, in most historical argumentation BT does not have to be very complicated or time-consuming at all, except in precisely the ways all sound and thoughtful research and argument must already be. And in the few cases where BT arguments need to be more complicated, this will be precisely because no other methods exist to manage those cases. Because if they did, and they are logically valid, then they will already be covertly Bayesian anyway (as I'll demonstrate for a number of cases in chapter four). When a problem is complicated, it will always be complicated no matter what tools you use to analyze it. Attempts to avoid that fact can only result in lazy or unsound thinking, and that is certainly not a good excuse to avoid BT.

Danica McKellar (a beautiful actress who became a published mathematician in defiance of stereotype) has also begun a series of books to help the novice get the point that math is important and not as hard as your lousy high school teachers led you to believe: Math Doesn't Suck: How to Survive Middle-School Math without Losing Your Mind or Breaking a Nail (New York: Hudson Street Press, 2007), Kiss My Math: Showing Pre-Algebra Who's Boss (New York: Hudson Street Press, 2008), and Hot X: Algebra Exposed (New York: Hudson Street Press, 2010). Expect more to come. Though aimed at girls, they are just as useful to boys (being equally comprehensible and entertaining to either gender), and though marketed as being for ‘middle schoolers’ (through high school—the three books cover grades six through ten in the American school system), they are not too dumb for adults. When I was at a seminar on this topic (of applying math to history) an actual university professor asked me how you multiply percentages (because “100% × 80% would be 800% and what kind of result is that!?”); I then realized everyone should be reading McKellar. She is not a celebrity poser, by the way, but a real mathematician. The Chayes-McKellar-Winn Theorem is based on her published work: L. Chayes, D. McKellar, and B. Winn, “Percolation and Gibbs States Multiplicity for Ferromagnetic Ashkin-Teller Models on Z2,” Journal of Physics A: Mathematical and General 31, no. 45 (1998): 9055
Shared publicly