Proving History: Bayes's Theorem and the Quest for the Historical Jesus, Richard Carrier 2012:

# ch6

There will remain occasions when you will have access to information the other can never access (usually private unshared experiences), in which case you will each get a different result from Bayes's Theorem. But since BT only produces a conditional probability (it demonstrates what your conclusion should be given what you know), disagreements in this case will be acceptable to both parties. Once all other disagreements are resolved in the manner described above, Party A will agree that Party B's conclusion should in fact be exactly what B finds it to be, given the information available to B, and Party B will agree that Party A's conclusion should in fact be exactly what A finds it to be, given the information available to A. In other words, they will actually agree they must disagree, and in exactly the way determined by their different results with Bayes's Theorem, precisely because they each have access to information the other cannot confirm. Each will thus agree the other's position is entirely rational (provided they've been sincere and are not insane), and therefore their disagreement is entirely appropriate. This latter condition does not support claims of epistemic relativism, however, since there is still a single objective fact of the matter (one or both parties are still wrong); it's just that to one (or both) of them the required information is unavailable and we must all work from what we know. The films Contact and (the original) Journey to the Center of the Earth each present clear (though far-fetched) examples of entirely valid instances of just such a condition, where one party validly knows the truth but cannot expect anyone else to agree with them. And in such cases the appropriate attitude of everyone else should be the same: that the party making the claim cannot be expected to disavow their conclusions, provided they in turn accept that others cannot be expected to share those conclusions.
...Hence the reason public or replicable data is so important to professional history (per my first axiom in chapter 2, page 20) is that it allows us to personally observe the same data (thus bypassing the need to trust more people than we have to), and the reason expert consensus is so important (per my second axiom in chapter 2, page 21) is that when the competent reporting witnesses are extremely numerous (e.g., a whole community with considerable training, mutually policing effective standards), the probability of mass error or deceptive collusion becomes extremely small. I'll revisit this point briefly later, and I discussed a few examples in chapters 2 and 3, but I won't analyze when and why to trust experts here. That will already become part of the information-sharing dialogue between disagreeing parties. And such dialogue almost invariably creates agreement. Rationally justified disagreement among well-informed parties is comparatively rare.

Resolving such disagreement requires exploring why each party derives the probability they do, and why they differ despite deriving it from the exact same information. One can do this by identifying determinable probabilities that can be connected to the probabilities being estimated and ask why deviations obtain. For example, if two historians disagree on how frequently bodies were stolen from graves in antiquity, at least one undeniable limit can be established: a maximum number of bodies available to be stolen in a given year can be agreed upon (which, let's say, archaeology can confirm can't have been more than 1,000,000 for any particular graveyard), and if both parties also agree at least one of those bodies would be stolen each year, you have a definite minimum frequency (one in a million per year), and if one party's estimate was lower than that, they must now agree to revise it. Thus at least some kind of minimum can be arrived at. Then you can approach the matter from the other side: if one party insists the frequency cannot be as high as, say, 1 out of every 1,000 bodies, yet this is their opponent's a fortiori maximum, their opponent must ask why they conclude the rate can't have been that high. If they can give no valid reason, then their objection is without foundation—they must then agree the rate could have been that high, as they know of no valid evidence it was lower. Now with a working maximum and minimum, a calculation can be made. And sometimes only the maximum matters to the actual argument being made, for example, if it is argued “so far as you know 1 in 1,000 bodies were stolen in any given year,” then any conclusion that follows from this can also be argued to hold “so far as you know” (because any qualifiers in the premises will commute to the conclusion). Thus the conclusion “so far as you know P(h|e.b) = x” would have to be accepted by both parties (otherwise one of them is rejecting sound logic). In this dialogue, all relevant evidence could be adduced regarding the frequency of bodysnatching (e.g., from laws passed, cases recorded, etc.) and similarly debated in respect to the minimum and maximum rates that would explain all that evidence, and if disagreements persist even there, the same debate can surround why. Finally, both parties can discuss what further inquiry (by collecting more information) might change their minds—and if that inquiry is possible, the prescribed research can be completed and the issue revisited.

Other concerns about contingency are already resolved by probability theory. For example, sometimes it's claimed that the probability of life arising on earth is very small, whereas if it was by design, the odds would be very high, creating such an enormous disparity in consequent probabilities that unless you have a wildly outrageous bias against the existence of a Creator (resulting in an extraordinarily large disparity in the priors against it), BT entails life was created by intelligent design. But there are two fallacies in this argument. The first is of invalidly predicting e from h, when in fact from the hypothesis “God exists” it isn't possible to deduce the prediction ‘simple, single-celled carbon-coded life forms would arise on just this one planet out of trillions, and only billions of years after the universe formed, which would only slowly evolve into humans after billions of years more’ etc. Thus P(e|GOD.b) in this instance is not ‘very high.’ In fact, arguably it's extraordinarily low, even before adding any background knowledge that renders such divine beings improbable in and of themselves.13 The second fallacy, however, is a common mistake in reasoning about probability: the odds of life forming by chance are not the odds of life forming by chance specifically here on earth, but the odds of life forming by chance on some planet somewhere in the whole of the known universe.14 Because, obviously, wherever that happens to be will become “specifically here” for whoever ends up evolving on that planet to think about it. It's the difference between you winning the lottery (which is very improbable) and someone winning the lottery (which is very probable). You are reasoning fallaciously if, after winning, you conclude the lottery must be rigged simply because your winning was so very improbable. Because someone was likely to win, and that someone was as likely to be you as anyone else playing. Hence, in fact, the number of planets and years available are such that, where L = ‘life as we observe it to be’ and U = ‘the universe as we observe it to be,’ P(L|U)→1. And since (as suggested earlier), P(L|GOD)→0, the consequent probabilities are in fact exactly the reverse of what was thought, such that even if P(GOD|b) were high (and it's not), life still probably wasn't created by intelligent design.15
The relevance of this to history is that the same kind of fallacious arguments can arise if you do not attend to the correct probabilities. For example, you cannot argue that Alexander the Great assassinated his father Phillip because the odds of that assassination happening by chance are small, but the odds of that happening “if Alexander did it” approach certainty. To begin with, such coincidences happen all the time (often kings are assassinated who just by chance have sons or successors who will benefit; indeed, this is probably true in most cases)—so the probability that this is one of those coincidences is actually high, not low (I'll discuss this phenomenon using a poker analogy on page 254). But more importantly, in Bayesian analysis this doesn't even become an issue because ~h would have to be accounted for, in which we would list a number of known persons who had the same motive (and that's assuming we can leave out of account the many unknown persons who would also have motive), and the prior probability for each being the culprit would have to be the same (assuming we have no other evidence implicating Alexander, or any one else), and, more importantly, the consequent probability would be the same for all of them. That is, “Phillip gets assassinated” is 100 percent certain on any “x did it” hypothesis. So Alexander is no more likely to be the culprit than anyone else. In other words, it's fallacious from the start to assume the hypothesis “Alexander did it” is competing against “chance” (as if random quantum events caused kings to be assassinated). Rather, it's competing against other assassins, for every one of whom “the odds of Phillip getting assassinated” are 100 percent. Of course, if we have other evidence, then e is not just “Phillip got assassinated” but the conjunction of all that evidence, which could implicate someone specific. Or if there were no other known suspects (or the only known suspects are actually only known from Alexander claiming they are suspects), the prior probability could favor Alexander. If a study of royal assassinations found that, statistically, sons more likely turned out to be the culprit, or that, when there was only one known suspect, more often than not they turned out to be the culprit, such data could be used to alter the priors (if all the contexts are sufficiently similar—see my following discussion of using reference classes to assign priors).

I suspect many critics by now have been chomping at the bit in protest of my cavalier assumption that epistemic probabilities in Bayes's Theorem are really just actual frequencies of things, and thus really physical probabilities after all—since I have evinced this assumption throughout this chapter and most of this book. I've given some hints already as to why that assumption is in fact valid, and as to how we convert such frequencies into seemingly unrelated things like “degrees of belief,” but I'll take that up in the rest of this chapter (especially in the last section, to which the next section is preliminary). In closing here, one final point deserves mention: even if we err in choosing a reference class (for priors) or estimating causal frequency (for consequents), this is no different than any other error in empirical reasoning. Until the error is identified and corrected, or shown to be uncorrectable, we have sufficient reason to believe what our analysis tells us. And we can be corrected, and will thus change our minds, if that error is indeed exposed by critics and then corrected, either by them or our own renewed inquiry and analysis.
If we can legitimately narrow the reference class, or are compelled by the logic of the situation to broaden it, we would simply recalculate our conclusion accordingly once we've been given new information not previously available to us. So, too, the solution to all other difficulties that arise in applying BT. And as we are discussing conclusions in history and not science, per my discussion in chapter 3 of the difference in degree between those two enterprises (page 45), as long as you follow all the rules and advice above and throughout this book, a Bayesian analysis will show you what you should believe given what you know, and since most of what you know (most of what's in b and e) does not rest on scientific certainty, neither can any conclusion you reach via BT. But scientific certainty is not required to warrant ordinary belief. What is required is that whatever degree of certainty you settle upon, it be based on a well-informed and logically valid analysis.

What are probabilities really probabilities of? Mathematicians and philosophers have long debated the question. Suppose we have a die with four sides (a tetrahedron), its geometry is perfect, and we toss it in a perfectly randomizing way. From the stated facts we can predict that it has a 1 in 4 chance of coming up a ‘4’ based on the geometry of the die, the laws of physics, and the previously “proven” randomizing effects of the way it will be tossed (and where). This could even be demonstrated with a deductive syllogism (such that from the stated premises, the conclusion necessarily follows). Yet this is still a physical probability. So in principle we can connect logical truths with empirical truths. The difference is that empirically we don't always know what all the premises are, or when or whether they apply (e.g., no die's geometry is ever perfect; we don't know if the die-thrower may have arranged a scheme to cheat; and countless other things we might never think of).30 That's why we can't prove facts from the armchair.
Nevertheless, Archimedes was able to prove the existence and operation of mathematical laws of physics purely from deductive logic—and he was right (he thus derived the basic laws of leverage and buoyancy). But he was only right because the premises on which his syllogisms depended were empirically confirmed to his satisfaction—at least in those conditions he restricted his laws to. We now know there are many factors that can alter or negate his premises and, therefore, to be more broadly applicable, his laws had to be considerably revised and expanded.31 He could not have deduced the world would turn out that way. But he could have speculated it would and then correctly deduced what laws of physics would then follow. And he was aware of this. For example, he knew the curvature of the earth complicated his premises for determining the laws of hydrostatics (since it meant the surface of a tub of water would not be a flat plane but a rounded convex shape), so he proved that that curvature was so slight it could be safely ignored for his purposes (i.e., he could assume the surface of a tub of water is flat). Only if someone demanded a certain (he might say absurd) level of precision would that curvature have to be reintroduced and accounted for.

Indeed, some historical events literally happen only once, because the conditions required converged only once. Yet we need to know with what frequency such a conjunction of causes will produce that effect. Returning to our previous example of Matthias the mechanic, suppose we had no evidence of anyone getting rich as an industrial mechanic in antiquity. That does not mean no one did. Because if it was rare, we can expect (to a very high probability) that we would have no evidence of it (see my earlier discussion of lost evidence, page 219). Meanwhile, all the elements required for it to have happened are well attested as operating in that context (and there is no evidence they were ever mutually exclusive), so their conjunction must have had an actual frequency. Indeed, this is so even if there were no rich mechanics. Just as a die that is never rolled nevertheless has a discernible probability of coming up ‘4’ if it ever is rolled, so, too, the conjunction of conditions required to produce a rich mechanic in antiquity will have some probability even if, by chance, that conjunction never occurred. And when we actually are faced with evidence of such a conjunction (which is always a logical possibility), we are certainly required to assess the prior probability of that conjunction, even in the absence of any prior examples, because if we know anything, we know it can't be zero, and is unlikely to be vanishingly small (since, after all, we're not talking about transmuting lead to gold). And as it happens, in this case we have been faced with evidence of just such a conjunction, by discovering direct evidence of a wealthy Roman industrial mechanic who made his fortune in the Middle East.36

So when Bayesians argue that probabilities in BT represent estimates of personal confidence and not actual frequencies, they are simply wrong. Because an ‘estimate of personal confidence’ is still a frequency: the frequency with which beliefs based on that kind of evidence turn out to be true (or false). As Faris says of Jaynes (who in life was a prominent Bayesian), “Jaynes considers the frequency interpretation of probability as far too limiting. Instead, probability should be interpreted as an indication of a state of knowledge or strength of evidence or amount of information within the context of inductive reasoning.”40 But “an indication of a state of knowledge” is a frequency: the frequency with which beliefs in that state will actually be true, such that a 0.9 means 1 out of every 10 beliefs achieving that state of knowledge will actually be false (so of all the beliefs you have that are in that state, 1 in 10 are false, you just won't know which ones). This is true all the way down the line. To say “I am 99% confident that x will happen roughly 80% of the time” is to assert a confidence level, and a confidence level is a mathematically defined state (it follows necessarily from the deductive truths of randomized sets and some relevant physical frequencies), thus what you are saying is that the frequency of all beliefs in the same mathematically defined state that are true is 99 in 100. And of course the 80% in which you have this confidence is a straightforward physical frequency.
This is why every claim has some small prior probability. If you are 99% confident that P(h|b) = 0, that amounts to saying there is a 1% chance that P(h|b) ≠ 0, so the a fortiori P(h|b) must be something more than 0. And in fact, since it's never strictly correct to say we're 99% confident that P(h|b) = 0, but rather we must say that P(h|b) = 0 +/-0.01 (or whatever confidence interval we intend at that stated confidence level—see chapter 3, page 87), we must always accept the possibility that the actual value is at the upper end of that error margin (since we can only narrow it further by reducing our confidence level below 99%). Therefore, since we are never 100% confident,41 nearly every logical possibility has a nonzero prior (hence my fourth axiom in chapter 2, page 23).
So perhaps we could say Jaynes is confusing confidence level with the frequencies in which we have confidence. Yet both are still just frequencies, confidence level being the frequency with which such confidence intervals correctly describe a physical frequency, and those confidence intervals describing a physical frequency (of some kind of event or correlation). If a frequentist validly determines that the frequency of x in a given population (in other words, a given reference class) has a 99% chance of falling somewhere between 20% and 30% (e.g., a frequency of 25% +/-5%), then a Bayesian must agree. That is, they must admit that their confidence that some new instance in that same reference class will have property x cannot be higher than 30% or lower than 20%, except 1% of the time, because (as the frequentist will have shown) the evidence can support no other conclusion. The former is their confidence interval; the latter, their confidence level. And when running a BT analysis in history, it's unwise to proceed with anything but a very high confidence level (well above 99%, high enough in fact that it shall never have to be stated), which we can always ensure by using a fortiori reasoning (as explained in chapter 3, page 85). Because the confidence levels for each probability assigned in a BT formula must commute to the conclusion. The mathematics of this can be quite complex, but as long as you keep your confidence levels very high, there will be no need to run any calculations—the confidence level that will always commute to the conclusion will then be the highest attainable for you (because you will only have chosen probabilities to include in your analysis that you can assert with the highest confidence you can attain). And that's all the certainty you need (or at least, all you can ever have, given the data available to you).

Bayesians aren't the only ones who can be confused about this. Historians might need help understanding it, too. In our personal correspondence, C. B. McCullagh observed that to apply BT to questions in history
    the hypothetical event has to be considered as a generic type, similar in some respect to others. That might worry historians, whose hypotheses are so often quite particular. For instance, consider how the hypothesis that Henry planned to kill William II in order to seize his throne explains the fact that after his death Henry quickly seized the royal treasure. The relation between these events is rational, not a matter of frequency.44
But, in fact, if the connection alleged is rational, then by definition it is a matter of frequency, entailed by a hypothetical reference class of comparable scenarios. To say it is rational is thus identical to saying that in any set of relevantly similar circumstances, most by far will exhibit the same relation. If we didn't believe that (if we had no certainty that that relation would frequently obtain in any other relevantly similar circumstances), then the proposed inference wouldn't be rational. Explaining why confirms the point that all epistemic probabilities are approximations of physical frequencies.
The evidence in this case is that Henry not only seized the royal treasure with unusual rapidity, but that his succeeding at this would have required considerable preparations before William's death, and such preparations entail foreknowledge of that death. Already to say Henry seized the royal treasure “with unusual rapidity” is a plain statement of frequency, for unusual = infrequent, and this statement of frequency is either well-founded or else irrational to maintain. And if that frequency is irrational to maintain, we are not warranted in saying anything was unusual about it. Likewise, saying “it would have required considerable preparations” amounts to saying that in any hypothetical set of scenarios in all other respects identical, successful acquisition of the treasure so quickly will be infrequent, and thus improbable, unless prior preparations had been made (in fact, if it is claimed such success would have been impossible without those preparations, that amounts to saying no member of the reference class will contain a successful outcome except members that include preparations). Again, the result is said to be unusual without such preparations, or even impossible; and unusual = infrequent, while impossible = a frequency of zero. Hence such a claim to frequency must already be defensible or it must be abandoned. Similarly for every other inference: making preparations in advance of an unexpected death is inherently improbable for anyone not privy to a conspiracy to arrange that death, and being privy to such a conspiracy is improbable for anyone not actually part of that conspiracy, and in each case we have again a frequency: we are literally saying that in all cases of foreknowing an otherwise unpredicted death, most of those cases will involve prior knowledge of a planned murder, and in all cases of having foreknowledge of a planned murder, few will involve people not part of that plan. If those frequency statements are unsustainable, so are the inferences that depend on them. And so on down the line.

For example, the prior probability that Jesus was raised from the dead by a supernatural agency is the same as the prior probability that a supernatural agency raised Romulus from the dead, or Asclepius, or Zalmoxis, or Inanna, or Lazarus, or the “many Saints” of Matthew 27:52–53, or “the Moabite” of 2 Kings 13:20–21, and so on.23 [23. Indeed the number of persons claimed to have been thus raised in antiquity is well more than two dozen: see NIF, pp. 85–127. And those are just the ones we know about.]

47. Although it's worth noting his [Ptolemy] geocentric model was also very predictively successful, only failing after a very long time (with the notable exception of apparent diameters, which never conformed to the theory, past or future).
Shared publiclyView activity