As so often with XKCD
cartoons, this is funny but it points to a crucial problem in a lot of sensationalized pop-science claims that we see in headlines, as well as a lot of pseudoscience. In case you don't get the joke, or know about the problem, let me spell it out.
Here's the deal. In 1922, a mathematician, Robert Fisher created the standard of a 95% confidence interval to shoot for in studies. The idea was that above this threshold the result would be called statistically significant.
Hence the (P > 0.05)
in the captions below, which means only a .05 or 5% probability that the result is just a lucky random fluke.
The standard was arbitrary, set there because it was easy to calculate with a slide-rule at the time. But it became the industry standard, so if you want to publish a psychology study, you want to demonstrate that your results are statistically significant. In itself, this is reasonable.
But look what happens in the cartoon below. This is known as significance chasing
or data fishing.
The researchers find no correlation between eating jelly beans and getting acne. But then they say, hey, maybe it's one color, so they test various jelly beans. Sure enough, after lots of flavors tested, green ones show up with a correlation. They publish, claiming statistical significance with 95% confidence.
What's wrong with this scenario? Everything. You see, the (P>.05)
means that only 1 out of 20 times will you be wrong. So if you test all these colors, any math geek will tell you, you should expect
at least one of them to randomly show a false correlation, because that will happen around 1 out of 20 times if there is a 5% chance of it being a random outlier. So they've stacked the deck, and conclude nonsense. It's nonsense because the 'green' is arbitrary and really, taking all the data as a whole, there was no correlation found.
They could prove this by testing green jellybeans many more times, but they don't. They could publish all the results, including the overwhelming number of jelly bean trials with no correlation, but they don't. They instead proclaim a sensational result that is "statistically significant."
And they don't
publish all the data they got where there was no significance. The irony is that results of no significance are very significant!
With a market for new and more helpful drugs, drug companies do this as well sometimes and don't publish negative results, and ineffective drugs get marketed, like the case of the green jelly bean below. But it's usually not fraud, as anti-big-pharma activists so often try to insist, as much as two other factors.
First, there is pressure only to publish positive results, as that's what gets the fame and grants and so forth. Nobody tends to care that "X is not correlated to Y"
in most cases. And secondly, there's always a built-in emotional bias: scientists find ways to confirm the hypothesis they're excited about, and they disregard negative results because they don't want to be wrong, and they think, well, something went wrong, so they keep trying, thus ignoring results they don't like to see. Which would be fine, if they just would publish all
the data and didn't just cherry-pick the trials that were positive.
And there is lots of fraud and flim-flam with pseudo-science. The paranormal crowd notoriously use this trick, among others, to claim that they have found e.s.p. and psychic powers. So do so-called "alternative" medicine
practitioners will keep testing and fishing until they get a hit, and then publish that, claiming their concoctions or magical 'healing energies' are effective. The pseudoscience crowd plays with funny numbers until the get to that gold standard of statistical significance
that might get their hokum into a legitimate publication.
What's surprising, however, is how rampant throughout the social sciences as well as the drug industry the same practice has become, even though mainstream science is supposed to be self-checking and replicate their studies to avoid these kinds of errors. Maybe they need more math geek consultants to oversee their work, so that real science doesn't copy the flim-flam of pseudoscience.
So that explains why the comic is funny. There's a long but very informative New Yorker
article on this issue that I posted a while back, called The Decline Effect and the Scientific Method.
It's worth the read, if you're interested in how scientific studies are done, and why so many claims can't be replicated after their initial hype:https://plus.google.com/+TomEigelsbach/posts/6e9yHAeHZsY
Source: http://xkcd.com/882/ #geekhumor #scienceeveryday #sciencesunday