Public
“What clues were the traders looking for? Some said that they considered a study’s sample size: Small studies will more likely produce false positives than bigger ones. Some looked at a common statistical metric called the P value. If a result has a P value that’s less than 0.05, it’s said to be statistically significant, or positive. And if a study contains lots of P values that just skate under this threshold, it’s a possible sign that the authors committed “p-hacking”—that is, they futzed with their experiment or their data until they got “positive” but potentially misleading results. Signs like this can be ambiguous, and “scientists are usually reluctant to lob around claims of p-hacking when they see them,” says Sanjay Srivastava from the University of Oregon. “But if you are just quietly placing bets, those are things you’d look at.
[...]
But Ledgerwood notes that the prediction markets worked because they relied on a crowd of people making judgements as a collective. “These findings don’t mean we can each individually forecast with a crystal ball whether a given study result will replicate,” she says. “It would be a mistake to conclude that individuals can predict scientific truths with great accuracy based on their gut.””
[...]
“The 62-percent success rate from the SSRP, though higher, is still galling to Vazire, since the project specifically looked at the two most prestigious journals in the world. “We should not treat publication in Science or Nature to be a mark of a particularly robust finding or a particularly skilled researcher,” she says. These journals “are not especially good at picking out really robust findings or excellent research practices. And the prediction market adds to my frustration because it shows that there are clues to the strength of the evidence in the papers themselves.”
If prediction-market participants could collectively identify reliable results, why couldn’t the scientists who initially reviewed those papers, or the journal editors who decided to publish them? “Maybe they’re not looking at the right things,” says Vazire. “They probably put too-little weight on markers of replicability, and too much on irrelevant factors, including the prestige of the authors or their institution.””
https://web.archive.org/web/20180830000819/https://www.theatlantic.com/science/archive/2018/08/scientists-can-collectively-sense-which-psychology-studies-are-weak/568630/
[...]
But Ledgerwood notes that the prediction markets worked because they relied on a crowd of people making judgements as a collective. “These findings don’t mean we can each individually forecast with a crystal ball whether a given study result will replicate,” she says. “It would be a mistake to conclude that individuals can predict scientific truths with great accuracy based on their gut.””
[...]
“The 62-percent success rate from the SSRP, though higher, is still galling to Vazire, since the project specifically looked at the two most prestigious journals in the world. “We should not treat publication in Science or Nature to be a mark of a particularly robust finding or a particularly skilled researcher,” she says. These journals “are not especially good at picking out really robust findings or excellent research practices. And the prediction market adds to my frustration because it shows that there are clues to the strength of the evidence in the papers themselves.”
If prediction-market participants could collectively identify reliable results, why couldn’t the scientists who initially reviewed those papers, or the journal editors who decided to publish them? “Maybe they’re not looking at the right things,” says Vazire. “They probably put too-little weight on markers of replicability, and too much on irrelevant factors, including the prestige of the authors or their institution.””
https://web.archive.org/web/20180830000819/https://www.theatlantic.com/science/archive/2018/08/scientists-can-collectively-sense-which-psychology-studies-are-weak/568630/
Partagé en mode public
Ajoutez un commentaire…