Press question mark to see available shortcut keys

'The Ironic Effect of Significant Results on the Credibility of Multiple-Study Articles', Schimmack 2012

"In medical drug trials, the occurrence of failed studies is actually very common. An astonishing 50% of Stage III drug trials, the last hurdle before a drug can be approved and sold, produce nonsignificant results (Gordian, Singh, & Zemmel, 2006). This is even more astonishing as effectiveness is tested in Stage II drug trials. This finding essentially shows a 50% failure rate to replicate effects that were significant during Stage II testing. The rate of failure is especially common for drugs that are based on novel mechanisms, which makes these studies more similar to studies published in top psychological journals that place a premium on new discoveries. In contrast to 50% failure rates in drug trials, the failure rate in psychological journals is close to zero (Sterling et al., 1995). Low total power and high IC-indices suggest that the main reason for this low failure rate is not that psychological research is more robust. A more likely explanation is that psychological discoveries are never subjected to rigorous tests equivalent to Stage III drug trials."

Great paper. The deceptively simple observation that psychology papers often report 2-5 'successful' experiments... but if there were 5 experiments which all had an unusually-high statistical power of 80%/0.8 then even if there were a real underlying effect, the odds of all 5 being statistically-significant is 0.80^5=33%! Where are the other 77% of experiments reporting just 1 success, just 2 successes, 3 successes, or 4 successes...? And a more realistic estimate of power sharpens the point (take the recent neurology http://lesswrong.com/lw/g13/against_nhst/8rob estimate of ~20% power in the average experiment: then that would be 0.20^5=<1%)

Hence, publication bias or p-value hacking or manipulation of degrees of freedom etc.
Shared publiclyView activity