'The Ironic Effect of Significant Results on the Credibility of Multiple-Study Articles', Schimmack 2012
"In medical drug trials, the occurrence of failed studies is actually very common. An astonishing 50% of Stage III drug trials, the last hurdle before a drug can be approved and sold, produce nonsignificant results (Gordian, Singh, & Zemmel, 2006). This is even more astonishing as effectiveness is tested in Stage II drug trials. This finding essentially shows a 50% failure rate to replicate effects that were significant during Stage II testing. The rate of failure is especially common for drugs that are based on novel mechanisms, which makes these studies more similar to studies published in top psychological journals that place a premium on new discoveries. In contrast to 50% failure rates in drug trials, the failure rate in psychological journals is close to zero (Sterling et al., 1995). Low total power and high IC-indices suggest that the main reason for this low failure rate is not that psychological research is more robust. A more likely explanation is that psychological discoveries are never subjected to rigorous tests equivalent to Stage III drug trials."
Great paper. The deceptively simple observation that psychology papers often report 2-5 'successful' experiments... but if there were 5 experiments which all had an unusually-high statistical power of 80%/0.8 then even if there were a real underlying effect, the odds of all 5 being statistically-significant is 0.80^5=33%! Where are the other 77% of experiments reporting just 1 success, just 2 successes, 3 successes, or 4 successes...? And a more realistic estimate of power sharpens the point (take the recent neurology http://lesswrong.com/lw/g13/against_nhst/8rob estimate of ~20% power in the average experiment: then that would be 0.20^5=<1%)
Hence, publication bias or p-value hacking or manipulation of degrees of freedom etc.
"In medical drug trials, the occurrence of failed studies is actually very common. An astonishing 50% of Stage III drug trials, the last hurdle before a drug can be approved and sold, produce nonsignificant results (Gordian, Singh, & Zemmel, 2006). This is even more astonishing as effectiveness is tested in Stage II drug trials. This finding essentially shows a 50% failure rate to replicate effects that were significant during Stage II testing. The rate of failure is especially common for drugs that are based on novel mechanisms, which makes these studies more similar to studies published in top psychological journals that place a premium on new discoveries. In contrast to 50% failure rates in drug trials, the failure rate in psychological journals is close to zero (Sterling et al., 1995). Low total power and high IC-indices suggest that the main reason for this low failure rate is not that psychological research is more robust. A more likely explanation is that psychological discoveries are never subjected to rigorous tests equivalent to Stage III drug trials."
Great paper. The deceptively simple observation that psychology papers often report 2-5 'successful' experiments... but if there were 5 experiments which all had an unusually-high statistical power of 80%/0.8 then even if there were a real underlying effect, the odds of all 5 being statistically-significant is 0.80^5=33%! Where are the other 77% of experiments reporting just 1 success, just 2 successes, 3 successes, or 4 successes...? And a more realistic estimate of power sharpens the point (take the recent neurology http://lesswrong.com/lw/g13/against_nhst/8rob estimate of ~20% power in the average experiment: then that would be 0.20^5=<1%)
Hence, publication bias or p-value hacking or manipulation of degrees of freedom etc.
I thought this was an interesting anecdote in that paper:
"During the reign of a paradigm, it is hard to imagine that things will ever change. However, for most contemporary psychologists, it is also hard to imagine that there was a time when psychology was dominated by animal research and reinforcement schedules. Older psychologists may have learned that the only constant in life is change. I have been fortunate enough to witness historic moments of change such as the falling of the Berlin Wall in 1989 and the end of behaviorism when Skinner gave his last speech at the convention of the American Psychological Association in 1990. In front of a packed auditorium, Skinner compared cognitivism to creationism. There was dead silence, made more audible by a handful of grey-haired members in the audience who applauded him. I can only hope to live long enough to see the time when Cohen’s valuable contribution to psychological science will gain the prominence that it deserves. A better understanding of the need for power will not solve all problems, but it will go a long way toward improving the quality of empirical studies and the credibility of results published in psychological journals."Apr 17, 2013