Shared publicly  - 
 
Drawing the wrong conclusion
More on Stapel's fraud and the fallout for psychology

Great. Now a NYT headline implies that all psychology research is suspect because of Stapel's fraud. Yes, responsibility for this fraud extends beyond Stapel himself (http://goo.gl/dQmBk), but not that broadly. Now those areas that lack replications (or even replication attempts) are harming the reputations of all psychology researchers.

Within cognitive psychology and especially in the vision sciences, if someone publishes a splashy, straightforward-to-try-yourself result in Science, many labs actually do try to replicate it. Following the annual Vision Sciences Society meeting, or sometimes while still there, researchers code up the latest result for themselves and try it out. If a finding doesn't replicate, people typically find out (and if one lab continually produces results that don't replicate, people stop trusting research from that lab). If a result does replicate, researchers then re-do the experiments and try to test their limits.

The problem here seems to be that nobody even tries to replicate any of the sorts of stuff Stapel "did," or if they do, it never gets published. In cognitive psychology, a standard approach is to replicate the original result and then extend or challenge it. Papers in Stapel's area don't include many such replications and extensions of earlier results by other labs.

Want a fun exercise? Try scanning the literature to see if anyone has published an attempt to replicate any of the dozens of fraudulent papers that Stapel produced. These were papers in Science and other top journals, and involved collaborations with other well-known researchers. The types of studies he published aren't that hard to do, and he's working in a crowded field with a lot of other people doing this sort of research. If you find any replications embedded in other articles, post them in the comments. I expect I'll be waiting a while.
4
9
Andrey Chetverikov's profile photoYohanna Coe's profile photoProf.Lakshman Madurasinghe's profile photoRyan Taylor's profile photo
17 comments
 
it is pretty stunning that this seems to gone on, unchecked, for so long....As you say in your earlier post "where were the rebuttals, the replication attempts"...I wish that the NYT piece had not said there were "flaws in psychology "...Of course, many people are willing to use this denounce an even broader range of science...the number one comment I see on many news sites is something like "yeah he must have learned his analysis techniques from climate researchers..."
 
I'm a philosopher these days, but as a post-doc I spent a bunch of time pretending to be a moral psychologist. I worked hard trying to replicate a bunch of really well-known and well-publicized results in moral psychology. Specifically, I spent a lot of time running and re-running experiments on the role of affect and emotion in moral judgment. Sometimes I got marginal results; sometimes I got nothing. While I tried to publish some of these failures, I constantly had those papers rejected.

That said, I do not think that there were problems with the original studies, and think that everything was on the level. The problem is that these kinds of effects are likely to be incredibly fragile, up at the level of hyper-complex and glacially-slow processes like those responsible for making moral judgments. There are just so many factors at play in the production of such a judgment that failures to replicate are likely to be endemic for any of the fancy, exciting, and spooky-cool effects that people get so excited about in this area. My guess, and this is only a guess, is that many of the students and collaborators that were complicit in Stapel's fabrications had a sense for this fact--even if it was not an explicit understanding that this region of psychology is bound to produce fragile effects.

What are we to do about this? I'm not really sure. Though, I do support a movement to publish more failure to replicate in social-ish psychology. At this point. that is just not part of the discipline, and it really should be!
 
Could this now be a reason to be more permissive in publishing unsuccessful attempted replications? I imagine that even if people did try to replicate, if they found null results, they wouldn't be able to publish.
 
+Paul Minda -- I try to avoid reading comments on news websites as a general rule. They're just one notch above YouTube comments. The same thing happened after the Hauser story broke. It seemed like most of the comments were of the "I won't ever trust scientists" sort. I think the media is partly culpable here for not explaining what exactly went wrong and explaining how it can go right.

+bryce huebner -- You're right, of course, that people can fail to replicate for a variety of reasons when an effect actually exists. On the other hand, I think people tend to give more weight than they should to positive results. It's also possible that the original result is an error, and given publication biases, I fear that is the case much of the time. Before something is settled science, it seems essential to publish the failures to replicate in addition to the positive results. I'm in the middle of the review process for a paper with a negative result (with plenty of power to find an effect). There have been a handful of positive results published, and our paper initially was rejected because we didn't explain away all of the positive results with a single confound. I made the case that the positive claim is not settled science and that it is just as likely to be in error as our null finding (more so in my view). Fortunately, the editor was willing to reconsider the case in light of some of our arguments and we are now revising. I wish more editors were understanding of the need to publish negative results, especially in replications of relatively new and not-yet-established findings (and by established, I mean replicated by multiple labs). I think there's a real danger in assuming that null findings must have been due to flaws in the experiment rather than to a spurious initial result. I would like to see more willingness to publish failures to replicate when a literature is young. +Cedar Riener is right that it's really hard to publish null results, especially when top journals like JPSP refuse to publish replication attempts at all.
 
+Daniel Simons I think that it's also important to keep in mind the extent to which people are out for the wicked-awesome, spooky-cool result in the more socially relevant regions of psychology. When people like Bargh, Wegner, Gilbert, and Stapel are pumping out results that look to be so insightful, and which are really cool to talk about from a theoretical perspective, there is an internal pressure to find a result that is just as exciting. it's easy to assume that your result, no matter how weird, is the next big thing--and really easy to feel like you are not doing real work when you are confirming something that is transparently obvious. The problems that are driving these sorts of issues are deeply embedded in the structure of social-ish psychology. It's a problem with everything from ignoring the size and power of an effect, to insisting on using parametric tests where your data really call for non-parametric analyses, to fussing with the statistics to get an effect, to aiming for crazier and crazier interventions.
 
Yes. There really need to be more openness to the attempts to replicate that don't pan out. As I mentioned elsewhere, I did work on replicating/extending one of Stapels studies (Mainly I did the data-collection), and it didn't pan out, and basically we did nothing with it.

But, I also come across what Bryce experienced. One of the students here (Lund, Sweden) tried to replicate a robust moral effect, and it just didn't work. I'm working on some stuff on emotional faces that is extending other work, and it looks interesting (but is weak), some of the measures I used, though, work differenty in Sweden than in Indiana (well, duh), and when we ran my experiment in Thailand, the results were nowhere near what we expected, nor in line with my hypotheses.

I find this very interesting (though frustrating, because there is so much data, and so hard to make a reasonable publishable story - well - without messing with the data that is). But, now, when datastorage is cheap...
 
I didn't really get the sense that the article was implying that all of psychological research was suspect, but just that there are some systemic problems with the way the field produces and evaluates research--which I think is absolutely true. In any case, it might not be a wholly bad thing for the public (and other scientists) to be more skeptical of psychological research; that might actually create a bigger incentive for standards in the field to improve more quickly than they otherwise would.
 
+Tal Yarkoni -- I think there are specific types of research that should inspire greater skepticism, but I hate to see the whole field thrown into the mix. Many areas of psychology are based on replicability rather than novelty. Wholesale skepticism isn't the answer. That leads to people posting comments on news sites claiming that all science is bogus (generalizing from Stapel to global warming).

Some areas of psychology merit greater skepticism. Specifically, those for which there are no published replications or replication attempts and for which media coverage seems to be the primary goal. That approach is not true of other areas. Although it's impossible go to generalize across all researchers in an area, subfields have different flavors. In the vision sciences, people tend to be less trusting of effects that they can't experience for themselves.

It seems to me that some areas of social psychology need some serious self-evaluation -- there are systemic problems due to a lack of published replications or replication attempts for flashy findings that get media coverage. Other findings don't merit the same sort of skepticism because they are easily replicated or the effects are robust enough that you can experience them for yourself.

Yes, there are systemic problems with the peer review process and the publication/evaluation process, but I don't think we should go down the path of assuming that all areas are equally affected by these issues.
 
Meta-analyses are done (my former doctoral student was involved with one looking at some area of forensic psychology), and I have seen some on emotion and gender differences. I'm not sure how systematic it is. I've seen some calls for unpublished work also, when it comes to upcoming meta-analyses to try to get a handle on the file-drawer problem. I liked that Goldacre article!
 
+Richard Gray -- Yes, meta-analyses are done quite often in psychology. Many of the papers in Psych Bulletin include meta-analyses, for example. I don't know how often they appear in the top social psych journals like JPSP. Part of the issue in social appears to be a lack of replication attempts in the published literature (due to policies against publishing replication attempts), which makes an assessment of the strength of the effects hard to estimate. Ionnides-style funnel plots can detect that sort of publication bias to some extent. I don't believe they have been used much in psychology as of yet, but they will be...
 
+Daniel Simons and +Richard Gray there is also a real problem with a lack of overlapping methodologies in many regions of social psych. Unlike vision science, say, there are not really agreed upon standards for how experiments ought to be run, what sorts of measures ought to be used, and what sorts of population differences ought to be expected. I gather that this is part of the reason why some people are so opposed to treating social psych as experimental science. In many cases, the experiments end up being more like anthropological observations. It's good where social psychologists are honest about that. But there seems to be a lot of variation in the extent to which people think that social psychological methods can actually tell us something about mechanisms and the like.
 
>In many cases, the experiments end up being more like anthropological observations

I'm not sure at all what you mean by that. There certainly are experiments. It may be difficult to have overlapping methods or agreed upon standards, because social behavior is kind of more broad than vision (I like Sanjay's blog name: The hardest science). We use a lot of different measures, depending on what we are interested in, but usually, before I embark on something anyway, I want to see what others have used, which seemed to work for them and then adapt it, rather than invent the wheel again, and I don't think I'm alone in this (it is something you learn in graduate school).

Of course, it could be done better, and one would hope for more coherence. But, I'm not sure this is where the problem is. I see a need for updating analysis-methods, more replication (more value in replication) across populations, and possibly more collaborations.
 
+Ase Innes-Ker I didn't mean to suggest that all of social psychology is like that. In fact, I think that there is plenty of excellent, careful, controlled, and well thought out social psychology. My thought was about the exciting kinds of studies that tend to get a lot of press, but that are ran like one-off observations that don't have any real controls. As an outsider, much of the the stuff from Stapel's group seems to fit pretty squarely in this category. But, perhaps I am wrong about that, too.
 
Thanks +Daniel Simons and +Ase Innes-Ker for the info. Seems like a lot of the standardization, replication, and review ideas from medicine will be useful in psychology. Of course this will not prevent fraud or biases etc (insert favourite evil pharma example here) but may be useful for pointing out the solid stuff from the not so good.
 
+bryce huebner - interesting points about non-overlapping methodologies and issues with standardization in some social psychology experiments. The funny things is that I found Stapel's Science paper on chaotic environments particularly interesting because it described 3-4 completely separate experiments (from memory there was some overlap between a couple of them) to test the one hypothesis and they all showed the same effect (of course we now know the data was probably made up to get this). Replications of an effect using different methodologies and non-overlapping experiments (with some standards) would seem to point to the effect being real. Are a lot of social psych replications along these lines? Of course if you want to quantify the size of an effect then close to exact replications would be required as this would allow a meta-analysis to be done.

This type of thing has happened in climate change modelling where different groups have coded up models from scratch often focusing on different aspects of climate physics. The fact that they all essentially produce the same results, despite the variations, gives confidence that the effects they predict are solid.
 
+Richard Gray Yes - using different methods to look at a phenomenon is fairly common in psychology. The idea is that if the phenomenon turns up in different paradigms, it probably is real, and is not due to idiosyncrasies in the paradigm/measurement system.

I just read through the intro to Plastic Fantastic (the story about Schön's fraud in physics). And, one of the comments the author makes (Eugenie Samuel Reich), is, that as an insider, Schön's knew very well which were the interesting phenomena to look for, which results were expected, and how they would look, and how to make them plausible. I think a similar thing can be said about Stapel.

Reading the report from Stapel's case, it seems like the place where his fraud might have been discovered earlier is from the "raw" data, because allegedly it was not very well done. (Seth Roberts - from Seth's blog suggested techniques where you look at the first digit distribution, for example). But, new cheaters can, of course, accommodate that once they know about it.

An enterprise that is built on trust, as science is, can always be gamed and cheated. Stapel isn't the first fraud and forger, and maybe not even the most spectacular, nor the most harmful (Wakefield has a lot more on his hands, I'd think).

I keep thinking about how to make the field more robust, to minimize the impact of cheaters. (And, that needs quite a bit of thinking, I think - but with the possibilities to disseminate science as they are now...)
Add a comment...