Press question mark to see available shortcut keys

"The Economics of Reproducibility in Preclinical Research", Freedman et al 2015 http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165 http://journals.plos.org/plosbiology/article/asset?unique&id=info:doi/10.1371/journal.pbio.1002165.s001 (commentary: http://pipeline.corante.com/archives/2015/06/10/the_cost_of_irreproducibility.php http://news.sciencemag.org/biology/2015/06/study-claims-28-billion-year-spent-irreproducible-biomedical-research http://www.nature.com/news/irreproducible-biology-research-costs-put-at-28-billion-per-year-1.17711 http://www.npr.org/sections/health-shots/2015/06/09/413140503/costs-of-slipshod-research-methods-may-be-in-the-billions ); excerpts:

"Low reproducibility rates within life science research undermine cumulative knowledge production and contribute to both delays and costs of therapeutic drug development. An analysis of past studies indicates that the cumulative (total) prevalence of irreproducible preclinical research exceeds 50%, resulting in approximately US$28,000,000,000 (US$28B)/year spent on preclinical research that is not reproducible—in the United States alone. We outline a framework for solutions and a plan for long-term improvements in reproducibility rates that will help to accelerate the discovery of life-saving therapies and cures.

Clearly, perfect reproducibility across all preclinical research is neither possible nor desirable. Attempting to achieve total reproducibility would dramatically increase the cost of such studies and radically curb their volume. Our assumption that current irreproducibility rates exceed a theoretically (and perhaps indeterminable) optimal level is based on the tremendous gap between the conventional 5% false positive rate (i.e., statistical significance level of 0.05) and the estimates reported below and elsewhere (see S1 Text and Fig 1).

An illustrative example is the use and misuse of cancer cell lines. The history of cell lines used in biomedical research is riddled with misidentification and cross-contamination events [29], which have been estimated to range from 15% to 36% [30]. Yet despite the availability of the short tandem repeat (STR) analysis as an accepted standard to authenticate cell lines, and its relatively low cost (approximately US$200 per assay), only one-third of labs typically test their cell lines for identity [31]. For an NIH-funded academic researcher receiving an average US$450,000, four-year grant, purchasing cell lines from a reputable vendor (or validating their own stock) and then authenticating annually will only cost about US$1,000 or 0.2% of the award. A search of NIH Reporter for projects using “cell line” or “cell culture” suggests that NIH currently funds about US$3.7B annually on research using cell lines. Given that a quarter of these research projects apparently use misidentified or contaminated cell lines, reducing this to even 10% through a broader application of the STR standard—a very realistic goal—would ensure a more effective use of nearly three-quarters of a billion dollars and ultimately speed the progress of research and the development of new treatments for disease.

Four Categories of Irreproducibility
Study Design
    Lack of proper study methodology has been identified as an ongoing challenge to research fidelity for more than 50 years. Improper study design encompasses both studies that are underpowered to yield statistically significant results, as well as those whose design lacks sufficiently rigorous statistical analysis [2]. For example, an analysis of 271 animal studies by Kilkenny et al. concluded that 13% used inappropriate statistical methods and almost 60% had problems with both the statistical analysis and the transparency of reporting [3]. In addition, the lack of rigor of the experimental design, and in particular the absence of properly blinded studies in confirmatory research, have been cited as key characteristics and contributors to studies and that ultimately are not reproducible [4].
    For our analysis, we determined an estimate for this category by evaluating research irreproducibility that can be attributed to the routine use of widely accepted statistical testing procedures. Jager and Leek’s [5] analysis of p-values from more than 75,000 papers from the medical literature concluded a rate of false discoveries among reported results at 14%. Valen Johnson’s[6] mathematical calculation of the false results, based on the assumption that 50% of tested null hypothesis are actually true, calculates a false result between 17% and 25% of the time. Using the high and low figures from these studies, we estimate that between 14% and 25% of reported results may be false positives, and used a midpoint of 19.5% for this category.

Biological Reagents and Reference Materials
    Reference material flaws are associated with the unreliable identification of source materials used in the preclinical study, particularly contaminated, mishandled, or mislabeled biological reagents like antibodies [7] or cell lines [8]. A poster child for misidentified cell lines is the adriomycin-resistant breast adenocarcinoma cell lines, MCF-7/AdrR, used in over 300 studies used before they were found to be derived from human ovarian carcinoma cells (now re-designated NCI/ADR-RES) [9]. For perspective, based on the cost of an average NIH-funded breast cancer grant (US$370k) in 2013 (http://report.nih.gov/categorical_spending.aspx) as much as US$100M of research funding may have been spent using this misidentified cell line alone. Similarly, a recent assessment of mycoplasma contamination found in the NCBI Sequence Read Archive (SRA; http://www.ncbi.nlm.nih.gov/sra) conservatively found that 11% of projects were contaminated [10], and estimated that hundreds of millions of dollars in NIH-funded research has been potentially affected by widespread mycoplasma contamination of continuous cell lines.
    Hughes et al. examined the prevalence of contaminated cancer cell line usage over a period of more than 20 years, and reported a wide range of contamination and mischaracterization, with only a small improvement in rates over time [11]. Excluding studies within the Hughes analysis that were outside of the US or had a sample size of <200 cell lines, the range of the reported misidentification or contamination ranged from a low of 14.9% [12] to a high of 36% [13] (midpoint 25.5%), which serves as our estimated error rate for this category.

Laboratory Protocols
    Laboratory protocol issues encompass irreproducibility that arises during the preparation and execution of the experiment. Although no study estimating the prevalence of the error rate within preclinical laboratory protocol was identified, analysis within the clinical environment has shown laboratory error rates in the range of 0.3% to 0.5% [14,15]. The error rate within preclinical environment—where there is less use of controls, blinding, and broadly accepted standards and best practices—was assumed to be significantly higher than in clinical trials [16]. To estimate the extent to which this assumption holds, we evaluated one study that compared the error rates of other factors in the preclinical to clinical environment, where estimates of the error rates in the preclinical laboratory were found to be as high as 19 times the clinical rate [17]. Applying this multiplier against the error rates for the preclinical laboratory setting generates an estimate of 5.7% to 9.5%, with a midpoint of 7.6%.

Data Analysis and Reporting
    The fourth category of contributing factors to preclinical irreproducibility is the analysis and reporting of data. Data sharing and reporting has been recognized by the NIH [18] as an essential part of the translational research process, and together with the rise of post-publication review [19], it is a key factor in facilitating the identification of irreproducible data or studies [20]. The same animal study referenced earlier (Kilkenny et al.) also looked at design issues and concluded that only 59% of the papers studied had included satisfactory level of details on the methodology, sample size, and key characteristics of the animals used in the study [3]. And while less common, errors in analysis can have devastating impacts, as two researchers reported in 2007 when a simple calculation error (change in sign) undermined several years of work on multidrug resistance efflux transporters and that led to the retraction of a widely cited paper [21].
    A number of studies have investigated the issue of inadequate reporting of research, with estimates of improper reporting reaching as high as 87% [22], although the impact of such data analysis and reporting errors on irreproducible research is inconclusive. For our analysis, we used the results of a study of 234 clinical trials, which found that 18% of data reported was deemed to be “inadequate” [23], which provides a conservative, lower-end estimate for the impact of this category on overall irreproducibility.

Cumulative Irreproducibility Rate
    In order to calculate the total rate of irreproducibility in preclinical research, the estimated prevalence values for all four categories were used as outlined in S1 and S2 Datasets. Given the limited number of studies in which we were able to identify reporting incidence rates for irreproducibility, and a lack of consistency as to how reproducibility/irreproducibility is defined, a rigorous meta-analysis or systematic review was not feasible. However, using both the range and midpoint estimates for each category of error, the combined impact was calculated using a highly conservative probability bounds approach [24] with the cumulative irreproducibility rate estimated to exceed 50% (see Fig. 2 and S1 Dataset).

Comparison to Prior Estimates of Irreproducibility
    Several prominent studies have examined the prevalence of irreproducibility within the confines of the research at a specific company or academic institution. One widely discussed effort was Amgen scientists’ ability to replicate only 6 (11%) of 53 key oncological studies [25]. A similarly low reproducibility rate was seen at Bayer, whose study concluded that a mere 20 to 25% of published data over a 4-year period could be corroborated internally [26]. Likewise, researchers at the Oregon Health & Science University found that 54% of 238 biomedical papers published in 84 journals failed to identify all of the resources necessary to reproduce results [27]. And finally, a review of 80 studies published in the journal Evidence-Based Medicine found that fewer than half (49%) included sufficient details of results to accurately attempt replication [28]. Notably, authors of the latter advocate for tracking replication as a means of post-publication evaluation to both assist researchers to identify reliable findings and to explicitly recognize and incentivize the publication of reproducible data and results. Our calculated estimate (53.3%) of the cumulative prevalence of irreproducible preclinical research falls well within the boundaries of the results published in these previous studies (Fig. 1).

However, it is reasonable to state that cumulative errors in the following broad categories—as well as underlying biases that could contribute to each problem area [14] or even result in entire studies never being published or reported [15]—are the primary causes of irreproducibility [16]: (1) study design, (2) biological reagents and reference materials, (3) laboratory protocols, and (4) data analysis and reporting. Fig 2, S1 Text, S1 and S2 Datasets show the results of our analysis, which estimates the prevalence (low, high, and midpoint estimates) of errors in each category and builds up to a cumulative (total) irreproducibility rate that exceeds 50%. Using a highly conservative probability bounds approach [17], we estimate that the cumulative rate of preclinical irreproducibility lies between 18% (the maximum of the low estimates, assuming maximum overlap between categories), and 88.5% (the sum of the high estimates, assuming minimal overlap). A natural point estimate of the cumulative irreproducibility rate is the midpoint of the upper and lower bounds, or 53.3%.

Extrapolating from 2012 data, an estimated US$114.8B in the United States [18] is spent annually on life sciences research, with the pharmaceutical industry being the largest funder at 61.8%, followed by the federal government (31.5%), nonprofits (3.8%), and academia (3.0%) [20]. Of this amount, an estimated US$56.4B (49%) is spent on preclinical research, with government sources providing the majority of funding (roughly US$38B) [19]. Using a conservative cumulative irreproducibility rate of 50% means that approximately US$28B/year is spent on research that cannot be replicated (see Fig 2 and S2 Dataset). Of course, uncertainty remains about the precise magnitude of the direct economic costs—the conservative probability bounds approach reported above suggest that these costs could plausibly be much smaller or much larger than US$28B...Irreproducibility also has downstream impacts in the drug development pipeline. Academic research studies with potential clinical applications are typically replicated within the pharmaceutical industry before clinical studies are begun, with each study replication requiring between 3 and 24 months and between US$500,000 to US$2,000,000 investment [23]. While industry will continue to replicate external studies for their own drug discovery process, a substantially improved preclinical reproducibility rate would derisk or result in an increased hit rate on such investments, both increasing the productivity of life science research and improving the speed and efficiency of the therapeutic drug development processes. The annual value added to the return on investment from taxpayer dollars would be in the billions in the US alone."

Category 4 of errors is a bit weak, but if anything, they're too generous on the other categories, and they don't include any downstream effects like the funding used in attempts to replicate hot research. 50% is a pretty reasonable estimate. (What is deeply unreasonable is trying to argue it's some much lower number like 10%...)
Shared publiclyView activity