Is Most Published Research Really False?

Jeffrey T. Leek; Leah R. Jager

doi:10.1146/annurev-statistics-060116-054104

Annual Review of Statistics and Its Application

Volume 4, 2017

Review Article

Free

Is Most Published Research Really False?

Jeffrey T. Leek^1,2, and Leah R. Jager¹
View Affiliations Hide Affiliations

Affiliations: ¹Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205; email: [email protected] ²Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland 21205
Vol. 4:109-122 (Volume publication date March 2017) https://doi.org/10.1146/annurev-statistics-060116-054104
First published as a Review in Advance on October 05, 2016
© Annual Reviews

Abstract

There has been an increasing concern in both the scientific and lay communities that most published medical findings are false. But what does it mean to be false? Here we describe the range of definitions of false discoveries in the scientific literature. We summarize the philosophical, statistical, and experimental evidence for each type of false discovery. We discuss common underpinning problems with the scientific and data analytic practices and point to tools and behaviors that can be implemented to reduce the problems with published scientific results.

Keyword(s): false discoveries, meta-analysis, reliability research, replicability, reproducibility, science-wise false discovery rate

Article metrics loading...

/content/journals/10.1146/annurev-statistics-060116-054104

2017-03-07

2024-05-05

Full text loading...

/deliver/fulltext/statistics/4/1/annurev-statistics-060116-054104.html?itemId=/content/journals/10.1146/annurev-statistics-060116-054104&mimeType=html&fmt=ahah

Literature Cited

Allaire JJ, Cheng J, Xie Y, McPherson J, Chang W. et al. 2015. rmarkdown: dynamic documents for R. http://rmarkdown.rstudio.com/
Allen G, Leek J. 2013. Changing our culture: perspectives from young faculty. http://magazine.amstat.org/blog/2013/12/01/changing-our-culture/
Alsheikh-Ali AA, Qureshi W, Al-Mallah MH, Ioannidis JP. 2011. Public availability of published research data in high-impact journals. PLOS ONE 6:9e24357 [Google Scholar]
Aschwanden C, King R. 2015. Science isn't broken. FiveThirtyEight Science Aug. 19. http://fivethirtyeight.com/features/science-isnt-broken/#part1
Baggerly K. 2010. Disclose all data in publications. Nature 467:7314401 [Google Scholar]
Baggerly KA, Coombes KR. 2009. Deriving chemosensitivity from cell lines: forensic bioinformatics and reproducible research in high-throughput biology. Ann. Appl. Stat 3:1309–34 [Google Scholar]
Begley CG, Ellis LM. 2012. Drug development: raise standards for preclinical cancer research. Nature 483:7391531–33 [Google Scholar]
Buckheit JB, Donoho DL. 1995. WaveLab and reproducible research. Wavelets and Statistics A Antoniadis, G Oppenheim 55–81 New York: Springer [Google Scholar]
Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J. et al. 2013. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14:5365–76 [Google Scholar]
Collins FS, Tabak LA. 2014. NIH plans to enhance reproducibility. Nature 505:7485612 [Google Scholar]
Csiszar A. 2016. Peer review: troubled from the start. Nat. News 532:7599306 [Google Scholar]
Gelman A, O'Rourke K. 2014. Discussion: difficulties in making inferences about scientific truth from distributions of published p-values. Biostatistics 15:118–23 [Google Scholar]
Goecks J, Nekrutenko A, Taylor J. 2010. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11:8R86 [Google Scholar]
Gooding I, Klaas B, Yager JD, Kanchanaraksa S. 2013. Massive open online courses in public health. Front. Public Health 1:59 [Google Scholar]
Hothorn T, Leisch F. 2011. Case studies in reproducibility. Brief. Bioinform. 12:3288–300 [Google Scholar]
Ioannidis JP. 2005a. Contradicted and initially stronger effects in highly cited clinical research. JAMA 294:2218–28 [Google Scholar]
Ioannidis JP. 2005b. Why most published research findings are false. PLOS Med. 2:8e124 [Google Scholar]
Ioannidis JP. 2014. How to make more published research true. PLOS Med. 11:10e1001747 [Google Scholar]
Ioannidis JP, Allison DB, Ball CA, Coulibaly I, Cui X. et al. 2009. Repeatability of published microarray gene expression analyses. Nat. Genet. 41:2149–55 [Google Scholar]
Irizarry R. 2012. People in positions of power that don't understand statistics are a big problem for genomics. Simply Statistics Blog, Apr. 27. http://simplystatistics.org/2012/04/27/people-in-positions-of-power-that-dont-understand/ [Google Scholar]
Jager L, Leek J. 2014. An estimate of the science-wise false discovery rate and application to the top medical literature. Biostatistics 15:11 [Google Scholar]
Kass RE, Caffo BS, Davidian M, Meng X-L, Yu B, Reid N. 2016. Ten simple rules for effective statistical practice. PLOS Comput. Biol. 12:6e1004961 [Google Scholar]
King G. 2007. An introduction to the dataverse network as an infrastructure for data sharing. Sociol. Methods Res. 36:2173–99 [Google Scholar]
Klein RA, Ratliff KA, Vianello M, Adams RB Jr., Bahník Š. et al. 2014. Investigating variation in replicability: a “Many Labs” replication project. Soc. Psychol. 45:142–52 [Google Scholar]
Leek JT. 2013. A summary of the evidence that most published research is false. Simply Statistics Blog Dec. 16. http://simplystatistics.org/2013/12/16/a-summary-of-the-evidence-that-most-published-research-is-false/
Leek JT. 2014. Why the three biggest positive contributions to reproducible research are the iPython Notebook, knitr, and Galaxy. Simply Statistics Blog Sep. 4. http://simplystatistics.org/2014/09/04/why-the-three-biggest-positive-contributions-to-reproducible-research-are-the-ipython-notebook-knitr-and-galaxy/
Leek JT. 2015. The Elements of Data Analytic Style https://leanpub.com/datastyle Victoria, Can.: Leanpub
Leek JT. 2016. How to Be a Modern Scientist https://leanpub.com/modernscientist Victoria, Can.: Leanpub
Leek JT, Peng RD. 2015. Statistics: P values are just the tip of the iceberg. Nature 520:7549612 [Google Scholar]
Micheel CM, Nass SJ, Omenn GS. 2012. Evolution of Translational Omics: Lessons Learned and the Path Forward Washington, DC: Natl. Acad. Press
Mobley A, Linder SK, Braeuer R, Ellis LM, Zwelling L. 2013. A survey on data reproducibility in cancer research provides insights into our limited ability to translate findings from the laboratory to the clinic. PLOS ONE 8:5e63221 [Google Scholar]
Nekrutenko A, Taylor J. 2012. Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat. Rev. Genet. 13:9667–72 [Google Scholar]
NIH (Natl. Inst. Health) 2015. Update: new biographical sketch format required for NIH and AHRQ grant applications submitted for due dates on or after May 25, 2015. Not. No. NOT-OD-15-032, Natl. Inst. Health, Bethesda, MD. http://grants.nih.gov/grants/guide/notice-files/NOT-OD-15-032.html
Nosek B, Alter G, Banks G, Borsboom D, Bowman S. et al. 2015. Promoting an open research culture: author guidelines for journals could help to promote transparency, openness, and reproducibility. Science 348:62421422 [Google Scholar]
NSF (Natl. Sci. Found.) 2015. Dear colleague letter—supporting scientific discovery through norms and practices for software and data citation and attribution. Doc. No. 14-059. Natl. Sci. Found., Arlington, VA. http://www.nsf.gov/pubs/2014/nsf14059/nsf14059.jsp
Open Science Collaboration 2015. Estimating the reproducibility of psychological science. Science 349:6251 [Google Scholar]
Patil P, Peng RD, Leek JT. 2016a. A statistical definition for reproducibility and replicability. Cold Spring Harb. Labs J. http://biorxiv.org/content/early/2016/07/29/066803
Patil P, Peng RD, Leek JT. 2016b. What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspect. Psychol. Sci. 11:539–44 [Google Scholar]
Peng RD. 2009. Reproducible research and biostatistics. Biostatistics 10:3405–8 [Google Scholar]
Peng RD. 2011. Reproducible research in computational science. Science 334:60601226–27 [Google Scholar]
Peng RD, Dominici F, Zeger SL. 2006. Reproducible epidemiologic research. Am. J. Epidemiol. 163:9783–89 [Google Scholar]
Pérez F, Granger BE. 2007. IPython: A system for interactive scientific computing. Comput. Sci. Eng. 9:321–29 [Google Scholar]
Potti A, Dressman HK, Bild A, Riedel RF, Chan G. et al. 2006. Genomic signatures to guide the use of chemotherapeutics. Nat. Med. 12:111294–300 [Google Scholar]
Prinz F, Schlange T, Asadullah K. 2011. Believe it or not: how much can we rely on published data on potential drug targets?. Nat. Rev. Drug Discov. 10:9712–12 [Google Scholar]
Rogoff K, Reinhart C. 2010. Growth in a time of debt. Am. Econ. Rev. 100:2573–8 [Google Scholar]
Stodden V, Guo P, Ma Z. 2013. Toward reproducible computational research: an empirical analysis of data and code policy adoption by journals. PLOS ONE 8:6e67111 [Google Scholar]
Teal TK, Cranston KA, Lapp H, White E, Wilson G. et al. 2015. Data carpentry: workshops to increase data literacy for researchers. Int. J. Digit. Curation 10:1135–43 [Google Scholar]
Wasserstein RL, Lazar NA. 2016. The ASA's statement on p-values: context, process, and purpose. Am. Stat. 70:129–33 [Google Scholar]
Wilson G. 2006. Software carpentry. Comput. Sci. Eng. 8:66–69 [Google Scholar]
Xie Y. 2015. Dynamic Documents with R and knitr Boca Raton: Chapman & Hall/CRC

/content/journals/10.1146/annurev-statistics-060116-054104

Is Most Published Research Really False?

Annual Review of Statistics and Its Application 4, 109 (2017); https://doi.org/10.1146/annurev-statistics-060116-054104

/content/journals/10.1146/annurev-statistics-060116-054104

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 4, 2017

Review Article

Free

Is Most Published Research Really False?

Abstract

Most Read This Month

Most Cited Most Cited RSS feed