1932

Abstract

There has been an increasing concern in both the scientific and lay communities that most published medical findings are false. But what does it mean to be false? Here we describe the range of definitions of false discoveries in the scientific literature. We summarize the philosophical, statistical, and experimental evidence for each type of false discovery. We discuss common underpinning problems with the scientific and data analytic practices and point to tools and behaviors that can be implemented to reduce the problems with published scientific results.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-060116-054104
2017-03-07
2024-11-15
Loading full text...

Full text loading...

/deliver/fulltext/statistics/4/1/annurev-statistics-060116-054104.html?itemId=/content/journals/10.1146/annurev-statistics-060116-054104&mimeType=html&fmt=ahah

Literature Cited

  1. Allaire JJ, Cheng J, Xie Y, McPherson J, Chang W. et al. 2015. rmarkdown: dynamic documents for R. http://rmarkdown.rstudio.com/
  2. Allen G, Leek J. 2013. Changing our culture: perspectives from young faculty. http://magazine.amstat.org/blog/2013/12/01/changing-our-culture/
  3. Alsheikh-Ali AA, Qureshi W, Al-Mallah MH, Ioannidis JP. 2011. Public availability of published research data in high-impact journals. PLOS ONE 6:9e24357 [Google Scholar]
  4. Aschwanden C, King R. 2015. Science isn't broken. FiveThirtyEight Science Aug. 19. http://fivethirtyeight.com/features/science-isnt-broken/#part1 [Google Scholar]
  5. Baggerly K. 2010. Disclose all data in publications. Nature 467:7314401 [Google Scholar]
  6. Baggerly KA, Coombes KR. 2009. Deriving chemosensitivity from cell lines: forensic bioinformatics and reproducible research in high-throughput biology. Ann. Appl. Stat 3:1309–34 [Google Scholar]
  7. Begley CG, Ellis LM. 2012. Drug development: raise standards for preclinical cancer research. Nature 483:7391531–33 [Google Scholar]
  8. Buckheit JB, Donoho DL. 1995. WaveLab and reproducible research. Wavelets and Statistics A Antoniadis, G Oppenheim 55–81 New York: Springer [Google Scholar]
  9. Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J. et al. 2013. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14:5365–76 [Google Scholar]
  10. Collins FS, Tabak LA. 2014. NIH plans to enhance reproducibility. Nature 505:7485612 [Google Scholar]
  11. Csiszar A. 2016. Peer review: troubled from the start. Nat. News 532:7599306 [Google Scholar]
  12. Gelman A, O'Rourke K. 2014. Discussion: difficulties in making inferences about scientific truth from distributions of published p-values. Biostatistics 15:118–23 [Google Scholar]
  13. Goecks J, Nekrutenko A, Taylor J. 2010. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11:8R86 [Google Scholar]
  14. Gooding I, Klaas B, Yager JD, Kanchanaraksa S. 2013. Massive open online courses in public health. Front. Public Health 1:59 [Google Scholar]
  15. Hothorn T, Leisch F. 2011. Case studies in reproducibility. Brief. Bioinform. 12:3288–300 [Google Scholar]
  16. Ioannidis JP. 2005a. Contradicted and initially stronger effects in highly cited clinical research. JAMA 294:2218–28 [Google Scholar]
  17. Ioannidis JP. 2005b. Why most published research findings are false. PLOS Med. 2:8e124 [Google Scholar]
  18. Ioannidis JP. 2014. How to make more published research true. PLOS Med. 11:10e1001747 [Google Scholar]
  19. Ioannidis JP, Allison DB, Ball CA, Coulibaly I, Cui X. et al. 2009. Repeatability of published microarray gene expression analyses. Nat. Genet. 41:2149–55 [Google Scholar]
  20. Irizarry R. 2012. People in positions of power that don't understand statistics are a big problem for genomics. Simply Statistics Blog, Apr. 27. http://simplystatistics.org/2012/04/27/people-in-positions-of-power-that-dont-understand/ [Google Scholar]
  21. Jager L, Leek J. 2014. An estimate of the science-wise false discovery rate and application to the top medical literature. Biostatistics 15:11 [Google Scholar]
  22. Kass RE, Caffo BS, Davidian M, Meng X-L, Yu B, Reid N. 2016. Ten simple rules for effective statistical practice. PLOS Comput. Biol. 12:6e1004961 [Google Scholar]
  23. King G. 2007. An introduction to the dataverse network as an infrastructure for data sharing. Sociol. Methods Res. 36:2173–99 [Google Scholar]
  24. Klein RA, Ratliff KA, Vianello M, Adams RB Jr., Bahník Š. et al. 2014. Investigating variation in replicability: a “Many Labs” replication project. Soc. Psychol. 45:142–52 [Google Scholar]
  25. Leek JT. 2013. A summary of the evidence that most published research is false. Simply Statistics Blog Dec. 16. http://simplystatistics.org/2013/12/16/a-summary-of-the-evidence-that-most-published-research-is-false/ [Google Scholar]
  26. Leek JT. 2014. Why the three biggest positive contributions to reproducible research are the iPython Notebook, knitr, and Galaxy. Simply Statistics Blog Sep. 4. http://simplystatistics.org/2014/09/04/why-the-three-biggest-positive-contributions-to-reproducible-research-are-the-ipython-notebook-knitr-and-galaxy/ [Google Scholar]
  27. Leek JT. 2015. The Elements of Data Analytic Style https://leanpub.com/datastyle Victoria, Can.: Leanpub [Google Scholar]
  28. Leek JT. 2016. How to Be a Modern Scientist https://leanpub.com/modernscientist Victoria, Can.: Leanpub [Google Scholar]
  29. Leek JT, Peng RD. 2015. Statistics: P values are just the tip of the iceberg. Nature 520:7549612 [Google Scholar]
  30. Micheel CM, Nass SJ, Omenn GS. 2012. Evolution of Translational Omics: Lessons Learned and the Path Forward Washington, DC: Natl. Acad. Press [Google Scholar]
  31. Mobley A, Linder SK, Braeuer R, Ellis LM, Zwelling L. 2013. A survey on data reproducibility in cancer research provides insights into our limited ability to translate findings from the laboratory to the clinic. PLOS ONE 8:5e63221 [Google Scholar]
  32. Nekrutenko A, Taylor J. 2012. Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat. Rev. Genet. 13:9667–72 [Google Scholar]
  33. NIH (Natl. Inst. Health) 2015. Update: new biographical sketch format required for NIH and AHRQ grant applications submitted for due dates on or after May 25, 2015. Not. No. NOT-OD-15-032, Natl. Inst. Health, Bethesda, MD. http://grants.nih.gov/grants/guide/notice-files/NOT-OD-15-032.html
  34. Nosek B, Alter G, Banks G, Borsboom D, Bowman S. et al. 2015. Promoting an open research culture: author guidelines for journals could help to promote transparency, openness, and reproducibility. Science 348:62421422 [Google Scholar]
  35. NSF (Natl. Sci. Found.) 2015. Dear colleague letter—supporting scientific discovery through norms and practices for software and data citation and attribution. Doc. No. 14-059. Natl. Sci. Found., Arlington, VA. http://www.nsf.gov/pubs/2014/nsf14059/nsf14059.jsp
  36. Open Science Collaboration 2015. Estimating the reproducibility of psychological science. Science 349:6251 [Google Scholar]
  37. Patil P, Peng RD, Leek JT. 2016a. A statistical definition for reproducibility and replicability. Cold Spring Harb. Labs J. http://biorxiv.org/content/early/2016/07/29/066803 [Google Scholar]
  38. Patil P, Peng RD, Leek JT. 2016b. What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspect. Psychol. Sci. 11:539–44 [Google Scholar]
  39. Peng RD. 2009. Reproducible research and biostatistics. Biostatistics 10:3405–8 [Google Scholar]
  40. Peng RD. 2011. Reproducible research in computational science. Science 334:60601226–27 [Google Scholar]
  41. Peng RD, Dominici F, Zeger SL. 2006. Reproducible epidemiologic research. Am. J. Epidemiol. 163:9783–89 [Google Scholar]
  42. Pérez F, Granger BE. 2007. IPython: A system for interactive scientific computing. Comput. Sci. Eng. 9:321–29 [Google Scholar]
  43. Potti A, Dressman HK, Bild A, Riedel RF, Chan G. et al. 2006. Genomic signatures to guide the use of chemotherapeutics. Nat. Med. 12:111294–300 [Google Scholar]
  44. Prinz F, Schlange T, Asadullah K. 2011. Believe it or not: how much can we rely on published data on potential drug targets?. Nat. Rev. Drug Discov. 10:9712–12 [Google Scholar]
  45. Rogoff K, Reinhart C. 2010. Growth in a time of debt. Am. Econ. Rev. 100:2573–8 [Google Scholar]
  46. Stodden V, Guo P, Ma Z. 2013. Toward reproducible computational research: an empirical analysis of data and code policy adoption by journals. PLOS ONE 8:6e67111 [Google Scholar]
  47. Teal TK, Cranston KA, Lapp H, White E, Wilson G. et al. 2015. Data carpentry: workshops to increase data literacy for researchers. Int. J. Digit. Curation 10:1135–43 [Google Scholar]
  48. Wasserstein RL, Lazar NA. 2016. The ASA's statement on p-values: context, process, and purpose. Am. Stat. 70:129–33 [Google Scholar]
  49. Wilson G. 2006. Software carpentry. Comput. Sci. Eng. 8:66–69 [Google Scholar]
  50. Xie Y. 2015. Dynamic Documents with R and knitr Boca Raton: Chapman & Hall/CRC [Google Scholar]
/content/journals/10.1146/annurev-statistics-060116-054104
Loading
/content/journals/10.1146/annurev-statistics-060116-054104
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error