1932

Abstract

The reproducibility of statistical findings has become a concern not only for statisticians, but for all researchers engaged in empirical discovery. Section 2 of this article identifies key reasons statistical findings may not replicate, including power and sampling issues; misapplication of statistical tests; the instability of findings under reasonable perturbations of data or models; lack of access to methods, data, or equipment; and cultural barriers such as researcher incentives and rewards. Section 3 discusses five proposed remedies for these replication failures: improved prepublication and postpublication validation of findings; the complete disclosure of research steps; assessment of the stability of statistical findings; providing access to digital research objects, in particular data and software; and ensuring these objects are legally reusable.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-010814-020127
2015-04-10
2024-12-04
Loading full text...

Full text loading...

/deliver/fulltext/statistics/2/1/annurev-statistics-010814-020127.html?itemId=/content/journals/10.1146/annurev-statistics-010814-020127&mimeType=html&fmt=ahah

Literature Cited

  1. Abdi H. 2007. The Bonferroni and Šidák corrections for multiple comparisons. Encycl. Meas. Stat. 3:103–7 [Google Scholar]
  2. Alberts B, Kirschner MW, Tilghman S, Varmus H. 2014. Rescuing US biomedical research from its systemic flaws. PNAS 18: 111:165773–77 [Google Scholar]
  3. Am. Libr. Assoc 2014. The Fair Access to Science and Technology Research Act (FASTR). Chicago, IL: ALA http://www.ala.org/advocacy/access/legislation/fastr [Google Scholar]
  4. Anderson JA, Eijkholt M, Illes J. 2013. Ethical reproducibility: towards transparent reporting in biomedical research. Nat. Methods 10:9843 [Google Scholar]
  5. Assoc. Psychol. Sci 2013. Registered Replication Reports. Washington, DC: APS http://www.psychologicalscience.org/index.php/replication [Google Scholar]
  6. Baggerly KA, Coombes KR. 2009. Deriving chemosensitivity from cell lines: forensic bioinformatics and reproducible research in high-throughput biology. Ann. Appl. Stat. 3:1309–34 [Google Scholar]
  7. Bailey DH, Borwein JM, Stodden V. 2013. Set the default to “open.”. Not. Am. Math. Soc. 60:061 [Google Scholar]
  8. Begley CG, Ellis LM. 2012. Drug development: Raise standards for preclinical cancer research. Nature 483:7391531–33 [Google Scholar]
  9. Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57:1289–300 [Google Scholar]
  10. Bogomolov M, Heller R. 2013. Discovering findings that replicate from a primary study of high dimension to a follow-up study. J. Am. Stat. Assoc. 108:5041480–92 [Google Scholar]
  11. Boyle R. 2007 (1661). The Sceptical Chymistor Chymico-Physical Doubts & Paradoxes, Touching the Spagyrist's Principles Commonly Call'd Hypostatical; As They Are Wont to Be Propos'd and Defended by the Generality of Alchymists. Whereunto Is Præmis'd Part of Another Discourse Relating to the Same Subject. Salt Lake City, UT: Proj. Gutenberg http://www.gutenberg.org/ebooks/22914 [Google Scholar]
  12. Collberg C, Proebsting T, Moraila G, Shankaran A, Shi Z, Warren AM. 2014. Measuring reproducibility in computer systems research Tech. Rep., Dep. Comput. Sci., Univ. Ariz., Tucson. http://reproducibility.cs.arizona.edu/tr.pdf [Google Scholar]
  13. Creative Commons 2013a. CC0 1.0 Universal (CC0 1.0) Public Domain Dedication Mountain View, CA: Creative Commons https://creativecommons.org/publicdomain/zero/1.0/ [Google Scholar]
  14. Creative Commons 2013b. Creative Commons—Attribution 4.0 International (CC BY 4.0) Mountain View, CA: Creative Commons https://creativecommons.org/licenses/by/4.0/ [Google Scholar]
  15. Dave SS, Wright G, Tan B, Rosenwald A, Gascoyne RD. et al. 2004. Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. N. Engl. J. Med. 351:2159–69 [Google Scholar]
  16. Donoho DL, Maleki A, Rahman IU, Shahram M, Stodden V. 2009. Reproducible research in computational harmonic analysis. Comput. Sci. Eng. 1:8–18 [Google Scholar]
  17. Donohue JJ, Wolfers JJ. 2006. Uses and abuses of empirical evidence in the death penalty debate. Stanford Law Rev. 58:791–846 [Google Scholar]
  18. Economist Staff Writer 2013a. Problems with scientific research: how science goes wrong. The Economist Oct. 19. http://www.economist.com/news/leaders/21588069-scientific-research-has-changed-world-now-it-needs-change-itself-how-science-goes-wrong [Google Scholar]
  19. Economist Staff Writer 2013b. Unreliable research: trouble at the lab. The Economist Oct. 19. http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble [Google Scholar]
  20. Fanelli D. 2010. Do pressures to publish increase scientists' bias? An empirical support from US states data. PLOS ONE 5:4e10271 [Google Scholar]
  21. Feizi S, Marbach D, Médard M, Kellis M. 2013. Network deconvolution as a general method to distinguish direct dependencies in networks. Nat. Biotechnol. 31:726–33 [Google Scholar]
  22. Fienberg SE, Martin ME, Straf ML. 1985. Sharing Research Data Washington, DC: Nat. Acad. Press [Google Scholar]
  23. Forrest WF, Cavet G. 2007. Comment on “The Consensus Coding Sequences of Human Breast and Colorectal Cancers. Science 317:58441500 [Google Scholar]
  24. Gavish M, Donoho D. 2012. Three dream applications of verifiable computational results. Comput. Sci. Eng. 14:426–31 [Google Scholar]
  25. Gentleman R, Temple Lang D. 2007. Statistical analyses and reproducible research. J. Comput. Graph. Stat. 16:11–23 [Google Scholar]
  26. Getz G, Höfling H, Mesirov JP, Golub TR, Meyerson M. et al. 2007. Comment on “The Consensus Coding Sequences of Human Breast and Colorectal Cancers. Science 317:58441500 [Google Scholar]
  27. Grove CC. 1930. Review of Statistical Methods for Research Workers by RA Fisher. Am. Math. Mon. 37:10547–50 [Google Scholar]
  28. Groves RM, Lyberg L. 2010. Total survey error: past, present, and future. Public Opin. Q. 74:5849–79 [Google Scholar]
  29. Heller R, Bogomolov M, Benjamini Y. 2013. Deciding whether follow-up studies have replicated findings in a preliminary large-scale “omics' study.”. arXiv:1310.0606 [stat.AP]
  30. Herndon T, Ash M, Pollin R. 2013. Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Camb. J. Econ. 38:257–79 [Google Scholar]
  31. Hines WC, Su Y, Kuhn I, Polyak K, Bissell MJ. 2014. Sorting out the FACS: a devil in the details. Cell Rep. 6:779–81 [Google Scholar]
  32. Hiltzik M. 2013. Science has lost its way, at a big cost to humanity. Los Angeles Times Oct. 27. http://articles.latimes.com/2013/oct/27/business/la-fi-hiltzik-20131027 [Google Scholar]
  33. Holm S. 1979. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6:265–70 [Google Scholar]
  34. Hong W-J, Warnke R, Chu G. 2005. Immune signatures in follicular lymphoma. N. Engl. J. Med. 352:1496–97 [Google Scholar]
  35. Ioannidis JPA. 2005. Why most published research findings are false. PLOS Med. 2:8e124 [Google Scholar]
  36. Jasny BR, Chin G, Chong L, Vignieri S. 2011. Again, and again, and again…. Science 334:60601225–25 [Google Scholar]
  37. Johnson G. 2014. New truths that only one can see. New York Times Jan. 20. http://www.nytimes.com/2014/01/21/science/new-truths-that-only-one-can-see.html [Google Scholar]
  38. Johnson VE. 2013. Revised standards for statistical evidence. PNAS 110:4819313–17 [Google Scholar]
  39. Katsnelson A. 2014. Male researchers stress out rodents. Nat. News 28 Apr. http://www.nature.com/news/male-researchers-stress-out-rodents-1.15106 [Google Scholar]
  40. Kelly T, Marians K. 2014. Rescuing US biomedical research: some comments on Alberts, Kirschner, Tilghman, and Varmus. PNAS 111:E2632–33 [Google Scholar]
  41. King G. 1995. Replication, replication. PS Polit. Sci. Polit. 28:443–99 [Google Scholar]
  42. Kleiner A, Talwalkar A, Sarkar P, Jordan MI. 2014. A scalable bootstrap for massive data. J. R. Stat. Soc. B. 76:795–816 [Google Scholar]
  43. Kreuter F, Peng RD. 2014. Extracting information from Big Data: issues of measurement, inference and linkage. See Lane et al. 2014 257–75
  44. Landis SC, Amara SG, Asadullah K, Austin CP, Blumenstein R. et al. 2012. A call for transparent reporting to optimize the predictive value of preclinical research. Nature 490:7419187–91 [Google Scholar]
  45. Lane JI, Stodden V, Bender S, Nissenbaum H. 2014. Privacy, Big Data, and the Public Good: Frameworks for Engagement New York: Cambridge Univ. Press [Google Scholar]
  46. Lehmann EL. 1993. The Fisher, Neyman–Pearson theories of testing hypotheses: One theory or two?. J. Am. Stat. Assoc. 88:1242–49 [Google Scholar]
  47. LeVeque RJ. 2007. Wave propagation software, computational science, and reproducible research. Proc. Int. Congr. Math., Madrid, Aug. 22–301227–54 Zurich, Switz: Eur. Math. Soc. Preprint URL http://faculty.washington.edu/rjl/pubs/icm06/icm06leveque.pdf [Google Scholar]
  48. LeVeque RJ, Mitchell IM, Stodden V. 2012. Reproducible research for scientific computing: tools and strategies for changing the culture. Comput. Sci. Eng. 14:413–17 [Google Scholar]
  49. Li Q, Brown JB, Huang H, Bickel PJ. 2011. Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat. 5:31752–79 [Google Scholar]
  50. Lim C, Yu B. 2013. Estimation stability with cross validation (ESCV). arXiv:1303.3128 [stat.ME]
  51. Madigan D, Stang PE, Berlin JA, Schuemie M, Overhage JM. et al. 2014. A systematic statistical approach to evaluating evidence from observational studies. Annu. Rev. Stat. Appl. 1:11–39 [Google Scholar]
  52. Masicampo EJ, Lalande DR. 2012. A peculiar prevalence of p values just below.05. Q. J. Exp. Psychol. 65:112271–79 [Google Scholar]
  53. McNutt M. 2014a. Raising the bar. Science 345:61929 [Google Scholar]
  54. McNutt M. 2014b. Reproducibility. Science 343:6168229 [Google Scholar]
  55. Nat. Publ. Group 2012. Must try harder. Nature 483:7391509 [Google Scholar]
  56. Nat. Publ. Group. 2013. Announcement: reducing our irreproducibility. Nature 496:7446398 [Google Scholar]
  57. Nat. Res. Counc. (Nat. Res. Counc. Comm. Math. Found. Verif. Valid. Uncertain. Quantif.) 2012. Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. Washington, DC: Nat. Acad. Press http://www.nap.edu/catalog/13395/assessing-the-reliability-of-complex-models-mathematical-and-statistical-foundations [Google Scholar]
  58. Nat. Res. Counc. (Nat. Res. Counc. Comm. Responsib. Authorship Biol. Sci.) 2003. Sharing Publication-Related Data and Materials Responsibilities of Authorship in the Life Sciences Washington, DC: Nat. Acad. Press http://www.nap.edu/catalog/10613/sharing-publication-related-data-and-materials-responsibilities-of-authorship-in [Google Scholar]
  59. Neyman J, Pearson ES. 1933. On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. R. Soc. Lond. A 231:289–337 [Google Scholar]
  60. Nosek BA, Spies JR, Motyl M. 2012. Scientific Utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspect. Psychol. Sci. 7:6615–31 [Google Scholar]
  61. NSF (Nat. Sci. Found.) 2011. NSF Data Management Plan Requirements Arlington, VA: Nat. Sci. Found http://www.nsf.gov/eng/general/dmp.jsp [Google Scholar]
  62. NSF (Nat. Sci. Found.) 2013. GPG Summary of Significant Changes Arlington, VA: Nat. Sci. Found http://nsf.gov/pubs/policydocs/pappguide/nsf13001/gpg_sigchanges.jsp [Google Scholar]
  63. Obama B. 2013. Executive order—making open and machine readable the new default for government information White House, Off. Press Secr., Washington, DC, May 9. http://www.whitehouse.gov/the-press-office/2013/05/09/executive-order-making-open-and-machine-readable-new-default-government [Google Scholar]
  64. Open Source Initiative 2013a. The BSD 2-Clause License Palo Alto, CA: Open Source Initiative http://opensource.org/licenses/BSD-2-Clause [Google Scholar]
  65. Open Source Initiative 2013b. The MIT License (MIT) Palo Alto, CA: Open Source Initiative http://opensource.org/licenses/MIT [Google Scholar]
  66. Pachter L. 2014. The network nonsense of Manolis Kellis. Bits of DNA: Rev. Comment. Comput. Biol. Blog Feb. 11. https://liorpachter.wordpress.com/2014/02/11/the-network-nonsense-of-manolis-kellis/ [Google Scholar]
  67. Panel Sci. Responsib. Conduct Res., Comm. Sci. Eng. Public Policy, Nat. Acad. Sci., Nat. Acad. Eng., Inst. Med 1992. Responsible Science I Ensuring the Integrity of the Research Process. Washington, DC: Nat. Acad. Press http://www.nap.edu/catalog/1864/responsible-science-volume-i-ensuring-the-integrity-of-the-research [Google Scholar]
  68. Peng RD. 2009. Reproducible research and biostatistics. Biostatistics 10:3405–8 [Google Scholar]
  69. Peng RD. 2011. Reproducible research in computational science. Science 334:60601226–27 [Google Scholar]
  70. Polanyi M. 1962. Personal Knowledge: Towards a Post-Critical Philosophy London: Routledge [Google Scholar]
  71. Polanyi M. 1967. The Tacit Dimension Garden City, NY: Anchor Books [Google Scholar]
  72. Popper KR. 2002. The Logic of Scientific Discovery London: Routledge [Google Scholar]
  73. Prinz F, Schlange T, Asadullah K. 2011. Believe it or not: How much can we rely on published data on potential drug targets?. Nat. Rev. Drug Discov. 10:9712–12 [Google Scholar]
  74. R Core Team 2014. R: A Language and Environment for Statistical Computing Vienna: R Found. Stat. Comput http://www.R-project.org/ [Google Scholar]
  75. Reich ES. 2011. Cancer trial errors revealed. Nature 469:139–40 [Google Scholar]
  76. Reinhart CM, Rogoff KS. 2010. Growth in a time of debt. Am. Econ. Rev. 100:573–78 [Google Scholar]
  77. Rockey S. 2014. Changes to the biosketch. Extramural Nexus May 22. http://nexus.od.nih.gov/all/2014/05/22/changes-to-the-biosketch/ [Google Scholar]
  78. Rosenthal R. 1979. The “file drawer problem” and tolerance for null results. Psychol. Bull. 86:3638–41 [Google Scholar]
  79. Rubin AF, Green P. 2007. Comment on “The Consensus Coding Sequences of Human Breast and Colorectal Cancers.”. Science 317:58441500 [Google Scholar]
  80. Sandve GK, Nekrutenko A, Taylor J, Hovig E. 2013. Ten simple rules for reproducible computational research. PLOS Comput Biol. 9:10e1003285 [Google Scholar]
  81. Schuemie MJ, Ryan PB, DuMouchel W, Suchard MA, Madigan D. 2014. Interpreting observational studies: why empirical calibration is needed to correct p-values. Stat. Med. 33:2209–18 [Google Scholar]
  82. Shapin S, Schaffer S. 1989. Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life Princeton, NJ: Princeton Univ. Press [Google Scholar]
  83. Shi J. 2014. Seeking the principles of sustainable software engineering. arXiv:1405.4464 [cs.DC]
  84. Siegfried T. 2013. Science's significant stats problem: Researchers' rituals for assessing probability may mislead as much as they enlighten. Nautilus Aug. 22. http://nautil.us/issue/4/the-unlikely/sciences-significant-stats-problem [Google Scholar]
  85. Simmons JP, Nelson LD, Simonsohn U. 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22:111359–66 [Google Scholar]
  86. Sjöblom T, Jones S, Wood LD, Parsons DW, Lin J. et al. 2006. The consensus coding sequences of human breast and colorectal cancers. Science 314:268–74 [Google Scholar]
  87. Staudt LM, Wright G, Dave S. 2005. Immune signatures in follicular lymphoma. N. Engl. J. Med. 352:1496–97 [Google Scholar]
  88. Stebbins M. 2013. Expanding public access to the results of federally funded research. OSTP Blog Feb. 22. http://www.whitehouse.gov/blog/2013/02/22/expanding-public-access-results-federally-funded-research [Google Scholar]
  89. Stock JH, Watson MW. 2011. Introduction to Econometrics Boston: Addison-Wesley, 3rd ed.. [Google Scholar]
  90. Stodden V. 2009a. Enabling reproducible research: licensing for scientific innovation. Intl. J. Comm. Pol. 13:1 [Google Scholar]
  91. Stodden V. 2009b. The legal framework for reproducible scientific research: licensing and copyright. Comput. Sci. Eng. 11:135–40 [Google Scholar]
  92. Stodden V. 2011. Trust your science? Open your data and code. Amstat News July 1. http://magazine.amstat.org/blog/2011/07/01/trust-your-science/ [Google Scholar]
  93. Stodden V. 2012. Reproducible research: tools and strategies for scientific computing. Comput. Sci. Eng. 14:411–12 [Google Scholar]
  94. Stodden V. 2013. Resolving irreproducibility in empirical and computational research IMS Bull. Nov. 17. http://bulletin.imstat.org/2013/11/resolving-irreproducibility-in-empirical-and-computational-research/
  95. Stodden V. 2014a. Intellectual property and computational science. Opening Science: The Evolving Guide on How the Internet Is Changing Research, Collaboration and Scholarly Publishing S Bartling, S Fiesike 225–35 New York: Springer Open [Google Scholar]
  96. Stodden V. 2014b. Enabling reproducibility in Big Data research: balancing confidentiality and scientific transparency. See Lane et al. 2014 112–32 [Google Scholar]
  97. Stodden V, Bailey DH, Borwein J, LeVeque RJ, Rider W, Stein W. 2012. Setting the default to reproducible: reproducibility in computational and experimental mathematics Collab. Rep., ICERM (Inst. Computat. Exp. Res. Math.), Brown Univ., Dec. 10–14, Providence, RI. http://icerm.brown.edu/html/programs/topical/tw12_5_rcem/icerm_report.pdf [Google Scholar]
  98. Stodden V, Borwein J, Bailey D. 2013. “Setting the default to reproducible” in computational science research. SIAM News Jun. 3. http://www.siam.org/news/news.php?id=2078 [Google Scholar]
  99. Stodden V, Leisch F, Peng RD. 2014. Implementing Reproducible Research Boca Raton, FL: CRC Press [Google Scholar]
  100. Stodden V, Miguez S. 2013. Best practices for computational science: software infrastructure and environments for reproducible and extensible research. J. Open Res. Softw. 2:e21 [Google Scholar]
  101. Stodden V, Miguez S, Seiler J. 2015. ResearchCompendia.org: cyberinfrastructure for reproducibility and collaboration in computational science. Comput. Sci. Eng. 17:12–19 [Google Scholar]
  102. Tibshirani R. 2004. Re-analysis of Dave et al., NEJM Nov. 18, 2004 Rep., Stanford Univ., Stanford, CA. http://statweb.stanford.edu/∼tibs/FL/report/ [Google Scholar]
  103. Tibshirani R. 2005. Immune signatures in follicular lymphoma. N. Engl. J. Med. 352:1496–97 [Google Scholar]
  104. WSSSPE (Work. Sustain. Softw. Sci. Pract. Exp.) 2013. Contributions. WSSSPE Workshop 1, Denver, CO Nov. 17. http://wssspe.researchcomputing.org.uk/wssspe1/contributions/ [Google Scholar]
  105. Yu B. 2013. Stability. Bernoulli 19:41484–500 [Google Scholar]
/content/journals/10.1146/annurev-statistics-010814-020127
Loading
/content/journals/10.1146/annurev-statistics-010814-020127
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error