Reproducing Statistical Results

Victoria Stodden

doi:10.1146/annurev-statistics-010814-020127

Annual Review of Statistics and Its Application

Volume 2, 2015

Review Article

Free

Reproducing Statistical Results

Victoria Stodden¹
View Affiliations Hide Affiliations

Affiliations: Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, Champaign, IL 61820-6211; email: [email protected]
Vol. 2:1-19 (Volume publication date May 2015) https://doi.org/10.1146/annurev-statistics-010814-020127
First published as a Review in Advance on February 20, 2015
© Annual Reviews

Abstract

The reproducibility of statistical findings has become a concern not only for statisticians, but for all researchers engaged in empirical discovery. Section 2 of this article identifies key reasons statistical findings may not replicate, including power and sampling issues; misapplication of statistical tests; the instability of findings under reasonable perturbations of data or models; lack of access to methods, data, or equipment; and cultural barriers such as researcher incentives and rewards. Section 3 discusses five proposed remedies for these replication failures: improved prepublication and postpublication validation of findings; the complete disclosure of research steps; assessment of the stability of statistical findings; providing access to digital research objects, in particular data and software; and ensuring these objects are legally reusable.

Keyword(s): code sharing, data sharing, open code, open data, open licensing, open science, replication, reproducible research, statistical reproducibility

Article metrics loading...

/content/journals/10.1146/annurev-statistics-010814-020127

2015-04-10

2024-04-20

Full text loading...

/deliver/fulltext/statistics/2/1/annurev-statistics-010814-020127.html?itemId=/content/journals/10.1146/annurev-statistics-010814-020127&mimeType=html&fmt=ahah

Literature Cited

Abdi H. 2007. The Bonferroni and Šidák corrections for multiple comparisons. Encycl. Meas. Stat. 3:103–7 [Google Scholar]
Alberts B, Kirschner MW, Tilghman S, Varmus H. 2014. Rescuing US biomedical research from its systemic flaws. PNAS 18: 111:165773–77 [Google Scholar]
Am. Libr. Assoc 2014. The Fair Access to Science and Technology Research Act (FASTR). Chicago, IL: ALA http://www.ala.org/advocacy/access/legislation/fastr [Google Scholar]
Anderson JA, Eijkholt M, Illes J. 2013. Ethical reproducibility: towards transparent reporting in biomedical research. Nat. Methods 10:9843 [Google Scholar]
Assoc. Psychol. Sci 2013. Registered Replication Reports. Washington, DC: APS http://www.psychologicalscience.org/index.php/replication
Baggerly KA, Coombes KR. 2009. Deriving chemosensitivity from cell lines: forensic bioinformatics and reproducible research in high-throughput biology. Ann. Appl. Stat. 3:1309–34 [Google Scholar]
Bailey DH, Borwein JM, Stodden V. 2013. Set the default to “open.”. Not. Am. Math. Soc. 60:061 [Google Scholar]
Begley CG, Ellis LM. 2012. Drug development: Raise standards for preclinical cancer research. Nature 483:7391531–33 [Google Scholar]
Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57:1289–300 [Google Scholar]
Bogomolov M, Heller R. 2013. Discovering findings that replicate from a primary study of high dimension to a follow-up study. J. Am. Stat. Assoc. 108:5041480–92 [Google Scholar]
Boyle R. 2007 (1661). The Sceptical Chymistor Chymico-Physical Doubts & Paradoxes, Touching the Spagyrist's Principles Commonly Call'd Hypostatical; As They Are Wont to Be Propos'd and Defended by the Generality of Alchymists. Whereunto Is Præmis'd Part of Another Discourse Relating to the Same Subject. Salt Lake City, UT: Proj. Gutenberg http://www.gutenberg.org/ebooks/22914
Collberg C, Proebsting T, Moraila G, Shankaran A, Shi Z, Warren AM. 2014. Measuring reproducibility in computer systems research Tech. Rep., Dep. Comput. Sci., Univ. Ariz., Tucson. http://reproducibility.cs.arizona.edu/tr.pdf
Creative Commons 2013a. CC0 1.0 Universal (CC0 1.0) Public Domain Dedication Mountain View, CA: Creative Commons https://creativecommons.org/publicdomain/zero/1.0/ [Google Scholar]
Creative Commons 2013b. Creative Commons—Attribution 4.0 International (CC BY 4.0) Mountain View, CA: Creative Commons https://creativecommons.org/licenses/by/4.0/ [Google Scholar]
Dave SS, Wright G, Tan B, Rosenwald A, Gascoyne RD. et al. 2004. Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. N. Engl. J. Med. 351:2159–69 [Google Scholar]
Donoho DL, Maleki A, Rahman IU, Shahram M, Stodden V. 2009. Reproducible research in computational harmonic analysis. Comput. Sci. Eng. 1:8–18 [Google Scholar]
Donohue JJ, Wolfers JJ. 2006. Uses and abuses of empirical evidence in the death penalty debate. Stanford Law Rev. 58:791–846 [Google Scholar]
Economist Staff Writer 2013a. Problems with scientific research: how science goes wrong. The Economist Oct. 19. http://www.economist.com/news/leaders/21588069-scientific-research-has-changed-world-now-it-needs-change-itself-how-science-goes-wrong
Economist Staff Writer 2013b. Unreliable research: trouble at the lab. The Economist Oct. 19. http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble
Fanelli D. 2010. Do pressures to publish increase scientists' bias? An empirical support from US states data. PLOS ONE 5:4e10271 [Google Scholar]
Feizi S, Marbach D, Médard M, Kellis M. 2013. Network deconvolution as a general method to distinguish direct dependencies in networks. Nat. Biotechnol. 31:726–33 [Google Scholar]
Fienberg SE, Martin ME, Straf ML. 1985. Sharing Research Data Washington, DC: Nat. Acad. Press
Forrest WF, Cavet G. 2007. Comment on “The Consensus Coding Sequences of Human Breast and Colorectal Cancers. Science 317:58441500 [Google Scholar]
Gavish M, Donoho D. 2012. Three dream applications of verifiable computational results. Comput. Sci. Eng. 14:426–31 [Google Scholar]
Gentleman R, Temple Lang D. 2007. Statistical analyses and reproducible research. J. Comput. Graph. Stat. 16:11–23 [Google Scholar]
Getz G, Höfling H, Mesirov JP, Golub TR, Meyerson M. et al. 2007. Comment on “The Consensus Coding Sequences of Human Breast and Colorectal Cancers. Science 317:58441500 [Google Scholar]
Grove CC. 1930. Review of Statistical Methods for Research Workers by RA Fisher. Am. Math. Mon. 37:10547–50 [Google Scholar]
Groves RM, Lyberg L. 2010. Total survey error: past, present, and future. Public Opin. Q. 74:5849–79 [Google Scholar]
Heller R, Bogomolov M, Benjamini Y. 2013. Deciding whether follow-up studies have replicated findings in a preliminary large-scale “omics' study.”. arXiv:1310.0606 [stat.AP]
Herndon T, Ash M, Pollin R. 2013. Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Camb. J. Econ. 38:257–79 [Google Scholar]
Hines WC, Su Y, Kuhn I, Polyak K, Bissell MJ. 2014. Sorting out the FACS: a devil in the details. Cell Rep. 6:779–81 [Google Scholar]
Hiltzik M. 2013. Science has lost its way, at a big cost to humanity. Los Angeles Times Oct. 27. http://articles.latimes.com/2013/oct/27/business/la-fi-hiltzik-20131027
Holm S. 1979. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6:265–70 [Google Scholar]
Hong W-J, Warnke R, Chu G. 2005. Immune signatures in follicular lymphoma. N. Engl. J. Med. 352:1496–97 [Google Scholar]
Ioannidis JPA. 2005. Why most published research findings are false. PLOS Med. 2:8e124 [Google Scholar]
Jasny BR, Chin G, Chong L, Vignieri S. 2011. Again, and again, and again…. Science 334:60601225–25 [Google Scholar]
Johnson G. 2014. New truths that only one can see. New York Times Jan. 20. http://www.nytimes.com/2014/01/21/science/new-truths-that-only-one-can-see.html
Johnson VE. 2013. Revised standards for statistical evidence. PNAS 110:4819313–17 [Google Scholar]
Katsnelson A. 2014. Male researchers stress out rodents. Nat. News 28 Apr. http://www.nature.com/news/male-researchers-stress-out-rodents-1.15106
Kelly T, Marians K. 2014. Rescuing US biomedical research: some comments on Alberts, Kirschner, Tilghman, and Varmus. PNAS 111:E2632–33 [Google Scholar]
King G. 1995. Replication, replication. PS Polit. Sci. Polit. 28:443–99 [Google Scholar]
Kleiner A, Talwalkar A, Sarkar P, Jordan MI. 2014. A scalable bootstrap for massive data. J. R. Stat. Soc. B. 76:795–816 [Google Scholar]
Kreuter F, Peng RD. 2014. Extracting information from Big Data: issues of measurement, inference and linkage. See Lane et al. 2014 257–75
Landis SC, Amara SG, Asadullah K, Austin CP, Blumenstein R. et al. 2012. A call for transparent reporting to optimize the predictive value of preclinical research. Nature 490:7419187–91 [Google Scholar]
Lane JI, Stodden V, Bender S, Nissenbaum H. 2014. Privacy, Big Data, and the Public Good: Frameworks for Engagement New York: Cambridge Univ. Press
Lehmann EL. 1993. The Fisher, Neyman–Pearson theories of testing hypotheses: One theory or two?. J. Am. Stat. Assoc. 88:1242–49 [Google Scholar]
LeVeque RJ. 2007. Wave propagation software, computational science, and reproducible research. Proc. Int. Congr. Math., Madrid, Aug. 22–301227–54 Zurich, Switz: Eur. Math. Soc. Preprint URL http://faculty.washington.edu/rjl/pubs/icm06/icm06leveque.pdf [Google Scholar]
LeVeque RJ, Mitchell IM, Stodden V. 2012. Reproducible research for scientific computing: tools and strategies for changing the culture. Comput. Sci. Eng. 14:413–17 [Google Scholar]
Li Q, Brown JB, Huang H, Bickel PJ. 2011. Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat. 5:31752–79 [Google Scholar]
Lim C, Yu B. 2013. Estimation stability with cross validation (ESCV). arXiv:1303.3128 [stat.ME]
Madigan D, Stang PE, Berlin JA, Schuemie M, Overhage JM. et al. 2014. A systematic statistical approach to evaluating evidence from observational studies. Annu. Rev. Stat. Appl. 1:11–39 [Google Scholar]
Masicampo EJ, Lalande DR. 2012. A peculiar prevalence of p values just below.05. Q. J. Exp. Psychol. 65:112271–79 [Google Scholar]
McNutt M. 2014a. Raising the bar. Science 345:61929 [Google Scholar]
McNutt M. 2014b. Reproducibility. Science 343:6168229 [Google Scholar]
Nat. Publ. Group 2012. Must try harder. Nature 483:7391509 [Google Scholar]
Nat. Publ. Group. 2013. Announcement: reducing our irreproducibility. Nature 496:7446398 [Google Scholar]
Nat. Res. Counc. (Nat. Res. Counc. Comm. Math. Found. Verif. Valid. Uncertain. Quantif.) 2012. Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. Washington, DC: Nat. Acad. Press http://www.nap.edu/catalog/13395/assessing-the-reliability-of-complex-models-mathematical-and-statistical-foundations
Nat. Res. Counc. (Nat. Res. Counc. Comm. Responsib. Authorship Biol. Sci.) 2003. Sharing Publication-Related Data and Materials Responsibilities of Authorship in the Life Sciences Washington, DC: Nat. Acad. Press http://www.nap.edu/catalog/10613/sharing-publication-related-data-and-materials-responsibilities-of-authorship-in
Neyman J, Pearson ES. 1933. On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. R. Soc. Lond. A 231:289–337 [Google Scholar]
Nosek BA, Spies JR, Motyl M. 2012. Scientific Utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspect. Psychol. Sci. 7:6615–31 [Google Scholar]
NSF (Nat. Sci. Found.) 2011. NSF Data Management Plan Requirements Arlington, VA: Nat. Sci. Found http://www.nsf.gov/eng/general/dmp.jsp
NSF (Nat. Sci. Found.) 2013. GPG Summary of Significant Changes Arlington, VA: Nat. Sci. Found http://nsf.gov/pubs/policydocs/pappguide/nsf13001/gpg_sigchanges.jsp
Obama B. 2013. Executive order—making open and machine readable the new default for government information White House, Off. Press Secr., Washington, DC, May 9. http://www.whitehouse.gov/the-press-office/2013/05/09/executive-order-making-open-and-machine-readable-new-default-government
Open Source Initiative 2013a. The BSD 2-Clause License Palo Alto, CA: Open Source Initiative http://opensource.org/licenses/BSD-2-Clause
Open Source Initiative 2013b. The MIT License (MIT) Palo Alto, CA: Open Source Initiative http://opensource.org/licenses/MIT [Google Scholar]
Pachter L. 2014. The network nonsense of Manolis Kellis. Bits of DNA: Rev. Comment. Comput. Biol. Blog Feb. 11. https://liorpachter.wordpress.com/2014/02/11/the-network-nonsense-of-manolis-kellis/
Panel Sci. Responsib. Conduct Res., Comm. Sci. Eng. Public Policy, Nat. Acad. Sci., Nat. Acad. Eng., Inst. Med 1992. Responsible Science I Ensuring the Integrity of the Research Process. Washington, DC: Nat. Acad. Press http://www.nap.edu/catalog/1864/responsible-science-volume-i-ensuring-the-integrity-of-the-research
Peng RD. 2009. Reproducible research and biostatistics. Biostatistics 10:3405–8 [Google Scholar]
Peng RD. 2011. Reproducible research in computational science. Science 334:60601226–27 [Google Scholar]
Polanyi M. 1962. Personal Knowledge: Towards a Post-Critical Philosophy London: Routledge
Polanyi M. 1967. The Tacit Dimension Garden City, NY: Anchor Books
Popper KR. 2002. The Logic of Scientific Discovery London: Routledge
Prinz F, Schlange T, Asadullah K. 2011. Believe it or not: How much can we rely on published data on potential drug targets?. Nat. Rev. Drug Discov. 10:9712–12 [Google Scholar]
R Core Team 2014. R: A Language and Environment for Statistical Computing Vienna: R Found. Stat. Comput http://www.R-project.org/
Reich ES. 2011. Cancer trial errors revealed. Nature 469:139–40 [Google Scholar]
Reinhart CM, Rogoff KS. 2010. Growth in a time of debt. Am. Econ. Rev. 100:573–78 [Google Scholar]
Rockey S. 2014. Changes to the biosketch. Extramural Nexus May 22. http://nexus.od.nih.gov/all/2014/05/22/changes-to-the-biosketch/
Rosenthal R. 1979. The “file drawer problem” and tolerance for null results. Psychol. Bull. 86:3638–41 [Google Scholar]
Rubin AF, Green P. 2007. Comment on “The Consensus Coding Sequences of Human Breast and Colorectal Cancers.”. Science 317:58441500 [Google Scholar]
Sandve GK, Nekrutenko A, Taylor J, Hovig E. 2013. Ten simple rules for reproducible computational research. PLOS Comput Biol. 9:10e1003285 [Google Scholar]
Schuemie MJ, Ryan PB, DuMouchel W, Suchard MA, Madigan D. 2014. Interpreting observational studies: why empirical calibration is needed to correct p-values. Stat. Med. 33:2209–18 [Google Scholar]
Shapin S, Schaffer S. 1989. Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life Princeton, NJ: Princeton Univ. Press
Shi J. 2014. Seeking the principles of sustainable software engineering. arXiv:1405.4464 [cs.DC]
Siegfried T. 2013. Science's significant stats problem: Researchers' rituals for assessing probability may mislead as much as they enlighten. Nautilus Aug. 22. http://nautil.us/issue/4/the-unlikely/sciences-significant-stats-problem
Simmons JP, Nelson LD, Simonsohn U. 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22:111359–66 [Google Scholar]
Sjöblom T, Jones S, Wood LD, Parsons DW, Lin J. et al. 2006. The consensus coding sequences of human breast and colorectal cancers. Science 314:268–74 [Google Scholar]
Staudt LM, Wright G, Dave S. 2005. Immune signatures in follicular lymphoma. N. Engl. J. Med. 352:1496–97 [Google Scholar]
Stebbins M. 2013. Expanding public access to the results of federally funded research. OSTP Blog Feb. 22. http://www.whitehouse.gov/blog/2013/02/22/expanding-public-access-results-federally-funded-research
Stock JH, Watson MW. 2011. Introduction to Econometrics Boston: Addison-Wesley, 3rd ed..
Stodden V. 2009a. Enabling reproducible research: licensing for scientific innovation. Intl. J. Comm. Pol. 13:1 [Google Scholar]
Stodden V. 2009b. The legal framework for reproducible scientific research: licensing and copyright. Comput. Sci. Eng. 11:135–40 [Google Scholar]
Stodden V. 2011. Trust your science? Open your data and code. Amstat News July 1. http://magazine.amstat.org/blog/2011/07/01/trust-your-science/
Stodden V. 2012. Reproducible research: tools and strategies for scientific computing. Comput. Sci. Eng. 14:411–12 [Google Scholar]
Stodden V. 2013. Resolving irreproducibility in empirical and computational research IMS Bull. Nov. 17. http://bulletin.imstat.org/2013/11/resolving-irreproducibility-in-empirical-and-computational-research/
Stodden V. 2014a. Intellectual property and computational science. Opening Science: The Evolving Guide on How the Internet Is Changing Research, Collaboration and Scholarly Publishing S Bartling, S Fiesike 225–35 New York: Springer Open [Google Scholar]
Stodden V. 2014b. Enabling reproducibility in Big Data research: balancing confidentiality and scientific transparency. See Lane et al. 2014 112–32 [Google Scholar]
Stodden V, Bailey DH, Borwein J, LeVeque RJ, Rider W, Stein W. 2012. Setting the default to reproducible: reproducibility in computational and experimental mathematics Collab. Rep., ICERM (Inst. Computat. Exp. Res. Math.), Brown Univ., Dec. 10–14, Providence, RI. http://icerm.brown.edu/html/programs/topical/tw12_5_rcem/icerm_report.pdf
Stodden V, Borwein J, Bailey D. 2013. “Setting the default to reproducible” in computational science research. SIAM News Jun. 3. http://www.siam.org/news/news.php?id=2078
Stodden V, Leisch F, Peng RD. 2014. Implementing Reproducible Research Boca Raton, FL: CRC Press
Stodden V, Miguez S. 2013. Best practices for computational science: software infrastructure and environments for reproducible and extensible research. J. Open Res. Softw. 2:e21 [Google Scholar]
Stodden V, Miguez S, Seiler J. 2015. ResearchCompendia.org: cyberinfrastructure for reproducibility and collaboration in computational science. Comput. Sci. Eng. 17:12–19 [Google Scholar]
Tibshirani R. 2004. Re-analysis of Dave et al., NEJM Nov. 18, 2004 Rep., Stanford Univ., Stanford, CA. http://statweb.stanford.edu/∼tibs/FL/report/
Tibshirani R. 2005. Immune signatures in follicular lymphoma. N. Engl. J. Med. 352:1496–97 [Google Scholar]
WSSSPE (Work. Sustain. Softw. Sci. Pract. Exp.) 2013. Contributions. WSSSPE Workshop 1, Denver, CO Nov. 17. http://wssspe.researchcomputing.org.uk/wssspe1/contributions/
Yu B. 2013. Stability. Bernoulli 19:41484–500 [Google Scholar]

/content/journals/10.1146/annurev-statistics-010814-020127

Reproducing Statistical Results

Annual Review of Statistics and Its Application 2, 1 (2015); https://doi.org/10.1146/annurev-statistics-010814-020127

/content/journals/10.1146/annurev-statistics-010814-020127

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 2, 2015

Review Article

Free

Reproducing Statistical Results

Abstract

Most Read This Month

Most Cited Most Cited RSS feed