1932

Abstract

Advances in computing technology have spurred two extraordinary phenomena in science: large-scale and high-throughput data collection coupled with the creation and implementation of complex statistical algorithms for data analysis. These two phenomena have brought about tremendous advances in scientific discovery but have raised two serious concerns. The complexity of modern data analyses raises questions about the reproducibility of the analyses, meaning the ability of independent analysts to recreate the results claimed by the original authors using the original data and analysis techniques. Reproducibility is typically thwarted by a lack of availability of the original data and computer code. A more general concern is the replicability of scientific findings, which concerns the frequency with which scientific claims are confirmed by completely independent investigations. Although reproducibility and replicability are related, they focus on different aspects of scientific progress. In this review, we discuss the origins of reproducible research, characterize the current status of reproducibility in public health research, and connect reproducibility to current concerns about the replicability of scientific findings. Finally, we describe a path forward for improving both the reproducibility and replicability of public health research in the future.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-publhealth-012420-105110
2021-04-01
2024-06-15
Loading full text...

Full text loading...

/deliver/fulltext/publhealth/42/1/annurev-publhealth-012420-105110.html?itemId=/content/journals/10.1146/annurev-publhealth-012420-105110&mimeType=html&fmt=ahah

Literature Cited

  1. 1. 
    Baggerly K. 2010. Disclose all data in publications. Nature 467:7314401–1
    [Google Scholar]
  2. 2. 
    Baggerly KA, Coombes KR. 2009. Deriving chemosensitivity from cell lines: forensic bioinformatics and reproducible research in high-throughput biology. Ann. Appl. Stat. 3:41309–34
    [Google Scholar]
  3. 3. 
    Barba LA. 2018. Terminologies for reproducible research. arXiv:1802.03311 [cs.DL]
    [Google Scholar]
  4. 4. 
    Barnett AG, Huang C, Turner L. 2012. Benefits of publicly available data. Epidemiology 23:3500–1
    [Google Scholar]
  5. 5. 
    Bendavid E, Mulaney B, Sood N, Shah S, Ling E et al. 2020. COVID-19 antibody seroprevalence in Santa Clara County, California. MedRxiv 2020.04.14.20062463
    [Google Scholar]
  6. 6. 
    Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P et al. 2001. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat. Genet. 29:4365–71
    [Google Scholar]
  7. 7. 
    Broman KW, Woo KH. 2018. Data organization in spreadsheets. Am. Stat. 72:12–10
    [Google Scholar]
  8. 8. 
    Brook RD, Rajagopalan S, Pope CA 3rd, Brook JR, Bhatnagar A et al. 2010. Particulate matter air pollution and cardiovascular disease: an update to the scientific statement from the American Heart Association. Circulation 121:212331–78
    [Google Scholar]
  9. 9. 
    Buckheit JB, Donoho DL. 1995. Wavelab and reproducible research. Wavelets and Statistics A Antoniadis 55–81 New York: Springer-Verlag
    [Google Scholar]
  10. 10. 
    Claerbout J, Schwab M. 2001. CD-rom versus the Web Rep. 84: Stanford Explor. Proj. Stanford, CA: http://sepwww.stanford.edu/public/docs/sep84/jon1.pdf
    [Google Scholar]
  11. 11. 
    Coombes KR, Wang J, Baggerly KA. 2007. Microarrays: retracing steps. Nat. Med. 13:1276–77
    [Google Scholar]
  12. 12. 
    Dockery DW, Pope CA 3rd, Xu X, Spengler JD, Ware JH et al. 1993. An association between air pollution and mortality in six U.S. cities. N. Engl. J. Med. 329:241753–59
    [Google Scholar]
  13. 13. 
    Drummond C. 2018. Reproducible research: a minority opinion. J. Exp. Theor. Artif. Intel. 30:11–11
    [Google Scholar]
  14. 14. 
    Ellis SE, Leek JT. 2018. How to share data for collaboration. Am. Stat. 72:153–57
    [Google Scholar]
  15. 15. 
    EPA (Environ. Prot. Agency) 2009. Integrated science assessment (ISA) for particulate matter Rep., EPA Natl. Cent. Environ. Assess. Washington, DC: https://cfpub.epa.gov/ncea/risk/recordisplay.cfm?deid=216546
    [Google Scholar]
  16. 16. 
    Foster ED, Deardorff A. 2017. Open science framework (OSF). J. Med. Libr. Assoc. 105:2203–6
    [Google Scholar]
  17. 17. 
    Gentleman R, Temple Lang D 2007. Statistical analyses and reproducible research. J. Comput. Graph. Stat. 16:11–23
    [Google Scholar]
  18. 18. 
    Goldberg P. 2014. Duke scientist: I hope NCI doesn't get original data. The Cancer Letter Jan. 16. https://cancerletter.com/articles/20150116_1/
    [Google Scholar]
  19. 19. 
    Goldberg P. 2015. Duke officials silenced med student who reported trouble in Anil Potti's lab. The Cancer Letter Jan. 9. https://cancerletter.com/articles/20150109_1/
    [Google Scholar]
  20. 20. 
    Goodman SN, Fanelli D, Ioannidis JPA. 2016. What does research reproducibility mean?. Sci. Transl. Med. 8:341341ps12
    [Google Scholar]
  21. 21. 
    Haibe-Kains B, Adam GA, Hosny A, Khodakarami FMAQC Soc. Board et al. 2020. The importance of transparency and reproducibility in artificial intelligence research. arXiv:2003.00898 [stat.AP]
    [Google Scholar]
  22. 22. 
    Hicks SC, Peng RD. 2019. Elements and principles of data analysis. arXiv:1903.07639 [stat.AP]
    [Google Scholar]
  23. 23. 
    Ioannidis JPA. 2005. Why most published research findings are false. PLOS Med 2:8e124
    [Google Scholar]
  24. 24. 
    Jager LR, Leek JT. 2014. An estimate of the science-wise false discovery rate and application to the top medical literature. Biostatistics 15:11–12
    [Google Scholar]
  25. 25. 
    King G. 2007. An introduction to the dataverse network as an infrastructure for data sharing. Sociol. Methods Res. 36:173–99
    [Google Scholar]
  26. 26. 
    Krewski D, Burnett RT, Goldberg MS, Hoover K, Siemiatycki J et al. 2000. Reanalysis of the Harvard Six Cities Study and the American Cancer Society study of particulate air pollution and mortality Rep., Health Effects Inst. Cambridge, MA: https://www.healtheffects.org/publication/reanalysis-harvard-six-cities-study-and-american-cancer-society-study-particulate-air
    [Google Scholar]
  27. 27. 
    Lancet Editors 2020. Expression of concern: Hydroxychloroquine or chloroquine with or without a macrolide for treatment of covid-19: a multinational registry analysis. Lancet 395:E102
    [Google Scholar]
  28. 28. 
    Leek JT, Peng RD. 2015. Opinion: Reproducible research can still be wrong: adopting a prevention approach. PNAS 112:61645–46
    [Google Scholar]
  29. 29. 
    Leinonen R, Sugawara H, Shumway M 2010. The sequence read archive. Nucleic Acids Res 39:Suppl. 1D19–21
    [Google Scholar]
  30. 30. 
    Mehra MR, Desai SS, Ruschitzka F, Patel AN. 2020. Retracted: Hydroxychloroquine or chloroquine with or without a macrolide for treatment of covid-19: a multinational registry analysis. Lancet. https://doi.org/10.1016/S0140-6736(20)31180-6
    [Crossref] [Google Scholar]
  31. 31. 
    Mehra MR, Ruschitzka F, Patel AN. 2020. Retraction—Hydroxychloroquine or chloroquine with or without a macrolide for treatment of covid-19: a multinational registry analysis. Lancet 395:P1820
    [Google Scholar]
  32. 32. 
    Natl. Acad. Sci. Eng. Med 2019. Reproducibility and Replicability in Science Washington, DC: Natl. Acad. Press
    [Google Scholar]
  33. 33. 
    Open Sci. Collab 2015. Estimating the reproducibility of psychological science. Science 349:6251aaac4716
    [Google Scholar]
  34. 34. 
    Patil P, Peng RD, Leek JT. 2016. What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspect. Psychol. Sci. 11:4539–44
    [Google Scholar]
  35. 35. 
    Patil P, Peng RD, Leek JT. 2019. A visual tool for defining reproducibility and replicability. Nat. Hum. Behav. 3:7650–52
    [Google Scholar]
  36. 36. 
    Peng RD. 2011. Reproducible research in computational science. Science 334:60601226–27
    [Google Scholar]
  37. 37. 
    Peng RD, Dominici F, Zeger SL. 2006. Reproducible epidemiologic research. Am. J. Epidemiol. 163:9783–89
    [Google Scholar]
  38. 38. 
    Plesser HE. 2017. Reproducibility vs. replicability: a brief history of a confused terminology. Front. Neuroinform. 11:76
    [Google Scholar]
  39. 39. 
    Pope CA 3rd, Thun MJ, Namboodiri MM, Dockery DW, Evans JS et al. 1995. Particulate air pollution as a predictor of mortality in a prospective study of US adults. Am. J. Respir. Crit. Care Med. 151:3669–74
    [Google Scholar]
  40. 40. 
    Potti A, Dressman HK, Bild A, Riedel RF, Chan G et al. 2006. Genomic signatures to guide the use of chemotherapeutics. Nat. Med. 12:111294–300
    [Google Scholar]
  41. 41. 
    R Core Team 2020.R: A language and environment for statistical computing. The R Project for Statistical Computing https://www.r-project.org/
  42. 42. 
    Samet JM, Dominici F, Zeger SL, Schwartz J, Dockery DW. 2000. The National Morbidity, Mortality, and Air Pollution Study. Part I: methods and methodological issues Res. Rep. 94-I, Health Effects Inst. Cambridge, MA: https://www.healtheffects.org/publication/national-morbidity-mortality-and-air-pollution-study-part-i-methods-and-methodologic
    [Google Scholar]
  43. 43. 
    Samet JM, Zeger SL, Dominici F, Curriero F, Coursac I et al. 2000. The National Morbidity, Mortality, and Air Pollution Study. Part II: morbidity and mortality from air pollution in the United States Res. Rep. 94-II Health Effects Inst. Cambridge, MA: https://www.healtheffects.org/publication/national-morbidity-mortality-and-air-pollution-study-part-ii-morbidity-and-mortality-air
    [Google Scholar]
  44. 44. 
    Sandve GK, Nekrutenko A, Taylor J, Hovig E. 2013. Ten simple rules for reproducible computational research. PLOS Comput. Biol. 9:10e1003285
    [Google Scholar]
  45. 45. 
    Schooler JW. 2014. Metascience could rescue the ‘replication crisis.’. Nature 515:75259
    [Google Scholar]
  46. 46. 
    Schwab M, Karrenbach N, Claerbout J. 2000. Making scientific computations reproducible. Comput. Sci. Eng. 2:661–67
    [Google Scholar]
  47. 47. 
    Simmons JP, Nelson LD, Simonsohn U. 2011. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22:111359–66
    [Google Scholar]
  48. 48. 
    Stodden V. 2015. Reproducing statistical results. Annu. Rev. Stat. Appl. 2:1–19
    [Google Scholar]
  49. 49. 
    Stodden V. 2020. Beyond open data: a model for linking digital artifacts to enable reproducibility of scientific claims. P-RECS ’20: Proceedings of the 3rd International Workshop on Practical Reproducible Evaluation of Computer Systems9–14 New York: ACM https://dl.acm.org/doi/10.1145/3391800.3398172
    [Google Scholar]
  50. 50. 
    Swanberg SM. 2017. Inter-University Consortium for Political and Social Research (ICPSR). J. Med. Libr. Assoc. 105:1106
    [Google Scholar]
  51. 51. 
    Wickham H. 2011. testthat: Get started with testing. R. J 3:5–10
    [Google Scholar]
  52. 52. 
    Wickham H. 2014. Tidy data. J. Stat. Softw. 59:101–23
    [Google Scholar]
  53. 53. 
    Wickham H, Hester J, Chang W. 2020. devtools: tools to make developing R packages easier. devtools https://devtools.r-lib.org/
    [Google Scholar]
/content/journals/10.1146/annurev-publhealth-012420-105110
Loading
  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error