1932

Abstract

The value of research data has grown as the emphasis on research transparency and data-intensive research has increased. Data sharing is now required by funders and publishers and is becoming a disciplinary expectation in many fields. However, practices promoting data reusability and research transparency are poorly understood, making it difficult for statisticians and other researchers to reframe study methods to facilitate data sharing. This article reviews the larger landscape of open research and describes contextual information that data reusers need to understand, evaluate, and appropriately analyze shared data. The article connects data reusability to statistical thinking by considering the impact of the type and quality of shared research artifacts on the capacity to reproduce or replicate studies and examining quality evaluation frameworks to understand the nature of data errors and how they can be mitigated prior to sharing. Actions statisticians can take to update their research approaches for their own and collaborative investigations are suggested.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-033121-105114
2023-03-09
2024-10-12
Loading full text...

Full text loading...

/deliver/fulltext/statistics/10/1/annurev-statistics-033121-105114.html?itemId=/content/journals/10.1146/annurev-statistics-033121-105114&mimeType=html&fmt=ahah

Literature Cited

  1. ACD (Advis. Comm. Dir.) 2021. ACD Working Group on Enhancing Rigor, Transparency and Translatability in Animal Research: final report Rep. NIH Bethesda, MD: https://acd.od.nih.gov/documents/presentations/06112021_ACD_WorkingGroup_FinalReport.pdf
    [Google Scholar]
  2. Alter G, Gonzalez R. 2018. Responsible practices for data sharing. Am. Psychol. 73:2146–56. https://doi.org/10.1037/amp0000258
    [Google Scholar]
  3. Amaya A, Biemer PP, Kinyon D. 2020. Total error in a big data world: adapting the TSE framework to big data. J. Surv. Stat. Methodol. 8:89–119. https://doi.org/10.1093/jssam/smz056
    [Google Scholar]
  4. Biemer P. 2010. Total survey error: design, implementation, and evaluation. Public Opin. Q. 74:5817–48. https://doi.org/10.1093/poq/nfq058
    [Google Scholar]
  5. Borgman C. 2012. The conundrum of sharing research data. J. Am. Soc. Inf. Sci. Technol. 63:61059–78. https://doi.org/10.1002/asi.22634
    [Google Scholar]
  6. Bourne P. 2005. Will a biological database be different from a biological journal?. PLOS Comput. Biol. 1:3e34 https://doi.org/10.1371/journal.pcbi.0010034
    [Google Scholar]
  7. Brinckman A, Chard K, Gaffney N, Hategan M, Jones MB et al. 2019. Computing environments for reproducibility: capturing the “Whole Tale. .” Future Gener. Comput. Syst. 94:854–67. https://doi.org/10.1016/j.future.2017.12.029
    [Google Scholar]
  8. Brownstein NC, Louis TA, O'Hagan A, Pendergast J 2019. The role of expert judgment in statistical inference and evidence-based decision-making. Am. Stat. 73:Suppl. 156–68. https://doi.org/10.1080/00031305.2018.1529623
    [Google Scholar]
  9. Christian T-ML, Lafferty-Hess S, Jacoby WG, Carsey T. 2018. Operationalizing the replication standard. Int. J. Digit. Curation 13:1114–24. https://doi.org/10.2218/ijdc.v13i1.555
    [Google Scholar]
  10. Cousijn H, Habermann T, Krznarich E, Meadows A. 2022. Beyond data: sharing related research outputs to make data reusable. Learn. Publ. 35:75–80. https://doi.org/10.1002/leap.1429
    [Google Scholar]
  11. Dempsey WP, Foster I, Fraser S, Kesselman C. 2022. Sharing begins at home: how continuous and ubiquitous FAIRness can enhance research productivity and data reuse. Harvard Data Sci. Rev. 4:3 https://doi.org/10.1162/99608f92.44d21b86
    [Google Scholar]
  12. ERC (Eur. Res. Counc.) 2022. Open research data and data management plans, version 4.1 Grant Inf., ERC Brussels: https://erc.europa.eu/sites/default/files/document/file/ERC_info_document-Open_Research_Data_and_Data_Management_Plans.pdf
    [Google Scholar]
  13. Errington TM, Denis A, Perfito N, Iorns E, Nosek BA. 2021. Reproducibility in cancer biology: challenges for assessing replicability in preclinical cancer biology. eLife 10:e67995 https://doi.org/10.7554/eLife.67995
    [Google Scholar]
  14. Faniel IM, Frank R, Yakel E. 2019. Context from the data reuser's point of view. J. Doc. 75:61274–97. https://doi.org/10.1108/JD-08-2018-0133
    [Google Scholar]
  15. Faniel IM, Jacobsen TE. 2010. Reusing scientific data: how earthquake engineering researchers assess the reusability of colleagues’ data. Comput. Support. Coop. Work 19:355–75
    [Google Scholar]
  16. Faniel IM, Kriesberg A, Yakel E. 2016. Social scientists’ satisfaction with data reuse. J. Assoc. Inf. Sci. Technol. 67:61404–16. https://doi.org/10.1002/asi.23480
    [Google Scholar]
  17. Faniel IM, Yakel E. 2017. Practices do not make perfect: disciplinary data sharing and reuse practices and their implications for repository data curation. Curating Research Data, Vol. 1 Practical Strategies for Your Digital Repository LR Johnston 103–26. Chicago: Assoc. Coll. Res. Libr.
    [Google Scholar]
  18. Fecher B, Friesike S, Hebing M. 2015. What drives academic data sharing. PLOS ONE 10:2e0118053 https://doi.org/10.1371/journal.pone.0118053
    [Google Scholar]
  19. Goeva A, Stoudt S, Trisovic A. 2020. Toward reproducible and extensible research: from values to action. Harvard Data Sci. Rev. 2:4 https://doi.org/10.1162/99608f92.1cc3d72a
    [Google Scholar]
  20. Goodman SN, Fanelli D, Ioannidis JPA. 2016. What does research reproducibility mean?. Sci. Transl. Med. 8:341 https://doi.org/10.1126/scitranslmed.aaf5027
    [Google Scholar]
  21. Gorgolewski K, Auer T, Calhoun V, Craddock RC, Das S et al. 2016. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci. Data 3:160044 https://doi.org/10.1038/sdata.2016.44
    [Google Scholar]
  22. Groves RM, Fowler FJ Jr., Couper MP, Lepkowski JM, Singer E, Tourangeau R. 2009. Survey Methodology Hoboken, NJ: Wiley. , 2nd ed..
    [Google Scholar]
  23. Groves RM, Lyberg L. 2010. Total survey error: past, present, and future. Public Opin. Q. 74:5817–48. https://doi.org/10.1093/poq/nfq065
    [Google Scholar]
  24. Gundersen OE, Gil Y, Aha DW. 2018. On reproducible AI: towards reproducible research, open science, and digital scholarship in AI publications. AI Mag. 39:356–68. https://doi.org/10.1609/aimag.v39i3.2816
    [Google Scholar]
  25. Hardwicke TE, Mathur MB, MacDonald K, Nilsonne G, Banks GC et al. 2018. Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal. Cognition. R. Soc. Open Sci. 5:180448 https://doi.org/10.1098/rsos.180448
    [Google Scholar]
  26. Holdren JP. 2013. Increasing access to the results of federally funded scientific research Memo., Off. Sci. Technol. Policy, White House Washington, DC: Febr. 22. https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf
    [Google Scholar]
  27. ICPSR (Inter-Univ. Consort. Political Soc. Res.) 2022. ICPSR: a case study in repository management Online Resour. ICPSR Ann Arbor, MI: https://www.icpsr.umich.edu/web/pages/datamanagement/lifecycle/index.html
    [Google Scholar]
  28. Keller SA, Korkmaz G, Orr M, Schroeder A, Shipp SS. 2017. The evolution of data quality: understanding the transdisciplinary origins of data quality concepts and approaches. . Annu. Rev. Stat. Appl. 4:85–108. https://doi.org/10.1146/annurev-statistics-060116-054114
    [Google Scholar]
  29. King G. 1995. Replication, replication. PS Political Sci. Politics 28:444–52
    [Google Scholar]
  30. Luhman S, Grazzini J, Ricciato F, Meszaros M, Giannakouris K et al. 2019. Promoting reproducibility-by-design in statistical offices Paper presented at Conference on New Techniques and Technologies for Statistics Brussels: March 14. https://doi.org/10.5281/zenodo.3240198
    [Google Scholar]
  31. Lupia A, Elman C. 2014. Openness in political science: data access and research transparency. Introduction. PS Political Sci. Politics 47:119–42
    [Google Scholar]
  32. Martone ME, Nakamura R. 2022. Changing the culture on data management and sharing: overview and highlights from a workshop held by the National Academies of Sciences, Engineering, and Medicine. Harvard Data Sci. Rev. 4:3 https://doi.org/10.1162/99608f92.44975b62
    [Google Scholar]
  33. McNutt M. 2020. Self-correction by design. Harvard Data Sci. Rev. 2:4 https://doi.org/10.1162/99608f92.32432837
    [Google Scholar]
  34. Mikytuck AM, Nusser SM, Korkmaz G. 2022. The interdependence of data producers and data users: how researchers’ behaviors can support or hinder each other. MetaArXiv yp3ct. https://doi.org/10.31222/osf.io/yp3ct
  35. Mons B, van Haagen H, Chichester C, Hoen PB, den Dunnen JT et al. 2011. The value of data. Nat. Genet. 43:281–83. https://doi.org/10.1038/ng0411-281
    [Google Scholar]
  36. Moravcsik A. 2019. Transparency in Qualitative Research. London: SAGE
    [Google Scholar]
  37. NASA (Natl. Aeronaut. Space Adm.) 2022. Transform to Open Science (TOPS) Open Sci. Initiat., NASA Washington, DC: https://science.nasa.gov/open-science/transform-to-open-science
    [Google Scholar]
  38. NASEM (Natl. Acad. Sci. Eng. Med.). 2017. Federal Statistics, Multiple Data Sources, and Privacy Protection: Next Steps Washington, DC: Natl. Acad https://doi.org/10.17226/24893
    [Google Scholar]
  39. NASEM (Natl. Acad. Sci. Eng. Med.) 2018. Open Science by Design: Realizing a Vision for 21st Century Research Washington, DC: Natl. Acad. https://doi.org/10.17226/25116
    [Google Scholar]
  40. NASEM (Natl. Acad. Sci. Eng. Med.) 2019. Reproducibility and Replicability in Science Washington, DC: Natl. Acad. https://doi.org/10.17226/25303
    [Google Scholar]
  41. NASEM (Natl. Acad. Sci. Eng. Med.) 2021a. Developing a Toolkit for Fostering Open Science Practices: Proceedings of a Workshop Washington, DC: Natl. Acad. https://doi.org/10.17226/26308
    [Google Scholar]
  42. NASEM (Natl. Acad. Sci. Eng. Med.) 2021b. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: Natl. Acad. https://doi.org/10.17226/26360
    [Google Scholar]
  43. Nelson A. 2022. Ensuring free, immediate, and equitable access to federally funded research Memo., Off. Sci. Technol. Policy, White House Washington, DC: Aug. 25. https://www.whitehouse.gov/wp-content/uploads/2022/08/08-2022-OSTP-Public-Access-Memo.pdf
    [Google Scholar]
  44. NIH (Natl. Inst. Health) 2020. Final NIH policy for data management and sharing Policy, NIH Bethesda, MD: https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html
    [Google Scholar]
  45. NRC (Natl. Res. Counc.). 1985. Sharing Research Data Washington, DC: Natl. Acad. https://doi.org/10.17226/2033
    [Google Scholar]
  46. Nusser SM, Cutcher-Gershenfeld JE, Mikytuck AM, Korkmaz G. 2021. Fostering data reusability: increasing impact and ease in sharing and reusing research data Workshop Rep. Ia. State Univ./Univ. Va./Natl. Sci. Found. Washington, DC: https://doi.org/10.5281/zenodo.5390123
    [Google Scholar]
  47. Open Sci. Collab. 2015. Promoting an open research culture. Science 348:62421422–25. https://doi.org/10.1126/science.aab2374
    [Google Scholar]
  48. Peer L, Green A. 2012. Building an open data repository for a specialized research community: process, challenges and lessons. Int. J. Digit. Curation 7:1 https://doi.org/10.2218/ijdc.v7i1.222
    [Google Scholar]
  49. Peer L, Green A, Stephenson E. 2014. Committing to data quality review. Int. J. Digit. Curation 9:1 https://doi.org/10.2218/ijdc.v9i1.317
    [Google Scholar]
  50. Peng G, Lacagnina C, Downs RR, Ganske A, Ramapriyan HK et al. 2022. Global community guidelines for documenting, sharing, and reusing quality information of individual digital datasets. Data Sci. J. 21:8 http://doi.org/10.5334/dsj-2022-008
    [Google Scholar]
  51. Percie du Sert N, Hurst V, Ahluwalia A, Alam S, Avey MT et al. 2020. The ARRIVE guidelines 2.0: updated guidelines for reporting animal research. PLOS Biol. 18:7e3000410 https://doi.org/10.1371/journal. pbio.3000410
    [Google Scholar]
  52. Pezoulas VC, Kourou KD, Kalatzis F, Exarchos TP, Venetsanopoulou A et al. 2019. Medical data quality assessment: on the development of an automated framework for medical data curation. Comput. Biol. Med. 107:270–83
    [Google Scholar]
  53. Poldrack R, Barch D, Mitchell J, Wager T, Wagner A et al. 2013. Toward open sharing of task-based fMRI data: the OpenfMRI project. Front. Neuroinformatics 7:12 https://doi.org/10.3389/fninf.2013.00012
    [Google Scholar]
  54. Prager EM, Changers KE, Plotkin JL, McArthur DL, Bandrowski AE et al. 2019. Improving transparency and scientific rigor in academic publishing. Brain Behav. 9:e01141 https://doi.org/10.1002/brb3.1141
    [Google Scholar]
  55. R. Soc. 2012. Science as an open enterprise Final Rep. R. Soc. London: https://royalsociety.org/topics-policy/projects/science-public-enterprise/report
    [Google Scholar]
  56. Smith TL, Redd K, Nusser S, Samors R, Miller ER. 2021. Guide to accelerate public access to research data Rep. Assoc. Am. Univ. Assoc. Public Land Grant Univ. Washington, DC: https://doi.org/10.31219/osf.io/tjybn
    [Google Scholar]
  57. Stodden V. 2015. Reproducing statistical results. Annu. Rev. Stat. Appl. 2:1–19. https://doi.org/10.1146/annurev-statistics-010814-020127
    [Google Scholar]
  58. Stonebraker M, Bruckner D, Ilyas IF, Beskales G, Cherniack M et al. 2013. Data curation at scale: the Data Tamer System Paper presented at 6th Biennial Conference on Innovative Data Systems Research (CIDR ’13) Asilomar, CA: Jan. 6–9. https://cs.uwaterloo.ca/∼ilyas/papers/StonebrakerCIDR2013.pdf
    [Google Scholar]
  59. Velterop J, Schultes E. 2020. An academic publishers’ GO FAIR implementation network (APIN). Inf. Serv. Use 40:4333–41. https://doi.org/10.3233/ISU-200102
    [Google Scholar]
  60. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018 https://doi.org/10.1038/sdata.2016.18
    [Google Scholar]
  61. Willis C, Stodden V. 2020. Trust but verify: how to leverage policies, workflows, and infrastructure to ensure computational reproducibility in publication. Harvard Data Sci. Rev. 2:4 https://doi.org/10.1162/99608f92.25982dcf
    [Google Scholar]
  62. wwPDB Consort 2019. Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res. 47:D520–28. https://doi.org/10.1093/nar/gky949
    [Google Scholar]
  63. Zhang LC. 2012. Topics of statistical theory for register-based statistics and data integration. Stat. Neerl. 66:141–63. https://doi.org/10.1111/j.1467-9574.2011.00508.x
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-033121-105114
Loading
  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error