1932

Abstract

The reproducibility of scientific research has become a point of critical concern. We argue that openness and transparency are critical for reproducibility, and we outline an ecosystem for open and transparent science that has emerged within the human neuroimaging community. We discuss the range of open data-sharing resources that have been developed for neuroimaging data, as well as the role of data standards (particularly the brain imaging data structure) in enabling the automated sharing, processing, and reuse of large neuroimaging data sets. We outline how the open source Python language has provided the basis for a data science platform that enables reproducible data analysis and visualization. We also discuss how new advances in software engineering, such as containerization, provide the basis for greater reproducibility in data analysis. The emergence of this new ecosystem provides an example for many areas of science that are currently struggling with reproducibility.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-biodatasci-072018-021237
2019-07-20
2024-06-19
Loading full text...

Full text loading...

/deliver/fulltext/biodatasci/2/1/annurev-biodatasci-072018-021237.html?itemId=/content/journals/10.1146/annurev-biodatasci-072018-021237&mimeType=html&fmt=ahah

Literature Cited

  1. 1. 
    Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J et al. 2013. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14:5365–76
    [Google Scholar]
  2. 2. 
    Open Sci. Collab 2015. Estimating the reproducibility of psychological science. Science 349:6251aac4716
    [Google Scholar]
  3. 3. 
    Errington TM, Iorns E, Gunn W, Tan FE, Lomax J, Nosek BA 2014. Science forum: an open investigation of the reproducibility of cancer biology research. eLife 3:e04333
    [Google Scholar]
  4. 4. 
    Herndon T, Ash M, Pollin R 2014. Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Camb. J. Econ. 38:2257–79
    [Google Scholar]
  5. 5. 
    Christensen GS, Miguel E. 2016. Transparency, reproducibility, and the credibility of economics research NBER Work. Pap. 22989
    [Google Scholar]
  6. 6. 
    Poldrack RA, Farah MJ. 2015. Progress and challenges in probing the human brain. Nature 526:7573371–79
    [Google Scholar]
  7. 7. 
    Peng RD. 2011. Reproducible research in computational science. Science 334:60601226–27
    [Google Scholar]
  8. 8. 
    Goodman SN, Fanelli D, Ioannidis JPA 2016. What does research reproducibility mean?. Sci. Transl. Med. 8:341341ps12
    [Google Scholar]
  9. 9. 
    Patil P, Peng RD, Leek J 2016. A statistical definition for reproducibility and replicability bioRxiv 066803. https://doi.org/10.1101/066803
    [Crossref] [Google Scholar]
  10. 10. 
    Longo DL, Drazen JM. 2016. Data sharing. N. Engl. J. Med. 374:3276–77
    [Google Scholar]
  11. 11. 
    Drazen JM. 2016. Data sharing and the journal. N. Engl. J. Med. 374:19e24
    [Google Scholar]
  12. 12. 
    Greene CS, Garmire LX, Gilbert JA, Ritchie MD, Hunter LE 2017. Celebrating parasites. Nat. Genet. 49:4483–84
    [Google Scholar]
  13. 13. 
    Van Horn JD, Gazzaniga MS 2013. Why share data? Lessons learned from the fMRIDC. Neuroimage 82:677–82
    [Google Scholar]
  14. 14. 
    Biswal BB, Mennes M, Zuo X-N, Gohel S, Kelly C et al. 2010. Toward discovery science of human brain function. PNAS 107:104734–39
    [Google Scholar]
  15. 15. 
    Gorgolewski KJ, Wheeler K, Halchenko YO, Poline J-B, Poldrack RA 2015. The impact of shared data in neuroimaging: the case of OpenfMRI.org. F1000Research 4:299
    [Google Scholar]
  16. 16. 
    Milham MP, Craddock RC, Son JJ, Fleischmann M, Clucas J et al. 2018. Assessment of the impact of shared brain imaging data on the scientific literature. Nat. Commun. 9:12818
    [Google Scholar]
  17. 17. 
    Grisham W, Brumberg JC, Gilbert T, Lanyon L, Williams RW, Olivo R 2017. Teaching with big data: report from the 2016 Society for Neuroscience Teaching Workshop. J. Undergrad. Neurosci. Educ. 16:1A68–76
    [Google Scholar]
  18. 18. 
    Saidi HIB. 2018. Power comparisons of the Rician and Gaussian random fields tests for detecting signal from functional magnetic resonance images PhD Thesis, Univ. North. Colo Greeley, CO:
    [Google Scholar]
  19. 19. 
    Ioannidis JPA. 2008. Why most discovered true associations are inflated. Epidemiology 19:5640–48
    [Google Scholar]
  20. 20. 
    Reid AT, Bzdok D, Genon S, Langner R, Müller VI et al. 2016. ANIMA: a data-sharing initiative for neuroimaging meta-analyses. Neuroimage 124:Pt. B1245–53
    [Google Scholar]
  21. 21. 
    Van Essen DC, Smith SM, Barch DM, Behrens TEJ, Yacoub E et al. 2013. The WU-Minn Human Connectome Project: an overview. Neuroimage 80:62–79
    [Google Scholar]
  22. 22. 
    Mennes M, Biswal BB, Castellanos FX, Milham MP 2013. Making data sharing work: the FCP/INDI experience. Neuroimage 82:683–91
    [Google Scholar]
  23. 23. 
    Nooner KB, Colcombe SJ, Tobe RH, Mennes M, Benedict MM et al. 2012. The NKI-Rockland Sample: a model for accelerating the pace of discovery science in psychiatry. Front. Neurosci. 6:152
    [Google Scholar]
  24. 24. 
    Poldrack RA, Baker CI, Durnez J, Gorgolewski KJ, Matthews PM et al. 2017. Scanning the horizon: towards transparent and reproducible neuroimaging research. Nat. Rev. Neurosci. 18:2115–26
    [Google Scholar]
  25. 25. 
    Braga RM, Buckner RL. 2017. Parallel interdigitated distributed networks within the individual estimated by intrinsic functional connectivity. Neuron 95:2457–71.e5
    [Google Scholar]
  26. 26. 
    Gordon EM, Laumann TO, Gilmore AW, Newbold DJ, Greene DJ et al. 2017. Precision functional mapping of individual human brains. Neuron 95:4791–807.e7
    [Google Scholar]
  27. 27. 
    Poldrack RA. 2017. Precision neuroscience: dense sampling of individual brains. Neuron 95:4727–29
    [Google Scholar]
  28. 28. 
    Pinho AL, Amadon A, Ruest T, Fabre M, Dohmatob E et al. 2018. Individual Brain Charting, a high-resolution fMRI dataset for cognitive mapping. Sci. Data. 5:180105
    [Google Scholar]
  29. 29. 
    Poldrack RA, Laumann TO, Koyejo O, Gregory B, Hover A et al. 2015. Long-term neural and physiological phenotyping of a single human. Nat. Commun. 6:8885
    [Google Scholar]
  30. 30. 
    Zuo X-N, Anderson JS, Bellec P, Birn RM, Biswal BB et al. 2014. An open science resource for establishing reliability and reproducibility in functional connectomics. Sci. Data. 1:140049
    [Google Scholar]
  31. 31. 
    Casey BJ, Cannonier T, Conley MI, Cohen AO, Barch DM et al. 2018. The Adolescent Brain Cognitive Development (ABCD) study: imaging acquisition across 21 sites. Dev. Cogn. Neurosci. 32:43–54
    [Google Scholar]
  32. 32. 
    Miller KL, Alfaro-Almagro F, Bangerter NK, Thomas DL, Yacoub E et al. 2016. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nat. Neurosci. 19:111523–36
    [Google Scholar]
  33. 33. 
    Fox PT, Lancaster JL. 2002. Mapping context and content: the BrainMap model. Nat. Rev. Neurosci. 3:4319–21
    [Google Scholar]
  34. 34. 
    Yarkoni T, Poldrack RA, Nichols TE, Van Essen DC, Wager TD 2011. Large-scale automated synthesis of human functional neuroimaging data. Nat. Methods. 8:8665–70
    [Google Scholar]
  35. 35. 
    Gorgolewski KJ, Auer T, Calhoun VD, Craddock RC, Das S et al. 2016. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci. Data. 3:160044
    [Google Scholar]
  36. 36. 
    Ashburner J, Barnes G, Chen C, Daunizeau J, Flandin G et al. 2014. SPM12 manual Softw. Man., Wellcome Trust Cent. Neuroimaging London:
    [Google Scholar]
  37. 37. 
    Gorgolewski K, Burns CD, Madison C, Clark D, Halchenko YO et al. 2011. Nipype: a flexible, lightweight and extensible neuroimaging data processing framework in Python. Front. Neuroinform. 5:13
    [Google Scholar]
  38. 38. 
    van der Walt S, Colbert SC, Varoquaux G 2011. The NumPy Array: a structure for efficient numerical computation. Comput. Sci. Eng. 13:222–30
    [Google Scholar]
  39. 39. 
    Jones E, Oliphant T, Peterson P 2001. SciPy: open source scientific tools for Python http://www.scipy.org/
    [Google Scholar]
  40. 40. 
    Kluyver T, Ragan-Kelley B, Pérez F, Granger BE, Bussonnier M et al. 2016. Jupyter Notebooks: a publishing format for reproducible computational workflows. Positioning and Power in Academic Publishing: Players, Agents and Agendas F Loizides, B Schmidt87–90 Washington, DC: IOS
    [Google Scholar]
  41. 41. 
    Wilson G, Aruliah DA, Brown CT, Chue Hong NP, Davis M et al. 2014. Best practices for scientific computing. PLOS Biol 12:1e1001745
    [Google Scholar]
  42. 42. 
    LeCun Y, Bengio Y, Hinton G 2015. Deep learning. Nature 521:7553436–44
    [Google Scholar]
  43. 43. 
    Jordan MI, Mitchell TM. 2015. Machine learning: trends, perspectives, and prospects. Science 349:6245255–60
    [Google Scholar]
  44. 44. 
    Yarkoni T, Westfall J. 2017. Choosing prediction over explanation in psychology: lessons from machine learning. Perspect. Psychol. Sci. 12:61100–22
    [Google Scholar]
  45. 45. 
    Varoquaux G, Thirion B. 2014. How machine learning is shaping cognitive neuroimaging. Gigascience 3:28
    [Google Scholar]
  46. 46. 
    Varoquaux G, Poldrack RA. 2018. Predictive models avoid excessive reductionism in cognitive neuroimaging. Curr. Opin. Neurobiol. 55:1–6
    [Google Scholar]
  47. 47. 
    Pereira F, Mitchell T, Botvinick M 2009. Machine learning classifiers and fMRI: a tutorial overview. Neuroimage 45:S199–209
    [Google Scholar]
  48. 48. 
    Woo C-W, Chang LJ, Lindquist MA, Wager TD 2017. Building better biomarkers: brain models in translational neuroimaging. Nat. Neurosci. 20:3365–77
    [Google Scholar]
  49. 49. 
    Klöppel S, Abdulkadir A, Jack CR Jr, Koutsouleris N, Mourão-Miranda J, Vemuri P 2012. Diagnostic neuroimaging across diseases. Neuroimage 61:2457–63
    [Google Scholar]
  50. 50. 
    Abraham A, Milham MP, Di Martino A, Craddock RC, Samaras D et al. 2017. Deriving reproducible biomarkers from multi-site resting-state data: an autism-based example. Neuroimage 147:736–45
    [Google Scholar]
  51. 51. 
    Wager TD, Atlas LY, Lindquist MA, Roy M, Woo C-W, Kross E 2013. An fMRI-based neurologic signature of physical pain. N. Engl. J. Med. 368:151388–97
    [Google Scholar]
  52. 52. 
    Poldrack RA. 2011. Inferring mental states from neuroimaging data: from reverse inference to large-scale decoding. Neuron 72:5692–97
    [Google Scholar]
  53. 53. 
    Poldrack RA. 2006. Can cognitive processes be inferred from neuroimaging data. Trends Cogn. Sci. 10:259–63
    [Google Scholar]
  54. 54. 
    Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K et al. 2015. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34:101993–2024
    [Google Scholar]
  55. 55. 
    Shin H-C, Roth HR, Gao M, Lu L, Xu Z et al. 2016. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35:51285–98
    [Google Scholar]
  56. 56. 
    Varoquaux G, Craddock RC. 2013. Learning and comparing functional connectomes across subjects. Neuroimage 80:405–15
    [Google Scholar]
  57. 57. 
    Thirion B, Varoquaux G, Dohmatob E, Poline J-B 2014. Which fMRI clustering gives good brain parcellations?. Front. Neurosci. 8:167
    [Google Scholar]
  58. 58. 
    Kiviniemi V, Kantola J-H, Jauhiainen J, Hyvärinen A, Tervonen O 2003. Independent component analysis of nondeterministic fMRI signal sources. Neuroimage 19:2 Pt. 1253–60
    [Google Scholar]
  59. 59. 
    Mensch A, Mairal J, Bzdok D, Thirion B, Varoquaux G 2017. Learning neural representations of human cognition across many fMRI studies. Advances in Neural Information Processing Systems 30 (NIPS 2017) I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus et al.5883–93 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  60. 60. 
    Chang C-C, Lin C-J. 2011. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2:327
    [Google Scholar]
  61. 61. 
    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B et al. 2011. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12:Oct.2825–30
    [Google Scholar]
  62. 62. 
    Gelman A, Loken E. 2013. The garden of forking paths: why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time Dept. Stat., Columbia Univ New York, NY: www.stat.columbia.edu/∼gelman/research/unpublished/p_hacking.pdf
    [Google Scholar]
  63. 63. 
    Carp J. 2012. On the plurality of (methodological) worlds: estimating the analytic flexibility of fMRI experiments. Front. Neurosci. 6:149
    [Google Scholar]
  64. 64. 
    Fischl B. 2012. FreeSurfer. Neuroimage 62:2774–81
    [Google Scholar]
  65. 65. 
    Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TEJ et al. 2004. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23:S208–19
    [Google Scholar]
  66. 66. 
    Avants BB, Tustison N, Song G 2009. Advanced normalization tools (ANTS). Insight J 2009:Jul.–Dec. http://hdl.handle.net/10380/3113
    [Google Scholar]
  67. 67. 
    Abraham A, Pedregosa F, Eickenberg M, Gervais P, Mueller A et al. 2014. Machine learning for neuroimaging with scikit-learn. Front. Neuroinform. 8:14
    [Google Scholar]
  68. 68. 
    Michel V, Gramfort A, Varoquaux G, Eger E, Thirion B 2011. Total variation regularization for fMRI-based prediction of behavior. IEEE Trans. Med. Imaging 30:71328–40
    [Google Scholar]
  69. 69. 
    Grosenick L, Klingenberg B, Katovich K, Knutson B, Taylor JE 2013. Interpretable whole-brain prediction analysis with GraphNet. Neuroimage 72:304–21
    [Google Scholar]
  70. 70. 
    Craddock RC, James GA, Holtzheimer PE 3rd, Hu XP, Mayberg HS 2012. A whole brain fMRI atlas generated via spatially constrained spectral clustering. Hum. Brain Mapp. 33:81914–28
    [Google Scholar]
  71. 71. 
    Hanke M, Halchenko YO, Sederberg PB, Hanson SJ, Haxby JV, Pollmann S 2009. PyMVPA: a Python toolbox for multivariate pattern analysis of fMRI data. Neuroinformatics 7:137–53
    [Google Scholar]
  72. 72. 
    Gramfort A, Luessi M, Larson E, Engemann DA, Strohmeier D et al. 2014. MNE software for processing MEG and EEG data. Neuroimage 86:446–60
    [Google Scholar]
  73. 73. 
    Millman KJ, Brett M. 2007. Analysis of functional magnetic resonance imaging in Python. Comput. Sci. Eng. 9:352–55
    [Google Scholar]
  74. 74. 
    Garyfallidis E, Brett M, Amirbekian B, Rokem A, van der Walt S et al. 2014. Dipy, a library for the analysis of diffusion MRI data. Front. Neuroinform. 8:8
    [Google Scholar]
  75. 75. 
    Brett M, Hanke M, Cipollini B, Côté M-A, Markiewicz C et al. 2016. nibabel: 2.1.0. Zenodo https://doi.org/10.5281/zenodo.60808
    [Crossref] [Google Scholar]
  76. 76. 
    Mackenzie-Graham AJ, Van Horn JD, Woods RP, Crawford KL, Toga AW 2008. Provenance in neuroimaging. Neuroimage 42:1178–95
    [Google Scholar]
  77. 77. 
    Kurtzer GM, Sochat V, Bauer MW 2017. Singularity: scientific containers for mobility of compute. PLOS ONE 12:5e0177459
    [Google Scholar]
  78. 78. 
    Halchenko YO, Hanke M. 2012. Open is not enough. Let's take the next step: an integrated, community-driven computing platform for neuroscience. Front. Neuroinform. 6:22
    [Google Scholar]
  79. 79. 
    Gorgolewski KJ, Alfaro-Almagro F, Auer T, Bellec P, Capotă M et al. 2017. BIDS apps: improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods. PLOS Comput. Biol. 13:3e1005209
    [Google Scholar]
  80. 80. 
    Esteban O, Markiewicz C, Blair RW, Moodie C, Isik AI et al. 2018. fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat. Methods 16:111–16
    [Google Scholar]
  81. 81. 
    Eklund A, Nichols TE, Knutsson H 2016. Cluster failure: why fMRI inferences for spatial extent have inflated false-positive rates. PNAS 113:287900–5
    [Google Scholar]
  82. 82. 
    Gronenschild EHBM, Habets P, Jacobs HIL, Mengelers R, Rozendaal N et al. 2012. The effects of FreeSurfer version, workstation type, and Macintosh operating system version on anatomical volume and cortical thickness measurements. PLOS ONE 7:6e38234
    [Google Scholar]
  83. 83. 
    Miller G. 2006. A scientist's nightmare: Software problem leads to five retractions. Science 314:58071856–57
    [Google Scholar]
  84. 84. 
    Kernighan BW, Plauger PJ. 1978. The Elements of Programming Style New York: McGraw-Hill
    [Google Scholar]
  85. 85. 
    Varoquaux G. 2016. Beyond computational reproducibility, let us aim for reusability. IEEE CIS Newsl. Cogn. Dev. Syst. 13:27
    [Google Scholar]
  86. 86. 
    Raymond E. 1999. The cathedral and the bazaar. Knowledge Technol. Policy 12:323–49
    [Google Scholar]
  87. 87. 
    Bigdely-Shamlo N, Makeig S, Robbins KA 2016. Preparing laboratory and real-world EEG data for large-scale analysis: a containerized approach. Front. Neuroinform. 10:7
    [Google Scholar]
/content/journals/10.1146/annurev-biodatasci-072018-021237
Loading
/content/journals/10.1146/annurev-biodatasci-072018-021237
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error