Computational and Informatic Advances for Reproducible Data Analysis in Neuroimaging

Russell A. Poldrack; Krzysztof J. Gorgolewski; Gaël Varoquaux

doi:10.1146/annurev-biodatasci-072018-021237

Annual Review of Biomedical Data Science

Volume 2, 2019

Review Article

Free

Computational and Informatic Advances for Reproducible Data Analysis in Neuroimaging

Russell A. Poldrack¹, Krzysztof J. Gorgolewski¹, and Gaël Varoquaux²
View Affiliations Hide Affiliations

Affiliations: ¹Department of Psychology, Stanford University, Stanford, California 94305, USA; email: [email protected] ²Parietal Team, Inria and NeuroSpin/CEA (Atomic Energy Commission), 91191 Gif/-sur-Yvette, France
Vol. 2:119-138 (Volume publication date July 2019) https://doi.org/10.1146/annurev-biodatasci-072018-021237
First published as a Review in Advance on April 08, 2019
Copyright © 2019 by Annual Reviews. All rights reserved

Abstract

The reproducibility of scientific research has become a point of critical concern. We argue that openness and transparency are critical for reproducibility, and we outline an ecosystem for open and transparent science that has emerged within the human neuroimaging community. We discuss the range of open data-sharing resources that have been developed for neuroimaging data, as well as the role of data standards (particularly the brain imaging data structure) in enabling the automated sharing, processing, and reuse of large neuroimaging data sets. We outline how the open source Python language has provided the basis for a data science platform that enables reproducible data analysis and visualization. We also discuss how new advances in software engineering, such as containerization, provide the basis for greater reproducibility in data analysis. The emergence of this new ecosystem provides an example for many areas of science that are currently struggling with reproducibility.

Keyword(s): containerization, machine learning, open science, Python, software engineering

Article metrics loading...

/content/journals/10.1146/annurev-biodatasci-072018-021237

2019-07-20

2024-05-13

Full text loading...

/deliver/fulltext/biodatasci/2/1/annurev-biodatasci-072018-021237.html?itemId=/content/journals/10.1146/annurev-biodatasci-072018-021237&mimeType=html&fmt=ahah

Literature Cited

1.
Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J et al. 2013. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14:5365–76
[Google Scholar]
2.
Open Sci. Collab 2015. Estimating the reproducibility of psychological science. Science 349:6251aac4716
[Google Scholar]
3.
Errington TM, Iorns E, Gunn W, Tan FE, Lomax J, Nosek BA 2014. Science forum: an open investigation of the reproducibility of cancer biology research. eLife 3:e04333
[Google Scholar]
4.
Herndon T, Ash M, Pollin R 2014. Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Camb. J. Econ. 38:2257–79
[Google Scholar]
5.
Christensen GS, Miguel E. 2016. Transparency, reproducibility, and the credibility of economics research NBER Work. Pap. 22989
6.
Poldrack RA, Farah MJ. 2015. Progress and challenges in probing the human brain. Nature 526:7573371–79
[Google Scholar]
7.
Peng RD. 2011. Reproducible research in computational science. Science 334:60601226–27
[Google Scholar]
8.
Goodman SN, Fanelli D, Ioannidis JPA 2016. What does research reproducibility mean?. Sci. Transl. Med. 8:341341ps12
[Google Scholar]
9.
Patil P, Peng RD, Leek J 2016. A statistical definition for reproducibility and replicability bioRxiv 066803. https://doi.org/10.1101/066803
[Crossref]
10.
Longo DL, Drazen JM. 2016. Data sharing. N. Engl. J. Med. 374:3276–77
[Google Scholar]
11.
Drazen JM. 2016. Data sharing and the journal. N. Engl. J. Med. 374:19e24
[Google Scholar]
12.
Greene CS, Garmire LX, Gilbert JA, Ritchie MD, Hunter LE 2017. Celebrating parasites. Nat. Genet. 49:4483–84
[Google Scholar]
13.
Van Horn JD, Gazzaniga MS 2013. Why share data? Lessons learned from the fMRIDC. Neuroimage 82:677–82
[Google Scholar]
14.
Biswal BB, Mennes M, Zuo X-N, Gohel S, Kelly C et al. 2010. Toward discovery science of human brain function. PNAS 107:104734–39
[Google Scholar]
15.
Gorgolewski KJ, Wheeler K, Halchenko YO, Poline J-B, Poldrack RA 2015. The impact of shared data in neuroimaging: the case of OpenfMRI.org. F1000Research 4:299
[Google Scholar]
16.
Milham MP, Craddock RC, Son JJ, Fleischmann M, Clucas J et al. 2018. Assessment of the impact of shared brain imaging data on the scientific literature. Nat. Commun. 9:12818
[Google Scholar]
17.
Grisham W, Brumberg JC, Gilbert T, Lanyon L, Williams RW, Olivo R 2017. Teaching with big data: report from the 2016 Society for Neuroscience Teaching Workshop. J. Undergrad. Neurosci. Educ. 16:1A68–76
[Google Scholar]
18.
Saidi HIB. 2018. Power comparisons of the Rician and Gaussian random fields tests for detecting signal from functional magnetic resonance images PhD Thesis, Univ. North. Colo Greeley, CO:
19.
Ioannidis JPA. 2008. Why most discovered true associations are inflated. Epidemiology 19:5640–48
[Google Scholar]
20.
Reid AT, Bzdok D, Genon S, Langner R, Müller VI et al. 2016. ANIMA: a data-sharing initiative for neuroimaging meta-analyses. Neuroimage 124:Pt. B1245–53
[Google Scholar]
21.
Van Essen DC, Smith SM, Barch DM, Behrens TEJ, Yacoub E et al. 2013. The WU-Minn Human Connectome Project: an overview. Neuroimage 80:62–79
[Google Scholar]
22.
Mennes M, Biswal BB, Castellanos FX, Milham MP 2013. Making data sharing work: the FCP/INDI experience. Neuroimage 82:683–91
[Google Scholar]
23.
Nooner KB, Colcombe SJ, Tobe RH, Mennes M, Benedict MM et al. 2012. The NKI-Rockland Sample: a model for accelerating the pace of discovery science in psychiatry. Front. Neurosci. 6:152
[Google Scholar]
24.
Poldrack RA, Baker CI, Durnez J, Gorgolewski KJ, Matthews PM et al. 2017. Scanning the horizon: towards transparent and reproducible neuroimaging research. Nat. Rev. Neurosci. 18:2115–26
[Google Scholar]
25.
Braga RM, Buckner RL. 2017. Parallel interdigitated distributed networks within the individual estimated by intrinsic functional connectivity. Neuron 95:2457–71.e5
[Google Scholar]
26.
Gordon EM, Laumann TO, Gilmore AW, Newbold DJ, Greene DJ et al. 2017. Precision functional mapping of individual human brains. Neuron 95:4791–807.e7
[Google Scholar]
27.
Poldrack RA. 2017. Precision neuroscience: dense sampling of individual brains. Neuron 95:4727–29
[Google Scholar]
28.
Pinho AL, Amadon A, Ruest T, Fabre M, Dohmatob E et al. 2018. Individual Brain Charting, a high-resolution fMRI dataset for cognitive mapping. Sci. Data. 5:180105
[Google Scholar]
29.
Poldrack RA, Laumann TO, Koyejo O, Gregory B, Hover A et al. 2015. Long-term neural and physiological phenotyping of a single human. Nat. Commun. 6:8885
[Google Scholar]
30.
Zuo X-N, Anderson JS, Bellec P, Birn RM, Biswal BB et al. 2014. An open science resource for establishing reliability and reproducibility in functional connectomics. Sci. Data. 1:140049
[Google Scholar]
31.
Casey BJ, Cannonier T, Conley MI, Cohen AO, Barch DM et al. 2018. The Adolescent Brain Cognitive Development (ABCD) study: imaging acquisition across 21 sites. Dev. Cogn. Neurosci. 32:43–54
[Google Scholar]
32.
Miller KL, Alfaro-Almagro F, Bangerter NK, Thomas DL, Yacoub E et al. 2016. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nat. Neurosci. 19:111523–36
[Google Scholar]
33.
Fox PT, Lancaster JL. 2002. Mapping context and content: the BrainMap model. Nat. Rev. Neurosci. 3:4319–21
[Google Scholar]
34.
Yarkoni T, Poldrack RA, Nichols TE, Van Essen DC, Wager TD 2011. Large-scale automated synthesis of human functional neuroimaging data. Nat. Methods. 8:8665–70
[Google Scholar]
35.
Gorgolewski KJ, Auer T, Calhoun VD, Craddock RC, Das S et al. 2016. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci. Data. 3:160044
[Google Scholar]
36.
Ashburner J, Barnes G, Chen C, Daunizeau J, Flandin G et al. 2014. SPM12 manual Softw. Man., Wellcome Trust Cent. Neuroimaging London:
37.
Gorgolewski K, Burns CD, Madison C, Clark D, Halchenko YO et al. 2011. Nipype: a flexible, lightweight and extensible neuroimaging data processing framework in Python. Front. Neuroinform. 5:13
[Google Scholar]
38.
van der Walt S, Colbert SC, Varoquaux G 2011. The NumPy Array: a structure for efficient numerical computation. Comput. Sci. Eng. 13:222–30
[Google Scholar]
39.
Jones E, Oliphant T, Peterson P 2001. SciPy: open source scientific tools for Python http://www.scipy.org/
40.
Kluyver T, Ragan-Kelley B, Pérez F, Granger BE, Bussonnier M et al. 2016. Jupyter Notebooks: a publishing format for reproducible computational workflows. Positioning and Power in Academic Publishing: Players, Agents and Agendas F Loizides, B Schmidt87–90 Washington, DC: IOS
[Google Scholar]
41.
Wilson G, Aruliah DA, Brown CT, Chue Hong NP, Davis M et al. 2014. Best practices for scientific computing. PLOS Biol 12:1e1001745
[Google Scholar]
42.
LeCun Y, Bengio Y, Hinton G 2015. Deep learning. Nature 521:7553436–44
[Google Scholar]
43.
Jordan MI, Mitchell TM. 2015. Machine learning: trends, perspectives, and prospects. Science 349:6245255–60
[Google Scholar]
44.
Yarkoni T, Westfall J. 2017. Choosing prediction over explanation in psychology: lessons from machine learning. Perspect. Psychol. Sci. 12:61100–22
[Google Scholar]
45.
Varoquaux G, Thirion B. 2014. How machine learning is shaping cognitive neuroimaging. Gigascience 3:28
[Google Scholar]
46.
Varoquaux G, Poldrack RA. 2018. Predictive models avoid excessive reductionism in cognitive neuroimaging. Curr. Opin. Neurobiol. 55:1–6
[Google Scholar]
47.
Pereira F, Mitchell T, Botvinick M 2009. Machine learning classifiers and fMRI: a tutorial overview. Neuroimage 45:S199–209
[Google Scholar]
48.
Woo C-W, Chang LJ, Lindquist MA, Wager TD 2017. Building better biomarkers: brain models in translational neuroimaging. Nat. Neurosci. 20:3365–77
[Google Scholar]
49.
Klöppel S, Abdulkadir A, Jack CR Jr, Koutsouleris N, Mourão-Miranda J, Vemuri P 2012. Diagnostic neuroimaging across diseases. Neuroimage 61:2457–63
[Google Scholar]
50.
Abraham A, Milham MP, Di Martino A, Craddock RC, Samaras D et al. 2017. Deriving reproducible biomarkers from multi-site resting-state data: an autism-based example. Neuroimage 147:736–45
[Google Scholar]
51.
Wager TD, Atlas LY, Lindquist MA, Roy M, Woo C-W, Kross E 2013. An fMRI-based neurologic signature of physical pain. N. Engl. J. Med. 368:151388–97
[Google Scholar]
52.
Poldrack RA. 2011. Inferring mental states from neuroimaging data: from reverse inference to large-scale decoding. Neuron 72:5692–97
[Google Scholar]
53.
Poldrack RA. 2006. Can cognitive processes be inferred from neuroimaging data. Trends Cogn. Sci. 10:259–63
[Google Scholar]
54.
Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K et al. 2015. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34:101993–2024
[Google Scholar]
55.
Shin H-C, Roth HR, Gao M, Lu L, Xu Z et al. 2016. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35:51285–98
[Google Scholar]
56.
Varoquaux G, Craddock RC. 2013. Learning and comparing functional connectomes across subjects. Neuroimage 80:405–15
[Google Scholar]
57.
Thirion B, Varoquaux G, Dohmatob E, Poline J-B 2014. Which fMRI clustering gives good brain parcellations?. Front. Neurosci. 8:167
[Google Scholar]
58.
Kiviniemi V, Kantola J-H, Jauhiainen J, Hyvärinen A, Tervonen O 2003. Independent component analysis of nondeterministic fMRI signal sources. Neuroimage 19:2 Pt. 1253–60
[Google Scholar]
59.
Mensch A, Mairal J, Bzdok D, Thirion B, Varoquaux G 2017. Learning neural representations of human cognition across many fMRI studies. Advances in Neural Information Processing Systems 30 (NIPS 2017) I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus et al.5883–93 Red Hook, NY: Curran Assoc.
[Google Scholar]
60.
Chang C-C, Lin C-J. 2011. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2:327
[Google Scholar]
61.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B et al. 2011. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12:Oct.2825–30
[Google Scholar]
62.
Gelman A, Loken E. 2013. The garden of forking paths: why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time Dept. Stat., Columbia Univ New York, NY: www.stat.columbia.edu/∼gelman/research/unpublished/p_hacking.pdf
63.
Carp J. 2012. On the plurality of (methodological) worlds: estimating the analytic flexibility of fMRI experiments. Front. Neurosci. 6:149
[Google Scholar]
64.
Fischl B. 2012. FreeSurfer. Neuroimage 62:2774–81
[Google Scholar]
65.
Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TEJ et al. 2004. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23:S208–19
[Google Scholar]
66.
Avants BB, Tustison N, Song G 2009. Advanced normalization tools (ANTS). Insight J 2009:Jul.–Dec. http://hdl.handle.net/10380/3113
[Google Scholar]
67.
Abraham A, Pedregosa F, Eickenberg M, Gervais P, Mueller A et al. 2014. Machine learning for neuroimaging with scikit-learn. Front. Neuroinform. 8:14
[Google Scholar]
68.
Michel V, Gramfort A, Varoquaux G, Eger E, Thirion B 2011. Total variation regularization for fMRI-based prediction of behavior. IEEE Trans. Med. Imaging 30:71328–40
[Google Scholar]
69.
Grosenick L, Klingenberg B, Katovich K, Knutson B, Taylor JE 2013. Interpretable whole-brain prediction analysis with GraphNet. Neuroimage 72:304–21
[Google Scholar]
70.
Craddock RC, James GA, Holtzheimer PE 3rd, Hu XP, Mayberg HS 2012. A whole brain fMRI atlas generated via spatially constrained spectral clustering. Hum. Brain Mapp. 33:81914–28
[Google Scholar]
71.
Hanke M, Halchenko YO, Sederberg PB, Hanson SJ, Haxby JV, Pollmann S 2009. PyMVPA: a Python toolbox for multivariate pattern analysis of fMRI data. Neuroinformatics 7:137–53
[Google Scholar]
72.
Gramfort A, Luessi M, Larson E, Engemann DA, Strohmeier D et al. 2014. MNE software for processing MEG and EEG data. Neuroimage 86:446–60
[Google Scholar]
73.
Millman KJ, Brett M. 2007. Analysis of functional magnetic resonance imaging in Python. Comput. Sci. Eng. 9:352–55
[Google Scholar]
74.
Garyfallidis E, Brett M, Amirbekian B, Rokem A, van der Walt S et al. 2014. Dipy, a library for the analysis of diffusion MRI data. Front. Neuroinform. 8:8
[Google Scholar]
75.
Brett M, Hanke M, Cipollini B, Côté M-A, Markiewicz C et al. 2016. nibabel: 2.1.0. Zenodo https://doi.org/10.5281/zenodo.60808
[Crossref] [Google Scholar]
76.
Mackenzie-Graham AJ, Van Horn JD, Woods RP, Crawford KL, Toga AW 2008. Provenance in neuroimaging. Neuroimage 42:1178–95
[Google Scholar]
77.
Kurtzer GM, Sochat V, Bauer MW 2017. Singularity: scientific containers for mobility of compute. PLOS ONE 12:5e0177459
[Google Scholar]
78.
Halchenko YO, Hanke M. 2012. Open is not enough. Let's take the next step: an integrated, community-driven computing platform for neuroscience. Front. Neuroinform. 6:22
[Google Scholar]
79.
Gorgolewski KJ, Alfaro-Almagro F, Auer T, Bellec P, Capotă M et al. 2017. BIDS apps: improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods. PLOS Comput. Biol. 13:3e1005209
[Google Scholar]
80.
Esteban O, Markiewicz C, Blair RW, Moodie C, Isik AI et al. 2018. fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat. Methods 16:111–16
[Google Scholar]
81.
Eklund A, Nichols TE, Knutsson H 2016. Cluster failure: why fMRI inferences for spatial extent have inflated false-positive rates. PNAS 113:287900–5
[Google Scholar]
82.
Gronenschild EHBM, Habets P, Jacobs HIL, Mengelers R, Rozendaal N et al. 2012. The effects of FreeSurfer version, workstation type, and Macintosh operating system version on anatomical volume and cortical thickness measurements. PLOS ONE 7:6e38234
[Google Scholar]
83.
Miller G. 2006. A scientist's nightmare: Software problem leads to five retractions. Science 314:58071856–57
[Google Scholar]
84.
Kernighan BW, Plauger PJ. 1978. The Elements of Programming Style New York: McGraw-Hill
85.
Varoquaux G. 2016. Beyond computational reproducibility, let us aim for reusability. IEEE CIS Newsl. Cogn. Dev. Syst. 13:27
[Google Scholar]
86.
Raymond E. 1999. The cathedral and the bazaar. Knowledge Technol. Policy 12:323–49
[Google Scholar]
87.
Bigdely-Shamlo N, Makeig S, Robbins KA 2016. Preparing laboratory and real-world EEG data for large-scale analysis: a containerized approach. Front. Neuroinform. 10:7
[Google Scholar]

/content/journals/10.1146/annurev-biodatasci-072018-021237

Computational and Informatic Advances for Reproducible Data Analysis in Neuroimaging

Annual Review of Biomedical Data Science 2, 119 (2019); https://doi.org/10.1146/annurev-biodatasci-072018-021237

/content/journals/10.1146/annurev-biodatasci-072018-021237

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Ethical Machine Learning in Healthcare
  
  Irene Y. Chen, Emma Pierson, Sherri Rose, Shalmali Joshi, Kadija Ferryman, and Marzyeh Ghassemi
  
  Vol. 4 (2021), pp. 123–144
- Spatial Metabolomics and Imaging Mass Spectrometry in the Age of Artificial Intelligence
  
  Theodore Alexandrov
  
  Vol. 3 (2020), pp. 61–87
- Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models
  
  Juan M. Banda, Martin Seneviratne, Tina Hernandez-Boussard, and Nigam H. Shah
  
  Vol. 1 (2018), pp. 53–68
- Computational Methods for Understanding Mass Spectrometry–Based Shotgun Proteomics Data
  
  Pavel Sinitcyn, Jan Daniel Rudolph, and Jürgen Cox
  
  Vol. 1 (2018), pp. 207–234
- RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis
  
  Koen Van den Berge, Katharina M. Hembach, Charlotte Soneson, Simone Tiberi, Lieven Clement, Michael I. Love, Rob Patro, and Mark D. Robinson
  
  Vol. 2 (2019), pp. 139–173
- Deep Learning in Biomedical Data Science
  
  Pierre Baldi
  
  Vol. 1 (2018), pp. 181–205
- From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture
  
  Xi Chen, Sarah A. Teichmann, and Kerstin B. Meyer
  
  Vol. 1 (2018), pp. 29–51
- Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS
  
  Lisa Bastarache
  
  Vol. 4 (2021), pp. 1–19
- Visualization of Biomedical Data
  
  Seán I. O'Donoghue, Benedetta Frida Baldi, Susan J. Clark, Aaron E. Darling, James M. Hogan, Sandeep Kaur, Lena Maier-Hein, Davis J. McCarthy, William J. Moore, Esther Stenau, Jason R. Swedlow, Jenny Vuong, and James B. Procter
  
  Vol. 1 (2018), pp. 275–304
- Alignment-Free Sequence Analysis and Applications
  
  Jie Ren, Xin Bai, Yang Young Lu, Kujin Tang, Ying Wang, Gesine Reinert, and Fengzhu Sun
  
  Vol. 1 (2018), pp. 93–114
More Less

Annual Review of Biomedical Data Science

Volume 2, 2019

Review Article

Free

Computational and Informatic Advances for Reproducible Data Analysis in Neuroimaging

Abstract

Most Read This Month

Most Cited Most Cited RSS feed