1932

Abstract

Inferences made about objects via vision, such as rapid and accurate categorization, are core to primate cognition despite the algorithmic challenge posed by varying viewpoints and scenes. Until recently, the brain mechanisms that support these capabilities were deeply mysterious. However, over the past decade, this scientific mystery has been illuminated by the discovery and development of brain-inspired, image-computable, artificial neural network (ANN) systems that rival primates in these behavioral feats. Apart from fundamentally changing the landscape of artificial intelligence, modified versions of these ANN systems are the current leading scientific hypotheses of an integrated set of mechanisms in the primate ventral visual stream that support core object recognition. What separates brain-mapped versions of these systems from prior conceptual models is that they are sensory computable, mechanistic, anatomically referenced, and testable (SMART). In this article, we review and provide perspective on the brain mechanisms addressed by the current leading SMART models. We review their empirical brain and behavioral alignment successes and failures, discuss the next frontiers for an even more accurate mechanistic understanding, and outline the likely applications.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-vision-112823-030616
2024-09-18
2024-12-10
Loading full text...

Full text loading...

/deliver/fulltext/vision/10/1/annurev-vision-112823-030616.html?itemId=/content/journals/10.1146/annurev-vision-112823-030616&mimeType=html&fmt=ahah

Literature Cited

  1. Arend L, Han Y, Schrimpf M, Bashivan P, Kar K, et al. 2018.. Single units in a deep neural network functionally correspond with neurons in the brain: preliminary results. Tech. Rep. , Cent. Brains Minds Mach., Cambridge, MA:
    [Google Scholar]
  2. Azadi R, Bohn S, Lopez E, Lafer-Sousa R, Wang K, et al. 2023.. Image-dependence of the detectability of optogenetic stimulation in macaque inferotemporal cortex. . Curr. Biol. 33:(3):58188
    [Crossref] [Google Scholar]
  3. Barbu A, Mayo D, Alverio J, Luo W, Wang C, et al. 2019.. ObjectNet: a large-scale bias-controlled dataset for pushing the limits of object recognition models. . In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), ed. H Wallach, H Larochelle, A Beygelzimer, F d'Alché-Buc, E Fox, R Garnett , pp. 944858. Red Hook, NY:: Curran Assoc.
    [Google Scholar]
  4. Bashivan P, Kar K, DiCarlo JJ. 2019.. Neural population control via deep image synthesis. . Science 364:(6439):eaav9436
    [Crossref] [Google Scholar]
  5. Bear DM, Wang E, Mrowca D, Binder FJ, Tung HYF, et al. 2021.. Physion: evaluating physical prediction from vision in humans and machines. . arXiv:2106.08261 [cs.AI]
  6. Blauch NM, Behrmann M, Plaut DC. 2022.. A connectivity-constrained computational account of topographic organization in primate high-level visual cortex. . PNAS 119:(3):e2112566119
    [Crossref] [Google Scholar]
  7. Bowers JS, Malhotra G, Dujmović M, Montero ML, Tsvetkov C, et al. 2022.. Deep problems with neural network models of human vision. . Behav. Brain Sci. 46::e385
    [Crossref] [Google Scholar]
  8. Bracci S, Op de Beeck HP. 2023.. Understanding human object vision: A picture is worth a thousand representations. . Annu. Rev. Psychol. 74::11335
    [Crossref] [Google Scholar]
  9. Cadena SA, Denfield GH, Walker EY, Gatys LA, Tolias AS, et al. 2019.. Deep convolutional models improve predictions of macaque V1 responses to natural images. . PLOS Comput. Biol. 15:(4):e1006897
    [Crossref] [Google Scholar]
  10. Cadieu CF, Hong H, Yamins DL, Pinto N, Ardila D, et al. 2014.. Deep neural networks rival the representation of primate IT cortex for core visual object recognition. . PLOS Comput. Biol. 10:(12):e1003963
    [Crossref] [Google Scholar]
  11. Canatar A, Feather J, Wakhloo A, Chung S. 2023.. A spectral theory of neural prediction and alignment. . arXiv:2309.12821 [q-bio.NC]
  12. Chang L, Egger B, Vetter T, Tsao DY. 2021.. Explaining face representation in the primate brain using different computational models. . Curr. Biol. 31:(13):278595
    [Crossref] [Google Scholar]
  13. Chen X, Wang F, Fernandez E, Roelfsema PR. 2020.. Shape perception via a high-channel-count neuroprosthesis in monkey visual cortex. . Science 370:(6521):119196
    [Crossref] [Google Scholar]
  14. Churchland PS, Sejnowski TJ. 1988.. Perspectives on cognitive neuroscience. . Science 242:(4879):74145
    [Crossref] [Google Scholar]
  15. Cichy RM, Khosla A, Pantazis D, Torralba A, Oliva A. 2016.. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. . Sci. Rep. 6:(1):27755
    [Crossref] [Google Scholar]
  16. Cohen U, Chung S, Lee DD, Sompolinsky H. 2020.. Separability and geometry of object manifolds in deep neural networks. . Nat. Commun. 11:(1):746
    [Crossref] [Google Scholar]
  17. Cornford J, Kalajdzievski D, Leite M, Lamarquette A, Kullmann DM, Richards BA. 2020.. Learning to live with Dale's principle: ANNs with separate excitatory and inhibitory units. . bioRxiv 2020.11.02.364968. https://doi.org/10.1101/2020.11.02.364968
  18. Dapello J, Kar K, Schrimpf M, Geary RB, Ferguson M, et al. 2022.. Aligning model and macaque inferior temporal cortex representations improves model-to-human behavioral alignment and adversarial robustness. . bioRxiv 2022.07.01.498495. https://doi.org/10.1101/2022.07.01.498495
  19. Dapello J, Marques T, Schrimpf M, Geiger F, Cox D, DiCarlo JJ. 2020.. Simulating a primary visual cortex at the front of CNNs improves robustness to image perturbations. . In Advances in Neural Information Processing Systems 33 (NeurIPS 2020), ed. H Larochelle, M Ranzato, R Hadsell, MF Balcan, H Lin , pp. 1307387. Red Hook, NY:: Curran Assoc.
    [Google Scholar]
  20. de Haan EH, Cowey A. 2011.. On the usefulness of “what’ and ‘where’ pathways in vision. . Trends Cogn. Sci. 15:(10):46066
    [Crossref] [Google Scholar]
  21. DiCarlo JJ, Maunsell JH. 2000.. Form representation in monkey inferotemporal cortex is virtually unaltered by free viewing. . Nat. Neurosci. 3:(8):81421
    [Crossref] [Google Scholar]
  22. DiCarlo JJ, Zoccolan D, Rust NC. 2012.. How does the brain solve visual object recognition?. Neuron 73:(3):41534
    [Crossref] [Google Scholar]
  23. Dobs K, Martinez J, Kell AJ, Kanwisher N. 2022.. Brain-like functional specialization emerges spontaneously in deep neural networks. . Sci. Adv. 8:(11):eabl8913
    [Crossref] [Google Scholar]
  24. Doshi FR, Konkle T. 2023.. Cortical topographic motifs emerge in a self-organized map of object space. . Sci. Adv. 9:(25):eade8187
    [Crossref] [Google Scholar]
  25. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, et al. 2020.. An image is worth 16x16 words: transformers for image recognition at scale. . arXiv:2010.11929 [cs.CV]
  26. Elsayed G, Shankar S, Cheung B, Papernot N, Kurakin A, et al. 2018.. Adversarial examples that fool both computer vision and time-limited humans. . In Advances in Neural Information Processing Systems 31 (NeurIPS 2018), ed. S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett , pp. 391424. Red Hook, NY:: Curran Assoc.
    [Google Scholar]
  27. Fan J, Zeng Y. 2023.. Challenging deep learning models with image distortion based on the abutting grating illusion. . Patterns 4:(3):100695
    [Crossref] [Google Scholar]
  28. Fel T, Rodriguez Rodriguez IF, Linsley D, Serre T. 2022.. Harmonizing the object recognition strategies of deep neural networks with humans. . In Advances in Neural Information Processing Systems 35 (NeurIPS 2022), ed. S Koyejo, S Mohamed, A Agarwal, D Belgrave, K Cho, A Oh , pp. 943246. Red Hook, NY:: Curran Assoc.
    [Google Scholar]
  29. Felleman DJ, Van Essen DC. 1991.. Distributed hierarchical processing in the primate cerebral cortex. . Cereb. Cortex 1:(1):147
    [Crossref] [Google Scholar]
  30. Freeman J, Ziemba CM, Heeger DJ, Simoncelli EP, Movshon JA. 2013.. A functional and perceptual signature of the second visual area in primates. . Nat. Neurosci. 16:(7):97481
    [Crossref] [Google Scholar]
  31. Fukushima K. 1980.. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. . Biol. Cybernet. 36:(4):193202
    [Crossref] [Google Scholar]
  32. Gallant JL, Connor CE, Rakshit S, Lewis JW, Van Essen DC. 1996.. Neural responses to polar, hyperbolic, and Cartesian gratings in area V4 of the macaque monkey. . J. Neurophysiol. 76:(4):271839
    [Crossref] [Google Scholar]
  33. Gattass R, Gross CG, Sandell JH. 1981.. Visual topography of V2 in the macaque. . J. Comp. Neurol. 201:(4):51939
    [Crossref] [Google Scholar]
  34. Gattass R, Sousa A, Gross C. 1988.. Visuotopic organization and extent of V3 and V4 of the macaque. . J. Neurosci. 8:(6):183145
    [Crossref] [Google Scholar]
  35. Gaziv G, Lee MJ, DiCarlo JJ. 2023.. Robustified ANNs reveal wormholes between human category percepts. . arXiv:2308.06887 [cs.CV]
  36. Geiger F, Schrimpf M, Marques T, DiCarlo JJ. 2020.. Wiring up vision: minimizing supervised synaptic updates needed to produce a primate ventral stream. . bioRxiv 2020.06.08.140111. https://doi.org/10.1101/2020.06.08.140111
  37. Geirhos R, Rubisch P, Michaelis C, Bethge M, Wichmann FA, Brendel W. 2018a.. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. . arXiv:1811.12231 [cs.CV]
  38. Geirhos R, Temme CR, Rauber J, Schütt HH, Bethge M, Wichmann FA. 2018b.. Generalisation in humans and deep neural networks. . In Advances in Neural Information Processing Systems 31 (NeurIPS 2018), ed. S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett , pp. 754961. Red Hook, NY:: Curran Assoc.
    [Google Scholar]
  39. Goetschalckx L, Andonian A, Oliva A, Isola P. 2019.. GANalyze: toward visual definitions of cognitive image properties. . In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 574453. Piscataway, NJ:: IEEE
    [Google Scholar]
  40. Golan T, Raju PC, Kriegeskorte N. 2020.. Controversial stimuli: pitting neural networks against each other as models of human cognition. . PNAS 117:(47):2933037
    [Crossref] [Google Scholar]
  41. Goodfellow IJ, Shlens J, Szegedy C. 2014.. Explaining and harnessing adversarial examples. . arXiv:1412.6572 [stat.ML]
  42. Gothoskar N, Cusumano-Towner M, Zinberg B, Ghavamizadeh M, Pollok F, et al. 2021.. 3DP3: 3D scene perception via probabilistic programming. . In Advances in Neural Information Processing Systems 34 (NeurIPS 2021), ed. M Ranzato, A Beygelzimer, Y Dauphin, PS Liang, J. Wortman Vaughan , pp. 960012. Red Hook, NY:: Curran Assoc.
    [Google Scholar]
  43. Gross CG, Rocha-Miranda CE, Bender DB. 1972.. Visual properties of neurons in inferotemporal cortex of the macaque. . J. Neurophysiol. 35:(1):96111
    [Crossref] [Google Scholar]
  44. Grossman S, Gaziv G, Yeagle EM, Harel M, Mégevand P, et al. 2019.. Convergent evolution of face spaces across human face-selective neuronal groups and deep convolutional networks. . Nat. Commun. 10:(1):4934
    [Crossref] [Google Scholar]
  45. Güçlü U, van Gerven MA. 2015.. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. . J. Neurosci. 35:(27):1000514
    [Crossref] [Google Scholar]
  46. Guo C, Lee M, Leclerc G, Dapello J, Rao Y, et al. 2022.. Adversarially trained neural representations are already as robust as biological neural representations. . Proc. Mach. Learn. Res. 162::807281
    [Google Scholar]
  47. Hassabis D, Kumaran D, Summerfield C, Botvinick M. 2017.. Neuroscience-inspired artificial intelligence. . Neuron 95:(2):24558
    [Crossref] [Google Scholar]
  48. He K, Zhang X, Ren S, Sun J. 2016.. Deep residual learning for image recognition. . In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 77078. Piscataway, NJ:: IEEE
    [Google Scholar]
  49. Hubel DH, Wiesel TN. 1962.. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. . J. Physiol. 160:(1):10652
    [Crossref] [Google Scholar]
  50. Hubel DH, Wiesel TN. 1968.. Receptive fields and functional architecture of monkey striate cortex. . J. Physiol. 195:(1):21543
    [Crossref] [Google Scholar]
  51. Hung CP, Kreiman G, Poggio T, DiCarlo JJ. 2005.. Fast readout of object identity from macaque inferior temporal cortex. . Science 310:(5749):86366
    [Crossref] [Google Scholar]
  52. Jacob G, Pramod R, Katti H, Arun S. 2021.. Qualitative similarities and differences in visual object representations between brains and deep networks. . Nat. Commun. 12:(1):1872
    [Crossref] [Google Scholar]
  53. Jazayeri M, Afraz A. 2017.. Navigating the neural space in search of the neural code. . Neuron 93:(5):100314
    [Crossref] [Google Scholar]
  54. Jones H, Grieve K, Wang W, Sillito A. 2001.. Surround suppression in primate V1. . J. Neurophysiol. 86:(4):201128
    [Crossref] [Google Scholar]
  55. Kanwisher N, McDermott J, Chun MM. 1997.. The fusiform face area: a module in human extrastriate cortex specialized for face perception. . J. Neurosci. 17:(11):430211
    [Crossref] [Google Scholar]
  56. Kar K. 2022.. A computational probe into the behavioral and neural markers of atypical facial emotion processing in autism. . J. Neurosci. 42:(25):511526
    [Crossref] [Google Scholar]
  57. Kar K, DiCarlo JJ. 2021.. Fast recurrent processing via ventrolateral prefrontal cortex is needed by the primate ventral stream for robust core visual object recognition. . Neuron 109:(1):16476
    [Crossref] [Google Scholar]
  58. Kar K, Kornblith S, Fedorenko E. 2022.. Interpretability of artificial neural network models in artificial intelligence versus neuroscience. . Nat. Mach. Intell. 4:(12):106567
    [Crossref] [Google Scholar]
  59. Kar K, Kubilius J, Schmidt K, Issa EB, DiCarlo JJ. 2019.. Evidence that recurrent circuits are critical to the ventral stream's execution of core object recognition behavior. . Nat. Neurosci. 22:(6):97483
    [Crossref] [Google Scholar]
  60. Kell AJ, Yamins DL, Shook EN, Norman-Haignere SV, McDermott JH. 2018.. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. . Neuron 98:(3):63044
    [Crossref] [Google Scholar]
  61. Khaligh-Razavi SM, Kriegeskorte N. 2014.. Deep supervised, but not unsupervised, models may explain it cortical representation. . PLOS Comput. Biol. 10:(11):e1003915
    [Crossref] [Google Scholar]
  62. Klindt D, Ecker AS, Euler T, Bethge M. 2017.. Neural system identification for large populations separating “what” and “where. .” In Advances in Neural Information Processing Systems 30 (NeurIPS 2017), ed. I Guyon, U Von Luxburg, S Bengio, H Wallach, R Fergus, S Vishwanathan, R Garnett , pp. 350919. Red Hook, NY:: Curran Assoc.
    [Google Scholar]
  63. Kornblith S, Norouzi M, Lee H, Hinton G. 2019.. Similarity of neural network representations revisited. . Proc. Mach. Learn. Res. 97::351929
    [Google Scholar]
  64. Kriegeskorte N, Mur M, Bandettini PA. 2008.. Representational similarity analysis—connecting the branches of systems neuroscience. . Front. Syst. Neurosci. 2::4
    [Crossref] [Google Scholar]
  65. Krizhevsky A, Sutskever I, Hinton GE. 2012.. ImageNet classification with deep convolutional neural networks. . In Advances in Neural Information Processing Systems 25 (NeurIPS 2012), ed. F Pereira, CJ Burges, L Bottou, KQ Weinberger , pp. 1097105. Red Hook, NY:: Curran Assoc.
    [Google Scholar]
  66. Kubilius J, Schrimpf M, Kar K, Rajalingham R, Hong H, et al. 2019.. Brain-like object recognition with high-performing shallow recurrent ANNs. . In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), ed. H Wallach, H Larochelle, A Beygelzimer, F d’ Alché-Buc, E Fox, R Garnett , pp. 1280516. Red Hook, NY:: Curran Assoc.
    [Google Scholar]
  67. Kuhn TS. 1962.. The Structure of Scientific Revolutions. Chicago:: Univ. Chicago Press
    [Google Scholar]
  68. Lafer-Sousa R, Conway BR. 2013.. Parallel, multi-stage processing of colors, faces and shapes in macaque inferior temporal cortex. . Nat. Neurosci. 16:(12):187078
    [Crossref] [Google Scholar]
  69. Lake BM, Salakhutdinov R, Tenenbaum JB. 2015.. Human-level concept learning through probabilistic program induction. . Science 350:(6266):133238
    [Crossref] [Google Scholar]
  70. LeCun Y, Bengio Y. 1995.. Convolutional networks for images, speech, and time series. . In The Handbook of Brain Theory and Neural Networks, ed. MA Arbib , pp. 25588. Cambridge, MA:: MIT Press
    [Google Scholar]
  71. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, et al. 1989.. Backpropagation applied to handwritten zip code recognition. . Neural Comput. 1:(4):54151
    [Crossref] [Google Scholar]
  72. Lee H, Margalit E, Jozwik KM, Cohen MA, Kanwisher N, et al. 2020.. Topographic deep artificial neural networks reproduce the hallmarks of the primate inferior temporal cortex face processing network. . bioRxiv 2020.07.09.185116. https://doi.org/10.1101/2020.07.09.185116
  73. Levitt JB, Kiper DC, Movshon JA. 1994.. Receptive fields and functional architecture of macaque V2. . J. Neurophysiol. 71:(6):251742
    [Crossref] [Google Scholar]
  74. Logothetis NK, Pauls J, Poggio T. 1995.. Shape representation in the inferior temporal cortex of monkeys. . Curr. Biol. 5:(5):55263
    [Crossref] [Google Scholar]
  75. Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A. 2017.. Towards deep learning models resistant to adversarial attacks. . arXiv:1706.06083 [stat.ML]
  76. Majaj NJ, Hong H, Solomon EA, DiCarlo JJ. 2015.. Simple learned weighted sums of inferior temporal neuronal firing rates accurately predict human core object recognition performance. . J. Neurosci. 35:(39):1340218
    [Crossref] [Google Scholar]
  77. Margalit E, Lee H, Finzi D, DiCarlo JJ, Grill-Spector K, Yamins DL. 2023.. A unifying principle for the functional organization of visual cortex. . bioRxiv 2023.05.18.541361. https://doi.org/10.1101/2023.05.18.541361
  78. Maunsell JH. 2015.. Neuronal mechanisms of visual attention. . Annu. Rev. Vis. Sci. 1::37391
    [Crossref] [Google Scholar]
  79. Maunsell JH, Treue S. 2006.. Feature-based attention in visual cortex. . Trends Neurosci. 29:(6):31722
    [Crossref] [Google Scholar]
  80. McMahon E, Isik L. 2023.. Seeing social interactions. . Trends Cogn. Sci. 27:(12):116579
    [Crossref] [Google Scholar]
  81. Mehrer J, Spoerer CJ, Jones EC, Kriegeskorte N, Kietzmann TC. 2021.. An ecologically motivated image dataset for deep learning yields better models of human vision. . PNAS 118:(8):e2011417118
    [Crossref] [Google Scholar]
  82. Miller EK, Gochin PM, Gross CG. 1991.. Habituation-like decrease in the responses of neurons in inferior temporal cortex of the macaque. . Vis. Neurosci. 7:(4):35762
    [Crossref] [Google Scholar]
  83. Nayebi A, Sagastuy-Brena J, Bear DM, Kar K, Kubilius J, et al. 2021.. Goal-driven recurrent neural network models of the ventral visual stream. . bioRxiv 2021.02.17.431717. https://doi.org/10.1101/2021.02.17.431717
  84. Ngo J, Sankaranarayanan S, Isola P. 2023.. Is CLIP fooled by optical illusions? Tiny Pap. , Int. Conf. Learn. Rep., N.p.: https://openreview.net/forum?id=YdGkE4Ugg2C
    [Google Scholar]
  85. Nili H, Wingfield C, Walther A, Su L, Marslen-Wilson W, Kriegeskorte N. 2014.. A toolbox for representational similarity analysis. . PLOS Comput. Biol. 10:(4):e1003553
    [Crossref] [Google Scholar]
  86. Nuthmann A. 2017.. Fixation durations in scene viewing: modeling the effects of local image features, oculomotor parameters, and task. . Psychon. Bull. Rev. 24:(2):37092
    [Crossref] [Google Scholar]
  87. Op De Beeck H, Vogels R. 2000.. Spatial sensitivity of macaque inferior temporal neurons. . J. Comp. Neurol. 426:(4):50518
    [Crossref] [Google Scholar]
  88. Op de Beeck H, Wagemans J, Vogels R. 2001.. Inferotemporal neurons represent low-dimensional configurations of parameterized shapes. . Nat. Neurosci. 4:(12):124452
    [Crossref] [Google Scholar]
  89. Parvizi J, Jacques C, Foster BL, Withoft N, Rangarajan V, et al. 2012.. Electrical stimulation of human fusiform face-selective regions distorts face perception. . J. Neurosci. 32:(43):1491520
    [Crossref] [Google Scholar]
  90. Pasupathy A, Connor CE. 1999.. Responses to contour features in macaque area V4. . J. Neurophysiol. 82:(5):2490502
    [Crossref] [Google Scholar]
  91. Perelman P, Johnson WE, Roos C, Seuánez HN, Horvath JE, et al. 2011.. A molecular phylogeny of living primates. . PLOS Genet. 7:(3):e1001342
    [Crossref] [Google Scholar]
  92. Peters B, Kriegeskorte N. 2021.. Capturing the objects of vision with neural networks. . Nat. Hum. Behav. 5:(9):112744
    [Crossref] [Google Scholar]
  93. Phillips RR, Malamut BL, Bachevalier J, Mishkin M. 1988.. Dissociation of the effects of inferior temporal and limbic lesions on object discrimination learning with 24-h intertrial intervals. . Behav. Brain Res. 27:(2):99107
    [Crossref] [Google Scholar]
  94. Pinto N, Doukhan D, DiCarlo JJ, Cox DD. 2009.. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. . PLOS Comput. Biol. 5:(11):e1000579
    [Crossref] [Google Scholar]
  95. Poggio T, Banburski A, Liao Q. 2020.. Theoretical issues in deep networks. . PNAS 117:(48):3003945
    [Crossref] [Google Scholar]
  96. Ponce CR, Xiao W, Schade PF, Hartmann TS, Kreiman G, Livingstone MS. 2019.. Evolving images for visual neurons using a deep generative network reveals coding principles and neuronal preferences. . Cell 177:(4):9991009
    [Crossref] [Google Scholar]
  97. Popivanov ID, Jastorff J, Vanduffel W, Vogels R. 2014.. Heterogeneous single-unit selectivity in an fMRI-defined body-selective patch. . J. Neurosci. 34:(1):95111
    [Crossref] [Google Scholar]
  98. Popper KR. 1934.. The Logic of Scientific Discovery. Berlin:: Julius Springer
    [Google Scholar]
  99. Pospisil DA, Pasupathy A, Bair W. 2018.. “Artiphysiology” reveals V4-like shape tuning in a deep network trained for image classification. . eLife 7::e38242
    [Crossref] [Google Scholar]
  100. Potter MC. 1976.. Short-term conceptual memory for pictures. . J. Exp. Psychol. Hum. Learn. Mem. 2:(5):50922
    [Crossref] [Google Scholar]
  101. Rajalingham R, Issa EB, Bashivan P, Kar K, Schmidt K, DiCarlo JJ. 2018.. Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. . J. Neurosci. 38:(33):725569
    [Crossref] [Google Scholar]
  102. Rajalingham R, Piccato A, Jazayeri M. 2022.. Recurrent neural networks with explicit representation of dynamic latent variables can mimic behavioral patterns in a physical inference task. . Nat. Commun. 13:(1):5865
    [Crossref] [Google Scholar]
  103. Rajalingham R, Schmidt K, DiCarlo JJ. 2015.. Comparison of object recognition behavior in human and monkey. . J. Neurosci. 35:(35):1212736
    [Crossref] [Google Scholar]
  104. Rajalingham R, Sorenson M, Azadi R, Bohn S, DiCarlo JJ, Afraz A. 2021.. Chronically implantable led arrays for behavioral optogenetics in primates. . Nat. Methods 18:(9):111216
    [Crossref] [Google Scholar]
  105. Ratan Murty NA, Bashivan P, Abate A, DiCarlo JJ, Kanwisher N. 2021.. Computational models of category-selective brain regions enable high-throughput tests of selectivity. . Nat. Commun. 12:(1):5540
    [Crossref] [Google Scholar]
  106. Richards BA, Lillicrap TP, Beaudoin P, Bengio Y, Bogacz R, et al. 2019.. A deep learning framework for neuroscience. . Nat. Neurosci. 22:(11):176170
    [Crossref] [Google Scholar]
  107. Riesenhuber M, Poggio T. 1999.. Hierarchical models of object recognition in cortex. . Nat. Neurosci. 2:(11):101925
    [Crossref] [Google Scholar]
  108. Ringach DL, Shapley RM, Hawken MJ. 2002.. Orientation selectivity in macaque V1: diversity and laminar dependence. . J. Neurosci. 22:(13):563951
    [Crossref] [Google Scholar]
  109. Rossion B, Taubert J. 2019.. What can we learn about human individual face recognition from experimental studies in monkeys?. Vis. Res. 157::14258
    [Crossref] [Google Scholar]
  110. Rumelhart DE, Hinton GE, Williams RJ. 1986.. Learning representations by back-propagating errors. . Nature 323:(6088):53336
    [Crossref] [Google Scholar]
  111. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, et al. 2015.. ImageNet large scale visual recognition challenge. . Int. J. Comput. Vis. 115::21152
    [Crossref] [Google Scholar]
  112. Rust NC, Schwartz O, Movshon JA, Simoncelli EP. 2005.. Spatiotemporal elements of macaque V1 receptive fields. . Neuron 46:(6):94556
    [Crossref] [Google Scholar]
  113. Schrimpf M, Blank IA, Tuckute G, Kauf C, Hosseini EA, et al. 2021.. The neural architecture of language: integrative modeling converges on predictive processing. . PNAS 118:(45):e2105646118
    [Crossref] [Google Scholar]
  114. Schrimpf M, Kubilius J, Hong H, Majaj NJ, Rajalingham R, et al. 2018.. Brain-score: Which artificial neural network for object recognition is most brain-like?. bioRxiv 407007. https://doi.org/10.1101/407007
  115. Schrimpf M, Kubilius J, Lee MJ, Murty NAR, Ajemian R, DiCarlo JJ. 2020.. Integrative benchmarking to advance neurally mechanistic models of human intelligence. . Neuron 108:(3):41323
    [Crossref] [Google Scholar]
  116. Serre T, Riesenhuber M. 2004.. Realistic modeling of simple and complex cell tuning in the HMAX model, and implications for invariant object recognition in cortex. Rep., Comput. Sci. Artif. Intell. Lab. , Mass. Inst. Technol., Cambridge:
    [Google Scholar]
  117. Sexton NJ, Love BC. 2022.. Reassessing hierarchical correspondences between brain and deep networks through direct interface. . Sci. Adv. 8:(28):eabm2219
    [Crossref] [Google Scholar]
  118. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. 2016.. Rethinking the inception architecture for computer vision. . In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 281826. Piscataway, NJ:: IEEE
    [Google Scholar]
  119. Tanaka K. 1996.. Inferotemporal cortex and object vision. . Annu. Rev. Neurosci. 19::10939
    [Crossref] [Google Scholar]
  120. Tang H, Schrimpf M, Lotter W, Moerman C, Paredes A, et al. 2018.. Recurrent computations for visual pattern completion. . PNAS 115:(35):883540
    [Crossref] [Google Scholar]
  121. Thorpe S, Fize D, Marlot C. 1996.. Speed of processing in the human visual system. . Nature 381:(6582):52022
    [Crossref] [Google Scholar]
  122. Tsao DY, Freiwald WA, Tootell RB, Livingstone MS. 2006.. A cortical region consisting entirely of face-selective cells. . Science 311:(5761):67074
    [Crossref] [Google Scholar]
  123. Ungerleider LG, Mishkin M. 1982.. Two cortical visual systems. . In Analysis of Visual Behavior, ed. DJ Ingle, MA Goodale, RJW Mansfield , pp. 54986. Cambridge, MA:: MIT Press
    [Google Scholar]
  124. Vogels R. 2022.. More than the face: representations of bodies in the inferior temporal cortex. . Annu. Rev. Vis. Sci. 8::383405
    [Crossref] [Google Scholar]
  125. Walker EY, Sinz FH, Cobos E, Muhammad T, Froudarakis E, et al. 2019.. Inception loops discover what excites neurons most using deep predictive models. . Nat. Neurosci. 22:(12):206065
    [Crossref] [Google Scholar]
  126. Wolff SB, Ölveczky BP. 2018.. The promise and perils of causal circuit manipulations. . Curr. Opin. Neurobiol. 49::8494
    [Crossref] [Google Scholar]
  127. Xiao W, Kreiman G. 2020.. XDream: finding preferred stimuli for visual neurons using generative networks and gradient-free optimization. . PLOS Comput. Biol. 16:(6):e1007973
    [Crossref] [Google Scholar]
  128. Yamins DL, DiCarlo JJ. 2016.. Using goal-driven deep learning models to understand sensory cortex. . Nat. Neurosci. 19:(3):35665
    [Crossref] [Google Scholar]
  129. Yamins DL, Hong H, Cadieu C, DiCarlo JJ. 2013.. Hierarchical modular optimization of convolutional networks achieves representations similar to macaque IT and human ventral stream. . In Advances in Neural Information Processing Systems 26 (NeurIPS 2013), ed. CJ Burges, L Bottou, M Welling, Z Ghahramani, KQ Weinberger , pp. 3093101. Red Hook, NY:: Curran Assoc.
    [Google Scholar]
  130. Yamins DL, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ. 2014.. Performance-optimized hierarchical models predict neural responses in higher visual cortex. . PNAS 111:(23):861924
    [Crossref] [Google Scholar]
  131. Zamir AR, Wu TL, Sun L, Shen WB, Shi BE, et al. 2017.. Feedback networks. . In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 130817. Piscataway, NJ:: IEEE
    [Google Scholar]
  132. Zeiler MD, Fergus R. 2014.. Visualizing and understanding convolutional networks. . In Computer Vision—ECCV 2014, pp. 81833. Berlin:: Springer
    [Google Scholar]
  133. Zhang M, Tseng C, Kreiman G. 2020.. Putting visual object recognition in context. . In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1298594. Piscataway, NJ:: IEEE
    [Google Scholar]
  134. Zhang Y, Meyers EM, Bichot NP, Serre T, Poggio TA, Desimone R. 2011.. Object decoding with attention in inferior temporal cortex. . PNAS 108:(21):885055
    [Crossref] [Google Scholar]
  135. Zhao C, Sun Q, Zhang C, Tang Y, Qian F. 2020.. Monocular depth estimation based on deep learning: an overview. . Sci. China Technol. Sci. 63:(9):161227
    [Crossref] [Google Scholar]
  136. Zhao ZQ, Zheng P, St Xu, Wu X. 2019.. Object detection with deep learning: a review. . IEEE Trans. Neural Netw. Learn. Syst. 30:(11):321232
    [Crossref] [Google Scholar]
  137. Zhuang C, Yan S, Nayebi A, Schrimpf M, Frank MC, et al. 2021.. Unsupervised neural network models of the ventral visual stream. . PNAS 118:(3):e2014196118
    [Crossref] [Google Scholar]
/content/journals/10.1146/annurev-vision-112823-030616
Loading
/content/journals/10.1146/annurev-vision-112823-030616
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error