Visual Object Recognition: Do We (Finally) Know More Now Than We Did?

Isabel Gauthier; Michael J. Tarr

doi:10.1146/annurev-vision-111815-114621

Annual Review of Vision Science

Volume 2, 2016

Review Article

Free

Visual Object Recognition: Do We (Finally) Know More Now Than We Did?

Isabel Gauthier¹, and Michael J. Tarr²
View Affiliations Hide Affiliations

Affiliations: ¹Department of Psychology, Vanderbilt University, Nashville, Tennessee 37240-7817; email: [email protected] ²Department of Psychology, Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Vol. 2:377-396 (Volume publication date October 2016) https://doi.org/10.1146/annurev-vision-111815-114621
First published as a Review in Advance on August 03, 2016
© Annual Reviews

Abstract

How do we recognize objects despite changes in their appearance? The past three decades have been witness to intense debates regarding both whether objects are encoded invariantly with respect to viewing conditions and whether specialized, separable mechanisms are used for the recognition of different object categories. We argue that such dichotomous debates ask the wrong question. Much more important is the nature of object representations: What are features that enable invariance or differential processing between categories? Although the nature of object features is still an unanswered question, new methods for connecting data to models show significant potential for helping us to better understand neural codes for objects. Most prominently, new approaches to analyzing data from functional magnetic resonance imaging, including neural decoding and representational similarity analysis, and new computational models of vision, including convolutional neural networks, have enabled a much more nuanced understanding of visual representation. Convolutional neural networks are particularly intriguing as a tool for studying biological vision in that this class of artificial vision systems, based on biologically plausible deep neural networks, exhibits visual recognition capabilities that are approaching those of human observers. As these models improve in their recognition performance, it appears that they also become more effective in predicting and accounting for neural responses in the ventral cortex. Applying these and other deep models to empirical data shows great promise for enabling future progress in the study of visual recognition.

Keyword(s): category selectivity, decoding, deep neural networks, face recognition, invariance, object recognition

Article metrics loading...

/content/journals/10.1146/annurev-vision-111815-114621

2016-10-14

2024-05-08

Full text loading...

/deliver/fulltext/vision/2/1/annurev-vision-111815-114621.html?itemId=/content/journals/10.1146/annurev-vision-111815-114621&mimeType=html&fmt=ahah

Literature Cited

Andresen DR, Vinberg J, Grill-Spector K. 2009. The representation of object viewpoint in human visual cortex. NeuroImage 45:522–36 [Google Scholar]
Barenholtz E, Tarr MJ. 2007. Reconsidering the role of structure in vision. Categories in Use 47 A Markman, B Ross 157–80 San Diego, CA: Academic [Google Scholar]
Biederman I. 1985. Human image understanding: recent research and a theory. Comput. Vis. Graph. Image Process. 32:29–73 [Google Scholar]
Biederman I, Bar M. 2000. Differing views on views: response to Hayward and Tarr (2000). Vis. Res. 40:3901–5 [Google Scholar]
Booth MCA, Rolls ET. 1998. View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex. Cereb. Cortex 8:510–23 [Google Scholar]
Bukach CM, Phillips SW, Gauthier I. 2010. Limits of generalization between categories and implications for theories of category specificity. Atten. Percept. Psychophys. 72:1865–74 [Google Scholar]
Bülthoff HH, Edelman SY, Tarr MJ. 1995. How are three-dimensional objects represented in the brain?. Cereb. Cortex 5:247–60 [Google Scholar]
Chao LL, Haxby JV, Martin A. 1999. Attribute-based neural substrates in temporal cortex for perceiving and knowing about objects. Nat. Neurosci. 2:913–19 [Google Scholar]
Curby KM, Glazek K, Gauthier I. 2009. A visual short-term memory advantage for objects of expertise. J. Exp. Psychol. 35:94–107 [Google Scholar]
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. 2009. ImageNet: a large-scale hierarchical image database. Computer Vision and Pattern Recognition, 2009248–55 Piscataway, NJ: IEEE [Google Scholar]
Dennett HW, McKone E, Tavashmi R, Hall A, Pidcock M. et al. 2012. The Cambridge Car Memory Test: a task matched in format to the Cambridge Face Memory Test, with norms, reliability, sex differences, dissociations from face memory, and expertise effects. Behav. Res. Methods 44587–605
Edelman S. 1999. Representation and Recognition in Vision. Cambridge, MA: MIT Press
Erez J, Cusack R, Kendall W, Barense MD. 2015. Conjunctive coding of complex object features. Cereb. Cortex 26:2271–82 [Google Scholar]
Ewbank MP, Andrews TJ. 2008. Differential sensitivity for viewpoint between familiar and unfamiliar faces in human visual cortex. NeuroImage 40:1857–70 [Google Scholar]
Farah MJ. 1990. Visual Agnosia: Disorders of Object Recognition and What They Tell Us about Normal Vision. Cambridge, MA: MIT Press
Felleman DJ, Van Essen DC. 1991. Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex 1:1–47 [Google Scholar]
Fukushima K. 1980. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36:193–202 [Google Scholar]
Gauthier I, Curran T, Curby KM, Collins D. 2003. Perceptual interference evidence for a non-modular account of face processing. Nat. Neurosci. 6428–32
Gauthier I, Nelson CA. 2001. The development of face expertise. Curr. Opin. Neurobiol. 11:219–24 [Google Scholar]
Gauthier I, Skudlarski P, Gore JC, Anderson AW. 2000. Expertise for cars and birds recruits brain areas involved in face recognition. Nat. Neurosci. 3:191–97 [Google Scholar]
Grill-Spector K, Sayres R, Ress D. 2006. High-resolution imaging reveals highly selective nonface clusters in the fusiform face area. Nat. Neurosci. 9:1177–85 [Google Scholar]
Güçlü U, van Gerven MA. 2015. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. J. Neurosci. 35:10005–14 [Google Scholar]
Harel A, Gilaie-Dotan S, Malach R, Bentin S. 2010. Top-down engagement modulates the neural expressions of visual expertise. Cereb. Cortex 202304–18
Haxby JV, Gobbini MI, Furey ML, Ishai A, Schouten JL, Pietrini P. 2001. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293:2425–30 [Google Scholar]
Hayward WG, Tarr MJ. 2000. Differing views on views: comments on Biederman and Bar (1999). Vis. Res. 40:3895–99 [Google Scholar]
He K, Zhang X, Ren S, Sun J. 2015. Deep residual learning for image recognition. arXiv:1512.03385 [cs.CV] [Google Scholar]
Hubel DH, Wiesel TN. 1968. Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 203:237–60 [Google Scholar]
Hubel DH, Wiesel TN. 1974. Sequence regularity and geometry of orientation columns in the monkey striate cortex. J. Comp. Neurol. 158:267–93 [Google Scholar]
Huth AG, Nishimoto S, Vu AT, Gallant JL. 2012. A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron 76:1210–24 [Google Scholar]
Jiang X, Bradley E, Rini RA, Zeffiro T, Vanmeter J, Riesenhuber M. 2007. Categorization training results in shape- and category-selective human neural plasticity. Neuron 53:891–903 [Google Scholar]
Kamitani Y, Tong F. 2005. Decoding the visual and subjective contents of the human brain. Nat. Neurosci. 8:679–85 [Google Scholar]
Kanwisher N. 2000. Domain specificity in face perception. Nat. Neurosci. 3:759–63 [Google Scholar]
Kanwisher N, McDermott J, Chun MM. 1997. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17:4302–11 [Google Scholar]
Kay KN, Naselaris T, Prenger RJ, Gallant JL. 2008. Identifying natural images from human brain activity. Nature 452:352–55 [Google Scholar]
Khaligh-Razavi SM, Kriegeskorte N. 2014. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLOS Comput. Biol. 10:e1003915 [Google Scholar]
Kriegeskorte N. 2015. Deep neural networks: a new framework for modeling biological vision and brain information processing. Annu. Rev. Vis. Sci. 1:417–46 [Google Scholar]
Kriegeskorte N, Goebel R, Bandettini P. 2006. Information-based functional brain mapping. PNAS 103:3863–68 [Google Scholar]
Kriegeskorte N, Mur M, Bandettini P. 2008. Representational similarity analysis—connecting the branches of systems neuroscience. Front. Syst. Neurosci. 2:4 [Google Scholar]
LeCun Y, Bengio Y, Hinton G. 2015. Deep learning. Nature 521:436–44 [Google Scholar]
Leeds DD, Seibert DA, Pyles JA, Tarr MJ. 2013. Comparing visual representations across human fMRI and computational vision. J. Vis. 13:1325 [Google Scholar]
Lowe DG. 2004. Distinctive image features from scale-invariant keypoints. Intl. J. Comp. Vis. 60:91–110 [Google Scholar]
Marr D. 1982. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. San Francisco: Freeman
Martin A. 2007. The representation of object concepts in the brain. Annu. Rev. Psychol. 58:25–45 [Google Scholar]
McCandliss BD, Cohen L, Dehaene S. 2003. The visual word form area: expertise for reading in the fusiform gyrus. Trends Cogn. Sci. 7293–99
McGugin RW, Gatenby JC, Gore JC, Gauthier I. 2012a. High-resolution imaging of expertise reveals reliable object selectivity in the fusiform face area related to perceptual performance. PNAS 109:17063–68 [Google Scholar]
McGugin RW, Newton AT, Gore JC, Gauthier I. 2014. Robust expertise effects in right FFA. Neuropsychologia 63:135–44 [Google Scholar]
McGugin RW, Richler JJ, Herzmann G, Speegle M, Gauthier I. 2012b. The Vanderbilt Expertise Test reveals domain-general and domain-specific sex effects in object recognition. Vis. Res. 69:10–22 [Google Scholar]
McGugin RW, Van Gulick AE, Gauthier I. 2016. Cortical thickness in fusiform face area predicts face and object recognition performance. J. Cogn. Neurosci. 28:282–94 [Google Scholar]
McGugin RW, Van Gulick AE, Tamber-Rosenau BJ, Ross DA, Gauthier I. 2015. Expertise effects in face-selective areas are robust to clutter and diverted attention but not to competition. Cereb. Cortex 252610–22
Miller GA. 1995. WordNet: a lexical database for English. Commun. ACM 38:39–41 [Google Scholar]
Mordvintsev A, Olah C, Tyka M. 2015. Inceptionism: going deeper into neural networks. Google Research Blog June 17. http://googleresearch.blogspot.com/2015/06/inceptionism-going-deeper-into-neural.html
Moscovitch M, Winocur G, Behrmann M. 1997. What is special about face recognition? Nineteen experiments on a person with visual object agnosia and dyslexia but normal face recognition. J. Cogn. Neurosci. 9:555–604 [Google Scholar]
Nestor A, Vettel JM, Tarr MJ. 2008. Task-specific codes for face recognition: how they shape the neural representation of features for detection and individuation. PLOS ONE 3:e3978 [Google Scholar]
Norman KA, Polyn SM, Detre GJ, Haxby JV. 2006. Beyond mind-reading: multi-voxel pattern analysis of fMRI data. Trends Cogn. Sci. 10:424–30 [Google Scholar]
O'Toole AJ, Jiang F, Abdi H, Haxby JV. 2005. Partially distributed representations of objects and faces in ventral temporal cortex. J. Cogn. Neurosci. 17:580–90 [Google Scholar]
Peissig JJ, Tarr MJ. 2007. Visual object recognition: Do we know more now than we did 20 years ago?. Annu. Rev. Psychol. 58:75–96 [Google Scholar]
Perrett DI, Oram MW, Ashbridge E. 1998. Evidence accumulation in cell populations responsive to faces: an account of generalisation of recognition without mental transformations. Cognition 67:111–45 [Google Scholar]
Pirsig RM. 1974. Zen and the Art of Motorcycle Maintenance: An Inquiry into Values. New York: HarperCollins. E-book
Riesenhuber M, Poggio T. 1999. Hierarchical models of object recognition in cortex. Nat. Neurosci. 2:1019–25 [Google Scholar]
Russakovsky O, Deng J, Su H, Krause J, Satheesh S. et al. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115:211–52 [Google Scholar]
Serences JT, Saproo S. 2012. Computational advances towards linking BOLD and behavior. Neuropsychologia 50:435–46 [Google Scholar]
Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T. 2007. Robust object recognition with cortex-like mechanisms. IEEE Trans. Pattern Anal. Mach. Intell. 29:411–26 [Google Scholar]
Shakeshaft NG, Plomin R. 2015. Genetic specificity of face recognition. PNAS 112:12887–92 [Google Scholar]
Sheinberg D, Tarr MJ. 2009. Objects of expertise. Perceptual Expertise: Bridging Brain and Behavior I Gauthier, MJ Tarr, D Bub 41–65 New York: Oxford Univ. Press [Google Scholar]
Shepard RN. 1980. Multidimensional scaling, tree-fitting, and clustering. Science 210:390–98 [Google Scholar]
Tarr MJ, Gauthier I. 2000. FFA: a flexible fusiform area for subordinate-level visual processing automatized by expertise. Nat. Neurosci. 3:764–69 [Google Scholar]
Tarr MJ, Pinker S. 1989. Mental rotation and orientation-dependence in shape recognition. Cogn. Psychol. 21:233–82 [Google Scholar]
Ungerleider LG, Bell AH. 2010. Uncovering the visual “alphabet”: advances in our understanding of object perception. Vis. Res. 51:782–99 [Google Scholar]
Van Gulick AE, McGugin RW, Gauthier I. 2015. Measuring non-visual knowledge about object categories: the Semantic Vanderbilt Expertise Test. Behav. Res. Methods In press doi: 10.3758/s13428-015-0637-5
Vuilleumier P, Henson RN, Driver J, Dolan RJ. 2002. Multiple levels of visual object constancy revealed by event-related fMRI of repetition priming. Nat. Neurosci. 5:491–99 [Google Scholar]
Wong AC-N, Jobard G, James KH, James TW, Gauthier I. 2009. Expertise with characters in alphabetic and non-alphabetic writing systems engage overlapping occipito-temporal areas. Cogn. Neuropsychol. 26111–27
Wong YK, Peng C, Fratus KN, Woodman GF, Gauthier I. 2014. Perceptual expertise and top-down expectation of musical notation engages the primary visual cortex. J. Cogn. Neurosci. 26:1629–43 [Google Scholar]
Yamane Y, Carlson ET, Bowman KC, Wang Z, Connor CE. 2008. A neural code for three-dimensional object shape in macaque inferotemporal cortex. Nat. Neurosci. 11:1352–60 [Google Scholar]
Yamins DL, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ. 2014. Performance-optimized hierarchical models predict neural responses in higher visual cortex. PNAS 111:238619–24 [Google Scholar]

/content/journals/10.1146/annurev-vision-111815-114621

Visual Object Recognition: Do We (Finally) Know More Now Than We Did?

Annual Review of Vision Science 2, 377 (2016); https://doi.org/10.1146/annurev-vision-111815-114621

/content/journals/10.1146/annurev-vision-111815-114621

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Deep Neural Networks: A New Framework for Modeling Biological Vision and Brain Information Processing
  
  Nikolaus Kriegeskorte
  
  Vol. 1 (2015), pp. 417–446
- A Revised Neural Framework for Face Processing
  
  Brad Duchaine, and Galit Yovel
  
  Vol. 1 (2015), pp. 393–416
- Capabilities and Limitations of Peripheral Vision
  
  Ruth Rosenholtz
  
  Vol. 2 (2016), pp. 437–457
- Visual Adaptation
  
  Michael A. Webster
  
  Vol. 1 (2015), pp. 547–567
- Microglia in the Retina: Roles in Development, Maturity, and Disease
  
  Sean M. Silverman, and Wai T. Wong
  
  Vol. 4 (2018), pp. 45–77
- Circuits for Action and Cognition: A View from the Superior Colliculus
  
  Michele A. Basso, and Paul J. May
  
  Vol. 3 (2017), pp. 197–226
- Neuronal Mechanisms of Visual Attention
  
  John H.R. Maunsell
  
  Vol. 1 (2015), pp. 373–391
- The Functional Neuroanatomy of Human Face Perception
  
  Kalanit Grill-Spector, Kevin S. Weiner, Kendrick Kay, and Jesse Gomez
  
  Vol. 3 (2017), pp. 167–196
- The Organization and Operation of Inferior Temporal Cortex
  
  Bevil R. Conway
  
  Vol. 4 (2018), pp. 381–402
- Scene Perception in the Human Brain
  
  Russell A. Epstein, and Chris I. Baker
  
  Vol. 5 (2019), pp. 373–397
More Less

Annual Review of Vision Science

Volume 2, 2016

Review Article

Free

Visual Object Recognition: Do We (Finally) Know More Now Than We Did?

Abstract

Most Read This Month

Most Cited Most Cited RSS feed