1932

Abstract

Recent advances in neural network modeling have enabled major strides in computer vision and other artificial intelligence applications. Human-level visual recognition abilities are coming within reach of artificial systems. Artificial neural networks are inspired by the brain, and their computations could be implemented in biological neurons. Convolutional feedforward networks, which now dominate computer vision, take further inspiration from the architecture of the primate visual hierarchy. However, the current models are designed with engineering goals, not to model brain computations. Nevertheless, initial studies comparing internal representations between these models and primate brains find surprisingly similar representational spaces. With human-level performance no longer out of reach, we are entering an exciting new era, in which we will be able to build biologically faithful feedforward and recurrent computational models of how biological brains perform high-level feats of intelligence, including vision.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-vision-082114-035447
2015-11-24
2024-09-08
Loading full text...

Full text loading...

/deliver/fulltext/vision/1/1/annurev-vision-082114-035447.html?itemId=/content/journals/10.1146/annurev-vision-082114-035447&mimeType=html&fmt=ahah

Literature Cited

  1. Agrawal P, Stansbury D, Jitendra Malik J, Gallant JL. 2014. Pixels to voxels: modeling visual representation in the human brain. arXiv:1407.5104 [q-bio.NC] [Google Scholar]
  2. Barlow H. 2001. Redundancy reduction revisited. Netw. Comput. Neural Syst. 2:3241–53 [Google Scholar]
  3. Barlow HB. 1961 (2012). Possible principles underlying the transformations of sensory messages. Sensory Communication WA Rosenblith 217–34 Cambridge, MA: MIT Press [Google Scholar]
  4. Bengio Y. 2009. Learning Deep Architectures for AI Hanover, MA: Now [Google Scholar]
  5. Brincat SL, Connor CE. 2006. Dynamic shape synthesis in posterior inferotemporal cortex. Neuron 49:17–24 [Google Scholar]
  6. Buesing L, Bill J, Nessler B, Maass W. 2011. Neural dynamics as sampling: a model for stochastic computation in recurrent networks of spiking neurons. PLOS Comput. Biol. 7:11e1002211 [Google Scholar]
  7. Cadieu CF, Hong H, Yamins DL, Pinto N, Ardila D. et al. 2014. Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLOS Comput. Biol. 10:12e1003963 [Google Scholar]
  8. Cadieu CF, Hong H, Yamins DL, Pinto N, Majaj NJ, DiCarlo JJ. 2013. The neural representation benchmark and its evaluation on brain and machine Presented at Int. Conf. Learn. Represent., Scottsdale, AZ, May 2–4 arXiv:1301.3530 [cs.NE] [Google Scholar]
  9. Carlson T, Tovar DA, Alink A, Kriegeskorte N. 2013. Representational dynamics of object vision: the first 1000 ms. J. Vis. 13:1 [Google Scholar]
  10. Chatfield K, Simonyan K, Vedaldi A, Zisserman A. 2014. Return of the devil in the details: delving deep into convolutional nets. Presented at Br. Mach. Vis. Conf., Nottingham, UK, Sept. 1–5 arXiv:1405.3531 [cs.CV] [Google Scholar]
  11. Cho K, van Merrienboer B, Gulcehre C, Bougares F, Schwenk H, Bengio Y. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. Presented at Conf. Empir. Methods Nat. Lang. Process., Doha, Qatar, Oct. 25–29 arXiv:1406.1078 [cs.CL] [Google Scholar]
  12. Cichy RM, Pantazis D, Oliva A. 2014. Resolving human object recognition in space and time. Nat. Neurosci. 17:455–62 [Google Scholar]
  13. Clarke A, Devereux BJ, Randall B, Tyler LK. 2015. Predicting the time course of individual objects with MEG. Cereb. Cortex 25:3602–12 [Google Scholar]
  14. Clarke A, Tyler LK. 2015. Understanding what we see: how we derive meaning from vision.. Trends Cogn. Sci. 19:677–87 [Google Scholar]
  15. Cybenko G. 1989. Approximation by superpositions of a sigmoid function. Math. Control Signals Syst. 2:303–14 [Google Scholar]
  16. Dayan P. 2003. Helmholtz machines and wake-sleep learning. The Handbook of Brain Theory and Neural Networks MA Arbib 520–25 Cambridge, MA: MIT Press, 2nd ed.. [Google Scholar]
  17. Dayan P, Hinton GE, Neal RM, Zemel RS. 1995. The Helmholtz machine. Neural Comput. 7:5889–904 [Google Scholar]
  18. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. 2009. ImageNet: a large-scale hierarchical image database. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 20–25, Miami248–55 New York: IEEE [Google Scholar]
  19. Dumoulin SO, Wandell BA. 2008. Population receptive field estimates in human visual cortex. NeuroImage 39:647–60 [Google Scholar]
  20. Fan Y, Qian Y, Xie F-L, Soong FK. 2014. TTS synthesis with bidirectional LSTM based recurrent neural networks. Proc. 15th Annu. Conf. Int. Speech Commun. Assoc., Sept. 14–18, Singapore1964–68 Baixas, Fr.: ISCA [Google Scholar]
  21. Fiser J, Berkes P, Orbán G, Lengyel M. 2010. Statistically optimal perception and learning: from behavior to neural representations. Trends Cogn. Sci. 14:119–30 [Google Scholar]
  22. Freiwald WA, Tsao DY. 2010. Functional compartmentalization and viewpoint generalization within the macaque face-processing system. Science 330:6005845–51 [Google Scholar]
  23. Friston K. 2010. The free-energy principle: a unified brain theory?. Nat. Rev. Neurosci. 11:2127–38 [Google Scholar]
  24. Fukushima K. 1980. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36:4193–202 [Google Scholar]
  25. Gallistel CR, King AP. 2011. Memory and the Computational Brain: Why Cognitive Science Will Transform Neuroscience Chichester, UK: Wiley-Blackwell [Google Scholar]
  26. Gershman SJ, Horvitz EJ, Tenenbaum JB. 2015. Computational rationality: a converging paradigm for intelligence in brains, minds, and machines. Science 349:6245273–78 [Google Scholar]
  27. Ghahramani Z. 2013. Bayesian non-parametrics and the probabilistic approach to modelling. Philos. Trans. R. Soc. Lond. A 371:20110553 [Google Scholar]
  28. Girshick R, Donahue J, Darrell T, Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv:1311.2524 [cs.CV] [Google Scholar]
  29. Goodfellow IJ, Shlens J, Szegedy C. 2015. Explaining and harnessing adversarial examples. arXiv:1412.6572v3 [stat.ML] [Google Scholar]
  30. Graves A, Schmidhuber J. 2009. Offline handwriting recognition with multidimensional recurrent neural networks. Adv. Neural Inf. Process. Syst. 21:545–52 [Google Scholar]
  31. Güçlü U, van Gerven MA. 2015. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. J. Neurosci. 35:2710005–14 [Google Scholar]
  32. Häfner RM, Berkes P, Fiser J. 2014. Perceptual decision-making as probabilistic inference by neural sampling. arXiv:1409.0257 [q-bio.NC] [Google Scholar]
  33. Hilgetag CC, O’Neill MA, Young MP. 2000. Hierarchical organization of macaque and cat cortical sensory systems explored with a novel network processor. Philos. Trans. R. Soc. Lond. B Biol. Sci. 355:71–89 [Google Scholar]
  34. Hinton GE, Dayan P, Frey BJ, Neal RM. 1995. The “wake-sleep” algorithm for unsupervised neural networks. Science 268:52141158–61 [Google Scholar]
  35. Hinton GE, Salakhutdinov RR. 2006. Reducing the dimensionality of data with neural networks. Science 313:5786504–7 [Google Scholar]
  36. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580 [cs.NV] [Google Scholar]
  37. Hochreiter S. 1991. Untersuchungen zu dynamischen neuronalen Netzen. Master's Thesis, Inst. Inform., Tech. Univ. München [Google Scholar]
  38. Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J. 2001. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. A Field Guide to Dynamical Recurrent Neural Networks SC Kremer, JF Kolen 237–244 New York: IEEE [Google Scholar]
  39. Hochreiter S, Schmidhuber J. 1997. Long short-term memory. Neural Comput. 9:81735–80 [Google Scholar]
  40. Hornik K. 1991. Approximation capabilities of multilayer feedforward networks. Neural Netw. 4:2251–57 [Google Scholar]
  41. Hoyer PO, Hyvärinen A. 2003. Interpreting neural response variability as Monte Carlo sampling of the posterior. Adv. Neural Inform. Proc. Syst. 15:293–300 [Google Scholar]
  42. Hubel DH, Wiesel TN. 1968. Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 195:215 [Google Scholar]
  43. Jaeger H. 2001. The “echo state” approach to analysing and training recurrent neural networks—with an erratum note GMD Tech. Rep. 148, Ger. Natl. Res. Cent. Inf. Technol., Bonn [Google Scholar]
  44. Jozwik KM, Kriegeskorte N, Mur M. 2015. Visual features as stepping stones toward semantics: explaining object similarity in IT and perception with non-negative least squares. Neuropsychologia. In press [Google Scholar]
  45. Kay KN, Naselaris T, Prenger RJ, Gallant JL. 2008. Identifying natural images from human brain activity. Nature 452:352–55 [Google Scholar]
  46. Khaligh-Razavi S-M, Kriegeskorte N. 2013. Object-vision models that better explain IT also categorize better, but all models fail at both Presented at COSYNE, Salt Lake City, UT [Google Scholar]
  47. Khaligh-Razavi S-M, Kriegeskorte N. 2014. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLOS Comput. Biol. 10:11e1003915 doi:10.1371/journal.pcbi.1003915 [Google Scholar]
  48. Kiani R, Esteky H, Mirpour K, Tanaka K. 2007. Object category structure in response patterns of neuronal population in monkey inferior temporal cortex. J. Neurophysiol. 97:4296–309 [Google Scholar]
  49. Knill DC, Kersten D, Yuille A. 1996. Introduction: a Bayesian formulation of visual perception. Perception as Bayesian Inference DC Knill, W Richards 1–21 Cambridge, UK: Cambridge Univ. Press [Google Scholar]
  50. Körding K. 2007. Decision theory: What “should” the nervous system do?. Science 318:5850606–10 [Google Scholar]
  51. Körding KP, Wolpert DM. 2006. Bayesian decision theory in sensorimotor control. Trends Cogn. Sci. 10:7319–26 [Google Scholar]
  52. Kriegeskorte N. 2011. Pattern-information analysis: from stimulus decoding to computational-model testing. NeuroImage 56:411–21 [Google Scholar]
  53. Kriegeskorte N, Kievit RA. 2013. Representational geometry: integrating cognition, computation, and the brain. Trends Cogn. Sci. 17:401–12 [Google Scholar]
  54. Kriegeskorte N, Mur M, Bandettini P. 2008a. Representational similarity analysis—connecting the branches of systems neuroscience. Front. Syst. Neurosci. 2:4 [Google Scholar]
  55. Kriegeskorte N, Mur M, Ruff DA, Kiani R, Bodurka J. et al. 2008b. Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron 60:1126–41 [Google Scholar]
  56. Krizhevsky A, Sutskever I, Hinton GE. 2012. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25:1097–105 [Google Scholar]
  57. LeCun Y, Bengio Y. 1995. Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks MA Arbib 255–58 Cambridge, MA: MIT Press [Google Scholar]
  58. LeCun Y, Bengio Y, Hinton G. 2015. Deep learning. Nature 521:436–44 [Google Scholar]
  59. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE. et al. 1989. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1:4541–51 [Google Scholar]
  60. Lee TS, Mumford D. 2003. Hierarchical Bayesian inference in the visual cortex. J. Opt. Soc. Am. A 20:71434–48 [Google Scholar]
  61. Lochmann T, Deneve S. 2011. Neural processing as causal inference. Curr. Opin. Neurobiol. 21:5774–81 [Google Scholar]
  62. Lowe DG. 1999. Object recognition from local scale-invariant features. Proc. 7th IEEE Int. Conf. Comput. Vis., Sept. 20–27, Kerkyra, Greece1150–57 New York: IEEE [Google Scholar]
  63. Ma WJ, Beck JM, Latham PE, Pouget A. 2006. Bayesian inference with probabilistic population codes. Nat. Neurosci. 9:1432–38 [Google Scholar]
  64. Maass W, Natschläger T, Markram H. 2002. Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput. 14:112531–60 [Google Scholar]
  65. McClelland JL. 2013. Integrating probabilistic models of perception and interactive neural networks: a historical and tutorial review. Front. Psychol. 4:503 [Google Scholar]
  66. McCulloch WS, Pitts W. 1943. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5:4115–33 [Google Scholar]
  67. Minsky M, Papert S. 1972. Perceptrons: An Introduction to Computational Geometry Cambridge, MA: MIT Press [Google Scholar]
  68. Mitchell TM, Shinkareva SV, Carlson A, Chang K-M, Malave VL. et al. 2008. Predicting human brain activity associated with the meanings of nouns. Science 320:1191–95 [Google Scholar]
  69. Moore CI, Cao R. 2008. The hemo-neural hypothesis: on the role of blood flow in information processing. J. Neurophysiol. 99:52035–47 [Google Scholar]
  70. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J. et al. 2015. Human-level control through deep reinforcement learning. Nature 518:529–33 [Google Scholar]
  71. Nguyen A, Yosinski J, Clune J. 2015. Deep neural networks are easily fooled: high confidence predictions for unrecognizable images Presented at IEEE Conf. Comput. Vis. Pattern Recognit., June 7–12, Boston arXiv:1412.1897v4 [cs.CV] [Google Scholar]
  72. Nili H, Wingfield C, Walther A, Su L, Marslen-Wilson W, Kriegeskorte N. 2014. A toolbox for representational similarity analysis. PLOS Comput. Biol. 10:e1003553 [Google Scholar]
  73. Olshausen BA, Field DJ. 1997. Sparse coding with an overcomplete basis set: a strategy employed by V1?. Vis. Res. 37:233311–25 [Google Scholar]
  74. Pouget A, Beck JM, Ma WJ, Latham PE. 2013. Probabilistic brains: knowns and unknowns. Nat. Neurosci. 16:91170–78 [Google Scholar]
  75. Prince SJ. 2012. Computer Vision: Models, Learning, and Inference Cambridge, UK: Cambridge Univ. Press [Google Scholar]
  76. Riesenhuber M, Poggio T. 1999. Hierarchical models of object recognition in cortex. Nat. Neurosci. 2:111019–25 [Google Scholar]
  77. Rosenblatt F. 1958. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65:6386 [Google Scholar]
  78. Rumelhart DE, Hinton GE, Williams RJ. 1986. Learning representations by back-propagating errors. Nature 323:6088533–36 [Google Scholar]
  79. Rumelhart DE, McClelland JL. PDP Research Group 1988. Parallel Distributed Processing 1354–62 New York: IEEE [Google Scholar]
  80. Sak H, Senior A, Beaufays F. 2014. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. arXiv:1402.1128 [cs.NE] [Google Scholar]
  81. Savin C, Deneve S. 2014. Spatio-temporal representations of uncertainty in spiking neural networks. Adv. Neural Inf. Process. Syst. 27:2024–32 [Google Scholar]
  82. Schäfer AM, Zimmermann HG. 2007. Recurrent neural networks are universal approximators. Int. J. Neural Syst. 17:4253–63 [Google Scholar]
  83. Schmidhuber J. 2015. Deep learning in neural networks: an overview. Neural Netw. 61:85–117 [Google Scholar]
  84. Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T. 2007. Robust object recognition with cortex-like mechanisms. IEEE Trans. Pattern Anal. Mach. Intell. 29:3411–26 [Google Scholar]
  85. Simoncelli EP, Olshausen BA. 2001. Natural image statistics and neural representation. Annu. Rev. Neurosci. 24:11193–216 [Google Scholar]
  86. Simonyan K, Vedaldi A, Zisserman A. 2014. Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv:1312.6034 [cs.CV] [Google Scholar]
  87. Sugase Y, Yamane S, Ueno S, Kawano K. 1999. Global and fine information coded by single neurons in the temporal visual cortex. Nature 400:6747869–73 [Google Scholar]
  88. Sutton RS, Barto AG. 1998. Reinforcement Learning: An Introduction 1: Cambridge, MA: MIT Press [Google Scholar]
  89. Sutskever I, Vinyals O, Le QVV. 2014. Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 27:3104–12 [Google Scholar]
  90. Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D. et al. 2014. Intriguing properties of neural networks Presented at Int. Conf. Learn. Represent., Apr. 14–16, Banff, Can arXiv:1312.6199v4 [cs.CV] [Google Scholar]
  91. Tanaka K. 1996. Inferotemporal cortex and object vision. Annu. Rev. Neurosci. 19:109–39 [Google Scholar]
  92. Tang H, Buia C, Madsen J, Anderson WS, Kreiman G. 2014. A role for recurrent processing in object completion: neurophysiological, psychophysical and computational evidence. arXiv:1409.2942 [q-bio.NC] [Google Scholar]
  93. Tang Y, Srivastava N, Salakhutdinov RR. 2014. Learning generative models with visual attention. Adv. Neural Inf. Process. Syst. 27:1808–16 [Google Scholar]
  94. Tarr MJ. 1999. News on views: pandemonium revisited. Nat. Neurosci. 2:932–35 [Google Scholar]
  95. Tenenbaum JB, Griffiths TL, Kemp C. 2006. Theory-based Bayesian models of inductive learning and reasoning. Trends Cogn. Sci. 10:7309–18 [Google Scholar]
  96. Tromans JM, Harris M, Stringer SM. 2011. A computational model of the development of separate representations of facial identity and expression in the primate visual system. PLOS ONE 6:e25616 [Google Scholar]
  97. Tsai C-Y, Cox DD. 2015. Measuring and understanding sensory representations within deep networks using a numerical optimization framework. arXiv:1502.04972 [cs.NE] [Google Scholar]
  98. Venugopalan S, Xu H, Donahue J, Rohrbach M, Mooney R, Saenko K. 2015. Translating videos to natural language using deep recurrent neural networks. arXiv:1412.4729 [cs.CV] [Google Scholar]
  99. von Helmholtz H. 1866. Handbuch der physiologischen Optik: Mit 213 in den Text eingedruckten Holzschnitten und 11 Tafeln 9: Voss [Google Scholar]
  100. Wallis G, Rolls ET. 1997. A model of invariant object recognition in the visual system. Prog. Neurobiol. 51:167–94 [Google Scholar]
  101. Werbos PJ. 1981. Applications of advances in nonlinear sensitivity analysis. Proceedings of the 10th IFIP Conference762–70 [Google Scholar]
  102. Yamins DL, Hong H, Cadieu CF, DiCarlo JJ. 2013. Hierarchical modular optimization of convolutional networks achieves representations similar to macaque IT and human ventral stream. Adv. Neural Inf. Process. Syst. 26:3093–101 [Google Scholar]
  103. Yamins DLK, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ. 2014. Performance-optimized hierarchical models predict neural responses in higher visual cortex. PNAS 111:8619–24 [Google Scholar]
  104. Yosinski J, Clune J, Nguyen A, Fuchs T, Lipson H. 2015. Understanding neural networks through deep visualization. arXiv:1506.06579 [cs.CV] [Google Scholar]
  105. Yuille A, Kersten D. 2006. Vision as Bayesian inference: analysis by synthesis?. Trends Cogn. Sci. 10:7301–8 [Google Scholar]
  106. Zeiler MD, Fergus R. 2014. Visualizing and understanding convolutional networks. Proc. 13th Eur. Conf. Comput. Vis., Sept. 6–12, Zurich818–833 New York: Springer [Google Scholar]
  107. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. 2015. Object detectors emerge in deep scene CNNs Presented at Int. Conf. Learn. Represent., May 7–9, San Diego arXiv:1412.6856 [cs.CV] [Google Scholar]
/content/journals/10.1146/annurev-vision-082114-035447
Loading
/content/journals/10.1146/annurev-vision-082114-035447
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error