1932

Abstract

Artificial vision has often been described as one of the key remaining challenges to be solved before machines can act intelligently. Recent developments in a branch of machine learning known as deep learning have catalyzed impressive gains in machine vision—giving a sense that the problem of vision is getting closer to being solved. The goal of this review is to provide a comprehensive overview of recent deep learning developments and to critically assess actual progress toward achieving human-level visual intelligence. I discuss the implications of the successes and limitations of modern machine vision algorithms for biological vision and the prospect for neuroscience to inform the design of future artificial vision systems.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-vision-091718-014951
2019-09-15
2024-04-12
Loading full text...

Full text loading...

/deliver/fulltext/vision/5/1/annurev-vision-091718-014951.html?itemId=/content/journals/10.1146/annurev-vision-091718-014951&mimeType=html&fmt=ahah

Literature Cited

  1. Abadi M, Barham P, Chen J, Chen Z, Davis A et al. 2016. TensorFlow: a system for large-scale machine learning. Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation265–83 Berkeley, CA: USENIX Assoc.
    [Google Scholar]
  2. Abbasi-Asl R, Chen Y, Bloniarz A, Oliver M 2018. The DeepTune framework for modeling and characterizing neurons in visual cortex area V4. bioRxiv 465534
  3. Adelson EH, Bergen JR. 1985. Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A. 2:2284–99
    [Google Scholar]
  4. Agrawal P, Girshick R, Malik J 2014. Analyzing the performance of multilayer neural networks for object recognition. Computer Vision: ECCV 2014329–44 Berlin: Springer
    [Google Scholar]
  5. Antolík J, Hofer SB, Bednar JA, Mrsic-Flogel TD 2016. Model constrained by visual hierarchy improves prediction of neural responses to natural scenes. PLOS Comput. Biol. 12:6e1004927
    [Google Scholar]
  6. Azulay A, Weiss Y. 2018. Why do deep convolutional networks generalize so poorly to small image transformations?. arXiv:1805.12177 [cs.CV]
  7. Badrinarayanan V, Kendall A, Cipolla R 2017. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39:122481–95
    [Google Scholar]
  8. Baker N, Lu H, Erlikhman G, Kellman PJ 2018. Deep convolutional networks do not classify based on global object shape. PLOS Comput. Biol. 14:12e1006613
    [Google Scholar]
  9. Baldi P. 2018. Deep learning in biomedical data science. Annu. Rev. Biomed. Data Sci. 1:181–205
    [Google Scholar]
  10. Bashivan P, Kar K, DiCarlo JJ 2019. Neural population control via deep image synthesis. Science 364:6439eaav9436
    [Google Scholar]
  11. Batty E, Merel J, Brackbill N, Heitman A, Sher A et al. 2016. Multilayer recurrent network models of primate retinal ganglion cell responses Paper presented at the 5th International Conference on Learning Representations (ICLR) Toulon, France:
  12. Berardino A, Laparra V, Ballé J, Simoncelli E 2017. Eigen-distortions of hierarchical representations. Advances in Neural Information Processing Systems 30 I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus, S Vishwanathan, R Garnett 3530–39 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  13. Bengio Y, Lee D-H, Bornschein J, Mesnard T, Lin Z 2015. Towards biologically plausible deep learning. arXiv:1502.04156 [cs.LG]
  14. Biederman I. 1987. Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94:2115–47
    [Google Scholar]
  15. Biparva M, Tsotsos J. 2017. STNet: selective tuning of convolutional networks for object localization Paper presented at the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Venice, Italy:
  16. Brendel W, Rauber J, Bethge M 2017. Decision-based adversarial attacks: reliable attacks against black-box machine learning models Paper presented at the 6th International Conference on Learning Representations (ICLR) Vancouver, BC, Canada:
  17. Brock A, Donahue J, Simonvan K 2019. Large scale GAN training for high fidelity natural image synthesis Paper presented at the 7th International Conference on Learning Representations (ICLR) New Orleans, LA:
  18. Brosch T, Neumann H, Roelfsema PR 2015. Reinforcement learning of linking and tracing contours in recurrent neural networks. PLOS Comput. Biol. 11:10e1004489
    [Google Scholar]
  19. Brown TB, Mané D, Roy A, Abadi M, Gilmer J 2017. Adversarial patch. arXiv:1712.09665 [cs.CV]
  20. Cadena SA, Weis MA, Gatys LA, Bethge M, Ecker AS 2018. Diverse feature visualizations reveal invariances in early layers of deep neural networks. Computer Vision: ECCV 2018 V Ferrari, M Hebert, C Sminchisescu, Y Weiss 225–40 Berlin: Springer
    [Google Scholar]
  21. Cadena SA, Denfield GH, Walker EY, Gatys LA, Tolias AS et al. 2019. Deep convolutional models improve predictions of macaque V1 responses to natural images. PLOS Comput. Biol. 15:4e1006897
    [Google Scholar]
  22. Cadieu CF, Hong H, Yamins DLK, Pinto N, Ardila D et al. 2014. Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLOS Comput. Biol. 10:12e1003963
    [Google Scholar]
  23. Carreira J, Zisserman A. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)4724–33 Piscataway, NJ: IEEE
    [Google Scholar]
  24. Cauchoix M, Crouzet SM, Fize D, Serre T 2016. Fast ventral stream neural activity enables rapid visual categorization. NeuroImage 125:280–90
    [Google Scholar]
  25. Chen L, Zhang H, Xiao J, Nie L, Shao J et al. 2017. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)6298–306 Piscataway, NJ: IEEE
    [Google Scholar]
  26. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL 2018. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40:4834–48
    [Google Scholar]
  27. Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F et al. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation Paper presented at the 2014 Conference on Empirical Methods in Natural Language Doha, Qatar:
  28. Cichy RM, Khosla A, Pantazis D, Oliva A 2017. Dynamics of scene representations in the human brain revealed by magnetoencephalography and deep neural networks. NeuroImage 153:346–58
    [Google Scholar]
  29. Cichy RM, Khosla A, Pantazis D, Torralba A, Oliva A 2016. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Sci. Rep. 6:27755
    [Google Scholar]
  30. Cireşan D, Meier U, Masci J, Schmidhuber J 2012. Multi-column deep neural network for traffic sign classification. Neural Netw 32:333–38
    [Google Scholar]
  31. Crick F. 1989. The recent excitement about neural networks. Nature 337:6203129–32
    [Google Scholar]
  32. Dai J, Li Y, He K, Sun J 2016. R-FCN: object detection via region-based fully convolutional networks. Advances in Neural Information Processing Systems 29 DD Lee, M Sugiyama, UV Luxburg, I Guyon, R Garnett 379–87 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  33. Devereux BJ, Clarke A, Tyler LK 2018. Integrated deep visual and semantic attractor neural networks predict fMRI pattern-information along the ventral object processing pathway. Sci. Rep. 8:110636
    [Google Scholar]
  34. DiCarlo JJ, Zoccolan D, Rust NC 2012. How does the brain solve visual object recognition?. Neuron 73:3415–34
    [Google Scholar]
  35. Doersch C, Zisserman A. 2017. Multi-task self-supervised visual learning. Proceedings of the 2017 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)2070–79 Piscataway, NJ: IEEE
    [Google Scholar]
  36. Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N et al. 2014. DeCAF: a deep convolutional activation feature for generic visual recognition. Proceedings of the 31st International Conference on Machine Learning647–55 La Jolla, CA: Int. Conf. Machine Learn.
    [Google Scholar]
  37. Dubey R, Peterson J, Khosla A, Yang M-H, Ghanem B 2015. What makes an object memorable?. Proceedings of the 2015 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)1089–97 Piscataway, NJ: IEEE
    [Google Scholar]
  38. Eberhardt S, Cader JG, Serre T 2016. How deep is the feature analysis underlying rapid visual categorization?. Advances in Neural Information Processing Systems 29 DD Lee, M Sugiyama, UV Luxburg, I Guyon, R Garnett 1100–8 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  39. Eckstein MP, Koehler K, Welbourne LE, Akbas E 2017. Humans, but not deep neural networks, often miss giant targets in scenes. Curr. Biol. 27:182827–32.e3
    [Google Scholar]
  40. Eickenberg M, Gramfort A, Varoquaux G, Thirion B 2017. Seeing it all: Convolutional network layers map the function of the human visual system. NeuroImage 152:184–94
    [Google Scholar]
  41. Ellis K, Solar-Lezama A, Tenenbaum J 2015. Unsupervised learning by program synthesis. Advances in Neural Information Processing Systems 28 C Cortes, ND Lawrence, DD Lee, M Sugiyama, R Garnett 973–81 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  42. Erdogan G, Jacobs RA. 2017. Visual shape perception as Bayesian inference of 3D object-centered shape representations. Psychol. Rev. 124:6740–61
    [Google Scholar]
  43. Erhan D, Bengio Y, Courville A, Vincent P 2009. Visualizing higher-layer features of a deep network Tech. Rep., Univ. Montreal
  44. Eslami SMA, Jimenez Rezende D, Besse F, Viola F, Morcos AS et al. 2018. Neural scene representation and rendering. Science 360:63941204–10
    [Google Scholar]
  45. Eykholt K, Evtimov I, Fernandes E, Li B, Rahmati A et al. 2018. Robust physical-world attacks on deep learning visual classification. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)1625–34 Piscataway, NJ: IEEE
    [Google Scholar]
  46. Fleuret F, Li T, Dubout C, Wampler EK, Yantis S, Geman D 2011. Comparing machines and humans on a visual categorization test. PNAS 108:4317621–25
    [Google Scholar]
  47. Freedman DJ, Riesenhuber M, Poggio T, Miller EK 2001. Categorical representation of visual stimuli in the primate prefrontal cortex. Science 291:312–16
    [Google Scholar]
  48. Fukushima K. 1980. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36:193–202
    [Google Scholar]
  49. Gatys LA, Ecker AS, Bethge M 2017. Texture and art with deep neural networks. Curr. Opin. Neurobiol. 46:178–86
    [Google Scholar]
  50. Geirhos R, Temme J, Rauber J, Schutt M, Bethge M, Wichmann FA 2018. Generalisation in humans and deep neural networks. Advances in Neural Information Processing Systems 31 S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett 7549–61 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  51. George D, Lehrach W, Kansky K, Lázaro-Gredilla M, Laan C et al. 2017. A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs. Science 358:6368eaag2612
    [Google Scholar]
  52. Giese MA, Poggio T. 2003. Neural mechanisms for the recognition of biological movements. Nat. Rev. Neurosci. 4:3179–92
    [Google Scholar]
  53. Gilmer J, Adams RP, Goodfellow I, Andersen D, Dahl GE 2018. Motivating the rules of the game for adversarial example research. arXiv:1807.06732 [cs.LG]
  54. Girshick R. 2015. Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)1440–48 Piscataway, NJ: IEEE
    [Google Scholar]
  55. Girshick R, Donahue J, Darrell T, Malik J 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)580–87 Piscataway, NJ: IEEE
    [Google Scholar]
  56. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D et al. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger 2672–80 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  57. Graves A, Wayne G, Reynolds M, Harley T, Danihelka I et al. 2016. Hybrid computing using a neural network with dynamic external memory. Nature 538:7626471–76
    [Google Scholar]
  58. Greene MR, Hansen BC. 2018. Shared spatiotemporal category representations in biological and artificial deep neural networks. PLOS Comput. Biol. 14:7e1006327
    [Google Scholar]
  59. Gross CG, Rocha-Miranda CE, Bender DB 1972. Visual properties of neurons in inferotemporal cortex of the macaque. J. Neurophysiol. 35:196–111
    [Google Scholar]
  60. Güçlü U, van Gerven MAJ 2015. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. J. Neurosci. 35:2710005–14
    [Google Scholar]
  61. Güçlü U, van Gerven MAJ 2017. Increasingly complex representations of natural movies across the dorsal stream are shared between subjects. NeuroImage 145:Pt. B329–36
    [Google Scholar]
  62. Güçlütürk Y, Güçlü U, Seeliger K, Bosch S, van Lier R, van Gerven MAJ 2017. Reconstructing perceived faces from brain activations with deep adversarial neural decoding. Advances in Neural Information Processing Systems 30 I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus et al.4246–57 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  63. Güler RA, Neverova N, Kokkinos I 2018. DensePose: dense human pose estimation in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition7297–306 Piscataway, NJ: IEEE
    [Google Scholar]
  64. Hamaguchi R, Fujita A, Nemoto K, Imaizumi T, Hikosaka S 2017. Effective use of dilated convolutions for segmenting small object instances in remote sensing imagery. Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision art. 00162 Piscataway, NJ: IEEE
    [Google Scholar]
  65. Han K, Wen H, Zhang Y, Fu D, Culurciello E, Liu Z 2018. Deep predictive coding network with local recurrent processing for object recognition. Advances in Neural Information Processing Systems 319221–33 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  66. Hara K, Kataoka H, Satoh Y 2018. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)18–22 Piscataway, NJ: IEEE
    [Google Scholar]
  67. Hassabis D, Kumaran D, Summerfield C, Botvinick M 2017. Neuroscience-inspired artificial intelligence. Neuron 95:2245–58
    [Google Scholar]
  68. He K, Gkioxari G, Dollar P, Girshick R 2019. Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. In press
    [Google Scholar]
  69. He K, Zhang X, Ren S, Sun J 2016. Deep residual learning for image recognition. arXiv:1512.03385 [cs.CV]
  70. Heeger DJ, Simoncelli EP, Movshon JA 1996. Computational models of cortical visual processing. PNAS 93:2623–27
    [Google Scholar]
  71. Hénaff OJ, Goris RLT, Simoncelli EP 2019. Perceptual straightening of natural videos. Nat. Neurosci. 22:6984–91
    [Google Scholar]
  72. Hinton GE, Sabour S, Frosst N 2018. Matrix capsules with EM routing Paper presented at the 6th International Conference on Learning Representations Vancouver, Canada:
  73. Hochreiter S, Schmidhuber J. 1997. Long short-term memory. Neural Comput 9:81735–80
    [Google Scholar]
  74. Hong H, Yamins DLK, Majaj NJ, DiCarlo JJ 2016. Explicit information for category-orthogonal object properties increases along the ventral stream. Nat. Neurosci. 19:4613–22
    [Google Scholar]
  75. Horikawa T, Kamitani Y. 2017a. Generic decoding of seen and imagined objects using hierarchical visual features. Nat. Commun. 8:15037
    [Google Scholar]
  76. Horikawa T, Kamitani Y. 2017b. Hierarchical neural representation of dreamed objects revealed by brain decoding with deep neural network features. Front. Comput. Neurosci. 11:4
    [Google Scholar]
  77. Hu J, Shen L, Albanie S, Sun G, Wu E 2019. Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. In press. https://doi.org/10.1109/TPAMI.2019.2913372
    [Crossref] [Google Scholar]
  78. Jaderberg M, Dalibard V, Osindero S, Czarnecki WM, Donahue J et al. 2017. Population based training of neural networks. arXiv:1711.09846 [cs.LG]
  79. Jhuang H, Serre T, Wolf L, Poggio T 2007. A biologically inspired system for action recognition. Proceedings of the 2007 IEEE 11th International Conference on Computer Vision and Pattern Recognition (CVPR)1–8 Piscataway, NJ: IEEE
    [Google Scholar]
  80. Jo J, Bengio Y. 2017. Measuring the tendency of CNNs to learn surface statistical regularities. arXiv:1711.11561 [cs.LG]
  81. Johnson J, Hariharan B, van der Maaten L, Li F-F, Zitnick CL, Girshick R 2017. CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)1988–97 Piscataway, NJ: IEEE
    [Google Scholar]
  82. Johnson SP. 2001. Visual development in human infants: binding features, surfaces, and objects. Vis. Cogn. 8:3–5565–78
    [Google Scholar]
  83. Jones JP, Palmer LA. 1987. The two-dimensional spatial structure of simple receptive fields in cat striate cortex. J. Neurophysiol. 58:61187–211
    [Google Scholar]
  84. Jozwik KM, Kriegeskorte N, Storrs KR, Mur M 2017. Deep convolutional neural networks outperform feature-based but not categorical models in explaining object similarity judgments. Front. Psychol. 8:1726
    [Google Scholar]
  85. Kalfas I, Kumar S, Vogels R 2017. Shape selectivity of middle superior temporal sulcus body patch neurons. eNeuro 4:3 ENEURO.0113-17 2017
    [Google Scholar]
  86. Karras T, Aila T, Laine S, Lehtinen J 2018. Progressive growing of GANs for improved quality, stability, and variation Paper presented at the International Conference on Learning Representations Vancouver, BC, Canada:
  87. Kay W, Carreira J, Simonyan K, Zhang B, Hillier C et al. 2017. The Kinetics Human Action Video dataset. arXiv:1705.06950 [cs.CV]
  88. Kellman P, Baker N, Erlikhman G, Lu H 2017. Classification images reveal that deep learning networks fail to perceive illusory contours. J. Vis. 17:10569
    [Google Scholar]
  89. Khaligh-Razavi S-M, Henriksson L, Kay K, Kriegeskorte N 2017. Fixed versus mixed RSA: explaining visual representations by fixed and mixed feature sets from shallow and deep computational models. J. Math. Psychol. 76:Pt. B184–97
    [Google Scholar]
  90. Khaligh-Razavi S-M, Kriegeskorte N. 2014. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLOS Comput. Biol. 10:11e1003915
    [Google Scholar]
  91. Kheradpisheh SR, Ghodrati M, Ganjtabesh M, Masquelier T 2016a. Deep networks can resemble human feed-forward vision in invariant object recognition. Sci. Rep. 6:32672
    [Google Scholar]
  92. Kheradpisheh SR, Ghodrati M, Ganjtabesh M, Masquelier T 2016b. Humans and deep networks largely agree on which kinds of variation make object recognition harder. Front. Comput. Neurosci. 10:92
    [Google Scholar]
  93. Kim JK, Ricci M, Serre T 2018. Not-So-CLEVR: learning same-different relations strains feedforward neural networks. Interface Focus 8:420180011
    [Google Scholar]
  94. Kingma DP, Welling M. 2014. Auto-encoding variational Bayes Paper presented at the International Conference on Learning Representations Banff, Canada:
  95. Klindt D, Ecker AS, Euler T, Bethge M 2017. Neural system identification for large populations separating “what” and “where.”. Advances in Neural Information Processing Systems 30 I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus et al.3506–16 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  96. Kobatake E, Tanaka K. 1994. Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex. J. Neurophysiol. 71:3856–67
    [Google Scholar]
  97. Kokkinos I. 2017. UberNet: training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)5454–63 Piscataway, NJ: IEEE
    [Google Scholar]
  98. Kriegeskorte N. 2015. Deep neural networks: a new framework for modeling biological vision and brain information processing. Annu. Rev. Vis. Sci. 1:417–46
    [Google Scholar]
  99. Krizhevsky A, Sutskever I, Hinton GE 2012. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 F Pereira, CJC Burges, L Bottou, KQ Weinberger 1097–105 Red Hook, NY: Curran Assoc
    [Google Scholar]
  100. Kubilius J, Bracci S, Op de Beeck HP 2016. Deep neural networks as a computational model for human shape sensitivity. PLOS Comput. Biol. 12:4e1004896
    [Google Scholar]
  101. Lake BM, Zaremba W, Fergus R, Gureckis TM 2015. Deep neural networks predict category typicality ratings for images. Proceedings of the 37th Annual Conference of the Cognitive Science Society1243–48 Seattle, WA: Cogn. Sci. Soc.
    [Google Scholar]
  102. Landau B, Smith LB, Jones SS 1988. The importance of shape in early lexical learning. Cogn. Dev. 3:3299–321
    [Google Scholar]
  103. LeCun Y, Bottou L, Bengio Y, Haffner P 1998. Gradient-based learning applied to document recognition. Proc. IEEE. 86:112278–324
    [Google Scholar]
  104. Lee K, Zung J, Li P, Jain V, Seung HS 2017. Superhuman accuracy on the SNEMI3D connectomics challenge. arXiv:1706.00120 [cs.CV]
  105. Lee TS, Mumford D, Romero R, Lamme VA 1998. The role of the primary visual cortex in higher level vision. Vis. Res. 38:15–162429–54
    [Google Scholar]
  106. Li N, Dicarlo JJ. 2012. Neuronal learning of invariant object representation in the ventral visual stream is not dependent on reward. J. Neurosci. 32:196611–20
    [Google Scholar]
  107. Lin T-Y, Goyal P, Girshick R, He K, Dollar P 2019. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. In press. https://doi.org/10.1109/TPAMI.2018.2858826
    [Crossref] [Google Scholar]
  108. Linsley D, Eberhardt S, Sharma T, Gupta P, Serre T 2017. What are the visual features underlying human versus machine vision?. Proceedings of the IEEE ICCV Workshop on the Mutual Benefit of Cognitive and Computer Vision2706–14 Piscataway, NJ: IEEE
    [Google Scholar]
  109. Linsley D, Kim JK, Veerabadran V, Windolf C, Serre T 2018. Learning long-range spatial dependencies with horizontal gated recurrent units. Advances in Neural Information Processing Systems 31 S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett 152–64 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  110. Linsley D, Schiebler D, Eberhardt S, Serre T 2019. Learning what and where to attend Paper presented at the Seventh International Conference on Learning Representations New Orleans, LA:
  111. Logothetis NK, Pauls J, Poggio T 1995. Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 5:552–63
    [Google Scholar]
  112. Long J, Shelhamer E, Darrell T 2015. Fully convolutional networks for semantic segmentation. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)3431–40 Piscataway, NJ: IEEE
    [Google Scholar]
  113. Lotter W, Kreiman G, Cox D 2016. Deep predictive coding networks for video prediction and unsupervised learning Paper presented at the 5th International Conference on Learning Representations Toulon, France:
  114. Maheswaranathan N, Kastner DB, Baccus SA, Ganguli S 2018. Inferring hidden structure in multilayered neural circuits. PLOS Comput. Biol. 14:8e1006291
    [Google Scholar]
  115. Maninis K-K, Pont-Tuset J, Arbelaez P, Van Gool L 2018. Convolutional oriented boundaries: from image segmentation to high-level tasks. IEEE Trans. Pattern Anal. Mach. Intell. 40:4819–33
    [Google Scholar]
  116. Martinho A, Kacelnik A. 2016. Ducklings imprint on the relational concept of “same or different.”. Science 353:6296286–88
    [Google Scholar]
  117. Mathis A, Mamidanna P, Cury KM, Abe T, Murthy VN et al. 2018. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21:91281–89
    [Google Scholar]
  118. McIntosh L, Maheswaranathan N, Nayebi A, Ganguli S, Baccus S 2016. Deep learning models of the retinal response to natural scenes. Advances in Neural Information Processing Systems 29 DD Lee, M Sugiyama, UV Luxburg, I Guyon, R Garnett 1369–77 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  119. Miconi T, Clune J, Stanley KO 2018. Differentiable plasticity: training plastic neural networks with backpropagation Proceedings of the 35th International Conference on Machine Learning (ICML2018)3556–65 La Jolla, CA: Int. Conf. Machine Learn.
  120. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J et al. 2015. Human-level control through deep reinforcement learning. Nature 518:7540529–33
    [Google Scholar]
  121. Moldwin T, Segev I. 2018. Perceptron learning and classification in a modeled cortical pyramidal cell. bioRxiv 464826
  122. Moosavi-Dezfooli S-M, Fawzi A, Fawzi O, Frossard P 2017. Universal adversarial perturbations. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)1765–73 Piscataway, NJ: IEEE
    [Google Scholar]
  123. Morcos AS, Barrett DGT, Rabinowitz NC, Botvinick M 2018. On the importance of single directions for generalization Paper presented at the 6th International Conference on Learning Representations Vancouver, Canada:
  124. Mordvintsev A, Pezzotti N, Schubert L, Olah C 2018. Differentiable image parameterizations. Distill July 25. https://distill.pub/2018/differentiable-parameterizations/
    [Google Scholar]
  125. Nguyen A, Yosinski J, Clune J 2015. Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 427–36 Piscataway, NJ: IEEE
    [Google Scholar]
  126. Nosofsky RM. 1987. Attention and learning processes in the identification and categorization of integral stimuli. J. Exp. Psychol. Learn. Mem. Cogn. 13:187–108
    [Google Scholar]
  127. Olah C, Mordvintsev A, Schubert L 2017. Feature visualization. Distill Nov. 7. https://distill.pub/2017/feature-visualization/
    [Google Scholar]
  128. Olah C, Satyanarayan A, Johnson I, Carter S, Schubert L et al. 2018. The building blocks of interpretability. Distill March 6. https://distill.pub/2018/building-blocks/
    [Google Scholar]
  129. Oquab M, Bottou L, Laptev I, Sivic J 2014. Learning and transferring mid-level image representations using convolutional neural networks. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)1717–24 Piscataway, NJ: IEEE
    [Google Scholar]
  130. O'Reilly RC, Wyatte DR, Rohrlich J 2017. Deep predictive learning: a comprehensive model of three visual streams. arXiv:1709.04654 [q-bio.NC]
  131. Ostrovsky Y, Meyers E, Ganesh S, Mathur U, Sinha P 2009. Visual parsing after recovery from blindness. Psychol. Sci. 20:121484–91
    [Google Scholar]
  132. Papandreou G, Kokkinos I, Savalle P-A 2015. Modeling local and global deformations in Deep Learning: epitomic convolution, Multiple Instance Learning, and sliding window detection. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)390–99 Piscataway, NJ: IEEE
    [Google Scholar]
  133. Parde CJ, Castillo C, Hill MQ, Colon YI, Sankaranarayanan S et al. 2017. Face and image representation in deep CNN features. Proceedings of the 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017)673–80 Piscataway, NJ: IEEE
    [Google Scholar]
  134. Park DY, Lee KH. 2019. Arbitrary style transfer with style-attentional. Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)5873–81 Piscataway, NJ: IEEE
    [Google Scholar]
  135. Paszke A, Gross S, Chintala S, Chanan G, Yang E et al. 2017. Automatic differentiation in PyTorch Paper presented at the Neural Information Processing Systems Autodiff Workshop Long Beach, CA:
  136. Peterson JC, Abbott JT, Griffiths TL 2018. Evaluating (and improving) the correspondence between deep neural networks and human representations. Cogn. Sci. 42:82648–69
    [Google Scholar]
  137. Phillips PJ, Yates AN, Hu Y, Hahn CA, Noyes E et al. 2018. Face recognition accuracy of forensic examiners, super-recognizers, and face recognition algorithms. PNAS 115:246171–76
    [Google Scholar]
  138. Ponce CR, Xiao W, Schade PF, Hartmann TS, Kreiman G, Livingstone MS 2019. Evolving images for visual neurons using a deep generative network reveals coding principles and neuronal preferences. Cell 177:4999–1009.e10
    [Google Scholar]
  139. Portilla J, Simoncelli EP. 2000. A parametric texture model based on joint statistics of complex wavelet coefficients. Int. J. Comput. Vis. 40:149–71
    [Google Scholar]
  140. Pramod RT, Arun SP. 2016. Do computational models differ systematically from human object perception?. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)1601–9 Piscataway, NJ: IEEE
    [Google Scholar]
  141. Radovic A, Williams M, Rousseau D, Kagan M, Bonacorsi D et al. 2018. Machine learning at the energy and intensity frontiers of particle physics. Nature 560:771641–48
    [Google Scholar]
  142. Rajalingham R, Issa EB, Bashivan P, Kar K, Schmidt K, DiCarlo JJ 2018. Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. J. Neurosci. 38:337255–69
    [Google Scholar]
  143. Recht B, Roelofs R, Schmidt L, Shankar V 2018. Do CIFAR-10 classifiers generalize to CIFAR-10?. arXiv:1806.00451 [cs.LG]
  144. Recht B, Roelofs R, Schmidt L, Shankar V 2019. Do ImageNet classifiers generalize to ImageNet?. arXiv:1902.10811 [cs.CV]
  145. Redmon J, Farhadi A. 2017. YOLO9000: better, faster, stronger. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)6517–25 Piscataway, NJ: IEEE
    [Google Scholar]
  146. Ren S, He K, Girshick R, Sun J 2015. Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28 C Cortes, ND Lawrence, DD Lee, M Sugiyama, R Garnett91–99 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  147. Riesenhuber M, Poggio T. 1999. Hierarchical models of object recognition in cortex. Nat. Neurosci. 2:1019–25
    [Google Scholar]
  148. Ritter S, Barrett DGT, Santoro A, Botvinick MM 2017. Cognitive psychology for deep neural networks: a shape bias case study. Proceedings of the 34th International Conference on Machine Learning2940–49 La Jolla, CA: Int. Conf. Machine Learn.
    [Google Scholar]
  149. Ronneberger O, Fischer P, Brox T 2015. U-Net: convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention: MICCAI 2015234–41 Berlin: Springer
    [Google Scholar]
  150. Rosenfeld A, Solbach MD, Tsotsos JK 2018a. Totally looks like—how humans compare, compared to machines. arXiv:1803.01485 [cs.CV]
  151. Rosenfeld A, Zemel R, Tsotsos JK 2018b. The elephant in the room. arXiv:1808.03305 [cs.CV]
  152. Russakovsky O, Deng J, Su H, Krause J, Satheesh S et al. 2015. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 115:211–52
    [Google Scholar]
  153. Sabour S, Frosst N, Hinton GE 2017. Dynamic routing between capsules. Advances in Neural Information Processing Systems 30 I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus, S Vishwanathan, R Garnett 3859–69 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  154. Salakhutdinov R, Hinton G. 2009. Deep Boltzmann machines. Proceedings of the 12th International Conference on Artificial Intelligence and Statistics D van Dyk, M Welling 448–55 New York: ACM
    [Google Scholar]
  155. Saleh B, Elgammal AM, Feldman J, Farhadi A 2016. Toward a taxonomy and computational models of abnormalities in images. Proceedings of the 30th AAAI Conference on Artificial Intelligence3588–96 Palo Alto, CA: Assoc. Adv. Artif. Intell.
    [Google Scholar]
  156. Santoro A, Raposo D, Barrett DGT, Malinowski M, Pascanu R et al. 2017. A simple neural network module for relational reasoning. Advances in Neural Information Processing Systems 30 I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus, S Vishwanathan, R Garnett 4974–83 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  157. Scellier B, Bengio Y. 2017. Equilibrium propagation: bridging the gap between energy-based models and backpropagation. Front. Comput. Neurosci. 11:24
    [Google Scholar]
  158. Segler MHS, Preuss M, Waller MP 2018. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555:7698604–10
    [Google Scholar]
  159. Serre T. 2015. Hierarchical models of the visual system. Encyclopedia of Computational Neuroscience D Jaeger, R Jung 1309–18 Berlin: Springer
    [Google Scholar]
  160. Sharif M, Bhagavatula S, Bauer L, Reiter MK 2019. A general framework for adversarial examples with objectives. ACM Trans. Priv. Secur. 22:36
    [Google Scholar]
  161. Silver D, Huang A, Maddison CJ, Guez A, Sifre L et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529:7587484–89
    [Google Scholar]
  162. Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M et al. 2018. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362:64191140–44
    [Google Scholar]
  163. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A et al. 2017. Mastering the game of Go without human knowledge. Nature 550:7676354–59
    [Google Scholar]
  164. Simonyan K, Zisserman A. 2014. Two-stream convolutional networks for action recognition in videos. Advances in Neural Information Processing Systems 27 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger 568–76 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  165. Simonyan K, Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition Paper presented at the 3rd International Conference on Learning Representations San Diego, CA:
  166. Spoerer CJ, McClure P, Kriegeskorte N 2017. Recurrent convolutional neural networks: a better model of biological object recognition. Front. Psychol. 8:1551
    [Google Scholar]
  167. Stabinger S, Rodríguez-Sánchez A, Piater J 2016. 25 years of CNNs: Can we compare to human abstraction capabilities?. Artificial Neural Networks and Machine Learning: ICANN 2016380–87 Berlin: Springer
    [Google Scholar]
  168. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S et al. 2015. Going deeper with convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)1–9 Piscataway, NJ: IEEE
    [Google Scholar]
  169. Szegedy C, Zaremba W, Sutskever I 2013. Intriguing properties of neural networks. arXiv:1312.6199 [cs.CV]
  170. Tang H, Schrimpf M, Lotter W, Moerman C, Paredes A et al. 2018. Recurrent computations for visual pattern completion. PNAS 115:358835–40
    [Google Scholar]
  171. Tonegawa S, Pignatelli M, Roy DS, Ryan TJ 2015. Memory engram storage and retrieval. Curr. Opin. Neurobiol. 35:101–9
    [Google Scholar]
  172. Torralba A, Efros AA. 2011. Unbiased look at dataset bias. Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)1521–28 Piscataway, NJ: IEEE
    [Google Scholar]
  173. Tosic I, Frossard P. 2011. Dictionary learning. IEEE Signal Process. Mag. 28:227–38
    [Google Scholar]
  174. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M 2015. Learning spatiotemporal features with 3D convolutional networks. Proceedings of the 2015 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)4489–97 Piscataway, NJ: IEEE
    [Google Scholar]
  175. Ullman S, Assif L, Fetaya E, Harari D 2016. Atoms of recognition in human and computer vision. PNAS 113:102744–49
    [Google Scholar]
  176. Van Horn G, Mac Aodha O, Song Y, Cui Y, Sun C et al. 2018. The iNaturalist species classification and detection dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition8769–78 Piscataway, NJ: IEEE
    [Google Scholar]
  177. Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D 2016. Matching networks for one shot learning. Advances in Neural Information Processing Systems 29 DD Lee, M Sugiyama, UV Luxburg, I Guyon, R Garnett 3630–38 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  178. Wainberg M, Merico D, Delong A, Frey BJ 2018. Deep learning in biomedicine. Nat. Biotechnol. 36:9829–38
    [Google Scholar]
  179. Wang F, Jiang M, Qian C, Yang S, Li C et al. 2017. Residual attention network for image classification. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)6450–58 Piscataway, NJ: IEEE
    [Google Scholar]
  180. Wang J, Zhang Z, Xie C, Zhou Y, Premachandran V et al. 2018. Visual concepts and compositional voting. Ann. Math. Sci. Appl. 3:1151–88
    [Google Scholar]
  181. Wang T, Sun M, Hu K 2017. Dilated deep residual network for image denoising. Proceedings of the 29th IEEE International Conference on Tools with Artificial Intelligence (ICTAI)1272–79 Piscataway, NJ: IEEE
    [Google Scholar]
  182. Whittington JCR, Bogacz R. 2017. An approximation of the error backpropagation algorithm in a predictive coding network with local Hebbian synaptic plasticity. Neural Comput 29:51229–62
    [Google Scholar]
  183. Xie S, Girshick R, Dollár P, Tu Z, He K 2017. Aggregated residual transformations for deep neural networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)5987–95 Piscataway, NJ: IEEE
    [Google Scholar]
  184. Xie S, Tu Z. 2017. Holistically-nested edge detection. Int. J. Comput. Vis. 125:13–18
    [Google Scholar]
  185. Yamins DLK, DiCarlo JJ. 2016. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. 19:3356–65
    [Google Scholar]
  186. Yamins DLK, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ 2014. Performance-optimized hierarchical models predict neural responses in higher visual cortex. PNAS 111:238619–24
    [Google Scholar]
  187. Yu F, Koltun V. 2016. Multi-scale context aggregation by dilated convolutions Paper presented at the 4th International Conference on Learning Representations (ICLR) San Juan, Puerto Rico:
  188. Zeiler MD, Fergus R. 2014. Visualizing and understanding convolutional networks. Computer Vision: ECCV 2014818–33 Berlin: Springer
    [Google Scholar]
  189. Zhang C, Bengio S, Hardt M, Recht B, Vinyals O 2016. Understanding deep learning requires rethinking generalization Paper presented at the 4th International Conference on Learning Representations, Toulon France: April 24–26
  190. Zhang R, Isola P, Efros AA 2016. Colorful image colorization. Computer Vision: ECCV 2016649–66 Berlin: Springer
    [Google Scholar]
  191. Zhou B, Bau D, Oliva A, Torralba A 2019. Interpreting deep visual representations via network dissection. IEEE Trans. Pattern Anal. Mach. Intell. 41:92131–45
    [Google Scholar]
  192. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A 2014. Learning deep features for scene recognition using Places Database. Advances in Neural Information Processing Systems 27 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger 487–95 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  193. Zhou Z, Firestone C. 2019. Humans can decipher adversarial images. Nat. Commun. 10:11334
    [Google Scholar]
  194. Zhu J-Y, Park T, Isola P, Efros AA 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks Paper presented at the IEEE International Conference on Computer Vision (ICCV) Venice, Italy:
/content/journals/10.1146/annurev-vision-091718-014951
Loading
/content/journals/10.1146/annurev-vision-091718-014951
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error