1932

Abstract

What are the core learning algorithms in brains? Nativists propose that intelligence emerges from innate domain-specific knowledge systems, whereas empiricists propose that intelligence emerges from domain-general systems that learn domain-specific knowledge from experience. We address this debate by reviewing digital twin studies designed to reverse engineer the learning algorithms in newborn brains. In digital twin studies, newborn animals and artificial agents are raised in the same environments and tested with the same tasks, permitting direct comparison of their learning abilities. Supporting empiricism, digital twin studies show that domain-general algorithms learn animal-like object perception when trained on the first-person visual experiences of newborn animals. Supporting nativism, digital twin studies show that domain-general algorithms produce innate domain-specific knowledge when trained on prenatal experiences (retinal waves). We argue that learning across humans, animals, and machines can be explained by a universal principle, which we call space-time fitting. Space-time fitting explains both empiricist and nativist phenomena, providing a unified framework for understanding the origins of intelligence.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-vision-101322-103628
2024-09-18
2024-10-05
Loading full text...

Full text loading...

/deliver/fulltext/vision/10/1/annurev-vision-101322-103628.html?itemId=/content/journals/10.1146/annurev-vision-101322-103628&mimeType=html&fmt=ahah

Literature Cited

  1. Adolph KE, Hoch JE. 2019.. Motor development: embodied, embedded, enculturated, and enabling. . Annu. Rev. Psychol. 70::14164
    [Crossref] [Google Scholar]
  2. Albert MV, Schnabel A, Field DJ. 2008.. Innate visual learning through spontaneous activity patterns. . PLOS Comput. Biol. 4:(8):e1000137
    [Crossref] [Google Scholar]
  3. Applegate MC, Gutnichenko KS, Mackevicius EL, Aronov D. 2023.. An entorhinal-like region in food-caching birds. . Curr. Biol. 33:(12):246577.e7
    [Crossref] [Google Scholar]
  4. Arroyo DA, Kirkby LA, Feller MB. 2016.. Retinal waves modulate an intraretinal circuit of intrinsically photosensitive retinal ganglion cells. . J. Neurosci. 36:(26):6892905
    [Crossref] [Google Scholar]
  5. ATLAS Collab., Aad G, Abajyan T, Abbott B, Abdallah J, et al. 2012.. A particle consistent with the Higgs boson observed with the ATLAS detector at the Large Hadron Collider. . Science 338:(6114):157682
    [Crossref] [Google Scholar]
  6. Azevedo FA, Carvalho LR, Grinberg LT, Farfel JM, Ferretti RE, et al. 2009.. Equal numbers of neuronal and nonneuronal cells make the human brain an isometrically scaled-up primate brain. . J. Comp. Neurol. 513:(5):53241
    [Crossref] [Google Scholar]
  7. Bear DM, Feigelis K, Chen H, Lee W, Venkatesh R, et al. 2023.. Unifying (machine) vision via counterfactual world modeling. . arXiv:2306.01828 [cs.CV]
  8. Binz M, Schulz E. 2023.. Using cognitive psychology to understand GPT-3. . PNAS 120:(6):e2218523120
    [Crossref] [Google Scholar]
  9. Blankenship AG, Feller MB. 2010.. Mechanisms underlying spontaneous patterned activity in developing neural circuits. . Nat. Rev. Neurosci. 11:(1):1829
    [Crossref] [Google Scholar]
  10. Boser BE, Guyon IM, Vapnik VN. 1992.. A training algorithm for optimal margin classifiers. . In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 14452. New York:: ACM
    [Google Scholar]
  11. Buckner C. 2023.. From Deep Learning to Rational Machines: What the History of Philosophy Can Teach Us About the Future of Artificial Intelligence. Oxford, UK:: Oxford Univ. Press
    [Google Scholar]
  12. Bulf H, Johnson SP, Valenza E. 2011.. Visual statistical learning in the newborn infant. . Cognition 121:(1):12732
    [Crossref] [Google Scholar]
  13. Calabrese A, Woolley SM. 2015.. Coding principles of the canonical cortical microcircuit in the avian brain. . PNAS 112:(11):351722
    [Crossref] [Google Scholar]
  14. Canny J. 1986.. A computational approach to edge detection. . IEEE Trans. Pattern Anal. Mach. Intel. (6):67998
    [Crossref] [Google Scholar]
  15. Cao YH, Wu J. 2022.. A random CNN sees objects: one inductive bias of CNN and its applications. . In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI-22), pp. 194202. Washington, DC:: Assoc. Adv. Artif. Intel.
    [Google Scholar]
  16. Carey S. 2009.. The Origin of Concepts. Oxford, UK:: Oxford Univ. Press
    [Google Scholar]
  17. Chen L, Lu K, Rajeswaran A, Lee K, Grover A, et al. 2021.. Decision transformer: reinforcement learning via sequence modeling. . Adv. Neural Inform. Process. Syst. 34::1508497
    [Google Scholar]
  18. Chen T, Kornblith S, Norouzi M, Hinton G. 2020.. A simple framework for contrastive learning of visual representations. . In Proceedings of the 37th International Conference on Machine Learning, pp. 1597607. Red Hook, NY:: Curran Assoc.
    [Google Scholar]
  19. Cox DD, Meier P, Oertelt N, DiCarlo JJ. 2005.. “Breaking” position-invariant object recognition. . Nat. Neurosci. 8:(9):114547
    [Crossref] [Google Scholar]
  20. Dalal N, Triggs B. 2005.. Histograms of oriented gradients for human detection. . In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 88693. Piscataway, NJ:: IEEE
    [Google Scholar]
  21. Devlin J, Chang M-W, Lee K, Toutanova K. 2019.. BERT: pre-training of deep bidirectional transformers for language understanding. . arXiv:1810.04805 [cs.CL]
  22. DiCarlo JJ, Zoccolan D, Rust NC. 2012.. How does the brain solve visual object recognition?. Neuron 73:(3):41534
    [Crossref] [Google Scholar]
  23. Dobs K, Yuan J, Martinez J, Kanwisher N. 2023.. Behavioral signatures of face perception emerge in deep neural networks optimized for face recognition. . PNAS 120:(32):e2220642120
    [Crossref] [Google Scholar]
  24. Doerig A, Sommers RP, Seeliger K, Richards B, Ismael J, et al. 2023.. The neuroconnectionist research programme. . Nat. Rev. Neurosci. 24:(7):43150
    [Crossref] [Google Scholar]
  25. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, et al. 2020.. An image is worth 16×16 words: transformers for image recognition at scale. . arXiv:2010.11929 [cs.CV]
  26. Dugas-Ford J, Rowell JJ, Ragsdale CW. 2012.. Cell-type homologies and the origins of the neocortex. . PNAS 109:(42):1697479
    [Crossref] [Google Scholar]
  27. Feichtenhofer C, Li Y, He K. 2022.. Masked autoencoders as spatiotemporal learners. . Adv. Neural Inform. Process. Syst. 35::3594658
    [Google Scholar]
  28. Feldman J, Tremoulet PD. 2006.. Individuation of visual objects over time. . Cognition 99:(2):13165
    [Crossref] [Google Scholar]
  29. Földiák P. 1991.. Learning invariance from transformation sequences. . Neural Comput. 3:(2):194200
    [Crossref] [Google Scholar]
  30. Garimella M, Pak D, Wood SMW, Wood JN. 2024.. A newborn embodied Turing test for comparing object segmentation across animals and machines. Paper presented at the 12th International Conference on Learning Representations, Vienna, May 7–11
    [Google Scholar]
  31. Ge X, Zhang K, Gribizis A, Hamodi AS, Sabino AM, Crair MC. 2021.. Retinal waves prime visual motion detection by simulating future optic flow. . Science 373:(6553):eabd0830
    [Crossref] [Google Scholar]
  32. Gibson EJ. 1963.. Perceptual learning. . Annu. Rev. Psychol. 14::2956
    [Crossref] [Google Scholar]
  33. Gibson JJ. 1979.. The Ecological Approach to Visual Perception: Classic Edition. Boston:: Houghton Mifflin
    [Google Scholar]
  34. Goldman JG, Wood JN. 2015.. An automated controlled-rearing method for studying the origins of movement recognition in newly hatched chicks. . Anim. Cogn. 18::72331
    [Crossref] [Google Scholar]
  35. Gomez-Villa A, Martin A, Vazquez-Corral J, Bertalmío M. 2019.. Convolutional neural networks can be deceived by visual illusions. . In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1230917. Piscataway, NJ:: IEEE
    [Google Scholar]
  36. Hasson U, Nastase SA, Goldstein A. 2020.. Direct fit to nature: an evolutionary perspective on biological and artificial neural networks. . Neuron 105:(3):41634
    [Crossref] [Google Scholar]
  37. He K, Chen X, Xie S, Li Y, Dollár P, Girshick R. 2022.. Masked autoencoders are scalable vision learners. . In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 160009. Piscataway, NJ:: IEEE
    [Google Scholar]
  38. Held R, Hein A. 1963.. Movement-produced stimulation in the development of visually guided behavior. . J. Comp. Physiol. Psychol. 56:(5):87276
    [Crossref] [Google Scholar]
  39. Hume D. 1739.. A Treatise of Human Nature. London:: John Noon
    [Google Scholar]
  40. Janini D, Hamblin C, Deza A, Konkle T. 2022.. General object-based features account for letter perception. . PLOS Comput. Biol. 18:(9):e1010522
    [Crossref] [Google Scholar]
  41. Jarvis ED, Güntürkün O, Bruce L, Csillag A, Karten H, et al. 2005.. Avian brains and a new understanding of vertebrate brain evolution. . Nat. Rev. Neurosci. 6:(2):15159
    [Crossref] [Google Scholar]
  42. Kanwisher N, Khosla M, Dobs K. 2023.. Using artificial neural networks to ask “why” questions of minds and brains. . Trends Neurosci. 46:(3):24054
    [Crossref] [Google Scholar]
  43. Karten HJ. 2013.. Neocortical evolution: Neuronal circuits arise independently of lamination. . Curr. Biol. 23:(1):R1215
    [Crossref] [Google Scholar]
  44. Kell AJ, Yamins DL, Shook EN, Norman-Haignere SV, McDermott JH. 2018.. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. . Neuron 98:(3):63044
    [Crossref] [Google Scholar]
  45. Kirkham NZ, Slemmer JA, Johnson SP. 2002.. Visual statistical learning in infancy: evidence for a domain general learning mechanism. . Cognition 83:(2):B3542
    [Crossref] [Google Scholar]
  46. Kriegeskorte N, Douglas PK. 2018.. Cognitive computational neuroscience. . Nat. Neurosci. 21:(9):114860
    [Crossref] [Google Scholar]
  47. Lake BM, Ullman TD, Tenenbaum JB, Gershman SJ. 2017.. Building machines that learn and think like people. . Behav. Brain Sci. 40::e253
    [Crossref] [Google Scholar]
  48. LeCun Y, Bengio Y, Hinton G. 2015.. Deep learning. . Nature 521:(7553):43644
    [Crossref] [Google Scholar]
  49. Lee D, Gujarathi P, Wood JN. 2021.. Controlled-rearing studies of newborn chicks and deep neural networks. . arXiv:2112.06106 [cs.CV]
  50. Li N, DiCarlo JJ. 2008.. Unsupervised natural experience rapidly alters invariant object representation in visual cortex. . Science 321:(5895):15027
    [Crossref] [Google Scholar]
  51. Li N, DiCarlo JJ. 2010.. Unsupervised natural visual experience rapidly reshapes size-invariant object representation in inferior temporal cortex. . Neuron 67:(6):106275
    [Crossref] [Google Scholar]
  52. Li Y, Anumanchipalli GK, Mohamed A, Lu J, Wu J, Chang EF. 2022.. Dissecting neural computations of the human auditory pathway using deep neural networks for speech. . Nat. Neurosci. 26::221325
    [Crossref] [Google Scholar]
  53. Lindsay G. 2021.. Models of the Mind: How Physics, Engineering and Mathematics Have Shaped Our Understanding of the Brain. London:: Bloomsbury Publ.
    [Google Scholar]
  54. Liu S, Deng W. 2015.. Very deep convolutional neural network based image classification using small training sample size. . In Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition, pp. 73034. Piscataway, NJ:: IEEE
    [Google Scholar]
  55. Locke J. 1690.. An Essay Concerning Human Understanding. London:: Thomas Basset
    [Google Scholar]
  56. Loken E, Gelman A. 2017.. Measurement error and the replication crisis. . Science 355:(6325):58485
    [Crossref] [Google Scholar]
  57. Lowe DG. 1999.. Object recognition from local scale-invariant features. . In Proceedings of the 7th IEEE International Conference on Computer Vision, Vol. 2, pp. 115057. Piscataway, NJ:: IEEE
    [Google Scholar]
  58. Marr D, Hildreth E. 1980.. Theory of edge detection. . Proc. R. Soc. Lond. B 207:(1167):187217
    [Crossref] [Google Scholar]
  59. Michaels JA, Schaffelhofer S, Agudelo-Toro A, Scherberger H. 2020.. A goal-driven modular neural network predicts parietofrontal neural dynamics during grasping. . PNAS 117:(50):3212435
    [Crossref] [Google Scholar]
  60. Mitchell TM. 1980.. The need for biases in learning generalizations. Work. Pap. CBM-TR-117 , Rutgers Univ., New Brunswick, NJ:
    [Google Scholar]
  61. Munafò MR, Nosek BA, Bishop DV, Button KS, Chambers CD, et al. 2017.. A manifesto for reproducible science. . Nat. Hum. Behav. 1::0021
    [Crossref] [Google Scholar]
  62. Nayebi A, Attinger A, Campbell M, Hardcastle K, Low I, et al. 2021.. Explaining heterogeneity in medial entorhinal cortex with task-driven neural networks. . Adv. Neural Inform. Process. Syst. 34::1216779
    [Google Scholar]
  63. Newell A. 1973.. You can't play 20 questions with nature and win: projective comments on the papers of this symposium. Tech. Rep., School Comput. Sci. , Carnegie Mellon Univ., Pittsburgh, PA:
    [Google Scholar]
  64. Pandey L, Wood SMW, Wood JN. 2023.. Are vision transformers more data hungry than newborn visual systems?. arXiv:2312.02843 [cs.CV]
  65. Pinker S. 2002.. The Blank Slate: The Modern Denial of Human Nature. London:: Penguin
    [Google Scholar]
  66. Prasad A, Wood SMW, Wood JN. 2019.. Using automated controlled rearing to explore the origins of object permanence. . Dev. Sci. 22:(3):e12796
    [Crossref] [Google Scholar]
  67. Radosavovic I, Xiao T, James S, Abbeel P, Malik J, Darrell T. 2023.. Real-world robot learning with masked visual pre-training. . In Proceedings of the 2023 Conference on Robot Learning, pp. 41626. Cambridge, MA:: PMLR
    [Google Scholar]
  68. Reed S, Zolna K, Parisotto E, Colmenarejo SG, Novikov A, et al. 2022.. A generalist agent. . arXiv:2205.06175 [cs.AI]
  69. Richards BA, Lillicrap TP, Beaudoin P, Bengio Y, Bogacz R, et al. 2019.. A deep learning framework for neuroscience. . Nat. Neurosci. 22:(11):176170
    [Crossref] [Google Scholar]
  70. Ritter S, Barrett DG, Santoro A, Botvinick MM. 2017.. Cognitive psychology for deep neural networks: a shape bias case study. . In Proceedings of the 34th International Conference on Machine Learning, pp. 294049. Cambridge, MA:: PMLR
    [Google Scholar]
  71. Saffran JR, Aslin RN, Newport EL. 1996.. Statistical learning by 8-month-old infants. . Science 274:(5294):192628
    [Crossref] [Google Scholar]
  72. Schneider F, Xu X, Ernst MR, Yu Z, Triesch J. 2021.. Contrastive learning through time. Paper presented at the NeurIPS Shared Visual Representations in Humans and Machines Workshop, virtual, Dec. 13
    [Google Scholar]
  73. Schrimpf M, Blank IA, Tuckute G, Kauf C, Hosseini EA, et al. 2021.. The neural architecture of language: integrative modeling converges on predictive processing. . PNAS 118:(45):e2105646118
    [Crossref] [Google Scholar]
  74. Schrimpf M, Kubilius J, Lee MJ, Murty NAR, Ajemian R, DiCarlo JJ. 2020.. Integrative benchmarking to advance neurally mechanistic models of human intelligence. . Neuron 108:(3):41323
    [Crossref] [Google Scholar]
  75. Shanahan M, Bingman VP, Shimizu T, Wild M, Güntürkün O. 2013.. Large-scale network organization in the avian forebrain: a connectivity matrix and theoretical analysis. . Front. Comput. Neurosci. 7::89
    [Crossref] [Google Scholar]
  76. Sheybani S, Hansaria H, Wood JN, Smith LB, Tiganj Z. 2023.. Curriculum learning with infant egocentric videos. Paper presented at the 37th Annual Conference on Neural Information Processing Systems, New Orleans, LA:, Dec. 10–16
    [Google Scholar]
  77. Simmons JP, Nelson LD, Simonsohn U. 2011.. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. . Psychol. Sci. 22:(11):135966
    [Crossref] [Google Scholar]
  78. Skinner BF. 1938.. The Behavior of Organisms: An Experimental Analysis. New York:: Appleton-Century
    [Google Scholar]
  79. Sobel I, Feldman G. 1968.. A 3×3 isotropic gradient operator for image processing. Talk , Stanford Artificial Intelligence Laboratory, Stanford, CA:
    [Google Scholar]
  80. Spelke ES. 2022.. What Babies Know: Core Knowledge and Composition, Vol. 1. Oxford, UK:: Oxford Univ. Press
    [Google Scholar]
  81. Spelke ES, Newport EL. 1998.. Nativism, empiricism, and the development of knowledge. . In Handbook of Child Psychology: Theoretical Models of Human Development, ed. W Damon, RM Lerner , pp. 275340. Hoboken, NJ:: John Wiley & Sons
    [Google Scholar]
  82. Stone JV. 1996.. Learning perceptually salient visual parameters using spatiotemporal smoothness constraints. . Neural Comput. 8:(7):146392
    [Crossref] [Google Scholar]
  83. Storrs KR, Anderson BL, Fleming RW. 2021.. Unsupervised learning predicts human perception and misperception of gloss. . Nat. Hum. Behav. 5:(10):140217
    [Crossref] [Google Scholar]
  84. Tong Z, Song Y, Wang J, Wang L. 2022.. VideoMAE: masked autoencoders are data-efficient learners for self-supervised video pre-training. . Adv. Neural Inform. Process. Syst. 35::1007893
    [Google Scholar]
  85. Ulhaq A, Akhtar N, Pogrebna G, Mian A. 2022.. Vision transformers for action recognition: a survey. . arXiv:2209.05700 [cs.CV]
  86. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, et al. 2017.. Attention is all you need. . arXiv:1706.03762 [cs.CL]
  87. Volk AA, Epps RW, Yonemoto DT, Masters BS, Castellano FN, et al. 2023.. AlphaFlow: autonomous discovery and optimization of multi-step chemistry using a self-driven fluidic lab guided by reinforcement learning. . Nat. Commun. 14::1403
    [Crossref] [Google Scholar]
  88. Walk RD, Gibson EJ, Tighe TJ. 1957.. Behavior of light- and dark-reared rats on a visual cliff. . Science 126:(3263):8081
    [Crossref] [Google Scholar]
  89. Wallis G, Bülthoff HH. 2001.. Effects of temporal association on recognition memory. . PNAS 98:(8):48004
    [Crossref] [Google Scholar]
  90. Wallis G, Rolls ET. 1997.. Invariant face and object recognition in the visual system. . Prog. Neurobiol. 51:(2):16794
    [Crossref] [Google Scholar]
  91. Wang HC, Bergles DE. 2015.. Spontaneous activity in the developing auditory system. . Cell Tissue Res. 361::6575
    [Crossref] [Google Scholar]
  92. Wang PY, Sun Y, Axel R, Abbott LF, Yang GR. 2021.. Evolving the olfactory system with machine learning. . Neuron 109:(23):387992
    [Crossref] [Google Scholar]
  93. Wang Y, Brzozowska-Prechtl A, Karten HJ. 2010.. Laminar and columnar auditory cortex in avian brain. . PNAS 107:(28):1267681
    [Crossref] [Google Scholar]
  94. Watson JB. 1913.. Psychology as the behaviorist views it. . Psychol. Rev. 20:(2):158177
    [Crossref] [Google Scholar]
  95. Wenner P. 2012.. Motor development: Activity matters after all. . Curr. Biol. 22:(2):R4748
    [Crossref] [Google Scholar]
  96. Whittington JC, McCaffary D, Bakermans JJ, Behrens TE. 2022.. How to build a cognitive map. . Nat. Neurosci. 25:(10):125772
    [Crossref] [Google Scholar]
  97. Wiskott L, Sejnowski TJ. 2002.. Slow feature analysis: unsupervised learning of invariances. . Neural Comput. 14:(4):71570
    [Crossref] [Google Scholar]
  98. Wolpert DH, Macready WG. 1997.. No free lunch theorems for optimization. . IEEE Trans. Evol. Comput. 1::6782
    [Crossref] [Google Scholar]
  99. Wood JN. 2013.. Newborn chickens generate invariant object representations at the onset of visual object experience. . PNAS 110:(34):140005
    [Crossref] [Google Scholar]
  100. Wood JN. 2014.. Newly hatched chicks solve the visual binding problem. . Psychol. Sci. 25:(7):147581
    [Crossref] [Google Scholar]
  101. Wood JN. 2016.. A smoothness constraint on the development of object recognition. . Cognition 153::14045
    [Crossref] [Google Scholar]
  102. Wood JN, Prasad A, Goldman JG, Wood SMW. 2016.. Enhanced learning of natural visual sequences in newborn chicks. . Anim. Cogn. 19::83545
    [Crossref] [Google Scholar]
  103. Wood JN, Ullman TD, Wood BWW, Spelke ES, Wood SMW. 2024.. Object permanence in newborn chicks is robust against opposing evidence. . arXiv:2402.14641 [q-bio.NC]
  104. Wood JN, Wood SMW. 2016.. The development of newborn object recognition in fast and slow visual worlds. . Proc. R. Soc. B 283:(1829):20160166
    [Crossref] [Google Scholar]
  105. Wood JN, Wood SMW. 2017.. Measuring the speed of newborn object recognition in controlled visual worlds. . Dev. Sci. 20:(4):e12470
    [Crossref] [Google Scholar]
  106. Wood JN, Wood SMW. 2018.. The development of invariant object recognition requires visual experience with temporally smooth objects. . Cogn. Sci. 42:(4):1391406
    [Crossref] [Google Scholar]
  107. Wood JN, Wood SMW. 2020.. One-shot learning of view-invariant object representations in newborn chicks. . Cognition 199::104192
    [Crossref] [Google Scholar]
  108. Wood SMW, Wood JN. 2015.. A chicken model for studying the emergence of invariant object recognition. . Front. Neural Circuits 9::7
    [Crossref] [Google Scholar]
  109. Wood SMW, Wood JN. 2019.. Using automation to combat the replication crisis: a case study from controlled-rearing studies of newborn chicks. . Infant Behav. Dev. 57::101329
    [Crossref] [Google Scholar]
  110. Wood SMW, Wood JN. 2021a.. Distorting face representations in newborn brains. . Cogn. Sci. 45:(8):e13021
    [Crossref] [Google Scholar]
  111. Wood SMW, Wood JN. 2021b.. One-shot object parsing in newborn chicks. . J. Exp. Psychol. Gen. 150:(11):240820
    [Crossref] [Google Scholar]
  112. Yamins DL, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ. 2014.. Performance-optimized hierarchical models predict neural responses in higher visual cortex. . PNAS 111:(23):861924
    [Crossref] [Google Scholar]
  113. Yang J, Dong X, Liu L, Zhang C, Shen J, Yu D. 2022.. Recurring the transformer for video action recognition. . In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1406373. Piscataway, NJ:: IEEE
    [Google Scholar]
  114. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A. 2014.. Learning deep features for scene recognition using places database. Paper presented at the 28th Annual Conference on Neural Information Processing Systems, Montreal, Can.:, Dec. 8–13
    [Google Scholar]
  115. Zhou HY, Lu C, Yang S, Yu Y. 2021.. Convnets versus transformers: Whose visual representations are more transferable?. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, pp. 223038. Piscataway, NJ:: IEEE
    [Google Scholar]
/content/journals/10.1146/annurev-vision-101322-103628
Loading
/content/journals/10.1146/annurev-vision-101322-103628
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error