1932

Abstract

It has long been argued that only humans could produce and understand language. But now, for the first time, artificial language models (LMs) achieve this feat. Here we survey the new purchase LMs are providing on the question of how language is implemented in the brain. We discuss why, a priori, LMs might be expected to share similarities with the human language system. We then summarize evidence that LMs represent linguistic information similarly enough to humans to enable relatively accurate brain encoding and decoding during language processing. Finally, we examine which LM properties—their architecture, task performance, or training—are critical for capturing human neural responses to language and review studies using LMs as in silico model organisms for testing hypotheses about language. These ongoing investigations bring us closer to understanding the representations and processes that underlie our ability to comprehend sentences and express thoughts in language.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-neuro-120623-101142
2024-08-08
2024-12-06
Loading full text...

Full text loading...

/deliver/fulltext/neuro/47/1/annurev-neuro-120623-101142.html?itemId=/content/journals/10.1146/annurev-neuro-120623-101142&mimeType=html&fmt=ahah

Literature Cited

  1. Abdou M, Gonzalez AV, Toneva M, Hershcovich D, Søgaard A. 2021.. Does injecting linguistic structure into language models lead to better alignment with brain recordings?. arXiv:2101.12608 [cs.CL]
  2. Abnar S, Beinborn L, Choenni R, Zuidema W. 2019.. Blackbox meets blackbox: representational similarity and stability analysis of neural language models and brains. . In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, ed. T Linzen, G Chrupala, Y Belinkov, D Hupkes , pp. 191203. Kerrville, TX:: Assoc. Comput. Linguist.
    [Google Scholar]
  3. Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, et al. 2023.. GPT-4 technical report. . arXiv:2303.08774 [cs.CL]
  4. Allen EJ, St-Yves G, Wu Y, Breedlove JL, Prince JS, et al. 2022.. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. . Nat. Neurosci. 25:(1):11626
    [Crossref] [Google Scholar]
  5. Amalric M, Denghien I, Dehaene S. 2018.. On the role of visual experience in mathematical development: evidence from blind mathematicians. . Dev. Cogn. Neurosci. 30::31423
    [Crossref] [Google Scholar]
  6. Anderson AJ, Kiela D, Binder JR, Fernandino L, Humphries CJ, et al. 2021.. Deep artificial neural networks reveal a distributed cortical network encoding propositional sentence-level meaning. . J. Neurosci. 41:(18):410019
    [Crossref] [Google Scholar]
  7. Antonello R, Huth A. 2024.. Predictive coding or just feature discovery? An alternative account of why language models fit brain data. . Neurobiol. Lang. 5:(1):6479
    [Google Scholar]
  8. Antonello R, Vaidya A, Huth AG. 2023.. Scaling laws for language encoding models in fMRI. . arXiv:2305.11863 [cs.CL]
  9. Apperly IA, Samson D, Humphreys GW. 2009.. Studies of adults can inform accounts of theory of mind development. . Dev. Psychol. 45:(1):190201
    [Crossref] [Google Scholar]
  10. Aw KL, Toneva M. 2023.. Training language models to summarize narratives improves brain alignment. Paper presented at the Eleventh International Conference on Learning Representations, Kigali, Rwanda:, May 1–5. https://openreview.net/forum?id=KzkLAE49H9b
    [Google Scholar]
  11. Badecker W, Caramazza A. 1985.. On considerations of method and theory governing the use of clinical categories in neurolinguistics and cognitive neuropsychology: the case against agrammatism. . Cognition 20:(2):97125
    [Crossref] [Google Scholar]
  12. Bahdanau D, Cho K, Bengio Y. 2015.. Neural machine translation by jointly learning to align and translate. Paper presented at the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA:, May 7–9
    [Google Scholar]
  13. Barrett D. 2010.. Supernormal Stimuli: How Primal Urges Overran Their Evolutionary Purpose. New York:: W. W. Norton & Company
    [Google Scholar]
  14. Bashivan P, Kar K, DiCarlo JJ. 2019.. Neural population control via deep image synthesis. . Science 364:(6439):eaav9436
    [Crossref] [Google Scholar]
  15. Bautista A, Wilson SM. 2016.. Neural responses to grammatically and lexically degraded speech. . Lang. Cogn. Neurosci. 31:(4):56774
    [Crossref] [Google Scholar]
  16. Beguš G. 2021.. CiwGAN and fiwGAN: encoding information in acoustic data to model lexical learning with Generative Adversarial Networks. . Neural Netw. 139::30525
    [Crossref] [Google Scholar]
  17. Bengio Y, Ducharme R, Vincent P. 2000.. A neural probabilistic language model. . In Advances in Neural Information Processing Systems 13 (NIPS 2000), ed. T Leen, T Dietterich, V Tresp . San Diego, CA:: NeurIPS. https://papers.nips.cc/paper_files/paper/2000/hash/728f206c2a01bf572b5940d7d9a8fa4c-Abstract.html
    [Google Scholar]
  18. Berndt RS. 1991.. Sentence processing in aphasia. . In Acquired Aphasias, ed. M Sarno , pp. 22370. Orlando, FL:: Academic Press
    [Google Scholar]
  19. Berwick RC, Chomsky N. 2015.. Why Only Us: Language and Evolution. Cambridge, MA:: MIT Press
    [Google Scholar]
  20. Bhargava P, Ng V. 2022.. Commonsense knowledge reasoning and generation with pre-trained language models: a survey. . arXiv:2201.12438 [cs.CL]
  21. Blank IA. 2023.. What are large language models supposed to model?. Trends Cogn. Sci. 27:(11):98789
    [Crossref] [Google Scholar]
  22. Blank IA, Balewski Z, Mahowald K, Fedorenko E. 2016.. Syntactic processing is distributed across the language system. . NeuroImage 127::30723
    [Crossref] [Google Scholar]
  23. Blank IA, Fedorenko E. 2020.. No evidence for differences among language regions in their temporal receptive windows. . NeuroImage 219::116925
    [Crossref] [Google Scholar]
  24. Blank IA, Kanwisher N, Fedorenko E. 2014.. A functional dissociation between language and multiple-demand systems revealed in patterns of BOLD signal fluctuations. . J. Neurophysiol. 112:(5):110518
    [Crossref] [Google Scholar]
  25. Bölücü N, Can B. 2022.. Analysing syntactic and semantic features in pre-trained language models in a fully unsupervised setting. . In Proceedings of the 19th International Conference on Natural Language Processing (ICON), ed. MS Akhtar, T Chakraborty , pp. 1931. Kerrville, TX:: Assoc. Comput. Linguist.
    [Google Scholar]
  26. Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, et al. 2022.. On the opportunities and risks of foundation models. . arXiv:2108.07258 [cs.LG]
  27. Braga RM, DiNicola LM, Becker HC, Buckner RL. 2020.. Situating the left-lateralized language network in the broader organization of multiple specialized large-scale distributed networks. . J. Neurophysiol. 124:(5):141548
    [Crossref] [Google Scholar]
  28. Brothers T, Kuperberg GR. 2021.. Word predictability effects are linear, not logarithmic: implications for probabilistic models of sentence comprehension. . J. Mem. Lang. 116::104174
    [Crossref] [Google Scholar]
  29. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, et al. 2020.. Language models are few-shot learners. . In Advances in Neural Information Processing Systems 33 (NeurIPS 2020), ed. H Larochelle, M Ranzato, R Hadsell, MF Balcan, H Lin . San Diego, CA:: NeurIPS. https://papers.nips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
    [Google Scholar]
  30. Caramazza A. 1997.. How many levels of processing are there in lexical access?. Cogn. Neuropsychol. 14:(1):177208
    [Crossref] [Google Scholar]
  31. Carlini N, Ippolito D, Jagielski M, Lee K, Tramer F, Zhang C. 2023.. Quantifying memorization across neural language models. . arXiv:2202.07646 [cs.LG]
  32. Caucheteux C, Gramfort A, King J-R. 2021a.. Disentangling syntax and semantics in the brain with deep networks. . arXiv:2103.01620 [cs.CL]
  33. Caucheteux C, Gramfort A, King J-R. 2021b.. Model-based analysis of brain activity reveals the hierarchy of language in 305 subjects. . arXiv:2110.06078 [q-bio.NC]
  34. Caucheteux C, King J-R. 2022.. Brains and algorithms partially converge in natural language processing. . Commun. Biol. 5:(1):134
    [Crossref] [Google Scholar]
  35. Chen X, Affourtit J, Ryskin R, Regev TI, Norman-Haignere S, et al. 2023.. The human language system, including its inferior frontal component in “Broca's area,” does not support music perception. . Cereb. Cortex 33:(12):790429
    [Crossref] [Google Scholar]
  36. Chomsky N. 1965.. Aspects of the Theory of Syntax. Cambridge, MA:: MIT Press
    [Google Scholar]
  37. Christiansen MH, Chater N. 2016.. The now-or-never bottleneck: a fundamental constraint on language. . Behav. Brain Sci. 39::e62
    [Crossref] [Google Scholar]
  38. Clifton C, Frazier L. 1989.. Comprehending sentences with long-distance dependencies. . In Linguistic Structure in Language Processing, ed. GN Carlson, MK Tanenhaus , pp. 273317. Dordrecht, Neth:.: Springer
    [Google Scholar]
  39. Cohen L, Salondy P, Pallier C, Dehaene S. 2021.. How does inattention affect written and spoken language processing?. Cortex 138::21227
    [Crossref] [Google Scholar]
  40. Conwell C, Prince JS, Kay KN, Alvarez GA, Konkle T. 2023.. What can 1.8 billion regressions tell us about the pressures shaping high-level visual representation in brains and machines?. bioRxiv 2022.03.28.485868. https://doi.org/10.1101/2022.03.28.485868
  41. Deen B, Koldewyn K, Kanwisher N, Saxe R. 2015.. Functional organization of social perception and cognition in the superior temporal sulcus. . Cereb. Cortex 25:(11):4596609
    [Crossref] [Google Scholar]
  42. Dehaene S, Spelke E, Pinel P, Stanescu R, Tsivkin S. 1999.. Sources of mathematical thinking: behavioral and brain-imaging evidence. . Science 284:(5416):97074
    [Crossref] [Google Scholar]
  43. Dell GS. 1986.. A spreading-activation theory of retrieval in sentence production. . Psychol. Rev. 93:(3):283321
    [Crossref] [Google Scholar]
  44. Demberg V, Keller F. 2008.. Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. . Cognition 109:(2):193210
    [Crossref] [Google Scholar]
  45. Fedorenko E, Behr MK, Kanwisher N. 2011.. Functional specificity for high-level linguistic processing in the human brain. . PNAS 108:(39):1642833
    [Crossref] [Google Scholar]
  46. Fedorenko E, Blank IA. 2020.. Broca's area is not a natural kind. . Trends Cogn. Sci. 24:(4):27084
    [Crossref] [Google Scholar]
  47. Fedorenko E, Blank IA, Siegelman M, Mineroff Z. 2020.. Lack of selectivity for syntax relative to word meanings throughout the language network. . Cognition 203::104348
    [Crossref] [Google Scholar]
  48. Fedorenko E, Hsieh P-J, Nieto-Castañón A, Whitfield-Gabrieli S, Kanwisher N. 2010.. New method for fMRI investigations of language: defining ROIs functionally in individual subjects. . J. Neurophysiol. 104:(2):117794
    [Crossref] [Google Scholar]
  49. Fedorenko E, Ryskin R, Gibson E. 2022.. Agrammatic output in non-fluent, including Broca's, aphasia as a rational behavior. . Aphasiology 37:(12):19812000
    [Crossref] [Google Scholar]
  50. Fedorenko E, Scott TL, Brunner P, Coon WG, Pritchett B, et al. 2016.. Neural correlate of the construction of sentence meaning. . PNAS 113:(41):E625662
    [Crossref] [Google Scholar]
  51. Fedorenko E, Varley R. 2016.. Language and thought are not the same thing: evidence from neuroimaging and neurological patients. . Ann. N. Y. Acad. Sci. 1369:(1):13253
    [Crossref] [Google Scholar]
  52. Friederici AD. 2002.. Towards a neural basis of auditory sentence processing. . Trends Cogn. Sci. 6:(2):7884
    [Crossref] [Google Scholar]
  53. Friederici AD. 2012.. The cortical language circuit: from auditory perception to sentence comprehension. . Trends Cogn. Sci. 16:(5):26268
    [Crossref] [Google Scholar]
  54. Futrell R, Wilcox E, Morita T, Qian P, Ballesteros M, Levy R. 2019.. Neural language models as psycholinguistic subjects: representations of syntactic state. . arXiv:1903.03260 [cs.CL]
  55. Fyshe A, Talukdar PP, Murphy B, Mitchell TM. 2014.. Interpretable semantic vectors from a joint model of brain- and text-based meaning. . In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ed. K Toutanova, H Wu , pp. 48999. Kerrville, TX:: Assoc. Comput. Linguist.
    [Google Scholar]
  56. Gauthier J, Hu J, Wilcox E, Qian P, Levy R. 2020.. SyntaxGym: an online platform for targeted evaluation of language models. . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, ed. A Celikyilmaz, T-H Wen , pp. 7076. Kerrville, TX:: Assoc. Comput. Linguist. https://doi.org/10.18653/v1/2020.acl-demos.10
    [Google Scholar]
  57. Gauthier J, Levy R. 2019.. Linking artificial and human neural representations of language. . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), ed. K Inui, J Jiang, V Ng, X Wan , pp. 529539. Kerrville, TX:: Assoc. Comput. Linguist.
    [Google Scholar]
  58. Geschwind N. 1970.. The organization of language and the brain. . Science 170:(3961):94044
    [Crossref] [Google Scholar]
  59. Gibson E. 1998.. Linguistic complexity: locality of syntactic dependencies. . Cognition 68:(1):176
    [Crossref] [Google Scholar]
  60. Gilkerson J, Richards JA, Warren SF, Montgomery JK, Greenwood CR, et al. 2017.. Mapping the early language environment using all-day recordings and automated analysis. . Am. J. Speech Lang. Pathol. 26:(2):24865
    [Crossref] [Google Scholar]
  61. Golan T, Siegelman M, Kriegeskorte N, Baldassano C. 2023.. Testing the limits of natural language models for predicting human language judgements. . Nat. Mach. Intel. 5:(9):95264
    [Crossref] [Google Scholar]
  62. Goldstein A, Ham E, Nastase SA, Zada Z, Grinstein-Dabus A, et al. 2023a.. Correspondence between the layered structure of deep language models and temporal structure of natural language processing in the human brain. . bioRxiv 2022.07.11.499562. https://doi.org/10.1101/2022.07.11.499562
    [Google Scholar]
  63. Goldstein A, Wang H, Niekerken L, Zada Z, Aubrey B, et al. 2023b.. Deep speech-to-text models capture the neural basis of spontaneous speech in everyday conversations. . bioRxiv 2023.06.26.546557. https://doi.org/10.1101/2023.06.26.546557
  64. Goldstein A, Zada Z, Buchnik E, Schain M, Price A, et al. 2022.. Shared computational principles for language processing in humans and deep language models. . Nat. Neurosci. 25:(3):36980
    [Crossref] [Google Scholar]
  65. Goodglass H. 1993.. Understanding Aphasia. San Diego, CA:: Academic Press
    [Google Scholar]
  66. Grodzinsky Y, Santi A. 2008.. The battle for Broca's region. . Trends Cogn. Sci. 12:(12):47480
    [Crossref] [Google Scholar]
  67. Guenther FH. 2016.. Neural Control of Speech. Cambridge, MA:: MIT Press
    [Google Scholar]
  68. Guest O, Martin AE. 2023.. On logical inference over brains, behaviour, and artificial neural networks. . Comput. Behav. 6:(2):21327
    [Crossref] [Google Scholar]
  69. Gupta A. 2023.. Probing quantifier comprehension in large language models: another example of inverse scaling. . arXiv:2306.07384 [cs.CL]
  70. Hagoort P. 2005.. On Broca, brain, and binding: a new framework. . Trends Cogn. Sci. 9:(9):41623
    [Crossref] [Google Scholar]
  71. Heilbron M, Armeni K, Schoffelen J-M, Hagoort P, de Lange FP. 2022.. A hierarchy of linguistic predictions during natural language comprehension. . PNAS 119:(32):e2201968119
    [Crossref] [Google Scholar]
  72. Henderson JM, Choi W, Lowder MW, Ferreira F. 2016.. Language structure in the brain: a fixation-related fMRI study of syntactic surprisal in reading. . NeuroImage 132::293300
    [Crossref] [Google Scholar]
  73. Hickok G, Poeppel D. 2007.. The cortical organization of speech processing. . Nat. Rev. Neurosci. 8:(5):393402
    [Crossref] [Google Scholar]
  74. Hochreiter S, Schmidhuber J. 1997.. Long short-term memory. . Neural Comput. 9:(8):173580
    [Crossref] [Google Scholar]
  75. Hoff E. 2006.. How social contexts support and shape language development. . Dev. Rev. 26:(1):5588
    [Crossref] [Google Scholar]
  76. Hollenstein N, de la Torre A, Langer N, Zhang C. 2019.. CogniVal: a framework for cognitive word embedding evaluation. . In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), ed. M Bansal, A Villavicencio , pp. 53849. Kerrville, TX:: Assoc. Comput. Linguist.
    [Google Scholar]
  77. Hosseini EA, Fedorenko E. 2023.. Large language models implicitly learn to straighten neural sentence trajectories to construct a predictive representation of natural language. Paper presented at the 37th Annual Conference on Neural Information Processing Systems, New Orleans, LA:, Dec. 10–16
    [Google Scholar]
  78. Hosseini EA, Schrimpf M, Zhang Y, Bowman S, Zaslavsky N, Fedorenko E. 2024.. Artificial neural network language models predict human brain responses to language even after a developmentally realistic amount of training. . Neurobiol. Lang. 5:(1):4363
    [Crossref] [Google Scholar]
  79. Hosseini EA, Zaslavsky N, Casto C, Fedorenko E. 2023.. Teasing apart the representational spaces of ANN language models to discover key axes of model-to-brain alignment . . Paper presented at the 2023 Conference on Cognitive Computational Neuroscience (CCN 2023), Oxford, UK:, Aug. 24–27
    [Google Scholar]
  80. Hu J, Gauthier J, Qian P, Wilcox E, Levy RP. 2020.. A systematic assessment of syntactic generalization in neural language models. . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ed. D Jurafsky, J Chai, N Schluter, J Tetreault , pp. 172544. Kerrville, TX:: Assoc. Comput. Linguist. https://aclanthology.org/2020.acl-main.158/
    [Google Scholar]
  81. Hu J, Small H, Kean H, Takahashi A, Zekelman L, et al. 2023.. Precision fMRI reveals that the language-selective network supports both phrase-structure building and lexical access during language production. . Cereb. Cortex 33::4384404
    [Crossref] [Google Scholar]
  82. Huettig F, Mani N. 2016.. Is prediction necessary to understand language? Probably not. . Lang. Cogn. Neurosci. 31:(1):1931
    [Crossref] [Google Scholar]
  83. Huth AG, de Heer WA, Griffiths TL, Theunissen FE, Gallant JL. 2016.. Natural speech reveals the semantic maps that tile human cerebral cortex. . Nature 532::45358
    [Crossref] [Google Scholar]
  84. Imani S, Du L, Shrivastava H. 2023.. MathPrompter: mathematical reasoning using large language models. . arXiv:2303.05398 [cs.CL]
  85. Ivanova AA, Mineroff Z, Zimmerer V, Kanwisher N, Varley R, Fedorenko E. 2021.. The language network is recruited but not required for nonverbal event semantics. . Neurobiol. Lang. 2:(2):176201
    [Crossref] [Google Scholar]
  86. Ivanova AA, Srikant S, Sueoka Y, Kean HH, Dhamala R, et al. 2020.. Comprehension of computer code relies primarily on domain-general executive brain regions. . eLife 9::e58906
    [Crossref] [Google Scholar]
  87. Jackendoff R. 2007.. A parallel architecture perspective on language processing. . Brain Res. 1146::222
    [Crossref] [Google Scholar]
  88. Jain S, Huth AG. 2018.. Incorporating context into language encoding models for fMRI. . In Advances in Neural Information Processing Systems 31 (NeurIPS 2018), ed. S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett . San Diego, CA:: NeurIPS. https://proceedings.neurips.cc/paper/2018/hash/f471223d1a1614b58a7dc45c9d01df19-Abstract.html
    [Google Scholar]
  89. Jain S, Vo VA, Mahto S, LeBel A, Turek JS, Huth A. 2020.. Interpretable multi-timescale models for predicting fMRI responses to continuous natural speech. . In Advances in Neural Information Processing Systems 33 (NeurIPS 2020), ed. H Larochelle, M Ranzato, R Hadsell, MF Balcan, H Lin . San Diego, CA:: NeurIPS. https://proceedings.neurips.cc/paper/2020/hash/9e9a30b74c49d07d8150c8c83b1ccf07-Abstract.html
    [Google Scholar]
  90. Jain S, Vo VA, Wehbe L, Huth AG. 2024.. Computational language modeling and the promise of in silico experimentation. . Neurobiol. Lang. 5:(1):80106
    [Crossref] [Google Scholar]
  91. Jang J, Ye S, Seo M. 2023.. Can large language models truly understand prompts? A case study with negated prompts. . In Proceedings of the 1st Transfer Learning for Natural Language Processing Workshop, ed. A Albalak, C Zhou, C Raffel, D Ramachandran, S Ruder, X Ma , pp. 5262. Proc. Mach. Learn. Res. 203. N.p.:: ML Res. Press. https://proceedings.mlr.press/v203/jang23a.html
    [Google Scholar]
  92. Jouravlev O, Schwartz R, Ayyash D, Mineroff Z, Gibson E, Fedorenko E. 2019.. Tracking colisteners’ knowledge states during language comprehension. . Psychol. Sci. 30:(1):319
    [Crossref] [Google Scholar]
  93. Jurafsky D, Martin J. 2008.. Speech and Language Processing. Upper Saddle River, NJ:: Prentice Hall. , 2nd ed..
    [Google Scholar]
  94. Kaplan J, McCandlish S, Henighan T, Brown TB, Chess B, et al. 2020.. Scaling laws for neural language models. . arXiv:2001.08361 [cs.LG]
  95. Kauf C, Tuckute G, Levy R, Andreas J, Fedorenko E. 2024.. Lexical-semantic content, not syntactic structure, is the main contributor to ANN-brain similarity of fMRI responses in the language network. . Neurobiol. Lang. 5:(1):742
    [Crossref] [Google Scholar]
  96. Kell AJE, Yamins DLK, Shook EN, Norman-Haignere SV, McDermott JH. 2018.. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. . Neuron 98:(3):63044.e16
    [Crossref] [Google Scholar]
  97. Khaligh-Razavi S-M, Kriegeskorte N. 2014.. Deep supervised, but not unsupervised, models may explain IT cortical representation. . PLOS Comput. Biol. 10:(11):e1003915
    [Crossref] [Google Scholar]
  98. Khosla M, Williams AH. 2023.. Soft matching distance: a metric on neural representations that captures single-neuron tuning. . arXiv:2311.09466 [cs.LG]
  99. Kozachkov L, Kastanenka KV, Krotov D. 2023.. Building transformers from neurons and astrocytes. . PNAS 120:(34):e2219150120
    [Crossref] [Google Scholar]
  100. Kuperberg GR, Jaeger TF. 2016.. What do we mean by prediction in language comprehension?. Lang. Cogn. Neurosci. 31:(1):3259
    [Crossref] [Google Scholar]
  101. Landauer TK, Foltz PW, Laham D. 1998.. An introduction to latent semantic analysis. . Discourse Process 25:(2–3):25984
    [Crossref] [Google Scholar]
  102. Lerner Y, Honey CJ, Silbert LJ, Hasson U. 2011.. Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. . J. Neurosci. 31:(8):290615
    [Crossref] [Google Scholar]
  103. Levelt WJM, Roelofs A, Meyer AS. 1999.. A theory of lexical access in speech production. . Behav. Brain Sci. 22:(1):138
    [Google Scholar]
  104. Levitin DJ, Menon V. 2003.. Musical structure is processed in “language” areas of the brain: a possible role for Brodmann area 47 in temporal coherence. . NeuroImage 20:(4):214252
    [Crossref] [Google Scholar]
  105. Lewis RL, Vasishth S. 2005.. An activation-based model of sentence processing as skilled memory retrieval. . Cogn. Sci. 29:(3):375419
    [Crossref] [Google Scholar]
  106. Lillicrap TP, Santoro A, Marris L, Akerman CJ, Hinton G. 2020.. Backpropagation and the brain. . Nat. Rev. Neurosci. 21:(6):33546
    [Crossref] [Google Scholar]
  107. Linzen T, Baroni M. 2021.. Syntactic structure from deep learning. . Annu. Rev. Linguist. 7::195212
    [Crossref] [Google Scholar]
  108. Linzen T, Dupoux E, Goldberg Y. 2016.. Assessing the ability of LSTMs to learn syntax-sensitive dependencies. . Trans. Assoc. Comput. Linguist. 4::52135
    [Crossref] [Google Scholar]
  109. Lipkin B, Tuckute G, Affourtit J, Small H, Mineroff Z, et al. 2022.. Probabilistic atlas for the language network based on precision fMRI data from >800 individuals. . Sci. Data 9:(1):529
    [Crossref] [Google Scholar]
  110. Liu Y-F, Kim J, Wilson C, Bedny M. 2020.. Computer code comprehension shares neural resources with formal logical inference in the fronto-parietal network. . eLife 9::e59340
    [Crossref] [Google Scholar]
  111. Lu K, Grover A, Abbeel P, Mordatch I. 2021.. Pretrained transformers as universal computation engines. . arXiv:2103.05247 [cs.LG]
  112. Luria AR. 1970.. The functional organization of the brain. . Sci. Am. 222:(3):6672
    [Crossref] [Google Scholar]
  113. Mahowald K, Fedorenko E. 2016.. Reliable individual-level neural markers of high-level language processing: a necessary precursor for relating neural variability to behavioral and genetic variability. . NeuroImage 139::7493
    [Crossref] [Google Scholar]
  114. Mahowald K, Ivanova AA, Blank IA, Kanwisher N, Tenenbaum JB, Fedorenko E. 2023.. Dissociating language and thought in large language models. . arXiv:2301.06627 [cs.CL]
  115. Malik-Moraleda S, Ayyash D, Gallée J, Affourtit J, Hoffmann M, et al. 2022.. An investigation across 45 languages and 12 language families reveals a universal language network. . Nat. Neurosci. 25:(8):101419
    [Crossref] [Google Scholar]
  116. Manning CD, Clark K, Hewitt J, Khandelwal U, Levy O. 2020.. Emergent linguistic structure in artificial neural networks trained by self-supervision. . PNAS 117:(48):3004654
    [Crossref] [Google Scholar]
  117. Marr D. 1982.. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. New York:: W. H. Freeman
    [Google Scholar]
  118. Marvin R, Linzen T. 2018.. Targeted syntactic evaluation of language models. . arXiv:1808.09031 [cs.CL]
  119. McKenzie IR, Lyzhov A, Pieler M, Parrish A, Mueller A, et al. 2023.. Inverse scaling: when bigger isn't better. . arXiv:2306.09479 [cs.CL]
  120. Menenti L, Gierhan SME, Segaert K, Hagoort P. 2011.. Shared language: overlap and segregation of the neuronal infrastructure for speaking and listening revealed by functional MRI. . Psychol. Sci. 22:(9):117382
    [Crossref] [Google Scholar]
  121. Meng K, Bau D, Andonian A, Belinkov Y. 2023.. Locating and editing factual associations in GPT. . arXiv:2202.05262 [cs.CL]
  122. Merlin G, Toneva M. 2022.. Language models and brain alignment: beyond word-level semantics and prediction. . arXiv:2212.00596 [cs.CL]
  123. Michaelov JA, Bergen BK. 2023.. Rarely a problem? Language models exhibit inverse scaling in their predictions following few-type quantifiers. . arXiv:2212.08700 [cs.CL]
  124. Mikhailov V, Serikov O, Artemova E. 2021.. Morph call: probing morphosyntactic content of multilingual transformers. . In Proceedings of the Third Workshop on Computational Typology and Multilingual NLP, ed. E Vylomova, E Salesky, S Mielke, G Lapesa, R Kumar , et al., pp. 97121. Kerrville, TX:: Assoc. Comput. Linguist. https://aclanthology.org/2021.sigtyp-1.10/
    [Google Scholar]
  125. Mitchell TM, Shinkareva SV, Carlson A, Chang K-M, Malave VL, et al. 2008.. Predicting human brain activity associated with the meanings of nouns. . Science 320:(5880):119195
    [Crossref] [Google Scholar]
  126. Monti MM, Parsons LM, Osherson DN. 2009.. The boundaries of language and thought in deductive inference. . PNAS 106:(30):1255459
    [Crossref] [Google Scholar]
  127. Monti MM, Parsons LM, Osherson DN. 2012.. Thought beyond language: neural dissociation of algebra and natural language. . Psychol. Sci. 23:(8):91422
    [Crossref] [Google Scholar]
  128. Naselaris T, Bassett DS, Fletcher AK, Kording K, Kriegeskorte N, et al. 2018.. Cognitive computational neuroscience: a new conference for an emerging discipline. . Trends Cogn. Sci. 22:(5):36567
    [Crossref] [Google Scholar]
  129. Nelson MJ, El Karoui I, Giber K, Yang X, Cohen L, et al. 2017.. Neurophysiological dynamics of phrase-structure building during sentence processing. . PNAS 114:(18):E366978
    [Crossref] [Google Scholar]
  130. Novick JM, Trueswell JC, Thompson-Schill SL. 2005.. Cognitive control and parsing: reexamining the role of Broca's area in sentence comprehension. . Cogn. Affect. Behav. Neurosci. 5:(3):26381
    [Crossref] [Google Scholar]
  131. Oh B-D, Schuler W. 2023.. Why does surprisal from larger transformer-based language models provide a poorer fit to human reading times?. Trans. Assoc. Comput. Linguist. 11::33650
    [Crossref] [Google Scholar]
  132. Oota SR, Alexandre F, Hinaut X. 2022a.. Long-term plausibility of language models and neural dynamics during narrative listening. . In Proceedings of the 44th Annual Conference of the Cognitive Science Society, ed. J Culbertson, A Perfors, H Rabagliati, V Ramenzoni , pp. 246269. N.p.: Cogn. Sci. Soc. https://escholarship.org/uc/item/7r95j62c
    [Google Scholar]
  133. Oota SR, Arora J, Agarwal V, Marreddy M, Gupta M, Surampudi BR. 2022b.. Neural language taskonomy: which NLP tasks are the most predictive of fMRI brain activity. ? arXiv:2205.01404 [cs.CL]
  134. Overath T, McDermott JH, Zarate JM, Poeppel D. 2015.. The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts. . Nat. Neurosci. 18:(6):90311
    [Crossref] [Google Scholar]
  135. Palatucci M, Pomerleau D, Hinton GE, Mitchell TM. 2009.. Zero-shot learning with semantic output codes. . In Advances in Neural Information Processing Systems 22 (NIPS 2009), ed. Y Bengio, D Schuurmans, J Lafferty, C Williams, A Culotta . San Diego, CA:: NeurIPS. https://papers.nips.cc/paper_files/paper/2009/hash/1543843a4723ed2ab08e18053ae6dc5b-Abstract.html
    [Google Scholar]
  136. Pallier C, Devauchelle A-D, Dehaene S. 2011.. Cortical representation of the constituent structure of sentences. . PNAS 108:(6):252227
    [Crossref] [Google Scholar]
  137. Pasquiou A, Lakretz Y, Hale J, Thirion B, Pallier C. 2022.. Neural language models are not born equal to fit brain data, but training helps. . arXiv:2207.03380 [cs.AI]
  138. Pasquiou A, Lakretz Y, Thirion B, Pallier C. 2023.. Information-restricted neural language models reveal different brain regions’ sensitivity to semantics, syntax and context. . arXiv:2302.14389 [cs.CL]
  139. Pavlick E. 2022.. Semantic structure in deep learning. . Annu. Rev. Linguist. 8::44771
    [Crossref] [Google Scholar]
  140. Pennington J, Socher R, Manning C. 2014.. GloVe: global vectors for word representation. . In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, ed. A Moschitti, B Pang, W Daelemans , pp. 153243. Kerrville, TX:: Assoc. Comput. Linguist.
    [Google Scholar]
  141. Pereira F, Detre G, Botvinick M. 2011.. Generating text from functional brain images. . Front. Hum. Neurosci. 5::72
    [Crossref] [Google Scholar]
  142. Ponce CR, Xiao W, Schade PF, Hartmann TS, Kreiman G, Livingstone MS. 2019.. Evolving images for visual neurons using a deep generative network reveals coding principles and neuronal preferences. . Cell 177:(4):9991009.e10
    [Crossref] [Google Scholar]
  143. Potter MC. 2012.. Recognition and memory for briefly presented scenes. . Front. Psychol. 3::32
    [Google Scholar]
  144. Price CJ. 2010.. The anatomy of language: a review of 100 fMRI studies published in 2009. . Ann. N. Y. Acad. Sci. 1191:(1):6288
    [Crossref] [Google Scholar]
  145. Qian P, Qiu X, Huang X. 2016.. Bridging LSTM architecture and the neural dynamics during reading. . arXiv:1604.06635 [cs.CL]
  146. Radford A, Narasimhan K, Salimans T, Sutskever I. 2018.. Improving language understanding by generative pre-training. Work. Pap. , OpenAI, San Francisco, CA:. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
    [Google Scholar]
  147. Ratan Murty NA, Bashivan P, Abate A, DiCarlo JJ, Kanwisher N. 2021.. Computational models of category-selective brain regions enable high-throughput tests of selectivity. . Nat. Commun. 12:(1):5540
    [Crossref] [Google Scholar]
  148. Rayner K, Reichle ED, Stroud MJ, Williams CC, Pollatsek A. 2006.. The effect of word frequency, word predictability, and font difficulty on the eye movements of young and older readers. . Psychol. Aging 21:(3):44865
    [Crossref] [Google Scholar]
  149. Regev TI, Casto C, Hosseini EA, Adamek M, Ritaccio AL, et al. 2023.. Neural populations in the language network differ in the size of their temporal receptive windows. . bioRxiv 2022.12.30.522216. https://doi.org/10.1101/2022.12.30.522216
  150. Ryskin R, Nieuwland MS. 2023.. Prediction during language comprehension: What is next?. Trends Cogn. Sci. 27:(11):103252
    [Crossref] [Google Scholar]
  151. Saussure F. 1959.. Course in General Linguistics. New York:: Columbia Univ. Press
    [Google Scholar]
  152. Saxe R, Brett M, Kanwisher N. 2006.. Divide and conquer: a defense of functional localizers. . NeuroImage 30:(4):108896
    [Crossref] [Google Scholar]
  153. Schrimpf M, Blank IA, Tuckute G, Kauf C, Hosseini EA, et al. 2021.. The neural architecture of language: integrative modeling converges on predictive processing. . PNAS 118:(45):e2105646118
    [Crossref] [Google Scholar]
  154. Shain C, Blank IA, Fedorenko E, Gibson E, Schuler W. 2022a.. Robust effects of working memory demand during naturalistic language comprehension in language-selective cortex. . J. Neurosci. 42:(39):741230
    [Crossref] [Google Scholar]
  155. Shain C, Blank IA, van Schijndel M, Schuler W, Fedorenko E. 2020.. fMRI reveals language-specific predictive coding during naturalistic sentence comprehension. . Neuropsychologia 138::107307
    [Crossref] [Google Scholar]
  156. Shain C, Kean H, Lipkin B, Affourtit J, Siegelman M, et al. 2024.. Distributed sensitivity to syntax and semantics throughout the language network. . J. Cogn. Sci. https://doi.org/10.1162/jocn_a_02164
    [Google Scholar]
  157. Shain C, Meister C, Pimentel T, Cotterell R, Levy RP. 2022b.. Large-scale evidence for logarithmic effects of word predictability on reading time. . PsyArXiv. https://doi.org/10.31234/osf.io/4hyna
  158. Shain C, Paunov A, Chen X, Lipkin B, Fedorenko E. 2023.. No evidence of theory of mind reasoning in the human language network. . Cereb. Cortex 33:(10):6299319
    [Crossref] [Google Scholar]
  159. Simon HA. 1962.. The architecture of complexity. . Proc. Am. Philos. Soc. 106:(6):46782
    [Google Scholar]
  160. Smith NJ, Levy R. 2013.. The effect of word predictability on reading time is logarithmic. . Cognition 128:(3):30219
    [Crossref] [Google Scholar]
  161. Steuer J, Mosbach M, Klakow D. 2023.. Large GPT-like models are bad babies: a closer look at the relationship between linguistic competence and psycholinguistic measures. . arXiv:2311.04547 [cs.CL]
  162. Tang J, LeBel A, Jain S, Huth AG. 2023.. Semantic reconstruction of continuous language from non-invasive brain recordings. . Nat. Neurosci. 26::85866
    [Crossref] [Google Scholar]
  163. Tenney I, Das D, Pavlick E. 2019.. BERT rediscovers the classical NLP pipeline. . arXiv:1905.05950 [cs.CL]
  164. Toneva M, Mitchell TM, Wehbe L. 2022.. Combining computational controls with natural text reveals aspects of meaning composition. . Nat. Comput. Sci. 2:(11):74557
    [Crossref] [Google Scholar]
  165. Toneva M, Wehbe L. 2019.. Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain). . In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), ed. H Wallach, H Larochelle, A Beygelzimer, F d'Alché-Buc, E Fox, R Garnett , pp. 1495464. San Diego, CA:: NeurIPS
    [Google Scholar]
  166. Tuckute G, Feather J, Boebinger D, McDermott JH. 2023.. Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions. . PLOS Biol. 21:(12):e3002366
    [Crossref] [Google Scholar]
  167. Tuckute G, Sathe A, Srikant S, Taliaferro M, Wang M, et al. 2024.. Driving and suppressing the human language network using large language models. . Nat. Hum. Behav. 8::54461
    [Crossref] [Google Scholar]
  168. van Schijndel M, Linzen T. 2021.. Single-stage prediction models do not explain the magnitude of syntactic disambiguation difficulty. . Cogn. Sci. 45:(6):e12988
    [Crossref] [Google Scholar]
  169. Varley RA, Klessinger NJC, Romanowski CAJ, Siegal M. 2005.. Agrammatic but numerate. . PNAS 102:(9):351924
    [Crossref] [Google Scholar]
  170. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, et al. 2017.. Attention is all you need. . In Advances in Neural Information Processing Systems 30 (NIPS 2017), ed. I Guyon, U Von Luxburg, S Bengio, H Wallach, R Fergus, et al . San Diego, CA:: NeurIPS. https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
    [Google Scholar]
  171. Wang A, Pruksachatkun Y, Nangia N, Singh A, Michael J, et al. 2020.. SuperGLUE: a stickier benchmark for general-purpose language understanding systems. . arXiv:1905.00537 [cs.CL]
  172. Wang A, Singh A, Michael J, Hill F, Levy O, Bowman SR. 2019.. GLUE: a multi-task benchmark and analysis platform for natural language understanding. . arXiv:1804.07461 [cs.CL]
  173. Wang K, Variengien A, Conmy A, Shlegeris B, Steinhardt J. 2022.. Interpretability in the wild: a circuit for indirect object identification in GPT-2 small. . arXiv:2211.00593 [cs.LG]
  174. Warstadt A, Bowman SR. 2022.. What artificial neural networks can tell us about human language acquisition. . arXiv:2208.07998 [cs.CL]
  175. Warstadt A, Choshen L, Mueller A, Williams A, Wilcox E, Zhuang C. 2023.. Call for papers—the BabyLM Challenge: sample-efficient pretraining on a developmentally plausible corpus. . arXiv:2301.11796 [cs.CL]
  176. Warstadt A, Parrish A, Liu H, Mohananey A, Peng W, et al. 2020.. BLiMP: the benchmark of linguistic minimal pairs for English. . Trans. Assoc. Comput. Linguist. 8::37792
    [Crossref] [Google Scholar]
  177. Wehbe L, Huth AG, Deniz F, Gao J, Kieseler M-L, Gallant JL. 2018.. BOLD predictions: automated simulation of fMRI experiments. Poster presented at the 2018 Conference on Cognitive Computational Neuroscience, Philadelphia, PA:, Sept. 6
    [Google Scholar]
  178. Wehbe L, Murphy B, Talukdar P, Fyshe A, Ramdas A, Mitchell T. 2014.. Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses. . PLOS ONE 9:(11):e112575
    [Crossref] [Google Scholar]
  179. Wiedemann G, Remus S, Chawla A, Biemann C. 2019.. Does BERT make any sense? Interpretable word sense disambiguation with contextualized embeddings. . arXiv:1909.10430 [cs.CL]
  180. Wilcox EG, Gauthier J, Hu J, Qian P, Levy R. 2020.. On the predictive power of neural language models for human real-time comprehension behavior. . In Proceedings of the 42nd Annual Meeting of the Cognitive Science Society, ed. S Denison, M Mack, Y Xu, BC Armstrong , pp. 170713. Seattle, WA:: Cogn. Sci. Soc.
    [Google Scholar]
  181. Wilcox EG, Vani P, Levy R. 2021.. A targeted assessment of incremental processing in neural language models and humans. . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ed. C Zong, F Xia, W Li, R Navigli , pp. 93952. Kerrville, TX:: Assoc. Comput. Linguist. https://doi.org/10.18653/v1/2021.acl-long.76
    [Google Scholar]
  182. Willems RM, Frank SL, Nijhof AD, Hagoort P, van den Bosch A. 2016.. Prediction during natural language comprehension. . Cereb. Cortex 26:(6):250616
    [Crossref] [Google Scholar]
  183. Wilson SM, Entrup JL, Schneck SM, Onuscheck CF, Levy DF, et al. 2023.. Recovery from aphasia in the first year after stroke. . Brain 146:(3):102139
    [Crossref] [Google Scholar]
  184. Wong L, Grand G, Lew AK, Goodman ND, Mansinghka VK, Andreas J, Tenenbaum JB. 2023.. From word models to world models: translating from natural language to the probabilistic language of thought. . arXiv:2306.12672 [cs.CL]
  185. Yamins DLK, DiCarlo JJ. 2016.. Using goal-driven deep learning models to understand sensory cortex. . Nat. Neurosci. 19:(3):35665
    [Crossref] [Google Scholar]
  186. Yamins DLK, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ. 2014.. Performance-optimized hierarchical models predict neural responses in higher visual cortex. . PNAS 111:(23):861924
    [Crossref] [Google Scholar]
  187. Yu C, Smith LB. 2012.. Embodied attention and word learning by toddlers. . Cognition 125:(2):24462
    [Crossref] [Google Scholar]
  188. Yu J, Wang X, Tu S, Cao S, Zhang-Li D, et al. 2023.. KoLA: carefully benchmarking world knowledge of large language models. . arXiv:2306.09296 [cs.CL]
  189. Yun C, Bhojanapalli S, Rawat A, Reddi SJ, Kumar S. 2020.. Are transformers universal approximators of sequence-to-sequence functions? Paper presented at the International Conference on Learning Representations (ICLR 2020), Addis Ababa:, Ethiopia, Apr. 30
    [Google Scholar]
  190. Zada Z, Goldstein A, Michelmann S, Simony E, Price A, et al. 2023.. A shared linguistic space for transmitting our thoughts from brain to brain in natural conversations. . bioRxiv 2023.06.27.546708. https://doi.org/10.1101/2023.06.27.546708
  191. Zador A, Escola S, Richards B, Ölveczky B, Bengio Y, et al. 2023.. Catalyzing next-generation Artificial Intelligence through NeuroAI. . Nat. Commun. 14:(1):1597
    [Crossref] [Google Scholar]
  192. Zhang H, Li LH, Meng T, Chang K-W, den Broeck GV. 2022.. On the paradox of learning to reason from data.. arXiv:2205.11502 [cs.CL]
  193. Zhang S, Roller S, Goyal N, Artetxe M, Chen M, et al. 2022.. OPT: open pre-trained transformer language models. . arXiv:2205.01068 [cs.CL]
  194. Zhang Y, Gibson E, Davis F. 2023.. Can language models be tricked by language illusions? Easier with syntax, harder with semantics. . arXiv:2311.01386 [cs.CL]
  195. Ziegler DM, Stiennon N, Wu J, Brown TB, Radford A, et al. 2019.. Fine-tuning language models from human preferences. . arXiv:1909.08593 [cs.CL]
  196. Zou S, Wang S, Zhang J, Zong C. 2022.. Cross-modal cloze task: a new task to brain-to-word decoding. . In Findings of the Association for Computational Linguistics: ACL 2022, ed. S Muresan, P Nakov, A Villavicencio , pp. 64857. Kerrville, TX:: Assoc. Comput. Linguist. https://doi.org/10.18653/v1/2022.findings-acl.54
    [Google Scholar]
/content/journals/10.1146/annurev-neuro-120623-101142
Loading
/content/journals/10.1146/annurev-neuro-120623-101142
Loading

Data & Media loading...

Supplemental Materials

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error