1932

Abstract

Deep learning has recently come to dominate computational linguistics, leading to claims of human-level performance in a range of language processing tasks. Like much previous computational work, deep learning–based linguistic representations adhere to the distributional meaning-in-use hypothesis, deriving semantic representations from word co-occurrence statistics. However, current deep learning methods entail fundamentally new models of lexical and compositional meaning that are ripe for theoretical analysis. Whereas traditional distributional semantics models take a bottom-up approach in which sentence meaning is characterized by explicit composition functions applied to word meanings, new approaches take a top-down approach in which sentence representations are treated as primary and representations of words and syntax are viewed as emergent. This article summarizes our current understanding of how well such representations capture lexical semantics, world knowledge, and composition. The goal is to foster increased collaboration on testing the implications of such representations as general-purpose models of semantics.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-linguistics-031120-122924
2022-01-14
2024-12-14
Loading full text...

Full text loading...

/deliver/fulltext/linguistics/8/1/annurev-linguistics-031120-122924.html?itemId=/content/journals/10.1146/annurev-linguistics-031120-122924&mimeType=html&fmt=ahah

Literature Cited

  1. Adi Y, Kermany E, Belinkov Y, Lavi O, Goldberg Y 2017. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks Paper presented at the International Conference on Learning Representations (ICLR 2017 Toulon, Fr.: Apr. 24–26
    [Google Scholar]
  2. Adiwardana D, Luong MT, So DR, Hall J, Fiedel N et al. 2020. Towards a human-like open-domain chatbot. arXiv:2001.09977 [cs.CL]
  3. Baroni M, Dinu G, Kruszewski G. 2014. Don't count, predict! A systematic comparison of context-counting versus context-predicting semantic vectors. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics 1 Long Papers238–47 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  4. Baroni M, Zamparelli R. 2010. Nouns are vectors, adjectives are matrices: representing adjective-noun constructions in semantic space. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP)1183–93 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  5. Belinkov Y, Glass J. 2019. Analysis methods in neural language processing: a survey. Trans. Assoc. Comput. Linguist. 7:49–72
    [Google Scholar]
  6. Bender EM, Koller A. 2020. Climbing towards NLU: on meaning, form, and understanding in the age of data. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics5185–98 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  7. Bengio Y, Ducharme R, Vincent P, Janvin C. 2003. A neural probabilistic language model. J. Mach. Learn. Res. 3:1137–55
    [Google Scholar]
  8. Berant J, Dagan I, Goldberger J 2011. Global learning of typed entailment rules. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies610–19 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  9. Bojanowski P, Grave E, Joulin A, Mikolov T. 2017. Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5:135–46
    [Google Scholar]
  10. Boleda G. 2020. Distributional semantics and linguistic theory. Annu. Rev. Linguist. 6:213–34
    [Google Scholar]
  11. Boleda G, Herbelot A. 2016. Formal distributional semantics: introduction to the special issue. Comput. Linguist. 42:4619–35
    [Google Scholar]
  12. Bosselut A, Rashkin H, Sap M, Malaviya C, Celikyilmaz A, Choi Y 2019. COMET: commonsense transformers for automatic knowledge graph construction. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics4762–79 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  13. Bowman SR, Dahl GE. 2021. What will it take to fix benchmarking in natural language understanding?. arXiv:2104.02145 [cs.CL]
  14. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J et al. 2020. Language models are few-shot learners. arXiv:2005.14165 [cs.CL]
  15. Cao B, Lin H, Han X, Sun L, Yan L et al. 2021. Knowledgeable or educated guess? Revisiting language models as knowledge bases. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 1 Long Papers1860–74 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  16. Carlson A, Betteridge J, Kisiel B, Settles B, Hruschka E, Mitchell T 2010. Toward an architecture for never-ending language learning. Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence1306–13 Palo Alto, CA: AAAI Press
    [Google Scholar]
  17. Chronis G, Erk K. 2020. When is a bishop not like a rook? When it's like a rabbi! Multi-prototype BERT embeddings for estimating semantic relationships. Proceedings of the 24th Conference on Computational Natural Language Learning227–44 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  18. Conklin H, Wang B, Smith K, Titov I 2021. Meta-learning to compositionally generalize. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 1 Long Papers3322–35 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  19. Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A. 2017. Supervised learning of universal sentence representations from natural language inference data. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP)670–80 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  20. Conneau A, Kruszewski G, Lample G, Barrault L, Baroni M. 2018. What you can cram into a single $ & !#* vector: probing sentence embeddings for linguistic properties. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics 1 Long Papers2126–36 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  21. Cooper R, Crouch D, Van Eijck J, Fox C, Van Genabith J et al. 1996. Using the framework Tech. Rep. LRE 62-051 D-16 FraCaS Consort:.
    [Google Scholar]
  22. Da J, Kasai J. 2019. Cracking the contextual commonsense code: understanding commonsense reasoning aptitude of deep contextual representations. Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing1–12 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  23. Dagan I, Glickman O, Magnini B. 2006. The PASCAL recognising textual entailment challenge. Machine Learning Challenges: Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Textual Entailment177–90 Berlin: Springer
    [Google Scholar]
  24. Dasgupta I, Guo D, Stuhlmüller A, Gershman SJ, Goodman ND. 2018. Evaluating compositionality in sentence embeddings. arXiv:1802.04302 [cs.CL]
  25. De Marneffe MC, Simons M, Tonhauser J. 2019. The CommitmentBank: investigating projection in naturally occurring discourse. Proceedings of Sinn und Bedeutung 23107–24 Bellaterra, Spain: Univ. Autòn Barcelona
    [Google Scholar]
  26. Devlin J, Chang MW, Lee K, Toutanova K 2018. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 [cs.CL]
  27. Dhillon PS, Foster D, Ungar L 2011. Multi-view learning of word embeddings via CCA. . In NIPS'11: Proceedings of the 24th International Conference on Neural Information Processing Systems199–207 New York: Assoc. Comput. Mach.
    [Google Scholar]
  28. Dumais ST, Furnas GW, Landauer TK, Deerwester S, Harshman R 1988. Using latent semantic analysis to improve access to textual information. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems281–85 New York: Assoc. Comput. Mach.
    [Google Scholar]
  29. Elman JL. 1990. Finding structure in time. Cogn. Sci. 14:2179–211
    [Google Scholar]
  30. Emerson G. 2020. What are the goals of distributional semantics?. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics7436–53 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  31. Erk K. 2007. A simple, similarity-based model for selectional preferences. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics216–23 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  32. Erk K. 2012. Vector space models of word meaning and phrase meaning: a survey. Lang. Linguist. Compass 6:10635–53
    [Google Scholar]
  33. Ettinger A. 2020. What BERT is not: lessons from a new suite of psycholinguistic diagnostics for language models. Trans. Assoc. Comput. Linguist. 8:34–48
    [Google Scholar]
  34. Ettinger A, Elgohary A, Phillips C, Resnik P. 2018. Assessing composition in sentence vector representations. Proceedings of the 27th International Conference on Computational Linguistics1790–801 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  35. Ettinger A, Elgohary A, Resnik P. 2016. Probing for semantic evidence of composition by means of simple classification tasks. Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP134–39 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  36. Etzioni O, Banko M, Soderland S, Weld DS 2008. Open information extraction from the web. Commun. ACM 51:1268–74
    [Google Scholar]
  37. Fellbaum C. 2010. WordNet. Theory and Applications of Ontology: Computer Applications231–43 Dordrecht, Neth: Springer
    [Google Scholar]
  38. Firth JR. 1957. A synopsis of linguistic theory, 1930–1955. Studies in Linguistic Analysis1–32 Oxford, UK: Philol. Soc.
    [Google Scholar]
  39. Fodor JA, Pylyshyn ZW. 1988. Connectionism and cognitive architecture: a critical analysis. Cognition 28:1–23–71
    [Google Scholar]
  40. Forbes M, Holtzman A, Choi Y. 2019. Do neural language representations learn physical commonsense?. arXiv:1908.02899 [cs.CL]
  41. Frege G. 1884. Die Grundlagen der Arithmetik: Eine logisch mathematische Untersuchung über den Begriff der Zahl. Breslau: W. Koebner
    [Google Scholar]
  42. Futrell R, Wilcox E, Morita T, Qian P, Ballesteros M, Levy R 2019. Neural language models as psycholinguistic subjects: representations of syntactic state. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 1 Long and Short Papers32–42 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  43. Garcia M, Kramer Vieira T, Scarton C, Idiart M, Villavicencio A 2021. Assessing the representations of idiomaticity in vector models with a noun compound dataset labeled at type and token levels. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 1 Long Papers2730–41 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  44. Giulianelli M, Del Tredici M, Fernández R. 2020. Analysing lexical semantic change with contextualised word representations. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics3960–73 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  45. Glockner M, Shwartz V, Goldberg Y. 2018. Breaking NLI systems with sentences that require simple lexical inferences. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics 2 Short Papers650–55 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  46. Goodwin E, Sinha K, O'Donnell TJ. 2020. Probing linguistic systematicity. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics1958–69 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  47. Grefenstette E, Sadrzadeh M 2011. Experimenting with transitive verbs in a DisCoCat. arXiv:1107.3119 [cs.CL]
  48. Hagoort P, van Berkum J. 2007. Beyond the sentence given. Philos. Trans. R. Soc. B 362:1481801–11
    [Google Scholar]
  49. Harris ZS. 1954. Distributional structure. Word 10:2–3146–62
    [Google Scholar]
  50. Hartung M, Kaupmann F, Jebbara S, Cimiano P 2017. Learning compositionality functions on word embeddings for modelling attribute meaning in adjective-noun phrases. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics 1 Long Papers54–64 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  51. Hewitt J, Liang P. 2019. Designing and interpreting probes with control tasks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)2733–43 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  52. Hewitt J, Manning CD. 2019. A structural probe for finding syntax in word representations. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 1 Long and Short Papers4129–38 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  53. Hosseini MJ, Chambers N, Reddy S, Holt XR, Cohen SB et al. 2018. Learning typed entailment graphs with global soft constraints. Trans. Assoc. Comput. Linguist. 6:703–17
    [Google Scholar]
  54. Hu J, Gauthier J, Qian P, Wilcox E, Levy R 2020. A systematic assessment of syntactic generalization in neural language models. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics1725–44 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  55. Hupkes D, Singh A, Korrel K, Kruszewski G, Bruni E. 2018a. Learning compositionally through attentive guidance. arXiv:1805.09657 [cs.CL]
  56. Hupkes D, Veldhoen S, Zuidema W. 2018b. Visualisation and ‘diagnostic classifiers’ reveal how recurrent and recursive neural networks process hierarchical structure. J. Artif. Intel. Res. 61:907–26
    [Google Scholar]
  57. Hwang JD, Bhagavatula C, Bras RL, Da J, Sakaguchi K et al. 2020. COMET-ATOMIC 2020: on symbolic and neural commonsense knowledge graphs. arXiv:2010.05953 [cs.CL]
  58. Johnson D, Mak D, Barker A, Loessberg-Zahl L. 2020. Probing for multilingual numerical understanding in transformer-based language models. Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP184–92 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  59. Jumelet J, Hupkes D. 2018. Do language models understand anything? On the ability of LSTMs to understand negative polarity items. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP222–31 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  60. Jurafsky D, Martin JH. 2009. Speech and Language Processing New York: Prentice-Hall, Inc, 2nd ed..
    [Google Scholar]
  61. Kann K, Warstadt A, Williams A, Bowman SR 2019. Verb argument structure alternations in word and sentence embeddings. Proceedings of the Society for Computation in Linguistics (SCiL) 2019287–97 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  62. Kassner N, Schütze H. 2020. Negated and misprimed probes for pretrained language models: Birds can talk, but cannot fly. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics7811–18 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  63. Kim N, Linzen T. 2020. COGS: a compositional generalization challenge based on semantic interpretation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)9087–105 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  64. Kim N, Patel R, Poliak A, Xia P, Wang A et al. 2019. Probing what different NLP tasks teach machines about function word comprehension. Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)235–49 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  65. Kiros R, Zhu Y, Salakhutdinov R, Zemel RS, Torralba A et al. 2015. Skip-thought vectors. arXiv:1506.06726 [cs.CL]
  66. Lake B, Baroni M 2018. Generalization without systematicity: on the compositional skills of sequence-to-sequence recurrent networks. Proceedings of the 35th International Conference on Machine Learning (ICML 2018) J Dy, A Krause 2873–82 n.p. PMLR
    [Google Scholar]
  67. Landauer TK, Dumais ST. 1997. A solution to Plato's problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104:2211–40
    [Google Scholar]
  68. Lapata M, Lascarides A. 2003. A probabilistic account of logical metonymy. Comput. Linguist. 29:2261–315
    [Google Scholar]
  69. LeCun Y, Bengio Y, Hinton G. 2015. Deep learning. Nature 521:7553436–44
    [Google Scholar]
  70. Lenci A. 2008. Distributional semantics in linguistic and cognitive research. Ital. J. Linguist. 20:11–31
    [Google Scholar]
  71. Lenci A. 2018. Distributional models of word meaning. Annu. Rev. Linguist. 4:151–71
    [Google Scholar]
  72. Li BZ, Nye M, Andreas J. 2021. Implicit representations of meaning in neural language models. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 1 Long Papers1813–27 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  73. Li X, Taheri A, Tu L, Gimpel K 2016. Commonsense knowledge base completion. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics 1 Long Papers1445–55 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  74. Lin BY, Lee S, Khanna R, Ren X. 2020. Birds have four legs?! NumerSense: probing numerical commonsense knowledge of pre-trained language models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)6862–68 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  75. Linzen T. 2020. How can we accelerate progress towards human-like linguistic generalization?. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics5210–17 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  76. Linzen T, Baroni M. 2020. Syntactic structure from deep learning. Annu. Rev. Linguist. 7:195–212
    [Google Scholar]
  77. Linzen T, Dupoux E, Goldberg Y 2016. Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Trans. Assoc. Comput. Linguist. 4:521–35
    [Google Scholar]
  78. Liu H, Singh P. 2004. ConceptNet—a practical commonsense reasoning tool-kit. BT Technol. J. 22:4211–26
    [Google Scholar]
  79. Liu NF, Gardner M, Belinkov Y, Peters ME, Smith NA. 2019. Linguistic knowledge and transferability of contextual representations. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 1 Long and Short Papers1073–94 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  80. Lovering C, Jha R, Linzen T, Pavlick E 2021. Predicting inductive biases of pre-trained models Poster presented at the International Conference on Learning Representations (ICLR 2021) Vienna, Austria: May 4
    [Google Scholar]
  81. Lund K, Burgess C, Atchley R 1995. Semantic and associative priming in high-dimensional semantic space. Proceedings of the 17th Annual Conference of the Cognitive Science Society JD Moore, JF Lehman 660–65 Mahwah, NJ: Erlbaum
    [Google Scholar]
  82. McCoy T, Pavlick E, Linzen T 2019. Right for the wrong reasons: diagnosing syntactic heuristics in natural language inference. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics3428–48 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  83. McKenna N, Steedman M. 2020. Learning negation scope from syntactic structure. Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics137–42 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  84. Merchant A, Rahimtoroghi E, Pavlick E, Tenney I 2020. What happens to BERT embeddings during fine-tuning?. Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP33–44 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  85. Mickus T, Paperno D, Constant M, van Deemter K. 2020. What do you mean, BERT?. Proceedings of the Society for Computation in Linguistics 2020279–90 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  86. Mikolov T, Chen K, Corrado G, Dean J 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781 [cs.CL]
  87. Mitchell J, Lapata M 2010. Composition in distributional models of semantics. Cogn. Sci. 34:81388–429
    [Google Scholar]
  88. Mosbach M, Khokhlova A, Hedderich MA, Klakow D. 2020. On the interplay between fine-tuning and sentence-level probing for linguistic knowledge in pre-trained transformers. Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP68–82 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  89. Naik A, Ravichander A, Rose C, Hovy E 2019. Exploring numeracy in word embeddings. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics3374–80 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  90. Nair S, Srinivasan M, Meylan S. 2020. Contextualized word embeddings encode aspects of human-like word sense knowledge. Proceedings of the Workshop on the Cognitive Aspects of the Lexicon129–41 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  91. Nandakumar N, Baldwin T, Salehi B 2019. How well do embedding models capture non-compositionality? A view from multiword expressions. Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP27–34 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  92. Nayak N, Angeli G, Manning CD. 2016. Evaluating word embeddings using a representative suite of practical tasks. Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP19–23 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  93. Partee B 1995. Lexical semantics and compositionality. Invitation to Cognitive Science 1 Language L Gleitman, M Liberman, DN Osherson 311–60 Cambridge, MA: MIT Press, 2nd ed..
    [Google Scholar]
  94. Pavlick E, Callison-Burch C. 2016. Most “babies” are “little” and most “problems” are “huge”: compositional entailment in adjective-nouns. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics 1 Long Papers2164–73 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  95. Pennington J, Socher R, Manning CD. 2014. GloVe: global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)1532–43 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  96. Peters M, Neumann M, Iyyer M, Gardner M, Clark C et al. 2018a. Deep contextualized word representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 1 Long Papers2227–37 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  97. Peters M, Neumann M, Zettlemoyer L, Yih W-T. 2018b. Dissecting contextual word embeddings: architecture and representation. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)1499–509 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  98. Petroni F, Rocktäschel T, Riedel S, Lewis P, Bakhtin A et al. 2019. Language models as knowledge bases?. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)2463–73 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  99. Poliak A, Haldar A, Rudinger R, Hu JE, Pavlick E et al. 2018a. Collecting diverse natural language inference problems for sentence representation evaluation. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)67–81 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  100. Poliak A, Naradowsky J, Haldar A, Rudinger R, Van Durme B. 2018b. Hypothesis only baselines in natural language inference. Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics180–91 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  101. Potts C. 2020. Is it possible for language models to achieve language understanding?. Medium Oct. 5. https://chrisgpotts.medium.com/is-it-possible-for-language-models-to-achieve-language-understanding-81df45082ee2
    [Google Scholar]
  102. Radford A, Kim JW, Hallacy C, Ramesh A, Goh G et al. 2021. Learning transferable visual models from natural language supervision. arXiv:2103.00020 [cs.CV]
  103. Radford A, Wu J, Amodei D, Amodei D, Clark J 2020. Better language models and their implications. OpenAI Blog Feb. 14. https://openai.com/blog/better-language-models/
    [Google Scholar]
  104. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. 2019. Language models are unsupervised multitask learners. Work. Pap., OpenAI San Francisco:
    [Google Scholar]
  105. Rajpurkar P, Zhang J, Lopyrev K, Liang P. 2016a. SQuAD: 100,000+ questions for machine comprehension of text. arXiv:1606.05250 [cs.CL]
  106. Rajpurkar P, Zhang J, Lopyrev K, Liang P. 2016b. SQuAD: 100,000+ questions for machine comprehension of text. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP)2383–92 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  107. Ravichander A, Hovy E, Suleman K, Trischler A, Cheung JCK 2020. On the systematicity of probing contextualized word representations: the case of hypernymy in BERT. Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics88–102 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  108. Reif E, Yuan A, Wattenberg M, Viegas FB, Coenen A et al. 2019. Visualizing and measuring the geometry of BERT. Advances in Neural Information Processing Systems 32 (NeurIPS 2019) H Wallach, H Larochelle, A Beygelzimer, F d'Alché-Buc, E Fox, R Garnett Red Hook, NY: Curran Assoc https://papers.nips.cc/paper/2019/file/159c1ffe5b61b41b3c4d8f4c2150f6c4-Paper.pdf
    [Google Scholar]
  109. Ribeiro MT, Singh S, Guestrin C. 2018. Semantically equivalent adversarial rules for debugging NLP models. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics 1 Long Papers856–65 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  110. Ribeiro MT, Wu T, Guestrin C, Singh S. 2020. Beyond accuracy: behavioral testing of NLP models with CheckList. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics4902–12 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  111. Ross A, Pavlick E. 2019. How well do NLI models capture verb veridicality?. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)2230–40 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  112. Rudinger R, Naradowsky J, Leonard B, Van Durme B 2018. Gender bias in coreference resolution. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2 Short Papers8–14 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  113. Saxton D, Grefenstette E, Hill F, Kohli P 2019. Analysing mathematical reasoning abilities of neural models Paper presented at the International Conference on Learning Representations (ICLR 2019) New Orleans, LA: May 6–9
    [Google Scholar]
  114. Shwartz V, Dagan I. 2019. Still a pain in the neck: evaluating text representations on lexical composition. Trans. Assoc. Comput. Linguist. 7:403–19
    [Google Scholar]
  115. Sinha K, Parthasarathi P, Pineau J, Williams A. 2021. UnNatural Language Inference. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 1 Long Papers7329–46 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  116. Smolensky P. 1990. Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artif. Intel. 46:1–2159–216
    [Google Scholar]
  117. Sun C, Myers A, Vondrick C, Murphy K, Schmid C 2019. VideoBERT: a joint model for video and language representation learning. Proceedings of the IEEE/CVF International Conference on Computer Vision7464–73 Los Alamitos, CA: IEEE
    [Google Scholar]
  118. Szabó ZG. 2020. Compositionality. Stanford Encyclopedia of Philosophy EN Zalta Stanford, CA: Stanford Univ https://plato.stanford.edu/entries/compositionality/
    [Google Scholar]
  119. Tenney I, Das D, Pavlick E. 2019a. BERT rediscovers the classical NLP pipeline. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics4593–601 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  120. Tenney I, Xia P, Chen B, Wang A, Poliak A et al. 2019b. What do you learn from context? Probing for sentence structure in contextualized word representations Paper presented at the International Conference on Learning Representations (ICLR 2019) New Orleans, LA: May 6–9
    [Google Scholar]
  121. Thrush T, Wilcox E, Levy R 2020. Investigating novel verb learning in BERT: selectional preference classes and alternation-based syntactic generalization. Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP265–75 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  122. Turney PD, Pantel P. 2010. From frequency to meaning: vector space models of semantics. J. Artif. Intel. Res. 37:141–88
    [Google Scholar]
  123. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L et al. 2017. Attention is all you need. arXiv:1706.03762 [cs.CL]
  124. Veldhoen S, Hupkes D, Zuidema W 2016. Diagnostic classifiers: revealing how neural networks process hierarchical structure. Proceedings of the Workshop on Cognitive Computation: Integrating Neural and Symbolic Approaches (CoCo 2016) TR Besold, A Bordes, A d'Avila Garcez, G Wayne Aachen, Ger: RWTH Aachen Univ http://ceur-ws.org/Vol-1773/CoCoNIPS_2016_paper6.pdf
    [Google Scholar]
  125. Voita E, Titov I. 2020. Information-theoretic probing with minimum description length. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)183–96 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  126. Vulić I, Ponti EM, Litschko R, Glavaš G, Korhonen A 2020. Probing pretrained language models for lexical semantics. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)7222–40 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  127. Wallace E, Feng S, Kandpal N, Gardner M, Singh S 2019a. Universal adversarial triggers for attacking and analyzing NLP. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)2153–62 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  128. Wallace E, Wang Y, Li S, Singh S, Gardner M. 2019b. Do NLP models know numbers? Probing numeracy in embeddings. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)5307–15 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  129. Wang A, Hula J, Xia P, Pappagari R, McCoy RT et al. 2019a. Can you tell me how to get past Sesame Street? Sentence-level pretraining beyond language modeling. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics4465–76 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  130. Wang A, Pruksachatkun Y, Nangia N, Singh A, Michael J et al. 2019b. SuperGLUE: a stickier benchmark for general-purpose language understanding systems. Advances in Neural Information Processing Systems 32 (NeurIPS 2019) H Wallach, H Larochelle, A Beygelzimer, F d'Alché-Buc, E Fox, R Garnett Red Hook, NY: Curran Assoc https://papers.nips.cc/paper/2019/file/4496bf24afe7fab6f046bf4923da8de6-Paper.pdf
    [Google Scholar]
  131. Wang A, Singh A, Michael J, Hill F, Levy O, Bowman SR 2019c. GLUE: a multi-task benchmark and analysis platform for natural language understanding. Paper presented at the International Conference on Learning Representations (ICLR 2019) New Orleans, LA:
    [Google Scholar]
  132. Warstadt A, Cao Y, Grosu I, Peng W, Blix H et al. 2019. Investigating BERT's knowledge of language: five analysis methods with NPIs. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)2877–87 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  133. Warstadt A, Zhang Y, Li X, Liu H, Bowman SR. 2020. Learning which features matter: RoBERTa acquires a preference for linguistic generalizations (eventually). Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)217–35 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  134. White AS, Rawlins K. 2018. The role of veridicality and factivity in clause selection. Proceedings of the 48th Annual Meeting of the North East Linguistic Society (NELS 48)221–34 Amherst, MA: Grad. Linguist. Stud. Assoc.
    [Google Scholar]
  135. Yanaka H, Mineshima K, Bekki D, Inui K, Sekine S et al. 2019. Can neural networks understand monotonicity reasoning?. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP31–40 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  136. Yenicelik D, Schmidt F, Kilcher Y. 2020. How does BERT capture semantics? A closer look at polysemous words. Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP156–62 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  137. Yu L, Ettinger A 2020. Assessing phrasal representation and composition in transformers. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)4896–907 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  138. Yu L, Ettinger A 2021. On the interplay between fine-tuning and composition in transformers. Findings of the Association for Computational Linguistics: ACL-IJCNLP 20212279–93 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  139. Zhang Y, Warstadt A, Li X, Bowman SR 2021. When do you need billions of words of pretraining data?. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 1 Long Papers1112–25 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
/content/journals/10.1146/annurev-linguistics-031120-122924
Loading
  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error