1932

Abstract

Distributional semantics provides multidimensional, graded, empirically induced word representations that successfully capture many aspects of meaning in natural languages, as shown by a large body of research in computational linguistics; yet, its impact in theoretical linguistics has so far been limited. This review provides a critical discussion of the literature on distributional semantics, with an emphasis on methods and results that are relevant for theoretical linguistics, in three areas: semantic change, polysemy and composition, and the grammar–semantics interface (specifically, the interface of semantics with syntax and with derivational morphology). The goal of this review is to foster greater cross-fertilization of theoretical and computational approaches to language as a means to advance our collective knowledge of how it works.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-linguistics-011619-030303
2020-01-14
2024-07-15
Loading full text...

Full text loading...

/deliver/fulltext/linguistics/6/1/annurev-linguistics-011619-030303.html?itemId=/content/journals/10.1146/annurev-linguistics-011619-030303&mimeType=html&fmt=ahah

Literature Cited

  1. Aina L, Gulordava K, Boleda G 2019. Putting words in context: LSTM language models and lexical ambiguity. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics3233–48 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  2. Alishahi A, Chrupała G, Linzen T 2019. Analyzing and interpreting neural networks for NLP: a report on the first BlackboxNLP Workshop. arXiv:1904.04063 [cs.CL]
  3. Arora S, Li Y, Liang Y, Ma T, Risteski A 2018. Linear algebraic structure of word senses, with applications to polysemy. Trans. Assoc. Comput. Linguist. 6:483–95
    [Google Scholar]
  4. Baayen RH, Piepenbrock R, Gulikers L 1993. The CELEX lexical database CD-ROM, Linguist. Data Consort Philadelphia:
    [Google Scholar]
  5. Baroni M. 2013. Composition in distributional semantics. Lang. Linguist. Compass 7:511–22
    [Google Scholar]
  6. Baroni M. 2016a. Composes: an executive summary Talk presented at the Composes Workshop Florence, Italy: Aug. 14
    [Google Scholar]
  7. Baroni M. 2016b. Grounding distributional semantics in the visual world. Linguist. Lang. Compass 10:3–13
    [Google Scholar]
  8. Baroni M, Bernardi R, Zamparelli R 2014a. Frege in space: a program for compositional distributional semantics. Linguist. Issues Lang. Technol. 9:5–110
    [Google Scholar]
  9. Baroni M, Dinu G, Kruszewski G 2014b. Don't count, predict! A systematic comparison of context-counting versus context-predicting semantic vectors. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics238–47 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  10. Baroni M, Lenci A. 2010. Distributional memory: a general framework for corpus-based semantics. Comput. Linguist. 36:673–721
    [Google Scholar]
  11. Baroni M, Zamparelli R. 2010. Nouns are vectors, adjectives are matrices: representing adjective-noun constructions in semantic space. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing(EMNLP 20101183–93 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  12. Beltagy I, Chau C, Boleda G, Garrette D, Erk K, Mooney R 2013. Montague meets Markov: deep semantics with probabilistic logical form. 2nd Joint Conference on Lexical and Computational Semantics (*SEM), Vol. 1:Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity11–21 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  13. Bentivogli L, Bernardi R, Marelli M, Menini S, Baroni M, Zamparelli R 2016. SICK through the SemEval glasses: lesson learned from the evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. Lang. Resour. Eval. 50:95–124
    [Google Scholar]
  14. Bernardi R, Dinu G, Marelli M, Baroni M 2013. A relatedness benchmark to test the role of determiners in compositional distributional semantics. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics53–57 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  15. Bojanowski P, Grave E, Joulin A, Mikolov T 2017. Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5:135–46
    [Google Scholar]
  16. Boleda G, Baroni M, Pham TN, McNally L 2013. Intensionality was only alleged: on adjective–noun composition in distributional semantics. Proceedings of the 10th International Conference on Computational Semantics(IWCS 201335–46 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  17. Boleda G, Erk K. 2015. Distributional semantic features as semantic primitives—or not. Papers from the AAAI Spring Symposium. Knowledge Representation and Reasoning: Integrating Symbolic and Neural Approaches2–5 Palo Alto, CA: AAAI
    [Google Scholar]
  18. Boleda G, Herbelot A. 2016. Formal distributional semantics: introduction to the special issue. Comput. Linguist. 42:619–35
    [Google Scholar]
  19. Boleda G, Schulte im Walde S, Badia T 2012. Modeling regular polysemy: a study on the semantic classification of Catalan adjectives. Comput. Linguist. 38:575–616
    [Google Scholar]
  20. Bréal M. 1897. Essai de sémantique Paris: Hachette
    [Google Scholar]
  21. Caliskan A, Bryson JJ, Narayanan A 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356:183–86
    [Google Scholar]
  22. Camacho-Collados J, Pilehvar MT. 2018. From word to sense embeddings: a survey on vector representations of meaning. J. Artif. Intell. Res. 63:743–88
    [Google Scholar]
  23. Clark S. 2015. Vector space models of lexical meaning. The Handbook of Contemporary Semantic Theory S Lappin, C Fox 493–522 New York: Wiley
    [Google Scholar]
  24. Coecke B, Sadrzadeh M, Clark S 2011. Mathematical foundations for a compositional distributional model of meaning. Linguist. Anal. 36:345–84
    [Google Scholar]
  25. Cotterell R, Schütze H. 2018. Joint semantic synthesis and morphological analysis of the derived word. Trans. Assoc. Comput. Linguist. 6:33–48
    [Google Scholar]
  26. Cruse DA. 1986. Lexical Semantics Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  27. Davies M. 2010. The Corpus of Historical American English(COHA https://www.english-corpora.org/coha/
    [Google Scholar]
  28. Del Tredici M, Fernández R, Boleda G 2019. Short-term meaning shift: a distributional exploration. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT 20192069–75 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  29. Deo A. 2015. Diachronic semantics. Annu. Rev. Linguist. 1:179–97
    [Google Scholar]
  30. Dorr BJ, Jones D. 1996. Role of word sense disambiguation in lexical acquisition: predicting semantics from syntactic cues. Proceedings of the 16th Conference on Computational Linguistics(COLING96), Vol. 1322–27 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  31. Dubossarsky H, Weinshall D, Grossman E 2017. Outta control: laws of semantic change and inherent biases in word representation models. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing(EMNLP 20171136–45 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  32. Erk K. 2012. Vector space models of word meaning and phrase meaning: a survey. Linguist. Lang. Compass 6:635–53
    [Google Scholar]
  33. Erk K. 2016. What do you know about an alligator when you know the company it keeps. Semant. Pragmat. 9:1–63
    [Google Scholar]
  34. Erk K, McCarthy D, Gaylord N 2013. Measuring word meaning in context. Comput. Linguist. 39:511–54
    [Google Scholar]
  35. Erk K, Padó S. 2008. A structured vector space model for word meaning in context. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing(EMNLP 2008897–906 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  36. Erk K, Padó S. 2010. Exemplar-based models for word meaning in context. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics92–97 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  37. Erk K, Padó S, Padó U 2010. A flexible, corpus-driven model of regular and inverse selectional preferences. Comput. Linguist. 36:723–63
    [Google Scholar]
  38. Fillmore CJ. 2006. Frame Semantics. Cognitive Linguistics: Basic Readings D Geeraerts 373–400 Berlin: Mouton de Gruyter
    [Google Scholar]
  39. Garrette D, Erk K, Mooney R 2011. Integrating logical representations with probabilistic information using Markov logic. Proceedings of the 9th International Conference on Computational Semantics(IWCS 2011105–14 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  40. Greenberg C, Sayeed A, Demberg V 2015. Improving unsupervised vector-space thematic fit evaluation via role-filler prototype clustering. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT 201521–31 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  41. Grefenstette E, Sadrzadeh M. 2011. Experimenting with transitive verbs in a DisCoCat. Proceedings of the GEMS 2011 Workshop on Geometrical Models of Natural Language Semantics62–66 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  42. Grimshaw J. 1990. Argument Structure Cambridge, MA: MIT Press
    [Google Scholar]
  43. Gulordava K, Baroni M. 2011. A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus. Proceedings of the GEMS 2011 Workshop on Geometrical Models of Natural Language Semantics67–71 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  44. Hamilton WL, Leskovec J, Jurafsky D 2016. Diachronic word embeddings reveal statistical laws of semantic change. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics1489–501 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  45. Harris ZS. 1954. Distributional structure. Word 10:146–62
    [Google Scholar]
  46. Herbelot A, Vecchi EM. 2015. Building a shared world: mapping distributional to model-theoretic semantic spaces. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing(EMNLP 201522–32 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  47. Heylen K, Wielfaert T, Speelman D, Geeraerts D 2015. Monitoring polysemy: word space models as a tool for large-scale lexical semantic analysis. Lingua 157:153–72
    [Google Scholar]
  48. Hock HH. 1991. Principles of Historical Linguistics Berlin: Walter de Gruyter
    [Google Scholar]
  49. Kilgarriff A. 1997. I don't believe in word senses. Comput. Humanit. 31:91–113
    [Google Scholar]
  50. Kim Y, Chiu YI, Hanaki K, Hegde D, Petrov S 2014. Temporal analysis of language through neural language models. Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science61–65 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  51. Kintsch W. 2001. Predication. Cogn. Sci. 25:173–202
    [Google Scholar]
  52. Korhonen A, Krymolowski Y, Marx Z 2003. Clustering polysemic subcategorization frame distributions semantically. Proceedings of the 41st Annual Meeting on Association for Computational Linguistics 1:64–71 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  53. Kutuzov A, Øvrelid L, Szymanski T, Velldal E 2018. Diachronic word embeddings and semantic shifts: a survey. Proceedings of the 27th International Conference on Computational Linguistics(COLING 20181384–97 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  54. Landauer TK, Dumais ST. 1997. A solution to Plato's problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104:211–40
    [Google Scholar]
  55. Lapata M, Brew C. 2004. Verb class disambiguation using informative priors. Comput. Linguist. 30:45–73
    [Google Scholar]
  56. Lapesa G, Kawaletz L, Plag I, Andreou M, Kisselew M, Padó S 2018. Disambiguation of newly derived nominalizations in context: a distributional semantics approach. Word Struct 11:277–312
    [Google Scholar]
  57. Lapesa G, Padó S, Pross T, Roßdeutscher A 2017. Are doggies cuter than dogs? Emotional valence and concreteness in German derivational morphology. Proceedings of the 12th International Conference on Computational Semantics(IWCS 20171–7 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  58. Lazaridou A, Marelli M, Zamparelli R, Baroni M 2013. Compositional-ly derived representations of morphologically complex words in distributional semantics. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics1517–26 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  59. LeCun Y, Bengio Y, Hinton G 2015. Deep learning. Nature 521:436–44
    [Google Scholar]
  60. Lenci A. 2011. Composing and updating verb argument expectations: a distributional semantic model. Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics58–66 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  61. Lenci A. 2018. Distributional models of word meaning. Annu. Rev. Linguist. 4:151–71
    [Google Scholar]
  62. Levin B. 1993. English Verb Classes and Alternations: A Preliminary Investigation Chicago/London: Univ. Chicago Press
    [Google Scholar]
  63. Lewis M, Steedman M. 2013. Combined distributional and logical semantics. Trans. Assoc. Comput. Linguist. 1:179–92
    [Google Scholar]
  64. Lieber R. 2004. Morphology and Lexical Semantics Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  65. Lyons J. 1977. Semantics Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  66. Mandera P, Keuleers E, Brysbaert M 2017. Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: a review and empirical validation. J. Mem. Lang. 92:57–78
    [Google Scholar]
  67. Marelli M, Baroni M. 2015. Affixation in semantic space: modeling morpheme meanings with compositional distributional semantics. Psychol. Rev. 122:485–515
    [Google Scholar]
  68. McCarthy D. 2000. Using semantic preferences to identify verbal participation in role switching alternations. Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference(NAACL 2000256–63 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  69. McCarthy D, Koeling R, Weeds J, Carroll J 2004. Finding predominant word senses in untagged text. Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics279–86 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  70. McNally L, Boleda G. 2017. Conceptual versus referential affordance in concept composition. Compositionality and Concepts in Linguistics and Psychology JA Hampton, Y Winter 245–67 Berlin: Springer
    [Google Scholar]
  71. Merlo P, Stevenson S. 2001. Automatic verb classification based on statistical distributions of argument structure. Comput. Linguist. 27:373–408
    [Google Scholar]
  72. Michel JB, Shen YK, Aiden AP, Veres A, Gray MK et al. 2011. Quantitative analysis of culture using millions of digitized books. Science 331:176–82
    [Google Scholar]
  73. Mikolov T, Karafiat M, Burget L, Cernocky J, Khudanpur S 2010. Recurrent neural network based language model. Proceedings of the 11th Annual Conference of the International Speech Communication Association(INTERSPEECH 20101045–48 Baixas, Fr: Int. Speech Commun. Assoc.
    [Google Scholar]
  74. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J 2013a. Distributed representations of words and phrases and their compositionality. Proceedings of the 26th International Conference on Neural Information Processing Systems(NIPS13 2:3111–19 Red Hook, NY: Curran
    [Google Scholar]
  75. Mikolov T, Yih W, Zweig G 2013b. Linguistic regularities in continuous space word representations. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL 2013746–51 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  76. Mitchell J, Lapata M. 2010. Composition in distributional models of semantics. Cogn. Sci. 34:1388–429
    [Google Scholar]
  77. Padó S, Herbelot A, Kisselew M, Šnajder J 2016. Predictability of distributional semantics in derivational word formation. Proceedings of the 26th International Conference on Computational Linguistics(COLING 20161285–96 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  78. Pater J. 2019. Generative linguistics and neural networks at 60: foundation, friction, and fusion. Language 95:e41–74
    [Google Scholar]
  79. Peters M, Neumann M, Iyyer M, Gardner M, Clark C et al. 2018. Deep contextualized word representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT 20182227–37 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  80. Pollard C, Sag IA. 1994. Head-Driven Phrase Structure Grammar Chicago: Univ. Chicago Press
    [Google Scholar]
  81. Pustejovsky J. 1995. The Generative Lexicon Cambridge, MA: MIT Press
    [Google Scholar]
  82. Reddy S, McCarthy D, Manandhar S 2011. An empirical study on compositionality in compound nouns. Proceedings of the 5th International Joint Conference on Natural Language Processing210–21 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  83. Reisinger J, Mooney RJ. 2010. Multi-prototype vector-space models of word meaning. Proceedings of the 2010 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT 2010109–17 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  84. Rosenfeld A, Erk K. 2018. Deep neural models of semantic shift. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT 2018474–84 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  85. Rumelhart DE, Abrahamson AA. 1973. A model for analogical reasoning. Cogn. Psychol. 5:1–28
    [Google Scholar]
  86. Sagi E, Kaufmann S, Clark B 2009. Semantic Density Analysis: comparing word meaning across time and phonetic space. Proceedings of the EACL 2009 Workshop on GEMS: GEometical Models of Natural Language Semantics104–11 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  87. Sahlgren M. 2008. The distributional hypothesis. Ital. J. Linguist. 20:33–54
    [Google Scholar]
  88. Santus E, Chersoni E, Lenci A, Blache P 2017. Measuring thematic fit with distributional feature overlap. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing(EMNLP 2017648–58 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  89. Schulte im Walde S. 2006. Experiments on the automatic induction of {German} semantic verb classes. Comput. Linguist. 32:159–94
    [Google Scholar]
  90. Schütze H. 1992. Dimensions of meaning. Proceedings of the 1992 ACM/IEEE Conference on Supercomputing787–96 Los Alamitos, CA: IEEE Comput. Soc.
    [Google Scholar]
  91. Schütze H. 1998. Automatic word sense discrimination. Comput. Linguist. 24:97–123
    [Google Scholar]
  92. Socher R, Huval B, Manning CD, Ng AY 2012. Semantic compositionality through recursive matrix-vector spaces. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning(EMNLP-CoNLL 20121201–11 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  93. Springorum S, Schulte im Walde S, Utt J 2013. Detecting polysemy in hard and soft cluster analyses of German preposition vector spaces. Proceedings of the International Joint Conference on Natural Language Processing632–40 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  94. Stern NG. 1921. Swift, Swiftly and Their Synonyms: A Contribution to Semantic Analysis and Theory Gothenburg, Swed: Wettergren & Kerber
    [Google Scholar]
  95. Szymanski T. 2017. Temporal word analogies: identifying lexical replacement with diachronic word embeddings. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics448–53 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  96. Tahmasebi N, Borin L, Jatowt A 2018. Survey of computational approaches to lexical semantic change. arXiv:1811.06278 [cs.CL]
  97. Traugott EC, Dasher RB. 2001. Regularity in Semantic Change Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  98. Turney PD, Pantel P. 2010. From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37:141–88
    [Google Scholar]
  99. Vecchi EM, Marelli M, Zamparelli R, Baroni M 2017. Spicy adjectives and nominal donkeys: capturing semantic deviance using compositionality in distributional spaces. Cogn. Sci. 41:102–36
    [Google Scholar]
  100. Westera M, Boleda G. 2019. Don't blame distributional semantics if it can't do entailment. Proceedings of the 13th International Conference on Computational Semantics(IWCS 2019120–33 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  101. Xu Y, Kemp C. 2015. A computational evaluation of two laws of semantic change. Proceedings of the 37th Annual Meeting of the Cognitive Science Society2703–8 Austin, TX: Cogn. Sci. Soc.
    [Google Scholar]
/content/journals/10.1146/annurev-linguistics-011619-030303
Loading
/content/journals/10.1146/annurev-linguistics-011619-030303
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error