Syntactic Structure from Deep Learning

Tal Linzen; Marco Baroni

doi:10.1146/annurev-linguistics-032020-051035

Syntactic Structure from Deep Learning

Tal Linzen¹, and Marco Baroni^2,3,4
View Affiliations Hide Affiliations

Affiliations: ¹Department of Linguistics and Center for Data Science, New York University, New York, NY 10003, USA; email: [email protected] ²Facebook AI Research, Paris 75002, France; email: [email protected] ³Catalan Institute for Research and Advanced Studies, Barcelona 08010, Spain ⁴Departament de Traducció i Ciències del Llenguatge, Universitat Pompeu Fabra, Barcelona 08018, Spain
Vol. 7:195-212 (Volume publication date January 2021) https://doi.org/10.1146/annurev-linguistics-032020-051035
First published as a Review in Advance on September 21, 2020
Copyright © 2021 by Annual Reviews. All rights reserved

Abstract

Modern deep neural networks achieve impressive performance in engineering applications that require extensive linguistic skills, such as machine translation. This success has sparked interest in probing whether these models are inducing human-like grammatical knowledge from the raw data they are exposed to and, consequently, whether they can shed new light on long-standing debates concerning the innate structure necessary for language acquisition. In this article, we survey representative studies of the syntactic abilities of deep networks and discuss the broader implications that this work has for theoretical linguistics.

Keyword(s): deep learning, nature versus nurture, probing linguistic knowledge, syntax

Article metrics loading...

/content/journals/10.1146/annurev-linguistics-032020-051035

2021-01-04

2024-04-19

Full text loading...

/deliver/fulltext/linguistics/7/1/annurev-linguistics-032020-051035.html?itemId=/content/journals/10.1146/annurev-linguistics-032020-051035&mimeType=html&fmt=ahah

Literature Cited

Adi Y, Kermany E, Belinkov Y, Lavi O, Goldberg Y 2017. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks Paper presented at the 5th International Conference on Learning Representations (ICLR) Toulon, Fr: Apr 24–26 https://openreview.net/pdf?id=BJh6Ztuxl
Bahdanau D, Cho K, Bengio Y 2015. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473 [cs.CL]
Banko M, Brill E. 2001. Scaling to very very large corpora for natural language disambiguation. Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics26–33 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Belinkov Y, Glass J. 2019. Analysis methods in neural language processing: a survey. Trans. Assoc. Comput. Linguist. 7:49–72
[Google Scholar]
Bernardy J, Lappin S. 2017. Using deep neural networks to learn syntactic agreement. Linguist. Issues Lang. Technol. 15:1–15
[Google Scholar]
Bock K, Miller C. 1991. Broken agreement. Cogn. Psychol. 23:45–93
[Google Scholar]
Chaves RP. 2020. What don't RNN language models learn about filler-gap dependencies. Proc. Soc. Comput. Linguist. 3:20–30
[Google Scholar]
Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F et al. 2014. Learning phrase representations using RNN Encoder–Decoder for statistical machine translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)1724–34 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Chomsky N. 1957. Syntactic Structures The Hague, Neth: Mouton
Chomsky N. 1965. Aspects of the Theory of Syntax Cambridge, MA: MIT Press
Chomsky N. 1980. Rules and representations. Behav. Brain Sci. 3:1–15
[Google Scholar]
Chomsky N. 1986. Knowledge of Language: Its Nature, Origin, and Use Westport, CT: Praeger
Chomsky N. 1995. The Minimalist Program Cambridge, MA: MIT Press
Chomsky N, Miller GE. 1963. Introduction to the formal analysis of natural languages. In Handbook of Mathematical Psychology 2 R Luce, R Bush, E Galanter 269–321 New York: Wiley
[Google Scholar]
Chowdhury S, Zamparelli R. 2018. RNN simulations of grammaticality judgments on long-distance dependencies. Proceedings of the 27th International Conference on Computational Linguistics133–44 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Christiansen M, Chater N. 1999. Connectionist natural language processing: the state of the art. Cogn. Sci. 23:417–37
[Google Scholar]
Chrupała G, Kádár A, Alishahi A 2015. Learning language through pictures. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing 2112–18 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Churchland P. 1989. A Neurocomputational Perspective: The Nature of Mind and the Structure of Science Cambridge, MA: MIT Press
Cichy RM, Kaiser D. 2019. Deep neural networks as scientific models. Trends Cogn. Sci. 23:305–17
[Google Scholar]
Clark A. 1989. Microcognition: Philosophy, Cognitive Science, and Parallel Distributed Processing Cambridge, MA: MIT Press
Conneau A, Kruszewski G, Lample G, Barrault L, Baroni M 2018. What you can cram into a single $&!#^* vector: probing sentence embeddings for linguistic properties. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics 12126–36 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Cui Y, Chen Z, Wei S, Wang S, Liu T, Hu G 2017. Attention-over-attention neural networks for reading comprehension. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics 1593–602 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Culicover P, Jackendoff R. 2005. Simpler Syntax Oxford, UK: Oxford Univ. Press
Devlin J, Chang MW, Lee K, Toutanova K 2019. BERT: pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies4171–86 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Dyer C, Kuncoro A, Ballesteros M, Smith N 2016. Recurrent neural network grammars. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies199–209 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Edunov S, Ott M, Auli M, Grangier D 2018. Understanding back-translation at scale. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing489–500 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Elman JL, Bates EA, Johnson MH, Karmiloff-Smith A, Parisi D, Plunkett K 1998. Rethinking Innateness: A Connectionist Perspective on Development Cambridge, MA: MIT Press
Everaert M, Huybregts M, Chomsky N, Berwick R, Bolhuis J 2015. Structures, not strings: linguistics as part of the cognitive sciences. Trends Cogn. Sci. 19:729–43
[Google Scholar]
Fodor J, Pylyshyn Z. 1988. Connectionism and cognitive architecture: a critical analysis. Cognition 28:3–71
[Google Scholar]
Futrell R, Wilcox E, Morita T, Qian P, Ballesteros M, Levy R 2019. Neural language models as psycholinguistic subjects: representations of syntactic state. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 132–42 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Gibson E, Thomas J. 1999. Memory limitations and structural forgetting: the perception of complex ungrammatical sentences as grammatical. Lang. Cogn. Process. 14:225–48
[Google Scholar]
Giulianelli M, Harding J, Mohnert F, Hupkes D, Zuidema W 2018. Under the hood: using diagnostic classifiers to investigate and improve how language models track agreement information. See Linzen et al. 2018 240–48
Goldberg A. 2019. Explain Me This: Creativity, Competition, and the Partial Productivity of Constructions Princeton, NJ: Princeton Univ. Press
Goldberg Y. 2017. Neural Network Methods for Natural Language Processing San Francisco: Morgan & Claypool
Goldberg Y. 2019. Assessing BERT's syntactic abilities. arXiv:1901.05287 [cs.CL]
Gómez R, Gerken L. 2000. Infant artificial language learning and language acquisition. Trends Cogn. Sci. 4:178–86
[Google Scholar]
Gulordava K, Bojanowski P, Grave E, Linzen T, Baroni M 2018. Colorless green recurrent networks dream hierarchically. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 11195–1205 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Hale J, Dyer C, Kuncoro A, Brennan J 2018. Finding syntax in human encephalography with beam search. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics2727–36 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Hart B, Risley TR. 1995. Meaningful Differences in the Everyday Experience of Young American Children Baltimore, MD: Brookes
Hauser M, Chomsky N, Fitch T 2002. The faculty of language: What is it, who has it, and how did it evolve. Science 298:1569–79
[Google Scholar]
Haxby J, Gobbini I, Furey M, Ishai A, Schouten J, Pietrini P 2001. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293:2425–30
[Google Scholar]
Hewitt J, Manning C. 2019. A structural probe for finding syntax in word representations. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 14129–38 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Hochreiter S, Schmidhuber J. 1997. Long short-term memory. Neural Comput 9:1735–80
[Google Scholar]
Jurafsky D, Martin J. 2008. Speech and Language Processing Upper Saddle River, NJ: Prentice Hall, 2nd. ed.
Kalchbrenner N, Grefenstette E, Blunsom P 2014. A convolutional neural network for modelling sentences. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics 1655–65 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Kuncoro A, Dyer C, Hale J, Blunsom P 2018a. The perils of natural behaviour tests for unnatural models: the case of number agreement. Poster presented at Learning Language in Humans and in Machines Paris, Fr: July 5–6 https://osf.io/9usyt/
Kuncoro A, Dyer C, Hale J, Yogatama D, Clark S, Blunsom P 2018b. LSTMs can learn syntax-sensitive dependencies well, but modeling structure makes them better. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics 11426–36 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Lakretz Y, Dehaene S, King JR 2020. What limits our capacity to process nested long-range dependencies in sentence comprehension. Entropy 22:446
[Google Scholar]
Lakretz Y, Kruszewski G, Desbordes T, Hupkes D, Dehaene S, Baroni M 2019. The emergence of number and syntax units in LSTM language models. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies11–20 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Lasnik H, Lidz J. 2017. The argument from the poverty of the stimulus. Oxford Handbook of Universal Grammar I Roberts 221–48 Oxford, UK: Oxford Univ. Press
[Google Scholar]
LeCun Y, Bengio Y, Hinton G 2015. Deep learning. Nature 521:436–44
[Google Scholar]
Leshno M, Lin V, Pinkus A, Schocken S 1993. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw 6:861–67
[Google Scholar]
Linzen T, Chrupała G, Alishahi A 2018. The 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP: Proceedings of the First Workshop Stroudsburg, PA: Assoc. Comput. Linguist.
Linzen T, Chrupała G, Belinkov Y, Hupkes D 2019. The BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP at ACL 2019: Proceedings of the Second Workshop Stroudsburg, PA: Assoc. Comput. Linguist.
Linzen T, Dupoux E, Goldberg Y 2016. Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Trans. Assoc. Comput. Linguist. 4:521–35
[Google Scholar]
Linzen T, Leonard B. 2018. Distinct patterns of syntactic agreement errors in recurrent networks and humans. Proceedings of the 40th Annual Conference of the Cognitive Science Society692–97 Austin, TX: Cogn. Sci. Soc.
[Google Scholar]
Marvin R, Linzen T. 2018. Targeted syntactic evaluation of language models. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing1192–1202 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
McCoy T, Frank R, Linzen T 2018. Revisiting the poverty of the stimulus: hierarchical generalization without a hierarchical bias in recurrent neural networks. Proceedings of the 40th Annual Conference of the Cognitive Science Society2093–98 Austin, TX: Cogn. Sci. Soc.
[Google Scholar]
McCoy T, Frank R, Linzen T 2020. Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks. Trans. Assoc. Comput. Linguist. 8:125–40
[Google Scholar]
Mitchell TM. 1980. The need for biases in learning generalizations Tech. Rep., Rutgers Univ New Brunswick, NJ:
Pinker S, Jackendoff R. 2005. The faculty of language: What's special about it. Cognition 95:201–36
[Google Scholar]
Pinker S, Prince A. 1988. On language and connectionism: analysis of a parallel distributed processing model of language acquisition. Cognition 28:73–193
[Google Scholar]
Pollack JB. 1990. Recursive distributed representations. Artif. Intel. 46:77–105
[Google Scholar]
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I 2019. Language models are unsupervised multitask learners Work. Pap., OpenAI San Francisco: https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
Raffel C, Shazeer N, Roberts A, Lee K, Narang S et al. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv:1910.10683 [cs.LG]
Ravfogel S, Goldberg Y, Tyers F 2018. Can LSTM learn to capture agreement? The case of Basque. See Linzen et al. 2018 98–107
Rogers A, Kovaleva O, Rumshisky A 2020. A primer in BERTology: what we know about how BERT works. arXiv:2002.12327 [cs.CL]
Ross J. 1967. Constraints on variables in syntax PhD Diss., Mass. Inst. Technol Cambridge, MA:
Shi X, Padhi I, Knight K 2016. Does string-based neural MT learn source syntax?. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing1526–34 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Socher R, Lin CC, Ng AY, Manning CD 2011. Parsing natural scenes and natural language with recursive neural networks. ICML'11: Proceedings of the 28th International Conference on Machine Learning129–36 Madison, WI: Omnipress
[Google Scholar]
Sutskever I, Vinyals O, Le QV 2014. Sequence to sequence learning with neural networks. NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems 2 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger 3104–12 Cambridge, MA: MIT Press
[Google Scholar]
Tran K, Bisazza A, Monz C 2018. The importance of being recurrent for modeling hierarchical structure. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing4731–36 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
van Schijndel M, Linzen T 2018. Modeling garden path effects without explicit hierarchical syntax. Proceedings of the 40th Annual Conference of the Cognitive Science Society T Rogers, M Rau, J Zhu, C Kalish 2603–8 Austin, TX: Cogn. Sci. Soc.
[Google Scholar]
van Schijndel M, Mueller A, Linzen T 2019. Quantity doesn't buy quality syntax with neural language models. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)5831–37 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L et al. 2017. Attention is all you need. NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems U von Luxburg 6000–10 Red Hook, NY: Curran
[Google Scholar]
Warstadt A, Parrish A, Liu H, Mohananey A, Peng W et al. 2019. BLiMP: the benchmark of linguistic minimal pairs for English. arXiv:1912.00582 [cs.CL]
Weston J. 2016. Dialog-based language learning. NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems DD Lee 829–37 Red Hook, NY: Curran
[Google Scholar]
Wilcox E, Levy R, Morita T, Futrell R 2018. What do RNN language models learn about filler–gap dependencies?. See Linzen et al. 2018 211–21

/content/journals/10.1146/annurev-linguistics-032020-051035

Syntactic Structure from Deep Learning

Annual Review of Linguistics 7, 195 (2021); https://doi.org/10.1146/annurev-linguistics-032020-051035

/content/journals/10.1146/annurev-linguistics-032020-051035

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Bilingualism, Mind, and Brain
  
  Judith F. Kroll, Paola E. Dussias, Kinsey Bice, and Lauren Perrotti
  
  Vol. 1 (2015), pp. 377–394
- How Nature Meets Nurture: Universal Grammar and Statistical Learning
  
  Jeffrey Lidz, and Annie Gagliardi
  
  Vol. 1 (2015), pp. 333–353
- The Indo-European Homeland from Linguistic and Archaeological Perspectives
  
  David W. Anthony, and Don Ringe
  
  Vol. 1 (2015), pp. 199–219
- Sign Language Typology: The Contribution of Rural Sign Languages
  
  Connie de Vos, and Roland Pfau
  
  Vol. 1 (2015), pp. 265–288
- Correlational Studies in Typological and Historical Linguistics
  
  D. Robert Ladd, Seán G. Roberts, and Dan Dediu
  
  Vol. 1 (2015), pp. 221–241
- Advances in Dialectometry
  
  Martijn Wieling, and John Nerbonne
  
  Vol. 1 (2015), pp. 243–264
- Genetics and the Language Sciences
  
  Simon E. Fisher, and Sonja C. Vernes
  
  Vol. 1 (2015), pp. 289–310
- Ditransitive Constructions
  
  Martin Haspelmath
  
  Vol. 1 (2015), pp. 19–41
- Language Abilities in Neanderthals
  
  Sverker Johansson
  
  Vol. 1 (2015), pp. 311–332
- Diachronic Semantics
  
  Ashwini Deo
  
  Vol. 1 (2015), pp. 179–197
More Less

Annual Review of Linguistics

Volume 7, 2021

Review Article

Free

Syntactic Structure from Deep Learning

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

Bilingualism, Mind, and Brain

How Nature Meets Nurture: Universal Grammar and Statistical Learning

The Indo-European Homeland from Linguistic and Archaeological Perspectives

Sign Language Typology: The Contribution of Rural Sign Languages

Correlational Studies in Typological and Historical Linguistics

Advances in Dialectometry

Genetics and the Language Sciences

Ditransitive Constructions

Language Abilities in Neanderthals

Diachronic Semantics