1932

Abstract

Learnability has traditionally been considered to be a crucial constraint on theoretical syntax; however, the issues involved have been poorly understood, partly as a result of the lack of simple learning algorithms for various types of formal grammars. Here I discuss the computational issues involved in learning hierarchically structured grammars from strings of symbols alone. The methods involved are based on an abstract notion of the derivational context of a syntactic category, which in the most elementary case of context-free grammars leads to learning algorithms based on a form of traditional distributional analysis. Crucially, these techniques can be extended to work with mildly context-sensitive grammars (and beyond), thus leading to learning methods that can in principle learn classes of grammars that are powerful enough to represent all natural languages. These learning methods require that the syntactic categories of the grammars be visible in a certain technical sense: They must be well characterized either by the sets of symbols that they generate or by the sets of contexts in which they can appear. However, there are still significant gaps between these theoretical results and their direct implementation as models of language acquisition; I discuss how these remaining problems can be overcome.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-linguistics-011516-034008
2017-01-14
2024-04-28
Loading full text...

Full text loading...

/deliver/fulltext/linguistics/3/1/annurev-linguistics-011516-034008.html?itemId=/content/journals/10.1146/annurev-linguistics-011516-034008&mimeType=html&fmt=ahah

Literature Cited

  1. Alishahi A, Stevenson S. 2008. A computational model of early argument structure acquisition. Cogn. Sci. 32:789–834 [Google Scholar]
  2. Angluin D. 1982. Inference of reversible languages. J. ACM 29:741–65 [Google Scholar]
  3. Angluin D. 1987. Learning regular sets from queries and counterexamples. Inf. Comput. 75:87–106 [Google Scholar]
  4. Berwick R. 1984. Strong generative capacity, weak generative capacity, and modern linguistic theories. Comput. Linguist. 10:189–202 [Google Scholar]
  5. Berwick R, Pietroski P, Yankama B, Chomsky N. 2011. Poverty of the stimulus revisited. Cogn. Sci. 35:1207–42 [Google Scholar]
  6. Bisk Y, Hockenmaier J. 2013. An HDP model for inducing combinatory categorial grammars. Trans. Assoc. Comput. Linguist. 1:75–88 [Google Scholar]
  7. Boeckx C. 2006. Linguistic Minimalism Oxford, UK: Oxford Univ. Press
  8. Chater N, Clark A, Goldsmith JA, Perfors A. 2015. Empiricism and Language Learnability Oxford, UK: Oxford Univ. Press
  9. Chater N, Manning C. 2006. Probabilistic models of language processing and acquisition. Trends Cogn. Sci. 10:335–44 [Google Scholar]
  10. Chater N, Vitányi P. 2007. ‘Ideal learning’ of natural language: positive results about learning from positive evidence. J. Math. Psychol. 51:135–63 [Google Scholar]
  11. Chomsky N. 1956. Three models for the description of language. IEEE Trans. Inf. Theory 2:113–24 [Google Scholar]
  12. Chomsky N. 1959. Review of Joshua Greenberg's Essays in Linguistics. Word 15:202–18 [Google Scholar]
  13. Chomsky N. 1986. Knowledge of Language: Its Nature, Origin, and Use New York: Praeger
  14. Clark A. 2006. PAC-learning unambiguous NTS languages. Grammatical Inference: Algorithms and Applications Y Sakakibara, S Kobayashi, K Sato, T Nishino, E Tomita 59–71 Berlin: Springer [Google Scholar]
  15. Clark A. 2013. The syntactic concept lattice: another algebraic theory of the context-free languages?. J. Logic Comput. doi:10.1093/logcom/ext037
  16. Clark A. 2014. Learning trees from strings: a strong learning algorithm for some context-free grammars. J. Mach. Learn. Res. 14:3537–59 [Google Scholar]
  17. Clark A. 2015. Canonical context-free grammars and strong learning: two approaches. Proceedings of the 14th Meeting on the Mathematics of Language (MOL 2015) M Kuhlmann, M Kanazawa, GM Kobele 99–111 Stroudsburg, PA: Assoc. Comput. Linguist. [Google Scholar]
  18. Clark A, Eyraud R. 2007. Polynomial identification in the limit of substitutable context-free languages. J. Mach. Learn. Res. 8:1725–45 [Google Scholar]
  19. Clark A, Kanazawa M, Kobele GM, Yoshinaka R. 2016. Distributional learning of some nonlinear tree grammars. Fundam. Inf. 146:1–39 [Google Scholar]
  20. Clark A, Lappin S. 2009. Another look at indirect negative evidence. Proceedings of the EACL Workshop on Cognitive Aspects of Computational Language Acquisition26–33 Stroudsburg, PA: Assoc. Comput. Linguist.
  21. Clark A, Lappin S. 2011. Linguistic Nativism and the Poverty of the Stimulus Malden, MA: Wiley- Blackwell
  22. Clark A, Thollard F. 2004. PAC-learnability of probabilistic deterministic finite state automata. J. Mach. Learn. Res. 5:473–97 [Google Scholar]
  23. Clark A, Yoshinaka R. 2013. Distributional learning of parallel multiple context-free grammars. Mach. Learn. 96:5–31 [Google Scholar]
  24. Crain S, Pietroski P. 2001. Nature, nurture and universal grammar. Linguist. Philos. 24:139–86 [Google Scholar]
  25. de Groote P. 2001. Towards abstract categorial grammars. Proceedings of the 39th Conference of the Association for Computational Linguistics252–59 Stroudsburg, PA: Assoc. Comput. Linguist.
  26. Denis F, Lemay A, Terlutte A. 2004. Learning regular languages using RFSAs. Theor. Comput. Sci. 313:267–94 [Google Scholar]
  27. Fodor J, Sakas W. 2005. The subset principle in syntax: costs of compliance. J. Linguist. 41:513–70 [Google Scholar]
  28. Ghomeshi J, Jackendo R, Rosen N, Russell K. 2004. Contrastive focus reduplication in English (the salad-salad paper). Nat. Lang. Linguist. Theory 22:307–357 [Google Scholar]
  29. Gibson E, Wexler K. 1994. Triggers. Linguist. Inq. 25:407–54 [Google Scholar]
  30. Gold EM. 1967. Language identification in the limit. Inf. Control 10:447–74 [Google Scholar]
  31. Graf T. 2013. Local and transderivational constraints in syntax and semantics PhD thesis, Dep. Linguist., Univ. Calif., Los Angeles
  32. Harris Z. 1951. Methods in Structural Linguistics Chicago: Univ. Chicago Press
  33. Headden WP III, Johnson M, McClosky D. 2009. Improving unsupervised dependency parsing with richer contexts and smoothing. Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics101–9 Stroudsburg, PA: Assoc. Comput. Linguist.
  34. Heinz J, de la Higuera C, van Zaanen M. 2015. Synthesis Lectures on Human Language Technologies: Grammatical Inference for Computational Linguistics. San Rafael, CA: Morgan & Claypool
  35. Horning JJ. 1969. A study of grammatical inference PhD thesis, Comput. Sci. Dep., Stanford Univ., Stanford, CA
  36. Huybrechts RAC. 1984. The weak inadequacy of context-free phrase structure grammars. Van Periferie naar Kern G de Haan, M Trommelen, W Zonneveld 81–99 Dordrecht, Neth.: Foris [Google Scholar]
  37. Joshi A, Vijay-Shanker K, Weir D. 1990. The convergence of mildly context-sensitive grammar formalisms Tech. rep. MS-CIS-90-01, Dep. Comput. Inf. Sci., Univ. Pa., Philadelphia
  38. Kanazawa M. 1994. Learnable classes of categorial grammars PhD thesis, Comput. Sci. Dep., Stanford Univ., Stanford, CA
  39. Klein D, Manning C. 2004. Corpus-based induction of syntactic structure: models of dependency and constituency. Proceedings of the 42nd Annual Conference of the Association for Computational Linguistics 478 Stroudsburg, PA: Assoc. Comput. Linguist.
  40. Kobele GM. 2006. Generating copies: an investigation into structural identity in language and grammar PhD thesis, Dep. Linguist., Univ. Calif., Los Angeles
  41. Kobele GM. 2011. Minimalist tree languages are closed under intersection with recognizable tree languages. Lecture Notes in Computer Science 6736 Logical Aspects of Computational Linguistics S Pogodalla, JP Prost 129–44 Berlin: Springer [Google Scholar]
  42. Kobele GM. 2015. LF-copying without LF. Lingua 166:B236–59 [Google Scholar]
  43. Kwiatkowski T, Zettlemoyer L, Goldwater S, Steedman M. 2010. Inducing probabilistic CCG grammars from logical form with higher-order unification. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing1223–33 Stroudsburg, PA: Assoc. Comput. Linguist.
  44. Lamb SM. 1961. On the mechanisation of syntactic analysis. Proceedings of the 1961 Conference on Machine Translation of Languages and Applied Language Analysis 2674–85 London: Her Majesty's Station. Off.
  45. MacWhinney B. 1995. The CHILDES Project: Tools for Analyzing Talk Hillsdale, NJ: Erlbaum
  46. Marcus GF. 1993. Negative evidence in language acquisition. Cognition 46:53–85 [Google Scholar]
  47. Marcus MP, Santorini B, Marcinkiewicz MA. 1993. Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19:312–30 [Google Scholar]
  48. Michaelis J. 2001. Transforming linear context-free rewriting systems into minimalist grammars. Logical Aspects of Computational Linguistics P de Groote, G Morrill, C Retoré 228–44 Berlin: Springer [Google Scholar]
  49. Miller P. 1999. Strong Generative Capacity: The Semantics of Linguistic Formalism Stanford, CA: Cent. Study Lang. Inf.
  50. Mossel E, Roch S. 2006. Learning nonsingular phylogenies and hidden Markov models. Ann. Appl. Probab. 16:583–614 [Google Scholar]
  51. Myhill J. 1950. Review of On Syntactical Categories by Yehoshua Bar-Hillel. J. Symb. Logic 15:220 [Google Scholar]
  52. Niyogi P. 2006. The Computational Nature of Language Learning and Evolution Cambridge, MA: MIT Press
  53. Osherson D, Stob M, Weinstein S. 1986. Systems That Learn: An Introduction to Learning Theory for Cognitive and Computer Scientists Cambridge, MA: MIT Press, 1st ed..
  54. Pearl L, Sprouse J. 2012. Computational models of acquisition for islands. Experimental Syntax and Island Effects J Sprouse, N Hornstein 109–31 Cambridge, UK: Cambridge Univ. Press [Google Scholar]
  55. Perfors A, Tenenbaum JB, Regier T. 2011. The learnability of abstract syntactic principles. Cognition 118:306–38 [Google Scholar]
  56. Regier T, Gahl S. 2004. Learning the unlearnable: the role of missing evidence. Cognition 93:147–55 [Google Scholar]
  57. Sakas W, Fodor J. 2001. The structural triggers learner. Language Acquisition and Learnability S Bertolo 172–233 Cambridge, UK: Cambridge Univ. Press [Google Scholar]
  58. Seki H, Matsumura T, Fujii M, Kasami T. 1991. On multiple context-free grammars. Theor. Comput. Sci. 88:191–229 [Google Scholar]
  59. Shibata C, Yoshinaka R. 2013. PAC learning of some subclasses of context-free grammars with basic distributional properties from positive data. Lecture Notes in Computer Science 8139 Algorithmic Learning Theory S Jain, R Munos, F Stephan, T Zeugmann 143–57 Berlin: Springer [Google Scholar]
  60. Shieber S. 1985. Evidence against the context-freeness of natural language. Linguist. Philos. 8:333–43 [Google Scholar]
  61. Shirakawa H, Yokomori T. 1993. Polynomial-time MAT learning of C-deterministic context-free grammars. Trans. Inf. Process. Soc. Jpn. 34:380–90 [Google Scholar]
  62. Spitkovsky VI, Alshawi H, Jurafsky D, Manning CD. 2010. Viterbi training improves unsupervised dependency parsing. Proceedings of the 14th Conference on Computational Natural Language Learning (CoNLL2010)9–17 Stroudsburg, PA: Assoc. Comput. Linguist. [Google Scholar]
  63. Stabler E. 1997. Derivational minimalism. Lecture Notes in Computer Science 1328 Logical Aspects of Computational Linguistics C Retoré 68–95 Berlin: Springer [Google Scholar]
  64. van Zaanen M. 2000. ABL: alignment-based learning. Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000)961–67 Stroudsburg, PA: Assoc. Comput. Linguist. [Google Scholar]
  65. Wells RS. 1947. Immediate constituents. Language 23:81–117 [Google Scholar]
  66. Yang C. 2002. Knowledge and Learning in Natural Language New York: Oxford Univ. Press
  67. Yang C. 2008. The great number crunch. J. Linguist. 44:205–28 [Google Scholar]
  68. Yokomori T. 2003. Polynomial-time identification of very simple grammars from positive data. Theor. Comput. Sci. 298:179–206 [Google Scholar]
  69. Yoshinaka R. 2010. Polynomial-time identification of multiple context-free languages from positive data and membership queries. Proceedings of the 10th International Colloquium on Grammatical Inference230–44 Berlin: Springer [Google Scholar]
  70. Yoshinaka R. 2011. Efficient learning of multiple context-free languages with multidimensional substitutability from positive data. Theor. Comput. Sci. 412:1821–31 [Google Scholar]
  71. Yoshinaka R, Kanazawa M. 2011. Distributional learning of abstract categorial grammars. Lecture Notes in Computer Science 6736 Logical Aspects of Computational Linguistics S Pogodalla, JP Prost 251–66 Berlin: Springer [Google Scholar]
/content/journals/10.1146/annurev-linguistics-011516-034008
Loading
  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error