Abstract
Over the past 15 years, there has great success in using linguistically annotated sentence collections, such as the Penn Treebank (PTB), to construct statistically based parsers. This success leads naturally to the question of the extent to which such systems acquire full “knowledge of language” in a conventional linguistic sense. This chapter addresses this question. It assesses the knowledge attained by several current statistically-trained parsers in the area of tense marking, questions, English passives, and the acquisition of “unnatural” language constructions, extending previous results that boosting training data via targeted examples can, in certain cases, improve performance, but also indicating that such systems may be too powerful, in the sense that they can learn “unnatural” language patterns. Going beyond this, this chapter advances a general approach to incorporate linguistic knowledge by means of “linguistic regularization” to canonicalize predicate-argument structure, and so improve statistical training and parser performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We note that there have been recent proposals that suggest that “linguistic mastery does not need to be available early in the course of language development” and that “the acquisition of usage-based and fixed-form patterns can account for … [the] syntactic burst [occuring around age two to three]” [39]. It is uncontroversial that some fixed form patterns are memorized by children, and equally that complete linguistic mastery of syntax is delayed until the age of eight or later, as first established by the work of Carol Chomsky [10]. However, while it “need not” be “available early”, in point of fact, empirically, it has long been established that ’telegraphic speech’ is not indicative of the full scope of syntactic comprehension at the ages of 2–3; rather, many aspects of syntax are acquired by this age, but telegraphic speech does not reveal these abilities and reveals processing difficulties such as memory limitations [20, 47].
- 2.
As noted in [41] and [48], despite the fact that statistically-based parsers have used both sorts of estimation methods, the underlying statistical models for both generative approaches as well as discriminative approaches using what are called “latent variables” – probabilistic and weighted context-free grammars, respectively – turn out to be equivalent in their expressive power.
- 3.
See, e.g., [9] and [2] for additional discussion of the lack of non-counting and palindromic rules in natural language, including syntax and phonology. It is known in certain sociological settings that palindromic forms are used, e.g., the Australian butchers’ market language. But all indications here are that this such behavior remains “puzzle based.”
- 4.
We attempted to use training settings that matched those for the parsers’ “pre-built” models as far possible. For example, we used the settings provided in the Stanford parser directory under makeSerialized.csh for the so-called wsjPCFG model. In the case of the BC-M2 parser, we used the settings given by collins.properties since we wanted to ensure replicability with standard results.
- 5.
The full database was obtained by download from http://www.computing.dcu.ie/~jjudge/qtreebank/. A handful of errors in corpus annotation were corrected in this downloaded dataset.
- 6.
As noted in Sect. 2 we tested both the Berkeley’s parser’s pre-built eng_sm5 grammar, as well as our own retrained version that carried out six split-merge iterations. The results did not change. The results also remained the same when we used Berkeley parser’s -accurate switch. In general, results did not change for any of the parsers when we substituted stock or should for will. Note that here the Berkeley parser is using its own part of speech tagger. If we force it to use “gold standard” part of speech tags, then it could not possibly fail in the manner we have described. However, we wanted to examine the parser’s own performance, not some exogenous part of speech tagger.
- 7.
For CJ-I we selected the “best” (highest likelihood parse score) from the output of the CJ-I parser. In fact, in several cases, the 2nd best parse tree turned out to be the correct one; this was true, for instance, for sentence 4(h). On the other hand, just as often the best parse was correct and the 2nd best parse was incorrect, as in example 4(a). Note that the CJ-I parser serves as input to the CJ-R re-ranking parser, taking, e.g., the top-50 most likely parses and then sorting them according to a discriminative weighted feature-based scheme using features such as the degree of right-branching, or conjunct parallelism. Since the top 50 parses usually included the correct answer, the re-ranking parser at least had a chance of possibly selecting the correct answer in each case. Even so, re-ranking was ineffective, and did not change the outcome for any of the sentence examples here. See [6] for details about this re-ranking parser.
- 8.
The remaining examples are some simple S’s and a few newswire stories. The authors would like to thank C. Manning for generously sharing these additional examples with us.
- 9.
We put to one side the question of carrying out fMRI experiments on computers.
References
Abney, S. (1996). Statistical methods and linguistics. In J. Klavans, & P. Resnik (Eds.), The balancing act: Combining symbolic and statistical approaches to language (pp. 1–26). Cambridge/Massachusetts: MIT Press.
Berwick, R. C., & Weinberg, A. S. (1982). The grammatical basis of linguistic performance. Cambridge: MIT Press.
Bikel, D. (2004a). On the Parameter Space of Generative Lexicalized Statistical Parsing Models. Ph.D. Thesis, University of Pennsylvania, Department of Computer Science.
Bikel, D. M. (2004b). Intricacies of Collins’ parsing model. Computational Linguistics, 30(4), 479–511.
Charniak, E. (2000). A maximum-entropy inspired parser. In Proceedings of the First Meeting of the North American Chapter of the Association for Computtaional Linguistics (pp. 132–139), Seattle. Association for Computational Linguistics.
Charniak, E., & Johnson, M. (2005). Coarse to fine n-best parser and maxent discriminative reranking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (pp. 173–180), Ann Arbor. East Stroudsburg: Association for Computational Linguistics.
Chiang, D., & Bikel, D. M. (2002). Recovering latent information in treebanks. In Proceedings of the 19th International Conference on Computational Linguistics (pp. 183–189), Howard International, Tapei.
Chomsky, N. (1957). Syntactic structures. The Hague: Mouton.
Chomsky, N. (1968). Language and mind. New York: Harcourt-Brace.
Chomsky, C. (1969). The acquisition of syntax in children from 5 to 10. Cambridge: MIT Press.
Clark, S., & Curran, J. (2007). Wide-coverage efficient statistical parsing with ccg and log-linear models. Journal of the Association for Computational Linguistics, 33, 493–452.
Clark, A., & Lappin, S. (2009). Another look at indirect negative evidence. In Proceedings of the EACL 2009 Workshop on Cognitive Aspects of Computational Language Acquisition (pp. 26–33), Athens. Association for Computational Linguistics.
Clegg, A. B. (2008). Computational-linguistic approaches to biomedical text sining. Ph.D. thesis, Birbeck College, University of London.
Collins, M. (1997). Three generative, lexicalized models for statistical parsing. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (pp. 16–23), Madrid. Association for Computational Linguistics.
Collins, M. (1999). Head-driven statistical models for natural language parsing. Ph.D. thesis, University of Pennsylvania.
Collins, M. (2003). Head-driven statistical models for natural language parsing. Computational Linguistics, 29(4), 589–637.
Crain, S., & Nakayama, M. (1987). Structure dependence in grammar formation. Language, 63, 522–543.
Curran, J., Clark, S., & Bos, J. (2007). Linguistically motivated large-scale nlp with C&C and boxer. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions (pp. 33–36), Prague, Czech Republic: Association for Computational Linguistics.
Eisner, J. (2001). Smoothing a probabilistic Lexicon via syntactic transformations. Ph.D. thesis, University of Pennsylvania.
Gleitman, L., Gleitman, H., & Shipley, E. (1972). The emergence of the child as grammarian. Cognition, 1(2–3), 137–164.
Hale, K., & Keyser, S. (1993). On argument structure and the lexical representation of syntactic relations. In K. Hale, & S. Keyser (Eds.), The view from building 20 (pp. 53–110). Cambridge: MIT Press.
Hockenmaier, J. (2003a). Data and Models for Statistical Parsing with Combinatory Categorial Grammar. Doctoral Dissertation, University of Edinburgh.
Hockenmaier, J. (2003b). Parsing with generative models of predicate-argument structure. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (pp. 359–366), Sapporo, Japan: Association for Computational Linguistics.
Jackendoff, R. (1999). Why can’t computers use English? New York: Linguistic Society of America (LSA) Publications.
Johnson, M. (1998). Pcfg models of linguistic tree representations. Computational Linguistics, 24(4), 613–632.
Judge, J., Cahill, A., & van Genabith, J. (2006). Questionbank: Creating a corpus of parse-annotated questions. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (pp. 497–504), Sydney, Australia: Association for Computational Linguistics.
Klein, D., & Manning, C. (2003a). Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (pp. 423–430), Sapporo. East Stroudsburg: Association for Computational Linguistics.
Klein, D., & Manning, C. (2003b). Fast exact inference with a factored model for natural language parsing. In Advances in Neural Information Processing Systems (pp. 3–10), Cambridge.
Lappin, S., & Shieber, S. M. (2007). Machine learning theory and practice as a source of insight into universal grammar. Journal of Linguistics, 43(2), 393–427.
Levy, R. (2006). Probabilistic models of word order and syntactic discontinuity. Ph.D. thesis, Stanford University.
Levy, R., & Andrew, G. (2006). Tregex and tsurgeon: Tools for querying and manipulating tree data structures. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, Genoa.
Levy, R., & Manning, C. D. (2004). Deep dependencies from context-free statistical parsers: Correcting the surface dependency approximation. In Procedings of the 42nd Annual Meeting of the Association for Computational Linguistics (pp. 327–334). East Stroudsburg: Association for Computational Linguistics.
Marcus, G. (2003). The algebraic mind. Cambridge: MIT Press.
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1994). Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19(2), 313–330.
Morgan, J., Meier, R., & Newport, E. (2004). Facilitating the acquisition of syntax with cross-sentential cues to phrase structure. Journal of Memory and Language, 28(3), 360–374.
Musso, M., Moro, A., Glauche, V., Rijntjes, M., Reichenbach, J., Buchel, C., & Weiller, C. (2003). Broca’s area and the language instinct. Nature Neuroscience, 6, 774–81.
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kubler, S., Marinov, S., & Marsi, E. (2007). Maltparser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2), 95–135.
Nivre, J., Rimell, L., MacDonald, R., & Rodriguez, C. G. (2010). Evaluation of dependency parsers on unbounded dependencies. In Proceedings of the 23rd International Conference on Computational Linguistics, Beijing. International Association for Computational Linguistics.
Parisse, C. (2012). Rethinking the syntactic burst in young children. In A. Alishahi, T. Poibeau, A. Korhonen, & A. Villavicencio (Eds.), Cognitive aspects of computational language acquisition. New York: Springer.
Petrov, S., & Klein, D. (2007). Learning and inference for hierarchically split PCFG’s. In AAAI 2007 Nectar Track, Washington. AAAI.
Petrov, S., & Klein, D. (2008). Sparse multi-scale grammars for discrimininative latent variable parsing. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (pp. 867–876), Honolulu. Association for Computational Linguistics.
Petrov, S., Barrett, L., Thibaux, R., & Klein, D. (2006). Learning accurate, compact, and interpretable tree annotation. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (pp. 433–440), Sydney, Australia: Association for Computational Linguistics.
Riezler, S., King, T. H., Kaplan, R. M., Crouch, R., Maxwell, J. T. I., & Johnson, M. (2002). Parsing the wall street journal using a lexical-functional grammar and discriminative estimation techniques. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-02) (pp. 271–278), Philadelphia, PA: Association for Computational Linguistics.
Rimmell, L., Clark, S., & Steedman, M. (2009). Unbounded dependency recovery for parser evaluation. In Proceedings of the 2009 Meeting on Empirical Methods on Natural Language Processing (pp. 813–821), Singapore: Association for Computational Linguistics.
Saffran, J., & Newport, E. (2007). Statistical learning in 8-month old infants. Science, 274(5294), 1926–1928.
Sekine, S., & Collins, M. (2008). The evalb program.
Shipley, E., Smith, C., & Gleitman, L. (1969). A study in the acquisition of language: Free responses to commands. Language, 45, 322–343.
Smith, N., & Johnson, M. (2007). Weighted and context-free grammars are equally expressive. Computational Linguistics, 33(4), 477–491.
Smith, N., Tsimpl, I. -M., & Ouhalla, J. (1993). Learning the impossible: The acquisition of possible and impossible languages by a polyglot savant. Lingua, 91, 279–347.
Smith, N. A., & Eisner, J. (2005). Guiding unsupervised grammar induction using contrastive estimation. In International Joint Conference on Artificial Intelligence (IJCAI) Workshop on Grammatical Inference Applications (pp. 73–82), Edinburgh, Scotland: Association for Computational Linguistics.
Tateisi, Y., Yakushiji, A., Ohta, T., & Tsujii, J. (2005). Syntax annotation for the Genia corpus. In Proceedings of the International Joint Conference on Natural Language Processing (pp. 222–227), JJeju Island, Korea: Association for Computational Linguistics.
Turian, J., & Melamed, I. D. (2006). Advances in discriminative parsing. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (pp. 873–880), Sydney, Australia: Association for Computational Linguistics.
Wexler, K., & Culicover, P. (1983). Formal principles of language acquisition. Cambridge: MIT Press.
Acknowledgements
We would like to thank Michael Coen and Ali Mohammed for assistance and valuable suggestions. More importantly, we would like to extend special thanks to those individuals who have graciously made their parsing systems publicly available for open experimentation, in particular Daniel Bikel and Michael Collins; John Judge for his extremely valuable QBank resource and his generosity in providing it to us; Mark Johnson and Eugene Charniak; the members of the Stanford NLP group, including Daniel Klein and Christopher Manning; the Berkeley NLP group, including Stan Petrov and Daniel Klein; and the Malt and C&C parser developers. Without their generosity, analyses like those carried out here would be impossible. Finally, we would like to acknowledge two anonymous reviewers whose suggestions greatly improved this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Fong, S., Malioutov, I., Yankama, B., Berwick, R.C. (2013). Treebank Parsing and Knowledge of Language. In: Villavicencio, A., Poibeau, T., Korhonen, A., Alishahi, A. (eds) Cognitive Aspects of Computational Language Acquisition. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31863-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-31863-4_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31862-7
Online ISBN: 978-3-642-31863-4
eBook Packages: Computer ScienceComputer Science (R0)