[go: up one dir, main page]

Skip to main content

Treebank Parsing and Knowledge of Language

  • Chapter
  • First Online:
Cognitive Aspects of Computational Language Acquisition

Abstract

Over the past 15 years, there has great success in using linguistically annotated sentence collections, such as the Penn Treebank (PTB), to construct statistically based parsers. This success leads naturally to the question of the extent to which such systems acquire full “knowledge of language” in a conventional linguistic sense. This chapter addresses this question. It assesses the knowledge attained by several current statistically-trained parsers in the area of tense marking, questions, English passives, and the acquisition of “unnatural” language constructions, extending previous results that boosting training data via targeted examples can, in certain cases, improve performance, but also indicating that such systems may be too powerful, in the sense that they can learn “unnatural” language patterns. Going beyond this, this chapter advances a general approach to incorporate linguistic knowledge by means of “linguistic regularization” to canonicalize predicate-argument structure, and so improve statistical training and parser performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We note that there have been recent proposals that suggest that “linguistic mastery does not need to be available early in the course of language development” and that “the acquisition of usage-based and fixed-form patterns can account for [the] syntactic burst [occuring around age two to three]” [39]. It is uncontroversial that some fixed form patterns are memorized by children, and equally that complete linguistic mastery of syntax is delayed until the age of eight or later, as first established by the work of Carol Chomsky [10]. However, while it “need not” be “available early”, in point of fact, empirically, it has long been established that ’telegraphic speech’ is not indicative of the full scope of syntactic comprehension at the ages of 2–3; rather, many aspects of syntax are acquired by this age, but telegraphic speech does not reveal these abilities and reveals processing difficulties such as memory limitations [20, 47].

  2. 2.

    As noted in [41] and [48], despite the fact that statistically-based parsers have used both sorts of estimation methods, the underlying statistical models for both generative approaches as well as discriminative approaches using what are called “latent variables” – probabilistic and weighted context-free grammars, respectively – turn out to be equivalent in their expressive power.

  3. 3.

    See, e.g., [9] and [2] for additional discussion of the lack of non-counting and palindromic rules in natural language, including syntax and phonology. It is known in certain sociological settings that palindromic forms are used, e.g., the Australian butchers’ market language. But all indications here are that this such behavior remains “puzzle based.”

  4. 4.

    We attempted to use training settings that matched those for the parsers’ “pre-built” models as far possible. For example, we used the settings provided in the Stanford parser directory under makeSerialized.csh for the so-called wsjPCFG model. In the case of the BC-M2 parser, we used the settings given by collins.properties since we wanted to ensure replicability with standard results.

  5. 5.

    The full database was obtained by download from http://www.computing.dcu.ie/~jjudge/qtreebank/. A handful of errors in corpus annotation were corrected in this downloaded dataset.

  6. 6.

    As noted in Sect. 2 we tested both the Berkeley’s parser’s pre-built eng_sm5 grammar, as well as our own retrained version that carried out six split-merge iterations. The results did not change. The results also remained the same when we used Berkeley parser’s -accurate switch. In general, results did not change for any of the parsers when we substituted stock or should for will. Note that here the Berkeley parser is using its own part of speech tagger. If we force it to use “gold standard” part of speech tags, then it could not possibly fail in the manner we have described. However, we wanted to examine the parser’s own performance, not some exogenous part of speech tagger.

  7. 7.

    For CJ-I we selected the “best” (highest likelihood parse score) from the output of the CJ-I parser. In fact, in several cases, the 2nd best parse tree turned out to be the correct one; this was true, for instance, for sentence 4(h). On the other hand, just as often the best parse was correct and the 2nd best parse was incorrect, as in example 4(a). Note that the CJ-I parser serves as input to the CJ-R re-ranking parser, taking, e.g., the top-50 most likely parses and then sorting them according to a discriminative weighted feature-based scheme using features such as the degree of right-branching, or conjunct parallelism. Since the top 50 parses usually included the correct answer, the re-ranking parser at least had a chance of possibly selecting the correct answer in each case. Even so, re-ranking was ineffective, and did not change the outcome for any of the sentence examples here. See [6] for details about this re-ranking parser.

  8. 8.

    The remaining examples are some simple S’s and a few newswire stories. The authors would like to thank C. Manning for generously sharing these additional examples with us.

  9. 9.

    We put to one side the question of carrying out fMRI experiments on computers.

References

  1. Abney, S. (1996). Statistical methods and linguistics. In J. Klavans, & P. Resnik (Eds.), The balancing act: Combining symbolic and statistical approaches to language (pp. 1–26). Cambridge/Massachusetts: MIT Press.

    Google Scholar 

  2. Berwick, R. C., & Weinberg, A. S. (1982). The grammatical basis of linguistic performance. Cambridge: MIT Press.

    Google Scholar 

  3. Bikel, D. (2004a). On the Parameter Space of Generative Lexicalized Statistical Parsing Models. Ph.D. Thesis, University of Pennsylvania, Department of Computer Science.

    Google Scholar 

  4. Bikel, D. M. (2004b). Intricacies of Collins’ parsing model. Computational Linguistics, 30(4), 479–511.

    Article  MATH  Google Scholar 

  5. Charniak, E. (2000). A maximum-entropy inspired parser. In Proceedings of the First Meeting of the North American Chapter of the Association for Computtaional Linguistics (pp. 132–139), Seattle. Association for Computational Linguistics.

    Google Scholar 

  6. Charniak, E., & Johnson, M. (2005). Coarse to fine n-best parser and maxent discriminative reranking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (pp. 173–180), Ann Arbor. East Stroudsburg: Association for Computational Linguistics.

    Google Scholar 

  7. Chiang, D., & Bikel, D. M. (2002). Recovering latent information in treebanks. In Proceedings of the 19th International Conference on Computational Linguistics (pp. 183–189), Howard International, Tapei.

    Google Scholar 

  8. Chomsky, N. (1957). Syntactic structures. The Hague: Mouton.

    Google Scholar 

  9. Chomsky, N. (1968). Language and mind. New York: Harcourt-Brace.

    Google Scholar 

  10. Chomsky, C. (1969). The acquisition of syntax in children from 5 to 10. Cambridge: MIT Press.

    Google Scholar 

  11. Clark, S., & Curran, J. (2007). Wide-coverage efficient statistical parsing with ccg and log-linear models. Journal of the Association for Computational Linguistics, 33, 493–452.

    Article  MATH  Google Scholar 

  12. Clark, A., & Lappin, S. (2009). Another look at indirect negative evidence. In Proceedings of the EACL 2009 Workshop on Cognitive Aspects of Computational Language Acquisition (pp. 26–33), Athens. Association for Computational Linguistics.

    Google Scholar 

  13. Clegg, A. B. (2008). Computational-linguistic approaches to biomedical text sining. Ph.D. thesis, Birbeck College, University of London.

    Google Scholar 

  14. Collins, M. (1997). Three generative, lexicalized models for statistical parsing. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (pp. 16–23), Madrid. Association for Computational Linguistics.

    Google Scholar 

  15. Collins, M. (1999). Head-driven statistical models for natural language parsing. Ph.D. thesis, University of Pennsylvania.

    Google Scholar 

  16. Collins, M. (2003). Head-driven statistical models for natural language parsing. Computational Linguistics, 29(4), 589–637.

    Article  MathSciNet  MATH  Google Scholar 

  17. Crain, S., & Nakayama, M. (1987). Structure dependence in grammar formation. Language, 63, 522–543.

    Article  Google Scholar 

  18. Curran, J., Clark, S., & Bos, J. (2007). Linguistically motivated large-scale nlp with C&C and boxer. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions (pp. 33–36), Prague, Czech Republic: Association for Computational Linguistics.

    Google Scholar 

  19. Eisner, J. (2001). Smoothing a probabilistic Lexicon via syntactic transformations. Ph.D. thesis, University of Pennsylvania.

    Google Scholar 

  20. Gleitman, L., Gleitman, H., & Shipley, E. (1972). The emergence of the child as grammarian. Cognition, 1(2–3), 137–164.

    Article  Google Scholar 

  21. Hale, K., & Keyser, S. (1993). On argument structure and the lexical representation of syntactic relations. In K. Hale, & S. Keyser (Eds.), The view from building 20 (pp. 53–110). Cambridge: MIT Press.

    Google Scholar 

  22. Hockenmaier, J. (2003a). Data and Models for Statistical Parsing with Combinatory Categorial Grammar. Doctoral Dissertation, University of Edinburgh.

    Google Scholar 

  23. Hockenmaier, J. (2003b). Parsing with generative models of predicate-argument structure. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (pp. 359–366), Sapporo, Japan: Association for Computational Linguistics.

    Google Scholar 

  24. Jackendoff, R. (1999). Why can’t computers use English? New York: Linguistic Society of America (LSA) Publications.

    Google Scholar 

  25. Johnson, M. (1998). Pcfg models of linguistic tree representations. Computational Linguistics, 24(4), 613–632.

    Google Scholar 

  26. Judge, J., Cahill, A., & van Genabith, J. (2006). Questionbank: Creating a corpus of parse-annotated questions. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (pp. 497–504), Sydney, Australia: Association for Computational Linguistics.

    Google Scholar 

  27. Klein, D., & Manning, C. (2003a). Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (pp. 423–430), Sapporo. East Stroudsburg: Association for Computational Linguistics.

    Google Scholar 

  28. Klein, D., & Manning, C. (2003b). Fast exact inference with a factored model for natural language parsing. In Advances in Neural Information Processing Systems (pp. 3–10), Cambridge.

    Google Scholar 

  29. Lappin, S., & Shieber, S. M. (2007). Machine learning theory and practice as a source of insight into universal grammar. Journal of Linguistics, 43(2), 393–427.

    Article  Google Scholar 

  30. Levy, R. (2006). Probabilistic models of word order and syntactic discontinuity. Ph.D. thesis, Stanford University.

    Google Scholar 

  31. Levy, R., & Andrew, G. (2006). Tregex and tsurgeon: Tools for querying and manipulating tree data structures. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, Genoa.

    Google Scholar 

  32. Levy, R., & Manning, C. D. (2004). Deep dependencies from context-free statistical parsers: Correcting the surface dependency approximation. In Procedings of the 42nd Annual Meeting of the Association for Computational Linguistics (pp. 327–334). East Stroudsburg: Association for Computational Linguistics.

    Google Scholar 

  33. Marcus, G. (2003). The algebraic mind. Cambridge: MIT Press.

    Google Scholar 

  34. Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1994). Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19(2), 313–330.

    Google Scholar 

  35. Morgan, J., Meier, R., & Newport, E. (2004). Facilitating the acquisition of syntax with cross-sentential cues to phrase structure. Journal of Memory and Language, 28(3), 360–374.

    Article  Google Scholar 

  36. Musso, M., Moro, A., Glauche, V., Rijntjes, M., Reichenbach, J., Buchel, C., & Weiller, C. (2003). Broca’s area and the language instinct. Nature Neuroscience, 6, 774–81.

    Article  Google Scholar 

  37. Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kubler, S., Marinov, S., & Marsi, E. (2007). Maltparser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2), 95–135.

    Google Scholar 

  38. Nivre, J., Rimell, L., MacDonald, R., & Rodriguez, C. G. (2010). Evaluation of dependency parsers on unbounded dependencies. In Proceedings of the 23rd International Conference on Computational Linguistics, Beijing. International Association for Computational Linguistics.

    Google Scholar 

  39. Parisse, C. (2012). Rethinking the syntactic burst in young children. In A. Alishahi, T. Poibeau, A. Korhonen, & A. Villavicencio (Eds.), Cognitive aspects of computational language acquisition. New York: Springer.

    Google Scholar 

  40. Petrov, S., & Klein, D. (2007). Learning and inference for hierarchically split PCFG’s. In AAAI 2007 Nectar Track, Washington. AAAI.

    Google Scholar 

  41. Petrov, S., & Klein, D. (2008). Sparse multi-scale grammars for discrimininative latent variable parsing. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (pp. 867–876), Honolulu. Association for Computational Linguistics.

    Google Scholar 

  42. Petrov, S., Barrett, L., Thibaux, R., & Klein, D. (2006). Learning accurate, compact, and interpretable tree annotation. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (pp.  433–440), Sydney, Australia: Association for Computational Linguistics.

    Google Scholar 

  43. Riezler, S., King, T. H., Kaplan, R. M., Crouch, R., Maxwell, J. T. I., & Johnson, M. (2002). Parsing the wall street journal using a lexical-functional grammar and discriminative estimation techniques. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-02) (pp. 271–278), Philadelphia, PA: Association for Computational Linguistics.

    Google Scholar 

  44. Rimmell, L., Clark, S., & Steedman, M. (2009). Unbounded dependency recovery for parser evaluation. In Proceedings of the 2009 Meeting on Empirical Methods on Natural Language Processing (pp. 813–821), Singapore: Association for Computational Linguistics.

    Google Scholar 

  45. Saffran, J., & Newport, E. (2007). Statistical learning in 8-month old infants. Science, 274(5294), 1926–1928.

    Article  Google Scholar 

  46. Sekine, S., & Collins, M. (2008). The evalb program.

    Google Scholar 

  47. Shipley, E., Smith, C., & Gleitman, L. (1969). A study in the acquisition of language: Free responses to commands. Language, 45, 322–343.

    Article  Google Scholar 

  48. Smith, N., & Johnson, M. (2007). Weighted and context-free grammars are equally expressive. Computational Linguistics, 33(4), 477–491.

    Article  MathSciNet  MATH  Google Scholar 

  49. Smith, N., Tsimpl, I. -M., & Ouhalla, J. (1993). Learning the impossible: The acquisition of possible and impossible languages by a polyglot savant. Lingua, 91, 279–347.

    Article  Google Scholar 

  50. Smith, N. A., & Eisner, J. (2005). Guiding unsupervised grammar induction using contrastive estimation. In International Joint Conference on Artificial Intelligence (IJCAI) Workshop on Grammatical Inference Applications (pp. 73–82), Edinburgh, Scotland: Association for Computational Linguistics.

    Google Scholar 

  51. Tateisi, Y., Yakushiji, A., Ohta, T., & Tsujii, J. (2005). Syntax annotation for the Genia corpus. In Proceedings of the International Joint Conference on Natural Language Processing (pp. 222–227), JJeju Island, Korea: Association for Computational Linguistics.

    Google Scholar 

  52. Turian, J., & Melamed, I. D. (2006). Advances in discriminative parsing. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (pp. 873–880), Sydney, Australia: Association for Computational Linguistics.

    Google Scholar 

  53. Wexler, K., & Culicover, P. (1983). Formal principles of language acquisition. Cambridge: MIT Press.

    Google Scholar 

Download references

Acknowledgements

We would like to thank Michael Coen and Ali Mohammed for assistance and valuable suggestions. More importantly, we would like to extend special thanks to those individuals who have graciously made their parsing systems publicly available for open experimentation, in particular Daniel Bikel and Michael Collins; John Judge for his extremely valuable QBank resource and his generosity in providing it to us; Mark Johnson and Eugene Charniak; the members of the Stanford NLP group, including Daniel Klein and Christopher Manning; the Berkeley NLP group, including Stan Petrov and Daniel Klein; and the Malt and C&C parser developers. Without their generosity, analyses like those carried out here would be impossible. Finally, we would like to acknowledge two anonymous reviewers whose suggestions greatly improved this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandiway Fong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Fong, S., Malioutov, I., Yankama, B., Berwick, R.C. (2013). Treebank Parsing and Knowledge of Language. In: Villavicencio, A., Poibeau, T., Korhonen, A., Alishahi, A. (eds) Cognitive Aspects of Computational Language Acquisition. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31863-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31863-4_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31862-7

  • Online ISBN: 978-3-642-31863-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics