Treebank Parsing and Knowledge of Language

Sandiway Fong⁵,
Igor Malioutov⁶,
Beracah Yankama⁶ &
…
Robert C. Berwick⁶

Part of the book series: Theory and Applications of Natural Language Processing ((NLP))

1096 Accesses

Abstract

Over the past 15 years, there has great success in using linguistically annotated sentence collections, such as the Penn Treebank (PTB), to construct statistically based parsers. This success leads naturally to the question of the extent to which such systems acquire full “knowledge of language” in a conventional linguistic sense. This chapter addresses this question. It assesses the knowledge attained by several current statistically-trained parsers in the area of tense marking, questions, English passives, and the acquisition of “unnatural” language constructions, extending previous results that boosting training data via targeted examples can, in certain cases, improve performance, but also indicating that such systems may be too powerful, in the sense that they can learn “unnatural” language patterns. Going beyond this, this chapter advances a general approach to incorporate linguistic knowledge by means of “linguistic regularization” to canonicalize predicate-argument structure, and so improve statistical training and parser performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Predicate Argument Structures for Information Extraction from Dependency Representations: Null Elements are Missing

Treebank Annotation for Formal Semantics Research

A Multi-purpose Bayesian Model for Word-Based Morphology

Notes

1.
We note that there have been recent proposals that suggest that “linguistic mastery does not need to be available early in the course of language development” and that “the acquisition of usage-based and fixed-form patterns can account for … [the] syntactic burst [occuring around age two to three]” [39]. It is uncontroversial that some fixed form patterns are memorized by children, and equally that complete linguistic mastery of syntax is delayed until the age of eight or later, as first established by the work of Carol Chomsky [10]. However, while it “need not” be “available early”, in point of fact, empirically, it has long been established that ’telegraphic speech’ is not indicative of the full scope of syntactic comprehension at the ages of 2–3; rather, many aspects of syntax are acquired by this age, but telegraphic speech does not reveal these abilities and reveals processing difficulties such as memory limitations [20, 47].
2.
As noted in [41] and [48], despite the fact that statistically-based parsers have used both sorts of estimation methods, the underlying statistical models for both generative approaches as well as discriminative approaches using what are called “latent variables” – probabilistic and weighted context-free grammars, respectively – turn out to be equivalent in their expressive power.
3.
See, e.g., [9] and [2] for additional discussion of the lack of non-counting and palindromic rules in natural language, including syntax and phonology. It is known in certain sociological settings that palindromic forms are used, e.g., the Australian butchers’ market language. But all indications here are that this such behavior remains “puzzle based.”
4.
We attempted to use training settings that matched those for the parsers’ “pre-built” models as far possible. For example, we used the settings provided in the Stanford parser directory under makeSerialized.csh for the so-called wsjPCFG model. In the case of the BC-M2 parser, we used the settings given by collins.properties since we wanted to ensure replicability with standard results.
5.
The full database was obtained by download from http://www.computing.dcu.ie/~jjudge/qtreebank/. A handful of errors in corpus annotation were corrected in this downloaded dataset.
6.
As noted in Sect. 2 we tested both the Berkeley’s parser’s pre-built eng_sm5 grammar, as well as our own retrained version that carried out six split-merge iterations. The results did not change. The results also remained the same when we used Berkeley parser’s -accurate switch. In general, results did not change for any of the parsers when we substituted stock or should for will. Note that here the Berkeley parser is using its own part of speech tagger. If we force it to use “gold standard” part of speech tags, then it could not possibly fail in the manner we have described. However, we wanted to examine the parser’s own performance, not some exogenous part of speech tagger.
7.
For CJ-I we selected the “best” (highest likelihood parse score) from the output of the CJ-I parser. In fact, in several cases, the 2nd best parse tree turned out to be the correct one; this was true, for instance, for sentence 4(h). On the other hand, just as often the best parse was correct and the 2nd best parse was incorrect, as in example 4(a). Note that the CJ-I parser serves as input to the CJ-R re-ranking parser, taking, e.g., the top-50 most likely parses and then sorting them according to a discriminative weighted feature-based scheme using features such as the degree of right-branching, or conjunct parallelism. Since the top 50 parses usually included the correct answer, the re-ranking parser at least had a chance of possibly selecting the correct answer in each case. Even so, re-ranking was ineffective, and did not change the outcome for any of the sentence examples here. See [6] for details about this re-ranking parser.
8.
The remaining examples are some simple S’s and a few newswire stories. The authors would like to thank C. Manning for generously sharing these additional examples with us.
9.
We put to one side the question of carrying out fMRI experiments on computers.

References

Abney, S. (1996). Statistical methods and linguistics. In J. Klavans, & P. Resnik (Eds.), The balancing act: Combining symbolic and statistical approaches to language (pp. 1–26). Cambridge/Massachusetts: MIT Press.
Google Scholar
Berwick, R. C., & Weinberg, A. S. (1982). The grammatical basis of linguistic performance. Cambridge: MIT Press.
Google Scholar
Bikel, D. (2004a). On the Parameter Space of Generative Lexicalized Statistical Parsing Models. Ph.D. Thesis, University of Pennsylvania, Department of Computer Science.
Google Scholar
Bikel, D. M. (2004b). Intricacies of Collins’ parsing model. Computational Linguistics, 30(4), 479–511.
Article MATH Google Scholar
Charniak, E. (2000). A maximum-entropy inspired parser. In Proceedings of the First Meeting of the North American Chapter of the Association for Computtaional Linguistics (pp. 132–139), Seattle. Association for Computational Linguistics.
Google Scholar
Charniak, E., & Johnson, M. (2005). Coarse to fine n-best parser and maxent discriminative reranking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (pp. 173–180), Ann Arbor. East Stroudsburg: Association for Computational Linguistics.
Google Scholar
Chiang, D., & Bikel, D. M. (2002). Recovering latent information in treebanks. In Proceedings of the 19th International Conference on Computational Linguistics (pp. 183–189), Howard International, Tapei.
Google Scholar
Chomsky, N. (1957). Syntactic structures. The Hague: Mouton.
Google Scholar
Chomsky, N. (1968). Language and mind. New York: Harcourt-Brace.
Google Scholar
Chomsky, C. (1969). The acquisition of syntax in children from 5 to 10. Cambridge: MIT Press.
Google Scholar
Clark, S., & Curran, J. (2007). Wide-coverage efficient statistical parsing with ccg and log-linear models. Journal of the Association for Computational Linguistics, 33, 493–452.
Article MATH Google Scholar
Clark, A., & Lappin, S. (2009). Another look at indirect negative evidence. In Proceedings of the EACL 2009 Workshop on Cognitive Aspects of Computational Language Acquisition (pp. 26–33), Athens. Association for Computational Linguistics.
Google Scholar
Clegg, A. B. (2008). Computational-linguistic approaches to biomedical text sining. Ph.D. thesis, Birbeck College, University of London.
Google Scholar
Collins, M. (1997). Three generative, lexicalized models for statistical parsing. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (pp. 16–23), Madrid. Association for Computational Linguistics.
Google Scholar
Collins, M. (1999). Head-driven statistical models for natural language parsing. Ph.D. thesis, University of Pennsylvania.
Google Scholar
Collins, M. (2003). Head-driven statistical models for natural language parsing. Computational Linguistics, 29(4), 589–637.
Article MathSciNet MATH Google Scholar
Crain, S., & Nakayama, M. (1987). Structure dependence in grammar formation. Language, 63, 522–543.
Article Google Scholar
Curran, J., Clark, S., & Bos, J. (2007). Linguistically motivated large-scale nlp with C&C and boxer. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions (pp. 33–36), Prague, Czech Republic: Association for Computational Linguistics.
Google Scholar
Eisner, J. (2001). Smoothing a probabilistic Lexicon via syntactic transformations. Ph.D. thesis, University of Pennsylvania.
Google Scholar
Gleitman, L., Gleitman, H., & Shipley, E. (1972). The emergence of the child as grammarian. Cognition, 1(2–3), 137–164.
Article Google Scholar
Hale, K., & Keyser, S. (1993). On argument structure and the lexical representation of syntactic relations. In K. Hale, & S. Keyser (Eds.), The view from building 20 (pp. 53–110). Cambridge: MIT Press.
Google Scholar
Hockenmaier, J. (2003a). Data and Models for Statistical Parsing with Combinatory Categorial Grammar. Doctoral Dissertation, University of Edinburgh.
Google Scholar
Hockenmaier, J. (2003b). Parsing with generative models of predicate-argument structure. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (pp. 359–366), Sapporo, Japan: Association for Computational Linguistics.
Google Scholar
Jackendoff, R. (1999). Why can’t computers use English? New York: Linguistic Society of America (LSA) Publications.
Google Scholar
Johnson, M. (1998). Pcfg models of linguistic tree representations. Computational Linguistics, 24(4), 613–632.
Google Scholar
Judge, J., Cahill, A., & van Genabith, J. (2006). Questionbank: Creating a corpus of parse-annotated questions. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (pp. 497–504), Sydney, Australia: Association for Computational Linguistics.
Google Scholar
Klein, D., & Manning, C. (2003a). Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (pp. 423–430), Sapporo. East Stroudsburg: Association for Computational Linguistics.
Google Scholar
Klein, D., & Manning, C. (2003b). Fast exact inference with a factored model for natural language parsing. In Advances in Neural Information Processing Systems (pp. 3–10), Cambridge.
Google Scholar
Lappin, S., & Shieber, S. M. (2007). Machine learning theory and practice as a source of insight into universal grammar. Journal of Linguistics, 43(2), 393–427.
Article Google Scholar
Levy, R. (2006). Probabilistic models of word order and syntactic discontinuity. Ph.D. thesis, Stanford University.
Google Scholar
Levy, R., & Andrew, G. (2006). Tregex and tsurgeon: Tools for querying and manipulating tree data structures. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, Genoa.
Google Scholar
Levy, R., & Manning, C. D. (2004). Deep dependencies from context-free statistical parsers: Correcting the surface dependency approximation. In Procedings of the 42nd Annual Meeting of the Association for Computational Linguistics (pp. 327–334). East Stroudsburg: Association for Computational Linguistics.
Google Scholar
Marcus, G. (2003). The algebraic mind. Cambridge: MIT Press.
Google Scholar
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1994). Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19(2), 313–330.
Google Scholar
Morgan, J., Meier, R., & Newport, E. (2004). Facilitating the acquisition of syntax with cross-sentential cues to phrase structure. Journal of Memory and Language, 28(3), 360–374.
Article Google Scholar
Musso, M., Moro, A., Glauche, V., Rijntjes, M., Reichenbach, J., Buchel, C., & Weiller, C. (2003). Broca’s area and the language instinct. Nature Neuroscience, 6, 774–81.
Article Google Scholar
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kubler, S., Marinov, S., & Marsi, E. (2007). Maltparser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2), 95–135.
Google Scholar
Nivre, J., Rimell, L., MacDonald, R., & Rodriguez, C. G. (2010). Evaluation of dependency parsers on unbounded dependencies. In Proceedings of the 23rd International Conference on Computational Linguistics, Beijing. International Association for Computational Linguistics.
Google Scholar
Parisse, C. (2012). Rethinking the syntactic burst in young children. In A. Alishahi, T. Poibeau, A. Korhonen, & A. Villavicencio (Eds.), Cognitive aspects of computational language acquisition. New York: Springer.
Google Scholar
Petrov, S., & Klein, D. (2007). Learning and inference for hierarchically split PCFG’s. In AAAI 2007 Nectar Track, Washington. AAAI.
Google Scholar
Petrov, S., & Klein, D. (2008). Sparse multi-scale grammars for discrimininative latent variable parsing. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (pp. 867–876), Honolulu. Association for Computational Linguistics.
Google Scholar
Petrov, S., Barrett, L., Thibaux, R., & Klein, D. (2006). Learning accurate, compact, and interpretable tree annotation. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (pp. 433–440), Sydney, Australia: Association for Computational Linguistics.
Google Scholar
Riezler, S., King, T. H., Kaplan, R. M., Crouch, R., Maxwell, J. T. I., & Johnson, M. (2002). Parsing the wall street journal using a lexical-functional grammar and discriminative estimation techniques. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-02) (pp. 271–278), Philadelphia, PA: Association for Computational Linguistics.
Google Scholar
Rimmell, L., Clark, S., & Steedman, M. (2009). Unbounded dependency recovery for parser evaluation. In Proceedings of the 2009 Meeting on Empirical Methods on Natural Language Processing (pp. 813–821), Singapore: Association for Computational Linguistics.
Google Scholar
Saffran, J., & Newport, E. (2007). Statistical learning in 8-month old infants. Science, 274(5294), 1926–1928.
Article Google Scholar
Sekine, S., & Collins, M. (2008). The evalb program.
Google Scholar
Shipley, E., Smith, C., & Gleitman, L. (1969). A study in the acquisition of language: Free responses to commands. Language, 45, 322–343.
Article Google Scholar
Smith, N., & Johnson, M. (2007). Weighted and context-free grammars are equally expressive. Computational Linguistics, 33(4), 477–491.
Article MathSciNet MATH Google Scholar
Smith, N., Tsimpl, I. -M., & Ouhalla, J. (1993). Learning the impossible: The acquisition of possible and impossible languages by a polyglot savant. Lingua, 91, 279–347.
Article Google Scholar
Smith, N. A., & Eisner, J. (2005). Guiding unsupervised grammar induction using contrastive estimation. In International Joint Conference on Artificial Intelligence (IJCAI) Workshop on Grammatical Inference Applications (pp. 73–82), Edinburgh, Scotland: Association for Computational Linguistics.
Google Scholar
Tateisi, Y., Yakushiji, A., Ohta, T., & Tsujii, J. (2005). Syntax annotation for the Genia corpus. In Proceedings of the International Joint Conference on Natural Language Processing (pp. 222–227), JJeju Island, Korea: Association for Computational Linguistics.
Google Scholar
Turian, J., & Melamed, I. D. (2006). Advances in discriminative parsing. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (pp. 873–880), Sydney, Australia: Association for Computational Linguistics.
Google Scholar
Wexler, K., & Culicover, P. (1983). Formal principles of language acquisition. Cambridge: MIT Press.
Google Scholar

Download references

Acknowledgements

We would like to thank Michael Coen and Ali Mohammed for assistance and valuable suggestions. More importantly, we would like to extend special thanks to those individuals who have graciously made their parsing systems publicly available for open experimentation, in particular Daniel Bikel and Michael Collins; John Judge for his extremely valuable QBank resource and his generosity in providing it to us; Mark Johnson and Eugene Charniak; the members of the Stanford NLP group, including Daniel Klein and Christopher Manning; the Berkeley NLP group, including Stan Petrov and Daniel Klein; and the Malt and C&C parser developers. Without their generosity, analyses like those carried out here would be impossible. Finally, we would like to acknowledge two anonymous reviewers whose suggestions greatly improved this work.

Author information

Authors and Affiliations

University of Arizona, Tuscon, AZ, 85721, USA
Sandiway Fong
Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Igor Malioutov, Beracah Yankama & Robert C. Berwick

Authors

Sandiway Fong
View author publications
You can also search for this author in PubMed Google Scholar
Igor Malioutov
View author publications
You can also search for this author in PubMed Google Scholar
Beracah Yankama
View author publications
You can also search for this author in PubMed Google Scholar
Robert C. Berwick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandiway Fong .

Editor information

Editors and Affiliations

Institute of Informatics, Federal University of Rio Grande do Sul, Av. Bento Gonçalves, Porto Alegre, 9500, Brazil
Aline Villavicencio
Universite Sorbonne Nouvelle, LATTICE-CNRS, Ecole Normale Superieure and, rue d'Ulm 45, Paris, 75005, France
Thierry Poibeau
Computer Laboratory, William Gates Building, University of Cambridge, Thomson Avenue 15 JJ, Cambridge, CB3 0FD, United Kingdom
Anna Korhonen
and Communication (TiCC), Tilburg University, Tilburg center for Cognition, Warandelaan 2, Tilburg, 5037, Netherlands
Afra Alishahi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fong, S., Malioutov, I., Yankama, B., Berwick, R.C. (2013). Treebank Parsing and Knowledge of Language. In: Villavicencio, A., Poibeau, T., Korhonen, A., Alishahi, A. (eds) Cognitive Aspects of Computational Language Acquisition. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31863-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-31863-4_6
Published: 27 September 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31862-7
Online ISBN: 978-3-642-31863-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Treebank Parsing and Knowledge of Language

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Predicate Argument Structures for Information Extraction from Dependency Representations: Null Elements are Missing

Treebank Annotation for Formal Semantics Research

A Multi-purpose Bayesian Model for Word-Based Morphology

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Treebank Parsing and Knowledge of Language

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Predicate Argument Structures for Information Extraction from Dependency Representations: Null Elements are Missing

Treebank Annotation for Formal Semantics Research

A Multi-purpose Bayesian Model for Word-Based Morphology

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation