[go: up one dir, main page]

Academia.eduAcademia.edu
This article was downloaded by: [Vrije Universiteit, Library] On: 8 June 2011 Access details: Access Details: [subscription number 907218003] Publisher Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 3741 Mortimer Street, London W1T 3JH, UK Discourse Processes Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t775653637 Surface Cues of Content and Tenor in Texts Luuk Lagerwerf; Wilbert Spooren; Liesbeth Degand Online publication date: 08 June 2010 To cite this Article Lagerwerf, Luuk , Spooren, Wilbert and Degand, Liesbeth(2006) 'Surface Cues of Content and Tenor in Texts', Discourse Processes, 41: 2, 111 — 116 To link to this Article: DOI: 10.1207/s15326950dp4102_1 URL: http://dx.doi.org/10.1207/s15326950dp4102_1 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. DISCOURSE PROCESSES, 41(2), 111–116 Copyright © 2006, Lawrence Erlbaum Associates, Inc. Downloaded By: [Vrije Universiteit, Library] At: 18:45 8 June 2011 INTRODUCTION Surface Cues of Content and Tenor in Texts This special issue of Discourse Processes contains a selection of articles from the workshop Multidisciplinary Approaches to Discourse (MAD03)—a biennial workshop bringing together researchers from various disciplines and with a mutual interest in the study of discourse. The 2003 edition’s aim was to tackle the issue of how to analyze content and tenor of texts. This topic has a background in various disciplines, which were all represented at the workshop: content analysis, discourse psychology, and computational and cognitive–functional linguistics. A number of articles addressed questions concerning the cues signaling a text’s content and tenor and the kind of information or effect that is conveyed this way. Four manuscripts evolving from these articles were selected for this issue. They are examples of the various approaches of discourse analysis making use of text corpora. Different statistical and computational techniques are used to analyze surface cues that signal content or tenor in texts. In this introduction, we present a short overview of current topics in corpus analysis as a tool for discourse analysis. We will show how the four contributions to this special issue represent recent developments. CORPUS ANALYSIS AS A TOOL FOR DISCOURSE ANALYSIS Corpus analysis is a method of linguistic analysis based on naturally occurring samples of texts and spoken discourse (corpora). Corpora have become a standard element of the linguist’s toolkit. Their function is obvious: If linguists want to study actual language use, they need to look at concrete instances of that use to come up with new hypotheses, to increase the reliability of their analyses, and to test already available hypotheses. The advent of sophisticated computer technologies has made it feasible to perform large-scale systematic research of large bodies of text on specific linguistic Downloaded By: [Vrije Universiteit, Library] At: 18:45 8 June 2011 112 INTRODUCTION properties. Large corpora of text have been assembled for linguistic research purposes, as well as methods to retrieve the relevant linguistic information from these texts (see Biber, Conrad, & Reppen, 1998, for an example of this kind of work). It follows that traditional sentence analysis tends to be replaced by discourse analysis for various reasons: Robust linguistic models need to be able to cope with the complexities of discourse and with naturally occurring examples usually found in discourse. The complexity of linguistic phenomena in discourses makes it inconvenient and implausible to make up discourses for analytic purposes. In all areas of linguistic research, corpora are being used. Some influential examples are studies of cross-language variation (Degand, 2001; Granger, Lerot, & Petch-Tyson, 2003), cross-genre stylistic variation (Conrad & Biber, 2001), cross-linguistic comparison of conceptual metaphor (Cameron & Low, 1999), and language learning (Tomassello, 2003). Tools for corpus analysis are not applied in linguistics exclusively. An example from social psychology is the Linguistic Inquiry and Word Count Dictionary, which is used to detect personality differences between people on the basis of tenor differences in their language use (Pennebaker, Francis, & Booth, 2001; Pennebaker & King, 1999). By using techniques of tagging parts of speech in electronic corpora, it is possible to make abstractions over linguistic categories. Developments in rule-based and probabilistic methods of recognizing and annotating linguistic elements have made tagging faster and more extensive (Brill, 1995; van Halteren, Daelemans, & Zavrel, 2001). The representations that part-of-speech taggers produce are very useful for discourse analysis. It has become possible to search systematically for those surface cues that signal discourse phenomena. These increasing possibilities make it more interesting to study the way in which surface cues may determine discourse characteristics. This special issue is dedicated to the study of some of these surface cues. The question that unites the articles is what discourse functions these surface cues serve. The articles differ in their choice of surface cues and discourse function. Together they provide an overview of current approaches in discourse analysis. DISCOURSE ANALYSIS In both linguistics and discourse psychology, discourse analysis has become a prominent part of the work. In psychology, discourse analysis is used to study how a reader goes through the process of converting linguistic symbols into knowledge (Graesser, Millis, & Zwaan, 1997; Kintsch, 1998; van Dijk & Kintsch, 1983). Textual elements such as connectives function as cues to build a hierarchical structure of a text (Mann & Thompson, 1988), and they help in building a coherent representation (Sanders, Spooren, & Noordman, 1993). Moreover, connectives are used to indicate interactive effects of written discourse, such as the degree of a writer’s Downloaded By: [Vrije Universiteit, Library] At: 18:45 8 June 2011 INTRODUCTION 113 subjectivity toward the content of what is expressed (Halliday, 1985; Langacker, 1985; Pander Maat & Degand, 2001; for spoken discourse, see Schiffrin, 1987). Connectives thus serve as surface cues modeling content (hierarchical and coherent representations) as well as tenor (subjectivity). Parallel to the work on discourse representation, computational models have been developed to represent the content of a text (Gardent & Webber, 2001; Lagerwerf, 1998; Polanyi, 1988; Prüst, Scha, & van den Berg, 1994). These models make use of formal theories of discourse representation (Asher & Lascarides, 2003; Beaver, 2001; Kamp & Reyle, 1993). There also has been a substantial body of computational work on the link between the intention of a writer and the production of text (Grosz & Sidner, 1986; Hovy, Lavid, Maier, Mittal, & Paris, 1992; Matthiessen & Bateman, 1991). These theories model the tenor of a text. As in the discourse psychological and text linguistic work, all of these computational approaches study the properties of surface cues as the provider of essential information with which to build their models. A recent line of work is based on the co-occurrence of words and statistical analysis (Bod & Scha, 1996). One of the more sophisticated frameworks is Latent Semantic Analysis (LSA; Landauer, Foltz, & Laham, 1998). In this framework, a semantic space is built on the basis of a specific statistical analysis of all word–context combinations in a text corpus. LSA possibly mimics the cognitive processes that take place during language comprehension (Landauer & Dumais, 1997), including the processing of text coherence (Foltz, Kintsch, & Landauer, 1998). In this line of work, corpus analysis and discourse analysis are integrated. Surface cues are, in fact, all words in their contexts, without distinction, and frequencies of words are the initial measures. In a second stage, when semantic spaces have been built, distances between specific surface cues can be used to make discourse analytic inferences. The combination of corpus analysis and discourse analysis enables researchers to build specific computer applications. Numerous computer applications of LSA exist (Foltz, 2005). A comprehensive application addressing the automatic determination of the readability of texts, using LSA cohesion measures as well as well-known readability formulas and other measures, is Coh-Metrix (Graesser, McNamara, Louwerse, & Cai, 2004). In this issue, each article represents one of these approaches in discourse analysis. The articles exemplify discourse analysis in various forms, with the aid of different kinds of corpus analysis, including both quantitative and qualitative methods. They share the purpose of analyzing how surface cues indicate and model content and tenor of texts. TEXTUAL SURFACE CUES OF CONTENT AND TENOR The four contributions in this issue vary in the kind of information that surface cues give: density of information, subjectivity, causality, and interclausal versus Downloaded By: [Vrije Universiteit, Library] At: 18:45 8 June 2011 114 INTRODUCTION intraclausal discourse representation. They also vary in method of corpus analysis: descriptive and hypothesis-testing statistical techniques, automated statistical techniques to falsify hypotheses, and in-depth qualitative analyses of selected examples. Together they exemplify the wide variety of discourse analysis in corpus research. Dorit Ravid and Ruth Berman show how written narratives, compared with spoken narratives about the same event from the same narrator, contain much less nonreferential material, making the representation of information more dense and providing less explicit clues about the representation. They compared spoken and written corpora on the use of specific surface cues. Their contribution represents the discourse psychological approach. Mirna Pit shows how subjectivity is expressed differently for several causal connectives by analyzing how these connectives interact with other subjectivity indicators in a corpus analysis. A corpus of newspaper items was analyzed systematically, and a linguistic approach to discourse analysis was followed in this article. Yves Bestgen, Liesbeth Degand, and Wilbert Spooren used automated techniques like LSA and Thematic Text Analysis to test hypotheses about the subjectifying properties of certain causal connectives. They used large amounts of texts as input for their analyses. Their information retrieval approach was used to test discourse analytic hypotheses. Michael Grabski and Manfred Stede studied the different occurrences of the German preposition–connective bei and analyzed its function for discourse representation. They based their analysis on selected examples from several corpora. Their work can be placed in the context of computational discourse analysis. With this issue, we intend to present to the community one version of the state of the art with respect to discourse analysis in corpus research. We hope that this volume will contribute to that aim. Other approaches of text research that came forth from MAD03 were published in a special issue of Information Design Journal + Document Design (see Foltz, 2005). We thank the following people for their role in the reviewing process: Nadjet Bouayad, Universitat Pompeu Fabra Wallace Chafe, University of California at Santa Barbara Lucile Chanquoy, Université de Nice Sophia Antipolis Peter Foltz, New Mexico State University Alistair Gill, University of Edinburgh Michael Grabski, Technical University of Berlin Eduard Hovy, University of Southern California Walter Kintsch, University of Colorado Emiel Krahmer, University of Tilburg Ronald Langacker, University of California, San Diego Max Louwerse, University of Memphis INTRODUCTION 115 Leonoor Oversteegen, University of Tilburg Marie-Paule Péry-Woodley, Université de Toulouse-Le Mirail Livia Polanyi, FX Palo Alto Laboratory and PARC Dorit Ravid, Tel Aviv University Dorit Ravid, Tel Aviv University Downloaded By: [Vrije Universiteit, Library] At: 18:45 8 June 2011 Luuk Lagerwerf Wilbert Spooren Liesbeth Degand Co-Editors REFERENCES Asher, N., & Lascarides, A. (2003). Logics of conversation. Cambridge, England: Cambridge University Press. Beaver, D. I. (2001). Presupposition and assertion in dynamic semantics. Chicago: University of Chicago Press. Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge, England: Cambridge University Press. Bod, L. W. M., & Scha, J. H. (1996). Data oriented language processing: An overview. Amsterdam: ILLC. Brill, E. (1995). Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics, 21(4), 543–565. Cameron, L., & Low, G. (Eds.). (1999). Researching and applying metaphor. Cambridge, England: Cambridge University Press. Conrad, S., & Biber, D. (Eds.). (2001). Variation in English: Multi-dimensional studies. London: Longman. Degand, L. (2001). Form and function of causation. A theoretical and empirical investigation of causal constructions in Dutch. Leuven, Belgium: Peeters. Foltz, P. W. (2005). Automated content processing of spoken and written discourse: Text coherence, essays, and team analyses. Information Design Journal + Document Design, 13(1), 5–13. Foltz, P. W., Kintsch, W., & Landauer, T. K. (1998). The measurement of textual coherence with latent semantic analysis. Discourse processes, 25(2–3), 285–307. Gardent, C., & Webber, B. (2001). Towards the use of automated reasoning in discourse disambiguation. Journal of Logic, Language and Information, 10(4), 487–509. Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2), 193–202. Graesser, A. C., Millis, K. K., & Zwaan, R. A. (1997). Discourse comprehension. Annual Review of Psychology, 48, 163–189. Granger, S., Lerot, J., & Petch-Tyson, S. (Eds.). (2003). Corpus-based approaches to contrastive linguistics and translation studies. Amsterdam: Rodopi. Grosz, B. J., & Sidner, C. L. (1986). Attention, intentions, and the structure of discourse. Computational Linguistics, 12, 175–204. Halliday, M. A. K. (1985). An introduction to functional grammar. London: Edward Arnold. Hovy, E., Lavid, J., Maier, E., Mittal, V., & Paris, C. (1992, April). Employing resources in a new text planner architecture. In Proceedings of the 6th International Workshop on Natural Language Generation. Workshop conducted in Castel Ivano, Trento, Italy. Downloaded By: [Vrije Universiteit, Library] At: 18:45 8 June 2011 116 INTRODUCTION Kamp, H., & Reyle, U. (1993). From discourse to logic. Introduction to modeltheoretic semantics of natural language, formal logic and discourse representation theory. Dordrecht, The Netherlands: Kluwer. Kintsch, W. (1998). Comprehension: A paradigm for cognition. Cambridge, England: Cambridge University Press. Lagerwerf, L. (1998). Causal connectives have presuppositions. Effects on discourse structure and coherence. Utrecht, The Netherlands: LOT. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: the Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104, 211–240. Landauer, T. K, Foltz, P. W., & Laham, D. (1998). An introduction to Latent Semantic Analysis. Discourse Processes, 25, 259–284. Langacker, R. W. (1985). Observations and speculations on subjectivity. In J. Haiman (Ed.), Iconicity in Syntax (pp. 109–150). Amsterdam: Benjamins. Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3), 243–281. Matthiessen, C. M. I. M., & Bateman, J. A. (1991). Text Generation and systemic-functional linguistics; experiences from English and Japanese. London: Pinter. Pander Maat, H., & Degand, L. (2001). Scaling causal relations and connectives in terms of speaker involvement. Cognitive linguistics, 12(3), 211–246. Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count dictionary: LIWC 2001. Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology, 6, 1296–1312. Polanyi, L. (1988). A formal model of the structure of discourse. Journal of Pragmatics, 12, 601–638. Prüst, H., Scha, R., & van den Berg, M. (1994). Discourse grammar and verb phrase anaphora. Linguistics and Philosophy, 17, 261–327. Sanders, T. J. M., Spooren, W. P. M., & Noordman, L. G. M. (1993). Coherence relations in a cognitive theory of discourse representation. Cognitive Linguistics, 4, 93–133. Schiffrin, D. (1987). Discourse markers. Cambridge, England: Cambridge University Press. Tomassello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge, MA: Harvard University Press. van Dijk, T. A., & Kintsch, W. (1983). Strategies of discourse comprehension. Orlando, FL: Academic. van Halteren, H., Daelemans, W., Zavrel, J. (2001) Improving accuracy in word class tagging through the combination of machine learning systems. Computational Linguistics, 27(2), 199–229.