A hybrid approach to hypertext generation

In this paper we present SAX, a system that generates hypertext descriptions of conceptual models designed with the SADT methodology. The combination of natural language and hypertext significantly lowers the communicative barrier between the analyst and the domain expert, thus increasing the effectiveness of conceptual model validation. The application of hybrid techniques for text generation guarantees an optimal trade-off between robustness and portability across domains on one side and text fluency on the other.

A Hybrid Approach to Hypertext Generation Nicola Cancedda1 , Gjertrud Kamstrup2, Emanuele Pianta2 and Ettore Pietrosanti3 1 2 Dipartimento di Informatica e Sistemistica, Universita di Roma \La Sapienza", via Salaria 113 - 00198 Roma, cancedda@dis.uniroma1.it IRST - Istituto per la Ricerca Scienti ca e Tecnologica, Loc. Pante di Povo - 38050 Trento, kamstrup@irst.itc.it, pianta@irst.itc.it 3 Finsiel SpA, via G. Bona - 00100 Roma, e.pietrosanti@ nsiel.it Abstract. In this paper we present SAX, a system that generates hypertext descriptions of conceptual models designed with the SADT methodology. The combination of natural language and hypertext signi cantly lowers the communicative barrier between the analyst and the domain expert, thus increasing the e ectiveness of conceptual model validation. The application of hybrid techniques for text generation guarantees an optimal trade-o between robustness and portability across domains on one side and text uency on the other. 1 Introduction Conceptual model validation is crucial in the information system development process, but this task may be dicult to accomplish if the domain expert is not acquainted with the formal language used by the Analyst. In such a situation the Analyst should provide the domain expert (Reader from now on) with additional and comprehensible documentation. Textual natural{language descriptions of the model are often suitable for this purpose. The motivation behind our work is that valuable Analyst time could be saved if these texts could be produced automatically. Furthermore, the Analyst herself may more easily detect aws in the model by reading a natural language description of it. In this paper we present SAX4 (SAdt eXplain), a system that automatically generates hypertext descriptions of SADT5 models. The input to the system is the SADT model representation adopted by the \DAFNE Tools", a CASE system developed by Finsiel SpA6 . The Analyst can tailor the hypertext generation through parameters that select the global presentation strategy and the way speci c parts of the diagram are described. The SAX architecture is shown in g. 1. It includes three main The SAX system is sponsored by Finsiel SpA and jointly realized by IRST, Finsiel SpA and the Computer Science Department of the University of Rome \La Sapienza". 5 Structured Analysis and Design Technique is a trademark and copyright by Softech Inc., see [MaMc88]. 6 DAFNETM is a Copyright and registered trademark ( c 1982, 1983, 1986, 1987, 19891996) of Data and Functions Networking, Finsiel Group. 4 User interface SADT model Label processor Labels Text Planner Syntactic module Communicative schemata and Linguistic Realizer Templates Morphological analyzer/generator HTML Writer Hypertext Fig. 1. SAX architecture components. The Text Planner is responsible for the content selection (what to say), the global textual organization (when to say what) and the sentence level template-based phrasing (how to structure sentences). The Linguistic Realizer performs morphological synthesis and phonological adjustment, while the HTML Writer translates a hypertext plan in a HTML document. SAX is implemented in Prolog and runs in the Windows environment. A rst version of the system is currently under beta-testing. When designing a text generation system, one should choose the linguistic resources which are most suitable to the task. A clear distinction is usually made between template-based approaches and deep generation based (NLG) approaches, see [Re95]. Template-based approaches are usually rated as ecient but not exible, while deep generation is considered exible but also inecient and resource consuming. Also, NLG systems are usually dicult to update and require specialized support, while template-based systems can be maintained by non linguists. The distinction between these two approaches tends to become less clear-cut. On the one hand, NLG systems are beginning to use templates when deep generation is not strictly necessary. On the other hand, we think that template-based systems could become more exible and powerful. In SAX we have pursued the goal of integrating the two approaches. Firstly, we decided to use a classical NLG approach for the planning of the global structure of the text using a schema-like representation, while using templates at the sentence level. Secondly, we tried to enhance the exibility of traditional static templates. The choice of a hybrid approach was based on system eciency requirements. First of all the system had to run on a PC. Moreover, the Analyst can tailor the text generation through a graphical interface and this requires iterations of parameter setting and on-line generation. 1.1 The SADT Language A SADT model is a collection of diagrams, organized in a tree structure. Each diagram is composed of boxes, representing activities, which are connected by arrows, representing ows of materials, data and informations (see g.5). Arrow objects play di erent roles with regard to the activities: inputs (from the left), outputs (to the right), controls (constraints on the activity - from the top) and mechanisms (participants in the activity - from the bottom). The arrows can branch and join as shown in g.5. Feedback arrows are outputs which go back to activities as inputs or controls. Some models are decorated with activation rules describing the dependencies between each arrow within an activity. Every box and arrow is given a label, i.e. a natural language expression informallydescribing an activity (boxes) or a ow (arrows). Box labels are constrained to be in nitive verb phrases while arrow labels should be noun phrases. No other constraint on the linguistic form of the labels is imposed, a part from the fact that labels should be short, both for readability and for layout purposes. For this reason a glossary adding information for each label, is included in the model. Abbreviations and acronyms in the labels (like TEC, VEN and others in g.5) are automatically expanded by the system using a table written by the Analyst. 2 A Flexible Template-Based Approach In this section we present the hybrid approach to text generation used in SAX. During Text Planning the structure and the content of the text is computed using enhanced versions of both textual schemas and templates. Schemas allow to specify the global structure of a text while templates are used to cope with the sentence level. 2.1 Communication Schemas To represent the global structure of the text we use communication schemas [NoPi95]. A communication schema is a variant of rhetorical schemas [Mc85], enhanced with the communicative intention that the corresponding presentational pattern is meant to satisfy. All the schemas in SAX have been identi ed through analysis of a corpus of hand-written SADT diagram descriptions. A communication schema is shown in g.2. The head slot identi es the schema. The intentions slot is meant to allow a free documentation string about the communicative intentions behind the schema. On the other hand, the e ect slot describes, in a simpli ed but formal way, the mental state that the schema is meant to induce in the Reader7 . 7 The content of the e ect slot corresponds to what is known in the NLG literature as communicative intention. See [MoPa94] for the description of an application based on cycles of generation and follow-up questions, where the representation of the communicative intentions of the system plays a crucial role. In SAX communicative intentions guided the elicitation of the communication schemas, but for the moment they play no role during computation. The constraints must be veri ed before applying the schema. The body slot includes a list of sub-schemas that articulate the main schema. Each sub-schema can be optional and its expansion can be restricted by local constraints. The order slot includes linear precedence constraints on sub-schemas. If no linear precedence constraint is speci ed, a default order is assumed. The example in g.2 describes how to present the overall activity modeled by a SADT diagram. Observe that the body includes both a sub-schema and a template. c_schema( head( main_activity(ActivityId)), intentions( 'present the main activity of the diagram'), effect(know(reader, structure_of(ActivityId))), constraints( (sadt_parameter(strategy, forward), activity_topology(ActivityId, ActTopology))), body( [activity(ActivityId, forward, ActTopology), template(sub_activities_summary(ActivityId, forward))]), order([before(activity(_, _), sub_activities_summary(_))])). Fig. 2. A communication schema used in SAX 2.2 The Hyper-Template Formalism The main features of this formalism are exibility and the ability to cope with hypertextual objects. Flexibility is given by the possibility of coping with morphological agreement and phonological adjustment. On the other hand, the formalism also allows the functional description of hyper-links and images to be inserted in the nal hypertext. A template is a declarative structure including two kinds of elements: connectives and gaps. Connectives are formed by preselected linguistic items, while gaps can be seen as variables which are instantiated by other linguistic items ( llers) during the generation process (template resolution). Both connectives and llers may include xed or exible items. Flexible items are realized di erently according to the context in which they occur. In the current implementation of the formalism, exible items may undergo morphological and/or phonological variation. Here is a list of the most interesting elements that can be included in a template de nition (see also g.3): Potential words A potential word is a word form which can undergo phono- logical adjustment. It is described by a term specifying the base form and its lexical category: w(noun, responsibility). Sequences of potential words are mapped onto sequences of strings by the Linguistic realization component. For example, [w(preposition, di), w(article, i), w(name, responsabili)] becomes ['dei', 'responsabili']. Morphological bundles These are sets of morphological features that are mapped onto potential words and then on strings by the Linguistic Realization component. For example, the bundle morpho([cat=noun, pred=company, num=plur]) is mapped onto the potential word w(noun, companies) and then onto the string 'companies'. When used in the template de nitions, the values of morphological features can be variables: morpho([cat=noun, pred=company, num=Num]). Morphological variables allow to treat agreement phenomena which are dicult to handle with static templates; these variables are instantiated during the template resolution process. Picture descriptors Templates can introduce a picture in a hypertext by specifying the absolute name of a le or a functional expression which is evaluated during template resolution. Slots A slot lls a gap with linguistic items during template resolution. In the SAX application domain these linguistic items are selected from the labels of boxes and arrows of a diagram. Here is an example of a slot speci cation: slot(inputs, ActId, Agreement). This expression refers to the input labels of the activity identi ed by ActId (input parameter). The Agreement variable is instantiated as a result of lling the slot (output parameter). A slot expression can be extended by a further expression specifying the syntactic elaboration that must be performed on the labels that ll the slot: slot(label, ActId) with parse nominalization.This slot will be lled with the nominalized label(s) of the ActId activity. Control expressions Template de nitions can include conditional and disjunc- tive expressions. Conditional expressions bind the resolution of a subpart of the template to the satisfaction of certain constraints. Disjunctive expressions give alternative ways of expressing something, for example: or(['taking into account', 'considering']). Formatting The template formalism allows to include any subpart of a body within the scope of one or more formatting operators such as: italic, bold, etc. All HTML format operators can be used. The formalism also supports style de nitions, as sets of format operators: style(my style, [list, italic]). title1 Links Links are treated as a special class of format instructions. They are spec- i ed through complex terms, which refer to linked documents through absolute addresses ( le name) or functional descriptors, evaluated at run time. Here is an example of a functional link description: link(to, glossary(activity, paragraph)). This descriptor is evaluated as a link from an activity description to the corresponding entry of the activity glossary. template(diagram title(DiagId, ActId), [ format([title1, title case], /* format operator list */ slot(label, ActId)), /* slot without syntactic elaborations */ &newline, /* special character */ if then else( /* control expression */ ( /* constraint conjunction */ has father diagram(DiagId, FatherDiagId), not sadt parameter(presentation, only text) ), /* then */ [format( [picture link(FatherDiagId]), /* parametric style */ [picture(image, 'father.gif')]), /* absolute identifier */ &newline], /* else */ [&newline]), if then else( has son diagram(DiagId), /* simple constraint */ /* pictures identified through functional descriptors */ [picture(map, diagram image(DiagId))], /* clickable image */ [picture(image, diagram image(DiagId))]) /* simple image */ ]). Fig. 3. A sample hyper-template de nition 3 The SAX System 3.1 The Generation Process The generation starts with a preliminary step (not shown in g. 1) in which the input model is translated from the source language to a Prolog representation. The hypertext generation is performed in three phases: Text Planning, Linguistic Realization and HTML Writing. The Text Planning component computes the content and the structure of the text on the basis of communication schemas and templates. The Text Planner recursively expands a root schema following a top down strategy and building a text tree structure. The selection of communication schemas is guided by the topological structure of the diagram being described and by the user preferences (see sect. 3.3). When the Text Planner reaches the sentence level a template solver is called which integrates diagram labels in the text tree structure. The output of the Text Planner is a textual tree with instantiated templates as leaves. Instantiated templates may include strings, potential words and ground morphological bundles, i.e. a subset of the elements available when de ning templates. The Linguistic Realizer maps morphological bundles onto words; all possible phonological and orthographic adjustments are carried out. The HTML Writer then maps the textual tree onto a HTML document. 3.2 A Hybrid Approach to Label Transformations In order to produce a uent and readable text, a label processor adapts the activity and arrow labels to the context in which they are inserted. Basically, four kinds of operations are performed: { a word can be substituted with a morphological bundle which is parametric with respect to some agreement features; { a verb phrase can be nominalized e.g. from 'to produce ice-cream' to 'the production of ice-cream'; { when referring to an already mentioned activity, the complements of the nominalization can be left out e.g. from 'to produce ice-cream' to 'the production'; { de nite or inde nite articles and prepositions can be inserted, e.g. from 'de nizione prodotti' (lit. 'de nition products', italian telegraphic style) to 'la de nizione dei prodotti' ('the de nition of the products'); The transformation of labels is performed through a special kind of syntactic analysis. During the system analysis phase two approaches to label transformation were considered. The rst one is based on pattern matching, the second on a full cycle of syntactic analysis, transformation and re-generation. After some testing we realized that using pattern matching would have led to a great number of ad hoc rules; on the other hand using the full machinery of a text analyzer and generator seemed an overkill given the relatively simple syntactic transformations that were needed. The solution adopted in the system is again a hybrid one. The transformations are carried out by a De nite Clause Grammar which performs a shallow syntactic analysis of labels and, instead of generating a parse tree and/or a semantic representation, produces a sequence of linguistic items which can be strings, potential words or morphological bundles. These sequences are mapped onto actual sentences through morphological synthesis and phonological adjustment; given this approach deep sentence generation is unnecessary. The DCG grammar has been designed so as to guarantee at least a partial analysis/transformation of the labels. 3.3 User Preferences The Analyst can make some choices about the text properties by setting global or local parameters. At the global level (description of the whole diagram) she can choose the description strategy i.e. how the content is linearly presented. A thorough analysis of a corpus of human-written descriptions8 has led to the identi cation of two main strategies: forward (for each activity the inputs are described rst) and backward (the description starts from outputs). The Analyst can include or exclude the description of branches, joins, feedbacks and activation rules. The Analyst can also set the values of local parameters associated with single diagram elements, i.e. boxes and arrows. The current version of the system allows to set the following parameters: { label transformation. The user can force or prevent any of the implemented label transformations on each label. { grammatical number speci cation. Sometimes the grammatical number of a label is ambiguous. The user can disambiguate it by adding morphological information to the words in the label. { verb selection. As the system can not perform a deep semantic analysis of the labels, it describes the relation between activities and arrows with generic verbs, e.g.: \activity x produces y". The user can force the system to use a more speci c verb. Both global and local parameters can be saved and re-used in subsequent generations for the same model. Given the incremental nature of the conceptual model de nition this represents a clear advantage over \brute" post-editing. 4 System Output In g.4 the top of a hypertext description can be seen. The gure shows the clickable diagram and the rst part of the text (an approximate English translation is given in g.5). The text is generated using a forward strategy (see sect. 3.3). The text shows some examples of label transformations. The arrow label \materiali" has become \dai materiali" and the activity label \de nire prodotti e obiettivi di produzione" has become \la de nizione dei prodotti e degli obiettivi di produzione" (see also sect. 3.2). Phrases referring to activities (e.g. \gestire la produzione dei beni industriali") or to arrows (e.g. "la normativa") are associated with links to a glossary page where the activity or arrow is explained in more detail. These are just some of the available links: g.6 contains a schematization of the network of links9. Potential SAX users have been involved in di erent phases of the project. In particular, Analysts have been asked to write descriptions of SADT diagrams and to evaluate the coherence between the authomatically generated texts and the content of the diagrams. 9 The bold boxes in the gure are hypertext pages, the grey arrows are hyperlinks, the arrow icons inside the main diagram page are hyperlink buttons, and the activity and arrow references are pieces of text with connected links. Finally, the map image is a clickable version of the SADT diagram. 8 Fig.4. The rst part of the diagram description with clickable image The activity of managing industry production is carried out using the materials, the orders and the customer confirmations. This activity is influenced by the following factors: - the budget - the regulation - the plant characteristics - the company strategies The results of the activity are the production goals, the production numbers, the products and the purchase orders. The activity of managing industry production can be divided into the following subactivities: 1. the definition of products and production goals 2. the manufacturing of quality products 3. the management of customer/provider relations [A description of each sub-activity follows ...] Fig.5. Partial translation of the generated text Fig.6. 5 Page hierarchy and connecting links Related Work Text generation as a method for validating and easing the non-expert's comprehension of conceptual models has been treated by several researchers (see [Da92, Re94, Gu95, PaKu+96]). Many researchers also agree upon the usefulness of natural language descriptions of models for the author herself: the text can help her in iteratively designing the model. The above mentioned systems use full Natural Language Generation (NLG) technology and generate plain text. Other systems presented in the literature are relevant for our purposes because they produce hypertexts, although not in the conceptual model validation domain. These systems use template-based or hybrid generation techniques to generate dynamic hypertexts, i.e. hypertexts that are generated in response to user requests (clicks on links), possibly taking the browsing context into account10. IDAS [ReMe+95] generates hypertexts concerning technical documentation. IDAS was initially designed to use pure NLG technology, but hybrid approaches were applied after a bene t-cost evaluation. Important aspects of hypertext generation are considered in ILEX [KnMe+96]. Upon request from the user, ILEX generates dynamic hypertext for simulated tours in a museum. In the system a hybrid text production approach is used: canned text is interleaved with information coming from KB entries. PEBA-II [DaMi96, MiTu+96] dynamically generates hypertext descriptions of a zoological database through an online interface. The authors argue that hypertext signi cantly eases the user modeling task since part of the content selection is made by the user. In both PEBA-II and ILEX discourse history is used to obtain context sensitive text. In [Ge96] a template-based approach is used both at the planning level and at the realization level. Page-templates lled with information from databases are used to generate hypertext nodes in a movie festival context. SAX combines aspects dealt with by both these two groups of systems. The initial goal of our system is the validation of a conceptual model represented in the SADT language. The system requirements and the nature of our domain made us choose a hybrid text production approach. The result is presented as a hypertext so that the user can easily move from one diagram description to all related pages. The hypertexts include clickable images of SADT diagrams, which make the navigation in the whole model description easier. 6 Conclusions This paper discussed the approach followed in SAX, a system for automatic generation of hypertextual descriptions of SADT models. The adopted solutions combine the advantages of hypertextual output format with those of a hybrid natural language generation architecture. On the basis of the system eciency requirements (PC hardware and on-line generation), a hybrid solution has been chosen for text generation: a classical NLG approach is adopted for the global text planning while exible templates are used for sentence level generation. The formalism devised for templates copes with morphological agreement and phonological adjustment, thus allowing the generation of exible and uent text. The decision to give up with traditional sentence-level generation, led to a system running on PCs, in the Windows environment, and capable of generating each diagram description in less than three seconds. The Analyst is given the possibility to in uence the output both at a global level, by stating a \description strategy" to follow, and locally, by constraining the way single diagram elements are linguistically expressed. 10 The content and organization of predetermined by its authors. static hypertext on the contrary, are completely SAX has been designed in a modular way in order to single out the part of the system which depends on the SADT methodology. We feel that the system is easily portable to other conceptual modeling methodologies. References [DaMi96] Robert Dale and Maria Milosavljevic, March 1996. Authoring on Demand: Natural Language Generation in Hypertext Documents. In Proceedings of the First Australian Document Computing Conference, Melbourne, Australia. [Da92] Hercules Dalianis, 1992. A method for Validating a Conceptual Model by Natural Language Discourse Generation. In Proceedings of the Fourth International Conference on Advanced Information Systems Engineering, Springer Verlag, 425{444. [Ge96] Sabine Geldof, June 1996. Hyper-Text Generation from Databases on the Internet. In Proceedings of the second international Workshop on Applications of Natural Language to Information Systems, Amsterdam, The Netherlands. [Gu95] Jon Atle Gulla, 1995. A General Explanation Component for Conceptual Modeling in CASE Environments. ACM Transactions on Information Systems. [KnMe+96] Alistair Knott, Chris Mellish, Jon Oberlander and Mick O'Donnell, 1996. Sources of Flexibility in Dynamic Hypertext Generation. In Proceedings of the International Workshop on Natural Language Generation. [MaMc88] David A. Marca and Clement L. McGowan, 1988. SADT, Structured Analysis and Design Technique. McGraw-Hill Book Company. [Mc85] Kathleen McKeown, 1985. Text Generation. Cambridge University Press, Cambridge. [MiTu+96] Maria Milosavljevic, Adrian Tulloch and Robert Dale, January 1996. Text Generation in a Dynamic Hypertext Environment. In Proceedings of the 19th Australasian Computer Science Conference, Melbourne, Australia. [MoPa94] Johanna D. Moore and Cecil L. Paris, 1994. Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information. Computational Linguistics, Vol.19,4. [NoPi95] Elena Not and Emanuele Pianta, April 1995. Speci cations for the Text Structurer. GIST deliverable, TST-2, LRE Project 062-09. [PaKu+96] Rebecca Passonneau, Karen Kukich, Jacques Robin, Vasileios Hatzivassiloglou, Larry Lefkowitz and Hongyan Jing, June 1996. Generating Summaries of Work Flow Diagrams. In Proceedings of the International Conference on Natural Language Processing and Industrial Applications, Moncton, Canada. [Re94] Ehud Reiter, 1994. Linguistically Based Generation of Software Documentation. Final Technical Report RL-TR-94-110, Rome Laboratory (USAF), New York, USA. [Re95] Ehud Reiter, 1995. NLG vs. Template. In Proceedings of the fth Workshop on Natural Language Generation, Leiden, The Netherlands. [ReMe+95] Ehud Reiter, Chris Mellish and John Levine, 1995. Automatic Generation of Technical Documentation. Applied Arti cial Intelligence, 9:259{287. This article was processed using the LATEX macro package with LLNCS style

Log In

A hybrid approach to hypertext generation

A hybrid approach to hypertext generation

Related Papers

RELATED PAPERS