[go: up one dir, main page]

Lecture Notes in Artificial Intelligence Edited by R. Goebel, J. Siekmann, and W. Wahlster Subseries of Lecture Notes in Computer Science 5221 Bengt Nordström Aarne Ranta (Eds.) Advances in Natural Language Processing 6th International Conference, GoTAL 2008 Gothenburg, Sweden, August 25-27, 2008 Proceedings 13 Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editors Bengt Nordström Aarne Ranta Chalmers University of Technology Department of Computer Science and Engineering 41296 Göteborg, Sweden E-mail: {bengt, aarne}@chalmers.se Library of Congress Control Number: Applied for CR Subject Classification (1998): I.2.7, F.4.2-3, I.2, H.3, I.7 LNCS Sublibrary: SL 7 – Artificial Intelligence ISSN ISBN-10 ISBN-13 0302-9743 3-540-85286-7 Springer Berlin Heidelberg New York 978-3-540-85286-5 Springer Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2008 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12463534 06/3180 543210 Preface This volume contains the papers presented at GoTAL 2008, the 6th International Conference on Natural Language Processing, held on August 25–27, 2008, at Chalmers University of Technology in Gothenburg, Sweden. GoTAL was the sixth conference in the TAL series, preceded by FracTAL 1997 (Université de Franche-Comté, Besançon, France), VexTAL 1999 (Università Ca’ Foscari di Venezia, Venice, Italy), PorTAL 2002 (Universidade do Algarve, Faro, Portugal), EsTAL 2004 (Universitat d’Alacant, Alicante, Spain), and FinTAL 2006 (University of Turku, Turku, Finland). The conference received 107 submissions. Each submission was reviewed by three programme committee members or external reviewers. The committee finally accepted 44 papers to be presented at the conference and included in the proceedings. The conference programme also included three invited talks, which are the first three papers in this volume. We are grateful to the programme committee members and the external reviewers for their careful and punctual work. The staff in the local organization team at Chalmers helped in a very professional way. The sponsors contributed, in particular, to the social programme planned for the conference. The invited speakers – Johan Bos, Lori Lamel, and Joakim Nivre – gave the scientific programme the broad, yet focused, profile that we wanted to achieve. And finally, it is essentially the authors of the submissions that created the substance of the conference and this volume, with all their good papers and their cooperative attitude. The EasyChair software was used throughout the reviewing and editing process. It saved us a lot of work by doing exactly the things that could be automatized, in exactly the ways we expected. June 2008 Bengt Nordström Aarne Ranta Organization Local Organization Björn Bringert Håkan Burden Rebecca Cyrén Markus Forsberg Harald Hammarström Tiina Rankanen Programme Committee Olli Aaltonen Walid El Abed Jan Alexandersson Jorge Baptista Tilman Becker Chris Biemann Patrick Blackburn Lars Borin Johan Bos Johan Boye Caroline Brun Sylviane Cardey Lauri Carlson Rolf Carlson Alexander Clark Robin Cooper Walter Daelemans Rodolfo Delmonte Elisabet Engdahl Jan van Eijck Filip Ginter Peter Greenfield Philippe de Groote Viggo Kann Kimmo Koskenniemi Hans Leiß Oliver Lemon Patricio Martinez Barco Adeline Nazarenko University of Helsinki, Finland Nestle Corp., Switzerland DFKI, Germany University of Algarve, Portugal DFKI, Germany Powerset, USA INRIA Lorraine University of Gothenburg, Sweden University of Rome “La Sapienza,” Italy SpeechAct, Sweden Xerox Corp., France University of Franche-Comté, France University of Helsinki, Finland KTH, Sweden Royal Holloway University of London, UK University of Gothenburg, Sweden University of Antwerp, Belgium University of Venice, Italy University of Gothenburg, Sweden CWI Amsterdam, The Netherlands University of Turku, Finland University of Franche-Comté, France INRIA Lorraine, France KTH, Sweden University of Helsinki, Finland LMU Munich, Germany University of Edinburgh, UK University of Alicante, Spain University Paris-Nord, France VIII Organization Joakim Nivre Bengt Nordström Pierre Nugues Guy Perrier Elisabete Ranchhod Aarne Ranta Manny Rayner Tapio Salakoski Karl-Michael Schneider Rolf Schwitter José Luis Vicedo Simo Vihjanen Annie Zaenen Växjö University and Uppsala University, Sweden Chalmers University of Technology, Sweden University of Lund, Sweden INRIA Lorraine, France University of Lisbon, Portugal University of Gothenburg, Sweden (Chair) University of Geneva, Switzerland University of Turku, Finland Textkernel, The Netherlands Macquarie University, Australia University of Alicante, Spain Lingsoft Ltd., Finland Palo Alto Research Center, USA External Reviewers Krasimir Angelov Björn Bringert Håkan Burden Maud Ehrmann Samuel Eleutério Oscar Ferrandez Sergio Ferrández Andrei Filip Bruno Guillaume Markus Forsberg Caroline Hagege Harald Hammarström Rubén Izquierdo Beviá Guillaume Jacquet Kristofer Johannisson Richard Johansson Lauri Karttunen Janna Khegai Marco Kuhlmann Joseph Le Roux Peter Ljunglöf Beáta Megyesi Borja Navarro Colorado Magnus Rosell Fernando Ruiz-Rico Markus Saers Estela Saquete Kamel Smaı̈li David Tomás Marcus Uneson Sponsors Centre for Language Technology, Gothenburg City of Göteborg Lingsoft Ltd., Helsinki Table of Contents Formal Semantics in the Real World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Johan Bos 1 Speech Processing for Audio Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lori Lamel and Jean-Luc Gauvain 4 Sorting Out Dependency Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joakim Nivre 16 “I Know What You Feel”: Analyzing the Role of Conjunctions in Automatic Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ritesh Agarwal, T.V. Prabhakar, and Sugato Chakrabarty 28 Automatic Annotation of Direct Reported Speech in Arabic and French, According to a Semantic Map of Enunciative Modalities . . . . . . . Motasem Alrahabi and Jean-Pierre Desclés 40 Type-Theoretical Bulgarian Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Krasimir Angelov A Compact Arabic Lexical Semantics Language Resource Based on the Theory of Semantic Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohamed Attia, Mohsen Rashwan, Ahmed Ragheb, Mohamed Al-Badrashiny, Husein Al-Basoumy, and Sherif Abdou Automatically Extracting Personal Name Aliases from the Web . . . . . . . . Danushka Bollegala, Taiki Honma, Yutaka Matsuo, and Mitsuru Ishizuka An Efficient Statistical Approach for Automatic Organic Chemistry Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Florian Boudin, Juan-Manuel Torres-Moreno, and Patricia Velázquez-Morales Augmenting Word Space Models for Word Sense Discrimination Using an Automatic Thesaurus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hiram Calvo 52 65 77 89 100 Plagiarism Detection Based on Singular Value Decomposition . . . . . . . . . Zdenek Ceska 108 Networking Multiword Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matthieu Constant and Patrick Watrin 120 X Table of Contents Searching for Part of Speech Tags That Improve Parsing Models . . . . . . . Martı́n Ariel Domı́nguez and Gabriel Infante-Lopez 126 A POS-Based Word Prediction System for the Persian Language . . . . . . . Masood Ghayoomi and Ehsan Daroodi 138 A Graph Based Method for Building Multilingual Weakly Supervised Dependency Parsers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jagadeesh Gorla, Anil Kumar Singh, Rajeev Sangal, Karthik Gali, Samar Husain, and Sriram Venkatapathy 148 A Web-Based Self-training Approach for Authorship Attribution . . . . . . . Rafael Guzmán-Cabrera, Manuel Montes-y-Gómez, Paolo Rosso, and Luis Villaseñor-Pineda 160 Parsing Discontinuous Phrase Structure with Grammatical Functions . . . Johan Hall and Joakim Nivre 169 How Can the Term Compositionality be Useful for Acquiring Elementary Semantic Relations? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thierry Hamon and Natalia Grabar 181 Training Statistical Language Models from Grammar-Generated Data: A Comparative Case-Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Beth Ann Hockey, Manny Rayner, and Gwen Christian 193 A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anton Karl Ingason, Sigrún Helgadóttir, Hrafn Loftsson, and Eirı́kur Rögnvaldsson 205 Semantic Roles in Valency Lexicon of Czech Verbs: Verbs of Communication and Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Václava Kettnerová, Markéta Lopatková, and Klára Hrstková 217 Automatic Generation of Frequent Case Forms of Query Keywords in Text Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kimmo Kettunen 222 Definition Extraction with Balanced Random Forests . . . . . . . . . . . . . . . . . L  ukasz Kobyliński and Adam Przepiórkowski 237 Reviewing and Evaluating Automatic Term Recognition Techniques . . . . Ioannis Korkontzelos, Ioannis P. Klapaftis, and Suresh Manandhar 248 Finding Text Boundaries and Finding Topic Boundaries: Two Different Tasks? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexandre Labadié and Violaine Prince 260 Table of Contents XI Tamil Question Classification Using Morpheme Features . . . . . . . . . . . . . . S. Lakshmana Pandian and T.V. Geetha 272 Analogical Translation of Medical Words in Different Languages . . . . . . . Philippe Langlais, François Yvon, and Pierre Zweigenbaum 284 Improving Chinese Pronominal Anaphora Resolution by Extensive Feature Representation and Confidence Estimation . . . . . . . . . . . . . . . . . . . Tyne Liang and Dian-Song Wu A Grammar Formalism for Specifying ISU-Based Dialogue Systems . . . . Peter Ljunglöf and Staffan Larsson Word Sense Disambiguation of Farsi Homographs Using Thesaurus and Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Raheleh Makki and Mohammad Mehdi Homayounpour Local Rephrasing Suggestions for Supporing the Work of Writers . . . . . . . Aurélien Max Interactive Multilingual Web Applications with Grammatical Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Moisés Salvador Meza Moreno and Björn Bringert Dependency Parsing by Transformation and Combination . . . . . . . . . . . . . Jens Nilsson and Joakim Nivre Using Constraints over Finite Sets of Integers for Range Concatenation Grammar Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yannick Parmentier and Wolfgang Maier Analyzing Argumentative Structures in Procedural Texts . . . . . . . . . . . . . Lionel Fontan and Patrick Saint-Dizier Natural Language Processing Across Time: An Empirical Investigation on Italian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marco Pennacchiotti and Fabio Massimo Zanzotto Statistical Surface Realisation of Portuguese Referring Expressions . . . . . Daniel Bastos Pereira and Ivandré Paraboni Classification-Based Filtering of Semantic Relatedness in Hypernymy Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maciej Piasecki, Stanislaw Szpakowicz, Michal Marcińczuk, and Bartosz Broda Similarity of Names Across Scripts: Edit Distance Using Learned Costs of N-Grams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bruno Pouliquen 296 303 315 324 336 348 360 366 371 383 393 405 XII Table of Contents Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Haşim Sak, Tunga Güngör, and Murat Saraçlar 417 A Graph Partitioning Approach to Entity Disambiguation Using Uncertain Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Emili Sapena, Lluı́s Padró, and Jordi Turmo 428 Arabic Named Entity Recognition from Diverse Text Types . . . . . . . . . . . Khaled Shaalan and Hafsa Raza A Noun-Predicate Bigram-Based Similarity Measure for Lexical Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyopil Shin and Insik Cho 440 452 German Compounds in Factored Statistical Machine Translation . . . . . . . Sara Stymne 464 A Reordering Model for Phrase-Based Machine Translation . . . . . . . . . . . Vinh Van Nguyen, Thai Phuong Nguyen, Akira Shimazu, and Minh Le Nguyen 476 Interruption, Resumption and Domain Switching in In-Vehicle Dialogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jessica Villing, Cecilia Holtelius, Staffan Larsson, Anders Lindström, Alexander Seward, and Nina Åberg 488 Finite Matters: Verbal Features in Data-Driven Parsing of Swedish . . . . . Lilja Øvrelid 500 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511