Lecture Notes in Artificial Intelligence
Edited by R. Goebel, J. Siekmann, and W. Wahlster
Subseries of Lecture Notes in Computer Science
5221
Bengt Nordström Aarne Ranta (Eds.)
Advances in
Natural Language
Processing
6th International Conference, GoTAL 2008
Gothenburg, Sweden, August 25-27, 2008
Proceedings
13
Series Editors
Randy Goebel, University of Alberta, Edmonton, Canada
Jörg Siekmann, University of Saarland, Saarbrücken, Germany
Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany
Volume Editors
Bengt Nordström
Aarne Ranta
Chalmers University of Technology
Department of Computer Science and Engineering
41296 Göteborg, Sweden
E-mail: {bengt, aarne}@chalmers.se
Library of Congress Control Number: Applied for
CR Subject Classification (1998): I.2.7, F.4.2-3, I.2, H.3, I.7
LNCS Sublibrary: SL 7 – Artificial Intelligence
ISSN
ISBN-10
ISBN-13
0302-9743
3-540-85286-7 Springer Berlin Heidelberg New York
978-3-540-85286-5 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
© Springer-Verlag Berlin Heidelberg 2008
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper
SPIN: 12463534
06/3180
543210
Preface
This volume contains the papers presented at GoTAL 2008, the 6th International Conference on Natural Language Processing, held on August 25–27, 2008,
at Chalmers University of Technology in Gothenburg, Sweden. GoTAL was the
sixth conference in the TAL series, preceded by FracTAL 1997 (Université de
Franche-Comté, Besançon, France), VexTAL 1999 (Università Ca’ Foscari di
Venezia, Venice, Italy), PorTAL 2002 (Universidade do Algarve, Faro, Portugal), EsTAL 2004 (Universitat d’Alacant, Alicante, Spain), and FinTAL 2006
(University of Turku, Turku, Finland).
The conference received 107 submissions. Each submission was reviewed by
three programme committee members or external reviewers. The committee finally accepted 44 papers to be presented at the conference and included in the
proceedings. The conference programme also included three invited talks, which
are the first three papers in this volume.
We are grateful to the programme committee members and the external reviewers for their careful and punctual work. The staff in the local organization
team at Chalmers helped in a very professional way. The sponsors contributed,
in particular, to the social programme planned for the conference. The invited
speakers – Johan Bos, Lori Lamel, and Joakim Nivre – gave the scientific programme the broad, yet focused, profile that we wanted to achieve. And finally,
it is essentially the authors of the submissions that created the substance of the
conference and this volume, with all their good papers and their cooperative
attitude.
The EasyChair software was used throughout the reviewing and editing process. It saved us a lot of work by doing exactly the things that could be automatized, in exactly the ways we expected.
June 2008
Bengt Nordström
Aarne Ranta
Organization
Local Organization
Björn Bringert
Håkan Burden
Rebecca Cyrén
Markus Forsberg
Harald Hammarström
Tiina Rankanen
Programme Committee
Olli Aaltonen
Walid El Abed
Jan Alexandersson
Jorge Baptista
Tilman Becker
Chris Biemann
Patrick Blackburn
Lars Borin
Johan Bos
Johan Boye
Caroline Brun
Sylviane Cardey
Lauri Carlson
Rolf Carlson
Alexander Clark
Robin Cooper
Walter Daelemans
Rodolfo Delmonte
Elisabet Engdahl
Jan van Eijck
Filip Ginter
Peter Greenfield
Philippe de Groote
Viggo Kann
Kimmo Koskenniemi
Hans Leiß
Oliver Lemon
Patricio Martinez Barco
Adeline Nazarenko
University of Helsinki, Finland
Nestle Corp., Switzerland
DFKI, Germany
University of Algarve, Portugal
DFKI, Germany
Powerset, USA
INRIA Lorraine
University of Gothenburg, Sweden
University of Rome “La Sapienza,” Italy
SpeechAct, Sweden
Xerox Corp., France
University of Franche-Comté, France
University of Helsinki, Finland
KTH, Sweden
Royal Holloway University of London, UK
University of Gothenburg, Sweden
University of Antwerp, Belgium
University of Venice, Italy
University of Gothenburg, Sweden
CWI Amsterdam, The Netherlands
University of Turku, Finland
University of Franche-Comté, France
INRIA Lorraine, France
KTH, Sweden
University of Helsinki, Finland
LMU Munich, Germany
University of Edinburgh, UK
University of Alicante, Spain
University Paris-Nord, France
VIII
Organization
Joakim Nivre
Bengt Nordström
Pierre Nugues
Guy Perrier
Elisabete Ranchhod
Aarne Ranta
Manny Rayner
Tapio Salakoski
Karl-Michael Schneider
Rolf Schwitter
José Luis Vicedo
Simo Vihjanen
Annie Zaenen
Växjö University and Uppsala University,
Sweden
Chalmers University of Technology, Sweden
University of Lund, Sweden
INRIA Lorraine, France
University of Lisbon, Portugal
University of Gothenburg, Sweden (Chair)
University of Geneva, Switzerland
University of Turku, Finland
Textkernel, The Netherlands
Macquarie University, Australia
University of Alicante, Spain
Lingsoft Ltd., Finland
Palo Alto Research Center, USA
External Reviewers
Krasimir Angelov
Björn Bringert
Håkan Burden
Maud Ehrmann
Samuel Eleutério
Oscar Ferrandez
Sergio Ferrández
Andrei Filip
Bruno Guillaume
Markus Forsberg
Caroline Hagege
Harald Hammarström
Rubén Izquierdo Beviá
Guillaume Jacquet
Kristofer Johannisson
Richard Johansson
Lauri Karttunen
Janna Khegai
Marco Kuhlmann
Joseph Le Roux
Peter Ljunglöf
Beáta Megyesi
Borja Navarro Colorado
Magnus Rosell
Fernando Ruiz-Rico
Markus Saers
Estela Saquete
Kamel Smaı̈li
David Tomás
Marcus Uneson
Sponsors
Centre for Language Technology, Gothenburg
City of Göteborg
Lingsoft Ltd., Helsinki
Table of Contents
Formal Semantics in the Real World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Johan Bos
1
Speech Processing for Audio Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lori Lamel and Jean-Luc Gauvain
4
Sorting Out Dependency Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Joakim Nivre
16
“I Know What You Feel”: Analyzing the Role of Conjunctions in
Automatic Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ritesh Agarwal, T.V. Prabhakar, and Sugato Chakrabarty
28
Automatic Annotation of Direct Reported Speech in Arabic and
French, According to a Semantic Map of Enunciative Modalities . . . . . . .
Motasem Alrahabi and Jean-Pierre Desclés
40
Type-Theoretical Bulgarian Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Krasimir Angelov
A Compact Arabic Lexical Semantics Language Resource Based on the
Theory of Semantic Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mohamed Attia, Mohsen Rashwan,
Ahmed Ragheb, Mohamed Al-Badrashiny,
Husein Al-Basoumy, and Sherif Abdou
Automatically Extracting Personal Name Aliases from the Web . . . . . . . .
Danushka Bollegala, Taiki Honma, Yutaka Matsuo, and
Mitsuru Ishizuka
An Efficient Statistical Approach for Automatic Organic Chemistry
Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Florian Boudin, Juan-Manuel Torres-Moreno, and
Patricia Velázquez-Morales
Augmenting Word Space Models for Word Sense Discrimination Using
an Automatic Thesaurus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hiram Calvo
52
65
77
89
100
Plagiarism Detection Based on Singular Value Decomposition . . . . . . . . .
Zdenek Ceska
108
Networking Multiword Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Matthieu Constant and Patrick Watrin
120
X
Table of Contents
Searching for Part of Speech Tags That Improve Parsing Models . . . . . . .
Martı́n Ariel Domı́nguez and Gabriel Infante-Lopez
126
A POS-Based Word Prediction System for the Persian Language . . . . . . .
Masood Ghayoomi and Ehsan Daroodi
138
A Graph Based Method for Building Multilingual Weakly Supervised
Dependency Parsers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Jagadeesh Gorla, Anil Kumar Singh, Rajeev Sangal, Karthik Gali,
Samar Husain, and Sriram Venkatapathy
148
A Web-Based Self-training Approach for Authorship Attribution . . . . . . .
Rafael Guzmán-Cabrera, Manuel Montes-y-Gómez,
Paolo Rosso, and Luis Villaseñor-Pineda
160
Parsing Discontinuous Phrase Structure with Grammatical Functions . . .
Johan Hall and Joakim Nivre
169
How Can the Term Compositionality be Useful for Acquiring
Elementary Semantic Relations? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Thierry Hamon and Natalia Grabar
181
Training Statistical Language Models from Grammar-Generated Data:
A Comparative Case-Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Beth Ann Hockey, Manny Rayner, and Gwen Christian
193
A Mixed Method Lemmatization Algorithm Using a Hierarchy of
Linguistic Identities (HOLI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Anton Karl Ingason, Sigrún Helgadóttir, Hrafn Loftsson, and
Eirı́kur Rögnvaldsson
205
Semantic Roles in Valency Lexicon of Czech Verbs: Verbs of
Communication and Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Václava Kettnerová, Markéta Lopatková, and Klára Hrstková
217
Automatic Generation of Frequent Case Forms of Query Keywords in
Text Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Kimmo Kettunen
222
Definition Extraction with Balanced Random Forests . . . . . . . . . . . . . . . . .
L
ukasz Kobyliński and Adam Przepiórkowski
237
Reviewing and Evaluating Automatic Term Recognition Techniques . . . .
Ioannis Korkontzelos, Ioannis P. Klapaftis, and Suresh Manandhar
248
Finding Text Boundaries and Finding Topic Boundaries: Two Different
Tasks? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Alexandre Labadié and Violaine Prince
260
Table of Contents
XI
Tamil Question Classification Using Morpheme Features . . . . . . . . . . . . . .
S. Lakshmana Pandian and T.V. Geetha
272
Analogical Translation of Medical Words in Different Languages . . . . . . .
Philippe Langlais, François Yvon, and Pierre Zweigenbaum
284
Improving Chinese Pronominal Anaphora Resolution by Extensive
Feature Representation and Confidence Estimation . . . . . . . . . . . . . . . . . . .
Tyne Liang and Dian-Song Wu
A Grammar Formalism for Specifying ISU-Based Dialogue Systems . . . .
Peter Ljunglöf and Staffan Larsson
Word Sense Disambiguation of Farsi Homographs Using Thesaurus and
Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Raheleh Makki and Mohammad Mehdi Homayounpour
Local Rephrasing Suggestions for Supporing the Work of Writers . . . . . . .
Aurélien Max
Interactive Multilingual Web Applications with Grammatical
Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Moisés Salvador Meza Moreno and Björn Bringert
Dependency Parsing by Transformation and Combination . . . . . . . . . . . . .
Jens Nilsson and Joakim Nivre
Using Constraints over Finite Sets of Integers for Range Concatenation
Grammar Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Yannick Parmentier and Wolfgang Maier
Analyzing Argumentative Structures in Procedural Texts . . . . . . . . . . . . .
Lionel Fontan and Patrick Saint-Dizier
Natural Language Processing Across Time: An Empirical Investigation
on Italian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Marco Pennacchiotti and Fabio Massimo Zanzotto
Statistical Surface Realisation of Portuguese Referring Expressions . . . . .
Daniel Bastos Pereira and Ivandré Paraboni
Classification-Based Filtering of Semantic Relatedness in Hypernymy
Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Maciej Piasecki, Stanislaw Szpakowicz, Michal Marcińczuk, and
Bartosz Broda
Similarity of Names Across Scripts: Edit Distance Using Learned Costs
of N-Grams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Bruno Pouliquen
296
303
315
324
336
348
360
366
371
383
393
405
XII
Table of Contents
Turkish Language Resources: Morphological Parser, Morphological
Disambiguator and Web Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Haşim Sak, Tunga Güngör, and Murat Saraçlar
417
A Graph Partitioning Approach to Entity Disambiguation Using
Uncertain Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Emili Sapena, Lluı́s Padró, and Jordi Turmo
428
Arabic Named Entity Recognition from Diverse Text Types . . . . . . . . . . .
Khaled Shaalan and Hafsa Raza
A Noun-Predicate Bigram-Based Similarity Measure for Lexical
Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hyopil Shin and Insik Cho
440
452
German Compounds in Factored Statistical Machine Translation . . . . . . .
Sara Stymne
464
A Reordering Model for Phrase-Based Machine Translation . . . . . . . . . . .
Vinh Van Nguyen, Thai Phuong Nguyen, Akira Shimazu, and
Minh Le Nguyen
476
Interruption, Resumption and Domain Switching in In-Vehicle
Dialogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Jessica Villing, Cecilia Holtelius, Staffan Larsson,
Anders Lindström, Alexander Seward, and Nina Åberg
488
Finite Matters: Verbal Features in Data-Driven Parsing of Swedish . . . . .
Lilja Øvrelid
500
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
511