Skip to main content
Abstract: Natural language generation systems (NLG) map non-linguistic representations into strings of words through a number of steps using intermediate representations of various levels of abstraction. Template based systems, by... more
Abstract: Natural language generation systems (NLG) map non-linguistic representations into strings of words through a number of steps using intermediate representations of various levels of abstraction. Template based systems, by contrast, tend to use only one representation level, ie fixed strings, which are combined, possibly in a sophisticated way, to generate the final text.
Abstract Automated extraction of ontological knowledge from text corpora is a relevant task in Natural Language Processing. In this paper, we focus on the problem of finding hypernyms for relevant concepts in a specific domain (eg Optical... more
Abstract Automated extraction of ontological knowledge from text corpora is a relevant task in Natural Language Processing. In this paper, we focus on the problem of finding hypernyms for relevant concepts in a specific domain (eg Optical Recording) in the context of a concrete and challenging application scenario (patent processing). To this end information available on the Web is exploited. The extraction method includes four mains steps.
We present the C-STAR Italian Generator (XIG), a system for generating Italian text from the interlingua content representation (Interchange Format) adopted within the C-STAR II speech to speech translation project. The constraints of the... more
We present the C-STAR Italian Generator (XIG), a system for generating Italian text from the interlingua content representation (Interchange Format) adopted within the C-STAR II speech to speech translation project. The constraints of the application scenario led us to follow a Mixed Representions approach to text generation and to adopt for the sentence planner an architecture based on cascades of default rules.
Abstract This document reports the process of extending MorphoPro for Venetan, a lesser-used language spoken in the Nort-Eastern part of Italy. MorphoPro is the morphological component of TextPro, a suite of tools oriented towards a... more
Abstract This document reports the process of extending MorphoPro for Venetan, a lesser-used language spoken in the Nort-Eastern part of Italy. MorphoPro is the morphological component of TextPro, a suite of tools oriented towards a number of NLP tasks. In order to extend this component to Venetan, we developed a declarative representation of the morphological knowledge necessary to analyze and synthesize Venetan words.
Abstract Clustering of the entities composing a Web application (static and dynamic pages) can be used to support program understanding, However, several alternative options are available when a clustering technique is designed for Web... more
Abstract Clustering of the entities composing a Web application (static and dynamic pages) can be used to support program understanding, However, several alternative options are available when a clustering technique is designed for Web applications. The entities to be clustered can be described in different ways (eg, by their structure, by their connectivity, or by their content), different similarity measures are possible, and alternative procedures can be used to form the clusters.
Abstract Dictionaries are a valuable source of information about multiwords. Unfortunately, only few multiwords are explicitly marked as such in dictionaries: most of them are presented without being distinguished from free combinations... more
Abstract Dictionaries are a valuable source of information about multiwords. Unfortunately, only few multiwords are explicitly marked as such in dictionaries: most of them are presented without being distinguished from free combinations of words. In this paper we present a methodology for detecting hidden multiwords in bilingual dictionaries, along with their translation in another language.
Abstract Institutions and companies that are based in countries where the main language is not English typically publish Web sites that offer the same information at least in the local language and in English. However, the evolution of... more
Abstract Institutions and companies that are based in countries where the main language is not English typically publish Web sites that offer the same information at least in the local language and in English. However, the evolution of these Web sites may be troublesome, if the same pages are replicated for all supported languages. In fact, changes have to be propagated to all translations of a modified page.
Abstract This paper presents a speech-to-speech translation system for tourism application developed in the context of the C-STAR consortium. Potential users can communicate by speech and by using their own language with a travel agent in... more
Abstract This paper presents a speech-to-speech translation system for tourism application developed in the context of the C-STAR consortium. Potential users can communicate by speech and by using their own language with a travel agent in order to organize their travel. The system uses an interchange format representation of the semantic contents of utterances, which is flexible and simplifies the system portability to new languages.
Abstract In this paper we present a novel approach to mapping FrameNet lexical units to WordNet synsets in order to automatically enrich the lexical unit set of a given frame. While the mapping approaches proposed in the past mainly rely... more
Abstract In this paper we present a novel approach to mapping FrameNet lexical units to WordNet synsets in order to automatically enrich the lexical unit set of a given frame. While the mapping approaches proposed in the past mainly rely on the semantic similarity between lexical units in a frame and lemmas in a synset, we exploit the definition of the lexical entries in FrameNet and the WordNet glosses to find the best candidate synset (s) for the mapping. Evaluation results are also reported and discussed.
Abstract We propose a feature type classification thought to be used in a therapeutic context. Such a scenario lays behind our need for a easily usable and cognitively plausible classification. Nevertheless, our proposal has both a... more
Abstract We propose a feature type classification thought to be used in a therapeutic context. Such a scenario lays behind our need for a easily usable and cognitively plausible classification. Nevertheless, our proposal has both a practical and a theoretical outcome, and its applications range from computational linguistics to psycholinguistics. An evaluation through inter-coder agreement has been performed to highlight the strength of our proposal and to conceive some improvements for the future.
ABSTRACT This document reports on Temporal Expression annotation for the Italian Content Annotation Bank (I-CAB) being developed at ITC-irst. We describe the extensions to the English annotation guidelines that are required for Italian... more
ABSTRACT This document reports on Temporal Expression annotation for the Italian Content Annotation Bank (I-CAB) being developed at ITC-irst. We describe the extensions to the English annotation guidelines that are required for Italian and provide a large number of examples and a detailed description of the benchmark.
Abstract In this paper we address the task of transferring FrameNet annotations from an English corpus to an aligned Italian corpus. Experiments were carried out on an English-Italian bitext extracted from the Europarl corpus and on a set... more
Abstract In this paper we address the task of transferring FrameNet annotations from an English corpus to an aligned Italian corpus. Experiments were carried out on an English-Italian bitext extracted from the Europarl corpus and on a set of selected sentences from the English FrameNet corpus that have been manually translated into Italian.
We present a framework for supporting ontology engineering by exploiting key-concept extraction. The framework is implemented in an existing wiki-based collaborative platform which has been extended with a component for terminology... more
We present a framework for supporting ontology engineering by exploiting key-concept extraction. The framework is implemented in an existing wiki-based collaborative platform which has been extended with a component for terminology extraction from domain-specific textual corpora, and with a further step aimed at matching the extracted concepts with pre-existing structured and semi-structured information.
One of the issue of the NESPOLE! scenario is whether we need to analyse free-hand strokes at the semantic or at the lexical level. In other words, whether gestures need to be translated or can simply be recognized and re-produced.... more
One of the issue of the NESPOLE! scenario is whether we need to analyse free-hand strokes at the semantic or at the lexical level. In other words, whether gestures need to be translated or can simply be recognized and re-produced. Moreover, even in the lexical approach, it may be the case that it will suffice to just segment a sequence of strokes without classifying it as a specific object (eg recognizing a sequence of three strokes as a single gesture without classifying it as an arrow). Let's call this kind of analysis partial lexical analysis.
Abstract. Citizens are increasingly aware of the influence of environmental and meteorological conditions on the quality of their life. This results in an increasing demand for personalized environmental information, ie, information that... more
Abstract. Citizens are increasingly aware of the influence of environmental and meteorological conditions on the quality of their life. This results in an increasing demand for personalized environmental information, ie, information that is tailored to citizens' specific context and background. In this demonstration, we present an environmental information system that addresses this demand in its full complexity in the context of the PESCaDO EU project.
Abstract In this paper, we address the issue of how to annotate discontinuous elements in XML. We will take discontinuous multiwords as a case study to investigate different annotation possibilities, in the framework of the linguistic... more
Abstract In this paper, we address the issue of how to annotate discontinuous elements in XML. We will take discontinuous multiwords as a case study to investigate different annotation possibilities, in the framework of the linguistic annotation of the MEANING Italian Corpus.
Abstract In order to show that a system for text understanding has produced a sound representation of the semantic and pragmatic contents of a story, it should be able to answer questions about the participants and the events occurring in... more
Abstract In order to show that a system for text understanding has produced a sound representation of the semantic and pragmatic contents of a story, it should be able to answer questions about the participants and the events occurring in the story. This requires processing linguistic descriptions which are lexically expressed but also unexpressed ones, a task that, in our opinion, can only be accomplished starting from full-fledged semantic representations.
The topic of this paper is the theoretical foundations and the results of a system for text analysis and understanding called GETA_RUN, developed at the University of Venice, Laboratory of Computational Linguistics, Department of... more
The topic of this paper is the theoretical foundations and the results of a system for text analysis and understanding called GETA_RUN, developed at the University of Venice, Laboratory of Computational Linguistics, Department of Linguistics and Language Teaching Theory. The main tenet of the theory supporting the construction of the system is that it is possible to reduce access to domain world knowledge by means of contextual reasoning, ie reasoning triggered independently by contextual or linguistic features of the text.
Abstract This paper presents the Italian NESPOLE! Database. The database consists of three parts: The first two, called DB-1 and DB-2 concern the tourism domain, while the third part, DB-3, concentrates on the medical domain. The database... more
Abstract This paper presents the Italian NESPOLE! Database. The database consists of three parts: The first two, called DB-1 and DB-2 concern the tourism domain, while the third part, DB-3, concentrates on the medical domain. The database includes audio files, transcriptions, Interlingua annotations in IF (Interchange Format) and translations into English, French and German.
Page 1. Parsing with Prolog Emanuele Pianta Fondazione Bruno Kessler, CELCT pianta@fbk.eu Text Processing 2011-2012 Trento University Page 2. Prolog • A declarative (as opposed to imperative or functional) programming language based on... more
Page 1. Parsing with Prolog Emanuele Pianta Fondazione Bruno Kessler, CELCT pianta@fbk.eu Text Processing 2011-2012 Trento University Page 2. Prolog • A declarative (as opposed to imperative or functional) programming language based on first order logics. • Instead of specifying how to achieve a certain goal in a certain situation (imperative stile) • we specify ‒ what the situation is (through rules and facts) ‒ and what the goal is (query) ‒ and let the Prolog interpreter derive the solution for us Page 3.
Abstract In this paper we present KNOWA, an English/Italian word aligner, developed at ITC-irst, which relies mostly on information contained in bilingual dictionaries. The performances of KNOWA are compared with those of GIZA++, a state... more
Abstract In this paper we present KNOWA, an English/Italian word aligner, developed at ITC-irst, which relies mostly on information contained in bilingual dictionaries. The performances of KNOWA are compared with those of GIZA++, a state of the art statistics-based alignment algorithm. The two algorithms are evaluated on the EuroCor and MultiSemCor tasks, that is on two English/Italian publicly available parallel corpora.
Abstract This document contains the instructions for preparing a camera-ready manuscript for the proceedings of EACL-06. The document itself conforms to its own specifications, and is therefore an example of what your manuscript should... more
Abstract This document contains the instructions for preparing a camera-ready manuscript for the proceedings of EACL-06. The document itself conforms to its own specifications, and is therefore an example of what your manuscript should look like. Authors are asked to conform to all the directions reported in this document.
Abstract. This paper describes the News People Search (NePS) Task organized as part of EVALITA 2011. The NePS Task aims at evaluating crossdocument coreference resolution of person entities in Italian news and consists of clustering a set... more
Abstract. This paper describes the News People Search (NePS) Task organized as part of EVALITA 2011. The NePS Task aims at evaluating crossdocument coreference resolution of person entities in Italian news and consists of clustering a set of Italian newspaper articles that mention a person name according to the different people sharing the name. The motivation behind the task, the dataset used for the evaluation and the results obtained are described and discussed.
Abstract This paper presents the theoretical bases and quantitative results of an activity consisting in manually annotating part-whole and motion relations in patent documents. The aim of this activity was creating a gold standard for... more
Abstract This paper presents the theoretical bases and quantitative results of an activity consisting in manually annotating part-whole and motion relations in patent documents. The aim of this activity was creating a gold standard for the evaluation of an automatic relation extraction tool developed by FBK-irst within the PA-TExpert project.
Abstract In this paper, we propose an extension of the WordNet conceptual model, with the final purpose of encoding the common sense lexical knowledge associated to words used in everyday life. The extended model has been defined starting... more
Abstract In this paper, we propose an extension of the WordNet conceptual model, with the final purpose of encoding the common sense lexical knowledge associated to words used in everyday life. The extended model has been defined starting from the short descriptions generated by naïve speakers in relation to target concepts (ie feature norms).
In this paper we present SAX, a system that generates hypertext descriptions of conceptual models designed with the SADT methodology. The combination of natural language and hypertext significantly lowers the communicative barrier between... more
In this paper we present SAX, a system that generates hypertext descriptions of conceptual models designed with the SADT methodology. The combination of natural language and hypertext significantly lowers the communicative barrier between the analyst and the domain expert, thus increasing the effectiveness of conceptual model validation. The application of hybrid techniques for text generation guarantees an optimal trade-off between robustness and portability across domains on one side and text fluency on the other.
Abstract. This paper presents an experimental system architecture for Part-Of-Speech Tagging for the Italian language, able to manage a large tagset to provide both lexical and morphological information. The tagger was built as a cascade... more
Abstract. This paper presents an experimental system architecture for Part-Of-Speech Tagging for the Italian language, able to manage a large tagset to provide both lexical and morphological information. The tagger was built as a cascade of four classifiers where each classifier in the cascade accepts data from an initial input or the guesses of the previous one, executes its annotation, and sends the resulting data to the next stage, or to the output of the cascade.
Page 1. EVALITA 2011 The News People Search Task: Evaluating Cross-document Coreference Resolution of Named Person Entities in Italian News L. Bentivogli, A. Marchetti, E.
This report illustrates the results of task 3.2 aiming at specifying the “knowledge necessary for the construction of an optimally structured text plan” and “the process which handles the interaction between the various knowledge... more
This report illustrates the results of task 3.2 aiming at specifying the “knowledge necessary for the construction of an optimally structured text plan” and “the process which handles the interaction between the various knowledge sources”[GIST Technical Annex]. The specifications presented here are all based on an extensive corpus analysis. The document turns out to be underspecified with regard to the thematic progression and referring expressions components, which are integrant part of the Text Structurer.
The CLEF 2010 Conference on Multilingual and Multimodal Information Access Evaluation1 was held at the University of Padua, Italy, September 20–23, 2010. CLEF 2010 was organized by the Information Management Systems (IMS) research group... more
The CLEF 2010 Conference on Multilingual and Multimodal Information Access Evaluation1 was held at the University of Padua, Italy, September 20–23, 2010. CLEF 2010 was organized by the Information Management Systems (IMS) research group of the Department of Information Engineering (DEI) of the University of Padua, Italy.
Abstract GETARUN the system for text understanding developed at the University of Venice contains an integrated set of algorithms to compute quantifier scope efficiently. The algorithm for Quantifier Raising is coupled to a procedure that... more
Abstract GETARUN the system for text understanding developed at the University of Venice contains an integrated set of algorithms to compute quantifier scope efficiently. The algorithm for Quantifier Raising is coupled to a procedure that checks whether a given utterance may be interpreted as generic assertion on the basis of a certain number of linguistic conditions dependent on tense, mood, frequency temporal adjuncts etc.
Abstract Although many approaches have been presented to compute and predict readability of documents in different languages, the information provided by readability systems often fail to show in a clear and understandable way how... more
Abstract Although many approaches have been presented to compute and predict readability of documents in different languages, the information provided by readability systems often fail to show in a clear and understandable way how difficult a document is and which aspects contribute to content readability.
Participative Research labOratory for Multimedia and Multilingual Information Systems Evaluation (PROMISE) is a Network of Excellence, starting in conjunction with this first independent CLEF 2010 conference, and designed to support and... more
Participative Research labOratory for Multimedia and Multilingual Information Systems Evaluation (PROMISE) is a Network of Excellence, starting in conjunction with this first independent CLEF 2010 conference, and designed to support and develop the evaluation of multilingual and multimedia information access systems, largely through the activities taking place in Cross-Language Evaluation Forum (CLEF) today, and taking it forward in important new ways.
Abstract. Question Answering (QA) evaluation potentially provides a way to evaluate systems that attempt to understand texts automatically. Although current QA technologies are still unable to answer complex questions that require deep... more
Abstract. Question Answering (QA) evaluation potentially provides a way to evaluate systems that attempt to understand texts automatically. Although current QA technologies are still unable to answer complex questions that require deep inference, we believe QA evaluation techniques must be adapted to drive QA research in the direction of deeper understanding of texts.
We present EntityPro, a system for Named Entity Recognition (NER) based on Support Vector Machines. EntityPro was trained with a large number of both static and dynamic features. The system performed the best on the task of Italian NER at... more
We present EntityPro, a system for Named Entity Recognition (NER) based on Support Vector Machines. EntityPro was trained with a large number of both static and dynamic features. The system performed the best on the task of Italian NER at EVALITA 2007, with an F1 measure of 82.14. Keywords: Named Entity Recognition, SVM
In this paper we introduce a joint project between Università di Venezia and Fondazione Bruno Kessler, Trento, for the semi-automatic development of FrameNet for Italian. The collaboration is aimed at investigating semi-automatic... more
In this paper we introduce a joint project between Università di Venezia and Fondazione Bruno Kessler, Trento, for the semi-automatic development of FrameNet for Italian. The collaboration is aimed at investigating semi-automatic approaches to acquire FrameNet for new languages and at developing a paradigm that can be suitable for most European languages.
An interlingua is a representation of meaning or speaker intention that is neutral between the various ways that the meaning can be expressed. The examples in (1) show that the same meaning can be expressed by different syntactic means in... more
An interlingua is a representation of meaning or speaker intention that is neutral between the various ways that the meaning can be expressed. The examples in (1) show that the same meaning can be expressed by different syntactic means in different languages, and even within one language.
In this paper we address the development of a system for the multilingual automatic generation of instructions in the administrative field. English, Italian and German have been taken as target languages and pension forms as target... more
In this paper we address the development of a system for the multilingual automatic generation of instructions in the administrative field. English, Italian and German have been taken as target languages and pension forms as target domain. We describe the knowledge resources required to produce coherent and cohesive instructional texts, taking into account the distinction between knowledge about how to communicate in a specific domain and general communication knowledge.
Abstract WORDNET makes a great number of fine-grained word sense distinctions. However, what could be seen as an advantage has often been considered a problem from a computational point of view. A great number of sense distinctions makes... more
Abstract WORDNET makes a great number of fine-grained word sense distinctions. However, what could be seen as an advantage has often been considered a problem from a computational point of view. A great number of sense distinctions makes harder the problem of word sense disambiguation. One way to face this issue is reducing the number of senses, for example by grouping them into equivalence classes which abstract on some aspects of the meanings of words. In this paper we will try a different approach.
Abstract Multilingual Web sites are expected to provide the same content expressed in various languages, presented according to a common style, with the same interaction facilities. To this extent, most Web developers start from a source... more
Abstract Multilingual Web sites are expected to provide the same content expressed in various languages, presented according to a common style, with the same interaction facilities. To this extent, most Web developers start from a source language version of the site and produce the multilingual versions by providing translations in all supported languages.
GraFo is a left corner parser for Italian, based on explicit rules manually coded in a unification formalism. As the linguistic coverage of GraFo is still quite limited, the parser produces complete parse trees for a small percentage of... more
GraFo is a left corner parser for Italian, based on explicit rules manually coded in a unification formalism. As the linguistic coverage of GraFo is still quite limited, the parser produces complete parse trees for a small percentage of sentences. This paper presents a number of strategies to recover from GraFo parsing failures. The various techniques have been evaluated on the data provided by the EVALITA 2007 evaluation campaign. Keywords: parsing, Italian, left corner, failure recover.
The idea of a Lexical knowledge base was recently proposed by the ESPRIT BRA AQUILEX [Briscoe 91],[Calzolari 92] project, to provide information, mostly of a semantic nature, internally consistently structured and electronically available.
Abstract We are going to present a technique of preposition disambiguation based on sense discriminative patterns, which are acquired using a variant of Angluin's algorithm. They represent the essential information extracted from a... more
Abstract We are going to present a technique of preposition disambiguation based on sense discriminative patterns, which are acquired using a variant of Angluin's algorithm. They represent the essential information extracted from a particular type of local contexts we call Chain Clarifying Relationship contexts. The data set and the results we present are from the Semeval task, WSD of Preposition (Litkowski 2007).
ABSTRACT We present an adaptation for the French text mining challenge (DEFT 2012) of the KX system for multilingual unsupervised key-concept extraction. KX carries out the selection of a list of weighted keywords from a document by... more
ABSTRACT We present an adaptation for the French text mining challenge (DEFT 2012) of the KX system for multilingual unsupervised key-concept extraction. KX carries out the selection of a list of weighted keywords from a document by combining basic linguistic annotations with simple statistical measures. In order to adapt it to the French language, a French morphological analyzer (PoS-Tagger) has been added into the extraction pipeline, to derive lexical patterns.
Abstract In this paper we will present a system for Question Answering called GETARUNS, in its deep version applicable to closed domains, that is to say domains for which the lexical semantics is fully specified and does not have to be... more
Abstract In this paper we will present a system for Question Answering called GETARUNS, in its deep version applicable to closed domains, that is to say domains for which the lexical semantics is fully specified and does not have to be induced. In addition, no ontology is needed: semantic relations are derived from linguistic relations encoded in the syntax.
Abstract. We present a methodology to find the best document-summary match based on three steps: first, a set of key-concepts is extracted from the document in order to give a concise representation of its content. Then, the key-concept... more
Abstract. We present a methodology to find the best document-summary match based on three steps: first, a set of key-concepts is extracted from the document in order to give a concise representation of its content. Then, the key-concept list is compared with each abstract by assigning a similarity score inspired by the standard metrics of Precision, Recall and F1. Finally, an algorithm is run that, given a weighted bipartite graph representing all possible document-summary pairs, finds the best matches.
A major difficulty in developing technological aids for anomic patients is the need to create tools flexible enough to cope with the great variability of their impairment. As far as therapeutic aids are concerned, the search for... more
A major difficulty in developing technological aids for anomic patients is the need to create tools flexible enough to cope with the great variability of their impairment. As far as therapeutic aids are concerned, the search for flexibility coincides with the need for cognitively motivated models. In this paper we will introduce STaRS.
Abstract—In this paper we propose a pilot study aimed at an indepth comprehension of the phenomena underlying Ontology Population from text. The study has been carried out on a collection of Italian news articles, which have been manually... more
Abstract—In this paper we propose a pilot study aimed at an indepth comprehension of the phenomena underlying Ontology Population from text. The study has been carried out on a collection of Italian news articles, which have been manually annotated at several semantic levels.
Il progetto ATP (Assistente Turistico Personale) si pone come obiettivo strategico di sviluppare metodi e tecnologie per accedere ad informazioni turistiche attraverso il telefono. In questo ambito sono state selezionate tecnologie... more
Il progetto ATP (Assistente Turistico Personale) si pone come obiettivo strategico di sviluppare metodi e tecnologie per accedere ad informazioni turistiche attraverso il telefono. In questo ambito sono state selezionate tecnologie (riconoscimento automatico della voce, sintesi della voce, dialogo e generazione automatica del linguaggio) e scenari applicativi (si vedano i precedenti documenti di Progetto) che consentiranno di realizzare un prototipo dimostrativo entro la fine dell'anno in corso.

And 43 more