Skip to main content

M. Antonia Marti

Universitat de Barcelona, General Linguistics, Faculty Member

Followers

152

Following

65

Co-authors

20

Public Views

Interests

Uploads

Papers by M. Antonia Marti

Information Theory–based Compositional Distributional Semantics

Computational Linguistics

In the context of text representation, Compositional Distributional Semantics models aim to fuse ... more In the context of text representation, Compositional Distributional Semantics models aim to fuse the Distributional Hypothesis and the Principle of Compositionality. Text embedding is based on co-ocurrence distributions and the representations are in turn combined by compositional functions taking into account the text structure. However, the theoretical basis of compositional functions is still an open issue. In this article we define and study the notion of Information Theory–based Compositional Distributional Semantics (ICDS): (i) We first establish formal properties for embedding, composition, and similarity functions based on Shannon’s Information Theory; (ii) we analyze the existing approaches under this prism, checking whether or not they comply with the established desirable properties; (iii) we propose two parameterizable composition and similarity functions that generalize traditional approaches while fulfilling the formal properties; and finally (iv) we perform an empiric...

On the Quality of Lexical Resources for Word Sense Disambiguation

Lecture Notes in Computer Science, 2004

MiniCors and Cast3LB: Two Semantically Tagged Spanish Corpora

In this paper we present two Spanish corpora, MiniCors and Cast3LB, semantically tagged according... more In this paper we present two Spanish corpora, MiniCors and Cast3LB, semantically tagged according to different annotation criteria and objectives. In order to guarantee the quality of the results, we have established a methodology for the development of these corpora. The resulting resources consist of a semantically tagged corpus according to the lexical sample task, and a semantically tagged corpus

The CoNLL-2009 shared task

Proceedings of the Thirteenth Conference on Computational Natural Language Learning Shared Task - CoNLL '09, 2009

Identity, non-identity, and near-identity: Addressing the complexity of coreference

Lingua, 2011

Coreference is not always either/or: psycholinguistic evidence for near-identity

Language, Cognition and Neuroscience, 2013

EsPal: One-stop shopping for Spanish word properties

Behavior Research Methods, 2013

What do we mean when we speak about Named Entities

Proceedings of Corpus …, 2007

The concept of Named Entity (NE) has its origin in the Named Entity Recognition and Classificatio... more

Comparison of the Final Wordnets Dutch, Spanish and Italian

Piek Vossen, University of Amsterdam Salvador Climent, Maria Antonia Marti, Mariona Taule, Univer... more Piek Vossen, University of Amsterdam Salvador Climent, Maria Antonia Marti, Mariona Taule, Universitat de Barcelona Julio Gonzalo, Irina Chugur, M. Felisa Verdejo, UNED Gerard Escudero, German Rigau, Horacio Rodriguez, Universitat Politecnica de Catalunya Antonietta Alonge, Francesca Bertagna, Rita Marinelli, Adriana Roventini, Luca Tarasi, Istituto di Linguistica del CNR, Pisa ... Deliverable D029, D030, WP3, WP4 EuroWordNet, LE2-4003 ... Title Comparison of the Final Wordnets Dutch, Spanish and Italian ... Authors Ö Piek Vossen, Laura ...

TEXT-MESS: Intelligent, Interactive and Multilingual Text Mining based on Human Language Technologies, TIN2006-15265-C06

The goal of the project is to analyze, experiment, and develop intelligent, interactive and multi... more The goal of the project is to analyze, experiment, and develop intelligent, interactive and multilingual Text Mining technologies, as a key element of the next generation of search engines, systems with the capacity to find" the need behind the query". This new generation will provide specialized services and interfaces according to the search domain and type of information needed. Moreover, it will integrate textual search (websites) and multimedia search (images, audio, video), it will be able to find and organize information, rather than ...

Defining a framework for the analysis of predicates

Proceedings of Interdisciplinary Workshop on the Identification and Representation of Verb Features and Verb Classes, 2005

In this position paper we present the research on verb predicates that we have carried out until ... more In this position paper we present the research on verb predicates that we have carried out until now for Catalan, Spanish, and Basque, and we outline the framework of our future research, which is based on the idea that it is necessary to include syntagmatic and statistic information in lexical resources, such as WordNet, in order to use it in tasks of information extraction from annotated corpora, and in automatic syntactic and semantic tagging of corpora.

HistoCat y DialCat: extensiones de un analizador morfológico para tratar textos históricos y dialectales del catalán

Resumen: Los textos históricos y dialectales del catalán no se pueden anotar morfosintácticamente... more Resumen: Los textos históricos y dialectales del catalán no se pueden anotar morfosintácticamente de manera automática ya que no existe una variante estándar de referencia que permita un tratamiento homogéneo y sistemático. El objetivo de los proyectos HistoCat y DialCat ha sido desarrollar un entorno de anotación semiautomático aprovechando herramientas existentes para la anotación morfosintáctica de textos en catalán, que minimizara al máximo la anotación manual. Palabras clave: Corpus ...

Using hybrid probabilistic-linguistic knowledge to improve pos-tagging performance

Proceedings of Corpus Linguistics, 2003

Los predicados de cambio y su representación en una BCL

Procesamiento del lenguaje natural, 1999

Resumen En este artículo se presenta una clase de predicados, la de cambio, a partir de los eleme... more Resumen En este artículo se presenta una clase de predicados, la de cambio, a partir de los elementos que hemos definido como básicos para la descripción del comportamiento verbal (componentes de significado, diátesis y estructura eventual). Se parte de la hipótesis de que los tres aspectos citados interaccionan entre sí y que son fundamentales a la hora de dar cuenta del uso real de los predicados. Esta información ha sido incorporada en la entrada léxica de una base de conocimiento léxico, de la cual presentamos la ...

Establishing Semantic Oppositions for the Typification of Predicates

Language Design, 1999

In this article we present our conception of diathesis alternations and how they intervene in the... more In this article we present our conception of diathesis alternations and how they intervene in the definition of a model of lexical entries. We consider that diathesis alternations are the syntactic realizations of oppositions of a more general semantic nature. We will see how they interact with other components such as event structure and how different semantic classes of predicates arise from that interaction. Keywords: Theoretical Model of Lexical Entries, Computational Lexicography, Diathesis Alternations.

Evaluation of EuroWordNet-and LCS-based lexical resources for machine translation

Proceedings from First International Conference on Language Resources & Evaluation, 1998

We evaluate two types of lexical resources with respect to their applicability to interlingual ma... more We evaluate two types of lexical resources with respect to their applicability to interlingual machine translation:(1) a EuroWordNetbased database of bilingual links between Spanish and English words; and (2) a repository of semantically classified verbs with their corresponding Lexical Conceptual Structure (LCS) representations. We examine the utility of these two resources for the task of lexical selection in machine translation. Our approach uses a coarse-grained graph-matching scheme that selects target-language words based ...

Translation equivalence and lexicalization in the ACQUILEX LKB

Proceedings of TMI ‘92, Montreal, Canada, 1992

We propose a strongly lexicalist treatment of translation equivalence where mismatches due to div... more We propose a strongly lexicalist treatment of translation equivalence where mismatches due to diverging lexicalization patterns are dealt with by means of translation links which capture crosslinguistic generalizations across sets of semantically related lexical items. We show how this treatment can be developed within a unification-based, multilingual lexical knowledge base which is integrated with facilities for semi-automatic development of bilingual lexicons, and describe an approach to machine translation where generation ...

Annotating Near-Identity from Coreference Disagreements

We present an extension of the coreference annotation in the English NP4E and the Catalan AnCora-... more We present an extension of the coreference annotation in the English NP4E and the Catalan AnCora-CA corpora with near-identity relations, which are borderline cases of coreference. The annotated subcorpora have 50K tokens each. Near-identity relations, as presented by Recasens et al.(2010; 2011), build upon the idea that identity is a continuum rather than an either/or relation, thus introducing a middle ground category to explain currently problematic cases. The first annotation effort that we describe shows that it is not ...

Ancora-Verb: A lexical resource for the semantic annotation of corpora

Proceedings of 6th International Conference on Language Resources and Evaluation, May 1, 2008

In this paper we present two large-scale verbal lexicons, AnCora-Verb-Ca for Catalan and AnCora-V... more In this paper we present two large-scale verbal lexicons, AnCora-Verb-Ca for Catalan and AnCora-Verb-Es for Spanish, which are the basis for the semantic annotation with arguments and thematic roles of AnCora corpora. In AnCora-Verb lexicons, the mapping between syntactic functions, arguments and thematic roles of each verbal predicate it is established taking into account the verbal semantic class and the diatheses alternations in which the predicate can participate. Each verbal predicate is related to one ...

Dealing with lexical mismatches

Information Theory–based Compositional Distributional Semantics

Computational Linguistics

In the context of text representation, Compositional Distributional Semantics models aim to fuse ... more In the context of text representation, Compositional Distributional Semantics models aim to fuse the Distributional Hypothesis and the Principle of Compositionality. Text embedding is based on co-ocurrence distributions and the representations are in turn combined by compositional functions taking into account the text structure. However, the theoretical basis of compositional functions is still an open issue. In this article we define and study the notion of Information Theory–based Compositional Distributional Semantics (ICDS): (i) We first establish formal properties for embedding, composition, and similarity functions based on Shannon’s Information Theory; (ii) we analyze the existing approaches under this prism, checking whether or not they comply with the established desirable properties; (iii) we propose two parameterizable composition and similarity functions that generalize traditional approaches while fulfilling the formal properties; and finally (iv) we perform an empiric...

On the Quality of Lexical Resources for Word Sense Disambiguation

Lecture Notes in Computer Science, 2004

MiniCors and Cast3LB: Two Semantically Tagged Spanish Corpora

In this paper we present two Spanish corpora, MiniCors and Cast3LB, semantically tagged according... more In this paper we present two Spanish corpora, MiniCors and Cast3LB, semantically tagged according to different annotation criteria and objectives. In order to guarantee the quality of the results, we have established a methodology for the development of these corpora. The resulting resources consist of a semantically tagged corpus according to the lexical sample task, and a semantically tagged corpus

The CoNLL-2009 shared task

Proceedings of the Thirteenth Conference on Computational Natural Language Learning Shared Task - CoNLL '09, 2009

Identity, non-identity, and near-identity: Addressing the complexity of coreference

Lingua, 2011

Coreference is not always either/or: psycholinguistic evidence for near-identity

Language, Cognition and Neuroscience, 2013

EsPal: One-stop shopping for Spanish word properties

Behavior Research Methods, 2013

What do we mean when we speak about Named Entities

Proceedings of Corpus …, 2007

The concept of Named Entity (NE) has its origin in the Named Entity Recognition and Classificatio... more

Comparison of the Final Wordnets Dutch, Spanish and Italian

Piek Vossen, University of Amsterdam Salvador Climent, Maria Antonia Marti, Mariona Taule, Univer... more Piek Vossen, University of Amsterdam Salvador Climent, Maria Antonia Marti, Mariona Taule, Universitat de Barcelona Julio Gonzalo, Irina Chugur, M. Felisa Verdejo, UNED Gerard Escudero, German Rigau, Horacio Rodriguez, Universitat Politecnica de Catalunya Antonietta Alonge, Francesca Bertagna, Rita Marinelli, Adriana Roventini, Luca Tarasi, Istituto di Linguistica del CNR, Pisa ... Deliverable D029, D030, WP3, WP4 EuroWordNet, LE2-4003 ... Title Comparison of the Final Wordnets Dutch, Spanish and Italian ... Authors Ö Piek Vossen, Laura ...

TEXT-MESS: Intelligent, Interactive and Multilingual Text Mining based on Human Language Technologies, TIN2006-15265-C06

The goal of the project is to analyze, experiment, and develop intelligent, interactive and multi... more The goal of the project is to analyze, experiment, and develop intelligent, interactive and multilingual Text Mining technologies, as a key element of the next generation of search engines, systems with the capacity to find" the need behind the query". This new generation will provide specialized services and interfaces according to the search domain and type of information needed. Moreover, it will integrate textual search (websites) and multimedia search (images, audio, video), it will be able to find and organize information, rather than ...

Defining a framework for the analysis of predicates

Proceedings of Interdisciplinary Workshop on the Identification and Representation of Verb Features and Verb Classes, 2005

In this position paper we present the research on verb predicates that we have carried out until ... more In this position paper we present the research on verb predicates that we have carried out until now for Catalan, Spanish, and Basque, and we outline the framework of our future research, which is based on the idea that it is necessary to include syntagmatic and statistic information in lexical resources, such as WordNet, in order to use it in tasks of information extraction from annotated corpora, and in automatic syntactic and semantic tagging of corpora.

HistoCat y DialCat: extensiones de un analizador morfológico para tratar textos históricos y dialectales del catalán

Resumen: Los textos históricos y dialectales del catalán no se pueden anotar morfosintácticamente... more Resumen: Los textos históricos y dialectales del catalán no se pueden anotar morfosintácticamente de manera automática ya que no existe una variante estándar de referencia que permita un tratamiento homogéneo y sistemático. El objetivo de los proyectos HistoCat y DialCat ha sido desarrollar un entorno de anotación semiautomático aprovechando herramientas existentes para la anotación morfosintáctica de textos en catalán, que minimizara al máximo la anotación manual. Palabras clave: Corpus ...

Using hybrid probabilistic-linguistic knowledge to improve pos-tagging performance

Proceedings of Corpus Linguistics, 2003

Los predicados de cambio y su representación en una BCL

Procesamiento del lenguaje natural, 1999

Resumen En este artículo se presenta una clase de predicados, la de cambio, a partir de los eleme... more Resumen En este artículo se presenta una clase de predicados, la de cambio, a partir de los elementos que hemos definido como básicos para la descripción del comportamiento verbal (componentes de significado, diátesis y estructura eventual). Se parte de la hipótesis de que los tres aspectos citados interaccionan entre sí y que son fundamentales a la hora de dar cuenta del uso real de los predicados. Esta información ha sido incorporada en la entrada léxica de una base de conocimiento léxico, de la cual presentamos la ...

Establishing Semantic Oppositions for the Typification of Predicates

Language Design, 1999

In this article we present our conception of diathesis alternations and how they intervene in the... more In this article we present our conception of diathesis alternations and how they intervene in the definition of a model of lexical entries. We consider that diathesis alternations are the syntactic realizations of oppositions of a more general semantic nature. We will see how they interact with other components such as event structure and how different semantic classes of predicates arise from that interaction. Keywords: Theoretical Model of Lexical Entries, Computational Lexicography, Diathesis Alternations.

Evaluation of EuroWordNet-and LCS-based lexical resources for machine translation

Proceedings from First International Conference on Language Resources & Evaluation, 1998

We evaluate two types of lexical resources with respect to their applicability to interlingual ma... more We evaluate two types of lexical resources with respect to their applicability to interlingual machine translation:(1) a EuroWordNetbased database of bilingual links between Spanish and English words; and (2) a repository of semantically classified verbs with their corresponding Lexical Conceptual Structure (LCS) representations. We examine the utility of these two resources for the task of lexical selection in machine translation. Our approach uses a coarse-grained graph-matching scheme that selects target-language words based ...

Translation equivalence and lexicalization in the ACQUILEX LKB

Proceedings of TMI ‘92, Montreal, Canada, 1992

We propose a strongly lexicalist treatment of translation equivalence where mismatches due to div... more We propose a strongly lexicalist treatment of translation equivalence where mismatches due to diverging lexicalization patterns are dealt with by means of translation links which capture crosslinguistic generalizations across sets of semantically related lexical items. We show how this treatment can be developed within a unification-based, multilingual lexical knowledge base which is integrated with facilities for semi-automatic development of bilingual lexicons, and describe an approach to machine translation where generation ...

Annotating Near-Identity from Coreference Disagreements

We present an extension of the coreference annotation in the English NP4E and the Catalan AnCora-... more We present an extension of the coreference annotation in the English NP4E and the Catalan AnCora-CA corpora with near-identity relations, which are borderline cases of coreference. The annotated subcorpora have 50K tokens each. Near-identity relations, as presented by Recasens et al.(2010; 2011), build upon the idea that identity is a continuum rather than an either/or relation, thus introducing a middle ground category to explain currently problematic cases. The first annotation effort that we describe shows that it is not ...

Ancora-Verb: A lexical resource for the semantic annotation of corpora

Proceedings of 6th International Conference on Language Resources and Evaluation, May 1, 2008

In this paper we present two large-scale verbal lexicons, AnCora-Verb-Ca for Catalan and AnCora-V... more In this paper we present two large-scale verbal lexicons, AnCora-Verb-Ca for Catalan and AnCora-Verb-Es for Spanish, which are the basis for the semantic annotation with arguments and thematic roles of AnCora corpora. In AnCora-Verb lexicons, the mapping between syntactic functions, arguments and thematic roles of each verbal predicate it is established taking into account the verbal semantic class and the diatheses alternations in which the predicate can participate. Each verbal predicate is related to one ...

Dealing with lexical mismatches