Tommaso Caselli

University of Groningen, Center for Language and Cognition Groningen (CLCG), Faculty Member

Vrije Universiteit Amsterdam, Faculteit der Geesteswetenschappen, Post-Doc

Followers

132

Following

Co-authors

Public Views

InterestsView All (10)

Uploads

Papers

Predicting Controversial News Using Facebook Reactions

Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017

Download

HaSpeeDe 2 @ EVALITA2020: Overview of the EVALITA 2020 Hate Speech Detection Task

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

The Hate Speech Detection (HaSpeeDe 2) task is the second edition of a shared task on the detecti... more The Hate Speech Detection (HaSpeeDe 2) task is the second edition of a shared task on the detection of hateful content in Italian Twitter messages. HaSpeeDe 2 is composed of a Main task (hate speech detection) and two Pilot tasks, (stereotype and nominal utterance detection). Systems were challenged along two dimensions: (i) time, with test data coming from a different time period than the training data, and (ii) domain, with test data coming from the news domain (i.e., news headlines). Overall, 14 teams participated in the Main task, the best systems achieved a macro F1-score of 0.8088 and 0.7744 on the in-domain in the out-of-domain test sets, respectively; 6 teams submitted their results for Pilot task 1 (stereotype detection), the best systems achieved a macro F1-score of 0.7719 and 0.7203 on in-domain and out-of-domain test sets. We did not receive any submission for Pilot task 2.

Evalita 2018: Overview on the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian

EVALITA Evaluation of NLP and Speech Tools for Italian

Download

Fighting the COVID-19 Infodemic with a Holistic BERT Ensemble

Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

Meaning in Context: Ontologically and linguistically motivated representations of objects and events

Applied Ontology

Download

A Bilingual Corpus of Inter-linked Events

Lrec, 2008

Download

Enriching the "Senso Comune" Platform with Automatically Acquired Data

Using a generative lexicon resource to compute bridging anaphora in Italian

Procesamiento Del Lenguaje Natural, 2009

Abstract: This article reports on a preliminary work on the use of a Generative Lexicon based lex... more

Data-Driven Approach Using Semantics for Recognizing and Classifying TimeML Events in Italian

Abstract We present a data-driven approach for recognizing and classifying TimeML events in Itali... more

FBK-TR: SVM for Semantic Relatedness and Corpus Patterns for RTE

This paper reports the description and scores of our system, FBK-TR, which participated at the Se... more This paper reports the description and scores of our system, FBK-TR, which participated at the SemEval 2014 task #1 "Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Entailment". The system consists of two parts: one for computing semantic relatedness, based on SVM, and the other for identi- fying the entailment values on the basis of both semantic relatedness scores and entailment patterns based on verb-specific semantic frames. The system ranked 11th on both tasks with competitive results.

FBK-TR: SVM for Semantic Relatedeness and Corpus Patterns for RTE

Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 2014

Download

FBK-TR: Applying SVM with Multiple Linguistic Features for Cross-Level Semantic Similarity

Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 2014

Download

Detecting Attribution Relations in Speech: a Corpus Study

ABSTRACT In this work we present a methodology for the annotation of Attri-bution Relations (ARs)... more ABSTRACT In this work we present a methodology for the annotation of Attri-bution Relations (ARs) in speech which we apply to create a pilot corpus of spo-ken informal dialogues. This represents the first step towards the creation of a re-source for the analysis of ARs in speech and the development of automatic extrac-tion systems. Despite its relevance for speech recognition systems and spoken language understanding, the relation hold-ing between quotations and opinions and their source has been studied and extracted only in written corpora, characterized by a formal register (news, literature, scientific articles). The shift to the informal register and to a spoken corpus widens our view of this relation and poses new challenges. Our hypothesis is that the decreased relia-bility of the linguistic cues found for writ-ten corpora in the fragmented structure of speech could be overcome by including prosodic clues in the system. The analysis of SARC confirms the hypothesis show-ing the crucial role played by the acous-tic level in providing the missing lexical clues.

Words in context: a reference perspective on the lexicon

ABSTRACT In this paper, we present a rich contex-tual perspective on the lexicon and back-ground ... more ABSTRACT In this paper, we present a rich contex-tual perspective on the lexicon and back-ground knowledge for the purpose of deep semantic parsing. In the project Under-standing Language By machine 1 , we ad-dress various aspects of semantics in rela-tion to i.) reference to entities and event in-stances, ii.) modeling of author and reader perspectives. Lexical resources and even resources with world-knowledge such as Wikipedia do not provide the episodic knowledge that is needed to determine ref-erence and eventually meaning. Most re-sources and also the Natural Language Processing that uses these resources fo-cus too much on semantic knowledge and local context. We argue that we need richer and more complex context mod-els that integrate episodic knowledge, dis-course structure and reader/writer perspec-tives to be able to correctly process text. We outline the directions of research that our project follows and the different as-pects that we will study.

Customizable SCF Acquisition in Italian

Technologies and Tools for Lexical Acquisition

SemEval-2010 task 13: TempEval-2

Proceedings of the 5th …, 2010

Tempeval-2 comprises evaluation tasks for time expressions, events and temporal re-lations, the l... more

How Concrete Do We Get Telling Stories?

Topics in cognitive science, Jul 1, 2018

Will reading different stories about the same event in the world result in a similar image of the... more Will reading different stories about the same event in the world result in a similar image of the world? Will reading the same story by different people result in a similar proxy for experiencing the story? The answer to both questions is no because language is abstract by definition and relies on our episodic experience to turn a story into a more concrete mental movie. Since our episodic knowledge differs, also the mental movie will be different. Language leaves out details, and this becomes specifically clear when building machines that read texts to represent events and to establish event relations across mentions, such as co-reference, causality, subevents, scripts, timelines, and storylines. There is a lot of information and knowledge on the event that is not in the text but is needed to reconstruct these relations and understand the story. Machines lack this knowledge and experience and likewise make explicit what it takes to understand stories from text. In this paper, we re...

Download

On abstraction: decoupling conceptual concreteness and categorical specificity

by Marianna Bolognesi, Tommaso Caselli, and Christian Burgers

Conceptual concreteness and categorical specificity are two continuous variables that allow disti... more Conceptual concreteness and categorical specificity are two continuous variables that allow distinguishing, for example, justice (low concreteness) from banana (high concreteness) and furniture (low specificity) from rocking chair (high specificity). The relation between these two variables is unclear, with some scholars suggesting that they might be highly correlated. In this study, we operationalize both variables and conduct a series of analyses on a sample of > 13,000 nouns, to investigate the relationship between them. Concreteness is operationalized by means of concreteness ratings, and specificity is operational-ized as the relative position of the words in the WordNet taxonomy, which proxies this variable in the hypernym semantic relation. Findings from our studies show only a moderate correlation between concreteness and specificity. Moreover, the intersection of the two variables generates four groups of words that seem to denote qualitatively different types of concepts, which are, respectively, highly specific and highly concrete (typical concrete concepts denoting individual nouns), highly specific and highly abstract (among them many words denoting human-born creation and concepts within the social reality domains), highly generic and highly concrete (among which many mass nouns, or uncountable nouns), and highly generic and highly abstract (typical abstract concepts which are likely to be loaded with affective information, as suggested by previous literature). These results suggest that future studies should consider concreteness and specificity as two distinct dimensions of the general phenomenon called abstraction.

Download

Rule-based creation of timeML documents from dependency trees

by Matteo Grella and Tommaso Caselli

AI* IA 2011: Artificial Intelligence Around …, Jan 1, 2011

The access to information through content has become the new frontier in NLP. Innovative annotati... more

Predicting Controversial News Using Facebook Reactions

Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017

Download

HaSpeeDe 2 @ EVALITA2020: Overview of the EVALITA 2020 Hate Speech Detection Task

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

Evalita 2018: Overview on the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian

EVALITA Evaluation of NLP and Speech Tools for Italian

Download

Fighting the COVID-19 Infodemic with a Holistic BERT Ensemble

Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

Meaning in Context: Ontologically and linguistically motivated representations of objects and events

Applied Ontology

Download

A Bilingual Corpus of Inter-linked Events

Lrec, 2008

Download

Enriching the "Senso Comune" Platform with Automatically Acquired Data

Using a generative lexicon resource to compute bridging anaphora in Italian

Procesamiento Del Lenguaje Natural, 2009

Abstract: This article reports on a preliminary work on the use of a Generative Lexicon based lex... more

Data-Driven Approach Using Semantics for Recognizing and Classifying TimeML Events in Italian

Abstract We present a data-driven approach for recognizing and classifying TimeML events in Itali... more

FBK-TR: SVM for Semantic Relatedness and Corpus Patterns for RTE

FBK-TR: SVM for Semantic Relatedeness and Corpus Patterns for RTE

Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 2014

Download

FBK-TR: Applying SVM with Multiple Linguistic Features for Cross-Level Semantic Similarity

Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 2014

Download

Detecting Attribution Relations in Speech: a Corpus Study

Words in context: a reference perspective on the lexicon

Customizable SCF Acquisition in Italian

Technologies and Tools for Lexical Acquisition

SemEval-2010 task 13: TempEval-2

Proceedings of the 5th …, 2010

Tempeval-2 comprises evaluation tasks for time expressions, events and temporal re-lations, the l... more

How Concrete Do We Get Telling Stories?

Topics in cognitive science, Jul 1, 2018

Download

On abstraction: decoupling conceptual concreteness and categorical specificity

by Marianna Bolognesi, Tommaso Caselli, and Christian Burgers

Download

Rule-based creation of timeML documents from dependency trees

by Matteo Grella and Tommaso Caselli

AI* IA 2011: Artificial Intelligence Around …, Jan 1, 2011

The access to information through content has become the new frontier in NLP. Innovative annotati... more

Temporal Information Annotation: Crowd vs. Experts

by Tommaso Caselli and Oana Inel

Download

Crowdsourcing Salient Information from News and Tweets

Download

Parsing Events: a New Perspective on Old Challenges

Download

SemEval 2015 Task 9: CLIPEval: Implicit Sentiment Analysis

Download

SPINOZA_VU: An NLP Pipeline for Cross Document TimeLines

by Tommaso Caselli and Roser Morante

Download

Storylines for Structuring Massive Stream of News

Download

When It's All Piling Up: error propagation in an NLP pipeline

by Tommaso Caselli and Antske Fokkens

Download

EVENTI: EValuation of Events and Temporal INformation at Evalita 2014

by Rachele Sprugnoli and Tommaso Caselli

This report describes the EVENTI (EValuation of Events aNd Temporal Information) task organized w... more This report describes the EVENTI (EValuation of Events aNd Temporal Information) task organized within the EVALITA 2014 evaluation campaign. The EVENTI task aims at evaluating
the performance of Temporal Information Processing systems on a corpus of Italian news articles. Motivations for the task,
datasets, evaluation metrics, and results obtained by participating systems are presented and discussed.

Download

"What happened to …?" Entity-based Timeline Extraction

by Tommaso Caselli, Roser Morante, and Piek Vossen

We present the VUA-Timeline module for extracting cross-document timelines. The system has been d... more We present the VUA-Timeline module for extracting cross-document timelines. The system has been developed in the framework of the SemEval-2015 Task 4 TimeLine: Cross-document event ordering (http://alt.qcri.org/semeval2015/task4/). Innovative aspects of this task are cross-document information extraction and centering of timelines around entities which requires a careful handling of nominal and event coreference information.
Timeline Extraction is a complex task composed by a set of subtasks: named entity recognition, event detection and classification, coreference resolution of entities and events, event factuality, temporal expression recognition and normalization, and extraction of temporal relations. The VUA-Timeline system has been developed as an additional module of the NewsReader NLP pipeline, which consists of tools which provide state-of-the-art results carrying out the subtasks mentioned above. The output of this pipeline is a rich representation including events, their participants, coreference relations, temporal and causal relations and links to external references such as DBpedia. We extract cross-document timelines concerning specific entities in two steps. First, we identify timelines within documents, selecting events that involve the entity in question and filtering out events that are speculative. These events are placed on a timeline based on normalized temporal indications (if present) and explicit temporal relations. In the second step, we merge document specific timelines to one cross-document timeline by applying cross-document event coreference and comparing normalized times.

Download

Crowdsourcing Temporal Relations in Italian and English

by Tommaso Caselli and Rachele Sprugnoli

This paper reports on two crowdsourcing experiments on Temporal Relation Annotation in Italian an... more This paper reports on two crowdsourcing experiments on Temporal Relation Annotation in Italian and English. The aim of these experiments is three-fold: first, to evaluate average Italian and English native speakers on their ability to identify and classify a temporal relation between two verbal events; second, to assess the feasibility of crowdsourcing for this kind of complex semantic task; third to perform a preliminary analysis of the role of syntax within such task.

Download

Aligning an ItalianWordNet with a Lexicographic Dictionary: Coping with limited data

This work describes the evaluations of two approaches, Lexical Matching and Sense Similarity, for... more This work describes the evaluations of two approaches, Lexical Matching and Sense Similarity, for word sense alignment between
MultiWordNet and a lexicographic dictionary, Senso Comune De Mauro, when having few sense descriptions (MultiWordNet) and
no structure over senses (Senso Comune De Mauro). The results obtained from the merging of the two approaches are satisfying, with F1 values of 0.47 for verbs and 0.64 for nouns.

Download