Philippe Blache

2024

pdf bib abs
Le corpus BrainKT: Etudier l’instanciation du common ground par l’analyse des indices verbaux, gestuels et neurophysiologiques
Eliot Maës | Thierry Legou | Leonor Becerra-Bonache | Philippe Blache
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 2 : traductions d'articles publiès

La quantité croissante de corpus multimodaux collectés permet de développer de nouvelles méthodes d’analyse de la conversation. Dans la très grande majorité des cas, ces corpus ne comprennent cependant que les enregistrements audio et vidéo, laissant de côté d’autres modalités plus difficiles à récupérer mais apportant un point de vue complémentaire sur la conversation, telle que l’activité cérébrale des locuteurs. Nous présentons donc BrainKT, un corpus de conversation naturelle en français, rassemblant les données audio, vidéo et signaux neurophysiologiques, collecté avec l’objectif d’étudier en profondeur les transmission d’information et l’instanciation du common ground. Pour chacune des conversations des 28 dyades (56 participants), les locuteurs devaient collaborer sur un jeu conversationnel (15min), et étaient ensuite libres de discuter du sujet de leur choix (15min). Pour chaque discussion, les données audio, vidéo, l’activité cérébrale (EEG par Biosemi 64) et physiologique (montre Empatica-E4) sont enregistrées. Cet article situe le corpus dans la littérature, présente le setup expérimental utilisé ainsi les difficultés rencontrées, et les différents niveaux d’annotations proposés pour le corpus.

pdf bib abs
Une approche zero-shot pour localiser les transferts d’informations en conversation naturelle
Eliot Maës | Hossam Boudraa | Philippe Blache | Leonor Becerra-Bonache
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 2 : traductions d'articles publiès

Les théories de l’interaction suggèrent que l’émergence d’une compréhension mutuelle entre les locuteurs en conversation naturelle dépend de la construction d’une base de connaissances partagée (common ground), mais n’explicitent ni le choix ni les circonstances de la mémorisation de ces informations.Des travaux antérieurs utilisant les métriques dérivées de la théorie de l’information pour analyser la dynamique d’échange d’information ne fournissent pas de moyen efficace de localiser les informations qui entreront dans le common ground. Nous proposons une nouvelle méthode basée sur la segmentation automatique d’une conversation en thèmes qui sont ensuite résumés. L’emplacement des transferts d’informations est finalement obtenu en calculant la distance entre le résumé du thème et les différents énoncés produits par un locuteur. Nous évaluons deux grands modèles de langue (LLMs) sur cette méthode, sur le corpus conversationnel français Paco-Cheese. Plus généralement, nous étudions la façon dont les derniers développement dans le champ des LLMs permettent l’étude de questions s’appuyant normalement fortement sur le jugement d’annotateurs humains.

pdf bib abs
Did You Get It? A Zero-Shot Approach to Locate Information Transfers in Conversations
Eliot Maës | Hossam Boudraa | Philippe Blache | Leonor Becerra-Bonache
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Interaction theories suggest that the emergence of mutual understanding between speakers in natural conversations depends on the construction of a shared knowledge base (common ground), but the details of which information and the circumstances under which it is memorized are not explained by any model. Previous works have looked at metrics derived from Information Theory to quantify the dynamics of information exchanged between participants, but do not provide an efficient way to locate information that will enter the common ground. We propose a new method based on the segmentation of a conversation into themes followed by their summarization. We then obtain the location of information transfers by computing the distance between the theme summary and the different utterances produced by a speaker. We evaluate two Large Language Models (LLMs) on this pipeline, on the French conversational corpus Paco-Cheese. More generally, we explore how the recent developments in the field of LLMs provide us with the means to implement these new methods and more generally support research into questions that usually heavily relies on human annotators.

pdf bib abs
The Distracted Ear: How Listeners Shape Conversational Dynamics
Auriane Boudin | Stéphane Rauzy | Roxane Bertrand | Magalie Ochs | Philippe Blache
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In the realm of human communication, feedback plays a pivotal role in shaping the dynamics of conversations. This study delves into the multifaceted relationship between listener feedback, narration quality and distraction effects. We present an analysis conducted on the SMYLE corpus, specifically enriched for this study, where 30 dyads of participants engaged in 1) face-to-face storytelling (8.2 hours) followed by 2) a free conversation (7.8 hours). The storytelling task unfolds in two conditions, where a storyteller engages with either a “normal” or a “distracted” listener. Examining the feedback impact on storytellers, we discover a positive correlation between the frequency of specific feedback and the narration quality in normal conditions, providing an encouraging conclusion regarding the enhancement of interaction through specific feedback in distraction-free settings. In contrast, in distracted settings, a negative correlation emerges, suggesting that increased specific feedback may disrupt narration quality, underscoring the complexity of feedback dynamics in human communication. The contribution of this paper is twofold: first presenting a new and highly enriched resource for the analysis of discourse phenomena in controlled and normal conditions; second providing new results on feedback production, its form and its consequence on the discourse quality (with direct applications in human-machine interaction).

2023

pdf bib abs
Are Frequent Phrases Directly Retrieved like Idioms? An Investigation with Self-Paced Reading and Language Models
Giulia Rambelli | Emmanuele Chersoni | Marco S. G. Senaldi | Philippe Blache | Alessandro Lenci
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

An open question in language comprehension studies is whether non-compositional multiword expressions like idioms and compositional-but-frequent word sequences are processed differently. Are the latter constructed online, or are instead directly retrieved from the lexicon, with a degree of entrenchment depending on their frequency? In this paper, we address this question with two different methodologies. First, we set up a self-paced reading experiment comparing human reading times for idioms and both highfrequency and low-frequency compositional word sequences. Then, we ran the same experiment using the Surprisal metrics computed with Neural Language Models (NLMs). Our results provide evidence that idiomatic and high-frequency compositional expressions are processed similarly by both humans and NLMs. Additional experiments were run to test the possible factors that could affect the NLMs’ performance.

pdf bib abs
Studying Common Ground Instantiation Using Audio, Video and Brain Behaviours: The BrainKT Corpus
Eliot Maës | Thierry Legou | Leonor Becerra | Philippe Blache
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

An increasing amount of multimodal recordings has been paving the way for the development of a more automatic way to study language and conversational interactions. However this data largely comprises of audio and video recordings, leaving aside other modalities that might complement this external view of the conversation but might be more difficult to collect in naturalistic setups, such as participants brain activity. In this context, we present BrainKT, a natural conversational corpus with audio, video and neuro-physiological signals, collected with the aim of studying information exchanges and common ground instantiation in conversation in a new, more in-depth way. We recorded conversations from 28 dyads (56 participants) during 30 minutes experiments where subjects were first tasked to collaborate on a joint information game, then freely drifted to the topic of their choice. During each session, audio and video were captured, along with the participants’ neural signal (EEG with Biosemi 64) and their electro-physiological activity (with Empatica-E4). The paper situates this new type of resources in the literature, presents the experimental setup and describes the different kinds of annotations considered for the corpus.

pdf bib abs
Investigating the Effect of Discourse Connectives on Transformer Surprisal: Language Models Understand Connectives, Even So They Are Surprised
Yan Cong | Emmanuele Chersoni | Yu-Yin Hsu | Philippe Blache
Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

As neural language models (NLMs) based on Transformers are becoming increasingly dominant in natural language processing, several studies have proposed analyzing the semantic and pragmatic abilities of such models. In our study, we aimed at investigating the effect of discourse connectives on NLMs with regard to Transformer Surprisal scores by focusing on the English stimuli of an experimental dataset, in which the expectations about an event in a discourse fragment could be reversed by a concessive or a contrastive connective. By comparing the Surprisal scores of several NLMs, we found that bigger NLMs show patterns similar to humans’ behavioral data when a concessive connective is used, while connective-related effects tend to disappear with a contrastive one. We have additionally validated our findings with GPT-Neo using an extended dataset, and results mostly show a consistent pattern.

2022

pdf bib abs
Shared knowledge in natural conversations: can entropy metrics shed light on information transfers?
Eliot Maës | Philippe Blache | Leonor Becerra
Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)

The mechanisms underlying human communication have been under investigation for decades, but the answer to how understanding between locutors emerges remains incomplete. Interaction theories suggest the development of a structural alignment between the speakers, allowing for the construction of a shared knowledge base (common ground). In this paper, we propose to apply metrics derived from information theory to quantify the amount of information exchanged between participants, the dynamics of information exchanges, to provide an objective way to measure the common ground instantiation. We focus on a corpus of free conversations augmented with prosodic segmentation and an expert annotation of thematic episodes. We show that during free conversations, the amount of information remains globally constant at the scale of the conversation, but varies depending on the thematic structuring, underlining the role of the speaker introducing the theme. We propose an original methodology applied to uncontrolled material.

We present in this paper the first natural conversation corpus recorded with all modalities and neuro-physiological signals. 5 dyads (10 participants) have been recorded three times, during three sessions (30mns each) with 4 days interval. During each session, audio and video are captured as well as the neural signal (EEG with Emotiv-EPOC) and the electro-physiological one (with Empatica-E4). This resource original in several respects. Technically, it is the first one gathering all these types of data in a natural conversation situation. Moreover, the recording of the same dyads at different periods opens the door to new longitudinal investigations such as the evolution of interlocutors’ alignment during the time. The paper situates this new type of resources with in the literature, presents the experimental setup and describes different annotations enriching the corpus.

pdf bib abs
Compositionality as an Analogical Process: Introducing ANNE
Giulia Rambelli | Emmanuele Chersoni | Philippe Blache | Alessandro Lenci
Proceedings of the Workshop on Cognitive Aspects of the Lexicon

Usage-based constructionist approaches consider language a structured inventory of constructions, form-meaning pairings of different schematicity and complexity, and claim that the more a linguistic pattern is encountered, the more it becomes accessible to speakers. However, when an expression is unavailable, what processes underlie the interpretation? While traditional answers rely on the principle of compositionality, for which the meaning is built word-by-word and incrementally, usage-based theories argue that novel utterances are created based on previously experienced ones through analogy, mapping an existing structural pattern onto a novel instance. Starting from this theoretical perspective, we propose here a computational implementation of these assumptions. As the principle of compositionality has been used to generate distributional representations of phrases, we propose a neural network simulating the construction of phrasal embedding as an analogical process. Our framework, inspired by word2vec and computer vision techniques, was evaluated on tasks of generalization from existing vectors.

pdf bib abs
Are You Smiling When I Am Speaking?
Auriane Boudin | Roxane Bertrand | Magalie Ochs | Philippe Blache | Stephane Rauzy
Proceedings of the Workshop on Smiling and Laughter across Contexts and the Life-span within the 13th Language Resources and Evaluation Conference

The aim of this study is to investigate conversational feedbacks that contain smiles and laughs. Firstly, we propose a statistical analysis of smiles and laughs used as generic and specific feedbacks in a corpus of French talk-in-interaction. Our results show that smiles of low intensity are preferentially used to produce generic feedbacks while high intensity smiles and laughs are preferentially used to produce specific feedbacks. Secondly, based on a machine learning approach, we propose a hierarchical classification of feedback to automatically predict not only the presence/absence of a smile but, also the type of smiles according to an intensity-scale (low or high).

2021

pdf bib abs
Did the Cat Drink the Coffee? Challenging Transformers with Generalized Event Knowledge
Paolo Pedinotti | Giulia Rambelli | Emmanuele Chersoni | Enrico Santus | Alessandro Lenci | Philippe Blache
Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics

Prior research has explored the ability of computational models to predict a word semantic fit with a given predicate. While much work has been devoted to modeling the typicality relation between verbs and arguments in isolation, in this paper we take a broader perspective by assessing whether and to what extent computational approaches have access to the information about the typicality of entire events and situations described in language (Generalized Event Knowledge). Given the recent success of Transformers Language Models (TLMs), we decided to test them on a benchmark for the dynamic estimation of thematic fit. The evaluation of these models was performed in comparison with SDM, a framework specifically designed to integrate events in sentence meaning representations, and we conducted a detailed error analysis to investigate which factors affect their behavior. Our results show that TLMs can reach performances that are comparable to those achieved by SDM. However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge, and their predictions often depend on surface linguistic features, such as frequent words, collocations and syntactic patterns, thereby showing sub-optimal generalization abilities.

2020

pdf bib abs
Two-level classification for dialogue act recognition in task-oriented dialogues
Philippe Blache | Massina Abderrahmane | Stéphane Rauzy | Magalie Ochs | Houda Oufaida
Proceedings of the 28th International Conference on Computational Linguistics

Dialogue act classification becomes a complex task when dealing with fine-grain labels. Many applications require such level of labelling, typically automatic dialogue systems. We present in this paper a 2-level classification technique, distinguishing between generic and specific dialogue acts (DA). This approach makes it possible to benefit from the very good accuracy of generic DA classification at the first level and proposes an efficient approach for specific DA, based on high-level linguistic features. Our results show the interest of involving such features into the classifiers, outperforming all other feature sets, in particular those classically used in DA classification.

pdf bib abs
Comparing Probabilistic, Distributional and Transformer-Based Models on Logical Metonymy Interpretation
Giulia Rambelli | Emmanuele Chersoni | Alessandro Lenci | Philippe Blache | Chu-Ren Huang
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

In linguistics and cognitive science, Logical metonymies are defined as type clashes between an event-selecting verb and an entity-denoting noun (e.g. The editor finished the article), which are typically interpreted by inferring a hidden event (e.g. reading) on the basis of contextual cues. This paper tackles the problem of logical metonymy interpretation, that is, the retrieval of the covert event via computational methods. We compare different types of models, including the probabilistic and the distributional ones previously introduced in the literature on the topic. For the first time, we also tested on this task some of the recent Transformer-based models, such as BERT, RoBERTa, XLNet, and GPT-2. Our results show a complex scenario, in which the best Transformer-based models and some traditional distributional models perform very similarly. However, the low performance on some of the testing datasets suggests that logical metonymy is still a challenging phenomenon for computational modeling.

pdf bib abs
The Brain-IHM Dataset: a New Resource for Studying the Brain Basis of Human-Human and Human-Machine Conversations
Magalie Ochs | Roxane Bertrand | Aurélie Goujon | Deirdre Bolger | Anne-Sophie Dubarry | Philippe Blache
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper presents an original dataset of controlled interactions, focusing on the study of feedback items. It consists on recordings of different conversations between a doctor and a patient, played by actors. In this corpus, the patient is mainly a listener and produces different feedbacks, some of them being (voluntary) incongruent. Moreover, these conversations have been re-synthesized in a virtual reality context, in which the patient is played by an artificial agent. The final corpus is made of different movies of human-human conversations plus the same conversations replayed in a human-machine context, resulting in the first human-human/human-machine parallel corpus. The corpus is then enriched with different multimodal annotations at the verbal and non-verbal levels. Moreover, and this is the first dataset of this type, we have designed an experiment during which different participants had to watch the movies and give an evaluation of the interaction. During this task, we recorded participant’s brain signal. The Brain-IHM dataset is then conceived with a triple purpose: 1/ studying feedbacks by comparing congruent vs. incongruent feedbacks 2/ comparing human-human and human-machine production of feedbacks 3/ studying the brain basis of feedback perception.

2019

pdf bib abs
Distributional Semantics Meets Construction Grammar. towards a Unified Usage-Based Model of Grammar and Meaning
Giulia Rambelli | Emmanuele Chersoni | Philippe Blache | Chu-Ren Huang | Alessandro Lenci
Proceedings of the First International Workshop on Designing Meaning Representations

In this paper, we propose a new type of semantic representation of Construction Grammar that combines constructions with the vector representations used in Distributional Semantics. We introduce a new framework, Distributional Construction Grammar, where grammar and meaning are systematically modeled from language use, and finally, we discuss the kind of contributions that distributional models can provide to CxG representation from a linguistic and cognitive perspective.

2018

pdf bib
A Semi-autonomous System for Creating a Human-Machine Interaction Corpus in Virtual Reality: Application to the ACORFORMed System for Training Doctors to Break Bad News
Magalie Ochs | Philippe Blache | Grégoire de Montcheuil | Jean-Marie Pergandi | Jorane Saubesty | Daniel Francon | Daniel Mestre
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib abs
Modeling Violations of Selectional Restrictions with Distributional Semantics
Emmanuele Chersoni | Adrià Torrens Urrutia | Philippe Blache | Alessandro Lenci
Proceedings of the Workshop on Linguistic Complexity and Natural Language Processing

Distributional Semantic Models have been successfully used for modeling selectional preferences in a variety of scenarios, since distributional similarity naturally provides an estimate of the degree to which an argument satisfies the requirement of a given predicate. However, we argue that the performance of such models on rare verb-argument combinations has received relatively little attention: it is not clear whether they are able to distinguish the combinations that are simply atypical, or implausible, from the semantically anomalous ones, and in particular, they have never been tested on the task of modeling their differences in processing complexity. In this paper, we compare two different models of thematic fit by testing their ability of identifying violations of selectional restrictions in two datasets from the experimental studies.

2017

pdf bib abs
Logical Metonymy in a Distributional Model of Sentence Comprehension
Emmanuele Chersoni | Alessandro Lenci | Philippe Blache
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

In theoretical linguistics, logical metonymy is defined as the combination of an event-subcategorizing verb with an entity-denoting direct object (e.g., The author began the book), so that the interpretation of the VP requires the retrieval of a covert event (e.g., writing). Psycholinguistic studies have revealed extra processing costs for logical metonymy, a phenomenon generally explained with the introduction of new semantic structure. In this paper, we present a general distributional model for sentence comprehension inspired by the Memory, Unification and Control model by Hagoort (2013,2016). We show that our distributional framework can account for the extra processing costs of logical metonymy and can identify the covert event in a classification task.

pdf bib abs
Measuring Thematic Fit with Distributional Feature Overlap
Enrico Santus | Emmanuele Chersoni | Alessandro Lenci | Philippe Blache
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper, we introduce a new distributional method for modeling predicate-argument thematic fit judgments. We use a syntax-based DSM to build a prototypical representation of verb-specific roles: for every verb, we extract the most salient second order contexts for each of its roles (i.e. the most salient dimensions of typical role fillers), and then we compute thematic fit as a weighted overlap between the top features of candidate fillers and role prototypes. Our experiments show that our method consistently outperforms a baseline re-implementing a state-of-the-art system, and achieves better or comparable results to those reported in the literature for the other unsupervised systems. Moreover, it provides an explicit representation of the features characterizing verb-specific semantic roles.

pdf bib
Is Structure Necessary for Modeling Argument Expectations in Distributional Semantics?
Emmanuele Chersoni | Enrico Santus | Philippe Blache | Alessandro Lenci
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Long papers

2016

pdf bib
Testing APSyn against Vector Cosine on Similarity Estimation
Enrico Santus | Emmanuele Chersoni | Alessandro Lenci | Chu-Ren Huang | Philippe Blache
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers

pdf bib
Representing Verbs with Rich Contexts: an Evaluation on Verb Similarity
Emmanuele Chersoni | Enrico Santus | Alessandro Lenci | Philippe Blache | Chu-Ren Huang
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
Dominique Brunato | Felice Dell’Orletta | Giulia Venturi | Thomas François | Philippe Blache
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

pdf bib abs
Towards a Distributional Model of Semantic Complexity
Emmanuele Chersoni | Philippe Blache | Alessandro Lenci
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

In this paper, we introduce for the first time a Distributional Model for computing semantic complexity, inspired by the general principles of the Memory, Unification and Control framework(Hagoort, 2013; Hagoort, 2016). We argue that sentence comprehension is an incremental process driven by the goal of constructing a coherent representation of the event represented by the sentence. The composition cost of a sentence depends on the semantic coherence of the event being constructed and on the activation degree of the linguistic constructions. We also report the results of a first evaluation of the model on the Bicknell dataset (Bicknell et al., 2010).

pdf bib abs
4Couv: A New Treebank for French
Philippe Blache | Grégoire de Montcheuil | Laurent Prévot | Stéphane Rauzy
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The question of the type of text used as primary data in treebanks is of certain importance. First, it has an influence at the discourse level: an article is not organized in the same way as a novel or a technical document. Moreover, it also has consequences in terms of semantic interpretation: some types of texts can be easier to interpret than others. We present in this paper a new type of treebank which presents the particularity to answer to specific needs of experimental linguistic. It is made of short texts (book backcovers) that presents a strong coherence in their organization and can be rapidly interpreted. This type of text is adapted to short reading sessions, making it easy to acquire physiological data (e.g. eye movement, electroencepholagraphy). Such a resource offers reliable data when looking for correlations between computational models and human language processing.

pdf bib abs
MarsaGram: an excursion in the forests of parsing trees
Philippe Blache | Stéphane Rauzy | Grégoire Montcheuil
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The question of how to compare languages and more generally the domain of linguistic typology, relies on the study of different linguistic properties or phenomena. Classically, such a comparison is done semi-manually, for example by extracting information from databases such as the WALS. However, it remains difficult to identify precisely regular parameters, available for different languages, that can be used as a basis towards modeling. We propose in this paper, focusing on the question of syntactic typology, a method for automatically extracting such parameters from treebanks, bringing them into a typology perspective. We present the method and the tools for inferring such information and navigating through the treebanks. The approach has been applied to 10 languages of the Universal Dependencies Treebank. We approach is evaluated by showing how automatic classification correlates with language families.

2015

pdf bib abs
Typologie automatique des langues à partir de treebanks
Philippe Blache | Grégroie de Montcheuil | Stéphane Rauzy
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

La typologie des langues repose sur l’étude de la réalisation de propriétés ou phénomènes linguistiques dans plusieurs langues ou familles de langues. Nous abordons dans cet article la question de la typologie syntaxique et proposons une méthode permettant d’extraire automatiquement ces propriétés à partir de treebanks, puis de les analyser en vue de dresser une telle typologie. Nous décrivons cette méthode ainsi que les outils développés pour la mettre en œuvre. Celle-ci a été appliquée à l’analyse de 10 langues décrites dans le Universal Dependencies Treebank. Nous validons ces résultats en montrant comment une technique de classification permet, sur la base des informations extraites, de reconstituer des familles de langues.

pdf bib abs
Création d’un nouveau treebank à partir de quatrièmes de couverture
Philippe Blache | Grégoire Moncheuil | Stéphane Rauzy | Marie-Laure Guénot
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Nous présentons ici 4-couv, un nouveau corpus arboré d’environ 3 500 phrases, constitué d’un ensemble de quatrièmes de couverture, étiqueté et analysé automatiquement puis corrigé et validé à la main. Il répond à des besoins spécifiques pour des projets de linguistique expérimentale, et vise à rester compatible avec les autres treebanks existants pour le français. Nous présentons ici le corpus lui-même ainsi que les outils utilisés pour les différentes étapes de son élaboration : choix des textes, étiquetage, parsing, correction manuelle.

This paper focuses on the representation and querying of knowledge-based multimodal data. This work stands in the OTIM project which aims at processing multimodal annotation of a large conversational French speech corpus. Within OTIM, we aim at providing linguists with a unique framework to encode and manipulate numerous linguistic domains (from prosody to gesture). Linguists commonly use Typed Feature Structures (TFS) to provide an uniform view of multimodal annotations but such a representation cannot be used within an applicative framework. Moreover TFS expressibility is limited to hierarchical and constituency relations and does not suit to any linguistic domain that needs for example to represent temporal relations. To overcome these limits, we propose an ontological approach based on Description logics (DL) for the description of linguistic knowledge and we provide an applicative framework based on OWL DL (Ontology Web Language) and the query language SPARQL.

2011

pdf bib
Predicting Linguistic Difficulty by Means of a Morpho-Syntactic Probabilistic Model
Philippe Blache | Stéphane Rauzy
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

2010

pdf bib abs
Un modèle de caractérisation de la complexité syntaxique
Philippe Blache
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article présente un modèle de la complexité syntaxique. Il réunit un ensemble d’indices de complexité et les représente à l’aide d’un cadre formel homogène, offrant ainsi la possibilité d’une quantification automatique : le modèle proposé permet d’associer à chaque phrase un indice reflétant sa complexité.

pdf bib
A Formal Scheme for Multimodal Grammars
Philippe Blache | Laurent Prévot
Coling 2010: Posters

Large annotation projects, typically those addressing the question of multimodal annotation in which many different kinds of information have to be encoded, have to elaborate precise and high level annotation schemes. Doing this requires first to define the structure of the information: the different objects and their organization. This stage has to be as much independent as possible from the coding language constraints. This is the reason why we propose a preliminary formal annotation model, represented with typed feature structures. This representation requires a precise definition of the different objects, their properties (or features) and their relations, represented in terms of type hierarchies. This approach has been used to specify the annotation scheme of a large multimodal annotation project (OTIM) and experimented in the annotation of a multimodal corpus (CID, Corpus of Interactional Data). This project aims at collecting, annotating and exploiting a dialogue video corpus in a multimodal perspective (including speech and gesture modalities). The corpus itself, is made of 8 hours of dialogues, fully transcribed and richly annotated (phonetics, syntax, pragmatics, gestures, etc.).

2009

pdf bib abs
Des relations d’alignement pour décrire l’interaction des domaines linguistiques : vers des Grammaires Multimodales
Philippe Blache
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Un des problèmes majeurs de la linguistique aujourd’hui réside dans la prise en compte de phénomènes relevant de domaines et de modalités différentes. Dans la littérature, la réponse consiste à représenter les relations pouvant exister entre ces domaines de façon externe, en termes de relation de structure à structure, s’appuyant donc sur une description distincte de chaque domaine ou chaque modalité. Nous proposons dans cet article une approche différente permettant représenter ces phénomènes dans un cadre formel unique, permettant de rendre compte au sein d’une même grammaire tous les phénomènes concernés. Cette représentation précise de l’interaction entre domaines et modalités s’appuie sur la définition de relations d’alignement.

pdf bib
A general scheme for broad-coverage multimodal annotation
Philippe Blache
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

2008

pdf bib abs
Influence de la qualité de l’étiquetage sur le chunking : une corrélation dépendant de la taille des chunks
Philippe Blache | Stéphane Rauzy
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Nous montrons dans cet article qu’il existe une corrélation étroite existant entre la qualité de l’étiquetage morpho-syntaxique et les performances des chunkers. Cette corrélation devient linéaire lorsque la taille des chunks est limitée. Nous appuyons notre démonstration sur la base d’une expérimentation conduite suite à la campagne d’évaluation Passage 2007 (de la Clergerie et al., 2008). Nous analysons pour cela les comportements de deux analyseurs ayant participé à cette campagne. L’interprétation des résultats montre que la tâche de chunking, lorsqu’elle vise des chunks courts, peut être assimilée à une tâche de “super-étiquetage”.

pdf bib abs
Creating and Exploiting Multimodal Annotated Corpora
Philippe Blache | Roxane Bertrand | Gaëlle Ferré
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The paper presents a project of the Laboratoire Parole & Langage which aims at collecting, annotating and exploiting a corpus of spoken French in a multimodal perspective. The project directly meets the present needs in linguistics where a growing number of researchers become aware of the fact that a theory of communication which aims at describing real interactions should take into account the complexity of these interactions. However, in order to take into account such a complexity, linguists should have access to spoken corpora annotated in different fields. The paper presents the annotation schemes used in phonetics, morphology and syntax, prosody, gestuality at the LPL together with the type of linguistic description made from the annotations seen in two examples.

pdf bib abs
Evaluation of Lexical Resources and Semantic Networks on a Corpus of Mental Associations
Laurianne Sitbon | Patrice Bellot | Philippe Blache
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

When a user cannot find a word, he may think of semantically related words that could be used into an automatic process to help him. This paper presents an evaluation of lexical resources and semantic networks for modelling mental associations. A corpus of associations has been constructed for its evaluation. It is composed of 20 low frequency target words each associated 5 times by 20 users. In the experiments we look for the target word in propositions made from the associated words thanks to 5 different resources. The results show that even if each resource has a useful specificity, the global recall is low. An experiment to extract common semantic features of several associations showed that we cannot expect to see the target word below a rank of 20 propositions.

pdf bib abs
Evaluating Robustness Of A QA System Through A Corpus Of Real-Life Questions
Laurianne Sitbon | Patrice Bellot | Philippe Blache
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents the sequential evaluation of the question answering system SQuaLIA. This system is based on the same sequential process as most statistical question answering systems, involving 4 main steps from question analysis to answer extraction. The evaluation is based on a corpus made from 20 questions taken in the set of an evaluation campaign and which were well answered by SQuaLIA. Each of the 20 questions has been typed by 17 native participants, non natives and dyslexics. They were vocally instructed the target of each question. Each of the 4 analysis steps of the system involves a loss of accuracy, until an average of 60 of right answers at the end of the process. The main cause of this loss seems to be the orthographic mistakes users make on nouns.

pdf bib
Le CID - Corpus of Interactional Data. Annotation et exploitation multimodale de parole conversationnelle [The “Corpus of Interactional Data” (CID) - Multimodal annotation of conversational speech”]
Roxane Bertrand | Philippe Blache | Robert Espesser | Gaëlle Ferré | Christine Meunier | Béatrice Priego-Valverde | Stéphane Rauzy
Traitement Automatique des Langues, Volume 49, Numéro 3 : Recherches actuelles en phonologie et en phonétique : interfaces avec le traitement automatique des langues [Current Research in Phonology and Phonetics: Interfaces with Natural-Language Processing]

2007

pdf bib abs
Traitements phrastiques phonétiques pour la réécriture de phrases dysorthographiées
Laurianne Sitbon | Patrice Bellot | Philippe Blache
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Cet article décrit une méthode qui combine des hypothèses graphémiques et phonétiques au niveau de la phrase, à l’aide d’une réprésentation en automates à états finis et d’un modèle de langage, pour la réécriture de phrases tapées au clavier par des dysorthographiques. La particularité des écrits dysorthographiés qui empêche les correcteurs orthographiques d’être efficaces pour cette tâche est une segmentation en mots parfois incorrecte. La réécriture diffère de la correction en ce sens que les phrases réécrites ne sont pas à destination de l’utilisateur mais d’un système automatique, tel qu’un moteur de recherche. De ce fait l’évaluation est conduite sur des versions filtrées et lemmatisées des phrases. Le taux d’erreurs mots moyen passe de 51 % à 20 % avec notre méthode, et est de 0 % sur 43 % des phrases testées.

pdf bib
Le moteur de prédiction de mots de la Plateforme de Communication Alternative [The word prediction engine of the Alternative Communication Platform]
Philippe Blache | Stéphane Rauzy
Traitement Automatique des Langues, Volume 48, Numéro 2 : Communication Assistée [Assisted communication]

pdf bib
Éléments pour adapter les systèmes de recherche d’information aux dyslexiques [Towards adapting information retrieval systems to dyslexic people]
Lauriane Sitbon | Patrice Bellot | Philippe Blache
Traitement Automatique des Langues, Volume 48, Numéro 2 : Communication Assistée [Assisted communication]

2006

pdf bib abs
Constraint-Based Parsing as an Efficient Solution: Results from the Parsing Evaluation Campaign EASy
Tristan Vanrullen | Philippe Blache | Jean-Marie Balfourier
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes the unfolding of the EASy evaluation campaign for french parsers as well as the techniques employed for the participation of laboratory LPL to this campaign. Three symbolic parsers based on a same resource and a same formalism (Property Grammars) are described and evaluated. The first results of this evaluation are analyzed and lead to the conclusion that symbolic parsing in a constraint-based formalism is efficient and robust.

pdf bib
Acceptability Prediction by Means of Grammaticality Quantification
Philippe Blache | Barbara Hemforth | Stéphane Rauzy
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib abs
Vers une prédiction automatique de la difficulté d’une question en langue naturelle
Laurianne Sitbon | Jens Grivolla | Laurent Gillard | Patrice Bellot | Philippe Blache
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Nous proposons et testons deux méthodes de prédiction de la capacité d’un système à répondre à une question factuelle. Une telle prédiciton permet de déterminer si l’on doit initier un dialogue afin de préciser ou de reformuler la question posée par l’utilisateur. La première approche que nous proposons est une adaptation d’une méthode de prédiction dans le domaine de la recherche documentaire, basée soit sur des machines à vecteurs supports (SVM) soit sur des arbres de décision, avec des critères tels que le contenu des questions ou des documents, et des mesures de cohésion entre les documents ou passages de documents d’où sont extraits les réponses. L’autre approche vise à utiliser le type de réponse attendue pour décider de la capacité du système à répondre. Les deux approches ont été testées sur les données de la campagne Technolangue EQUER des systèmes de questions-réponses en français. L’approche à base de SVM est celle qui obtient les meilleurs résultats. Elle permet de distinguer au mieux les questions faciles, celles auxquelles notre système apporte une bonne réponse, des questions difficiles, celles restées sans réponses ou auxquelles le système a répondu de manière incorrecte. A l’opposé on montre que pour notre système, le type de réponse attendue (personnes, quantités, lieux...) n’est pas un facteur déterminant pour la difficulté d’une question.

pdf bib abs
Mécanismes de contrôle pour l’analyse en Grammaires de Propriétés
Philippe Blache | Stéphane Rauzy
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Les méthodes d’analyse syntaxiques hybrides, reposant à la fois sur des techniques statistiques et symboliques, restent peu exploitées. Dans la plupart des cas, les informations statistiques sont intégrées à un squelette contextfree et sont utilisées pour contrôler le choix des règles ou des structures. Nous proposons dans cet article une méthode permettant de calculer un indice de corrélation entre deux objets linguistiques (catégories, propriétés). Nous décrivons une utilisation de cette notion dans le cadre de l’analyse des Grammaires de Propriétés. L’indice de corrélation nous permet dans ce cas de contrôler à la fois la sélection des constituants d’une catégorie, mais également la satisfaction des propriétés qui la décrivent.

pdf bib
Proceedings of the Third Workshop on Constraints and Language Processing
Philippe Blache | Henning Christiansen | Veronica Dahl | Jean-Philippe Prost
Proceedings of the Third Workshop on Constraints and Language Processing

pdf bib
A Robust and Efficient Parser for Non-Canonical Inputs
Philippe Blache
Proceedings of the Workshop on ROMAND 2006:Robust Methods in Analysis of Natural language Data

2005

pdf bib abs
Combiner analyse superficielle et profonde : bilan et perspectives
Philippe Blache
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

L’analyse syntaxique reste un problème complexe au point que nombre d’applications n’ont recours qu’à des analyseurs superficiels. Nous faisons dans cet article le point sur les notions d’analyse superficielles et profondes en proposant une première caractérisation de la notion de complexité opérationnelle pour l’analyse syntaxique automatique permettant de distinguer objets et relations plus ou moins difficiles à identifier. Sur cette base, nous proposons un bilan des différentes techniques permettant de caractériser et combiner analyse superficielle et profonde.

pdf bib abs
Une plateforme pour l’acquisition, la maintenance et la validation de ressources lexicales
Tristan Vanrullen | Philippe Blache | Cristel Portes | Stéphane Rauzy | Jean-François Maeyhieux
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Nous présentons une plateforme de développement de lexique offrant une base lexicale accompagnée d’un certain nombre d’outils de maintenance et d’utilisation. Cette base, qui comporte aujourd’hui 440.000 formes du Français contemporain, est destinée à être diffusée et remise à jour régulièrement. Nous exposons d’abord les outils et les techniques employées pour sa constitution et son enrichissement, notamment la technique de calcul des fréquences lexicales par catégorie morphosyntaxique. Nous décrivons ensuite différentes approches pour constituer un sous-lexique de taille réduite, dont la particularité est de couvrir plus de 90% de l’usage. Un tel lexique noyau offre en outre la possibilité d’être réellement complété manuellement avec des informations sémantiques, de valence, pragmatiques etc.

2004

pdf bib
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
Philippe Blache | Noël Nguyen | Nouredine Chenfour | Abdenbi Rajouani
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

pdf bib abs
Densité d’information syntaxique et gradient de grammaticalité
Philippe Blache
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article propose l’introduction d’une notion de densité syntaxique permettant de caractériser la complexité d’un énoncé et au-delà d’introduire la spécification d’un gradient de grammaticalité. Un tel gradient s’avère utile dans plusieurs cas : quantification de la difficulté d’interprétation d’une phrase, gradation de la quantité d’information syntaxique contenue dans un énoncé, explication de la variabilité et la dépendances entre les domaines linguistiques, etc. Cette notion exploite la possibilité de caractérisation fine de l’information syntaxique en termes de contraintes : la densité est fonction des contraintes satisfaites par une réalisation pour une grammaire donnée. Les résultats de l’application de cette notion à quelques corpus sont analysés.

pdf bib
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Posters
Philippe Blache | Noël Nguyen | Nouredine Chenfour | Abdenbi Rajouani
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

2003

pdf bib abs
Meta-Level Contstraints for Linguistic Domain Interaction
Philippe Blache
Proceedings of the Eighth International Conference on Parsing Technologies

This paper presents a technique for the representation and the implementation of interaction relations between different domains of linguistic analysis. This solution relies on the localization of the linguistic objects in the context. The relations are then implemented by means of interaction constraints, each domain information being expressed independently.

pdf bib abs
Vers une théorie cognitive de la langue basée sur les contraintes
Philippe Blache
Actes de la 10ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Cet article fournit des éléments d’explication pour la description des relations entre les différents domaines de l’analyse linguistique. Il propose une architecture générale en vue d’une théorie formée de plusieurs niveaux : d’un côté les grammaires de chacun des domaines et de l’autre des relations spécifiant les interactions entre ces domaines. Dans cette approche, chacun des domaines est porteur d’une partie de l’information, celle-ci résultant également de l’interaction entre les domaines.

2002

pdf bib abs
Variabilité et dépendances des composants linguistiques
Philippe Blache | Albert Di Cristo
Actes de la 9ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Nous présentons dans cet article un cadre d’explication des relations entre les différents composants de l’analyse linguistique (prosodie, syntaxe, sémantique, etc.). Nous proposons un principe spécifiant un équilibre pour un objet linguistique donné entre ces différents composants sous la forme d’un poids (précisant l’aspect marqué de l’objet décrit) défini pour chacun d’entre eux et d’un seuil (correspondant à la somme de ces poids) à atteindre. Une telle approche permet d’expliquer certains phénomènes de variabilité : le choix d’une “tournure” à l’intérieur d’un des composants peut varier à condition que son poids n’empêche pas d’atteindre le seuil spécifié. Ce type d’information, outre son intérêt purement linguistique, constitue le premier élément de réponse pour l’introduction de la variabilité dans des applications comme les systèmes de génération ou de synthèse de la parole.

pdf bib
From Shallow to Deep Parsing Using Constraint Satisfaction
Jean-Marie Balfourier | Philippe Blache | Tristan van Rullen
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
An evaluation of different symbolic shallow parsing techniques
Tristan Van Rullen | Philippe Blache
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
Property Grammars: A Flexible Constraint-Based Approach to Parsing
Philippe Blache | Jean-Marie Balfourier
Proceedings of the Seventh International Workshop on Parsing Technologies

pdf bib abs
Dépendances à distance dans les grammaires de propriétés : l’exemple des disloquées
Philippe Blache
Actes de la 8ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article propose une description des dépendances à distances s’appuyant sur une approche totalement déclarative, les grammaires de propriétés, décrivant l’information linguistique sous la forme de contraintes. L’approche décrite ici consiste à introduire de façon dynamique en cours d’analyse de nouvelles contraintes, appelées propriétés distantes. Cette notion est illustrée par la description du phénomène des disloquées en français.

2000

pdf bib
Property Grammars: a Solution for Parsing with Constraints
Philippe Blache
Proceedings of the Sixth International Workshop on Parsing Technologies

1998

pdf bib
Parsing Ambiguous Structures using Controlled Disjunctions and Unary Quasi-Trees
Philippe Blache
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf bib
Parsing Ambiguous Structures using Controlled Disjunctions and Unary Quasi-Trees
Philippe Blache
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

1997

pdf bib abs
Disambiguating with Controlled Disjunctions
Philippe Blache
Proceedings of the Fifth International Workshop on Parsing Technologies

In this paper, we propose a disambiguating technique called controlled disjunctions. This extension of the so-called named disjunctions relies on the relations existing between feature values (covariation, control, etc.). We show that controlled disjunctions can implement different kind of ambiguities in a consistent and homogeneous way. We describe the integration of controlled disjunctions into a HPSG feature structure representation. Finally, we present a direct implementation by means of delayed evaluation and we develop an example within the functional programming paradigm.