Skip to main content

Nathalie Pernelle

Université Sorbonne Paris Nord / Sorbonne Paris Nord University, Institut Galilée, Faculty Member

Followers

9

Following

4

Co-authors

2

Public Views

Interests

Uploads

Papers by Nathalie Pernelle

Ingénierie des Connaissances

National audienceLes journées francophones d'Ingénierie des Connaissances (IC) permettent à l... more National audienceLes journées francophones d'Ingénierie des Connaissances (IC) permettent à la communauté francophone d'échanger sur des thématiques d'acquisition, de représentation ou de gestion des données et des connaissances. Ainsi, chercheurs académiques, industriels et étudiants ont la possibilité de présenter leurs travaux en l'ingénierie des connaissances pour confronter leurs points de vue. Ces journées sont organisées chaque année depuis 1997 sous l'égide du Groupe de Recherche en Acquisition des Connaissances (GRACQ) puis du collège Ingénierie des Connaissances de l'Association Française pour l'Intelligence Artificielle (AFIA). Ce numéro spécial de la Revue d'Intelligence Artificielle rassemble quatre contributions sélectionnées a partir des meilleurs articles présentés lors des éditions 2016 et 2017 des journées francophones d'Ingénierie des Connaissances

Generating Referring Expressions from Knowledge Graphs

A referring expression (RE) is a description in natural language or a logical formula that can un... more A referring expression (RE) is a description in natural language or a logical formula that can uniquely identify an entity. For instance, the 44th president of the United States unambiguously characterizes Barack Obama. Referring expressions find applications in disambiguation, data anonymization, query answering, or data linking. There may potentially exist many logical expressions for uniquely identifying an entity. Generation of referring expressions is a well-studied task in natural language generation [1]. Hence, various algorithms with different objectives have been proposed to automatically discover REs. These approaches vary depending on the expressivity of the logical formulas they can generate. For instance in [1, 2], REs that are created are conjunctions of atoms. While in [3], more complex REs represented in description logics are discovered that can involve the universal quantifier. In this work our focus lies on automatically discovering REs for each entity within a cl...

Enrichissement contrôlé de bases de connaissances à partir de documents semi-structurés annotés

Grâce au Linked Open Data, les sources RDF mises a disposition sur le Web sont de plus en plus no... more Grâce au Linked Open Data, les sources RDF mises a disposition sur le Web sont de plus en plus nombreuses. Cependant, ces sources contiennent relativement peu d'information par comparaison au volume d'informations contenues dans les documents semi-structures. De nombreux outils ont pour objectif d'annoter semantiquement ces documents mais l'extraction de relations reste une tâche particulierement difficile quand la structure et le vocabulaire des documents sont heterogenes. Nous proposons une approche permettant d'enrichir et d'interroger une ou plusieurs bases de connaissances RDF/OWL en exploitant un ensemble de documents semantiquement annotes. Ces bases sont enrichies par des instances de relations incertaines inferees a partir de la structure des documents, des ontologies et des faits presents dans les bases de connaissances. Une requete SPARQL formulee dans le vocabulaire du domaine est reformulee afin de combiner les faits issus des differentes bases e...

Explanation Dialogues on Erroneous SameAs Using Argumentation Theory

International audienc

LDM: Link Discovery Method for new Resource Integration

C-SAKey : une approche de découverte de clés conditionnelles dans des données RDF

Détection et Représentation des changements dans les sources de données RDF

De nombreuses sources de donnees RDF sont en evolution constante que ce soit au niveau des don-ne... more De nombreuses sources de donnees RDF sont en evolution constante que ce soit au niveau des don-nees ou du vocabulaire utilise. Or, de nombreuses tâches d'integration sont impactees par ces modifications. Nous presentons une approche permettant de detecter et de representer des changements plus ou moins complexes que l'on peut detecter lorsque l'on s'interesse aux seules donnees. Une premiere experimentation a ete menee sur differentes versions de DBPedia.

Comment représenter et découvrir des liens d’identité contextuelle dans une base de connaissances : Application à des données expérimentales en sciences du vivant

Revue d'intelligence artificielle, 2018

Détection de liens d'identité contextuels dans une base de connaissances

Journées Francophones d'Ingénierie des Connaissances, 2017

Approche logique pour la réconciliation de références

Extraction et Gestion des Connaissances, 2007

Enrichissement sémantique de documents XML représentant des tableaux

Extraction et Gestion des Connaissances, 2005

On Evaluating the Quality of RDF Identity Links in the LOD

The notion of Linked Data is based on the idea that resources from different data sources can be ... more The notion of Linked Data is based on the idea that resources from different data sources can be connected by typed links to enable new knowledge that isolated data sources cannot provide on their own. Today, as the Web of Data is proving its importance, and as an huge amount of data is published on the web in form of RDF triples, the quantity of sameAs links is extremely growing as different data sources often describe equivalent resources. Since most of the sameAs links are discovered automatically and the quality of the data sources can be poor, it is becoming crucial to develop methods for evaluating the quality of these RDF identity links, thus providing a ranking measure of their reliability. In this context, this paper defines a initial methodology for analyzing and evaluating a set of given RDF identity links in the Web of Data, ranking their reliability.

Approche numérique pour l'invalidation de liens d'identité (owl: SameAs)

Résumé : Au cours des dernières années, grâce à la standardisation des technologies Web sémantiqu... more Résumé : Au cours des dernières années, grâce à la standardisation des technologies Web sémantique, nous connaissons une production de données sans précédent, publiées en ligne sous forme de données liées. Dans ce contexte, lorsqu’un lien typé est déclaré entre deux ressources distinctes faisant référence à la même entité du monde réel, l’utilisation du owl :sameAs est généralement prédominant. Toutefois, des travaux récents dans la communauté des données liées ont montré des problèmes dans l’utilisation des liens owl :sameAs. Les problèmes surviennent à la fois dans les cas où ces liens sont erronés ou lorsqu’ils traduisent un lien moins strict que la sémantique des liens owl:sameAs définie dans OWL. Dans ce travail, nous présentons une méthode d’invalidation numérique de liens d’identité s’appuyant sur un calcul de similarité et sur des axiomes de l’ontologie pour détecter des liens d’identité invalides. Nous présentons nos premiers résultats expérimentaux, obtenus sur un jeu de d...

Semantic enrichment of data: annotation and data linking

Memoire presente en vu d'obtenir l' Habilitation a Diriger des Recherches, specialite &qu... more Memoire presente en vu d'obtenir l' Habilitation a Diriger des Recherches, specialite " Informatique " par Nathalie Pernelle. Over the last fifteen years, I have focused on issues related to the general problem of enriching unstructured documents or structured RDF data with semantic information. This has been done following three main researchaxes. A first research axis has been dedicated to the definition of approaches that automatically construct class hierarchies over XML data when there is no domain ontology available. Since this work is rather old now, I will only describe it briefly in this chapter but I will not detail it in this thesis. In a second research axis, I have investigated different approaches that can be used to annotate unstructured data using available ontologies. A third research axis has been devoted to the enrichment of RDF datasets with identity links when RDF datasets conform to OWL ontologies.

Définition de la sémantique des clés dans le web sémantique : un point de vue théorique

De nombreuses approches ont ete definies pour permettre le liage automatique de sources de donnee... more De nombreuses approches ont ete definies pour permettre le liage automatique de sources de donnees RDF publiees sur le Web. Certaines de ces approches sont basees sur la selection des plus petits ensembles de proprietes pertinentes pour comparer deux donnees. Ces ensembles forment des cles et cette notion est similaire aux cles definies pour les bases de donnees relationnelles. Dans cet article, nous proposons d'explorer differentes semantiques de cles qui peuvent etre utilisees dans le cadre du Web semantique.

Enriching a Relational Data Warehouse by Integrating XML Data : Report on the e . dot Project Applied to Microbiology

In this paper we present two methods for integrating (and querying) data in a relational setting.... more In this paper we present two methods for integrating (and querying) data in a relational setting. These methods have been motivated and validated by a knowledge management application on Microbiology, the edot project. The aim of e.dot project is to enrich an existing relational database storing microbiological data dealing with food risk assessment with data resulting from a continuous web technologic watch. The data coming from the web are put in XML format and must be queried by the same relational interface as the pre-existing relational database. The first method allows to integrate possibly heterogenous XML data by relational views over the schema of the existing database. These relational views are not materialized but we provide a query rewriting algorithm which decompose a select-project-join query into a set of local queries expressed in Xquery. The second method focus on the data tables contained in the documents found on the Web. The method takes advantage of the presenc...

L2R: A Logical Method for Reference Reconciliation

The reference reconciliation problem consists in deciding whether different identifiers refer to ... more The reference reconciliation problem consists in deciding whether different identifiers refer to the same data, i.e., correspond to the same world entity. The L2R system exploits the semantics of a rich data model, which extends RDFS by a fragment of OWL-DL and SWRL rules. In L2R, the semantics of the schema is translated into a set of logical rules of reconciliation, which are then used to infer correct decisions both of reconciliation and no reconciliation. In contrast with other approaches, the L2R method has a precision of 100% by construction. First experiments show promising results for recall, and most importantly significant increases when rules are added.

RDF data evolution: automatic detection and semantic representation of changes

Many RDF data sources are constantly changing for both data and vocabulary (ontology) levels. Man... more Many RDF data sources are constantly changing for both data and vocabulary (ontology) levels. Many integration tasks are impacted by these changes. In this context, it is important to develop approaches to detect and represent these changes. Many studies have focused on the detection, the representation and the management of changes at the ontology level. In this paper, we present an approach which allows to detect and represent elementary and complex changes that can be detected when we focus only on the data level. A first experiment was conducted on different versions of DBpedia. CCS Concepts •Information systems → Information integration;

Differential Causal Rules Mining in Knowledge Graphs

Proceedings of the 11th on Knowledge Capture Conference

The sameAs Problem: A Survey on Identity Management in the Web of Data

In a decentralised knowledge representation system such as the Web of Data, it is common and inde... more In a decentralised knowledge representation system such as the Web of Data, it is common and indeed desirable for different knowledge graphs to overlap. Whenever multiple names are used to denote the same thing, owl:sameAs statements are needed in order to link the data and foster reuse. Whilst the deductive value of such identity statements can be extremely useful in enhancing various knowledge-based systems, incorrect use of identity can have wide-ranging effects in a global knowledge space like the Web of Data. With several works already proven that identity in the Web is broken, this survey investigates the current state of this "sameAs problem". An open discussion highlights the main weaknesses suffered by solutions in the literature, and draws open challenges to be faced in the future.

Ingénierie des Connaissances

National audienceLes journées francophones d'Ingénierie des Connaissances (IC) permettent à l... more National audienceLes journées francophones d'Ingénierie des Connaissances (IC) permettent à la communauté francophone d'échanger sur des thématiques d'acquisition, de représentation ou de gestion des données et des connaissances. Ainsi, chercheurs académiques, industriels et étudiants ont la possibilité de présenter leurs travaux en l'ingénierie des connaissances pour confronter leurs points de vue. Ces journées sont organisées chaque année depuis 1997 sous l'égide du Groupe de Recherche en Acquisition des Connaissances (GRACQ) puis du collège Ingénierie des Connaissances de l'Association Française pour l'Intelligence Artificielle (AFIA). Ce numéro spécial de la Revue d'Intelligence Artificielle rassemble quatre contributions sélectionnées a partir des meilleurs articles présentés lors des éditions 2016 et 2017 des journées francophones d'Ingénierie des Connaissances

Generating Referring Expressions from Knowledge Graphs

A referring expression (RE) is a description in natural language or a logical formula that can un... more A referring expression (RE) is a description in natural language or a logical formula that can uniquely identify an entity. For instance, the 44th president of the United States unambiguously characterizes Barack Obama. Referring expressions find applications in disambiguation, data anonymization, query answering, or data linking. There may potentially exist many logical expressions for uniquely identifying an entity. Generation of referring expressions is a well-studied task in natural language generation [1]. Hence, various algorithms with different objectives have been proposed to automatically discover REs. These approaches vary depending on the expressivity of the logical formulas they can generate. For instance in [1, 2], REs that are created are conjunctions of atoms. While in [3], more complex REs represented in description logics are discovered that can involve the universal quantifier. In this work our focus lies on automatically discovering REs for each entity within a cl...

Enrichissement contrôlé de bases de connaissances à partir de documents semi-structurés annotés

Grâce au Linked Open Data, les sources RDF mises a disposition sur le Web sont de plus en plus no... more Grâce au Linked Open Data, les sources RDF mises a disposition sur le Web sont de plus en plus nombreuses. Cependant, ces sources contiennent relativement peu d'information par comparaison au volume d'informations contenues dans les documents semi-structures. De nombreux outils ont pour objectif d'annoter semantiquement ces documents mais l'extraction de relations reste une tâche particulierement difficile quand la structure et le vocabulaire des documents sont heterogenes. Nous proposons une approche permettant d'enrichir et d'interroger une ou plusieurs bases de connaissances RDF/OWL en exploitant un ensemble de documents semantiquement annotes. Ces bases sont enrichies par des instances de relations incertaines inferees a partir de la structure des documents, des ontologies et des faits presents dans les bases de connaissances. Une requete SPARQL formulee dans le vocabulaire du domaine est reformulee afin de combiner les faits issus des differentes bases e...

Explanation Dialogues on Erroneous SameAs Using Argumentation Theory

International audienc

LDM: Link Discovery Method for new Resource Integration

C-SAKey : une approche de découverte de clés conditionnelles dans des données RDF

Détection et Représentation des changements dans les sources de données RDF

De nombreuses sources de donnees RDF sont en evolution constante que ce soit au niveau des don-ne... more De nombreuses sources de donnees RDF sont en evolution constante que ce soit au niveau des don-nees ou du vocabulaire utilise. Or, de nombreuses tâches d'integration sont impactees par ces modifications. Nous presentons une approche permettant de detecter et de representer des changements plus ou moins complexes que l'on peut detecter lorsque l'on s'interesse aux seules donnees. Une premiere experimentation a ete menee sur differentes versions de DBPedia.

Comment représenter et découvrir des liens d’identité contextuelle dans une base de connaissances : Application à des données expérimentales en sciences du vivant

Revue d'intelligence artificielle, 2018

Détection de liens d'identité contextuels dans une base de connaissances

Journées Francophones d'Ingénierie des Connaissances, 2017

Approche logique pour la réconciliation de références

Extraction et Gestion des Connaissances, 2007

Enrichissement sémantique de documents XML représentant des tableaux

Extraction et Gestion des Connaissances, 2005

On Evaluating the Quality of RDF Identity Links in the LOD

The notion of Linked Data is based on the idea that resources from different data sources can be ... more The notion of Linked Data is based on the idea that resources from different data sources can be connected by typed links to enable new knowledge that isolated data sources cannot provide on their own. Today, as the Web of Data is proving its importance, and as an huge amount of data is published on the web in form of RDF triples, the quantity of sameAs links is extremely growing as different data sources often describe equivalent resources. Since most of the sameAs links are discovered automatically and the quality of the data sources can be poor, it is becoming crucial to develop methods for evaluating the quality of these RDF identity links, thus providing a ranking measure of their reliability. In this context, this paper defines a initial methodology for analyzing and evaluating a set of given RDF identity links in the Web of Data, ranking their reliability.

Approche numérique pour l'invalidation de liens d'identité (owl: SameAs)

Résumé : Au cours des dernières années, grâce à la standardisation des technologies Web sémantiqu... more Résumé : Au cours des dernières années, grâce à la standardisation des technologies Web sémantique, nous connaissons une production de données sans précédent, publiées en ligne sous forme de données liées. Dans ce contexte, lorsqu’un lien typé est déclaré entre deux ressources distinctes faisant référence à la même entité du monde réel, l’utilisation du owl :sameAs est généralement prédominant. Toutefois, des travaux récents dans la communauté des données liées ont montré des problèmes dans l’utilisation des liens owl :sameAs. Les problèmes surviennent à la fois dans les cas où ces liens sont erronés ou lorsqu’ils traduisent un lien moins strict que la sémantique des liens owl:sameAs définie dans OWL. Dans ce travail, nous présentons une méthode d’invalidation numérique de liens d’identité s’appuyant sur un calcul de similarité et sur des axiomes de l’ontologie pour détecter des liens d’identité invalides. Nous présentons nos premiers résultats expérimentaux, obtenus sur un jeu de d...

Semantic enrichment of data: annotation and data linking

Memoire presente en vu d'obtenir l' Habilitation a Diriger des Recherches, specialite &qu... more Memoire presente en vu d'obtenir l' Habilitation a Diriger des Recherches, specialite " Informatique " par Nathalie Pernelle. Over the last fifteen years, I have focused on issues related to the general problem of enriching unstructured documents or structured RDF data with semantic information. This has been done following three main researchaxes. A first research axis has been dedicated to the definition of approaches that automatically construct class hierarchies over XML data when there is no domain ontology available. Since this work is rather old now, I will only describe it briefly in this chapter but I will not detail it in this thesis. In a second research axis, I have investigated different approaches that can be used to annotate unstructured data using available ontologies. A third research axis has been devoted to the enrichment of RDF datasets with identity links when RDF datasets conform to OWL ontologies.

Définition de la sémantique des clés dans le web sémantique : un point de vue théorique

De nombreuses approches ont ete definies pour permettre le liage automatique de sources de donnee... more De nombreuses approches ont ete definies pour permettre le liage automatique de sources de donnees RDF publiees sur le Web. Certaines de ces approches sont basees sur la selection des plus petits ensembles de proprietes pertinentes pour comparer deux donnees. Ces ensembles forment des cles et cette notion est similaire aux cles definies pour les bases de donnees relationnelles. Dans cet article, nous proposons d'explorer differentes semantiques de cles qui peuvent etre utilisees dans le cadre du Web semantique.

Enriching a Relational Data Warehouse by Integrating XML Data : Report on the e . dot Project Applied to Microbiology

In this paper we present two methods for integrating (and querying) data in a relational setting.... more In this paper we present two methods for integrating (and querying) data in a relational setting. These methods have been motivated and validated by a knowledge management application on Microbiology, the edot project. The aim of e.dot project is to enrich an existing relational database storing microbiological data dealing with food risk assessment with data resulting from a continuous web technologic watch. The data coming from the web are put in XML format and must be queried by the same relational interface as the pre-existing relational database. The first method allows to integrate possibly heterogenous XML data by relational views over the schema of the existing database. These relational views are not materialized but we provide a query rewriting algorithm which decompose a select-project-join query into a set of local queries expressed in Xquery. The second method focus on the data tables contained in the documents found on the Web. The method takes advantage of the presenc...

L2R: A Logical Method for Reference Reconciliation

The reference reconciliation problem consists in deciding whether different identifiers refer to ... more The reference reconciliation problem consists in deciding whether different identifiers refer to the same data, i.e., correspond to the same world entity. The L2R system exploits the semantics of a rich data model, which extends RDFS by a fragment of OWL-DL and SWRL rules. In L2R, the semantics of the schema is translated into a set of logical rules of reconciliation, which are then used to infer correct decisions both of reconciliation and no reconciliation. In contrast with other approaches, the L2R method has a precision of 100% by construction. First experiments show promising results for recall, and most importantly significant increases when rules are added.

RDF data evolution: automatic detection and semantic representation of changes

Many RDF data sources are constantly changing for both data and vocabulary (ontology) levels. Man... more Many RDF data sources are constantly changing for both data and vocabulary (ontology) levels. Many integration tasks are impacted by these changes. In this context, it is important to develop approaches to detect and represent these changes. Many studies have focused on the detection, the representation and the management of changes at the ontology level. In this paper, we present an approach which allows to detect and represent elementary and complex changes that can be detected when we focus only on the data level. A first experiment was conducted on different versions of DBpedia. CCS Concepts •Information systems → Information integration;

Differential Causal Rules Mining in Knowledge Graphs

Proceedings of the 11th on Knowledge Capture Conference

The sameAs Problem: A Survey on Identity Management in the Web of Data

In a decentralised knowledge representation system such as the Web of Data, it is common and inde... more In a decentralised knowledge representation system such as the Web of Data, it is common and indeed desirable for different knowledge graphs to overlap. Whenever multiple names are used to denote the same thing, owl:sameAs statements are needed in order to link the data and foster reuse. Whilst the deductive value of such identity statements can be extremely useful in enhancing various knowledge-based systems, incorrect use of identity can have wide-ranging effects in a global knowledge space like the Web of Data. With several works already proven that identity in the Web is broken, this survey investigates the current state of this "sameAs problem". An open discussion highlights the main weaknesses suffered by solutions in the literature, and draws open challenges to be faced in the future.