Kai Eckert | Hochschule der Medien - Academia.edu

Skip to main content

Kai Eckert

Hochschule der Medien, Faculty for Communication and Information, Faculty Member

University of Mannheim, School of Business Informatics and Mathematics, Alumnus

Followers

21

Following

48

Co-author

1

Public Views

Computer and Information Scientist
Address: Mannheim, Germany

less

InterestsView All (11)

Uploads

Research

Usage-driven Maintenance of Knowledge Organization Systems

Knowledge Organization Systems (KOS) are typically used as background knowledge for document inde... more Knowledge Organization Systems (KOS) are typically used as background knowledge for document indexing in information retrieval. They have to be maintained and adapted constantly to reflect changes in the domain and the terminology. In this thesis, approaches are provided that support the maintenance of hierarchical knowledge organization systems, like thesauri, classifications, or taxonomies, by making information about the usage of KOS concepts available to the maintainer.
The central contribution is the ICE-Map Visualization, a treemap-based visualization on top of a generalized statistical framework that is able to visualize almost arbitrary usage information. The proper selection of an existing KOS for available documents and the evaluation of a KOS for different indexing techniques by means of the ICE-Map Visualization is demonstrated.
For the creation of a new KOS, an approach based on crowdsourcing is presented that uses feedback from Amazon Mechanical Turk to relate terms hierarchically. The extension of an existing KOS with new terms derived from the documents to be indexed is performed with a machine-learning approach that relates the terms to existing concepts in the hierarchy. The features are derived from text snippets in the result list of a web search engine. For the splitting of overpopulated
concepts into new subconcepts, an interactive clustering approach is presented that is able to propose names for the new subconcepts. The implementation of a framework is described that integrates all approaches of this thesis and contains the reference implementation of the ICE-Map Visual-
ization. It is extendable and supports the implementation of evaluation methods that build on other evaluations. Additionally, it supports the visualization of the results and the implementation of new visualizations. An important building block for practical applications is the simple linguistic indexer that is presented as minor contribution. It is knowledge-poor and works without any training.
This thesis applies computer science approaches in the domain of information science. The introduction describes the foundations in information science; in the conclusion, the focus is set on the relevance for practical applications, especially
regarding the handling of different qualities of KOSs due to automatic and semi-automatic maintenance.

The Role of Reasoning for RDF Validation

For data practitioners embracing the world of RDF and Linked Data, the openness and flexi... more For data practitioners embracing the world of RDF and
Linked Data, the openness and flexibility is a mixed blessing. For them, data validation according to predefined constraints is a much sought-after feature, particularly as this
is taken for granted in the XML world. Based on our work
in the DCMI RDF Application Profiles Task Group
and in cooperation with the W3C Data Shapes Working Group, we published by today 81 types of constraints that are required
by various stakeholders for data applications. These constraint types form the basis to investigate the role that reasoning and different semantics play in practical data validation, why reasoning is beneficial for RDF validation, and
how to overcome the major shortcomings when validating
RDF data by performing reasoning prior to validation. For
each constraint type, we examine (1) if reasoning may im-
prove data quality, (2) how efficient in terms of runtime val-
idation is performed with and without reasoning, and (3) if
validation results depend on underlying semantics which differs between reasoning and validation. Using these findings,
we determine for the most common constraint languages
which constraint types they enable to express and give di-
rections for the further development of constraint languages.

Does it fit? KOS evaluation using the ICE-Map Visualization.

The ICE-Map Visualization was developed to graphically an- alyze the distribution of indexing res... more The ICE-Map Visualization was developed to graphically an-
alyze the distribution of indexing results within a given Knowledge Organization System (KOS) hierarchy and allows the user to explore the document sets and the KOSs at the same time. In this paper, we demonstrate the use of the ICE-Map Visualization in combination with a simple
automatic indexer to visualize the semantic overlap between a KOS and a set of documents.

Papers

SKOS : eine Sprache für die Übertragung von Thesauri ins Semantic Web

Das Semantic Web – bzw. Linked Data – hat das Potenzial, die Verfugbarkeit von Daten und Wissen, ... more Das Semantic Web – bzw. Linked Data – hat das Potenzial, die Verfugbarkeit von Daten und Wissen, sowie den Zugriff darauf zu revolutionieren. Einen grosen Beitrag dazu konnen Wissensorganisationssysteme wie Thesauri leisten, die die Daten inhaltlich erschliesen und strukturieren. Leider sind immer noch viele dieser Systeme lediglich in Buchform oder in speziellen Anwendungen verfugbar. Wie also lassen sie sich fur das Semantic Web nutzen? Das Simple Knowledge Organization System (SKOS) bietet eine Moglichkeit, die Wissensorganisationssysteme in eine Form zu “ubersetzen”, die im Web zitiert und mit anderen Resourcen verknupft werden kann.

Validating RDF Data Quality Using Constraints to Direct the Development of Constraint Languages

Towards description set profiles for RDF using SPARQL as intermediate language

International Conference on Dublin Core and Metadata Applications, Oct 8, 2014

Evaluating the Quality of RDF Data Sets on Common Vocabularies in the Social, Behavioral, and Economic Sciences

arXiv (Cornell University), Apr 17, 2015

Results of the Ontology Alignment Evaluation Initiative 2012

Results of the Ontology Alignment Evaluation Initiative 2016

HAL (Le Centre pour la Communication Scientifique Directe), Oct 18, 2016

Results of the Ontology Alignment Evaluation Initiative 2014

HAL (Le Centre pour la Communication Scientifique Directe), Oct 20, 2014

RDF Constraints to Validate Rectangular Data and Metadata on Person-Level Data, Aggregated Data, Thesauri, and Statistical Classifications

arXiv (Cornell University), Apr 17, 2015

Constraints to Validate RDF Data Quality on Common Vocabularies in the Social, Behavioral, and Economic Sciences

arXiv (Cornell University), Apr 17, 2015

Provenance and annotations for linked data

International Conference on Dublin Core and Metadata Applications, Sep 2, 2013

Cross-lingual extreme summarization of scholarly documents

International Journal on Digital Libraries

The number of scientific publications nowadays is rapidly increasing, causing information overloa... more The number of scientific publications nowadays is rapidly increasing, causing information overload for researchers and making it hard for scholars to keep up to date with current trends and lines of work. Recent work has tried to address this problem by developing methods for automated summarization in the scholarly domain, but concentrated so far only on monolingual settings, primarily English. In this paper, we consequently explore how state-of-the-art neural abstract summarization models based on a multilingual encoder–decoder architecture can be used to enable cross-lingual extreme summaries of scholarly texts. To this end, we compile a new abstractive cross-lingual summarization dataset for the scholarly domain in four different languages, which enables us to train and evaluate models that process English papers and generate summaries in German, Italian, Chinese and Japanese. We present our new X-SCITLDR dataset for multilingual summarization and thoroughly benchmark different ...

Towards Automated Survey Variable Search and Summarization in Social Science Publications

arXiv (Cornell University), Sep 14, 2022

Overview of the SV-Ident 2022 Shared Task on Survey Variable Identification in Social Science Publications

arXiv (Cornell University), Sep 19, 2022

Extracting Event Metadata from Proceedings Titles

Zenodo (CERN European Organization for Nuclear Research), May 21, 2022

Data science curriculum in the iField

Journal of the Association for Information Science and Technology

Many disciplines, including the broad Field of Information (iField), offer Data Science (DS) prog... more Many disciplines, including the broad Field of Information (iField), offer Data Science (DS) programs. There have been significant efforts exploring an individual discipline's identity and unique contributions to the broader DS education landscape. To advance DS education in the iField, the iSchool Data Science Curriculum Committee (iDSCC) was formed and charged with building and recommending a DS education framework for iSchools. This paper reports on the research process and findings of a series of studies to address important questions: What is the iField identity in the multidisciplinary DS education landscape? What is the status of DS education in iField schools? What knowledge and skills should be included in the core curriculum for iField DS education? What are the jobs available for DS graduates from the iField? What are the differences between graduate‐level and undergraduate‐level DS education? Answers to these questions will not only distinguish an iField approach to ...

Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries

Semantic Web: The Basics (Anhang 1 zum Protokoll der 3. Sitzung des Fachausschusses Regelwerke und Formate vom 27.10.2009)

Summary: - RDF is a simple, graph-based data model for metadata on the web - RDF has an XML synta... more Summary: - RDF is a simple, graph-based data model for metadata on the web - RDF has an XML syntax for: - Exchanging RDF Models - Embedding RDF Models into web pages - Advantages over XML - Data model is agnostic to syntactic variations - Information from different models and locations can easily be linked - Some important operations are trivial (i.e. merging two models) - RDF Schema defines special resources and predicates for defining vocabularies - Vokabular: Class, SubClassOf, domain, range - Implicit information can be derived using simple derivation rules - There is no clear separation between model and schema, schema elements can be part of an RDF mode

Usage-driven Maintenance of Knowledge Organization Systems

Knowledge Organization Systems (KOS) are typically used as background knowledge for document inde... more Knowledge Organization Systems (KOS) are typically used as background knowledge for document indexing in information retrieval. They have to be maintained and adapted constantly to reflect changes in the domain and the terminology. In this thesis, approaches are provided that support the maintenance of hierarchical knowledge organization systems, like thesauri, classifications, or taxonomies, by making information about the usage of KOS concepts available to the maintainer.
The central contribution is the ICE-Map Visualization, a treemap-based visualization on top of a generalized statistical framework that is able to visualize almost arbitrary usage information. The proper selection of an existing KOS for available documents and the evaluation of a KOS for different indexing techniques by means of the ICE-Map Visualization is demonstrated.
For the creation of a new KOS, an approach based on crowdsourcing is presented that uses feedback from Amazon Mechanical Turk to relate terms hierarchically. The extension of an existing KOS with new terms derived from the documents to be indexed is performed with a machine-learning approach that relates the terms to existing concepts in the hierarchy. The features are derived from text snippets in the result list of a web search engine. For the splitting of overpopulated
concepts into new subconcepts, an interactive clustering approach is presented that is able to propose names for the new subconcepts. The implementation of a framework is described that integrates all approaches of this thesis and contains the reference implementation of the ICE-Map Visual-
ization. It is extendable and supports the implementation of evaluation methods that build on other evaluations. Additionally, it supports the visualization of the results and the implementation of new visualizations. An important building block for practical applications is the simple linguistic indexer that is presented as minor contribution. It is knowledge-poor and works without any training.
This thesis applies computer science approaches in the domain of information science. The introduction describes the foundations in information science; in the conclusion, the focus is set on the relevance for practical applications, especially
regarding the handling of different qualities of KOSs due to automatic and semi-automatic maintenance.

The Role of Reasoning for RDF Validation

For data practitioners embracing the world of RDF and Linked Data, the openness and flexi... more For data practitioners embracing the world of RDF and
Linked Data, the openness and flexibility is a mixed blessing. For them, data validation according to predefined constraints is a much sought-after feature, particularly as this
is taken for granted in the XML world. Based on our work
in the DCMI RDF Application Profiles Task Group
and in cooperation with the W3C Data Shapes Working Group, we published by today 81 types of constraints that are required
by various stakeholders for data applications. These constraint types form the basis to investigate the role that reasoning and different semantics play in practical data validation, why reasoning is beneficial for RDF validation, and
how to overcome the major shortcomings when validating
RDF data by performing reasoning prior to validation. For
each constraint type, we examine (1) if reasoning may im-
prove data quality, (2) how efficient in terms of runtime val-
idation is performed with and without reasoning, and (3) if
validation results depend on underlying semantics which differs between reasoning and validation. Using these findings,
we determine for the most common constraint languages
which constraint types they enable to express and give di-
rections for the further development of constraint languages.

Does it fit? KOS evaluation using the ICE-Map Visualization.

The ICE-Map Visualization was developed to graphically an- alyze the distribution of indexing res... more The ICE-Map Visualization was developed to graphically an-
alyze the distribution of indexing results within a given Knowledge Organization System (KOS) hierarchy and allows the user to explore the document sets and the KOSs at the same time. In this paper, we demonstrate the use of the ICE-Map Visualization in combination with a simple
automatic indexer to visualize the semantic overlap between a KOS and a set of documents.

SKOS : eine Sprache für die Übertragung von Thesauri ins Semantic Web

Das Semantic Web – bzw. Linked Data – hat das Potenzial, die Verfugbarkeit von Daten und Wissen, ... more Das Semantic Web – bzw. Linked Data – hat das Potenzial, die Verfugbarkeit von Daten und Wissen, sowie den Zugriff darauf zu revolutionieren. Einen grosen Beitrag dazu konnen Wissensorganisationssysteme wie Thesauri leisten, die die Daten inhaltlich erschliesen und strukturieren. Leider sind immer noch viele dieser Systeme lediglich in Buchform oder in speziellen Anwendungen verfugbar. Wie also lassen sie sich fur das Semantic Web nutzen? Das Simple Knowledge Organization System (SKOS) bietet eine Moglichkeit, die Wissensorganisationssysteme in eine Form zu “ubersetzen”, die im Web zitiert und mit anderen Resourcen verknupft werden kann.

Validating RDF Data Quality Using Constraints to Direct the Development of Constraint Languages

Towards description set profiles for RDF using SPARQL as intermediate language

International Conference on Dublin Core and Metadata Applications, Oct 8, 2014

Evaluating the Quality of RDF Data Sets on Common Vocabularies in the Social, Behavioral, and Economic Sciences

arXiv (Cornell University), Apr 17, 2015

Results of the Ontology Alignment Evaluation Initiative 2012

Results of the Ontology Alignment Evaluation Initiative 2016

HAL (Le Centre pour la Communication Scientifique Directe), Oct 18, 2016

Results of the Ontology Alignment Evaluation Initiative 2014

HAL (Le Centre pour la Communication Scientifique Directe), Oct 20, 2014

RDF Constraints to Validate Rectangular Data and Metadata on Person-Level Data, Aggregated Data, Thesauri, and Statistical Classifications

arXiv (Cornell University), Apr 17, 2015

Constraints to Validate RDF Data Quality on Common Vocabularies in the Social, Behavioral, and Economic Sciences

arXiv (Cornell University), Apr 17, 2015

Provenance and annotations for linked data

International Conference on Dublin Core and Metadata Applications, Sep 2, 2013

Cross-lingual extreme summarization of scholarly documents

International Journal on Digital Libraries

The number of scientific publications nowadays is rapidly increasing, causing information overloa... more The number of scientific publications nowadays is rapidly increasing, causing information overload for researchers and making it hard for scholars to keep up to date with current trends and lines of work. Recent work has tried to address this problem by developing methods for automated summarization in the scholarly domain, but concentrated so far only on monolingual settings, primarily English. In this paper, we consequently explore how state-of-the-art neural abstract summarization models based on a multilingual encoder–decoder architecture can be used to enable cross-lingual extreme summaries of scholarly texts. To this end, we compile a new abstractive cross-lingual summarization dataset for the scholarly domain in four different languages, which enables us to train and evaluate models that process English papers and generate summaries in German, Italian, Chinese and Japanese. We present our new X-SCITLDR dataset for multilingual summarization and thoroughly benchmark different ...

Towards Automated Survey Variable Search and Summarization in Social Science Publications

arXiv (Cornell University), Sep 14, 2022

Overview of the SV-Ident 2022 Shared Task on Survey Variable Identification in Social Science Publications

arXiv (Cornell University), Sep 19, 2022

Extracting Event Metadata from Proceedings Titles

Zenodo (CERN European Organization for Nuclear Research), May 21, 2022

Data science curriculum in the iField

Journal of the Association for Information Science and Technology

Many disciplines, including the broad Field of Information (iField), offer Data Science (DS) prog... more Many disciplines, including the broad Field of Information (iField), offer Data Science (DS) programs. There have been significant efforts exploring an individual discipline's identity and unique contributions to the broader DS education landscape. To advance DS education in the iField, the iSchool Data Science Curriculum Committee (iDSCC) was formed and charged with building and recommending a DS education framework for iSchools. This paper reports on the research process and findings of a series of studies to address important questions: What is the iField identity in the multidisciplinary DS education landscape? What is the status of DS education in iField schools? What knowledge and skills should be included in the core curriculum for iField DS education? What are the jobs available for DS graduates from the iField? What are the differences between graduate‐level and undergraduate‐level DS education? Answers to these questions will not only distinguish an iField approach to ...

Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries

Semantic Web: The Basics (Anhang 1 zum Protokoll der 3. Sitzung des Fachausschusses Regelwerke und Formate vom 27.10.2009)

Summary: - RDF is a simple, graph-based data model for metadata on the web - RDF has an XML synta... more Summary: - RDF is a simple, graph-based data model for metadata on the web - RDF has an XML syntax for: - Exchanging RDF Models - Embedding RDF Models into web pages - Advantages over XML - Data model is agnostic to syntactic variations - Information from different models and locations can easily be linked - Some important operations are trivial (i.e. merging two models) - RDF Schema defines special resources and predicates for defining vocabularies - Vokabular: Class, SubClassOf, domain, range - Implicit information can be derived using simple derivation rules - There is no clear separation between model and schema, schema elements can be part of an RDF mode

A methodology for supervised automatic document annotation

Bull. IEEE Tech. Comm. Digit. Libr., 2008

Springer LOD Conference Portal. Demo paper

Despite many efforts for making data about scholarly publications available on the Web of Data, i... more Despite many efforts for making data about scholarly publications available on the Web of Data, information about academic conferences is still contained in (at best) free-text format. Availability of this data in a structured format would enable more efficient decision making for researchers, libraries, publishers, funding and evaluation bodies. This demo paper showcases the Springer Linked Open Data (LOD) conference portal (available at http://lod.springer.com). We cover the architecture, vocabularies and features of the portal and present usage scenarios.

ArguminSci: A Tool for Analyzing Argumentation and Rhetorical Aspects in Scientific Writing

Proceedings of the 5th Workshop on Argument Mining, 2018