Knowledge Organization Systems (KOS) are typically used as background knowledge for document inde... more Knowledge Organization Systems (KOS) are typically used as background knowledge for document indexing in information retrieval. They have to be maintained and adapted constantly to reflect changes in the domain and the terminology. In this thesis, approaches are provided that support the maintenance of hierarchical knowledge organization systems, like thesauri, classifications, or taxonomies, by making information about the usage of KOS concepts available to the maintainer. The central contribution is the ICE-Map Visualization, a treemap-based visualization on top of a generalized statistical framework that is able to visualize almost arbitrary usage information. The proper selection of an existing KOS for available documents and the evaluation of a KOS for different indexing techniques by means of the ICE-Map Visualization is demonstrated. For the creation of a new KOS, an approach based on crowdsourcing is presented that uses feedback from Amazon Mechanical Turk to relate terms hierarchically. The extension of an existing KOS with new terms derived from the documents to be indexed is performed with a machine-learning approach that relates the terms to existing concepts in the hierarchy. The features are derived from text snippets in the result list of a web search engine. For the splitting of overpopulated concepts into new subconcepts, an interactive clustering approach is presented that is able to propose names for the new subconcepts. The implementation of a framework is described that integrates all approaches of this thesis and contains the reference implementation of the ICE-Map Visual- ization. It is extendable and supports the implementation of evaluation methods that build on other evaluations. Additionally, it supports the visualization of the results and the implementation of new visualizations. An important building block for practical applications is the simple linguistic indexer that is presented as minor contribution. It is knowledge-poor and works without any training. This thesis applies computer science approaches in the domain of information science. The introduction describes the foundations in information science; in the conclusion, the focus is set on the relevance for practical applications, especially regarding the handling of different qualities of KOSs due to automatic and semi-automatic maintenance.
For data practitioners embracing the world of RDF and
Linked Data, the openness and flexi... more For data practitioners embracing the world of RDF and Linked Data, the openness and flexibility is a mixed blessing. For them, data validation according to predefined constraints is a much sought-after feature, particularly as this is taken for granted in the XML world. Based on our work in the DCMI RDF Application Profiles Task Group and in cooperation with the W3C Data Shapes Working Group, we published by today 81 types of constraints that are required by various stakeholders for data applications. These constraint types form the basis to investigate the role that reasoning and different semantics play in practical data validation, why reasoning is beneficial for RDF validation, and how to overcome the major shortcomings when validating RDF data by performing reasoning prior to validation. For each constraint type, we examine (1) if reasoning may im- prove data quality, (2) how efficient in terms of runtime val- idation is performed with and without reasoning, and (3) if validation results depend on underlying semantics which differs between reasoning and validation. Using these findings, we determine for the most common constraint languages which constraint types they enable to express and give di- rections for the further development of constraint languages.
The ICE-Map Visualization was developed to graphically an-
alyze the distribution of indexing res... more The ICE-Map Visualization was developed to graphically an- alyze the distribution of indexing results within a given Knowledge Organization System (KOS) hierarchy and allows the user to explore the document sets and the KOSs at the same time. In this paper, we demonstrate the use of the ICE-Map Visualization in combination with a simple automatic indexer to visualize the semantic overlap between a KOS and a set of documents.
Das Semantic Web – bzw. Linked Data – hat das Potenzial, die Verfugbarkeit von Daten und Wissen, ... more Das Semantic Web – bzw. Linked Data – hat das Potenzial, die Verfugbarkeit von Daten und Wissen, sowie den Zugriff darauf zu revolutionieren. Einen grosen Beitrag dazu konnen Wissensorganisationssysteme wie Thesauri leisten, die die Daten inhaltlich erschliesen und strukturieren. Leider sind immer noch viele dieser Systeme lediglich in Buchform oder in speziellen Anwendungen verfugbar. Wie also lassen sie sich fur das Semantic Web nutzen? Das Simple Knowledge Organization System (SKOS) bietet eine Moglichkeit, die Wissensorganisationssysteme in eine Form zu “ubersetzen”, die im Web zitiert und mit anderen Resourcen verknupft werden kann.
The number of scientific publications nowadays is rapidly increasing, causing information overloa... more The number of scientific publications nowadays is rapidly increasing, causing information overload for researchers and making it hard for scholars to keep up to date with current trends and lines of work. Recent work has tried to address this problem by developing methods for automated summarization in the scholarly domain, but concentrated so far only on monolingual settings, primarily English. In this paper, we consequently explore how state-of-the-art neural abstract summarization models based on a multilingual encoder–decoder architecture can be used to enable cross-lingual extreme summaries of scholarly texts. To this end, we compile a new abstractive cross-lingual summarization dataset for the scholarly domain in four different languages, which enables us to train and evaluate models that process English papers and generate summaries in German, Italian, Chinese and Japanese. We present our new X-SCITLDR dataset for multilingual summarization and thoroughly benchmark different ...
Journal of the Association for Information Science and Technology
Many disciplines, including the broad Field of Information (iField), offer Data Science (DS) prog... more Many disciplines, including the broad Field of Information (iField), offer Data Science (DS) programs. There have been significant efforts exploring an individual discipline's identity and unique contributions to the broader DS education landscape. To advance DS education in the iField, the iSchool Data Science Curriculum Committee (iDSCC) was formed and charged with building and recommending a DS education framework for iSchools. This paper reports on the research process and findings of a series of studies to address important questions: What is the iField identity in the multidisciplinary DS education landscape? What is the status of DS education in iField schools? What knowledge and skills should be included in the core curriculum for iField DS education? What are the jobs available for DS graduates from the iField? What are the differences between graduate‐level and undergraduate‐level DS education? Answers to these questions will not only distinguish an iField approach to ...
Summary: - RDF is a simple, graph-based data model for metadata on the web - RDF has an XML synta... more Summary: - RDF is a simple, graph-based data model for metadata on the web - RDF has an XML syntax for: - Exchanging RDF Models - Embedding RDF Models into web pages - Advantages over XML - Data model is agnostic to syntactic variations - Information from different models and locations can easily be linked - Some important operations are trivial (i.e. merging two models) - RDF Schema defines special resources and predicates for defining vocabularies - Vokabular: Class, SubClassOf, domain, range - Implicit information can be derived using simple derivation rules - There is no clear separation between model and schema, schema elements can be part of an RDF mode
Knowledge Organization Systems (KOS) are typically used as background knowledge for document inde... more Knowledge Organization Systems (KOS) are typically used as background knowledge for document indexing in information retrieval. They have to be maintained and adapted constantly to reflect changes in the domain and the terminology. In this thesis, approaches are provided that support the maintenance of hierarchical knowledge organization systems, like thesauri, classifications, or taxonomies, by making information about the usage of KOS concepts available to the maintainer. The central contribution is the ICE-Map Visualization, a treemap-based visualization on top of a generalized statistical framework that is able to visualize almost arbitrary usage information. The proper selection of an existing KOS for available documents and the evaluation of a KOS for different indexing techniques by means of the ICE-Map Visualization is demonstrated. For the creation of a new KOS, an approach based on crowdsourcing is presented that uses feedback from Amazon Mechanical Turk to relate terms hierarchically. The extension of an existing KOS with new terms derived from the documents to be indexed is performed with a machine-learning approach that relates the terms to existing concepts in the hierarchy. The features are derived from text snippets in the result list of a web search engine. For the splitting of overpopulated concepts into new subconcepts, an interactive clustering approach is presented that is able to propose names for the new subconcepts. The implementation of a framework is described that integrates all approaches of this thesis and contains the reference implementation of the ICE-Map Visual- ization. It is extendable and supports the implementation of evaluation methods that build on other evaluations. Additionally, it supports the visualization of the results and the implementation of new visualizations. An important building block for practical applications is the simple linguistic indexer that is presented as minor contribution. It is knowledge-poor and works without any training. This thesis applies computer science approaches in the domain of information science. The introduction describes the foundations in information science; in the conclusion, the focus is set on the relevance for practical applications, especially regarding the handling of different qualities of KOSs due to automatic and semi-automatic maintenance.
For data practitioners embracing the world of RDF and
Linked Data, the openness and flexi... more For data practitioners embracing the world of RDF and Linked Data, the openness and flexibility is a mixed blessing. For them, data validation according to predefined constraints is a much sought-after feature, particularly as this is taken for granted in the XML world. Based on our work in the DCMI RDF Application Profiles Task Group and in cooperation with the W3C Data Shapes Working Group, we published by today 81 types of constraints that are required by various stakeholders for data applications. These constraint types form the basis to investigate the role that reasoning and different semantics play in practical data validation, why reasoning is beneficial for RDF validation, and how to overcome the major shortcomings when validating RDF data by performing reasoning prior to validation. For each constraint type, we examine (1) if reasoning may im- prove data quality, (2) how efficient in terms of runtime val- idation is performed with and without reasoning, and (3) if validation results depend on underlying semantics which differs between reasoning and validation. Using these findings, we determine for the most common constraint languages which constraint types they enable to express and give di- rections for the further development of constraint languages.
The ICE-Map Visualization was developed to graphically an-
alyze the distribution of indexing res... more The ICE-Map Visualization was developed to graphically an- alyze the distribution of indexing results within a given Knowledge Organization System (KOS) hierarchy and allows the user to explore the document sets and the KOSs at the same time. In this paper, we demonstrate the use of the ICE-Map Visualization in combination with a simple automatic indexer to visualize the semantic overlap between a KOS and a set of documents.
Das Semantic Web – bzw. Linked Data – hat das Potenzial, die Verfugbarkeit von Daten und Wissen, ... more Das Semantic Web – bzw. Linked Data – hat das Potenzial, die Verfugbarkeit von Daten und Wissen, sowie den Zugriff darauf zu revolutionieren. Einen grosen Beitrag dazu konnen Wissensorganisationssysteme wie Thesauri leisten, die die Daten inhaltlich erschliesen und strukturieren. Leider sind immer noch viele dieser Systeme lediglich in Buchform oder in speziellen Anwendungen verfugbar. Wie also lassen sie sich fur das Semantic Web nutzen? Das Simple Knowledge Organization System (SKOS) bietet eine Moglichkeit, die Wissensorganisationssysteme in eine Form zu “ubersetzen”, die im Web zitiert und mit anderen Resourcen verknupft werden kann.
The number of scientific publications nowadays is rapidly increasing, causing information overloa... more The number of scientific publications nowadays is rapidly increasing, causing information overload for researchers and making it hard for scholars to keep up to date with current trends and lines of work. Recent work has tried to address this problem by developing methods for automated summarization in the scholarly domain, but concentrated so far only on monolingual settings, primarily English. In this paper, we consequently explore how state-of-the-art neural abstract summarization models based on a multilingual encoder–decoder architecture can be used to enable cross-lingual extreme summaries of scholarly texts. To this end, we compile a new abstractive cross-lingual summarization dataset for the scholarly domain in four different languages, which enables us to train and evaluate models that process English papers and generate summaries in German, Italian, Chinese and Japanese. We present our new X-SCITLDR dataset for multilingual summarization and thoroughly benchmark different ...
Journal of the Association for Information Science and Technology
Many disciplines, including the broad Field of Information (iField), offer Data Science (DS) prog... more Many disciplines, including the broad Field of Information (iField), offer Data Science (DS) programs. There have been significant efforts exploring an individual discipline's identity and unique contributions to the broader DS education landscape. To advance DS education in the iField, the iSchool Data Science Curriculum Committee (iDSCC) was formed and charged with building and recommending a DS education framework for iSchools. This paper reports on the research process and findings of a series of studies to address important questions: What is the iField identity in the multidisciplinary DS education landscape? What is the status of DS education in iField schools? What knowledge and skills should be included in the core curriculum for iField DS education? What are the jobs available for DS graduates from the iField? What are the differences between graduate‐level and undergraduate‐level DS education? Answers to these questions will not only distinguish an iField approach to ...
Summary: - RDF is a simple, graph-based data model for metadata on the web - RDF has an XML synta... more Summary: - RDF is a simple, graph-based data model for metadata on the web - RDF has an XML syntax for: - Exchanging RDF Models - Embedding RDF Models into web pages - Advantages over XML - Data model is agnostic to syntactic variations - Information from different models and locations can easily be linked - Some important operations are trivial (i.e. merging two models) - RDF Schema defines special resources and predicates for defining vocabularies - Vokabular: Class, SubClassOf, domain, range - Implicit information can be derived using simple derivation rules - There is no clear separation between model and schema, schema elements can be part of an RDF mode
Despite many efforts for making data about scholarly publications available on the Web of Data, i... more Despite many efforts for making data about scholarly publications available on the Web of Data, information about academic conferences is still contained in (at best) free-text format. Availability of this data in a structured format would enable more efficient decision making for researchers, libraries, publishers, funding and evaluation bodies. This demo paper showcases the Springer Linked Open Data (LOD) conference portal (available at http://lod.springer.com). We cover the architecture, vocabularies and features of the portal and present usage scenarios.
Uploads
The central contribution is the ICE-Map Visualization, a treemap-based visualization on top of a generalized statistical framework that is able to visualize almost arbitrary usage information. The proper selection of an existing KOS for available documents and the evaluation of a KOS for different indexing techniques by means of the ICE-Map Visualization is demonstrated.
For the creation of a new KOS, an approach based on crowdsourcing is presented that uses feedback from Amazon Mechanical Turk to relate terms hierarchically. The extension of an existing KOS with new terms derived from the documents to be indexed is performed with a machine-learning approach that relates the terms to existing concepts in the hierarchy. The features are derived from text snippets in the result list of a web search engine. For the splitting of overpopulated
concepts into new subconcepts, an interactive clustering approach is presented that is able to propose names for the new subconcepts. The implementation of a framework is described that integrates all approaches of this thesis and contains the reference implementation of the ICE-Map Visual-
ization. It is extendable and supports the implementation of evaluation methods that build on other evaluations. Additionally, it supports the visualization of the results and the implementation of new visualizations. An important building block for practical applications is the simple linguistic indexer that is presented as minor contribution. It is knowledge-poor and works without any training.
This thesis applies computer science approaches in the domain of information science. The introduction describes the foundations in information science; in the conclusion, the focus is set on the relevance for practical applications, especially
regarding the handling of different qualities of KOSs due to automatic and semi-automatic maintenance.
Linked Data, the openness and flexibility is a mixed blessing. For them, data validation according to predefined constraints is a much sought-after feature, particularly as this
is taken for granted in the XML world. Based on our work
in the DCMI RDF Application Profiles Task Group
and in cooperation with the W3C Data Shapes Working Group, we published by today 81 types of constraints that are required
by various stakeholders for data applications. These constraint types form the basis to investigate the role that reasoning and different semantics play in practical data validation, why reasoning is beneficial for RDF validation, and
how to overcome the major shortcomings when validating
RDF data by performing reasoning prior to validation. For
each constraint type, we examine (1) if reasoning may im-
prove data quality, (2) how efficient in terms of runtime val-
idation is performed with and without reasoning, and (3) if
validation results depend on underlying semantics which differs between reasoning and validation. Using these findings,
we determine for the most common constraint languages
which constraint types they enable to express and give di-
rections for the further development of constraint languages.
alyze the distribution of indexing results within a given Knowledge Organization System (KOS) hierarchy and allows the user to explore the document sets and the KOSs at the same time. In this paper, we demonstrate the use of the ICE-Map Visualization in combination with a simple
automatic indexer to visualize the semantic overlap between a KOS and a set of documents.
The central contribution is the ICE-Map Visualization, a treemap-based visualization on top of a generalized statistical framework that is able to visualize almost arbitrary usage information. The proper selection of an existing KOS for available documents and the evaluation of a KOS for different indexing techniques by means of the ICE-Map Visualization is demonstrated.
For the creation of a new KOS, an approach based on crowdsourcing is presented that uses feedback from Amazon Mechanical Turk to relate terms hierarchically. The extension of an existing KOS with new terms derived from the documents to be indexed is performed with a machine-learning approach that relates the terms to existing concepts in the hierarchy. The features are derived from text snippets in the result list of a web search engine. For the splitting of overpopulated
concepts into new subconcepts, an interactive clustering approach is presented that is able to propose names for the new subconcepts. The implementation of a framework is described that integrates all approaches of this thesis and contains the reference implementation of the ICE-Map Visual-
ization. It is extendable and supports the implementation of evaluation methods that build on other evaluations. Additionally, it supports the visualization of the results and the implementation of new visualizations. An important building block for practical applications is the simple linguistic indexer that is presented as minor contribution. It is knowledge-poor and works without any training.
This thesis applies computer science approaches in the domain of information science. The introduction describes the foundations in information science; in the conclusion, the focus is set on the relevance for practical applications, especially
regarding the handling of different qualities of KOSs due to automatic and semi-automatic maintenance.
Linked Data, the openness and flexibility is a mixed blessing. For them, data validation according to predefined constraints is a much sought-after feature, particularly as this
is taken for granted in the XML world. Based on our work
in the DCMI RDF Application Profiles Task Group
and in cooperation with the W3C Data Shapes Working Group, we published by today 81 types of constraints that are required
by various stakeholders for data applications. These constraint types form the basis to investigate the role that reasoning and different semantics play in practical data validation, why reasoning is beneficial for RDF validation, and
how to overcome the major shortcomings when validating
RDF data by performing reasoning prior to validation. For
each constraint type, we examine (1) if reasoning may im-
prove data quality, (2) how efficient in terms of runtime val-
idation is performed with and without reasoning, and (3) if
validation results depend on underlying semantics which differs between reasoning and validation. Using these findings,
we determine for the most common constraint languages
which constraint types they enable to express and give di-
rections for the further development of constraint languages.
alyze the distribution of indexing results within a given Knowledge Organization System (KOS) hierarchy and allows the user to explore the document sets and the KOSs at the same time. In this paper, we demonstrate the use of the ICE-Map Visualization in combination with a simple
automatic indexer to visualize the semantic overlap between a KOS and a set of documents.