Skip to main content
Anita Waard
  • Jericho, Vermont, United States

Anita Waard

We propose a deep learning model for identifying structure within experiment narratives in scientific literature. We take a sequence labeling approach to this problem, and label clauses within experiment narratives to identify the... more
We propose a deep learning model for identifying structure within experiment narratives in scientific literature. We take a sequence labeling approach to this problem, and label clauses within experiment narratives to identify the different parts of the experiment. Our dataset consists of paragraphs taken from open access PubMed papers labeled with rhetorical information as a result of our pilot annotation. Our model is a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) cells that labels clauses. The clause representations are computed by combining word representations using a novel attention mechanism that involves a separate RNN. We compare this model against LSTMs where the input layer has simple or no attention and a feature rich CRF model. Furthermore, we describe how our work could be useful for information extraction from scientific literature.
With the advancement of technology and the wide adoption of ontologies as knowledge representation formats, in the last decade, a handful of models were proposed for the externalization of the rhetoric and argumentation captured within... more
With the advancement of technology and the wide adoption of ontologies as knowledge representation formats, in the last decade, a handful of models were proposed for the externalization of the rhetoric and argumentation captured within scientific publications. Conceptually, most of these models share a similar representation form of the scientific publication, i.e. as a series of interconnected elementary knowledge items. The main differences are given by the terminology used, the types of rhetorical and/or argumentation relations connecting the knowledge items and the foundational theories supporting these relations. This paper analyzes the state of the art and provides a concise comparative overview of the five most prominent discourse representation models, with the goal of sketching an unified model for discourse representation.
We describe a novel approach to machine reading of the primary scientific literature. We treat a description of an experiment as a discourse, viewing a scientific corpus not merely into a collection of documents, but also an extended... more
We describe a novel approach to machine reading of the primary scientific literature. We treat a description of an experiment as a discourse, viewing a scientific corpus not merely into a collection of documents, but also an extended conversation formed by the collective set of experiments, their introductions and interpretations. This paper introduces this approach as a methodology called ‘Cycles of Scientific Investigation in Discourse’ (CoSID). In CoSID, we capture the central conceptual structure of a paper as a series of nested reasoning loops, composed of passages in results sections, which describe individual research findings. We ground our work with a number of worked examples based on data from the MINTACT and Pathway Logic databases, and illustrate the idea in the context of machine-enable biocuration. Keywords—interpretive framework for experiments, experiment description as discourse, computational language technology I. THEORETICAL BACKGROUND All experiments consist of...
To enable better representations of biomedical argumentation over collections of research papers, we propose a model and a lightweight ontology to represent interpersonal, discourse-based, data-driven reasoning. This model is applied to a... more
To enable better representations of biomedical argumentation over collections of research papers, we propose a model and a lightweight ontology to represent interpersonal, discourse-based, data-driven reasoning. This model is applied to a collection of scientific documents, to show how it can be applied in practice. We present three biomedical applications for this work, and suggest connections with other, existing, ontologies and reasoning tools. Specifically, this model offers a lightweight way to connect nanopublication-like formal representations to scientific papers written in natural language.
We have created a tool to identify and store experimental metadata during the execution of an electrophysiological experiment, and a semantic architecture to enable access, manipulation and integration of this data to support a... more
We have created a tool to identify and store experimental metadata during the execution of an electrophysiological experiment, and a semantic architecture to enable access, manipulation and integration of this data to support a collaborative research environment. We discuss possible extensions of this work to aid data sharing and semantic research frameworks.
The DOPE project (Drug Ontology Project for Elsevier) is driven by the need to access multiple information sources through a single interface. In this paper, we describe how DOPE allows thesaurus-driven access to heterogeneous and... more
The DOPE project (Drug Ontology Project for Elsevier) is driven by the need to access multiple information sources through a single interface. In this paper, we describe how DOPE allows thesaurus-driven access to heterogeneous and distributed data, based on the RDF data model. The architecture allows for the easy addition of thesauri and data sources, and can facilitate explorations in ontology mapping and data integration.
We present our research on establishing a new format for the scientific Research Article. In our paper, we analyze over 20 years of experiments and research that have been done by our group, both at Elsevier and at various Dutch academic... more
We present our research on establishing a new format for the scientific Research Article. In our paper, we analyze over 20 years of experiments and research that have been done by our group, both at Elsevier and at various Dutch academic institutions. Firstly, we provide a theoretical framework for thinking about scientific discourse, based on an analysis of the data and knowledge contained within an article, as well as its rhetorical and semantic roles. We then provide an overview of experiments which were internally conducted at Elsevier to investigate four directions in electronic publishing: first, the concept of Modularity: defining content elements of the research article that can exist and be published independently; second, the concept of multi-modality, where we investigate the role of non-textual data as a formative and navigation principle for allowing access to various content sources; third: experiments in semantic access to data repositories, using a framework based on...
This report documents the program and the outcomes of Dagstuhl Perspectives Workshop 11331 "The Future of Research Communication". The purpose of the workshop was to bring together researchers from these different disciplines,... more
This report documents the program and the outcomes of Dagstuhl Perspectives Workshop 11331 "The Future of Research Communication". The purpose of the workshop was to bring together researchers from these different disciplines, whose core research goal is changing the formats, standards, and means by which we communicate science.
We review over 10 years of research at Elsevier and various Dutch academic institutions on establishing a new format for the scientific research article. Our work rests on two main theoretical principles: the concept of modular documents,... more
We review over 10 years of research at Elsevier and various Dutch academic institutions on establishing a new format for the scientific research article. Our work rests on two main theoretical principles: the concept of modular documents, consisting of content elements that can exist and be published independently and are linked by meaningful relations, and the use of semantic data standards allowing access to heterogeneous data. We discuss the application of these concepts in five different projects: a modular format for physics articles, an XML encyclopedia in pharmacology, a semantic data integration project, a modular format for computer science proceedings papers, and our current work on research articles in cell biology.
We review over 10 years of research at Elsevier and various Dutch academic institutions on establishing a new format for the scientific research article. Our work rests on two main theoretical principles: the concept of modular documents,... more
We review over 10 years of research at Elsevier and various Dutch academic institutions on establishing a new format for the scientific research article. Our work rests on two main theoretical principles: the concept of modular documents, consisting of content elements that can exist and be published independently and are linked by meaningful relations, and the use of semantic data standards allowing access to heterogeneous data. We discuss the application of these concepts in five different projects: a modular format for physics articles, an XML encyclopedia in pharmacology, a semantic data integration project, a modular format for computer science proceedings papers, and our current work on research articles in cell biology.
This paper describes a lightweight ontology for representing annotations of declarative evidence based clinical guidelines. We present the motivation and requirements for this representation, based on an analysis of several guidelines.... more
This paper describes a lightweight ontology for representing annotations of declarative evidence based clinical guidelines. We present the motivation and requirements for this representation, based on an analysis of several guidelines. The ontology provides the means to connect clinical questions and associated recommendations to underlying evidence, and can capture strength and quality of recommendations and evidence, respectively. The ontology was applied in the conversion of manual annotations to RDF and used as part of a prototype clinical decision support system.
To improve the classification of biological texts into rhetorical discourse segments, we need a taxonomy for biological verbs. After reviewing three existing classifications, we created a merged taxonomy that encompasses both... more
To improve the classification of biological texts into rhetorical discourse segments, we need a taxonomy for biological verbs. After reviewing three existing classifications, we created a merged taxonomy that encompasses both biology-specific and scientific discoursespecific elements. We provide a manual classification of 239 unique verbs from two full-text biology papers to this taxonomy, and investigate correspondences with segment type. This leads us to propose a simple model of scientific communication, that might enable further
We describe a large-scale community effort to build an open-access, interoperable, and computable repository of COVID-19 molecular mechanisms - the COVID-19 Disease Map. We discuss the tools, platforms, and guidelines necessary for the... more
We describe a large-scale community effort to build an open-access, interoperable, and computable repository of COVID-19 molecular mechanisms - the COVID-19 Disease Map. We discuss the tools, platforms, and guidelines necessary for the distributed development of its contents by a multi-faceted community of biocurators, domain experts, bioinformaticians, and computational biologists. We highlight the role of relevant databases and text mining approaches in enrichment and validation of the curated mechanisms. We describe the contents of the Map and their relevance to the molecular pathophysiology of COVID-19 and the analytical and computational modelling approaches that can be applied for mechanistic data interpretation and predictions. We conclude by demonstrating concrete applications of our work through several use cases and highlight new testable hypotheses.
Over the past five years, Elsevier has focused on implementing FAIR and best practices in data management, from data preservation through reuse. In this paper we describe a series of efforts undertaken in this time to support proper data... more
Over the past five years, Elsevier has focused on implementing FAIR and best practices in data management, from data preservation through reuse. In this paper we describe a series of efforts undertaken in this time to support proper data management practices. In particular, we discuss our journal data policies and their implementation, the current status and future goals for the research data management platform Mendeley Data, and clear and persistent linkages to individual data sets stored on external data repositories from corresponding published papers through partnership with Scholix. Early analysis of our data policies implementation confirms significant disparities at the subject level regarding data sharing practices, with most uptake within disciplines of Physical Sciences. Future directions at Elsevier include implementing better discoverability of linked data within an article and incorporating research data usage metrics.
To support superior ways of sharing scientific, medical and engineering knowledge, Elsevier has been an active participant in a plethora of research projects, multi-stakeholder communities, alliances and data standards organisations. As... more
To support superior ways of sharing scientific, medical and engineering knowledge, Elsevier has been an active participant in a plethora of research projects, multi-stakeholder communities, alliances and data standards organisations. As an example, we will discuss how Elsevier has contributed to organisations that advocate and govern the use of unique identifiers throughout the scholarly lifecycle. The shared use of unique identifiers for finding and linking to papers, authors, and institutions has been an area where Elsevier has invested in, from the beginning. After the resounding success of the Digital Object Identifier project, which lead to the foundation of CrossRef, a multi-stakeholder organisation that manages DOIs across the publishing landscape, Elsevier was one of the founders of ORCID, intended to assign unique identifiers to scientists. Further cross-stakeholder initiatives for identifying objects to improve findability were pursued through Force11, a multi-stakeholder ...
The grammatical structures scholars use to express their assertions are intended to convey various degrees of certainty or speculation. Prior studies have suggested a variety of categorization systems for scholarly certainty. However,... more
The grammatical structures scholars use to express their assertions are intended to convey various degrees of certainty or speculation. Prior studies have suggested a variety of categorization systems for scholarly certainty. However, these have not been objectively tested for their validity, particularly with respect to representing the interpretation by the reader, rather than the intention of the author. In this study, we use a series of questionnaires to determine how researchers classify various scholarly assertions, using three distinct certainty classification systems. We find that there are three categories of certainty perceived by readers: one level of high certainty, and two levels of lower certainty that are somewhat less distinct, but nevertheless show a significant degree of inter-annotator agreement. We show that these categories can be detected in an automated manner, using a machine learning model, with a cross-validation accuracy of 89.2% relative to an author-anno...
The Enabling FAIR Data project has brought together a broad spectrum of Earth, space, and environmental science leaders to ensure that data are findable, accessible, interoperable, and reusable.
This report documents the program and the outcomes of Dagstuhl Perspectives Workshop 11331 "The Future of Research Communication". The purpose of the workshop was to bring together researchers from these different disciplines,... more
This report documents the program and the outcomes of Dagstuhl Perspectives Workshop 11331 "The Future of Research Communication". The purpose of the workshop was to bring together researchers from these different disciplines, whose core research goal is changing the formats, standards, and means by which we communicate science.
This report documents the program and the outcomes of Dagstuhl Perspectives Workshop 11331 “The Future of Research Communication”. The purpose of the workshop was to bring together researchers from these different disciplines, whose core... more
This report documents the program and the outcomes of Dagstuhl Perspectives Workshop 11331 “The Future of Research Communication”. The purpose of the workshop was to bring together researchers from these different disciplines, whose core research goal is changing the formats, standards, and means by which we communicate science.
This paper presents a three-way perspective on the annotation of discourse in scientific literature. We use three different schemes, each of which focusses on different aspects of discourse in scientific articles, to annotate a corpus of... more
This paper presents a three-way perspective on the annotation of discourse in scientific literature. We use three different schemes, each of which focusses on different aspects of discourse in scientific articles, to annotate a corpus of three full-text papers, and compare the results. One scheme seeks to identify the core components of scientific investigations at the sentence level, a second annotates meta-knowledge pertaining to bio-events and a third considers how epistemic knowledge is conveyed at the clause level. We present our analysis of the comparison, and a discussion of the contributions of each scheme.
SePublica-2011 Semantic Publishing 2011 Proceedings of the 1st Workshop on Semantic Publishing 2011 co-located with the 8th Extended Semantic Web Conference, ESWC2011 Hersonissos, Crete, Greece, May 30, 2011.
Research Interests:
Research Interests:
We have created a tool to identify and store experimental metadata during the execution of an electrophysiological experiment, and a semantic architecture to enable access, manipulation and integration of this data to support a... more
We have created a tool to identify and store experimental metadata during the execution of an electrophysiological experiment, and a semantic architecture to enable access, manipulation and integration of this data to support a collaborative research environment. We discuss possible extensions of this work to aid data sharing and semantic research frameworks.
Research Interests:
Research Interests:
... [10] Harmsze, FAP, van der Tol, M..C. and Kircz, JG 1999. A modular structure for electronic scientific articles. Conf. Informatiewetenschap 1999. CWI, Amsterdam, 1999. In: P. de Bra and L. Hardman (eds). TUE Report 99-20, 2-9. [11]... more
... [10] Harmsze, FAP, van der Tol, M..C. and Kircz, JG 1999. A modular structure for electronic scientific articles. Conf. Informatiewetenschap 1999. CWI, Amsterdam, 1999. In: P. de Bra and L. Hardman (eds). TUE Report 99-20, 2-9. [11] Hovy, EH 1993. ...
ABSTRACT To advance the pace of scientific discovery we propose a conceptual format that forms the basis of a truly new way of publishing science. In our proposal, all scientific communication objects (including experimental workflows,... more
ABSTRACT To advance the pace of scientific discovery we propose a conceptual format that forms the basis of a truly new way of publishing science. In our proposal, all scientific communication objects (including experimental workflows, direct results, email conversations, and all drafted and published information artifacts) are labeled and stored in a great, big, distributed data store (or many distributed data stores that are all connected). Each item has a set of metadata attached to it, which includes (at least) the person and time it was created, the type of object it is, and the status of the object including intellectual property rights and ownership. Every researcher can (and must) deposit every knowledge item that is produced in the lab into this repository. With this deposition goes an essential metadata component that states who has the rights to see, use, distribute, buy or sell this item. Into this grand (and system-wise distributed, cloud-based) architecture, all items produced by a single lab, or several labs, are stored, labeled and connected.
... [10] Harmsze, FAP, van der Tol, M..C. and Kircz, JG 1999. A modular structure for electronic scientific articles. Conf. Informatiewetenschap 1999. CWI, Amsterdam, 1999. In: P. de Bra and L. Hardman (eds). TUE Report 99-20, 2-9. [11]... more
... [10] Harmsze, FAP, van der Tol, M..C. and Kircz, JG 1999. A modular structure for electronic scientific articles. Conf. Informatiewetenschap 1999. CWI, Amsterdam, 1999. In: P. de Bra and L. Hardman (eds). TUE Report 99-20, 2-9. [11] Hovy, EH 1993. ...

And 21 more