Skip to main content
Abstract The Open Provenance Specification is composed of the following documents:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]. In this deliverable, we have concatenated 11 distinct documents for the convenience of the reader. Each of these has... more
Abstract The Open Provenance Specification is composed of the following documents:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]. In this deliverable, we have concatenated 11 distinct documents for the convenience of the reader. Each of these has its own numbering (in the footer) and bibliography.
An artist applies a bit of titanium white to his canvas takes a soft brush and ever so slightly smears it to form the shape of a pearl. Two hundred years later, the painting is being studied by a student who would love to know how the... more
An artist applies a bit of titanium white to his canvas takes a soft brush and ever so slightly smears it to form the shape of a pearl. Two hundred years later, the painting is being studied by a student who would love to know how the pearl was made, what paint was used, where it was painted, who painted it, when exactly was it painted, and why the painter did it that way?
Abstract Provenance is increasingly acknowledged as an important requirement in a wide range of information processing endeavours, with a variety of conceptions associated with it that tend to be domain specific. In the area of... more
Abstract Provenance is increasingly acknowledged as an important requirement in a wide range of information processing endeavours, with a variety of conceptions associated with it that tend to be domain specific. In the area of distributed computing, an initial approach conceptualizes provenance of a data item as the documentation of the process that led to that data item. We adopt this definition and clarify it further through the development of a sample scenario.
Abstract This document represents the start of the EU Provenance project's standardisation effort. It presents a specification of the data model for process documentation. Our approach is top down in nature. We start by describing the... more
Abstract This document represents the start of the EU Provenance project's standardisation effort. It presents a specification of the data model for process documentation. Our approach is top down in nature. We start by describing the p-structure—the logical organisation of process documentation, before drilling down into the models of the different forms of p-assertions. We then show how identification of p-assertions and data items can be achieved, before showing how we model context.
In dit college maak je kennis met een aantal belangrijke stromingen en denkers uit de traditie van de sociale en culturele wijsbegeerte en leer je primaire filosofische teksten lezen op het gebied van de sociale en culturele wijsbegeerte.... more
In dit college maak je kennis met een aantal belangrijke stromingen en denkers uit de traditie van de sociale en culturele wijsbegeerte en leer je primaire filosofische teksten lezen op het gebied van de sociale en culturele wijsbegeerte. Daartoe moet je zelfstandig een overzichtswerk bestuderen waarin een beeld wordt geschetst van de (post-) moderne samenleving en de paradoxen die daarin besloten liggen.
Abstract Task management is a core part of knowledge work. However, intelligent assistance for task management is hampered be the lack of large amounts of structured knowledge about user tasks. In this paper, we present a novel approach,... more
Abstract Task management is a core part of knowledge work. However, intelligent assistance for task management is hampered be the lack of large amounts of structured knowledge about user tasks. In this paper, we present a novel approach, Social Task Networks, for obtaining rich user contributed task information by integrating task management with social networking sites.
Abstract: Scientists today collect, analyze, and generate TeraBytes and PetaBytes of data. These data are often shared and further processed and analyzed among collaborators. In order to facilitate sharing and data interpretations, data... more
Abstract: Scientists today collect, analyze, and generate TeraBytes and PetaBytes of data. These data are often shared and further processed and analyzed among collaborators. In order to facilitate sharing and data interpretations, data need to carry with it metadata about how the data was collected or generated, and provenance information about how the data was processed. This chapter describes metadata and provenance in the context of the data lifecycle.
Abstract: The scientific system that we use today was devised centuries ago and is inadequate for our current ICT-based society: the peer review system encourages conservatism, journal publications are monolithic and slow, data is often... more
Abstract: The scientific system that we use today was devised centuries ago and is inadequate for our current ICT-based society: the peer review system encourages conservatism, journal publications are monolithic and slow, data is often not available to other scientists, and the independent validation of results is limited.
Abstract. Within complex scientific domains such as pharmacology, operational equivalence between two concepts is often context-, user-and task-specific. Existing Linked Data integration procedures and equivalence services do not take the... more
Abstract. Within complex scientific domains such as pharmacology, operational equivalence between two concepts is often context-, user-and task-specific. Existing Linked Data integration procedures and equivalence services do not take the context and task of the user into account. We present a vision for enabling users to control the notion of operational equivalence by applying scientific lenses over Linked Data. The scientific lenses vary the links that are activated between the datasets which affects the data returned to the user.
Abstract Workflow systems can manage complex scientific applications with distributed data processing. Although some workflow systems can represent collections of data with very compact abstractions and manage their execution efficiently,... more
Abstract Workflow systems can manage complex scientific applications with distributed data processing. Although some workflow systems can represent collections of data with very compact abstractions and manage their execution efficiently, there are no approaches to date to manage collections of application components required to express some scientific applications. We present an approach to handle collections of components and data alike in expressive workflow templates whose basic structure is reusable.
Abstract: We have used the Montage image mosaic engine to investigate the cost and performance of processing images on the Amazon EC2 cloud, and to inform the requirements that higher-level products impose on provenance management... more
Abstract: We have used the Montage image mosaic engine to investigate the cost and performance of processing images on the Amazon EC2 cloud, and to inform the requirements that higher-level products impose on provenance management technologies. We will present a detailed comparison of the performance of Montage on the cloud and on the Abe high performance cluster at the National Center for Supercomputing Applications (NCSA).
In the first phase of the EU provenance project, it is necessary to clearly define the concept of provenance within the context of the project, as this will provide the driving impetus for the subsequent design and implementation of a... more
In the first phase of the EU provenance project, it is necessary to clearly define the concept of provenance within the context of the project, as this will provide the driving impetus for the subsequent design and implementation of a provenance architecture. Towards this end, we have chosen to design and develop a simple system, which we term the pre-prototype, that will effectively articulate the project's conception of provenance.
The Social Semantic Web has begun to provide connections between users within social networks and the content they produce across the whole of the Social Web. Thus, the Social Semantic Web provides a basis to analyze both the... more
The Social Semantic Web has begun to provide connections between users within social networks and the content they produce across the whole of the Social Web. Thus, the Social Semantic Web provides a basis to analyze both the communication behavior of users together with the content of their communication. However, there is little research combining the tools to study communication behaviour and communication content, namely, social network analysis and content analysis.
Abstract. Annotating datasets with metadata is an important part of organizing and curating data. However, it is a time consuming process and often not done in a rigorous fashion. In this paper, we propose a new approach to annotating... more
Abstract. Annotating datasets with metadata is an important part of organizing and curating data. However, it is a time consuming process and often not done in a rigorous fashion. In this paper, we propose a new approach to annotating datasets through the use of reconstructed provenance. A detailed survey of the related work in this area is given. Additionally, we provide an overview of our approach for both reconstructing provenance and using that provenance to automatically annotate datasets with metadata.
ABSTRACT Scientific discourse occurs both in the academic literature and, increasingly, on the Web. What is discussed in the literature influences what is discussed on the web, and the reverse. However, the study of this discourse has... more
ABSTRACT Scientific discourse occurs both in the academic literature and, increasingly, on the Web. What is discussed in the literature influences what is discussed on the web, and the reverse. However, the study of this discourse has largely been isolated based on medium either using bibliometrics for academic literature or webometrics for Web-based communication.
Abstract. In the next 10 years, we will see a Semantic Web that is infused with a richer set of verbs: the ability not just to represent knowledge about static datasets but the ability to use knowledge to perform actions or operations. We... more
Abstract. In the next 10 years, we will see a Semantic Web that is infused with a richer set of verbs: the ability not just to represent knowledge about static datasets but the ability to use knowledge to perform actions or operations. We argue that there are three trends that make this outcome likely, namely, demand from current web applications (eg Facebook's Like), the ubiquity of Javascript and the increasing instrumentation of the real world.
Linked Data is at its core about the setting of links between resources. Links provide enriched semantics, pointers to extra information and enable the merging of data sets. However, as the amount of Linked Data has grown, there has been... more
Linked Data is at its core about the setting of links between resources. Links provide enriched semantics, pointers to extra information and enable the merging of data sets. However, as the amount of Linked Data has grown, there has been the need to automate the creation of links and such automated approaches can create low-quality links or unsuitable network structures. In particular, it is difficult to know whether the links introduced improve or diminish the quality of Linked Data.
Abstract Scientists increasingly use workflows to represent and share their computational experiments. Because of their declarative nature, focus on pre-existing component composition and the availability of visual editors, workflows are... more
Abstract Scientists increasingly use workflows to represent and share their computational experiments. Because of their declarative nature, focus on pre-existing component composition and the availability of visual editors, workflows are often seen as more “natural” than programming or scripting languages for representing data analysis procedures. However, there is still a significant gap between the naturalness of workflow representations and natural language.
Abstract This paper describes how computer-human interaction in ambient computing environments can be best informed by conceptualizing of such environments as problem solving systems. Typically, such systems comprise multiple human and... more
Abstract This paper describes how computer-human interaction in ambient computing environments can be best informed by conceptualizing of such environments as problem solving systems. Typically, such systems comprise multiple human and technological agents that meet the demands imposed by problem constraints through dynamic collaboration.
Abstract. Network science is an emerging research area focused on developing general network-based approaches for studying phenomena across a range of fields from social science to biology. Techniques from network science include network... more
Abstract. Network science is an emerging research area focused on developing general network-based approaches for studying phenomena across a range of fields from social science to biology. Techniques from network science include network analysis, network modeling and visualization. A key difficulty facing networks science is data acquisition. Network data must often be mined and converted from non-network sources, which is often a laborious and error prone process.
Abstract In this paper we propose a new provenance model which is tailored to a class of workflow-based applications. We motivate the approach with use cases from the astronomy community. We generalize the class of applications the... more
Abstract In this paper we propose a new provenance model which is tailored to a class of workflow-based applications. We motivate the approach with use cases from the astronomy community. We generalize the class of applications the approach is relevant to and propose a pipeline-centric provenance model. Finally, we evaluate the benefits in terms of storage needed by the approach when applied to an astronomy application.
This document provides an overview of a model of provenance along with a description of a family of detailed specification documents that support the model. Important aspects of the model are specified within these documents in a detailed... more
This document provides an overview of a model of provenance along with a description of a family of detailed specification documents that support the model. Important aspects of the model are specified within these documents in a detailed and clear manner that provides an unambiguous reference for developers.
This article is concerned with enhancing agent coordination in modern sociotechnological systems. To this end, sociotechnological systems are conceptualized as problem solving systems that comprise human and technological agents engaged... more
This article is concerned with enhancing agent coordination in modern sociotechnological systems. To this end, sociotechnological systems are conceptualized as problem solving systems that comprise human and technological agents engaged in dynamic collaboration. Following this, there is a discussion of the challenge of achieving agent coordination in problem solving systems as technological agents become increasingly autonomous.
Scientist increasingly rely on large scale, open distributed systems such as Grids in order to investigate a wide variety of research questions. In such systems, it is difficult to know exactly how a result is generated, however, such... more
Scientist increasingly rely on large scale, open distributed systems such as Grids in order to investigate a wide variety of research questions. In such systems, it is difficult to know exactly how a result is generated, however, such information is necessary for the scientific process. Therefore, it is vital that these systems have an automated mechanism for documenting process from which a result's provenance can be retrieved. The provenance of a result is the process that led to that result.
Abstract Increasingly, web-based applications are created through the composition of multiple functional components provided by different institutions. These so called” mash-ups” are an effective means to rapidly develop new applications.... more
Abstract Increasingly, web-based applications are created through the composition of multiple functional components provided by different institutions. These so called” mash-ups” are an effective means to rapidly develop new applications. However, when these mash-ups are embedded within social networking sites that aggregate and expose personal data, such as Facebook, MySpace, or LinkedIn, serious privacy issues arise because personal data can be transmitted outside the applications hosting institution.
Abstract. In this paper, we begin to address the question of which scientists are online. Prior studies have shown that Web users are only a segmented reflection of the actual offline population, and thus when studying online behaviors we... more
Abstract. In this paper, we begin to address the question of which scientists are online. Prior studies have shown that Web users are only a segmented reflection of the actual offline population, and thus when studying online behaviors we need to be explicit about the representativeness of the sample under study to accurately relate trends to populations. When studying social phenomena on the Web, the identification of individuals is essential to be able to generalize about specific segments of a population offline.
Abstract. Several approaches integrating life science data using Semantic Web technologies have been described in the literature. However, these approaches have largely ignored the vast amount of content only available within the... more
Abstract. Several approaches integrating life science data using Semantic Web technologies have been described in the literature. However, these approaches have largely ignored the vast amount of content only available within the scientific literature. In this article, we present an RDF schema for text mining results that enables queries in SPARQL over textual and database data together. We show how real pharmacological queries can be answered over 4 billion text mined triples.
In the last years we have seen an explosion of massive amounts of graph shaped data coming from a variery of applications that are related to social networks (Facebook, Twitter, blogs and other on-line media) and telecommunication... more
In the last years we have seen an explosion of massive amounts of graph shaped data coming from a variery of applications that are related to social networks (Facebook, Twitter, blogs and other on-line media) and telecommunication networks. Furthermore, the W3C Linking Open Data Initiative [8] has boosted the publication and interlinkage of a large number of datasets on the Semantic Web [2] resulting to the Linked Open Data Cloud.
ABSTRACT We present the Open PHACTS linked data platform that is being developed to address a set of example drug discovery research questions and which supports several drug discovery applications. The platform retrieves data from many... more
ABSTRACT We present the Open PHACTS linked data platform that is being developed to address a set of example drug discovery research questions and which supports several drug discovery applications. The platform retrieves data from many complementary, but overlapping, data sources to present an integrated view of the data. The platform exploits two entity resolution services: respectively for transforming text and chemical structures to a concept.
Abstract The Web of Data is increasingly becoming an important infrastructure for such diverse sectors as entertainment, government, e-commerce and science. The robustness of this Web of Data is now crucial. Prior studies show that this... more
Abstract The Web of Data is increasingly becoming an important infrastructure for such diverse sectors as entertainment, government, e-commerce and science. The robustness of this Web of Data is now crucial. Prior studies show that this Web is strongly dependent on a small number of central hubs, making it highly vulnerable to single points of failure. In this paper, we present concepts and algorithms to analyse and repair the brittleness of the Web of Data.
Abstract. Linked Data is at its core about the setting of links between resources. Links provide enriched semantics, pointers to extra information and enable the merging of data sets. However, as the amount of Linked Data has grown, there... more
Abstract. Linked Data is at its core about the setting of links between resources. Links provide enriched semantics, pointers to extra information and enable the merging of data sets. However, as the amount of Linked Data has grown, there has been the need to automate the creation of links and such automated approaches can create low-quality links or unsuitable network structures. In particular, it is difficult to obtain an overall picture as to whether the links introduced improve or diminish the quality of Linked Data.
Abstract Although to-do lists are a ubiquitous form of personal task management, there has been no work on intelligent assistance to automate, elaborate, or coordinate a user's to-dos. Our research focuses on three aspects of intelligent... more
Abstract Although to-do lists are a ubiquitous form of personal task management, there has been no work on intelligent assistance to automate, elaborate, or coordinate a user's to-dos. Our research focuses on three aspects of intelligent assistance for to-dos. We investigated the use of intelligent agents to automate todos in an office setting. We collected a large corpus from users, and developed a paraphrase-based approach to matching agent capabilities with to-dos.
Abstract This paper describes how human-technology interaction in modern ambient technology environments can be best informed by conceptualizing of such environments as problem solving systems. Typically, such systems comprise multiple... more
Abstract This paper describes how human-technology interaction in modern ambient technology environments can be best informed by conceptualizing of such environments as problem solving systems. Typically, such systems comprise multiple human and technological agents that meet the demands imposed by problem constraints through dynamic collaboration.
Abstract Personal tasks are managed with a variety of mechanisms from To-Do lists to calendars to emails. Task management remains challenging, since many tasks are interrelated, some may depend on other people's tasks to be accomplished,... more
Abstract Personal tasks are managed with a variety of mechanisms from To-Do lists to calendars to emails. Task management remains challenging, since many tasks are interrelated, some may depend on other people's tasks to be accomplished, and their priority and status changes over time. While many tasks could be automated by services and agents available on the Web, To-Do lists often capture tasks in textual form that combine non-automatable aspects.
The purpose of this document is to present the architecture of the Client Side Library (CSL) of a provenance system, its rationale, its implementation and a methodology guiding its use. According to Kruchten [1],“this is a development... more
The purpose of this document is to present the architecture of the Client Side Library (CSL) of a provenance system, its rationale, its implementation and a methodology guiding its use. According to Kruchten [1],“this is a development architecture, which focuses on the actual software module organisation.”
An artist applies a bit of titanium white to his canvas, takes a soft brush and ever so slightly smears it to form the shape of a pearl. Two hundred years later, the painting is being studied by a student who would love to know how the... more
An artist applies a bit of titanium white to his canvas, takes a soft brush and ever so slightly smears it to form the shape of a pearl. Two hundred years later, the painting is being studied by a student who would love to know how the pearl was made, what paint was used, where it was painted, who painted it, when exactly was it painted, and why the painter did it that way?
Abstract. Electronic contracts are a means of representing agreed responsibilities and expected behaviour of autonomous agents acting on behalf of businesses. They can be used to regulate behaviour by providing negative consequences,... more
Abstract. Electronic contracts are a means of representing agreed responsibilities and expected behaviour of autonomous agents acting on behalf of businesses. They can be used to regulate behaviour by providing negative consequences, penalties, where the responsibilities and expectations are not met, the contract is violated. However, long-term business relationships require some flexibility in the face of circumstances that do not conform to the assumptions of the contract, mitigating circumstances.
Abstract. Understanding the provenance of clinical guidelines is important for both practitioners and researchers as it allows for deeper understanding of the provided recommendations and could potentially provide a basis for updating... more
Abstract. Understanding the provenance of clinical guidelines is important for both practitioners and researchers as it allows for deeper understanding of the provided recommendations and could potentially provide a basis for updating guidelines. Often such provenance is incomplete or unavailable. We describe a prototype of a multi-signal pipeline for reconstructing provenance and show preliminary results of reconstructing dependencies between documents in the context of clinical guidelines and associated documents.
The discovery of new medicines requires pharmacologists to interact with a number of information sources ranging from tabular data to scientific papers, and other specialized formats. In this application report, we describe a linked data... more
The discovery of new medicines requires pharmacologists to interact with a number of information sources ranging from tabular data to scientific papers, and other specialized formats. In this application report, we describe a linked data platform for integrating multiple pharmacology datasets that form the basis for several drug discovery applications.
Abstract With billions of assertions and counting, the Web of Data represents the largest multi-contributor interlinked knowledge base that ever existed. We present a novel framework for analyzing and using the Web of Data based on... more
Abstract With billions of assertions and counting, the Web of Data represents the largest multi-contributor interlinked knowledge base that ever existed. We present a novel framework for analyzing and using the Web of Data based on extracting and analyzing thematic subsets of it. We view the Web of Data as a" network of networks" from which to extract meaningful subsets that can be converted them into self-contained networks to be further analyzed and reused.
Abstract Artificial intelligence has a long history of learning from domain problems ranging from chess to jeopardy. In this work, we look at a problem stemming from social science, namely, how do social relationships influence... more
Abstract Artificial intelligence has a long history of learning from domain problems ranging from chess to jeopardy. In this work, we look at a problem stemming from social science, namely, how do social relationships influence communication content and vice versa. The tools used to study communication content (content analysis) have rarely been combined with those used to study social relationships (social network analysis). Furthermore, there is even less work addressing the longitudinal characteristics of such a combination.
Abstract Humans often teach procedures through tutorial instruction to other humans. For computers, learning from natural human instruction remains a challenge as it is plagued with incompleteness and ambiguity. Instructions are often... more
Abstract Humans often teach procedures through tutorial instruction to other humans. For computers, learning from natural human instruction remains a challenge as it is plagued with incompleteness and ambiguity. Instructions are often given out of order and are not always consistent. Moreover, humans assume that the learner has a wealth of knowledge and skills, which computers do not always have.
Abstract. Scientists and, more generally end users of computer systems, need to be able to trust the data they use. Understanding the origin or provenance of data can provide this trust. Attempts have been made to develop systems for... more
Abstract. Scientists and, more generally end users of computer systems, need to be able to trust the data they use. Understanding the origin or provenance of data can provide this trust. Attempts have been made to develop systems for recording provenance, however, most are not generic and cannot be applied in a general manner across different systems and different technologies. Moreover, many existing systems confuse the concept of provenance with its representation.
The third provenance challenge was organized to evaluate the efficacy of the Open Provenance Model (OPM) in representing and sharing provenance with the goal of improving the specification. A data loading scientific workflow that ingests... more
The third provenance challenge was organized to evaluate the efficacy of the Open Provenance Model (OPM) in representing and sharing provenance with the goal of improving the specification. A data loading scientific workflow that ingests data files into a relational database for the Pan-STARRS sky survey project was selected as a candidate for collecting provenance.
Process documentation can often be distributed across different provenance stores. To enable the discovery of related process documentation, a mechanism is required to link disparate but related process documentation to enable the... more
Process documentation can often be distributed across different provenance stores. To enable the discovery of related process documentation, a mechanism is required to link disparate but related process documentation to enable the effective collection of such documentation in order to answer provenance queries. This document represents a WS-addressing profile on distributed process documentation that provides mechanisms to solve this problem.
Abstract. eRDF is an infrastructure for exploring the Web of Data through evolutionary querying. The main idea is to employ the wellknown strength of evolutionary strategies to find good, though possibly approximate, answers quickly. This... more
Abstract. eRDF is an infrastructure for exploring the Web of Data through evolutionary querying. The main idea is to employ the wellknown strength of evolutionary strategies to find good, though possibly approximate, answers quickly. This allows us discover relevant answers to a user's information need in an anytime way.
• Provenance of a resource is a record that describes entities and processes involved in producing and delivering or otherwise influencing that resource.• Provenance provides a critical foundation for assessing authenticity, enabling... more
• Provenance of a resource is a record that describes entities and processes involved in producing and delivering or otherwise influencing that resource.• Provenance provides a critical foundation for assessing authenticity, enabling trust, and allowing reproducibility.• Provenance assertions are a form of contextual metadata and can themselves become important records with their own provenance.
This document provides information to the community regarding the specification of a data model for process documentation used to describe a SOAP binding of the process documentation model and has the status of a working draft. It does... more
This document provides information to the community regarding the specification of a data model for process documentation used to describe a SOAP binding of the process documentation model and has the status of a working draft. It does not define any standards or technical recommendations. Distribution is unlimited. ... This document describes a SOAP binding for the process documentation pheader. It presents a specification of the p-header and can be considered an extension of the process documentation data model presented in [MGJ+06].
Abstract From where did this tweet originate? Was this quote from the New York Times modified? Daily, we rely on data from the Web, but often it is difficult or impossible to determine where it came from or how it was produced. This lack... more
Abstract From where did this tweet originate? Was this quote from the New York Times modified? Daily, we rely on data from the Web, but often it is difficult or impossible to determine where it came from or how it was produced. This lack of provenance is particularly evident when people and systems deal with Web information or with any environment where information comes from sources of varying quality. Provenance is not captured pervasively in information systems.

And 59 more