Discovery of efficient anti-cancer drug combinations is a major challenge, since experimen-tal te... more Discovery of efficient anti-cancer drug combinations is a major challenge, since experimen-tal testing of all possible combinations is clearly impossible. Recent efforts to computation-ally predict drug combination responses retain this experimental search space, as model definitions typically rely on extensive drug perturbation data. We developed a dynamical model representing a cell fate decision network in the AGS gastric cancer cell line, relying on background knowledge extracted from literature and databases. We defined a set of logi-cal equations recapitulating AGS data observed in cells in their baseline proliferative state. Using the modeling software GINsim, model reduction and simulation compression tech-niques were applied to cope with the vast state space of large logical models and enable simulations of pairwise applications of specific signaling inhibitory chemical substances. Our simulations predicted synergistic growth inhibitory action of five combinations from a to...
Background: The biosciences increasingly face the challenge of integrating a wide variety of avai... more Background: The biosciences increasingly face the challenge of integrating a wide variety of available data, information and knowledge in order to gain an understanding of biological systems. Data integration is supported by a diverse series of tools, but the lack of a consistent terminology to label these data still presents significant hurdles. As a consequence, much of the available biological data remains disconnected or worse: becomes misconnected. The need to address this terminology problem has spawned the building of a large number of bio-ontologies. OBOF, RDF and OWL are among the most used ontology formats to capture terms and relationships in the Life Sciences, opening the potential to use the Semantic Web to support data integration and further exploitation of integrated resources via automated retrieval and reasoning procedures. Methods: We extended the Perl suite ONTO-PERL and functionally integrated it into the Galaxy platform. The resulting ONTO-ToolKit supports the ...
Scientific progress is increasingly dependent on knowledge in computation-ready forms. In the lif... more Scientific progress is increasingly dependent on knowledge in computation-ready forms. In the life sciences, among others, many scientists carefully extract and structure knowledge from the scientific literature. In a process called manual curation, they enter knowledge into spreadsheets, or into databases where it serves their and many others' research. Valuable as these curation efforts are, the range and detail of what can practically be captured and shared remains limited, because of the constraints of current curation tools. Many important contextual aspects of observations described in literature simply do not fit in the form defined by these tools, and thus cannot be captured. Here we present the design of an easy-to-use, general-purpose method and interface, that enables the precise semantic capture of virtually unlimited types of information and details, using only a minimal set of building blocks. Scientists from any discipline can use this to convert any complex knowl...
The Semantic Web standards OWL and RDF are often used to represent biomedical information as Link... more The Semantic Web standards OWL and RDF are often used to represent biomedical information as Linked Data; however, the OWL/RDF syntax, which combines both, was never optimised for querying. By combining two formal paradigms for modelling Linked Data, namely multi-digraphs and Description Logic, many precise terms for relations have emerged that are defined in the Metarel relation ontology. They are especially useful in Linked Data and RDF knowledge bases that 1 rely on SPARQL querying and 2 require semantic support for chains of relations.Metarel-described multi-digraphs were used for knowledge integration and reasoning in three RDF knowledge bases in the domain of genome biology: BioGateway, Cell Cycle Ontology and Gene Expression Knowledge Base. These knowledge bases integrate both data, like KEGG, and ontologies, like Gene Ontology, in the same RDF graphs. Their libraries with biomedically relevant SPARQL queries show the practical benefits of this semantic paradigm. In addition ...
Genome-scale &amp... more Genome-scale 'omics' data constitute a potentially rich source of information about biological systems and their function. There is a plethora of tools and methods available to mine omics data. However, the diversity and complexity of different omics data types is a stumbling block for multi-data integration, hence there is a dire need for additional methods to exploit potential synergy from integrated orthogonal data. Rough Sets provide an efficient means to use complex information in classification approaches. Here, we set out to explore the possibilities of Rough Sets to incorporate diverse information sources in a functional classification of unknown genes. We explored the use of Rough Sets for a novel data integration strategy where gene expression data, protein features and Gene Ontology (GO) annotations were combined to describe general and biologically relevant patterns represented by If-Then rules. The descriptive rules were used to predict the function of unknown genes in Arabidopsis thaliana and Schizosaccharomyces pombe. The If-Then rule models showed success rates of up to 0.89 (discriminative and predictive power for both modeled organisms); whereas, models built solely of one data type (protein features or gene expression data) yielded success rates varying from 0.68 to 0.78. Our models were applied to generate classifications for many unknown genes, of which a sizeable number were confirmed either by PubMed literature reports or electronically interfered annotations. Finally, we studied cell cycle protein-protein interactions derived from both tandem affinity purification experiments and in silico experiments in the BioGRID interactome database and found strong experimental evidence for the predictions generated by our models. The results show that our approach can be used to build very robust models that create synergy from integrating gene expression data and protein features. The Rough Set-based method is implemented in the Rosetta toolkit kernel version 1.0.1 available at: http://rosetta.lcb.uu.se/
Summary The BioGateway App is a Cytoscape (version 3) plugin designed to provide easy query acces... more Summary The BioGateway App is a Cytoscape (version 3) plugin designed to provide easy query access to the BioGateway Resource Description Framework triple store, which contains functional and interaction information for proteins from several curated resources. For explorative network building, we have added a comprehensive dataset with regulatory relationships of mammalian DNA-binding transcription factors and their target genes, compiled both from curated resources and from a text mining effort. Query results are visualized using the inherent flexibility of the Cytoscape framework, and network links can be checked against curated database records or against the original publication. Availability and implementation Install through the Cytoscape application manager or visit www.biogateway.eu for download and tutorial documents. Supplementary information Supplementary information is available at Bioinformatics online.
In recent years, several authors have used probabilistic graphical models to learn expression mod... more In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms. We introduce a software package for learning module networks, called LeMoNe, which incorporates a novel strategy for learning regulatory programs. Novelties include the use of a bottom-up Bayesian hierarchical clustering to construct the regulatory programs, and the use of a conditional entropy measure to assign regulators to the regulation program nodes. Using SynTReN data, we test the performance of LeMoNe in a completely controlled situation and assess the effect of the methodological changes we made with respect to an existing software package, namely Genomica. Additionally, we assess the effect of various parameters, such as the size of the data set and the amount of noise, on t...
The Mauriceville and Varkud plasmids are retroid elements that propagate in the mitochondria of s... more The Mauriceville and Varkud plasmids are retroid elements that propagate in the mitochondria of some Neurospora spp. strains. Previous studies of endogenous reactions in ribonucleoprotein particle preparations suggested that the plasmids use a novel mechanism of reverse transcription that involves synthesis of a full-length minus-strand DNA beginning at the 3' end of the plasmid transcript, which has a 3' tRNA-like structure (M. T. R. Kuiper and A. M. Lambowitz, Cell 55:693-704, 1988). In this study, we developed procedures for releasing the Mauriceville plasmid reverse transcriptase from mitochondrial ribonucleoprotein particles and partially purifying it by heparin-Sepharose chromatography. By using these soluble preparations, we show directly that the Mauriceville plasmid reverse transcriptase synthesizes full-length cDNA copies of in vitro transcripts beginning at the 3' end and has a preference for transcripts having the 3' tRNA-like structure. Further, unlike r...
Scientific progress is increasingly dependent on knowledge in computation-ready forms. In the lif... more Scientific progress is increasingly dependent on knowledge in computation-ready forms. In the life sciences, among others, many scientists therefore extract and structure knowledge from the literature. In a process called manual curation, they enter knowledge into spreadsheets, or into databases where it serves their and many others’ research. Valuable as these curation efforts are, the range and detail of what can practically be captured and shared remains limited, because of the constraints of current curation tools. Many important contextual aspects of observations described in literature simply do not fit in the form defined by these tools, and thus cannot be captured. Here we present the design of an easy-touse, general-purpose method and interface, that enables the precise semantic capture of virtually unlimited types of information and details, using only a minimal set of building blocks. Scientists from any discipline can use this to convert any complex knowledge into a form...
Life Science information is increasingly available on the Semantic Web and this poses a demand fo... more Life Science information is increasingly available on the Semantic Web and this poses a demand for new tools and methodologies if it is to fulfill its potential to advance research in all areas of life sciences, including biomedicine. Life science information is obtained by traditionally distinct and varied scientific disciplines which explains why it is heterogeneous in its representation and in its semantics. The exploitation of this information by users relies only to a limited extent on well understood and shared formats, relations and metaphors; the interaction of users with biomedical information resources is an integral part of the definition and interpretation of the information that they provide. The need for this interactivity is reflected by current biomedical research practice. A range of life science software tools and methodologies focus on the analysis of biological networks and pathways. They provide interactive environments where relations among biological entities ...
Discovery of efficient anti-cancer drug combinations is a major challenge, since experimen-tal te... more Discovery of efficient anti-cancer drug combinations is a major challenge, since experimen-tal testing of all possible combinations is clearly impossible. Recent efforts to computation-ally predict drug combination responses retain this experimental search space, as model definitions typically rely on extensive drug perturbation data. We developed a dynamical model representing a cell fate decision network in the AGS gastric cancer cell line, relying on background knowledge extracted from literature and databases. We defined a set of logi-cal equations recapitulating AGS data observed in cells in their baseline proliferative state. Using the modeling software GINsim, model reduction and simulation compression tech-niques were applied to cope with the vast state space of large logical models and enable simulations of pairwise applications of specific signaling inhibitory chemical substances. Our simulations predicted synergistic growth inhibitory action of five combinations from a to...
Background: The biosciences increasingly face the challenge of integrating a wide variety of avai... more Background: The biosciences increasingly face the challenge of integrating a wide variety of available data, information and knowledge in order to gain an understanding of biological systems. Data integration is supported by a diverse series of tools, but the lack of a consistent terminology to label these data still presents significant hurdles. As a consequence, much of the available biological data remains disconnected or worse: becomes misconnected. The need to address this terminology problem has spawned the building of a large number of bio-ontologies. OBOF, RDF and OWL are among the most used ontology formats to capture terms and relationships in the Life Sciences, opening the potential to use the Semantic Web to support data integration and further exploitation of integrated resources via automated retrieval and reasoning procedures. Methods: We extended the Perl suite ONTO-PERL and functionally integrated it into the Galaxy platform. The resulting ONTO-ToolKit supports the ...
Scientific progress is increasingly dependent on knowledge in computation-ready forms. In the lif... more Scientific progress is increasingly dependent on knowledge in computation-ready forms. In the life sciences, among others, many scientists carefully extract and structure knowledge from the scientific literature. In a process called manual curation, they enter knowledge into spreadsheets, or into databases where it serves their and many others' research. Valuable as these curation efforts are, the range and detail of what can practically be captured and shared remains limited, because of the constraints of current curation tools. Many important contextual aspects of observations described in literature simply do not fit in the form defined by these tools, and thus cannot be captured. Here we present the design of an easy-to-use, general-purpose method and interface, that enables the precise semantic capture of virtually unlimited types of information and details, using only a minimal set of building blocks. Scientists from any discipline can use this to convert any complex knowl...
The Semantic Web standards OWL and RDF are often used to represent biomedical information as Link... more The Semantic Web standards OWL and RDF are often used to represent biomedical information as Linked Data; however, the OWL/RDF syntax, which combines both, was never optimised for querying. By combining two formal paradigms for modelling Linked Data, namely multi-digraphs and Description Logic, many precise terms for relations have emerged that are defined in the Metarel relation ontology. They are especially useful in Linked Data and RDF knowledge bases that 1 rely on SPARQL querying and 2 require semantic support for chains of relations.Metarel-described multi-digraphs were used for knowledge integration and reasoning in three RDF knowledge bases in the domain of genome biology: BioGateway, Cell Cycle Ontology and Gene Expression Knowledge Base. These knowledge bases integrate both data, like KEGG, and ontologies, like Gene Ontology, in the same RDF graphs. Their libraries with biomedically relevant SPARQL queries show the practical benefits of this semantic paradigm. In addition ...
Genome-scale &amp... more Genome-scale 'omics' data constitute a potentially rich source of information about biological systems and their function. There is a plethora of tools and methods available to mine omics data. However, the diversity and complexity of different omics data types is a stumbling block for multi-data integration, hence there is a dire need for additional methods to exploit potential synergy from integrated orthogonal data. Rough Sets provide an efficient means to use complex information in classification approaches. Here, we set out to explore the possibilities of Rough Sets to incorporate diverse information sources in a functional classification of unknown genes. We explored the use of Rough Sets for a novel data integration strategy where gene expression data, protein features and Gene Ontology (GO) annotations were combined to describe general and biologically relevant patterns represented by If-Then rules. The descriptive rules were used to predict the function of unknown genes in Arabidopsis thaliana and Schizosaccharomyces pombe. The If-Then rule models showed success rates of up to 0.89 (discriminative and predictive power for both modeled organisms); whereas, models built solely of one data type (protein features or gene expression data) yielded success rates varying from 0.68 to 0.78. Our models were applied to generate classifications for many unknown genes, of which a sizeable number were confirmed either by PubMed literature reports or electronically interfered annotations. Finally, we studied cell cycle protein-protein interactions derived from both tandem affinity purification experiments and in silico experiments in the BioGRID interactome database and found strong experimental evidence for the predictions generated by our models. The results show that our approach can be used to build very robust models that create synergy from integrating gene expression data and protein features. The Rough Set-based method is implemented in the Rosetta toolkit kernel version 1.0.1 available at: http://rosetta.lcb.uu.se/
Summary The BioGateway App is a Cytoscape (version 3) plugin designed to provide easy query acces... more Summary The BioGateway App is a Cytoscape (version 3) plugin designed to provide easy query access to the BioGateway Resource Description Framework triple store, which contains functional and interaction information for proteins from several curated resources. For explorative network building, we have added a comprehensive dataset with regulatory relationships of mammalian DNA-binding transcription factors and their target genes, compiled both from curated resources and from a text mining effort. Query results are visualized using the inherent flexibility of the Cytoscape framework, and network links can be checked against curated database records or against the original publication. Availability and implementation Install through the Cytoscape application manager or visit www.biogateway.eu for download and tutorial documents. Supplementary information Supplementary information is available at Bioinformatics online.
In recent years, several authors have used probabilistic graphical models to learn expression mod... more In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms. We introduce a software package for learning module networks, called LeMoNe, which incorporates a novel strategy for learning regulatory programs. Novelties include the use of a bottom-up Bayesian hierarchical clustering to construct the regulatory programs, and the use of a conditional entropy measure to assign regulators to the regulation program nodes. Using SynTReN data, we test the performance of LeMoNe in a completely controlled situation and assess the effect of the methodological changes we made with respect to an existing software package, namely Genomica. Additionally, we assess the effect of various parameters, such as the size of the data set and the amount of noise, on t...
The Mauriceville and Varkud plasmids are retroid elements that propagate in the mitochondria of s... more The Mauriceville and Varkud plasmids are retroid elements that propagate in the mitochondria of some Neurospora spp. strains. Previous studies of endogenous reactions in ribonucleoprotein particle preparations suggested that the plasmids use a novel mechanism of reverse transcription that involves synthesis of a full-length minus-strand DNA beginning at the 3' end of the plasmid transcript, which has a 3' tRNA-like structure (M. T. R. Kuiper and A. M. Lambowitz, Cell 55:693-704, 1988). In this study, we developed procedures for releasing the Mauriceville plasmid reverse transcriptase from mitochondrial ribonucleoprotein particles and partially purifying it by heparin-Sepharose chromatography. By using these soluble preparations, we show directly that the Mauriceville plasmid reverse transcriptase synthesizes full-length cDNA copies of in vitro transcripts beginning at the 3' end and has a preference for transcripts having the 3' tRNA-like structure. Further, unlike r...
Scientific progress is increasingly dependent on knowledge in computation-ready forms. In the lif... more Scientific progress is increasingly dependent on knowledge in computation-ready forms. In the life sciences, among others, many scientists therefore extract and structure knowledge from the literature. In a process called manual curation, they enter knowledge into spreadsheets, or into databases where it serves their and many others’ research. Valuable as these curation efforts are, the range and detail of what can practically be captured and shared remains limited, because of the constraints of current curation tools. Many important contextual aspects of observations described in literature simply do not fit in the form defined by these tools, and thus cannot be captured. Here we present the design of an easy-touse, general-purpose method and interface, that enables the precise semantic capture of virtually unlimited types of information and details, using only a minimal set of building blocks. Scientists from any discipline can use this to convert any complex knowledge into a form...
Life Science information is increasingly available on the Semantic Web and this poses a demand fo... more Life Science information is increasingly available on the Semantic Web and this poses a demand for new tools and methodologies if it is to fulfill its potential to advance research in all areas of life sciences, including biomedicine. Life science information is obtained by traditionally distinct and varied scientific disciplines which explains why it is heterogeneous in its representation and in its semantics. The exploitation of this information by users relies only to a limited extent on well understood and shared formats, relations and metaphors; the interaction of users with biomedical information resources is an integral part of the definition and interpretation of the information that they provide. The need for this interactivity is reflected by current biomedical research practice. A range of life science software tools and methodologies focus on the analysis of biological networks and pathways. They provide interactive environments where relations among biological entities ...
Uploads
Papers by Martin Kuiper