Background Positional weight matrix (PWM) is a de facto standard model to describe transcription ... more Background Positional weight matrix (PWM) is a de facto standard model to describe transcription factor (TF) DNA binding specificities. PWMs inferred from in vivo or in vitro data are stored in many databases and used in a plethora of biological applications. This calls for comprehensive benchmarking of public PWM models with large experimental reference sets. Results Here we report results from all-against-all benchmarking of PWM models for DNA binding sites of human TFs on a large compilation of in vitro (HT-SELEX, PBM) and in vivo (ChIP-seq) binding data. We observe that the best performing PWM for a given TF often belongs to another TF, usually from the same family. Occasionally, binding specificity is correlated with the structural class of the DNA binding domain, indicated by good cross-family performance measures. Benchmarking-based selection of family-representative motifs is more effective than motif clustering-based approaches. Overall, there is good agreement between in v...
Several Genome Wide Association Studies (GWAS) have reported variants associated to immune diseas... more Several Genome Wide Association Studies (GWAS) have reported variants associated to immune diseases. However, the identified variants are rarely the drivers of the associations and the molecular mechanisms behind the genetic contributions remain poorly understood. ChIP-seq data for TFs and histone modifications provide snapshots of protein-DNA interactions allowing the identification of heterozygous SNPs showing significant allele specific signals (AS-SNPs). AS-SNPs can change a TF binding site resulting in altered gene regulation and are primary candidates to explain associations observed in GWAS and expression studies. We identified 17,293 unique AS-SNPs across 7 lymphoblastoid cell lines. In this set of cell lines we interrogated 85% of common genetic variants in the population for potential regulatory effect and we identified 237 AS-SNPs associated to immune GWAS traits and 714 to gene expression in B cells. To elucidate possible regulatory mechanisms we integrated long-range 3D...
High-throughput data, for instance ChIP-seq data, measure binding of transcription factors (TFs) ... more High-throughput data, for instance ChIP-seq data, measure binding of transcription factors (TFs) or other proteins to DNA and have become a widespread data source for de-novo motif discovery. Often, several ChIP-seq data sets study the same TF under different conditions resulting in several, potentially redundant motifs, which demands for identification and clustering of similar motifs. Here, we propose a refined measure of motif similarity based on the correlation between score profiles on de Bruijn sequences. We demonstrate the utility of the proposed measure in benchmark studies on artificial motifs and motifs discovered from ENCODE ChIP-seq data. We use this measure to cluster motifs discovered from 757 different ENCODE ChIP-seq data sets for 166 TFs and RNA-polymerase II and III. Based on this clustering, we derive a TF interaction network that reflects many known TF-TF interactions, but also reveals novel putative interaction partners.
Czech Journal of Genetics and Plant Breeding, 2011
As a resource for structural and functional barley genome analysis, more than 140 000 ESTs (... more As a resource for structural and functional barley genome analysis, more than 140 000 ESTs (expressed sequence tags) were generated from 22 cDNA libraries that yielded 25 224 tentative unigenes. About 50% of them belong to gene families. The size of the complete transcriptome is estimated to comprise between 35 000 and 75 000 genes. The barley EST collection is a rich source for the development of novel markers including SSRs (simple sequence repeats) and SNPs (single nucleotide polymorphisms). Several bioinformatic tools have been developed facilitating the computer-assisted analysis of EST databases for the presence of either SNPs or SSRs and the development of SNP-derived CAPS (cleaved amplified polymorphic sequences) markers. In an attempt to systematically map barley genes a high-density transcript map is under construction and presently comprises more than 1000 markers. This map is a gateway to comparative genomics with particular emphasis on the rice genome. 65% of ...
Plant-specific EFFECTORS OF TRANSCRIPTION (ET) are characterised by a variable number of highly c... more Plant-specific EFFECTORS OF TRANSCRIPTION (ET) are characterised by a variable number of highly conserved ET repeats, which are involved in zinc and DNA binding. In addition, ETs share a GIY-YIG domain, involved in DNA nicking activity. It was hypothesised that ETs might act as epigenetic regulators. Here, methylome, transcriptome and phenotypic analyses were performed to investigate the role of ET factors and their involvement in DNA methylation in Arabidopsis thaliana. Comparative DNA methylation and transcriptome analyses in flowers and seedlings of et mutants revealed ET-specific differentially expressed genes and mostly independently characteristic, ET-specific differentially methylated regions. Loss of ET function results in pleiotropic developmental defects. The accumulation of cyclobutane pyrimidine dimers after ultraviolet stress in et mutants suggests an ET function in DNA repair.
Functions for RNA-binding proteins in orchestrating plant development and environmental responses... more Functions for RNA-binding proteins in orchestrating plant development and environmental responses are well established. However, the lack of a genome-wide view of their in vivo binding targets and binding landscapes represents a gap in understanding the mode of action of plant RNA-binding proteins. Here, we adapt individual nucleotide resolution crosslinking and immunoprecipitation (iCLIP) genome-wide to determine the binding repertoire of the circadian clock-regulated Arabidopsis thaliana glycine-rich RNA-binding protein AtGRP7. iCLIP identifies 858 transcripts with significantly enriched crosslink sites in plants expressing AtGRP7-GFP that are absent in plants expressing an RNA-binding-dead AtGRP7 variant or GFP alone. To independently validate the targets, we performed RNA immunoprecipitation (RIP)-sequencing of AtGRP7-GFP plants subjected to formaldehyde fixation. Of the iCLIP targets, 452 were also identified by RIP-seq and represent a set of high-confidence binders. AtGRP7 can...
Growing evidence makes a strong case that epigenetic mechanisms contribute to complex traits, wit... more Growing evidence makes a strong case that epigenetic mechanisms contribute to complex traits, with implications across many fields of biology from dissecting developmental processes to understanding aspects of human health and disease. In ecology, recent studies have merged ecological experimental design with epigenetic analyses to elucidate the contribution of epigenetics to plant phenotypes, stress response, adaptation to habitat, or species range distributions. While there has been some progress in revealing the role of epigenetics in ecological processes, many studies with non-model species have so far been limited to describing broad patterns based on anonymous markers of DNA methylation. In contrast, studies with model species have benefited from powerful genomic resources, which allow for a more mechanistic understanding but have limited ecological realism. To understand the true significance of epigenetics for plant ecology and evolution, we must combine both approaches tran...
The plant hormone ethylene regulates numerous developmental processes and stress responses. Ethyl... more The plant hormone ethylene regulates numerous developmental processes and stress responses. Ethylene signaling proceeds via a linear pathway, which activates transcription factor (TF) EIN3, a primary transcriptional regulator of ethylene response. EIN3 influences gene expression upon binding to a specific sequence in gene promoters. This interaction, however, might be considerably affected by additional co-factors. In this work, we perform whole genome bioinformatics study to identify the impact of epigenetic factors in EIN3 functioning. The analysis of publicly available ChIP-Seq data on EIN3 binding in Arabidopsis thaliana showed bimodality of distribution of EIN3 binding regions (EBRs) in gene promoters. Besides a sharp peak in close proximity to transcription start site, which is a common binding region for a wide variety of TFs, we found an additional extended peak in the distal promoter region. We characterized all EBRs with respect to the epigenetic status appealing to previo...
Sugar beet (Beta vulgaris ssp. vulgaris) is a biennial, sucrose-storing plant, which is mainly cu... more Sugar beet (Beta vulgaris ssp. vulgaris) is a biennial, sucrose-storing plant, which is mainly cultivated as a spring crop and harvested in the vegetative stage before winter. For increasing beet yield, over-winter cultivation would be advantageous. However, bolting is induced after winter and drastically reduces yield. Thus, post-winter bolting control is essential for winter beet cultivation. To identify genetic factors controlling bolting after winter, a F2 population was previously developed by crossing the sugar beet accessions BETA 1773 with reduced bolting tendency and 93161P with complete bolting after winter. For a mapping-by-sequencing analysis, pools of 26 bolting-resistant and 297 bolting F2 plants were used. Thereby, a single continuous homozygous region of 103 kb was co-localized to the previously published BR1 QTL for post-winter bolting resistance (Pfeiffer et al., 2014). The BR1 locus was narrowed down to 11 candidate genes from which a homolog of the Arabidopsis CL...
Small nucleolar RNAs (snoRNAs) are one of the most ancient families amongst non-protein-coding RN... more Small nucleolar RNAs (snoRNAs) are one of the most ancient families amongst non-protein-coding RNAs. They are ubiquitous in Archaea and Eukarya but absent in bacteria. Their main function is to target chemical modifications of ribosomal RNAs. They fall into two classes, box C/D snoRNAs and box H/ACA snoRNAs, which are clearly distinguished by conserved sequence motifs and the type of chemical modification that they govern. Similarly to microRNAs, snoRNAs appear in distinct families of homologs that affect homologous targets. In animals, snoRNAs and their evolution have been studied in much detail. In plants, however, their evolution has attracted comparably little attention. In order to chart the phylogenetic distribution of individual snoRNA families in plants, we applied a sophisticated approach for identifying homologs of known plant snoRNAs across the plant kingdom. In response to the relatively fast evolution of snoRNAs, information on conserved sequence boxes, target sequences...
High-throughput sequencing techniques have made it possible to assay an organism's entire rep... more High-throughput sequencing techniques have made it possible to assay an organism's entire repertoire of small non-coding RNAs (ncRNAs) in an efficient and cost-effective manner. The moderate size of small RNA-seq datasets makes it feasible to provide free web services to the research community that provide many basic features of a small RNA-seq analysis, including quality control, read normalization, ncRNA quantification, and the prediction of putative novel ncRNAs. DARIO is one such system that so far has been focussed on animals. Here we introduce an extension of this system to plant short non-coding RNAs (sncRNAs). It includes major modifications to cope with plant-specific sncRNA processing. The current version of plantDARIO covers analyses of mapping files, small RNA-seq quality control, expression analyses of annotated sncRNAs, including the prediction of novel miRNAs and snoRNAs from unknown expressed loci and expression analyses of user-defined loci. At present Arabidops...
The transcription of genes is often regulated not only by transcription factors binding at single... more The transcription of genes is often regulated not only by transcription factors binding at single sites per promoter, but by the interplay of multiple copies of one or more transcription factors binding at multiple sites forming a cis-regulatory module. The computational recognition of cis-regulatory modules from ChIP-seq or other high-throughput data is crucial in modern life and medical sciences. A common type of cis-regulatory modules are homotypic clusters of binding sites, i.e., clusters of binding sites of one transcription factor. For their recognition the homotypic Sunflower Hidden Markov Model is a promising statistical model. However, this model neglects statistical dependences among nucleotides within binding sites and flanking regions, which makes it not well suited for de-novo motif discovery. Here, we propose an extension of this model that allows statistical dependences within binding sites, their reverse complements, and flanking regions. We study the efficacy of thi...
Background Positional weight matrix (PWM) is a de facto standard model to describe transcription ... more Background Positional weight matrix (PWM) is a de facto standard model to describe transcription factor (TF) DNA binding specificities. PWMs inferred from in vivo or in vitro data are stored in many databases and used in a plethora of biological applications. This calls for comprehensive benchmarking of public PWM models with large experimental reference sets. Results Here we report results from all-against-all benchmarking of PWM models for DNA binding sites of human TFs on a large compilation of in vitro (HT-SELEX, PBM) and in vivo (ChIP-seq) binding data. We observe that the best performing PWM for a given TF often belongs to another TF, usually from the same family. Occasionally, binding specificity is correlated with the structural class of the DNA binding domain, indicated by good cross-family performance measures. Benchmarking-based selection of family-representative motifs is more effective than motif clustering-based approaches. Overall, there is good agreement between in v...
Several Genome Wide Association Studies (GWAS) have reported variants associated to immune diseas... more Several Genome Wide Association Studies (GWAS) have reported variants associated to immune diseases. However, the identified variants are rarely the drivers of the associations and the molecular mechanisms behind the genetic contributions remain poorly understood. ChIP-seq data for TFs and histone modifications provide snapshots of protein-DNA interactions allowing the identification of heterozygous SNPs showing significant allele specific signals (AS-SNPs). AS-SNPs can change a TF binding site resulting in altered gene regulation and are primary candidates to explain associations observed in GWAS and expression studies. We identified 17,293 unique AS-SNPs across 7 lymphoblastoid cell lines. In this set of cell lines we interrogated 85% of common genetic variants in the population for potential regulatory effect and we identified 237 AS-SNPs associated to immune GWAS traits and 714 to gene expression in B cells. To elucidate possible regulatory mechanisms we integrated long-range 3D...
High-throughput data, for instance ChIP-seq data, measure binding of transcription factors (TFs) ... more High-throughput data, for instance ChIP-seq data, measure binding of transcription factors (TFs) or other proteins to DNA and have become a widespread data source for de-novo motif discovery. Often, several ChIP-seq data sets study the same TF under different conditions resulting in several, potentially redundant motifs, which demands for identification and clustering of similar motifs. Here, we propose a refined measure of motif similarity based on the correlation between score profiles on de Bruijn sequences. We demonstrate the utility of the proposed measure in benchmark studies on artificial motifs and motifs discovered from ENCODE ChIP-seq data. We use this measure to cluster motifs discovered from 757 different ENCODE ChIP-seq data sets for 166 TFs and RNA-polymerase II and III. Based on this clustering, we derive a TF interaction network that reflects many known TF-TF interactions, but also reveals novel putative interaction partners.
Czech Journal of Genetics and Plant Breeding, 2011
As a resource for structural and functional barley genome analysis, more than 140 000 ESTs (... more As a resource for structural and functional barley genome analysis, more than 140 000 ESTs (expressed sequence tags) were generated from 22 cDNA libraries that yielded 25 224 tentative unigenes. About 50% of them belong to gene families. The size of the complete transcriptome is estimated to comprise between 35 000 and 75 000 genes. The barley EST collection is a rich source for the development of novel markers including SSRs (simple sequence repeats) and SNPs (single nucleotide polymorphisms). Several bioinformatic tools have been developed facilitating the computer-assisted analysis of EST databases for the presence of either SNPs or SSRs and the development of SNP-derived CAPS (cleaved amplified polymorphic sequences) markers. In an attempt to systematically map barley genes a high-density transcript map is under construction and presently comprises more than 1000 markers. This map is a gateway to comparative genomics with particular emphasis on the rice genome. 65% of ...
Plant-specific EFFECTORS OF TRANSCRIPTION (ET) are characterised by a variable number of highly c... more Plant-specific EFFECTORS OF TRANSCRIPTION (ET) are characterised by a variable number of highly conserved ET repeats, which are involved in zinc and DNA binding. In addition, ETs share a GIY-YIG domain, involved in DNA nicking activity. It was hypothesised that ETs might act as epigenetic regulators. Here, methylome, transcriptome and phenotypic analyses were performed to investigate the role of ET factors and their involvement in DNA methylation in Arabidopsis thaliana. Comparative DNA methylation and transcriptome analyses in flowers and seedlings of et mutants revealed ET-specific differentially expressed genes and mostly independently characteristic, ET-specific differentially methylated regions. Loss of ET function results in pleiotropic developmental defects. The accumulation of cyclobutane pyrimidine dimers after ultraviolet stress in et mutants suggests an ET function in DNA repair.
Functions for RNA-binding proteins in orchestrating plant development and environmental responses... more Functions for RNA-binding proteins in orchestrating plant development and environmental responses are well established. However, the lack of a genome-wide view of their in vivo binding targets and binding landscapes represents a gap in understanding the mode of action of plant RNA-binding proteins. Here, we adapt individual nucleotide resolution crosslinking and immunoprecipitation (iCLIP) genome-wide to determine the binding repertoire of the circadian clock-regulated Arabidopsis thaliana glycine-rich RNA-binding protein AtGRP7. iCLIP identifies 858 transcripts with significantly enriched crosslink sites in plants expressing AtGRP7-GFP that are absent in plants expressing an RNA-binding-dead AtGRP7 variant or GFP alone. To independently validate the targets, we performed RNA immunoprecipitation (RIP)-sequencing of AtGRP7-GFP plants subjected to formaldehyde fixation. Of the iCLIP targets, 452 were also identified by RIP-seq and represent a set of high-confidence binders. AtGRP7 can...
Growing evidence makes a strong case that epigenetic mechanisms contribute to complex traits, wit... more Growing evidence makes a strong case that epigenetic mechanisms contribute to complex traits, with implications across many fields of biology from dissecting developmental processes to understanding aspects of human health and disease. In ecology, recent studies have merged ecological experimental design with epigenetic analyses to elucidate the contribution of epigenetics to plant phenotypes, stress response, adaptation to habitat, or species range distributions. While there has been some progress in revealing the role of epigenetics in ecological processes, many studies with non-model species have so far been limited to describing broad patterns based on anonymous markers of DNA methylation. In contrast, studies with model species have benefited from powerful genomic resources, which allow for a more mechanistic understanding but have limited ecological realism. To understand the true significance of epigenetics for plant ecology and evolution, we must combine both approaches tran...
The plant hormone ethylene regulates numerous developmental processes and stress responses. Ethyl... more The plant hormone ethylene regulates numerous developmental processes and stress responses. Ethylene signaling proceeds via a linear pathway, which activates transcription factor (TF) EIN3, a primary transcriptional regulator of ethylene response. EIN3 influences gene expression upon binding to a specific sequence in gene promoters. This interaction, however, might be considerably affected by additional co-factors. In this work, we perform whole genome bioinformatics study to identify the impact of epigenetic factors in EIN3 functioning. The analysis of publicly available ChIP-Seq data on EIN3 binding in Arabidopsis thaliana showed bimodality of distribution of EIN3 binding regions (EBRs) in gene promoters. Besides a sharp peak in close proximity to transcription start site, which is a common binding region for a wide variety of TFs, we found an additional extended peak in the distal promoter region. We characterized all EBRs with respect to the epigenetic status appealing to previo...
Sugar beet (Beta vulgaris ssp. vulgaris) is a biennial, sucrose-storing plant, which is mainly cu... more Sugar beet (Beta vulgaris ssp. vulgaris) is a biennial, sucrose-storing plant, which is mainly cultivated as a spring crop and harvested in the vegetative stage before winter. For increasing beet yield, over-winter cultivation would be advantageous. However, bolting is induced after winter and drastically reduces yield. Thus, post-winter bolting control is essential for winter beet cultivation. To identify genetic factors controlling bolting after winter, a F2 population was previously developed by crossing the sugar beet accessions BETA 1773 with reduced bolting tendency and 93161P with complete bolting after winter. For a mapping-by-sequencing analysis, pools of 26 bolting-resistant and 297 bolting F2 plants were used. Thereby, a single continuous homozygous region of 103 kb was co-localized to the previously published BR1 QTL for post-winter bolting resistance (Pfeiffer et al., 2014). The BR1 locus was narrowed down to 11 candidate genes from which a homolog of the Arabidopsis CL...
Small nucleolar RNAs (snoRNAs) are one of the most ancient families amongst non-protein-coding RN... more Small nucleolar RNAs (snoRNAs) are one of the most ancient families amongst non-protein-coding RNAs. They are ubiquitous in Archaea and Eukarya but absent in bacteria. Their main function is to target chemical modifications of ribosomal RNAs. They fall into two classes, box C/D snoRNAs and box H/ACA snoRNAs, which are clearly distinguished by conserved sequence motifs and the type of chemical modification that they govern. Similarly to microRNAs, snoRNAs appear in distinct families of homologs that affect homologous targets. In animals, snoRNAs and their evolution have been studied in much detail. In plants, however, their evolution has attracted comparably little attention. In order to chart the phylogenetic distribution of individual snoRNA families in plants, we applied a sophisticated approach for identifying homologs of known plant snoRNAs across the plant kingdom. In response to the relatively fast evolution of snoRNAs, information on conserved sequence boxes, target sequences...
High-throughput sequencing techniques have made it possible to assay an organism's entire rep... more High-throughput sequencing techniques have made it possible to assay an organism's entire repertoire of small non-coding RNAs (ncRNAs) in an efficient and cost-effective manner. The moderate size of small RNA-seq datasets makes it feasible to provide free web services to the research community that provide many basic features of a small RNA-seq analysis, including quality control, read normalization, ncRNA quantification, and the prediction of putative novel ncRNAs. DARIO is one such system that so far has been focussed on animals. Here we introduce an extension of this system to plant short non-coding RNAs (sncRNAs). It includes major modifications to cope with plant-specific sncRNA processing. The current version of plantDARIO covers analyses of mapping files, small RNA-seq quality control, expression analyses of annotated sncRNAs, including the prediction of novel miRNAs and snoRNAs from unknown expressed loci and expression analyses of user-defined loci. At present Arabidops...
The transcription of genes is often regulated not only by transcription factors binding at single... more The transcription of genes is often regulated not only by transcription factors binding at single sites per promoter, but by the interplay of multiple copies of one or more transcription factors binding at multiple sites forming a cis-regulatory module. The computational recognition of cis-regulatory modules from ChIP-seq or other high-throughput data is crucial in modern life and medical sciences. A common type of cis-regulatory modules are homotypic clusters of binding sites, i.e., clusters of binding sites of one transcription factor. For their recognition the homotypic Sunflower Hidden Markov Model is a promising statistical model. However, this model neglects statistical dependences among nucleotides within binding sites and flanking regions, which makes it not well suited for de-novo motif discovery. Here, we propose an extension of this model that allows statistical dependences within binding sites, their reverse complements, and flanking regions. We study the efficacy of thi...
Uploads
Papers