Skip to main content
Dawn Field

    Dawn Field

    We have examined the phylogenetic distribution of the longest, perfect microsatellites in GenBank. Despite the large contributions of model higher-eukaryotic organisms to GenBank, the selective cloning of long microsatellites from these... more
    We have examined the phylogenetic distribution of the longest, perfect microsatellites in GenBank. Despite the large contributions of model higher-eukaryotic organisms to GenBank, the selective cloning of long microsatellites from these organisms as genetic markers, and the relative lack of concentration on the microsatellites in lower eukaryotes and prokaryotes, we found that simple organisms, defined here as slime molds, fungi, protists, prokaryotes, viruses, organelles and plasmids, contributed 78 of the 375 examined sequences. These 78 simple-organism microsatellites are characterized predominantly by trinucleotide repeats, nearly half of which lie in exons, and in general show a bias towards A+T rich motifs. Simple-organism microsatellites represented more than once in GenBank displayed length polymorphisms when independent clones were compared. These facts collectively raise speculation as to the role of these 'junk' sequences in such highly economical genomes, especially when precise changes in long microsatellites are known to regulate critical virulence factors in several prokaryotes. Regardless of their biological significance, simple-organism microsatellites may provide a general source of molecular markers to track disease outbreaks and the evolution of microorganisms in unprecedented detail.
    We describe a general method based on principal coordinates analysis to predict the effects of single-nucleotide polymorphisms within regulatory sequences on DNA–protein interactions. We use binding data for the transcription factor NF-κB... more
    We describe a general method based on principal coordinates analysis to predict the effects of single-nucleotide polymorphisms within regulatory sequences on DNA–protein interactions. We use binding data for the transcription factor NF-κB as a test system. The ...
    Twelve patients infected with the human immunodeficiency virus (HIV) and with CD4 cell counts below 100 cells/microliter received fluconazole daily (200 mg; five patients) or weekly (400 mg; seven patients) for fungal prophylaxis during a... more
    Twelve patients infected with the human immunodeficiency virus (HIV) and with CD4 cell counts below 100 cells/microliter received fluconazole daily (200 mg; five patients) or weekly (400 mg; seven patients) for fungal prophylaxis during a 6-month period. Oropharyngeal swabs were taken at regular intervals in order to detect colonization with Candida spp. All yeast isolates were examined with respect to the development over time of fluconazole resistance. Genetic diversity among the strains was assessed in order to discriminate between selection of a resistant subclone and patient recolonization. Genotyping was performed through random amplification of polymorphic DNA (RAPD) analysis. Specific site polymorphisms were assayed by tracking length variability in several microsatellite loci. Finally, to maximize resolution, one of these loci (ERK1) was analyzed by nucleotide sequencing. Although the number of strains analyzed was too small to allow statistical verification, it appeared that when fluconazole was given weekly, a smaller fraction of the strains showed diminished sensitivity than when it was given daily. Genetic analyses allowed three different scenarios to be discerned. Resistance development in an otherwise apparently unchanged strain was seen for 1 of the 12 patients. Clear strain replacement was observed for 3 of the remaining 11 patients. For all other patients minor differences were seen in either the RAPD genotype or the microsatellite allele composition during the course of treatment. In general, microsatellite sequence data is in agreement with data obtained by other methods, but occasionally within-patient heterogeneity is indicated. The present results show that during fluconazole treatment colonizing strains can remain identical, be replaced by clearly different strains, or undergo small changes. Within a patient there may be different levels of intrastrain variation.
    The Genomic Contextual Data Markup Language (GCDML) is a core project of the Genomic Standards Consortium (GSC) that implements the... more
    The Genomic Contextual Data Markup Language (GCDML) is a core project of the Genomic Standards Consortium (GSC) that implements the "Minimum Information about a Genome Sequence" (MIGS) specification and its extension, the "Minimum Information about a Metagenome Sequence" (MIMS). GCDML is an XML Schema for generating MIGS/MIMS compliant reports for data entry, exchange, and storage. When mature, this sample-centric, strongly-typed schema will provide a diverse set of descriptors for describing the exact origin and processing of a biological sample, from sampling to sequencing, and subsequent analysis. Here we describe the need for such a project, outline design principles required to support the project, and make an open call for participation in defining the future content of GCDML. GCDML is freely available, and can be downloaded, along with documentation, from the GSC Web site (http://gensc.org).
    The PlantTribes database (http://fgp.huck.psu.edu/tribe.html) is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and... more
    The PlantTribes database (http://fgp.huck.psu.edu/tribe.html) is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and Populus trichocarpa. We used the graph-based clustering algorithm MCL [Van Dongen (Technical Report INS-R0010 2000) and Enright et al. (Nucleic Acids Res. 2002; 30: 1575–1584)] to classify all of these species’ protein-coding genes into putative gene families, called tribes, using three clustering stringencies (low, medium and high). For all tribes, we have generated protein and DNA alignments and maximum-likelihood phylogenetic trees. A parallel database of microarray experimental results is linked to the genes, which lets researchers identify groups of related genes and their expression patterns. Unified nomenclatures were developed, and tribes can be related to traditional gene families and conserved domain identifiers. SuperTribes, constructed through a second iteration of MCL clustering, connect distant, but potentially related gene clusters. The global classification of nearly 200 000 plant proteins was used as a scaffold for sorting ~4 million additional cDNA sequences from over 200 plant species. All data and analyses are accessible through a flexible interface allowing users to explore the classification, to place query sequences within the classification, and to download results for further study.
    This meeting report summarizes the proceedings of the fifth Genomic Standards Consortium (GSC) workshop held December 12-14, 2007, at the European Bioinformatics Institute (EBI), Cambridge, UK. This fifth workshop served as a milestone... more
    This meeting report summarizes the proceedings of the fifth Genomic Standards Consortium (GSC) workshop held December 12-14, 2007, at the European Bioinformatics Institute (EBI), Cambridge, UK. This fifth workshop served as a milestone event in the evolution of the GSC (launched in September 2005); the key outcome of the workshop was the finalization of a stable version of the MIGS specification (v2.0) for publication. This accomplishment enables, and also in some cases necessitates, downstream activities, which are described in the multiauthor, consensus-driven articles in this special issue of OMICS produced as a direct result of the workshop. This report briefly summarizes the workshop and overviews the special issue. In particular, it aims to explain how the various GSC-led projects are working together to help this community achieve its stated mission of further standardizing the descriptions of genomes and metagenomes and implementing improved mechanisms of data exchange and integration to enable more accurate comparative analyses. Further information about the GSC and its range of activities can be found at http://gensc.org.
    Summary: The first open source software suite for experimentalists and curators that (i) assists in the annotation and local management of experimental metadata from high-throughput studies employing one or a combination of omics and... more
    Summary: The first open source software suite for experimentalists and curators that (i) assists in the annotation and local management of experimental metadata from high-throughput studies employing one or a combination of omics and other technologies; (ii) empowers users to uptake community-defined checklists and ontologies; and (iii) facilitates submission to international public repositories.Availability and Implementation: Software, documentation, case studies and implementations at http://www.isa-tools.orgContact: isatools/at/googlegroups.com<!-- try{initUnObscureEmail ("e_id645732", '<a href="' + reverseAndReplaceString('moc.spuorgelgoog/ta/slootasi:otliam', '/at/', '@') + '">' + reverseAndReplaceString('moc.spuorgelgoog/ta/slootasi', '/at/','@') + '</a>')}catch(e){} //-->
    The plasmid pQBR103 was found within Pseudomonas populations colonizing the leaf and root surfaces of sugar beet plants growing at Wytham, Oxfordshire, UK. At 425 kb it is the largest self-transmissible plasmid yet sequenced from the... more
    The plasmid pQBR103 was found within Pseudomonas populations colonizing the leaf and root surfaces of sugar beet plants growing at Wytham, Oxfordshire, UK. At 425 kb it is the largest self-transmissible plasmid yet sequenced from the phytosphere. It is known to enhance the competitive fitness of its host, and parts of the plasmid are known to be actively transcribed in the plant environment. Analysis of the complete sequence of this plasmid predicts a coding sequence (CDS)-rich genome containing 478 CDSs and an exceptional degree of genetic novelty; 80% of predicted coding sequences cannot be ascribed a function and 60% are orphans. Of those to which function could be assigned, 40% bore greatest similarity to sequences from Pseudomonas spp, and the majority of the remainder showed similarity to other c-proteobacterial genera and plasmids. pQBR103 has identifiable regions presumed responsible for replication and partitioning, but despite being tra+ lacks the full complement of any previously described conjugal transfer functions. The DNA sequence provided few insights into the functional significance of plant-induced transcriptional regions, but suggests that 14% of CDSs may be expressed (11 CDSs with functional annotation and 54 without), further highlighting the ecological importance of these novel CDSs. Comparative analysis indicates that pQBR103 shares significant regions of sequence with other plasmids isolated from sugar beet plants grown at the same geographic location. These plasmid sequences indicate there is more novelty in the mobile DNA pool accessible to phytosphere pseudomonas than is currently appreciated or understood.
    Genome sequencing, the determination of the complete complement of DNA in an organism, is revolutionizing all aspects of the biological sciences. Genome sequences make available for scientific scrutiny the complete genetic capacity of an... more
    Genome sequencing, the determination of the complete complement of DNA in an organism, is revolutionizing all aspects of the biological sciences. Genome sequences make available for scientific scrutiny the complete genetic capacity of an organism. With respect to microbes, this means we now have the unprecedented opportunity to investigate the molecular basis of commensal and virulence behavior. We now have genome sequences for a wide range of bacterial pathogens (obligate, facultative, and opportunistic); this has facilitated the discovery of many previously unidentified determinants of pathogenicity and has provided novel insights into what creates a pathogen. In-depth analyses of bacterial genomes are also providing new perspectives on bacterial physiology, molecular adaptation to a preferred niche, and genomic susceptibility to the uptake of foreign DNA, three key factors that can play a significant role in determining whether a species, or a strain, will have pathogenic potential.
    The Genomic Standards Consortium (GSC) invited a representative of the Long-Term Ecological Research (LTER) to its fifth workshop to present the Ecological Metadata Language (EML) metadata standard and its relationship to the Minimum... more
    The Genomic Standards Consortium (GSC) invited a representative of the Long-Term Ecological Research (LTER) to its fifth workshop to present the Ecological Metadata Language (EML) metadata standard and its relationship to the Minimum Information about a Genome/Metagenome Sequence (MIGS/MIMS) and its implementation, the Genomic Contextual Data Markup Language (GCDML). The LTER is one of the top National Science Foundation (NSF) programs in biology since 1980, representing diverse ecosystems and creating long-term, interdisciplinary research, synthesis of information, and theory. The adoption of EML as the LTER network standard has been key to build network synthesis architectures based on high-quality standardized metadata. EML is the NSF-recognized metadata standard for LTER, and EML is a criteria used to review the LTER program progress. At the workshop, a potential crosswalk between the GCDML and EML was explored. Also, collaboration between the LTER and GSC developers was proposed to join efforts toward a common metadata cataloging designer&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39;s tool. The community adoption success of a metadata standard depends, among other factors, on the tools and trainings developed to use the standard. LTER&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#39;s experience in embracing EML may help GSC to achieve similar success. A possible collaboration between LTER and GSC to provide training opportunities for GCDML and the associated tools is being explored. Finally, LTER is investigating EML enhancements to better accommodate genomics data, possibly integrating the GCDML schema into EML. All these action items have been accepted by the LTER contingent, and further collaboration between the GSC and LTER is expected.
    This report summarizes the proceedings of the “Metagenomics, Metadata and Meta-analysis” (M3) Special Interest Group (SIG) meeting held at the Intelligent Systems for Molecular Biology 2009 conference. The Genomic Standards Consortium... more
    This report summarizes the proceedings of the “Metagenomics, Metadata and Meta-analysis” (M3) Special Interest Group (SIG) meeting held at the Intelligent Systems for Molecular Biology 2009 conference. The Genomic Standards Consortium (GSC) hosted this meeting to explore the bottlenecks and emerging solutions for obtaining biological insights through large-scale comparative analysis of metagenomic datasets. The M3 SIG included 16 talks, half of which were selected from submitted abstracts, a poster session and a panel discussion involving members of the GSC Board. This report summarizes this one-day SIG, attempts to identify shared themes and recapitulates community recommendations for the future of this field. The GSC will also host an M3 workshop at the Pacific Symposium on Biocomputing (PSB) in January 2010. Further information about the GSC and its range of activities can be found at http://gensc.org/.