Cyanobacterial macrocolonies known as Llayta are found at Andean wetlands and consumed since pre-... more Cyanobacterial macrocolonies known as Llayta are found at Andean wetlands and consumed since pre-Columbian times in South America. Macrocolonies of filamentous cyanobacteria are niches for colonization by other microorganisms; however, the microbiome of edible Llayta has not been explored. Based on a culture-independent approach, we report the presence, identification and metagenomic genome reconstruction of Cyanocohniella sp. LLY associated to Llayta trichomes. The assembled genome of strain LLY is now available for further inquiries, and may be instrumental for taxonomic advances on this genus. All known members of the Cyanocohniella genus have been isolated from salty European habitats. A biogeographic gap for the Cyanocohniella genus is partially filled by the existence of strain LLY at Andes Mountains wetlands in South America as a new habitat. This is the first genome available for members of this genus. Genes involved in primary and secondary metabolism are described providin...
Inflammation and infection of bovine mammary glands, commonly known as mastitis, imposes signific... more Inflammation and infection of bovine mammary glands, commonly known as mastitis, imposes significant losses each year in the dairy industry worldwide. While several different bacterial species have been identified as causative agents of mastitis, many clinical mastitis cases remain culture negative, even after enrichment for bacterial growth. To understand the basis for this increasingly common phenomenon, the composition of bacterial communities from milk samples was analyzed using culture independent pyrosequencing of amplicons of 16S ribosomal RNA genes (16S rDNA). Comparisons were made of the microbial community composition of culture negative milk samples from mastitic quarters with that of non-mastitic quarters from the same animals. Genomic DNA from culture-negative clinical and healthy quarter sample pairs was isolated, and amplicon libraries were prepared using indexed primers specific to the V1–V2 region of bacterial 16S rRNA genes and sequenced using the Roche 454 GS FLX ...
Accurate estimations of the seroprevalence of antibodies to severe acute respiratory syndrome cor... more Accurate estimations of the seroprevalence of antibodies to severe acute respiratory syndrome coronavirus 2 need to properly consider the specificity and sensitivity of the antibody tests. In addition, prior knowledge of the extent of viral infection in a population may also be important for adjusting the estimation of seroprevalence. For this purpose, we have developed a Bayesian approach that can incorporate the variabilities of specificity and sensitivity of the antibody tests, as well as the prior probability distribution of seroprevalence. We have demonstrated the utility of our approach by applying it to a recently published large-scale dataset from the US CDC, with our results providing entire probability distributions of seroprevalence instead of single-point estimates. Our Bayesian code is freely available at https://github.com/qunfengdong/AntibodyTest.
A Gram-positive, coagulase-negative, novobiocin resistant, and lithium-tolerant bacterium was iso... more A Gram-positive, coagulase-negative, novobiocin resistant, and lithium-tolerant bacterium was isolated from Salar de Atacama. Strain LCHXa is closely related to Staphylococcus sciuri. Its genome is 3,013,090 bp long and contains 2,551 predicted protein genes. We observed 58 genes associated with stress response and 17 genes linked to osmoregulation, mainly related to glycine betaine metabolism.
Modern pyrosequencing techniques make it possible to study complex bacterial populations, such as... more Modern pyrosequencing techniques make it possible to study complex bacterial populations, such as 16S rRNA, directly from environmental or clinical samples without the need for laboratory purification. Alignment of sequences across the resultant large data sets (100,000+ sequences) is of particular interest for the purpose of identifying potential gene clusters and families, but such analysis represents a daunting computational task. The aim of this work is the development of an efficient pipeline for the clustering of large sequence read sets. Pairwise alignment techniques are used here to calculate genetic distances between sequence pairs. These methods are pleasingly parallel and have been shown to more accurately reflect accurate genetic distances in highly variable regions of rRNA genes than do traditional multiple sequence alignment (MSA) approaches. By utilizing Needleman-Wunsch (NW) pairwise alignment in conjunction with novel implementations of interpolative multidimensiona...
Derived from the maize Mu1 transposon, RescueMu provides strategies for maize gene discovery and ... more Derived from the maize Mu1 transposon, RescueMu provides strategies for maize gene discovery and mutant phenotypic analysis. 9.92 Mb of gene-enriched sequences next to RescueMu insertion sites were co-assembled with expressed sequence tags and analyzed. Multiple plasmid recoveries identified probable germinal insertions and screening of RescueMu plasmid libraries identified plants containing probable germinal insertions. Although frequently recovered parental insertions and insertion hotspots reduce the efficiency of gene discovery per plasmid, RescueMu targets a large variety of genes and produces knockout mutants.
Although GBrowse is popular for visualizing genomic features along a reference sequence, its inst... more Although GBrowse is popular for visualizing genomic features along a reference sequence, its installa-tion and configuration are difficult for many biologists. WebGBrowse is a web server that takes a user-supplied annotation file, guides users to configure the display of each genomic feature, and allows users to visualize the genome annotation with integrated GBrowse software. This protocol guides the
Urobiome research has the potential to advance the understanding of a wide range of diseases, inc... more Urobiome research has the potential to advance the understanding of a wide range of diseases, including lower urinary tract symptoms and kidney disease. Many scientific areas have benefited from early research method consensus to facilitate the greater, common good.
To develop a mathematical model to characterize age-specific case-fatality rates (CFR) of COVID-1... more To develop a mathematical model to characterize age-specific case-fatality rates (CFR) of COVID-19. Based on 2 large-scale Chinese and Italian CFR data, a logistic model was derived to provide quantitative insight on the dynamics between CFR and age. We inferred that CFR increased faster in Italy than in China, as well as in females over males. In addition, while CFR increased with age, the rate of growth eventually slowed down, with a predicted theoretical upper limit for males (32%), females (21%), and the general population (23%). Our logistic model provided quantitative insight on the dynamics of CFR.
Journal of the American Medical Informatics Association
Objective Estimating the hospitalization risk for people with comorbidities infected by the SARS-... more Objective Estimating the hospitalization risk for people with comorbidities infected by the SARS-CoV-2 virus is important for developing public health policies and guidance. Traditional biostatistical methods for risk estimations require: (i) the number of infected people who were not hospitalized, which may be severely undercounted since many infected people were not tested; (ii) comorbidity information for people not hospitalized, which may not always be readily available. We aim to overcome these limitations by developing a Bayesian approach to estimate the risk ratio of hospitalization for COVID-19 patients with comorbidities. Materials and Methods We derived a Bayesian approach to estimate the posterior distribution of the risk ratio using the observed frequency of comorbidities in COVID-19 patients in hospitals and the prevalence of comorbidities in the general population. We applied our approach to 2 large-scale datasets in the United States: 2491 patients in the COVID-NET, a...
A common research task in COVID-19 studies often involves the prevalence estimation of certain me... more A common research task in COVID-19 studies often involves the prevalence estimation of certain medical outcomes. Although point estimates with confidence intervals are typically obtained, a better approach is to estimate the entire posterior probability distribution of the prevalence, which can be easily accomplished with a standard Bayesian approach using binomial likelihood and its conjugate beta prior distribution. Using two recently published COVID-19 data sets, we performed Bayesian analysis to estimate the prevalence of infection fatality in Iceland and asymptomatic children in the United States.
Cyanobacterial macrocolonies known as Llayta are found at Andean wetlands and consumed since pre-... more Cyanobacterial macrocolonies known as Llayta are found at Andean wetlands and consumed since pre-Columbian times in South America. Macrocolonies of filamentous cyanobacteria are niches for colonization by other microorganisms; however, the microbiome of edible Llayta has not been explored. Based on a culture-independent approach, we report the presence, identification and metagenomic genome reconstruction of Cyanocohniella sp. LLY associated to Llayta trichomes. The assembled genome of strain LLY is now available for further inquiries, and may be instrumental for taxonomic advances on this genus. All known members of the Cyanocohniella genus have been isolated from salty European habitats. A biogeographic gap for the Cyanocohniella genus is partially filled by the existence of strain LLY at Andes Mountains wetlands in South America as a new habitat. This is the first genome available for members of this genus. Genes involved in primary and secondary metabolism are described providin...
Inflammation and infection of bovine mammary glands, commonly known as mastitis, imposes signific... more Inflammation and infection of bovine mammary glands, commonly known as mastitis, imposes significant losses each year in the dairy industry worldwide. While several different bacterial species have been identified as causative agents of mastitis, many clinical mastitis cases remain culture negative, even after enrichment for bacterial growth. To understand the basis for this increasingly common phenomenon, the composition of bacterial communities from milk samples was analyzed using culture independent pyrosequencing of amplicons of 16S ribosomal RNA genes (16S rDNA). Comparisons were made of the microbial community composition of culture negative milk samples from mastitic quarters with that of non-mastitic quarters from the same animals. Genomic DNA from culture-negative clinical and healthy quarter sample pairs was isolated, and amplicon libraries were prepared using indexed primers specific to the V1–V2 region of bacterial 16S rRNA genes and sequenced using the Roche 454 GS FLX ...
Accurate estimations of the seroprevalence of antibodies to severe acute respiratory syndrome cor... more Accurate estimations of the seroprevalence of antibodies to severe acute respiratory syndrome coronavirus 2 need to properly consider the specificity and sensitivity of the antibody tests. In addition, prior knowledge of the extent of viral infection in a population may also be important for adjusting the estimation of seroprevalence. For this purpose, we have developed a Bayesian approach that can incorporate the variabilities of specificity and sensitivity of the antibody tests, as well as the prior probability distribution of seroprevalence. We have demonstrated the utility of our approach by applying it to a recently published large-scale dataset from the US CDC, with our results providing entire probability distributions of seroprevalence instead of single-point estimates. Our Bayesian code is freely available at https://github.com/qunfengdong/AntibodyTest.
A Gram-positive, coagulase-negative, novobiocin resistant, and lithium-tolerant bacterium was iso... more A Gram-positive, coagulase-negative, novobiocin resistant, and lithium-tolerant bacterium was isolated from Salar de Atacama. Strain LCHXa is closely related to Staphylococcus sciuri. Its genome is 3,013,090 bp long and contains 2,551 predicted protein genes. We observed 58 genes associated with stress response and 17 genes linked to osmoregulation, mainly related to glycine betaine metabolism.
Modern pyrosequencing techniques make it possible to study complex bacterial populations, such as... more Modern pyrosequencing techniques make it possible to study complex bacterial populations, such as 16S rRNA, directly from environmental or clinical samples without the need for laboratory purification. Alignment of sequences across the resultant large data sets (100,000+ sequences) is of particular interest for the purpose of identifying potential gene clusters and families, but such analysis represents a daunting computational task. The aim of this work is the development of an efficient pipeline for the clustering of large sequence read sets. Pairwise alignment techniques are used here to calculate genetic distances between sequence pairs. These methods are pleasingly parallel and have been shown to more accurately reflect accurate genetic distances in highly variable regions of rRNA genes than do traditional multiple sequence alignment (MSA) approaches. By utilizing Needleman-Wunsch (NW) pairwise alignment in conjunction with novel implementations of interpolative multidimensiona...
Derived from the maize Mu1 transposon, RescueMu provides strategies for maize gene discovery and ... more Derived from the maize Mu1 transposon, RescueMu provides strategies for maize gene discovery and mutant phenotypic analysis. 9.92 Mb of gene-enriched sequences next to RescueMu insertion sites were co-assembled with expressed sequence tags and analyzed. Multiple plasmid recoveries identified probable germinal insertions and screening of RescueMu plasmid libraries identified plants containing probable germinal insertions. Although frequently recovered parental insertions and insertion hotspots reduce the efficiency of gene discovery per plasmid, RescueMu targets a large variety of genes and produces knockout mutants.
Although GBrowse is popular for visualizing genomic features along a reference sequence, its inst... more Although GBrowse is popular for visualizing genomic features along a reference sequence, its installa-tion and configuration are difficult for many biologists. WebGBrowse is a web server that takes a user-supplied annotation file, guides users to configure the display of each genomic feature, and allows users to visualize the genome annotation with integrated GBrowse software. This protocol guides the
Urobiome research has the potential to advance the understanding of a wide range of diseases, inc... more Urobiome research has the potential to advance the understanding of a wide range of diseases, including lower urinary tract symptoms and kidney disease. Many scientific areas have benefited from early research method consensus to facilitate the greater, common good.
To develop a mathematical model to characterize age-specific case-fatality rates (CFR) of COVID-1... more To develop a mathematical model to characterize age-specific case-fatality rates (CFR) of COVID-19. Based on 2 large-scale Chinese and Italian CFR data, a logistic model was derived to provide quantitative insight on the dynamics between CFR and age. We inferred that CFR increased faster in Italy than in China, as well as in females over males. In addition, while CFR increased with age, the rate of growth eventually slowed down, with a predicted theoretical upper limit for males (32%), females (21%), and the general population (23%). Our logistic model provided quantitative insight on the dynamics of CFR.
Journal of the American Medical Informatics Association
Objective Estimating the hospitalization risk for people with comorbidities infected by the SARS-... more Objective Estimating the hospitalization risk for people with comorbidities infected by the SARS-CoV-2 virus is important for developing public health policies and guidance. Traditional biostatistical methods for risk estimations require: (i) the number of infected people who were not hospitalized, which may be severely undercounted since many infected people were not tested; (ii) comorbidity information for people not hospitalized, which may not always be readily available. We aim to overcome these limitations by developing a Bayesian approach to estimate the risk ratio of hospitalization for COVID-19 patients with comorbidities. Materials and Methods We derived a Bayesian approach to estimate the posterior distribution of the risk ratio using the observed frequency of comorbidities in COVID-19 patients in hospitals and the prevalence of comorbidities in the general population. We applied our approach to 2 large-scale datasets in the United States: 2491 patients in the COVID-NET, a...
A common research task in COVID-19 studies often involves the prevalence estimation of certain me... more A common research task in COVID-19 studies often involves the prevalence estimation of certain medical outcomes. Although point estimates with confidence intervals are typically obtained, a better approach is to estimate the entire posterior probability distribution of the prevalence, which can be easily accomplished with a standard Bayesian approach using binomial likelihood and its conjugate beta prior distribution. Using two recently published COVID-19 data sets, we performed Bayesian analysis to estimate the prevalence of infection fatality in Iceland and asymptomatic children in the United States.
Uploads