Skip to main content

    Bret Larget

    The molecular clock hypothesis remains an important conceptual and analytical tool in evolutionary biology despite the repeated observation that the clock hypothesis does not perfectly explain observed DNA sequence variation. We introduce... more
    The molecular clock hypothesis remains an important conceptual and analytical tool in evolutionary biology despite the repeated observation that the clock hypothesis does not perfectly explain observed DNA sequence variation. We introduce a parametric model that relaxes the molecular clock by allowing rates to vary across lineages according to a compound Poisson process. Events of substitution rate change are placed onto a phylogenetic tree according to a Poisson process. When an event of substitution rate change occurs, the current rate of substitution is modified by a gamma-distributed random variable. Parameters of the model can be estimated using Bayesian inference. We use Markov chain Monte Carlo integration to evaluate the posterior probability distribution because the posterior probability involves high dimensional integrals and summations. Specifically, we use the Metropolis-Hastings-Green algorithm with 11 different move types to evaluate the posterior distribution. We demo...
    ABSTRACT A fundamental problem in evolutionary biology is determining evolutionary relationships among different taxa. Genome arrangement data is potentially more informative than DNA sequence data in cases where alignment of DNA... more
    ABSTRACT A fundamental problem in evolutionary biology is determining evolutionary relationships among different taxa. Genome arrangement data is potentially more informative than DNA sequence data in cases where alignment of DNA sequences is highly uncertain. We describe a Bayesian framework for phylogenetic inference from mitochondrial genome arrangement data that uses Markov chain Monte Carlo (MCMC) as the computational engine for inference. Our approach is to model mitochondrial data as a circular signed permutation which is subject to reversals. We calculate the likelihood of one arrangement mutating into another along a single branch by counting the number of possible sequences of reversals which transform the first to the second. We calculate the likelihood of the entire tree by augmenting the state space with the arrangements at the branching points of the tree. We use MCMC to update both the tree and the arrangement data at the branching points.
    Eight ruminally cannulated lactating dairy cows from a study on the effect of dietary rumen-degraded protein on production and digestion of nutrients were used to assess using sample duplication to control day-to-day variation within... more
    Eight ruminally cannulated lactating dairy cows from a study on the effect of dietary rumen-degraded protein on production and digestion of nutrients were used to assess using sample duplication to control day-to-day variation within animals and errors associated with sampling and laboratory analyses. Two consecutive pooled omasal samples, each representing a feeding cycle, were obtained from each cow in each period. The effectiveness of sample duplication in error control was tested by comparing the variance of the difference in treatment means when taking 2 samples from each cow in each period to the variance when taking only one sample. Compared with no duplication, sample duplication improved precision by reducing variance by 50, 40, 31, 23, 23, and 9% for, respectively, rumen-undegraded protein flows, ruminal neutral detergent fiber digestibility, microbial nonammonia N flow, microbial efficiency, organic matter flow, and organic matter truly digested in the rumen. For these sa...
    The molecular clock hypothesis remains an important conceptual and analytical tool in evolutionary biology despite the repeated observation that the clock hypothesis does not perfectly explain observed DNA sequence variation. We introduce... more
    The molecular clock hypothesis remains an important conceptual and analytical tool in evolutionary biology despite the repeated observation that the clock hypothesis does not perfectly explain observed DNA sequence variation. We introduce a parametric model that relaxes the molecular clock by allowing rates to vary across lineages according to a compound Poisson process. Events of substitution rate change are placed onto a phylogenetic tree according to a Poisson process. When an event of substitution rate change occurs, the current rate of substitution is modified by a gamma-distributed random variable. Parameters of the model can be estimated using Bayesian inference. We use Markov chain Monte Carlo integration to evaluate the posterior probability distribution because the posterior probability involves high dimensional integrals and summations. Specifically, we use the Metropolis-Hastings-Green algorithm with 11 different move types to evaluate the posterior distribution. We demo...
    Maintenance of genetic variation at loci under selection has profound implications for adaptation under environmental change. In temporally and spatially varying habitats, non-neutral polymorphism could be maintained by heterozygote... more
    Maintenance of genetic variation at loci under selection has profound implications for adaptation under environmental change. In temporally and spatially varying habitats, non-neutral polymorphism could be maintained by heterozygote advantage across environments (marginal overdominance), which could be greatly increased by beneficial reversal of dominance across conditions. We tested for reversal of dominance and marginal overdominance in salinity tolerance in the saltwater-to-freshwater invading copepod Eurytemora affinis. We compared survival of F1 offspring generated by crossing saline and freshwater inbred lines (between-salinity F1 crosses) relative to within-salinity F1 crosses, across three salinities. We found evidence for both beneficial reversal of dominance and marginal overdominance in salinity tolerance. In support of reversal of dominance, survival of between-salinity F1 crosses was not different from that of freshwater F1 crosses under freshwater conditions and saltwa...
    ... A Likelihood Framework for Estimating Phylogeographic History on a Continuous Landscape, Alan R. Lemmon and Emily Moriarty Lemmon, 544. ... for Host-Symbiont Codivergence Indicates Ancient Origin of Fungal Endophytes in Grasses, CL... more
    ... A Likelihood Framework for Estimating Phylogeographic History on a Continuous Landscape, Alan R. Lemmon and Emily Moriarty Lemmon, 544. ... for Host-Symbiont Codivergence Indicates Ancient Origin of Fungal Endophytes in Grasses, CL Schardl, KD Craven, S. Speakman ...
    Calculating the likelihood of observed DNA sequence data at the leaves of a tree is the computational bottleneck for phylogenetic analysis by Bayesian methods or by the method of maximum,likelihood. Because analysis of even moderately... more
    Calculating the likelihood of observed DNA sequence data at the leaves of a tree is the computational bottleneck for phylogenetic analysis by Bayesian methods or by the method of maximum,likelihood. Because analysis of even moderately sized data sets can require hours of computational time on fast desktop computers, algorithmic changes that substantially increase the speed of the basic likelihood calculation are signican t. It has long been recognized that the contribution to the likelihood at sites with identical patterns is the same and need only be computed once for each unique pattern. We note that sites whose patterns are not identical on the entire tree may be identical on subtrees, and hence partial likelihood calculations made for one site may be stored and used for calculations at another. The bookkeeping and memory requirements are large, but not too excessive for current desktop computers. Timed calculations on many genuine data sets indicate that the computational algori...
    Research Interests:
    The main limiting factor in Bayesian MCMC analysis of phylogeny is typically the efficiency with which topology proposals sample tree space. Here we evaluate the performance of seven different proposal mechanisms, including most of those... more
    The main limiting factor in Bayesian MCMC analysis of phylogeny is typically the efficiency with which topology proposals sample tree space. Here we evaluate the performance of seven different proposal mechanisms, including most of those used in current Bayesian phylogenetics software. We sampled 12 empirical nucleotide data sets--ranging in size from 27 to 71 taxa and from 378 to 2,520 sites--under difficult conditions: short runs, no Metropolis-coupling, and an oversimplified substitution model producing difficult tree spaces (Jukes Cantor with equal site rates). Convergence was assessed by comparison to reference samples obtained from multiple Metropolis-coupled runs. We find that proposals producing topology changes as a side effect of branch length changes (LOCAL and Continuous Change) consistently perform worse than those involving stochastic branch rearrangements (nearest neighbor interchange, subtree pruning and regrafting, tree bisection and reconnection, or subtree swapping). Among the latter, moves that use an extension mechanism to mix local with more distant rearrangements show better overall performance than those involving only local or only random rearrangements. Moves with only local rearrangements tend to mix well but have long burn-in periods, whereas moves with random rearrangements often show the reverse pattern. Combinations of moves tend to perform better than single moves. The time to convergence can be shortened considerably by starting with a good tree, but this comes at the cost of compromising convergence diagnostics based on overdispersed starting points. Our results have important implications for developers of Bayesian MCMC implementations and for the large group of users of Bayesian phylogenetics software.
    ABSTRACT Two major approaches for blind source separation (BSS) are, respectively, based on the maximum likelihood (ML) principle and mutual information (MI) minimization. They have been mainly studied for simple linear mixtures. We here... more
    ABSTRACT Two major approaches for blind source separation (BSS) are, respectively, based on the maximum likelihood (ML) principle and mutual information (MI) minimization. They have been mainly studied for simple linear mixtures. We here show that they additionally involve indirect functional dependencies for general nonlinear mixtures. Moreover, the notations commonly employed by the BSS community in calculations performed for these methods may become misleading when using them for nonlinear mixtures, due to the above-mentioned dependencies. In this paper, we first explain this phenomenon for arbitrary nonlinear mixing models. We then accordingly correct two previously published methods for specific nonlinear mixtures, where indirect dependencies were mistakenly ignored. This paper therefore opens the way to the application of the ML and MI BSS methods to many specific mixing models, by providing general tools to address such mixtures and explicitly showing how to apply these tools to practical cases.
    ABSTRACT Sclerotinia sclerotiorum, the causal agent of potato stem rot, is prevalent and poorly managed on potatoes in the Columbia Basin of Washington. Because of the ubiquitous nature of the fungus and high crop diversity within the... more
    ABSTRACT Sclerotinia sclerotiorum, the causal agent of potato stem rot, is prevalent and poorly managed on potatoes in the Columbia Basin of Washington. Because of the ubiquitous nature of the fungus and high crop diversity within the Columbia Basin, understanding the population structure and the potential for outcrossing of the pathogen would be helpful in developing disease management strategies. The population structure of S. sclerotiorum in the Columbia Basin from potato was examined using microsatellite markers and mycelial compatibility. Analysis of molecular variance revealed that 92% of the variability among 167 isolates was found within subpopulations, with limited, yet statistically significant impact of the collection date, but not the year or location of collection. Linkage disequilibrium and index of association analyses noted a potential for outcrossing in two locations, which was substantiated by the discovery of recombinant ascospores in three field-generated apothecia from the 12 apothecia examined. Microsatellite haplotypes were not correlated with mycelial compatibility groups. This high haplotypic diversity did not seem to impact pathologically important phenotypes. Greenhouse inoculations of potato plants exhibited no significant differences in aggressiveness on potato stems. Moreover, in vitro studies of response to fungicides and temperature stimuli yielded no significant differences among studied isolates. These findings illustrate the potential for outcrossing in warm temperate regions of North America, where a diversity of crops are planted simultaneously and in neighboring fields. This study also indicates that the unsatisfactory management of potato stem rot is likely not directly attributable to genetic factors, but to gaps in agricultural practices.
    We describe a Bayesian approach to estimate phylogeny and ancestral genome arrangements on the basis of genome arrangement data using a model in which gene inversion is the sole mechanism of change. While we have described a similar... more
    We describe a Bayesian approach to estimate phylogeny and ancestral genome arrangements on the basis of genome arrangement data using a model in which gene inversion is the sole mechanism of change. While we have described a similar method to estimate phylogenetic relationships in the statistics literature, the novel contribution of the present work is the description of a method to compute probability distributions of ancestral genome arrangements. We assess the robustness of posterior distributions to different specifications of prior distributions and provide an empirical means to selecting a prior distribution. We note that parsimony approaches to ancestral reconstruction in the literature focus on the development of computationally efficient algorithms for searching for optimal ancestral genome arrangements, but, unlike Bayesian approaches, do not include assessment of uncertainty in these estimates. We compare and contrast a Bayesian approach with a parsimony approach to infer phylogenies and ancestral arrangements from genome arrangement data by re-analyzing a number of previously published data sets.
    Genome arrangements are a potentially powerful source of information to infer evolutionary relationships among distantly related taxa. Mitochondrial genome arrangements may be especially informative about metazoan evolutionary... more
    Genome arrangements are a potentially powerful source of information to infer evolutionary relationships among distantly related taxa. Mitochondrial genome arrangements may be especially informative about metazoan evolutionary relationships because (1) nearly all animals have the same set of definitively homologous mitochondrial genes, (2) mitochondrial genome rearrangement events are rare relative to changes in sequences, and (3) the number of possible mitochondrial genome arrangements is huge, making convergent evolution of genome arrangements appear highly unlikely. In previous studies, phylogenetic evidence in genome arrangement data is nearly always used in a qualitative fashion-the support in favor of clades with similar or identical genome arrangements is considered to be quite strong, but is not quantified. The purpose of this article is to quantify the uncertainty among the relationships of metazoan phyla on the basis of mitochondrial genome arrangements while incorporating prior knowledge of the monophyly of various groups from other sources. The work we present here differs from our previous work in the statistics literature in that (1) we incorporate prior information on classifications of metazoans at the phylum level, (2) we describe several advances in our computational approach, and (3) we analyze a much larger data set (87 taxa) that consists of each unique, complete mitochondrial genome arrangement with a full complement of 37 genes that were present in the NCBI (National Center for Biotechnology Information) database at a recent date. In addition, we analyze a subset of 28 of these 87 taxa for which the non-tRNA mitochondrial genomes are unique where the assumption of our inversion-only model of rearrangement is more plausible. We present summaries of Bayesian posterior distributions of tree topology on the basis of these two data sets.
    ... David Aldous and Bret Larget Department of Statistics, University of California, Berkeley, CA 94720, USA ... I 2677 [81 Domb C, Schneider T and Stoll E 1975 J. Phys A: Math Gm 8 L90 191 Gaunt D S, Sykes M F, "ie GM and... more
    ... David Aldous and Bret Larget Department of Statistics, University of California, Berkeley, CA 94720, USA ... I 2677 [81 Domb C, Schneider T and Stoll E 1975 J. Phys A: Math Gm 8 L90 191 Gaunt D S, Sykes M F, "ie GM and Whillington S G 1982 J. Phys. ...
    The historical database from the Environmental Measurements Laboratory's Quality Assessment Program from 1982 to 1998 has been analyzed to... more
    The historical database from the Environmental Measurements Laboratory's Quality Assessment Program from 1982 to 1998 has been analyzed to determine control limits for future performance evaluations of the different laboratories contracted to the U.S. Department of Energy. Seventy-three radionuclides in four different matrices (air filter, soil, vegetation, and water) were analyzed. The evaluation criteria were established based on a z-score calculation.
    Phylogeography investigates the historical process that is responsible for the contemporary geographic distributions of populations in a species. The inference is made on the basis of molecular sequence data sampled from modern-day... more
    Phylogeography investigates the historical process that is responsible for the contemporary geographic distributions of populations in a species. The inference is made on the basis of molecular sequence data sampled from modern-day populations. The estimates, however, may fluctuate depending on the relevant genomic regions, because the evolution mechanism of each genome is unique, even within the same individual. In this article, we propose a genome-differentiated population tree model that allows the existence of separate population trees for each homologous genome. In each population tree, the unique evolutionary characteristics account for each genome, along with their homologous relationship; therefore, the approach can distinguish the evolutionary history of one genome from that of another. In addition to the separate divergence times, the new model can estimate separate effective population sizes, gene-genealogies and other mutation parameters. For Bayesian inference, we developed a Markov chain Monte Carlo (MCMC) methodology with a novel MCMC algorithm which can mix over a complicated state space. The stability of the new estimator is demonstrated through comparison with the Monte Carlo samples and other methods, as well as MCMC convergence diagnostics. The analysis of African gorilla data from two homologous loci reveals discordant divergence times between loci, and this discrepancy is explained by male-mediated gene flows until the end of the last ice age.
    BUCKy is a C++ program that implements Bayesian concordance analysis. The method uses a non-parametric clustering of genes with compatible trees, and reconstructs the primary concordance tree from clades supported by the largest... more
    BUCKy is a C++ program that implements Bayesian concordance analysis. The method uses a non-parametric clustering of genes with compatible trees, and reconstructs the primary concordance tree from clades supported by the largest proportions of genes. A population tree with branch lengths in coalescent units is estimated from quartet concordance factors. BUCKy is open source and distributed under the GNU general public license at www.stat.wisc.edu/∼ane/bucky/.
    Libraries of 16S rRNA genes provide insight into the membership of microbial communities. Statistical methods help to determine whether differences in library composition are artifacts of sampling or are due to underlying differences in... more
    Libraries of 16S rRNA genes provide insight into the membership of microbial communities. Statistical methods help to determine whether differences in library composition are artifacts of sampling or are due to underlying differences in the communities from which they are derived. To contribute to a growing statistical framework for comparing 16S rRNA libraries, we present a computer program, integral -LIBSHUFF, which calculates the integral form of the Cramér-von Mises statistic. This implementation builds upon the LIBSHUFF program, which uses an approximation of the statistic and makes a number of modifications that improve precision and accuracy. Once integral -LIBSHUFF calculates the P values, when pairwise comparisons are tested at the 0.05 level, the probability of falsely identifying a significant P value is 0.098 for a study with two libraries, 0.265 for three libraries, and 0.460 for four libraries. The potential negative effects of making the multiple pairwise comparisons necessitate correcting for the increased likelihood that differences between treatments are due to chance and do not reflect biological differences. Using integral -LIBSHUFF, we found that previously published 16S rRNA gene libraries constructed from Scottish and Wisconsin soils contained different bacterial lineages. We also analyzed the published libraries constructed for the zebrafish gut microflora and found statistically significant changes in the community during development of the host. These analyses illustrate the power of integral -LIBSHUFF to detect differences between communities, providing the basis for ecological inference about the association of soil productivity or host gene expression and microbial community composition.

    And 4 more