Proceedings of the National Academy of Sciences of the United States of America, Jun 5, 2018
The American tropics (the Neotropics) are the most species-rich realm on Earth, and for centuries... more The American tropics (the Neotropics) are the most species-rich realm on Earth, and for centuries, scientists have attempted to understand the origins and evolution of their biodiversity. It is now clear that different regions and taxonomic groups have responded differently to geological and climatic changes. However, we still lack a basic understanding of how Neotropical biodiversity was assembled over evolutionary timescales. Here we infer the timing and origin of the living biota in all major Neotropical regions by performing a cross-taxonomic biogeographic analysis based on 4,450 species from six major clades across the tree of life (angiosperms, birds, ferns, frogs, mammals, and squamates), and integrate >1.3 million species occurrences with large-scale phylogenies. We report an unprecedented level of biotic interchange among all Neotropical regions, totaling 4,525 dispersal events. About half of these events involved transitions between major environmental types, with a pre...
Rapidly growing biological data-including molecular sequences and fossils-hold an unprecedented p... more Rapidly growing biological data-including molecular sequences and fossils-hold an unprecedented potential to reveal how evolutionary processes generate and maintain biodiversity. However, researchers often have to develop their own idiosyncratic workflows to integrate and analyze these data for reconstructing time-calibrated phylogenies. In addition, divergence times estimated under different methods and assumptions, and based on data of various quality and reliability, should not be combined without proper correction. Here we introduce a modular framework termed SUPERSMART (Self-Updating Platform for Estimating Rates of Speciation and Migration, Ages, and Relationships of Taxa), and provide a proof of concept for dealing with the moving targets of evolutionary and biogeographical research. This framework assembles comprehensive data sets of molecular and fossil data for any taxa and infers dated phylogenies using robust species tree methods, also allowing for the inclusion of genomic data produced through next-generation sequencing techniques. We exemplify the application of our method by presenting phylogenetic and dating analyses for the mammal order Primates and for the plant family Arecaceae (palms). We believe that this framework will provide a valuable tool for a wide range of hypothesis-driven research questions in systematics, biogeography, and evolution. SUPERSMART will also accelerate the inference of a "Dated Tree of Life" where all node ages are directly comparable. [Bayesian phylogenetics; data mining; divide-and-conquer methods; GenBank; multilocus multispecies coalescent; next-generation sequencing; palms; primates; tree calibration.].
Understanding the patterns and processes underlying the uneven distribution of biodiversity acros... more Understanding the patterns and processes underlying the uneven distribution of biodiversity across space constitutes a major scientific challenge in systematic biology and biogeography, which largely relies on effectively mapping and making sense of rapidly increasing species occurrence data. There is thus an urgent need for making the process of coding species into spatial units faster, automated, transparent, and reproducible. Here we present SpeciesGeoCoder, an open-source software package written in Python and R, that allows for easy coding of species into user-defined operational units. These units may be of any size and be purely spatial (i.e., polygons) such as countries and states, conservation areas, biomes, islands, biodiversity hotspots, and areas of endemism, but may also include elevation ranges. This flexibility allows scoring species into complex categories, such as those encountered in topographically and ecologically heterogeneous landscapes. In addition, SpeciesGeoCoder can be used to facilitate sorting and cleaning of occurrence data obtained from online databases, and for testing the impact of incorrect identification of specimens on the spatial coding of species. The various outputs of SpeciesGeoCoder include quantitative biodiversity statistics, global and local distribution maps, and files that can be used directly in many phylogeny-based applications for ancestral range reconstruction, investigations of biome evolution, and other comparative methods. Our simulations indicate that even datasets containing hundreds of millions of records can be analyzed in relatively short time using a standard computer. We exemplify the use of SpeciesGeoCoder by inferring the historical dispersal of birds across the Isthmus of Panama, showing that lowland species crossed the Isthmus about twice as frequently as montane species with a marked increase in the number of dispersals during the last 10 million years. [ancestral area reconstruction; biodiversity patterns; ecology; evolution; point in polygon; species distribution data.].
Recent molecular studies have identified substantial fungal diversity in indoor environments. Fun... more Recent molecular studies have identified substantial fungal diversity in indoor environments. Fungi and fungal particles have been linked to a range of potentially unwanted effects in the built environment, including asthma, decay of building materials, and food spoilage. The study of the built mycobiome is hampered by a number of constraints, one of which is the poor state of the metadata annotation of fungal DNA sequences from the built environment in public databases. In order to enable precise interrogation of such data – for example, “retrieve all fungal sequences recovered from bathrooms” – a workshop was organized at the University of Gothenburg (May 23-24, 2016) to annotate public fungal barcode (ITS) sequences according to the MIxS-Built Environment annotation standard (http://gensc.org/mixs/). The 36 participants assembled a total of 45,488 data points from the published literature, including the addition of 8,430 instances of countries of collection from a total of 83 countries, 5,801 instances of building types, and 3,876 instances of surface-air contaminants. The results were implemented in the UNITE database for molecular identification of fungi (http://unite.ut.ee) and were shared with other online resources. Data obtained from human/animal pathogenic fungi will furthermore be verified on culture based metadata for subsequent inclusion in the ISHAM-ITS database (http://its.mycologylab.org).
Proceedings of the National Academy of Sciences of the United States of America, Jun 5, 2018
The American tropics (the Neotropics) are the most species-rich realm on Earth, and for centuries... more The American tropics (the Neotropics) are the most species-rich realm on Earth, and for centuries, scientists have attempted to understand the origins and evolution of their biodiversity. It is now clear that different regions and taxonomic groups have responded differently to geological and climatic changes. However, we still lack a basic understanding of how Neotropical biodiversity was assembled over evolutionary timescales. Here we infer the timing and origin of the living biota in all major Neotropical regions by performing a cross-taxonomic biogeographic analysis based on 4,450 species from six major clades across the tree of life (angiosperms, birds, ferns, frogs, mammals, and squamates), and integrate >1.3 million species occurrences with large-scale phylogenies. We report an unprecedented level of biotic interchange among all Neotropical regions, totaling 4,525 dispersal events. About half of these events involved transitions between major environmental types, with a pre...
Rapidly growing biological data-including molecular sequences and fossils-hold an unprecedented p... more Rapidly growing biological data-including molecular sequences and fossils-hold an unprecedented potential to reveal how evolutionary processes generate and maintain biodiversity. However, researchers often have to develop their own idiosyncratic workflows to integrate and analyze these data for reconstructing time-calibrated phylogenies. In addition, divergence times estimated under different methods and assumptions, and based on data of various quality and reliability, should not be combined without proper correction. Here we introduce a modular framework termed SUPERSMART (Self-Updating Platform for Estimating Rates of Speciation and Migration, Ages, and Relationships of Taxa), and provide a proof of concept for dealing with the moving targets of evolutionary and biogeographical research. This framework assembles comprehensive data sets of molecular and fossil data for any taxa and infers dated phylogenies using robust species tree methods, also allowing for the inclusion of genomic data produced through next-generation sequencing techniques. We exemplify the application of our method by presenting phylogenetic and dating analyses for the mammal order Primates and for the plant family Arecaceae (palms). We believe that this framework will provide a valuable tool for a wide range of hypothesis-driven research questions in systematics, biogeography, and evolution. SUPERSMART will also accelerate the inference of a "Dated Tree of Life" where all node ages are directly comparable. [Bayesian phylogenetics; data mining; divide-and-conquer methods; GenBank; multilocus multispecies coalescent; next-generation sequencing; palms; primates; tree calibration.].
Understanding the patterns and processes underlying the uneven distribution of biodiversity acros... more Understanding the patterns and processes underlying the uneven distribution of biodiversity across space constitutes a major scientific challenge in systematic biology and biogeography, which largely relies on effectively mapping and making sense of rapidly increasing species occurrence data. There is thus an urgent need for making the process of coding species into spatial units faster, automated, transparent, and reproducible. Here we present SpeciesGeoCoder, an open-source software package written in Python and R, that allows for easy coding of species into user-defined operational units. These units may be of any size and be purely spatial (i.e., polygons) such as countries and states, conservation areas, biomes, islands, biodiversity hotspots, and areas of endemism, but may also include elevation ranges. This flexibility allows scoring species into complex categories, such as those encountered in topographically and ecologically heterogeneous landscapes. In addition, SpeciesGeoCoder can be used to facilitate sorting and cleaning of occurrence data obtained from online databases, and for testing the impact of incorrect identification of specimens on the spatial coding of species. The various outputs of SpeciesGeoCoder include quantitative biodiversity statistics, global and local distribution maps, and files that can be used directly in many phylogeny-based applications for ancestral range reconstruction, investigations of biome evolution, and other comparative methods. Our simulations indicate that even datasets containing hundreds of millions of records can be analyzed in relatively short time using a standard computer. We exemplify the use of SpeciesGeoCoder by inferring the historical dispersal of birds across the Isthmus of Panama, showing that lowland species crossed the Isthmus about twice as frequently as montane species with a marked increase in the number of dispersals during the last 10 million years. [ancestral area reconstruction; biodiversity patterns; ecology; evolution; point in polygon; species distribution data.].
Recent molecular studies have identified substantial fungal diversity in indoor environments. Fun... more Recent molecular studies have identified substantial fungal diversity in indoor environments. Fungi and fungal particles have been linked to a range of potentially unwanted effects in the built environment, including asthma, decay of building materials, and food spoilage. The study of the built mycobiome is hampered by a number of constraints, one of which is the poor state of the metadata annotation of fungal DNA sequences from the built environment in public databases. In order to enable precise interrogation of such data – for example, “retrieve all fungal sequences recovered from bathrooms” – a workshop was organized at the University of Gothenburg (May 23-24, 2016) to annotate public fungal barcode (ITS) sequences according to the MIxS-Built Environment annotation standard (http://gensc.org/mixs/). The 36 participants assembled a total of 45,488 data points from the published literature, including the addition of 8,430 instances of countries of collection from a total of 83 countries, 5,801 instances of building types, and 3,876 instances of surface-air contaminants. The results were implemented in the UNITE database for molecular identification of fungi (http://unite.ut.ee) and were shared with other online resources. Data obtained from human/animal pathogenic fungi will furthermore be verified on culture based metadata for subsequent inclusion in the ISHAM-ITS database (http://its.mycologylab.org).
Uploads
Papers by Ruud Scharn