[go: up one dir, main page]

Academia.eduAcademia.edu
Downloaded from rsbl.royalsocietypublishing.org on August 25, 2011 Synthesizing and databasing fossil calibrations: divergence dating and beyond Daniel T. Ksepka, Michael J. Benton, Matthew T. Carrano, Maria A. Gandolfo, Jason J. Head, Elizabeth J. Hermsen, Walter G. Joyce, Kristin S. Lamm, José S. L. Patané, Matthew J. Phillips, P. David Polly, Marcel Van Tuinen, Jessica L. Ware, Rachel C. M. Warnock and James F. Parham Biol. Lett. published online 27 April 2011 doi: 10.1098/rsbl.2011.0356 References This article cites 6 articles, 1 of which can be accessed free P<P Published online 27 April 2011 in advance of the print journal. Subject collections Articles on similar topics can be found in the following collections http://rsbl.royalsocietypublishing.org/content/early/2011/04/20/rsbl.2011.0356.full.html #ref-list-1 palaeontology (236 articles) taxonomy and systematics (371 articles) bioinformatics (132 articles) Email alerting service Receive free email alerts when new articles cite this article - sign up in the box at the top right-hand corner of the article or click here Advance online articles have been peer reviewed and accepted for publication but have not yet appeared in the paper journal (edited, typeset versions may be posted when available prior to final publication). Advance online articles are citable and establish publication priority; they are indexed by PubMed from initial publication. Citations to Advance online articles must include the digital object identifier (DOIs) and date of initial publication. To subscribe to Biol. Lett. go to: http://rsbl.royalsocietypublishing.org/subscriptions This journal is © 2011 The Royal Society Downloaded from rsbl.royalsocietypublishing.org on August 25, 2011 Biol. Lett. doi:10.1098/rsbl.2011.0356 Published online Keywords: molecular clocks; palaeontology; bioinformatics; database Palaeontology Meeting report Synthesizing and databasing fossil calibrations: divergence dating and beyond Daniel T. Ksepka1,3,*, Michael J. Benton4, Matthew T. Carrano5, Maria A. Gandolfo6, Jason J. Head7, Elizabeth J. Hermsen6, Walter G. Joyce8, Kristin S. Lamm2, José S. L. Patané9, Matthew J. Phillips10, P. David Polly11, Marcel Van Tuinen12, Jessica L. Ware13,14, Rachel C. M. Warnock4 and James F. Parham15 1 Department of Marine, Earth and Atmospheric Sciences, and Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695, USA 3 Department of Paleontology, North Carolina Museum of Natural Sciences, Raleigh, NC 27601, USA 4 School of Earth Sciences, University of Bristol, Bristol BS8 1RJ, UK 5 Department of Paleobiology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20013, USA 6 LH Bailey Hortorium, Department of Plant Biology, Cornell University, Ithaca, NY 14853, USA 7 Department of Biology, University of Toronto Mississauga, Mississauga, Ontario, Canada L5L 1C6 8 Department of Geosciences, University of Tübingen, 72074 Tübingen, Germany 9 Laboratorio de Ecologia e Evolucao, Instituto Butantan, Sao Paulo, Brazil 10 School of Biological Sciences, University of Queensland, Saint Lucia 4072, Australia 11 Department of Geological Sciences, Indiana University, Bloomington, IN 47405, USA 12 Department of Biology and Marine Biology, University of North Carolina at Wilmington, Wilmington, NC 28403, USA 13 Department of Biology, Rutgers, The State University of New Jersey, Newark, NJ 07102, USA 14 Division of Invertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA 15 Alabama Museum of Natural History, University of Alabama, Tuscaloosa, AL 3548, USA *Author for correspondence (daniel_ksepka@ncsu.edu). 2 Divergence dating studies, which combine temporal data from the fossil record with branch length data from molecular phylogenetic trees, represent a rapidly expanding approach to understanding the history of life. National Evolutionary Synthesis Center hosted the first Fossil Calibrations Working Group (3–6 March, 2011, Durham, NC, USA), bringing together palaeontologists, molecular evolutionists and bioinformatics experts to present perspectives from disciplines that generate, model and use fossil calibration data. Presentations and discussions focused on channels for interdisciplinary collaboration, best practices for justifying, reporting and using fossil calibrations and roadblocks to synthesis of palaeontological and molecular data. Bioinformatics solutions were proposed, with the primary objective being a new database for vetted fossil calibrations with linkages to existing resources, targeted for a 2012 launch. Received 28 March 2011 Accepted 8 April 2011 1. INTRODUCTION Temporal data from the fossil record provide crucial calibration information for divergence time estimation, enabling rates of molecular evolution and absolute time to be disentangled. A survey of the literature reveals a recent, rapid increase in the number of divergence dating analyses that incorporate fossil calibrations (figure 1). This proliferation can be attributed to advances in theory, methods and software. Recent innovations allow age distributions, rather than simple point or hard minimum calibrations, to be applied to nodes. New methods have sparked a greater demand for fossil data to inform more complex age distributions and models of molecular rate variation, and also to increase calibration density across the Tree of Life. Unfortunately, the development of rigorous approaches for choosing fossil calibrations, justifying their phylogenetic placement and incorporating stratigraphic uncertainty has lagged behind advances in molecular methods. Because the fossil record is critical to modern approaches to divergence time estimation, addressing this problem is of immediate concern. The National Evolutionary Synthesis Center (NESCent) Fossil Calibrations meeting was convened to address best practices in extracting and communicating fossil calibrations. In this report, we examine some of the most compelling issues surrounding the synthesis of palaeontology and molecular biology, characterize several recurring challenges and summarize participant consensus on pathways for advancing the field. Bioinformatics solutions proposed at the meeting are outlined, specifically, a new database for vetted fossil calibrations. 2. CHALLENGES IN FORMULATING AND APPLYING FOSSIL CALIBRATIONS Despite the ever-increasing use of fossil calibrations (figure 1), many divergence dating studies have spawned controversy. Palaeontologists frequently voice concern over problematic calibrations (e.g. [2]), and molecular evolutionists have repeatedly articulated the dangers of inappropriate application of fossil data [3,4]. Unfortunately, such concerns are often expressed after divergence results for a clade of interest have already been published and rarely result in reanalysis. Daniel Ksepka presented data from 171 fossil calibrations, quantifying the prevalence of inaccurate calibrations within Aves. For 67 per cent of these calibrations, the phylogenetic position of the fossil remains untested, meaning that the fossil may not even pertain to the node of interest. Only 24 per cent of calibrations were based on fossils that had been included in phylogenetic analyses. Of these, 33 per cent were improperly applied, potentially introducing large errors into divergence results. The most common error was the use of a stem lineage fossil (one that diverged before the last common ancestor of all extant species, and so before the target node) to calibrate the minimum age of the crown clade. Such errors can cause overestimation of node ages throughout the tree. These problems are widespread across the Tree of Life. Meeting participants recognized similar patterns for This journal is q 2011 The Royal Society Downloaded from rsbl.royalsocietypublishing.org on August 25, 2011 2 D. T. Ksepka et al. Meeting report. Fossil calibrations 1800 1600 fossil calibrations 1400 1200 1000 800 600 400 200 68 70 19 72 19 74 19 76 19 78 19 80 19 82 19 84 19 86 19 88 19 90 19 92 19 94 19 96 19 98 20 00 20 02 20 04 20 06 20 08 19 19 19 66 0 year Figure 1. Summary of fossil calibrations in the tetrapod literature from the interval between the first fossil calibration by Zuckerkandl & Pauling [1] in 1965 and the year 2008. Data include 1603 individual calibrations derived from over 600 publications. published calibrations in plants, mammals and insects. Because accurate calibrations are strongly correlated with accurate divergence dating results, poor calibrations represent a major problem. Nevertheless, fossil calibrations are often obtained and applied with less rigour compared with the DNA sequencing, alignment and phylogeny inference steps of an analysis. Some studies apply ages obtained from ‘the fossil record’, without reference to individual species or specimens, while others choose fossils and ages from outdated compendia or previous divergence dating studies. Literature justifying the placement of the fossil or assigned age is only occasionally cited. It is theoretically possible to sidestep the issue of assigning calibration points to nodes a priori by including fossils directly in the phylogenetic analysis as terminal taxa with known ages and simultaneously inferring phylogeny and divergence times. Although a joint estimation approach is not implemented in any widely used program (e.g. BEAST, r8s), there are practical workarounds using existing software [5]. Some calibration issues are clade-specific. Differences in morphology, ecology and preservation potential among clades affect our ability to define calibration standards that apply uniformly across the Tree of Life. Plants pose a particular challenge because various structures of a single species (e.g. leaves, pollen, fruit) are frequently recovered in isolation and apomorphies are unevenly distributed throughout these structures [6]. Maria Gandolfo and Elizabeth Hermsen detailed palaeobotanical examples in which pollen and macrofossil records provide vastly different estimates for the first appearance of a clade. In many cases, the oldest potential records of a clade are based on precisely those structures that are the most difficult to place with phylogenetic precision, presenting the dilemma of choosing between fossils that may be far younger than the actual age of origin and fossils that may be misidentified. In some insect clades, preservational biases favouring characters with high levels of homoplasy can be problematic. For example, wing venation pattern is often preserved with high fidelity, but convergences related to flight style or wing size reduction lead to a high risk of erroneous clade assignments when only such venation data are available. Biol. Lett. Debates over whether, and how, maximum ages can be extracted reliably from the fossil record were spurred by presentations on formulating calibration age distributions. Establishing a maximum constraint for a node is less straightforward than identifying a minimum constraint based on the oldest fossil from a clade. Many palaeontologists are understandably reluctant to articulate maxima, but given that most divergence dating programmes require at least one maximum age, there is a pressing need to tackle this issue. Michael Benton presented an overview of methods for assigning maximum constraints, including ‘soft maxima’ that assign a very low probability of a clade existing beyond a specified maximum age, but allow this age to be surpassed if the molecular data strongly suggest the actual age is older. Methods for informing maxima include confidence intervals based on the stratigraphic distribution of fossils, using the age of fossils from the sister clade, and phylogenetic bracketing with stratigraphic bounding. The first method can provide probability measures based on sampling data, but the second, although often recommended, suffers from the problem that dates of outgroups need not be older than the target node. The last approach, favoured by most meeting participants, relies on the age of the youngest deposit (often a Lagerstätte) that might be anticipated to contain fossils of the clade in question but does not [7]. One underappreciated issue raised in several discussions is the difficulty in specifying calibrations for groups with poor fossil records. If only a single fossil exists, it provides a minimum but no viable justification for a maximum (e.g. there is no reasonable expectation for such a poorly represented clade to turn up in the next oldest Lagerstätte, and confidence intervals cannot be generated from fewer than three data points). It is for precisely these clades that molecular divergence time estimation has the greatest potential to inform, but also the greatest potential to mislead. 3. BIOINFORMATICS SOLUTIONS At present, there is no centralized source for fossil calibration data. Yet such a bioinformatics resource is Downloaded from rsbl.royalsocietypublishing.org on August 25, 2011 Meeting report. Fossil calibrations D. T. Ksepka et al. clearly needed. A database avoids many problems posed by the static nature of print resources and also serves multiple communities by consolidating data that are spread across disparate palaeontological, geological and systematic literature. A roundtable discussion focused on developing a prototype design for a new open access database took place on the final day of the meeting. Objectives of the database initiative include: (i) reporting vetted fossil calibrations in a standardized format, (ii) requiring a treebased representation of a fossil’s phylogenetic placement to make explicit both the most exclusive placement of the fossil and any potential conflicts between morphological and molecular phylogenies, and (iii) incentivizing palaeontologists to contribute by linking publications directly with database search results. Debate focused on the standards required for vetting a fossil calibration. Participants agreed that calibrations should at minimum be phylogenetically justified through either a phylogenetic analysis or a list of apomorphies that allow unambiguous placement of the fossil within the clade specified. Age should be justified by stratigraphic evidence (which may include radiometric, palaeomagnetic or biostratigraphic data), with dating error appropriately incorporated. A revision field is essential to allow contributors and administrators to alert users to update a calibration when, for example, new stratigraphic research refines the estimated age of a fossil locality. Requiring estimates for soft maximum ages to be submitted alongside minimum constraints was considered, but ultimately it was decided that specifying maximum age estimates should remain optional. This approach balances the need for palaeontologists to inform maximum ages against the difficulties in justifying maxima in cases where the fossil record is poor or phylogenetic uncertainty is high. A recurring issue with database initiatives has been motivating contributors. Palaeontologists are typically the only specialists with the expertise to identify and vet calibrations, but are seldom directly involved as authors of divergence dating studies. Thus, fossil calibrations often enter the literature haphazardly. Publication credit is one way to encourage the palaeontological community to take an active role in providing clearly justified fossil calibrations. A partnership with the open access journal Palaeontologia Electronica (PE) has been initiated to address this issue. David Polly presented an overview of electronic publication logistics and database connectivity potential at PE. Submissions to a special Fossil Calibrations section at PE will serve as a peer-reviewed pipeline for calibrations to enter the database. Linking database entries directly to publications credits palaeontological contributions, facilitates citations and directs data users to details of the calibration justification. Of course, a wealth of fossil calibrations already exists. Calibrations that meet the standards outlined above but were published outside PE will also be eligible for entry into the database. The importance of creating a centralized clearinghouse for fossil calibration data is balanced against concern over the creation of independent ‘data silos’. If information is stored behind walls, or if linkages to existing databases are lacking, the potential use of the Biol. Lett. 3 database may be inhibited and effort may be duplicated. Michael Benton, Matthew Carrano and Marcel Van Tuinen presented overviews of existing online resources for obtaining data on the timing of evolution. Date A Clade (fossilrecord.net/dateaclade) provides minimum and maximum constraints from the fossil record for 40 key nodes on the animal Tree of Life. These calibrations provide an ideal example of the type of vetted data the Fossil Calibrations database will curate. The Paleobiology Database (paleodb.org) stores data on over 175 000 fossil taxa and includes many powerful features such as a set of time scales that can be updated centrally (e.g. revised boundaries for a time interval can be simultaneously updated for all fossils from that interval). Some of the myriad fossils listed as oldest occurrences are already well-established calibration points, but others are more ambiguous and illustrate the perils involved in ‘reading the record’ without expert guidance. Regardless, interconnectivity with the Paleobiology Database was considered a major long-term goal. In summary, the immediate goal outlined at the NESCent Fossil Calibrations meeting is to create a database that accepts vetted calibration data from palaeontologists and provides these data to molecular evolutionists in a streamlined fashion. As this initiative moves forward, the supply of vetted fossil calibrations will grow and new opportunities for synthesis will emerge. This meeting was supported by the National Evolutionary Synthesis Center (NESCent), NSF no. EF-0905606. We thank NESCent for hosting the meeting and Karen Cranston, Vladimir Gapayev, Hilmar Lapp, Todd Vision and Danielle Wilson for contributing their expertise to this project. 1 Zuckerkandl, E. & Pauling, L. B. 1965 Evolutionary divergence and convergence in proteins. In Evolving genes and proteins (eds V. Bryson & H. Vogel), pp. 97–166. New York, NY: Academic Press. 2 Parham, J. F. & Irmis, R. B. 2008 Caveats on the use of fossil calibrations for molecular dating: a comment on Near et al. Am. Nat. 171, 132 –136. (doi:10.1086/ 524198) 3 Graur, D. & Martin, W. 2004 Reading the entrails of chickens: molecular timescales of evolution and the illusion of precision. Trends Genet. 20, 80–86. (doi:10.1016/ j.tig.2003.12.003) 4 Pulquério, M. J. F. & Nichols, R. A. 2007 Dates from the molecular clock: how wrong can we be? Trends Ecol. Evol. 22, 180 –184. (doi:10.1016/j.tree.2006.11.013) 5 Ware, J., Grimaldi, D. & Engel, M. 2010 The effects of fossil placement and calibration on divergence times and rates: an example from the termites (Insecta: Isoptera). Arthropod Struct. Dev. 38, 204–219. (doi:10.1016/j.asd. 2009.11.003) 6 Gandolfo, M. A., Nixon, K. C. & Crepet, W. L. 2008 Selection of fossils for calibration of molecular dating models. Ann. Missouri Botanical Garden 95, 34–42. (doi:10.3417/2007064) 7 Benton, M. J. & Donoghue, P. C. 2007 Paleontological evidence to date the Tree of Life. Mol. Biol. Evol. 24, 26–53. (doi:10.1093/molbev/msl150)