Downloaded from rsbl.royalsocietypublishing.org on August 25, 2011
Synthesizing and databasing fossil calibrations: divergence
dating and beyond
Daniel T. Ksepka, Michael J. Benton, Matthew T. Carrano, Maria A. Gandolfo, Jason J. Head,
Elizabeth J. Hermsen, Walter G. Joyce, Kristin S. Lamm, José S. L. Patané, Matthew J. Phillips, P.
David Polly, Marcel Van Tuinen, Jessica L. Ware, Rachel C. M. Warnock and James F. Parham
Biol. Lett. published online 27 April 2011
doi: 10.1098/rsbl.2011.0356
References
This article cites 6 articles, 1 of which can be accessed free
P<P
Published online 27 April 2011 in advance of the print journal.
Subject collections
Articles on similar topics can be found in the following collections
http://rsbl.royalsocietypublishing.org/content/early/2011/04/20/rsbl.2011.0356.full.html
#ref-list-1
palaeontology (236 articles)
taxonomy and systematics (371 articles)
bioinformatics (132 articles)
Email alerting service
Receive free email alerts when new articles cite this article - sign up in the box at the top
right-hand corner of the article or click here
Advance online articles have been peer reviewed and accepted for publication but have not yet appeared in
the paper journal (edited, typeset versions may be posted when available prior to final publication). Advance
online articles are citable and establish publication priority; they are indexed by PubMed from initial publication.
Citations to Advance online articles must include the digital object identifier (DOIs) and date of initial
publication.
To subscribe to Biol. Lett. go to: http://rsbl.royalsocietypublishing.org/subscriptions
This journal is © 2011 The Royal Society
Downloaded from rsbl.royalsocietypublishing.org on August 25, 2011
Biol. Lett.
doi:10.1098/rsbl.2011.0356
Published online
Keywords: molecular clocks; palaeontology;
bioinformatics; database
Palaeontology
Meeting report
Synthesizing and
databasing fossil
calibrations: divergence
dating and beyond
Daniel T. Ksepka1,3,*, Michael J. Benton4,
Matthew T. Carrano5, Maria A. Gandolfo6,
Jason J. Head7, Elizabeth J. Hermsen6,
Walter G. Joyce8, Kristin S. Lamm2,
José S. L. Patané9, Matthew J. Phillips10,
P. David Polly11, Marcel Van Tuinen12,
Jessica L. Ware13,14, Rachel C. M. Warnock4
and James F. Parham15
1
Department of Marine, Earth and Atmospheric Sciences, and
Bioinformatics Research Center, North Carolina State University,
Raleigh, NC 27695, USA
3
Department of Paleontology, North Carolina Museum of Natural
Sciences, Raleigh, NC 27601, USA
4
School of Earth Sciences, University of Bristol, Bristol BS8 1RJ, UK
5
Department of Paleobiology, National Museum of Natural History,
Smithsonian Institution, Washington, DC 20013, USA
6
LH Bailey Hortorium, Department of Plant Biology, Cornell University,
Ithaca, NY 14853, USA
7
Department of Biology, University of Toronto Mississauga, Mississauga,
Ontario, Canada L5L 1C6
8
Department of Geosciences, University of Tübingen, 72074 Tübingen,
Germany
9
Laboratorio de Ecologia e Evolucao, Instituto Butantan,
Sao Paulo, Brazil
10
School of Biological Sciences, University of Queensland, Saint Lucia
4072, Australia
11
Department of Geological Sciences, Indiana University, Bloomington,
IN 47405, USA
12
Department of Biology and Marine Biology, University of North
Carolina at Wilmington, Wilmington, NC 28403, USA
13
Department of Biology, Rutgers, The State University of New Jersey,
Newark, NJ 07102, USA
14
Division of Invertebrate Zoology, American Museum of Natural
History, New York, NY 10024, USA
15
Alabama Museum of Natural History, University of Alabama,
Tuscaloosa, AL 3548, USA
*Author for correspondence (daniel_ksepka@ncsu.edu).
2
Divergence dating studies, which combine temporal data from the fossil record with branch
length data from molecular phylogenetic trees,
represent a rapidly expanding approach to understanding the history of life. National Evolutionary
Synthesis Center hosted the first Fossil Calibrations Working Group (3–6 March, 2011,
Durham, NC, USA), bringing together palaeontologists, molecular evolutionists and bioinformatics
experts to present perspectives from disciplines
that generate, model and use fossil calibration
data. Presentations and discussions focused on
channels for interdisciplinary collaboration, best
practices for justifying, reporting and using fossil
calibrations and roadblocks to synthesis of
palaeontological and molecular data. Bioinformatics solutions were proposed, with the
primary objective being a new database for vetted
fossil calibrations with linkages to existing
resources, targeted for a 2012 launch.
Received 28 March 2011
Accepted 8 April 2011
1. INTRODUCTION
Temporal data from the fossil record provide crucial
calibration information for divergence time estimation,
enabling rates of molecular evolution and absolute
time to be disentangled. A survey of the literature reveals
a recent, rapid increase in the number of divergence
dating analyses that incorporate fossil calibrations
(figure 1). This proliferation can be attributed to
advances in theory, methods and software. Recent innovations allow age distributions, rather than simple point
or hard minimum calibrations, to be applied to nodes.
New methods have sparked a greater demand for fossil
data to inform more complex age distributions and
models of molecular rate variation, and also to increase
calibration density across the Tree of Life. Unfortunately, the development of rigorous approaches for
choosing fossil calibrations, justifying their phylogenetic
placement and incorporating stratigraphic uncertainty
has lagged behind advances in molecular methods.
Because the fossil record is critical to modern approaches
to divergence time estimation, addressing this problem is
of immediate concern.
The National Evolutionary Synthesis Center (NESCent) Fossil Calibrations meeting was convened to
address best practices in extracting and communicating
fossil calibrations. In this report, we examine some of
the most compelling issues surrounding the synthesis of
palaeontology and molecular biology, characterize several
recurring challenges and summarize participant consensus on pathways for advancing the field. Bioinformatics
solutions proposed at the meeting are outlined, specifically, a new database for vetted fossil calibrations.
2. CHALLENGES IN FORMULATING AND
APPLYING FOSSIL CALIBRATIONS
Despite the ever-increasing use of fossil calibrations
(figure 1), many divergence dating studies have
spawned controversy. Palaeontologists frequently
voice concern over problematic calibrations (e.g. [2]),
and molecular evolutionists have repeatedly articulated the dangers of inappropriate application of fossil
data [3,4]. Unfortunately, such concerns are often
expressed after divergence results for a clade of interest
have already been published and rarely result in reanalysis. Daniel Ksepka presented data from 171 fossil
calibrations, quantifying the prevalence of inaccurate
calibrations within Aves. For 67 per cent of these calibrations, the phylogenetic position of the fossil remains
untested, meaning that the fossil may not even pertain
to the node of interest. Only 24 per cent of calibrations
were based on fossils that had been included in phylogenetic analyses. Of these, 33 per cent were improperly
applied, potentially introducing large errors into divergence results. The most common error was the use of a
stem lineage fossil (one that diverged before the last
common ancestor of all extant species, and so before
the target node) to calibrate the minimum age of the
crown clade. Such errors can cause overestimation of
node ages throughout the tree.
These problems are widespread across the Tree of Life.
Meeting participants recognized similar patterns for
This journal is q 2011 The Royal Society
Downloaded from rsbl.royalsocietypublishing.org on August 25, 2011
2
D. T. Ksepka et al.
Meeting report. Fossil calibrations
1800
1600
fossil calibrations
1400
1200
1000
800
600
400
200
68
70
19
72
19
74
19
76
19
78
19
80
19
82
19
84
19
86
19
88
19
90
19
92
19
94
19
96
19
98
20
00
20
02
20
04
20
06
20
08
19
19
19
66
0
year
Figure 1. Summary of fossil calibrations in the tetrapod literature from the interval between the first fossil calibration by Zuckerkandl &
Pauling [1] in 1965 and the year 2008. Data include 1603 individual calibrations derived from over 600 publications.
published calibrations in plants, mammals and insects.
Because accurate calibrations are strongly correlated
with accurate divergence dating results, poor calibrations
represent a major problem. Nevertheless, fossil calibrations are often obtained and applied with less rigour
compared with the DNA sequencing, alignment and phylogeny inference steps of an analysis. Some studies apply
ages obtained from ‘the fossil record’, without reference
to individual species or specimens, while others choose
fossils and ages from outdated compendia or previous
divergence dating studies. Literature justifying the placement of the fossil or assigned age is only occasionally
cited. It is theoretically possible to sidestep the issue of
assigning calibration points to nodes a priori by including
fossils directly in the phylogenetic analysis as terminal
taxa with known ages and simultaneously inferring phylogeny and divergence times. Although a joint estimation
approach is not implemented in any widely used program
(e.g. BEAST, r8s), there are practical workarounds using
existing software [5].
Some calibration issues are clade-specific. Differences in morphology, ecology and preservation potential
among clades affect our ability to define calibration standards that apply uniformly across the Tree of Life. Plants
pose a particular challenge because various structures of a
single species (e.g. leaves, pollen, fruit) are frequently
recovered in isolation and apomorphies are unevenly
distributed throughout these structures [6]. Maria Gandolfo and Elizabeth Hermsen detailed palaeobotanical
examples in which pollen and macrofossil records provide vastly different estimates for the first appearance of
a clade. In many cases, the oldest potential records of a
clade are based on precisely those structures that are
the most difficult to place with phylogenetic precision,
presenting the dilemma of choosing between fossils that
may be far younger than the actual age of origin and fossils
that may be misidentified. In some insect clades, preservational biases favouring characters with high levels of
homoplasy can be problematic. For example, wing venation pattern is often preserved with high fidelity, but
convergences related to flight style or wing size reduction
lead to a high risk of erroneous clade assignments when
only such venation data are available.
Biol. Lett.
Debates over whether, and how, maximum ages can
be extracted reliably from the fossil record were spurred
by presentations on formulating calibration age distributions. Establishing a maximum constraint for a node
is less straightforward than identifying a minimum constraint based on the oldest fossil from a clade. Many
palaeontologists are understandably reluctant to articulate maxima, but given that most divergence dating
programmes require at least one maximum age, there is
a pressing need to tackle this issue. Michael Benton presented an overview of methods for assigning maximum
constraints, including ‘soft maxima’ that assign a very
low probability of a clade existing beyond a specified
maximum age, but allow this age to be surpassed if the
molecular data strongly suggest the actual age is older.
Methods for informing maxima include confidence intervals based on the stratigraphic distribution of fossils,
using the age of fossils from the sister clade, and phylogenetic bracketing with stratigraphic bounding. The first
method can provide probability measures based on
sampling data, but the second, although often recommended, suffers from the problem that dates of
outgroups need not be older than the target node. The
last approach, favoured by most meeting participants,
relies on the age of the youngest deposit (often a Lagerstätte) that might be anticipated to contain fossils of the
clade in question but does not [7].
One underappreciated issue raised in several discussions is the difficulty in specifying calibrations for
groups with poor fossil records. If only a single fossil
exists, it provides a minimum but no viable justification
for a maximum (e.g. there is no reasonable expectation for such a poorly represented clade to turn up in
the next oldest Lagerstätte, and confidence intervals
cannot be generated from fewer than three data
points). It is for precisely these clades that molecular
divergence time estimation has the greatest potential
to inform, but also the greatest potential to mislead.
3. BIOINFORMATICS SOLUTIONS
At present, there is no centralized source for fossil
calibration data. Yet such a bioinformatics resource is
Downloaded from rsbl.royalsocietypublishing.org on August 25, 2011
Meeting report. Fossil calibrations D. T. Ksepka et al.
clearly needed. A database avoids many problems posed
by the static nature of print resources and also serves multiple communities by consolidating data that are spread
across disparate palaeontological, geological and systematic literature. A roundtable discussion focused on
developing a prototype design for a new open access database took place on the final day of the meeting. Objectives
of the database initiative include: (i) reporting vetted fossil
calibrations in a standardized format, (ii) requiring a treebased representation of a fossil’s phylogenetic placement
to make explicit both the most exclusive placement of the
fossil and any potential conflicts between morphological and molecular phylogenies, and (iii) incentivizing
palaeontologists to contribute by linking publications
directly with database search results.
Debate focused on the standards required for vetting a
fossil calibration. Participants agreed that calibrations
should at minimum be phylogenetically justified through
either a phylogenetic analysis or a list of apomorphies that
allow unambiguous placement of the fossil within the
clade specified. Age should be justified by stratigraphic
evidence (which may include radiometric, palaeomagnetic or biostratigraphic data), with dating error
appropriately incorporated. A revision field is essential
to allow contributors and administrators to alert users
to update a calibration when, for example, new stratigraphic research refines the estimated age of a fossil locality.
Requiring estimates for soft maximum ages to be
submitted alongside minimum constraints was considered, but ultimately it was decided that specifying
maximum age estimates should remain optional. This
approach balances the need for palaeontologists to
inform maximum ages against the difficulties in justifying maxima in cases where the fossil record is poor or
phylogenetic uncertainty is high.
A recurring issue with database initiatives has been
motivating contributors. Palaeontologists are typically
the only specialists with the expertise to identify and
vet calibrations, but are seldom directly involved
as authors of divergence dating studies. Thus, fossil
calibrations often enter the literature haphazardly.
Publication credit is one way to encourage the palaeontological community to take an active role in providing
clearly justified fossil calibrations. A partnership with
the open access journal Palaeontologia Electronica (PE)
has been initiated to address this issue. David Polly presented an overview of electronic publication logistics and
database connectivity potential at PE. Submissions to a
special Fossil Calibrations section at PE will serve as a
peer-reviewed pipeline for calibrations to enter the database. Linking database entries directly to publications
credits palaeontological contributions, facilitates citations and directs data users to details of the calibration
justification. Of course, a wealth of fossil calibrations
already exists. Calibrations that meet the standards
outlined above but were published outside PE will also
be eligible for entry into the database.
The importance of creating a centralized clearinghouse for fossil calibration data is balanced against
concern over the creation of independent ‘data silos’.
If information is stored behind walls, or if linkages to
existing databases are lacking, the potential use of the
Biol. Lett.
3
database may be inhibited and effort may be duplicated. Michael Benton, Matthew Carrano and
Marcel Van Tuinen presented overviews of existing
online resources for obtaining data on the timing of
evolution. Date A Clade (fossilrecord.net/dateaclade)
provides minimum and maximum constraints from
the fossil record for 40 key nodes on the animal Tree
of Life. These calibrations provide an ideal example
of the type of vetted data the Fossil Calibrations database will curate. The Paleobiology Database
(paleodb.org) stores data on over 175 000 fossil taxa
and includes many powerful features such as a set of
time scales that can be updated centrally (e.g. revised
boundaries for a time interval can be simultaneously
updated for all fossils from that interval). Some of
the myriad fossils listed as oldest occurrences are
already well-established calibration points, but others
are more ambiguous and illustrate the perils involved
in ‘reading the record’ without expert guidance.
Regardless, interconnectivity with the Paleobiology
Database was considered a major long-term goal.
In summary, the immediate goal outlined at the
NESCent Fossil Calibrations meeting is to create a
database that accepts vetted calibration data from
palaeontologists and provides these data to molecular
evolutionists in a streamlined fashion. As this initiative
moves forward, the supply of vetted fossil calibrations
will grow and new opportunities for synthesis will
emerge.
This meeting was supported by the National Evolutionary
Synthesis Center (NESCent), NSF no. EF-0905606. We
thank NESCent for hosting the meeting and Karen
Cranston, Vladimir Gapayev, Hilmar Lapp, Todd Vision
and Danielle Wilson for contributing their expertise to this
project.
1 Zuckerkandl, E. & Pauling, L. B. 1965 Evolutionary divergence and convergence in proteins. In Evolving genes and
proteins (eds V. Bryson & H. Vogel), pp. 97–166.
New York, NY: Academic Press.
2 Parham, J. F. & Irmis, R. B. 2008 Caveats on the use of
fossil calibrations for molecular dating: a comment
on Near et al. Am. Nat. 171, 132 –136. (doi:10.1086/
524198)
3 Graur, D. & Martin, W. 2004 Reading the entrails of
chickens: molecular timescales of evolution and the illusion of precision. Trends Genet. 20, 80–86. (doi:10.1016/
j.tig.2003.12.003)
4 Pulquério, M. J. F. & Nichols, R. A. 2007 Dates from the
molecular clock: how wrong can we be? Trends Ecol. Evol.
22, 180 –184. (doi:10.1016/j.tree.2006.11.013)
5 Ware, J., Grimaldi, D. & Engel, M. 2010 The effects of
fossil placement and calibration on divergence times and
rates: an example from the termites (Insecta: Isoptera).
Arthropod Struct. Dev. 38, 204–219. (doi:10.1016/j.asd.
2009.11.003)
6 Gandolfo, M. A., Nixon, K. C. & Crepet, W. L. 2008
Selection of fossils for calibration of molecular dating
models. Ann. Missouri Botanical Garden 95, 34–42.
(doi:10.3417/2007064)
7 Benton, M. J. & Donoghue, P. C. 2007 Paleontological
evidence to date the Tree of Life. Mol. Biol. Evol. 24,
26–53. (doi:10.1093/molbev/msl150)