Abstract
Existing benchmark datasets for use in evaluating variant-calling accuracy are constructed from a consensus of known short-variant callers, and they are thus biased toward easy regions that are accessible by these algorithms. We derived a new benchmark dataset from the de novo PacBio assemblies of two fully homozygous human cell lines, which provides a relatively more accurate and less biased estimate of small-variant-calling error rates in a realistic context.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Zook, J. M. et al. Nat. Biotechnol. 32, 246–251 (2014).
Eberle, M. A. et al. Genome Res. 27, 157–164 (2017).
Li, H. Bioinformatics 30, 2843–2851 (2014).
Chin, C. S. et al. Nat. Methods 13, 1050–1054 (2016).
Chin, C. S. et al. Nat. Methods 10, 563–569 (2013).
Seo, J. S. et al. Nature 538, 243–247 (2016).
Huddleston, J. et al. Genome Res. 27, 677–685 (2017).
Schneider, V. A. et al. Genome Res. 27, 849–864 (2017).
Li, H. arXiv Preprint at https://arxiv.org/abs/1303.3997 (2013).
Langmead, B. & Salzberg, S. L. Nat. Methods 9, 357–359 (2012).
Li, H. Bioinformatics https://doi.org/10.1093/bioinformatics/bty191 (2018).
Garrison, E. & Marth, G. arXiv Preprint at https://arxiv.org/abs/1207.3907 (2012).
Rimmer, A. et al. Nat. Genet. 46, 912–918 (2014).
Li, H. Bioinformatics 27, 2987–2993 (2011).
DePristo, M. A. et al. Nat. Genet. 43, 491–498 (2011).
Cleary, J.G. et al. bioRxiv Preprint at https://www.biorxiv.org/content/early/2015/08/03/023754 (2015).
Auton, A. et al. Nature 526, 68–74 (2015).
Robinson, J. T. et al. Nat. Biotechnol. 29, 24–26 (2011).
Morgulis, A., Gertz, E. M., Schäffer, A. A. & Agarwala, R. J. Comput. Biol. 13, 1028–1040 (2006).
Mallick, S. et al. Nature 538, 201–206 (2016).
Li, H. Bioinformatics 31, 3694–3696 (2015).
Acknowledgements
We are grateful to E. Eichler (Department of Genome Sciences, University of Washington, Seattle, WA, USA) for providing DNA from CHM cell lines. We thank A. Carrol for testing PacBio’s new consensus caller, Arrow, and M. DePristo, J. Zook and B. Chapman for helpful suggestions. This study was supported by the US National Institutes of Health (NIH) (grants 5U54DK105566-04 and 5U01HG009088-03 to D.M. and B.N.; grant 1R01HG010040-01 to H.L.).
Author information
Authors and Affiliations
Contributions
H.L. conceived the study, constructed the benchmark dataset and drafted the manuscript; H.L., J.M.B. and Y.F. designed the experiment; L.G. and M.F. analyzed the data and applied the benchmark; and D.M. and B.N. supervised the project. All of the authors helped to revise the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Software
Syndip evaluation scripts and helper scripts used to generate the benchmark dataset
Supplementary Data 1
Numerical data and gnuplot script used to generate Fig. 2
Rights and permissions
About this article
Cite this article
Li, H., Bloom, J.M., Farjoun, Y. et al. A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat Methods 15, 595–597 (2018). https://doi.org/10.1038/s41592-018-0054-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-018-0054-7
This article is cited by
-
Leaf: an ultrafast filter for population-scale long-read SV detection
Genome Biology (2024)
-
Unveiling microbial diversity: harnessing long-read sequencing technology
Nature Methods (2024)
-
Analysis and benchmarking of small and large genomic variants across tandem repeats
Nature Biotechnology (2024)
-
Pangenome graphs improve the analysis of structural variants in rare genetic diseases
Nature Communications (2024)
-
Personalized pangenome references
Nature Methods (2024)