Abstract
The emergence of HIV-1 group M subtype B in North American men who have sex with men was a key turning point in the HIV/AIDS pandemic. Phylogenetic studies have suggested cryptic subtype B circulation in the United States (US) throughout the 1970s1,2 and an even older presence in the Caribbean2. However, these temporal and geographical inferences, based upon partial HIV-1 genomes that postdate the recognition of AIDS in 1981, remain contentious3,4 and the earliest movements of the virus within the US are unknown. We serologically screened >2,000 1970s serum samples and developed a highly sensitive approach for recovering viral RNA from degraded archival samples. Here, we report eight coding-complete genomes from US serum samples from 1978–1979—eight of the nine oldest HIV-1 group M genomes to date. This early, full-genome ‘snapshot’ reveals that the US HIV-1 epidemic exhibited extensive genetic diversity in the 1970s but also provides strong evidence for its emergence from a pre-existing Caribbean epidemic. Bayesian phylogenetic analyses estimate the jump to the US at around 1970 and place the ancestral US virus in New York City with 0.99 posterior probability support, strongly suggesting this was the crucial hub of early US HIV/AIDS diversification. Logistic growth coalescent models reveal epidemic doubling times of 0.86 and 1.12 years for the US and Caribbean, respectively, suggesting rapid early expansion in each location3. Comparisons with more recent data reveal many of these insights to be unattainable without archival, full-genome sequences. We also recovered the HIV-1 genome from the individual known as ‘Patient 0’ (ref. 5) and found neither biological nor historical evidence that he was the primary case in the US or for subtype B as a whole. We discuss the genesis and persistence of this belief in the light of these evolutionary insights.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Change history
02 October 2016
In the online version of this paper the images in Figures 2 and 3 were switched, this has been corrected.
References
Korber, B. et al. Timing the ancestor of the HIV-1 pandemic strains. Science 288, 1789–1796 (2000)
Gilbert, M. T. et al. The emergence of HIV/AIDS in the Americas and beyond. Proc. Natl Acad. Sci. USA 104, 18566–18570 (2007)
Holmes, E. C. When HIV spread afar. Proc. Natl Acad. Sci. USA 104, 18351–18352 (2007)
Pape, J. W. et al. The epidemiology of AIDS in Haiti refutes the claims of Gilbert et al. Proc. Natl Acad. Sci. USA 105, E13 (2008)
Auerbach, D. M., Darrow, W. W., Jaffe, H. W. & Curran, J. W. Cluster of cases of the acquired immune deficiency syndrome. Patients linked by sexual contact. Am. J. Med. 76, 487–492 (1984)
Stevens, C. E. et al. Human T-cell lymphotropic virus type III infection in a cohort of homosexual men in New York City. J. Am. Med. Assoc. 255, 2167–2172 (1986)
Szmuness, W., Stevens, C. E., Zang, E. A., Harley, E. J. & Kellner, A. A controlled clinical trial of the efficacy of the hepatitis B vaccine (Heptavax B): a final report. Hepatology 1, 377–385 (1981)
Koblin, B. A., Morrison, J. M., Taylor, P. E., Stoneburner, R. L. & Stevens, C. E. Mortality trends in a cohort of homosexual men in New York City, 1978–1988. Am. J. Epidemiol. 136, 646–656 (1992)
Jaffe, H. W. et al. The acquired immunodeficiency syndrome in a cohort of homosexual men. A six-year follow-up study. Ann. Intern. Med . 103, 210–214 (1985)
Foley, B., Pan, H., Buchbinder, S. & Delwart, E. L. Apparent founder effect during the early years of the San Francisco HIV type 1 epidemic (1978–1979). AIDS Res. Hum. Retroviruses 16, 1463–1469 (2000)
Centers for Disease Control (CDC) A cluster of Kaposi’s sarcoma and Pneumocystis carinii pneumonia among homosexual male residents of Los Angeles and Orange Counties, California. MMWR Morb. Mortal. Wkly. Rep. 31, 305–307 (1982)
McKay, R. A. Imagining ‘Patient Zero’: Sexuality, Blame, and the Origins of the North American AIDS Epidemic. Doctoral thesis, Univ. of Oxford (2011)
Harden, V. A. AIDS at 30: A History (Potomac Books, 2012)
Darrow, W. W. Trip report to New York City, July 12–16 and August 3–6, 1982. CDC Task Force on AIDS, internal communication (3 September 1982)
Darrow, W. W. Time–space clustering of KS cases in the City of New York: evidence for horizontal transmission of some mysterious microbe. CDC Task Force on Kaposi’s Sarcoma and Opportunistic Infections, internal communication (3 March 1982)
Darrow, W. W. & Auerbach, D. M. Los Angeles cluster: background. CDC Task Force on Kaposi’s Sarcoma and Opportunistic Infections, internal communication (12 May 1982)
Shilts, R. And the Band Played On: Politics, People, and the AIDS Epidemic (St. Martin’s Press, 1987)
McKay, R. A. “Patient Zero”: the absence of a patient’s view of the early North American AIDS epidemic. Bull. Hist. Med. 88, 161–194 (2014)
Moss, A. R. In response to: AIDS without end. New York Rev. Books 35, 60 (1988)
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014)
Bruen, T. C., Philippe, H. & Bryant, D. A simple and robust statistical test for detecting the presence of recombination. Genetics 172, 2665–2681 (2006)
Martin, D. P., Murrell, B., Golden, M., Khoosal, A. & Muhire, B. RDP4: detection and analysis of recombination patterns in virus genomes. Virus Evol. 1, vev003 (2015)
Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012)
Rambaut, A. Estimating the rate of molecular evolution: incorporating non-contemporaneous sequences into maximum likelihood phylogenies. Bioinformatics 16, 395–399 (2000)
Lemey, P., Rambaut, A., Drummond, A. J. & Suchard, M. A. Bayesian phylogeography finds its roots. PLOS Comput. Biol . 5, e1000520 (2009)
Drummond, A. J., Ho, S. Y. W., Phillips, M. J. & Rambaut, A. Relaxed phylogenetics and dating with confidence. PLoS Biol . 4, e88 (2006)
Rambaut, A., Lam, T. T., de Carvalho, L. & Pybus, O. G. Exploring the temporal structure of heterochronous sequences using TempEst. Virus Evol . 2, DOI: http://dx.doi.org/10.1093/ve/vew007 (2016)
Gill, M. S. et al. Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol. Biol. Evol. 30, 713–724 (2013)
Faria, N. R. et al. HIV epidemiology. The early spread and epidemic ignition of HIV-1 in human populations. Science 346, 56–61 (2014)
Edwards, C. J. et al. Ancient hybridization and an Irish origin for the modern polar bear matriline. Curr. Biol. 21, 1251–1258 (2011)
Minin, V. N. & Suchard, M. A. Counting labeled transitions in continuous-time Markov models of evolution. J. Math. Biol. 56, 391–412 (2008)
Lemey, P. et al. Unifying viral genetics and human transportation data to predict the global transmission dynamics of human influenza H3N2. PLoS Pathog . 10, e1003932 (2014)
Suchard, M. A. & Rambaut, A. Many-core algorithms for statistical phylogenetics. Bioinformatics 25, 1370–1376 (2009)
Gräf, T. et al. Contribution of epidemiological predictors in unraveling the phylogeographic history of HIV-1 subtype C in Brazil. J. Virol. 89, 12341–12348 (2015)
Acknowledgements
We thank C. Stevens and D. Hemmerlein for facilitating access to archival sera; G.-Z. Han, A. Bjork, W. Switzer, V. Sullivan, R. Ruboyianes and P. Sprinkle for technical assistance; T. Spira and M. Owen for geographical data on some published sequences; and the NIH AIDS Reagent program for providing reference virus samples US657 and HT599. W. W. Darrow led the initial 1982 cluster investigation and provided R.A.M. with access to his copies of archival CDC documents. This work was supported by NIH/NIAID R01AI084691 and the David and Lucile Packard Foundation (M.W.); the Wellcome Trust (080651), the University of Oxford’s Clarendon Fund, the Economic and Social Research Council (PTA-026-27-2838), and a J. Armand Bombardier Internationalist Fellowship (R.A.M.); the Research Fund KU Leuven (Onderzoeksfonds KU Leuven, Program Financing no. PF/10/018) and the ‘Fonds voor Wetenschappelijk Onderzoek Vlaanderen’ (FWO) (G066215N) (P.L); and NSF DMS 1264153, NIH R01 HG006139 and NIH R01 AI107034 (M.A.S.).
Author information
Authors and Affiliations
Contributions
M.W., H.W.J., P.L. and R.A.M. conceived the study. T.D.W and M.W. designed the RNA jackhammering method. T.D.W. generated the sequences. B.A.K. provided serum samples from New York City. W.H. and T.G. acquired specimens and provided serological data. D.E.T. provided conceptual input. M.W., M.A.S. and P.L. prepared the data sets and performed the phylogenetic analyses. R.A.M. performed the historical analyses. M.W., H.W.J., P.L. and R.A.M. wrote the paper. All authors discussed the results and commented on the manuscript. The findings and conclusions in this report are those of the author(s) and do not necessarily represent the official position of the Centers for Disease Control and Prevention.
Corresponding authors
Ethics declarations
Competing interests
A patent, ‘Methods and systems for RNA or DNA detection and sequencing’ (US patent application 62/325,320), has been filed with the United States Patent and Trademark Office. It will be used to facilitate the licensing of this methodology.
Additional information
Reviewer Information Nature thanks K. Andersen and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
Extended Data Figure 1 Jackhammering schematic and primer panels and pools.
a–d, Detection and amplification of target RNA molecules in old, degraded, low-titre samples. For the purposes of illustration, consider a tube with 1013 RNA molecules, but (because of the low RNA quality) only one molecule that is (i) capable of being primed by the given reverse primer(s) and (ii) long enough to form a 200-bp product. a, Conventional RT–PCR with a long amplification product, oversized for a sample with RNA less than ~200 bases in length. b, RT–PCR with a shorter amplification product. c, Use of multiple primer pairs to increase the chance of at least one PCR-positive result. d, The jackhammering approach, which overcomes the problems encountered in a–c by (i) targeting an extensive panel of short amplicons appropriately sized to the level of RNA survival in the sample, (ii) conducting reverse transcription with pools of primer pairs that amplify discrete, non-overlapping genomic regions, and (iii) employing a multiplex pre-amplification step, in the tube with the reverse transcription product, to generate sufficient DNA to ensure that each aliquot from it contains numerous template molecules for final PCR amplification. In this schematic, we show just two primer pairs per pool, but we used pools of ten pairs with our largest primer panels (shown in e, HXB2 numbering along HIV-1 genome). With a 10 primer-pair pool, and 10 final reactions, one can reliably recover 10 bands for sequencing. Five such pools (one entire panel of 50 pairs), allows complete HIV-1 genome recovery even in heavily degraded samples.
Extended Data Figure 2 Maximum clade credibility (MCC) tree summaries of Bayesian spatio-temporal reconstructions based on complete HIV-1 genome data.
a, ‘full genome 46’, b, ‘full genome 38’. The tips of the trees correspond to the year of sampling while the branch (and node) colours reflect location: the sampling location for the tip branches and the inferred location for the internal branches. AF, Africa; CB, Caribbean; US, the United States; CA, California, GA, Georgia; NY, New York. The diameters of the internal node circles reflect posterior location probability values. Thick outer circles represent internal nodes with posterior probability support >0.95. Grey bars indicate the 95% credibility intervals for the internal node ages. The tree in b represents the fully annotated version of the tree in Fig. 1 in the main text.
Extended Data Figure 3 Maximum clade credibility (MCC) tree summaries of Bayesian spatio-temporal reconstructions based on different genome region data sets.
MCC trees for the same strains are shown for a, gag, b, pol, c, env and d, the complete genome. The tips of the trees correspond to the year of sampling while the branch (and node) colours reflect location: the sampling location for the tip branches and the inferred location for the internal branches. AF, Africa; CB, Caribbean; US, the United States. Tip labels are provided for the newly obtained archival HIV-1 genomes. The diameters of the internal node circles reflect posterior location probability values. Thick outer circles represent internal nodes with posterior probability support >0.95. We also depict the posterior probability densities for the time of the introduction event from the Caribbean into the US on the time scale of the trees.
Extended Data Figure 4 Maximum likelihood phylogenies for the different genome region data sets.
a, gag, b, pol, c, env and d, the complete genome. We analysed the same data sets as in Extended Data Fig. 3. The diameters of the internal node circles reflect bootstrap support values. We manually coloured the branches in a similar way as for the Bayesian phylogeographic reconstructions.
Extended Data Figure 5 Maximum clade credibility (MCC) tree summaries of Bayesian spatio-temporal reconstructions based on different env data sets.
a, ‘env 105’, b, ‘env 74’. The tips of the trees correspond to the year of sampling while the branch (and node) colours reflect location: the sampling location for the tip branches and the inferred location for the internal branches. AF, Africa; CB, Caribbean; US, the United States, CA, California; GA, Georgia; NJ, New Jersey, NY, New York; PA, Pennsylvania. The diameters of the internal node circles reflect posterior location probability values. Thick outer circles represent internal nodes with posterior probability support >0.95. We also depict the posterior probability density for the time of the introduction event from the Caribbean into the U.S on the time scales of the trees. The three partial env sequences from SF in 1978 (ref. 10) are highlighted with bullets.
Extended Data Figure 6 Maximum clade credibility (MCC) tree summaries of Bayesian spatio-temporal reconstruction comparing early and late strains.
a, ‘env 133’, b, only ‘late’ sequences from ‘env 133’. In a, we classified US sequences as ‘early’ or ‘late’ depending on whether they were sampled before or after (and including) 1985. In b, the analysis was conducted on an empirical tree distribution of ‘env 133’ from which we pruned early US sequences (in grey), but we still annotate the reconstruction on the complete phylogenies for reference. The tips of the tree correspond to the year of sampling while the branch (and node) colours reflect location: the sampling location for the tip branches and the inferred location for the internal branches. AF, Africa; CB, Caribbean; US early, the United States sampled <1985; US late, the United States sampled in or after 1985; CA, California; GA, Georgia; NC, North Carolina, NY, New York. The diameters of the internal node circles reflect posterior location probability values. Thick outer circles represent internal nodes with posterior probability support >0.95.
Extended Data Figure 7 A cluster of 40 early AIDS patients linked through sexual contact.
Reprinted from figure 1 of ref. 5 with permission from Elsevier.
Extended Data Figure 8 Jackhammering validation with reference viruses.
a, The consensus sequences for primer panels HIVM and HIVR (‘RMcon’ suffix) were included, with previously published sequences for an US (US657) virus and a Haitian (HT599) virus, in a maximum likelihood tree. The two clusters of paired sequences are highlighted by coloured boxes. b, Plot of the root to tip genetic distance against sampling time for the tree in a. The colours for the data points are consistent with those used for sampling locations in the phylogenies (the two African outgroup tips are not shown for clarity). The data points with black circles represent the published sequences while the data points with a target symbol represent the newly obtained sequences.
Extended Data Figure 9 Plots of the root-to-tip genetic distance against sampling time for different genome region data sets (gag, pol, env and the complete genome).
We used TempEst27 to obtain exploratory regressions based on the maximum likelihood trees (Extended Data Fig. 4). Each data point represents a tip; colours are consistent with those used for sampling locations in the phylogenies. The US data points with black circles represent the new genomes dating back to 1978–1979. The data point with the target symbol represents the Patient 0 genome. In each plot, we provide the R2 for the regression and the slope, reflecting the evolutionary rate (in substitutions per site per year).
Related audio
Supplementary information
Supplementary Information
This file contains a Supplementary Discussion, Supplementary References and Supplementary Tables 1-2. (PDF 1016 kb)
Rights and permissions
About this article
Cite this article
Worobey, M., Watts, T., McKay, R. et al. 1970s and ‘Patient 0’ HIV-1 genomes illuminate early HIV/AIDS history in North America. Nature 539, 98–101 (2016). https://doi.org/10.1038/nature19827
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nature19827
This article is cited by
-
A random priming amplification method for whole genome sequencing of SARS-CoV-2 virus
BMC Genomics (2022)
-
Whole-genome sequencing and genetic characteristics of representative porcine reproductive and respiratory syndrome virus (PRRSV) isolates in Korea
Virology Journal (2022)
-
The First 40 Years of AIDS: Promising Programs, Limited Success
AIDS and Behavior (2021)
-
Frontiers in Molecular Evolutionary Medicine
Journal of Molecular Evolution (2020)
-
An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar
Genome Biology (2019)